The Question
Spotify offers an enormous surface area for discovering music: algorithmic playlists, editorial curation, search, artist radio, social sharing, the home feed. But how do people actually navigate all of this? Do the platform’s discovery mechanisms match how real listeners think about finding music, or do they work around them, against them, despite them?
We set out to find out. Over the course of a semester, our team of five conducted contextual inquiries with eight Spotify users, observed them in their natural listening environments, built work models of their behaviors, synthesized findings through affinity diagramming, and designed three targeted interventions addressing the most significant pain points, prototyped at high fidelity for both desktop and macOS and iPhone/iOS.
Contextual Inquiry
We recruited eight participants spanning a range of ages (16–50), occupations (college students, software engineers, a nurse, a USPS maintenance worker), and listening contexts (commuting, studying, working out, driving, working from home, church). Each was observed using Spotify in their natural environment (not a lab) while a team member conducted a contextual inquiry, capturing not just what they did but why, and what the physical and social context demanded of them.
From each interview, we built four types of work models: flow models mapping how information and influence moved between the user, the app, and external sources; sequence models capturing the step-by-step structure of discovery and listening tasks; artifact models annotating the actual screens and interfaces participants interacted with; and physical models documenting how the environment shaped behavior (a phone in a car cupholder, a laptop on a desk with the library sidebar visible, AirPods at the gym).
We then ran three structured interpretation sessions where the full team reviewed each interview collaboratively, with rotating roles (interviewer, modeler, recorder, moderator) ensuring every perspective shaped the analysis.
What We Found
The findings were rich and sometimes surprising. Through affinity diagramming and cross-participant analysis, several patterns emerged that reframed how we understood the relationship between Spotify users and the platform’s discovery systems.
There is no singular “Spotify experience”
Despite using the same application, participants’ approaches to finding, organizing, and engaging with music differed dramatically. Discovery methods depended on goals, moods, and contexts, and almost no user discovered new music in only one way. This produced flow and sequence diagrams that branched extensively, reflecting the genuine complexity of real-world music interaction.
The Liked Songs junk drawer
The one near-universal behavior was reliance on the Liked Songs playlist. Its low friction (a single tap on the heart icon, no interruption to the current workflow) made it the path of least resistance for saving tracks. But this same frictionlessness created a downstream problem: Liked Songs became a catch-all storehouse rather than a curated collection. Participants accumulated hundreds or thousands of tracks without organization, and the continually growing size of the playlist made the prospect of organizing it feel overwhelming and daunting. The very simplicity that made Liked Songs the most dependable interface in the app was also what made it dysfunctional at scale.
Meta-skipping: gaming the algorithm
Our contextual inquiries uncovered an unexpected behavior pattern: users who deliberately skipped songs not because they disliked them, but to influence how Spotify’s recommendation algorithm interpreted their listening preferences. Some participants said they consciously avoided playing music they felt ambivalent about, specifically to prevent the algorithm from categorizing their taste too narrowly or flooding subsequent recommendations with similar content.
Skipping, in these cases, wasn’t feedback about the song; it was a strategic move to keep the algorithm in check. Users were “gaming the system” to maintain control over their own recommendation profile.
Bimodal algorithm sentiment
User sentiment about Spotify’s recommendation algorithm split cleanly into two camps. Users with favorable impressions appreciated how recommendations freed them from decision-making, surfaced new releases, and promoted discovery. These users actively engaged with Daily Mixes, Discover Weekly, homepage recommendations, and other algorithmic features.
Users with negative impressions expressed frustration that the algorithm “doesn’t get them,” recommending popular or repetitive tracks rather than content aligned with their niche or evolving tastes. They felt the algorithm couldn’t understand why they enjoyed specific songs, that skips and likes weren’t meaningfully factored into recommendations, and that the system prioritized globally popular content over personally relevant content. Several users mentioned encountering AI-generated songs surfaced by the algorithm, which they found disrespectful to music creators.
Critically, both groups wanted the same thing: more transparency and control. The satisfied users were satisfied by accident of taste alignment, not because the system was transparent. The dissatisfied users had no recourse, no way to directly correct an algorithm that had gone wrong.
External discovery as a workaround
Because of the limitations in Spotify’s own recommendation systems, many participants turned to external sources (friends, family, Instagram, YouTube, review sites like Rate Your Music) to discover new music. They often felt these sources understood their tastes better than Spotify did. Users would discover music elsewhere and then navigate to Spotify to listen, effectively using the platform as a playback tool rather than a discovery tool.
This pattern suggests that if Spotify’s internal recommendations were more effective, users might not need to leave the app to find music they actually enjoy.
Context shapes everything
Participants curated their music based on environment and activity. High-energy playlists for driving or working out; calm, instrumental, or lo-fi music for studying or deep work, because lyrics were described as distracting or mentally intrusive during cognitively demanding tasks. When relaxed, lyrical music became more prominent, associated with nostalgia, emotional expression, and narrative engagement.
Physical context imposed its own constraints: a phone in a car cupholder demanded eyes-free, minimal-interaction workflows (large playlists on shuffle, skip-heavy listening). The gym meant almost no interaction after the initial song selection. Studying meant selecting a playlist and not touching the app again. These contextual demands shaped not just what music users chose, but how much they were willing to engage with discovery at all.
Three Interventions
The research findings pointed toward three high-impact, addressable problems. Each redesign intervention maps directly to specific findings from the contextual inquiries.
1. Homepage Customization
Finding: Users found the homepage cluttered with irrelevant content: podcasts, audiobooks, globally popular artists they had no interest in. What users wanted to see varied widely, reflecting the diversity of individual musical tastes, but the homepage offered no way to shape it.
Intervention: A widget-based homepage system. Users can toggle widgets on or off, reorder them via drag-and-drop, customize data sources (e.g., swap “Popular Radio: Spotify Global” for “Popular Radio: Pitchfork” to surface genre-relevant content), and choose layout styles. The system includes widgets for Daily Mixes, popular radio, popular artists, recently played, new releases, podcasts, and a song organization widget that surfaces a few unsorted recently-saved songs for quick triage.
The design is non-destructive: “deleting” a widget toggles visibility rather than permanently removing it, so users can experiment freely. On mobile, the delete button is hidden to prevent accidental taps, and form inputs use standard iOS components.
2. Recommendation Fine-Tuning
Finding: Users felt the recommendation algorithm was a black box: opaque, unresponsive to their feedback, and impossible to correct when it went wrong. Skipping and liking songs felt like highly decoupled, indirect, and uncertain methods of shaping recommendations. Users wanted a direct mechanism to tell the algorithm what they actually like and dislike.
Intervention: A dedicated Recommendation Fine-Tuning interface, accessible from the main navigation. The system generates two snapshot playlists reflecting its current understanding of the user: “Songs We Think You’ll Love” and “Songs That Aren’t Your Jam.” The user can review these, identify misclassifications, and enter a focused rating flow.
In the rating flow, the system presents songs one at a time with a 5-point scale (Strongly Dislike → Strongly Like) and, critically, explainability tags showing why the system predicted the user would like or dislike each song (“You like this genre,” “Your friends have listened to this,” “You liked similar songs”). When the system’s prediction diverges significantly from the user’s rating, an optional follow-up asks why, with options like “I like this artist but not this song,” “Somebody else was listening, not me,” “I was just trying it out,” or “I have to be in the right mood for it.”
On mobile, the rating interaction uses a Tinder-inspired swipe gesture (left to dislike, right to like) optimized for quick, low-effort input during casual use.
A text field for additional context (“I’m going through a breakup, I want more sad music”) allows natural-language steering of the algorithm.
3. Recently Saved Playlist & Archive
Finding: The Liked Songs playlist was universally used but universally dysfunctional at scale. Users saved songs to Liked because it was the easiest option in the moment, especially in constrained contexts like driving, but this created an ever-growing collection that was impossible to organize later. Worse, indiscriminate “liking” polluted the recommendation algorithm’s understanding of the user’s actual preferences.
Intervention: A new “Save for Later” action, distinct from “Like,” with its own first-class playlist called Recently Saved. This separates the low-commitment act of bookmarking a song from the higher-signal act of expressing genuine preference, preserving the recommendation algorithm’s integrity while giving users a frictionless way to capture songs they want to revisit.
Recently Saved operates on a rolling 30-day window. Songs older than 30 days are automatically moved to an Archive playlist (not deleted, just deprioritized). This keeps the active playlist compact and manageable, creating a low-stakes sense of urgency to organize without punishing users who don’t get around to it. A “Moving to Archive Soon” section at the top of the playlist highlights songs approaching the window, and the homepage Song Organization widget periodically surfaces a handful of unsorted songs for quick triage, keeping the task bite-sized rather than overwhelming.
Use Case Design
Each intervention was formalized as a detailed use case specification, not just happy-path descriptions, but documents that model frequency of use, alternative courses, exceptions, special requirements, and open questions.
The recommendation fine-tuning use case, for example, includes a frequency-of-use model predicting divergent adoption patterns: one user group that never needs the feature (their recommendations are already good), and another that exhibits an initial spike of corrective activity followed by gradual decline as the algorithm converges on their preferences, with periodic returns when taste drifts over time. Special requirements include non-destructiveness (users must be able to experiment without fear of degrading performance), explainability (the system must surface why it made its predictions), and latency transparency (users must understand whether the fine-tuned model is in place yet or still updating).
Native Proof-of-Concept
In addition to the high-fidelity prototypes, I built a native SwiftUI proof-of-concept for both iOS and macOS that implemented the redesigned navigation and interaction flows. The app doesn’t play music (it’s a UI shell) but it demonstrates how the new features integrate into a native Spotify-like experience with platform-appropriate controls and interactions.
My Role
This was a five-person team project across the full semester. I served as a primary contributor across all phases. In the research phase, I conducted contextual inquiries, participated in all interpretation sessions, and co-authored the key findings and analysis, working particularly closely with Anisa Callis, whose research instincts and writing complemented my own throughout the synthesis process. In the design phase, I authored all three use case specifications, including the frequency-of-use modeling and special requirements analysis. I designed and built the high-fidelity prototypes for both desktop and mobile, and independently developed the native SwiftUI proof-of-concept.
Why This Matters
This project is a complete walk through the contextual design process, from field observation to work modeling to consolidation to redesign, applied to a product used by hundreds of millions of people. The value isn’t in proposing flashy features. It’s in the rigor of tracing each design decision back to observed behavior: real people, in real environments, doing real things that the current product doesn’t adequately support.
The meta-skipping finding, the junk drawer dynamic, the bimodal algorithm sentiment: these aren’t insights you get from surveys or analytics dashboards. They come from sitting with someone while they use the app and asking “why did you just do that?” Contextual inquiry surfaces the invisible workarounds, the quiet frustrations, the strategic behaviors that users develop when a system doesn’t quite fit. And those are exactly the insights that lead to designs people actually want.
HCIN 730, User-Centered Design Methods · Prof. Lawrence Roth · Rochester Institute of Technology · Fall 2025