Recommender Systems
Introduction to Recommender Systems
80% of everything watched on Netflix is not a conscious viewer choice. It is the algorithm's recommendation. TikTok's For You generates 30 billion dollars in revenue - the algorithm matters more than the content. Amazon: 35% of all sales come through "similar items". A recommender system is not a product feature. It is the business model.
- **Netflix** saves 1 billion dollars per year through recommendations - users stay because they always find something worth watching (GroupLens 1994 - the first collaborative filter - started it all)
- **Amazon** generates 35% of revenue through "Customers who bought this also bought..." - item-based collaborative filtering, rewritten back in 2003
- **Spotify Discover Weekly** - a personalized playlist of 30 songs, generated by a hybrid of collaborative filtering and audio analysis, for each of 500M users (30M+ plays in the first week after launch)
- **TikTok For You** - a hybrid system with reinforcement learning: 70% of users' time on the platform is driven by the recommendation engine, not by subscriptions
Предварительные знания
- Vectors and vector spaces - user and item profiles are represented as vectors
- Basic probability - estimating how likely a user is to enjoy an item
- What machine learning is - recommendation models are trained on interaction data
Tapestry and GroupLens: the Birth of Collaborative Filtering
The term "collaborative filtering" was coined in 1992 at Xerox PARC, where David Goldberg and colleagues built Tapestry, a system that filtered email by letting users annotate messages and react to each other's annotations. In 1994 the GroupLens project (Paul Resnick, John Riedl and others at MIT and the University of Minnesota) applied the idea to Usenet news, predicting how interesting an article would be from the ratings of like-minded readers. These two projects established the core insight behind every modern recommender: people with shared past tastes are good predictors of each other's future tastes. Everything from Amazon to Netflix grew from that seed.
Content-Based Filtering: Recommendations from Item Features
Netflix is open. A film appears on screen that nobody explicitly chose. The algorithm chose it - and got it right. **Content-based filtering** works without knowing anything about other people: it looks only at the features of the content itself. Three Christopher Nolan sci-fi films in the watch history? The system builds a preference vector and searches the catalog for matches by genre, tags, and director.
Every film, book, or track is described by a set of features: genre, author, year, keywords. The user gets a **profile** too - a vector assembled from the features of items they liked. The rest is simple: find catalog entries whose feature vector is closest to that profile. Nearest neighbor by cosine distance - next recommendation.
For text descriptions the key tool is **TF-IDF (Term Frequency - Inverse Document Frequency)**. It converts text into a numerical vector, weighting words by rarity. The word "film" appears everywhere - near-zero weight. The word "cyberpunk" is rare and informative - high weight. Rare words carry the meaning.
**Cosine similarity** measures the angle between two vectors - not distance, but direction. A value of 1.0 means perfectly aligned directions (ideal similarity); 0.0 means orthogonal vectors (nothing in common). Formula: cos(θ) = (A · B) / (||A|| × ||B||). Normalizing by length makes it fair to compare a film described by 10 genres and one described by three.
| Advantage | Disadvantage |
|---|---|
| No data from other users required | Limited to what the user has already enjoyed |
| Transparent recommendations (explainable) | Does not surface new genres (filter bubble) |
| Works for new users who have a history | Requires high-quality item descriptions |
| No cold start problem for items | Doesn't draw on the opinions of other users |
**The filter bubble** is the main trap of the content-based approach. Only watched action films? The system only recommends action films. A documentary that could have changed everything never appears. The algorithm loops on the history it already knows.
A user has watched three films - all sci-fi thrillers. A content-based system recommends a fourth. What criterion does it use?
Collaborative Filtering: The Wisdom of the Crowd
What if we throw out item features entirely? No genre, no director, no tags - only **people's behavior**. If Alice and Bob have rated 50 films almost identically, and Bob also watched Dune and gave it 5 stars, Alice will probably love it too. This is **collaborative filtering** - recommendations from collective experience, with no item descriptions at all.
The foundation is a **rating matrix**: rows are users, columns are items, cells contain ratings. Most cells are empty (the user has not seen the item). The goal is to **fill in the blanks** - predict how a user would rate something they have not yet watched.
Two main variants: - **User-based CF** - finds users with similar ratings and uses their scores to predict. "People like you also watched..." - **Item-based CF** - finds items that were rated similarly. "People who liked this film also watched..." Item-based CF is generally more stable: people's tastes change, but the relationship between items does not. Amazon switched to item-based back in 2003.
**The sparsity problem:** In real systems the rating matrix is only 1-3% filled. Netflix has ~500M users and ~15,000 films - that is 7.5 trillion cells, fewer than 1% of which contain a rating. Cosine similarity between two users who have only co-rated 2 films is statistically meaningless.
**Item-based vs user-based in practice:** Item-based CF dominates in production. The reason: there are typically far fewer items than users (100K films vs 500M users), so the item-item matrix is more compact. Relationships between items are also more stable - a film does not change its genre, while user tastes drift over time.
The Netflix rating matrix is less than 2% filled. What problem does this create for collaborative filtering?
Hybrid Systems: The Best of Both Worlds
2006. Netflix announces a competition: one million dollars to whoever can improve the algorithm by 10%. Three years. Thousands of teams. The 2009 winner - "BellKor's Pragmatic Chaos" - delivered a **blend of 107 algorithms**. No single approach produced the required improvement on its own. The combination covered the blind spots of each.
**Hybrid recommender systems** combine multiple approaches. Content-based works well for new items - they have features. Collaborative works well for users with history - neighbor ratings exist. Together they cover each other's weaknesses.
| Strategy | How it works | Example |
|---|---|---|
| Weighted | Linear combination: score = α × CB + (1-α) × CF | score = 0.3 × content + 0.7 × collab |
| Switching | Switch method based on context: if CF is not possible → use CB | New user → content-based; otherwise → collaborative |
| Cascade | Pipeline: CB selects candidates → CF ranks them | Genre sci-fi → then rank by ratings |
| Feature Augmentation | One method enriches the input of the other | CF → latent features → input for CB model |
| Meta-level | One method builds a model; the other uses it | CB builds user profile → input for CF |
Netflix Prize: $1M for a 10% Improvement
The Netflix Prize (2006-2009) was a watershed moment for recommender systems. Netflix's own algorithm - Cinematch - had an RMSE of 0.9525. The prize required reducing it to 0.8572 (a 10% improvement). Thousands of teams around the world competed for three years. The winners blended **107 distinct algorithms** into a single ensemble. A notable footnote: Netflix never deployed the winning algorithm - it was too complex for production. But the competition spawned an entire generation of research.
**In practice, virtually every large-scale system is a hybrid.** YouTube combines collaborative filtering (what similar users watched) with content-based (video tags, descriptions) and contextual signals (time of day, device, country). Spotify uses collaborative (user playlists), content-based (audio features), and NLP (lyrics, playlist descriptions).
**Cascade hybrid is the most practical starting point.** The first stage (retrieval) quickly selects 100-1,000 candidates using a fast, simple method. The second stage (ranking) precisely reorders the candidates with a sophisticated model. This balances speed against accuracy.
A Netflix recommendation system surfaces different films depending on the time of day and context. Which hybrid strategy is being used?
Cold Start: The Newcomer Problem
A new user signs up for Spotify. Zero plays, zero likes, zero playlists. Collaborative filtering is helpless - no data to compare against anyone. Content-based is also helpless - no preference profile. This is the **cold start problem** - one of the most fundamental challenges in recommender systems. Data is needed to give recommendations. Recommendations are needed to gather data. A closed loop.
Cold start comes in three forms: - **New user cold start** - a new user with no history - **New item cold start** - a new item that nobody has rated yet - **New system cold start** - a new platform with no data at all Each form demands its own strategy, and every major service addresses it differently.
| Strategy | Cold start type | How it works |
|---|---|---|
| Popularity-based | User / System | Recommend the most popular items - a "safe" baseline choice |
| Demographic | User | Use age, country, and language for initial recommendations |
| Onboarding | User | Ask the user to rate 5-10 items at registration |
| Content features | Item | Describe the new item through features - genre, tags, author |
| Bandits (Explore/Exploit) | User / Item | Show varied content at random and learn quickly from reactions |
**Explore vs Exploit (Multi-Armed Bandit):** Advanced systems use bandit algorithms for cold start. The idea: show the user varied items at random (explore) and quickly learn from their reactions (exploit). The epsilon-greedy algorithm shows the current best option 90% of the time and a random item 10% of the time. After just 10-20 interactions, the system already has a meaningful picture of the user's tastes.
**New item cold start is equally serious.** A new song uploaded by an unknown artist on Spotify will not be recommended through collaborative filtering - nobody has listened to it yet. For this reason, Spotify analyzes the song's **audio features** (tempo, energy, danceability) via content-based methods, as well as its lyrics and metadata. Without this, new content falls into a silence trap: no recommendations - no plays - no recommendations.
**Onboarding is the most effective remedy for user cold start.** Netflix asks new users to rate several films. Spotify asks them to choose favorite artists. Pinterest asks them to mark topics of interest. Even 3-5 signals from a user radically improve recommendation quality compared to a pure popularity-based approach.
Collaborative filtering is always better than content-based because it uses data from real users
The choice between CF and CB depends on context: data density, the proportion of new items, and the specific task. Under sparse data and cold start conditions, content-based can outperform collaborative.
CF requires a sufficient number of ratings to produce reliable similarity estimates. On a platform with millions of items and few active users (e.g., e-commerce with a long-tail catalog), the rating matrix is extremely sparse, and CF only "sees" popular items. CB works for every item that has a description, even if nobody has rated it yet. In practice, the best results come from a hybrid that adapts to the volume of available data.
A new song by an unknown artist appears on a music platform. Which strategy is most effective for recommending it?
Key Takeaways
- **Content-based filtering** recommends based on item features (genre, author, TF-IDF). Transparent and explainable, but creates a filter bubble - it loops on the user's own history.
- **Collaborative filtering** taps into collective behavior: user-based (similar people) and item-based (similar items). More powerful, but data-hungry and vulnerable to sparsity.
- **Hybrid systems** combine approaches (weighted, switching, cascade). The Netflix Prize demonstrated: 107 algorithms beat any single method.
- **Cold start** is a fundamental challenge: without data about a user or an item, meaningful recommendations are impossible. Solutions: popularity, onboarding, content features, bandits.
Related Topics
Recommender systems draw on linear algebra, similarity metrics, and machine learning techniques:
- Collaborative Filtering in Depth — A deeper look at user-based and item-based approaches, prediction formulas, and metric selection
- Matrix Factorization — SVD and ALS - advanced collaborative filtering methods that decompose the rating matrix
Вопросы для размышления
- Consider a high-traffic service (YouTube, Spotify, TikTok). What type of recommendations does it most likely use - content-based, collaborative, or hybrid? What signals suggest this?
- The filter bubble is a serious issue: recommendation systems narrow a user's horizons. How can a system be designed to balance accuracy with diversity of recommendations?
- For a new streaming service launching from scratch (0 users, 10,000 films), which cold start strategy fits best and why?
Связанные уроки
- rec-02 — Collaborative filtering and matrix factorization
- ml-01-intro — ML as foundation for training recommendation models
- la-01-vectors-intro — Vector spaces for user/item embeddings
- prob-04-bayes — Bayesian methods for preference estimation
- ir-01 — Search and recommendations - two sides of relevance
- ml-14-knn — kNN - simplest recommendation algorithm
- ml-01