Recommender Systems
Multi-Objective and Re-Ranking
Spotify Discover Weekly - 30 personalized tracks every Monday. The first version (2015) delivered very similar tracks: high relevance, zero variety. Users listened to the first few and closed it. After adding diversity-aware re-ranking, completion rate grew dramatically - the playlist began to "feel like discovery" rather than "more of the same".
- **Spotify Discover Weekly** - diversity re-ranking to balance personalization and discovery of new music
- **LinkedIn Feed** - fairness constraints: new content creators get guaranteed minimum exposure regardless of initial engagement
- **TikTok For You Page** - epsilon-greedy exploration: every N videos is an exploration slot for new content with no history
Предварительные знания
- Ranking and scoring of candidates
- Embeddings and cosine similarity
- Basic probability (Beta distribution, expectation)
From Learning to Rank to Multi-Task Ranking
In 2005, Chris Burges and colleagues at Microsoft Research published RankNet, a neural learning-to-rank model trained on pairwise preferences with a probabilistic cost. RankNet powered Microsoft's Bing ranking and started the modern learning-to-rank era, later followed by LambdaRank and LambdaMART. The next leap was learning many objectives at once. In 2018 Jiaqi Ma and colleagues at Google introduced Multi-gate Mixture-of-Experts (MMoE), where shared experts are combined through per-task gating networks so loosely related objectives stop fighting each other. In 2019 Zhe Zhao and the YouTube team applied multi-task ranking in production, predicting engagement and satisfaction objectives together and adding a shallow tower to correct position bias. That architecture is why a single ranking model can balance clicks, watch time, and satisfaction at the re-ranking stage.
Diversity: why 10 similar recommendations are worse than 10 varied ones
Early versions of Spotify Discover Weekly delivered 30 tracks very similar to each other - same genre, same tempo. Users listened to the first 5 and closed the playlist. After introducing diversity-aware re-ranking, the playlist began to "feel fresh" - and completion rate grew. Relevance and diversity are different quality axes.
**Maximal Marginal Relevance (MMR)** is an iterative selection algorithm that balances relevance and novelty. At each step, the document is selected with the maximum difference between relevance and similarity to already-selected items.
**Submodular optimization:** diversity problems are often submodular - adding each new item yields diminishing marginal gains in variety. This allows greedy algorithms (like MMR) with a provable (1-1/e) ≈ 63% of optimal solution guarantee at polynomial complexity.
MMR with lambda=0 (minimum) prioritizes:
Fairness: equity for providers and users
Spotify found that 90% of listens go to 1% of artists - even among tracks of equal quality. New artists receive almost no exposure. This is **provider fairness** - inequity for content creators. **User fairness** is when the system systematically serves minorities worse (users with niche tastes, non-English speakers).
| Fairness type | Who suffers | Metric | Solution |
|---|---|---|---|
| Provider fairness | New/niche artists | Exposure per group | Min-exposure constraints |
| User fairness | Users with niche tastes | Error rate by group | Group-specific calibration |
| Disparate impact | Protected categories | DI ratio (>= 0.8) | Post-processing re-ranking |
| Popularity bias | Unpopular items | Long-tail coverage | Exploration boost for new items |
A disparate impact ratio < 0.8 means:
Exploration-Exploitation: bandits for new items
Cold-start problem: a new track on Spotify has no history - collaborative filtering gives it a zero score. With epsilon-greedy, with probability ε a random item is recommended (exploration), with probability 1-ε the best by current estimate (exploitation). Thompson Sampling is a Bayesian approach without a hard ε.
**When to use which method:** epsilon-greedy is simple but inefficient (constant exploration rate). Thompson Sampling is adaptive - high uncertainty leads to more exploration, low uncertainty to more exploitation. LinUCB incorporates user context - the best choice for personalized exploration.
Why is Thompson Sampling more efficient than epsilon-greedy for cold-start?
Business Rule Injection: reality on top of ML
An ML model optimizes a proxy metric. The business has additional requirements: content safety, licensing restrictions, sponsored content, regional legal prohibitions. A **post-processing re-ranking layer** applies these rules after ML scoring - without retraining the model.
**Re-ranking pipeline architecture:** ML candidate generation (1000 items) -> ML scoring & ranking -> business rules post-processing -> diversity re-ranking (MMR) -> fairness constraints -> final top-K. Each layer is independent and can be changed without retraining the model.
Diversity and fairness conflict with relevance - they cannot be optimized simultaneously.
A small relevance drop (+diversity, +fairness) often increases long-term engagement and user satisfaction. MMR with lambda=0.7 typically loses 2-5% precision while gaining +20-30% diversity. This is an acceptable trade-off for most products.
Users tire of filter bubbles - homogeneous content reduces session engagement. Studies at Netflix and Spotify showed: diverse recommendations increase user return rate.
Why are business rules applied after ML scoring rather than as constraints during model training?
Multi-Objective and Re-Ranking
- **Diversity (MMR):** iterative selection: relevance - lambda * similarity_to_selected; submodular, greedy algorithm with (1-1/e) guarantee
- **Fairness:** provider fairness (exposure for new creators), user fairness (equal quality for minorities), disparate impact ratio >= 0.8
- **Exploration:** epsilon-greedy (constant), Thompson Sampling (adaptive posterior), LinUCB (contextual) for cold-start items
- **Business rules post-processing:** licensing, age, sponsored fraction - fast changes without model retraining
Related Topics
Re-ranking is the final layer of the recommender pipeline above retrieval and scoring.
- Context-Aware Recommendations — Multi-task scores from context-aware models are the input to the re-ranking layer
- Candidate Generation — Re-ranking operates on the top-1000 from retrieval; retrieval quality bounds re-ranking
Вопросы для размышления
- How should lambda be chosen in MMR for a specific product - what data is needed for a principled decision rather than an intuitive one?
- Why is provider fairness (exposure for new artists) beneficial for the platform in the long run, even if it temporarily reduces engagement metrics?
- How does Thompson Sampling behave with seasonality - should the prior be reset when a new season begins?
Связанные уроки
- rec-07 — Context produces the signals balanced during re-ranking
- rec-09 — Candidate generation supplies items to re-rank
- rec-11 — Bandit exploration drives online re-ranking choices
- ml-48-rl-intro — Exploration-exploitation comes from reinforcement learning
- stat-05-hypothesis — Fairness checks resemble statistical hypothesis testing
- ml-01-intro