Recommender Systems
Candidate Generation
Цели урока
- Understand how HNSW works and its trade-offs
- Design a two-stage retrieval pipeline with multiple candidate sources
- Apply a Bloom Filter for efficient deduplication of seen items
- Compare LSH and HNSW on key characteristics
In 2016 YouTube published its paper on the recommendation system. The core engineering problem: from 800 million videos, return a personalized list of 10 within 200 ms. The solution - a two-stage architecture with ANN Search for retrieval - became the industry standard. Netflix, Spotify, and TikTok all use the same pattern.
- **YouTube** - two-stage with matrix factorization for retrieval, DNN for ranking
- **Spotify** - HNSW for ANN search across 80M tracks, Bloom Filter for viewed exclusion
- **Pinterest** - LSH for pin recommendations against a continuously updated catalog
Предварительные знания
- Embeddings and vector similarity
- Matrix factorization basics
- Big-O reasoning about latency and throughput
Two-Tower Retrieval and the YouTube Deep Learning Era
In 2016 Paul Covington, Jay Adams, and Emre Sargin of YouTube published "Deep Neural Networks for YouTube Recommendations", which split the problem into two stages: a deep candidate-generation network that narrows millions of videos down to hundreds, then a separate ranking network. The candidate model framed retrieval as extreme multiclass classification and trained it with sampled softmax, approximating a softmax over the whole catalog from a small set of negatives. At serving time the learned user vector queries a precomputed index of video vectors through approximate nearest neighbor search. This two-tower, two-stage design became the default retrieval pattern across the industry, because exact scoring of an entire catalog per request is impossible at web scale.
ANN Search: HNSW and ScaNN
Spotify's catalog holds 80 million tracks, each represented as a 128-dimensional vector. Finding the 1,000 nearest tracks to a user profile via brute-force means 80 million dot products per request. At 10,000 requests per second that is computationally infeasible. **ANN (Approximate Nearest Neighbor) Search** is a family of algorithms that return sufficiently close vectors in milliseconds, trading a small fraction of recall for orders-of-magnitude speedup.
**HNSW (Hierarchical Navigable Small World)** builds a multi-layer graph: upper layers are sparse "highways" for fast traversal; lower layers are dense neighbor maps. Search starts at the top and descends, approaching the target at each level. **ScaNN (Scalable Nearest Neighbors)** from Google uses product quantization and asymmetric distance scoring to achieve 10-20x speedup via SIMD CPU instructions.
HNSW stores the entire graph in RAM - 80M vectors at dim=128 requires ~40 GB. In production, the index is sharded by `userId % N`, with each shard on a separate host. Milvus and Weaviate implement this out of the box.
What is the main trade-off of ANN Search compared to exact (brute-force) search?
Two-Stage Architecture
A sophisticated ranking model (200+ features, 50 ms inference) cannot run against 80 million tracks. But ANN alone lacks contextual signals. The industry solution is a **two-stage architecture**: the first stage retrieves hundreds of candidates quickly; the second stage ranks only those candidates precisely.
Why does a two-stage recommender use multiple candidate sources (ANN + item2item + trending) instead of one?
Bloom Filter for deduplication
A user has already watched 50,000 videos. Every recommendation request must exclude them from candidates. Keeping a Set of 50,000 IDs in memory per request is expensive, and checking each candidate against Redis is too slow. A **Bloom Filter** is a probabilistic data structure: it uses ~100x less memory and answers "possibly seen" or "definitely not seen" in O(1).
A Bloom Filter never produces false negatives: if it says "not seen", that item is guaranteed to be new. False positives ("possibly seen") are acceptable: occasionally a new item is suppressed. For recommendations, that is preferable to repeatedly showing already-watched content.
A Bloom Filter reports that a user "possibly saw" video X. This means:
Locality-Sensitive Hashing
HNSW stores its entire graph in RAM - hundreds of gigabytes for a billion vectors. **Locality-Sensitive Hashing (LSH)** offers an alternative: instead of a graph, it uses hash tables where similar vectors land in the same bucket. The method is based on random projections: if two vectors are close in the original space, their projections onto a random hyperplane have the same sign with high probability.
LSH places similar vectors into the same bucket via random projections. What property makes this work?
Candidate Generation
- HNSW: graph-based ANN, recall 95-97%, ~2-5 ms, effective up to ~100M vectors
- ScaNN: SIMD-optimized ANN from Google, 10-20x faster than brute-force
- Two-stage: retrieval (ANN + item2item + trending) -> ranking -> re-ranking
- Bloom Filter: O(1) deduplication with 33x memory saving, 1% false positive rate
- LSH: HNSW alternative for billions of vectors or frequently updated indexes
Related Topics
Candidate generation is the first stage of the retrieval pipeline; its output feeds ranking and re-ranking.
- Matrix Factorization — User and item vectors from MF are the primary input for ANN Search
- Feature Engineering for Recommendations — Retrieved candidates are enriched with features before being passed to the Stage 2 ranker
- Multi-Objective and Re-Ranking — Re-ranking operates on the top-1000 from candidate generation
Вопросы для размышления
- How should the number of Stage 1 candidates be chosen given a fixed latency budget?
- When does it make sense to maintain multiple ANN indexes (by genre, language) instead of one global index?
- How does a Bloom Filter behave as the number of seen items per user grows over time?
Связанные уроки
- rec-05 — Embeddings from sequential models seed ANN search
- rec-10 — Generated candidates get enriched with features
- rec-08 — Retrieved candidates pass into multi-objective re-ranking
- aie-10-vector-databases — Vector databases serve the same ANN retrieval
- ds-24-bloom-filter — Bloom filters deduplicate retrieved candidates
- alg-10-binary-search — ANN trades exactness for speed like search
- ml-01-intro