Natural Language Processing
NLP System Design
The gap between a working NLP model and a reliable production NLP system is enormous. Gmail's spam filter serves 1.8 billion users with 99.9% uptime and sub-100ms latency - not because the ML model is perfect, but because the serving infrastructure, fallbacks, monitoring, and human review pipeline are engineered for production. The same principle applies to every search engine, chatbot, and moderation system. FAANG NLP system design interviews test exactly this: can a candidate reason through the full stack, not just the model?
- **Airbnb Search** uses a multi-stage NLP pipeline - query understanding, semantic retrieval (dense + sparse fusion), listing reranking, and explanation generation - serving 150 million users with p99 latency under 200ms and A/B tested model changes every week.
- **Meta's content moderation** system applies 3-tier classification across 100 billion pieces of content per day across Facebook and Instagram, with 350+ specialized classifiers for different violation types and languages, handling 99.5% automatically before any human review.
- **Stripe's fraud detection** uses NLP on transaction descriptions and merchant names in addition to structured signals - a BERT-based merchant category classifier runs on every transaction and feeds into the risk scoring pipeline, processing 250 million transactions per day.
Предварительные знания
- Embeddings, bi-encoders, and approximate nearest neighbor search
- Retrieval-Augmented Generation: retrieval, reranking, and grounding
- Text classification, used for intent routing and content moderation tiers
- Basic systems thinking: latency percentiles (p50/p99), throughput, caching, and fallbacks