Natural Language Processing

NLP System Design

The gap between a working NLP model and a reliable production NLP system is enormous. Gmail's spam filter serves 1.8 billion users with 99.9% uptime and sub-100ms latency - not because the ML model is perfect, but because the serving infrastructure, fallbacks, monitoring, and human review pipeline are engineered for production. The same principle applies to every search engine, chatbot, and moderation system. FAANG NLP system design interviews test exactly this: can a candidate reason through the full stack, not just the model?

**Airbnb Search** uses a multi-stage NLP pipeline - query understanding, semantic retrieval (dense + sparse fusion), listing reranking, and explanation generation - serving 150 million users with p99 latency under 200ms and A/B tested model changes every week.
**Meta's content moderation** system applies 3-tier classification across 100 billion pieces of content per day across Facebook and Instagram, with 350+ specialized classifiers for different violation types and languages, handling 99.5% automatically before any human review.
**Stripe's fraud detection** uses NLP on transaction descriptions and merchant names in addition to structured signals - a BERT-based merchant category classifier runs on every transaction and feeds into the risk scoring pipeline, processing 250 million transactions per day.

Предварительные знания

Embeddings, bi-encoders, and approximate nearest neighbor search
Retrieval-Augmented Generation: retrieval, reranking, and grounding
Text classification, used for intent routing and content moderation tiers
Basic systems thinking: latency percentiles (p50/p99), throughput, caching, and fallbacks

NLP System Design

Предварительные знания

NLP System Design

Предварительные знания

From feature engineering to LLM-as-a-service

Semantic Search System Design

Production Chatbot Design

Content Moderation at Scale

Production NLP Pipeline Design

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки