Stream Processing
Streaming at FAANG Interviews
Google, Meta, Amazon, Netflix - system design is half of a FAANG interview. Streaming questions: 'design YouTube Analytics', 'design Uber's notification system', 'design Twitter feed'. All share the same structure: Kafka + streaming processor + storage. The goal is to know the trade-offs - not the 'correct' answer.
- **Meta Scribe**: log aggregation via Kafka -> Hadoop. 10TB/hour. Notification system on top: Kafka -> Flink -> push/email/SMS workers
- **Twitter timeline**: hybrid push/pull feed. Redis Sorted Set for active users, pull for celebrities. Evolution from monolith to streaming 2012-2016
- **Airbnb Minerva**: real-time analytics pipeline. Kafka + Flink + Druid. 100M+ events/day. A/B test results in real-time
Design: Notification System for 100M Users
A classic FAANG question: design a notification system like Facebook or Twitter. Scale: 100M MAU, 1B notifications per day. Types: push (mobile), email, SMS, in-app. Requirements: delivery guarantee (at-least-once), latency < 1 second for push, idempotency (no duplicates).
Why is a Redis bloom filter better than a MySQL CHECK for notification deduplication at 1B/day scale?
Design: News Feed for a Social Network
News feed is the most common FAANG streaming question. Twitter, Instagram, Facebook: a user publishes a post -> all followers' feeds update. Two approaches: Push (fanout on write): on publish, write to each follower's feed. Pull (fanout on read): the feed is assembled at read time from followed accounts' posts. Or a hybrid.
Why does Instagram use a hybrid approach (push for regular users, pull for celebrities)?
Design: Real-Time Analytics Pipeline
A typical question: design a system like Google Analytics or Amplitude - event tracking with real-time dashboards. 100K events/sec, dashboard latency < 30 seconds. Two layers: hot path (streaming, minutes) and cold path (batch, hours-days). Lambda Architecture: both paths, results merged.
Why does Lambda Architecture use two paths (hot + cold) instead of one?
Trade-off Analysis: How to Answer FAANG Interviews
A FAANG interviewer is not looking for the 'correct' answer - they seek structured thinking. Framework: (1) Clarify: scale, SLA, is consistency needed? (2) High-level design: draw the components. (3) in-depth look: trade-off at each layer. (4) Bottleneck analysis: where does it break at 10x load? (5) Monitoring: what to monitor, what alerts to set.
At a streaming design interview, the goal is to choose the 'best' technology
At an interview, the goal is to explain the trade-off: why Kafka over RabbitMQ, why Flink over Spark, why ClickHouse over PostgreSQL - for the specific requirements of the problem
There is no 'best' technology - only trade-offs. RabbitMQ is better for task queues (ack per message). Kafka is better for replay, fan-out. Flink is better for stateful streaming. Spark is better for batch. A senior demonstrates understanding of trade-offs, not technology holy wars
An interview question: 'what happens at 10x load in your system?' Which answer demonstrates a senior level?
Related Topics
FAANG streaming design is built on several patterns:
- CQRS Pattern — Feed: write path (Kafka fanout) and read path (Redis cache) - that is CQRS
- Change Data Capture — CDC is the event source for notification and feed systems
- Designing Systems at Scale — Backpressure and partitioning are required topics in any streaming design
Key Ideas
- **Notification**: Kafka fan-out -> delivery workers. Redis bloom filter for dedup. Rate limiting per user. At-least-once + idempotent delivery.
- **Feed**: push (Redis Sorted Set) for regular users, pull for celebrities (>100K followers). Hybrid is the standard.
- **Analytics**: Lambda Architecture = hot path (Flink + ClickHouse, 30 sec) + cold path (Spark + S3, hours). Both are needed.
- **Interview framework**: Clarify -> High-level -> in-depth look -> Bottleneck 10x -> Monitoring. Specific metrics, not abstractions.
- **Trade-off > 'correct' answer**: explain why Kafka not RabbitMQ, when Flink vs Spark. That is what the interviewer wants.
Вопросы для размышления
- Notification system: if FCM (push provider) is down for 30 minutes - how to buffer notifications and avoid loss or duplication on recovery?
- Feed with Redis Sorted Set: what happens on a cache miss (user has not logged in for 30 days)? How to build cold cache rebuild without a thundering herd?
- Lambda Architecture requires maintaining two codebases (Flink + Spark). When would Kappa Architecture (streaming only with replay) be better?
Связанные уроки
- stream-15 — Design patterns from the previous lesson are the foundation for interview solutions
- stream-09 — Exactly-once is key in notification and financial systems
- stream-11 — CQRS is the standard pattern in feed and notification systems
- stream-13 — CDC is a typical event source in notification pipelines
- ds-04-consistent-hashing — The system design interview framework also applies to streaming problems
- dist-12-consistency