Databases

System Design: Twitter Feed

Beyonce announced her pregnancy at the 2013 Super Bowl. Within minutes, 5 million tweets were posted and Twitter collapsed. The naive timeline approach - a JOIN across tweets and follows tables per read request - could not handle 300,000 concurrent reads. This incident shaped the pre-computed timeline cache architecture that Twitter uses today: write fanout to Redis at post time, O(1) reads from cache.

  • **Twitter Fanout Service**: pre-computes timelines for all active users in Redis. Approximately 99% of 300,000 reads/sec are served from cache. The database only handles tweet persistence and cache rebuilds for inactive users.
  • **Instagram Feed and Stories**: hybrid fanout architecture for 800M+ daily timeline reads. Regular accounts use fanout-on-write; verified accounts with large followings use fanout-on-read merged at request time.

Requirements and Scale Estimates

Before designing any system, clarify functional requirements (what the system does) and non-functional requirements (scale, latency, availability). Capacity estimation grounds the design in concrete numbers and reveals the bottlenecks. For Twitter-scale feed generation, the read-to-write ratio (~300:1) makes timeline pre-computation the dominant design decision.

Twitter's 2013 Super Bowl + Beyonce announcement moment - 5 million tweets in minutes - took down the timeline service because the naive JOIN approach could not handle the fan-out. This incident directly led to the pre-computed timeline cache architecture.

Twitter timeline read rate is 300,000 RPS. Without precomputation, each request requires a JOIN across millions of rows. What fundamental approach solves this?

Fanout on Write vs Fanout on Read

Fanout on Write (push model): when a user posts a tweet, the tweet ID is immediately pushed into each follower's timeline cache. Timeline reads are fast (cache lookup), but posting is expensive for users with many followers. Fanout on Read (pull model): the timeline is assembled at read time from each followee's recent tweets. Read is expensive; write is cheap.

Elon Musk has 150M followers and posts a tweet. In fanout-on-write, how many Redis operations does this trigger?

Timeline Cache Design

The timeline cache stores a sorted list of tweet IDs per user. Redis sorted sets (ZADD with timestamp as score) or lists (LPUSH with LTRIM) are the primary implementation. The cache holds a fixed window of recent tweet IDs (typically 1,000); older content is retrieved from the database.

Twitter's Fanout Service (described in their 2013 blog post) maintains timeline caches for all active users in Redis. The cache holds 1,000 tweet IDs per user. When a user scrolls past the 1,000th tweet, the request falls back to the database. Approximately 99% of read traffic is served entirely from Redis, keeping database load manageable.

A user returns to Twitter after 3 days of inactivity. Their timeline cache may be stale. What is the standard handling?

Celebrity Problem: Handling Hot Users

Fanout on write breaks for users with millions of followers: a single tweet triggers millions of Redis writes, creating a write spike. The celebrity problem requires treating high-follower users differently. The hybrid approach: fanout on write for regular users (<1M followers), fanout on read (merged at request time) for celebrities.

Architecture: 99% of users use fanout-on-write, 1% celebrities use fanout-on-read. When a user reads their timeline, what happens for celebrity tweets?

Complete System Design

The complete Twitter feed architecture combines: a tweet storage database (MySQL/PostgreSQL), a Fanout Service for write-time distribution, a Redis cluster for timeline and tweet caches, a celebrity detection system, and a Timeline Read Service that merges pre-computed and real-time feeds.

Instagram uses a similar hybrid architecture for Stories and Feed. Regular users get fanout-on-write; verified accounts with large followings get fanout-on-read. Instagram's engineering team published that their Feed system processes over 800 million timeline reads per day, with 99% served from Redis cache.

A user deletes a tweet. Their 5 million followers' timeline caches still contain the tweet ID. How is this handled?

Key Ideas

  • **Scale first**: Twitter's 52:1 read:write ratio makes timeline pre-computation the critical design decision. Never design for scale without concrete numbers.
  • **Fanout on write** pushes tweet IDs to each follower's Redis sorted set at post time. O(1) reads, O(followers) writes.
  • **Fanout on read** assembles the timeline at read time from each followee's recent tweets. O(1) writes, O(followees) reads. Unacceptable at 300K reads/sec.
  • **Celebrity problem**: fanout-on-write for >1M followers creates a write spike. Hybrid: regular users = write fanout; celebrities = their own cache merged at read time.
  • **Tweet deletion**: soft delete in tweet cache. Timeline caches are not updated. Deleted tweets are filtered at hydration time when the ID is encountered.

Related Topics

Twitter feed design applies several database patterns:

  • Redis — Redis sorted sets (ZADD/ZREVRANGE) are the core timeline data structure. Redis Cluster shards timeline data by user_id across nodes.
  • Caching — The timeline cache is a write-through cache: tweet posted -> fanout service pushes to cache. Cache stampede on cache miss is handled by lazy rebuild.
  • Polyglot Persistence — Twitter feed uses multiple databases: PostgreSQL for tweets, MySQL for followers, Redis for timeline cache. Each database is chosen for its specific access pattern.

Вопросы для размышления

  • Twitter's threshold for switching from fanout-on-write to fanout-on-read is ~1M followers. How would you determine the optimal threshold for your system based on read/write traffic ratios?
  • A user follows 500 people, 10 of whom are celebrities. At read time, 10 additional sorted set lookups are made. How does this overhead scale as the number of celebrities on the platform grows?
  • How would you implement the 'Top Tweets' feature (showing most-liked tweets from the last 24h) within the pre-computed timeline architecture?

Связанные уроки

  • sd-14-twitter
System Design: Twitter Feed

0

1

Sign In