System Design

Case Study: Twitter/X

When Obama tweets to 130 million followers, the naive design writes 130 million records before the HTTP response returns. At the historical world-record tweet rate of 143,199 tweets per second (FIFA Women's World Cup final, 2011), pure fan-out on write would require 18 trillion writes per second - more than the entire global database industry processes. The architecture had to bend.

Twitter's 2010 Fail Whale era forced the rewrite from Ruby-on-Rails monolith to JVM microservices, documented by Raffi Krikorian.
Snowflake (open-sourced 2010) generates time-sortable 64-bit IDs and is now standard at Discord, Instagram, Shopify, and Sony.
Twitter ran roughly 105 TB of RAM across Redis Cluster nodes in 2013 just to hold home timelines as ZSETs.
Manhattan, Twitter's homegrown distributed KV store, replaced FlockDB and Cassandra for tweet storage to control tail latency below 10ms p99.

Requirements and Scale

**Obama has 130 million followers. When he tweets, what happens in your system?**

**Functional Requirements:** - Post tweets (text, images, video) - Follow / Unfollow users - Home Timeline: tweets from people I follow - User Timeline: all tweets of a specific user - Search tweets - Notifications (mentions, retweets, likes) **Non-Functional Requirements:** - High availability: 99.99% - Eventually consistent timeline (users accept slight delay) - Low timeline read latency: < 50ms - Tweet posting: < 100ms **Scale:** - 350 million monthly active users - 500 million tweets per day - 500,000 tweets per second at peak - Read/Write ratio: 100:1 (reading is dominant) - Average user follows 200 people - Celebrities: millions of followers

**Real World:** Twitter was founded in 2006. Architecture challenges emerged with growth: in 2009 the Fail Whale appeared during high-traffic events. The transition to microservices and the current architecture took years.

Why is making timeline reads fast more important than making tweet posting fast?

The Fan-Out Problem

**What are the trade-offs between fan-out on write and fan-out on read?**

**Fan-out on Write (Push Model)** When a tweet is posted → immediately write to all followers' timelines. ``` User A posts tweet → Write to timeline of follower 1 → Write to timeline of follower 2 → ... → Write to timeline of 1M followers ``` Pros: - Timeline read is instant (pre-computed) - No computation on read Cons: - If a user has 100M followers → 100M writes per tweet - Slow for celebrities and hot users - Wasted writes for inactive followers **Fan-out on Read (Pull Model)** When a user opens Twitter → dynamically query all followed users' tweets. ``` User opens timeline → Fetch IDs of 200 people I follow → Query tweets from each → Sort by time → Return top 100 ``` Pros: - Tweet posting is instant - No waste for inactive users Cons: - Timeline construction is slow (N+1 queries) - High load on read - Hard to paginate and sort

**Real World:** Early Twitter used pure fan-out on read and couldn't handle the load. Switching to fan-out on write helped but broke for celebrity accounts. The solution: hybrid approach.

What is the main problem with pure fan-out on write for users with millions of followers?

Hybrid Fan-Out Architecture

**How does Twitter combine both approaches to get the best of each?**

Twitter uses a **Hybrid Fan-Out** approach: **Regular Users (< 100K followers) → Fan-out on Write** - When posting, write tweet_id to the timeline cache of all followers - Fast reads from Redis **Celebrities (> 100K followers) → Fan-out on Read** - Celebrity tweets are NOT pre-written to followers' feeds - When a user opens their timeline, celebrity tweets are fetched separately and merged **Timeline Construction:** ``` 1. Get pre-computed feed from Redis (regular users' tweets) 2. Get list of celebrities I follow 3. Fetch their recent tweets from their user timeline 4. Merge + sort by time 5. Return top N tweets ``` **Why this threshold?** - A post by a 100-follower user: 100 writes - acceptable - A post by a 1M-follower user: 1M writes - too expensive - Threshold ~100K followers balances write load **Hot User Identification:** Monitor follower count. Cross the threshold → switch strategy automatically.

**Real World:** This is exactly Twitter's actual architecture (described by Raffi Krikorian, former VP Engineering). The system monitors follower counts and dynamically switches between strategies.

Why are celebrity tweets (1M+ followers) processed with fan-out on read instead of fan-out on write?

Timeline Storage in Redis

**How does Redis store timelines for 350 million users?**

Twitter stores home timelines in **Redis** using **Sorted Sets**. **Data Structure:** ``` Key: timeline:{userId} Value: Sorted Set - member: tweet_id - score: timestamp (Unix) ``` **Operations:** - Post tweet: ZADD timeline:{followerId} {timestamp} {tweetId} - Read timeline: ZREVRANGE timeline:{userId} 0 99 (top 100 newest) - All operations: O(log N) **Tweet Storage:** Redis only stores tweet_ids. Actual content in a separate DB (MySQL/Manhattanβ). To display a timeline: 1. Get 100 tweet_ids from Redis 2. Batch fetch tweets from MySQL 3. Merge with user data **Memory Optimization:** - Max 800 tweets per timeline in Redis - Older tweets → paginate from disk storage - Compression: store only tweet_id (8 bytes) + score (8 bytes) = 16 bytes per record - 350M users × 800 tweets × 16 bytes ≈ 4.5 TB of Redis (distributed cluster)

**Real World:** Twitter engineers revealed at a 2013 conference that they had ~105 TB of RAM for Redis clusters storing timelines. Redis Cluster with consistent hashing across hundreds of nodes.

Why does Redis store only tweet_ids and not the full tweet content?

Data Model

**What databases does Twitter use for different types of data?**

**Tweet Storage - MySQL (Manhattan)** Twitter has custom MySQL infrastructure: ```sql CREATE TABLE tweets ( id BIGINT PRIMARY KEY, - Snowflake ID user_id BIGINT NOT NULL, content VARCHAR(280), media_ids JSON, reply_to_id BIGINT, retweet_id BIGINT, like_count INT DEFAULT 0, created_at TIMESTAMP ); ``` **User Data - MySQL** ```sql CREATE TABLE users ( id BIGINT PRIMARY KEY, username VARCHAR(50) UNIQUE, display_name VARCHAR(100), bio TEXT, follower_count INT, following_count INT, verified BOOLEAN ); ``` **Follow Graph - FlockDB (Graph DB)** Custom distributed graph database for follower/following relationships: - Forward index: Who do I follow? - Reverse index: Who follows me? **Tweet IDs - Snowflake** Twitter developed Snowflake for distributed unique ID generation: ``` 64-bit ID: [41 bits: timestamp] [10 bits: machine_id] [12 bits: sequence] ``` - Globally unique - Sortable by time (no extra index needed) - Generates 4096 IDs per millisecond per machine

**Real World:** Twitter open-sourced Snowflake in 2010. Now it's the standard approach for distributed ID generation - used by Discord, Instagram, and others in various forms.

The follower graph is graph data, so Twitter must use a real graph database like Neo4j with traversal queries.

Twitter's FlockDB stored the follow graph as two sharded inverted indexes (forwards/backwards) over MySQL, not as a property graph. Almost all production queries are 1-hop lookups (who do I follow, who follows me), which are key-prefix scans, not traversals.

Real graph engines optimize for deep traversal and pattern matching. For 1-hop fan-out at billions of edges, partitioned inverted indexes on commodity stores outperform property graphs by an order of magnitude in throughput and operational simplicity. FlockDB powered 13 billion edges before being retired in 2020.

What are the advantages of using Snowflake IDs over UUID for tweets?

What stays in memory

Hybrid fan-out: write-based for sub-100K-follower accounts, read-based for celebrities - the threshold balances write amplification against read fan-in.
Home timelines live in Redis Sorted Sets keyed by user_id, scored by timestamp, holding only tweet_ids (16 bytes per entry).
Snowflake IDs encode time in the high 41 bits, making ORDER BY id equivalent to ORDER BY created_at - no separate index needed.
Read-to-write ratio is 100:1, so every architectural choice prioritizes read latency over write throughput.
FlockDB stored the follow graph as two inverted indexes (forwards, backwards) - 1-hop queries dominated, deep traversal was rare.

Where this connects

Twitter's hybrid fan-out is a reference pattern reused at Instagram, LinkedIn, Pinterest, and Mastodon. Snowflake IDs propagated industry-wide. The 100:1 read-write skew is the key invariant that shapes every downstream choice.

URL shortener (precursor scalability case) — builds-on
Message queues for fan-out — applies
Caching for celebrity timelines — applies
NoSQL data models — applies

Вопросы для размышления

If notifications (mentions, retweets, likes) had the same 100:1 read-write ratio as timelines, would you choose fan-out on write or fan-out on read for that service, and why does the answer differ from the timeline case?
Snowflake IDs leak timestamps - given two adjacent IDs, an attacker can infer when an account was created. What is the trade-off between time-sortability and information disclosure, and would you accept it for a healthcare or messaging product?
The hybrid threshold of 100K followers is a tuning parameter, not a law. If write IOPS got 10x cheaper tomorrow, how would the threshold move, and what feature changes would that unlock?

Связанные уроки

sd-13-url-shortener — Basic scaling patterns before a complex social network
sd-09-message-queue — Fan-out via message queue for tweet delivery to timelines
sd-07-caching — Celebrity timelines are cached separately due to fan-out problem
db-17-nosql-overview — NoSQL for timelines: rows = users, columns = tweets
dist-14-sharding