Chat architecture
WhatsApp delivers 100 billion messages a day. Telegram supports groups of up to 200,000 people. Discord serves 19 million servers. What do their architectures have in common?
- **WhatsApp** (100B messages/day) uses a deterministic channel key `min(id):max(id)` - no DB lookup on the first message
- **Slack** introduced threads in 2017 and added `threadId` plus `parentId` - two fields instead of one to allow future nesting
- **Discord** shards fan-out by server size: under 1,000 members - direct broadcast, larger - a Kafka pipeline
- **Telegram** mega-groups (200k members) switch to a pull model when the chat opens instead of pushing to everyone
1:1 chat architecture
A 1:1 chat is the simplest topology, but it raises a non-trivial question: how do you identify the channel between two users? Creating a room on the first message means having up to N*(N-1)/2 potential rooms. WhatsApp and Telegram use a deterministic key: `min(userId_A, userId_B):max(userId_A, userId_B)`.
Telegram processes 15 billion messages per day. The key to its scalability is sharding by `channelId`: all messages for a single conversation live on the same shard, which guarantees data locality when paginating history.
Storing a channel lazily (on the first message) saves storage: a user with 1,000 contacts has up to 1,000 potential DM channels but actually uses 20-30. Facebook Messenger works exactly that way.
Users A (id=5) and B (id=3) start a conversation. The deterministic channel ID is...
Group chats: scaling delivery
Group chats break the simple 1:1 model. A Telegram group can hold 200,000 members. When a new message arrives it has to be delivered to every online member - this is the fan-out problem.
Discord uses a hybrid approach: up to 1,000 members - direct fan-out through internal pub/sub; above that - asynchronous queues via Elixir/Phoenix Channels. Across 19 million servers Discord processes 4 million messages per minute.
- **Small groups (< 100)**: direct fan-out via WebSocket broadcast
- **Medium groups (100-10k)**: Redis Pub/Sub with sharding across servers
- **Large groups (> 10k)**: Kafka fan-out with batch delivery by workers
- **Telegram mega-groups (> 100k)**: messages are not guaranteed to reach everyone, only online users plus pull on open
A Discord server with 50,000 members receives a new message. Why is direct fan-out over WebSocket not a fit?
Threads: nested conversations
Threads are replies tied to a specific message, forming a nested conversation. Slack introduced them in 2017, which significantly complicated the data model: every message can now be the root of a tree.
Slack stores 10+ billion messages. The key decision: `threadId` and `parentId` are two different fields. `threadId` points to the thread root (for group-by); `parentId` points to the immediate parent (for future nested-reply support). Slack itself uses only one level of nesting.
Denormalizing `replyCount` and `lastReplyAt` is critical for performance. Without them, every channel-list render would require an aggregating query across all thread messages.
When loading channel messages in a Slack-style app you only want to show root messages (not thread replies). Which WHERE clause is correct?
Reactions: real-time counters
Emoji reactions look simple, but they create hotspots: a popular Slack message can pick up hundreds of reactions per minute. The naive approach (UPDATE a counter on every reaction) destroys performance.
Slack uses a CRDT-like approach for reactions: every add or remove is a separate entry in an append-only log. The final counter is computed at read time. This handles concurrent reactions without locks.
- Store `(messageId, userId, emoji)` as a unique row - one user cannot place the same reaction twice
- Denormalize counters into `message.reactionSummary` JSONB - read from a single row
- Use atomic UPDATE with JSONB functions to avoid race conditions
- Throttle reaction broadcasts: if 10 reactions arrive within 100 ms, send a single batch
Reactions are just counters, INCREMENT/DECREMENT in a single column
Reactions need a separate table with uniqueness on (messageId, userId, emoji); the counter is a read-side denormalization
Without a separate table you cannot (1) show who reacted, (2) guarantee that one user does not place the same reaction twice, (3) atomically undo a reaction
Why use a separate `(messageId, userId, emoji)` table for reactions instead of a simple counter on the message?
Takeaways
- **Deterministic channel ID**: `min(A,B):max(A,B)` - no need to create a channel up front, the key is computed on the fly
- **Fan-out strategy depends on size**: small groups - direct broadcast, large - Kafka plus workers
- **Reactions = separate table**: the counter is denormalized; the source of truth is `(messageId, userId, emoji)`
Related topics
Chat architecture stands on several core patterns:
- WebSocket scaling — Real-time message delivery to channel members
- Database sharding — Sharding by channelId for data locality
- Fan-out patterns — Pub/Sub and queues for scaling group delivery
Вопросы для размышления
- How does the architecture change when you add support for forwarding messages between channels?
- Telegram channels (not groups) have millions of subscribers. How does their delivery differ from group chats?
- Reactions on a message with 1M views attract 10k reactions per minute. How do you protect the database?