System Design

Introduction to System Design

Цели урока

Understand what System Design is and why it matters
Learn the core components of modern systems
Master the concept of trade-offs
Learn a framework for solving SD problems in interviews

Предварительные знания

Basic understanding of HTTP and REST APIs
Experience working with any database
Understanding of client-server architecture

Twitter, 2013, Super Bowl Sunday. 143,000 tweets per second - a record that held for years. Instagram sold for 1 billion dollars that same year with 13 engineers and 30 million users. WhatsApp - 450 million users, 55 engineers, 19 billion dollars. System Design is not about technology. It's about trade-offs: the CAP theorem allows picking only two of three properties. There are no right answers - only justified compromises.

**Twitter fan-out at 143k RPS:** at Super Bowl peak, generating feeds for celebrity followers meant one tweet propagating to 10+ million accounts in seconds - only aggressive push caching made it survivable
**Instagram at 13 engineers:** aggressive Redis caching, PostgreSQL for user data, CDN for media - simple architecture taken to its limit
**WhatsApp 55 engineers / 450M users:** Erlang + FreeBSD, minimal dependencies, reliability above everything - the highest engineering density in the industry
**CAP in production:** DynamoDB (Amazon), Cassandra (Netflix) choose AP. Google Spanner achieves near-CA through atomic clocks. Every choice carries consequences

What is System Design?

**13 engineers. 30 million users. 1 billion dollars.** Instagram sold to Facebook in 2012 with a team the size of a basketball starting lineup. WhatsApp in 2014 - 450 million active users, 55 engineers, 19 billion dollars. Scale isn't defined by technology choices - it's defined by architectural decisions.

**System Design** is not about code and frameworks. It's about **components**, their **interactions**, and **trade-offs**. "How would a designer build Twitter?" is not a question about programming languages. It's about how the system handles 143,000 tweets per second during the Super Bowl without going down.

System Design is the bridge between business requirements and technical implementation. A senior engineer turns 'we need a chat' into an architecture with concrete components, concrete numbers, and justified trade-offs.

**CAP theorem in one line:** of three properties - Consistency, Availability, Partition tolerance - a real distributed system can only guarantee two. Not three. There are no correct answers - only justified trade-offs. That is the essence of System Design.

**Analogy:** an architect designing a skyscraper doesn't start with wallpaper colors - they start with foundations, load-bearing structures, elevators. System Design is the same discipline for software systems.

System Design covers a wide range of decisions:

**Database selection**: SQL vs NoSQL, when to use which
**Scalability**: how to handle growing load
**Reliability**: what to do when a server goes down
**Performance**: how to reduce latency
**Security**: authentication, authorization, encryption

System Design is about choosing the right technology

System Design is about choosing the right trade-off

There is no 'best' database or 'best' pattern - there are solutions that fit specific requirements. Instagram ran on PostgreSQL at 30M users. Cassandra is needed at 10B events per day. Technology is secondary - requirements and trade-offs come first

Which of the following is NOT part of System Design?

Unit tests are part of code development, not system architecture. System Design operates at the level of components and their interactions, not individual functions.

Why Study System Design?

System Design is not an academic exercise. It's the skill that separates engineers by value and compensation more than almost anything else.

**1. Interviews at top companies**

System Design is a mandatory interview stage at Google, Meta, Amazon, Apple, Netflix, and Microsoft. At the Senior+ level it's often the deciding round: a weak coding round can be recovered from, but a failed System Design round means no offer.

**2. Career growth**

Junior - writes code to spec. Mid - picks technologies, designs the API. Senior - spots the bottleneck, proposes architectural changes. Staff - designs the evolution of the entire platform for 10x growth. Every step up requires System Design.

**3. Practical value**

Understanding System Design helps in day-to-day work:

Understanding why a system fails under load
Knowing how to read architecture diagrams
Participating in design reviews meaningfully
Communicating with the team using the same vocabulary
Making informed technical decisions

Even without FAANG ambitions, System Design shifts perspective. Instead of seeing only a single endpoint, engineers start seeing the system as a living organism with failure points, bottlenecks, and headroom.

At what level does System Design become a critically important skill?

System Design becomes critically important when transitioning to the Senior level. Juniors and Mids can work from ready-made specs, but a Senior must make architectural decisions independently.

Core Building Blocks

Instagram, Twitter, WhatsApp - different products, but the same building blocks under the hood. The difference lies in how they're combined and which trade-offs they optimize for.

Each component solves one problem better than anything else - that's what makes architectures predictable:

**1. Load Balancer** - the entry point that distributes traffic across servers. Solves two problems at once: scalability (one server can't handle 143k RPS) and fault tolerance (a server goes down, traffic routes to live ones). Nginx, HAProxy, AWS ALB.

**2. Application Server** - business logic. Key property: **stateless**. Any server handles any request. This means adding 10 servers takes 5 minutes - the Load Balancer starts using them immediately.

**3. Database** - durable storage. SQL (PostgreSQL, MySQL) - ACID transactions, complex JOINs, strict schema. NoSQL (MongoDB, Cassandra) - flexible schema, horizontal sharding, massive write throughput. The choice follows from the data, not from trends.

**4. Cache** - Redis/Memcached in front of the database. Hot data in memory, latency drops from 50 ms to 0.5 ms. Twitter kept the top 1,000 most-followed users in Redis permanently - their fans' feeds generated in microseconds.

**5. CDN** - Cloudflare, AWS CloudFront. Static assets and media served from the nearest edge node. A user in Tokyo gets the image from Tokyo, not from Virginia.

**6. Message Queue** - Kafka, RabbitMQ, SQS. A heavy task (photo resize, email send, invoice generation) goes into the queue - the API responds to the client immediately, a worker processes it asynchronously.

Stateless means no sessions

Stateless means state is stored separately from the server

Sessions can live in Redis and the architecture is still stateless: any server finds the user's session through shared Redis. The point is that killing or adding a server loses no data and creates no duplicates. That's what enables horizontal scaling

Why should an Application Server be stateless?

Stateless servers enable easy scaling: the Load Balancer can route a request to any server because state is stored separately (in DB, Cache). This is the key to horizontal scaling.

Trade-offs: The Heart of System Design

**Perfect architecture doesn't exist.** Every decision is a trade-off. A good engineer understands what they're giving up and explains why. A bad engineer picks a technology because it's trendy.

In an interview, candidates are not evaluated on the "right answer" (there isn't one), but on the understanding of trade-offs and the ability to reason through decisions out loud. "I'm choosing X here because..." - that's the expected format.

**Trade-off #1: Consistency vs Availability**

CAP theorem: under network partition, the system picks one of two. Consistency - all nodes see the same data at any moment. Availability - the system always responds, even if data may diverge. A bank transfer demands Consistency. An Instagram feed tolerates eventual consistency just fine.

**Trade-off #2: Latency vs Throughput**

Latency - response time for a single request. Throughput - requests per second. Google Search: p95 latency < 200 ms, every individual query matters. Kafka log processing: aggregate throughput matters - millions of events per second, a few seconds of delay is fine. Optimizing for one often sacrifices the other.

**Trade-off #3: SQL vs NoSQL**

A classic mistake: choosing MongoDB "because flexible schema" for a system with financial transactions. Or PostgreSQL for 100 billion events per day. Technology follows from requirements - never the other way around.

A team is designing a payment system. What matters most?

Strong Consistency is critical for financial systems: two requests must never see different balances (double-spending). It's better for the system to be unavailable than to show incorrect data.

Framework for Solving SD Problems

A System Design interview lasts 45-60 minutes. Without structure - chaos: the candidate dives into implementation details too early, misses the important parts, loses the interviewer. Here's a framework that covers all expectations:

**Stage 1: Requirements - ask questions**

"Design Twitter" is intentionally vague. Just posting, or also search, recommendations, monetization? 100K DAU or 300M? Text only or video too? The right questions demonstrate experience - and they determine the entire architecture.

**Stage 2: Estimation - back-of-envelope**

Twitter: 300M DAU × 50 reads/day / 86,400 sec = ~170k reads/sec. Peak load × 3 = 500k RPS. That alone tells the story: aggressive caching is required, read replicas for the DB, sharding. Precision isn't the goal - the right order of magnitude is.

**Stage 3: High-Level Design - draw a diagram**

Start with the simplest architecture: Client → Load Balancer → App Server → Database. Add components as their need becomes apparent - Cache if read-heavy, Queue if there are heavy background tasks, CDN if the audience is global.

Think out loud. The interviewer evaluates reasoning, not just results. "I'm adding a Redis cache here because read:write is 100:1 and most requests read the same hot profiles" - that's exactly the right format.

**Stage 4: Deep Dive - demonstrate expertise**

"Tell me more about X" is the standard interviewer move. Be ready to go deep on any component: exactly how does sharding by user_id work, what eviction policy fits the cache, how is replication structured and what happens at failover.

What is the first step when solving a System Design problem?

Requirements first! A system cannot be designed without understanding what it should do, what scale it needs to support, and what the constraints are. Different requirements → different architectures.

Key Takeaways

**System Design** - architectural decisions that determine whether a system survives 143k RPS or collapses under the first traffic spike
**CAP theorem** - the foundation: Consistency, Availability, Partition tolerance - real systems pick two. Banks pick CP. Social networks pick AP
**6 building blocks**: Load Balancer, App Server (stateless!), DB, Cache, CDN, Message Queue - every system is assembled from this set
**Trade-offs, not technologies**: MongoDB vs PostgreSQL follows from data requirements, not hype. Latency vs Throughput follows from load type
**4-stage framework**: Requirements (clarify scope) → Estimation (order of magnitude) → High-Level Design (diagram) → Deep Dive (details on demand)

What's Next

Upcoming lessons: Requirements Gathering (how to ask the right questions), Load Estimation (back-of-envelope calculations), then breakdown of each component (Load Balancer, Cache, Message Queue) and patterns (Microservices, API Gateway). The finale - real-world cases: designing Twitter, YouTube, Uber.

system-design — related to

Вопросы для размышления

Consider a product or service used every day - a messenger, a streaming platform, maps. What trade-offs are hidden behind its behavior? When it shows stale data or goes briefly unavailable - is that accidental, or a deliberate architectural decision?

Связанные уроки

sd-02-requirements — Requirements gathering is the first step of the 4-stage framework introduced here
alg-01-big-o — Back-of-envelope estimation mirrors complexity analysis for systems
st-01-feedback-loops — CAP theorem and consistency models map directly to feedback loop theory
prob-01-intro — Probabilistic thinking needed for SLA, error budgets, and latency percentiles
bt-01-overview — Transport protocols are the connective tissue between system design components
net-01-intro