System Design

Scalability Fundamentals

Цели урока

Understand when and why scaling is needed
Grasp vertical vs horizontal scaling
Master the Stateless vs Stateful concept
Understand the CAP theorem and its practical implications
Learn the core scaling patterns

Предварительные знания

Requirements gathering (previous lesson)
Basic understanding of client-server architecture

2008. Twitter. Super Bowl. 50,000 tweets per second. The service buckles. Users stare at the Fail Whale. Advertisers bleed revenue. That single day became the turning point - the team started building the architecture that, five years later, would carry 143,000 tweets per second without a single hiccup.

**Twitter 2013** - transition to horizontal scaling allowed surviving Super Bowl load without a single Fail Whale
**Netflix** caches 99% of requests through EVCache (distributed Memcached) - without caching, AWS bills would grow by an order of magnitude
**Amazon** split its monolith into 100+ microservices in 2002 - each stateless, each scaling independently
**Instagram** in 2012 with 13 employees served 30 million users - PostgreSQL + Redis + Nginx + horizontal scaling

Amdahl's Law and the birth of horizontal scaling

In 1967, Gene Amdahl (IBM) predicted the limits of parallel computing. His formula showed: sequential code is the bottleneck of any system. In the 2000s, as single servers hit physical limits, the industry massively shifted to horizontal scaling. Google published GFS and MapReduce in 2003 - the first public architectures for commodity hardware. Amazon switched to a service-oriented architecture by Bezos mandate in 2002. Netflix publicly described its microservices transformation in 2009. Each followed the same logic: not one high-throughput server - but many cheap ones.

Why Scale?

2008. Twitter. Super Bowl Sunday. 50,000 tweets per second. One server. The infamous Fail Whale floods millions of screens. Ad revenue vaporizes. That day kicked off Twitter's real architecture work - the work that eventually carried 143,000 tweets per second without a single failure.

**Scaling** is a system's ability to absorb growing load. Not just serving 100 users - going from 100 to 100,000 to 10 million without a complete rewrite.

**When should scaling begin?**

**Latency is growing** - requests are getting slower
**CPU/Memory near capacity** - server is 80-90% used
**Errors under load** - the system crashes during peak hours
**Database is a bottleneck** - DB queries take up most of the response time

Premature optimization is the root of all evil. Do not scale early - that just piles on complexity. Design so scaling is possible later.

When should a system start scaling?

Vertical vs Horizontal Scaling

Two fundamental approaches to scaling: **vertical** (Scale Up) and **horizontal** (Scale Out). The difference is not just technical - it dictates the entire system architecture.

**Vertical scaling (Scale Up)** - increase the power of a single server: more CPU, RAM, faster disk.

**Horizontal scaling (Scale Out)** - add more servers and distribute the load across them.

**Amdahl's Law: the math behind enthusiasm**

Not everything parallelizes. Amdahl's Law is the cold shower architects need: the sequential portion of the code caps the maximum speedup no matter how many servers get added.

AWS EC2 ranges from t3.micro (2 vCPU, 1 GB) to u-24tb1.metal (448 vCPU, 24 TB RAM). Vertical scaling hits a physical ceiling; horizontal scaling, for all practical purposes, does not.

The killer design question: which part of the system CANNOT be parallelized? That is the Amdahl bottleneck. The database is usually that part - which is exactly why sharding and replication matter so much.

A stateless REST API. Which type of scaling is preferable?

Stateless vs Stateful

**Stateless** versus **Stateful** is the dividing line for scaling. Stateless services scale easily; stateful services scale painfully. This is not a recommendation, it is math.

A **stateless service** holds no information between requests. Each request carries everything it needs to be processed.

A **stateful service** holds state between requests. The next request depends on the previous one.

**How does Stateful become Stateless?** The trick: lift state out into an external store (Redis, Database) and keep the servers stateless.

**JWT tokens** - state inside the token itself, server stores no sessions
**Client-side state** - state in the browser (Redux, localStorage)
**Centralized cache** - Redis/Memcached for shared state
**Database** - all data in DB, server handles only business logic

Stateful services are still required: databases, queues, caches. They scale separately, more carefully, with their own playbook. The application layer stays stateless.

A web app stores user sessions in server memory. How to make it stateless?

CAP Theorem

The **CAP theorem** (Brewer's theorem) is a fundamental law of distributed systems. Guaranteeing all three properties simultaneously - Consistency, Availability, Partition Tolerance - is impossible. Not a recommendation, a mathematically proven theorem.

In a distributed system, network partitions are **inevitable**. Servers spread across data centers, networks misbehave. A system that cannot survive a partition is not distributed. The real choice boils down to: CP or AP.

**The PACELC clarification.** CAP describes behavior during a partition. What about normal operation? PACELC fills the gap: Else (Latency vs Consistency). Even without a partition there is a trade-off.

In practice, most systems take eventual consistency for the bulk of operations and reserve strong consistency for the critical ones (payments, inventory).

Designing a like-counter system for a social network. Which trade-off to choose?

Scaling Patterns

Theory laid down. Now the practice: five patterns powering real systems from startups to FAANG.

**1. Load Balancing** - distributes requests across servers. The first step of horizontal scaling.

**2. Database Replication** - copies data across multiple servers. Increases read throughput and fault tolerance.

**3. Database Sharding** - splits data into parts (shards) by key. Scales both reads and writes.

**4. Caching** - keeps hot data in memory. Slashes DB load and latency. Netflix caches film metadata in memory - 90% of requests never touch the DB.

**5. Asynchronous Processing** - offloads heavy operations to background processing via queues.

Real systems combine multiple patterns. Load Balancing + Caching + Database Replication is the baseline kit for any high-load system.

The database is a bottleneck. Reads happen 100x more frequently than writes. Which pattern to apply first?

Key Ideas

Vertical scaling (Scale Up) - increasing server power, simple but has a ceiling
Horizontal scaling (Scale Out) - adding servers, requires stateless architecture
Amdahl's Law: 5% sequential code caps speedup at 20x even with 100 servers
CAP theorem: during a network partition the choice is Consistency or Availability - no third option
Patterns: Load Balancing, Replication, Sharding, Caching, Async Processing - applied in combination

What's next

Next lesson: Database fundamentals - SQL vs NoSQL, ACID, replication and sharding in practice. This is the scaling pattern detail for the data layer.

CAP Theorem (in depth) — Full breakdown of CP vs AP with Cassandra, etcd, Spanner examples
Databases: Replication and Sharding — Scaling pattern details for the data layer

Вопросы для размышления

Instagram with 13 employees served 30 million users. Which specific scaling decisions made that possible? What would have happened if they had chosen CP instead of AP for feeds?

Связанные уроки

ds-02 — CAP theorem is the foundation for choosing between CP and AP systems
bd-03 — Replication and sharding detail scaling patterns for the data layer
cloud-03 — Cloud auto-scaling builds on horizontal scaling principles
devops-05 — Load balancing is the first step of horizontal scaling in production
alg-01 — Big-O helps identify the bottleneck - the sequential part in Amdahl's law
se-08 — CQRS separates read/write paths - the same logic as read replicas
dist-11-replication