Computer Networks

Latency Numbers Every Programmer Should Know

Knowing latency numbers is a superpower in system design interviews. When you say 'we need a cache because a network call adds 500µs while RAM costs 100ns,' the interviewer sees depth of understanding. This is what separates a senior from a junior engineer.

**Google**: Jeff Dean's latency numbers - required knowledge for SRE and system design roles
**HFT**: microseconds matter - firmware optimizations to minimize latency
**Gaming**: 16ms frame budget requires understanding every component of the pipeline

Предварительные знания

Why Networks Exist

Цели урока

Memorize the orders of magnitude: L1 ~1ns, RAM ~100ns, SSD ~100µs, datacenter RTT ~0.5ms, cross-region ~150ms
Distinguish throughput vs latency: 10GbE = 1.25 GB/s but a single RTT still costs 0.5ms
Estimate fan-out scenarios: 100 parallel RPCs of 1ms = ~1ms wall-clock; sequential = 100ms
Apply numbers in interviews: justify 'we need a cache' with '100ns vs 500µs', not 'cache is faster'
Understand why physical limits (speed of light NYC-SF ~21ms one-way) set the floor for cross-continent RTT

Latency Hierarchy

Understanding the orders of magnitude of latency is critical for system design. The difference between an L1 cache reference (1ns) and a network call (100ms) spans **8 orders of magnitude**. Optimization must focus on the bottleneck.

**Rule of thumb**: RAM ~100ns, SSD ~10µs, network within DC ~500µs, cross-continent ~100ms. Each level is roughly 10–100× slower than the one above.

Jeff Dean (Google) popularized these numbers. They change as hardware evolves (NVMe SSD is faster than SATA, DDR5 is faster than DDR4), but the orders of magnitude remain stable.

How many RAM reads (100ns) can be performed in the time of one network round trip within a datacenter (500µs)?

Network Latency

Network latency consists of: **propagation delay** (speed of light in fiber ~200,000 km/s), **transmission delay** (size/bandwidth), **queuing delay** (waiting in buffers), and **processing delay** (processing at each hop).

**Speed of light limit**: NYC–London minimum RTT is 56ms. No technology will reduce this - only a CDN (moving content closer to the user) helps.

High-frequency trading (HFT) firms pay millions for microwave links (faster than fiber due to the direct line of sight) and colocation (servers next to the exchange). Every microsecond is money.

Why is a CDN critical for a global website?

Disk vs Memory

**SSD** changed the landscape: random read 10–100µs vs HDD 2–10ms. But RAM is still orders of magnitude faster. Understanding this difference is critical when choosing between an in-memory cache and disk storage.

**Sequential vs Random**: HDD head seek 2–10ms, but sequential read 200MB/s. SSD has no mechanical seek, but sequential is still faster (parallelism, prefetching).

Databases optimize for this: B-trees minimize random seeks, LSM-trees (RocksDB) favor sequential writes. The write-ahead log is a sequential write for durability.

Why is Redis (in-memory) orders of magnitude faster than PostgreSQL (disk-based) for simple key-value lookups?

Estimation Techniques

In a system design interview you need to quickly estimate throughput and latency. Key benchmarks: **1 server = 1000 RPS** (typical web), **1 modern server = 10,000–50,000 RPS** (optimized), **1 database = 5000–10,000 QPS** (simple queries).

**Little's Law**: Concurrency = Throughput × Latency. If latency is 100ms and 10,000 RPS is required, 1000 concurrent connections are needed. Useful for sizing thread pools.

When estimating, round to powers of 10. The goal is an order of magnitude, not an exact number. 1 million vs 10 million matters more than 1.2 vs 1.8 million.

A service with 10ms latency and 100 threads. What is the maximum throughput (Little's Law)?

Back-of-Envelope Calculations

**Back-of-envelope** calculations are quick, approximate estimates on a napkin. In an interview they demonstrate that you understand scale and can justify architectural decisions with numbers.

**Powers of 2 for quick math**: 2^10 ≈ 1000 (KB), 2^20 ≈ 1M (MB), 2^30 ≈ 1B (GB), 2^40 ≈ 1T (TB). Day = 86,400s ≈ 100,000s for easy division.

Useful approximations: 1 day ≈ 100K seconds, 1 month ≈ 2.5M seconds, 1 year ≈ 30M seconds. 1 million requests/day ≈ 10 RPS average.

Latency numbers become outdated and do not need to be known precisely

Exact numbers change, but the orders of magnitude (ns, µs, ms) remain stable. The key insight: RAM ~100ns, SSD ~10µs, network within DC ~500µs, cross-continent ~100ms. These ratios drive architectural decisions.

Choosing between an in-memory cache and a database, CDN and origin server, sharding and replication - all are based on understanding the latency hierarchy. The numbers provide the intuition needed to make the right trade-offs.

1 million requests per day. What is the average RPS?

Key Numbers to Remember

**RAM**: ~100ns random read
**SSD**: ~10-100µs random read (NVMe faster than SATA)
**Network within DC**: ~500µs round trip
**Cross-continent**: ~100-200ms round trip
**1 server**: ~1000-10,000 RPS (web), ~100,000 ops/sec (Redis)

Вопросы для размышления

How will these numbers change in 5–10 years? What will remain constant?
When is it worth paying for a faster storage tier?
How does latency affect UX? What latency is 'felt' by a user?

Связанные уроки

rt-36

Computer Networks

Latency Numbers Every Programmer Should Know

**Google**: Jeff Dean's latency numbers - required knowledge for SRE and system design roles
**HFT**: microseconds matter - firmware optimizations to minimize latency
**Gaming**: 16ms frame budget requires understanding every component of the pipeline

Предварительные знания

Why Networks Exist

Цели урока

Memorize the orders of magnitude: L1 ~1ns, RAM ~100ns, SSD ~100µs, datacenter RTT ~0.5ms, cross-region ~150ms
Distinguish throughput vs latency: 10GbE = 1.25 GB/s but a single RTT still costs 0.5ms
Estimate fan-out scenarios: 100 parallel RPCs of 1ms = ~1ms wall-clock; sequential = 100ms
Apply numbers in interviews: justify 'we need a cache' with '100ns vs 500µs', not 'cache is faster'
Understand why physical limits (speed of light NYC-SF ~21ms one-way) set the floor for cross-continent RTT

Latency Hierarchy

**Rule of thumb**: RAM ~100ns, SSD ~10µs, network within DC ~500µs, cross-continent ~100ms. Each level is roughly 10–100× slower than the one above.

Jeff Dean (Google) popularized these numbers. They change as hardware evolves (NVMe SSD is faster than SATA, DDR5 is faster than DDR4), but the orders of magnitude remain stable.

How many RAM reads (100ns) can be performed in the time of one network round trip within a datacenter (500µs)?

Network Latency

**Speed of light limit**: NYC–London minimum RTT is 56ms. No technology will reduce this - only a CDN (moving content closer to the user) helps.

High-frequency trading (HFT) firms pay millions for microwave links (faster than fiber due to the direct line of sight) and colocation (servers next to the exchange). Every microsecond is money.

Why is a CDN critical for a global website?

Disk vs Memory

**Sequential vs Random**: HDD head seek 2–10ms, but sequential read 200MB/s. SSD has no mechanical seek, but sequential is still faster (parallelism, prefetching).

Databases optimize for this: B-trees minimize random seeks, LSM-trees (RocksDB) favor sequential writes. The write-ahead log is a sequential write for durability.

Why is Redis (in-memory) orders of magnitude faster than PostgreSQL (disk-based) for simple key-value lookups?

Estimation Techniques

**Little's Law**: Concurrency = Throughput × Latency. If latency is 100ms and 10,000 RPS is required, 1000 concurrent connections are needed. Useful for sizing thread pools.

When estimating, round to powers of 10. The goal is an order of magnitude, not an exact number. 1 million vs 10 million matters more than 1.2 vs 1.8 million.

A service with 10ms latency and 100 threads. What is the maximum throughput (Little's Law)?

Back-of-Envelope Calculations

**Back-of-envelope** calculations are quick, approximate estimates on a napkin. In an interview they demonstrate that you understand scale and can justify architectural decisions with numbers.

**Powers of 2 for quick math**: 2^10 ≈ 1000 (KB), 2^20 ≈ 1M (MB), 2^30 ≈ 1B (GB), 2^40 ≈ 1T (TB). Day = 86,400s ≈ 100,000s for easy division.

Useful approximations: 1 day ≈ 100K seconds, 1 month ≈ 2.5M seconds, 1 year ≈ 30M seconds. 1 million requests/day ≈ 10 RPS average.

Latency numbers become outdated and do not need to be known precisely

1 million requests per day. What is the average RPS?

Key Numbers to Remember

**RAM**: ~100ns random read
**SSD**: ~10-100µs random read (NVMe faster than SATA)
**Network within DC**: ~500µs round trip
**Cross-continent**: ~100-200ms round trip
**1 server**: ~1000-10,000 RPS (web), ~100,000 ops/sec (Redis)

Вопросы для размышления

How will these numbers change in 5–10 years? What will remain constant?
When is it worth paying for a faster storage tier?
How does latency affect UX? What latency is 'felt' by a user?

Связанные уроки

rt-36

Latency Numbers Every Programmer Should Know

Предварительные знания

Цели урока

Latency Hierarchy

Network Latency

Disk vs Memory

Estimation Techniques

Back-of-Envelope Calculations

Key Numbers to Remember

Related Topics

Вопросы для размышления

Связанные уроки

Latency Numbers Every Programmer Should Know

Предварительные знания

Цели урока

Latency Hierarchy

Network Latency

Disk vs Memory

Estimation Techniques

Back-of-Envelope Calculations

Key Numbers to Remember

Related Topics

Вопросы для размышления

Связанные уроки