Parallel Computing
Concurrency at Scale
The Amazon Prime Day 2018 outage lasted 65 minutes and cost an estimated $100 million in lost sales. The root cause: a small internal service became slow under load, its thread pool filled up, callers blocked on the full queue, their thread pools filled up, and the cascade propagated up the call chain until the front-end was down. This is the cascade failure pattern that circuit breakers, backpressure, and rate limiting exist to prevent. Netflix's Hystrix was built to make their microservice architecture resilient to exactly this class of failure. Every distributed system at scale eventually encounters it.
- **Netflix's Hystrix** circuit breaker library was developed after a 2012 cascade failure where a single degraded dependency caused a 3-hour outage. Hystrix now runs on every microservice at Netflix (400+ services), isolating failures via thread pool isolation and circuit breaking - credited with preventing multiple potential full-site outages per year.
- **Stripe's rate limiter** uses a distributed token bucket backed by Redis to enforce per-customer API limits (100-25,000 requests/second depending on plan) across a globally distributed API fleet processing 250 million transactions per day, preventing abuse while ensuring fair capacity allocation.
- **Cloudflare's global rate limiting** applies distributed rate limiting at the network edge (290 data centers) to block DDoS attacks, using approximate counting with local token buckets synced across the PoP - processing 40 million HTTP requests per second without Redis round-trips on the critical path.
Thread Pools and Executor Services
A thread pool amortizes the cost of thread creation (typically 1-10ms) across many tasks by reusing a fixed set of threads. The core pool size (minimum threads always alive) and maximum pool size (peak threads under load) define the pool's scaling behavior. Work queue type matters: an unbounded queue allows infinite task accumulation (OOM risk under load); a bounded queue with rejection policy (CallerRunsPolicy, AbortPolicy) implements backpressure.
Java's ThreadPoolExecutor is the reference implementation. Sizing rule of thumb: for I/O-bound tasks, pool size = CPU cores * (1 + wait_time/compute_time); for CPU-bound tasks, pool size = CPU cores + 1 (one extra to prevent stalls from page faults). A 16-core server serving HTTP requests with average 50ms I/O wait and 5ms compute should have 16 * (1 + 50/5) = 176 threads.
Virtual threads (Java 21 Project Loom) change thread pool sizing entirely: virtual threads are mapped M:N to OS threads by the JVM, similar to goroutines. With virtual threads, the pool can have 10,000+ threads without OS overhead - the JVM scheduler handles multiplexing onto platform threads. This makes thread-per-request viable again for I/O-bound services.
Why is an unbounded work queue in a thread pool dangerous under sustained high load?
Backpressure
Backpressure is a flow control mechanism that slows down producers when downstream consumers cannot keep up, preventing unbounded resource accumulation. Without backpressure, a fast producer and slow consumer leads to OOM: the producer's output accumulates in buffers until the process crashes. With backpressure, the producer is blocked or rate-limited as soon as the consumer falls behind.
Reactive Streams (the standard adopted by Java 9 Flow API, RxJava, Project Reactor, Akka Streams) defines the backpressure protocol: a subscriber sends request(n) to the publisher to signal capacity for n more items. The publisher sends at most n items, then waits for another request(). This pull-based protocol prevents the fast-producer slow-consumer problem structurally.
TCP implements backpressure via the receive window: the receiver advertises how much buffer space is available; the sender cannot transmit more bytes than the window allows. This is why a TCP connection automatically adjusts throughput when the receiver is slow - no application-level flow control is needed for byte streams.
How does the Reactive Streams backpressure protocol prevent fast-producer slow-consumer OOM?
Rate Limiting
Rate limiting controls the rate of operations to protect downstream systems or enforce fairness. Two fundamental algorithms: Token Bucket (allows burst up to bucket capacity, then limits to refill rate) and Leaky Bucket (enforces constant output rate, absorbing bursts by queueing). Token bucket is preferred for APIs (allows reasonable burst); leaky bucket for SLAs requiring constant rate (network shaping).
Distributed rate limiting (across multiple servers) requires coordination: a Redis INCR with TTL implements a sliding window counter; Redis Lua scripts make the check-and-increment atomic. For high-throughput rate limiting (>100k checks/sec), local token bucket with occasional Redis sync (approximate limiting) dramatically reduces Redis load compared to per-request Redis calls.
The fixed window counter algorithm (simplest: INCR key with TTL) has a boundary attack: a user can send limit requests at the end of window N and limit more requests at the start of window N+1, getting 2x the intended rate. Sliding window (sorted set in Redis) or token bucket algorithms prevent this edge case.
When would a Token Bucket rate limiter be preferred over a Leaky Bucket?
Circuit Breaker Pattern
A circuit breaker prevents cascade failures by stopping calls to a failing downstream service. States: CLOSED (normal operation, requests flow through), OPEN (failure detected, requests fail-fast without calling downstream), HALF-OPEN (probe state, allow one test request to check if downstream recovered). This prevents a slow downstream from exhausting thread pools and cascading to callers.
Netflix's Hystrix popularized circuit breakers for microservices (now deprecated in favor of Resilience4j). The default Hystrix configuration opens the circuit when >= 50% of the last 10 requests in a 10-second window fail or time out. The half-open probe allows one request after a 5-second sleep window - if it succeeds, the circuit closes; if it fails, the sleep resets.
Circuit breakers combine with retries carefully: retry + circuit breaker can conflict if not ordered correctly. The correct order: retry -> circuit breaker -> call. Retries should see the circuit breaker's fail-fast; not bypass it. Resilience4j's decorator ordering enforces this pattern.
A circuit breaker prevents all downstream failures from affecting callers
A circuit breaker only prevents cascading failures once the threshold is reached - the first N failing requests (before the threshold) still fail; the circuit breaker prevents the N+1 through infinity failures from wasting resources
The circuit breaker must observe enough failures to distinguish a temporarily slow service from a permanently failed one - this minimum sample window means some failures reach callers before the protection activates
Why is the HALF-OPEN state necessary in a circuit breaker, rather than going directly from OPEN to CLOSED?
Key Ideas
- **Thread pools** must be sized based on I/O vs. CPU work ratios (cores * (1 + wait/compute)) and use bounded queues with backpressure rejection policies to prevent OOM under load.
- **Backpressure** via Reactive Streams demand signals (request(n)) prevents fast-producer slow-consumer OOM by making the subscriber control the flow rate explicitly.
- **Circuit breakers** prevent cascade failures: CLOSED (normal), OPEN (fail-fast, no downstream calls), HALF-OPEN (probe recovery) - triggered at configurable failure rate and latency thresholds.
Related Topics
Concurrency at scale draws on synchronization and distributed systems patterns:
- Asynchronous Programming — Backpressure and rate limiting are implemented using async primitives (bounded channels, semaphores) in async runtimes like Tokio, Akka Streams, and Project Reactor
- Actor Model: Erlang, Akka — Akka Streams implements the Reactive Streams backpressure protocol on top of the actor model, combining location transparency with flow-controlled stream processing
Вопросы для размышления
- Design a thread pool configuration for a Java REST API that calls a database (avg 50ms query time) and an external payment API (avg 200ms, 99th pct 2s) on a 16-core server handling 5,000 req/s peak. What pool sizes, queue sizes, and rejection policies would you use?
- A microservice receives 100k req/s and calls a downstream service that can handle 10k req/s. Without rate limiting or backpressure, what happens? Design the flow control to handle this disparity.
- A circuit breaker is configured with a 50% failure threshold and 10-call window. If the downstream service alternates between success and failure (SFSFSF...), does the circuit open? Why or why not?