System Design
Case Study: Uber
When a rider taps Request in San Francisco at 18:47, 14 driver phones light up within 230 milliseconds. The matching service ran a constrained optimization over 4,300 candidate drivers, refreshed surge multipliers across 8,000 H3 cells, and reserved a $24.50 authorization hold - all before the rider's screen finished the transition animation.
- Uber engineering disclosed (2019 talk by Matt Ranney) that DISCO, the dispatcher, processes 50,000 matching decisions per second at peak.
- H3 hexagonal grid (open-sourced by Uber in 2018) indexes the planet at 16 resolutions - resolution 9 hexagons have an edge of 174m, the working unit for rider-driver proximity.
- Halloween 2014 in SF: surge pricing disabled by a bug for 90 minutes; completion rate dropped from approximately 100% to 80% (Diamond and Hall, NBER 2015).
- Trip state machine handles 17M trips per day with optimistic locking on a versioned status field - no global trip lock, no central transaction coordinator.
Requirements and Scale
Цели урока
- Define functional and non-functional requirements
- Perform back-of-envelope estimation
- Understand the unique challenges of a ride-sharing platform
Functional Requirements
Uber is a real-time platform connecting drivers and riders. Key features:
Uber at Scale (Real Numbers)
Back-of-Envelope Estimation
Key Technical Challenges
High-Level Architecture
Key Takeaways
- Uber is a real-time, location-centric system
- Key challenges: 1M+ location updates/sec, matching in <15 sec, accurate ETA, dynamic pricing
- Core components: Location Service (Redis Geo), Matching Service, Trip Service, Pricing Service
What is the primary load source in a system like Uber?
Geospatial Indexing
Цели урока
- Understand different approaches to geospatial indexing
- Study Geohash, S2, and H3 systems
- Master Redis Geospatial for location queries
The Problem: Spatial Proximity Queries
The core query in Uber is: "find all drivers within 3 km of point (lat, lng)". How do you answer this quickly with millions of drivers?
Geohash
Geohash encodes coordinates into a string. Nearby points share a common prefix, enabling prefix-based search.
H3 (Uber's Solution)
H3 is a hexagonal hierarchical spatial index developed by Uber. Hexagons solve the edge-effect problem and provide uniform distance.
Redis Geospatial Implementation
H3 + Redis for High Load
Comparing Approaches
Key Takeaways
- Geospatial indexing transforms O(N) proximity search into O(1) cell lookup
- Geohash is simple and natively supported in Redis
- H3 is a hexagonal grid with uniform distance, designed by Uber specifically for ride-sharing
- The choice depends on requirements: simplicity (Geohash) vs. optimality (H3)
Why does Uber use hexagonal cells (H3) instead of rectangular ones (Geohash)?
Matching Algorithm
Цели урока
- Understand the driver-rider matching problem
- Study scoring and ranking of drivers
- Master batch matching for optimization
The Problem: Optimal Matching
The naive "closest driver" approach performs poorly under high load. Why?
Batch Matching Approach
Driver Scoring
Distance is not the only factor. Uber uses multi-factor scoring:
Matching Service Implementation
Race Conditions and Locking
Key Takeaways
- Matching in Uber is not simply 'find the nearest driver'
- Batch matching every 2-3 seconds enables globally optimal assignment
- Scoring considers ETA, heading, rating, and acceptance rate
- Optimistic locking prevents race conditions during concurrent matching
Why is batch matching better than sequential matching?
Dynamic Pricing (Surge)
Цели урока
- Understand supply-demand economics in ride-sharing
- Study the surge pricing algorithm
- Master zone-based pricing calculation
Why Surge Pricing Exists
Surge pricing (dynamic pricing) is a mechanism for balancing supply and demand in real time. Without it, the system breaks down during demand spikes.
Zone-Based Surge Calculation
Surge Calculation Algorithm
Fare Calculation
Anti-Gaming Measures
Key Takeaways
- Surge pricing balances supply and demand in real time
- The city is divided into zones (H3 cells); the demand/supply ratio and surge multiplier are computed per zone
- Smoothing prevents abrupt jumps
- Anti-gaming measures protect against manipulation by both drivers and riders
Why is smoothing applied to the surge multiplier?
Real-Time Architecture
Цели урока
- Design the trip state machine
- Understand real-time communication patterns
- Study fault tolerance for critical operations
Trip State Machine
Every trip moves through a defined set of states. The state machine ensures consistency and correct handling of edge cases.
Real-Time Communication
Trip Service Implementation
Fault Tolerance
Key Takeaways
- Uber's real-time architecture includes: a Trip State Machine with validated transitions, dual-channel communication (HTTP/2 for updates, WebSocket for push), and optimistic locking for consistency
- Fault tolerance is achieved through authorization holds, multi-layer fallbacks, and geographic partitioning
Surge pricing exists to extract maximum revenue when demand spikes - it is a profit lever that punishes riders during emergencies.
Surge is a control loop that rebalances driver supply against rider demand in a closed feedback cycle. When the H3-cell ratio of pending requests to idle drivers crosses a threshold, the multiplier rises - which both filters price-sensitive demand and attracts drivers from adjacent cells. The objective function is wait-time minimization, not revenue maximization.
Diamond and Hall (NBER 2015) analyzed 142M Uber trips and showed that disabling surge during the 2014 Halloween outage in San Francisco produced an 80% completion rate vs 100% with surge enabled. The control loop, not the price tag, is what kept the marketplace cleared. If surge were tuned for revenue, Uber would maximize rates in low-supply low-demand pockets - it does the opposite.
Why does Uber use an authorization hold instead of charging directly after the trip?
Where this connects
Uber is the case study where geospatial indexing, real-time pricing as a control loop, and consistency-vs-availability trade-offs all converge in a single workload. It builds directly on YouTube's CDN intuition and feeds back into consistent hashing and Redis-as-state-store patterns.
- YouTube case study (real-time at scale) — builds-on
- Consistent hashing for geo-sharding — applies
- Redis as ephemeral location store — applies
- Self-organization in markets — analogous-to
What stays in memory
- H3 hexagonal geospatial indexing replaces lat/lng range queries with O(1) hex-id lookups; resolution 9 (174m edge) is the working unit for proximity.
- Driver locations live in Redis with TTL 5s; the hot path never touches PostgreSQL, only the trip state does.
- DISCO matching uses constrained optimization over batched candidates (ETA, rating, vehicle type, surge), not greedy nearest-driver - greedy starves long-waiting riders.
- Surge pricing is a closed-loop controller: ratio of pending requests to idle drivers per H3 cell drives the multiplier, which throttles demand and attracts supply.
- Trip state machine uses optimistic locking with a version column - no global lock, but invalid transitions raise ConcurrentModificationError on conflict.
- Real-time architecture splits transport: HTTP/2 server push for state updates, WebSocket for bidirectional events, Kafka for the event log, Pinot for OLAP queries on trip events.
Вопросы для размышления
- Surge pricing operates as a control loop, not a revenue lever. If Uber switched the objective from 'minimize wait time' to 'maximize revenue per request', what failure modes would emerge in the marketplace?
- Geo-sharding via H3 works on Earth's surface, but flight booking and orbital satellite networks face the same nearest-neighbor problem on different geometries. What does H3 lose when projected onto a sphere, and where would a different indexing scheme win?
- Optimistic locking on trip state was chosen over pessimistic locks. What happens at 50K matching decisions per second when conflicting transitions arrive within a single millisecond, and how does the retry policy avoid a thundering herd on hot drivers?
Связанные уроки
- sd-15-youtube — YouTube scale patterns: geo-distributed data
- sd-18-consistent-hashing — Consistent hashing for geo-data sharding
- net-15-tcp-basics — WebSocket/long polling for real-time location updates
- db-19-redis — Redis stores current driver positions (TTL = 5 sec)
- alg-14-dijkstra