Operating Systems
System Interview Patterns
A candidate preparing for a system interview at Google. The interviewer opens Coderpad and writes: "Design a Thread Pool". The pulse quickens. 45 minutes to demonstrate understanding of concurrency, memory management, distributed systems. These patterns are not abstract puzzles. They are real questions that separate L4 from L5, junior from senior. Each pattern is used in production: rate limiter in API Gateway, thread pool in web servers, memory allocator in every C++ program.
- **Rate Limiter in production:** Stripe processes millions of API requests per second. Without rate limiting, one client can bring down the service. Redis-based rate limiter with Sliding Window Counter protects the infrastructure. This is not theory - it's a critical component of production systems.
- **Thread Pool everywhere:** Nginx uses thread pool for disk I/O. Java Executors serve millions of Android applications. Go goroutines (8KB stack!) scale up to 100,000 on one machine. Understanding thread pool is key to writing scalable systems.
- **Custom Allocators for performance:** jemalloc (Facebook) sped up MySQL by 30%. tcmalloc (Google) is used in Chrome. Game engines write custom allocators to minimize GC pauses. In High-Frequency Trading, every microsecond is critical - malloc() from glibc is too slow.
Цели урока
- Rate limiter: token bucket vs sliding window, distributed (Redis-based)
- Thread pool: queue, worker threads, graceful shutdown, error propagation
- Memory allocator: free list, buddy, slab; trade-offs of jemalloc vs tcmalloc
- Bounded buffer (producer-consumer): mutex + cond vars vs lock-free queue
- Structure the answer: clarify → API → state → concurrency → trade-offs
Design Rate Limiter
**Rate Limiter** - one of the top-5 questions in system interviews at FAANG. Task: limit the number of requests from a client (protection against DDoS, fair usage). The interviewer checks knowledge of algorithms, threading, distributed systems.
Task Setup (as in an interview)
**Interviewer:** Design a rate limiter for an API. Requirements: 1. Maximum 100 requests per minute per user 2. Must work in a distributed environment (multiple servers) 3. Minimal latency (check < 1ms) 4. Graceful degradation under overload **Candidate's move:** Which algorithm is best? Token Bucket? Leaky Bucket? Sliding Window?
**Popular rate limiting algorithms:** 1. **Token Bucket** - tokens are replenished at a constant rate, a request consumes a token 2. **Leaky Bucket** - requests are processed at a fixed rate (smooths burst) 3. **Fixed Window** - counter resets every minute (problem: burst at the boundary) 4. **Sliding Window Log** - stores timestamp of each request (accurate but memory-intensive) 5. **Sliding Window Counter** - hybrid of Fixed Window + Sliding (optimal!)
**Key question from the interviewer:** How does this work in a distributed system? One problem: state is not synchronized between servers. Solutions: 1. **Sticky Sessions** - user always lands on the same server (not scalable) 2. **Centralized Redis** - counters in Redis with TTL (industry standard) 3. **Rate Limiter Middleware** - separate service (Envoy, Kong)
ASCII diagram Token Bucket
``` Token Bucket Algorithm: refill_rate = 2 tokens/sec | v ┌─────────────────────┐ │ 🪙 🪙 🪙 🪙 │ ← Current: 4 tokens │ capacity = 10 │ (max: 10) └─────────────────────┘ | v (request consumes 1 token) ┌─────────────────────┐ │ 🪙 🪙 🪙 │ ← After request: 3 └─────────────────────┘ Time → Refill: - t=0: 4 tokens - t=0.5s: 4 + 0.5*2 = 5 tokens - t=1s: 4 + 1*2 = 6 tokens Advantage: burst allowed (up to capacity) Disadvantage: burst can overload backend ```
**Follow-up questions from the interviewer:** - How to handle burst traffic? (Token Bucket vs Leaky Bucket) - What to do during Redis downtime? (Fallback: allow all or deny all?) - How to avoid race conditions? (Lua scripts in Redis - atomicity) - How to monitor rate limiting? (Metrics: reject_rate, p99 latency)
Why is Fixed Window Counter not suitable for production rate limiting despite its simplicity?
Design Thread Pool
**Thread Pool** - a classic of system design. Task: efficiently handle thousands of tasks with a limited number of threads. Creating a thread is expensive (1-2ms, several MB of stack), so reuse is critical.
Task Setup (real Meta question)
**Interviewer:** Implement a Thread Pool with these requirements: 1. Fixed number of worker threads (configurable) 2. Task queue - FIFO processing 3. Graceful shutdown - wait for current tasks to complete 4. Thread-safe - can submit from any thread 5. Support for priority tasks (extension) **What is being tested:** mutex/condition variables, producer-consumer pattern, resource management.
**Key points for the interview:** 1. **Condition Variable** - workers sleep while there are no tasks (not busy-wait!) 2. **Lock minimization** - hold lock only when working with queue, execute task without lock 3. **Graceful shutdown** - check `stop && tasks.empty()` (complete pending tasks) 4. **Move semantics** - `std::move(task)` avoids copying functions
**Production Thread Pool features:** - **Work Stealing** - if one worker is idle, it steals tasks from another's queue (Java ForkJoinPool) - **Priority Queue** - critical tasks are executed first (std::priority_queue) - **Dynamic Sizing** - pool grows/shrinks based on load (ExecutorService in Java) - **Task Cancellation** - ability to cancel a task that hasn't started yet (std::future::cancel)
ASCII diagram Thread Pool
``` Thread Pool Architecture: ┌──────────────────────────────────────────┐ │ Main Thread │ │ pool.submit(task1) │ │ pool.submit(task2) │ │ pool.submit(task3) │ └────────┬─────────────────────────────────┘ │ v ┌──────────────────────────────────────────┐ │ Task Queue (mutex protected) │ │ [task1] [task2] [task3] [task4] ... │ └─┬────┬────┬────┬───────────────────────┬─┘ │ │ │ │ │ v v v v v ┌───┐┌───┐┌───┐┌───┐ ┌───┐ │ W1││ W2││ W3││ W4│ ... workers ... │ Wn│ └───┘└───┘└───┘└───┘ └───┘ W1: wait(cv) → took task1 → execute W2: wait(cv) → took task2 → execute W3: wait(cv) → sleeping (no tasks) When task is completed → worker waits(cv) again ```
**Follow-up questions:** - What if a task throws an exception? (Catch in worker_loop, otherwise thread will die) - How to return a result from a task? (std::future + std::promise) - How to avoid deadlock with nested submit? (Do not block in task!) - What is the optimal pool size? (CPU-bound: num_cores, IO-bound: num_cores * 2-4)
Why do we execute the task OUTSIDE the critical section (after unlock) in worker_loop?
Design Memory Allocator
**Memory Allocator** - a question for senior+ positions. Task: implement `malloc()/free()` from scratch. Tests understanding of memory layout, fragmentation, alignment, thread-safety. Often asked at Google, Meta.
Task Setup (real Google question)
**Interviewer:** Implement a memory allocator: ```c void* my_malloc(size_t size); void my_free(void* ptr); ``` Requirements: 1. Minimize fragmentation (external and internal) 2. O(1) or O(log n) allocation (not O(n) search!) 3. Thread-safe for multithreaded programs 4. Memory alignment (8-byte on x64) **Hint:** Use free list, segregated free lists, or buddy allocator.
**Main allocation strategies:** 1. **Free List** - list of free blocks, search for suitable: - First Fit - first suitable block (fast, but fragmentation) - Best Fit - smallest suitable (slow, less fragmentation) - Worst Fit - largest block (works poorly) 2. **Segregated Free Lists** - separate lists for sizes (8, 16, 32, 64... bytes). O(1) search! 3. **Buddy Allocator** - blocks of power of 2 (4KB, 8KB, 16KB...). On free, merge "buddy" blocks. 4. **Slab Allocator** - for fixed-size objects (used in Linux kernel)
**Problems of simple Free List:** 1. **Fragmentation** - after many malloc/free memory is "holey" 2. **O(n) search** - with thousands of blocks find_free_block is slow 3. **No coalescing** - free blocks are not merged (external fragmentation) 4. **Not thread-safe** - two threads can take the same block!
ASCII diagram Segregated Free Lists
``` Segregated Free Lists (as in tcmalloc): bin[0] (8-16 bytes): [blk1] → [blk5] → [blk9] → NULL bin[1] (17-32 bytes): [blk2] → NULL bin[2] (33-64 bytes): [blk3] → [blk7] → NULL bin[3] (65-128 bytes): NULL bin[4] (129-256): [blk4] → [blk6] → NULL ... malloc(24): bin[1] (17-32) → take blk2 (O(1)!) free(blk2): return to bin[1] Advantages: - O(1) search for small allocations - Less fragmentation (similar sizes in one bin) - Cache-friendly (blocks close in memory) ```
**Thread-Safety in allocators:** Problem: two threads simultaneously called `malloc()` - both can take the same block! Solutions: 1. **Global Lock** - mutex on the entire allocator (simple but slow) 2. **Per-Thread Arenas** - each thread has its own pool (tcmalloc, jemalloc) 3. **Lock-Free Data Structures** - CAS (compare-and-swap) for bins (complex!)
**Follow-up questions:** - How to detect memory leaks? (Metadata: track all allocations) - What is internal vs external fragmentation? (Internal: unused in block, External: gaps between blocks) - How does `realloc()` work? (Try extend in-place, else alloc+copy+free) - Why is jemalloc faster than glibc malloc? (Better thread-locality, less fragmentation)
Why do modern allocators (tcmalloc, jemalloc) use per-thread arenas instead of a global lock?
Concurrency Puzzles
**Concurrency Puzzles** - synchronization puzzles that are popular in FAANG interviews. Classic tasks: Dining Philosophers, Reader-Writer Lock, Bounded Buffer. They test understanding of deadlocks, starvation, race conditions.
Task: Implement Barrier
**Interviewer:** Implement a Barrier for N threads. Barrier - a synchronization point: all threads must reach the barrier before any can proceed. ```cpp class Barrier { public: Barrier(int count); // count = number of threads void wait(); // Blocks until all arrive }; ``` **Usage:** ```cpp Barrier barrier(3); // Thread 1: do_work_phase1(); barrier.wait(); // Waits for Thread 2 and 3 do_work_phase2(); ```
**Key concurrency patterns for interviews:** 1. **Producer-Consumer** - queue + cv (Thread Pool is based on this) 2. **Reader-Writer Lock** - many readers OR one writer 3. **Barrier** - phase synchronization (parallel algorithms) 4. **Semaphore** - limit the number of threads (connection pool) 5. **Dining Philosophers** - classic deadlock example
Task: Reader-Writer Lock
**Setup:** Implement RWLock: ```cpp class RWLock { public: void read_lock(); // Many readers can be simultaneous void read_unlock(); void write_lock(); // Only one writer, blocks readers void write_unlock(); }; ``` **Requirement:** Readers do not block each other, but block writers. Writer blocks everyone.
**Dining Philosophers - classic deadlock:** 5 philosophers sit at a round table. Between each pair is one fork. To eat, a philosopher needs TWO forks (left and right). If each takes the left fork - deadlock!
ASCII diagram Dining Philosophers
``` Dining Philosophers Problem: P0 F0 F1 P4 P1 F4 F2 P3 P2 F3 P = Philosopher, F = Fork Deadlock scenario: - P0 takes F0 - P1 takes F1 - P2 takes F2 - P3 takes F3 - P4 takes F4 → All wait for the right fork → DEADLOCK! Solution (Resource Ordering): - P0: takes F0, then F1 (min → max) - P1: takes F1, then F2 - ... - P4: takes F0, then F4 (0 < 4!) Now P4 competes with P0 for F0 → no cycle! ```
**Common Interview Questions:** - **Print in Order:** 3 threads print "first", "second", "third" in the correct order (semaphore/cv) - **FizzBuzz Multithreaded:** 4 threads print FizzBuzz (condition variables) - **H2O Molecule:** threads H and O form water 2H+O (barrier + semaphore) - **Traffic Intersection:** simulate traffic light (mutex + state machine) All are solved using mutex + condition_variable!
Condition Variable automatically protects against race conditions - just cv.wait()/notify() is enough
Condition Variable is a synchronization primitive but requires PROPER use with mutex and predicate for correctness
Common mistake: calling cv.wait() without checking the condition. Problems: 1. **Spurious Wakeup** - a thread can wake up WITHOUT notify() (OS/hardware feature). A predicate is needed: ```cpp cv.wait(lock, [&]{ return condition; }); // ✓ cv.wait(lock); // ✗ Can wake up when condition=false! ``` 2. **Lost Wakeup** - if notify() is called BEFORE wait(), the wakeup is lost: ```cpp // Thread 1: Thread 2: notify() // (hasn't called wait yet) wait() // Sleeps forever! ``` Solution: check the state variable. 3. **Race Condition without lock** - changing shared state without mutex will lead to data race. Correct pattern: ```cpp // Producer: { lock_guard lock(mtx); ready = true; // Change state } cv.notify_one(); // Notify // Consumer: unique_lock lock(mtx); cv.wait(lock, []{return ready;}); // Wait with predicate ``` Condition Variable is a useful primitive but requires discipline!
Key Ideas
- **Rate Limiter patterns:** Token Bucket (burst-friendly), Sliding Window Counter (accurate and efficient). Distributed rate limiting via Redis with Sorted Sets. Trade-off: accuracy vs performance vs memory.
- **Thread Pool architecture:** Producer-Consumer with condition variables. Critical: lock ONLY on queue operations, task executed outside lock. Graceful shutdown via flag + cv.notify_all(). Production features: work stealing, priority queues, dynamic sizing.
- **Memory Allocator strategies:** Segregated Free Lists for O(1) search. Per-thread arenas (tcmalloc) minimize lock contention. Alignment, coalescing, fragmentation - key concepts. Buddy allocator for kernel-level systems.
- **Concurrency Puzzles solved through:** Mutex + Condition Variables + correct predicate. Barrier for phase synchronization. Reader-Writer Lock (watch out for starvation!). Dining Philosophers teaches avoiding deadlock through resource ordering.
Related Topics
System interview patterns are related to fundamental OS and distributed systems concepts:
- Concurrency and Synchronization — Mutex, condition variables, semaphores - primitives for Thread Pool, Barrier, RWLock. Understanding race conditions and deadlocks is critical
- Memory Management — Virtual memory, paging, alignment - foundation for understanding custom allocators. Segmentation, fragmentation, buddy system
- Scheduling Algorithms — Thread Pool scheduler - simplified version of OS scheduler. Fair scheduling, starvation, priority inversion
- Distributed Systems — Rate Limiter in a distributed system - CAP theorem in action. Consistency vs Availability trade-off when using Redis
Вопросы для размышления
- How should the Thread Pool be modified to support task cancellation? What happens to tasks already being executed?
- Rate Limiter on Redis: what happens during Redis downtime? Fail-open (allow all) or fail-closed (deny all)? What choice for a banking API vs a social network?
- Is it possible to implement a Memory Allocator WITHOUT system calls (mmap/brk)? How to pre-allocate a large pool and manage within?
- Dining Philosophers: can it be solved WITHOUT mutex, only through atomic operations (CAS)? What are the trade-offs?