Optimization

Optimization in Interviews

Dijkstra proved his shortest-path algorithm in 20 minutes on a cafe napkin. Decades later LeetCode asks the same question to a trillion candidates, almost none of whom realize it was an optimization problem.

  • Twitter feed: 100M DAU x 300 follows = 30B fan-out edges. Pure push for a celebrity tweet writes 1B rows. Hybrid push+pull cuts write amplification by 100x.
  • Google PageRank is a linear optimization across 100B pages: $r = (1 - d) \cdot \mathbf{1} + d M r$. Power iteration converges in 50-100 steps.
  • AWS spot instances plus checkpointing save 60% over on-demand at the cost of about 5% hourly interruption probability - an explicit Pareto trade-off.
  • Dynamic batching for GPU inference at max_batch=32, max_wait_ms=10 lifts throughput by 4x while P99 latency rises by only 10 ms.

Предварительные знания

  • Optimization in ML and Production

## Optimization in Interviews Mathematical optimization and interview algorithms are the same idea in different clothing. Dijkstra = greedy optimization. Knapsack DP = exact algorithm for a special case of ILP. System design = Pareto multi-objective optimization. Architecture choice = constrained optimization with explicit trade-offs. This lesson covers: seeing any algorithm through the lens of optimization, structuring system design answers as Pareto analysis, and explaining every engineering decision via objective function + constraints.

## Summary: Optimization in Interviews **Algorithms = optimization**: Dijkstra -> greedy with exchange argument. Knapsack -> ILP with DP for optimal substructure. A* -> lower bound guided search = Branch & Bound. Greedy is correct only when provable---not just intuitive. **System design = multi-objective**: clarify objectives (latency, cost, consistency), name constraints (SLA, budget, team size), propose Pareto-optimal solutions, recommend with explicit trade-off rationale. **Strong answer pattern**: 1) Clarify objectives -> 2) Establish baseline -> 3) Enumerate Pareto solutions -> 4) Recommend with reasoning. **The core insight of this course**: optimization is a unified language. LP, DP, greedy, SGD, system design---all are searching for the minimum of an objective function subject to constraints. Different contexts, one mathematics.

Algorithms as Optimization Problems

## Shortest Path Is an Optimization Problem Dijkstra's algorithm solves: ``` min sum_{(u,v) in path} w(u,v) subject to: path starts at s path ends at t path is connected Dijkstra = greedy optimization by nearest unvisited: Invariant: d[v] = minimum-cost path from s to v Step: pick v* = argmin_{v unvisited} d[v] -> Greedy choice property: d[v*] is already optimal! ``` ## Knapsack Is an ILP ``` 0-1 Knapsack: max sum_i v_i * x_i (maximize value) s.t. sum_i w_i * x_i <= W (capacity constraint) x_i in {0, 1} (binary variables) -> This is Integer Linear Programming with n binary variables! 0-1 Knapsack DP = Branch & Bound for this specific structure: dp[i][w] = max value using first i items with capacity w Optimization interpretation: dp[i][w] = exact solution to the ILP restricted to items 1..i at capacity w. ``` ## A* as Heuristic Optimization ``` A* minimizes: f(n) = g(n) + h(n) where: g(n) = actual cost from start to n h(n) = heuristic estimate of cost from n to goal Dijkstra: h(n) = 0 (no heuristic) A* with Euclidean distance: faster for geometric graphs Optimality guarantee: if h(n) <= true remaining distance (h is admissible), A* finds the optimal path. -> A* = optimization of f(n) with an admissible lower bound Analogy: lower bounds in Branch & Bound! ``` ## LeetCode Through an Optimization Lens ```python # Problem: Jump Game II (LC #45) # Find minimum number of jumps to reach the end # -> Optimization: min jumps s.t. reach the last index def jump(nums): """Greedy: at each step, maximize the reachable index.""" n = len(nums) jumps = 0 current_end = 0 farthest = 0 for i in range(n - 1): farthest = max(farthest, i + nums[i]) # maximize reach if i == current_end: # end of current jump jumps += 1 current_end = farthest return jumps # Why is greedy optimal? # Exchange argument: # If an optimal solution makes a shorter jump than ours, # we can replace it with ours (no worse) -> greedy = optimal. # Problem: Coin Change (LC #322) # -> Unbounded Knapsack (a variant of ILP) # -> DP = exact algorithm for this special ILP structure def coinChange(coins, amount): dp = [float('inf')] * (amount + 1) dp[0] = 0 for coin in coins: for x in range(coin, amount + 1): dp[x] = min(dp[x], dp[x - coin] + 1) return dp[amount] if dp[amount] != float('inf') else -1 ``` ## How to Talk About Optimality in Interviews ``` Question: "Why is your algorithm optimal?" Don't say: "I tested it on examples and it works." Do say (three ways to prove optimality): 1. Exchange argument: "Assume a better solution exists. I can show that we can swap steps to produce a solution no worse than ours." 2. Optimal substructure: "The problem has optimal substructure: an optimal solution contains optimal solutions to subproblems. -> Dynamic programming applies." 3. Lower bound: "Here is a lower bound for any algorithm: [bound]. Our algorithm achieves this bound -> it is optimal." ```

System Design as Optimization

## Every System Design Question Is Multi-Objective Optimization ``` Example: "Design Twitter feed" Objectives (competing): min latency(read) - feed load time min cost(storage) - storage cost max consistency(data) - data freshness max availability - SLA uptime Constraints: DAU = 100M users avg follows = 300 tweets/day = 500M -> This is Pareto optimization! There is no single "correct" answer---only Pareto-optimal architectures. ``` ## Pull vs. Push: An Explicit Trade-off ```python # PULL model (compute on read): # latency_read ~ O(follows) = O(300) <- slow for large follow counts # storage = O(tweets only) # consistency = perfect (always fresh) # PUSH model (fan-out on write): # latency_read ~ O(1) <- instant # storage = O(DAU * follows) ~ 30B entries # latency_write ~ O(follows) = O(300) <- slow for celebrities # Hybrid (real Twitter/Meta approach): def get_feed(user_id): # Pre-computed feed for regular users (PUSH) cached_feed = redis.get(f'feed:{user_id}') if cached_feed: return cached_feed # For celebrity followers - pull on demand celeb_tweets = [] for celeb in get_celebrities_following(user_id): celeb_tweets.extend(db.get_recent_tweets(celeb, limit=10)) # Merge with cached regular feed regular_feed = redis.get(f'feed_regular:{user_id}') or [] return merge_sorted(regular_feed, celeb_tweets)[:100] # Pareto trade-off: # regular users -> push (O(1) read) # celebrities -> pull (avoid O(300M) fan-out) ``` ## Reduce Latency: Typical Interview Questions ``` Question: "How would you reduce our recommendation system latency from 500ms to 50ms?" Optimization frame: min latency s.t. accuracy >= threshold Strategies (each is a point on the Pareto front): 1. Cache precomputed recommendations: latency: O(1) read, cost: O(DAU * recs) storage trade-off: staleness vs. freshness 2. Smaller model (distillation): latency: proportional to model size trade-off: accuracy vs. speed 3. Approximate Nearest Neighbor (HNSW, FAISS): exact k-NN: O(N*d), HNSW: O(log N * d) trade-off: recall % vs. speed 4. Two-stage ranking (retrieval + reranking): Stage 1: fast, low precision, 1000 candidates Stage 2: slow, high precision, top 10 trade-off: quality vs. throughput FAANG answer structure: 1. Identify the bottleneck (where do the 500ms go?) 2. Propose 2-3 Pareto-optimal options 3. Choose one with explicit trade-off rationale ``` ## Reduce Cost: Infrastructure Optimization ```python # Classic problem: optimize ML pipeline cost # Formalization: # min cost(instance_type, count) # s.t. throughput >= 1000 req/sec # latency_p99 <= 100 ms # Pareto-optimal solutions: # 1. Spot instances + checkpointing: config = { 'instance_type': 'g4dn.xlarge', 'pricing': 'spot', # ~60% cheaper than on-demand 'checkpoint_freq': '10min', # lose at most 10 min on interruption } # 2. Dynamic batching (throughput-latency trade-off): def dynamic_batching(queue, max_batch=32, max_wait_ms=10): """ Accumulate requests up to max_batch or max_wait_ms. GPU throughput grows linearly with batch; latency barely increases. """ batch = [] deadline = time.time() + max_wait_ms / 1000 while len(batch) < max_batch and time.time() < deadline: if queue.not_empty(): batch.append(queue.get()) return process_batch(batch) # 3. Cache popular predictions (Zipf law): # If top 20% of users drive 80% of requests, # caching them yields 80% cache hit rate -> 80% less compute. ```

Engineering Decisions as Constrained Optimization

## Every Engineering Decision Is an Optimization Problem Framing a decision in optimization terms forces you to: 1. State the objective function explicitly (what are we optimizing?) 2. Name the constraints (what cannot be violated?) 3. Show you understand the trade-offs ## Choosing a Database: Formalized ``` Problem: choose a database for user sessions (100K RPS) Objective: min latency_read Constraints: availability >= 99.9% (SLA) durability = strong (cannot lose sessions) cost <= $X/month ops_complexity <= medium (team of 3) Candidates on the Pareto front: Redis: latency=0.1ms, consistency=eventual, cost=high Postgres: latency=1ms, consistency=strong, cost=low DynamoDB: latency=0.5ms, consistency=eventual, cost=medium Cassandra: latency=0.3ms, consistency=tunable, cost=medium Choice depends on relative objective importance: Latency critical: Redis (with persistence enabled) Cost critical + strong cons.: Postgres + read replicas Global scale: DynamoDB ``` ## Microservices vs. Monolith ```python objectives = [ 'deployment_velocity', # how fast can we ship features 'operational_complexity', # how hard to operate 'team_autonomy', # how independently teams can work 'latency', # network hops between services 'cost' # infrastructure + DevOps time ] # Monolith: # deployment_velocity: low (all-or-nothing deploys) # operational_complexity: low (one process) # team_autonomy: low (shared codebase conflicts) # latency: zero (in-process calls) # cost: low # Microservices: # deployment_velocity: high (independent deploys) # operational_complexity: high (service mesh, observability) # team_autonomy: high (own repo/deploy/runtime) # latency: higher (network + serialization) # cost: higher (more infra) # The decision is a function of constraints: def choose_architecture(team_size, product_maturity, traffic): if team_size < 10 and product_maturity == 'early': return 'monolith' # premature optimization is evil elif team_size > 50 and traffic > '100k_rps': return 'microservices' else: return 'modular_monolith' # Pareto-optimal compromise ``` ## FAANG Interview Pattern ``` Question: "How would you optimize our ML inference pipeline?" Answer structure: 1. Clarify objectives and constraints first: "What are we optimizing---P99 latency? Cost? Throughput? What are the constraints on accuracy and availability?" 2. Measure before optimizing: "Without profiling I cannot identify the bottleneck. First step: add tracing and find the slowest component." 3. Present Pareto analysis: "I see several points on the Pareto front: a) Prediction cache: latency O(1), accuracy -3% b) Model distillation: latency -50%, accuracy -1% c) INT8 quantization: latency -60%, accuracy -0.5% d) More replicas: latency -0%, cost +50%" 4. Recommend with explicit trade-off rationale: "I recommend INT8 quantization (c)---best latency/accuracy trade-off. If accuracy cannot degrade---distillation (b). Cache (a) only works if requests repeat (rules out personalized recommendations)." Question: "How do you decide whether to roll out a feature to 100%?" Optimization framing: Objective: max(engagement_delta) Constraints: error_rate <= baseline, latency <= +10ms, P(harm) < 0.001 Method: A/B test -> sequential hypothesis testing (minimize sample size s.t. power >= 0.8, alpha <= 0.05) ``` ## Answer Pattern for Any Engineering Question ``` 1. Clarify objectives: "What are we maximizing/minimizing?" "Which constraints are hard vs. soft preferences?" 2. Establish a baseline: "What is the current state? We need numbers before optimizing." 3. Enumerate Pareto-optimal solutions: "Here are N solutions with different trade-offs: [solution] -> [objective improvement] at [constraint cost]" 4. Recommend with reasoning: "I choose [X] because [objective priority] given [constraint]. If [context changes], I would reconsider in favor of [Y]." ```

Connection to previous

Optimization in ML and Production showed how to deploy models faster - SAM, QAT, fusion. This lesson transfers the same optimization lens to algorithmic problems and system design. Greedy, DP, A* are all special cases of optimization with provable guarantees. System design lives on the same map, only the Pareto front spans latency / cost / consistency.

  • Optimization in ML and Production — ML optimization is a special case; the same lens applies to algorithms and architecture
  • Algorithms and complexity — Big-O quantifies how close an algorithm sits to the lower bound

Summary

  • Every LeetCode greedy is optimization with an exchange argument. Without the proof it is guesswork.
  • 0-1 Knapsack DP is an exact ILP solver for a special structure with pseudo-polynomial O(n·W) complexity.
  • A* is Dijkstra plus an admissible lower bound - Branch & Bound in pure form.
  • System design is multi-objective optimization. The Pareto front (latency, cost, consistency) admits no single right answer.
  • Twitter feed is the celebrity problem: fan-out O(followers) breaks for Musk-class accounts. Hybrid push+pull is the Pareto-optimal compromise.
  • Strong candidates clarify objectives, measure a baseline, enumerate Pareto options, and recommend with an explicit trade-off.

Вопросы для размышления

  • If every interview algorithm can be framed as optimization, how does the approach to a new LeetCode problem change?
  • When does monolith Pareto-dominate microservices, and which constants (team size, traffic) trigger the transition?
  • Can the optimization framing extend to non-technical decisions such as feature prioritization or hiring? Where does it break down?

Связанные уроки

  • opt-15 — Interview questions build on ML/Production optimization knowledge
  • ml-09-gradient-descent — Gradient descent is a mandatory ML interview topic
  • alg-01-big-o — Algorithm complexity is a baseline interview question
  • db-11-query-optimization — Query optimization is a frequent backend interview topic
  • dl-09 — Comparing AdamW and SGD is a classic optimization interview question
  • calc-01-sequences
Optimization in Interviews

0

1

Sign In

  • Shortest path = min-cost optimization; Dijkstra = greedy with exchange argument proof
  • Knapsack = binary ILP; 0-1 DP = exact Branch & Bound exploiting optimal substructure
  • A* with admissible heuristic = lower bound guided Branch & Bound, guaranteed optimal
  • Greedy is optimal only when provable via exchange argument or matroid structure---never just intuition

Why can 0-1 Knapsack be solved with dynamic programming?

  • Every system design question is multi-objective: latency, cost, consistency, availability all compete
  • Pull vs. Push is an explicit Pareto trade-off; the right choice depends on workload distribution
  • Reduce latency options---cache, distillation, ANN, two-stage ranking---are each points on the Pareto front
  • The goal is not one "right answer" but structured trade-off analysis with explicit assumptions

Why does Twitter use a hybrid push+pull approach instead of pure push for the feed?

  • Every engineering decision = objective function + constraints + trade-offs---structure answers this way
  • Measure before optimizing: without profiling you cannot know the bottleneck (Knuth's rule)
  • Pareto analysis: offering multiple solutions with explicit trade-offs outperforms one "correct" answer
  • Microservices vs. monolith is not a technical decision---it is a function of team size, product maturity, and traffic

System design interviews reward jumping straight to an architecture: Kafka + Redis + Cassandra + microservices wins every time.

A strong candidate first names the objective (what is being minimized) and constraints (SLA, budget, team size), then enumerates 2-3 Pareto-optimal options and defends the choice via relative importance.

FAANG interviewers have heard hundreds of 'Kafka + Redis' answers. What separates senior engineers is recognizing that no single answer is correct and articulating the trade-off explicitly. Without objective and constraints, any architecture is guesswork.

What is the primary marker of a strong system design answer?