Stochastic Processes
Stochastic Processes in Interviews
Technical interviews in quantitative finance, ML research, and SRE regularly test understanding of stochastic processes. Not formulas-but the ability to think quickly through probabilistic models, apply Little's Law from memory, explain queue nonlinearity, or say why e^{B(t)} is not a martingale. This lesson is the final practice before a real interview.
- **Quantitative Finance**-questions on GBM, martingales, Ito's formula are standard for quant developer and researcher roles
- **ML Research**-SGD convergence, diffusion models, Langevin MCMC are typical topics in research interviews
- **SRE and Backend**-Little's Law, queue nonlinearity, system design with queueing theory are expected at the Senior level
Предварительные знания
Markov Chains: Interview Questions
Markov chain questions are among the most common in quantitative finance, ML, and data engineering interviews. Key topics: stationary distributions, first return times, and absorption probabilities. Being able to quickly extract the stationary distribution π from balance equations is a required skill.
**Top Markov interview questions:** 1. "Find the stationary distribution of a chain with matrix P"-solve πP = π, ∑πᵢ = 1 2. "What is the absorption probability from state i?"-system of linear equations hᵢ = pᵢⱼ + ∑ pᵢₖ hₖ 3. "Expected first return time to state i?"-mᵢ = 1/πᵢ 4. "PageRank: what distribution is it?"-stationary distribution of a random walk on the web graph 5. "The chain converges to π. How fast?"-spectral gap |λ₂|, mixing time
**A common interview trap:** questions about periodic chains. If a chain is periodic (e.g., a random walk on a bipartite graph), the stationary distribution exists but π^{(t)} does not converge to π. The right answer: "For an ergodic chain (aperiodic + irreducible), convergence is guaranteed; the period must be checked."
| Interview question | Key fact | Common mistake |
|---|---|---|
| Find stationary π | πP = π, ∑πᵢ = 1 (left eigenvector) | Solving Pᵀπ = π is also correct |
| Expected return time to i | E[τᵢ] = 1/πᵢ | Confusing with first passage time |
| Convergence to π | Geometric: ‖πᵗ-π‖ ≤ C·|λ₂|ᵗ | Assuming any chain converges |
| Absorption in random walk | P(reach N from i) = i/N | Trying to compute directly via DP |
PageRank is a complex Google algorithm unrelated to probability theory
PageRank = the stationary distribution of a random walk on the web graph (with teleportation: with probability α jump to a random page). This is a classic problem of finding π for an ergodic Markov chain
π = stationary distribution of P = (1-α)·(transition matrix) + α·(1/n)·J. Power iteration (repeatedly multiplying by P) is nothing other than simulating the random walk to stationarity
Interview question: "A random walk on {0,...,10} is absorbed at both ends. Starting at 4, the goal is to reach 10. What is P(success)?"
Brownian Motion: Interview Questions
Questions on Brownian motion and stochastic calculus are standard for quantitative analyst and research scientist roles. Key skills: computing covariances correctly, applying Ito's formula, and understanding geometric BM. Many questions have elegant solutions via martingale arguments.
**Top Brownian motion interview questions:** 1. "What is the quadratic variation of BM?"-[B,B]_t = t (not t², not 0) 2. "What is E[B(s)·B(t)]?"-min(s,t) 3. "Compute E[e^{σB(t)}]"-e^{σ²t/2} (moment generating function of normal) 4. "Is B(t)² a martingale?"-No! B²(t) - t is a martingale (Ito's formula) 5. "Solve dX = aX dt + bX dB (GBM)"-X(t) = X₀ exp((a-b²/2)t + bB(t)) 6. "How does Ito's formula change for f(t, B)?"-add ∂f/∂t · dt
**A common interview trap:** "Is e^{B(t)} a martingale?" The answer is **no**. E[e^{B(t)}] = e^{t/2} grows over time-it is a submartingale. The martingale is e^{B(t) - t/2}. Another trap: "Find E[B(3) | B(1) = 2]". The answer is 2 (martingale property), not 2/3 or 6.
| Question | Correct answer | Common mistake |
|---|---|---|
| E[B(s)·B(t)], s≤t | min(s,t) = s | Answering s·t or 0 |
| E[B(3) | B(1)=2] | 2 (martingale) | Answering 6 (linear interpolation) |
| Is e^{B(t)} a martingale? | No, E[e^{B(t)}] = e^{t/2} grows | Confusing with e^{B(t)-t/2} |
| d(B(t)³) by Ito's formula | 3B² dB + 3B dt | Writing 3B² dB (forgetting correction) |
| Quadratic variation of cos(B) | ∫sin²(B_s)ds | Answering 0 or t |
B(t)² is a martingale because B(t) is a martingale and x² is a convex function
B(t)² is a submartingale (Jensen's inequality). The martingale is B(t)² - t. The correction -t comes from Ito's formula: d(B²) = 2B dB + dt
Jensen's inequality: E[f(B_t)|F_s] ≥ f(B_s) for convex f-this is a submartingale, not a martingale. Ito's formula is precise: d(B²) = 2B dB + dt, so integrating: B(t)² = 2∫B dB + t. The martingale part is 2∫B dB; the compensator is t
Interview question: "Compute E[B(5) | B(2) = 3]."
Queues and Applications: Interview Questions
Queueing theory questions appear in system design, SRE, and backend interviews. Key tasks: apply Little's Law, compute M/M/1 metrics, find the bottleneck in a Jackson network. Many candidates know M/M/1 but miss the nonlinearity of queues-that is exactly what interviewers test.
**Top queueing interview questions:** 1. "Given: λ=80 req/s, μ=100 req/s. Mean response time?"-W = 1/(μ-λ) = 50 ms 2. "Why cap utilization at 70% and not 95%?"-nonlinearity: L = ρ/(1-ρ) 3. "L=20 concurrent requests, throughput=200 req/s. Mean time?"-W = L/λ = 100 ms (Little's Law) 4. "Two servers vs one fast one?"-M/M/2 vs M/M/1 with 2μ-M/M/2 always wins at the same total capacity 5. "Latency with CV=2 vs CV=1?"-P-K formula: 2.5x worse
**System design + queues:** When asked "How can service latency be improved?", the right structure is: 1. measure ρ = λ/μ 2. find the bottleneck via Jackson networks 3. apply Little's Law L = λW to estimate W from observable metrics 4. reduce CV of service time through batching or load balancing.
| Interview scenario | Formula | Answer |
|---|---|---|
| "How many replicas are needed?" | ρ_per_server = λ/(c·μ) < 0.7 | c = ceil(λ/(0.7·μ)) |
| "Why did latency spike at 90% load?" | L = ρ/(1-ρ): from ρ 0.8→0.9, L: 4→9 | Nonlinear queue growth |
| "Why is P95 so much higher than median?" | M/G/1: CV affects the tail | Heavy-tailed service times |
| "What about request hedging?" | W ~ 1/(μ-λ): lower effective CV | Reduces P99 at cost of extra requests |
Two servers with speed μ/2 each are worse than one server with speed μ
At high load, two servers with speed μ/2 are better than one with speed μ (M/M/2 beats M/M/1). At low load they are roughly equivalent
M/M/2: each server handles half the traffic at half the speed-ρ is unchanged, but queueing follows the gentler Erlang-C criterion. Two servers provide two parallel "exits", reducing wait time during temporary load spikes
System design interview: "The dashboard shows L=50 active connections, throughput=100 req/s. What is the mean response time?"
Practical Applications: Senior-Level Questions
Senior ML and systems interviews require not just knowing formulas but quickly modeling real problems through stochastic processes. Typical questions: explain system behavior under load, choose the right model, estimate the order of magnitude without exact calculations.
**Senior-level question archetypes:** **"Estimate" questions:** - "How many SGD steps to reach ε-optimality?"-O(1/(η·ε)) for convex, O(1/ε²) for non-convex - "How much does latency grow when load goes from 70% to 90%?"-L(0.9)/L(0.7) = (9)/(2.33) ≈ 4x (nonlinear!) **"Design" questions:** - "How can an A/B test be designed with correct guarantees?"-Sequential probability ratio test (SPRT), martingales - "How can anomalies be detected in streaming data?"-CUSUM (cumulative sum)-martingale monitor **"Explain" questions:** - "Why does batch size affect model generalization?"-Large batch = small gradient noise = fast convergence to sharp minimum
**Answering "design" questions** follows this structure: 1. identify the stochastic process describing the system 2. write the key equations (stationary distribution / SDE / balance equations) 3. apply the theorem (OST / Little's Law / Ito's formula) 4. interpret the result from an engineering perspective. Demonstrating this chain of thinking is what separates a strong candidate.
| Real-world problem | Model | Key result |
|---|---|---|
| A/B test: when to stop? | SPRT-martingale E-value | Stop when E_t > 1/α |
| Anomaly in metrics? | CUSUM-accumulate LLR | Detect mean shift optimally |
| Rate limiting (token bucket) | M/M/1 with finite buffer | P(reject) = ρ^K/(1-ρ^{K+1}) |
| Exponential backoff | Geometric waiting distribution | Optimal for Poisson collisions |
In an interview all formulas must be known by heart
What matters more is understanding the problem structure and being able to derive answers. "I remember the principle: L = ρ/(1-ρ) is nonlinear" is more valuable than a memorized Erlang-C formula without understanding
The interviewer tests the ability to think through stochastic processes: identify the process, write equations, apply a theorem. A correctly framed problem and an order-of-magnitude answer are stronger than an exact formula without insight into its origin
System design interview: "Load grew from 70% to 90%. By how much did the mean number of requests in queue increase?"
Key Ideas
- **Markov in interviews**-P(absorption from i) = i/N, E[return] = 1/πᵢ, PageRank = stationary π; check periodicity
- **BM in interviews**-Cov(B(s),B(t)) = min(s,t), E[B(t)|B(s)] = B(s), B²-t is a martingale, d(B³) = 3B²dB + 3B dt
- **Queues in interviews**-W = L/λ (Little's Law), W = 1/(μ-λ) for M/M/1, nonlinearity L = ρ/(1-ρ)
- **Senior questions**-answer structure: process → equations → theorem → engineering interpretation; order of magnitude beats a memorized formula
Related Topics
This lesson brings together key results from the entire course in interview format:
- Martingales — OST for the ruin problem, martingale arguments for B(t)² - t, sequential tests
- Queueing Theory — Little's Law, M/M/1, load nonlinearity-standard system design questions
- Stochastic in ML and Systems — SGD as a Markov chain, DDPM as an SDE, inference latency via queues-the practical context
Вопросы для размышления
- An interviewer asks: "Why does SGD with large batch size often generalize worse?" How can this be connected to stochastic process theory?
- "Design an experiment that shows the service follows M/M/1 rather than M/G/1"-how should this be answered?
- The hardest part of stochastic processes from practice-what should be explained differently from how it appears in textbooks?