Stochastic Processes

Stochastic Processes in Interviews

Technical interviews in quantitative finance, ML research, and SRE regularly test understanding of stochastic processes. Not formulas-but the ability to think quickly through probabilistic models, apply Little's Law from memory, explain queue nonlinearity, or say why e^{B(t)} is not a martingale. This lesson is the final practice before a real interview.

  • **Quantitative Finance**-questions on GBM, martingales, Ito's formula are standard for quant developer and researcher roles
  • **ML Research**-SGD convergence, diffusion models, Langevin MCMC are typical topics in research interviews
  • **SRE and Backend**-Little's Law, queue nonlinearity, system design with queueing theory are expected at the Senior level

Предварительные знания

  • MCMC and Sampling

Markov Chains: Interview Questions

Markov chain questions are among the most common in quantitative finance, ML, and data engineering interviews. Key topics: stationary distributions, first return times, and absorption probabilities. Being able to quickly extract the stationary distribution π from balance equations is a required skill.

**Top Markov interview questions:** 1. "Find the stationary distribution of a chain with matrix P"-solve πP = π, ∑πᵢ = 1 2. "What is the absorption probability from state i?"-system of linear equations hᵢ = pᵢⱼ + ∑ pᵢₖ hₖ 3. "Expected first return time to state i?"-mᵢ = 1/πᵢ 4. "PageRank: what distribution is it?"-stationary distribution of a random walk on the web graph 5. "The chain converges to π. How fast?"-spectral gap |λ₂|, mixing time

**A common interview trap:** questions about periodic chains. If a chain is periodic (e.g., a random walk on a bipartite graph), the stationary distribution exists but π^{(t)} does not converge to π. The right answer: "For an ergodic chain (aperiodic + irreducible), convergence is guaranteed; the period must be checked."

Interview questionKey factCommon mistake
Find stationary ππP = π, ∑πᵢ = 1 (left eigenvector)Solving Pᵀπ = π is also correct
Expected return time to iE[τᵢ] = 1/πᵢConfusing with first passage time
Convergence to πGeometric: ‖πᵗ-π‖ ≤ C·|λ₂|ᵗAssuming any chain converges
Absorption in random walkP(reach N from i) = i/NTrying to compute directly via DP

PageRank is a complex Google algorithm unrelated to probability theory

PageRank = the stationary distribution of a random walk on the web graph (with teleportation: with probability α jump to a random page). This is a classic problem of finding π for an ergodic Markov chain

π = stationary distribution of P = (1-α)·(transition matrix) + α·(1/n)·J. Power iteration (repeatedly multiplying by P) is nothing other than simulating the random walk to stationarity

Interview question: "A random walk on {0,...,10} is absorbed at both ends. Starting at 4, the goal is to reach 10. What is P(success)?"

Brownian Motion: Interview Questions

Questions on Brownian motion and stochastic calculus are standard for quantitative analyst and research scientist roles. Key skills: computing covariances correctly, applying Ito's formula, and understanding geometric BM. Many questions have elegant solutions via martingale arguments.

**Top Brownian motion interview questions:** 1. "What is the quadratic variation of BM?"-[B,B]_t = t (not t², not 0) 2. "What is E[B(s)·B(t)]?"-min(s,t) 3. "Compute E[e^{σB(t)}]"-e^{σ²t/2} (moment generating function of normal) 4. "Is B(t)² a martingale?"-No! B²(t) - t is a martingale (Ito's formula) 5. "Solve dX = aX dt + bX dB (GBM)"-X(t) = X₀ exp((a-b²/2)t + bB(t)) 6. "How does Ito's formula change for f(t, B)?"-add ∂f/∂t · dt

**A common interview trap:** "Is e^{B(t)} a martingale?" The answer is **no**. E[e^{B(t)}] = e^{t/2} grows over time-it is a submartingale. The martingale is e^{B(t) - t/2}. Another trap: "Find E[B(3) | B(1) = 2]". The answer is 2 (martingale property), not 2/3 or 6.

QuestionCorrect answerCommon mistake
E[B(s)·B(t)], s≤tmin(s,t) = sAnswering s·t or 0
E[B(3) | B(1)=2]2 (martingale)Answering 6 (linear interpolation)
Is e^{B(t)} a martingale?No, E[e^{B(t)}] = e^{t/2} growsConfusing with e^{B(t)-t/2}
d(B(t)³) by Ito's formula3B² dB + 3B dtWriting 3B² dB (forgetting correction)
Quadratic variation of cos(B)∫sin²(B_s)dsAnswering 0 or t

B(t)² is a martingale because B(t) is a martingale and x² is a convex function

B(t)² is a submartingale (Jensen's inequality). The martingale is B(t)² - t. The correction -t comes from Ito's formula: d(B²) = 2B dB + dt

Jensen's inequality: E[f(B_t)|F_s] ≥ f(B_s) for convex f-this is a submartingale, not a martingale. Ito's formula is precise: d(B²) = 2B dB + dt, so integrating: B(t)² = 2∫B dB + t. The martingale part is 2∫B dB; the compensator is t

Interview question: "Compute E[B(5) | B(2) = 3]."

Queues and Applications: Interview Questions

Queueing theory questions appear in system design, SRE, and backend interviews. Key tasks: apply Little's Law, compute M/M/1 metrics, find the bottleneck in a Jackson network. Many candidates know M/M/1 but miss the nonlinearity of queues-that is exactly what interviewers test.

**Top queueing interview questions:** 1. "Given: λ=80 req/s, μ=100 req/s. Mean response time?"-W = 1/(μ-λ) = 50 ms 2. "Why cap utilization at 70% and not 95%?"-nonlinearity: L = ρ/(1-ρ) 3. "L=20 concurrent requests, throughput=200 req/s. Mean time?"-W = L/λ = 100 ms (Little's Law) 4. "Two servers vs one fast one?"-M/M/2 vs M/M/1 with 2μ-M/M/2 always wins at the same total capacity 5. "Latency with CV=2 vs CV=1?"-P-K formula: 2.5x worse

**System design + queues:** When asked "How can service latency be improved?", the right structure is: 1. measure ρ = λ/μ 2. find the bottleneck via Jackson networks 3. apply Little's Law L = λW to estimate W from observable metrics 4. reduce CV of service time through batching or load balancing.

Interview scenarioFormulaAnswer
"How many replicas are needed?"ρ_per_server = λ/(c·μ) < 0.7c = ceil(λ/(0.7·μ))
"Why did latency spike at 90% load?"L = ρ/(1-ρ): from ρ 0.8→0.9, L: 4→9Nonlinear queue growth
"Why is P95 so much higher than median?"M/G/1: CV affects the tailHeavy-tailed service times
"What about request hedging?"W ~ 1/(μ-λ): lower effective CVReduces P99 at cost of extra requests

Two servers with speed μ/2 each are worse than one server with speed μ

At high load, two servers with speed μ/2 are better than one with speed μ (M/M/2 beats M/M/1). At low load they are roughly equivalent

M/M/2: each server handles half the traffic at half the speed-ρ is unchanged, but queueing follows the gentler Erlang-C criterion. Two servers provide two parallel "exits", reducing wait time during temporary load spikes

System design interview: "The dashboard shows L=50 active connections, throughput=100 req/s. What is the mean response time?"

Practical Applications: Senior-Level Questions

Senior ML and systems interviews require not just knowing formulas but quickly modeling real problems through stochastic processes. Typical questions: explain system behavior under load, choose the right model, estimate the order of magnitude without exact calculations.

**Senior-level question archetypes:** **"Estimate" questions:** - "How many SGD steps to reach ε-optimality?"-O(1/(η·ε)) for convex, O(1/ε²) for non-convex - "How much does latency grow when load goes from 70% to 90%?"-L(0.9)/L(0.7) = (9)/(2.33) ≈ 4x (nonlinear!) **"Design" questions:** - "How can an A/B test be designed with correct guarantees?"-Sequential probability ratio test (SPRT), martingales - "How can anomalies be detected in streaming data?"-CUSUM (cumulative sum)-martingale monitor **"Explain" questions:** - "Why does batch size affect model generalization?"-Large batch = small gradient noise = fast convergence to sharp minimum

**Answering "design" questions** follows this structure: 1. identify the stochastic process describing the system 2. write the key equations (stationary distribution / SDE / balance equations) 3. apply the theorem (OST / Little's Law / Ito's formula) 4. interpret the result from an engineering perspective. Demonstrating this chain of thinking is what separates a strong candidate.

Real-world problemModelKey result
A/B test: when to stop?SPRT-martingale E-valueStop when E_t > 1/α
Anomaly in metrics?CUSUM-accumulate LLRDetect mean shift optimally
Rate limiting (token bucket)M/M/1 with finite bufferP(reject) = ρ^K/(1-ρ^{K+1})
Exponential backoffGeometric waiting distributionOptimal for Poisson collisions

In an interview all formulas must be known by heart

What matters more is understanding the problem structure and being able to derive answers. "I remember the principle: L = ρ/(1-ρ) is nonlinear" is more valuable than a memorized Erlang-C formula without understanding

The interviewer tests the ability to think through stochastic processes: identify the process, write equations, apply a theorem. A correctly framed problem and an order-of-magnitude answer are stronger than an exact formula without insight into its origin

System design interview: "Load grew from 70% to 90%. By how much did the mean number of requests in queue increase?"

Key Ideas

  • **Markov in interviews**-P(absorption from i) = i/N, E[return] = 1/πᵢ, PageRank = stationary π; check periodicity
  • **BM in interviews**-Cov(B(s),B(t)) = min(s,t), E[B(t)|B(s)] = B(s), B²-t is a martingale, d(B³) = 3B²dB + 3B dt
  • **Queues in interviews**-W = L/λ (Little's Law), W = 1/(μ-λ) for M/M/1, nonlinearity L = ρ/(1-ρ)
  • **Senior questions**-answer structure: process → equations → theorem → engineering interpretation; order of magnitude beats a memorized formula

Related Topics

This lesson brings together key results from the entire course in interview format:

  • Martingales — OST for the ruin problem, martingale arguments for B(t)² - t, sequential tests
  • Queueing Theory — Little's Law, M/M/1, load nonlinearity-standard system design questions
  • Stochastic in ML and Systems — SGD as a Markov chain, DDPM as an SDE, inference latency via queues-the practical context

Вопросы для размышления

  • An interviewer asks: "Why does SGD with large batch size often generalize worse?" How can this be connected to stochastic process theory?
  • "Design an experiment that shows the service follows M/M/1 rather than M/G/1"-how should this be answered?
  • The hardest part of stochastic processes from practice-what should be explained differently from how it appears in textbooks?

Связанные уроки

  • prob-01-intro
Stochastic Processes in Interviews

0

1

Sign In