Stochastic Processes

Stochastic Processes in Interviews

Technical interviews in quantitative finance, ML research, and SRE regularly test understanding of stochastic processes. Not formulas-but the ability to think quickly through probabilistic models, apply Little's Law from memory, explain queue nonlinearity, or say why e^{B(t)} is not a martingale. This lesson is the final practice before a real interview.

**Quantitative Finance**-questions on GBM, martingales, Ito's formula are standard for quant developer and researcher roles
**ML Research**-SGD convergence, diffusion models, Langevin MCMC are typical topics in research interviews
**SRE and Backend**-Little's Law, queue nonlinearity, system design with queueing theory are expected at the Senior level

Предварительные знания

MCMC and Sampling

Markov Chains: Interview Questions

Markov chain questions are among the most common in quantitative finance, ML, and data engineering interviews. Key topics: stationary distributions, first return times, and absorption probabilities. Being able to quickly extract the stationary distribution π from balance equations is a required skill.

**Top Markov interview questions:** 1. "Find the stationary distribution of a chain with matrix P"-solve πP = π, ∑πᵢ = 1 2. "What is the absorption probability from state i?"-system of linear equations hᵢ = pᵢⱼ + ∑ pᵢₖ hₖ 3. "Expected first return time to state i?"-mᵢ = 1/πᵢ 4. "PageRank: what distribution is it?"-stationary distribution of a random walk on the web graph 5. "The chain converges to π. How fast?"-spectral gap |λ₂|, mixing time

**A common interview trap:** questions about periodic chains. If a chain is periodic (e.g., a random walk on a bipartite graph), the stationary distribution exists but π^{(t)} does not converge to π. The right answer: "For an ergodic chain (aperiodic + irreducible), convergence is guaranteed; the period must be checked."

Interview question	Key fact	Common mistake
Find stationary π	πP = π, ∑πᵢ = 1 (left eigenvector)	Solving Pᵀπ = π is also correct
Expected return time to i	E[τᵢ] = 1/πᵢ	Confusing with first passage time
Convergence to π	Geometric: ‖πᵗ-π‖ ≤ C·\|λ₂\|ᵗ	Assuming any chain converges
Absorption in random walk	P(reach N from i) = i/N	Trying to compute directly via DP

PageRank is a complex Google algorithm unrelated to probability theory

PageRank = the stationary distribution of a random walk on the web graph (with teleportation: with probability α jump to a random page). This is a classic problem of finding π for an ergodic Markov chain

π = stationary distribution of P = (1-α)·(transition matrix) + α·(1/n)·J. Power iteration (repeatedly multiplying by P) is nothing other than simulating the random walk to stationarity

Interview question: "A random walk on {0,...,10} is absorbed at both ends. Starting at 4, the goal is to reach 10. What is P(success)?"

Brownian Motion: Interview Questions

Questions on Brownian motion and stochastic calculus are standard for quantitative analyst and research scientist roles. Key skills: computing covariances correctly, applying Ito's formula, and understanding geometric BM. Many questions have elegant solutions via martingale arguments.

**Top Brownian motion interview questions:** 1. "What is the quadratic variation of BM?"-[B,B]_t = t (not t², not 0) 2. "What is E[B(s)·B(t)]?"-min(s,t) 3. "Compute E[e^{σB(t)}]"-e^{σ²t/2} (moment generating function of normal) 4. "Is B(t)² a martingale?"-No! B²(t) - t is a martingale (Ito's formula) 5. "Solve dX = aX dt + bX dB (GBM)"-X(t) = X₀ exp((a-b²/2)t + bB(t)) 6. "How does Ito's formula change for f(t, B)?"-add ∂f/∂t · dt

**A common interview trap:** "Is e^{B(t)} a martingale?" The answer is **no**. E[e^{B(t)}] = e^{t/2} grows over time-it is a submartingale. The martingale is e^{B(t) - t/2}. Another trap: "Find E[B(3) | B(1) = 2]". The answer is 2 (martingale property), not 2/3 or 6.

Question	Correct answer	Common mistake
E[B(s)·B(t)], s≤t	min(s,t) = s	Answering s·t or 0
E[B(3) \| B(1)=2]	2 (martingale)	Answering 6 (linear interpolation)
Is e^{B(t)} a martingale?	No, E[e^{B(t)}] = e^{t/2} grows	Confusing with e^{B(t)-t/2}
d(B(t)³) by Ito's formula	3B² dB + 3B dt	Writing 3B² dB (forgetting correction)
Quadratic variation of cos(B)	∫sin²(B_s)ds	Answering 0 or t

B(t)² is a martingale because B(t) is a martingale and x² is a convex function

B(t)² is a submartingale (Jensen's inequality). The martingale is B(t)² - t. The correction -t comes from Ito's formula: d(B²) = 2B dB + dt

Jensen's inequality: E[f(B_t)|F_s] ≥ f(B_s) for convex f-this is a submartingale, not a martingale. Ito's formula is precise: d(B²) = 2B dB + dt, so integrating: B(t)² = 2∫B dB + t. The martingale part is 2∫B dB; the compensator is t

Interview question: "Compute E[B(5) | B(2) = 3]."

Queues and Applications: Interview Questions

Queueing theory questions appear in system design, SRE, and backend interviews. Key tasks: apply Little's Law, compute M/M/1 metrics, find the bottleneck in a Jackson network. Many candidates know M/M/1 but miss the nonlinearity of queues-that is exactly what interviewers test.

**Top queueing interview questions:** 1. "Given: λ=80 req/s, μ=100 req/s. Mean response time?"-W = 1/(μ-λ) = 50 ms 2. "Why cap utilization at 70% and not 95%?"-nonlinearity: L = ρ/(1-ρ) 3. "L=20 concurrent requests, throughput=200 req/s. Mean time?"-W = L/λ = 100 ms (Little's Law) 4. "Two servers vs one fast one?"-M/M/2 vs M/M/1 with 2μ-M/M/2 always wins at the same total capacity 5. "Latency with CV=2 vs CV=1?"-P-K formula: 2.5x worse

**System design + queues:** When asked "How can service latency be improved?", the right structure is: 1. measure ρ = λ/μ 2. find the bottleneck via Jackson networks 3. apply Little's Law L = λW to estimate W from observable metrics 4. reduce CV of service time through batching or load balancing.

Interview scenario	Formula	Answer
"How many replicas are needed?"	ρ_per_server = λ/(c·μ) < 0.7	c = ceil(λ/(0.7·μ))
"Why did latency spike at 90% load?"	L = ρ/(1-ρ): from ρ 0.8→0.9, L: 4→9	Nonlinear queue growth
"Why is P95 so much higher than median?"	M/G/1: CV affects the tail	Heavy-tailed service times
"What about request hedging?"	W ~ 1/(μ-λ): lower effective CV	Reduces P99 at cost of extra requests

Two servers with speed μ/2 each are worse than one server with speed μ

At high load, two servers with speed μ/2 are better than one with speed μ (M/M/2 beats M/M/1). At low load they are roughly equivalent

M/M/2: each server handles half the traffic at half the speed-ρ is unchanged, but queueing follows the gentler Erlang-C criterion. Two servers provide two parallel "exits", reducing wait time during temporary load spikes

System design interview: "The dashboard shows L=50 active connections, throughput=100 req/s. What is the mean response time?"

Practical Applications: Senior-Level Questions

Senior ML and systems interviews require not just knowing formulas but quickly modeling real problems through stochastic processes. Typical questions: explain system behavior under load, choose the right model, estimate the order of magnitude without exact calculations.

**Senior-level question archetypes:** **"Estimate" questions:** - "How many SGD steps to reach ε-optimality?"-O(1/(η·ε)) for convex, O(1/ε²) for non-convex - "How much does latency grow when load goes from 70% to 90%?"-L(0.9)/L(0.7) = (9)/(2.33) ≈ 4x (nonlinear!) **"Design" questions:** - "How can an A/B test be designed with correct guarantees?"-Sequential probability ratio test (SPRT), martingales - "How can anomalies be detected in streaming data?"-CUSUM (cumulative sum)-martingale monitor **"Explain" questions:** - "Why does batch size affect model generalization?"-Large batch = small gradient noise = fast convergence to sharp minimum

**Answering "design" questions** follows this structure: 1. identify the stochastic process describing the system 2. write the key equations (stationary distribution / SDE / balance equations) 3. apply the theorem (OST / Little's Law / Ito's formula) 4. interpret the result from an engineering perspective. Demonstrating this chain of thinking is what separates a strong candidate.

Real-world problem	Model	Key result
A/B test: when to stop?	SPRT-martingale E-value	Stop when E_t > 1/α
Anomaly in metrics?	CUSUM-accumulate LLR	Detect mean shift optimally
Rate limiting (token bucket)	M/M/1 with finite buffer	P(reject) = ρ^K/(1-ρ^{K+1})
Exponential backoff	Geometric waiting distribution	Optimal for Poisson collisions

In an interview all formulas must be known by heart

What matters more is understanding the problem structure and being able to derive answers. "I remember the principle: L = ρ/(1-ρ) is nonlinear" is more valuable than a memorized Erlang-C formula without understanding

The interviewer tests the ability to think through stochastic processes: identify the process, write equations, apply a theorem. A correctly framed problem and an order-of-magnitude answer are stronger than an exact formula without insight into its origin

System design interview: "Load grew from 70% to 90%. By how much did the mean number of requests in queue increase?"

Key Ideas

**Markov in interviews**-P(absorption from i) = i/N, E[return] = 1/πᵢ, PageRank = stationary π; check periodicity
**BM in interviews**-Cov(B(s),B(t)) = min(s,t), E[B(t)|B(s)] = B(s), B²-t is a martingale, d(B³) = 3B²dB + 3B dt
**Queues in interviews**-W = L/λ (Little's Law), W = 1/(μ-λ) for M/M/1, nonlinearity L = ρ/(1-ρ)
**Senior questions**-answer structure: process → equations → theorem → engineering interpretation; order of magnitude beats a memorized formula

Вопросы для размышления

An interviewer asks: "Why does SGD with large batch size often generalize worse?" How can this be connected to stochastic process theory?
"Design an experiment that shows the service follows M/M/1 rather than M/G/1"-how should this be answered?
The hardest part of stochastic processes from practice-what should be explained differently from how it appears in textbooks?

Связанные уроки

prob-01-intro

Stochastic Processes

Stochastic Processes in Interviews

**Quantitative Finance**-questions on GBM, martingales, Ito's formula are standard for quant developer and researcher roles
**ML Research**-SGD convergence, diffusion models, Langevin MCMC are typical topics in research interviews
**SRE and Backend**-Little's Law, queue nonlinearity, system design with queueing theory are expected at the Senior level

Предварительные знания

MCMC and Sampling

Markov Chains: Interview Questions

Interview question	Key fact	Common mistake
Find stationary π	πP = π, ∑πᵢ = 1 (left eigenvector)	Solving Pᵀπ = π is also correct
Expected return time to i	E[τᵢ] = 1/πᵢ	Confusing with first passage time
Convergence to π	Geometric: ‖πᵗ-π‖ ≤ C·\|λ₂\|ᵗ	Assuming any chain converges
Absorption in random walk	P(reach N from i) = i/N	Trying to compute directly via DP

PageRank is a complex Google algorithm unrelated to probability theory

π = stationary distribution of P = (1-α)·(transition matrix) + α·(1/n)·J. Power iteration (repeatedly multiplying by P) is nothing other than simulating the random walk to stationarity

Interview question: "A random walk on {0,...,10} is absorbed at both ends. Starting at 4, the goal is to reach 10. What is P(success)?"

Brownian Motion: Interview Questions

Question	Correct answer	Common mistake
E[B(s)·B(t)], s≤t	min(s,t) = s	Answering s·t or 0
E[B(3) \| B(1)=2]	2 (martingale)	Answering 6 (linear interpolation)
Is e^{B(t)} a martingale?	No, E[e^{B(t)}] = e^{t/2} grows	Confusing with e^{B(t)-t/2}
d(B(t)³) by Ito's formula	3B² dB + 3B dt	Writing 3B² dB (forgetting correction)
Quadratic variation of cos(B)	∫sin²(B_s)ds	Answering 0 or t

B(t)² is a martingale because B(t) is a martingale and x² is a convex function

B(t)² is a submartingale (Jensen's inequality). The martingale is B(t)² - t. The correction -t comes from Ito's formula: d(B²) = 2B dB + dt

Interview question: "Compute E[B(5) | B(2) = 3]."

Queues and Applications: Interview Questions

Interview scenario	Formula	Answer
"How many replicas are needed?"	ρ_per_server = λ/(c·μ) < 0.7	c = ceil(λ/(0.7·μ))
"Why did latency spike at 90% load?"	L = ρ/(1-ρ): from ρ 0.8→0.9, L: 4→9	Nonlinear queue growth
"Why is P95 so much higher than median?"	M/G/1: CV affects the tail	Heavy-tailed service times
"What about request hedging?"	W ~ 1/(μ-λ): lower effective CV	Reduces P99 at cost of extra requests

Two servers with speed μ/2 each are worse than one server with speed μ

At high load, two servers with speed μ/2 are better than one with speed μ (M/M/2 beats M/M/1). At low load they are roughly equivalent

System design interview: "The dashboard shows L=50 active connections, throughput=100 req/s. What is the mean response time?"

Practical Applications: Senior-Level Questions

Real-world problem	Model	Key result
A/B test: when to stop?	SPRT-martingale E-value	Stop when E_t > 1/α
Anomaly in metrics?	CUSUM-accumulate LLR	Detect mean shift optimally
Rate limiting (token bucket)	M/M/1 with finite buffer	P(reject) = ρ^K/(1-ρ^{K+1})
Exponential backoff	Geometric waiting distribution	Optimal for Poisson collisions

In an interview all formulas must be known by heart

System design interview: "Load grew from 70% to 90%. By how much did the mean number of requests in queue increase?"

Key Ideas

**Markov in interviews**-P(absorption from i) = i/N, E[return] = 1/πᵢ, PageRank = stationary π; check periodicity
**BM in interviews**-Cov(B(s),B(t)) = min(s,t), E[B(t)|B(s)] = B(s), B²-t is a martingale, d(B³) = 3B²dB + 3B dt
**Queues in interviews**-W = L/λ (Little's Law), W = 1/(μ-λ) for M/M/1, nonlinearity L = ρ/(1-ρ)
**Senior questions**-answer structure: process → equations → theorem → engineering interpretation; order of magnitude beats a memorized formula

Вопросы для размышления

An interviewer asks: "Why does SGD with large batch size often generalize worse?" How can this be connected to stochastic process theory?
"Design an experiment that shows the service follows M/M/1 rather than M/G/1"-how should this be answered?
The hardest part of stochastic processes from practice-what should be explained differently from how it appears in textbooks?

Связанные уроки

prob-01-intro

Stochastic Processes in Interviews

Предварительные знания

Markov Chains: Interview Questions

Brownian Motion: Interview Questions

Queues and Applications: Interview Questions

Practical Applications: Senior-Level Questions

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Stochastic Processes in Interviews

Предварительные знания

Markov Chains: Interview Questions

Brownian Motion: Interview Questions

Queues and Applications: Interview Questions

Practical Applications: Senior-Level Questions

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки