Statistics
Bayesian Statistics
'Candidate X has a 70% chance of winning the election' - this is a Bayesian statement. Frequentist statistics can't make such claims: an election happens once, there's no 'limiting frequency.' Bayesian statistics quantifies uncertainty where the frequentist approach falls silent.
- Netflix uses Bayesian models for personalization.
- Spam filters (Naive Bayes) were one of the first applications.
- Medical diagnosis, financial risk models, autonomous vehicles - anywhere uncertainty needs to be expressed quantitatively.
Предварительные знания
Bayesian vs. Frequentist Approach
**Two views on probability:** Frequentist: probability = limiting frequency over infinite repetitions. Cannot talk about the 'probability of a hypothesis' (it is either true or not). Bayesian: probability = degree of belief in the truth of a statement, updated as data arrive.
**The counterintuitive medical test result!** Even with a good test (95% sensitivity), for a rare disease (1% prevalence) a positive result means only ~9% probability of actually having the disease. That's why confirmatory tests are recommended. Base rate neglect - the failure to account for the prior - is a classic cognitive bias.
In the Bayesian approach, what is the 'prior'?
Bayesian Updating: From Prior to Posterior
**Bayes' Theorem:** P(θ|X) = P(X|θ) × P(θ) / P(X) Where: θ - parameter/hypothesis, X - data, P(θ) - prior, P(X|θ) - likelihood, P(θ|X) - posterior, P(X) - normalizing constant. Repeatedly updating as new data arrive is the central power of the Bayesian approach.
**Conjugate priors:** when the prior and posterior belong to the same distribution family. Examples: Beta/Binomial, Normal/Normal, Gamma/Poisson. This yields an analytical solution without MCMC. In production systems, used for online updating (A/B tests, recommender systems).
Prior: Beta(2, 2) for a coin's p. We observe 7 heads and 3 tails. What is the posterior?
Bayesian Inference in Practice: MCMC and A/B Tests
For complex models, an analytical posterior is out of reach. **MCMC (Markov Chain Monte Carlo)** is a family of algorithms for sampling from the posterior without computing it explicitly. PyMC and Stan are the main tools. For A/B tests, the Bayesian approach gives direct answers without p-values.
**Credible interval vs Confidence interval:** a 95% credible interval [a, b] means 'there is a 95% probability that the true parameter lies in [a, b]' - which is what most people naively assume a confidence interval means! A 95% confidence interval means: 'if the experiment were repeated infinitely, 95% of such intervals would contain the true value.' Bayesian inference gives the more intuitive interpretation.
A Bayesian A/B test shows P(B > A) = 0.92. What does this mean?
Key Ideas
- Bayesian approach: probability = degree of belief, updated as data arrive
- Bayes' Theorem: P(θ|X) ∝ P(X|θ) × P(θ) - likelihood × prior
- Prior → Posterior: each observation refines beliefs
- Conjugate priors (Beta/Binomial, Normal/Normal) yield analytical posteriors
- Credible interval: P(θ ∈ [a,b] | data) = 0.95 - direct probabilistic interpretation
- MCMC (PyMC, Stan) - for complex models without analytical posteriors
What's Next
Non-parametric tests are an alternative for data that violate the assumptions of parametric methods. Bayesian non-parametric models (e.g., Gaussian processes) combine both approaches.
- Non-Parametric Tests — Non-parametric methods need no distributional prior; they work with ranks instead of values
Вопросы для размышления
- How is a prior chosen when expert knowledge about a parameter is available? How does this change when a lot of data is available?
- Why is 'base rate neglect' (ignoring the prior) so common? Give a real-life example where ignoring the base rate leads to incorrect conclusions.
- Compare the interpretation of a 95% confidence interval and a 95% credible interval. Why is the latter more intuitive for most people?