Statistical Learning Theory
PAC Learning Extensions
Standard PAC assumes the correct answer lies in class H. In practice it does not. Agnostic PAC is honest: it guarantees competition with the best in H, nothing more.
- **Search ranking:** Google Search uses agnostic PAC guarantees with 10^9 examples and VC-dimension 10^6 - realizability is known to be violated
- **Medical diagnosis:** Learning with imprecise or contested diagnoses - a classic agnostic setting where the true label is unknown or noisy
- **AutoML:** Model selection from class H without realizability assumption: the agnostic bound guides architecture search
- **Financial forecasting:** Markets are not generated by any model in H; agnostic PAC formalizes competition with the best available strategy
Предварительные знания
- Standard PAC learning
- VC-dimension
- Online learning and bandits
Agnostic PAC learning
Google Search uses agnostic PAC guarantees for ranking models: with 10^9 training examples and a hypothesis class of VC-dimension d = 10^6, the generalization error stays below epsilon = 0.01 with probability 1 - delta = 0.95.
How does agnostic PAC differ from realisable PAC?
In realisable PAC: best-in-class achieves 0 error (some h* ∈ H with R(h*) = 0). In agnostic PAC: the best hypothesis h* = argmin R(h) may have R(h*) > 0. ERM guarantees R(ERM) <= R(h*) + sqrt(O(VC log(n))/n) - suitable for noisy or misspecified models.
Uniform convergence and ERM
Which property of H guarantees ERM-learnability in the agnostic setting?
Fundamental theorem of statistical learning (Vapnik-Chervonenkis): VC(H) < ∞ ↔ H is agnostically PAC-learnable via ERM with sample complexity O((VC(H) + log(1/delta))/eps²). Rademacher complexity gives finer distribution-aware bounds.
Improper learning and extensions
Agnostic PAC as competing with the best team member - Class H is a team of specialists. The true answer is unknown to everyone. The task is to perform no worse than the best team member, measured by actual (not empirical) performance. Zero error cannot be guaranteed - but competition with the best available tool can. This is the honest formulation for real-world ML.
What is improper learning?
In proper learning A(S) ∈ H. In improper learning A(S) may lie outside H. This often yields strictly better bounds: for disjunctions on n variables proper learning requires Theta(n/eps) examples while improper learning needs only O(log(n)/eps). DNNs can be viewed as improper learners for classes of simple functions.
Connections
Agnostic PAC connects classical learnability theory with modern guarantees for neural networks and online methods.
- VC theory — Related topic
- Rademacher complexity — Related topic
- Online learning — Related topic
- Noise models — Related topic
Итоги
- Agnostic PAC requires no realizability: the goal is to compete with opt_H, not find c*
- ERM bound: L(h_S) no worse than opt_H + O(sqrt(log|H|/m)) for finite H
- Uniform convergence is necessary and sufficient for agnostic PAC learnability
- Sample size: m = O((log|H| + log(1/delta)) / epsilon^2) for an (epsilon, delta)-agnostic guarantee
- Massart noise (eta < 1/2) allows a smaller sample complexity than the fully agnostic case
What is the key difference between agnostic and standard PAC learning?
Agnostic PAC: the true concept may not lie in H. The goal is to find h with error no worse than opt_H + epsilon. No realizability assumption is made.