Convex Optimization
Bayesian Optimization
Mockus in 1978 proposed Expected Improvement, and by 2012 Bayesian Optimization had become the standard in AutoML: Google, DeepMind, and OpenAI use it for neural network hyperparameter tuning, saving thousands of GPU-hours.
- AutoML: BO tunes neural network architecture and hyperparameters (SMAC, Optuna, HyperOpt)
- Materials science: searching for new alloys with minimal lab experiments
- Drug discovery: molecular optimization via GP surrogate over molecular space
Предварительные знания
Gaussian Process Surrogate
Bayesian Optimization (Mockus 1978) builds a surrogate model of f via Gaussian process: f|X,y ~ N(mu_n(x), sigma^2_n(x)). Kernel hyperparameters are optimized by maximum likelihood. The acquisition function balances exploration/exploitation without f's gradient. Srinivas et al. (2010): GP-UCB achieves O(sqrt(T*gamma_T)) sublinear regret.
What is gamma_T in the GP-UCB regret formula?
Applications and Extensions of Bayesian Optimization
AutoML uses Bayesian Optimization for hyperparameter tuning (SMAC, TPE). BOHB (Falkner 2018) combines BO and Hyperband for resource-efficient search. Multi-task BO (MTL-BO): transfer of prior between tasks. Parallel BO: q-EI for batch acquisition.
Why is Bayesian Optimization efficient with few function evaluations?
Key Ideas
- GP surrogate: f|X,y ~ N(mu_n(x), sigma^2_n(x)) via Bayes kernel rule
- EI(x) = (mu_n - f*) Phi(Z) + sigma_n phi(Z) -- analytically tractable for GP
- GP-UCB: R_T = O(sqrt(T*gamma_T)), gamma_T = max_{|S|=T} I(f; y_S)
- BOHB: BO + Hyperband -- resource-efficient hyperparameter search
- q-EI: parallel selection of q points via MC sampling
Further Directions
These ideas open paths to deeper mathematics.
- co-28-bandit-opt — extends
Вопросы для размышления
- Give a concrete example.
- How does this connect to other areas of mathematics?