Convex Optimization

Bayesian Optimization

Mockus in 1978 proposed Expected Improvement, and by 2012 Bayesian Optimization had become the standard in AutoML: Google, DeepMind, and OpenAI use it for neural network hyperparameter tuning, saving thousands of GPU-hours.

AutoML: BO tunes neural network architecture and hyperparameters (SMAC, Optuna, HyperOpt)
Materials science: searching for new alloys with minimal lab experiments
Drug discovery: molecular optimization via GP surrogate over molecular space

Предварительные знания

Previous lesson

Gaussian Process Surrogate

Bayesian Optimization (Mockus 1978) builds a surrogate model of f via Gaussian process: f|X,y ~ N(mu_n(x), sigma^2_n(x)). Kernel hyperparameters are optimized by maximum likelihood. The acquisition function balances exploration/exploitation without f's gradient. Srinivas et al. (2010): GP-UCB achieves O(sqrt(T*gamma_T)) sublinear regret.

What is gamma_T in the GP-UCB regret formula?

Applications and Extensions of Bayesian Optimization

AutoML uses Bayesian Optimization for hyperparameter tuning (SMAC, TPE). BOHB (Falkner 2018) combines BO and Hyperband for resource-efficient search. Multi-task BO (MTL-BO): transfer of prior between tasks. Parallel BO: q-EI for batch acquisition.

Why is Bayesian Optimization efficient with few function evaluations?

Key Ideas

GP surrogate: f|X,y ~ N(mu_n(x), sigma^2_n(x)) via Bayes kernel rule
EI(x) = (mu_n - f*) Phi(Z) + sigma_n phi(Z) -- analytically tractable for GP
GP-UCB: R_T = O(sqrt(T*gamma_T)), gamma_T = max_{|S|=T} I(f; y_S)
BOHB: BO + Hyperband -- resource-efficient hyperparameter search
q-EI: parallel selection of q points via MC sampling

Further Directions

These ideas open paths to deeper mathematics.

co-28-bandit-opt — extends

Вопросы для размышления

Give a concrete example.
How does this connect to other areas of mathematics?