Optimization

Constrained Optimization

Training a Tesla autopilot that won't run red lights, or optimizing a trading portfolio with a strict risk limit - these are constrained optimization problems. KKT conditions are the language that describes all of them-from SVM training to policy gradient in safe RL.

**SVM:** SVM training is a KKT problem; support vectors are points with lambda_i > 0 (active constraints); the dual formulation enables the kernel trick
**Safe RL:** constrained policy gradient (CPO, TRPO, PPO) uses KKT for constraint satisfaction-the foundation of safe autonomous systems
**Federated Learning:** ADMM decomposes the learning problem into client subproblems; used in FL with privacy constraints at Apple and Google

Предварительные знания

Nonlinear Optimization

Lagrangian and Multipliers

Constrained problem: min f(x) s.t. hi(x) = 0, gj(x) <= 0. The **Lagrangian** merges objective and constraints into one function: L(x, lambda, mu) = f(x) + sum(lambda_i * hi(x)) + sum(mu_j * gj(x)). Multipliers lambda_i (equalities) and mu_j >= 0 (inequalities) are the 'prices' of constraint violation.

**Economic interpretation:** lambda_i is the 'shadow price' of constraint hi = 0. Relaxing the constraint by epsilon changes the optimum by approximately lambda_i * epsilon. This is identical to dual variables in LP-sensitivity of the optimum to constraint right-hand sides.

In min f(x) s.t. g(x)=0, the Lagrange multiplier lambda = 5. What happens to the optimum if we relax the constraint to g(x) <= 0.1?

KKT Conditions

**KKT conditions** (Karush-Kuhn-Tucker, 1951) are the necessary optimality conditions for a constrained problem. For convex problems they are also sufficient. Any optimum x* must satisfy four groups of conditions.

KKT condition	Math	Meaning
Stationarity	nabla_x L(x,lam,mu*) = 0	Optimal in x with fixed multipliers
Primal feasibility	g_i(x) <= 0, h_j(x) = 0	Solution satisfies all constraints
Dual feasibility	lam_i >= 0	Inequality multipliers are non-negative
Complementary slackness	lam_i * g_i(x*) = 0	Only tight constraints have nonzero price

**Complementary slackness** is the most intuitive KKT condition: either a constraint is active (gi = 0) or its price is zero (lambda_i = 0). Both conditions cannot hold simultaneously-unless degenerate. This explains why in SVM only support vectors have nonzero alpha_i.

At the optimum of min f(x) s.t. g(x) <= 0, we have g(x*) = -3 (constraint is inactive). What does this imply about lambda*?

Penalty and Augmented Lagrangian

To solve a constrained problem using unconstrained methods, the **penalty method** adds a quadratic penalty: min f(x) + (rho/2)||h(x)||^2. As rho goes to infinity, the solution approaches the constrained optimum. Problem: large rho causes ill-conditioning. **Augmented Lagrangian** combines Lagrangian and penalty: La = f + lambda^T*h + (rho/2)||h||^2-a finite rho suffices.

**Augmented Lagrangian vs plain penalty:** plain penalty needs rho -> infinity for exact constraint satisfaction (ill-conditioning). Augmented Lagrangian updates lambda-a finite rho achieves exact satisfaction. This is the Hestenes-Powell method of multipliers (1969), the foundation of ADMM.

Why is Augmented Lagrangian better than plain penalty method for exact constraint satisfaction?

ADMM: Distributed Optimization

**ADMM** (Alternating Direction Method of Multipliers) extends Augmented Lagrangian to problems of the form min f(x) + g(z) s.t. Ax + Bz = c. It splits the problem into two subproblems, solved alternately: an x-update and a z-update. Natural fit for distributed ML.

**Safe RL via constrained optimization:** the constrained policy optimization problem (max reward s.t. E[violations] <= delta) is solved through CPO (Constrained Policy Optimization) or ADMM-based approaches. KKT conditions give the optimality criterion for policy gradients with constraints.

KKT conditions are purely theoretical and not used in practice

KKT conditions underlie all industrial constrained solvers: interior point methods solve perturbed KKT systems, SVM is built on dual KKT, ADMM updates dual variables per KKT.

Primal-dual IPM at each iteration solves: nabla_x L = 0, h(x)=0, g(x)<=0, lambda>=0, lambda*g=0-with a barrier smoothing. This is the KKT system. SVM dual: the KKT condition w=sum(alpha_i*yi*xi) is used directly for prediction. KKT is not just theory-it's the algorithmic foundation of modern constrained optimization.

Why is ADMM popular for distributed ML problems?

Key Ideas

**Lagrangian** L(x,lambda,mu) merges objective and constraints; multipliers are shadow prices of constraints
**KKT:** stationarity + primal feasibility + dual feasibility + complementary slackness-necessary optimality conditions (sufficient for convex problems)
**Augmented Lagrangian** solves constrained problems with unconstrained methods; dual update achieves exact constraint satisfaction with finite rho
**ADMM** decomposes composite problems into subproblems; standard for distributed ML and federated learning

Вопросы для размышления

Complementary slackness says: either the constraint is active or its price is zero. Why are both conditions impossible simultaneously? How does this connect to SVM support vectors?
Gradient descent can't handle constraints directly. Why does Augmented Lagrangian transform a constrained problem into a sequence of unconstrained ones, and how does this help?
How does ADMM apply to federated learning? What corresponds to the x-update, z-update, and dual update in that context?

Связанные уроки

calc-01-sequences

Optimization

Constrained Optimization

**SVM:** SVM training is a KKT problem; support vectors are points with lambda_i > 0 (active constraints); the dual formulation enables the kernel trick
**Safe RL:** constrained policy gradient (CPO, TRPO, PPO) uses KKT for constraint satisfaction-the foundation of safe autonomous systems
**Federated Learning:** ADMM decomposes the learning problem into client subproblems; used in FL with privacy constraints at Apple and Google

Предварительные знания

Nonlinear Optimization

Lagrangian and Multipliers

In min f(x) s.t. g(x)=0, the Lagrange multiplier lambda = 5. What happens to the optimum if we relax the constraint to g(x) <= 0.1?

KKT Conditions

KKT condition	Math	Meaning
Stationarity	nabla_x L(x,lam,mu*) = 0	Optimal in x with fixed multipliers
Primal feasibility	g_i(x) <= 0, h_j(x) = 0	Solution satisfies all constraints
Dual feasibility	lam_i >= 0	Inequality multipliers are non-negative
Complementary slackness	lam_i * g_i(x*) = 0	Only tight constraints have nonzero price

At the optimum of min f(x) s.t. g(x) <= 0, we have g(x*) = -3 (constraint is inactive). What does this imply about lambda*?

Penalty and Augmented Lagrangian

Why is Augmented Lagrangian better than plain penalty method for exact constraint satisfaction?

ADMM: Distributed Optimization

KKT conditions are purely theoretical and not used in practice

KKT conditions underlie all industrial constrained solvers: interior point methods solve perturbed KKT systems, SVM is built on dual KKT, ADMM updates dual variables per KKT.

Why is ADMM popular for distributed ML problems?

Key Ideas

**Lagrangian** L(x,lambda,mu) merges objective and constraints; multipliers are shadow prices of constraints
**KKT:** stationarity + primal feasibility + dual feasibility + complementary slackness-necessary optimality conditions (sufficient for convex problems)
**Augmented Lagrangian** solves constrained problems with unconstrained methods; dual update achieves exact constraint satisfaction with finite rho
**ADMM** decomposes composite problems into subproblems; standard for distributed ML and federated learning

Вопросы для размышления

Complementary slackness says: either the constraint is active or its price is zero. Why are both conditions impossible simultaneously? How does this connect to SVM support vectors?
Gradient descent can't handle constraints directly. Why does Augmented Lagrangian transform a constrained problem into a sequence of unconstrained ones, and how does this help?
How does ADMM apply to federated learning? What corresponds to the x-update, z-update, and dual update in that context?

Связанные уроки

calc-01-sequences

Constrained Optimization

Предварительные знания

Lagrangian and Multipliers

KKT Conditions

Penalty and Augmented Lagrangian

ADMM: Distributed Optimization

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Constrained Optimization

Предварительные знания

Lagrangian and Multipliers

KKT Conditions

Penalty and Augmented Lagrangian

ADMM: Distributed Optimization

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки