convex optimization & lagrange duality

33
Convex Optimization & Lagrange Duality Chee Wei Tan CS6491 Topics in Optimization and its Applications to Computer Science

Upload: others

Post on 16-Oct-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality

Chee Wei Tan

CS6491 Topics in Optimization and its Applications to Computer Science

Page 2: Convex Optimization & Lagrange Duality

Outline

• Convex optimization

• Optimality condition

• Lagrange duality

• KKT optimality condition

• Sensitivity analysis

1

Page 3: Convex Optimization & Lagrange Duality

Convex Optimization

A convex optimization problem with variables x:

minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m

aTi x = bi, i = 1, 2, . . . , p

where f0, f1, . . . , fm are convex functions.

• Minimize convex objective function (or maximize concave objectivefunction)

• Upper bound inequality constraints on convex functions ()Constraint set is convex)

• Equality constraints must be a�ne

2

Page 4: Convex Optimization & Lagrange Duality

Convex Optimization

• Epigraph form:

minimize tsubject to f0(x)� t 0

fi(x) 0, i = 1, 2, . . . ,maTi x = bi, i = 1, 2, . . . , p

• Can you reformulate the Illumination Problem in Lecture 1?

3

Page 5: Convex Optimization & Lagrange Duality

• Can you reformulate the following problem (not in convexoptimization form):

minimize x21 + x2

2

subject to x11+x2

2 0

(x1 + x2)2 = 0

Now transformed into a convex optimization problem:

minimize x21 + x2

2

subject to x1 0x1 + x2 = 0

4

Page 6: Convex Optimization & Lagrange Duality

Locally Optimal ) Globally Optimal

Given x is locally optimal for a convex optimization problem, i.e., x is feasibleand for some R > 0,

f0(x) = inf{f0(z)|z is feasible , kz � xk2 R}

Suppose x is not globally optimal, i.e., there is a feasible y such thatf0(y) < f0(x)

Since ky � xk2 > R, we can construct a point z = (1� ✓)x+ ✓y where✓ = R

2ky�xk2. By convexity of feasible set, z is feasible. By convexity of f0, wehave

f0(z) (1� ✓)f0(x) + ✓f0(y) < f0(x)

which contradicts locally optimality of x

Therefore, there exists no feasible y such that f0(y) < f0(x)

5

Page 7: Convex Optimization & Lagrange Duality

Optimality Condition for Di↵erentiable f0

x is optimal for a convex optimization problem i↵ x is feasible andfor all feasible y:

rf0(x)T (y � x) � 0

�rf0(x) is supporting hyperplane to feasible set

Unconstrained convex optimization: condition reduces to:

rf0(x) = 0

Proof: take y = x� trf0(x) where t 2 R+. For small enough t, y isfeasible, so rf0(x)T (y � x) = �tkrf0(x)k22 � 0. Thus rf0(x) = 0

6

Page 8: Convex Optimization & Lagrange Duality

Unconstrained Quadratic Optimization

Minimize f0(x) =12x

TPx+ qTx+ r

P is positive semidefinite. So it’s a convex optimization problem

x minimizes f0 i↵ (P, q) satisfy this linear equality:

rf0(x) = Px+ q = 0

• If q /2 R(P ), no solution. f0 unbounded below

• If q 2 R(P ) and P � 0, there is a unique minimizer x⇤ = �P�1q

• If q 2 R(P ) and P is singular, set of optimal x: �P †q +N (P )

7

Page 9: Convex Optimization & Lagrange Duality

Equality Constrained Convex Optimization

minimize f0(x)subject to Ax = b

Assume nonempty feasible set. Sincerf0(x)T (y � x) � 0, 8y : Ay = b, and every feasible y can be

written as y = x+ v for some v 2 N (A), optimality condition is:

rf0(x)Tv � 0, 8v 2 N (A)

which implies rf0(x)Tv = 0, i.e., rf0(x) is orthogonal to N (A).Thus rf0(x) 2 R(AT ), i.e., there exists ⌫ such that

rf0(x) +AT⌫ = 0

Conclusion: x is optimal i↵ Ax = b and 9⌫ s.t. rf0(x) +AT⌫ = 0

8

Page 10: Convex Optimization & Lagrange Duality

Duality Mentality

Bound or solve an optimization problem via a di↵erent optimizationproblem!

We’ll develop the basic Lagrange duality theory for a generaloptimization problem, then specialize for convex optimization

9

Page 11: Convex Optimization & Lagrange Duality

Lagrange Dual Function

An optimization problem in standard form:

minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m

hi(x) = 0, i = 1, 2, . . . , p

Variables: x 2 Rn. Assume nonempty feasible set

Optimal value: p⇤. Optimizer: x⇤

Idea: augment objective with a weighted sum of constraints

Lagrangian L(x,�, ⌫) = f0(x) +Pm

i=1 �ifi(x) +Pp

i=1 ⌫ihi(x)

Lagrange multipliers (dual variables): � ⌫ 0, ⌫

Lagrange dual function: g(�, ⌫) = infxL(x,�, ⌫)

10

Page 12: Convex Optimization & Lagrange Duality

Lower Bound on Optimal Value

Claim: g(�, ⌫) p⇤, 8� ⌫ 0, ⌫

Proof: Consider feasible x̃:

L(x̃,�, ⌫) = f0(x̃) +mX

i=1

�ifi(x̃) +pX

i=1

⌫ihi(x̃) f0(x̃)

since fi(x̃) 0 and �i � 0

Hence, g(�, ⌫) L(x̃,�, ⌫) f0(x̃) for all feasible x̃

Therefore, g(�, ⌫) p⇤

11

Page 13: Convex Optimization & Lagrange Duality

Lagrange Dual Function and ConjugateFunction

• Lagrange dual function g(�, ⌫)

• Conjugate function: f⇤(y) = supx2dom f(yTx� f(x))

Consider linearly constrained optimization:

minimize f0(x)subject to Ax � b

Cx = d

g(�, ⌫) = infx

�f0(x) + �T (Ax� b) + ⌫T (Cx� d)

= �bT�� dT⌫ + infx

�f0(x) + (AT�+ CT⌫)Tx

= �bT�� dT⌫ � f⇤0 (�AT�� CT⌫)

12

Page 14: Convex Optimization & Lagrange Duality

Example

We’ll use the simplest version of entropy maximization as ourexample for the rest of this lecture on duality. Entropy maximization

is an important basic problem in information theory:

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b1Tx = 1

Since the conjugate function of u log u is ey�1, by independence ofthe sum, we have

f⇤0 (y) =

nX

i=1

eyi�1

13

Page 15: Convex Optimization & Lagrange Duality

Therefore, dual function of entropy maximization is

g(�, ⌫) = �bT�� ⌫ � e�⌫�1nX

i=1

e�aTi �

where ai are columns of A

14

Page 16: Convex Optimization & Lagrange Duality

Lagrange Dual Problem

Lower bound from Lagrange dual function depends on (�, ⌫).What’s the best lower bound that can be obtained from Lagrange

dual function?

maximize g(�, ⌫)subject to � ⌫ 0

This is the Lagrange dual problem with dual variables (�, ⌫)

Always a convex optimization! (Dual objective function always aconcave function since it’s the infimum of a family of a�ne

functions in (�, ⌫))

Denote the optimal value of Lagrange dual problem by d⇤

15

Page 17: Convex Optimization & Lagrange Duality

Weak Duality

What’s the relationship between d⇤ and p⇤?

Weak duality always hold (even if primal problem is not convex):

d⇤ p⇤

Optimal duality gap:

p⇤ � d⇤

E�cient generation of lower bounds through (convex) dual problem

16

Page 18: Convex Optimization & Lagrange Duality

Strong Duality

Strong duality (zero optimal duality gap):

d⇤ = p⇤

If strong duality holds, solving dual is ‘equivalent’ to solving primal.But strong duality does not always hold

Convexity and constraint qualifications ) Strong duality

A simple constraint qualification: Slater’s condition (there existsstrictly feasible primal variables fi(x) < 0 for non-a�ne fi)

Another reason why convex optimization is ‘easy’

17

Page 19: Convex Optimization & Lagrange Duality

Example: Entropy Maximization

Primal optimization problem (variables x):

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b, 1Tx = 1

Dual optimization problem (variables �, ⌫):

maximize �bT�� ⌫ � e�⌫�1Pn

i=1 e�aTi �

subject to � ⌫ 0

Analytically maximize over the unconstrained ⌫ ) Simplified dualoptimization problem (variables �) and strong duality holds:

maximize �bT�� logPn

i=1 exp(�aTi �)subject to � ⌫ 0

18

Page 20: Convex Optimization & Lagrange Duality

Saddle Point Interpretation

Assume no equality constraints. We can express primal optimalvalue as

p⇤ = infx

sup�⌫0

L(x,�)

By definition of dual optimal value:

d⇤ = sup�⌫0

infx

L(x,�)

Weak duality (max min inequality):

sup�⌫0

infx

L(x,�) infx

sup�⌫0

L(x,�)

19

Page 21: Convex Optimization & Lagrange Duality

Strong duality (saddle point property):

sup�⌫0

infx

L(x,�) = infx

sup�⌫0

L(x,�)

20

Page 22: Convex Optimization & Lagrange Duality

Complementary Slackness

Assume strong duality holds:

f0(x⇤) = g(�⇤, ⌫⇤)

= infx

f0(x) +

mX

i=1

�⇤i fi(x) +

pX

i=1

⌫⇤i hi(x)

!

f0(x⇤) +

mX

i=1

�⇤i fi(x

⇤) +pX

i=1

⌫⇤i hi(x⇤)

f0(x⇤)

21

Page 23: Convex Optimization & Lagrange Duality

Complementary Slackness

So the two inequalities must hold with equality. This implies:

�⇤i fi(x

⇤) = 0, i = 1, 2, . . . ,m

Complementary Slackness Property:

�⇤i > 0 ) fi(x

⇤) = 0

fi(x⇤) < 0 ) �⇤

i = 0

22

Page 24: Convex Optimization & Lagrange Duality

KKT Optimality Conditions

Since x⇤ minimizes L(x,�⇤, ⌫⇤) over x, we have

rf0(x⇤) +

mX

i=1

�⇤irfi(x

⇤) +pX

i=1

⌫⇤irhi(x⇤) = 0

Karush-Kuhn-Tucker optimality conditions:

fi(x⇤) 0, hi(x⇤) = 0, �⇤i ⌫ 0

�⇤i fi(x

⇤) = 0

rf0(x⇤) +Pm

i=1 �⇤irfi(x⇤) +

Ppi=1 ⌫

⇤irhi(x⇤) = 0

23

Page 25: Convex Optimization & Lagrange Duality

KKT Optimality Conditions

• Any optimization (with di↵erentiable objective and constraintfunctions) with strong duality, KKT condition is necessarycondition for primal-dual optimality

• Convex optimization (with di↵erentiable objective and constraintfunctions) with Slater’s condition, KKT condition is also su�cientcondition for primal-dual optimality (useful for theoretical andnumerical purposes)

24

Page 26: Convex Optimization & Lagrange Duality

Example: Entropy Maximization

Primal optimization problem (variables x):

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b, 1Tx = 1

Dual optimization problem (variables �, ⌫):

maximize �bT�� ⌫ � e�⌫�1Pn

i=1 e�aTi �

subject to � ⌫ 0

Having solved dual problem, recover optimal primal variables asminimizer of

L(x,�⇤, ⌫⇤) =nX

i=1

xi log xi + �⇤T (Ax� b) + ⌫⇤(1Tx� 1)

25

Page 27: Convex Optimization & Lagrange Duality

i.e., x⇤i = 1/ exp(aTi �

⇤ + ⌫⇤ + 1)

If x⇤ above is primal feasible, it’s the optimal primal. Otherwise, theprimal optimum is not attained

26

Page 28: Convex Optimization & Lagrange Duality

Example: Maximizing Channel Capacity asWater-filling

maximizenX

i=1

log

✓1 +

xi

↵i

subject to x ⌫ 0, 1Tx = 1

Variables: x (powers). Constants: ↵i > 0 (channel noise)

KKT conditions:

x⇤ ⌫ 0, 1Tx⇤ = 1, �⇤ ⌫ 0

�⇤ix

⇤i = 0, �1/(↵i + x⇤

i )� �⇤i + ⌫⇤ = 0

27

Page 29: Convex Optimization & Lagrange Duality

Since �⇤ are slack variables, reduce to

x⇤ ⌫ 0, 1Tx⇤ = 1

x⇤i (⌫

⇤ � 1/(↵i + x⇤i )) = 0, ⌫⇤ � 1/(↵i + x⇤

i )

If ⌫⇤ < 1/↵i, x⇤i > 0. So x⇤

i = 1/⌫⇤ � ↵i. Otherwise, x⇤i = 0

Thus, x⇤i = [1/⌫⇤ � ↵i]+ where ⌫⇤ is such that

Pi x

⇤i = 1

• Draw a picture to interpret this optimal solution

• Design an algorithm to compute this optimal solution

28

Page 30: Convex Optimization & Lagrange Duality

Global Sensitivity Analysis

Perturbed optimization problem:

minimize f0(x)subject to fi(x) ui, i = 1, 2, . . . ,m

hi(x) = vi i = 1, 2, . . . , p

Optimal value p⇤(u, v) as a function of parameters (u, v)

Assume strong duality and that dual optimum is attained:

p⇤(0, 0) = g(�⇤, ⌫⇤) f0(x) +

Pi �

⇤i fi(x) +

Pi ⌫

⇤i hi(x)

f0(x) + �⇤Tu+ ⌫⇤Tv

29

Page 31: Convex Optimization & Lagrange Duality

Global Sensitivity Analysis

p⇤(u, v) � p⇤(0, 0)� �⇤Tu� ⌫⇤Tv

• If �⇤i is large, tightening ith constraint (ui < 0) will increase

optimal value greatly

• If �⇤i is small, loosening ith constraint (ui > 0) will reduce optimal

value only slightly

30

Page 32: Convex Optimization & Lagrange Duality

Local Sensitivity Analysis

Assume that p⇤(u, v) is di↵erentiable at (0, 0):

�⇤i = �@p⇤(0, 0)

@ui, ⌫⇤i = �@p⇤(0, 0)

@vi

Shadow price interpretation of Lagrange dual variables

Small �⇤i means tightening or loosening ith constraint will not

change optimal value by much

31

Page 33: Convex Optimization & Lagrange Duality

Summary

• Convexity mentality. Convex optimization is ‘nice’ for severalreasons: local optimum is global optimum, zero optimal duality gap(under mild conditions), KKT optimality conditions are necessaryand su�cient

• Duality mentality. Can always bound primal through dual,sometimes indirectly solve primal through dual

• Primal-dual: where is the optimum, how sensitive is it toperturbation in problem parameters?

Reading assignment: Sections 4.1-4.2 and 5.1-5.7 of textbook.

32