short course robust optimization and machine …robust optimization and machine learning lecture 2:...

Robust Optimization &Machine Learning

2. ConvexOptimization

Convex problemsConvex sets

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Short CourseRobust Optimization and Machine Learning

Lecture 2: Convex Optimization

Laurent El Ghaoui

EECS and IEOR DepartmentsUC Berkeley

Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012

January 16, 2012

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Outline

Convex problemsConvex setsConvex functionsConvex problems

DualityWeak dualityExamplesStrong duality

References

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Outline

References

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex setsDefinition

A set C in Rn is convex if, or any two pair of points, such as x1 and x2,the line segment joining the two points is entirely in the set.Mathematically:

∀ x1, x2 ∈ C, ∀ λ ∈ [0, 1] : λx1 + (1− λx2) ∈ C.

Points x1,x2 are in the set, so the linebetween them is.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex setsIntersection rule

The intersection of convex sets isconvex: if Cα is a family of convexsets indexed by α ∈ A, then∩α∈ACα is convex. (We allow forinfinite sets A.)

Examples:I the second-order cone {(y , t) :: ‖y‖2 ≤ t} is convex, as it is the

intersection of half-spaces of the form uT y ≤ t with u spanningthe unit Euclidean sphere.

I the cone of positive, semi-definite (PSD) matrices is convex,since a n × n symmetric matrix X is PSD iff zT Xz ≥ 0 for everyn-vector z.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex setsAffine transformation rule

Affine transformations of sets are convex. If C ⊆ Rm is convex, thenfor any x ∈ Rn and R ∈ Rn×m, the set {x + Ru : u ∈ C} is convex.

Example: For a given m × n matrix A, m-vector b, n-vector c, andscalar d , the set {

x ∈ Rn : ‖Ax + b‖2 ≤ cT x + d}

is convex.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex setsExample: chance constraint

If a ∈ Rn is a Gaussian random variable with mean a and covariancematrix Σ, then the set of points x ∈ Rn such that

Prob{a : aT x ≤ b} ≥ 0.99

is convex, and can be expressed via the second-order cone condition

aT x + κ√

xT Σx︸︷︷︸=‖Σ1/2x‖2

≤ b,

where κ = Φ−1(0.99) ≈ 2.33, with Φ the CDF of the standardGaussian.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsExtended value functions

Extended value functions and domain:I The domain of a function f , denoted by domf , is the set of

points where it is finite.Example: f : x → log x , dom f = R++.

I We can extend functions outside the domain using the values±∞.Example: f : x → log x , x > 0, +∞ otherwise.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsDefinition

A (extended value) function f : Rn → R is convex if

∀ x1, x2, ∀λ ∈ [0, 1] : f (λx1 + (1− λx2)) ≤ λf (x1) + (1− λ)f (x2).

(This requires the domain to be convex.)

Examples:I f : x → 1/x if x > 0, +∞ otherwise, is convex.I f : x → 1/x if x 6= 0, +∞ otherwise, is not convex.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsEpigraph

Epigraph of a function:

epi f :={

(x , t) ∈ Rn × R : f (x) ≤ t}.

f convex⇐⇒ epi f is convex.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsAlternate characterizations

f differentiable, convex⇐⇒

∀ x , x0 : f (x) ≥ f (x0) +∇f (x0)T (x − x0).

Affine approximation to f at anypoint is global lower bound onf .

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsAlternate characterizations

f twice differentiable, convex if and only if its Hessian

∇2f (x) =

(∂2f∂xi∂xj

)1≤ij≤n

is positive semidefinite (PSD) for every x .

In general, this condition is hard to check. For quadratic functionsthough, it is easy, as seen next.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Quadratic functionsRepresentation via symmetric matrices

A quadratic function q : Rn → R can be represented as

q(x) =12

xT Qx + bT x + c,

for appropriate symmetric matrix Q, vector b ∈ Rn, and scalar c.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Positive semidefinite matrices

Flashback from linear algebra:I A square matrix A has n (possibly non-distinct) eigenvalues,

which are (in general complex) numbers that solvedet(λI − A) = 0.

I Symmetric matrices have real eigenvalues only.I A symmetric matrix Q is said to be positive semi-definite (PSD) if

∀ x : xT Qx ≥ 0.

We write Q � 0.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Eigenvalue decomposition for symmetric matrices

Theorem (EVD of symmetric matrices)We can decompose any symmetric p × p matrix Q as

Q = UΛUT =

p∑i=1

λiuiuTi ,

where Λ = diag(λ1, . . . , λp), with λ1 ≥ . . . ≥ λn the eigenvalues, andU = [u1, . . . , up] is a p × p orthogonal matrix (UT U = Ip) that containsthe eigenvectors of Q.

Corollary

Q � 0⇐⇒ every eigenvalue is non-negative.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsConvex quadratic functions

A quadratic function q : Rn → R can be represented as

q(x) =12

xT Qx + bT x + c,

for appropriate symmetric matrix Q, vector b ∈ Rn, and scalar c.Since ∇2q(x) = Q:

Theorem (Convex quadratic functions)q convex⇐⇒ Q � 0.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsSome properties

I Pointwise maxima of convex functions are convex: if fα is a familyof convex functions sets indexed by α ∈ A, then f with valuesmaxα∈A

fα(x) is convex.

I Conversely, any convex function can be represented as maximaof affine ones. (Hence, “convex means max-linear”).

I If f is convex and component-wise monotone increasing andg1, . . . , gk are convex, then h = f ◦ g with valuesh(x) = f (g1(x), . . . , gk (x)) is convex.Proof: the epigraph of h can be represented via convexconstraints:

h(x) ≤ t ⇐⇒ ∃ u ∈ Rk : f (u) ≤ t , gi (x) ≤ ui , i = 1, . . . , k .

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex functionsExamples

The following functions are convex:I Norms, such as l1, l2 and l∞. Hence, for example,

x → ‖Ax + b‖2, with A a matrix and b a vector, is convex.I The function x → λmax(F (x)), with F (x) = F0 + x1F1 + . . .+ xmFm

and affine combination of given symmetric matrices, and λmax isthe largest eigenvalue, is convex.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Convex problemsDefinition

The problem in standard form

p∗ := minx

f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,Ax = b,

is convex if the functions f0, . . . , fm are all convex. Here A ∈ Rp×n,b ∈ Rp are given.

Note that only affine equality constraints are allowed.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

DefinitionMaximization problems

The problem in standard form

f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,Ax = b,

is convex ifI The function f0 is concave.I The functions f1, . . . , fm are all convex.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Problem classesLinear programming

Linear programming (LP) involves the minimization of a linear functionover a polytope:

cT0 x : cT

i x + di ≥ 0, i = 1, . . . ,m.

The problem

3x1 + 1.5x2

s.t. −1 ≤ x1 ≤ 2,0 ≤ x2 ≤ 3

is an LP, since the objective andconstraint functions are all affine.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Problem classesQuadratic programming

Quadratic programming (QP) involves the minimization of a quadraticconvex function over a polytope.

cT x + xT Qx : cTi x + di ≥ 0, i = 1, . . . ,m.

Here, Q must be PSD.The problem

x21 − x1x2 + 2x2

2−3x1 − 1.5x2

s.t. −1 ≤ x1 ≤ 2,0 ≤ x2 ≤ 3

is a QP, since objective isquadratic convex, and the theconstraint functions are all affine.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of a quadratic programThe slalom problem

The problem seen in Overview lecture can be formulated as a QP:

v1u0 +n∑

(yi + vi+1 − vi )2

: ‖v‖∞ ≤ c.

where σi (resp. yi ) is the horizontal (resp. vertical) distance betweenthe middle of the gates.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Problem classesSecond-order cone programming

Second-order cone programming (SOCP) generalizes LP and QP viathe inclusion of Euclidean norms in the constraint functions.

cT0 x : ‖Aix + bi‖2 ≤ cT

i x + di , i = 1, . . . ,m.

Application: Chance-constrained linear programming

cT x : Prob{aTi x ≤ b} ≥ 0.99, i = 1, . . . ,m,

where each ai is a Gaussian random variable with mean ai andcovariance matrix Σi , i = 1, . . . ,m.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Problem classesSemi-definite programming programming

Semi-definite programming (SDP) involves the minimization of a linearfunction over the constraint that a symmetric matrix affine in thedecision variables be positive-semidefinite:

cT0 x : F0 +

n∑i=1

xiFi � 0.

(Here, F0, . . . ,Fn are given symmetric matrices.)

Application: relaxation of Boolean problems, see later.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Outline

References

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Lagrangian

Consider the (non-necessarily convex) “primal” problem in standardform

p∗ := minx

f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,

We can write the problem as an unconstrained one:

p∗ := minx

maxy≥0L(x , y),

where L is the Lagrange function :

L(x , y) := f0(x) +m∑

yi fi (x).

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Min-max inequality

Theorem (Min-max inequality)For any function L of two variables:

maxyL(x , y) ≥ max

xL(x , y).

Proof.In the LHS, maximizing Player y has full information on x (y∗ is afunction of x). The RHS is better (smaller) for the minimizing Player x ,as then y has no information on x .

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Weak duality

Weak duality is the process of applying the min-max inequality to theproblem in unconstrained form:

p∗ = minx

maxy≥0L(x , y) ≥ d∗ := max

y≥0min

xL(x , y).

The dual problem is

d∗ = maxy≥0

G(y), where G(y) := minxL(x , y)

is called the dual function .

Since G is concave, the dual problem is convex (but not necessarilyeasy to solve!)

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

ExampleLP duality

Linear programming:

p∗ := min cT x : Ax ≤ b.

Lagrangian:L(x , y) = cT x + yT (Ax − b).

Dual is also an LP

p∗ ≥ d∗ = maxy≥0

minxL(x , y) = max

y≥0, AT y+c=0−bT y .

Turns out to be equivalent (p∗ = d∗), as seen later.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of weak dualityA Boolean problem

Let W = W T be a given n × n matrix.

p∗ := maxx

xT Wx : x2i = 1, i = 1, . . . , n

I Arises in e.g., segmentation problems (with W the Laplacianmatrix of a graph).

I Hard combinatorial problem.I Duality yields a provably good, non-trivial bound.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of weak dualityDual of Boolean problem

We can express the problem in an unconstrained way

p∗ = minx

maxyL(x , y)

where L is the Lagrangian:

L(x , y) := xT Wx +n∑

yi (1− x2i )

Dual function:

G(y) := maxxL(x , y) =

{ ∑ni=1 yi if diag(y)−W is PSD,

+∞ otherwise.

Dual problem is an SDP:

p∗ ≤ d∗ := maxy

G(y) = maxy

n∑i=1

yi : diag(y) � W .

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of weak duality

CVX syntax CVX is a matlab prototyping tool for convexoptimization [1].

Here is a matlab snippet that solves for the dual bound, assumingW , n are in the workspace:

cvx_beginvariable y(n,1);minimize( sum(y) );subject to

diag(y)-W == semidefinite(n) ;cvx_enddstar = sum(y);

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Strong dualityDefinition & main result

Recall weak duality:

p∗ = minx

maxy≥0L(x , y) ≥ d∗ := max

y≥0min

xL(x , y).

We say that strong duality holds when p∗ = d∗.I It holds when the primal problem is convex and strictly feasible.I It holds for LPs that are feasible.I It holds in some very specific non-convex problems. For example,

if m = 1 and both f0, f1 are quadratic.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of strong dualityBidual of Boolean problem

Return to Lagrange relaxation (“dual”) of Boolean problem:

d∗ = maxy

n∑i=1

yi : diag(y) � W .

Express it as max-min problem:

d∗ = maxy

minX�0

n∑i=1

yi + Tr X (diag(y)−W ).

Here we use the fact that for any symmetric matrix Q,

maxX�0

Tr QX =

{0 if Q � 0,

+∞ otherwise.

Exchanging min and max leads to the so-called bidual :

p∗∗ := maxX

Tr WX : X � 0, Xii = 1, i = 1, . . . , n.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of strong dualityBidual of Boolean problem

Bidual:max

XTr WX : X � 0, Xii = 1, i = 1, . . . , n.

I Also an SDP, with objective value equal to d∗ ( strong dualityholds ).

I Can obtain this directly by expressing original problem in terms ofmatrix variable X := xxT , and dropping the rank constraint on X .

I Suggests a way to generate primal feasible points from bidualvariable X , using distribution with mean 0 and covariance matrixgiven by X .

I Allows to prove that (2/π)d∗ ≤ p∗ ≤ d∗ (independent of problemsize!).

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Example of strong dualityCVX syntax

Here is a matlab snippet that solves the bidual, assuming W , n are inthe workspace:

cvx_beginvariable X(n,n) symmetric;maximize( trace(X*W) );subject to

X == semidefinite(n) ;diag(X) == 1;

cvx_enddstar = trace(X*W);

Alternatively, we can use the dual command in the original dualproblem:

cvx_beginvariable y(n,1);dual variable Xminimize( sum(y) );subject to

diag(y)-W == semidefinite(n) : X;cvx_end

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Optimality conditions

If the primal problem is convex, and both primal and dual are strictlyfeasible, then

I Strong duality holds;I both problems are attained;I the “Karush-Kuhn-Tucker” (KKT) conditions

x∗ ∈ arg minxL(x , y∗), y∗i fi (x∗) = 0, i = 1, . . . ,m.

characterize optimality (primal-dual feasible pairs are optimal iffthey satisfy them).

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Applications of dualityI Sensitivity analysis of convex problems.I Stopping criteria in convex optimization algorithms.I Decomposition methods for large-scale convex optimization.I Bounds and approximations for non-convex problems.I For convex problems, leads to often surprising connections.

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

Outline

References

Convex functions

Convex problems

DualityWeak duality

Examples

Strong duality

References

S. Boyd and M. Grant.

The CVX optimization package, 2010.

S. Boyd and L. Vandenberghe.

Convex Optimization.Cambridge University Press, 2004.

short course robust optimization and machine …robust optimization and machine learning lecture 2:...

Documents

robust optimization applied to a currency...

overview optimization robust portfolio optimization

robust and data-driven optimization: modern decision...

robust discrete optimization under ellipsoidal...

robust discrete optimization and network...

robust optimization for unconstrained simulation...

robust portfolio optimization - etheses...

adversarially robust optimization with gaussian...

a robust energy optimization

optimization models - eecs 127 / eecs...

derivative free optimization and robust optimization

nonconvex robust optimization for problems with...

robust optimization for unconstrained simulation-based...

short course robust optimization and machine …short course...

uncertainty: sensitivity analysis & robust optimization ·...

robust optimization: applications in portfolio selection...

convex optimization lecture notes for ee 227bt draft,...

experiments in robust optimization

optimization applications - eecs at uc...

distributionally robust portfolio optimization