short course robust optimization and machine …robust optimization and machine learning lecture 2:...
TRANSCRIPT
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Short CourseRobust Optimization and Machine Learning
Lecture 2: Convex Optimization
Laurent El Ghaoui
EECS and IEOR DepartmentsUC Berkeley
Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012
January 16, 2012
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Outline
Convex problemsConvex setsConvex functionsConvex problems
DualityWeak dualityExamplesStrong duality
References
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Outline
Convex problemsConvex setsConvex functionsConvex problems
DualityWeak dualityExamplesStrong duality
References
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex setsDefinition
A set C in Rn is convex if, or any two pair of points, such as x1 and x2,the line segment joining the two points is entirely in the set.Mathematically:
∀ x1, x2 ∈ C, ∀ λ ∈ [0, 1] : λx1 + (1− λx2) ∈ C.
Points x1,x2 are in the set, so the linebetween them is.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex setsIntersection rule
The intersection of convex sets isconvex: if Cα is a family of convexsets indexed by α ∈ A, then∩α∈ACα is convex. (We allow forinfinite sets A.)
Examples:I the second-order cone {(y , t) :: ‖y‖2 ≤ t} is convex, as it is the
intersection of half-spaces of the form uT y ≤ t with u spanningthe unit Euclidean sphere.
I the cone of positive, semi-definite (PSD) matrices is convex,since a n × n symmetric matrix X is PSD iff zT Xz ≥ 0 for everyn-vector z.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex setsAffine transformation rule
Affine transformations of sets are convex. If C ⊆ Rm is convex, thenfor any x ∈ Rn and R ∈ Rn×m, the set {x + Ru : u ∈ C} is convex.
Example: For a given m × n matrix A, m-vector b, n-vector c, andscalar d , the set {
x ∈ Rn : ‖Ax + b‖2 ≤ cT x + d}
is convex.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex setsExample: chance constraint
If a ∈ Rn is a Gaussian random variable with mean a and covariancematrix Σ, then the set of points x ∈ Rn such that
Prob{a : aT x ≤ b} ≥ 0.99
is convex, and can be expressed via the second-order cone condition
aT x + κ√
xT Σx︸ ︷︷ ︸=‖Σ1/2x‖2
≤ b,
where κ = Φ−1(0.99) ≈ 2.33, with Φ the CDF of the standardGaussian.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsExtended value functions
Extended value functions and domain:I The domain of a function f , denoted by domf , is the set of
points where it is finite.Example: f : x → log x , dom f = R++.
I We can extend functions outside the domain using the values±∞.Example: f : x → log x , x > 0, +∞ otherwise.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsDefinition
A (extended value) function f : Rn → R is convex if
∀ x1, x2, ∀λ ∈ [0, 1] : f (λx1 + (1− λx2)) ≤ λf (x1) + (1− λ)f (x2).
(This requires the domain to be convex.)
Examples:I f : x → 1/x if x > 0, +∞ otherwise, is convex.I f : x → 1/x if x 6= 0, +∞ otherwise, is not convex.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsEpigraph
Epigraph of a function:
epi f :={
(x , t) ∈ Rn × R : f (x) ≤ t}.
f convex⇐⇒ epi f is convex.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsAlternate characterizations
f differentiable, convex⇐⇒
∀ x , x0 : f (x) ≥ f (x0) +∇f (x0)T (x − x0).
Affine approximation to f at anypoint is global lower bound onf .
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsAlternate characterizations
f twice differentiable, convex if and only if its Hessian
∇2f (x) =
(∂2f∂xi∂xj
(x)
)1≤ij≤n
is positive semidefinite (PSD) for every x .
In general, this condition is hard to check. For quadratic functionsthough, it is easy, as seen next.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Quadratic functionsRepresentation via symmetric matrices
A quadratic function q : Rn → R can be represented as
q(x) =12
xT Qx + bT x + c,
for appropriate symmetric matrix Q, vector b ∈ Rn, and scalar c.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Positive semidefinite matrices
Flashback from linear algebra:I A square matrix A has n (possibly non-distinct) eigenvalues,
which are (in general complex) numbers that solvedet(λI − A) = 0.
I Symmetric matrices have real eigenvalues only.I A symmetric matrix Q is said to be positive semi-definite (PSD) if
∀ x : xT Qx ≥ 0.
We write Q � 0.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Eigenvalue decomposition for symmetric matrices
Theorem (EVD of symmetric matrices)We can decompose any symmetric p × p matrix Q as
Q = UΛUT =
p∑i=1
λiuiuTi ,
where Λ = diag(λ1, . . . , λp), with λ1 ≥ . . . ≥ λn the eigenvalues, andU = [u1, . . . , up] is a p × p orthogonal matrix (UT U = Ip) that containsthe eigenvectors of Q.
Corollary
Q � 0⇐⇒ every eigenvalue is non-negative.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsConvex quadratic functions
A quadratic function q : Rn → R can be represented as
q(x) =12
xT Qx + bT x + c,
for appropriate symmetric matrix Q, vector b ∈ Rn, and scalar c.Since ∇2q(x) = Q:
Theorem (Convex quadratic functions)q convex⇐⇒ Q � 0.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsSome properties
I Pointwise maxima of convex functions are convex: if fα is a familyof convex functions sets indexed by α ∈ A, then f with valuesmaxα∈A
fα(x) is convex.
I Conversely, any convex function can be represented as maximaof affine ones. (Hence, “convex means max-linear”).
I If f is convex and component-wise monotone increasing andg1, . . . , gk are convex, then h = f ◦ g with valuesh(x) = f (g1(x), . . . , gk (x)) is convex.Proof: the epigraph of h can be represented via convexconstraints:
h(x) ≤ t ⇐⇒ ∃ u ∈ Rk : f (u) ≤ t , gi (x) ≤ ui , i = 1, . . . , k .
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex functionsExamples
The following functions are convex:I Norms, such as l1, l2 and l∞. Hence, for example,
x → ‖Ax + b‖2, with A a matrix and b a vector, is convex.I The function x → λmax(F (x)), with F (x) = F0 + x1F1 + . . .+ xmFm
and affine combination of given symmetric matrices, and λmax isthe largest eigenvalue, is convex.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Convex problemsDefinition
The problem in standard form
p∗ := minx
f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,Ax = b,
is convex if the functions f0, . . . , fm are all convex. Here A ∈ Rp×n,b ∈ Rp are given.
Note that only affine equality constraints are allowed.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
DefinitionMaximization problems
The problem in standard form
maxx
f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,Ax = b,
is convex ifI The function f0 is concave.I The functions f1, . . . , fm are all convex.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Problem classesLinear programming
Linear programming (LP) involves the minimization of a linear functionover a polytope:
minx
cT0 x : cT
i x + di ≥ 0, i = 1, . . . ,m.
The problem
minx
3x1 + 1.5x2
s.t. −1 ≤ x1 ≤ 2,0 ≤ x2 ≤ 3
is an LP, since the objective andconstraint functions are all affine.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Problem classesQuadratic programming
Quadratic programming (QP) involves the minimization of a quadraticconvex function over a polytope.
minx
cT x + xT Qx : cTi x + di ≥ 0, i = 1, . . . ,m.
Here, Q must be PSD.The problem
minx
x21 − x1x2 + 2x2
2−3x1 − 1.5x2
s.t. −1 ≤ x1 ≤ 2,0 ≤ x2 ≤ 3
is a QP, since objective isquadratic convex, and the theconstraint functions are all affine.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of a quadratic programThe slalom problem
The problem seen in Overview lecture can be formulated as a QP:
minv
v1u0 +n∑
i=1
(yi + vi+1 − vi )2
σ2i
: ‖v‖∞ ≤ c.
where σi (resp. yi ) is the horizontal (resp. vertical) distance betweenthe middle of the gates.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Problem classesSecond-order cone programming
Second-order cone programming (SOCP) generalizes LP and QP viathe inclusion of Euclidean norms in the constraint functions.
minx
cT0 x : ‖Aix + bi‖2 ≤ cT
i x + di , i = 1, . . . ,m.
Application: Chance-constrained linear programming
minx
cT x : Prob{aTi x ≤ b} ≥ 0.99, i = 1, . . . ,m,
where each ai is a Gaussian random variable with mean ai andcovariance matrix Σi , i = 1, . . . ,m.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Problem classesSemi-definite programming programming
Semi-definite programming (SDP) involves the minimization of a linearfunction over the constraint that a symmetric matrix affine in thedecision variables be positive-semidefinite:
minx
cT0 x : F0 +
n∑i=1
xiFi � 0.
(Here, F0, . . . ,Fn are given symmetric matrices.)
Application: relaxation of Boolean problems, see later.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Outline
Convex problemsConvex setsConvex functionsConvex problems
DualityWeak dualityExamplesStrong duality
References
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Lagrangian
Consider the (non-necessarily convex) “primal” problem in standardform
p∗ := minx
f0(x) subject to fi (x) ≤ 0, i = 1, . . . ,m,
We can write the problem as an unconstrained one:
p∗ := minx
maxy≥0L(x , y),
where L is the Lagrange function :
L(x , y) := f0(x) +m∑
i=1
yi fi (x).
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Min-max inequality
Theorem (Min-max inequality)For any function L of two variables:
minx
maxyL(x , y) ≥ max
ymin
xL(x , y).
Proof.In the LHS, maximizing Player y has full information on x (y∗ is afunction of x). The RHS is better (smaller) for the minimizing Player x ,as then y has no information on x .
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Weak duality
Weak duality is the process of applying the min-max inequality to theproblem in unconstrained form:
p∗ = minx
maxy≥0L(x , y) ≥ d∗ := max
y≥0min
xL(x , y).
The dual problem is
d∗ = maxy≥0
G(y), where G(y) := minxL(x , y)
is called the dual function .
Since G is concave, the dual problem is convex (but not necessarilyeasy to solve!)
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
ExampleLP duality
Linear programming:
p∗ := min cT x : Ax ≤ b.
Lagrangian:L(x , y) = cT x + yT (Ax − b).
Dual is also an LP
p∗ ≥ d∗ = maxy≥0
minxL(x , y) = max
y≥0, AT y+c=0−bT y .
Turns out to be equivalent (p∗ = d∗), as seen later.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of weak dualityA Boolean problem
Let W = W T be a given n × n matrix.
p∗ := maxx
xT Wx : x2i = 1, i = 1, . . . , n
I Arises in e.g., segmentation problems (with W the Laplacianmatrix of a graph).
I Hard combinatorial problem.I Duality yields a provably good, non-trivial bound.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of weak dualityDual of Boolean problem
We can express the problem in an unconstrained way
p∗ = minx
maxyL(x , y)
where L is the Lagrangian:
L(x , y) := xT Wx +n∑
i=1
yi (1− x2i )
Dual function:
G(y) := maxxL(x , y) =
{ ∑ni=1 yi if diag(y)−W is PSD,
+∞ otherwise.
Dual problem is an SDP:
p∗ ≤ d∗ := maxy
G(y) = maxy
n∑i=1
yi : diag(y) � W .
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of weak duality
CVX syntax CVX is a matlab prototyping tool for convexoptimization [1].
Here is a matlab snippet that solves for the dual bound, assumingW , n are in the workspace:
cvx_beginvariable y(n,1);minimize( sum(y) );subject to
diag(y)-W == semidefinite(n) ;cvx_enddstar = sum(y);
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Strong dualityDefinition & main result
Recall weak duality:
p∗ = minx
maxy≥0L(x , y) ≥ d∗ := max
y≥0min
xL(x , y).
We say that strong duality holds when p∗ = d∗.I It holds when the primal problem is convex and strictly feasible.I It holds for LPs that are feasible.I It holds in some very specific non-convex problems. For example,
if m = 1 and both f0, f1 are quadratic.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of strong dualityBidual of Boolean problem
Return to Lagrange relaxation (“dual”) of Boolean problem:
d∗ = maxy
n∑i=1
yi : diag(y) � W .
Express it as max-min problem:
d∗ = maxy
minX�0
n∑i=1
yi + Tr X (diag(y)−W ).
Here we use the fact that for any symmetric matrix Q,
maxX�0
Tr QX =
{0 if Q � 0,
+∞ otherwise.
Exchanging min and max leads to the so-called bidual :
p∗∗ := maxX
Tr WX : X � 0, Xii = 1, i = 1, . . . , n.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of strong dualityBidual of Boolean problem
Bidual:max
XTr WX : X � 0, Xii = 1, i = 1, . . . , n.
I Also an SDP, with objective value equal to d∗ ( strong dualityholds ).
I Can obtain this directly by expressing original problem in terms ofmatrix variable X := xxT , and dropping the rank constraint on X .
I Suggests a way to generate primal feasible points from bidualvariable X , using distribution with mean 0 and covariance matrixgiven by X .
I Allows to prove that (2/π)d∗ ≤ p∗ ≤ d∗ (independent of problemsize!).
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Example of strong dualityCVX syntax
Here is a matlab snippet that solves the bidual, assuming W , n are inthe workspace:
cvx_beginvariable X(n,n) symmetric;maximize( trace(X*W) );subject to
X == semidefinite(n) ;diag(X) == 1;
cvx_enddstar = trace(X*W);
Alternatively, we can use the dual command in the original dualproblem:
cvx_beginvariable y(n,1);dual variable Xminimize( sum(y) );subject to
diag(y)-W == semidefinite(n) : X;cvx_end
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Optimality conditions
If the primal problem is convex, and both primal and dual are strictlyfeasible, then
I Strong duality holds;I both problems are attained;I the “Karush-Kuhn-Tucker” (KKT) conditions
x∗ ∈ arg minxL(x , y∗), y∗i fi (x∗) = 0, i = 1, . . . ,m.
characterize optimality (primal-dual feasible pairs are optimal iffthey satisfy them).
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Applications of dualityI Sensitivity analysis of convex problems.I Stopping criteria in convex optimization algorithms.I Decomposition methods for large-scale convex optimization.I Bounds and approximations for non-convex problems.I For convex problems, leads to often surprising connections.
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
Outline
Convex problemsConvex setsConvex functionsConvex problems
DualityWeak dualityExamplesStrong duality
References
Robust Optimization &Machine Learning
2. ConvexOptimization
Convex problemsConvex sets
Convex functions
Convex problems
DualityWeak duality
Examples
Strong duality
References
References
S. Boyd and M. Grant.
The CVX optimization package, 2010.
S. Boyd and L. Vandenberghe.
Convex Optimization.Cambridge University Press, 2004.