algorithms for convex optimization -...

23
Algorithms for convex optimization Michal Ko ˇ cvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University [email protected] http://www.utia.cas.cz/kocvara Algorithms for convex optimization – p.1/33

Upload: hoangkhanh

Post on 16-Jul-2018

263 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Algorithms for convex optimization

Michal Kocvara

Institute of Information Theory and Automation

Academy of Sciences of the Czech Republic

and

Czech Technical University

[email protected]

http://www.utia.cas.cz/kocvara

Algorithms for convex optimization – p.1/33

Page 2: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Convex programs

A general problem of mathematical programming:

min f(x) subject to x ∈ X ⊂ Rn (MP)

where

n is the design dimension of the problem;

f : Rn → R is the objective function;

X ⊂ Rn is the feasible domain of the problem.

Assume:

the function f and the feasible domain X are convex ;

the feasible domain X is defined by convex functionsgi : R

n → R, i = 1, . . . , m:

X := {x ∈ Rn | gi(x) ≤ 0, i = 1, . . . , m} .

Algorithms for convex optimization – p.4/33

kocvara
min f(x) subject to x ∈ X ⊂ Rn
Page 3: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Convex programs

A mathematical program satisfying the above assumption is calledconvex mathematical program:

min f(x) subject to gi(x) ≤ 0, i = 1, . . . , m. (CP)

Why are we interested in this class of optimization problems? That isbecause:

(i) Convex programs are computationally tractable: there existnumerical methods which efficiently solve every convex programsatisfying “mild” additional assumption;

(ii) In contrast, no efficient universal methods for nonconvexmathematical programs are known.

Algorithms for convex optimization – p.5/33

kocvara
min f(x) subject to gi(x) ≤ 0, i = 1, . . . ,m.
kocvara
Convex programs are computationally tractable:
kocvara
convex
kocvara
mathematical
kocvara
program:
kocvara
no efficient universal methods for nonconvex
kocvara
mathematical
kocvara
programs
Page 4: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Convex functions

A function h : Rn → R is convex when

∀x, x′ ∀λ ∈ [0, 1] : h(λx + (1 − λ)x′) ≤ λh(x) + (1 − λ)h(x′)

Convex (a) and nonconvex (b) functions in R1:

(b)(a)

Algorithms for convex optimization – p.6/33

Page 5: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Convex feasible region

Convex feasible region described by convex gi’s:

Algorithms for convex optimization – p.7/33

kocvara
X
Page 6: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Method of Newton

The “simplest” convex program— problem without constraints and witha strongly convex objective:

min f(x) (UCP) ;

strongly convex—the Hessian matrix ∇

2 f ( x) is positive definite atevery point x and that f ( x) → ∞ as ‖x‖2 → ∞.

Method of choice: Newton’s method. The idea: at every iterate, weapproximate the function f by a quadratic function, the second-orderTaylor expansion:

f(x) + (y − x)T ∇f(x) +1

2(y − x)T ∇2f(x)(y − x).

The next iterate—by minimization of this quadratic function.

Algorithms for convex optimization – p.8/33

kocvara
Newton’s method.
kocvara
Method of choice:
kocvara
The idea:
Page 7: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Method of Newton

Given a current iterate x, compute the gradient ∇f(x) andHessian H(x) of f at x.

Compute the direction vector

d = −H(x)−1∇f(x).

Compute the new iterate

xnew = x + d.

The method is

extremely fast whenever we are “close enough” to the solution x∗.Theory: the method is locally quadratically convergent, i.e.,

‖xnew − x∗‖2 ≤ c‖x − x∗‖22,

provided that ‖x − x∗‖2 ≤ r, where r is small enough.

rather slow when we are not “close enough” to x∗.Algorithms for convex optimization – p.9/33

kocvara
Given a current iterate x, compute the gradient ∇f(x) and Hessian H(x) of f at x. Compute the direction vector d = −H(x)−1∇f(x). Compute the new iterate xnew = x + d.
kocvara
extremely fast
kocvara
locally quadratically convergent,
kocvara
close enough”
kocvara
rather slow
kocvara
not
kocvara
Theory:
Page 8: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Interior-point methods

Transform the “difficult” constrained problem into an “easy”unconstrained problem, or into a sequence of unconstrained problems.

Once we have an unconstrained problem, we can solve it by Newton’smethod.

The idea is to use a barrier function that sets a barrier against leavingthe feasible region. If the optimal solution occurs at the boundary ofthe feasible region, the procedure moves from the interior to theboundary, hence interior-point methods.

The barrier function approach was first proposed in the early sixtiesand later popularised and thoroughly investigated by Fiacco andMcCormick.

Algorithms for convex optimization – p.10/33

kocvara
Transform the “difficult” constrained problem into an “easy” unconstrained problem, or into a sequence of unconstrained problems.
kocvara
interior-point
kocvara
barrier function
Page 9: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Classic approach

min f(x) subject to gi(x) ≤ 0, i = 1, . . . , m. (CP)

We introduce a barrier function B that is nonnegative and continuousover the region {x | gi(x) < 0}, and approaches infinity as theboundary of the region {x | gi(x) ≤ 0} is approached from theinterior.

(CP) −→ a one-parametric family of functions generated by theobjective and the barrier:

Φ(µ; x) := f(x) + µB(x)

and the corresponding unconstrained convex programs

minx

Φ(µ; x);

here the penalty parameter µ is assumed to be nonnegative.

Algorithms for convex optimization – p.11/33

kocvara
min f(x) subject to gi(x) ≤ 0, i = 1, . . . ,m.
kocvara
barrier function
kocvara
penalty parameter
kocvara
�(µ; x) := f(x) + µB(x)
kocvara
min x �(µ; x);
Page 10: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Classic approach

The idea behind the barrier methods is now obvious:We start with some µ (say µ = 1) and solve the unconstrainedauxiliary problem. Then we decrease µ by some factor and solveagain the auxiliary problem, and so on.

The auxiliary problem has a unique solution x(µ) for any µ > 0.

The central path, defined by the solutions x(µ), µ > 0, is asmooth curve and its limit points (for µ → 0) belong to the set ofoptimal solutions of (CP).

Algorithms for convex optimization – p.12/33

kocvara
unique solution
kocvara
central path,
Page 11: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Example

One of the most popular barrier functions—(Frisch’s) logarithmicbarrier function,

B(x) = −

m∑

i=1

log(−gi(x)).

Consider a one-dimensional CP

min −x subject to x ≤ 0.

Auxiliary problem:

Φ(µ; x) := −x + µB(x)

Algorithms for convex optimization – p.13/33

kocvara
logarithmic
kocvara
barrier function,
Page 12: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Example

Barrier function (a) and function Φ (b) for µ = 1 and µ = 0.3:

−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5

−2

−1

0

1

2

3

4

5

6

7

−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5

−2

−1

0

1

2

3

4

5

6

7

µ

µ

µ=0.3

µ

(a) (b)

=0.3

=1

=1

Algorithms for convex optimization – p.14/33

Page 13: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Example

min (x1 + x2)

s. t.−x1 − 2x2 + 2 ≤ 0

−x1 ≤ 0, −x2 ≤ 0

The auxiliary function

Φ = x1 + x2 − µ [log(x1 + 2x2 − 2) + log(x1) + log(x2)]

Algorithms for convex optimization – p.15/33

Page 14: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Example

Level lines of the function Φ for µ = 1, 0.5, 0.25, 0.125:

50 100 150 200 250 300

50

100

150

200

250

300

50 100 150 200 250 300

50

100

150

200

250

300

50 100 150 200 250 300

50

100

150

200

250

300

50 100 150 200 250 300

50

100

150

200

250

300

Algorithms for convex optimization – p.16/33

Page 15: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Central path

������������������������������������������������������������������������

��������������������������������������������������

��������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

x( )

x( )

µ1

µx( )0.25µ 0.125x( )

µ0.5

Algorithms for convex optimization – p.17/33

Page 16: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

The “algorithm”

At i-th step, we are at a point x(µi) of the central path.

decrease a bit µi, thus getting a new “target point” x(µi+1) onthe path;

approach the new target point x( µi+1) by running the Newtonmethod started at our current iterate xi.

Hope: x(µi) is in the region of quadratic convergence of the Newtonmethod approaching x(µi+1).

Hence, following the central path, Newton’s method is always efficient(always in the region of quadratic convergence)

Algorithms for convex optimization – p.18/33

kocvara
a bit
kocvara
Newton’s method is always efficient
kocvara
Hope:
Page 17: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Theory and practice

THEORY: Under some mild technical assumption, the methodconverges to the solution of (CP):

x(µ) → x∗ as µ → 0.

PRACTICE: disappointing.The method may have serious numerical difficulties.

The idea to stay on the central path, and thus to solve theauxiliary problems exactly, is too restrictive.

The idea to stay all the time in the region of quadratic convergenceof the Newton method may lead to extremely short steps.

How to guarantee in practice that we are “close enough”? Thetheory does not give any quantitative results.

If we take longer steps (i.e., decrease µ more rapidly), then theNewton method may become inefficient and we may even leavethe feasible region.

Algorithms for convex optimization – p.19/33

kocvara
disappointing.
kocvara
stay on the central path,
kocvara
too restrictive.
kocvara
extremely short steps.
kocvara
The theory does not give any quantitative results.
Page 18: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Modern approach—Brief history

The “classic” barrier methods—60’s and 70’s. Due to theirdisadvantages, practitioners lost interest soon.

Linear programming:Before 1984 linear programs solved exclusively by simplex method(Danzig ’47)

good practical performance

bad theoretical behaviour(Examples with exponential behaviour)

Looking for polynomial-time method

1979 Khachian: ellipsoid methodpolynomial-time (theoretically)bad practical performance

1984 Karmakar: polynomial-time method for LPreported 50-times faster than simplex

1986 Gill et al.: Karmakar = classic barrier method

Algorithms for convex optimization – p.20/33

kocvara
Looking for polynomial-time method
kocvara
Linear programming:
Page 19: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Asymptotic vs. Complexity analysis

Asymptotic analysis

Classic analysis of the Newton and barrier methods:uses terms like “sufficiently close”, “sufficiently small”, “close enough”,“asymptotic quadratic convergence”.

Does not give any quantitative estimates like:

how close is “sufficiently close” in terms of the problem data?

how much time (operations) do we need to reach our goal?

Algorithms for convex optimization – p.21/33

kocvara
how close is “sufficiently close” in terms of the problem data?
kocvara
how much time (operations) do we need to reach our goal?
kocvara
not
Page 20: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Asymptotic vs. Complexity analysis

Complexity analysis

Answers the question:

Given an instance of a generic problem and a desired accuracy, howmany arithmetic operations do we need to get a solution?

The classic theory for barrier methods does not give a singleinformation in this respect.

Moreover, from the complexity viewpoint,

Newton’s method has no advantage to first-order algorithms(there is no such phenomenon as “local quadratic convergence”);

constrained problems have the same complexity as theunconstrained ones.

Algorithms for convex optimization – p.22/33

kocvara
Given an instance of a generic problem and a desired accuracy, how many arithmetic operations do we need to get a solution?
kocvara
Newton’s method has no advantage to first-order algorithms
kocvara
constrained problems have the same complexity as the unconstrained ones.
Page 21: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Modern approach (Nesterov-Nemirovski)

It appears that all the problems of the classic barrier methods comefrom the fact that we have too much freedom in the choice of thepenalty function B.Nesterov and Nemirovski (SIAM, 1994):

1. There is a class of “good” (self-concordant) barrier functions.Every barrier function B of this type is associated with a realparameter θ(B) > 1.

2. If B is self-concordant, one can specify the notion of “closenessto the central path” and the policy of updating the penaltyparameter µ in the following way. If an iterate xi is close (in theabove sense) to x(µi) and we update the parameter µi to µi+1,then in a single Newton step we get a new iterate xi+1 which isclose to x(µi+1). In other words, after every update of µ we canperform only one Newton step and stay close to the central path.Moreover, points “close to the central path” belong to the interiorof the feasible region.

Algorithms for convex optimization – p.23/33

kocvara
Nesterov and Nemirovski (SIAM, 1994):
kocvara
There is a class of “good” (self-concordant) barrier functions.
kocvara
“closeness to the central path”
kocvara
If an iterate xi is close (in the above sense) to x(µi) and we update the parameter µi to µi+1, then in a single Newton step we get a new iterate xi+1 which is close to x(µi+1)
Page 22: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Modern approach (Nesterov-Nemirovski)

3. The penalty updating policy can be defined in terms of problemdata:

1

µi+1

=

(

1 +0.1

θ(B)

)

1

µi

;

this shows, in particular, that reduction factor is independent onthe size of µ, the penalty parameter decreases linearly.

4. Assume that we are close to the central path (this can be realizedby certain initialization step). Then every O(

θ(B)) steps of thealgorithm improve the quality of the generated approximatesolutions by an absolute constant factor. In particular, we need atmost

O(1)√

θ(B) log

(

1 +µ0θ(B)

ε

)

to generate a strictly feasible ε-solution to (CP).

Algorithms for convex optimization – p.24/33

kocvara
that reduction factor is independent on
kocvara
the size of µ
kocvara
improve the quality
kocvara
by an absolute constant factor.
Page 23: Algorithms for convex optimization - LAAS-CNRShomepages.laas.fr/henrion/courses/lmi06/lmiIII1.pdf · Algorithms for convex optimization Michal Kocvaraˇ Institute of Information Theory

Modern approach (Nesterov-Nemirovski)

Staying close to the central path in a single Newton step:

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������

����������������������������������������������������������������������

Central path

x

One Newton step

i-1x

µi

µ i+1

µi-1

i

xi+1

x( )

x( )

x( )

The class of self-concordant function is sufficiently large and containsmany popular barriers, in particular the logarithmic barrier function.

Algorithms for convex optimization – p.25/33