new convex optimization - nanjing university · 2020. 9. 3. · introduction problems duality...

68
Introduction Problems Duality Methods Convex Optimization Lijun Zhang Nanjing University, China December 24, 2017 http://cs.nju.edu.cn/zlj Convex Optimization Learning And Mining from DatA A D A M L NANJING UNIVERSITY

Upload: others

Post on 16-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Optimization

Lijun Zhang

Nanjing University, China

December 24, 2017

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAMLNANJING UNIVERSITY

Page 2: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Outline

1 Introduction

2 Convex Optimization Problems

3 Duality

4 Optimization Methods

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 3: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Outline

1 Introduction

2 Convex Optimization Problems

3 Duality

4 Optimization Methods

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 4: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Mathematical Optimization

Optimization Problem [Boyd and Vandenberghe, 2004]

minx

f0(x)

s. t. fi(x) ≤ 0, i = 1, . . . , n

x = [x1, . . . , xd ]⊤ ∈ R

d : optimization variable

f0 : Rd 7→ R: objective function

fi : Rd 7→ R, i = 1, . . . , n: constraint functions

Optimal Solution

x∗|f0(x∗) ≤ f0(x), ∀x ∈ Ωwhere Ω = x|fi(x) ≤ 0, i = 1, . . . , n

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 5: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Least-squares

The Problem

minx

f0(x) =n∑

i=1

(

a⊤i x − bi

)2= ‖Ax − b‖2

A = [a1, . . . , an]⊤ ∈ R

n×d

b = [b1, . . . , bn]⊤ ∈ R

n

Solving Least-squares

Analytical solution: x∗ = (A⊤A)−1A⊤b

Reliable and efficient algorithms and software

Computation time proportional to d2n

A mature technology

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 6: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Linear Programming

The Problemmin

xc⊤x

s. t. a⊤i x ≤ bi , i = 1, . . . , n

x = [x1, . . . , xd ]⊤: optimization variable

c, a1, . . . , an ∈ Rd and b1, . . . , bn ∈ R : parameters

Solving Linear Programs

No analytical formula for solution

Reliable and efficient algorithms and software

Computation time proportional to d2n if n ≥ d

A mature technology

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 7: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Optimization Problem

The Problemmin

xf0(x)

s. t. fi(x) ≤ 0, i = 1, . . . , n

Objective and constraint functions are convex:

fi(θx + (1 − θ)y) ≤ θfi(x) + (1 − θ)fi(y)

if 0 ≤ θ ≤ 1

Solving Convex Optimization Problems

No analytical solution

Reliable and efficient algorithms

Almost a technology

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 8: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Outline

1 Introduction

2 Convex Optimization Problems

3 Duality

4 Optimization Methods

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 9: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Set

Line Segment between x1 and x2

x = θx1+(1−θ)x2, 0 ≤ θ ≤ 1 !

"

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 10: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Set

Line Segment between x1 and x2

x = θx1+(1−θ)x2, 0 ≤ θ ≤ 1 !

"

Convex Set

A set C is convex if the line segment between any two points inC lies in C.

x1, x2 ∈ C, 0 ≤ θ ≤ 1 ⇒ θx1 + (1 − θ)x2 ∈ C

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 11: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Functions

Convex Functions

A function f : Rd 7→ R is convex if dom f is a convex set and iffor all x, y ∈ dom f , and θ with 0 ≤ θ ≤ 1, we have

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 12: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Functions

Convex Functions

A function f : Rd 7→ R is convex if dom f is a convex set and iffor all x, y ∈ dom f , and θ with 0 ≤ θ ≤ 1, we have

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

-3 -2 -1 0 1 2 3-1

0

1

2

3

4

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 13: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Functions

Convex Functions

A function f : Rd 7→ R is convex if dom f is a convex set and iffor all x, y ∈ dom f , and θ with 0 ≤ θ ≤ 1, we have

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

Concave Functions

A function f : Rd 7→ R is concave if dom f is a convex set and iffor all x, y ∈ dom f , and θ with 0 ≤ θ ≤ 1, we have

f (θx + (1 − θ)y) ≥ θf (x) + (1 − θ)f (y)

f is concave if −f is convex

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 14: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Examples

Examples on R

Affine: ax + b on R, for any a, b ∈ R

Exponential: eax on R, for any a ∈ R

Powers: xa on R++, for a ≥ 1 or a ≤ 0

Powers of absolute value: |x |p on R, for p ≥ 1

Negative entropy: x log x on R++

Examples on Rd

Affine function: a⊤x + b, for any a ∈ Rd , b ∈ R

Norm: ‖x‖p = (∑d

i=1 |xi |p)1/p, for any p ≥ 1

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 15: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

First-order Condition

The Gradient of f (·) : Rd 7→ R

∇f (x) =[

∂f (x)∂x1

, . . . ,∂f (x)∂xd

]⊤∈ R

d

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 16: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

First-order Condition

The Gradient of f (·) : Rd 7→ R

∇f (x) =[

∂f (x)∂x1

, . . . ,∂f (x)∂xd

]⊤∈ R

d

First-order Condition

Suppose f is differentiable. Then f is convex if and only if dom fis convex and

f (y) ≥ f (x) +∇f (x)⊤(y − x)

holds for all x, y ∈ dom f .

First-order approximation of f is a global underestimator

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 17: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

First-order Condition

An Exmaple

-3 -2 -1 0 1 2 3-1

0

1

2

3

4

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 18: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Second-order Condition

The Hessian of f (·) : Rd 7→ R

∇2f (x) ∈ Rd×d with ∇2f (x)ij =

∂2f (x)∂xi∂xj

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 19: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Second-order Condition

The Hessian of f (·) : Rd 7→ R

∇2f (x) ∈ Rd×d with ∇2f (x)ij =

∂2f (x)∂xi∂xj

Second-order Condition

Suppose f is twice differentiable. Then f is convex if and only ifdom f is convex and its Hessian is positive semidefinite:

∇2f (x) 0

holds for all x dom f .

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 20: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Second-order Condition

The Hessian of f (·) : Rd 7→ R

∇2f (x) ∈ Rd×d with ∇2f (x)ij =

∂2f (x)∂xi∂xj

Second-order Condition

Suppose f is twice differentiable. Then f is convex if and only ifdom f is convex and its Hessian is positive semidefinite:

∇2f (x) 0

holds for all x dom f .

Quadratic functions

f (x) =12

x⊤Ax + b⊤x + c

where A 0http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 21: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Operations that Preserve Convexity

Nonnegative Weighted SumIf f1, . . . , fn are convex and w1, . . . ,wn ≥ 0, then

f = w1f1 + w2f2 + · · · ,wnfn is convex.

Composition with Affine FunctionIf f is convex, then

g(x) = f (Ax + b) is convex.

Pointwise Maximum and SupremumIf f1, . . . , fn are convex, then

f (x) = max f1(x), . . . , fn(x) is convex.

Hinge loss: ℓ(w) = max(0, 1 − yx⊤w)If for each y ∈ A, f (x , y) is convex in x , then the function

g(x) = supy∈A

f (x , y) is convex.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 22: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Jensen’s inequality

Basic Inequality

If f is convex, then for 0 ≤ θ ≤ 1,

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

Extensions

If f is convex, then

f (E[X ]) ≤ E [f (X )]

where X is a random variable

Examples:

(E[X ])2 ≤ E[

X 2]

E[√

X]

≤√

E[X ]

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 23: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Optimization Problem

Standard Formmin

xf0(x)

s. t. fi(x) ≤ 0, i = 1, . . . , n

a⊤i x = bi , i = 1, . . . , p

f0, f1, . . . , fn are convex

Equality constraints are affine

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 24: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Convex Optimization Problem

Standard Formmin

xf0(x)

s. t. fi(x) ≤ 0, i = 1, . . . , n

a⊤i x = bi , i = 1, . . . , p

f0, f1, . . . , fn are convex

Equality constraints are affine

Local and Global Optima

Any locally optimal point of a convex problem is (glo-bally) optimal.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 25: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Optimality Criterion for Differentiable f0

Optimality Criterionx is optimal if and only if it is feasible and

∇f0(x)⊤(y − x) ≥ 0 for all feasible y

!

"#$(!)

Unconstrained: x is optimal if and only if ∇f0(x) = 0http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 26: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Conjugate Function

Conjugate Function

The conjugate of function f : Rd 7→ R is defined as

f ∗(y) = supx∈dom f

(

y⊤x − f (x))

dom f ∗

y| supx∈dom f

(

y⊤x − f (x))

< ∞

f ∗ is convex

Suppose f is differentiable

f ∗(y) = ∇f (x)⊤x − f (x) = −[

f (x) +∇f (x)⊤(0 − x)]

where y = ∇f (x)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 27: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Conjugate Function

Conjugate Function

The conjugate of function f : Rd 7→ R is defined as

f ∗(y) = supx∈dom f

(

y⊤x − f (x))

-5 0 5-5

0

5

10

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 28: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Conjugate Function

Conjugate Function

The conjugate of function f : Rd 7→ R is defined as

f ∗(y) = supx∈dom f

(

y⊤x − f (x))

-5 0 5-5

0

5

10

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 29: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Conjugate Function

Conjugate Function

The conjugate of function f : Rd 7→ R is defined as

f ∗(y) = supx∈dom f

(

y⊤x − f (x))

-5 0 5-5

0

5

10

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 30: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Outline

1 Introduction

2 Convex Optimization Problems

3 Duality

4 Optimization Methods

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 31: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Lagrangian

The Optimization Problem (not necessarily convex)

minx

f0(x)

s. t. fi(x) ≤ 0, i = 1, . . . , n

hi(x) = 0, i = 1, . . . , p

Variable x ∈ Rd , domain D, optimal value p∗

The Lagrangian L : Rd × Rn × R

p 7→ R

L(x,λ,ν) = f0(x) +n∑

i=1

λi fi(x) +p∑

i=1

νihi(x)

λi is Lagrange multiplier associated with fi(x) ≤ 0

νi is Lagrange multiplier associated with hi(x) = 0

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 32: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Lagrange Dual Function

Lagrange dual function g : Rn × Rp 7→ R

g(λ,ν) = infx∈D

L(x,λ,ν)

= infx∈D

f0(x) +n∑

i=1

λi fi(x) +p∑

i=1

νihi(x)

g is concave

Lower Bound Property

If λ 0, then g(λ,ν) ≤ p∗.

f0(x) ≥ L(x,λ,ν) ≥ infx∈D

L(x,λ,ν) = g(λ,ν)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 33: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Dual Problem

Lagrange Dual Problem

maxλ,ν

g(λ,ν)

s. t. λ 0

The best lower bound on p∗A convex optimization problemOptimal value d∗

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 34: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Dual Problem

Lagrange Dual Problem

maxλ,ν

g(λ,ν)

s. t. λ 0

The best lower bound on p∗A convex optimization problemOptimal value d∗

Weak Dualityd∗ ≤ p∗

Strong Dualityd∗ = p∗

Doesnot hold in general, but usually holds for convexproblems

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 35: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Complementary SlacknessUnder Strong Duality

f0(x∗) = g(λ∗,ν∗) = infx∈D

f0(x) +n∑

i=1

λ∗i fi(x) +p∑

i=1

ν∗ihi(x)

≤f0(x∗) +n∑

i=1

λ∗i fi(x∗) +p∑

i=1

ν∗ihi(x∗) ≤ f0(x∗)

x∗ is primal optimal and (λ∗,ν∗) is dual optimal

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 36: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Complementary SlacknessUnder Strong Duality

f0(x∗) = g(λ∗,ν∗) = infx∈D

f0(x) +n∑

i=1

λ∗i fi(x) +p∑

i=1

ν∗ihi(x)

≤f0(x∗) +n∑

i=1

λ∗i fi(x∗) +p∑

i=1

ν∗ihi(x∗) ≤ f0(x∗)

x∗ is primal optimal and (λ∗,ν∗) is dual optimal

Complementary Slackness

λ∗i fi(x∗) = 0 ⇒

λ∗i > 0 ⇒fi(x∗) = 0

fi(x∗) < 0 ⇒λ∗i = 0, i = 1, . . . , n

Optimality of x∗

x∗ = argminx∈D

f0(x) +n∑

i=1

λ∗i fi(x) +p∑

i=1

ν∗ihi(x)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 37: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Karush-Kuhn-Tucker (KKT) Conditions

1 Primal constraintsfi(x) ≤ 0, i = 1, . . . , n

hi(x) = 0, i = 1, . . . , p

2 Dual constraintsλ 0

3 Complementary Slackness

λ∗i fi(x∗) = 0, i = 1, . . . , n

4 Gradient of Lagrangian with respect to x vanishes

∇f0(x) +n∑

i=1

λi∇fi(x) +p∑

i=1

νi∇hi(x) = 0

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 38: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Implications of KKT Conditions

General Problems

For any optimization problem for which strong duality obtains,any pair of primal and dual optimal points must satisfy the KKTconditions.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 39: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Implications of KKT Conditions

General Problems

For any optimization problem for which strong duality obtains,any pair of primal and dual optimal points must satisfy the KKTconditions.

Convex Optimizations

For any convex optimization problem, any points that satisfy theKKT conditions are primal and dual optimal, and have zeroduality gap.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 40: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Implications of KKT Conditions

General Problems

For any optimization problem for which strong duality obtains,any pair of primal and dual optimal points must satisfy the KKTconditions.

Convex Optimizations

For any convex optimization problem, any points that satisfy theKKT conditions are primal and dual optimal, and have zeroduality gap.

Convex Optimizations

For any convex optimization problem for which strong dualityobtains, x∗ is optimal if and only if there exist (λ∗,ν∗) thatsatisfy KKT conditions.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 41: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines I

The Optimization Problem

minw∈Rd ,b∈R

n∑

i=1

max(

0, 1 − yi(w⊤xi + b)

)

2‖w‖2

2

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 42: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines I

The Optimization Problem

minw∈Rd ,b∈R

n∑

i=1

max(

0, 1 − yi(w⊤xi + b)

)

2‖w‖2

2

Redefine the Hinge Loss

ℓ(x) = max(0, 1 − x)

Its Conjugate Function

ℓ∗(y) = supx(yx − ℓ(x)) =

y , −1 ≤ y ≤ 0

∞, otherwise

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 43: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines I

The Optimization Problem

minw∈Rd ,b∈R

n∑

i=1

max(

0, 1 − yi(w⊤xi + b)

)

2‖w‖2

2

Redefine the Hinge Loss

ℓ(x) = max(0, 1 − x)

Its Conjugate Function

ℓ∗(y) = supx(yx − ℓ(x)) =

y , −1 ≤ y ≤ 0

∞, otherwise

A General Problem

minw∈Rd ,b∈R

n∑

i=1

ℓ(

yi(w⊤xi + b)

)

2‖w‖2

2

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 44: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines II

An Equivalent Form

minw∈Rd ,b∈R,u∈Rn

n∑

i=1

ℓ(ui) +λ

2‖w‖2

2

s. t. ui = yi(w⊤xi + b), i = 1 . . . , n

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 45: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines II

An Equivalent Form

minw∈Rd ,b∈R,u∈Rn

n∑

i=1

ℓ(ui) +λ

2‖w‖2

2

s. t. ui = yi(w⊤xi + b), i = 1 . . . , n

The Lagrangian

L(w, b,u,ν) =n∑

i=1

ℓ(ui) +λ

2‖w‖2

2 +n∑

i=1

νi

(

ui − yi(w⊤xi + b)

)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 46: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines II

An Equivalent Form

minw∈Rd ,b∈R,u∈Rn

n∑

i=1

ℓ(ui) +λ

2‖w‖2

2

s. t. ui = yi(w⊤xi + b), i = 1 . . . , n

The Lagrangian

L(w, b,u,ν) =n∑

i=1

ℓ(ui) +λ

2‖w‖2

2 +n∑

i=1

νi

(

ui − yi(w⊤xi + b)

)

The Lagrange Dual Function

g(ν) = infw,b,u

L(w, b,u,ν)

= infw,b,u

n∑

i=1

ℓ(ui) +λ

2‖w‖2

2 +n∑

i=1

νi

(

ui − yi(w⊤xi + b)

)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 47: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines III

The Lagrange Dual Function

g(ν)

= infw,b,u

n∑

i=1

(ℓ(ui) + νiui)) +

(

λ

2‖w‖2

2 − w⊤n∑

i=1

νiyixi

)

− bn∑

i=1

νiyi

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 48: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines III

The Lagrange Dual Function

g(ν)

= infw,b,u

n∑

i=1

(ℓ(ui) + νiui)) +

(

λ

2‖w‖2

2 − w⊤n∑

i=1

νiyixi

)

− bn∑

i=1

νiyi

Minimizing w, b,uinfui

(ℓ(ui) + νiui)) = − supui

(−νiui − ℓ(ui))

=− ℓ∗(−νi) = νi , if 0 ≤ νi ≤ 1

∇wL(w, b,u,ν) = λw −n∑

i=1

νiyixi = 0 ⇒ w =1λ

n∑

i=1

νiyixi

∇bL(w, b,u,ν) = −n∑

i=1

νiyi = 0

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 49: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines IV

The Lagrange Dual Function

g(ν) =n∑

i=1

νi −1

n∑

i=1

n∑

j=1

νiνjyiyjx⊤i xj

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 50: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines IV

The Lagrange Dual Function

g(ν) =n∑

i=1

νi −1

n∑

i=1

n∑

j=1

νiνjyiyjx⊤i xj

The Dual Problem

maxν∈Rn

n∑

i=1

νi −1

n∑

i=1

n∑

j=1

νiνjyiyjx⊤i xj

s. t. 0 ≤ νi ≤ 1, i = 1 . . . , n

s. t.n∑

i=1

νiyi = 0

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 51: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Support Vector Machines V

KKT Conditions

(w∗, b∗,u∗) and ν∗ are primal and dual solutions

u∗i = yi(w⊤∗ xi + b∗)

0 ≤ ν∗i ≤ 1n∑

i=1

ν∗iyi = 0

w∗ =1λ

n∑

i=1

ν∗iyixi

u∗i = argminui

(ℓ(ui) + ν∗iui) = 1 if 0 < ν∗i < 1

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 52: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Outline

1 Introduction

2 Convex Optimization Problems

3 Duality

4 Optimization Methods

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 53: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Performance Measure

The Optimization Problemminw∈W

f (w)

f (·) is a convex function

W is a convex domain

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 54: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Performance Measure

The Optimization Problemminw∈W

f (w)

f (·) is a convex function

W is a convex domain

Convergence Rate

After T iterations, the gap between objectives

f (wT )− f (w∗) = O(

1√T

)

, O(

1T

)

, O(

1T 2

)

,O(

1αT

)

Iteration Complexity

To ensure f (wT )− f (w∗) ≤ ǫ, the order of T

T = Ω

(

1ǫ2

)

, Ω

(

)

, Ω

(

1√ǫ

)

(

log1ǫ

)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 55: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Analytical Properties

Lipschitz Continuous

‖∇f (x)‖ ≤ G, or ‖∂f (x)‖ ≤ G

Strongly Convex

∇2f (x) µI

〈∇f (x)−∇f (y), x − y〉 ≥ µ‖x − y‖22

f (y) ≥ f (x) + 〈∇f (x), y − x〉+ µ

2‖y − x‖2

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)− θ(1 − θ)µ

2‖x − y‖2

Smooth∇2f (x) LI

〈∇f (x)−∇f (y), x − y〉 ≤ L‖x − y‖22

f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖y − x‖2

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 56: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

An Example

-4 -2 0 2 4 6

0

1

2

3

4

5

6

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 57: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

An Example

-4 -2 0 2 4 6

0

1

2

3

4

5

6

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 58: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Existing Results

Convergence RateLipschitz

ContinuousStrongly Convex Smooth

SmoothStrongly Convex

GD O(

1√T

)

EGD/SGDα O( 1

T

)

AGD O( 1

T 2

)

GD/AGD O( 1αT

)

GD—Gradient Descent [Nesterov, 2004]

EGD—Epoch Gradient Descent [Hazan and Kale, 2011]

SGDα—SGD with α-suffix Averaging [Rakhlin et al., 2012]

AGD—Nesterov’s Accelerated Gradient Descent[Nesterov, 2005, Nesterov, 2007, Tseng, 2008]

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 59: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Gradient Descent

-3 -2 -1 0 1 2 3

0

1

2

3

4

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 60: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Gradient Descent with Projection

The Problemminw∈W

f (w)

for t = 1, . . . ,T do

w′t+1 = wt − ηt∇f (wt)

wt+1 = ΠW(w′t+1)

end forreturn wT = 1

T

∑Tt=1 wt

The Projection Operator

ΠW(y) = argminx∈W

‖x − y‖2

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 61: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Convergence Rate I

For any w ∈ W, we have

f (wt)− f (w)

≤〈∇f (wt),wt − w〉

=1ηt〈wt − w′

t+1,wt − w〉

=1

2ηt

(

‖wt − w‖22 − ‖w′

t+1 − w‖22 + ‖wt − w′

t+1‖22

)

=1

2ηt

(

‖wt − w‖22 − ‖w′

t+1 − w‖22

)

+ηt

2‖∇f (wt)‖2

2

≤ 12ηt

(

‖wt − w‖22 − ‖wt+1 − w‖2

2

)

+ηt

2‖∇f (wt)‖2

2

To simplify the above inequality, we assume

ηt = η, ‖∇f (w)‖2 ≤ G, ∀w ∈ W, and ‖x − y‖2 ≤ D, ∀x, y ∈ W

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 62: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Convergence Rate II

Then, we have

f (wt)− f (w) ≤ 12η

(

‖wt − w‖22 − ‖wt+1 − w‖2

2

)

2G2

By adding the inequalities of all iterations, we haveT∑

t=1

f (wt)− Tf (w)

≤ 12η

(

‖w1 − w‖22 − ‖wT+1 − w‖2

2

)

+ηT2

G2

≤ 12η

‖w1 − w‖22 +

ηT2

G2

≤ 12η

D2 +ηT2

G2 = GD√

T

where we set

η =D

G√

Thttp://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 63: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Convergence Rate III

Then, we have

f (wT )− f (w) = f

(

1T

T∑

t=1

wt

)

− f (w)

≤ 1T

T∑

t=1

f (wt)− f (w) ≤ 1T

GD√

T =GD√

T

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 64: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

The Convergence Rate III

Then, we have

f (wT )− f (w) = f

(

1T

T∑

t=1

wt

)

− f (w)

≤ 1T

T∑

t=1

f (wt)− f (w) ≤ 1T

GD√

T =GD√

T

Limitations

The step size η = DG√

Tdepends on T

f (wT ) doesnot decrease monotonically

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 65: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Gradients or Subgradients

The Logit Loss

ℓi(w) = log(

1 + exp(−yix⊤i w)

)

∇ℓi(w) =1

1 + exp(−yix⊤i w)

∇(

1 + exp(−yix⊤i w)

)

=exp(−yix⊤

i w)

1 + exp(−yix⊤i w)

∇(−yix⊤i w) =

exp(−yix⊤i w)

1 + exp(−yix⊤i w)

− yixi

The Hinge Loss

ℓi(w) = max(0, 1 − yix⊤i w)

∂ℓi(w) =

− yixi , yix⊤i w < 1

0, yix⊤i w > 1

−αyixi : α ∈ [0, 1], yix⊤i w = 1

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 66: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Homework

Analyze the smoothness of the following function

f (x) = log (1 + exp(−x)) +λ

2x2

ΠW(·) is a nonexpanding operator

‖ΠW(x)− ΠW(y)‖2 ≤ ‖x − y‖2, ∀x, y

The analysis of varied step size

ηt = O(1/√

t)

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 67: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Reference I

Boyd, S. and Vandenberghe, L. (2004).Convex Optimization.Cambridge University Press.

Hazan, E. and Kale, S. (2011).Beyond the regret minimization barrier: an optimal algorithm for stochasticstrongly-convex optimization.In Proceedings of the 24th Annual Conference on Learning Theory, pages421–436.

Nesterov, Y. (2004).Introductory lectures on convex optimization: a basic course, volume 87 ofApplied optimization.Kluwer Academic Publishers.

Nesterov, Y. (2005).Smooth minimization of non-smooth functions.Mathematical Programming, 103(1):127–152.

Nesterov, Y. (2007).Gradient methods for minimizing composite objective function.Core discussion papers.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML

Page 68: New Convex Optimization - Nanjing University · 2020. 9. 3. · Introduction Problems Duality Methods Convex Functions Convex Functions A function f : Rd → Ris convex if domf is

Introduction Problems Duality Methods

Reference II

Rakhlin, A., Shamir, O., and Sridharan, K. (2012).Making gradient descent optimal for strongly convex stochastic optimization.In Proceedings of the 29th International Conference on Machine Learning, pages449–456.

Tseng, P. (2008).On acclerated proximal gradient methods for convex-concave optimization.Technical report, University of Washington.

http://cs.nju.edu.cn/zlj Convex Optimization

Learning And Mining from DatA

A DAML