modern optimization techniques - universität …modern optimization techniques outline 1....

117
Modern Optimization Techniques Modern Optimization Techniques 1. Theory Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 39

Upload: others

Post on 22-Jun-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques

Modern Optimization Techniques1. Theory

Lars Schmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science

University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 39

Page 2: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques

Syllabus

Mon. 30.10. (0) 0. Overview

1. TheoryMon. 6.11. (1) 1. Convex Sets and Functions

2. Unconstrained OptimizationMon. 13.11. (2) 2.1 Gradient DescentMon. 20.11. (3) 2.2 Stochastic Gradient DescentMon. 27.11. (4) 2.3 Newton’s MethodMon. 4.12. (5) 2.4 Quasi-Newton MethodsMon. 11.12. (6) 2.5 Subgradient MethodsMon. 18.12. (7) 2.6 Coordinate Descent

— — Christmas Break —

3. Equality Constrained OptimizationMon. 8.1. (8) 3.1 DualityMon. 15.1. (9) 3.2 Methods

4. Inequality Constrained OptimizationMon. 22.1. (10) 4.1 Primal MethodsMon. 29.1. (11) 4.2 Barrier and Penalty MethodsMon. 5.2. (12) 4.3 Cutting Plane Methods

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 39

Page 3: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques

Outline

1. Introduction

2. Convex Sets

3. Convex Functions

4. Optimization Problems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 39

Page 4: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 1. Introduction

Outline

1. Introduction

2. Convex Sets

3. Convex Functions

4. Optimization Problems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 39

Page 5: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 1. Introduction

A convex function

x

f (x)f (x) = x2

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 39

Page 6: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 1. Introduction

A non-convex function

x

f (x)

f (x) = 0.1x2 + sin x

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

2 / 39

Page 7: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 1. Introduction

Convex Optimization Problem

An optimization problem

minimize f (x)

subject to hq(x) ≤ 0, q = 1, . . . ,Q

Ax = b

is said to be convex if f , h1 . . . hQ are convex

How do we know if a

function is convex or not?

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

3 / 39

Page 8: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 1. Introduction

Convex Optimization Problem

An optimization problem

minimize f (x)

subject to hq(x) ≤ 0, q = 1, . . . ,Q

Ax = b

is said to be convex if f , h1 . . . hQ are convex How do we know if a

function is convex or not?

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

3 / 39

Page 9: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Outline

1. Introduction

2. Convex Sets

3. Convex Functions

4. Optimization Problems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 10: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 11: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 12: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 13: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 14: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 15: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets

For any two points x1, x2 we can define the line through them as:

x = θx1 + (1− θ)x2 θ ∈ R

Example:

x1

x2

θ = 1

θ = 0

θ = 0.4

θ = 1.3

θ = −0.5

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 39

Page 16: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets - Definition

An affine set is a set containing the line through any two distinct pointsin it.

Examples:

I RN for N ∈ N+

I Solution set of linear equations {x ∈ RN | Ax = b}

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

5 / 39

Page 17: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Affine Sets - Definition

An affine set is a set containing the line through any two distinct pointsin it.

Examples:

I RN for N ∈ N+

I Solution set of linear equations {x ∈ RN | Ax = b}

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

5 / 39

Page 18: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 19: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 20: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 21: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 22: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 23: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 24: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets

The line segment between any two points x1, x2 is the set of all points:

x = θx1 + (1− θ)x2 0 ≤ θ ≤ 1

Example:

x1

x2

θ = 1

θ = 0

A convex set contains the line segment between any two points in theset.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 39

Page 25: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets - ExamplesConvex Sets:

Non-convex Sets:

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

7 / 39

Page 26: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Sets - ExamplesConvex Sets:

Non-convex Sets:

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

7 / 39

Page 27: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 2. Convex Sets

Convex Combination and Convex Hull

(standard) simplex:

∆N := {θ ∈ RN | θn ≥ 0, n = 1, . . . ,N;N∑

n=1

θn = 1}

convex combination of some points x1, . . . xN ∈ RM : any point x with

x = θ1x1 + θ2x2 + . . .+ θNxN , θ ∈ ∆N

convex hull of a set X ⊆ RM of points:

conv(X ) := {θ1x1 + θ2x2 + . . .+ θNxN | N ∈ N, x1, . . . , xN ∈ X , θ ∈ ∆N}

i.e., the set of all convex combinations of points in X .

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

8 / 39

Page 28: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Outline

1. Introduction

2. Convex Sets

3. Convex Functions

4. Optimization Problems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 39

Page 29: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex Functions

A function f : X → R,X ⊆ Rn is convex iff:

I dom f = X is a convex set

I for all x1, x2 ∈ dom f and 0 ≤ θ ≤ 1 it satistfies

f (θx1 + (1− θ)x2) ≤ θf (x1) + (1− θ)f (x2)

(the function is below of its secant segments/chords.)

(x1, f (x1))(x2, f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 39

Page 30: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex Functions

A function f : X → R,X ⊆ Rn is convex iff:

I dom f = X is a convex set

I for all x1, x2 ∈ dom f and 0 ≤ θ ≤ 1 it satistfies

f (θx1 + (1− θ)x2) ≤ θf (x1) + (1− θ)f (x2)

(the function is below of its secant segments/chords.)

(x1, f (x1))(x2, f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 39

Page 31: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex Functions

A function f : X → R,X ⊆ Rn is convex iff:

I dom f = X is a convex set

I for all x1, x2 ∈ dom f and 0 ≤ θ ≤ 1 it satistfies

f (θx1 + (1− θ)x2) ≤ θf (x1) + (1− θ)f (x2)

(the function is below of its secant segments/chords.)

(x1, f (x1))(x2, f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 39

Page 32: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex Functions

A function f : X → R,X ⊆ Rn is convex iff:

I dom f = X is a convex set

I for all x1, x2 ∈ dom f and 0 ≤ θ ≤ 1 it satistfies

f (θx1 + (1− θ)x2) ≤ θf (x1) + (1− θ)f (x2)

(the function is below of its secant segments/chords.)

(x1, f (x1))(x2, f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 39

Page 33: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex functions

x

f (x)

I θx1 + (1− θ)x2I (θx1 + (1− θ)x2, f (θx1 + (1− θ)x2))

I (θx1 + (1− θ)x2, θf (x1) + (1− θ)f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

10 / 39

Page 34: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex functions

x

f (x)

x2x1

I θx1 + (1− θ)x2

I (θx1 + (1− θ)x2, f (θx1 + (1− θ)x2))

I (θx1 + (1− θ)x2, θf (x1) + (1− θ)f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

10 / 39

Page 35: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex functions

x

f (x)

x2x1

I θx1 + (1− θ)x2I (θx1 + (1− θ)x2, f (θx1 + (1− θ)x2))

I (θx1 + (1− θ)x2, θf (x1) + (1− θ)f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

10 / 39

Page 36: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Convex functions

x

f (x)

x2x1

I θx1 + (1− θ)x2I (θx1 + (1− θ)x2, f (θx1 + (1− θ)x2))

I (θx1 + (1− θ)x2, θf (x1) + (1− θ)f (x2))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

10 / 39

Page 37: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

How are Convex Functions Related to Convex Sets?

epigraph of a function f : X → R,X ⊆ RN :

epi(f ) := {(x , y) ∈ X × R | y ≥ f (x)}

f is convex (as function) ⇐⇒ epi(f) is convex (as set).

proof is straight-forward (try it!)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

11 / 39

Page 38: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

How are Convex Functions Related to Convex Sets?

epigraph of a function f : X → R,X ⊆ RN :

epi(f ) := {(x , y) ∈ X × R | y ≥ f (x)}

f is convex (as function) ⇐⇒ epi(f) is convex (as set).

proof is straight-forward (try it!)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

11 / 39

Page 39: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Concave Functions

A function f is called concave if −f is convex

A Concave Function

x

f (x)

f0(x) = −x2

A Convex Function

x

f (x)f0(x) = x2

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

12 / 39

Page 40: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Concave Functions

A function f is called concave if −f is convex

A Concave Function

x

f (x)

f0(x) = −x2

A Convex Function

x

f (x)f0(x) = x2

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

12 / 39

Page 41: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Concave Functions

A function f is called concave if −f is convex

A Concave Function

x

f (x)

f0(x) = −x2

A Convex Function

x

f (x)f0(x) = x2

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

12 / 39

Page 42: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Strictly Convex Functions

A function f : X → R,X ⊆ RN is strictly convex if:

I dom f is a convex set

I for all x1, x2 ∈ dom f , x 6= y and 0 < θ < 1 it satistfies

f (θx1 + (1− θ)x2) < θf (x1) + (1− θ)f (x2)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

13 / 39

Page 43: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Strictly Convex Functions

A function f : X → R,X ⊆ RN is strictly convex if:

I dom f is a convex set

I for all x1, x2 ∈ dom f , x 6= y and 0 < θ < 1 it satistfies

f (θx1 + (1− θ)x2) < θf (x1) + (1− θ)f (x2)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

13 / 39

Page 44: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Strictly Convex Functions

A function f : X → R,X ⊆ RN is strictly convex if:

I dom f is a convex set

I for all x1, x2 ∈ dom f , x 6= y and 0 < θ < 1 it satistfies

f (θx1 + (1− θ)x2) < θf (x1) + (1− θ)f (x2)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

13 / 39

Page 45: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 46: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ R

I exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 47: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ R

I powers: f (x) = xa, with dom f = R+0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 48: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 49: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 50: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 51: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 52: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ R

I powers: f (x) = xa, with dom f = R+0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 53: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 54: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI exponential: f (x) = eax , with a ∈ RI powers: f (x) = xa, with dom f = R+

0 and a ≥ 1 or a ≤ 0

I powers of absolute value: f (x) = |x |a, with dom f = R and a ≥ 1

I negative entropy: f (x) = x log x , with dom f = R+

Examples of Concave Functions:

I affine: f (x) = ax + b, with dom f = R and a, b ∈ RI powers: f (x) = xa, with dom f = R+

0 and 0 ≤ a ≤ 1

I logarithm: f (x) = log x , with dom f = R+

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 39

Page 55: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:All norms are convex!

I Immediate consequence of the triangle inequality and absolutehomogeneity.

I For x ∈ RN , p ≥ 1:

p-norm: ||x||p := (∑N

n=1 |xn|p)1p ,

I ||x||∞ := maxn=1:N |xn|

Affine functions on vectors are also convex: f (x) = aTx + b

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 39

Page 56: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:All norms are convex!

I Immediate consequence of the triangle inequality and absolutehomogeneity.

I For x ∈ RN , p ≥ 1:

p-norm: ||x||p := (∑N

n=1 |xn|p)1p ,

I ||x||∞ := maxn=1:N |xn|

Affine functions on vectors are also convex: f (x) = aTx + b

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 39

Page 57: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:All norms are convex!

I Immediate consequence of the triangle inequality and absolutehomogeneity.

I For x ∈ RN , p ≥ 1:

p-norm: ||x||p := (∑N

n=1 |xn|p)1p ,

I ||x||∞ := maxn=1:N |xn|

Affine functions on vectors are also convex: f (x) = aTx + b

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 39

Page 58: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Examples

Examples of Convex functions:All norms are convex!

I Immediate consequence of the triangle inequality and absolutehomogeneity.

I For x ∈ RN , p ≥ 1:

p-norm: ||x||p := (∑N

n=1 |xn|p)1p ,

I ||x||∞ := maxn=1:N |xn|

Affine functions on vectors are also convex: f (x) = aTx + b

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 39

Page 59: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition

f is differentiable if dom f is open and the gradient

∇f (x) =

(∂f (x)

∂x1,∂f (x)

∂x2, . . . ,

∂f (x)

∂xn

)exists everywhere.

1st-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x, y ∈ dom f

f (y) ≥ f (x) +∇f (x)T (y − x)

(the function is above any of its tangents.)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

16 / 39

Page 60: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition

f is differentiable if dom f is open and the gradient

∇f (x) =

(∂f (x)

∂x1,∂f (x)

∂x2, . . . ,

∂f (x)

∂xn

)exists everywhere.

1st-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x, y ∈ dom f

f (y) ≥ f (x) +∇f (x)T (y − x)

(the function is above any of its tangents.)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

16 / 39

Page 61: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition

f is differentiable if dom f is open and the gradient

∇f (x) =

(∂f (x)

∂x1,∂f (x)

∂x2, . . . ,

∂f (x)

∂xn

)exists everywhere.

1st-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x, y ∈ dom f

f (y) ≥ f (x) +∇f (x)T (y − x)

(the function is above any of its tangents.)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

16 / 39

Page 62: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition1st-order condition: a differentiable function f is convex iff

I dom f is a convex setI for all x, y ∈ dom f

f (y) ≥ f (x) +∇f (x)T (y − x)

f (x)

xx

(x, f (x))

h(y) = f (x) +∇f (x)T (y − x)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

17 / 39

Page 63: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Proof

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

“⇒ ” : f (x + t(y − x)) ≤ (1− t)f (x) + tf (y) | : t

f (y) ≥ f (x + t(y − x))− f (x)

t+ f (x)

t→0+−→ ∇f (x)T (y − x) + f (x)

“⇐ ” : Apply twice to z := θx + (1− θ)y

f (x) ≥ f (z) +∇f (z)T (x − z)

f (y) ≥ f (z) +∇f (z)T (y − z)

θf (x) + (1− θ)f (y) ≥ f (z) +∇f (z)T (θx + (1− θ)y)−∇f (z)T z

= f (z) +∇f (z)T z −∇f (z)T z = f (z) = f (θx + (1− θ)y)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 39

Page 64: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Proof

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

“⇒ ” : f (x + t(y − x)) ≤ (1− t)f (x) + tf (y) | : t

f (y) ≥ f (x + t(y − x))− f (x)

t+ f (x)

t→0+−→ ∇f (x)T (y − x) + f (x)

“⇐ ” : Apply twice to z := θx + (1− θ)y

f (x) ≥ f (z) +∇f (z)T (x − z)

f (y) ≥ f (z) +∇f (z)T (y − z)

θf (x) + (1− θ)f (y) ≥ f (z) +∇f (z)T (θx + (1− θ)y)−∇f (z)T z

= f (z) +∇f (z)T z −∇f (z)T z = f (z) = f (θx + (1− θ)y)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 39

Page 65: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Proof

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

“⇒ ” : f (x + t(y − x)) ≤ (1− t)f (x) + tf (y) | : t

f (y) ≥ f (x + t(y − x))− f (x)

t+ f (x)

t→0+−→ ∇f (x)T (y − x) + f (x)

“⇐ ” : Apply twice to z := θx + (1− θ)y

f (x) ≥ f (z) +∇f (z)T (x − z)

f (y) ≥ f (z) +∇f (z)T (y − z)

θf (x) + (1− θ)f (y) ≥ f (z) +∇f (z)T (θx + (1− θ)y)−∇f (z)T z

= f (z) +∇f (z)T z −∇f (z)T z = f (z) = f (θx + (1− θ)y)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 39

Page 66: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Proof

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

“⇒ ” : f (x + t(y − x)) ≤ (1− t)f (x) + tf (y) | : t

f (y) ≥ f (x + t(y − x))− f (x)

t+ f (x)

t→0+−→ ∇f (x)T (y − x) + f (x)

“⇐ ” : Apply twice to z := θx + (1− θ)y

f (x) ≥ f (z) +∇f (z)T (x − z)

f (y) ≥ f (z) +∇f (z)T (y − z)

θf (x) + (1− θ)f (y) ≥ f (z) +∇f (z)T (θx + (1− θ)y)−∇f (z)T z

= f (z) +∇f (z)T z −∇f (z)T z = f (z) = f (θx + (1− θ)y)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 39

Page 67: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Proof

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

“⇒ ” : f (x + t(y − x)) ≤ (1− t)f (x) + tf (y) | : t

f (y) ≥ f (x + t(y − x))− f (x)

t+ f (x)

t→0+−→ ∇f (x)T (y − x) + f (x)

“⇐ ” : Apply twice to z := θx + (1− θ)y

f (x) ≥ f (z) +∇f (z)T (x − z)

f (y) ≥ f (z) +∇f (z)T (y − z)

θf (x) + (1− θ)f (y) ≥ f (z) +∇f (z)T (θx + (1− θ)y)−∇f (z)T z

= f (z) +∇f (z)T z −∇f (z)T z = f (z) = f (θx + (1− θ)y)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 39

Page 68: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

1st-Order Condition / Strict Variant

strict 1st-order condition: a differentiable function f is strictly convex iff

I dom f is a convex set

I for all x, y ∈ dom f

f (y) > f (x) +∇f (x)T (y − x)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

19 / 39

Page 69: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Global Minima

Let dom f = X be convex.

f : X → R convex⇔ f (y) ≥ f (x) +∇f (x)T (y − x) ∀x, y

Consequence: Points x with ∇f (x) = 0 are (equivalent) global minima.

I minima form a convex set

I if f is strictly convex: there is exactly one global minimum x∗.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

20 / 39

Page 70: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

2nd-Order Conditionf is twice differentiable if dom f is open and the Hessian ∇2f (x)

∇2f (x)n,m =∂2f (x)

∂xn∂xm

exists everywhere.

2nd-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x ∈ dom f

∇2f (x) � 0 for all x ∈ dom f

I if ∇2f (x) � 0 for all x ∈ dom f , then f is strictly convexI the converse is not true,

e.g., f (x) = x4 is strictly convex, but has 0 derivative at 0.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

21 / 39

Page 71: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

2nd-Order Conditionf is twice differentiable if dom f is open and the Hessian ∇2f (x)

∇2f (x)n,m =∂2f (x)

∂xn∂xm

exists everywhere.

2nd-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x ∈ dom f

∇2f (x) � 0 for all x ∈ dom f

I if ∇2f (x) � 0 for all x ∈ dom f , then f is strictly convexI the converse is not true,

e.g., f (x) = x4 is strictly convex, but has 0 derivative at 0.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

21 / 39

Page 72: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

2nd-Order Conditionf is twice differentiable if dom f is open and the Hessian ∇2f (x)

∇2f (x)n,m =∂2f (x)

∂xn∂xm

exists everywhere.

2nd-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x ∈ dom f

∇2f (x) � 0 for all x ∈ dom f

I if ∇2f (x) � 0 for all x ∈ dom f , then f is strictly convexI the converse is not true,

e.g., f (x) = x4 is strictly convex, but has 0 derivative at 0.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

21 / 39

Page 73: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

2nd-Order Conditionf is twice differentiable if dom f is open and the Hessian ∇2f (x)

∇2f (x)n,m =∂2f (x)

∂xn∂xm

exists everywhere.

2nd-order condition: a differentiable function f is convex iff

I dom f is a convex set

I for all x ∈ dom f

∇2f (x) � 0 for all x ∈ dom f

I if ∇2f (x) � 0 for all x ∈ dom f , then f is strictly convexI the converse is not true,

e.g., f (x) = x4 is strictly convex, but has 0 derivative at 0.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

21 / 39

Page 74: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Positive Semidefinite Matrices (A Reminder)A symmetric matrix A ∈ Rn×n is positive semidefinite (A � 0):

xTAx ≥ 0, ∀x ∈ RN

Equivalent:

(i) all eigenvalues of A are ≥ 0.

(ii) A = BTB for some matrix B

A symmetric matrix A ∈ RN×N is positive definite (A � 0):

xTAx > 0, ∀x ∈ RN \ {0}

Equivalent:

(i) all eigenvalues of A are > 0.

(ii) A = BTB for some nonsingular matrix B

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

22 / 39

Page 75: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Positive Semidefinite Matrices (A Reminder)A symmetric matrix A ∈ Rn×n is positive semidefinite (A � 0):

xTAx ≥ 0, ∀x ∈ RN

Equivalent:

(i) all eigenvalues of A are ≥ 0.

(ii) A = BTB for some matrix B

A symmetric matrix A ∈ RN×N is positive definite (A � 0):

xTAx > 0, ∀x ∈ RN \ {0}

Equivalent:

(i) all eigenvalues of A are > 0.

(ii) A = BTB for some nonsingular matrix B

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

22 / 39

Page 76: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

I There are a number of operations that preserve the convexity of afunction.

I If f can be obtained by applying those operations to a convexfunction, f is also convex.

Nonnegative multiple:

I if f is convex and a ≥ 0 then af is convex.

I Example: 5x2 is convex since x2 is convex

Sum:

I if f1 and f2 are convex functions then f1 + f2 is convex.

I Example: f (x) = e3x + x log x with dom f = R+ is convex since e3x

and x log x are convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

23 / 39

Page 77: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

I There are a number of operations that preserve the convexity of afunction.

I If f can be obtained by applying those operations to a convexfunction, f is also convex.

Nonnegative multiple:

I if f is convex and a ≥ 0 then af is convex.

I Example: 5x2 is convex since x2 is convex

Sum:

I if f1 and f2 are convex functions then f1 + f2 is convex.

I Example: f (x) = e3x + x log x with dom f = R+ is convex since e3x

and x log x are convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

23 / 39

Page 78: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

I There are a number of operations that preserve the convexity of afunction.

I If f can be obtained by applying those operations to a convexfunction, f is also convex.

Nonnegative multiple:

I if f is convex and a ≥ 0 then af is convex.

I Example: 5x2 is convex since x2 is convex

Sum:

I if f1 and f2 are convex functions then f1 + f2 is convex.

I Example: f (x) = e3x + x log x with dom f = R+ is convex since e3x

and x log x are convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

23 / 39

Page 79: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

I There are a number of operations that preserve the convexity of afunction.

I If f can be obtained by applying those operations to a convexfunction, f is also convex.

Nonnegative multiple:

I if f is convex and a ≥ 0 then af is convex.

I Example: 5x2 is convex since x2 is convex

Sum:

I if f1 and f2 are convex functions then f1 + f2 is convex.

I Example: f (x) = e3x + x log x with dom f = R+ is convex since e3x

and x log x are convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

23 / 39

Page 80: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

I There are a number of operations that preserve the convexity of afunction.

I If f can be obtained by applying those operations to a convexfunction, f is also convex.

Nonnegative multiple:

I if f is convex and a ≥ 0 then af is convex.

I Example: 5x2 is convex since x2 is convex

Sum:

I if f1 and f2 are convex functions then f1 + f2 is convex.

I Example: f (x) = e3x + x log x with dom f = R+ is convex since e3x

and x log x are convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

23 / 39

Page 81: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with the affine function:

I if f is convex then f (Ax + b) is convex.

I Example: norm of an affine function ||Ax + b||

Pointwise Maximum:

I if f1, . . . , fm are convex functions then f (x) = max{f1(x), . . . , fm(x)}is convex.

I Example: f (x) = maxi=1,...,I (aTi x + bi ) is convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

24 / 39

Page 82: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with the affine function:

I if f is convex then f (Ax + b) is convex.

I Example: norm of an affine function ||Ax + b||

Pointwise Maximum:

I if f1, . . . , fm are convex functions then f (x) = max{f1(x), . . . , fm(x)}is convex.

I Example: f (x) = maxi=1,...,I (aTi x + bi ) is convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

24 / 39

Page 83: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with the affine function:

I if f is convex then f (Ax + b) is convex.

I Example: norm of an affine function ||Ax + b||

Pointwise Maximum:

I if f1, . . . , fm are convex functions then f (x) = max{f1(x), . . . , fm(x)}is convex.

I Example: f (x) = maxi=1,...,I (aTi x + bi ) is convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

24 / 39

Page 84: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with the affine function:

I if f is convex then f (Ax + b) is convex.

I Example: norm of an affine function ||Ax + b||

Pointwise Maximum:

I if f1, . . . , fm are convex functions then f (x) = max{f1(x), . . . , fm(x)}is convex.

I Example: f (x) = maxi=1,...,I (aTi x + bi ) is convex

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

24 / 39

Page 85: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with scalar functions:

I if g : RN → R, h : R→ R and

f (x) = h(g(x))

I f is convex if:I g is convex, h is convex and nondecreasing orI g is concave, h is convex and nonincreasing

I Examples:I eg(x) is convex if g is convexI 1

g(x) is convex if g is concave and positive

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

25 / 39

Page 86: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with scalar functions:

I if g : RN → R, h : R→ R and

f (x) = h(g(x))

I f is convex if:I g is convex, h is convex and nondecreasing orI g is concave, h is convex and nonincreasing

I Examples:I eg(x) is convex if g is convexI 1

g(x) is convex if g is concave and positive

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

25 / 39

Page 87: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

Composition with scalar functions:

I if g : RN → R, h : R→ R and

f (x) = h(g(x))

I f is convex if:I g is convex, h is convex and nondecreasing orI g is concave, h is convex and nonincreasing

I Examples:I eg(x) is convex if g is convexI 1

g(x) is convex if g is concave and positive

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

25 / 39

Page 88: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

There are many different ways to establish the convexity of a function:

I Apply the definition

I Show that ∇2f (x) � 0 for twice differentiable functions

I Show that f can be obtained from other convex functions byoperations that preserve convexity

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

26 / 39

Page 89: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

There are many different ways to establish the convexity of a function:

I Apply the definition

I Show that ∇2f (x) � 0 for twice differentiable functions

I Show that f can be obtained from other convex functions byoperations that preserve convexity

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

26 / 39

Page 90: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 3. Convex Functions

Recognizing Convex Functions

There are many different ways to establish the convexity of a function:

I Apply the definition

I Show that ∇2f (x) � 0 for twice differentiable functions

I Show that f can be obtained from other convex functions byoperations that preserve convexity

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

26 / 39

Page 91: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Outline

1. Introduction

2. Convex Sets

3. Convex Functions

4. Optimization Problems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

27 / 39

Page 92: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Optimization Problem

minimize f (x)

subject to gp(x) = 0, p = 1, . . . ,P

hq(x) ≤ 0, q = 1, . . . ,Q

I f : RN → R is the objective function

I x ∈ RN are the optimization variables

I gp : RN → R, p = 1, . . . ,P are the equality constraint functions

I hq : RN → R, q = 1, . . . ,Q are the inequality constraint functions

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

27 / 39

Page 93: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Convex Optimization ProblemAn optimization problem

minimize f (x)

subject to gp(x) = 0, p = 1, . . . ,P

hq(x) ≤ 0, q = 1, . . . ,Q

is said to be convex if

I f is convex,

I g1, . . . , gP are affine and

I h1, . . . hQ are convex.

minimize f (x)

subject to Ax = a

hq(x) ≤ 0, q = 1, . . . ,Q

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

28 / 39

Page 94: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Household Spending

Suppose we have the following data about different households:

I Number of workers in the household (a1)

I Household composition (a2)

I Region (a3)

I Gross normal weekly household income (a4)

I Weekly household spending (y)

We want to create a model of the weekly household spending

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

29 / 39

Page 95: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Household Spending

Suppose we have the following data about different households:

I Number of workers in the household (a1)

I Household composition (a2)

I Region (a3)

I Gross normal weekly household income (a4)

I Weekly household spending (y)

We want to create a model of the weekly household spending

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

29 / 39

Page 96: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression

If we have data about M households, we can represent it as:

A =

1 a1,1 a1,2 a1,3 a1,41 a2,1 a2,2 a2,3 a2,4...

......

......

1 aM,1 aM,2 aM,3 aM,4

, y =

y1y2...yM

We can model the household consumption is a linear combination of thehousehold features with parameters β:

ym =βTAm,. = β01 + β1am,1 + β2am,2 + β3am,3 + β4am,4, m = 1, . . . ,M

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

30 / 39

Page 97: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression

If we have data about M households, we can represent it as:

A =

1 a1,1 a1,2 a1,3 a1,41 a2,1 a2,2 a2,3 a2,4...

......

......

1 aM,1 aM,2 aM,3 aM,4

, y =

y1y2...yM

We can model the household consumption is a linear combination of thehousehold features with parameters β:

ym =βTAm,. = β01 + β1am,1 + β2am,2 + β3am,3 + β4am,4, m = 1, . . . ,M

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

30 / 39

Page 98: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression

We have: 1 a1,1 a1,2 a1,3 a1,41 a2,1 a2,2 a2,3 a2,4...

......

......

1 aM,1 aM,2 aM,3 aM,4

·β0β1β2β3β4

y1y2...yM

We want to find parameters β such that the measured error of thepredictions is minimal:

M∑m=1

(βTAm,. − ym)2 = ||Aβ − y||22

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

31 / 39

Page 99: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression

We have: 1 a1,1 a1,2 a1,3 a1,41 a2,1 a2,2 a2,3 a2,4...

......

......

1 aM,1 aM,2 aM,3 aM,4

·β0β1β2β3β4

y1y2...yM

We want to find parameters β such that the measured error of thepredictions is minimal:

M∑m=1

(βTAm,. − ym)2 = ||Aβ − y||22

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

31 / 39

Page 100: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 101: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 102: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 103: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 104: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 105: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

||Aβ − y||22 = (Aβ − y)T (Aβ − y)

d

dβ(Aβ − y)T (Aβ − y) = 2AT (Aβ − y)

2AT (Aβ − y) = 0

ATAβ − ATy = 0

ATAβ = ATy

β = (ATA)−1ATy

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

32 / 39

Page 106: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

I Convex Problem!

I Analytical solution: β∗ = (ATA)−1ATy

I Often applied for data fitting

I Aβ − y is usually called the residual or error

I Extensions such as regularized least squares

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

33 / 39

Page 107: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 1: Linear Regression / Least Squares Problem

minimize ||Aβ − y||22

I Convex Problem!

I Analytical solution: β∗ = (ATA)−1ATy

I Often applied for data fitting

I Aβ − y is usually called the residual or error

I Extensions such as regularized least squares

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

33 / 39

Page 108: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 2: Linear Classification / Household Location

Suppose we have the following data about different households:

I Number of workers in the household (a1)

I Household composition (a2)

I Weekly household spending (a3)

I Gross normal weekly household income (a4)

I Region (y): north y = 1 or south y = 0

We want to create a model of the location of the household

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

34 / 39

Page 109: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 2: Linear Classification / Household Location

Suppose we have the following data about different households:

I Number of workers in the household (a1)

I Household composition (a2)

I Weekly household spending (a3)

I Gross normal weekly household income (a4)

I Region (y): north y = 1 or south y = 0

We want to create a model of the location of the household

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

34 / 39

Page 110: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 2: Linear ClassificationIf we have data about M households, we can represent it as:

A =

1 a1,1 . . . a1,41 a2,1 . . . a2,4...

......

...1 aM,1 . . . aM,4

, y =

y1y2...yM

We can model the probability of the household location to be north(y = 1) as a linear combination of the household features with parametersβ:

ym = σ(βTAm,.)

= σ(β01 + β1am,1 + β2am,2 + β3am,3 + β4am,4), m = 1, . . . ,M

where: σ(x) := 11+e−x (logistic function)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

35 / 39

Page 111: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 2: Linear ClassificationIf we have data about M households, we can represent it as:

A =

1 a1,1 . . . a1,41 a2,1 . . . a2,4...

......

...1 aM,1 . . . aM,4

, y =

y1y2...yM

We can model the probability of the household location to be north(y = 1) as a linear combination of the household features with parametersβ:

ym = σ(βTAm,.)

= σ(β01 + β1am,1 + β2am,2 + β3am,3 + β4am,4), m = 1, . . . ,M

where: σ(x) := 11+e−x (logistic function)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

35 / 39

Page 112: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 2: Linear Classification / Logistic Regression

The logistic regression learning problem is

maximizeM∑

m=1

ym log σ(βTAm,.) + (1− ym) log(1− σ(βTAm,.))

A =

1 a1,1 a1,2 a1,3 a1,41 a2,1 a2,2 a2,3 a2,4...

......

......

1 aM,1 aM,2 aM,3 aM,4

, y =

y1y2...yM

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

36 / 39

Page 113: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Example 3: Linear Programming

minimize cTx

subject to aTq x ≤ bq q = 1, . . . ,Q

x ≥ 0

c, aq, x ∈ RN , bq ∈ R

I No simple analytical solution.I There are reliable algorithms available:

I SimplexI Interior Points Method

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

37 / 39

Page 114: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Summary (1/2)I Convex sets are closed under line segments (convex combinations).

I Convex functions are defined on a convex domain andI are below any of their secant segments / chords (definition)I are globally above their tangents (1st-order condition)I have a positive semidefinite Hessian (2nd-order condition)

I For convex functions, points with vanishing gradients are (equivalent)global minima.

I Operations that preserve convexity:I scaling with a nonnegative constantI sumsI pointwise maximumI composition with an affine functionI composition with a nondecreasing convex scalar functionI composition of a nonincreasing convex scalar function with a concave

functionI esp. −g for a concave g

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

38 / 39

Page 115: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques 4. Optimization Problems

Summary (2/2)

I General optimization problems consist ofI an objective function,I equality constraints.I inequality constraints and

I Convex optimization problems haveI a convex objective function,I affine equality constraints andI convex inequality constraints.

I Examples for convex optimization problems:I linear regression / least squaresI linear classification / logistic regressionI linear programmingI quadratic programming

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

39 / 39

Page 116: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques

Further Readings

I Convex sets:I Boyd and Vandenberghe [2004], chapter 2, esp. 2.1I see also ch. 2.2 and 2.3

I Convex functions:I Boyd and Vandenberghe [2004], chapter 3, esp. 3.1.1–7, 3.2.1–5

I Convex optimization:I Boyd and Vandenberghe [2004], chapter 4, esp. 4.1–3I see also ch. 4.4

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

40 / 39

Page 117: Modern Optimization Techniques - Universität …Modern Optimization Techniques Outline 1. Introduction 2. Convex Sets 3. Convex Functions 4. Optimization Problems Lars Schmidt-Thieme,

Modern Optimization Techniques

References IStephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge Univ Press, 2004.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

41 / 39