math2070 - non-linear optimisation without …review: nonlinear optimisation without constraints the...

70
MATH2070 Optimisation Nonlinear optimisation without constraints Semester 2, 2012 Lecturer: I.W. Guo Lecture slides courtesy of J.R. Wishart

Upload: others

Post on 04-Apr-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

MATH2070 Optimisation

Nonlinear optimisation without constraints

Semester 2, 2012Lecturer: I.W. Guo

Lecture slides courtesy of J.R. Wishart

Page 2: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Review: Nonlinear optimisation without constraints

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 3: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 4: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Non-linear optimisation

Interested in optimising a function of several variables with noconstraints.

Multivariate framework

Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn

Objective Function Z = f (x1, x2, . . . , xn)

Example

Typical non-linear functions:

I f (x1, x2) = 7x51 − 10x3

2 + 3x1x2.

I f (x1, x2, x3) = e−x3(x2

1 + x22

).

I f (x1, x2) = 12π exp

−x21+x22

2

.

Page 5: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example: Trigonometric function

02

46 0

2

4

6−1

0

1

Figure 1: Objective function : f (x1, x2) = sin(x1)× sin(x2)

Page 6: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example: Bivariate Normal density

−2 −1 01

2−2

0

20

0.1

0.2

Figure 2: Objective function :f(x1, x2) = 1

π√3

exp(− 2

3

(x21 + x22 − x1x2

))

Page 7: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Optimisation of multivariate functions

Consider first minimising objective function over its domain D,

Z∗ = min(x1,x2,...,xn)∈D

f (x1, x2, . . . , xn) .

Written in vector notation,

Z∗ = minx∈D

f(x).

Consider the problem of local minima, which is simpler.

Methodology

1. Find critical points, i.e. when first partial derivatives of f arezero.

2. Determine the nature of critical points by looking at secondorder derivatives.

Page 8: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

No loss of generality

Can extend this to maximisation through the following,

maxx∈D

f(x) = −minx∈D−f(x)

−2 −1 01

2−2

0

2

−0.2

−0.1

0

Figure 3: Objective function :f(x1, x2) = −1

π√3

exp(− 2

3

(x21 + x22 − x1x2

))

Page 9: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Univariate review

Univariate framework

Domain x ∈ D ⊂ R.

Objective function f : [a, b]−→R.

Necessary conditions for Extrema

Occur at boundary of D or ..

In the interior of D when dfdx = 0.

Page 10: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example

xx∗x∗

f(x)

I Global max at the left boundary

I Global min in the interior

I Local max near the right boundary.

Page 11: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Nature of extrema

Generalised higher derivative test

Let m be a positive integer and assume that there exists anx0 ∈ (a, b) such that

f (1)(x0) = f (2)(x0) = . . . = f (2m−1)(x0) = 0.

Then the following holds

1. If f (2m)(x0) > 0, then x0 is the location of a local minimum.

2. If f (2m)(x0) < 0, then x0 is the location of a local maximum.

3. If f (2m)(x0) = 0 and f (2m+1)(x0) 6= 0 then the test fails.

Page 12: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Taylor’s Theorem

Theorem

Taylor’s Suppose that f ∈ Cp[a, b] and the derivative f (p+1) existson [a, b], and let x0 ∈ [a, b]. For every x ∈ [a, b] there exists η(x)between x0 and x such that

f(x) = f(x0) + f (1)(x0)(x− x0) +f (2)(x0)

2!(x− x0)2

+ · · ·+ f (p)(x0)

p!(x− x0)p +

f (p+1)(η)

(p+ 1)!(x− x0)p+1 .

To prove the results on previous slide choose p = 2m− 1.

Page 13: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Proof of result 1.

Choose p = 2m− 1 in Taylor’s theorem yields,

f(x) = f(x0) +f (2m)(η)

(2m)!(x− x0)2m ,

where η is between x and x0.

I The term (x− x0)2m is guaranteed to be positive.

I By assumption, f (2m)(η) will be positive if x is in aneighbourhood of x0. Then, f(x) > f(x0) when x is in a neighbourhood of x0.

Page 14: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Proof of result 2.

Again, choose p = 2m− 1 in Taylor’s theorem,

f(x) = f(x0) +f (2m)(η)

(2m)!(x− x0)2m ,

where η is between x and x0.

I The term (x− x0)2m is guaranteed to be positive.

I By assumption, f (2m)(η) will be negative if x is in aneighbourhood of x0. Then, f(x) < f(x0) when x is in a neighbourhood of x0.

Page 15: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 16: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Bivariate Analysis

Consider now the necessary and sufficient conditions for extremafor a bivariate function.

Bivariate framework

Domain x = (x, y) ∈ D ⊂ R2.

Objective function f = f(x, y) :D−→R.

Extend the Taylor expansion argument to two dimensional case.

Page 17: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Necessary condition for extrema

Critical point

A point x0 is a stationary point of the function f = f(x) if

∂f

∂x=∂f

∂y= 0.

Introduce notation for partial derivatives,

Notation

Let f = f(x, y) then,

∂f∂x = fx, ∂f

∂y = fy, ∂2f∂x2

= fxx, ∂2f∂x∂y = fxy

and so on...

Page 18: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Minimisation example

Example

Minimize z = f(x, y) = 12(x− 1)2 + 1

2(y − 2)2 + 1.

Solution

Consider the necessary conditions:

∂f

∂x= x− 1 = 0

∂f

∂y= y − 2 = 0

and the result follows.

Page 19: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Bivariate version

Theorem (Taylor’s Theorem)

Suppose that f(x, y) and its partial derivatives of all orders lessthan or equal to p+ 1 are continuous on D ⊆ R2 and letx0 = (x0, y0) ∈ D. For every x = (x, y) ∈ D, there exists ξbetween x and x0, and η between y and y0 such that

f(x,y) =f(x0,y0)+fx(x0)(x−x0)+fy(x0)(y−y0)

+ 12!(fxx(x0)(x−x0)2+2fxy(x0)(x−x0)(y−y0)+fyy(x0)(y−y0)2)

+···+ 1p!

(∑pj=0 (pj)( ∂pf

∂xp−j∂yj)0(x−x0)p−j(y−y0)j

)+Rp(x,y,ξ,η)

where Rp is a remainder term.

Page 20: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example: Inverted linear function

1

2

3

4−3

−2

−1

0

0.5

1

Figure 4: Objective function : z = f (x, y) = 1x−y

Page 21: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Quadratic level Taylor expansion: Example

Find a quadratic approximation to z = 1x−y at (x, y) = (1, 0).

Let z = 1x−y = (x− y)−1. Finding the partial derivatives,

zx = − 1

(x− y)2= −zy , zxx =

2

(x− y)3= zyy = −zxy .

Evaluating z and these derivatives at x0 = (1, 0) gives

z(1, 0) = 1 , zx(1, 0) = −1 = −zy , zxx = 2 = zyy = −zxy .

Apply Taylor’s Theorem,

z(x, y) = 1− (x− 1) + y +1

2

(2(x− 1)2 − 2(x− 1)y + 2y2

)+ . . .

= 3− 3x+ 2y − xy + x2 + y2 + . . .

Page 22: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Hessian matrix

Consider the Taylor expansion in matrix form.

Introduce the gradient operator ∇f :=

(∂f∂x

∂f∂y

)

Definition

Define the Hessian matrix

H (x) = H(x, y) :=

∂2f

∂x2

∂2f

∂x∂y

∂2f

∂y∂x

∂2f

∂y2

.

Page 23: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Matrix version of Taylor formula

Define the distance from x to x0 as,

d := x− x0 =

(x− x0

y − y0

).

Then Taylor’s formula can be rewritten,

Matrix version

f(x) = f(x0 + d)

= f(x0) + dT∇f(x0) +1

2dTH(x0)d + · · · ,

Page 24: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Proof of Matrix version

∇f (x0)T d =(∂f∂x (x0) ∂f

∂y (x0))(x− x0

y − y0

)=∂f

∂x(x0) (x− x0) +

∂f

∂y(x0) (y − y0)

dTH(x0)d =(x− x0 y − y0

)∂2f

∂x2(x0)

∂2f

∂x∂y(x0)

∂2f

∂y∂x(x0)

∂2f

∂y2(x0)

(x− x0

y − y0

)

=∂2f

∂x2(x0)(x− x0)2 + 2

∂2f

∂x∂y(x0)(x− x0)(y − y0)

+∂2f

∂y2(x0)(y − y0)2.

Page 25: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Application of Matrix version

Recall, the necessary condition for a critical point at x0 is,

∂f

∂x=∂f

∂y= 0⇒ ∇f (x0) = 0

So, at critical point Taylor expansion reduces to,

f(x) = f(x0) +1

2dTH(x0)d + . . .

Focus analysis on matrix H to determine sufficient conditions forbehaviour of critical points.

Page 26: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Quadratic Form

Assume the matrix M is a symmetric matrix then

Definition

The expression Q := xTMx is known as a the quadratic formassociated with the matrix M .

Known as quadratic form due to the pairwise multiplication(powers of two) elements in the vector.

The Hessian matrices are symmetric, since it is assumed that thefunction f is continuous.

Page 27: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Invariant under order of partial differentiation

If f is continuous then this implies that,

∂2f

∂x∂y=

∂2f

∂y∂x.

and therefore the Hessian matrix becomes,

H(x) =

(∂2f∂x2

∂2f∂y∂x

∂2f∂x∂y

∂2f∂y2

)=

(∂2f∂x2

∂2f∂x∂y

∂2f∂x∂y

∂2f∂y2

)Why is this useful?

Page 28: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Quadratic form

Example

If H is the symmetric matrix,

H =

(2 55 2

)the associated quadratic form Q = xTHx is

Q =(x y

)(2 55 2

)(xy

)= 2x2 + 5xy + 5yx+ 2y2

= 2x2 + 10xy + 2y2 .

Page 29: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

General Quadratic form for bivariate functions

Bivariate Quadratic form

Given x =

(xy

)and a symmetric matrix,

M =

(a bb c

)then

Q := xTMx = ax2 + 2bxy + cy2.

Page 30: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Inverse operation of Quadratic form

Inverse Quadratic form result

Consider the quadratic form Q := ax2 + bxy + cyx+ dy2. Thenthe associated symmetric matrix M is

M =

(a 1

2(b+ c)12(b+ c) d

).

Example

If Q = 3x2 + 14xy + y2, then the associated symmetric matrix Mis,

M =

(3 77 1

).

Page 31: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Link to Quadratic form

The quadratic form Q = xTHx, and its associated symmetricmatrix H, may be classified as follows:

1. Q and H are positive definite, if Q > 0 for x 6= 0.

2. Q and H are negative definite, if Q < 0 for x 6= 0.

3. Q and H are positive semi-definite, if Q ≥ 0 and Q = 0 forsome x 6= 0.

4. Q and H are negative semi-definite, if Q ≤ 0 and Q = 0 forsome x 6= 0.

5. Q and H are indefinite, if there exist x1, x2 such that Q > 0at x = x1 and Q < 0 at x = x2.

Page 32: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Positive definite function

−10−5

05

10

−10−5

05

100

50

100

150

200

x1

x2

Q1(x

1,x2)

Figure 5: Q1 (x1, x2) = x21 + x22

Notice that the function is positive for all values in its domain(excluding x = 0).

Page 33: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Negative definite function

−10−5

05

10

−10−5

05

10−200

−150

−100

−50

0

x1

x2

Q2(x

1,x2)

Figure 6: Q2 (x1, x2) = −x21 − x22

Notice that the function is negative for all values in its domain(excluding x = 0).

Page 34: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Indefinite function

−10−5

05

10

−10−5

05

10−100

−50

0

50

100

x1

x2

Q3(x

1,x2)

Figure 7: Q3 (x1, x2) = x21 − x22

Notice the saddle point around x = 0.

Page 35: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Positive semi-definite function

−10 −5 0 5 10−10

0

100

100

200

300

400

x1

x2

Q4(x

1,x2)

Figure 8: Q4 (x1, x2) = x21 + 2x1x2 + x22

The function is positive but can be zero along the axis x1 = −x2.

Page 36: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Negative semi-definite function

−10 −5 0 5 10−10

0

10−400

−300

−200

−100

0

x1

x2

Q5(x

1,x2)

Figure 9: Q5 (x1, x2) = −(x21 + 2x1x2 + x22)

The function is negative but can be zero along the axis x1 = −x2.

Page 37: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Why bother with Quadratic form?

Reason

There exists sufficient conditions for the nature of critical pointsand those conditions are linked to the Hessian H and Q.

Sufficient conditions

1. If H is positive definite, then x0 is a local minimum of f(x).

2. If H is negative definite, then x0 is a local maximum of f(x).

3. If H is positive semi-definite, then the test fails.

4. If H is negative semi-definite, then the test fails.

5. If H is indefinite, then x0 is a saddle point of f(x).

Page 38: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Why does the test fail sometimes?

Recall the Taylor expansion,

f(x) = f(x0) + dT∇f(x0) + dTH (x0)d + · · ·

If H (x0) is semi-definite then the sign of dTH (x0)d is neitherpositive or negative everywhere.

So, nature determined by higher order terms in expansion.

Page 39: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example of completing the square

Example

Let Q be given by Q = x2 + 6xy + 11y2.

Determine the nature of this quadratic form.

Solution

Q = x2 + 6xy + 11y2

= x2 + 6xy + (3y)2 + 2y2

= (x+ 3y)2 + 2y2

> 0

for (x, y) 6= (0, 0). Therefore Q is positive definite.

Page 40: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 41: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Generalise the optimisation

Consider the full generalised framework.

Multivariate framework

Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn

Objective Function Z = f (x1, x2, . . . , xn)

Wish to extend the arguments in the Bivariate section withHessian and Quadratic forms to the n-dimensional case.

Notation

For the generalised partial derivatives denote,

∂pf

∂xi11 ∂xi22 . . . ∂x

inn

= fxi11 x

i22 ...x

inn

Page 42: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Taylor’s Theorem

Theorem

Suppose that f(x) and its partial derivatives of all orders less thanor equal to p+ 1 are continuous on an open set D ⊂ Rn and letx0 ∈ D such that the line segment joining x to x0 lies in D. Forevery x ∈ D, there exists ξ between x and x0, such that

f(x) =f(x0) +∑n

i=1 fxi (x0)(xi−x0i)

+ 12!

∑ni=1

∑nj=1 fxixj (x0)(xi−x0i)(xj−x0j) + ···

+ 1p!

∑pi∈S

p!i1!i2!...in!

fxi11 x

i22 ...x

inn

(x0)(x1−x01)i1 (x2−x02)i2 ...(xn−x0n)in+Rp(x)

where the summation setS = i1, i2, . . . , in; 0 ≤ i1, i2, . . . , in ≤ p, i1 + i2 + · · ·+ in = pand Rp is the remainder term.

Page 43: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Generalised gradient operator

Definition

Define the n-dimensional gradient operator

∇f :=

∂f∂x1

∂f∂x2

...

∂f∂xn

.

So the first summation is∑ni=1 fxi(x0)(xi − x0i) = (x− x0)T∇f(x0).

Page 44: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Generalised Hessian

Definition

Define the Hessian matrix

H =

(∂2f

∂xi∂xj

)1≤i,j≤n

=

∂2f

∂x21

∂2f

∂x1∂x2· · · ∂2f

∂x1∂xn∂2f

∂x2∂x1

∂2f

∂x22

· · ·...

......

. . ....

∂2f

∂xn∂x1· · · · · · ∂2f

∂x2n

.

Then the second summation is∑ni=1

∑nj=1 fxixj (x0)(xi − x0i)(xj − x0j) = dTH (x0)d.

Page 45: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Functions

Consider functions f such that,

∂2f

∂xi∂xj=

∂2f

∂xj∂xi

so that the Hessian H is always a symmetric matrix, i.e.Hij = Hji.

H =

∂2f

∂x21

∂2f

∂x1∂x2· · · ∂2f

∂x1∂xn∂2f

∂x1∂x2

∂2f

∂x22

· · ·...

......

. . ....

∂2f

∂x1∂xn· · · · · · ∂2f

∂x2n

.

Page 46: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Necessary conditions

Critical point

A point x0 is a stationary point of the function f = f(x) if

∂f

∂x1=

∂f

∂x2= . . . =

∂f

∂xn= 0.

Page 47: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

General Quadratic form for univariate functions

Trivariate Quadratic form

Given x =

xyz

and a symmetric matrix,

M =

a b cb d ec e f

then

Q := xTMx = ax2 + 2bxy + 2cxz + dy2 + 2eyz + fz2.

Page 48: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Example of going Q→M

Example

The quadratic form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

has the associated symmetric matrix,

M =

1 2 12 5 31 3 2

.

Page 49: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Inverse operation of Quadratic form (Trivariate)

Inverse Quadratic form result

Consider the quadratic form

Q := ax2 + bxy + cxz + dyz + ey2 + fz2.

Then the associated symmetric matrix M is

M =

a 1

2b12c

12b e 1

2d

12c

12d f

.

Page 50: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Multivariate Quadratic form

Example

The quadratic form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

has the associated symmetric matrix,

M =

1 2 12 5 31 3 2

.

Page 51: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Completing the square

Example

Complete the square for the form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

Solution

Q = (x21 + 4x1x2 + 2x1x3) + 5x2

2 + 6x2x3 + 2x23

= (x1 + 2x2 + x3)2 + x22 + 2x2x3 + x2

3

= (x1 + 2x2 + x3)2 + (x2 + x3)2 .

I Q is usually positive (sum of squares)

I Q can be zero if x = (a,−a, a) for some a 6= 0.

Page 52: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Same sufficient conditions apply for multivariate case.

If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.

Assume ∇f(x0) = 0 for some x0 ∈ Rn

Sufficient conditions

1. If H is positive definite, then x0 is a local minimum of f(x).

2. If H is negative definite, then x0 is a local maximum of f(x).

3. If H is positive semi-definite, then the test fails.

4. If H is negative semi-definite, then the test fails.

5. If H is indefinite, then x0 is a saddle point of f(x).

Page 53: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

3-dimensional example

Find the critical points of z = f(x1, x2, x3) = 3x21x

22 + x2

3.Show that Hessian is only positive semi-definite and the localminimum test fails.Are the critical points actually local minima?

Page 54: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 55: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Alternative to completing the square

First method considered to check the nature of H is completingthe square.

Alternative method exists to determine the nature of the HessianH which checks the eigenvalues of the matrix H.

In particular, we check the sign of the eigenvalues of the matrix toshow it is positive or negative definite.

Page 56: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Brief review of linear algebra

Definition

The eigenvalues of a square n× n matrix H are the valuesλ1, λ2, . . . , λn such that,

det |H − λiI| = 0,

where det |M | is the determinant of the matrix M and I is theidentity matrix.

Recall, H is a symmetric matrix which means the eigenvalues arereal.

To ease the notation, assume eigenvalues are ordered so that,

λ1 ≤ λ2 ≤ · · · ≤ λn .

Page 57: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Sufficient conditions for nature of H

Eigenvalue Test

1. H is positive definite if and only if all its eigenvalues arepositive, i.e. λ1 > 0.

2. H is negative definite if and only if all its eigenvalues arenegative, i.e. λn < 0.

3. H is indefinite if and only if it has positive and negativeeigenvalues, i.e. λ1 < 0 and λn > 0.

4. H is positive semi-definite if and only if all its eigenvalues arenon-negative and at least one is zero, i.e. λ1 = 0.

5. H is negative semi-definite if and only if all its eigenvalues arenon-positive and at least one is zero, i.e. λn = 0.

Page 58: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Same sufficient conditions apply for multivariate case.

If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.

Assume ∇f(x0) = 0 for some x0 ∈ Rn

Sufficient conditions

1. If H is positive definite, then x0 is a local minimum of f(x).

2. If H is negative definite, then x0 is a local maximum of f(x).

3. If H is positive semi-definite, then the test fails.

4. If H is negative semi-definite, then the test fails.

5. If H is indefinite, then x0 is a saddle point of f(x).

Page 59: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Simple Hessian eigenvalue example

Example

Find all critical points for the function

f(x, y) = x2 + y2

and determine the nature of the critical points.

Solution

Finding the critical points,

∂f

∂x= 2x ,

∂f

∂y= 2y .

Critical point at x0 = (0, 0). Find Hessian,

H =

(2 00 2

).

Page 60: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

3-dimensional Hessian example

Find all extrema of the function,

f(x, y, z) = 3xy − x3 − y3 − 3z2

Critical points at,

∂f

∂x= 3y − 3x2 = 0,

∂f

∂y= 3x− 3y2 = 0,

∂f

∂z= −6z = 0 .

Two critical points for these equations (1, 1, 0) and (0, 0, 0). TheHessian is

H(x, y, z) =

−6x 3 03 −6y 00 0 −6

.

Page 61: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

At (0, 0, 0) the Hessian is

H(0, 0, 0) =

0 3 03 0 00 0 −6

.

The eigenvalues of H (0, 0, 0) are,

λ = 3, −3, −6.

Therefore H(0, 0, 0) is indefinite and (0, 0, 0) is a saddle point.At the other critical point (1, 1, 0),

H(1, 1, 0) =

−6 3 03 −6 00 0 −6

.

The eigenvalues are λ = −3, −6, −9, H(1, 1, 0) is negativedefinite and hence there is a local maximum at (1, 1, 0).

Page 62: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets

Page 63: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Convex Set

Definition

A set Ω ∈ Rn is convex, if for any points P , Q in Ω, the linesegment PQ joining P and Q lies in Ω. If p and q are the positionvectors of P and Q, then the point R with position vectorr = cp + (1− c)q lies in Ω, where 0 ≤ c ≤ 1.

Roughly speaking, if two points x, y ∈ Ω, then the line joining thepoints x and y is also in Ω.

I Linear constraints are convex,

g(x) =

n∑i=1

cixi − b = 0,

for some constants ci and b.

Page 64: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Convex function

Definition

A function f(x) = f(x1, x2, . . . , xn) is convex on the convex setΩ, if

f(cp + (1− c)q) ≤ cf(p) + (1− c)f(q) ,

where 0 ≤ c ≤ 1.

Note in particular, that linear functions are convex.Proof: Left as an exercise.

I A function is called strictly convex if the inequality is strict.

I A function f is (strictly) concave if −f is (strictly) convex.

Page 65: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Convex function example

x

y

p r q

f(p)

f(r)

f(q)cf(p) + (1− c)f(q)

Roughly speaking,

the chord between two points on f lies above f .

Page 66: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Theorem

If fini=1 are convex functions on the convex set Ω and ci ≥ 0,then

∑ni=1 cifi is convex on Ω.

Proof is left as an exercise.

Theorem

If f is a convex function on the convex set Ω. Then

Uc =x ∈ Ω

∣∣∣f(x) ≤ c

is a convex set for all c ∈ R.

Page 67: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Differentiable Convex functions

Theorem

Let f ∈ C1(Ω) with Ω convex. Then f is convex if and only if

f(y) ≥ f(x) + (y − x)T∇f(x)

for all x,y ∈ Ω.

Theorem

Let f ∈ C2(Ω), Ω is a non-empty convex set. Then f is convex onΩ if and only if Hf , the Hessian of f is positive semi-definite on Ω.

Page 68: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Check for Convexity

Example

Is the function

f(x, y) = 2x2 + xy + 2y2 − 12x+ 12y − 12

convex?

Page 69: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Check for Convexity

Example

Is the function

f(x, y) = −2x2 + xy + 2y2 − 2x+ 8y + 1

convex?

Page 70: MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Introduction Bivariate Multivariate Hessian Convex

Convex optimisation

Theorem

If f is a convex function on a convex set Ω, then

M =

x ∈ Ω

∣∣∣minx∈Ω

f(x)

is a convex set and any local minimum is a global minimum.

Theorem

If f ∈ C1(Ω) with Ω convex and there exists a x∗ such that for ally ∈ Ω,

(y − x∗)T∇f(x∗) ≥ 0,

then x∗ is a global minimum of f over Ω.