optimization algorithms on riemannian manifolds with ...gallivan/talks/whuang_defense_slides.pdf ·...

70
Introduction Line Search Optimization Methods Trust Region Optimization Methods Optimization for Partly Smooth Functions Implementations Experiments and Applications Conclusions Optimization Algorithms on Riemannian Manifolds with Applications Wen Huang Coadvisor: Kyle A. Gallivan Coadvisor: Pierre-Antoine Absil Florida State University Catholic University of Louvain November 5, 2013 Optimization Algorithms on Riemannian Manifolds with Applications 1

Upload: others

Post on 14-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Optimization Algorithms on Riemannian Manifoldswith Applications

Wen Huang

Coadvisor: Kyle A. Gallivan Coadvisor: Pierre-Antoine Absil

Florida State University Catholic University of Louvain

November 5, 2013

Optimization Algorithms on Riemannian Manifolds with Applications 1

Page 2: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Problem StatementsMotivationsFrameworks of OptimizationExisting Optimization Algorithms

Problem Statements

Finding an optimum of a real-valued function f on a Riemannianmanifold, i.e.,

min f (x), x ∈M

Finite dimensional manifold

Roughly speaking, a manifold is a set endowed with coordinatepatches that overlap smoothly, e.g.,

sphere: {x ∈ Rn|‖x‖2 = 1}.

Optimization Algorithms on Riemannian Manifolds with Applications 2

Page 3: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Problem StatementsMotivationsFrameworks of OptimizationExisting Optimization Algorithms

Motivations

Optimization on manifolds is used in many areas [AMS08].

Numerical linear algebra

Signal processing

Data mining

Statistical image analysis

Optimization Algorithms on Riemannian Manifolds with Applications 3

Page 4: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Problem StatementsMotivationsFrameworks of OptimizationExisting Optimization Algorithms

Frameworks of Optimization

Line search optimization methods

Find a search direction,Apply a line search algorithm and obtain a next iterate.

Trust region optimization methods

Build a local model that approximates the objective function f ,Optimize the local model and obtain a candidate of next iterate,If the local model is close to f , then accept the candidate to be nextiterate, otherwise, reject the candidate,Update the local model.

Optimization Algorithms on Riemannian Manifolds with Applications 4

Page 5: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Problem StatementsMotivationsFrameworks of OptimizationExisting Optimization Algorithms

Existing Euclidean Optimization Algorithms

There are many algorithms developed for problems in Euclidean space.(see e.g. [NW06]) e.g.,

Newton-based (requires gradient and Hessian)

gradient-based (requires gradient only)

Steepest descentQuasi-Newton

Restricted Broyden Family (BFGS, DFP)Symmetric rank-1 update

These ideas can be combined with line search or trust regionstrategies.

Optimization Algorithms on Riemannian Manifolds with Applications 5

Page 6: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Problem StatementsMotivationsFrameworks of OptimizationExisting Optimization Algorithms

Existing Riemannian Optimization Algorithms

The algorithmic and theoretical workon Riemannian manifolds is quitelimited.

Trust region withNewton-Steihaug CG (C. G.Baker [Bak08])

Riemannian BFGS (C. Qi[Qi11])

Riemannian BFGS (W. Ringand B. Wirth [RW12])

Quadratic:

limk→∞

dist(xk+1, x∗)

dist(xk , x∗)2<∞

Superlinear:

limk→∞

dist(xk+1, x∗)

dist(xk , x∗)= 0

Linear:

limk→∞

dist(xk+1, x∗)

dist(xk , x∗)< 1

Optimization Algorithms on Riemannian Manifolds with Applications 6

Page 7: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Framework of Line Search Optimization Methods

Line search optimization methods on Euclidean space

x+ = x + αd ,

where d is a descent direction and α is a step size.

Cannot apply to problems on Riemannian manifold directly

direction?addition?

Riemannian concepts can be found in [O’N83, AMS08].

Optimization Algorithms on Riemannian Manifolds with Applications 7

Page 8: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Tangent Space

γ is a curve onM. Thetangent vector shows thedirection along γ at x , forwhich is γ′(0), where γ(0) = x .

Tangent space at x is the set ofall tangent vectors(directions)at x , denoted by TxM.

Tangent space is a linear space.

x

ηx

TxMγ

Optimization Algorithms on Riemannian Manifolds with Applications 8

Page 9: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Metric

A Riemannian metric g is defined on each TxM as an inner productgx : TxM× TxM→ R. A Riemannian manifold is the combination(M, g). This results in:

angle between directions and length of directions

distance:

d(x , y) = infγ{

∫ 1

0

‖γ(t)‖gγ(t)dt},

where γ is a curve onM with γ(0) = x and γ(1) = y .

neighborhood:

Bδ(x) = {y ∈M : d(x , y) < δ}.

Optimization Algorithms on Riemannian Manifolds with Applications 9

Page 10: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Retraction

Retraction is a mapping from atangent vector to a point onM,denoted by Rx(ηx ) where x ∈Mand ηx ∈ TxM.

x

ηx

Rx(ηx)

Optimization Algorithms on Riemannian Manifolds with Applications 10

Page 11: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Framework of Line Search Optimization Methods

Line search optimization methods on Riemannian manifolds

x+ = Rx(αd),

where d ∈ TxM and α is a step size.

Optimization Algorithms on Riemannian Manifolds with Applications 11

Page 12: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Gradient

The Riemannian gradient grad f of f at x is the unique tangentvector such that

〈grad f (x), η〉x = D f (x)[η], ∀η ∈ TxM,

where D f (x)[η] denotes the derivative of f along η.

grad f (x) is the steepest ascent direction.

Optimization Algorithms on Riemannian Manifolds with Applications 12

Page 13: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Search Direction and Step Size

Search directionThe angle between − grad f

and d does not approach π/2.

Step size

f decreases sufficiently,Step size is not too small,e.g., the Wolfe conditions,the Armijo-Goldsteinconditions.

Above conditions are sufficientto guarantee convergence.

Example: The figure shows thecontour curves of f around aminimizer x∗.

-grad f(x)

x∗x

αd

M

Optimization Algorithms on Riemannian Manifolds with Applications 13

Page 14: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Steepest Descent

Riemannian steepest descent (RSD): d = − grad f (x),

Converges slowly, i.e., linearly

limk→∞

dist(xk+1, x∗)

dist(xk , x∗)< 1

The Riemannian Hessian of f at x is a linear operator on TxM.

Let Hess f (x∗) denote the Hessian at the minimizer x∗ and λmin andλmax respectively denote the smallest and largest eigenvalue ofHess f (x∗). The smaller λmin/λmax is, the more slowly steepestdescent converges. [AMS08, Theorem 4.5.6]

Optimization Algorithms on Riemannian Manifolds with Applications 14

Page 15: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

An Example for Steepest Descent

f is a function defined on a Euclidean space.

x∗ is a minimizer and λmin/λmax is small.

The following figure shows contour curves of f (x) around x∗ anditerates generated by an exact line search algorithm.

xk

xk+1

xk+2

x∗

−gradf(xk)

−gradf(xk+1)

Optimization Algorithms on Riemannian Manifolds with Applications 15

Page 16: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Newton Method

Riemannian Newton update formula:

x+ = Rx(α[−Hess f (x)−1 grad f (x)]),

where α is chosen to be 1 when x is close enough to x∗.The search direction is not necessarily descent.When xk is close enough to x∗, the search direction is descent.Riemannian Newton method converges quadratically [AMS08,

Theorem 6.3.2], i.e.,limk→∞dist(xk+1,x

∗)dist(xk ,x∗)2

<∞.

xk

x∗

−Hessf(xk)−1gradf(xk)

O(dist(xk, x∗)2)

−gradf(xk)

O(dist(xk, x∗))

Optimization Algorithms on Riemannian Manifolds with Applications 16

Page 17: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Quasi-Newton Methods

Steepest descent method

Converge slowly

Newton method

Requires the action of the Hessian which may be expensive orunavailableSearch direction may be not descent. Therefore, extra considerationsare required.

Quasi-Newton method

Approximate the action of the Hessian or its inverse and thereforeaccelerate the convergent rateProvide an approach to produce a descent direction

Optimization Algorithms on Riemannian Manifolds with Applications 17

Page 18: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Secant Condition

An 1 dimension example to show the idea of the secant condition.

f(x) = x4gradf(x) = 4x3

gradf(x) = 4x3

xkxk+1

update of Newton method

gradf(x) = 4x3

xk−1xk+1

update of Secant idea

xk

Newton: xk+1 = xk − (Hess f (xk ))−1 grad f (xk )

Secant: xk+1 = xk − B−1k grad f (xk),Bk (xk − xk−1) = grad f (xk )− grad f (xk−1)

Optimization Algorithms on Riemannian Manifolds with Applications 18

Page 19: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Secant Conditions

Euclidean:

grad f (xk+1)− grad f (xk ) = Bk+1(xk+1 − xk).

Riemannian:

xk+1 − xk can be replaced by R−1xk

(xk+1)grad f (xk+1) and grad f (xk) are in different tangent spaces. Amethod of comparing tangent vectors in different tangent spaces isrequired.

Optimization Algorithms on Riemannian Manifolds with Applications 19

Page 20: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Vector Transport

Vector transport

Transport a tangent vector from onetangent space to another.

notation: Tηxξx , denotes transport of

ξx to tangent space of Rx(ηx). R is aretraction associated with T .

An isometric vector transport, denotedby TS , additionally satisfies

gx(ηx , ξx) = gy (TSζxηx , TSζx

ξx),

where x , y ∈ M, y = Rx(ζx ) andηx , ξx , ζx ∈ TxM.

x

ηx

Rx(ηx)

ξx

Tηx(ξx)

Optimization Algorithms on Riemannian Manifolds with Applications 20

Page 21: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Secant Conditions

The secant condition of Qi [Qi11]:

grad f (xk+1)− P1←0γk

grad f (xk) = Bk+1(P1←0γk

Exp−1xkxk+1),

where Exp is a particular retraction, called the exponential mapping andP is a particular vector transport, called the parallel translation.

Optimization Algorithms on Riemannian Manifolds with Applications 21

Page 22: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Secant Conditions

The secant condition of Ring and Wirth [RW12]:

(grad f (xk+1)♭TRξk

− grad f (xk )♭)T −1Sξk

= (Bk+1TSξkξk)

where TR is differentiated retraction of R , i.e.,

TRηxζx =

d

dtRx(ηx + tζx)|t=0

and η♭x denotes a function from TxM to R, i.e., η♭xξx = gx(ηx , ξx).Their work is on infinite dimensional manifolds. It is rewritten in a finitedimensional form so that it can be compared to our secant condition.

Optimization Algorithms on Riemannian Manifolds with Applications 22

Page 23: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Secant Conditions

We use

grad f (xk+1)/βk − TSξkgrad f (xk) = Bk+1TSξk

ξk ,

where ξk = R−1xk(xk+1), βk = ‖ξk‖/‖TRξk

ξk‖, TR is differentiatedretraction, and TS is an isometric vector transport that satisfies

TSξξ = βTRξ

ξ.

Optimization Algorithms on Riemannian Manifolds with Applications 23

Page 24: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Euclidean DFP

The Euclidean secant condition and some additional constraints areimposed.

minB‖B − Bk‖WB

s.t. B = BT ,

where WB is any positive definite matrix satisfying WByk = sk and

‖A‖WB= ‖W

1/2B AW

1/2B ‖F .

Bk+1 = (I −yks

Tk

yTk sk

)Bk(I −sky

Tk

yTk sk

) +yky

Tk

yTk sk

,

where sk = xk+1 − xk and yk = grad f (xk+1)− grad f (xk). This is calledDavidon-Fletcher-Powell(DFP) update.

Optimization Algorithms on Riemannian Manifolds with Applications 24

Page 25: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Euclidean BFGS

Let Hk = B−1k .

minH‖H − Hk‖WH

s.t. H = HT ,

where WB is any positive definite matrix satisfying WByk = sk and

‖A‖WB= ‖W

1/2B AW

1/2B ‖F .

Bk+1 = Bk −Bk sks

Tk Bk

sTk Bksk+

ykyTk

yTk sk

.

This is called Broyden-Fletcher-Goldfarb-Shanno(BFGS) update.

Optimization Algorithms on Riemannian Manifolds with Applications 25

Page 26: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Euclidean Broyden Family

The linear combination of BFGS update and DFP update is calledBroyden Family update, (1− φk )BFGS + φkDFP :

Bk+1 = Bk −Bksks

Tk Bk

sTk Bk sk+

ykyTk

yTk sk

+ (φk sTk Bksk)vkv

Tk ,

where

vk =yk

yTk sk−

Bk sk

sTk Bksk.

If φk ∈ [0, 1], then it is restricted Broyden Family update.

Properties

If yTk sk > 0, then Bk+1 is positive definite if and only if Bk is positive

definite.yTk sk > 0 is guaranteed by the Wolfe second condition.

Optimization Algorithms on Riemannian Manifolds with Applications 26

Page 27: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Broyden Family

Riemannian Restricted Broyden Family update is

Bk+1 = Bk −Bksk(B

∗k sk)

(B∗k sk)♭sk

+yky

♭k

y ♭k sk

+ φkg(sk , Bksk)vkv♭k ,

where φk ∈ [0, 1], η♭x denotes a function from TxM to R, i.e.,η♭xξx = gx(ηx , ξx), sk = TSαkηk

αkηk and

yk = grad f (xk+1)/βk − TSαkηkgrad f (xk), Bk = TSαkηk

◦ Bk ◦ T−1Sαkηk

and

vk =yk

g(yk , sk)−

Bksk

g(sk , Bksk).

Optimization Algorithms on Riemannian Manifolds with Applications 27

Page 28: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Broyden Family

Properties

If g(yk , sk) > 0, then Bk+1 is positive definite if and only if Bk ispositive definite.

g(yk , sk) > 0 is not guaranteed by the most natural way ofgeneralizing the Wolfe second condition for arbitrary retraction andisometric vector transport.

We impose another condition called the ’locking condition’

TSξξ = βTRξ

ξ, β =‖ξ‖

‖TRξξ‖

,

where TR is differentiated retraction.

Optimization Algorithms on Riemannian Manifolds with Applications 28

Page 29: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Line Search Riemannian Broyden Family Method

(1) Given initial x0 and symmetric positive definite B0. Let k = 0.

(2) Obtain search direction by ηk = −B−1k grad f (xk )

(3) Set next iterate xk+1 = Rxk (αkηk), where αk is set to satisfy theWolfe conditions

f (xk+1) ≤ f (xk ) + c1αkg(grad f (xk ), ηk), (1)

d

dtf (Rxk (tηk ))|t=αk

≥ c2d

dtf (Rxk (tηk )|t=0. (2)

where 0 < c1 < 0.5 < c2 < 1.

(4) Use update formula to obtain Bk+1.

(5) If not converged, then k ← k + 1 and go to Step 2.

Optimization Algorithms on Riemannian Manifolds with Applications 29

Page 30: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Euclidean Theoretical Results

If f ∈ C 2 and strongly convex, then the sequence {xk} generated bya Broyden family algorithm with φk ∈ [0, 1− δ) converges to theminimizer x∗, where δ > 0. Furthermore, the convergence rate islinear.

If additionally, Hess f is Holder continuous at the minimizer x∗, i.e.,there exist p > 0 and L > 0 such that

‖Hess f (x)−Hess f (x∗)‖ ≤ L‖x − x∗‖p,

for all x in a neighborhood of x∗, then step size αk = 1 satisfies theWolfe conditions eventually. Moreover, if 1 is chosen to be the stepsize whenever it satisfies the Wolfe conditions, {xk} converges to x∗

superlinearly, i.e.,

limk→∞

‖xk+1 − x∗‖2‖xk − x∗‖2

= 0.

Optimization Algorithms on Riemannian Manifolds with Applications 30

Page 31: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Riemannian Theoretical Results

1 The (strong) convexity of a function is generalized to theRiemannian setting and is called (strong) retraction-convexity.

2 Suppose some reasonable assumptions hold. If f ∈ C 2 and stronglyretraction-convex, then the sequence {xk} generated by aRiemannian Broyden family algorithm with φk ∈ [0, 1− δ) convergesto the minimizer x∗, where δ > 0. Furthermore, the convergencerate is linear.

3 If additionally, Hess f satisfies a generalization of Holder continuityat the minimizer x∗, then step size αk = 1 satisfies the Wolfeconditions eventually. Moreover, if 1 is chosen to be the step sizewhenever it satisfies the Wolfe conditions, {xk} converges to x∗

superlinearly, i.e.,

limk→∞

dist(xk+1, x∗)

dist(xk , x∗)= 0.

Optimization Algorithms on Riemannian Manifolds with Applications 31

Page 32: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Convergence Rate

Step size αk = 1:

Eventually works for Riemannian Broyden family algorithm andRiemannian quasi-Newton algorithm

Does not work for RSD in general.

xk

x∗

−Hessf(xk)gradf(xk)

O(dist(xk, x∗)2)

−gradf(xk)

O(dist(xk, x∗))

−B−1k gradf(xk)

o(dist(xk, x∗))

Optimization Algorithms on Riemannian Manifolds with Applications 32

Page 33: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Limited-memory RBFGS

Riemannian Restricted Broyden Family requires computingBk = TSαkηk

◦ Bk ◦ T−1Sαkηk

.

Explicit form of TS may not exist.

Even though it exists, matrix multiplication is needed.

Limited-memory

Similar to Euclidean case, it requires less memory.

It avoids the requirement of explicit form of TS .

We only consider limited-memory RBFGS algorithm.

Optimization Algorithms on Riemannian Manifolds with Applications 33

Page 34: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Limited-memory RBFGS

Consider the update of inverse Hessian approximation of RBFGS,Hk = B−1k . We have

Hk+1 = V♭kHkVk + ρksks

♭k , where ρk =

1

g(yk , sk)and Vk = id−ρkyks

♭k .

If the number of latest sk and yk we use is m + 1, then

Hk+1 = V♭k V

♭k−1 · · · V

♭k−mH

0k+1Vk−m · · · Vk−1Vk

+ ρk−mV♭k V

♭k−1 · · · V

♭k−m+1s

(k+1)k−m s

(k+1)k−m

♭Vk−m+1 · · · Vk−1Vk

+ · · ·

+ ρks(k+1)k s

(k+1)k

♭,

where Vi = id−ρiy(k+1)i s

(k+1)i

♭and H0

k+1 =g(sk ,yk )g(yk ,yk )

id.

Optimization Algorithms on Riemannian Manifolds with Applications 34

Page 35: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Line Search Optimization MethodsSteepest DescentNewton MethodQuasi-Newton Methods

Construct TS

Methods to construct TS satisfying the locking condition

TSξξ = βTRξ

ξ, β =‖ξ‖

‖TRξξ‖

,

for all ξ ∈ TxM.

Method 1: Modifying an existing isometric vector transport

Method 2: Construct TS when a smooth function of buildingorthonormal basis of tangent space is known.

Both ideas use Householder reflection twice.

Method 3: Given an isometric vector transport TS , a retraction isobtained by solving d

dtRx(tηx ) = TStηx

ηx . In some cases, the closedform of the solution exists.

Optimization Algorithms on Riemannian Manifolds with Applications 35

Page 36: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Framework of Trust Region Optimization Methods

Euclidean trust region method is to build a local model

mk(η) = f (xk) + grad f (xk )Tη +

1

2ηTBkη

and findsηk = arg min

‖η‖2≤δkmk(η),

where δk is the radius of trust region. The candidate of next iterate is

xk+1 = xk + ηk .

If (f (xk )− f (xk ))/(mk(0)−mk(ηk) is big enough, then accept thecandidate xk+1 = xk+1, otherwise, reject the candidate. Finally, updatethe local model.

Optimization Algorithms on Riemannian Manifolds with Applications 36

Page 37: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Framework of Trust Region Optimization Methods

Riemannian trust region builds a model on the tangent space of currentiterate xk ,

mk(η) = f (xk ) + g(grad f (xk ), η) +1

2g(η,Bkη)

and findsηk = arg min

‖η‖≤δkmk(η),

where δk is the radius of trust region. The candidate of next iterate is

xk+1 = Rxk (ηk ).

If (f (xk )− f (xk ))/(mk(0)−mk(ηk) is big enough, then accept thecandidate xk+1 = xk+1, otherwise, reject the candidate. Finally, updatethe local model.

Optimization Algorithms on Riemannian Manifolds with Applications 37

Page 38: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Steepest Descent

Riemannian trust region steepest descent(SD)

Bk = id,If the local model is solved exactly, then

ηk = −min(1, δk/‖ grad f (xk)‖) grad f (xk ),

Converges linearly.

xk

Optimization Algorithms on Riemannian Manifolds with Applications 38

Page 39: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Newton Method

Riemannian trust region Newton method

Bk = Hess f (xk ),Converges quadratically [Bak08],In [Bak08], the local model is not required to be solved exactly and aRiemannian truncated conjugate gradient is proposed.

xk

Optimization Algorithms on Riemannian Manifolds with Applications 39

Page 40: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Quasi-Newton Method

Symmetric rank-1 update

Euclidean: Bk+1 = Bk +(yk−Bk sk)(yk−Bk sk )

T

(yk−Bk sk)T sk,

Riemannian:Bk+1 = Bk +(yk−Bk sk)(yk−Bk sk )

g(sk ,yk−Bk sk),

where Bk = TSηk◦ Bk ◦ T

−1Sηk

.

Properties:

It does not preserve positive definiteness of Bk ,It produces better Hessian approximation as an operator.

These properties suggest we use trust region.

Optimization Algorithms on Riemannian Manifolds with Applications 40

Page 41: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Riemannian Trust region with symmetric rank-1 method

(RTR-SR1)

(1) Given τ1, c ∈ (0, 1), τ2 > 1, initial x0, and symmetric B0. Let k = 0.

(2) Obtain ηk by (approximately) solving the local model mk(η)

(3) Set the candidate of next iterate xk+1 = Rxk (ηk).

(4) Let ρk = (f (xk)− f (xk))/(mk (0)−mk(ηk ). If ρk > c , thenxk+1 = xk+1, otherwise xk+1 = xk .

(5) Update the local model by first using update formula to obtain Bk+1

and setting

δk+1 =

τ2δk , if ρk > 0.75 and ‖η‖ ≥ 0.8δk ;τ1δk , if ρk < 0.1;δk , otherwise.

(6) If not converge, then k ← k + 1 and go to Step 2.

Optimization Algorithms on Riemannian Manifolds with Applications 41

Page 42: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Euclidean Theoretical Results

If f is Lipschitz continuously differentiable and bounded below andthe ‖Bk‖ ≤ C for some constant C , then the sequence {xk}generated by trust region with symmetric rank-1 update methodconverges to a stationary point x∗. [NW06]

Suppose some reasonable assumptions hold. If f ∈ C 2 and theHess f is Lipschitz continuous around the minimizer x∗, then thesequence {xk} converges to x∗ n + 1-step superlinearly, i.e.,

limk→∞

‖xk+n+1 − x∗‖2‖xk − x∗‖2

= 0,

where n is the dimension of the domain. [BKS96]

Optimization Algorithms on Riemannian Manifolds with Applications 42

Page 43: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Riemannian Theoretical Results

Global convergence property has been proved in [Bak08] and isapplicable for RTR-SR1.

Suppose some reasonable assumptions hold. If f ∈ C 2 and theHess f satisfies a Riemannian generalization version of Lipschitzcontinuity around the minimizer x∗, then the sequence {xk}converges to x∗ d + 1-step superlinearly, i.e.,

limk→∞

dist(xk+d+1, x∗)

dist(xk , x∗)= 0,

where d is the dimension of the manifold.

Optimization Algorithms on Riemannian Manifolds with Applications 43

Page 44: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Limited-memory RTR-SR1

Same motivation as limited-memory RBFGS

Less storage complexity,Avoid some expensive operations.

Similar techniques

Use a few previous sk and yk to approximate the action of theHessian.

Optimization Algorithms on Riemannian Manifolds with Applications 44

Page 45: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Framework of Trust Region Optimization MethodsSteepest Descent and Newton MethodQuasi-Newton Method

Important Theorems

Dennis and More conditions give necessary and sufficient conditions for asequence {xk} converging superlinearly to x∗ [DM77]. We havegeneralized to

Riemannian Dennis More conditions for root solving

Riemannian Dennis More conditions for optimization

Optimization Algorithms on Riemannian Manifolds with Applications 45

Page 46: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Optimization for Partly Smooth Functions

f is called partly smooth on S if it is continuously differentiable onan open dense subset.

Gradient sampling algorithm (GS) [BLO05],

Global convergence analysis.Works for non-Lipschitz continuous functions empirically.

BFGS [LO13],

Modify the line search algorithm,Modify the stopping criterion,No convergence analysis.Does not work for non-Lipschitz continuous functions empirically.

Optimization Algorithms on Riemannian Manifolds with Applications 46

Page 47: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Optimization for Partly Smooth Functions

Complexity

GS,

Many gradient evaluations in each iterationEach iteration needs to solve a convex quadratic program.

BFGS,

Less gradient evaluations than GSSolving a convex quadratic program is needed when the sequence isclose to convergence.

Solving a convex quadratical program is expensive.

Optimization Algorithms on Riemannian Manifolds with Applications 47

Page 48: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Optimization for Riemannian Partly Smooth Functions

Generalized the framework of GS to the Riemannian setting.

Generalized the modifications of BFGS to the Riemannian setting.

Empirical performance testing.

Optimization Algorithms on Riemannian Manifolds with Applications 48

Page 49: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

General ImplementationsImplementations for Four Specific Manifolds

General Implementations

All the discussions about Riemannian optimization algorithms aregeneral,

General implementations for Riemannian manifolds that can berepresented by R

n are given,

M is a subset of Rn,M is a quotient manifold with total space be a subset of Rn,M is a product of two or more manifolds each of which is any of thefirst two types.

Optimization Algorithms on Riemannian Manifolds with Applications 49

Page 50: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

General ImplementationsImplementations for Four Specific Manifolds

General Implementations

The discussions include

Representation of metric, linear operator and vector transports,

n-dimensional representation,d-dimensional representation (intrinsic approach),

Constructions and implementations of the vector transports.

Optimization Algorithms on Riemannian Manifolds with Applications 50

Page 51: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

General ImplementationsImplementations for Four Specific Manifolds

Implementations for Four Specific Manifolds

Providing detailed efficient implementations for four particular manifolds:

the sphere,

the compact Stiefel manifold,

the orthogonal group,

the Grassmann manifold.

Optimization Algorithms on Riemannian Manifolds with Applications 51

Page 52: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Experiments

Four cost functions are tested:

the Brockett cost function on the Stiefel manifold,

the Rayleigh quotient function on the Grassmann manifold,

the Lipschitz minmax function on the sphere,

the non-Lipschitz minmax function on the sphere.

Optimization Algorithms on Riemannian Manifolds with Applications 52

Page 53: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Experiments

Ten algorithms are compared

RBFGS,

Riemannian Broyden family using Davidon’s update φ [Dav75],

Riemannian Broyden family using a problem specific φ,

Limited-memory RBFGS,

Riemannian SD,

Riemannian GS,

RTR-SR1,

Limited-memory RTR-SR1,

RTR-SD,

RTR-Newton [Bak08].

Optimization Algorithms on Riemannian Manifolds with Applications 53

Page 54: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Experiments

Systematic comparisons are made. The following are shown in thedissertation.

Performance of different retractions and vector transports,

Performance of different choices of φk ,

Performance of different algorithms,

The locking condition yield robustness and reliability of RiemannianBroyden family in our framework. Empirical evidence shows it is notnecessary but behavior then is often difficult to predict.

The value of limited-memory versions for large scaled problems,

The value of Riemannian GS for non-Lipschitz continuous functionon a manifold.

Optimization Algorithms on Riemannian Manifolds with Applications 54

Page 55: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Applications

Applications with smooth enough cost functions,

The joint diagonalization problem for independent componentanalysis,The synchronization of rotation problem,Rotation and reparameterization problem of closed curves in elasticshape analysis,Secant-based nonlinear dimension reduction.

Application with a partly smooth cost function.

Secant-based nonlinear dimension reduction.

Optimization Algorithms on Riemannian Manifolds with Applications 55

Page 56: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

The Joint Diagonalization Problem for Independent

Component Analysis

Independent component analysis (ICA)

Determine an independent component form of a random vector,Determine a few components of an independent component form ofa random vector.

Different cost functions are used [AG06], [TCA09]. We used thejoint diagonalization cost function[TCA09].

The previous algorithm used is RTR-Newton. It is relatively slowwhen the number of samples are large.RTR-SR1 and LRBFGS are the two fastest algorithms when thenumber of samples are large.

Optimization Algorithms on Riemannian Manifolds with Applications 56

Page 57: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

The Synchronization of Rotation Problem

The Synchronization of Rotation Problem is to find N unknownrotations R1, . . .RN from M noisy measurements, Hij of Hij = RiR

Tj .

A review and a Riemannian approach for this problem can be foundin [BSAB12].

Using Riemannian optimization algorithms for the Riemannianapproach, we showed that RBFGS and limited-memory RBFGS arethe two fastest and reliable algorithms.

Optimization Algorithms on Riemannian Manifolds with Applications 57

Page 58: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Rotation and Reparameterization Problem of Closed

Curves in Elastic Shape Analysis

In elastic shape analysis, a shape is invariant toScalingTranslationRotationReparametrization

shape1: x = cos(2πt3), y = sin(2πt3), t ∈ [0, 1]shape2: x = cos(2πt), y = sin(2πt), t ∈ [0, 1]

Our work is based on the framework of [SKJJ11].

Optimization Algorithms on Riemannian Manifolds with Applications 58

Page 59: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Rotation and Reparameterization Problem of Closed

Curves in Elastic Shape Analysis

Elastic shape space is a quotient space. When two closed curves arecompared, an important problem in elastic space analysis is to findthe best rotation and reparametrization function.

Previous algorithm is a coordinate relaxation of rotation andreparameterization.

Rotation: Singular value decompositionReparameterization: dynamic programmingOne iteration

Difficulties

High complexity.Not robust when more iterations are used.

Optimization Algorithms on Riemannian Manifolds with Applications 59

Page 60: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Rotation and Reparameterization Problem of Closed

Curves in Elastic Shape Analysis

Gradient methods:

Hessian is unknown,

Infinite dimensional problem,

Riemannian quasi-Newton algorithms can be applied,

Work for closed curves problem,Reliable and much faster than the coordinate relaxation algorithm.

Optimization Algorithms on Riemannian Manifolds with Applications 60

Page 61: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Secant-based Nonlinear Dimension Reduction

SupposeM is a d-dimensional manifold embedded in Rn. The idea

is to find a projection π[U] = U(UTU)−1UT such that π[U]|M iseasy to invert, i.e., maximize kπ[U]

where

kπ[U]= inf

x,y∈M,x 6=y

‖π[U](x − y)‖2

‖x − y‖2.

The cost function φ([U]) = ‖π[U](x − y)‖2/‖x − y‖2 is partlysmooth.

An alternative smooth cost function F ([U]) is proposed in [BK05].

Discretization is needed to approximate F and φ, called F and φrespectively.

Optimization Algorithms on Riemannian Manifolds with Applications 61

Page 62: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

ExperimentsApplicationsThe Joint Diagonalization ProblemThe Synchronization of Rotation ProblemA Problem in Elastic Shape AnalysisSecant-based Nonlinear Dimension Reduction

Secant-based Nonlinear Dimension Reduction

Previous method used in [BK05] is Riemannian conjugate gradientalgorithm for F ([U]).

An example (used in [BK00] and [BK05]) is tested

For the smooth function F , RBFGS and LRBFGS is the two fastestalgorithms.For the partly smooth function φ, RBFGS is the fastest algorithm.Even though Riemannian GS is relatively slow, it can escape fromlocal optima and usually find the global optimum.F is a worse cost function than φ in the sense the global optimum ofF may be non-invertible.

Optimization Algorithms on Riemannian Manifolds with Applications 62

Page 63: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Conclusions

Generalized Broyden family update and symmetric rank-1 update tothe Riemannian setting; combined them with line search and trustregion strategy respectively and provided complete convergenceanalysis.

Generalized limited-memory version of SR1 and BFGS to theRiemannian setting.

The main work of generalizing quasi-Newton algorithms toRiemannian setting is finished.

Generalized GS and modified version of BFGS to the Riemanniansetting.

Developed general, detailed and efficient implementations forRiemannian optimization.

Empirical performances are accessed by experiments and fourapplications.

Optimization Algorithms on Riemannian Manifolds with Applications 63

Page 64: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

Thank you!

Thank you!

Optimization Algorithms on Riemannian Manifolds with Applications 64

Page 65: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References I

P.-A. Absil and K. A. Gallivan.Joint diagonalization on the oblique manifold for independentcomponent analysis.2006 IEEE International Conference on Acoustics, Speech and Signal

Processing, 2006. ICASSP 2006 Proceedings., 5:V945–V948, 2006.

P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization algorithms on matrix manifolds.Princeton University Press, Princeton, NJ, 2008.

C. G. Baker.Riemannian manifold trust-region methods with applications to

eigenproblems.PhD thesis, Florida State University, 2008.

Optimization Algorithms on Riemannian Manifolds with Applications 65

Page 66: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References II

D. S. Broomhead and M. Kirby.A new approach to dimensionality reduction: theory and algorithms.SIAM Journal on Applied Mathematics, 60(6):2114–2142, 2000.

D. S. Broomhead and M. J. Kirby.Dimensionality reduction using secant-based projection methods :the induced dynamics in projected systems.Nonlinear Dynamics, 41(1-3):47–67, 2005.

R. H. Byrd, H. F. Khalfan, and R. B. Schnabel.Analysis of a symmetric rank-one trust region method.SIAM Journal on Optimization, 6(4):1025–1039, 1996.

Optimization Algorithms on Riemannian Manifolds with Applications 66

Page 67: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References III

J. V. Burke, A. S. Lewis, and M. L. Overton.A robust gradient sampling algorithm for nonsmooth, nonconvexoptimization.SIAM Journal on Optimization, 15(3):751–779, January 2005.doi:10.1137/030601296.

N. Boumal, A. Singer, P.-A. Absil, and V. D. Blondel.Cramer-Rao bounds for synchronization of rotations, 2012.arXiv:1211.1621v1.

W. C. Davidon.Optimally conditioned optimization algorithms without line searches.Mathematical Programming, 9(1):1–30, 1975.

Optimization Algorithms on Riemannian Manifolds with Applications 67

Page 68: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References IV

J. E. Dennis and J. J. More.Quasi-Newton methods, motivation and theory.SIAM Review, 19(1):46–89, 1977.

A. S. Lewis and M. L. Overton.Nonsmooth optimization via quasi-Newton methods.Mathematical Programming, 141(1-2):135–163, February 2013.doi:10.1007/s10107-012-0514-2.

J. Nocedal and S. J. Wright.Numerical optimization.Springer, second edition, 2006.

Optimization Algorithms on Riemannian Manifolds with Applications 68

Page 69: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References V

B. O’Neill.Semi-Riemannian geometry.Academic Press Incorporated [Harcourt Brace JovanovichPublishers], 1983.

C. Qi.Numerical optimization methods on Riemannian manifolds.PhD thesis, Florida State University, 2011.

W. Ring and B. Wirth.Optimization methods on Riemannian manifolds and theirapplication to shape space.SIAM Journal on Optimization, 22(2):596–627, January 2012.doi:10.1137/11082885X.

Optimization Algorithms on Riemannian Manifolds with Applications 69

Page 70: Optimization Algorithms on Riemannian Manifolds with ...gallivan/talks/WHuang_Defense_Slides.pdf · Introduction Line Search Optimization Methods Trust Region Optimization Methods

IntroductionLine Search Optimization Methods

Trust Region Optimization MethodsOptimization for Partly Smooth Functions

ImplementationsExperiments and Applications

Conclusions

References VI

A. Srivastava, E. Klassen, S. H. Joshi, and I. H. Jermyn.Shape analysis of elastic curves in Euclidean spaces.IEEE Transactions on Pattern Analysis and Machine Intelligence,33(7):1415–1428, September 2011.doi:10.1109/TPAMI.2010.184.

F. J. Theis, T. P. Cason, and P.-A. Absil.Soft dimension reduction for ICA by joint diagonalization on theStiefel manifold.Proceedings of the 8th International Conference on Independent

Component Analysis and Signal Separation, 5441:354–361, 2009.

Optimization Algorithms on Riemannian Manifolds with Applications 70