optimization methods in machine learning lectures 13-14optimization methods in machine learning...

33
Optimization Methods in Machine Learning Lectures 13-14 Katya Scheinberg Lehigh University [email protected]

Upload: others

Post on 11-Oct-2020

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Optimization Methods in Machine Learning

Lectures 13-14

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA

Katya Scheinberg Lehigh University

[email protected]

Page 2: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

First Order Methods

Page 3: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Consider:

•  Linear lower approximation

•  Quadratic upper approximation

First-order proximal gradient methods

Page 4: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then

First-order proximal gradient method

Page 5: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 6: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Complexity bound derivation outline

Page 7: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then in O(L||x0-x*||/²) iterations finds solution

Complexity of proximal gradient method

Compare to O(log(L/²)) of interior point methods.

Can we do better?

Page 8: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then

Accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

Page 9: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then in iterations finds solution

Complexity of accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

This method is optimal if only gradient information is used.

Page 10: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete
Page 11: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 12: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then in iterations finds solution

FISTA method Beck&Teboulle ‘09

Page 13: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

µ is not a prox parameter here

Page 14: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 15: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 16: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Unconstrained formulation of the SVM problem

Page 17: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

SVM problem using Huber loss function

Page 18: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

First order methods for composite functions

Page 19: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Lasso or CS:

•  Group Lasso or MMV

•  Matrix Completion

•  Robust PCA

•  SICS

Examples

Page 20: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Consider:

•  Quadratic upper approximation

Prox method with nonsmooth term

Assume that g(y) is such that the above function is easy to optimize over y

Page 21: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation function Qf,µ(x,y) on each iteration

Example 1 (Lasso and SICS)

Closed form solution!

O(n) effort

Page 22: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete
Page 23: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Very similar to the previous case, but with ||.|| instead of |.|

Example 2 (Group Lasso)

Closed form solution!

O(n) effort

Page 24: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Example 3 (Collaborative Prediction)

Closed form solution!

O(n^3) effort

Page 25: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then in O(L/²) iterations finds solution

ISTA/Gradient prox method

Page 26: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an “accelerated” point.

•  If µ· 1/L then in iterations finds solution

Fast first-order method Nesterov, Beck & Teboulle

Page 27: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Practical first order algorithms using backtracking search

Page 28: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper relaxation on each iteration

•  Using line search find µk such that

•  In O(1/µmin²) iterations finds ²-optimal solution (in practice better)

Nesterov, 07 Beck&Teboulle, Tseng, Auslender&Teboulle, 08

Iterative Shrinkage Threshholding Algorithm (ISTA)

Page 29: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper relaxation on each iteration

•  Using line search find µk · µk-1 such that

•  In iterations finds ²-optimal solution

Fast Iterative Shrinkage Threshholding Algorithm (FISTA)

Nesterov, Beck&Teboulle, Tseng

Very restrictive

Page 30: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  ISTA’s complexity is O(L/²) while FISTA’s is

•  However, FISTA’s condition µk · µk-1 often slows down practical performance and simply ignoring the condition does not help.

•  We want to modify FISTA algorithm to relax µk · µk-1, while maintaining complexity bound or maybe even improving it

FISTA with line search Goldfarb and S. 2010

Page 31: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk · µk-1 such that

Cycle to find µk

Convergence rate:

Page 32: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk such that

Goldfarb & S. 2011

This condition….

… gives this bound on the error

Page 33: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk such that

Goldfarb & S. 2011

FISTA with full line search

Cycle to find µ and t