large scale optimization for machine learning...• useful optimization tools for machine learning...

39
Large Scale Optimization for Machine Learning Meisam Razaviyayn Lecture 0 [email protected]

Upload: others

Post on 28-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

LargeScaleOptimizationforMachineLearning

Meisam RazaviyaynLecture 0

[email protected]

Page 2: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

What is this course about?• Useful optimization tools for machine learning

• Focus is on the algorithms and the analysis

• This is NOT a ML course• Don’t expect to learn detailed ML

• This is NOT a classical optimization course• We won’t cover many classical optimization results

• We cover some basics though• Few weeks on optimization• Some ML examples will be explained in details

1

Page 3: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Prerequisites

• ProbabilityandStatistics• Expectedvalue,variance,statisticalindependence,conditionalprobability,maximumlikelihoodestimation,regression,etc.

• LinearAlgebraandMathematicalAnalysis• Sets,functions,limits,liminf,limsup,derivative,gradient,subspace,lineardependence,innerproduct,eigenvalue,singularvalue,norms,etc.

• Programmingskills• Matrix/vectoroperationsinMatlab/Python/C++• “For,while,repeatuntil”loops

2

Page 4: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

What is optimization?

3

Page 5: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

What is optimization?

Decisionvariable(Discrete/Continuous)

Objective/Costfunction

FeasibleRegion

3

Page 6: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

What is optimization?

Decisionvariable(Discrete/Continuous)

Objective/Costfunction

FeasibleRegion• Existenceofasolution?• Checkingifacandidatexisoptimal?• Findingtheoptimalsolutionnumerically

3

Page 7: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Why do we care?

• Manyengineeringproblemsrequiresoptimization

• OptimizationiskeytoefficientimplementationofMLalgorithms

4

Page 8: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

5

Page 9: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Problem Example: RegressionSamples/Datapo

ints

Features/independentVariablesTarget/DependentVariables/Label

Area CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

5

Page 10: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Problem Example: Regression

Canweusethisdatasettopredictthepriceofthishouse?

1400 2.2 3 3.1 7.6 2 … ????

Samples/Datapo

ints

Area CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

5

Features/independentVariablesTarget/DependentVariables/Label

Page 11: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

Canweusethisdatasettopredictthepriceofthishouse?

1400 2.2 3 3.1 7.6 2 … ????

TrainingData

5

Page 12: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

1400 2.2 3 3.1 7.6 2 … ????

TrainingData

LearningAlgorithm Prediction

5

Page 13: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Learning Algorithms

• VariousmethodsinML• Decisiontrees,deeplearning,Bayes,empiricalBayes,linearregression,logisticregression,…

• Manymethods

• Model

• Minimizetheloss/Maximizethelikelihood

6

Page 14: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

Canweusethisdatasettopredictthepriceofthishouse?

1400 2.2 3 3.1 7.6 2 … ????

7

Page 15: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

7

Page 16: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

7

Page 17: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

Model:LinearpredictorLoss:L2difference

7

Page 18: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)

600 1.05 12 2.4 10.1 1 … 500

1000 2.34 10 2.5 20.1 1 … 800

1200 1.45 3 3.1 9.7 3 … 1500

1500 1.56 30 1.7 7.2 2 … 1200

… … … … … … … …

2700 1.01 20 0.9 4.3 4 … 5000

Model:LinearpredictorLoss:L2difference

Offsetincluded

7

Page 19: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Another Example: Classification

Radius Texture Area Compactness Symmetry … Rec/non-Rec

1.1 2.3 3.5 2.4 1.4 … 1

0.7 1.2 2.5 1.4 3.2 … 0

1.7 2.4 1.5 3.3 1.3 … 1

… … … … … … …

0.2 3.4 0.7 4.3 2.0 … 1

0.2 2.7 0.9 2.3 1.0 … ????

8

Page 20: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Logistic Regression

Radius Texture Area Compactness Symmetry … Rec/non-Rec

1.1 2.3 3.5 2.4 1.4 … 1

0.7 1.2 2.5 1.4 3.2 … 0

1.7 2.4 1.5 3.3 1.3 … 1

… … … … … … …

0.2 3.4 0.7 4.3 2.0 … 1

9

Page 21: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Logistic Regression

Radius Texture Area Compactness Symmetry … Rec/non-Rec

1.1 2.3 3.5 2.4 1.4 … 1

0.7 1.2 2.5 1.4 3.2 … 0

1.7 2.4 1.5 3.3 1.3 … 1

… … … … … … …

0.2 3.4 0.7 4.3 2.0 … 1

Model:logisticMaximumlikelihoodestimator

9

Page 22: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Algorithm Example: Logistic Regression

Radius Texture Area Compactness Symmetry … Rec/non-Rec

1.1 2.3 3.5 2.4 1.4 … 1

0.7 1.2 2.5 1.4 3.2 … 0

1.7 2.4 1.5 3.3 1.3 … 1

… … … … … … …

0.2 3.4 0.7 4.3 2.0 … 1

Model:logisticMaximumlikelihoodestimator

9

Page 23: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Optimization in ML

• Many more examples (K-means, SVM, Deep learning, …)

• Efficient algorithms: CPU, Memory requirements, Parallelizable, robustness, etc.

• Other issues: Non-convexity, Sparsity, Large values of n/d, Online implementation, Implicit

bias, Overfitting, etc.

• But first, we need to review a bit of optimization (Targeted review!)

10

Page 24: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Optimization in ML

• Many more examples (K-means, SVM, Deep learning, …)

• Efficient algorithms: CPU, Memory requirements, Parallelizable, robustness, etc.

• Other issues: Non-convexity, Sparsity, Large values of n/d, Online implementation, Implicit

bias, Overfitting, etc.

• But first, we need to review a bit of optimization (Targeted review!)

• Even before this, let’s review a bit of linear algebra and mathematical analysis10

Page 25: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Notations• Sets•• Realnumbers,Complexnumbers

11

Page 26: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Notations• Sets•• Realnumbers,Complexnumbers

• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat

11

Page 27: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Notations• Sets•• Realnumbers,Complexnumbers

• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat

11

Page 28: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Notations• Sets•• Realnumbers,Complexnumbers

• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat

inf{sinn : n � 1} =?

11

Page 29: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Notations• Sets•• Realnumbers,Complexnumbers

• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat

inf{sinn : n � 1} =?

• Functions:

11

Page 30: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Vectors and Subspaces• Linear combination:

12

Page 31: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Vectors and Subspaces• Linear combination:

• Subspace and linear independence• Asetiscalledsubspaceifitisclosedunderlinearcombination• Asetofvectorsiscalledlinearlyindependentifnolinearcombinationofthemisequaltozero• Innerproduct:• Orthogonality:

12

Page 32: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Vectors and Subspaces• Linear combination:

• Subspace and linear independence• Asetiscalledsubspaceifitisclosedunderlinearcombination• Asetofvectorsiscalledlinearlyindependentifnolinearcombinationofthemisequaltozero• Innerproduct:• Orthogonality:

• Cauchy-Schwarz inequality

12

Page 33: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Matrix addition• Matrix product• Square matrix• Inner product:

13

Page 34: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Matrix addition• Matrix product• Square matrix• Inner product:

• Spectral radius:

13

Page 35: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Matrix addition• Matrix product• Square matrix• Inner product:

• Spectral radius:

• Eigenvalue decomposition of symmetric matrices• Positive (Semi-)definite matrices

13

Page 36: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Singular values:

14

Page 37: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Singular values:• Singular value decomposition of :

14

Page 38: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Singular values:• Singular value decomposition of :

• Norms:• Frobenius norm:

• Nuclear norm:

• Matrix 2-norm:

14

Page 39: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations

Matrices• Singular values:• Singular value decomposition of :

• Norms:• Frobenius norm:

• Nuclear norm:

• Matrix 2-norm:

• Useful inequalities:

14