optimization for data science - télécom paristech · optimization for data science master 2 data...
TRANSCRIPT
![Page 1: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/1.jpg)
Optimization for Data ScienceMaster 2 Data Science, Univ. Paris Saclay
Robert M. Gower&
Alexandre Gramfort
![Page 2: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/2.jpg)
Core Info● Where : Telecom ParisTech● Location : Amphi Estaunié or B312 ● ECTS : 5 ECTS● Volume : 40h● When : 12 weeks (including one week break for holidays +
one week for exam)● Online: All teaching materials on moodle: http://datascience-
x-master-paris-saclay.fr/education/● Students upload their projects / reports via moodle too.● All students **must** be registered on moodle.
![Page 3: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/3.jpg)
Who am I? Robert M. Gower
● Assistant Prof at Telecom ● [email protected] ● www.ens.fr/~rgower● Research topics: Stochastic algorithms for optimization,
numerical linear algebra, quasi-Newton methods and automatic differentiation (backpropagation).
![Page 4: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/4.jpg)
Introduction to Optimization in Machine Learning
Robert M. Gower
Master 2 Data Science, Univ. Paris SaclayOptimisation for Data Science
![Page 5: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/5.jpg)
An Introduction to Supervised Learning
![Page 6: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/6.jpg)
References for this class
Convex Optimization
Pages 67 to 79
Understanding Machine Learning: From Theory to Algorithms
Chapter 1
![Page 7: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/7.jpg)
Is There a Cat in the Photo?
Yes
No
![Page 8: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/8.jpg)
Is There a Cat in the Photo?
Yes
![Page 9: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/9.jpg)
Is There a Cat in the Photo?
Yes
![Page 10: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/10.jpg)
Is There a Cat in the Photo?
No
![Page 11: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/11.jpg)
Is There a Cat in the Photo?
Yes
![Page 12: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/12.jpg)
Find mapping h that assigns the “correct” target to each input
Is There a Cat in the Photo?
Yes
No
x: Input/Feature y: Output/Target
![Page 13: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/13.jpg)
Labeled Data: The training set
![Page 14: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/14.jpg)
Labeled Data: The training set
y= -1 means no/false
![Page 15: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/15.jpg)
Labeled Data: The training set
Learning Algorithm
y= -1 means no/false
![Page 16: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/16.jpg)
Labeled Data: The training set
Learning Algorithm
y= -1 means no/false
![Page 17: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/17.jpg)
Labeled Data: The training set
Learning Algorithm
-1
y= -1 means no/false
![Page 18: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/18.jpg)
Example: Linear Regression for Height
Sex Male
Age 30
Height 1,72 cm
Sex Female
Age 70
Height 1,52 cm
Labeled data
![Page 19: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/19.jpg)
Example Hypothesis: Linear Model
Example: Linear Regression for Height
Sex Male
Age 30
Height 1,72 cm
Sex Female
Age 70
Height 1,52 cm
Labeled data
![Page 20: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/20.jpg)
Example Training Problem:
Example Hypothesis: Linear Model
Example: Linear Regression for Height
Sex Male
Age 30
Height 1,72 cm
Sex Female
Age 70
Height 1,52 cm
Labeled data
![Page 21: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/21.jpg)
Linear Regression for Height
Age
Height
![Page 22: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/22.jpg)
Linear Regression for Height
The Training Algorithm
Age
Height
![Page 23: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/23.jpg)
Linear Regression for Height
The Training Algorithm
Age
Height
Other options aside from linear?
![Page 24: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/24.jpg)
Parametrizing the HypothesisHeight
Age
Linear:
Polinomial:
Age
Height
Neural Net:
![Page 25: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/25.jpg)
Loss Functions
Why a SquaredLoss?
![Page 26: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/26.jpg)
Loss Functions
Why a SquaredLoss?
Loss Functions
The Training Problem
![Page 27: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/27.jpg)
Loss Functions
Why a SquaredLoss?
Loss Functions
The Training Problem
Typically a convex function
![Page 28: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/28.jpg)
Choosing the Loss Function
Quadratic Loss
Binary Loss
Hinge Loss
![Page 29: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/29.jpg)
Choosing the Loss Function
Quadratic Loss
Binary Loss
Hinge Loss
y=1 in all figures
![Page 30: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/30.jpg)
Choosing the Loss Function
Quadratic Loss
Binary Loss
Hinge Loss
EXE: Plot the binary and hinge loss function in when
y=1 in all figures
![Page 31: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/31.jpg)
Loss Functions
Is a notion of Loss enough?
What happens when we do not have enough data?
![Page 32: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/32.jpg)
Loss FunctionsThe Training Problem
Is a notion of Loss enough?
What happens when we do not have enough data?
![Page 33: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/33.jpg)
Overfitting and Model Complexity
Fitting 1st order polynomial
![Page 34: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/34.jpg)
Overfitting and Model Complexity
Fitting 1st order polynomial
![Page 35: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/35.jpg)
Overfitting and Model Complexity
Fitting 3rd order polynomial
![Page 36: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/36.jpg)
Overfitting and Model Complexity
Fitting 9th order polynomial
![Page 37: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/37.jpg)
Regularizor Functions
General Training Problem
Regularization
![Page 38: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/38.jpg)
Regularizor Functions
General Training Problem
Regularization
Goodness of fit, fidelity term ...etc
![Page 39: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/39.jpg)
Regularizor Functions
General Training Problem
Regularization
Goodness of fit, fidelity term ...etc
Penlizes complexity
![Page 40: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/40.jpg)
Regularizor Functions
General Training Problem
Regularization
Goodness of fit, fidelity term ...etc
Penlizes complexity
Controls tradeoff between fit and complexity
![Page 41: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/41.jpg)
Regularizor Functions
General Training Problem
Regularization
Exe:
Goodness of fit, fidelity term ...etc
Penlizes complexity
Controls tradeoff between fit and complexity
![Page 42: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/42.jpg)
Overfitting and Model Complexity
Fitting kth order polynomial
![Page 43: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/43.jpg)
Overfitting and Model Complexity
Fitting kth order polynomial
For big enough, λthe solution is a 2nd order polynomial
![Page 44: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/44.jpg)
Linear hypothesis
Exe: Ridge Regression
Ridge Regression
L2 loss
L2 regularizor
![Page 45: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/45.jpg)
Linear hypothesis
Exe: Support Vector Machines
SVM with soft margin
Hinge loss
L2 regularizor
![Page 46: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/46.jpg)
Linear hypothesis
Exe: Logistic Regression
Logistic Regression
Logistic loss
L2 regularizor
![Page 47: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/47.jpg)
The Machine Learners Job
![Page 48: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/48.jpg)
The Machine Learners Job
![Page 49: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/49.jpg)
The Machine Learners Job
![Page 50: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/50.jpg)
The Machine Learners Job
![Page 51: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/51.jpg)
The Machine Learners Job
![Page 52: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/52.jpg)
The Machine Learners Job
![Page 53: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/53.jpg)
The Statistical Learning Problem:The hard truth
Do we really care if the loss is small on the known labelled data paris (xi,yi) ? Nope
We really want to have a small loss on new unlabelled Observations! Assume data sampled where is an unknown distribution
![Page 54: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science](https://reader033.vdocuments.mx/reader033/viewer/2022042302/5ecce711e8355a7b3b2fa879/html5/thumbnails/54.jpg)
The statistical learning problem:Minimize the expected loss over an unknown expectation
The Statistical Learning Problem:The hard truth
Variance of sample mean: