statistical machine learning part i: statistical learning ... · yelp.com dataset challenge sml -...
TRANSCRIPT
![Page 2: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/2.jpg)
Course Introduction
• The Promises of Big Data
• What kind of tools will we use?
• Do we have to program?
• For starters... a first assignment
• Why is this useful for me?
SML - 2015 2
![Page 3: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/3.jpg)
The Promises of Big Data
SML - 2015 3
![Page 4: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/4.jpg)
Personal Health
• Data can help us predict when people will have to go to the hospital
Heritage Health Prize
SML - 2015 4
![Page 5: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/5.jpg)
Small Businesses
• Data can help us predict the dynamics of restaurants’ popularity
Yelp.com dataset challenge
SML - 2015 5
![Page 6: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/6.jpg)
Lending Money
• Data can help us predict who we can lend money to
www.lendingclub.com
SML - 2015 6
![Page 7: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/7.jpg)
Lending Money
• Data can help us predict who we can lend money to
www.lendingclub.com
SML - 2015 7
![Page 8: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/8.jpg)
Movies
• Data can help us predict whether people will like a given movie
Netflix Prize, Research@ATT Hao Zhang
SML - 2015 8
![Page 9: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/9.jpg)
All these problems have in common that...
Data is Available
all you have to do, is download it... and analyze it!
SML - 2015 9
![Page 10: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/10.jpg)
What we will do in 7 lectures
The graduate school has many courses on how to handle data.Check the course offerings.
In these 7 lectures, we will focus on 3 things:
• Present elementary tools: regression and classification
• Study the mathematical foundations of statistical learning theory:
◦ Choose the right models, address computational issues,◦ Address the problem of overfitting.
• Introduce advanced topics: kernel methods, sparsity.
SML - 2015 10
![Page 11: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/11.jpg)
What kind of mathematical tools?
We will adopt a mathematical formalism to propose and study algorithms.
Probability & Statistics, Linear Algebra, Optimization
SML - 2015 11
![Page 12: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/12.jpg)
Mathematical Tools
• Probability & Statistics (to handle uncertainty & randomness)
◦ Probability Spaces, Random variables◦ Expectation, variance, inequalities◦ Central limit theorem, convergence in probability
• Linear Algebra (to handle high-dimensional problems)
◦ Matrix inverse, eigenvalues/vectors◦ Positive-definiteness.
• Optimization (to give the best possible answer)
◦ convex programs,◦ lagrangean, Lagrange multipliers etc.
SML - 2015 12
![Page 13: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/13.jpg)
Programming
This is not a course about programming, but we will implement algorithms
I encourage you to use MATLAB
but you can use any other program (R, Python, etc...)
I do not recommend using C/C++ or other compiled languages.
SML - 2015 13
![Page 14: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/14.jpg)
For Starters...
Some simple ideas and a 1st assignment.
SML - 2015 14
![Page 15: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/15.jpg)
A function
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
a polynomial plotted between 0 and 4...
SML - 2015 15
![Page 16: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/16.jpg)
A function
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
... can be seen as a very detailed scatter plot.
SML - 2015 16
![Page 17: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/17.jpg)
A function
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
Yet, when less points are available...
SML - 2015 17
![Page 18: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/18.jpg)
A function
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
can we still guess the whole blue line?
SML - 2015 18
![Page 19: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/19.jpg)
A partially observed function
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
100
Assume we only have the red points.
SML - 2015 19
![Page 20: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/20.jpg)
We can guess by using interpolating polynomials
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
100
data 1 quadratic cubic 4th degree
Curve fitting tools can help us get back the original function.We can actually reconstruct it perfectly.
SML - 2015 20
![Page 21: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/21.jpg)
Polynomial Interpolation
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
100
even if points are not evenly spaced...
SML - 2015 21
![Page 22: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/22.jpg)
Polynomial Interpolation
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
100
data 1 quadratic cubic 4th degree
SML - 2015 22
![Page 23: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/23.jpg)
Uncertainty in measurements
0 0.5 1 1.5 2 2.5 3 3.5 4−20
0
20
40
60
80
100
sometimes, we do not have access to the correct information...
SML - 2015 23
![Page 24: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/24.jpg)
Uncertainty in measurements
0 0.5 1 1.5 2 2.5 3 3.5 4−40
−20
0
20
40
60
80
100
but rather an information corrupted by “noise”.
SML - 2015 24
![Page 25: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/25.jpg)
Things become a lot more difficult
0 0.5 1 1.5 2 2.5 3 3.5 4−40
−20
0
20
40
60
80
100
data 1data 2 4th degree
If we use standard tools...
SML - 2015 25
![Page 26: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/26.jpg)
Things become a lot more difficult
0 0.5 1 1.5 2 2.5 3 3.5 4
−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
data 1data 2 4th degreeoriginal curve
we might be very far from the original function.
SML - 2015 26
![Page 27: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/27.jpg)
Things become a lot more difficult
0 0.5 1 1.5 2 2.5 3 3.5 4
−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
data 1data 2 4th degreeoriginal curve
Can we handle uncertainty in a better way?Quantify how far we might be from the true function?
How many points do we need to reconstruct a more general curve?Does this work for surfaces in higher dimensions?
SML - 2015 27
![Page 28: Statistical Machine Learning Part I: Statistical Learning ... · Yelp.com dataset challenge SML - 2015 5. Lending Money • Data can help us predict who we can lend money to SML -](https://reader030.vdocuments.mx/reader030/viewer/2022040617/5f2323c15ae4f1372c7e4bcf/html5/thumbnails/28.jpg)
Things become a lot more difficult
0 0.5 1 1.5 2 2.5 3 3.5 4
−20
0
20
40
60
80
x
(x−1)4−(x−3)2+(x−2)3
data 1data 2 4th degreeoriginal curve
First assignment - due Monday October 13th 23:59 by email
• Look for a definition of interpolation, e.g. check the wikipedia page.
• Do what I just did with Matlab and send me an email with the results:
◦ Choose a function.. you can use fancier functions (sin, cos, exp etc.)◦ Plot it. Scatter plot a few points.◦ Use these points with the curve fitting tool. Interpolate & Compare.
• Finally: give me a hint of what might go wrong in higher dimensions?
SML - 2015 28