stk2100 - machine learning and statistical methods for … · stk2100 - machine learning and...
TRANSCRIPT
![Page 1: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/1.jpg)
STK2100 - Machine Learning and Statistical Methods forPrediction and Classification
Lecturer: Geir StorvikExercises: Lars Henry Berge Olsen
Plan for today:Quick overview of the course
Some topics from the first two chapters
![Page 2: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/2.jpg)
Big data, data science, statistics
I Huge amounts of data are now more easily availableI Receipts from supermarketsI Credit card dataI Genetic dataI Mobile phone dataI Metrological dataI WebdataI Astronomic dataI Large physical experimentsI Automatic methods for collecting dataI Major reductions in cost for storage
I Data Science is central in extracting knowledge from data
![Page 3: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/3.jpg)
Data Science
The importance of Data Science:I https://blog.edx.org/
the-importance-of-data-science-in-the-21st-century
I http://bigdata.teradata.com/US/Big-Data-Quick-Start/People-And-Roles/Data-Scientist/
I https://blog.alexa.com/know-data-science-important/
I https://blog.udacity.com/2014/11/data-science-job-skills.html
This course:I Focus on data analysisI Statistics, machine learning, statistical learning
![Page 4: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/4.jpg)
Statistics/machine learning/data mining
I Azzalini and Scarpa (2012):Data mining represents the work of processing, graphically or nume-rically, large amounts or continuous streams of data, with the aim ofextracting information useful to those who possess them.
Include statistics, machine learning, database management (IN3020)I In many cases statistics and machine learning are presented as different
things:I Statistics: Model basedI Machine learning: Algorithmic basedI https://www.svds.com/machine-learning-vs-statistics/I Breiman 2001
I In practice: Considerable overlapI Common goal: Extract knowledge from data
I Required background:I ProgrammingI Mathematics (calculus, linear algebra)I Probability theory, statistical inference
![Page 5: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/5.jpg)
This course: Focus on prediction
I To predict an outcome that is stochastic is important in many situations:I Numbers that become infected or hospitalized due to Covid-19I Numbers that die within each age groups (life insurance)I Extreme rainfall (meteorology)I Outcome of a medical treatmentI What people buy in storesI +++
I Prediction of outcome is typically based on a choice of relevantexplanatory variablesI Residence, age, educationI Temperature, windI Genomic data
I Statistical terminology:Continuous response PredictionCategorical response Classification
I Huge datasets are (sometimes) collectedI Gives possibilities for more flexible methods/modelsI Can give new challenges in evaluation of methodsI Can also give computational challenges
I Many of the main ideas common with analysis of smaller datasets.
![Page 6: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/6.jpg)
Car data
I Problem: Predict distance covered per unit fuel (or consumption of fuel)as a function of certain characteristics of a car
Citydistance
12
34
5
●●
●
●
●●●●●●
●●
●●●
●●●
●
●●●●●
●●●●
●
●
●●
●
●●●●
●●●●●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●●●●
●●
●●●
●●●●
●●●●●●●●●
●●
●● ●●● ●
●●●●●
●
●●
●●●
●
●●
●●●●
●●●●●●
●●● ●●●●● ●●●●
●●●●●●●● ●●●●●●●
●●●●●●
●●●●
●●●●
●●●●●●
●
●
●●●●●●
●●
●
●
●●
●
●
●●●●●●
●●
●●●●●●
●
●●●●● ●●●●●● ●● ●●●●●●●●●●●● ●●●
●●
●
●●●●●
●●●●
●●●●●●
●●●●
● ●●●●● ●●●● ●●●● ●●●●●●●●●●●
●● ●●● ●
●●●●●● ●● ●●●●●●
●●●
●
●●●●●● ●●● ●●●●● ●●●● ●●●●●● ●● ●●●●●●●●●●●●● ●●●●
●●●●
●●●●●●
●
●●●●●●● ●●
●
●
5 10 15 20
800
1200
1600
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●● ●
●●●●
●●●●
●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
●●●
●
● ●
●●●
●●●●
●●●●●●●●●
●●
●
●
●●● ●
●
●
●
●
●●
●
●
●●
●
●
●●●●●
●
●●●●
●●
●●
●●●
●●
●
●
●●
●
●●●
●●
●
●● ●●●●●●●
●●●
●●
●
●●●●
●●●●
●●●●●●
●●
●
●●●●
●
●●●●
1 2 3 4 5
●●
●
●
●●●●
●●
●●
●●●
●●●
●
●●●
●
●
●●●
● ●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
●●
●●
●
●●●●●●●
●
●
●●●
●
●●
●●●●
●●●●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●●
●●
●
●●●●
●●
●●●●●●
●
●●●
●●●
●
●●●●
●●
●
●
●●●●
●●
●
●●
●
Enginesize
●●
●
●
●●●●●●
●●
●●● ●●●
●
●●●●●●●●● ● ●●●● ●●●● ●●●●●●●●● ●
●●
●
●●●●●
●●● ●
●●●●● ●
●● ●●
●●●●● ● ● ●●●●●●●●●●●●●●●● ●●
●●●●●●
●●●●● ●●●●●● ● ●●
●●●
●
●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●
●●●●
●●●●●●
●
● ●●●●●● ●●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●●
●●●●●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
●●●
●
● ●
●●●
●●●●
●●●●●●●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●● ●●●
●
●●●●●●
●●
●●●
●●
●
●
●●
●
●●●
●●
●
●●●●●●●●●
●●●
●●
●
●●●●
●●●●
●●●●●●
●●
●
●●●●
●
●● ●●
●●
●
●
●●●●
●●
●●
●●●
●●●
●
●●●
●
●
●●●
●●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
●●
●●
●
●●●●●●●
●
●
●●●
●
●●
●●● ●
●●●●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●●
●●
●
●●●●
●●
●●●●●●
●
●●●
●●●●
●●●●
●●
●
●
●●●●
●●
●
●●
●
●●
●
●
●●●●●●
●●
●●●
●●●
●
●●●●●●●●●
●
●
●●
●
●●●●
●●●●●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●●●●●●
●●●
●●●●
●●●●●●●●●
●●
●●●●●●
●●●●●
●
●●●●●
●
●●
●●●●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●
●●●●
●●●●
●●●●●●
●
●
●●●●●●●●
●
●
Number ofcylinders
2 4 6 8 10 12
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●●
●●●●●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
●●●
●
●●
●●●
●●●●
●●●●●●●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●● ●●●
●
●●●●●●
●●
●●●
●●
●
●
●●
●
●●●
●●
●
●●●●●●●●●
●●●
●●
●
●●●●
●●●●
●●●●●●
●●
●
●●●●●
●● ●●
800 1200 1600
510
1520
●●
●
●
●● ● ●
●●
●●
●●●
● ●●
●
●●●
●
●
●●●
● ●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
● ●
●●
●
●● ●● ●●●
●
●
●● ●
●
●●
●●● ●
●●●●
●●
●
● ●
●
●
●
● ●
●
●
● ●
●
●● ●
● ●
●●
●
●●●●
●●
●●● ●● ●
●
●●●
●●
●●
●●●●
●●
●
●
● ●● ●
● ●
●
●●●
●●
●
●
●● ● ●●●
●●
●● ●
● ● ●
●
●●●●●
●●●●
●
●
● ●
●
●●●●
●●●● ●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●●●●
●●
●●●
●●●●
●●●●●●●●●
●●
● ●●●●●
● ●● ●●
●
●●
●●●
●
●●
●●●●
●●●●●●
●● ●●● ●● ●● ●● ●
●●● ●● ●●●●●●●●●●
●●● ●● ●
●●●●
●● ●●
●●●●●●
●
●
● ●● ●● ●
● ●
●
●
24
68
1012
●●
●
●
●● ● ● ●●
●●
●● ● ● ● ●
●
●●●● ●●●● ● ● ●● ●● ●●●● ●●●● ●●●●● ●
●●
●
●●●●●
●●● ●
●●●●● ●
●● ●●
●●●● ● ●● ●●●●●●●●●●●●●●●● ●●
● ●●●●●
● ●● ●●●● ●●● ● ● ●●
●●●
●
●●●●●●●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●●●● ●●● ●● ●●●●●
●● ●●
●●●●●●
●
● ● ●● ●● ●● ●
●
●
Curbweight
![Page 7: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/7.jpg)
Car dataI Problem: Predict distance covered per unit fuel (or consumption of fuel)
as a function of certain characteristics of a car
Cityconsumption
12
34
5
●●
●
●
●●●●● ●
●●
●● ●
●● ●
●
●●● ●●
●●●●
●
●
● ●
●
●●●●
●●●● ●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
● ●●●
●●
●●●
●●●●
●●●●●●●●●
●●
●●●● ●●
●●●●●
●
●●
●●●
●
●●
●●●●
●●●● ●●
●●●● ● ● ●●● ● ●●
● ●●● ●●●●● ●●●● ●●
●●●●●●
● ●●●
● ●●●
●●●● ●●
●
●
●●●●●●
● ●
●
●
●●
●
●
●●●● ● ●
●●
●● ● ●● ●
●
●●● ● ●●●● ●● ●● ●● ●●●● ●●●● ●●●●● ●
●●
●
●●●●●
●●● ●
●●●●● ●
●● ●●
●● ●● ●●● ●●●●● ●●●●●●●●●●● ●●
●●●● ●●
●●●●● ●● ●●●● ● ●●
●●●
●
●●●● ●●● ●●● ● ● ●●● ● ●●● ●●● ●●●●● ●●●● ●● ●●●●●●● ●●●
● ●● ●
●●●● ●●
●
● ●●●● ●●● ●
●
●
0.06 0.10 0.14 0.18
800
1200
1600
●●
●
●
●
●
●●
● ●
●●
●●
●
●
●
●
●
●●● ●
●
●●●
●
●
●
●●●●●●●
●●●●
●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
● ●●
●
●●
●●●
●● ●●
●●●●●●●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●● ●●●
●
●●●●
●●
●●
●● ●
● ●
●
●
● ●
●
●●●
● ●
●
●●● ●●●●●●
●●●
●●
●
●●●●
● ●● ●
●●●● ●●
●●
●
●●● ●
●
●● ●●
1 2 3 4 5
●●
●
●
●
●●●
●
●
●●
●●●
●●
●
●
●●●
●
●
●●●
● ●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●
●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
●●
●●
●
●●●●●●
●
●
●
●●●
●
●●
●●●●
●●●●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●●
●●
●
●●●●
●●
●●●●●●
●●●●
●
●
●
●
●●●●
●●
●
●
●●●●
●●
●
●
●
●
Enginesize
●●
●
●
●●●●●●
●●
●●● ●●●
●
●●●●●●●●● ● ●●●● ●●●● ●●●●●●●●● ●
●●
●
●●●●●
●●● ●
●●●●● ●
●● ●●
●●●●● ● ● ●●●●●●●●●●●●●●●● ●●
●●●●●●
●●●●● ●●●●●● ● ●●
●●●
●
●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●
●●●●
●●●●●●
●
● ●●●●●● ●●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●●
●●●●●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
●●●
●
● ●
●●●
●●●●
●●●●●●●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●● ●●●
●
●●●●●●
●●
●●●
●●
●
●
●●
●
●●●
●●
●
●●●●●●●●●
●●●
●●
●
●●●●
●●●●
●●●●●●
●●
●
●●●●
●
●● ●●
●●
●
●
●
●●●
●
●
●●
●●●
●●
●
●
●●●
●
●
●●●
●●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●
●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
●●
●●
●
●●●●●●
●
●
●
●●●
●
●●
●●● ●
●●●●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●●
●●
●
●●●●
●●
●●●●●●
●●●●
●
●
●
●
●●●●
●●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●●●●●●
●●
●●●
●●●
●
●●●●●●●●●
●
●
●●
●
●●●●
●●●●●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●●●●●●
●●●
●●●●
●●●●●●●●●
●●
●●●●●●
●●●●●
●
●●●●●
●
●●
●●●●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●
●●●●
●●●●
●●●●●●
●
●
●●●●●●●●
●
●
Number ofcylinders
2 4 6 8 10 12
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●●
●●●●●
●●
●●
●
●●●
●●●●●
●●●●
●●●●●
●
●●
●
●
●
●●●
●
●●
●●●
●●●●
●●●●●●●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●● ●●●
●
●●●●●●
●●
●●●
●●
●
●
●●
●
●●●
●●
●
●●●●●●●●●
●●●
●●
●
●●●●
●●●●
●●●●●●
●●
●
●●●●●
●● ●●
800 1200 1600
0.06
0.10
0.14
0.18
●●
●
●
●
● ● ●
●
●
●●
●●●
● ●
●
●
●●●
●
●
●●●
● ●
●
●
●
●
●●●●
●●●●
●●●
●●
●
●●
●
●●●●●
●●●
●
●●●●●
●
●●
●●
●
●
●●
●●
●
●●●
●●
●●
●●●●●●●●●
●●
● ●
●●
●
●● ●● ●●
●
●
●
●● ●
●
●●
●●● ●
●●●●
●●
●
● ●
●
●
●
● ●
●
●
● ●
●
●● ●
● ●
●●
●
●●●●
●●
●●● ●● ●
●●●●
●
●
●
●
●●●●
●●
●
●
● ●● ●
● ●
●
●
●
●
●●
●
●
●● ● ●●●
●●
●● ●
● ● ●
●
●●●●●
●●●●
●
●
● ●
●
●●●●
●●●● ●●●
●●
●
●●
●
●●●●●
●●●●
●●●●●
●
●●
●●
●
●●●●
●●
●●●
●●●●
●●●●●●●●●
●●
● ●●●●●
● ●● ●●
●
●●
●●●
●
●●
●●●●
●●●●●●
●● ●●● ●● ●● ●● ●
●●● ●● ●●●●●●●●●●
●●● ●● ●
●●●●
●● ●●
●●●●●●
●
●
● ●● ●● ●
● ●
●
●
24
68
1012
●●
●
●
●● ● ● ●●
●●
●● ● ● ● ●
●
●●●● ●●●● ● ● ●● ●● ●●●● ●●●● ●●●●● ●
●●
●
●●●●●
●●● ●
●●●●● ●
●● ●●
●●●● ● ●● ●●●●●●●●●●●●●●●● ●●
● ●●●●●
● ●● ●●●● ●●● ● ● ●●
●●●
●
●●●●●●●● ●●● ●● ●● ●● ●●●● ●● ●●●●●●●●●● ●●● ●● ●●●●●
●● ●●
●●●●●●
●
● ● ●● ●● ●● ●
●
●
Curbweight
![Page 8: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/8.jpg)
Wage data
I Variable of interest: WageI Explanatory variables:
I ageI yearI education
20 40 60 80
50
10
02
00
30
0
Age
Wa
ge
2003 2006 2009
50
10
02
00
30
0
Year
Wa
ge
1 2 3 4 5
50
10
02
00
30
0Education Level
Wa
ge
1Some of the figures in this presentation is taken from "An Introduction to Statistical Learning, with
applications in R"(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R.
Tibshirani
![Page 9: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/9.jpg)
Stock marked data
I Variable of interest: Increase/reduction in Økning/reduksjon i akskjeverdi(direction)
I Explanatory variables:I Value earlier days (lag1-5)I Volume (volume)
Down Up
−4
−2
02
46
Yesterday
Today’s Direction
Pe
rce
nta
ge
ch
an
ge
in
S&
P
Down Up
−4
−2
02
46
Two Days Previous
Today’s Direction
Pe
rce
nta
ge
ch
an
ge
in
S&
P
Down Up
−4
−2
02
46
Three Days Previous
Today’s Direction
Pe
rce
nta
ge
ch
an
ge
in
S&
P
![Page 10: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/10.jpg)
Eyedata
I Gene expression datafrom a microarray experiment on 120 rats(Scheetz et al., 2006)I Response y : Ekspresjonslevel on TRIM32 geneI Explanatory variables x1, ..., x200: Data from 200 gene probes.
x.1377
3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 4.5 5.0 5.5 8.1 8.2 8.3 8.4 8.5 8.6 8.7
3.6
4.2
4.8
3.4
4.0
4.6
x.1748
x.2487
4.5
5.5
4.5
5.5
x.2679
x.2789
6.0
7.0
3.6 3.8 4.0 4.2 4.4 4.6 4.8
8.1
8.4
8.7
4.5 5.0 5.5 6.0 6.0 6.5 7.0 7.5
y
I Possible model: Y = β0 +∑p
j=1 βjXj + ε
I Least squares estimate:
β = (XT X)−1XT Y
I Problem: XT X not invertible!I Problem: Many explanatory variables compared to the number of
individuals
![Page 11: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/11.jpg)
Images of faces
I Images of students at Standord (100 women, 100 men)I Each image consists 100× 100 = 10 000 pixelsI Possible to classify gender based on these images?
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
I Problem: Many explanatory variables compared to the number ofindividuals
![Page 12: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/12.jpg)
Gen ekspresjonsdata
I 6,830 gene expresjons measurementsI No spesific response/variable of interestI Data fom 64 cancer cellsI Of interst to group cells
−40 −20 0 20 40 60
−60
−40
−20
020
−40 −20 0 20 40 60
−60
−40
−20
020
Z1Z1
Z2
Z2
![Page 13: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/13.jpg)
STK2100
I Focus on prediction and classificationI Will also touch upon problems related to big data
I Statistical issuesI p larger than n problemsI Multiple testing
I Computational issues
I Textbook: Azzalini, Scarpa: Data Analysis and Data MiningI Supplementary text: James, Witten, Hastie, Tibshirani: An Introduction to
Statistical learning - with applications in RI Supplemented by extra exercises/notes.I Recommended previous knowledge
I Calkulus/linear algebra (MAT1100/1110/1120)I Probability theory/statistical inference (STK1100/1110)I Basic programming (IN1900 or IN1000)
![Page 14: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/14.jpg)
Statistical learing
I Focus: Prediction of a response variable based on explanatory variablesítem If quantitative reponse variable: Regression
I If categorical response variable: ClassifikationI Machine learing: Focus on algorithms for prediction
I Often used as a black boxI Evaluation often based on a separate test set
I Statistical learing: Focus on method for predictionI More "open box"I Focus on inference
I quantification of uncertaintyI Evaluation of methodI Hypothesis testing
I No clear distinction, both have focus on prediction
![Page 15: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/15.jpg)
Prediction - advertising data
I Response: Sale of product in 200 different markeds (sales)I Explantatory variables:
I Advertising budget in tv (TV)I Advertising budget in radio (radio)I Advertising budget in newspapers (newspaper)
0 50 100 200 300
510
15
20
25
TV
Sale
s
0 10 20 30 40 50
510
15
20
25
Radio
Sale
s
0 20 40 60 80 100
510
15
20
25
Newspaper
Sale
s
![Page 16: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/16.jpg)
Prediction - Wage data
I Response: Income (income)I Explanatory variables: years of education
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
![Page 17: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/17.jpg)
Statistical model for prediction
I x = (x1, ..., xp) is a set of explanatory variables, y is response.I Possible model: Y = f (x) + ε,E [ε] = 0I Sales data: Possible choice of f :
f (x) = β0 +
p∑j=1
βjxj
I Wage data: Non-linear relationship, not obvious how to choose f
![Page 18: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/18.jpg)
Estimation of f
I Assume Y = f (x) + ε,E [ε] = 0, f unknown
E [Y |x] = f (x)
I Why estimate f?I Prediction: y = f (x)
I Can think of f as a black box, not important to know the form of fI Understanding on how y is influenced by x1, ..., xp .
I Which variable(s) influence the response?I What is the relationship between a variable and the response?
Is the relationship linear?
I Precision of an estimate (for given x and f ):
E(Y − Y )2 = E[(f (x)− f (x))2]︸ ︷︷ ︸reducible
+ V(ε)︸︷︷︸Non-reducible
Choice of method for estimation will influence the the reducible part ofthe error in prediction.
![Page 19: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/19.jpg)
Estimation of f - parametric methodsI Linear regression: Assume
f (x) = β0 + β1x1 + · · ·+ βpxp
Example of a parametric model.The model is described by the parameters β = (β0, ..., βp).Estimation of f through estimation of parameters β
I Assume p = 1 and
f (x) = β1eλ1x + β2eλ2x
Also an example of a parametric model.The model is described by the parameters θ = (β1, β2, λ1, λ2).Estimation of f through estimation of parameters θ
0 50 100 150 200
4050
6070
8090
100
x
y
![Page 20: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/20.jpg)
Estimation of f - non-parametric methods
I Non-parametric methods: Assume f (x) is a smooth function in x.
I Data: {(x1, y1), · · · , (xn, yn)}I Estimation: f (x) = E [Y |X = x]
can be estimated by
f (x) =∑n
i=1 I(xi = x)yi∑ni=1 I(xi = x)
Problem: Can be very fewobservations with xi = x.
I Idea: Since f (x) ≈ f (xi) for xi
close to x:
f (x) =∑n
i=1 I(xi ≈ x)yi∑ni=1 I(xi ≈ x)
Years of Education
Sen
iorit
y
Incom
e
![Page 21: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/21.jpg)
Parametric or non-parametric
I Precision of estimate:
E(Y − Y )2 = E [f (x)− f (x)]2︸ ︷︷ ︸reducible
+ V(ε)︸︷︷︸Non-reducible
I Reducible part:
E [f (x)− f (x)]2 =E [f (x)− E [f (x)] + E [f (x)]− f (x)]2
=[f (x)− E [f (x)]]2 + E [(E [f (x)]− f (x))2]
= [f (x)− E [f (x)]]2︸ ︷︷ ︸Bias
+Var[f (x)]︸ ︷︷ ︸Variance
I Few assumptions about f gives small bias but can give high varianceI Harder assumptions on f can give small variance but high bias (if
assumptions are wrong)I For interpretation: Simpler forms of f is preferable
![Page 22: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/22.jpg)
Overview of different methods
Flexibility
Inte
rpre
tabili
ty
Low High
Low
Hig
h Subset SelectionLasso
Least Squares
Generalized Additive ModelsTrees
Bagging, Boosting
Support Vector Machines
![Page 23: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/23.jpg)
Regression vs classification
I Variables often divided into two characteristics:I Quantitative: Numerical values (ordered)I Qualitative: Categorical, no ordering
I Regression: Response is quantitativeI Classification: Response is qualitativeI Note:
I There can be qualitative explanatory variables within regressionI There can be quantitative explanatory variables within classification
![Page 24: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/24.jpg)
Unsupervised learing
I So far: Response Y and explanatory variable(s) XI Supervised learing
I In many situations: Only XI Gene expression data
I Wants to find relations between variablesI Common approach : Cluster/group data
0 2 4 6 8 10 12
24
68
10
12
0 2 4 6
24
68
X1X1
X2
X2
I Unsupervised learing (Chap 6)
![Page 25: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/25.jpg)
Evalution of precision
I Goal: Introduce many statistical learning methods (extending linearregression/logistic regression)
I No method will be the best in all situationsI Depend on how complex f is, how much data is available etc.
I In a specific problem setting: Try out several methodsI How to evaluate which one is the best?I Possible goal: (regression):
E[(Y ∗ − f (x∗))2]
where x∗ are values of explanatory variables where we want to predict anew Y ∗.
I Problem: Y ∗ is unknown
![Page 26: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/26.jpg)
Estimation of measurement errorI Possible estimate on E[(Y ∗ − f (x∗))2]:
MSE = 1n
n∑i=1
(Yi − f (xi))2
I Measurement error evaluated on the same data as the ones used forestimating f .I Problem: We are interested in prediction on new dataI Use of the same data can give too optimistic answers
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Mean S
quare
d E
rror
I Idea: Divide data into two parts:I Training data: Used for estimating fI Test data: Used for estimating E[(Y∗ − f (x∗))2]
![Page 27: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/27.jpg)
Variance vs bias
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Mean S
quare
d E
rror
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Mean S
quare
d E
rror
0 20 40 60 80 100
−10
010
20
X
Y
2 5 10 20
05
10
15
20
Flexibility
Mean S
quare
d E
rror
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
05
10
15
20
Flexibility
MSEBiasVar
E [(f (x)− f (x))2] = [f (x)− E [f (x)]]2︸ ︷︷ ︸Bias
+Var[f (x)]︸ ︷︷ ︸Variance
I Bias decrease with higher flexibilityI Variance increase with higher fleksibility
![Page 28: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/28.jpg)
Classification
I Regression: Measurement error by E[(Y ∗ − f (x∗))2]
I Classification: Error rate: E[I(Y 6= Y )]
I Can be estimated by 1n
∑ni=1 I(yi 6= yi)
I Same problem as before:I Estimated error rate too optimistic if estimated from the same dataI Trade-off between bias and variance
![Page 29: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/29.jpg)
Bayes classificator
I Can show (exerice) that the classification which minimizes error rate is
Y = argmaxj
Pr(Y = j|X = x)
I Called the Bayes classificator
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o oo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2 Border between regions are called
Bayesian decision borders
![Page 30: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/30.jpg)
K -nearest neighbors
I Ideal: Use Bayes classificatorI In practice Pr(Y = k |X = x∗) is unkown and needs to be estimated.I K -nearest neighbor: Let N0 ⊂ {1, ..., n} be the K indexes with xi nearest
to x∗.I Estimation of Pr(Y = k |X = x0) = E [I(Y = k |X = x∗)]:
Pr(Y = k |X = x) =1K
∑i∈N0
I(yi = k)
I Classification rule: Classify test point with X = x∗ to the class with thehighest number of cases among the K nearest points.
o
o
o
o
o
oo
o
o
o
o
o o
o
o
o
o
oo
o
o
o
o
o
![Page 31: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/31.jpg)
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o oo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2
KNN: K=10
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
KNN: K=1 KNN: K=100
0.01 0.02 0.05 0.10 0.20 0.50 1.00
0.0
00.0
50.1
00.1
50.2
0
1/K
Err
or
Rate
Training Errors
Test Errors
![Page 32: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/32.jpg)
Plan
I Chapter 2: Linear modelsI Linear regressionI Logistic regressionI Least squares and maximum likelihoodI Also parts of Appendix A
I Chapter 3: Optimism, conflicts and trade-offsI General concepts, evaluation, comparison
I Chapter 4: Regression methodsI Chapter 5: Classification methodsI Chapter 6: Cluster analysis
![Page 33: STK2100 - Machine Learning and Statistical Methods for … · STK2100 - Machine Learning and Statistical Methods for Prediction and Classification Lecturer: Geir Storvik Exercises:](https://reader035.vdocuments.mx/reader035/viewer/2022070214/61116fefee8834205a6df3bb/html5/thumbnails/33.jpg)
A. Azzalini and B. Scarpa. Data analysis and data mining: An introduction.OUP USA, 2012.