introduction to machine learning€¦ · introductiontomachinelearning gillesgasso insa rouen - asi...
TRANSCRIPT
Introduction to Machine Learning
Gilles Gasso
INSA Rouen - ASI DepartementLITIS Lab
September 5, 2019
Gilles Gasso Introduction to Machine Learning 1 / 32
Machine Learning: introduction
Machine Learning ≡ data-based programmingability of computers to learn how to perform tasks (classification,detection, translation. . . ) without being explicitly programmed
study of algorithms that improve their performance at some task basedon experience
Gilles Gasso Introduction to Machine Learning 2 / 32
The rise
DataBig Data : continuous increase in data generated
Twitter : 50M tweets /day (=7 terabytes)Facebook : 10 terabytes /dayYoutube : 50h of uploaded videos /minute2.9 millions of e-mails /second
Computing PowerMoore’s lawMassively distributedcomputing
The value-adding processInterest: from product to customers.Data Mining ≡ discovering patterns inlarge data sets
Gilles Gasso Introduction to Machine Learning 3 / 32
Historical perspective
Today’s IA is Deep Learning (a technique of Machine Learning)
https://blog.alore.io/machine-learning-and-artificial-intelligence/
Gilles Gasso Introduction to Machine Learning 4 / 32
Applications: sentiment analysis
Classify stored reviews according to users’ sentiment
Score(x) > 0
Score(x) < 0
Gilles Gasso Introduction to Machine Learning 5 / 32
Product recommendation
Product features
User features
Matrix factorization Purchase history of customers
https://katbailey.github.io/images/matrix_factorization.png
Purchased item Recommended products
Gilles Gasso Introduction to Machine Learning 6 / 32
Image classification
Labeled training images Deep classification architecture
https://www.researchgate.net/profile/Y_Nikitin/publication/270163511/figure/download/fig5/AS:295194831409153@1447391340221/MPCNN-architecture-using-alternating-convolutional-and-max-pooling-layers-13.png
Input images and predicted category
https://adriancolyer.files.wordpress.com/2016/04/imagenet-fig4l.png?w=656
Gilles Gasso Introduction to Machine Learning 7 / 32
Medical diagnosis
Malignant melanoma detection130 000 images including over 2 000 cases of cancerError rate 28 % (human 34 %)
Digital Mammography DREAM Challenge640 000 mammographies (1209 participants)false-positive rate decreased by 5 %
Heart rhythm analysis500 000 ECGaccuracy 92.6 % (human 80.0 %) sensitivity of 97 %
Gilles Gasso Introduction to Machine Learning 8 / 32
Object detection
https://www.mdpi.com/applsci/applsci-09-01128/article_deploy/html/images/applsci-09-01128-g004.pn
https://arxiv.org/pdf/1506.02640.pdf
Gilles Gasso Introduction to Machine Learning 9 / 32
Audio captioning
https://ars.els-cdn.com/content/image/1-s2.0-S1574954115000151-gr1.jpg
Gilles Gasso Introduction to Machine Learning 10 / 32
Chatbot
https://miro.medium.com/max/2058/1*LF5T9fsr4w2EqyFJkb-gng.png
Gilles Gasso Introduction to Machine Learning 11 / 32
Implementing a Machine Learning project
Real World
IndustriesInsuranceInternet
Social NetworksMedical
. . .
DataGathering
SensorsTransactions
Web-clicks,logsMobility
Open dataDocs
. . .
DataStrength-ening
CleaningOrganizingAggregationManagement
of errors
. . .
Dataengineering
ExplorationRepresentation
DisplayMachine Learning
Data Mining
. . .
Business
InterpretationPattern ofinterests
Strategic decisionValuation
. . .
Gilles Gasso Introduction to Machine Learning 12 / 32
Chain of the data engineering process
DataPre-
processingTrain themodel
Evaluatethe
modelModel
1 Understand and specify projectgoals
2 Pre-processing/visualize/analyzedata
3 Which ML problem is it?4 Design a solving approach5 Evaluate its performance6 Go to to 2) if needed
1
1
1
1 1
1
1
1
1
1
1
Y
? 1
1
Course Goal: Study the steps from 2 to 5
Gilles Gasso Introduction to Machine Learning 13 / 32
The data
Information (past experience) are examples with attributesAssume the data set consists of N samples
AttributesAn attribute is a property or characteristic of a phenomenon beingobserved. Also called it feature or variable
SampleIt is an entity characterising an object; it is made up of attributes.Synonyms : instance, point, vector (usually in Rd)
Gilles Gasso Introduction to Machine Learning 14 / 32
Data : visualizationhhhhhhhhhPoints x
Features citric acid residual sugar chlorides sulfur dioxide
1 0 1.9 0.076 112 0 2.6 0.098 253 0.04 2.3 0.092 15
Point x ∈ R4 0.56 1.9 0.075 175 0 1.9 0.076 116 0 1.8 0.075 137 0.06 1.6 0.069 158 0.02 2 0.073 99 0.36 2.8 0.071 1710 0.08 1.8 0.097 15
1.5 2 2.5 30.065
0.07
0.075
0.08
0.085
0.09
0.095
0.1
Variable 2 : Residual Sugar
Variable
3 : C
hlo
rides
Points
Moyenne des points
1.61.8
22.2
2.42.6
2.8
0.06
0.07
0.08
0.09
0.1
10
15
20
25
Variable 2 : Residual Sugar
Variable 3 : Chlorides
Variable
4 : S
ulfur
Gilles Gasso Introduction to Machine Learning 15 / 32
Data types
Sensors → Quantitative andqualitative variables,ordinales, nominals
Text → StringSpeech → Time SeriesImages → 2D DataVideos → 2D Data + time
Networks → GraphsStream → Logs, coupons. . .Labels → Expected output prediction
Gilles Gasso Introduction to Machine Learning 16 / 32
Approaches of Machine Learning
Gilles Gasso Introduction to Machine Learning 17 / 32
Supervised learning
Principle
Given a set of N training examples {(xi , yi ) ∈ X × Y, i = · · · ,N}, wewant to estimate a prediction function y = f (x).
The supervision comes from the label knowledge
ExamplesImage classification, object detection, stock price prediction . . .
Gilles Gasso Introduction to Machine Learning 18 / 32
Unsupervised learning
Principle
Only the {xi ∈ X , i = · · · ,N} are available. We aim to describe howdata is organized and extract homogeneous subsets from it.
ExamplesApplications: Customers segmentation, image segmentation, datavisualization, categorization of similar documents . . .
-60 -40 -20 0 20 40 60 80-80
-60
-40
-20
0
20
40
60
80
00
00
00 00
000
0
0
0 0
000
0
00
00
00
0
0
0
0 0
0
00
0 0
000
00000
0
00 00 00
00
0
00
0
0
000 00
000
00
0000
0
00 0
0
000
00
0000
0000
0
0
0000
0
00
00
1
111
1
1
1
11
11
111
11
11
1
11
1
11
1
11
11
11
1
1
1
1111
111
11111
111
1
1
11
1
111
11
11
1
1
1
1
111
1
11
1 11
11
11 1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1 1
1
2
22
2
2
2
22
2
2 22
2
2
2
2
222 2
222
2
2
2
2
22 2
2
222
2
22
2
2 2
2
2
22
2
2
22
2 2
222
2
222
2
2
2
2
2
2
222
22
2
2
2
2
2
2 2
2
2
2
2
2
2
22222
2
2
2 2
2
22222
2
2
22
3 3
3
3
3
3
33
3 33
33
3
3
33
333
3
333
3
33
3
3
33 3
3
3
33
3
3
3
3
3
33
3
3
3
3
3
33
3
3
3
3
333
3
3
33 33
3
3
3333
3
3
33
3
3
3
3
3
333
33
3
3
33
3
3
3
33 33
3
3
33
33
4
4
4
4
4
44
4
4
4
4
4 4444
4
4
4
4
4
4
4
444
4
44
44
4
4
4
4 444
44
4
44
4
44
4
4
4
4
4444
4
4
4
4
4
4
44
444 4
4
4
44
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
44
44
4
4
4
44 4
44
5
555555
5
5 55
5
5
5
55
5
5
5
5
5
5
5
5
5
55
55
5
55
5
5
5
5
5
5
5
5
5
5
5
5
5
555
5
5
5
5
5
55
5
55
5
55
5
5
55
555
5
5
5
55555
55
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
55
5
56
6
6
66
6 6666
666 6
6 666
6
66 66
6
6
6
6
66
6
66
6 6
6
66
66
6 66
6
6
6
66 6
6 66
6
6
6
66
6
6
6
66
66666666
6
6
66
66 6
66
6 66
6
6
6
6
6
6
66
66
6
6
6
66
6
66
6
77
77
777
7
7
7
777
7
7
7
7
7
77
7
7
7
77 7
7
7
7
777
7
7
7
77
7
77
7
77
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
777
7
7
77
7
77
7
7
7
7
77
77
7
77
7
77
7
7
7
7 77
7
77
7
7 7
77 7
77
88
8 888 888
8
88 8
88
88
8
8 8
8
8
88
88
8
8
8
8
8
888
8
888 8
8
88
8888 8
8
8
8
8
8
88 88
8
8
88888
888 88 8
8
8 88
8 888
888
8
88
8
8
88
8
8
8
8
8
8
8 8
88
888
999
9
99
9
9
9
9
9
99
99
9
9
9
9
9
9
99
9
99
9
99
9
9
9
9
9
9999
999
9
99999
9
99
9
9
9
9
99
9
99
999
9 99
9
99
99 9 9
9
9 99
9
9 9
9
9
9
9
9
999
9
9
9
9
9
9
9
99
99
9
9
Gilles Gasso Introduction to Machine Learning 19 / 32
Supervised learning : concept
Let X and Y be two sets. Assume p(X ,Y ) the joint probabilitydistribution of (X ,Y ) ∈ X × Y.
Goal : find a prediction function f : X → Y which correctly estimatesthe output y corresponding to x .
f belongs to a space H called hypothesis class. Example of H : setof polynomial functions
Gilles Gasso Introduction to Machine Learning 20 / 32
Supervised learning: principle
Loss function L(Y , f (X ))
evaluates how ”close” is the prediction f (x) to the true label y
it penalizes errors: L(y , f (x)) ={
0 if y = f (x)≥ 0 if y 6= f (x)
True risk (expected prediction error)
R(f ) = E(X ,Y )[L(Y , f (X ))] =
∫X×Y
L(y , f (x))p(x , y)dxdy
ObjectiveIdentify the prediction function which minimizes the true risk i.e.
f ? = arg minf ∈H
R(f )
Gilles Gasso Introduction to Machine Learning 21 / 32
Loss functions
Quadratic loss : L(Y , f (X )) = (Y − f (X ))2
`1 loss (absolute deviation): L(Y , f (X )) = |Y − f (X )|
0− 1 loss: L(y , f (x)) ={
0 if y = f (x)1 otherwise
Hinge loss: L(y , f (x)) = max(0, 1− yf (x))
Gilles Gasso Introduction to Machine Learning 22 / 32
Some supervised learning problems
RegressionWe talk about regression whenY is a subset of Rd .
Usual related loss function:quadratic loss (y − f (x))2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−1.5
−1
−0.5
0
0.5
1Support Vector Machine Regression
x
y
Gilles Gasso Introduction to Machine Learning 23 / 32
Some supervised learning problems
ClassificationOutput space Y is an un-ordered discrete setBinary classification: card(Y) = 2
Example: Y = {−1, 1}Loss functions: 0-1 loss, hinge loss
Multiclass classification: card(Y) > 2Example: Y = {1, 2, · · · ,K}
Gilles Gasso Introduction to Machine Learning 24 / 32
From true risk to empirical risk
Minimizing the true risk is not duable (in allmost all practical applications)
The joint distribution p(X ,Y ) is unknown!
Only a finite training set {(xi , yi ) ∈ X × Y}Ni=1 is available
Empirical risk
Remp(f ) =1N
N∑i=1
L(yi , f (xi ))
Empirical risk minimization
f̂ = arg minf ∈H
Remp(f )
Gilles Gasso Introduction to Machine Learning 25 / 32
Overfitting
OverfittingEmpirical risk is not appropriate for model selection: if H is large enough,Remp(f )→ 0 but the generalized error (true risk) is high.
Err
eur
de
pre
dic
tio
n
Ensemble de Test
Ensemble d’apprentissage
Faible ElevéComplexité du modèle
Gilles Gasso Introduction to Machine Learning 26 / 32
Illustration of overfitting
Valeurs decroissantes de k0102030
Ris
que
empi
rique
0
0.1
0.2
0.3
AppValTest
https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/ML_basics_lecture6_overfitting.pdf
Gilles Gasso Introduction to Machine Learning 27 / 32
Model selection
Find in H the best function f that learned based on the training setwill well generalize (low true risk)
Example : We are looking for a polynomial function of degree αminimizing the risk : Remp(fα) =
∑Ni=1(yi − fα(xi ))
2.
Goal :1 propose a model estimation method in order to choose (approximately)
the best model belonging to H .2 once the model is selected, estimate its generalization error.
Gilles Gasso Introduction to Machine Learning 28 / 32
Model selection : basic approache
Case 1 : N is really big (large scale DN )
Validation Apprentissage {X app ,Y app }
Données disponibles
Test {X test ,Y test }{X val ,Y val}
1 Randomly split DN = Dtrain ∪ Dval ∪ Dtest
2 For each α, train fα based on Dtrain
3 Evaluating its performance on Dval Rval =1
Nval
∑i∈Dval
L(yi , f (xi ))
4 Select the model with the best performance on Dval
5 Test selected model on Dtest
NoteDtest is used once!
Gilles Gasso Introduction to Machine Learning 29 / 32
Model selection : Cross-validation
Case 2 : Small or medium scale DN
Estimate the generalization error by re-sampling.
Principle
1 Split DN into K sets of equal size.
2 For each k = 1, · · · ,K , train a model by using the K − 1 remainingsets and evaluate the model on the k-th part.
3 Average the K error estimates obtained to have the cross-validationerror.
TestApprentissageValidation
Validation TestApprentissage Apprentissage
Validation TestApprentissage
Gilles Gasso Introduction to Machine Learning 30 / 32
Conclusions
To successfully carry out an automatic data processing projectClearly identify and spell out the needs.
Create or obtain data representative of the problem
Identify the context of learning
Analyze and reduce data size
Choose an algorithm and/or a space of hypotheses
Choose a model by applying the algorithm to pre-processed data
Validate the performance of the method
Gilles Gasso Introduction to Machine Learning 31 / 32
Au final ...
Find your way. . .
Gilles Gasso Introduction to Machine Learning 32 / 32