introduction to machine learning€¦ · introductiontomachinelearning gillesgasso insa rouen - asi...

Post on 27-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Machine Learning

Gilles Gasso

INSA Rouen - ASI DepartementLITIS Lab

September 5, 2019

Gilles Gasso Introduction to Machine Learning 1 / 32

Machine Learning: introduction

Machine Learning ≡ data-based programmingability of computers to learn how to perform tasks (classification,detection, translation. . . ) without being explicitly programmed

study of algorithms that improve their performance at some task basedon experience

Gilles Gasso Introduction to Machine Learning 2 / 32

The rise

DataBig Data : continuous increase in data generated

Twitter : 50M tweets /day (=7 terabytes)Facebook : 10 terabytes /dayYoutube : 50h of uploaded videos /minute2.9 millions of e-mails /second

Computing PowerMoore’s lawMassively distributedcomputing

The value-adding processInterest: from product to customers.Data Mining ≡ discovering patterns inlarge data sets

Gilles Gasso Introduction to Machine Learning 3 / 32

Historical perspective

Today’s IA is Deep Learning (a technique of Machine Learning)

https://blog.alore.io/machine-learning-and-artificial-intelligence/

Gilles Gasso Introduction to Machine Learning 4 / 32

Applications: sentiment analysis

Classify stored reviews according to users’ sentiment

Score(x) > 0

Score(x) < 0

Gilles Gasso Introduction to Machine Learning 5 / 32

Product recommendation

Product features

User features

Matrix factorization Purchase history of customers

https://katbailey.github.io/images/matrix_factorization.png

Purchased item Recommended products

Gilles Gasso Introduction to Machine Learning 6 / 32

Image classification

Labeled training images Deep classification architecture

https://www.researchgate.net/profile/Y_Nikitin/publication/270163511/figure/download/fig5/AS:295194831409153@1447391340221/MPCNN-architecture-using-alternating-convolutional-and-max-pooling-layers-13.png

Input images and predicted category

https://adriancolyer.files.wordpress.com/2016/04/imagenet-fig4l.png?w=656

Gilles Gasso Introduction to Machine Learning 7 / 32

Medical diagnosis

Malignant melanoma detection130 000 images including over 2 000 cases of cancerError rate 28 % (human 34 %)

Digital Mammography DREAM Challenge640 000 mammographies (1209 participants)false-positive rate decreased by 5 %

Heart rhythm analysis500 000 ECGaccuracy 92.6 % (human 80.0 %) sensitivity of 97 %

Gilles Gasso Introduction to Machine Learning 8 / 32

Object detection

https://www.mdpi.com/applsci/applsci-09-01128/article_deploy/html/images/applsci-09-01128-g004.pn

https://arxiv.org/pdf/1506.02640.pdf

Gilles Gasso Introduction to Machine Learning 9 / 32

Audio captioning

https://ars.els-cdn.com/content/image/1-s2.0-S1574954115000151-gr1.jpg

Gilles Gasso Introduction to Machine Learning 10 / 32

Chatbot

https://miro.medium.com/max/2058/1*LF5T9fsr4w2EqyFJkb-gng.png

Gilles Gasso Introduction to Machine Learning 11 / 32

Implementing a Machine Learning project

Real World

IndustriesInsuranceInternet

Social NetworksMedical

. . .

DataGathering

SensorsTransactions

Web-clicks,logsMobility

Open dataDocs

. . .

DataStrength-ening

CleaningOrganizingAggregationManagement

of errors

. . .

Dataengineering

ExplorationRepresentation

DisplayMachine Learning

Data Mining

. . .

Business

InterpretationPattern ofinterests

Strategic decisionValuation

. . .

Gilles Gasso Introduction to Machine Learning 12 / 32

Chain of the data engineering process

DataPre-

processingTrain themodel

Evaluatethe

modelModel

1 Understand and specify projectgoals

2 Pre-processing/visualize/analyzedata

3 Which ML problem is it?4 Design a solving approach5 Evaluate its performance6 Go to to 2) if needed

1

1

1

1 1

1

1

1

1

1

1

Y

? 1

1

Course Goal: Study the steps from 2 to 5

Gilles Gasso Introduction to Machine Learning 13 / 32

The data

Information (past experience) are examples with attributesAssume the data set consists of N samples

AttributesAn attribute is a property or characteristic of a phenomenon beingobserved. Also called it feature or variable

SampleIt is an entity characterising an object; it is made up of attributes.Synonyms : instance, point, vector (usually in Rd)

Gilles Gasso Introduction to Machine Learning 14 / 32

Data : visualizationhhhhhhhhhPoints x

Features citric acid residual sugar chlorides sulfur dioxide

1 0 1.9 0.076 112 0 2.6 0.098 253 0.04 2.3 0.092 15

Point x ∈ R4 0.56 1.9 0.075 175 0 1.9 0.076 116 0 1.8 0.075 137 0.06 1.6 0.069 158 0.02 2 0.073 99 0.36 2.8 0.071 1710 0.08 1.8 0.097 15

1.5 2 2.5 30.065

0.07

0.075

0.08

0.085

0.09

0.095

0.1

Variable 2 : Residual Sugar

Variable

3 : C

hlo

rides

Points

Moyenne des points

1.61.8

22.2

2.42.6

2.8

0.06

0.07

0.08

0.09

0.1

10

15

20

25

Variable 2 : Residual Sugar

Variable 3 : Chlorides

Variable

4 : S

ulfur

Gilles Gasso Introduction to Machine Learning 15 / 32

Data types

Sensors → Quantitative andqualitative variables,ordinales, nominals

Text → StringSpeech → Time SeriesImages → 2D DataVideos → 2D Data + time

Networks → GraphsStream → Logs, coupons. . .Labels → Expected output prediction

Gilles Gasso Introduction to Machine Learning 16 / 32

Approaches of Machine Learning

Gilles Gasso Introduction to Machine Learning 17 / 32

Supervised learning

Principle

Given a set of N training examples {(xi , yi ) ∈ X × Y, i = · · · ,N}, wewant to estimate a prediction function y = f (x).

The supervision comes from the label knowledge

ExamplesImage classification, object detection, stock price prediction . . .

Gilles Gasso Introduction to Machine Learning 18 / 32

Unsupervised learning

Principle

Only the {xi ∈ X , i = · · · ,N} are available. We aim to describe howdata is organized and extract homogeneous subsets from it.

ExamplesApplications: Customers segmentation, image segmentation, datavisualization, categorization of similar documents . . .

-60 -40 -20 0 20 40 60 80-80

-60

-40

-20

0

20

40

60

80

00

00

00 00

000

0

0

0 0

000

0

00

00

00

0

0

0

0 0

0

00

0 0

000

00000

0

00 00 00

00

0

00

0

0

000 00

000

00

0000

0

00 0

0

000

00

0000

0000

0

0

0000

0

00

00

1

111

1

1

1

11

11

111

11

11

1

11

1

11

1

11

11

11

1

1

1

1111

111

11111

111

1

1

11

1

111

11

11

1

1

1

1

111

1

11

1 11

11

11 1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1 1

1

2

22

2

2

2

22

2

2 22

2

2

2

2

222 2

222

2

2

2

2

22 2

2

222

2

22

2

2 2

2

2

22

2

2

22

2 2

222

2

222

2

2

2

2

2

2

222

22

2

2

2

2

2

2 2

2

2

2

2

2

2

22222

2

2

2 2

2

22222

2

2

22

3 3

3

3

3

3

33

3 33

33

3

3

33

333

3

333

3

33

3

3

33 3

3

3

33

3

3

3

3

3

33

3

3

3

3

3

33

3

3

3

3

333

3

3

33 33

3

3

3333

3

3

33

3

3

3

3

3

333

33

3

3

33

3

3

3

33 33

3

3

33

33

4

4

4

4

4

44

4

4

4

4

4 4444

4

4

4

4

4

4

4

444

4

44

44

4

4

4

4 444

44

4

44

4

44

4

4

4

4

4444

4

4

4

4

4

4

44

444 4

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

44

44

4

4

4

44 4

44

5

555555

5

5 55

5

5

5

55

5

5

5

5

5

5

5

5

5

55

55

5

55

5

5

5

5

5

5

5

5

5

5

5

5

5

555

5

5

5

5

5

55

5

55

5

55

5

5

55

555

5

5

5

55555

55

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

55

5

56

6

6

66

6 6666

666 6

6 666

6

66 66

6

6

6

6

66

6

66

6 6

6

66

66

6 66

6

6

6

66 6

6 66

6

6

6

66

6

6

6

66

66666666

6

6

66

66 6

66

6 66

6

6

6

6

6

6

66

66

6

6

6

66

6

66

6

77

77

777

7

7

7

777

7

7

7

7

7

77

7

7

7

77 7

7

7

7

777

7

7

7

77

7

77

7

77

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

777

7

7

77

7

77

7

7

7

7

77

77

7

77

7

77

7

7

7

7 77

7

77

7

7 7

77 7

77

88

8 888 888

8

88 8

88

88

8

8 8

8

8

88

88

8

8

8

8

8

888

8

888 8

8

88

8888 8

8

8

8

8

8

88 88

8

8

88888

888 88 8

8

8 88

8 888

888

8

88

8

8

88

8

8

8

8

8

8

8 8

88

888

999

9

99

9

9

9

9

9

99

99

9

9

9

9

9

9

99

9

99

9

99

9

9

9

9

9

9999

999

9

99999

9

99

9

9

9

9

99

9

99

999

9 99

9

99

99 9 9

9

9 99

9

9 9

9

9

9

9

9

999

9

9

9

9

9

9

9

99

99

9

9

Gilles Gasso Introduction to Machine Learning 19 / 32

Supervised learning : concept

Let X and Y be two sets. Assume p(X ,Y ) the joint probabilitydistribution of (X ,Y ) ∈ X × Y.

Goal : find a prediction function f : X → Y which correctly estimatesthe output y corresponding to x .

f belongs to a space H called hypothesis class. Example of H : setof polynomial functions

Gilles Gasso Introduction to Machine Learning 20 / 32

Supervised learning: principle

Loss function L(Y , f (X ))

evaluates how ”close” is the prediction f (x) to the true label y

it penalizes errors: L(y , f (x)) ={

0 if y = f (x)≥ 0 if y 6= f (x)

True risk (expected prediction error)

R(f ) = E(X ,Y )[L(Y , f (X ))] =

∫X×Y

L(y , f (x))p(x , y)dxdy

ObjectiveIdentify the prediction function which minimizes the true risk i.e.

f ? = arg minf ∈H

R(f )

Gilles Gasso Introduction to Machine Learning 21 / 32

Loss functions

Quadratic loss : L(Y , f (X )) = (Y − f (X ))2

`1 loss (absolute deviation): L(Y , f (X )) = |Y − f (X )|

0− 1 loss: L(y , f (x)) ={

0 if y = f (x)1 otherwise

Hinge loss: L(y , f (x)) = max(0, 1− yf (x))

Gilles Gasso Introduction to Machine Learning 22 / 32

Some supervised learning problems

RegressionWe talk about regression whenY is a subset of Rd .

Usual related loss function:quadratic loss (y − f (x))2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−1.5

−1

−0.5

0

0.5

1Support Vector Machine Regression

x

y

Gilles Gasso Introduction to Machine Learning 23 / 32

Some supervised learning problems

ClassificationOutput space Y is an un-ordered discrete setBinary classification: card(Y) = 2

Example: Y = {−1, 1}Loss functions: 0-1 loss, hinge loss

Multiclass classification: card(Y) > 2Example: Y = {1, 2, · · · ,K}

Gilles Gasso Introduction to Machine Learning 24 / 32

From true risk to empirical risk

Minimizing the true risk is not duable (in allmost all practical applications)

The joint distribution p(X ,Y ) is unknown!

Only a finite training set {(xi , yi ) ∈ X × Y}Ni=1 is available

Empirical risk

Remp(f ) =1N

N∑i=1

L(yi , f (xi ))

Empirical risk minimization

f̂ = arg minf ∈H

Remp(f )

Gilles Gasso Introduction to Machine Learning 25 / 32

Overfitting

OverfittingEmpirical risk is not appropriate for model selection: if H is large enough,Remp(f )→ 0 but the generalized error (true risk) is high.

Err

eur

de

pre

dic

tio

n

Ensemble de Test

Ensemble d’apprentissage

Faible ElevéComplexité du modèle

Gilles Gasso Introduction to Machine Learning 26 / 32

Illustration of overfitting

Valeurs decroissantes de k0102030

Ris

que

empi

rique

0

0.1

0.2

0.3

AppValTest

https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/ML_basics_lecture6_overfitting.pdf

Gilles Gasso Introduction to Machine Learning 27 / 32

Model selection

Find in H the best function f that learned based on the training setwill well generalize (low true risk)

Example : We are looking for a polynomial function of degree αminimizing the risk : Remp(fα) =

∑Ni=1(yi − fα(xi ))

2.

Goal :1 propose a model estimation method in order to choose (approximately)

the best model belonging to H .2 once the model is selected, estimate its generalization error.

Gilles Gasso Introduction to Machine Learning 28 / 32

Model selection : basic approache

Case 1 : N is really big (large scale DN )

Validation Apprentissage {X app ,Y app }

Données disponibles

Test {X test ,Y test }{X val ,Y val}

1 Randomly split DN = Dtrain ∪ Dval ∪ Dtest

2 For each α, train fα based on Dtrain

3 Evaluating its performance on Dval Rval =1

Nval

∑i∈Dval

L(yi , f (xi ))

4 Select the model with the best performance on Dval

5 Test selected model on Dtest

NoteDtest is used once!

Gilles Gasso Introduction to Machine Learning 29 / 32

Model selection : Cross-validation

Case 2 : Small or medium scale DN

Estimate the generalization error by re-sampling.

Principle

1 Split DN into K sets of equal size.

2 For each k = 1, · · · ,K , train a model by using the K − 1 remainingsets and evaluate the model on the k-th part.

3 Average the K error estimates obtained to have the cross-validationerror.

TestApprentissageValidation

Validation TestApprentissage Apprentissage

Validation TestApprentissage

Gilles Gasso Introduction to Machine Learning 30 / 32

Conclusions

To successfully carry out an automatic data processing projectClearly identify and spell out the needs.

Create or obtain data representative of the problem

Identify the context of learning

Analyze and reduce data size

Choose an algorithm and/or a space of hypotheses

Choose a model by applying the algorithm to pre-processed data

Validate the performance of the method

Gilles Gasso Introduction to Machine Learning 31 / 32

Au final ...

Find your way. . .

Gilles Gasso Introduction to Machine Learning 32 / 32

top related