introduction to machine learning€¦ · introductiontomachinelearning gillesgasso insa rouen - asi...

Introduction to Machine Learning

Gilles Gasso

INSA Rouen - ASI DepartementLITIS Lab

September 5, 2019

Gilles Gasso Introduction to Machine Learning 1 / 32

Machine Learning: introduction

Machine Learning ≡ data-based programmingability of computers to learn how to perform tasks (classification,detection, translation. . . ) without being explicitly programmed

study of algorithms that improve their performance at some task basedon experience

The rise

DataBig Data : continuous increase in data generated

Twitter : 50M tweets /day (=7 terabytes)Facebook : 10 terabytes /dayYoutube : 50h of uploaded videos /minute2.9 millions of e-mails /second

Computing PowerMoore’s lawMassively distributedcomputing

The value-adding processInterest: from product to customers.Data Mining ≡ discovering patterns inlarge data sets

Historical perspective

Today’s IA is Deep Learning (a technique of Machine Learning)

https://blog.alore.io/machine-learning-and-artificial-intelligence/

Applications: sentiment analysis

Classify stored reviews according to users’ sentiment

Score(x) > 0

Score(x) < 0

Product recommendation

Product features

User features

Matrix factorization Purchase history of customers

https://katbailey.github.io/images/matrix_factorization.png

Purchased item Recommended products

Image classification

Labeled training images Deep classification architecture

https://www.researchgate.net/profile/Y_Nikitin/publication/270163511/figure/download/fig5/AS:295194831409153@1447391340221/MPCNN-architecture-using-alternating-convolutional-and-max-pooling-layers-13.png

Input images and predicted category

https://adriancolyer.files.wordpress.com/2016/04/imagenet-fig4l.png?w=656

Medical diagnosis

Malignant melanoma detection130 000 images including over 2 000 cases of cancerError rate 28 % (human 34 %)

Digital Mammography DREAM Challenge640 000 mammographies (1209 participants)false-positive rate decreased by 5 %

Heart rhythm analysis500 000 ECGaccuracy 92.6 % (human 80.0 %) sensitivity of 97 %

Object detection

https://www.mdpi.com/applsci/applsci-09-01128/article_deploy/html/images/applsci-09-01128-g004.pn

https://arxiv.org/pdf/1506.02640.pdf

Audio captioning

https://ars.els-cdn.com/content/image/1-s2.0-S1574954115000151-gr1.jpg

Chatbot

https://miro.medium.com/max/2058/1*LF5T9fsr4w2EqyFJkb-gng.png

Implementing a Machine Learning project

Real World

IndustriesInsuranceInternet

Social NetworksMedical

DataGathering

SensorsTransactions

Web-clicks,logsMobility

Open dataDocs

DataStrength-ening

CleaningOrganizingAggregationManagement

of errors

Dataengineering

ExplorationRepresentation

DisplayMachine Learning

Data Mining

Business

InterpretationPattern ofinterests

Strategic decisionValuation

Chain of the data engineering process

DataPre-

processingTrain themodel

Evaluatethe

modelModel

1 Understand and specify projectgoals

2 Pre-processing/visualize/analyzedata

3 Which ML problem is it?4 Design a solving approach5 Evaluate its performance6 Go to to 2) if needed

Course Goal: Study the steps from 2 to 5

The data

Information (past experience) are examples with attributesAssume the data set consists of N samples

AttributesAn attribute is a property or characteristic of a phenomenon beingobserved. Also called it feature or variable

SampleIt is an entity characterising an object; it is made up of attributes.Synonyms : instance, point, vector (usually in Rd)

Data : visualizationhhhhhhhhhPoints x

Features citric acid residual sugar chlorides sulfur dioxide

1 0 1.9 0.076 112 0 2.6 0.098 253 0.04 2.3 0.092 15

Point x ∈ R4 0.56 1.9 0.075 175 0 1.9 0.076 116 0 1.8 0.075 137 0.06 1.6 0.069 158 0.02 2 0.073 99 0.36 2.8 0.071 1710 0.08 1.8 0.097 15

1.5 2 2.5 30.065

Variable 2 : Residual Sugar

Variable

Points

Moyenne des points

1.61.8

2.42.6

Variable 2 : Residual Sugar

Variable 3 : Chlorides

Variable

Data types

Sensors → Quantitative andqualitative variables,ordinales, nominals

Text → StringSpeech → Time SeriesImages → 2D DataVideos → 2D Data + time

Networks → GraphsStream → Logs, coupons. . .Labels → Expected output prediction

Approaches of Machine Learning

Supervised learning

Principle

Given a set of N training examples {(xi , yi ) ∈ X × Y, i = · · · ,N}, wewant to estimate a prediction function y = f (x).

The supervision comes from the label knowledge

ExamplesImage classification, object detection, stock price prediction . . .

Unsupervised learning

Principle

Only the {xi ∈ X , i = · · · ,N} are available. We aim to describe howdata is organized and extract homogeneous subsets from it.

ExamplesApplications: Customers segmentation, image segmentation, datavisualization, categorization of similar documents . . .

-60 -40 -20 0 20 40 60 80-80

00 00 00

000 00

4 4444

555555

6 6666

66666666

8 888 888

8888 8

888 88 8

99 9 9

Supervised learning : concept

Let X and Y be two sets. Assume p(X ,Y ) the joint probabilitydistribution of (X ,Y ) ∈ X × Y.

Goal : find a prediction function f : X → Y which correctly estimatesthe output y corresponding to x .

f belongs to a space H called hypothesis class. Example of H : setof polynomial functions

Supervised learning: principle

Loss function L(Y , f (X ))

evaluates how ”close” is the prediction f (x) to the true label y

it penalizes errors: L(y , f (x)) ={

0 if y = f (x)≥ 0 if y 6= f (x)

True risk (expected prediction error)

R(f ) = E(X ,Y )[L(Y , f (X ))] =

∫X×Y

L(y , f (x))p(x , y)dxdy

ObjectiveIdentify the prediction function which minimizes the true risk i.e.

f ? = arg minf ∈H

Loss functions

Quadratic loss : L(Y , f (X )) = (Y − f (X ))2

`1 loss (absolute deviation): L(Y , f (X )) = |Y − f (X )|

0− 1 loss: L(y , f (x)) ={

0 if y = f (x)1 otherwise

Hinge loss: L(y , f (x)) = max(0, 1− yf (x))

Some supervised learning problems

RegressionWe talk about regression whenY is a subset of Rd .

Usual related loss function:quadratic loss (y − f (x))2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−1.5

−0.5

1Support Vector Machine Regression

Some supervised learning problems

ClassificationOutput space Y is an un-ordered discrete setBinary classification: card(Y) = 2

Example: Y = {−1, 1}Loss functions: 0-1 loss, hinge loss

Multiclass classification: card(Y) > 2Example: Y = {1, 2, · · · ,K}

From true risk to empirical risk

Minimizing the true risk is not duable (in allmost all practical applications)

The joint distribution p(X ,Y ) is unknown!

Only a finite training set {(xi , yi ) ∈ X × Y}Ni=1 is available

Empirical risk

Remp(f ) =1N

N∑i=1

L(yi , f (xi ))

Empirical risk minimization

f̂ = arg minf ∈H

Remp(f )

Overfitting

OverfittingEmpirical risk is not appropriate for model selection: if H is large enough,Remp(f )→ 0 but the generalized error (true risk) is high.

Ensemble de Test

Ensemble d’apprentissage

Faible ElevéComplexité du modèle

Illustration of overfitting

Valeurs decroissantes de k0102030

AppValTest

https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/ML_basics_lecture6_overfitting.pdf

Model selection

Find in H the best function f that learned based on the training setwill well generalize (low true risk)

Example : We are looking for a polynomial function of degree αminimizing the risk : Remp(fα) =

∑Ni=1(yi − fα(xi ))

Goal :1 propose a model estimation method in order to choose (approximately)

the best model belonging to H .2 once the model is selected, estimate its generalization error.

Model selection : basic approache

Case 1 : N is really big (large scale DN )

Validation Apprentissage {X app ,Y app }

Données disponibles

Test {X test ,Y test }{X val ,Y val}

1 Randomly split DN = Dtrain ∪ Dval ∪ Dtest

2 For each α, train fα based on Dtrain

3 Evaluating its performance on Dval Rval =1

∑i∈Dval

L(yi , f (xi ))

4 Select the model with the best performance on Dval

5 Test selected model on Dtest

NoteDtest is used once!

Model selection : Cross-validation

Case 2 : Small or medium scale DN

Estimate the generalization error by re-sampling.

Principle

1 Split DN into K sets of equal size.

2 For each k = 1, · · · ,K , train a model by using the K − 1 remainingsets and evaluate the model on the k-th part.

3 Average the K error estimates obtained to have the cross-validationerror.

TestApprentissageValidation

Validation TestApprentissage Apprentissage

Validation TestApprentissage

Conclusions

To successfully carry out an automatic data processing projectClearly identify and spell out the needs.

Create or obtain data representative of the problem

Identify the context of learning

Analyze and reduce data size

Choose an algorithm and/or a space of hypotheses

Choose a model by applying the algorithm to pre-processed data

Validate the performance of the method

Au final ...

Find your way. . .

introduction to machine learning€¦ · introductiontomachinelearning gillesgasso insa rouen - asi...

Documents

株式会社ハイパーフィットネスrod a q a stepl...

machine control post machine name type axes machine model

introduction to machine learning - practical...

régression logistique...16 g. gasso, s. canu régression...

introduction au data-mining - insa...

dnc pro 메뉴얼cadtek.co.kr/doc/dncpro.pdf · 2011. 7....

ing. néstor o. gasso - miembro del cimeei 1 triz

district 131 fine arts festival - d131.org · stroll...

apprentissage semi supervisé via un svm...

machine shop safety - machine shop hazards, machine tool...

mobile handset virtual machine. outline virtual machine java...

mobia fitness machine machine

stéphane canu et gilles gasso {stephane.canu, …/cite> ·...

machine-to-machine communication

optimisation avec contraintes - cas convexe · gilles gasso...

papantlaveracruz.com.mxno 30, col. zana sun. gasso....

industry4.0@aerospace future of manufacturing & production...

biol invasions -...

stéphane canu et gilles gasso {stephane.canu, … ·...

personal machine machine