lectures on machine learning - lecture 1: from artificial...

69
Lectures on Machine Learning Lecture 1: from artificial intelligence to machine learning Stefano Carrazza TAE2018, 2-15 September 2018 European Organization for Nuclear Research (CERN) Acknowledgement: This project has received funding from HICCUP ERC Consolidator grant (614577) and by the European Unions Horizon 2020 research and innovation programme under grant agreement no. 740006. PDF N 3 Machine Learning • PDFs • QCD

Upload: others

Post on 19-Sep-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Lectures on Machine Learning

Lecture 1: from artificial intelligence to machine learning

Stefano Carrazza

TAE2018, 2-15 September 2018

European Organization for Nuclear Research (CERN)

Acknowledgement: This project has received funding from HICCUP ERC Consolidatorgrant (614577) and by the European Unions Horizon 2020 research and innovationprogramme under grant agreement no. 740006.

PDFN 3Machine Learning • PDFs • QCD

Page 2: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Why lectures on machine learning?

1

Page 3: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Why lectures on machine learning?

because

• it is an essential set of algorithms for building models in science,

• fast development of new tools and algorithms in the past years,

• nowadays it is a requirement in experimental and theoretical physics,

• large interest from the HEP community: IML, conferences, grants.

1

Page 4: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Why lectures on machine learning?

because

• it is an essential set of algorithms for building models in science,

• fast development of new tools and algorithms in the past years,

• nowadays it is a requirement in experimental and theoretical physics,

• large interest from the HEP community: IML, conferences, grants.

1

Page 5: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Why lectures on machine learning?

because

• it is an essential set of algorithms for building models in science,

• fast development of new tools and algorithms in the past years,

• nowadays it is a requirement in experimental and theoretical physics,

• large interest from the HEP community: IML, conferences, grants.

1

Page 6: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Why lectures on machine learning?

because

• it is an essential set of algorithms for building models in science,

• fast development of new tools and algorithms in the past years,

• nowadays it is a requirement in experimental and theoretical physics,

• large interest from the HEP community: IML, conferences, grants.

1

Page 7: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

What expect from these lectures?

2

Page 8: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

What expect from these lectures?

• Learn the basis of machine learning techniques.

• Learn when and how to apply machine learning algorithms.

2

Page 9: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

The talk is divided in three lectures:

Lecture 1 (today)

• Artificial intelligence

• Machine learning

• Model representation

• Metrics

Lecture 2 (tomorrow)

• Parameter learning

• Non-linear models

• Beyond neural networks

• Clustering

Lecture 3 (tomorrow)

• Hyperparameter tune

• Cross-validation

• ML in practice

• The PDF case study

3

Page 10: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Some references

Books:

• The elements of statistical learning, T. Hastie, R.

Tibshirani, J. Friedman.

• An introduction to statistical learning, G. James,

D. Witten, T. Hastie, R. Tibshirani.

• Deep learning, I. Goodfellow, Y. Bengio, A.

Courville.

Online resources:

• HEP-ML:

https://github.com/iml-wg/HEP-ML-Resources

• Tensorflow: http://tensorflow.org

• Keras: http://keras.io

• Scikit: http://scikit-learn.org

4

Page 11: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Artificial Intelligence

Page 12: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Artificial intelligence timeline

5

Page 13: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Defining A.I.

Artificial intelligence (A.I.) is the science and engineering of making

intelligent machines. (John McCarthy ‘56)

Artificial intelligence

Machine learning

Natural language processing

Knowledge reasoning

Computer vision

Planning

Robotics

Speech

A.I. consist in the development of computer systems to perform tasks

commonly associated with intelligence, such as learning .

6

Page 14: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Defining A.I.

Artificial intelligence (A.I.) is the science and engineering of making

intelligent machines. (John McCarthy ‘56)

Artificial intelligence

Machine learning

Natural language processing

Knowledge reasoning

Computer vision

Planning

Robotics

Speech

A.I. consist in the development of computer systems to perform tasks

commonly associated with intelligence, such as learning . 6

Page 15: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

A.I. and humans

There are two categories of A.I. tasks:

• abstract and formal: easy for computers but difficult for humans,

e.g. play chess (IBM’s Deep Blue 1997).

→ Knowledge-based approach to artificial intelligence.

• intuitive for humans but hard to describe formally:

e.g. recognizing faces in images or spoken words.

→ Concept capture and generalization

7

Page 16: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

A.I. and humans

There are two categories of A.I. tasks:

• abstract and formal: easy for computers but difficult for humans,

e.g. play chess (IBM’s Deep Blue 1997).

→ Knowledge-based approach to artificial intelligence.

• intuitive for humans but hard to describe formally:

e.g. recognizing faces in images or spoken words.

→ Concept capture and generalization

7

Page 17: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

A.I. technologies

Historically, the knowledge-based approach has not led to a major success

with intuitive tasks for humans, because:

• requires human supervision and hard-coded logical inference rules.

• lacks of representation learning ability.

Solution:

The A.I. system needs to acquire its own knowledge.

This capability is known as machine learning (ML).

→ e.g. write a program which learns the task.

8

Page 18: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

A.I. technologies

Historically, the knowledge-based approach has not led to a major success

with intuitive tasks for humans, because:

• requires human supervision and hard-coded logical inference rules.

• lacks of representation learning ability.

Solution:

The A.I. system needs to acquire its own knowledge.

This capability is known as machine learning (ML).

→ e.g. write a program which learns the task.

8

Page 19: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Venn diagram for A.I.

Artificial intelligence

Machine learning

Deep learning

Representation learning

e.g. Knowledge bases

e.g. Logistic regression

e.g. Autoencoders

e.g. MLPs

When a representation learning is difficult, ML provides deep learning

techniques which allow the computer to build complex concepts out of

simpler concepts, e.g. artificial neural networks (MLP).9

Page 20: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine Learning

Page 21: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning definition

Definition from A. Samuel in 1959:

Field of study that gives computers the ability to learn without being

explicitly programmed.

Definition from T. Mitchell in 1998:

A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P , if its performance on

T , as measured by P , improves with experience E.

10

Page 22: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning definition

Definition from A. Samuel in 1959:

Field of study that gives computers the ability to learn without being

explicitly programmed.

Definition from T. Mitchell in 1998:

A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P , if its performance on

T , as measured by P , improves with experience E.

10

Page 23: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning examples

Thanks to work in A.I. and new capability for computers:

• Database mining:

• Search engines

• Spam filters

• Medical and biological records

• Intuitive tasks for humans:

• Autonomous driving

• Natural language processing

• Robotics (reinforcement learning)

• Game playing (DQN algorithms)

• Human learning:

• Concept/human recognition

• Computer vision

• Product recommendation

11

Page 24: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning examples

Thanks to work in A.I. and new capability for computers:

• Database mining:

• Search engines

• Spam filters

• Medical and biological records

• Intuitive tasks for humans:

• Autonomous driving

• Natural language processing

• Robotics (reinforcement learning)

• Game playing (DQN algorithms)

• Human learning:

• Concept/human recognition

• Computer vision

• Product recommendation

11

Page 25: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning examples

Thanks to work in A.I. and new capability for computers:

• Database mining:

• Search engines

• Spam filters

• Medical and biological records

• Intuitive tasks for humans:

• Autonomous driving

• Natural language processing

• Robotics (reinforcement learning)

• Game playing (DQN algorithms)

• Human learning:

• Concept/human recognition

• Computer vision

• Product recommendation

11

Page 26: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning examples

Thanks to work in A.I. and new capability for computers:

• Database mining:

• Search engines

• Spam filters

• Medical and biological records

• Intuitive tasks for humans:

• Autonomous driving

• Natural language processing

• Robotics (reinforcement learning)

• Game playing (DQN algorithms)

• Human learning:

• Concept/human recognition

• Computer vision

• Product recommendation

11

Page 27: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

ML applications in HEP

12

Page 28: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

ML in experimental HEP

There are many applications in experimental HEP involving the LHC

measurements, including the Higgs discovery, such as:

• Tracking

• Fast Simulation

• Particle identification

• Event filtering

13

Page 29: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

ML in experimental HEP

Some remarkable examples are:

• Signal-background detection:

Decision trees, artificial neural networks, support vector machines.

• Jet discrimination:

Deep learning imaging techniques via convolutional neural networks.

• HEP detector simulation:

Generative adversarial networks, e.g. LAGAN and CaloGAN.

14

Page 30: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

ML in theoretical HEP

15

Page 31: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

ML in theoretical HEP

• Supervised learning:

• The structure of the proton at the LHC

• parton distribution functions

• Theoretical prediction and combination• Monte Carlo reweighting techniques

• neural network Sudakov

• BSM searches and exclusion limits

• Unsupervised learning:

• Clustering and compression

• PDF4LHC15 recommendation

• Density estimation and anomaly detection

• Monte Carlo sampling

x3−10 2−10 1−10 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

g/10

vu

vd

d

c

s

u

NNPDF3.1 (NNLO)

)2=10 GeV2µxf(x,

x3−10 2−10 1−10 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1g/10

vu

vd

d

u

s

c

b

)2 GeV4=102µxf(x,

10-2

10-1

100

101

102

σ per bin [pb]

STJ★

STJST

POWHEG BOX + PYTHIA8

Top quark rapidity

0.6

0.8 1

1.25 1.6

#/ST

0.6

0.8 1

1.25 1.6

#/STJ

0.6

0.8 1

1.25 1.6

-4 -3 -2 -1 0 1 2 3 4

#/STJ★

y(t)

16

Page 32: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning algorithms

Machine learning algorithms:

• Supervised learning:

regression, classification, ...

• Unsupervised learning:

clustering, dim-reduction, ...

• Reinforcement learning:

real-time decisions, ...

Labels are known

Supervised learning

Input Data

Processing

Output

Algorithm

Supervisor

Training Data Set

Desired Output

17

Page 33: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning algorithms

Machine learning algorithms:

• Supervised learning:

regression, classification, ...

• Unsupervised learning:

clustering, dim-reduction, ...

• Reinforcement learning:

real-time decisions, ...

Labels are unknown

Unsupervised learning

Input Data

Processing

Output

Algorithm

No Training Data Set

Unknown Output

Discover Interpretationfrom Features

17

Page 34: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning algorithms

Machine learning algorithms:

• Supervised learning:

regression, classification, ...

• Unsupervised learning:

clustering, dim-reduction, ...

• Reinforcement learning:

real-time decisions, ...

Reinforcement learning

Input Data

Output

Algorithm

Agent

Environment

Best Action Reward

17

Page 35: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Machine learning algorithms

More than 60 algorithms.18

Page 36: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Workflow in machine learning

The operative workflow in ML is summarized by the following steps:

Model

Optimizer

Cost function Best modelCross-validationTraining

Data

The best model is then used to:

• supervised learning: make predictions for new observed data.

• unsupervised learning: extract features from the input data.

19

Page 37: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Models and metrics

Page 38: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Models and metrics

Model

Optimizer

Cost function Best modelCross-validationTraining

Data

20

Page 39: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation in supervised learning

We define parametric and structure models for statistical inference:

• examples: linear models, neural networks, decision tree...

Machine LearningAlgorithm

Model

Data Setfor Training

Input xEstimated Prediction

• Given a training set of input-output pairs A = (x1, y1), . . . , (xn, yn).

• Find a model M which:

M(x) ∼ y

where x is the input vector and y discrete labels in classification and

real values in regression.

21

Page 40: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation in supervised learning

Examples of models:

→ linear regression we define a vector x ∈ Rn as input and predict the

value of a scalar y ∈ R as its output:

y(x) = wTx+ b

where w ∈ Rn is a vector of parameters and b a constant.

→ Generalized linear models are also available increasing the power of

linear models:

→ Non-linear models: neural networks (talk later).

22

Page 41: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation in supervised learning

Examples of models:

→ linear regression we define a vector x ∈ Rn as input and predict the

value of a scalar y ∈ R as its output:

y(x) = wTx+ b

where w ∈ Rn is a vector of parameters and b a constant.

→ Generalized linear models are also available increasing the power of

linear models:

→ Non-linear models: neural networks (talk later).

22

Page 42: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation in supervised learning

Examples of models:

→ linear regression we define a vector x ∈ Rn as input and predict the

value of a scalar y ∈ R as its output:

y(x) = wTx+ b

where w ∈ Rn is a vector of parameters and b a constant.

→ Generalized linear models are also available increasing the power of

linear models:

→ Non-linear models: neural networks (talk later).

22

Page 43: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation trade-offs

However, the selection of the appropriate model comes with trade-offs:

• Prediction accuracy vs interpretability:

→ e.g. linear model vs splines or neural networks.

• Optimal capacity/flexibility: number of parameters, architecture

→ deal with overfitting, and underfitting situations

Neural Nets

Accuracy

Interpretability

Support Vector Machines

Linear Regression

Decision Tree

K-Nearest Neighbors

Random Forest

23

Page 44: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Model representation trade-offs

However, the selection of the appropriate model comes with trade-offs:

• Prediction accuracy vs interpretability:

→ e.g. linear model vs splines or neural networks.

• Optimal capacity/flexibility: number of parameters, architecture

→ deal with overfitting, and underfitting situations

23

Page 45: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance

How to check model performance?

→ define metrics and statistical estimators for model performance.

Examples:

• Regression: cost / loss / error function,

• Classification: cost function, precision, accuracy, recall, ROC, AUC

24

Page 46: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance - cost function

To access the model performance we define a cost function J(w) which

often measures the difference between the target and the model output.

In a optimization procedure, given a model yw, we search for:

argminw

J(w)

The mean square error (MSE) is the most commonly used for regression:

J(w) =1

n

n∑i=1

(yi − yw(xi))2

a quadratic function and convex function in linear regression.

25

Page 47: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance - cost function

Other cost functions are depending on the nature of the problem.

Some other examples:

• regression with uncertainties, chi-square:

J(w) =

n∑i,j=1

(yi−yw(xi))(σ−1)ij(yj−yw(xj))

where:

• σij is the data covariance matrix.

e.g. for LHC data experimental statistical

and systematics correlations.

1.000

1.025

1.050

1.075

1.100

NNLO

/NLO

ATLAS1JET11 - R=0.4 - k-factor modelsNN modelk-factor CGP |y|=0.2

1.000

1.025

1.050

1.075

1.100

NNLO

/NLO

NN modelk-factor CGP |y|=0.8

1.00

1.05

1.10

NNLO

/NLO

NN modelk-factor CGP |y|=1.2

0.95

1.00

1.05

1.10

NNLO

/NLO

NN modelk-factor CGP |y|=1.8

0.95

1.00

1.05

1.10

NNLO

/NLO

NN modelk-factor CGP |y|=2.2

250 500 750 1000 1250 1500 1750pT (GeV)

0.95

1.00

1.05

1.10

NNLO

/NLO

NN modelk-factor CGP |y|=2.8

26

Page 48: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance - cost function

• logistic regression (binary classification): cross-entropy

J(w) = − 1

n

n∑i=1

yi log yw(xi) + (1− yi) log(1− yw(xi))

where yw(xi) = 1/(1 + e−wT xi).

27

Page 49: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance - cost function

• density estimate / regression: negative log-likelihood:

J(w) = −n∑

i=1

log(yw(xi))

20 10 0 10 20v

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

P

Gaussian mixture pdfRTBM modelSampling Ns = 105

6 4 2 0 2 4 6v1

6

4

2

0

2

4

6

v2

0.0

0.2

0.4P(v1)

0.00 0.25 0.50P(v2)

• Kullback-Leibler, RMSE, MAE, etc.

28

Page 50: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing the model performance - cost function

• density estimate / regression: negative log-likelihood:

J(w) = −n∑

i=1

log(yw(xi))

20 10 0 10 20v

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

P

Gaussian mixture pdfRTBM modelSampling Ns = 105

6 4 2 0 2 4 6v1

6

4

2

0

2

4

6

v2

0.0

0.2

0.4P(v1)

0.00 0.25 0.50P(v2)

• Kullback-Leibler, RMSE, MAE, etc.

28

Page 51: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Training and test sets

Another common issue related to model capacity in supervised learning:

• The model should not learn noise from data.

• The model should be able to generalize its output to new samples.

To observe this issue we split the input data in training and test sets:

• training set error, JTr(w)

• test set/generalization error, JTest(w)

Training Set Test Set

Total number of examples

29

Page 52: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Training and test sets

Another common issue related to model capacity in supervised learning:

• The model should not learn noise from data.

• The model should be able to generalize its output to new samples.

To observe this issue we split the input data in training and test sets:

• training set error, JTr(w)

• test set/generalization error, JTest(w)

Training Set Test Set

Total number of examples

29

Page 53: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Training and test sets

The test set is independent from the training set but follows the same

probability distribution.

Training Set Permanent model

Test Set Prediction Estimate performance

Model building

30

Page 54: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade-off

From a practical point of view dividing the input data in training and test:

The training and test/generalization error conflict is known as

bias-variance trade-off.

31

Page 55: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade-off

Supposing we have model y(x) determined from a training data set, and

considering as the true model

Y = y(X) + ε, with y(x) = E(Y |X = x),

where the noise ε has zero mean and constant variance.

If we take (x0, y0) from the test set then:

E[(y0 − y(x0))2] = (Bias[y(x0)])2+Var[y(x0)] + Var(ε),

where

• Bias[y(x0)] = E[y(x0)]− y(x0)• Var[y(x0)] = E[y(x0)

2]− (E[y(x0)])2

So, the expectation averages over the variability of y0 (bias) and the

variability in the training data.

32

Page 56: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade-off

Supposing we have model y(x) determined from a training data set, and

considering as the true model

Y = y(X) + ε, with y(x) = E(Y |X = x),

where the noise ε has zero mean and constant variance.

If we take (x0, y0) from the test set then:

E[(y0 − y(x0))2] = (Bias[y(x0)])2+Var[y(x0)] + Var(ε),

where

• Bias[y(x0)] = E[y(x0)]− y(x0)• Var[y(x0)] = E[y(x0)

2]− (E[y(x0)])2

So, the expectation averages over the variability of y0 (bias) and the

variability in the training data.

32

Page 57: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade-off

If y increases flexibility, its variance increases and its biases decreases.

Choosing the flexibility based on average test error amounts to a

bias-variance trade-off:

• High Bias → underfitting:

erroneous assumptions in the learning algorithm.

• High Variance → overfitting:

erroneous sensitivity to small fluctuations (noise) in the training set.

33

Page 58: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade-off

More examples of bias-variance trade-off:

34

Page 59: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Bias-variance trade off

Regularization techniques can be applied to modify the learning

algorithm and reduce its generalization error but not its training error.

For example, including the weight decay to the MSE cost function:

J(w) =1

n

n∑i=1

(yi − yw(xi))2+λwTw.

where λ is a real number which express the preference for weights with

smaller squared L2 norm.

35

Page 60: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Solution for the bias-variance trade off

Tuning the hyperparameter λ we can regularize a model without

modifying explicitly its capacity.

36

Page 61: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Solution for the bias-variance trade off

A common way to reduce the bias-variance trade-off and choose the

proper learning hyperparamters is to create a validation set that:

• not used by the training algorithm

• not used as test set

Training Set Test Set

Total number of examples

Validation Set

• Training set: examples used for learning.

• Validation set: examples used to tune the hyperparameters.

• Test set: examples used only to access the performance.

Techniques are available to deal with data samples with large and small

number of examples. (talk later)37

Page 62: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

In binary classification tasks we usually complement the cost function

with the accuracy metric defined as:

Accuracy =TP + TN

TP + TN + FP + FN.

Example:

True Positives (TP)e.g. 8

False Positives (FP)e.g. 2

True Negatives (TN)e.g. 20

False Negatives (FN)e.g. 4

• Accuracy = 82%

However accuracy does not represents the overall situation for skewed

classes, i.e. imbalance data set with large disparity, e.g. signal and

background.

In this cases we define precision and recall.

38

Page 63: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

In binary classification tasks we usually complement the cost function

with the accuracy metric defined as:

Accuracy =TP + TN

TP + TN + FP + FN.

Example:

True Positives (TP)e.g. 8

False Positives (FP)e.g. 2

True Negatives (TN)e.g. 20

False Negatives (FN)e.g. 4

• Accuracy = 82%

However accuracy does not represents the overall situation for skewed

classes, i.e. imbalance data set with large disparity, e.g. signal and

background.

In this cases we define precision and recall.38

Page 64: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

Precision: proportion of correct positive identifications.

Recall: proportion of correct actual positives identifications.

Precision =TP

TP + FP, Recall =

TP

TP + FN

True Positives (TP)e.g. 8

False Positives (FP)e.g. 2

True Negatives (TN)e.g. 20

False Negatives (FN)e.g. 4

• Accuracy = 82%

• Precision = 80%

• Recall = 67%

Various metrics have been developed that rely on both precision and

recall, e.g. the F1 score:

F1 = 2 · Precision · RecallPrecision + Recall

= 73%

39

Page 65: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

Precision: proportion of correct positive identifications.

Recall: proportion of correct actual positives identifications.

Precision =TP

TP + FP, Recall =

TP

TP + FN

True Positives (TP)e.g. 8

False Positives (FP)e.g. 2

True Negatives (TN)e.g. 20

False Negatives (FN)e.g. 4

• Accuracy = 82%

• Precision = 80%

• Recall = 67%

Various metrics have been developed that rely on both precision and

recall, e.g. the F1 score:

F1 = 2 · Precision · RecallPrecision + Recall

= 73%

39

Page 66: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

In a binary classification we can vary the probability threshold and define:

• the receiver operating characteristic curve (ROC curve) is a

metric which shows the relationship between correctly classified

positive cases, the true positive rate (TRP/recall) and the incorrectly

classified negative cases, false positive rate (FPR, (1-effectivity)).

TPR =TP

TP + FN, FPR =

FP

FP + TN

40

Page 67: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Assessing model performance for classification

The area under the ROC curve (AUC) represents the probability that

classifier will rank a randomly chosen positive instance higher than a

randomly chosen negative one.

AUC provides an aggregate measure of performance across all possible

classification thresholds.

• AUC is 0 if predictions are 100% wrong

• AUC is 1 if all predictions are correct.

• AUC is scale-invariant and

classification-threshold-invariant.

41

Page 68: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Summary

Page 69: Lectures on Machine Learning - Lecture 1: from artificial ...benasque.org/2018tae/talks_contr/062_carrazza_lesson1.pdf · Lectures on Machine Learning Lecture 1: from arti cial intelligence

Summary

We have covered the following topics:

• Motivation and overview of A.I.

• Definition and overview of ML.

• Model representation definition and trade-offs

• Learning metrics for accessing the model performance

• Metrics for classification.

42