[fdd 2017] pablo ribalta - machine learning done right

61
Machine learning done right An approach to successfully building AI products Pablo Ribalta Lorenzo R&D Lead Engineer [email protected]

Upload: future-processing

Post on 22-Jan-2018

53 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: [FDD 2017] Pablo Ribalta - Machine learning done right

Machine learning done right

An approach to successfully building AI products

Pablo Ribalta Lorenzo

R&D Lead Engineer

[email protected]

Page 2: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Page 3: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Page 4: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Page 5: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Page 6: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Page 7: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Page 8: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Page 9: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Choosing your metric

Machine learning done right: An approach to building successful products

Page 10: [FDD 2017] Pablo Ribalta - Machine learning done right

MRI scan

Manual MRI segmentation

Page 11: [FDD 2017] Pablo Ribalta - Machine learning done right

MRI scan Doctor’s prediction

Manual MRI segmentation

Page 12: [FDD 2017] Pablo Ribalta - Machine learning done right

MRI scan

Automatic MRI segmentation

Page 13: [FDD 2017] Pablo Ribalta - Machine learning done right

MRI scan

?

Automatic MRI segmentation

Page 14: [FDD 2017] Pablo Ribalta - Machine learning done right

Ground truth

MRI scan

Training

Automatic MRI segmentation

Page 15: [FDD 2017] Pablo Ribalta - Machine learning done right

Ground truth

MRI scan

ML-system predictionTraining

Automatic MRI segmentation

Page 16: [FDD 2017] Pablo Ribalta - Machine learning done right

ML-system predictionGround truth

vs

Page 17: [FDD 2017] Pablo Ribalta - Machine learning done right

ML-system predictionGround truth

vs

Approach #0: Pixelwise comparison

Page 18: [FDD 2017] Pablo Ribalta - Machine learning done right

ML-system predictionGround truth

vs

Approach #0: Pixelwise comparison

Page 19: [FDD 2017] Pablo Ribalta - Machine learning done right

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Page 20: [FDD 2017] Pablo Ribalta - Machine learning done right

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Page 21: [FDD 2017] Pablo Ribalta - Machine learning done right

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Relevant elements

Selected elements

Page 22: [FDD 2017] Pablo Ribalta - Machine learning done right

Precision =

FPTP

TP

Recall =

TP

FN TP

What is our tendency to oversegment? What is our tendency to miss items?

[0, 1] [0, 1]

Page 23: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Ultimate goal: Single metric

Machine learning done right: An approach to building successful products

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗1

1𝑟𝑒𝑐𝑎𝑙𝑙

+1

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛

= 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

[0, 1]

Page 24: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Choosing your metric: Summary

Machine learning done right: An approach to building successful products

• Like business requirements, choosing a good metric comes as result of understanding the needs and expectations of the model’s users

• A model can be excellent in one metric, but very poor in others

• Train using the metric you plan on judging the model with

Page 25: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building your dataset

Machine learning done right: An approach to building successful products

Page 26: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

How much data can we collect?

Machine learning done right: An approach to building successful products

Page 27: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

How much data can we collect?

Machine learning done right: An approach to building successful products

Page 28: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Dealing with data scarcity

Machine learning done right: An approach to building successful products

Medical records

Page 29: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Dealing with data scarcity

Machine learning done right: An approach to building successful products

Medical records Only few patients

Page 30: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful productsSecret sauce: Data augmentation

Page 31: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta LorenzoDeformed Original

Page 32: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Rotation

Deformed Original

0° 45° 90°

Page 33: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Page 34: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Verticalflip

Yes Yes Yes YesYes Yes

Page 35: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Building your dataset: Summary

Machine learning done right: An approach to building successful products

• Many approaches to augmenting data

• We must ensure that our dataset is balanced and correctly describes the data’s statistical distribution

• Although not mentioned, splitting a dataset into Training, Validation and Test is fundamental for a correct training and evaluation of the results

Page 36: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Tuning your model

Machine learning done right: An approach to building successful products

Page 37: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Page 38: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Page 39: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Page 40: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Page 41: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Page 42: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Automatic hyper-parameter selection: Particle Swarm Optimization

Page 43: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Automatic hyper-parameter selection: Particle Swarm Optimization

Page 44: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

• Pablo Ribalta Lorenzo, Jakub Nalepa, Michal Kawulok, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 481-488.

• Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, New York, NY, USA, 1864-1871.

When possible, go automatic

Page 45: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Tuning your model: Summary

Machine learning done right: An approach to building successful products

• Hyper-parameter optimization is probably the most time consuming aspect of building a Machine Learning product

• We need to be confident that our selected settings will translate well in the majority of the cases

• Use automatic approaches when possible

Page 46: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Comparing your results

Machine learning done right: An approach to building successful products

Page 47: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

Page 48: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

Page 49: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

Page 50: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

Page 51: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

• Team of experienced doctors performance: 0.5% error

Page 52: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

• Team of experienced doctors performance: 0.5% error

What is human performance?

Page 53: [FDD 2017] Pablo Ribalta - Machine learning done right

F1 score = 0.817 F1 score = 0.845 F1 score = 0.545 F1 score = 0.801

Page 54: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

Page 55: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

3x State of the art performance for single stage lesions

2x State of the art performance for multiple stage lesions

Page 56: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

3x State of the art performance for single stage lesions

Page 57: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Comparing your results: Summary

Machine learning done right: An approach to building successful products

• It is hard to compare with human performance, and the majority of the time can be misleading

• We have to strive for achieving statistically significant results across different subsets of our data

• Comparing with the state of the art is always a good idea, but we must ensure a fair comparison

Page 58: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

About us

Machine learning done right: An approach to building successful products

Page 59: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

ECONIB in numbers

Machine learning done right: An approach to building successful products

• 18 months ongoing

• 8 publications

• Featured in social media

• Healthcare and research partnership

• NVIDIA Inception member

• Still more research in progress

Page 60: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Conclusions

Machine learning done right: An approach to building successful products

• Building ML products is possible with a rigorous scientific approach

• Maximising the performance of our model is a nuanced process that requires a thorough understanding of the problem and the theory behind it

• It is not only about the model, but also what’s around it

Page 61: [FDD 2017] Pablo Ribalta - Machine learning done right

Pablo Ribalta Lorenzo

Machine learning done rightAn approach to building successful ML projects

[email protected]

www.future-processing.pl