[fdd 2017] pablo ribalta - machine learning done right

Post on 22-Jan-2018

53 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine learning done right

An approach to successfully building AI products

Pablo Ribalta Lorenzo

R&D Lead Engineer

pribalta@future-processing.com

Pablo Ribalta Lorenzo

Pablo Ribalta Lorenzo

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Pablo Ribalta Lorenzo

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

Pablo Ribalta Lorenzo

Choosing your metric

Machine learning done right: An approach to building successful products

MRI scan

Manual MRI segmentation

MRI scan Doctor’s prediction

Manual MRI segmentation

MRI scan

Automatic MRI segmentation

MRI scan

?

Automatic MRI segmentation

Ground truth

MRI scan

Training

Automatic MRI segmentation

Ground truth

MRI scan

ML-system predictionTraining

Automatic MRI segmentation

ML-system predictionGround truth

vs

ML-system predictionGround truth

vs

Approach #0: Pixelwise comparison

ML-system predictionGround truth

vs

Approach #0: Pixelwise comparison

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Relevant elements

Selected elements

Precision =

FPTP

TP

Recall =

TP

FN TP

What is our tendency to oversegment? What is our tendency to miss items?

[0, 1] [0, 1]

Pablo Ribalta Lorenzo

Ultimate goal: Single metric

Machine learning done right: An approach to building successful products

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗1

1𝑟𝑒𝑐𝑎𝑙𝑙

+1

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛

= 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

[0, 1]

Pablo Ribalta Lorenzo

Choosing your metric: Summary

Machine learning done right: An approach to building successful products

• Like business requirements, choosing a good metric comes as result of understanding the needs and expectations of the model’s users

• A model can be excellent in one metric, but very poor in others

• Train using the metric you plan on judging the model with

Pablo Ribalta Lorenzo

Building your dataset

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

How much data can we collect?

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

How much data can we collect?

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Dealing with data scarcity

Machine learning done right: An approach to building successful products

Medical records

Pablo Ribalta Lorenzo

Dealing with data scarcity

Machine learning done right: An approach to building successful products

Medical records Only few patients

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful productsSecret sauce: Data augmentation

Pablo Ribalta LorenzoDeformed Original

Pablo Ribalta Lorenzo

Rotation

Deformed Original

0° 45° 90°

Pablo Ribalta Lorenzo

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Pablo Ribalta Lorenzo

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Verticalflip

Yes Yes Yes YesYes Yes

Pablo Ribalta Lorenzo

Building your dataset: Summary

Machine learning done right: An approach to building successful products

• Many approaches to augmenting data

• We must ensure that our dataset is balanced and correctly describes the data’s statistical distribution

• Although not mentioned, splitting a dataset into Training, Validation and Test is fundamental for a correct training and evaluation of the results

Pablo Ribalta Lorenzo

Tuning your model

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Hyperparameter optimisation

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Automatic hyper-parameter selection: Particle Swarm Optimization

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

Automatic hyper-parameter selection: Particle Swarm Optimization

Pablo Ribalta Lorenzo

Machine learning done right: An approach to building successful products

• Pablo Ribalta Lorenzo, Jakub Nalepa, Michal Kawulok, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 481-488.

• Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, New York, NY, USA, 1864-1871.

When possible, go automatic

Pablo Ribalta Lorenzo

Tuning your model: Summary

Machine learning done right: An approach to building successful products

• Hyper-parameter optimization is probably the most time consuming aspect of building a Machine Learning product

• We need to be confident that our selected settings will translate well in the majority of the cases

• Use automatic approaches when possible

Pablo Ribalta Lorenzo

Comparing your results

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

• Team of experienced doctors performance: 0.5% error

Pablo Ribalta Lorenzo

Surpassing human performance in medical classification

Machine learning done right: An approach to building successful products

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

• Team of experienced doctors performance: 0.5% error

What is human performance?

F1 score = 0.817 F1 score = 0.845 F1 score = 0.545 F1 score = 0.801

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

3x State of the art performance for single stage lesions

2x State of the art performance for multiple stage lesions

Pablo Ribalta Lorenzo

Comparing with the state of the art

Machine learning done right: An approach to building successful products

• Superpixel segmentation algorithm

3x State of the art performance for single stage lesions

Pablo Ribalta Lorenzo

Comparing your results: Summary

Machine learning done right: An approach to building successful products

• It is hard to compare with human performance, and the majority of the time can be misleading

• We have to strive for achieving statistically significant results across different subsets of our data

• Comparing with the state of the art is always a good idea, but we must ensure a fair comparison

Pablo Ribalta Lorenzo

About us

Machine learning done right: An approach to building successful products

Pablo Ribalta Lorenzo

ECONIB in numbers

Machine learning done right: An approach to building successful products

• 18 months ongoing

• 8 publications

• Featured in social media

• Healthcare and research partnership

• NVIDIA Inception member

• Still more research in progress

Pablo Ribalta Lorenzo

Conclusions

Machine learning done right: An approach to building successful products

• Building ML products is possible with a rigorous scientific approach

• Maximising the performance of our model is a nuanced process that requires a thorough understanding of the problem and the theory behind it

• It is not only about the model, but also what’s around it

Pablo Ribalta Lorenzo

Machine learning done rightAn approach to building successful ML projects

pribalta@future-processing.com

www.future-processing.pl

top related