[fdd 2017] pablo ribalta - machine learning done right

Machine learning done right

An approach to successfully building AI products

Pablo Ribalta Lorenzo

R&D Lead Engineer

pribalta@future-processing.com

Building a Machine Learning product

Machine learning done right: An approach to building successful products

Choosing your metric

Building your dataset

Tuning your parameters

Comparing your results

MRI scan

Manual MRI segmentation

MRI scan Doctor’s prediction

Manual MRI segmentation

MRI scan

Automatic MRI segmentation

MRI scan

Ground truth

MRI scan

Training

Ground truth

MRI scan

ML-system predictionTraining

ML-system predictionGround truth

Approach #0: Pixelwise comparison

ML-system predictionGround truth

Approach #0: Pixelwise comparison

Approach #1: Exploiting confusion matrices

Truepositives

Falsepositives

True negativesFalse negatives

Truepositives

Falsepositives

Truepositives

Falsepositives

Relevant elements

Selected elements

Precision =

Recall =

What is our tendency to oversegment? What is our tendency to miss items?

[0, 1] [0, 1]

Ultimate goal: Single metric

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗1

1𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛

= 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

[0, 1]

Choosing your metric: Summary

• Like business requirements, choosing a good metric comes as result of understanding the needs and expectations of the model’s users

• A model can be excellent in one metric, but very poor in others

• Train using the metric you plan on judging the model with

How much data can we collect?

Dealing with data scarcity

Medical records

Dealing with data scarcity

Medical records Only few patients

Machine learning done right: An approach to building successful productsSecret sauce: Data augmentation

Pablo Ribalta LorenzoDeformed Original

Rotation

Deformed Original

0° 45° 90°

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Rotation

Horizontalflip

Deformed Original

0° 45° 90°

Yes Yes Yes

Verticalflip

Yes Yes Yes YesYes Yes

Building your dataset: Summary

• Many approaches to augmenting data

• We must ensure that our dataset is balanced and correctly describes the data’s statistical distribution

• Although not mentioned, splitting a dataset into Training, Validation and Test is fundamental for a correct training and evaluation of the results

Tuning your model

Hyperparameter optimisation

Automatic hyper-parameter selection: Particle Swarm Optimization

• Pablo Ribalta Lorenzo, Jakub Nalepa, Michal Kawulok, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 481-488.

• Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, New York, NY, USA, 1864-1871.

When possible, go automatic

Tuning your model: Summary

• Hyper-parameter optimization is probably the most time consuming aspect of building a Machine Learning product

• We need to be confident that our selected settings will translate well in the majority of the cases

• Use automatic approaches when possible

Surpassing human performance in medical classification

• Typical human performance: 3% error

• Typical doctor performance: 1% error

• Experienced doctor performance: 0.7% error

• Team of experienced doctors performance: 0.5% error

What is human performance?

F1 score = 0.817 F1 score = 0.845 F1 score = 0.545 F1 score = 0.801

Comparing with the state of the art

• Superpixel segmentation algorithm

3x State of the art performance for single stage lesions

2x State of the art performance for multiple stage lesions

3x State of the art performance for single stage lesions

Comparing your results: Summary

• It is hard to compare with human performance, and the majority of the time can be misleading

• We have to strive for achieving statistically significant results across different subsets of our data

• Comparing with the state of the art is always a good idea, but we must ensure a fair comparison

About us

ECONIB in numbers

• 18 months ongoing

• 8 publications

• Featured in social media

• Healthcare and research partnership

• NVIDIA Inception member

• Still more research in progress

Conclusions

• Building ML products is possible with a rigorous scientific approach

• Maximising the performance of our model is a nuanced process that requires a thorough understanding of the problem and the theory behind it

• It is not only about the model, but also what’s around it

Machine learning done rightAn approach to building successful ML projects

pribalta@future-processing.com

www.future-processing.pl

[fdd 2017] pablo ribalta - machine learning done right

Data & Analytics

interview w jorge ribalta

metodología fdd

zte fdd installation

1 marche alla ribalta

trabajo fdd

3 marche alla ribalta

albert solé-ribalta, sergio gómez y alex...

fdd apresentacao (2)

lte, lte-advanced fdd/tdd, nb-iot/emtc fdd x...

francesc ribalta. sara martí. 2n bat a

tdd fdd coexistence

umts fdd rnp

jorge ribalta experimentos para una nueva institucionalidad

lte, lte-advanced fdd/tdd, nb-iot/emtc fdd x...

amigos luzes da ribalta

la ribalta 2010

luci della ribalta micromedia

fdd: feature driven development desarrollo basado en...

sim7200minipciemodule hardwareguide v1 · 2016. 6. 1. ·...

fdd brand guidelines