[fdd 2017] pablo ribalta - machine learning done right
TRANSCRIPT
Machine learning done right
An approach to successfully building AI products
Pablo Ribalta Lorenzo
R&D Lead Engineer
Pablo Ribalta Lorenzo
Pablo Ribalta Lorenzo
Pablo Ribalta Lorenzo
Building a Machine Learning product
Machine learning done right: An approach to building successful products
Choosing your metric
Building your dataset
Tuning your parameters
Comparing your results
Pablo Ribalta Lorenzo
Building a Machine Learning product
Machine learning done right: An approach to building successful products
Choosing your metric
Building your dataset
Tuning your parameters
Comparing your results
Pablo Ribalta Lorenzo
Building a Machine Learning product
Machine learning done right: An approach to building successful products
Choosing your metric
Building your dataset
Tuning your parameters
Comparing your results
Pablo Ribalta Lorenzo
Building a Machine Learning product
Machine learning done right: An approach to building successful products
Choosing your metric
Building your dataset
Tuning your parameters
Comparing your results
Pablo Ribalta Lorenzo
Building a Machine Learning product
Machine learning done right: An approach to building successful products
Choosing your metric
Building your dataset
Tuning your parameters
Comparing your results
Pablo Ribalta Lorenzo
Choosing your metric
Machine learning done right: An approach to building successful products
MRI scan
Manual MRI segmentation
MRI scan Doctor’s prediction
Manual MRI segmentation
MRI scan
Automatic MRI segmentation
MRI scan
?
Automatic MRI segmentation
Ground truth
MRI scan
Training
Automatic MRI segmentation
Ground truth
MRI scan
ML-system predictionTraining
Automatic MRI segmentation
ML-system predictionGround truth
vs
ML-system predictionGround truth
vs
Approach #0: Pixelwise comparison
ML-system predictionGround truth
vs
Approach #0: Pixelwise comparison
Approach #1: Exploiting confusion matrices
Truepositives
Falsepositives
True negativesFalse negatives
Approach #1: Exploiting confusion matrices
Truepositives
Falsepositives
True negativesFalse negatives
Approach #1: Exploiting confusion matrices
Truepositives
Falsepositives
True negativesFalse negatives
Relevant elements
Selected elements
Precision =
FPTP
TP
Recall =
TP
FN TP
What is our tendency to oversegment? What is our tendency to miss items?
[0, 1] [0, 1]
Pablo Ribalta Lorenzo
Ultimate goal: Single metric
Machine learning done right: An approach to building successful products
𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ∗1
1𝑟𝑒𝑐𝑎𝑙𝑙
+1
𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛
= 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
[0, 1]
Pablo Ribalta Lorenzo
Choosing your metric: Summary
Machine learning done right: An approach to building successful products
• Like business requirements, choosing a good metric comes as result of understanding the needs and expectations of the model’s users
• A model can be excellent in one metric, but very poor in others
• Train using the metric you plan on judging the model with
Pablo Ribalta Lorenzo
Building your dataset
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
How much data can we collect?
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
How much data can we collect?
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Dealing with data scarcity
Machine learning done right: An approach to building successful products
Medical records
Pablo Ribalta Lorenzo
Dealing with data scarcity
Machine learning done right: An approach to building successful products
Medical records Only few patients
Pablo Ribalta Lorenzo
Machine learning done right: An approach to building successful productsSecret sauce: Data augmentation
Pablo Ribalta LorenzoDeformed Original
Pablo Ribalta Lorenzo
Rotation
Deformed Original
0° 45° 90°
Pablo Ribalta Lorenzo
Rotation
Horizontalflip
Deformed Original
0° 45° 90°
Yes Yes Yes
Pablo Ribalta Lorenzo
Rotation
Horizontalflip
Deformed Original
0° 45° 90°
Yes Yes Yes
Verticalflip
Yes Yes Yes YesYes Yes
Pablo Ribalta Lorenzo
Building your dataset: Summary
Machine learning done right: An approach to building successful products
• Many approaches to augmenting data
• We must ensure that our dataset is balanced and correctly describes the data’s statistical distribution
• Although not mentioned, splitting a dataset into Training, Validation and Test is fundamental for a correct training and evaluation of the results
Pablo Ribalta Lorenzo
Tuning your model
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Hyperparameter optimisation
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Hyperparameter optimisation
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Hyperparameter optimisation
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Hyperparameter optimisation
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Machine learning done right: An approach to building successful products
Automatic hyper-parameter selection: Particle Swarm Optimization
Pablo Ribalta Lorenzo
Machine learning done right: An approach to building successful products
Automatic hyper-parameter selection: Particle Swarm Optimization
Pablo Ribalta Lorenzo
Machine learning done right: An approach to building successful products
• Pablo Ribalta Lorenzo, Jakub Nalepa, Michal Kawulok, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 481-488.
• Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. 2017. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, New York, NY, USA, 1864-1871.
When possible, go automatic
Pablo Ribalta Lorenzo
Tuning your model: Summary
Machine learning done right: An approach to building successful products
• Hyper-parameter optimization is probably the most time consuming aspect of building a Machine Learning product
• We need to be confident that our selected settings will translate well in the majority of the cases
• Use automatic approaches when possible
Pablo Ribalta Lorenzo
Comparing your results
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
• Typical human performance: 3% error
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
• Typical human performance: 3% error
• Typical doctor performance: 1% error
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
• Typical human performance: 3% error
• Typical doctor performance: 1% error
• Experienced doctor performance: 0.7% error
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
• Typical human performance: 3% error
• Typical doctor performance: 1% error
• Experienced doctor performance: 0.7% error
• Team of experienced doctors performance: 0.5% error
Pablo Ribalta Lorenzo
Surpassing human performance in medical classification
Machine learning done right: An approach to building successful products
• Typical human performance: 3% error
• Typical doctor performance: 1% error
• Experienced doctor performance: 0.7% error
• Team of experienced doctors performance: 0.5% error
What is human performance?
F1 score = 0.817 F1 score = 0.845 F1 score = 0.545 F1 score = 0.801
Pablo Ribalta Lorenzo
Comparing with the state of the art
Machine learning done right: An approach to building successful products
• Superpixel segmentation algorithm
Pablo Ribalta Lorenzo
Comparing with the state of the art
Machine learning done right: An approach to building successful products
• Superpixel segmentation algorithm
3x State of the art performance for single stage lesions
2x State of the art performance for multiple stage lesions
Pablo Ribalta Lorenzo
Comparing with the state of the art
Machine learning done right: An approach to building successful products
• Superpixel segmentation algorithm
3x State of the art performance for single stage lesions
Pablo Ribalta Lorenzo
Comparing your results: Summary
Machine learning done right: An approach to building successful products
• It is hard to compare with human performance, and the majority of the time can be misleading
• We have to strive for achieving statistically significant results across different subsets of our data
• Comparing with the state of the art is always a good idea, but we must ensure a fair comparison
Pablo Ribalta Lorenzo
About us
Machine learning done right: An approach to building successful products
Pablo Ribalta Lorenzo
ECONIB in numbers
Machine learning done right: An approach to building successful products
• 18 months ongoing
• 8 publications
• Featured in social media
• Healthcare and research partnership
• NVIDIA Inception member
• Still more research in progress
Pablo Ribalta Lorenzo
Conclusions
Machine learning done right: An approach to building successful products
• Building ML products is possible with a rigorous scientific approach
• Maximising the performance of our model is a nuanced process that requires a thorough understanding of the problem and the theory behind it
• It is not only about the model, but also what’s around it
Pablo Ribalta Lorenzo
Machine learning done rightAn approach to building successful ML projects
www.future-processing.pl