datasciencebowl 2017 · 2020. 6. 5. · team cowboy bebop - 11th place dmitry altuhov alexander...
TRANSCRIPT
![Page 1: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/1.jpg)
DataScienceBowl 2017Lung cancer prediction
![Page 2: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/2.jpg)
Team
Cowboy Bebop - 11th place
Dmitry AltuhovAlexander GuschinDmitry UlyanovMichail Trofimov
![Page 3: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/3.jpg)
Overview
X: 3d images of lungs (CT)Y: 1 if (lung cancer was diagnosed during 1 year after scan) else 0Metric: Logloss
![Page 4: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/4.jpg)
Example
(X,Y) slices of 3d image (Z,Y) slices of 3d image
![Page 5: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/5.jpg)
Overview
X: 3d images of lungs (CT)Y: 1 if (lung cancer was diagnosed during 1 year after scan) else 0
Additional data: Luna 2016 challenge
1. 3d images of lungs (CT)2. Nodules candidates (detected automatically)3. Candidates assesment (by radiologists)
A candidate == (x,y,z)
![Page 6: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/6.jpg)
Example: nodule
![Page 7: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/7.jpg)
Data size
DataScienceBowl:
3d images of lungs (CT) x 1600 patients (stage1) ~ 120GB3d images of lungs (CT) x 500 patients (stage2) ~ 60GB
Additional data: Luna 2016 challenge
3d images of lungs (CT) x 1000 patients ~ 80GB
![Page 8: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/8.jpg)
Common pipeline
1. Preprocess dataa. Rescale to 1 voxel == 1mm^3b. Refine data: segmentation
2. Train networks on Luna Dataseta. Classify candidate/annotation (3d convnet)b. Segmentation candidate/annotation (3d Unet)c. Classify nodule’s malignancy
3. Use networks on DSB to “preprocess” dataa. “Probability” map
4. Traina. 3d convnetb. Xgboost
![Page 9: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/9.jpg)
1. Preprocess: rescaling
Different spacing between slices
1. Between different patients in one dataset2. Between different datasets
Stage2 have smaller spacings => higher quality dataStage1: Stage2:
![Page 10: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/10.jpg)
1. Preprocess: segmentation
Two goals:
1. Remove redundant parts2. Refine useful parts
![Page 11: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/11.jpg)
1. Preprocess: segmentation
Examples
![Page 12: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/12.jpg)
1. Preprocess: segmentation
Examples
![Page 13: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/13.jpg)
2. Train networks on Luna dataset
Differences in network architecture. In input:
1. Different receptive fields: 32^3, 64^3, etc2. Some use 2d conv networks (see 7th place report)
In target:
1. Classification: candidate/annotation or malignancy (XML)2. Segmentation using 3d-contour (XML)3. Regression: nodule size
![Page 14: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/14.jpg)
2. Train networks on Luna
Luna winners paper
![Page 15: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/15.jpg)
3. Use networks on DSB to “preprocess” data
After “preprocessing”
![Page 16: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/16.jpg)
4. Train on DSB after “processing”
1. 3d-Convnet2. Xgboost
a. Features: max, np.where(patch == patch.max()), relative coordinates~ 0.405 cv (5 folds), ~0.44 lb
b. Preprocessing: mask~ 0.405 cv (5 folds), ~0.435 lb
c. Features for predictions from 3 zoom levels (2nd place)1 voxel == 1mm^3, 1.5mm^3, 2mm^3
![Page 17: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/17.jpg)
4. Train on DSB after “preprocessing”
Use mask to clip redundant parts of images
![Page 18: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/18.jpg)
Pipeline overview: 2nd and 11th places
1. Preprocess dataa. Rescale to 1 voxel == 1mm^3b. Refine data: segmentation
2. Train networks on Luna Dataseta. Classify candidate/annotation (3d convnet)b. Segmentation candidate/annotation (3d Unet)c. Classify nodule’s malignancy
3. Use networks on DSB to “preprocess” dataa. “Probability” map
4. Traina. 3d convnetb. Xgboost
![Page 19: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/19.jpg)
Pipeline overview: 7th place
1. Preprocess dataa. Rescale to 1 voxel == 1mm^3b. Refine data: segmentation
2. Train networks on Luna Dataseta. Classify candidate/annotation (3d convnet)b. Segmentation candidate/annotation (2d Unet)c. Classify nodule’s malignancy
3. Use networks on DSB to “preprocess” dataa. “Probability” map + Clustering
4. Traina. Top-20 most suspicious clusters > 3d convnet classification (labels from patients)b. Max probability from top-20 > patient prediction
![Page 20: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/20.jpg)
Pipeline overview: 9th place
1. Preprocess dataa. Rescale to 1 voxel == 1mm^3b. Refine data: segmentation
2. Train networks on Luna Dataseta. Classify candidate/annotation (3d convnet)b. Segmentation candidate/annotation (3d Unet)c. Classify nodule’s malignancy
3. Use networks on DSB to “preprocess” dataa. Find candidates by 2b > False positive reduction by 2a and 2c
4. Traina. Top-N most suspicious nodules > 3d convnet classification (by networks like 2a/2c)b. Aggregating probabilities from top-N > patient prediction
![Page 21: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/21.jpg)
Technical details
The heaviest network trains up to 12 hours on 4x Nvidia M60 for networks on Luna dataset
Pytorch
![Page 22: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/22.jpg)
Fun moments
Perfect score script <Oleg Trott>
The core algebraic insight needed here is that if we choose 15 probabilities to be
sigmoid(- n * epsilon * 2 ** i)
where n=198, 0 <= i < 15, and epsilon = 1.05e-5 for example, and choose the rest of the probabilities to be 0.5, then the 15 labels corresponding to those 15 probabilities are easily discoverable from the score we get, because all 32768 possible label combinations lead to different scores.
![Page 23: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/23.jpg)
Fun moments
Using “pretrained” convolutions from Luna winners paper
![Page 24: DataScienceBowl 2017 · 2020. 6. 5. · Team Cowboy Bebop - 11th place Dmitry Altuhov Alexander Guschin Dmitry Ulyanov Michail Trofimov](https://reader033.vdocuments.mx/reader033/viewer/2022052808/6070749935637d15de58a4be/html5/thumbnails/24.jpg)
Pictures are taken both from our work and from competition forum/kernels:
https://www.kaggle.com/c/data-science-bowl-2017