ultrasound nerve segmentation, kaggle review

Kaggle ultrasound nerve segmentation

Tyantov Eduard

#1 Description

Thu 19 May 2016 – Thu 18 Aug 2016

#2 Data: to find Brachial Plexus (BP)– 420x580 resolution– 5635 train images with masks, 5508 test;– ~120 images per 47 patients– 47% of the images don’t have a mask; – result in RLE encoding

#3 Data: mistakes in the ground truth– 45 known errors of near duplicate images–metric is sensitive to nerve presence mistakes

https://www.kaggle.com/agalea91/ultrasound-nerve-segmentation/mislabeled-training-images/comments

#4 Evaluation

Peculiarities– (!) Mask presence mistake leads to zero score– Needs smoothing in the denominator

Loss functions– 1 – dice– -dice– weighted cross entropy (2 classes, per pixel prediction)

Mean mask

#5 BaselinesScore Description Framework Author

0.51 Empty submission - -

0.00 Top left pixel - -

0.57 U-Net, in the beginning of the competition

Keras code Marko Jocic, kaggler

0.62 U-Net, almost at the end

Torch code Qure.ai (host)

https://github.com/jocicmarko/ultrasound-nerve-segmentation/

http://blog.qure.ai/notes/ultrasound-nerve-segmentation-using-torchnet

#6 What is U-NetOverview

– (May 2015 article) “U-Net: Convolutional Networks for Biomedical Image Segmentation”–Winner of “Grand Challenge for Computer-Automated Detection of Caries in Bitewing

Radiography at ISBI 2015”– Encoder-decoder architecture with skip connection on the same level– Fully convolutional, Drop-out in the middle– Augmentation: “Smooth deformations using random displacement vectors on a coarse 3 by 3

grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard deviation.”

https://arxiv.org/pdf/1505.04597.pdf

#7 Another aproach, FCN Overview

– (20 May 2016, article) “Fully Convolutional Networks for Semantic Segmentation”– VGG-18– Segmentation prediction on different layers of the net, +upsampling– Average predictions

https://arxiv.org/pdf/1605.06211v1.pdf

#8 Starting point, Marko Jocic’s solutionOverview

– Classic U-Net: VGG-like– Very simple Keras code– Image resize to 64x80, bicubic interpolation– Loss= - Dice coefficient, per batch averaging, smooth=1– Training on whole dataset, no validation– RLE-encoding function– Adam optimizer

Training– 20 epochs, ~30 seconds on Titan X, memory footprint 800mb– Overfits, 0.68 on training -> 0.57 on LB

#9 Aspects of the solution: basics

Overfitting basics (+2%)– Split train/valid, 20% and early stopping patience=5 epochs

• used random split instead of more convenient by patient (due to a subtle bug) – Dropout after each conv layer

General enhancements– Resolution 64x80 -> 80x112 (+1%)– ELU instead of ReLU -> faster convergence

#10 Aspects of the solution: augmentationAugmentation*

– flip x, y– random rotate (5)– random zoom (0.9, 1.1)– random channel shift (5.0)

*all transformations should be done with a mask too

All transformations can be done on the fly with a generator (randomly applied), but didn’t improve results.

Elastic transform:convolve with a Gaussian on randomdisplacement fields

Result: no added effect

#11 Aspects of the solution: blocksModifications of U-Net

– 2 3x3 convolution -> inception_v3 block– BNA after each convolution– BNA + activation after summation– nxn -> 1xn + nx1

Results:– lesser parameters (1M)– faster convergence– LB: +2%

v3 block

v3 + splitted

#12 Aspects of the solution: 2nd head, postfilter2nd head

–mask presence branch in the middle of NN (after decoder part)• Conv 1x1, sigmoid• FC=1, sigmoid

– leads to better convergence

Post filter– presence prob < 0.5 or sum(pixels) < 3000 -> empty mask (+4.5%)– in the end: combining p_nerve = (p_score + p_segment)/2 -> +0.5%

#13 Aspects of the solution: otherModifications

– Skip connection with Residual blocks (+1%)–Max pool -> Conv 3x3 with stride=2 (+BNA)– Ensemble (+1%)

• k-fold 5,6,7,8, average– Prediction on augmented versions of test images (averaging)

Final result:– single model 0.694 score– ensemble 0.70399 (hour before the competition’s end)– last submission has been human verified ;) but no help

code: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation

https://github.com/EdwardTyantov/ultrasound-nerve-segmentation






#14 Leaderboard– 31-th – see Private LB huge shake-up

#15 What didn’t help

– Inception Resnet v4– sequential training of decoder, encoder parts–more or lesser layers/blocks/n_filters– pixel clustering – higher or lower resolution– dropout, different probs– Torch version– deconv layers instead of upsampling– weight decay for layers– FCN– Deepmask architecture

#16 Technical

– Ubuntu 14 or 16, Cuda 8, cudnn 5, keras last, torch last– batch_size=64, 128 (depends on GPU memory)– Single model, 2-3 hours on Titan/1080– Ensemble – 24 hours

– train dataset: error re-labeling or zero-outing– FCN with several heads in different resolutions (regularize)– post process: mask to elipse, no holes– separate training: mask/no mask– crop images, super-resolution–models on different resolution– higher resolution– loss

– smart post-processing• which obv. led to overfitting on Public Score

– replication padding instead of zero-padding

#17 Other competitors

#18 Deepmask (FB)– no low-level features– CNN with two heads: mask and score– training set: patch, mask, y_k - objected in centered and fully contained

• mask pixel=1 if it is part of object in the center– VGG, 8-layers, can be trained– Training

• joint learning score * 1/32• first branch only positives• augmentation: shift 16pix, scale little, horiz. flip

– Evaluation• full image: 16 pixel stride

#19 Sharpmask (refined deepmask)

#20 Sharpmask (refined deepmask)– Architecture

• trunk: resnet-50 pretrained• reflect-pad instead of zero• for DeepMask: headC

– Training• Deepmask (score + coarse mask)• then freeze and learn refinements• Why: faster convergenc, deepmask or sharpmask result, minimal finetuning

– Inference• only top-n locations refined

ultrasound nerve segmentation, kaggle review

Health & Medicine