blood cell detection using singleshot multibox...

Blood Cell detection using Singleshot Multibox Detector

Inyoung HuhComputational Science, Mathematics and Engineering

University of California, San [email protected]

Abstract— Automatically cell detection in microscopy imageshas become an important task to a wide range of biomedicalresearch. In machine learning field, the state of the art methodsfor detection have been developed more than classification. Inthis paper, I designed Red blood cell detector using SingleshotMultibox Detection algorithm, which can detect only blood cellin an image. With more than 99% accuracy, cell detector canfind red blood cell’s location.

I. INTRODUCTION

Over past few years, the neural network has evolvedfast. The advent of deep learning enable machine learn-ing not only to apply to everyday lives but also to usean interdisciplinary research. Now we can easily find theapplication of deep learning. Deep learning architecture hasbeen used in many fields, such as computer vision, speechrecognition, natural language processing, audio recognition,social network filtering, machine translation, bioinformaticsand drug design. Especially, application of deep learningin medicine is incredible. Deep learning can help doctorsmake faster, more accurate diagnoses. It helps doctors cansee, which means that enhances doctors ability to analyzemedical images. For example, by training detecting the tumorfrom input images, deep learning model can assist doctors todecide proper diagnosis. It is of significant interest to a widerange of medical imaging tasks and clinical applications.This paper is about cell detection motivated by this insight.Cell detection is to find whether there are certain types ofcells present in an input image and to localize them in theimage. This paper will begin with a discussion of SingleShot Multibox Detection algorithm. The data used in thispaper is from Kaggle dataset. The input to this algorithmare blood cell images, including bounding boxes and class ofblood cells in an image. The Single Shot Multibox Detectionalgorithm will predict the class and location of blood cell ineach images.

II. RELATED WORK

In the last few decades, different cell recognition methodshad been proposed. These methods can be divided intotraditional methods and fully deep learning based meth-ods. Traditional cell detection is based on two approaches:Hand-crafted feature representation+classifier. For example,Laplacian-of-Gaussian (LoG) [1] operator was famous forblob detection. Gabor filter or LBP feature [2] provideinteresting texture properties and had been attempted for acell detection task [3]. The process of this methods is, first,detection system extracts the features as the representation

of input images and machine learning algorithm is appliedthe feature vectors to recognize the location around targetcells. However, even though deep learning methods can beused as classifier, since these are mostly relied on Hand-crafted feature representation, there are always some riskthat suitable features are selected by human and the featuresare tightly coupled with others and some limitation whenmassive inputs are needed. Fully deep learning approachcan resolve these problems. Unsupervised learning enable toevolve itself from fixed feature towards automated learning ofproblem-specific features directly from training data. There-fore, users do not have to go into the elaborate procedurefor the extraction of features. For example, collaborationwith convolutional neural network and compressed sensingperformed well for cell detection which is the first successfuluse of convolutional neural network with compressed sensingbased output space encoding [4]. A sparse color unmixingtechnique with convolutional neural network presents a suc-cess for automatic immune cell[5].

III. METHODS

A. Convolutional Neural Network

Convolutional neural network is specialized in analyzingvisual imagery. While traditional neural network consistsof input, hidden and output layers simply, convolutionalneural network consist of four main layers: ConvolutionalLayer, Pooling Layer, Dropout Layer and Fully-ConnectedLayer. These layers can be said to basic architecture ofconvolutional neural network. By stacking these layers, wecan design a full convolutional neural network. Convolutionallayer consist of a set of independent filters. When we makedesign choices of convolution layers, we must decide on akernel size. The kernel is slid over image taking a ”snapshot”at each step, the slide is managed with a choice of stridesize, allowing each snapshot to overlap the previous. Thisprocess is the convolution of the pixels set in one snapshotwith the next. A strength of the convolutional network inimage classification is its use of local structure in the imageto support feature learning. This ability is rooted in theoverlapping of these ”snapshots” in the convolution layers.

Sandwiched between convolution layers are pooling,dropout, or normalization layers depending on design choice.Pooling layer will perform a downsampling operation alongthe spatial dimensions. They are popularly chosen with2×2 receptive fields. From a receptive field a representativemaximum value pixel is chosen as active and is passed for-ward through the network, non-active pixels are effectively

zeroed. Gradients backpropogate through the active pixel.This process reduces the size of the input, not necessarilythe depth though. Dropout layers are used to reduce thenumber of parameters and help with overfitting by reduc-ing parameters and therefore allowable complexity of thenetwork. Convolution layers can increase the input depthto the next layer from say, 3 initially for RGB, to 64. Theadded depth allows for more complex features to be learned.Associated with this is an explosion of parameters. Thedropout layer will choose to randomly drop some percentageof the parameters, in effect, creating a different architecture.These changing architectures give more flexibility to whatmay be learned by the network. Same as traditional neuralnetwork, fully connected layer will connect every outputneurons and compute the class scores.

B. VGG16

Single Shot Multibox Detector is based on convolutionalnetwork. Its base network: VGG16. VGG16 is a pre-trainedmodel, which is already trained on a dataset and con-tains the weights and biases that represent the features ofwhichever dataset it was trained on. Learned features areoften transferable to different data. Usually, when we usepre-trained model, we only use the architecture by discardingfully connected layers. Therefore, in Single Shot Mulitiboxdetector, VGG16 is used for base network without fully con-nected layer. Fully connected layers of Single Shot MultiboxDetector is replaced by other network.

Fig. 1. Sliding Window

C. Single Shot Multibox Detector

Traditionally, image processing problem was limited toclassification problem. However, there is a deeper problem:object detection. While classification is about predicting labelof the object present in an image, detection goes furtherthan that and finds locations of those objects. In imageclassification, we predict the probabilities of each class,while in object detection, we also predict a bounding boxcontaining the object of that class.

Fig. 2. Sliding Window

Single Shot Multibox Detector have great balance ofaccuracy and speed. As I mentioned before, base networkfor Single Shot Multibox is VGG16. VGG16 can be usedthe extract feature maps. Full connected layers, called Extrafeature Layers is the key of this architecture. Basic idea ofobject detection is sliding window. Sliding window idea isthat we crop the patches contained in the boxes, resize andinsert them to convolutional network. By repeating this pro-cess with smaller window size , detector can capture objectsof smaller size. However, it occurs lots of computation. Inorder to reduce computational cost, Single Shot MultiboxDetector use default boundary boxes. Every feature map cellis matched with a set of default bounding boxes. It enableto measure different dimensions and aspect ratios. Thesebounding boxes are manually chosen based on IoU withrespect to the ground truth was over 0.5.

Fig. 3. Single Shot Multibox Detector Architecture

Since object detection is the process solving classificationand localization problem, we should consider two type ofloss: Confidence loss and location loss. Confidence lossmeasure how accurate the network is distinguishing theobjects class. Cross-entopy is used to compute this loss.Location Loss measure how far away the networks predictedbounding boxes are from the ground truth ones from thetraining set. L1-norm is used for Single Shot MultiboxDetector. xpij = 1, 0 be an indicator for matching the i-thdefault box to the j-th ground truth box of category p. cindicate the softmax loss over multiple classes confidence.

We can calculate confidence loss as following.

Lconf (x, c) = −N∑

i∈Pos

xpij log(cpi )−

∑i∈Neg

log(c0i ) (1)

l is the predicted box and g is the ground truth boxparameters. We can calculate location loss as follows.

Lloc(x, l, g) =

N∑i∈Pos

∑m∈cx,cy,w,h

xkijsmoothL1(l − g) (2)

Total loss can be calculated with summation of location lossand confidence loss and weight term α.

L(x, c, l, g) =1

N(Lconf (x, c) + αLloc(x, l, g)) (3)

IV. EXPERIMENTS/RESULTS

A. Data

In order to train the model with Single Shot Multibox Dec-tor, we need object image, category and its bounding boxes(x,y coordinates). The data used for this project is providedby Kaggle dataset. This project is only for detection RedBlood Cell. Therefore, class category is ”Red Blood Cell”.Each image contains multiple Red Blood Cells. Therefore,in one image, there are more than four bounding boxes inone image. 200 images with 1,453 labels are used as inputdata and it is divided into training data and test data with8:2 ratio. Input data is as following.

Fig. 4. Input data example

B. Experiments

Tensorflow Object Detection API provide several differentmodels for object detection. All models are already pre-trained by COCO dataset, which is the photo collectionof common objects in the world. Additionally, they areprovided two type of algorithms: Mobilenet and inceptionversion. Difference between Mobilenet and inception one is

that Mobilenet is separable convolution while Inception usesstandard convolution. Mobilenet is lighter than Inceptionsince it is implemented for mobile application. Since Itrained the model with local computer, Mobilenet is used fordetection training. It is only for detecting single object: Bloodcell. Number of class should be one. I set the batch size is24. If batch size is too large, it occurs memory error. Initiallearning rate is 0.004 and it decreases by every epoch. It isoften useful to reduce learning rate as the training progressesbecause it helps coverage slowly and not to pass the optimal.

(a) TestImage

(b) BoundingBoxesbydetector

Fig. 5. Test Data 1

(a) TestImage

(b) BoundingBoxesbydetector

Fig. 6. Test Data 2

(a) ClassificationLoss

(b) LocalizationLoss

(c) TotalLoss

Fig. 7. Loss

C. Results

I trained the model 14,000K epoch for one day. We cansee interesting features from the final results. First, the finalimage shows the more number of bounding boxes beforetraining. In terms of accuracy, predicting the bounding boxesshow good accuracy. In terms of detecting bounding boxes,it shows better results than human. In figure 4 and 5 (a),there are only four bounding boxes even though there aremore blood cell in the image. However by Single ShotMultibox detector, more red blood cells can be found. Itimplies that cell detector trained by Single Shot MultiboxDetector algorithm can outperform than human observer. Itcan be applied to help human by finding the feature whichis hard to be distinguished by human eyes.

In figure 4 (b), it detected the wrong cell as red blood cell.I can guess two why this interesting results come out. Sincethe input images was limited to the blood cell but each bloodcell image do not completely look same. Some images has aspot in the middle of the cell while others do not. In figure 4(b), the wrong cell which is not blood cell has also a spot init. Inconsistent input images may cause to detect this wrongcell as red blood cell. It also implies the risk of detection bymachine.

V. CONCLUSIONS

This paper is about finding the blood cells in an image withsingleshot multi box detection. Object detection is modeledas classification problem. However it includes localizationproblem. Therefore it predict not only objects class but alsoits location in the image. There are several the state-of- the-arts methods for object detection. Among these methods,Single Shot Multibox Detector is faster and more accurate

than other methods. Using Single Shot Multibox Detectoralgorithm, I trained the model to detect the red blood cellsin the image. This detector showed incredible outperformingresults than human observer. I would like to improve thisthe two direction for future work. First one is improvingthis work as multi label model. Since the cell detector wastrained with single label, it is difficult to distinguish twodifferent but similar type of cells in figure 4. If this modelis trained with multi-labels, it will distinguish exactly whichone is red blood cell or not. Additionally, according to relatedworks, it is still not fully deep learning algorithm. In theprocess to annotate bounding boxes around cells, lots of taskdepends on Hand-crafted feature representation. This processcontains risk and limitation. We can improve this model withusing fully deep learning or applying other machine learningalgorithms.

REFERENCES

[1] Hui Kong, Hatice Cinar Akakin, and Sanjay E. Sarma. A generalizedlaplacian of gaussian filter for blob detection and its applications. IEEETransactions on Cybernetics, 43:17191733, 2013.

[2] T. Ojala, M. Pietikinen, and D. Harwood. A comparative study oftexture measures with classification based on feature distributions.Pattern Recognition, 29:5159, 1996.

[3] Yousef Al-Kofahi, Wiem Lassoued, William Lee, and BadrinathRoysam. Improved automatic detection and segmentation of cellnuclei in histopathology images. IEEE Transactions on BiomedicalEngineering, 57:841852, 2010.

[4] Yao Xue and Nilanjan Ray. Cell Detection in Microscopy Images withDeep Convolutional Neural Network and Compressed Sensing.

[5] Ting Chen and Christophe Chefdhotel. Deep Learning Based Auto-matic Immune Cell Detection for Immunohistochemistry Images

[6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, ScottReed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBoxDetector.

[7] https:github.comdatitranraccoon dataset[8] https:github.comtensorflowmodelstreemasterresearchobject detection

blood cell detection using singleshot multibox...

Documents