high throughput automatic muscle image segmentation using...

High Throughput Automatic Muscle ImageSegmentation Using Cloud Computing and

Multi-core Programming

Zizhao Zhang1, Fuyong Xing2, Fujun Liu2, Lin Yang2,3

1Dept. of Computer and Information Science and Engineering, University of Florida2Dept. of Electrical and Computer Engineering, University of Florida

3J. Crayton Pruitt Family Dept. of Biomedical Engineering, University of Florida

Abstract. Automatic segmentation of skeletal muscle cell image is cru-cial for the diagnosis of many muscle diseases. It can avoid manual imageanalysis, which is labor intensive, and improve the objectivity and re-producibility. Many state-of-the-art algorithms have been presented forautomatic muscle cell segmentation and achieved great success. However,most approaches exhibit high model complexity and time cost, and theyare not adaptive to large-size images such as whole slide digitized spec-imens. In this paper, we propose a novel distributed computing frame-work, which adopts both data and model parallel, for fast muscle cellsegmentation using cloud computing. With a master-worker paralleliza-tion manner, the image data is distributed onto multiple workers basedon the Spark cloud computing platform. On each worker, we developa parallelized hierarchical tree based region selection algorithm to effi-ciently segment muscle cells using multi-core techniques. Compared withthe standalone implementation, the proposed method achieves more than10 times speed improvement on very large-scale muscle images containinghundreds of cells while performing promising segmentation results.

1 Introduction

Skeletal muscle has been extensively recognized as the tissue related to many dis-eases such as heart failure and chronic obstructive pulmonary disease (COPD).To accelerate the disease diagnosis at the cellular level and reduce the inter-observer variations, these exist increasing demands for accurate and efficientcomputer-aided muscle image analysis. Automatic muscle cell segmentation isusually the first step for further image feature quantification. In recent years, alarge number of state-of-the-art algorithms have been reported in literature forskeletal muscle image segmentation. Liu et al. [11] have proposed a deformablemodel-based segmentation algorithm which achieves impressive performance onmuscle image patches, and later a region-based selection method dealing withlow quality muscle cell images is presented in [13]. Due to the high model com-plexity, these methods is computationally expensive in large-scale images (e.g.4000× 4000).

2 Z. Zhang, et al.

0

100

200

300

400

500

600

700

800

Contour detection Segments

initialization

Hierarchical tree based

region selection

time (s)

Fig. 1: The time profile for each step of the proposed entire segmentation algo-rithm running on a standalone machine with a 6000 × 6000 image. The hierar-chical tree based region selection step dominates the running time.

Recently, there is an encouraging evidence that applying medical image anal-ysis [16,18] to high performance computing resources can significantly improvethe running time of the algorithms. Meanwhile, analyzing the whole slide imagescan provide much richer information, which is helpful to clinical diagnosis [6].Therefore, there is an urgent need of efficient image analysis algorithms thatcan handle large-scale data. High performance computing techniques emerge asone solution to tackle this challenge, and have attracted a great deal of researchinterests in medical image analysis [15,18,8]. In particular, we have successfullyapplied a cloud computing framework [16,17] to content-based subimage retrievalon whole slide tissue microarray images, and another application is reported in[18] for high throughput landmark based image registration. Although manyhigh performance computing applications in medical image analysis have beenpresented in recent literatures, there exits very few reports focusing on cell seg-mentation.

In this paper, we propose an efficient muscle cell segmentation algorithm us-ing the Spark cloud computing platform on very large muscle images. It consistsof three steps: 1) Muscle cell contour detection using random forest; 2) Regioncandidates generation using superpixel techniques; 3) Hierarchical tree based re-gion selection for cell segmentation. Based on the time profile in Figure 1, whichindicates that the region selection dominates the running time (accounting foraround 94%), we focus on accelerating this step using both data and model paral-lel. Specifically, a master-worker parallelization manner is exploited to distributeimage data using Spark, and a parallel hierarchical tree based region selectionalgorithm is applied to the cell segmentation with multi-core techniques on eachworker.

Title Suppressed Due to Excessive Length 3

2 Contour Detection and Region Candidate Generation

We present the proposed cell segmentation framework in this section, and willdiscuss the parallel implementation in Section 3. Effective contour detection isthe first step of most region-based image segmentation methods [5,2], and inthis paper we propose a structured random forest algorithm that extracts theobject contours without the effects of the texture variances. Next, a superpixelalgorithm is exploited to generate region candidates using oriented watershedtransform (OWT) and ultra-metric contour map (UCM) [1]. Finally, we present ahierarchical tree based method to select the optimal regions for cell segmentation.

2.1 Contour Detection

Random forest (RF) classifier is an ensemble learning technique which combinest decision trees to form a forest F = {Tj}tj=1. Each tree Tj is trained indepen-dently in a recursive manner and the final classification is determined by applyinga majority voting to all the outputs of trees in F . RF is a fast classifier satis-fying the real-time requirement in video or large image processing applications,and it is successfully applied to contour detection and achieves state-of-the-artperformance [10,5].

In order to consider the structures in the label space, we propose to exploitstructured random forest (SRF) [4], a variation of the conventional RF classifier,to detect the muscle cell boundaries (contours). SRF is trained with a set oftraining dataD composed ofN image patchesX = {x1, x2, ..., xN}, xn ∈ Rd×d×c

and the corresponding labels Y = {y1, y2, ..., yN}, where D ⊂ X × Y . In ourimplementation, xn is a d × d × c-dimensional feature representation extractedfrom a d × d image patch, where each pixel is represented as a c-dimensionalvector. In the conventional RF, the label is determined by the membership of thecenter pixel of the corresponding patch (1 means it is on cell edge, otherwise 0);Instead of using this simple strategy, SRF extracts a structured labels gn ∈ Zd×d

for each patch xn which considering the neighborhood information around itscenter pixel. At node i of a tree during the training, a mapping function isused to transform gn to the discrete label yn [4] by considering the structureinformation of all data in Di. Also a split function h(x, θ) = 1[x(k) < τ ] splitsthe data Di ⊂ X × Y to the left L or right R substree of node i. The τ isdetermined by maximizing the standard information gain criterion Ci at node i:

Ci = H(Di)−∑

o∈{L,R}

|Doi |

|Di|H(Do

i ), (1)

where H(Di) is the Gini impurity measure where H(Di) =∑

y cy(1 − cy) withcy denoting the proportion of data in Di with label y. The split function h(x, θ)sends the data in Di into the left subtree if h(·) = 1 and the right substreeotherwise.

Following the suggestions in [4], we use 3 color channels, 2 magnitude and8 orientation channels to represent the feature xn, such that in total we have

4 Z. Zhang, et al.

c = 13 channel features for each pixel. In the training stage, each tree randomlyselects a subset of training samples and features to prevent overfitting, and themost representative structural label g⋆i at note i is recorded for classification[3]. In the testing stage, the label of each pixel is determined by performing avoting on the d×d× t binary decisions (patches centered at all pixels need to beclassified). Since the classification of each pixel is independent, we can parallelizethis stage using a multi-thread programming technique.

2.2 Region Candidate Generation

Given a detected contour image, superpixels (or region candidates) can be gen-erated to group similar pixels in terms of color and spatial configuration. In thisway, the computation cost of subsequent segmentation can be greatly reduced.There exists numerous algorithms for superpixel creation, in this paper we choosethe oriented watershed transform and ultra-metric contour map (OWT-UCM)[1] for three main reasons: 1) OWT-UCM is very efficient to handle large-scaleimages; 2) Regions in a UCM image are well nested at different thresholds; 3)OWT-UCM guarantees that the boundaries of regions are closed and single-pixel wide, and each edge (boundary arcs separated by branch points) has thesame intensity value. These characteristics can facilitate the parallelism of thesubsequent proposed hierarchical tree based region selection method.

2.3 Hierarchical Tree-based Region Selection

Based on the generated region candidates, we formulate cell segmentation as aregion selection problem and solve it using hierarchical tree, which is a graph-based segmentation algorithm [14]. The leaf nodes represent the initial smallregions generated by the aforementioned superpixel algorithms, and the rootnode corresponds to the whole image. Each node has a label l and its conditionalprobability is P (l|p, w) ∝ exp(−E(l|p, w)), where p is the feature vector recordedin the corresponding tree node. E(l|p, w) is often defined as the negative innerproduct:

E(l|p, w) = −⟨w, ϕ(p, y)⟩, (2)

where ϕ(p, y) is the concatenation of features of all regions selected [14]. Duringthe optimization, we aim to calculate the maximum a posterior (MAP) as

argmaxl

P (l|p, w) = argminl

E(l|p, w). (3)

Intuitively, the objective of Equation (3) is to search for a combination of regionsin order to maximize the conditional probability. Therefore, the MAP can beefficiently computed using dynamic programming or loopy belief propagationalgorithms. Different from the flat graph-based segmentation method, the non-overlapping constraints need be considered when optimizing Equation (3) [9].

In our implementation, we build the hierarchical tree into a dendrogramstructure using the region-wise distance of the UCM map (see Figure 2 (b)).


Master Worker Worker

Core 1 Core n

4

1 2

(a) (b)

31 2 4 3

5

21

3

4

1 23

4

5

: Top-to-bottom cut

Close-up patches

Fig. 2: (a): The partially overlapped tiles (left muscle image) are distributed toworkers as tasks. The returned segmentation results are combine to generate theright image. (b): Close-up patches of the test image in (a) is shown. The fourpatches are original image, the edge image, the initial UCM image and the UCMimage cut by a high threshold. The initial UCM is built to a tree (in dendrogramstructure). Each region in the high-thresholded UCM image is separated to smalltrees using region-wise distance computed using the edge image. The hierarchicaltree based inference algorithm is parallelized using multi-core techniques.

Thereafter, we exploits a max-product algorithm to calculate the optimal valueof Equation (3). Meanwhile, we compute the P (l|p, w) using a binary class sup-port vector machine (SVM) classifier, where w is the parameters of the SVMhyperplane, and calculate the feature p by following [12].

3 Parallel Muscle Image Segmentation

3.1 Data Distribution Using Spark

Due to the extremely high resolution of muscle images, it costs a large amountof running time when applying the whole image onto a standalone machine.Since segmentation of each cell is independent with each other, we propose todivide the image into multiple partially-overlapped tiles and distribute themonto multiple workers for concurrent processing, as shown in Figure 2 (b). Tothis end, we implement this parallel strategy in a master-worker manner withthe Spark cloud computing platform [20]. In comparison with other distributedcomputing framework, Spark has the following advantages: 1) It has a flexiblecluster management mechanism such that a parallel system can be easily builtand run on local clusters; 2) It uses an Resilient Distributed Datasets (RDDs)technique [19] to perform in-memory computations, which is suitable for imageprocessing applications; 3) It exhibits strong compatibility, supporting multiplestandard programming languages and avoiding learning new languages.

The Spark-based parallel segmentation algorithm consists of four steps: 1)Data preparation: given a test muscle image I, we compute the contour image

6 Z. Zhang, et al.

O with SRF and the UCM map U with OWT-UCM, and load the learned SVMmodel M (see Section 2) at the master machine; 2) Data distribution: thedata D = (I,O, U,M) is divided into w tiles, D1, ...,Dw (the SVM model is abroadcast variable) where Dw = (Iw, Ow, Uw,M), and the master dynamicallymaps Dw to several workers using a user-defined map function; 3) Segmenta-tion: the proposed cell segmentation algorithm will be executed via another mapfunction on each worker using multi-core techniques; 4) Data collection: thesegmentation results returned by each worker are collected to form the final seg-mentation. To avoid the lost of cells crossing the stitching positions of differenttiles, we simply pad the tiles to make neighborhood tiles partially overlapped.In order to reduce the overhead of data transfer between master-worker andalleviate extra cost of combing results returned from workers, we only requireworkers to return masked binary images.

With data-level parallelization, we can speed up the segmentation algorithmwith no more than K times (because of data communication overhead) withK nodes in the cluster. To further speed up our segmentation algorithm, wepropose to parallelize the hierarchical tree inference algorithm and discuss it inthe next section.

3.2 Hierarchical inference in parallel

In order to further speed up the proposed method, we propose to parallelizethe hierarchical tree inference algorithm. The algorithm is mainly composed of:1) building a tree structure using the UCM image U , 2) extracting feature p ofeach tree node, 3) computing P (l|p, w) for each p, and 4) calculating the optimalsolution of Equation (3). Based on our experiments, we observe that steps 2 and3 dominate the time cost, since the number of nodes and the depth of a tree growas the number of cells in an muscle image. However, it is unnecessary to build avery deep tree for the whole image with hundreds of cells. On the contrary, wecan easily cut the tree from top-to-bottom by the region-wise distance computedfrom the detected contour image O and the UCM image U (see Figure 2 (b)).Therefore, the tree is separated into several substrees and the inference process ofsubstrees is independent with each other. We parallelize the inference algorithmusing a multi-core programming technique on a single machine.

4 Results

To evaluate the proposed parallel algorithm, we build a small cluster using 8Linux machines, each with 6 cores (Intel [email protected] × 6) and 32 GB RAM. Intotal we create a cloud system with 48 cores (or nodes) and 256 GB RAM. Toavoid losing cells crossing the stitching positions of neighborhood tiles, we set a300× 300 overlapped region size.


Number of nodes1 2 4 6 9 12 16

Tim

e (s

)

0

50

100

150

200

250

300

350

400

450

Spark

(a)

Image size (pixels)1X 2X 3X 4X 5X

Tim

e (s

)

0

50

100

150

200

250

300

350

400

450

500

Standalone Spark

(b)

Fig. 3: (a): The running time cost using different number of nodes on Spark.(b): The comparison of time cost between the proposed parallel method and thestandalone version. The x-axis is the image size (1x = 1000 ∗ 800).

4.1 Efficiency Evaluation

The parallelism of the proposed method has two levels: data level parallelismusing cloud computing and model level parallelism using multi-cores. Based onour observation, there is a trade-off between the size of a tile and the number oftiles (each tile is a task distributed to a node) need to be parallelized. Given antest image, the more tiles we have, then the small tile size we have. If the tile sizeis too small, the computation duty is too slight to maximize the performance ofthe multi-core parallel hierarchical tree region selection algorithm. Meanwhile,the large number of tiles would bring too much data communication cost. On theother hand, our model level parallelism may have resource (core) conflicts withdata level parallelism. Practically we ensure that only 2 cores of each workercould be used in the cluster, and thus in total we have a maximum of 16 nodes.

In Figure 3 (a), we visualize the time cost using different number of nodesin the cluster with a 4600 × 3800 test image. As we can see, as the number ofnodes increases, the time cost drops dramatically. We can achieve a significantspeed improvement when the number of node increasing from 1 to 8, but the timedecreasing is not obvious from 9 to 12. This is attributed to the trade-off betweenthe size and the number of image tiles. The time cost for data communicationwill gradually increase as the tile size decreases. In Figure 3 (b), we comparethe time cost between the Spark based parallel segmentation and the standalonemachine based algorithm. We can obtain more than 10 times speedup with 5ximage size.

4.2 Segmentation Performance

To evaluate segmentation performance, we report precision, recall and F1 score.The details of the evaluation method can be found at [13].

Figure 4 visualizes the segmentation results on three H&E skeletal muscleimage samples, which exhibit significant variations on cell sizes, shapes and ap-pearances. It is clear that the proposed algorithm can accurately segment out

8 Z. Zhang, et al.

Fig. 4: The segmentation results of three H&E skeletal muscle images. The leftcolumn is the original images and the right column is the corresponding segmen-tation mask images.

each individual cell. We compare the proposed parallel muscle image segmenta-tion algorithm with two state-of-the-art image segmentation algorithms: 1) gpb[1] is an edge-based image segmentation algorithm which has been widely usedin the image segmentation field. The major drawback is its low efficiency; 2)Isoperimetric graph partition (ISO) [7] produces high quality segmentations as aspectral method with improved speed and stability. From Table 1, our algorithmoutperforms the compared segmentation method based on our muscle imagesalthough ISO performs a high precision. Compared with those algorithms, ouralgorithm achieves impressively improved recall.

5 Conclusion

In this paper, we propose a novel parallel cell segmentation method using cloudcomputing on large-scale skeletal muscle images. We implement the algorithmon the Spark cloud computing platform in a master-worker parallelization man-ner. Image tiles in the master node are distributed to nodes in the cluster fordata distribution. At each worker, we parallelize the hierarchical tree inference


Table 1: The comparison results of state-of-the-art image segmentation algo-rithms.

MethodF1-score (%) Precision (%) Recall (%)mean std mean std mean std

ISO[7] 80.50 0.0993 89.88 0.0589 74.29 0.1369

gPb[1] 79.04 0.0780 91.23 0.0515 70.11 0.0962

Proposed 87.24 0.0114 87.00 0.0032 87.47 0.0191

algorithm for region selection using multi-core techniques. Experimental resultsindicate a more than 10 times speed improvement compared with the standaloneversion of the proposed segmentation method. Comparison results demonstratepromising segmentation results on our skeletal H&E muscle image dataset.

References

1. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchicalimage segmentation. Pattern Analysis and Machine Intelligence, IEEE Transac-tions on 33(5), 898–916 (2011)

2. Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combi-natorial grouping. In: Computer Vision and Pattern Recognition (CVPR), IEEEConference on. pp. 328–335 (2014)

3. Dollar, P., Zitnick, C.: Fast edge detection using structured forests. Pattern Anal-ysis and Machine Intelligence, IEEE Transactions on pp. 1–1 (2014)

4. Dollar, P., Zitnick, C.L.: Structured forests for fast edge detection. In: ComputerVision (ICCV), IEEE International Conference on. pp. 1841–1848 (2013)

5. Donoser, M., Schmalstieg, D.: Discrete-continuous gradient orientation estima-tion for faster image segmentation. In: Computer Vision and Pattern Recognition(CVPR), IEEE Conference on. pp. 3158–3165 (2014)

6. Ghaznavi, F., Evans, A., Madabhushi, A., Feldman, M.: Digital imaging in pathol-ogy: whole-slide imaging and beyond. Annual Review of Pathology: Mechanismsof Disease 8, 331–359 (2013)

7. Grady, L., Schwartz, E.L.: Isoperimetric graph partitioning for image segmentation.IEEE transactions on pattern analysis and machine intelligence 28(3), 469–475(2006)

8. Kagadis, G.C., Kloukinas, C., Moore, K., Philbin, J., Papadimitroulas, P., Alex-akos, C., Nagy, P.G., Visvikis, D., Hendee, W.R.: Cloud computing in medicalimaging. Medical physics 40(7) (2013)

9. Lempitsky, V., Vedaldi, A., Zisserman, A.: Pylon model for semantic segmentation.In: Advances in neural information processing systems. pp. 1485–1493 (2011)

10. Lim, J.J., Zitnick, C.L., Dollar, P.: Sketch tokens: A learned mid-level representa-tion for contour and object detection. In: Computer Vision and Pattern Recogni-tion (CVPR), IEEE Conference on. pp. 3158–3165 (2013)

11. Liu, F., Mackey, A., Srikuea, R., Esser, K., Yang, L.: Automated image segmen-tation of haematoxylin and eosin stained skeletal muscle cross-sections. Journal ofMicroscopy 252(3), 275–285 (2013)

10 Z. Zhang, et al.

12. Liu, F., Xing, F., Yang, L.: Robust muscle cell segmentation using region selectionwith dynamic programming. In: Biomedical Imaging (ISBI), IEEE InternationalSymposium on. pp. 521–524 (2014)

13. Liu, F., Xing, F., Zhang, Z., Mcgough, M., Yang, L.: Robust muscle cell quantifi-cation using structured edge detection and hierarchical segmentation. In: Interna-tional Conference on Medical Image Computing and Computer Assisted Interven-tion (MICCAI) (2015)

14. Uzunbas, M.G., Chen, C., Metaxsas, D.: Optree: A learning-based adaptive wa-tershed algorithm for neuron segmentation. In: Medical Image Computing andComputer-Assisted Intervention (MICCAI), pp. 97–105 (2014)

15. Van Aart, E., Sepasian, N., Jalba, A., Vilanova, A.: Cuda-accelerated geodesicray-tracing for fiber tracking. Journal of Biomedical Imaging p. 6 (2011)

16. Xing, Fuyong ans Qi, X., Foran, D.J., Kurc, T., Saltz, J., Yang, L.: Content-basedparallel sub-image retrieval (2012)

17. Yang, L., Qi, X., Xing, F., Kurc, T., Saltz, J., Foran, D.J.: Parallel content-basedsub-image retrieval using hierarchical searching. Bioinformatics 30(7), 996–1002(2014)

18. Yang, L., Kim, H., Parashar, M., Foran, D.J.: High throughput landmark basedimage registration using cloud computing (2011)

19. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin,M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant ab-straction for in-memory cluster computing. In: Proceedings of the 9th USENIXconference on Networked Systems Design and Implementation. pp. 2–2 (2012)

20. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: clustercomputing with working sets. In: Proceedings of the 2nd USENIX conference onHot topics in cloud computing. vol. 10, p. 10 (2010)

high throughput automatic muscle image segmentation using...

Documents