[ieee 2010 international conference on artificial intelligence and computational intelligence (aici)...

5
Automatic Extraction of Shape Features for Classification of Leukocytes Ermai Xie, T. M. McGinnity, QingXiang Wu Intelligent Systems Research Centre, University of Ulster at Magee, Londonderry, BT48 7JL, N.I., UK, {xie-e, tm.mcginnity, q.wu}@ulster.ac.uk, http://isrc.ulster.ac.uk/ Abstract— Microscope-based white blood cell classification plays an important role in diagnosing disease. The number of segments of nucleus and the shape of segments of nucleus are regarded as important features. Since it is difficult to automatically extract these features from a blood smeared image, they have not been used in the current automatic classifiers based on smeared images. In this paper, an approach based on the Poisson equation is proposed to extract the number of segments of nucleus in a more straightforward manner, and inner distances are used to represent the shape features of the nucleus segments. The experimental results show that the proposed approaches can extract the features efficiently. These important features can be added to the input feature set of neural networks or other classifiers to improve classification results of leukocytes in a blood smeared image. Keywords- Shape feature extraction, Poisson equation, Inner distance, Leukocyte classification I. INTRODUCTION Microscope-based white blood cell classification is still an important source of data for clinical cytology in pathologies fields, even if blood cell analysis has been progressively developed using various new technologies. This process is usually performed by hematologists, and can be slow and subjective. Therefore, an automatic classifier based on blood smeared images is proposed to improve speed and accuracy of the performance. Biological evidence shows there are five types of the white blood cell, which also can be divided into two categories: granulocytes and agranulocytes [2-5]. The granulocytic series include neutrophilic granulocyte (N), eosinophilic granulocyte (E) and basophilic granulocyte (B). The agranulocytic series include lymphocyte (L) and monocyte (M). Some typical examples of these types are shown in Figure 1, where N has small granules in cytoplasm and only one nucleus, with a variable number of lobes; E has bilobed nucleus and coarse cytoplasmic granules; B include many cytoplasmic granules over the nucleus; L has round nucleus and is devoid of specific granules; M has a kidney-shaped nucleus and slightly basophilic in the cytoplasm. Table 1 summarizes the characteristic features of these cells and their relative size and number in normal blood. The current approaches classify leukocytes by the color of nucleus and leukocyte cytoplasm [12] [13]. These approaches only provide a limited accuracy. In order to gain higher performances, the size of nucleus, the shape of nucleus, the segmentation of nucleus, the presence of granules in cytoplasm and the structure are also used in most of the classifiers [7] [8] [9]. However, the definition of the shape characteristics is not straightforward for computer to recognize automatically. Most approaches use varieties of geometry characteristics to describe these shapes, i.e. circularity, concavity, convexity, principal axis ratio and so on. For classification, neural networks can also be used [7- 9]. Their results are good, but they have not used the number of segments and the shape of segments as inputs, therefore, it is very hard to identify the various maturity stages of the cells. This paper focuses on extracting these two kinds of important shape features and the features are used as inputs for neural networks to improve the accuracy of the classification. The proposed approach uses the notion of random walks, which can be performed as the solution of the Poisson equation, combining with inner distance, to obtain those important features. The remainder of this paper is organized as follows. In Section 2, the methodology and algorithm are outlined. The experimental results are shown in Section 3 and the conclusions are presented in Section 4. TABLE I. THE FEATURES OF LEUKOCYTES Features Granulocytes Agranulocytes Neutrophil Eosinophil Basophil Monocyte Lymphocyte Diameter 12-15 μm 12-15 μm 12-15 μm 12-20 μm 6-18 μm Nucleus U-shaped, S-shaped, 2-5 segmented Segmented, bilobed Poorly shown, S-shaped Kidney shaped Round Granules Azurophilic granules; specific granules Eosinophilic granules Basophilic granules of different sizes Basophilic bluish-gray Scanty, light blue 2010 International Conference on Artificial Intelligence and Computational Intelligence 978-0-7695-4225-6/10 $26.00 © 2010 IEEE DOI 10.1109/AICI.2010.168 220 2010 International Conference on Artificial Intelligence and Computational Intelligence 978-0-7695-4225-6/10 $26.00 © 2010 IEEE DOI 10.1109/AICI.2010.168 220

Upload: qingxiang

Post on 03-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Automatic Extraction of Shape Features for Classification of Leukocytes

Ermai Xie, T. M. McGinnity, QingXiang Wu Intelligent Systems Research Centre,

University of Ulster at Magee, Londonderry, BT48 7JL, N.I., UK,

{xie-e, tm.mcginnity, q.wu}@ulster.ac.uk, http://isrc.ulster.ac.uk/

Abstract— Microscope-based white blood cell classification plays an important role in diagnosing disease. The number of segments of nucleus and the shape of segments of nucleus are regarded as important features. Since it is difficult to automatically extract these features from a blood smeared image, they have not been used in the current automatic classifiers based on smeared images. In this paper, an approach based on the Poisson equation is proposed to extract the number of segments of nucleus in a more straightforward manner, and inner distances are used to represent the shape features of the nucleus segments. The experimental results show that the proposed approaches can extract the features efficiently. These important features can be added to the input feature set of neural networks or other classifiers to improve classification results of leukocytes in a blood smeared image.

Keywords- Shape feature extraction, Poisson equation, Inner distance, Leukocyte classification

I. INTRODUCTION Microscope-based white blood cell classification is still

an important source of data for clinical cytology in pathologies fields, even if blood cell analysis has been progressively developed using various new technologies. This process is usually performed by hematologists, and can be slow and subjective. Therefore, an automatic classifier based on blood smeared images is proposed to improve speed and accuracy of the performance.

Biological evidence shows there are five types of the white blood cell, which also can be divided into two categories: granulocytes and agranulocytes [2-5]. The granulocytic series include neutrophilic granulocyte (N), eosinophilic granulocyte (E) and basophilic granulocyte (B). The agranulocytic series include lymphocyte (L) and monocyte (M). Some typical examples of these types are shown in Figure 1, where N has small granules in cytoplasm

and only one nucleus, with a variable number of lobes; E has bilobed nucleus and coarse cytoplasmic granules; B include many cytoplasmic granules over the nucleus; L has round nucleus and is devoid of specific granules; M has a kidney-shaped nucleus and slightly basophilic in the cytoplasm. Table 1 summarizes the characteristic features of these cells and their relative size and number in normal blood.

The current approaches classify leukocytes by the color of nucleus and leukocyte cytoplasm [12] [13]. These approaches only provide a limited accuracy. In order to gain higher performances, the size of nucleus, the shape of nucleus, the segmentation of nucleus, the presence of granules in cytoplasm and the structure are also used in most of the classifiers [7] [8] [9]. However, the definition of the shape characteristics is not straightforward for computer to recognize automatically. Most approaches use varieties of geometry characteristics to describe these shapes, i.e. circularity, concavity, convexity, principal axis ratio and so on. For classification, neural networks can also be used [7- 9]. Their results are good, but they have not used the number of segments and the shape of segments as inputs, therefore, it is very hard to identify the various maturity stages of the cells. This paper focuses on extracting these two kinds of important shape features and the features are used as inputs for neural networks to improve the accuracy of the classification. The proposed approach uses the notion of random walks, which can be performed as the solution of the Poisson equation, combining with inner distance, to obtain those important features.

The remainder of this paper is organized as follows. In Section 2, the methodology and algorithm are outlined. The experimental results are shown in Section 3 and the conclusions are presented in Section 4.

TABLE I. THE FEATURES OF LEUKOCYTES

Features Granulocytes Agranulocytes Neutrophil Eosinophil Basophil Monocyte Lymphocyte

Diameter 12-15 µm 12-15 µm 12-15 µm 12-20 µm 6-18 µm Nucleus U-shaped, S-shaped,

2-5 segmented Segmented,

bilobed Poorly shown,

S-shaped Kidney shaped Round

Granules Azurophilic granules; specific granules

Eosinophilic granules

Basophilic granules of different sizes

Basophilic bluish-gray

Scanty, light blue

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.168

220

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.168

220

Figure 1. Examples of 5 types of human leukocytes.

II. METHODOLOGY AND ALGORITHM Random walks are used in vision applications such as

perceptual grouping and segmentation. In this paper, it is used to extract features from the contour of the shape of the nucleus. Shapes are considered surrounded by simple, closed contour and divided into grids. Then a set of particles is placed at each grid inside the shape and allowed to move by random walking until they hit the contour. During this time, all the random walking from one grid to a contour is used to reflect the relationship between this grid and the global shape. For each grid, we have a mean time of random walking corresponding to that a particle walks from the grid to contour via possible paths.

The mean time of random walking is able to give a high value in the center of the shape and a low value at the contour of the shape (the value of all the grids on the contour is considered as zero here in this work). Unlike the distance transform, which uses the minimum distance to the contour, the random walk can reflect more global properties of the silhouette. It is beneficial to be able to analyze the shape of nucleus, and it is able to describe the change when the cell is growing old. Some examples are shown in Figure 2. The diagrams show the different stages of the white blood cell growing process from left (infant stage) to right (mature stage). It can be seen that the segments of the nucleus are changed from one to two. There is a high computational cost to simulate the random walking algorithm in which large number of random paths is required. Therefore the Poisson equation is employed to handle this problem.

Figure 2. Example results of mean time distribution obtained using solution of Poisson equation. Bright points correspond to high value of mean time.

A. Poisson Equation The function of random walk can be formalized as the

Poisson equation to be calculated. Poisson's equation is a partial differential equation, which is commonly used in

electrostatics, mechanical engineering and theoretical physics.

Let ( , )U x y denote the mean time, h is the size of grid. Then ( , )U x y can be calculated for every grid inside the silhouette by using following equation.

( )1

( ( , ) ( , ) ( , ) ( , ))4

,

1 U x h y U x h y U x y h U x y h

U x y

+ + − + + + −

=

+ (1)

Here ( , )U x y is equal to the mean value of its immediate four neighbors plus a constant (The constant “1” means the speed of random walking is one grid per time unit). The equation (1) is a discrete form approximation of the Poisson equation:

( )U x, y

( , ) ( , ) ( , ) ( , ) 4 ( , )2h

U x h y U x h y U x y h U x y h U x y

− + + + − + + − (2)

If the shape only includes one grid with four contour grids surrounding it and the boundary (also called Dirichlet boundary conditions) conditions, then we have:

( , ) ( , )

( , ) ( , ) 0

U x h y U x h y

U x y h U x y h

− = + =

− = + = (3)

Based on the definition of equation (1), we have:

( )

( ) ( )(

( ) ( ))

( )

, ,

, ,

1

1

,

11

4

1

40 0 0 0

U x h y U x h y

U x y h U x y h

U x y

+ + −

+ + + −

= +

=

= +

+ + +

(4)

Take the value to the equation (2), and then we have:

221221

( ),

( , ) ( , ) ( , ) ( , ) 4 ( , )2

42

− + + + − + + −≈

= −

U x y

U x h y U x h y U x y h U x y h U x yh

h

(5)

Here 2

4h

denotes the overall scaling and represent the speed

of random walks, In order to accord with the definition in the equation (1), it is set as.

2

4 1=h

(6)

B. Solve the Poisson Equation In this paper, each pixel is treated as one grid. Successive

over-relaxation is applied to solve this Poisson equation. For a shape with m pixels, it will be calculated by following m m× linear system.

[ ][ ] [ ]A U b= , (7)

where [ ]U is the matrix which includes all the solution of Poisson equation by using natural ordering:

[ ] 1 2[ , , , ]= … TmU u u u , (8)

where [ ]A is:

0 0 0 00 0 0

0 0 0[ ]

0 0 00 00 0

D II D I

I D IA

I D II D I

I D

− …⎡ ⎤⎢ ⎥− − …⎢ ⎥⎢ ⎥− − …⎢ ⎥= ⎢ ⎥⎢ ⎥… − −⎢ ⎥

… … − −⎢ ⎥⎢ ⎥… … … −⎣ ⎦

(9)

I is the identity matrix and D is:

4 1 0 0 0 01 4 1 0 0 0

0 1 4 1 0 0

0 0 1 4 1 00 0 1 4 10 0 1 4

D

− …⎡ ⎤⎢ ⎥− − …⎢ ⎥⎢ ⎥− − …⎢ ⎥= ⎢ ⎥⎢ ⎥… − −⎢ ⎥

… … − −⎢ ⎥⎢ ⎥… … … −⎣ ⎦

(10)

[ ]b is:

[ ] 1 2 m[ , , , ]Tb b b b= … , (11)

where 1 2 m, , ,b b b… are the values decided by the Dirichlet

boundary condition plus the value of ( )U x, y . In this paper, all the values of contour grids are defined as 0 and all the ( ) 1U x, y = − , so we have:

1 2 mb b b= = … = . (12)

For convenience, we set 1 2 1mb b b= = … == . The algorithm for the converter from an image to [A] is as follow: Algorithm 1: Converter 1.Load a binary image 2.Count=0 3.For x=1 to height 4. For y=1 to width 5. If this pixel is not blank 6. count= count+1 7. A(count, count)= 4 8. Record the count as order number 9. If the up neighbor is not blank 10. A(count-1, count)= -1 11. A(count, count - 1)= -1 12. If the left neighbor is not blank 13. A(left neighbor’s order number, count)=-1 14. A(count, left neighbor’s order number)=-1 15. End for 16.End for

Now following algorithm is used to calculate [U]: Algorithm 2: Solution of Poisson equation 1.n equals to length of [b] and [U]= {0} 2.Repeat until the average iteration error of all pixels

is smaller than −× 51 10 3. For i= 1 to n 4. Temp=0 5. For j=1 to i-1 6. Temp=Temp + A(i, j)*U(j) 7. End for 8. For j=i+1 to n 9. Temp=Temp + A0(i, j)*U(j) 10. End for 11. U(i)=(1- ω)*U0(i)+ ω *(b(i)-temp)/A(i,i) 12. End for 13.End repeat

222222

Where ω is relaxation factor, it can be set as 1 as default, the method will be GS method. It also can be calculated using the spectral radius of the Jacobi transition matrix. C. Feature extraction

In order to extract more straightforward features including the number of segments and the shape of each segment, the skeleton algorithm and inner distance are combined with the solution of the Poisson equation.

First, the U∇ is calculated by using the central differences, and those points with local maximum value can be discovered by using certain threshold value. The number of the local maximum points is used to count the number of the segments.

Second, the skeleton of nucleus is also able to be extracted by using the following equations, which are proposed in the paper [1]. The skeleton can be narrowed down by using another threshold.

( )

Ψ

UUU

U

∇⋅∇ ⋅∇=

∼ (13)

Ψ∼

represents the value for skeleton Finally, based on the skeleton, the longest inner distance

of each segment are calculated and recorded. This step consists of two steps: 1) Build a set of all points within one skeleton of segment. For each pair of points, if the line segment connecting two points falls entirely within the skeleton, then build an edge between the two points with the weight equal to the Euclidean distance. 2) Apply a shortest

path algorithm to the graph. For a round shape, the skeleton is short and the inner distance is same as Euclidean distance.

Figure 3. Inner distance and Euclidean distance.

For a kidney shape, the inner distance is longer than the Euclidean distance.

In order to employ them in a neural network or other classifiers as inputs, the number of the local maximum points from step one is counted to represent the segment of nucleus. For different shape of segments, it is defined as three numbers {0, 1, 2}. “0” represents round shape of segment, “1” represents kidney shape, and “2” represents U shape as shown in Figure 3.

III. RESULTS OF THE EXPERIMENT The system has been implemented using Matlab. Figure

1 shows some samples from the test set. Figure 4 shows the results of the process. The sub-image “A” shows the nucleus of the white blood cell after segmentation which is based on the method proposed in [6].

A B C D E F

Neutrophil

Eosinophil

Basophil

Lymphocyte

Monocyte

Monocyte

Figure 4. Experimental Results.

223223

The result of the Poisson equation, which uses the

algorithm 1, is shown in the sub-image “B”. The value is shown by the color map, and the bright colour zone represents a high value zone, which normally is located in the centre of the shape, and blue represents a low value. The sub-image “C” shows U∇ which is the change rate of the result of the Poisson equation, and is calculated by the central differences method. The sub-image “D” shows the local maximum value of the result of the Poisson equation. In this experiment, the threshold of U∇ is set as 0.5 to extract the local maximum points. The result of equation (13) is shown in the sub-image “E” and here is used to extract the skeleton of the shape. The sub-image “F” is the skeleton diagram which is segmented by using the threshold 0.4. To classify different shapes of segments, the length of longest inner-distance and the ratio between longest inner-distance and Euclidean distance are used. If the longest inner-distance is smaller than 3 pixels and the ratio is 1, the shape is defined as a circle. If the longest inner-distance is bigger than 3 pixels and the ratio is bigger than 1 but smaller than 1.57, then the shape is defined as a kidney. If the ratio is bigger than 1.57, the shape is defined as U shape. These parameters can be used as efficient features for inputs of classifiers. The test applied these new shape features is carried out and the results are shown in the Table 2. Compared with current approaches, it successfully improves the accuracy of the recognition of Eosinophil and Neutrophil. It also provides straightforward shape features for doctors to confirm the classification of the cells.

TABLE II. THE CONFUSION MATRIX

N E B M L Correct (%)N 58 0 5 0 1 90.63 E 0 32 0 0 2 94.12 B 2 0 25 0 1 89.29 M 1 0 0 31 8 75.5 L 0 0 0 1 63 98.48

IV. CONCLUSION This paper presents a Poisson equation based approach

for leukocyte classification from a blood smeared image. The results provided a set of efficient shape features for leukocyte classifiers such as neural networks or other

classifiers. The advantage is that the features can be automatically extracted important shape features from a blood smeared image. It is also able to reduce the input number and complexity of leukocyte classifiers using such features. These features are able to distinguish different ages of leucocytes in a smear image efficiently so that the performance of leukocyte classifiers can be improved.

REFERENCES [1] L. Gorelick, M. Galun, E. Sharon, R. Basri,A. Brandt, "Shape

representation and classification usingthe poisson equation," In CVPR (2),2004.

[2] L. C. Junqueira,J. Carneiro, R. O. Kelley, "Basic Histology Appleton & Lange," Norwalk, Conn,1992.

[3] S. Zhang, "In An atlas of histology," New York: Springer-verlag, 1998, pp.393-403.

[4] I. Berman, Color Atlas of Basic Histology. 2ed, Stawford, Connecticut, EUA: Appleton and Lange,1998.

[5] W. Sandritter, C. Thomas, W. B. Wartman, "Color atlas & textbook of histopathology," Year Book Medical Publishers, 1979.

[6] Q. X. Wu, X. Huang, J. Cai, Y. Wu, M. Lin, "Segmentation of Leukocytes in Blood Smeare Images Using Color Processing Mechanism Inspired by The Visual System," BMEI'09, IEEE, 2009, pp.368-372.

[7] S. Mircic , N. Jorgovanovic, "Application of neural network for automatic classification of leukocytes", Proceedings of the 8th IEEE seminar on neural network applications in electrical engineering, 2006, pp.141–144.

[8] D.M.U. Sabino, L.F. Costa, E.G. Rizzatti, M.A. Zago, "Toward leukocyte recognition using morphometry, tex-ture and color," IEEE Intl Symp Biom Imag, 2004.

[9] H. Ramoser,V. Laurain, H. Bischof, R. Ecker, "Leukocyte segmentation and classification in blood-smear images," EMBS 2005 IEEE, 2005, pp.3371-3374.

[10] M. Ferri, S. Lombardini, C. Pallotti, "Leukocyte classification by size functions," Proceedings of the second IEEE Workshop on Applications of Computer Vision. IEEE Computer Society Press, Los Alamitos, CA, 1994, pp.223-229.

[11] D.M.U. Sabino, L.F. Costa, E.G. Rizzatti, M.A. Zago, "A texture approach to leukocyte recognition," Real-Time Imaging, vol. 10, 2004, pp.205–216.

[12] D. H. Tycko, S. Anbalagan, H.C. Liu, L. Ornstein, "Automatic leukocyte classification using cytochemically stained smears," J Histochem Cytochem. 24(1), 1976, pp.178-94.

[13] P.E. Pavlova, K.P. Cyrrilov, I. N. Moumdjiev, "Application of HSV colour system in the identification by colour of biological objects on the basis of microscopic images," Computerized Medical Imaging and Graphics. 20(5), 1996, pp.357-64.

224224