hybrid segmentation, characterization and classificatio

16
Hybrid segmentation, characterization and classification of basal cell nuclei from histopathological images of normal oral mucosa and oral submucous fibrosis M. Muthu Rama Krishnan a , Chandan Chakraborty a,, Ranjan Rashmi Paul b , Ajoy K. Ray c a School of Medical Science and Technology, IIT Kharagpur, India b Department of Oral and Maxillofacial Pathology, Guru Nanak Institute of Dental Science and Research, Kolkata, India c Department of Electronics and Electrical Communication Engineering, IIT Kharagpur, India article info Keywords: Oral submucous fibrosis Zernike moments Parabola fitting Color deconvolution Fuzzy divergence Unsupervised feature selection Gradient vector flow Pattern classification abstract This work presents a quantitative microscopic approach for discriminating oral submucous fibrosis (OSF) from normal oral mucosa (NOM) in respect to morphological and textural properties of the basal cell nuclei. Practically, basal cells constitute the proliferative compartment (called basal layer) of the epithe- lium. In the context of histopathological evaluation, the morphometry and texture of basal nuclei are assumed to vary during malignant transformation according to onco-pathologists. In order to automate the pathological understanding, the basal layer is initially extracted from histopathological images of NOM (n = 341) and OSF (n = 429) samples using fuzzy divergence, morphological operations and parabola fitting followed by median filter-based noise reduction. Next, the nuclei are segmented from the layer using color deconvolution, marker-controlled watershed transform and gradient vector flow (GVF) active contour method. Eighteen morphological, 4 gray-level co-occurrence matrix (GLCM) based texture fea- tures and 1 intensity feature are quantized from five types of basal nuclei characteristics. Afterwards, unsupervised feature selection method is used to evaluate significant features and hence 18 are obtained as most discriminative out of 23. Finally, supervised and unsupervised classifiers are trained and tested with 18 features for the classification between normal and OSF samples. Experimental results are obtained and compared. It is observed that linear kernel based support vector machine (SVM) leads to 99.66% accuracy in comparison with Bayesian classifier (96.56%) and Gaussian mixture model (90.37%). Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Oral cancer (OC) is the sixth most common cancer in the world. It accounts for approximately 4% of all cancers and 2% of all cancer deaths world-wide. In India it is the commonest malignant neo- plasm, accounting for 20–30% of all cancers (Banoczy, 1982; Burkhardt, 1985; Daftary et al., 1993). A higher incidence of OC is observed on the Indian subcontinent mainly due to the late diagno- sis of potentially precancerous lesions. Oral submucous fibrosis (OSF) is an insidious chronic, progressive, precancerous condition with a high degree of malignant potentiality. A large number of these cases transform into OC. Through progression of this patho- sis, OC develops in the epithelial region of the oral mucosa. The precancerous status is judged on the basis of light microscopic his- topathological features of oral epithelial dysplasia (OED) and/or cellular atypia which have different grades according to involvement of the epithelial region (Paul et al., 2005). There is no established quantitative technique by which histop- athologically significant features of the diseased tissue like (i) thick- ness of different histological layers, (ii) density, distribution, and alignment of tissue components, and (iii) cell population density, distribution, and their different morphological attributes could be analyzed. Actually, a precancerous state generally depicts mixed features of normalcy as well as pro- or pre-malignancy. With the disease progression, the histological scenario alters slowly in differ- ent combinations, characterizing the specific pathological state of progression toward malignancy (Paul et al., 2005). Pathologists have been using microscopic images to study tis- sue biopsies for a long time, relying on their personal experience on giving decisions about the healthiness state of the examined biopsy. This includes distinguishing normal from abnormal (i.e., cancerous) tissue, benign versus malignant tumors and identifying the level of tumor malignancy. Nevertheless, variability in the re- ported diagnosis may still occur (Duncan & Ayache, 2000), which could be due to, but not limited to the heterogeneous nature of the diseases; ambiguity caused by nuclei overlapping; noise arising from the staining process of the tissue samples; intra-observer var- iability, i.e., pathologists are not able to give the same reading of the same image at more than one occasion; and inter-observer 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.07.107 Corresponding author. Address: School of Medical Science and Technology, Indian Institute of Technology Kharagpur, West Bengal 721 302, India. Tel.: +91 3222 283570; fax: +91 3222 28881. E-mail address: [email protected] (C. Chakraborty). Expert Systems with Applications 39 (2012) 1062–1077 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: pandre

Post on 10-Nov-2015

12 views

Category:

Documents


0 download

DESCRIPTION

Hybrid Segmentation, Characterization and Classificatio

TRANSCRIPT

  • do

    , R

    e andia

    Oral submucous brosisZernike momentsParabola ttingColor deconvolutionFuzzy divergenceUnsupervised feature selectionGradient vector owPattern classication

    from normal oral mucosa (NOM) in respect to morphological and textural properties of the basal cell

    these cases transform into OC. Through progression of this patho-sis, OC develops in the epithelial region of the oral mucosa. Theprecancerous status is judged on the basis of light microscopic his-topathological features of oral epithelial dysplasia (OED) and/orcellular atypia which have different grades according toinvolvement of the epithelial region (Paul et al., 2005).

    sue biopsies for a long time, relying on their personal experienceon giving decisions about the healthiness state of the examinedbiopsy. This includes distinguishing normal from abnormal (i.e.,cancerous) tissue, benign versus malignant tumors and identifyingthe level of tumor malignancy. Nevertheless, variability in the re-ported diagnosis may still occur (Duncan & Ayache, 2000), whichcould be due to, but not limited to the heterogeneous nature ofthe diseases; ambiguity caused by nuclei overlapping; noise arisingfrom the staining process of the tissue samples; intra-observer var-iability, i.e., pathologists are not able to give the same reading ofthe same image at more than one occasion; and inter-observer

    Corresponding author. Address: School of Medical Science and Technology,Indian Institute of Technology Kharagpur, West Bengal 721 302, India. Tel.: +913222 283570; fax: +91 3222 28881.

    Expert Systems with Applications 39 (2012) 10621077

    Contents lists available at

    Expert Systems w

    .eE-mail address: [email protected] (C. Chakraborty).1. Introduction

    Oral cancer (OC) is the sixth most common cancer in the world.It accounts for approximately 4% of all cancers and 2% of all cancerdeaths world-wide. In India it is the commonest malignant neo-plasm, accounting for 2030% of all cancers (Banoczy, 1982;Burkhardt, 1985; Daftary et al., 1993). A higher incidence of OC isobserved on the Indian subcontinent mainly due to the late diagno-sis of potentially precancerous lesions. Oral submucous brosis(OSF) is an insidious chronic, progressive, precancerous conditionwith a high degree of malignant potentiality. A large number of

    There is no established quantitative technique by which histop-athologically signicant features of the diseased tissue like (i) thick-ness of different histological layers, (ii) density, distribution, andalignment of tissue components, and (iii) cell population density,distribution, and their different morphological attributes could beanalyzed. Actually, a precancerous state generally depicts mixedfeatures of normalcy as well as pro- or pre-malignancy. With thedisease progression, the histological scenario alters slowly in differ-ent combinations, characterizing the specic pathological state ofprogression toward malignancy (Paul et al., 2005).

    Pathologists have been using microscopic images to study tis-0957-4174/$ - see front matter 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.07.107nuclei. Practically, basal cells constitute the proliferative compartment (called basal layer) of the epithe-lium. In the context of histopathological evaluation, the morphometry and texture of basal nuclei areassumed to vary during malignant transformation according to onco-pathologists. In order to automatethe pathological understanding, the basal layer is initially extracted from histopathological images ofNOM (n = 341) and OSF (n = 429) samples using fuzzy divergence, morphological operations and parabolatting followed by median lter-based noise reduction. Next, the nuclei are segmented from the layerusing color deconvolution, marker-controlled watershed transform and gradient vector ow (GVF) activecontour method. Eighteen morphological, 4 gray-level co-occurrence matrix (GLCM) based texture fea-tures and 1 intensity feature are quantized from ve types of basal nuclei characteristics. Afterwards,unsupervised feature selection method is used to evaluate signicant features and hence 18 are obtainedas most discriminative out of 23. Finally, supervised and unsupervised classiers are trained and testedwith 18 features for the classication between normal and OSF samples. Experimental results areobtained and compared. It is observed that linear kernel based support vector machine (SVM) leads to99.66% accuracy in comparison with Bayesian classier (96.56%) and Gaussian mixture model (90.37%).

    2011 Elsevier Ltd. All rights reserved.Keywords: This work presents a quantitative microscopic approach for discriminating oral submucous brosis (OSF)Hybrid segmentation, characterization anfrom histopathological images of normal

    M. Muthu Rama Krishnan a, Chandan Chakraborty a,a School of Medical Science and Technology, IIT Kharagpur, IndiabDepartment of Oral and Maxillofacial Pathology, Guru Nanak Institute of Dental ScienccDepartment of Electronics and Electrical Communication Engineering, IIT Kharagpur, In

    a r t i c l e i n f o a b s t r a c t

    journal homepage: wwwll rights reserved.classication of basal cell nucleiral mucosa and oral submucous brosis

    anjan Rashmi Paul b, Ajoy K. Ray c

    d Research, Kolkata, India

    SciVerse ScienceDirect

    ith Applications

    lsevier .com/locate /eswa

  • aims to avoid unnecessary biopsies and assist pathologists in theprocess of cancer diagnosis (Gilles et al., 2008; Grootscholten

    stemet al., 2008; Shuttleworth, Todman, Norrish, & Bennett, 2005).Thus, quantitative evaluation of the histopathological features isnot only important for accurate diagnostics, but it is also vital forassessing the relative involvement of the different tissue compo-nents in the pathology of the disease.

    Generally, basal cells form the proliferative compartment(Shabana, Gel-Labban, & Lee, 1987) of the epithelium from whichcells migrate, differentiate as they progress and eventually desqua-mated at the surface. The keratinocytes of the basal cell layer of theoral stratied squamous epithelium represent the progenitor cells(Satheesh, Paul, & Hammond, 2007) that are responsible for theproduction of other cells making the various layers of the epithe-lium. Changes in the basal cells may have serious implications onfuture cell behavior, including malignant transformation. The mea-surement of their size and shape in OSF may be an important prog-nostic marker as studies have shown that there is an increase inthe size and shape of both the cell and the nucleus during OSF.

    Automatic grading of pathological images has been investigatedin various elds during the past few years, including brain tumor asto cytomas (ASTs) (Glotsos, 2003; McKeown & Ramsey, 1996;Scarpelli, Bartels, Montironi, Galluzzi, & Thompson, 1994; Schad,Schmitt, Oberwittler, & Lorenz, 1987), prostate carcinoma (Farjam,Soltanian-Zadeh, Zorro, & Jafari-Khouzani, 2005; Jafari-Khouzani& Soltanian-Zadeh, 2003; Smith, Zajicek, Werman, Pizov, &Sherman, 1999; Tabesh et al., 2007), renal cell carcinoma (RCC)(Fuhrman, Lasky, & Limas, 1982; Hand & Broders, 1931; Kim, Choi,Cha, & Choi, 2005; Lohse, Blute, Zincke, Weaver, & Chenille, 2002;Novara, Martignoni, Artibani, & Ficarra, 2007), and hepatocellularcarcinoma (Huang & Lai, 2010); however, an automated systemfor screening OSF biopsy images has not been exhaustively reportedin the literature, but some works have been reported (Muthu RamaKrishnan et al., 2009; Muthu Rama Krishnan, Shah et al., 2010) inother layers of oral mucosa. Since the grading system for a specictype of cancer cannot be applied to other types of cancers, it is nec-essary to exploit appropriate segmentation, feature extraction, andclassication methods for different types of cancers. This is partic-ularly true for the oral cancer because OSF biopsy images alwayssuffer from the problem of impurities, undesirable elements, anduneven exposure. In OSF screening, the characteristics of basal cellnuclei are the key to estimate the degree of oral malignancy. How-ever, the areas of nuclei, cytoplasm, and cells are difcult to be iden-tied and measured. In this paper, we propose a novel method tosegment the basal cell nuclei. Twenty three features are extractedfrom OSF biopsy images according to ve types of characteristicscommonly adopted by pathologists.

    We use set of supervised (Bayesian and SVM) and unsupervised(k-means, fuzzy c-means and Gaussian mixture model (GMM))classiers to test the effectiveness of classication for OSF biopsyimages. In this study, we nd that not all 23 features are equallyimportant or necessary to distinguish normal and OSF images.Therefore, we implemented an unsupervised feature selection foroptimal feature subset selection so that the best performance ofclassifying OSF images can be achieved.

    2. Materials and methods

    2.1. Histologyvariability, i.e., increase in classication variation among patholo-gists. Therefore, over the past three decades, quantitative tech-niques have been developed for computer-aided diagnosis, which

    M. Muthu Rama Krishnan et al. / Expert SyTwelve study subjects clinically diagnosed with OSF have beensubjected to incisional biopsy under their informed consent at theDepartment of Oral and Maxillofacial Pathology, Guru Nanak Insti-tute of Dental Sciences and Research, Kolkata, India. Normal studysamples are collected from the buccal mucosa of 10 healthy volun-teers without having any oral habits or any other known systemicdiseases with prior written consent. The study subjects are of sim-ilar age (2140 years) and food habits. This study is duly approvedby the ethics review committee of the institute. All the biopsy sam-ples processed for histopathological examination and parafnembedded tissue sections of 5 lm thickness prepared and thenstained by haematoxylin and eosin (H&E).

    2.2. Image acquisition

    Images of basal layer for Normal oral mucosa and OSF are opti-cally grabbed by Zeiss Observer. Z1 Microscope from H&E stainedhistological sections under 100 objectives (N.A.1.4) at School ofMedical Science & Technology. At a resolution of 0.24 lm and thepixel size of 0.06 lm. Image database for this analysis consist of1194 cells, which are extracted from 341 normal and 429 OSF withdysplasia images. The grabbed images are digitized at 1388 1040 pixels and stored in a computer.

    2.3. Image processing

    The histopathological image of oral mucosa grabbed by CarlZeiss microscope contains white and black pixels (noise) randomly.To remove the noise median lter is used (Gonzalez & Woods,2002). The block diagram of the proposed methodology for quanti-tative evaluation of basal cell nuclei is as shown in Fig. 1.

    2.4. Basal cell nuclei segmentation

    The novel approach for basal cell nuclei analysis mainly consistsof three stages. (1) Basal nuclei extraction, (2) feature extractionand (3) classication. Basal nuclei extraction is a four step mecha-nism, i.e. (a) extraction of lower boundary of epithelium, (b) parab-ola tting to segment the basal layer (c) segmentation of basal cellusing marker controlled watershed and (d) extraction of nucleiusing gradient vector ow. Anatomically, basal layer is the rstlayer in epithelium and rst mechanism of basal nuclei extractionkeeps this promise by extracting epithelio-mesenchymal (EM)junction. The rst step in this mechanism consists of morphologi-cal operations on H&E stained image to diminish the local maxima(cell boundaries and variation in collagen bers) present in epithe-lium and connective tissue followed by enhancing the edges byanisotropic diffusion and generating binary image using fuzzydivergence (Chaira & Ray, 2003, 2009) based thresholding.

    2.4.1. Edge enhancement using anisotropic diffusionAnisotropic diffusion (Grieg, Kubler, Kikinis, & Jolesz, 1992) is a

    technique aiming at removing or smoothing of the homogeneouspart of an image keeping the signicant part of the image like edge,line and other details that are important for the image interpreta-tion. The main idea of this approach is to embed the original imagein a family of derived images obtained by convolving the originalimage with a Gaussian kernel having variance t, which is a scale-space parameter. Larger values of t correspond to coarser resolu-tion and lower values correspond to ne resolution. This oneparameter family of derived images can be represented as a solu-tion of the heat conduction or diffusion equation. Mathematically,the anisotropic diffusion is dened as

    @I@t

    divcx; y; trI rc:rI cx; y; tDI; 1

    s with Applications 39 (2012) 10621077 1063where D denotes the Laplacian operator,r denotes the gradient andc(x, y, t) is the diffusion coefcient which controls the rate of

  • tem1064 M. Muthu Rama Krishnan et al. / Expert Sysdiffusion. The solutions to the diffusion Eq. (1) is proposed by Pero-na and Malik (1990) as two functions for diffusion coefcients

    ckrIk e krIkK 2

    ; 2

    ckrIk 11 krIkK

    2 ; 3

    the constant K controls the sensitivity to edges and the rst privi-leges high-contrast edges over low-contrast edges and the secondone privileges wide regions over smaller ones. 8-nearest neighborsdiscretization of the Laplacian operator is used. After diffusion, theimage edges are made sharp (Fig. 2(d)) it can be inferred from thehistograms before and after diffusion. Further, thresholding is doneusing fuzzy divergence to segment out the surface epithelium.

    2.4.2. Fuzzy divergence based threshold selectionFan and Xie (1999) proposed fuzzy divergence from fuzzy expo-

    nential entropy by using a single row vector. Here the divergenceconcept of Fan and Xie is extended to an image, represented by amatrix. In an image of size M M with L distinct gray level having

    Fig. 1. Block diagram of the proposed methodology.probabilities (p0, p1, p2, . . . , pL1), the exponential entropy is de-ned as H PL1i0 pie1pi .

    The fuzzy entropy for an image A of size M M is dened as

    HA 1n ep 1

    XM1i0

    XM1j0

    lAfij e1lAfij 1 lAfij elAfij 1

    4Here n =M2 and i, j = 0, 1, 2, 3, . . . , (M 1). lAfij is the membershipvalue if the pixel in the image and fij is the (i, j)th pixel of the imageA. For two images A and B, at the (i, j)th pixel of the image, the infor-mation of discrimination between lA(aij) and lB(bij) of images A andB is given by (Chaira & Ray, 2003, 2009)

    elAaij=elBbij elAaijlBbij; 5where lA(aij) and lB(bij) are the membership values of the (i, j)thpixels in images A and B, respectively. i, j = 0, 1, 2, . . . ,M 1. Thediscrimination of image A against image B may be given as

    D1A;BXM1i0

    XM1j0

    1 1lAaij

    elAaijlBbij lAaijelBbijlAaij

    6Likewise the discrimination of B against A is

    D2B;AXM1i0

    XM1j0

    1 1lBbij

    elBbijlAaij lBbijelAaijlBbij

    :

    7So, total fuzzy divergence between image A and B is obtained fromEqs. (6) and (7)

    DA;B D1A;B D2B;A; 8

    DA;B XM1i0

    XM1j0

    2 1 lAaij lBbij

    :elAaijlBbij

    1 lBbij lAaij

    :elBbijlAaij: 9

    In the method, image A is an original image and image B is an ide-ally segmented image. An ideally segmented image is dened as theimage which is perfectly thresholded so that each pixel belongs toexactly either to the object or to the background region. In such sit-uation, the membership values for ideally segmented image of eachpixel belong to the object/background region should be equal toone. Hence the above Eq. (9) becomes,

    DA;B XM1io

    XN1j0

    2 2 lAaij

    :elAaij1 lAaij:e1lAaij

    :

    10Henceforth, in that way the divergence value of each pixel is calcu-lated for whole image and corresponding gray level is noted. Thegray value corresponding to the minimum divergence (Fig. 3(b)) ischosen as threshold initially for segmenting the object (epithelium)and background (rest of the image) regions. In fact, the minimumdivergence value indicates the maximum belongingness of each ob-ject pixel to the object region (epithelium) and each backgroundpixel to the background region (connective tissue). Morphologicaloperations are performed on Fig. 3(c) to diminish the local maxima(cell boundaries and variation in collagen bers) present in epithe-lium and connective tissue output is shown in Fig. 3(d). The areahaving maximum white pixels is extracted using connected compo-nent labeling (Fig. 3(e)).

    s with Applications 39 (2012) 10621077Next, the edges, boundaries are extracted from this binary im-age using canny edge detector (Fig. 3 (f)). The longest edge presentin this image is the epithelio-mesenchymal (EM) junction, which is

  • stemM. Muthu Rama Krishnan et al. / Expert Syextracted by Connected component labeling to locate the lowerboundary of the basal layer (Fig. 3 (f)). The abrupt variation in thisedge is lessened by ltering it with band-pass lter. The shape andorientation of this lower boundary at 100 magnication can beapproximated by parabola (Fig. 3(g)). Parabola tting is performedby linear regression as parabola equation is a linear model (Rust,2001).

    2.4.3. Parabola ttingGeneralized equation for parabola is Y = aX2 + bX + C. If the

    straight line model is inadequate for given data set, polynomialwith degree 2, i.e., parabola may be one of the good choices ashigher orders polynomial are unstable. Polynomial equation is alinear model so generalized model can be used to obtain linearregression (Muthu Rama Krishnan, Pal et al., 2010). The modelfor the (n + 1)th order or nth degree polynomial is

    yt Xn1i1

    aiXi1: 11

    Fig. 2. (a) Normal colour image; (b) gray scale image of (a); (c) histogram of image (b); (s with Applications 39 (2012) 10621077 1065In matrix form,

    yt1yt2yt3

    ..

    .

    ytm

    2666666664

    3777777775

    1 Xt1 X2t1 X

    nt1

    1 Xt2 X2t2 .. . X

    nt2

    ..

    . . .. ..

    .

    1 Xtm X2tm .. . X

    ntm

    26666664

    37777775

    a

    a2a3

    ..

    .

    an1

    2666666664

    3777777775

    or in shorter form, Y = Xa, where X is m n + 1-dimensional matrix(n ( m) and a is a n + 1 column vector.

    Let us write the objective function for the least square estima-tion as

    La Xmi1

    Y Xa2i Y XaT Y Xa: 12

    Which we can expand to give

    La YTY 2aTXTY aTXTXa: 13

    d) diffused image of (b) using anisotropic diffusion; and (e) histogram of image (d).

  • e, focted

    temGeometrically, the objective function denes an (n + 1) dimen-sional, quadratic hypersurface, sometimes called the response sur-face, whose level curves correspond to concentric n-dimensional

    Fig. 3. (a) Normal gray scale image; (b) plots of gray level against fuzzy divergencoperation to remove small objects within the epithelium; (e) larger white area extra

    1066 M. Muthu Rama Krishnan et al. / Expert Sysellipsoids in the a-space. It has a unique global minimum that wecan nd by differentiating L(a) with respect to a and equating theresult to the zero vector,

    @L@a

    2XTY 2XTXa 0:

    Thus, the minimizing a must satisfy the n n system of linearequations

    XTXa XTY : 14Which are often called the normal equations. Because the columnsof X are linearly independent, the matrix product on the left side isnonsingular, so the unique solution is

    a XTX1XTY: 15If relatively small perturbations in the data produce relatively largeperturbation in solution, we can get more numerically stable algo-rithm by computing an orthogonal factorization form

    X Q R0

    ;

    where Q is an m m orthogonal matrix QTQ = l = QQT, R is an n nupper triangular matrix, and 0 is an (m n) n matrix of zeroes.By substituting this factorization into Equation, we can easily verifythat a satises the n n upper triangular systemRa Q1Y ; 16where Q1 is the m n matrix formed by the rst n columns of Q(Rust, 2001).

    Assuming the model tted to the data is correct, the residualsapproximate the random errors. It is dened as ri = Yi Xia, fori = t1, t2, . . ., , tm.Therefore, if the residuals appear to behave randomly, it sug-gests that the model ts the data well. The parabola is tted overthe lower boundary (EM junction).

    r selection of the threshold value; (c) thresholded image of (a); (d) morphologicalusing connected component labeling; and (f) lower boundary extraction from (e).

    s with Applications 39 (2012) 10621077Next step is to generate n parabola parallel to the tted parab-ola for lower boundary of basal layer such that the image gener-ated from these parallel parabola overlays basal layer completely(Fig. 4a and b). The effective distance between two parallel parab-olas at distal end and center is not same. This property of the par-allel parabolas generates image mask (Fig. 4(b)) and this mask issuperimposed on Fig. 3(a), which gives basal layer (Fig. 4(c)).

    2.4.4. Basal cell segmentation using color deconvolutionMoreover, epithelial cell borders cannot be isolated accurately

    in H&E stain; it can be estimated statistically using space partitionprocedure. Initially, the Haematoxylin plane is extracted using col-or deconvolution (Ruifrok & Johnston, 2001), which has high con-trast between nuclei and cytoplasm.

    2.4.4.1. Color deconvolution. According to LambertBeers law, thedetected intensities of light transmitted through the specimenand the amount (A) of stain with absorption factor c is described by

    I IoeAc 17with I0 is the intensity of light entering the specimen, I is the inten-sity of light detected after passing the specimen. This suggests thatthe gray-values of each RGB channel depend on concentration ofstain in a non-linear way. Hence it is difcult to separate out eachstain by intensity. However, the optical density (OD) can be usedto separate it out and it is dened as

    OD log10IIo Ac: 18

    Hence OD is proportional to absorption factor c for given amount ofstain. This helps us to separate the contribution of each stain frommulti stained specimen. Each pure stain will be characterized by a

  • specic optical density for the light in each of the three RGB chan-nels, which can be represented by a 3 1 OD vector describing thestain in the OD-converted RGB color space. The length of the vectorwill be proportional to the amount of stain, while the relative valuesof the vector describe the actual OD for the detection channels(Ruifrok & Johnston, 2001).

    In the case of three channels, the color system can be describedas a matrix of the formwith every row representing a specic stain,and every column representing the optical density as detected bythe red, green and blue channel for each stain. Stain-specic valuesfor the OD in each of the three channels can be easily determinedby measuring relative absorption for red, green and blue on slides

    M m11 m12 m13m21 m22 m23m31 m32 m33

    0B@

    1CA;

    where mij OijP3

    k1O2ik

    qand Oij is the element of the OD matrix.

    If C is the 3 1 vector for amounts of the three stains at a par-ticular pixel, then the vector of OD levels detected at that pixel isy =MC. From the above it is clear that C =M1 y. This means, thatmultiplication of the OD image with the inverse of the OD matrix,which we dene as the color-deconvolution matrix D =M1, resultsin orthonormal representation of the stains forming the image;

    Fig. 4. (a) Parabola tting using lower boundary; (b) mask of the tted parabola; and (c) extracted basal layer using the mask.

    M. Muthu Rama Krishnan et al. / Expert Systems with Applications 39 (2012) 10621077 1067stained with a single stain.If we can nd out the ortho-normal transformation of this ma-

    trix, it is easy to separate each stain contribution. The transforma-tion has to be normalized to achieve correct balancing of theabsorbtion factor for each separate stain. If matrixM is normalizedmatrix of matrix OD then it is dened asFig. 5. (a) Extracted basal layer; (b) contrast enhanced nuclei using color deconvolumorphological operations on image (c); (e) watershed output over image (d); (f) segmeC Dy: 19This enhanced nuclei (Fig. 5(b)) with morphological operations(Fig. 5(d)) works as a marker in watershed algorithm to segmentepithelium in different compartment. This compartment effectivelyshows the segmentation of epithelium in to basal cells (Fig. 5(f)).Here, all partitions do not exactly contain the basal cells as sometion; (c) thresholded image of (b) using fuzzy divergence; (d) after performingnted boundaries of basal cells are superimposed on the extracted basal layer.

  • ; (d

    1068 M. Muthu Rama Krishnan et al. / Expert Systems with Applications 39 (2012) 10621077of them have the suprabasal cells or clump of basal cells. The fol-lowing approach is adopted to classify the partition so called pseu-do cell as a basal cell or non-basal cell.

    Fig. 6. (a) Basal cell image; (b) gradient image; (c) normalized GVF eldFirst step is to nd the neighbors for all pseudo cells followed byevaluation of each pseudo cell area and if it is not within threshold,

    Fig. 7. (ac) Basal cell nuclei contours tracked usinthen it should be merged or ignored depending upon whether it ispart of the cell or background respectively and named as to bemerged cell. Further, shape parameter compactness and variance

    ) deformation of the contour; and (e) nal contour obtained using GVF.are evaluated for to be merged cell and respective neighbor. Thesefeatures are fuzzy in nature and are evaluated by trapezoid

    g GVF based snakes. (df) Segmented nucleus.

  • me

    iteration times for gradient computation is bigger than 80 in these

    (f ) Perimeter equivalent diameter (PED) mathematically

    area

    (f6)

    whereby ell

    been done by rst tting the nucleus by a minimum bounding rect-angle.

    For a discrete case such as image, if p(x, y) is the current pixel, the

    To cal

    stemcases.Fig. 6(ae) shows an example of a cell being tracked by the GVF

    snake. The red lines indicate the moving contour at different pointsof time. The segmented nuclei is shown in Fig. 7(df).

    3. Feature extraction

    The criteria of OSF with dysplasia grading are usually based onthe following four types of characteristics: nuclear changes (varia-tion in size and shape, polymorphism (nuclei of the basal layer areelongated and perpendicular to basement membrane), nuclearirregularity, hyperchromasia (excessive pigmentation in hemoglo-bin content of basal cell nuclei)). The above four types of character-istics are provided by experienced onco-pathologists (Paul et al.,2005) and usually used for OSF with dysplasia grading. In addition,to facilitate computer processing and image analysis, the onco-pathologists also suggest nuclear texture as the fth type of char-acteristics. Then, 23 features based on these ve types of character-istics are extracted from oral histopathological images forclassication.

    The following features are evaluated for nucleus. (a) Area, (b)perimeter, (c) eccentricity, (d) area equivalent diameter, (e) perim-eter equivalent diameter, (f) convex area, (g) Zernike moments, and(h) Fourier descriptors, etc. Counting the number of pixels presentin binary image of the nucleus gives the (f1) area, whereas (f2)perimeter of the nucleus has been obtained by counting the numberof boundary pixels present in the nucleus. (f3) Form factor is pro-portional to the area of each nucleus divided by the square ofperimeter (http://www.dentistry.bham.ac.uk/landinig/software/software.html).

    (f3) Form factor mathematically dened as

    Form factor 4 p area : 20highest membership value. Moreover, the elimination of supraba-sal layer is carried out by extracting the lowest cell from the imageas basal layer is the rst layer in epithelium.

    2.5. Basal cell nuclei tracking using GVF snakes

    The watershed segmentation gives the initial boundary aroundnuclei which also contains the background epithelial region. Tosegment the exact boundaries of objects, we use an energy-minimizing contour, called snake (Xu & Prince, 1997), which isguided by external constraint forces and inuenced by imageforces that pull towards the edges. Snake provides a powerfulinteractive tool for image segmentation. We use the contour ob-tained from the previous segmentation result as the initial contour,and then move this contour close to the more accurate nuclei con-tour under the inuence of internal forces depending on the intrin-sic properties of the curve and external forces derived from theimage edge data.

    To obtain good segmentation result, Gaussian Blur is applied onthe image which restrains the noise in the cell image. The edgegradient of the image is computed using edge computation by So-bel operator. Flexible parameter a and rigid parameter b are alsoanalysed by testing it. One different cases, which prove that thesnake model cannot get good convergence result if a is less than1. Hence a is taken to be 1.2. Furthermore, parameter b does notwork in any cases. At the same time, iteration times of GVF Snakeare analysed too. The segmentation result becomes stable if thembership function. Then, to be merged cell is merged with

    M. Muthu Rama Krishnan et al. / Expert Syperimeter2

    (f4) Area equivalent diameter (AED) mathematically dened as

    unit dthe orare nox y

    culate the Zernike moment, the image is rst mapped to theexpression of Zernike moment becomes

    Amn m 1pXX

    px; yVmnx; y: 30where m = 0, 1, 2,. . . denes order of the moment and f(x, y) is thefunction being described. Here n is an integer that depicting theangular dependence or rotation subject to the following condition:

    m jnj even; jnj 6 m: 26Now, its expression in polar coordinates is

    Vmnr; h Rmnr expjnh: 27Here Rmn is the orthogonal radial polynomial and is dened as

    Rmnr Xmjnj2s0

    1sFm;n; s; r; 28

    and Fm;n; s; r m s!s! mjnj2 s

    ! mjnj2 s

    !rm2s: 29Amn m 1p x yf x; yVmnx; y dxdy; 25where Feret: Largest axis length of minimum bounding rectangle;Breadth: The largest axis perpendicular to the Feret (not necessarilycolinear).

    (f8) Zernike moment.The Zernike polynomials are rst proposed in 1934 by Zernike.Their moment formulation appears to be one of the most pop-ular, outperforming in terms of noise resilience, informationredundancy and reconstruction capability. Complex Zernikemoments are constructed using a set of complex polynomialswhich form a complete orthogonal basis set dened on the unitdisc. They are expressed as Two dimensional Zernike moment(Khotanzad & Hong, 1990):Z ZAspect Ratio FeretBreadth

    ; 24Chaudari and Samal (2007).(f7) Aspect ratioIt is mathematically dened asThe algorithm for minimum bounding rectangle is given inby a minimum bounding rectangle. The ellipse approximation hasa and b indicates major and minor axis. Which are obtainediptical approximation. Each nucleus has been approximatedPerimeter equivalent diameter p

    ; 22

    Eccentricity is calculated by the following equation:

    Eccentricity a2 b2

    pa

    ; 235

    dened as rp

    Area equivalent diameter

    4 area

    r21

    s with Applications 39 (2012) 10621077 1069isc using polar coordinates, where the centre of the image isigin of the unit disc. Those pixels falling outside the unit disct used in the calculation. The coordinates are then described

  • byandrotScathe

    age obance i

    Now t

    quencthe edcompu

    heightrectan

    hu(f13dia

    some structuring element, and bottom-hat transform is the dif-ference between the closing and the input image. Top-hattransform returns an image containing elements that are smal-ler than the structuring element and brighter than their sur-roundings. Bottom-hat transform returns an image containingelements that are smaller than the structuring elements anddarker than their surroundings.

    Spot areas ratio 1n

    Xni1

    1kNik kBNik kDNik

    ; 40

    where B(Ni): the overall size of all bright-spots innucleus Ni; D(Ni):the overallsize of all dark-spots innucleus Ni; Ni: ith nucleus in theimage 1 6 i 6 n.

    107 tem(f14)

    Concav ity : Convex Area-Nuclear Area: 37(f15) Orientation: Angle (in degrees) between the x-axis and themajor axis of the ellipse that has the same second-moments asthe region.(f ) Area Irregularity: The nucleus is rotated so that its major16axiboll.) Roundness: Nuclear area divided by the area of a circle withmeter equal to the length of the major axis.(f11) Convex area: Area of the convex hull (area of the smallestconvex set of pixels containing the entire nuclear object).(f12) Solidity: Nuclear area divided by the area of the convex. R describes the deviation degree of the nuclei to thegle.where FR(u, v) and FI(u, v)are real and imaginary parts of the Fouriertransform of the image respectively, and u and v are the frequenciesalong the x and y axes of the image, respectively. Fourier descriptorsare not invariants to scaling and translation. Scaling and translationinvariant can be achieved using Eqs. (31) and (32).

    (f10) Rectangularity of the nuclei region mathematically denedas

    R AW H : 36

    A stands for the area, W stands for the width, H stands for theu0;v0

    PAC

    XF2Ru;v F2I u; v; 35e, the total number of the descriptors varies as the length ofge changes. Here the AC power of the Fourier descriptor isted as follows:au XK1k0

    skej2puk=K : 34

    If we consider length of DFT of any sequence is same as original se-sk xi jyi: 33he DFT of s(k) iscount for a binary image) is m00 = b, where b is a predetermined va-lue (Khotanzad & Hong, 1990).

    (f9) Fourier descriptorsIn any image (xi, yi) where i = 1, 2, . . . , K represents the edgepoints of an object, Fourier descriptors of that edge can be rep-resented by the following approach. Each point can be treatedas a complex number (Gonzalez & Woods, 2002) so thatject center, causingm01 =m10 = 0. Following this, scale invari-s produced by altering each object so that its area (or pixelmoment

    hx; y f xa x; y

    a y

    where a

    b

    m00

    s31

    and, x m10m00 ; y m01m00

    .Here, m01, m00, m10 are the regular moments

    mpq Xx

    Xy

    xpyqf x; y: 32

    Translation invariance is achieved by moving the origin to the im-the length of the vector from the origin to the coordinate point rthe angle from the x-axis to the vector r. Zernike moments are

    ation invariants but not invariants to scaling and translation.ling and translation invariant can be achieved by transformingpixel coordinate using following rule before applying Zernike

    0 M. Muthu Rama Krishnan et al. / Expert Syss becomes horizontal & is then enclosed by a minimumunding rectangle (MBR). There is at least one intersectingline, and a horizontal line. The area irregularity is given as

    Area Irregularity 1n

    Xni1

    14

    Xnj1

    maxk1...4;kj

    j kSijk kSikk

    j !

    :

    38(f17) Contour irregularity: The contour of the nucleus can be rep-resented by a sequence of k equal spacing sample boundarypoints {p0, p1, p2, . . . , pj1pj, . . . , pk1} with pk = p0 andp1 = pk1. Let pj(w) be the boundary point with a distance ofw pixels from the current point pj. The curvature at point pj isdened as:

    dij tan1yj yjwxj xjw tan

    1 yj1 yj1wxj1 xj1w ; d

    i1 dik1

    Therefore, contour irregularity is dened as

    Contour Irregularity 1k

    Xk1j0

    jdij dij1j !

    ; 39

    (f18) Spot areas ratio: Pigmentation is an important characteris-tic appearing in a malignant tumor. In our system, the brightand dark spots can be detected by top-hat and bottom-hattransforms, respectively, on nuclei using a disk shape structur-ing element of radius 5 (Huang & Lai, 2010). Top-hat transformis the difference between an input image and its opening bypoint between a nucleus and each side of its MBR as shown inFig. 8. If there are two or more intersecting points at one side,the middle one is selected as the representative intersectingpoint (Huang & Lai, 2010). Then, a nucleus is partitioned intofour parts as follows. If an intersecting point is on a vertical sideof theMBR, a horizontal cutting line will go through this point. Ifan intersecting point is on a horizontal side of theMBR, a verticalcutting line will go through this point. Consequently, four possi-bly overlapping areas S1, S2, S3, and S4 will be formed with eacharea surrounded by a segment of nucleuss boundary, a vertical

    P1

    P2

    P3

    P4

    P1

    P2

    P3

    P4

    (a) (b)Fig. 8. Area irregularity (a) Round nucleus. (b) Irregular nucleus.

    s with Applications 39 (2012) 10621077Texture features: Haralicks texture features (Haralick, Shanmugan,& Dinstein, 1973) are calculated using the gray-level co-occurrencematrix. This matrix is square with dimension Ng, where Ng is the

  • stemThe value of k2 is zero when the features are linearly dependent andincreases as the amount of dependency decreases. It may not benoted that the measure k2 is nothing but the eigenvalue for thedirection normal to the principal component direction of featurepair ( x, y). It is shown that maximum information compressionachieved if a multivariate data is projected along its principal com-ponent direction. The corresponding loss of information in recon-struction of the pattern (in terms of second order statistics) isequal to the eigenvalue along the direction normal to the principalnumber of gray levels in the image. Element [I, j] of the matrix isgenerated by counting the number of times a pixel with value I isadjacent to a pixel with value j and then dividing the entire matrixby the total number of such comparisons made. Each entry is there-fore considered to be the probability that a pixel with value Iwill befound adjacent to a pixel of value j. Four statistics namely (f19) con-trast, (f20) correlation, (f21) homogeneity and (f22) energy are calcu-lated from the co-occurrence matrices calculated using offsets as(1, 0); (1, 0); (0, 1); (0, 1). Thus

    Contrast Xi;j

    ji jj2pi; j; 41

    Correlation Xi;j

    i lij ljpi; jrirj

    ; 42

    Homogeneity Xi;j

    pi; j1 ji jj ; 43

    Energy Xi;j

    pi; j2: 44

    (f23) Hyperchromatism: Hyperchromatism represents excessivepigmentation in hemoglobin content of basal cell nuclei (Huang& Lai, 2010). It is an important characteristic appearing in amalignant tumor. For the case of sever dysplasia, chromatinabnormality will result in increasing staining capacity of nuclei.Thus, the intensity of nucleus in severe dysplasia usuallyappears darker than that of normal nucleus. To nd the hyper-chromatism mean intensity of nuclei (MNI) is calculated asfollows:

    Mean intensity of nuclei 1n

    Xni1

    1kNik

    X8x;y2Ni

    Nix; y !

    ;

    45

    where n total number of nuclei, Ni: ith nucleus in the image1 6 i 6 n.

    4. Unsupervised feature selection

    All extracted features are checked for possibly highly correlatedfeatures. This process assists in removing any bias towards certainfeatures which might afterwards affect the classication proce-dure. An approach which is based on feature similarity for measur-ing similarity between two random variables based on lineardependency (Mitra, Murthy, & Pal, 2002) proposed a measurecalled maximal information compression index. Let

    Pbe the

    covariance matrix of random variable x and y. Dene, maximalinformation compression index as k2x; y smallest eigenvalueofR, i.e.,

    2k2x;y varxvaryvarxvary24varxvary1qx;y2

    q :

    46

    M. Muthu Rama Krishnan et al. / Expert Sycomponent. Hence, k2 is the amount of reconstruction error com-mitted if the data is projected to a reduced dimension in the bestpossible way. Therefore, it is a measure of the minimum amountof information loss or the maximum amount of information com-pression. The feature selection involves two steps, namely, parti-tioning the original feature set into a number of homogenoussubsets (clusters) and selecting a representative feature from eachsuch cluster. Partitioning of the features is done based on the k-NN principle using maximal information compression index. Indoing so, we rst compute the k nearest features of each feature.Among them the feature having the most compact subset (as deter-mined by its distance to the farthest neighbor) is selected, and its kneighboring features are discarded. This process is repeated for theremaining features until all of them are either selected or discarded.

    While determining the k nearest-neighbors of features, we as-sign a constant error threshold (e) which is set equal to the dis-tance of the kth nearest-neighbor of the feature selected in therst iteration. In subsequent iterations, we check the k2 value, cor-responding to the subset of a feature, whether it is greater than e ornot. If yes, then we decrease the value of k.

    5. k-Fold cross validation

    k-Fold cross validation the data set is divided into k subsets.Each time, one of the k subsets is used as the test set and the otherk 1 subsets are put together to form a training set. The advantageof this method is that it matters less how the data gets divided.Every data point gets to be in a test set exactly once, and gets tobe in a training set k 1 times. The variance of the resulting esti-mate is reduced as k is increased. The disadvantage of this methodis that the training algorithm has to be rerun from scratch k times,which means it takes k times as much computation to make anevaluation. A variant of this method is to randomly divide the datainto a test and training set k different times. The advantage ofdoing this is that we can independently choose how large each testset is and how many trials we average over (http://www.cs.cmu.edu/schneide/tut5/node42.html).

    6. Basal cell nuclei classication

    The performance of our automatic basal cell nuclei classicationsystem in this study is evaluated by two supervised and threeunsupervised classiers: the Bayesian classier, the support vectormachine (SVM) classier, the k-means, the Fuzzy c-means andGMM clustering.

    6.1. Bayesian classication

    Bayesian classication is based on probability theory and thefundamental approach to the problem of classication is Bayesdecision theory (Duda, Hart, & Stork, 2007). The principle of thedecision is to choose the most probable or the lowest risk (ex-pected cost) option. The feature vector x = [x1, x2, . . ., xd] is assumedto be generated by a d dimensional Gaussian process havingensemble mean l and covariance matrix R such a process is repre-sented using the probability density function given by

    pxijkk 12pd2jPkj12 exp12xi lkT

    X1xi lk

    ( ): 47

    The posterior probability of such process is computed by Bayes rule,

    Pkjxn akpxnjkkPcj1ajpxnjkj

    ; 48

    where c is the number of classes present in the data and aj is the jth

    s with Applications 39 (2012) 10621077 1071class priori probability (>0). Here we have c = 2 viz., normal and OSFwithout dysplasia. In order to make a Bayesian decision, the follow-ing classication rule is adopted,

  • temIf P(k = 1|xn) > P(k = 2|xn) then xn e Normal class else xn e Osfclass.

    6.2. Support vector machine classication

    The support vector machine classier (El-Naqa, Yang, Wernick,Galatsanos, & Nishikawa, 2002; Vapnik, 1998) is based on the ideaof margin maximization and it can be found by solving the follow-ing optimization problem

    min12wTw C

    Xli1

    n2i

    s:t: yiwTxi bP 1 ni; i 1; l; ni P 0:49

    The decision function for linear SVMs is given as f(x) = wTx + b. Inthis formulation; we have the training data set xi; yif gi 1; . . . ; l;where xi e Rn are the training data points or the tissue sample vec-tors, yiare the class labels, l is the number of samples and n is thenumber of features in each sample. By solving the optimizationproblem (49), i.e., by nding the parameters w and b for a giventraining set, we are effectively designing a decision hyperplane overan n dimensional input space that produces the maximal margin inthe space. Generally, the optimization problem (50) is solved bychanging it into the dual problem below:

    max Lda Xli1

    ai 12Xli;j1

    yiyjaiajxTi xj 50

    Subject to 0 6 ai 6 C; i 1; . . . ; lXli1

    aiyi 0: 51

    In this setting, one needs to maximize the dual objective func-tion Ld(a) with respect to the dual variables ai only. Subject onlyto the box constraints 0 6 ai 6 C. The optimization problem canbe solved by various established techniques for solving generalquadratic programming problems with inequality constraints.

    6.3. k-means clustering

    The k-means clustering algorithm initially assumes k centroids(in our case k = 2). Based on the initial centroids, it calculates thecluster label to each pattern (consisting of 18 features) based onthe minimum Euclidean distance (MacQueen, 1967). Based onthese labels the centroids are re-estimated as the average of allthe patterns belonging to that class at that iteration. The conver-gence criterion is total mean squared error that should be belowa threshold. The iterations are continued until the total MSE is be-low the threshold. The k-means clustering minimizes followingobjective function.

    J XKk1

    XNi1

    kxi ckk2; 52

    where xi is the ith pattern and ck is the kth centroid.

    6.4. Fuzzy c-means clustering

    The fuzzy c-means clustering algorithm optimizes (Bezdek,1981) following objective function

    J XNi1

    Xcj1

    umji kxi Vjk2; 53

    1072 M. Muthu Rama Krishnan et al. / Expert Syswhere uji is the fuzzy membership having m as the weighting expo-nent and with pattern xi such that it can associate with the cluster jhaving centroid Vj. The fuzzy membership has the property suchthat

    Xcj1

    uji 1 8i: 54

    The algorithm almost works in the same manner as that of k meansalgorithm. The update equations for the Cluster center and the fuz-zy membership are follows:

    V newj PN

    i1ujimxiPNi1ujim

    55

    unewji 1

    kxiV newj k

    2m1

    Pcl1

    1kxiVnewl k

    2m1

    56

    The iterations are stopped when kUnew UkF < e; a predenedsmall real number and U fuji;1 6 j 6 c;1 6 i 6 Ng.

    For different weighting exponent, it is possible to get differentclustering accuracies.

    6.5. Gaussian mixture model based clustering

    Here we have a binary class problem of classication of nor-mal and OSF with dysplasia cases. The GMM (Bilmes, 1998) as-sumes that the features are drawn from a normal distribution.We have two mixing components corresponding to normal andOSF classes respectively. Therefore we have two class conditionaldensities, p(xn|xk), 1 6 k 6 2 and 1 6 n 6 N, where k is the num-ber of classes and N is the total number of observations or pat-terns, corresponding class prior probabilities, p(xk), 1 6 k 6 2.Each of the two mixing component has a mean vector and covari-ance matrix. Since we have applied orthogonal transformation incompact supported basis, the off diagonal elements in the covari-ance matrix are all approximately zero since the data will behighly uncorrelated. The probability density function of such amodel is given by

    pxnjxk 12pjRkj1=2

    exp 12xn xkTR1k xn xk

    ; 57

    where xk 1jXkjXxn2xk

    xn; 58

    and Rk 1jXkjXxn2xk

    xn xkxn xkT diagr2i ;1 6 i 6 d: 59

    The corresponding posterior probabilities are given by Bayes rule asfollows.

    Pxkjxi pxijxkP2k1pxkpxijxk

    : 60

    Since our data consists of missing observations or it does not repre-sent the whole of the sample space, the mean vectors and thecovariance matrices computed are not the correct ones. Thereforethe means and variances are recomputed using Expectation Maxi-mization (EM) algorithm and using maximum likelihood estimationmethod. The re-estimating formulae are following

    l^j PN

    i1xiPxjjxiPNi1Pxjjxi

    ; 61

    P

    s with Applications 39 (2012) 10621077r^2j Ni1xi l^j2PxjjxiPN

    i1Pxjjxi; 62

  • px^j 1NXNi1

    Pxjjxi: 63

    The initial prior probability is taken to be 0.5 for each of the classes.An initial model is assumed from the data. The EM algorithm used ishaving two core steps; E step and M step. During E step class con-

    Overall accuracy: The overall accuracy of a test is the measure oftrue ndings (true-positive + true-negative results) divided byall test results. This is also termed the efciency of the test.

    Overall accuracy TP TNTP FP FN TN%: 67

    7. Results and discussion

    M. Muthu Rama Krishnan et al. / Expert Systems with Applications 39 (2012) 10621077 1073ditional density is computed according to Eq. (57), and from it pos-terior density according to Eq. (60) is computed. During M step theclass model is been re-estimated according to the Eqs. (61)(63).The process is continued until the new estimate will not changemuch from the previous estimate, and model gets stabilized. Thenthe EM based GMM is said to be converged. The logarithm of theclass conditional density called as log- likelihood is computed foreach of the iteration and it will stop increasing at convergence.

    The GMM algorithm is an optimization problem which maxi-mizes the following objective function.

    J Yn

    Xk

    pxkpxnjxk 64

    The converged centroids are such that the product over all theobservations, the total class conditional densities weighted withrespective prior probability will be maximized. The EM algorithmdetermines its new estimate such that it will be approaching tothe optimum of the objective function, so as for the algorithm toconverge. GMM is an iterative algorithm, which can be performedin O(ndkT) oating point operations, where n is the number of pat-terns, d is the total number of features in a pattern, k is the totalnumber of classes present in the data, and T is the number of iter-ations required for convergence of the algorithm.

    6.6. Performance analysis

    In practice, each of the classiers is required to be evaluated inorder to compare their sensitivity, specicity along with overallaccuracy. In view of this, the following confusion matrix (seeTable 1) is usually designed based on the trade-off between actualand classier generated outputs.

    where

    TP: True Positive: A patient predicted with OSF when the sub-ject actually has OSF.TN: True Negative: A patient predicted healthy when subjectactually is healthy.FP: False Positive: A patient predicted with OSF when subjectactually is healthy.FN: False Negative: A patient predicted healthy when subjectactually has OSF.

    Sensitivity: It is a measure of accuracy of diagnosis of malignant(true) cases of OSF. Mathematically, it is dened as

    Sensitivity TPTP FN%: 65

    Specicity: It is a measure of accuracy of diagnosis of benign (false)cases of OSF. Mathematically, it is dened as

    Specificity TNFP TN%: 66

    Table 1A 2 2 confusion matrix for performance evaluation.

    Classier output Patients with OSF (as conrmed on biopsy)

    Negative (absent) Positive(present)Negative TN FNPositive FP TPThe basal cell nuclei boundaries are overlaid on extracted basallayer (shown in Fig. 9) of the H&E image shown in Fig. 5(a).Fig. 7(ac) shows some of the extracted cells after performing fuz-zy classication for identifying cells of NOM and OSF respectively.The segmented nuclei of the cells are shown in Fig. 7(df) usingGVF. Fig. 10(a) and (c) shows the segmented normal and dysplasticbasal cells. Fig. 10(b) and (d) shows the segmented nucleus respec-tively, which shows the normal nucleus taken very less stain com-pare to the dysplastic nucleus. This is due to hyperchromatism.

    The features are extracted from the segmented basal cell nucleiFig. 7(df). Here we have 771 nuclei for normal and 423 nuclei forOSF with dysplasia.

    The features of normal and OSF are summarized into mean,standard deviation (Table 2). The results suggest that 18 featuresare signicant except eccentricity, solidity, rectangularity, orienta-tion and contour irregularity in discriminating normal and OSFgroup using unsupervised feature selection. An advantage of usingthe unsupervised feature selection for inspecting feature separabil-ity is that the algorithm is generic in nature and has the capabilityof multiscale representation of the data sets. Fig. 11 shows plot be-tween feature index and feature weights of the unsupervised fea-ture selection between normal, OSF without dysplasia group.Feature weights are basically the distance of k-NN for each feature.Moreover, the plot indicates signicance of the feature to discrim-inate the two groups.

    Further, numeric values of most of the feature are increasingsteadily from the normal to OSF with dysplasia. The nucleus areaof the dysplastic cells is twice as large as that of the normal cells.The increase in nucleus area in this study may be a reection ofthe increase in DNA synthesis. The changes occurring in the basalcell nuclei might indicate an increased metabolic activity prior tothe invasion of the underlying connective tissues. Thus, the meanintensity of nucleus in sever dysplasia usually appears darker thanthat of normal nucleus (Fig. 10(b) and (d)), which can be inferredfrom the results. In normal case the intensity value is 24.98 it indi-cates stain taken by the nucleus is less but in OSF with dysplasiathe intensity value is 18.69, it indicates stain taken by the nucleusis high. This is due to hyperchromatism, i.e., excessive pigmenta-tion in hemoglobin content of basal cell nuclei. It is an importantcharacteristic appearing in a malignant tumor. For the case of severdysplasia, chromatin abnormality results in increasing stainingcapacity of nuclei.Fig. 9. Segmented boundaries of basal cells are superimposed on the extractedbasal layer.

  • Table 2

    Fig. 10. (a) Segmented normal basal cell; (b) less intense normal basal cell nucleus; (

    1074 M. Muthu Rama Krishnan et al. / Expert SystemFeatures extracted from nucleus of normal and OSF basal cells.

    Sl. no Nucleus features

    1 Area2 Perimeter3 EccentricityFig. 12(a) shows the box plot for one of the feature, area of nu-cleus, which suggests that median of the feature is almost same asmean so neglecting the chance of outliers for contributing thehigher difference between two classes. Fig. 12(b) shows the densityplot of perimeter for normal and OSF with dysplasia cases which

    4 Fourier descriptors5 Zernike moments ((m = 1; n = 3))6 Area equivalent diameter7 Perimeter equivalent diameter8 Form factor9 Convex area10 Solidity11 Roundness12 Concavity13 Orientation14 Aspect ratio15 Rectangularity16 Area irregularity17 Contour irregularity18 Spot areas ratio19 Contrast20 Correlation21 Homogeneity22 Energy23 Mean nuclei intensity

    * Signicant based on feature weights.

    Fig. 11. Plot between feature indexes vs. feature weights for showing signicanceof features.Normal OSF with dysplasial r l r

    7.93 1.75 13.79 2.07*

    9.31 1.15 12.50 1.13*

    0.89 0.12 0.88 0.14

    c) segmented dysplastic basal cell; (d) high intense dysplastic basal cell nucleus.

    s with Applications 39 (2012) 10621077shows the distinct discrimination between the two groups and3D scatter plot as shown in Fig. 12(c) shows that the features,i.e., Zernike moments, Fourier descriptors and area equivalentdiameter are quiet separable from discrimination point of viewwith this we can infer a simple linear classier can achieve higheraccuracy.

    We have evaluated the performance of OSF screening systemusing 341 normal and 429 OSF with dysplasia biopsy images of size1388 1040 pixels obtained frommore than 20 patients. To estab-lish the ground truth, biopsy images are commonly graded by agroup of experienced pathologists. Before features extraction, nu-clei segmentation must be performed. Fig. 6 shows examples ofsuccessful nuclei segmentation.

    To evaluate the performance of our screening system, we used1194 nuclei images in this 771 normal nuclei and 423 OSF withdysplasia nuclei images. In our study we have used k-fold cross val-idation for training/testing data partitioning. The advantage ofdoing this is that we can independently choose how large each testset is and how many trials we average over (Schneider 1997). Inour study the number of cases (normal: 771, OSF with dysplasia:423) is divided by 10 fold; the size of each fold is not the sameas shown in Table 3.

    Here we have employed two supervised classiers viz., Bayesianand SVM, three unsupervised classiers viz., k-means, FCM, GMM

    1.01e+013 7.38e+012 8.21e+013 5.17e+013*

    2.38 0.32 2.44 0.42*

    3.16 0.36 4.18 0.31*

    2.52 0.56 4.39 0.66*

    1.14 0.08 1.11 0.09*

    8.22 1.82 14.34 2.19*

    0.96 0.01 0.96 0.020.68 0.12 0.70 0.12*

    0.29 0.13 0.55 0.26*

    1.73 60.21 1.17 60.0910.25 2.26 17.82 2.69*

    0.77 0.01 0.77 0.011.45 0.69 2.42 1.23*

    0.25 0.07 0.25 0.071.59 0.07 1.56 0.07*

    0.09 0.03 0.10 0.03*

    0.98 0.01 0.97 0.01*

    0.64 0.07 0.59 0.06*

    0.910.03 0.94 0.02*

    18.69 5.71 24.98 7.03*

  • M. Muthu Rama Krishnan et al. / Expert Systemto evaluate the screening system using 18 features. The best overallperformance (99.66%) is obtained with 10-fold cross validationusing SVM classier. The corresponding sensitivity is 99.74% andspecicity is 99.53% are also sufciently high. The supervised clas-siers results are listed in Table 4. In case of Bayesian we have ob-

    Fig. 12. (a) Box plot for area; (b) Density plot of perimeter for Normal and OSF wi

    Table 3Stratied 10-fold cross validation of the given data set.

    Fold Size of training set Size of testing set

    Fold#1 1075 119Fold#2 1074 120Fold#3 1074 120Fold#4 1074 120Fold#5 1074 120Fold#6 1075 119Fold#7 1075 119Fold#8 1075 119Fold#9 1075 119Fold#10 1075 119

    Table 4Performance measure for supervised classiers.

    Classier Average sensitivity(%)

    Average specicity(%)

    Average accuracy(%)

    Bayesian 96.43 96.62 96.56SVM 99.74 99.53 99.66s with Applications 39 (2012) 10621077 1075tained 96.56% overall performance. The corresponding sensitivityis 96.43% and specicity is 96.62%. Fig. 13(ac) shows the sensitiv-ity, specicity and accuracy plot over 10-fold. In SVM we haveobserved both sensitivity and specicity are more than 99% in all10-folds consistently, but in Bayesian classier 7th fold there is adrastic reduction in sensitivity, specicity and accuracy except thatall other folds are more than 90%.

    The classication accuracy is listed in Table 5 for all the threeunsupervised classiers; i.e., k means, FCM and GMM classiers,among them GMM performs well. The best overall performance(90.37%) is obtained using GMM classier. The corresponding sen-sitivity is 89.62% and specicity is 91.73% are also sufciently high.The GMM is trained to classify the data and the log likelihood willconverge during estimating model parameters. The log likelihoodplot is given in Fig. 14. It converges in seven iterations and be-comes stable.

    From the above results (Tables 4 and 5), we conclude that theSVM obtains very promising results in classifying the possibleOSF patients. We believe that the proposed system can be veryhelpful to the onco-pathologist for their nal decision on their pa-tients. By using such an efcient tool, they can make very accuratedecisions.

    8. Conclusion

    Accurate screening for OSF biopsy images is important toprognosis and treatment planning. Visual grading by human istime-consuming, subjective, and inconsistent while computerized

    th dysplasia; (c) 3D plot of features for normal and OSF with dysplasia cases.

  • Fig. 13. (a) Sensitivity plot for SVM and Bayesian classiers over 10-fold; (b) specicityBayesian classiers over 10-fold.

    Table 5Performance measure for unsupervised classiers.

    Classier Sensitivity (%) Specicity (%) Accuracy (%)

    k-Means 84.44 83.22 84.00FCM 90.14 88.18 89.45GMM 89.62 91.73 90.37

    Fig. 14. Log-likelihood values of GMM classier during training over iterations.

    1076 M. Muthu Rama Krishnan et al. / Expert Systems with Applications 39 (2012) 10621077analysis for OSF biopsy images is a very complex task requiring a lotof appropriate image processing steps and experts domain knowl-edge for correct screening.

    In this paper, we propose an automated system for screeningOSF biopsy images. In image preprocessing, a median lteringmethod is proposed to remove noise. Initially basal layer extractedfrom histopathological images using various steps viz., fuzzy diver-gence based thresholding subsequently morphological operationsto nd the lower boundary of the basal layer and parabola tting.Further, nuclei are extracted from these cells using color deconvo-lution, marker-controlled watershed transform and GVF activecontour method, such a hybrid approach is robust in terms ofremoving noise and preserving shapes of nuclei in OSF biopsyimages. In feature extraction, 23 features are extracted from seg-mented biopsy images according to ve types of OSF characteris-tics including nuclear changes (variation in size and shape,polymorphism (nuclei of the basal layer are elongated and perpen-dicular to basement membrane), nuclear irregularity, hyperchro-masia (excessive pigmentation in hemoglobin content of basalcell nuclei) and nuclear texture. These features comprise both localand global characteristics so that normal and OSF with dysplasiacan be distinguished effectively. In classication, unsupervised fea-ture selection method is used to select an optimal feature subset(18 features) from the 23 features for the supervised and unsuper-vised classiers.

    plot for SVM and Bayesian classiers over 10-fold; (c) accuracy plot for SVM and

  • M. Muthu Rama Krishnan et al. / Expert Systems with Applications 39 (2012) 10621077 1077The major contribution of this study is to develop an efcientand effective automated screening system for OSF biopsy imagesusing several methods for image preprocessing, segmentation, fea-ture extraction and image classication. The system is effective be-cause experimental results show that 99.66% of accuracy can beachieved on an average by exercising a set of 341 normal and429 OSF with dysplasia images obtained from more than 20patients. A compact set of 18 features and their quantitativemeasurements are particularly useful for screening is dened inthis paper. The best accuracy can be achieved 99.66% using SVMclassier and 90.37% accuracy achieved using GMM classierbecause feature subset is carefully selected. We believe that theproposed system can be very helpful to the onco-pathologist fortheir nal decision on to their patients.

    Acknowledgement

    The authors would like to thank Dr. M. Pal, GNDSIR, Kolkata,India, and Dr. J. Chatterjee, SMST, IIT Kharagpur, India for theirclinical support and valuable advices. The authors are very gratefulto Mr. Pratik Shah, SMST, IIT Kharagpur, India for assistance duringthe implementation of the parabola tting and colour deconvolu-tion algorithms.

    References

    Banoczy, J. (1982). Oral leucoplakia (p. 231). Akademiai Kiado: Budapest.Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New

    York: Plenum Press.Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to

    parameter estimation for Gaussian mixture and hidden Markov models. TechnicalReport, UC Berkeley.

    Burkhardt, A. (1985). Advanced methods in the evaluation of premalignant lesionsand carcinoma of the oral mucosa. Journal of Oral Pathology, 14, 751758.

    Chaira, T., & Ray, A. K. (2003). Segmentation using fuzzy divergence. PatternRecognition Letters, 24(12), 18371844.

    Chaira, T., & Ray, A. K. (2009). Fuzzy image processing and applications with MATLAB.New York: CRC Press, pp. 8081.

    Chaudari, D., & Samal, A. (2007). A simple method for tting of bounding rectangleto closed regions. Pattern Recognition, 40, 19811989.

    Daftary, D. K., Murti, P. R., Bhonsale, R. B., Gupta, P. C., Mehta, F. S., & Pindborg, J.J.(1993). Oral precancerous lesions and conditions of tropical interest. In:Prabhu, S. R., Wilson, D. F., Daftary, D. K., Johnson, N. W., (Eds.), Oral diseases inthe tropics (pp. 402424). Oxford: Oxford University Press.

    Duda, R., Hart, P., & Stork, D. (2007). Pattern classication (2nd ed.). India: Wiley.Duncan, J. S., & Ayache, N. (2000). Medical image analysis: Progress over two

    decades and the challenges ahead. IEEE Transactions on Pattern Analysis andMachine Intelligence, 22, 85106.

    El-Naqa, I., Yang, Y., Wernick, M. N., Galatsanos, N. P., & Nishikawa, M. R. (2002). Asupport vector machine approach for detection of microcalcications. IEEETransactions on medical imaging, 21, 15521563.

    Fan, J., & Xie, W. (1999). Distance measure and induced fuzzy entropy. Fuzzy SetsSystems, 104, 305314.

    Farjam, R., Soltanian-Zadeh, H., Zoroo, R. A., & Jafari-Khouzani, K. (2005). Tree-structured grading of pathological images of prostate. Proceedings of SPIE:Medical Imaging, 5747, 840851.

    Fuhrman, S. A., Lasky, L. C., & Limas, C. (1982). Prognostic signicance ofmorphologic parameters in renal cell carcinoma. American Journal of SurgicalPathology, 6, 655663.

    Gilles, F. H., Tavare, C. J., Becker, L. E., Burger, P. C., Yates, A. J., Pollack, I. F., et al.(2008). Pathologist interobserver variability of histologic features in childhoodbrain tumors: Results from the CCG-945 study. Pediatric and DevelopmentalPathology, 11, 08117.

    Glotsos, D. (2003). A hierarchical decision tree classication scheme for braintumour astrocytoma grading using support vector machines. In Proceedings ofthird international symposium on image and signal processing analysis (Vol. 2, pp.10341038).

    Gonzalez, R. C., & Woods, R. E. (2002). Digital image processing (2nd ed.). New York:Prentice Hall, pp. 655659.

    Grieg, G., Kubler, O., Kikinis, R., & Jolesz, F. A. (1992). Nonlinear anisotropic lteringof MRI data. IEEE Transactions on Medical Imaging, 11(2), 221232.

    Grootscholten, C., Bajema, I. M., Florquin, S., Steenbergen, E. J., Peutz-Kootstra, C. J.,Goldschmeding, R., et al. (2008). Interobserver agreement of scoring ofhistopathological characteristics and classication of lupus nephritis.Nephrology Dialysis Transplantation, 23, 223230.Hand, J. R., & Broders, A. (1931). Carcinoma of the kidney: The degree of malignancyin relation to factors bearing on prognosis. Journal of Urology, 28, 199216.

    Haralick, R. M., Shanmugan, K., & Dinstein, I. (1973). Textural features for imageclassication. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3,610621.

    http://www.cs.cmu.edu/~schneide/tut5/node42.html last accessed March 2010.http://www.dentistry.bham.ac.uk/landinig/software/software.html last accessed

    March 2010.Huang, P. W., & Lai, Y. H. (2010). Effective segmentation and classication for HCC

    biopsy images. Pattern Recognition, 43(4), 15501563.Jafari-Khouzani, K., & Soltanian-Zadeh, H. (2003). Multiwavelet grading of

    pathological images of prostate. IEEE Transactions on Biomedical Engineering,50, 697704.

    Khotanzad, A., & Hong, Y. H. (1990). Invariant image recognition by zernikemoments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5),489497.

    Kim, T. Y., Choi, H. J., Cha, S. J., & Choi, H. K. (2005). Study on texture analysis of renalcell carcinoma nuclei based on the Fuhrman grading system. In Proceedings ofseventh international workshop on enterprise networking and computing inhealthcare industry (pp. 384387).

    Lohse, C. M., Blute, M. L., Zincke, H., Weaver, A. L., & Chenille, J. C. (2002).Comparison of standardized and non-standardized nuclear grade of renal cellcarcinoma to predict outcome among 2042 patients. American Journal of SurgicalPathology, 118, 877886.

    MacQueen, J. B. (1967). Some methods for classication and analysis of multivariateobservations. In Proceedings of fth Berkeley symposium on mathematicalstatistics and probability (Vol. 1, pp. 281297). Berkeley: University ofCalifornia Press.

    McKeown, M. J., & Ramsey, D. A. (1996). Classication of astrocytomas andmalignant astrocytomas by principal component analysis and a neural net.Journal of Neuropathology and Experimental Neurology, 55, 12381245.

    Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised feature selection usingfeature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(4), 301312.

    Muthu Rama Krishnan, M., Pal, M., Bomminayuni, S. K., Chakraborty, C., Paul, R. R.,Chatterjee, J., et al. (2009). Automated classication of cells in sub-epithelialconnective tissue of oral sub-mucous brosis-an SVM based approach.Computers in Biology and Medicine, 39(12), 10961104.

    Muthu Rama Krishnan, M., Shah, P., Pal, M., Chakraborty, C., Paul, R. R., Chatterjee, J.,et al. (2010). Structural markers for normal oral mucosa and oral sub-mucousbrosis. Micron, 41(4), 312320.

    Muthu Rama Krishnan, M., Pal, M., Paul, R. R., Chakraborty, C., Chatterjee, J., & Ray, A.K. (2010). Computer vision approach to morphometric feature analysis of basalcell nuclei for evaluating malignant potentiality of oral submucous brosis.Journal of Medical Systems. doi:10.1007/s10916-010-9634-5 [Epub ahead ofprint].

    Novara, G., Martignoni, G., Artibani, W., & Ficarra, V. (2007). Grading systems inrenal cell carcinoma. Journal of Urology, 177, 430436.

    Paul, R. R., Mukherjee, A., Dutta, P. K., Banerjee, S., Pal, M., Chatterjee, J., et al.(2005). A novel wavelet neural network based pathological stage detectiontechnique for an oral precancerous condition. Journal of Clinical Pathology, 58,932938.

    Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropicdiffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7),629639.

    Ruifrok, A. C., & Johnston, D. A. (2001). Quantication of histochemical staining bycolor deconvolution. Analytical Quantitative Cytology and Histology, 291299.

    Rust, B. W. (2001). Fitting natures basic functions part I: Polynomials and linearleast squares. Computing in Science and Engineering, 8489.

    Satheesh, M., Paul, M., & Hammond, S. P. (2007). Modeling epithelial cell behaviorand organization. IEEE Transactions on NanoBioscience, 6(1), 7785.

    Scarpelli, M., Bartels, P. H., Montironi, R., Galluzzi, C. M., & Thompson, D. (1994).Morphometrically assisted grading of astrocytomas. Analytical QuantitativeCytology and Histology, 16, 351356.

    Schad, L. R., Schmitt, H. P., Oberwittler, C., & Lorenz, W. J. (1987). Numerical gradingof astrocytomas. Medical Informatics, 12, 1122.

    Shabana, A. H., Gel-Labban, N., & Lee, K. W. (1987). Morphometric analysis of basalcell layer in oral premalignant white lesions and squamous cell carcinoma.Journal of Clinical Pathology, 40(4), 454458.

    Shuttleworth, J., Todman, A., Norrish. M., & Bennett, M. (2005). Learninghistopathological microscopy. Pattern Recognition and Image Analysis, Pt 2,Proceedings. 3687, 764772.

    Smith, Y., Zajicek, G., Werman, M., Pizov, G., & Sherman, Y. (1999). Similaritymeasurement method for the classication of architecturally differentiatedimages. Computers and Biomedical Research, 32, 112.

    Tabesh, A., Teverovskiy, A. M., Pang, H. Y., Kumar, V. P., Verbel, D., Kotsianti, A., et al.(2007). Multifeature prostate cancer diagnosis and gleason grading ofhistological images. IEEE Transaction on Medical Imaging, 26, 13661378.

    Vapnik, V. (1998). Statistical learning theory (2nd ed.). New York: Wiley.Xu, C., & Prince, J. L. (1997). Gradient vector ow: A new external force for snakes. In

    Proceeding of IEEE conference on computer vision and pattern recognition (CVPR)(pp. 6671). Los Alamitos: Comp. Soc. Press.

    Hybrid segmentation, characterization and classification of basal cell nuclei from histopathological images of normal oral mucosa and oral submucous fibrosis1 Introduction2 Materials and methods2.1 Histology2.2 Image acquisition2.3 Image processing2.4 Basal cell nuclei segmentation2.4.1 Edge enhancement using anisotropic diffusion2.4.2 Fuzzy divergence based threshold selection2.4.3 Parabola fitting2.4.4 Basal cell segmentation using color deconvolution2.4.4.1 Color deconvolution

    2.5 Basal cell nuclei tracking using GVF snakes

    3 Feature extraction4 Unsupervised feature selection5 k-Fold cross validation6 Basal cell nuclei classification6.1 Bayesian classification6.2 Support vector machine classification6.3 k-means clustering6.4 Fuzzy c-means clustering6.5 Gaussian mixture model based clustering6.6 Performance analysis

    7 Results and discussion8 ConclusionAcknowledgementReferences