chapter 3 segmentation of lung region using fuzzy possibilistic c...

51

CHAPTER 3

SEGMENTATION OF LUNG REGION USING FUZZY

POSSIBILISTIC C-MEANS (FPCM)

3.1 INTRODUCTION

Cancer is a disease in which abnormal cells of the body divide very

fast, and generate excessive tissue that forms a tumor. Cancer cells are

capable of spreading to other parts of the body through the blood and lymph

systems. When the uncontrolled cell growth occurs in one or both lungs, it is

said to be Lung Cancer. Besides, developing into a healthy, normal lung

tissue, these abnormal cells continue dividing and form lumps or masses of

tissue called tumors. The main function of the lung which is to carry the

bloodstream with oxygen to the entire body is disturbed by these tumors.

There are many types of cancers.

3.1.1 Types of Lung Cancer

Cancers that begin in the lungs are divided into two major types,

non-small cell lung cancer and small cell lung cancer, depending on how the

cells look under a microscope. Each type of lung cancer grows and spreads in

different ways and is treated differently.

3.1.2 Small cell lung cancer (SCLC)

This is usually believed to be a systemic disease at the time of

diagnosis and thus surgery plays no part in the management of this disease.

52

3.1.3 SCLC staging

Limited disease: It is limited to one hemi thorax that can be

included in a reasonable field of thoracic radiation therapy.

Extensive disease: It is beyond one hemi thorax or that cannot be

included in a reasonable field of thoracic radiation therapy.

3.1.4 Non-small cell lung cancer (NSCLC)

NSCLC aggravates heavily later in its course than SCLC and

consequently surgery is the best chance of cure. Patients those who are

considered for surgical treatment must be carefully staged to determine tumor

resectability. Positron Emission Tomography (PET) will help to assess modal

in involvement. But, only 15% of patients are appropriate for resection at

diagnosis. The patient must also be carefully assessed pre-operatively to

assure fitness for surgery.

Small-cell lung cancer differs from non-small-cell lung cancer in

the following ways:

SCLC grows quickly.

SCLC spreads quickly.

SCLC responds well to chemotherapy and radiation therapy

SCLC is often associated with distinct paraneoplastic

syndromes

Lung cancer is one of the most dangerous cancers in the world,

with the smallest survival rate after diagnosis, with a gradual increase in the

mortality rate every year. Figure 3.1 shows an example of cancerous lung

image. The survival probability from lung cancer is indirectly proportional to

53

its growth at its detection time. The chances of successful treatment are

possible only if the disease is detected at the earlier stage. An estimated result

shows that 85% of lung cancer cases in males and 75% in females are caused

by cigarette smoking (Singapore Cancer Society). The overall survival rate

for all types of cancer is 63%. Though surgery, radiation therapy, and

chemotherapy have been used in the treatment of lung cancer, the five-year

survival rate for all stages combined is only 14%. This has not changed in the

past three decades (American Cancer Society, 2005).

Figure 3.1 An example of cancerous lung image

Several thin-sectional CT images are produced in clinic for each

patient and are estimated by a radiologist by looking at each image in the

axial mode. Most of the images will be very tough to interpret and consume a

lot of time that can cause high false-negative rates for detecting small lung

nodules, and thus potentially miss a cancer. The fundamental idea of

designing a CAD system is to make a machine algorithm that acts as a support

to the radiologist and points out locations of doubtful objects, so that the

overall sensitive rate is raised.

CAD system must achieve following needs

improving the quality and accuracy of diagnosis,

increasing success in therapy by early detection of cancer,

avoiding unnecessary biopsies,

Reducing radiologist interpretation time (Wiemker et al 2005).

54

A CAD system for early detection of lung cancer by analyzing the

images of CT is proposed in this chapter. Fuzzy Possibilistic C-Means

(FPCM) is used for clustering the cancerous and non-cancerous nodules.

3.2 NEURAL NETWORKS IN CANCER DETECTION

Various neural network techniques have been employed in the

cancer detection approaches. Recently, ANNs is a main research area in

health care modeling and it is believed that they will receive extensive

application to biomedical systems in the coming years (Lin et al 2002).

Neural networks learn by examples and so the ways to recognize the disease

is not needed. A set of examples (patterns) is only needed that are

representative of all the variations of the particular disease. A high accuracy

level in the disease recognition is obtained by carefully choosing the patterns.

3.2.1 Artificial neural networks (ANN)

These are basic models of the biological nervous system and are

inspired from the kind of computing executed by a human brain. An ANN

is an extremely parallel distributed processing system made up of highly

interconnected neural computing elements (neurons) that offer the ability

to learn and thus acquire knowledge and make it available for use

(Rajasekaran et al 2003). The data obtained by electrical impedance

spectroscopy has a strong relation with soft computing in identifying

cancerous area from the normal area. So ANN which is an information

processing system can be used as an appropriate tool for cancer detection.

Certain performance characteristics of ANN are common with

biological neural networks. An ANN contains some nodes which are

connected through weights. Each node obtains data from previous nodes,

attaches it and passes data via a nonlinear function, and then propagates data

to succeeding nodes. Two phases are involved in the evaluation of ANN

performance:

55

Training phase

Test phase

The input patterns are offered to the ANN and weights are adjusted

and fixed to learn these patterns in the training phase. ANN certainly learns

input patterns in learning phase. On the other hand, the patterns which are not

used in training phase are presented to the ANN in the test phase and the

ANN’s outputs are used to estimate its performance (Alizadeh et al 2008). If

the performance of ANN is satisfactory, it can be used in its own specific

application.

ANNs are classified into two categories like supervised and

unsupervised learning. ANN is supervised only if the outputs of the input

patterns used in the training phase of the ANN are available through a

particular experiment, and otherwise it is unsupervised. Supervised ANNs can

also be categorized into two groups namely error-based and prototype-based.

The main aim of error-based network is to reduce the cost function which is

defined on the basis of error between the desired output and the network

output. The main aim of the prototype-based network is to reduce the distance

between the inputs patterns and the prototypes which are assigned to each

cluster.

The Multilayer Perceptron (MLP) and Radial Basis Function (RBF)

networks are the examples of error-based networks and the Linear Vector

Quantization (LVQ) is the example for prototype based network. The MLP

training is based on the minimization of a suitable cost function, and is called

the back propagation algorithm. The first version of this algorithm based on

the gradient descent technique was proposed by Werbos (1974) and Parker

(1982).

56

The fundamental construction of a Radial Basis Function (RBF)

network constitutes three layers with entirely different roles. The inputs layer

consists of source nodes that connect the network to its environment. A

nonlinear transformation from the input space to the hidden space is applied

in the second layer; in most applications the hidden space is of high

dimensionality. The output layer is linear, providing the response of the

network to the activation pattern applied to the input layer (Vakil Baghmisheh

et al 2004, Haykin 2006).

Linear Vector Quantization (LVQ) was introduced by Linde et al

(1980) and Gray (1984). It was initially used for image data compression and

later was adapted by Kohonen (1990) for pattern recognition. The

fundamental idea is to divide the input space into number of distinct regions,

called decision regions.

These different ANN structures are to predict the malignancy of the

different cancers.

3.2.2 Wavelet Neural Network

Multilayer perceptron (MLP) along with the back propagation

learning algorithm is the most popular type of ANN among all in practical

situations (Zainuddin et al 2001, Shirvany et al 2009). However,

disadvantages of a MLP are

1. difficulties in reaching the global minimum in a complex

search space

2. time-consuming and

3. failure to converge when high nonlinearities exist,

57

To overcome the deficiencies of an MLP, a Wavelet Neural

Network (WNN) has been introduced as a vital alternative to the MLP

(Banakar et al 2008). Wavelet families are integrated as the activation

function in the hidden layer of WNNs. There are several issues that are

concerned with WNNs, varying from different learning algorithms, network

architecture, type of activation functions used in hidden layer and also the

parameter initialization.

A proper initialization of the network parameter is a key factor to

achieve faster convergence rate and higher accuracy rate. Approaches of using

an explicit expression, hierarchical clustering, support vector machine,

genetic algorithm and K-Means clustering are among the approaches that

have been implemented in the parameter initialization (Maoan et al 2004,

Xiao-Guang et al 2006). Various clustering algorithms, namely, K-Means

(KM), Fuzzy C-means (FCM), symmetry-based K-Means (SBKM),

symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-

based K-means (MPKM) clustering algorithms are available in initializing the

WNN translation parameter. These various clustering algorithms can be

integrated into the WNN and applied in a real world application, where the

classification problem of heterogeneous cancer using the microarray data is

the main concern.

3.2.3 Probabilistic Neural Network (PNN)

PNN was developed by Specht (1988, 1990). This provides a

common solution to pattern classification problems by following the

probabilistic approach based on the Bayes’ formula. The Bayes’ decision

theory emerged from his formula takes into account the relative likelihood of

events and uses ‘a priori’ information to improve prediction. Parzen

estimators are used by the network model to attain the corresponding

probability density functions (p.d.f.) to the classification categories.

58

Parzen (1962) showed that classes of p.d.f. estimator asymptotically approach

the fundamental density function, provided that it is continuous. Cacoulos

(1966) extended Parzen’s approach to the multivariate case.

A supervised training set is used by PNN to develop probability

density functions within a pattern layer. Training of a PNN is much easier

than other ANN techniques. Key advantages of PNN are that training needs

only a unique pass and that the decision hyper-surfaces are guaranteed to

approach the Bayes-optimal decision boundaries as the number of training

samples grows. On the other hand, the main limitation of PNN is that all

training samples must be stored and used in classifying new patterns. But, in

order to decrease the computational cost, dimensionality reduction and

clustering approaches are usually applied, previous to the PNN construction.

The PNN-based decision approach was applied to categorize a

group of individuals into certain categories of diagnosis in the area of cancer

diseases.

3.2.4 Hopfield Artificial Neural Network (HANN)

Hopfield Neural Network (HANN) is one of the ANN, which has

been used in many of the literatures for different purposes. The main use of

Hopfield Neural Network in the medical image processing field is its use

for classification of Magnetic Resonance (MR) images of the brain based

on energy minimization as described in (Armatur et al 1992). The

performance of the HANN is found to be significant. The algorithm is

enhanced to overcome some of the problems such as considering the

minimization of the sum of squared errors and ensuring the convergence of

the network in a pre-specified period of time.

59

The improved version of the HANN has been used in (Sammoda et al

1996) for MR images of the brain. The same algorithm is used for the

segmentation of the extracted lung regions. Then the extracted lung regions

segmentation problem is formulated as a minimization of an energy function

constructed of a cost-term as a sum of squared errors. In order to guarantee the

convergence of the network, the minimization is achieved with a step function

permitting the network to reach its stability in a pre-specified period of time.

Figure 3.2 Architecture of HANN

The HANN architecture (Figure 3.2) consists of a single layer

representing a grid of N x M neurons with each column representing a class

and each row representing a pixel. All neurons work as both input and output

neurons simultaneously. In fact neurons under each class hold the probability

that the corresponding pixel belongs to this class. N is the size of the given

image and M is the number of classes that is given as ‘a priori’ information.

The network is designed to classify the feature space without teacher, based

on the compactness of each class calculated using the distance measure (Rkl)

between the kth

pixel and the centroid of class l. The problem of segmentation

is formulated as a partition of N pixels among M classes such that the

assignment of the pixels reduces the cost-term of the energy (error) function:

60

E = R V (3.1)

where Rkl represents the distance measure between the kth

pixel and the

centroid of class l, and defined as follows:

R = ||X X || (3.2)

where Xk is the feature value (intensity value) of the kth

pixel and X l is the

centroid value of class l, and defined as follows:

X = (3.3)

where nl is the number of pixels in class l. Considering n=2 which means the

energy is defined as sum-squared error. Vkl is the output of the klth

neuron.

This approach adopted the winner-takes-all learning rule, where the input-

output function for the kth

row (to assign a label ‘m’ to the kth

pixel) is given

by:

V (t + 1) = 1 if U = Max[U ,1]

V (t + 1) = 0 otherwise (3.4)

The minimization achieved by using Hopfield neural network

(HANN) and by solving a set of motion equations satisfying the condition,

( t) (3.5)

where Ui and Vi respectively represent the input and output of the ith

neuron,

(t) represents scalar positive function of time, which determines the length

of the step to be taken in the direction of the vector d = E(V ). The suitable

selection of the step (t) is done by proper skill. Experimentation and a

familiarity with a given class of optimization problems are often required to

61

find the best function. It is found that the (t) function used by Sammouda et al

(1996) for segmenting the MR data using HANN is used in this approach and

works fine for segmenting the CT data:

(i.e) ( t) = t (T t) (3.6)

where t represents the iteration step and is the pre-specified convergence

time. HANN segmentation algorithm can be summarized in the following

steps:

1. Initialize the input of neurons to random values.

2. Apply the input-output function (Vkl) defined above, to obtain

the new output values for each neuron, establishing the

assignment of pixels to classes. The class membership

probabilities grow or diminish in a winner-takes-all style as a

result of contention between classes. In winner-takes-all

model, the neuron with the highest input value fires and takes

the value 1, and all remaining neurons take the value 0.

3. Compute the centroid (X l) as defined above, for each class l.

4. Compute the energy function (E) as defined above.

5. Update the inputs (Ui) using the following equation, Learning

occurs when neuron input weights are adjusted in an attempt

to reduce the output error.

U (t + 1) = U (t) + (3.7)

6. Repeat from step 2 until t = Ts. This process iteratively

modifies the pixel label assignments to reach a near optimal

final segmentation map.

62

Equation 3.1 to 3.7 explains the application of HANN to

segmentation of MRI images.

The existing neural network techniques are susceptible to a high

rate of false positives as well as false negatives. Thus a novel technique is

necessary to increase the accuracy, specificity, and sensitivity of the system.

3.3 ARTIFICIAL INTELLIGENCE (AI)

The most efficient method to overcome the drawbacks of the

existing approaches is to use Artificial Intelligence. Fundamentally, image

processing is followed by fuzzy logic. These AI techniques remove the human

errors in detection and the number of steps involved is also less. And more

importantly it is cost effective.

Artificial intelligence has been widely used for logistics, data

mining, medical diagnosis and several other areas. The popularity of AI is due

to the factors like the incredible influence of computers, a superior

prominence on solving specific sub problems, the formation of new binds

between AI and other fields working on related problems etc., AI technique is

expected to display the capabilities like deduction, reasoning and problem

solving, knowledge representation, planning, perception, creativity etc.

3.3.1 Importance of Fuzzy Logic

Supervised methods are extensively used in medical image

segmentation, but they need conditions that are difficult to satisfy the medical

field (Francesco Masuli et al 1999).

Fuzzy logic can be widely used for controlling a process that is too

non-linear or too misunderstood to use a traditional control designs.

Therefore, it can be used to deal with complex systems. Moreover, fuzzy

63

logic facilitates control engineers to easily implement control approaches used

by human operators.

Fuzzy logic offers an efficient way to reach a definite solution

based upon unclear, ambiguous, imprecise, noisy or missing input

information. It can be easily implemented in hardware, software or a

combination of both. It uses an imprecise but very descriptive language to

deal with input data more like human operator. It provides a foundation for

the development of new tools for dealing with natural languages and

knowledge representation.

Clustering is the process of dividing data elements into classes or

clusters so that the data in the same class are similar and data in different

classes are dissimilar. In case of hard clustering, the elements belong to

exactly one class and in the case of soft clustering or fuzzy clustering, the

element can belong to more than one class and associated with each element

is a set of membership levels. Fuzzy clustering is the process of assigning the

membership value and using them to assign elements to one or more clusters.

3.3.2 Fuzzy C-Means (FCM)

The commonly used fuzzy method for clustering is Fuzzy C-Means

(FCM). The aim of the objective function of FCM is to find the cluster

centers and to produce class membership matrix, which designates a

membership to a data point, depending on the relativity of the data point to a

particular class when compared to other classes. (ie) The objective function

provided by FCM is

Jm (U, v) =c

i

ikm

ki

N

k

AvX

1

2

1

)(

(3.8)

64

where X={x1, x2,…xN} RN in a dataset, c is the number of clusters in X: 2

c < N, m is a weighting exponent: 1 m < , U = { ik } is the fuzzy c-

partition of X, AvX ik is an induced a-norm of RN, and A is a positive-

definite (NXN) weight matrix.

3.3.3 Possibilistic C-Means (PCM)

FCM treats the image as separate data points. Hence, an alternate

method called Possibilistic C-Means (PCM) is used. The Possibilistic

C-Means algorithm uses a Possibilistic type of membership function to

illustrate the degree of belonging. It is advantageous that the memberships for

representative feature points be as high as possible and unrepresentative

points have low membership. The intention function, which satisfies the

requirements, is formulated as follows,

(3.9)

where, dij represents the distance between the jth

data and the ith

cluster center,

ij denotes the degree of belonging, m represents the degree of fuzziness, i is

the suitable positive number, c is the number of clusters, and N denotes the

number of pixels. The main advantage of this PCM technique is that the value

of i can be fixed or can be changed at each iteration. The PCM is more

robust in the presence of noise, in finding valid clusters, and in giving a robust

estimate of the centers. The integration of FCM and PCM in Fuzzy

Possibilistic C-Means could provide much better result.

65

Lung Regions Extraction

Segmentation of lung region using

FPCM

Analysis of segmented lung region

Formation of diagnosis rules

Testing and Evaluation

Because of various advantages of the AI and fuzzy logic, the

proposed system uses the Fuzzy Possibilistic C-Means (FPCM) for the

segmentation purpose.

3.4 PROPOSED METHODOLOGY

A new method for segmenting Lung cancer images is presented as

in Figure 3.3. This system is used for detection of lung cancer by analyzing

chest computer tomography (CT) images. In the first stage of this CAD

system, pure basic image processing techniques are used to extract lung

regions. The extracted lung regions in each slice are segmented using Fuzzy

Possibilistic C-Means (FPCM) and it shows good segmentation results in a

short time. FPCM algorithm is presented that incorporates spatial information

into the membership function for clustering.

Figure 3.3 The Lung Cancer Detection System using FPCM

66

3.4.1 Lung Regions Extraction

The main limitations of the earlier gray level thresholding

techniques are the problem of selecting suitable and accurate threshold values.

Moreover, some approaches, as in (Kanazawa et al 1998) need a post

processing step to compensate the lost parts that may occur as a result of

using the thresholding technique. To overcome the problems of the

thresholding methods, a new method is used for the automatic extraction of

lung regions based on one of the different features of the raw data obtained

using the bit-plane slicing technique. The extraction approach described in

this section is fully automatic and depends on a set of basic digital image

processing techniques adapted to the CT data. A CT image of chest contains

different regions such as the background, lung, heart, and liver and other

organs’ areas. The main aim of lung region extraction step is to separate the

lung regions, the regions of interest (ROIs), from the surrounding anatomical

structures.

Figure 3.4 clearly explains the method (Sammouda et al 2006) for

the extraction of the lung regions from CT chest image. Initially, the bit-plane

slicing algorithm (Gonzalez et al 2002) is applied to each CT image of the

raw data. The resulting binary slices are then examined to select the best bit-

plane image that helps in extracting the lung regions from the raw CT-image

data with a certain degree of accuracy and sharpness.

67

Original Image Bit-Plane Slicing Erosion Median Filter Dilation

Outlining Lung border Extraction Flood Fill Algorithm Extracted Lung

Figure 3.4 The Lung regions extraction method

In order to refine the selected bit-plane image, other approaches

were used for different purposes in a sequence of steps. The main aim of

erosion, median filter and dilation steps is to eliminate irrelevant details that

may add extra difficulties to the lung border extraction process. The outlining

step mainly aims to extract the structure’s borders. The lung border extraction

step is used to separate lung structure from all other uninteresting structures.

Finally, a stack-based flood-fill approach is used to fill the extracted lung

regions with their original intensities. Figure 3.5 shows the results of applying

step by step, the lung regions extraction method to a given CT image.

3.4.2 Lung Regions Segmentation

After extracting the lung regions successfully from the raw CT

images, as described in the previous section, the second step of the proposed

CAD system is lung regions segmentation that aims to segment the extracted

lung regions searching for cancerous cell candidates – the new Region Of

Interests (ROIs). A huge number of candidates are chosen with large number

of non-cancerous candidates or false positives and a few numbers of

cancerous candidates.

68

Figure 3.5 Lung regions extraction algorithm: a. original CT image,

b. bit-plane, c. erosion, d. median filter, e. dilation,

f. outlining, g. lung region borders, and h. extracted lung.

There are various segmentation techniques available in the

literature which can be used for this purpose. But FPCM is used here due to

its significant performance.

Fuzzy Possibilistic C-Means (FPCM)

FPCM is a clustering algorithm that integrates the characteristics of

both Fuzzy and Possibilistic C-Means. Memberships and typicalities are very

vital for the correct feature of data substructure in clustering problem. Thus,

an objective function in the FPCM depending on both memberships and

typicalities can be shown as:

J (U,T,V) = (u + t )d (X ,v ) (3.10)

69

With the following constraints:

= 1, {1,…,n} (3.11)

t = 1, {1,…,c} (3.12)

A solution of the objective function can be obtained through an

iterative process where the degrees of membership, typicality and the cluster

centers are updated using

u =

( , )

( ( , )

( )

,1 c,1 n (3.13)

t = =

( , )

( , )

( )

,1 c,1 n (3.14)

v =( )

( ),1 c (3.15)

Equations 3.8 to 3.15 explain the FPCM algorithm. FPCM

produces memberships and possibilities simultaneously, along with the usual

point prototypes or cluster centers for each cluster. FPCM is a hybridization

of Possibilistic C-Means (PCM) and Fuzzy C-Means (FCM) which provides

the solution to various problems.

The advantages of the FPCM method are the following:

Provides regions more homogeneous than other techniques

it reduces the spurious blobs

it removes noisy spots

It is less sensitive to noise than other techniques.

For the segmentation of lung images, the weighting exponent (m)

has been chosen as 2, the typicality weight ( ) was chosen as 2 and

70

termination threshold was fixed as le-5 and the maximum number of

iterations was fixed as 100.

Figure 3.6 Input feature vectors of an image

Figure 3.7 Clustered feature vectors using FPCM

0 20 40 60 80 100 120 1400

20

40

60

80

100

120

140Input feature vectors

0 20 40 60 80 100 120 1400

20

40

60

80

100

120

140Clustered feature vectors (FPCM)

71

Figure 3.6 shows the graphical representation of the feature vectors

of a lung image and Figure 3.7 shows the plot of the clustered feature vectors

using FPCM.

Figure 3.8 Number of iterations taken by FPCM

Figure 3.8 shows the number of iterations taken by FPCM to

achieve the desired result.

FPCM with the specifications mentioned above is applied to each

of the extracted lung regions for the whole data set and the results are

maintained for further processing in the following steps. FPCM segmentation

results are accurate and homogeneous and it takes shorter time to achieve the

desired segmentation results. FPCM needs less than 100 iterations to reach the

targeted results.

3.4.3 Features Extraction and Formation of Diagnostic Rules

The segmentation results were obtained and the approach initiated

by initial cancerous candidate objects or nodules that denote all the members

of one of the classes resulting from the FPCM segmentation approach. The

objects are labeled and different features are extracted for those objects and

0 10 20 30 40 50 60 70 80 900

50

100

150

200

250

300

350Termination measure (FPCM)

Iteration num.

Term

ina

tion

me

asure

va

lue

72

they are used in the following diagnostic step, where some diagnostic rules

are formulated to remove a huge number of false candidates that usually

results from the segmentation step.

3.4.3.1 Feature Extraction

Different features are extracted for the labeled candidates. They are

Area, Centroid, Filled Area, Perimeter, Radius and the average intensity.

Among these features extracted, the following features are selected and used

for framing the diagnostic rules (Sammouda et al 2005)

1. Area of the candidate region

2. The Maximum Drawable Circle (MDC) inside the candidate

region

3. Average intensity value of the candidate region.

It is found that, these features are suitable to achieve accurate

diagnosis experimentally. The first feature (the area of the candidate region or

object) is used to eliminate isolated pixels (seen as noise in the segmented

image), eliminate very small candidate object (Area is less than a threshold

value). This feature generally eliminates a good number of extra candidate

regions that do not have a chance to form a nodule; moreover it also reduces

the computation time needed in the following steps.

The second feature is to denote each candidate region by its

equivalent MDC. This method begins to draw a circle starting from a point

inside a candidate region or object. This circle should accomplish the state

that all the pixels inside the circle belong to the object in process. All the

pixels inside the object are regarded as starting drawing point. The process

starts to draw one-pixel radius size circle starting from a point inside the

candidate region. If the process succeeds, the radius is increased by one pixel

73

and tries to redraw the circle. This process is repeated until the last radius

exceeds the border of the region. Each candidate object saves its circle to be

used in the diagnostic process to eliminate more and more false positive

cancerous candidates.

The average CT intensity value of the candidate region is the third

feature and is used to remove more regions that do not have features of

cancerous cells. The average intensity value denotes the average intensity

value of all the pixels that belong to the same region (object) and is calculated

as follows:

average(j) =( )

where ‘j’ denotes the object index and ranges from 1 to the total number of

candidate objects in the whole image. Intensity (i) denotes the CT intensity

value of pixel ‘i’, and ‘i’ ranges from 1 to ‘n’, where ‘n’ represents the total

number of pixels belonging to object ‘j’.

3.4.3.2 Formation of Diagnostic Rules

Diagnostic rules are formulated for the extracted features and they

are used in the proposed CAD system.

Rule 1: When the area of the object is below the threshold value

‘T1’ for each candidate object, then it is deleted from the candidate list. When

this condition is applied, it has the effect of reducing the number of false

positives that exist in the initial candidate objects.

Rule 2: If the value of MDC of an object is below the threshold

value ‘T2’, then it is deleted from the candidate list. ‘T2’ is chosen to be 2-

pixels radius size, thus any candidate objects with less than 2-pixels radius

74

size would be removed as it is away from being a nodule, and very close from

being a blood vessel. This rule is associated with the medical fact that true

lung nodules show certain circularity especially small lung nodules. When

this condition is applied, it has the effect of removing a large number of

vessels, which in general have a thin oblong, or line shape.

Rule 3: For each candidate object, if the value of the average

intensity of this object lies outside a particular range, i.e. between ‘T3’ and

‘T4’, then it is deleted from the candidate list. The right values chosen for both

thresholds ‘T3’ and ‘T4’ are based on Medical information and

experimentation. The proposed approach used the values of -9000 CT-

intensity value for the threshold ‘T3’ and -12500 CT intensity value for the

threshold ‘T4’. The rule has the effect of removing a few more false positives.

The lung nodule detected for a CT image is shown in Figure 3.9.

Figure 3.9(a) represents the original CT scan image and 3.9(b) represent the

cancer nodule detected image after segmentation and application of diagnosis

rules.

(a) (b)

Figure 3.9 (a) Original Image (b) Lung Nodule Detected after

Segmentation

75

3.4.3.3 Early Detection

The diagnosis rules formulated are very much useful in detecting

the cancerous regions at the early stage. The threshold values used in the

diagnosis rules helps in detecting the cancerous region at its starting stage

(i.e. when the cancer region is very small).

After all the rules are applied, very small numbers of cancerous

candidate objects are present. These remaining candidates are marked by the

CAD system as possible cancerous regions. Then the images with these

regions were reported and exhibited to radiologist to take the final decision.

This shows that, the purpose of the proposed CAD system is not to replace

radiologists; but to assist radiologists and offer a tool that facilitates them in

detecting lung cancer at early stages by alerting them to possible

abnormalities. Moreover, the proposed approach aims at improving the

accuracy of detection and minimizing the time spent by radiologists in

analyzing a vast number of slices per patient.

3.5 DATASET

The experiments were conducted on the proposed computer-aided

diagnosis systems with the help of real time lung images. Nearly, 500 real

time images of different categories of people were collected for the study.

These images included both benign and malignant nodules. Among these

images, 270 patients with no signs of cancer are left out and remaining 230

patients’ images are analyzed for the occurrence of benign and malignant

nodules. Malignant nodules are cancerous while benign nodules are not

cancerous. The images are taken from CT scanner with slice thickness 1 mm

with voltage peak 120kVp, tube current 220mA, with a scan rate of 300 slices

per minute. Each image is stored as an array of type unsigned integer of size

512 x 512 pixels.

76

Patients in the age group of 35-50 are likely to have benign nodules

(99 % are non-cancerous nodules) whereas patients above the age group of 50

are likely to develop malignant nodules. In the case of gender, males are

mostly affected by lung cancer than female.

With the ground truth provided by the radiologist, a dataset

consisting of 300 nodules were taken up for the study. Among these 300

nodules, 123 are malignant and 177 are benign. The size of the nodules ranges

from 5mm to 10 mm and among the 123 nodules, 28 nodules have a size

greater than or equal to 2mm.

3.6 PERFORMANCE EVALUATION

The performance evaluation of this approach can be estimated

based on sensitivity, specificity and accuracy

Sensitivity = TP / (TP + FN)

Specificity = TN / (TN + FP)

Accuracy = (TP + TN) / (TP + FP + TN + FN)

The true positives (TPs) identified by this approach was 94 and

hence the sensitivity was reported as 76.42% (94/123). The number of false

positives (FPs) was 38 and hence the specificity was reported as 78.53 %

(139/177). Accuracy was estimated as 77.67% (233/300).

3.7 SUMMARY

This chapter discusses about an automatic CAD system for early

detection of lung cancer by analyzing raw chest CT images. The approach

starts by extracting the lung regions from the CT image using several image

processing techniques, including bit plane slicing, erosion, median filter,

77

dilation, outlining, and flood-fill algorithm. A novel approach of using bit-

plane slicing technique is introduced instead of the thresholding technique

that is used as the first step in the extraction process to convert the CT image

into a binary image. Bit-plane slicing technique is both faster and data- and

user-independent compared to the thresholding technique. After the extraction

step, the extracted lung regions are segmented using Fuzzy Possibilistic

C-Means (FPCM) algorithm. The FPCM algorithm shows homogeneous

results obtained in a short time. Then, the initial lung candidate nodules

resulting from the FPCM segmentation are analyzed to extract a set of

features to be used in the diagnostic rules. These rules are formulated in the

next step to discriminate between cancerous and non-cancerous candidate

nodules. This technique is a powerful method for noisy image segmentation

and works for both single and multiple-feature data with spatial information.

The extracted features in the proposed system are: the segmented lung

regions, the Maximum Drawable Circle (MDC) and the average intensity

value of the region.

The next chapter deals with the next proposed segmentation

scheme called “Segmentation of Lung Region using Modified Fuzzy

Possibilistic C-Means (MFPCM)”.

chapter 3 segmentation of lung region using fuzzy possibilistic c...

Documents