performance analysis of distance measures for computer ...archana singh et al. implemented k-means...
TRANSCRIPT
Performance Analysis of Distance Measures for Computer tomography
Image Segmentation
V.V.Gomathi Dr. S.Karthikeyan
Research Scholar Assistant Professor
Research and Development Centre Department of Information Technology
Bharathiar University College of Applied Sciences
Coimbatore, India Sohar, Oman
[email protected] [email protected]
Abstract
This paper presents a comparative evaluation of
different distance metrics for clustering data points
for organ segmentation. Selecting the exact
distance measure is the challenging problem in clustering. In this research work, we have
compared Euclidean distance, Manhattan
Distance, Minkowski distance, Chebyshev distance
and Signature Quadratic form Distance measures.
The main aim of this research work is to identify
the best distance measures for exact segmentation
of clustering the images by minimizing
fragmentation issue. Real time Dataset are used to
evaluate the distance measures.
Keywords: Euclidean distance, Manhattan
Distance, Minkowski distance, Chebyshev
distance, Signature Quadratic form Distance
measures, clustering, segmentation, Classification.
1. Introduction
Image segmentation is the process of
partitioning a digital image into segments.
Segmentation refers to simplifying and/or change
the representation of an image into more
meaningful and easier to analyze [11]. Image
Segmentation is the most interesting and
challenging problems in computer vision generally
and especially in medical imaging applications.
Accurate, fast and reproducible image
segmentation techniques are required for various
applications. Segmentation algorithms available
vary widely depending on the specific application,
image modality and other factors.
Medical image segmentation is the process of
outlining relevant anatomical structures in an
image dataset. It is a problem that is central to a
variety of medical applications including image
enhancement and reconstruction, surgical planning,
disease classification, data storage and
compression, and 3D visualization [21] [9].
In cluster based medical image segmentation
algorithms, more number of unwanted fragments
exist and also fragments are not consistent when
executed for a certain number of times i.e. when
the same image executed for different number of
times, the result were not holding the same number
of fragments, position of fragment and size of fragment and also were dynamic. For diminishing
these drawbacks, the distance based segmentation
algorithm has been proposed.
A central problem in image recognition and
computer vision is determining the distance
between images [8][12]. Clustering is the process
of organizing objects into groups. The aim of
clustering is to find out the intrinsic grouping in a
set of unlabeled data. An important component of a
clustering algorithm is the distance measure
between data points. Any segmentation or
classification of images involves combining or
identifying objects that are close or similar to each
other.
The choice of distance is very important and
should not be taken unconscientiously. Generally,
some experience or subject matter based
knowledge is more helpful in selecting a suitable distance for any clustering based application.
Distance metrics plays a very important role in the
clustering process. Distance metrics are used to
segmenting the objects by region growing and
classifying image pixels by the cluster analysis in
image processing [1]. Traditionally the most of the
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
400
ISSN:2229-6093
image segmentation techniques are based on
classical metric such as Euclidean metric.
In this paper different distance metrics such as
Euclidean distance, Manhattan Distance,
Minkowski distance, Chebyshev distance,
Signature Quadratic form Distance were analyzed.
The main contribution of this paper is to
demonstrate the performance of these distance
metrics for computer tomography images. To the best of our knowledge, such a performance
comparison has not been done on real time medical
images especially computer tomography images.
The rest of the paper is organized as follows:
Section 2 describes the Related Works. Section 3
discusses Materials and Methods. Section 4
presents experimental results and discussion.
Section 5 concludes the work in the paper.
2. Related Works
Archana Singh et al. implemented K-means
with different measures and found Euclidean
distance metric gives best result and Manhattan
distance metric’s performance is worst[2]. Vadivel
et al. have used Manhattan distance, Euclidean
distance, Vector Cosine Angle distance and Histogram Intersection distance for a number of
color histograms on a large database of images and
the experimental results shows that the Manhattan
distance performs better than the other distance
metrics for all the five types of histograms [22].
N.Selvarasu et al. proposed Euclidean distance
based color image segmentation algorithm for
abnormality Extraction in Thermographs [17].
Sourav Paul et al. integrated a self organizing map
with mahalanobis distance to determine the winner
unit. The distance between the input vector and the
weight vector has been determined by mahalanobis
distance and chooses the unit whose weight vector
has the smallest mahalanobis distance from the
input vector [19]. Hsiang-Chuan Liu et al. proposed
an improved Fuzzy C-Means algorithm based on a
standard Mahalanobis distance (FCM-SM)[10].O. A. Mohamed Jafar et al. made a comparative study
of K-Means and FCM algorithm with Chebyshev
distance, Chi-square distance measures and they
found FCM based on Chi-square distance measure
had better result than Chebyshev distance
measure[15]. Luh Yen et al. proposed a new
distance metric called the Euclidian Commute
Time (ECT) distance, based on a random walk
model on a graph derived from the data which
allows retrieving well-separated clusters of
arbitrary shapes [13]. Modh Jigar.S et al. Used
L*a*b color space and using cosine distance
matrices instead of sqeuclidean Distance with
clustering based K-means segmentation technique
[14].
3. Materials and Methods 3.1. Data set
Different kind of Tumour patient dataset were
collected by a SIEMENS SOMATOM EMOTION SPIRAL CT scanner located at Multi Speciality
Hospital, Coimbatore, Tamilnadu, India. Besides a
normal scan performed at a routine clinical dosage
(130 mA), an additional scan from the same patient
was acquired at a much lower tube current, i.e. 20
mA. The 3D image data consisted of DICOM
(Digital Imaging and Communications in
Medicine) consecutive slices, each slice being of
size 512 by 512 and having 16-bit grey level
resolution. Each of the organs of interest in this
research was manually contoured by the expert for
the comparison of auto segmented output with
manual contoured image.
3.2. Methods
A. An Overview of Distance Measures in
Clustering
Distance metric is a key issue in many
machine learning algorithm [19]. The distance
measure plays an important role in acquiring exact
clusters. It is used to discover the similarity and
dissimilarity between the pair of objects in the
clustering techniques. Clustering techniques are
based on measuring similarity and dissimilarity
between data objects by calculating the distance
between each pair. The choice of distance measure
between clusters has a large effect on the shape of
the resulting clusters. Euclidean distance is
generally used in many clustering techniques. In this work we consider Euclidean distance,
Manhattan Distance, Minkowski distance,
Chebyshev distance and Signature Quadratic form
Distance.
1) Euclidean distance This distance is most commonly used in all
applications especially used in clustering problems.
Euclidean distance computes the root of square
difference between co-ordinates of pair of objects.
Euclidean distance is calculated for every image
pixel from the average intensities. It is computed in
medical image segmentation as:
n
i
iiyx yxd1
2
,
2) Manhattan Distance Manhattan distance is also called city block
distance. It represents distance between points in a
city road grid. Manhattan distance computes the
absolute differences between coordinates of pair of
objects.
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
401
ISSN:2229-6093
n
i
iiyx yxd1
,
3) Minkowski Distance
Minkowski Distance is the generalized distance
metric which is a generalization of the distance
between points in Euclidean space. This is defined
as:
pn
i
p
iiyx yxd
1
1
,
4) Chebyshev distance
Chebyshev distance is also called Maximum value distance or chessboard distance. It computes the
absolute magnitude of the differences between
clustering variable values. It is calculated by the
following formula:
ii
ni
yx yxd
max....2,1
,
5) Signature Quadratic form distance
Signature Quadratic form distance is a
generalization of the Quadratic form distance. It
(SQFD) [4] is an adaptive distance-based similarity
measure. Signature Quadratic Form Distance
measure which allows efficient similarity
computations based on flexible feature
representations. This approach bridges the gap
between the well-known concept of Quadratic
Form Distances and feature signatures. The
Signature Quadratic Form Distance (SQFD) is a
recently introduced distance measure for content-
based similarity. It makes use of feature signatures,
a flexible way to summarize the features of a
multimedia object. The SQFD is a way to measure
the similarity between two objects. Signature Quadratic Form Distance showing good
retrieval performance for various multimedia
databases [5]. The SQFD works on feature
signatures consisting of sets of points, where each
point has a weight and a set of coordinates. If the
points are generated by clustering, they are also
called weighted centroids.
Calculate the similarity matrix A for P and Q with
the similarity function f, by using the following
formula
Similarity Matrix Value f (Q, P) =
221/1 yyxx PQPQ
Where
P - Intensity Vector Pixel Values
(signatures)
Q - Input Image Pixel (signatures)
Qx and Qy – X and Y position
Px and Py – X and Y position
Signature Quadratic form distance [3] [4] is defined
as
SQFDA (Q, P) = T
PQAPQ |**|
Here we have changed the parameter of Signature
Quadratic Form Distance for medical image
segmentation.
B. Major Algorithm for Organ
Segmentation using Distance measures
Various distance measures have been evaluated for
calculating the minimum distance for our
application (Computer tomography Image
Segmentation) as follows:
2
kM istanceEuclideanD 1) ijk YX
)min(MZ ij
ijY kk X M DistanceManhattan 2)
)min(MZ ij
2
kM Distance Minkowski 3) ijk YX
)min(MZ ij
ijkk YmaxM Distance Chebyshev 4) X
Step 1: Read Radiotherapy Structure Set
(RTSS) file
Step 2: Extract manual segmented contour
Data from RTSS
Step 3: Construct Manual segmented
organs from extracted contour data
Step 4: Enhance the contrast of the input
image using dicom contrast
Step 5: Clone the input image as output
image
Step 6: Initialize the cluster step value
Step 7: Generate cluster elements for cluster vector based on the cluster step.
Step 8: Calculate the distance between all
cluster elements and every pixel value
individually.
Step 9: Replace the pixel value in the
output image with the best cluster element
by evaluating the minimum distance.
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
402
ISSN:2229-6093
)min(MZ ij
Where
X - Intensity vector
Xk – kth
Value of Intensity vector
Yij – Input Image Pixel
M - Output vector
Mk - kth
Value of output vector
Zij - Output image pixel
5) Signature Quadratic form distance has been
calculated by the following steps:
a) Extract the feature signatures from intensity
vector and input image pixel i.e. P= Xk , Q= Yij
Where
X - Intensity vector
Xk – kth
Value of Intensity vector
Yij – Input Image Pixel
P, Q - Feature Signatures
b) Calculate the Similarity Matrix =
221/1 nnmm PQPQA
Where
A-Similarity Matrix
Qm and Qn – X and Y position of Q
elements respectively
Pm and Pn – X and Y position of P
elements respectively
c) Signature Quadratic form distance is defined as
SQFD Mk = T
kijkij XYAXY |**|
Where
A-Similarity Matrix
X - Intensity vector
Xk – kth
Value of Intensity vector
Yij – Input Image Pixel
M - Output vector
Mk - kth
Value of output vector
d) )min(MZ ij
Where
M – Output Vector
Zij - Output image pixel
Initially, Radiotherapy Structure Set (RTSS) file is
read from set of DICOM Images. Already all the
necessary organs are contoured by the medical
expert. Generate the manual segmented organs
from extracted contoured data. The contrast of the
input image is enhanced by using dicom contrast
technique. In this algorithm, the initial cluster step
values have chosen either by manually or
randomly. Cluster elements have been generated
for cluster vector based on the cluster step. Then
Calculate the distance between all cluster elements
and every pixel value individually. Replace the
pixel value with the best cluster element by
evaluating the minimum distance. Different
distance methods are used to calculate the distance
between cluster elements and every pixel.
4. Experimental Results and Discussion
Experimentation was carried out on 100 numbers
of different tumour patients contains 100 to 1000
slices of Computer Tomography images using
different Segmentation algorithms. The image
format is DICOM (Digital Imaging
Communications in Medicine). The algorithm has been implemented in Matlab environment. Manual
Segmentation done by the experts. Experimental
results of the images are illustrated here.
able 1. Performance analysis of Euclidean Distance, Manhattan Distance, Minkowski
Figure.1. (a). Input (b) Euclidean CT Image Segmentation
(c) Manhattan (d) Minkowski Segmentation Segmentation
(e) Chebychev (f) SQFD Segmentation Segmentation
Figure.1. Segmentation output using different
distance measures
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
403
ISSN:2229-6093
Table 1. Performance analysis of Euclidean Distance, Manhattan Distance, Minkowski Distance, Chebychev Distance, Signature Quadratic form Distance Segmentation algorithms
The main objective of this paper is to study the
performance of different distance metrics. We
carried out the experiments by applying all distance
measures .The algorithm was applied and segmented on computer tomography images. Table
1 and Figure. 1 shows the results of all
experiments. In our experimental evaluation we
figured out that the sensitivity, specificity, accuracy
and number of fragments are similar for all
distance measures for all organs. It also shows that
there are no best distance measures for segmenting
the image.
5. Conclusion
Selection of distance metrics plays a very
important role in clustering and also it is a very
challenging task. The main aim of our study is to
determine the good distance metrics for clustering
the images. Traditionally Euclidean distance is
used in clustering algorithms. In this paper we have
implemented Euclidean, Manhattan, Minkowski,
Chebychev, Signature Quadratic form distance
measures on real time data set of Computer
tomography images. The results are exactly similar
and the segmentation output is also same for all 5
different metrics. The result can be varied based on the task, number of data and complexity of the task.
There is no universal distance measure which can
be best suited for all clustering applications. But in
our observation also, none of the metrics is a best
metric for medical image segmentation. The
researcher can be use any distance measure based
on their application with respect to clustering.
Acknowledgement
The authors like to thank Dr.M.Hemalatha,
Professor, Department of Computer Science,
Karpagam University for her valuable suggestions,
comments and words of encouragement. It helped
us to make this research work successful.
References
[1] Andras Hajdu, Janos Kormos, Benedek Nagy,
and Zoltan Zorgo, “Choosing appropriate
distance measurement in digital image
segmentation”, 2004.
[2] Archana Singh, Avantika Yadav, Ajay Rana,
“K-means with Three different Distance
Metrics”, International Journal of Computer
Applications, Volume 67, No.10, April 2013.
[3] Beecks C, Uysal M.S, Seidl.T, “Signature
Quadratic Form Distances for Content-Based Similarity,” in Proceeding of ACM
International Conference on multimedia,
2009, pp. 697–700.
[4] Beecks.C, Uysal M.S, Seidl.T , “Signature
Quadratic Form Distances for Content-based
Similarity”, ACM CVIR 2010.
[5] Beecks.C, Uysal M.S, Seidl.T, “A
Comparative Study of Similarity Measures for
Content-Based Multimedia Retrieval”,
International Proceeding. IEEE International
Conference on Multimedia & Expo, pages
1552–1557, 2010.
[6] Christian Beecks, Anca Maria Ivanescu,
Steffen Kirchhoff and Thomas Seidl ,”
Modeling Image Similarity by Gaussian
Mixture Models and the Signature Quadratic
Distance
Measures Organs
Quantitative Parameters
Sensitivity Specificity Accuracy No of
Fragments
Euclidean Distance
Manhattan Distance
Minkowski Distance
Chebychev Distance
Signature Quadratic
form distance
Lung 93.04 99.93 89.68 9
Liver 97.30 99.31 99.25 39
Heart 88.45 99.93 99.67 15
Spinal Cord 98.12 99.33 99.33 11
Bones 67.32 99.98 99.92 3
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
404
ISSN:2229-6093
Form Distance”, IEEE International
Conference on Computer Vision, 2011.
[7] Christian Beecks, Jakub Lokoc, Thomas Seidl,
Tomas Skopal,” Indexing the Signature
Quadratic Form Distance for Efficient
Content-Based Multimedia Retrieval”, ICMR
’11, April 17-20, 2011.
[8] Daniel P.Huttenlocher, Gregory
A.Glanderman, and William J.Rucklidge, “Comparing the Images using the Housdorff
distance”, IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol.15,
No.9, September 1993.
[9] Gomathi V. V., Karthikeyan.S .A Proposed
Hybrid Medoid Shift with K-Means (HMSK)
Segmentation Algorithm to Detect Tumor and
Organs for Effective Radiotherapy.
International Conference on Mining
Intelligence and Knowledge Exploration,
Lecture Notes in Computer Science(Springer)
2013;Dec 18-20; 8284, pp.139-147.
[10] Hsiang-Chuan Liu, Bai-Cheng Jeng, Jeng-
Ming Yih, and Yen-Kuei Yu, “Fuzzy C-Means
Algorithm Based on Standard Mahalanobis
Distances”, Proceedings of the International
Symposium on Information Processing (ISIP’09), August 21-23, 2009, pp. 422-427.
[11] Linda G. Shapiro and George C.
Stockman,”Computer Vision”, New Jersey,
Prentice-Hall, ISBN 0-13-030796-3, pp.279-
325.
[12] Liwei Wang, Yan Zhang, Jufu Feng, “On the
Euclidean Distance of Images”, IEEE
Transaction on Pattern Analysis and Machine
Intelligence, Vol. 27, No.8, Aug.2005, pp.
1334-1339.
[13] Luh Yen, Denis Vanvyve, Fabien Wouters,
Francois Fouss, “Clustering using a random
walk based distance measure”, European
Symposium on Artificial Neural Networks
Bruges, ISBN 2-930307-05-6, April 2005.
[14] Modh Jigar S, Shah Brijesh, Shah Satish k ,” A
New K-mean Color Image Segmentation with Cosine Distance for Satellite Images”,
International Journal of Engineering and
Advanced Technology (IJEAT),ISSN: 2249 –
8958, Volume-1, Issue-5, June 2012.
[15] Mohamed Jafar O.A., Sivakumar.R, “A
Comparative Study of Hard and Fuzzy Data
Clustering Algorithms with Cluster Validity
Indices”, Proceedings of International
conference on Emerging research in
Computing, Information, Communication and
application, Elsevier Publication, 2013.
[16] Peter Grabusts, “The Choice Of Metrics For
Clustering Algorithms “, Proceedings of the
8th International Scientific and Practical
Conference. Volume 2, 2011.
[17] Selvarasu.N, Alamelu Nachiappan and
Nandhitha N.M,” Euclidean Distance Based
Color Image Segmentation of Abnormality
Detection from Pseudo Color Thermographs”,
International Journal of Computer Theory and
Engineering, Vol. 2, No. 4, August, 2010.
[18] Soni Madhulatha.T,”An Overview on
Clustering Methods”, IOSR Journal of
Engineering, Apr. 2012, Vol. 2(4), pp. 719-
725.
[19] Sourav Paul, Mousumi Gupta,” Image Segmentation by Self Organizing Map with
Mahalanobis Distance”, International Journal
of Emerging Technology and Advanced
Engineering, Volume 3, Issue 2, February
2013.
[20] Sung-Hyuk Cha,” Comprehensive Survey on
Distance/Similarity Measures between
Probability Density Functions”, International
Journal Of Mathematical Models and Methods
In Applied Sciences, Issue 4, Volume 1, 2007.
[21] Tsai.A, Wells.W, Tempany.C, Grimson.E,
Willsky.A,”Mutual information in coupled
multi-shape model for medical image
segmentation”, Medical Image Analysis
(Elsevier), 2004, pp. 429–445.
[22] Vadivel. A, Majumdar A.K, Shamik Sural,
"Performance comparison of distance metrics in content-based Image retrieval applications”.
V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405
IJCTA | March-April 2014 Available [email protected]
405
ISSN:2229-6093