implementation of artificial intellegence to diagnose and predicting breast cancer disease

Upload: himawan

Post on 08-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    1/6

    Implementation of Artificial Intellegence to Diagnose and Predicting

    BreastCancer disease :Review

    Irfan1, Himawan2, Ni Wayan Parwati3

    1 Computer Science FacultyBudi Luhur University, Jakarta, 12260Telp : (021) 5853753 ext 253, Fax : (021)

    E-mail :[email protected]

    2 Computer Science FacultyBudi Luhur University, Jakarta, 12260

    Telp : (021) 5853753 ext 253, Fax : (021)E-mail : [email protected]

    3 Computer Science Faculty

    Budi Luhur University, Jakarta, 12260Telp : (021) 5853753 ext 253, Fax : (021)

    E-mail : [email protected]

    Abstract

    Breast cancer is one of the most common and deadly disease among women in the world. Detection of breast cancer

    in its early stage is the key of its cure. The automatic diagnosis of breast cancer is important. Artificial intelligence

    is now challenging research area in medicine, in this paper we show how artificial intelligence are used to

    diagnosis breast cancer. Neural network and fuzzy logic have been successfully applied to the problem of breast

    cancer diagnosis. People can be checked for breast cancer disease quickly and at an early stage. This review

    indicates that various artificial intelligence techniques can be effectively used for breast cancer diagnosis. The

    prediction can help a doctor to plan for a better medication and give patient with early diagnosis. The experiments

    that used on paper that we have reviewed using Wisconsin Breast Cancer Dataset (WBCD).

    Keywords:Breast cancer diagnosis, Artificial intelligence, fuzzy logic, neural network

    1. INTRODUCTION

    Breast cancer is a disease initially found in the form of tumor around the breast. These tumors are classified asbenign (non cancerous) and malignant (cancerous). The malignant tumors are cancer, where the cancer cell can invade anddamage tissues and organs near the tumor. Breast cancer is the second leading cause of cancer deaths among women in theworld [2]. According to a World Health Organization reports, breast cancer was detected in 1.3 million women in theworld every year. Improvement in diagnostic procedures and effective medical aid has much reduced breast cancer death rate.A major problem in medical science involves diagnosis of disease, based on various tests performed upon the patient [17].Accurate diagnosis and Early detection can improve the survival rate of breast cancer patients [3].

    Breast cancer classification, diagnosis and prediction techniques have been a great researched area in the world ofmedical informatics. Several articles have been published which successfully classify breast cancer datasets using varioustechniques such as fuzzy logic and neural networks. Since the breast tumors whether malignant or benign share structuralsimilarities, it becomes an extremely tedious and time-consuming task to manually diagnose them.Accurate classification isimportant as the potency of the cytotoxic drugs administered during the treatment can be life threatening or may develop intoanother cancer. Manual laboratory analysis or biopsies are time-consuming and yet accurate system of prediction. hence, anautomated system to provide a faster and more reliable diagnosis methods for the patients is needed. Nevertheless, classifiersystems help the medical community to a great extent in cancer detection. The advantages of such systems are: theyare fast and capable of detailed examination, free from subjective errors and minimum patient inconvenience.

    2. LITERATURE REVIEW

    A mount of research on diagnosis of breast cancer was founded in ma ny literature. Many of them show goodclassification accuracy in diagnose. According Quinlan [4] classification accuracy using 10-fold cross- validation with C4.5decision tree method achieved 94.74%. Hamiton, Shan, & Cercone [5] accuracy with RIAC method obtained 94.99%. Ster &Dobnikar [6] with linear discreet analysis method obtained 96.8%. Nauck and Kruse [7] was increased 95.06% with

    mailto:[email protected]:[email protected]:[email protected]
  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    2/6

    neuron- fuzzy techniques. Pena-Reyes and Sipper [8] used the classification technique of fuzzy-GA method, reaching aclassification accuracy of 97.36%. Setiono [9] employed the classification based on a feed forward neural network ruleextraction algorithm, the accuracy was 98.10%. according Goodman, Boggess, and Watkins [10] used three differentmethods, optimized learning vector quantization (LVQ), big LVQ, and artificial immune recognition system (AIRS), and theobtained accuracies were 96.7%, 96.8%, and 97.2%, respectively. Albrecht, Lappas, Vinterbo, Wong, and Ohno-Machado[11] applied a learning algorithm that combined logarithmic simulated annealing with the Perceptron algorithm, the

    reported accuracy was 98.8%. the method proposed by Abonyi and Szeifert [12] an accuracy of 95.57% was obtained withthe application of supervised fuzzy clustering technique. Polat and Gunes [13] least square SVM was used and an accuracyof 98.53% was obtained. Mehmet Fatih Akay[14] increased the accuracy to 99.51%, by combining SVM with featureselection.

    Table 1: Classification accuracies obtained with our method and other classifiers from literature:

    No Author (year) Method Classification Accuracy (%)

    123456

    78910

    11121314

    Quinlan (1996)Hamiton et al. (1996)Ster and Dobnikar (1996)Nauck and Kruse (1999)Pena-Reyes and Sipper (1999)Setiono (2000)

    Goodman et al. (2002)Goodman et al. (2002)

    Goodman et al. (2002)Albrecht et al. (2002)

    Abonyi and Szeifert (2003)Polat and Gunes (2007)Mehmet Fatih Akay (2009) N.Kumaravel, J. Palanivel (2011)

    C4.5RIACLinear Discreet Analysis (LDA)Neuro-fuzzyFuzzy GAfeed forward neural network rule

    extraction algorithmLVQBig LVQAIRSlearning algorithm that combinedlogarithmic simulated annealing withthe Perceptron algorithmsupervised fuzzy clusteringleast square SVMcombining SVM with feature selectionFuzzy C-Means Clustering

    94.74

    94.99

    96.80

    95.06

    97.36

    98.10

    96.70

    96.80

    97.20

    98.80

    95.57

    98.53

    99.51

    99.87

    3. METHODOLOGY

    3.1. Breast Cancer DatasetThere are ten features computed using digital image [1], as following:1. Radius (mean of distances from center to points on the perimeter)2. Texture (standard deviation of gray-scale values)3. Perimeter 4. Area5. Smoothness (local variation in radius lengths)6. Compactness (perimeter 2/are 1.0)7. Concavity (severity of concave portions of the contour)8. Concave points (number of concave portions of the contour)9. Symmetry10. Fractal dimension ("coastline approximation"-1).Many researchers use WBCD to conduct their research so that we are able to compare the result of reviewed paper.

    According to A, Ahirwar and R.S. Jadon [15] using gray level co-occurrence matrices (GLCM). The feature extraction usingformula will describ on the table 2:

    Table 2: feature extraction on GLCM

    N

    oMethod Formula

    1 Contrast

    Sc=

    2Entropy:

    Se=-

  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    3/6

    3 Energy:

    Sen=

    4Mean:

    Sm=

    5

    InverseDifferenc

    eMoment: Sidm=

    6Standard

    Deviation: Sd=

    3.2. Fuzzy C-Means ClusteringFuzzy C-means (FCM) is a clustering method which allows one piece of data to belong to two or more clusters.

    This method developed by Dunn[16] in 1973 and improved is used in pattern recognition frequently. The algorithm of FuzzyC-means (FCM) was following here :

    Figure 1. The algorithm of Fuzzy C-means (FCM)

    3.4. The Generalized Regression Neural Networks (GRNN)

    There are four layer in GRNN: input layer, pattern layer, summation layer, and output layer. Input layer is fully

    connected to pattern layer. Each pattern unit is connected to the neurons on the summation layer. The summation neuroncomputes the sum of weighted output of the pattern layer while the D summation neuron calculates the un weighted outputs ofthe pattern neurons[17]. The connection weight between neuron in the pattern layer and the summation neuron is Y i. For Dsummation neuron, the connection weight is unity. The output layer divides the output of each S summation neuro and Dsummation neuron, yielding the predicted value to an unknown input vector [18].

    Vij=

    Initialize the membership

    matrix KA

    Calculate Fuzzy cluster

    center

    Compute the cost

    function

    Compute a new kA

    Show the result

    Start

    Finni

    sh

  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    4/6

    Figure 2. GRNN built up in a way such that it can be used as a parallel Neural Network

    3.5. Probabilistic Neural Network

    A probabilistic neural network is an implementation of statistical algorithm called Kernel discriminant analysis inwhich the operations are organized into a multilayered feed forward network with four layers: input layer, pattern layer,summation layer, output layer. This method give a fast training process and guaranteed to converge to an optimal classifier asthe size of the representative training set increases. In Probabilistic neural network, training samples can be added or removedwithout extensive retraining.

    Pattern layer, there is one pattern node for each training example. Each pattern node built a product of the weightvector and the given example for classification, where the weights entering a node are from a particular example. The prouct is

    passed through the activation function. Summation layer: each summation node receives the outputs from pattern nodesassociated with a given class. output layer: the output nodes are binary neurons that produce the classification decision.

    4. RESULT AND DISCUSSION

    4.1. Fuzzy Logic

    The data was used for training SVM[1]. In this work , there are two classes: the first is benign (non cancerous) and

    the other is malignant (cancerous). Performance classification of the classifier are calculated using fuzzy C-meansclustering. The highest classification accuracy achieved is 97.007 %. A poor classification, with an accuracy of 54.401%.for comparison. Experiments are conducted with six different epochs of the data. First epoch contains to 5 10% of the data,

    second epoch 10- 20%, third epoch 20 40%, fourth 40- 50 %, fifth epoch 5070% and sixth epoch 70 100%. They areshown on table 3.

    Table 3. Experiments are conducted with six different epochs of the data

    Epoch Epoch percentage Sensitivity Specificity Positive predictive value Negative Predictive value

    1

    2

    3

    4

    5

    6

    5 10%10 - 20%20 - 40%40-50%

    50-70%

    70- 100%

    96.69

    98.6

    97.83

    98.93

    98.74

    99.82

    95.54

    97.05

    96.84

    98.27

    95.54

    99.71

    96.9

    98.41

    92.30

    97.67

    95.45

    99.73

    97.97

    94.28

    96.61

    99.74

    98.71

    99.83

    4.2. Neural Network

    Material that used in the research [17] was derived from the internet site of University at California at Irvine (UCI)Machine Learning Data Repository. The files contain medical data concerning breast cancer classification cases that werecategorized into benign or malignant. Table 4 shows the discretion of three dataset:

    Table 4. Data Discretions

  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    5/6

    Dataset Number Number of Instance Number of Attributes Data type Class Distribution

    1 699, missing values = 16 10 Integer Benign : 458Malignant : 241

    2 569 32 RealBenign : 357Malignant : 212

    3 198 35 Real Benign : 151Malignant : 47

    Table 5. Classes and their data discretion

    Dataset

    Number

    Train/Dataset Class A Train Class B Train Class A Test Class B Test

    1 477/203 311 166 133 732 398/171 250 148 107 643 136/58 32 104 14 44

    Selection of proper neural network structure is one of the most difficult problems for neural network modeling. There are threedifferent neural network structures, multi layer perceptron (MLP), generalized regression neural network (GRNN) andprobabilistic neural network (PNN) were applied to three Wisconsin Breast Cancer Dataset (WBCD)to show the performanceof neural networks on breast cancer data. The following table is the result of ANNs performance:

    Table 6. ANNs Performance [17]

    Data Set MLP Performance (%) GRNN Performance (%) PNN Performance (%)

    1 99 97 992 98 95 963 70 75 75

    5. CONCLUSSION

    Fuzzy C-Means Clustering method was brought results with higher classification accuracies. The 80-20% training- testpartition gives a highest classification accuracy achieved 99.87%.

    Neural network have been applied for pattern classification and recognition problems. The performance of threenetworks MLP,GRNN and PNN was investigated for breast cancer diagnosis using three data set. There are different results onthe third dataset because the number of instance was less than the other sets. An overall result shows that the most suitableneural networks for classifying WCBD data are MLP and GRRN.

    REFERENCES

    [1] J. Palanivel, N. Kumaravel, An Efficient Breast Cancer Screening System Based on Adaptive Support Vector Machineswith Fuzzy C-Means Clustering, European Journal of Scientific Research, ISSN 1450-216X Vol.51 No.1 (2011),pp.115-123

    [2] http://napavalley.patch.com/articles/healthy-living-can-prevent-breast-cancer-napa-valley-resources[3] D.West, P.Mangiameli, R,Rampal, and V.West. Ensemble Strategies for a medical diagnosis decision support

    system: A breast Cancer diagnosis application. European Journal of Operational Research, 2005.vol.162, pp. 532-551,

    [4] J.R.Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 1996, vol.4,pp. 7790.

    [5] H. J .Hamiton, N. Shan, and N. Cercone RIAC: A rule induction algorithm based on approximateclassification. Technical Report CS. University of Regina, ,1996, pp.96-06.

    [6] B.Ster, and A.Dobnikar, Neural networks in medical diagnosis: Comparison with other methods.Proceedings of the international conference on engineering applications of neural networks, 1996, pp. 427430.

    [7] D.Nauck, and R. Kruse,Obtaining interpretable fuzzy classification rules from medical data. Artificial Intelligencein Medicine, , 1999, vol.16, pp.149169.[8] C. A.Pena-Reyes, and M. Sipper, A fuzzy-genetic approach to breast cancer diagnosis. Artificial Intelligence

    http://napavalley.patch.com/articles/healthy-living-can-prevent-breast-cancer-napa-valley-resourceshttp://napavalley.patch.com/articles/healthy-living-can-prevent-breast-cancer-napa-valley-resources
  • 8/23/2019 Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer Disease

    6/6

    in Medicine , 1999, vol.17, pp.131155,.[9] R. Setiono, Generating concise and accurate classification rules for breast cancer diagnosis. Artificial

    Intelligence in Medicine, 2000, vol.18 (3), pp.205217.[10] D.E.Goodman, L. Boggess, and A.Watkins, Artificial immune system classification of multiple-class

    problems. Proceedings of the artificial neural networks in engineering, 2002, pp. 179183.[11] A. A.Albrecht, G. Lappas, S. A. Vinterbo, C. K Wong, and L. Ohno- Machado Two applications of the

    LSA machine. Proceedings of the 9th international conference on neural information processing , 2002, pp. 184189.[12] J.Abonyi, and F. Szeifert, Supervised fuzzy clustering for the identification of fuzzy classifiers. PatternRecognition Letters, ,2003, vol.14(24), 21952207.

    [13] K.Polat, and S.Gunes, Breast cancer diagnosis using least square support vector machine. Digital SignalProcessing, 2007,vol.17(4), 694701.

    [14] Mehmet Fatih Akay Support vector machines combined with feature selection for breast cancer diagnosisExpert Systems with Applications, Elsivier. ,2009, Vol.36, pp. 3240324.

    [15] Ahirwar and R.S. Jadon , Characterization of tumor region using SOM and Neuro Fuzzy techniques in DigitalMammography, International Journal of Computer Science & Information Technology (IJCSIT), Feb 2011, Vol 3,No 1, pp.199-211

    [16] J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-SeparatedClusters", Journal of Cybernetics, 1973, vol.3 pp.32-57

    [17] A. A. E. Howida, H. H. Mohammed, Breast Cancer Diagnosis Using Intelligence Neural Network, J.Sc. Tech, (1)

    2011, pp 159-171[18] Kerem .H CIIZOLU1 Pnar AKINArtificial neural network models in : Rain fall Run - off Modeling of Turkish

    Rivers. 1990