Performance Comparison of Classification Algorithms

Download Performance Comparison of Classification Algorithms

Post on 28-Sep-2015

3 views

Category:

Documents

0 download

DESCRIPTION

Data mining is the knowledge discovery process byanalysing the huge volume of data from various perspectives andsummarizing it into useful information. Data mining is used tofind cloaked patterns from a large data set. Classification is oneof the most important applications of data mining. Classificationtechniques are used to classify data items into predefined classlabel. Classification employs supervised learning technique.During data mining the classifier builds classification modelsfrom an input data set, which are used to predict future datatrends. For study purpose various algorithm available forclassification like decision tree, k-nearest neighbour ,Naive Bayes,Neural Network, Back propagation, Artificial Neural, Multi classclassification, Multi-layer perceptron, Support vector Machine,etc. In this paper we introduce four algorithms from them. Inthis paper we have worked with different data classificationalgorithms and these algorithms have been applied on NSL-KDDdataset to find out the evaluation criteria using WaikatoEnvironment for Knowledge Analysis (WEKA).

TRANSCRIPT

Performance Comparison of Classification Algorithms Using WEKA Sreenath.M#1, D.Sumathi*2 1PG Scholar, 2Assistant Professor Department of Computer Science and Engineering, PPG Institute of Technology, Coimbatore, India 1sreenath.m.91@gmail.com 2sumicse.sai@gmail.com Abstract Data mining is the knowledge discovery process by analysing the huge volume of data from various perspectives and summarizing it into useful information. Data mining is used to find cloaked patterns from a large data set. Classification is one of the most important applications of data mining. Classification techniques are used to classify data items into predefined class label. Classification employs supervised learning technique. During data mining the classifier builds classification models from an input data set, which are used to predict future data trends. For study purpose various algorithm available for classification like decision tree, k-nearest neighbour ,Naive Bayes, Neural Network, Back propagation, Artificial Neural, Multi class classification, Multi-layer perceptron, Support vector Machine, etc. In this paper we introduce four algorithms from them. In this paper we have worked with different data classification algorithms and these algorithms have been applied on NSL-KDD dataset to find out the evaluation criteria using Waikato Environment for Knowledge Analysis (WEKA). Keywords classifier algorithms, data mining, nsl-kdd, oner, Hoeffding tree, decision stump, alternating decision tree, weka. I. INTRODUCTION Data mining is the technique of automated data analysis to reveal previously undetected dependence among data .Three of the major data mining techniques are classification, regression and clustering. In this research paper we are working with the classification because it is most important process, if we have a very huge database. Weka tool is used for classification. Classification [1] is one of the most important techniques in data mining to build classification models from an input data set. These build models are used to predict future data trends [2, 3]. Our knowledge about data becomes greater and easier once the classification is complete. We can deduct logic from the classified data. Most of all it makes the data retrieval faster with better results and new data to be sorted easily. There are many data mining tools available [4, 5]. In this paper we will be using Weka data mining tool, which is an open source tool developed using JAVA [6]. It contains tools for data preprocessing, clustering, classification, visualization, association rule, regression. It not only supports data algorithms, but also Meta learners like bagging, boosting and data preparation. Weka toolkit has achieved the highest applicability among Orange, Tanagra, and KNIME, respectively [4]. While using Weka for classification, performance is tested by applying cross validation test mode instead of using percentage split test mode [7]. II. RELATED WORKS Data mining is the method of extracting information from vast dataset. Their techniques apply advanced computation ways to get unknown relations and total up results by analysing the determined dataset to form these relations clear and apprehensible. Hu and et.al [8] conducted experimental comparison of C4.5, LibSVMs, AdaBoosting C4.5, Bagging C4.5, and Random Forest on seven Microarray cancer information sets. They concluded that C4.5 was higher among all algorithms and additionally found that information preprocessing and cleansing improves the potency of classification algorithms. Shin and et.al [9] conducted comparison between C4.5 and Nave bayes and hence concluded that C4.5 is out performing algorithm than Nave bayes. Sharma [10] conducted experiment with weka environment by comparing four algorithms specifically ID3, J48, easy CART and Alternating decision Tree (ADTree). He compared these four algorithms for spam email dataset in terms of classification accuracy. According to his simulation results, the J48 classifier performs better than ID3, CART and ADTree in terms of classification accuracy. Abdul Hamid M. Ragab and et.al [11] compared Classification Algorithms for Students College Enrolment Approval Using Data Mining. They found that C4.5 gives the best performance and accuracy and lowest absolute errors, then PART, Random Forest, Multilayer Perceptron, and Nave Bayes, respectively.Table-1 shows an outline for a few recent works associated with classification algorithms performance and sort of the applications space for the experimental datasets used. It illustrates many data mining algorithms that may be applied into completely different application space. International Journal of Advanced and Innovative Research (2278-7844) / # 193 / Volume 4 Issue 4 2015 IJAIR. All Rights Reserved 193 TABLE I A SUMMARY OF RELATED DATA MINING ALGORITHMS AND THE APPLICATION DATA SETS USED. III. METHODOLOGY We used Intel core i3 Processor platform which consist of 4 GB memory, Windows 7 ultimate operating system, a 500GB secondary memory .In all the experiments, we used Weka 3.7.11, to find the performance characteristics on the input data set. A. Weka interface Weka (Waikato environment for knowledge Analysis) is a widely used machine learning software written in Java, originally developed at the University of Waikato, New Zealand. The weka suite contains a group of algorithms and visualization tools for data analysis with graphical user interfaces for easy access to this functionality. The Weka is employed in many different application areas, specifically for academic purposes and research. There are numerous benefits of Weka: It is freely obtainable under the gnu General Public License. It is portable, since it's totally implemented within the Java programing language and therefore runs on almost any architecture. It is a large collection of data preprocessing and modelling techniques. It is simple to use because of its graphical interface. Weka supports multiple data mining tasks specifically data preprocessing, clustering, classification, regression, feature selection and visualization. All techniques of Weka's software are predicated on the belief that the data is obtainable as one file or relation, wherever each data point is represented by a fixed number of attributes. B. Data set NSL-KDD data set is used for evaluation. The NSL-KDD data set is advised to resolve a number of the inherent issues of the KDD CUP'99 data set. KDD CUP99 is the mostly wide used data set for anomaly detection. However Tavallaee et al directed a measurable investigation on this data set and found two essential issues that enormously influenced the performance of evaluated systems, and lands up in a very poor analysis of anomaly detection approaches. To resolve these problems, they projected anew data set, NSL-KDD that consists of selected records of the whole KDD data set [20]. The following are the benefits of the NSL-KDD over the original KDD data set: First, it doesn't include redundant records within the train set, so the classifiers won't be biased towards more frequent records. Second, the amount of selected records from every difficulty level group is inversely proportional to the share of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a very wider range, which makes it more efficient to have an accurate evaluation of different learning techniques. Third, the quantities of records in the train and test sets is sensible, that makes it reasonable to run the experiments on the entire set without the requirement to randomly choose a tiny low portion. Consequently, analysis Year Authors Data mining Algorithms Data set 2010 Nesreen K. Ahmed,Amir F. Atiya,Neamat El Gayar,Hisham El-Shishiny[12] MLP,BNN ,RBF,K Nearest Neighbor Regression M3 competition data 2011 S. Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore [13] RBF networks,Nave Bayes,J48,CART,SVM-RBF kernel WBC, WDBC, Pima Indians Diabetes database 2011 R. Kishore Kumar, G. Poonkuzhali, P. Sudhakar [14] ID3, J48, Simple CART and Alternating Decision Tree (ADTree) Spam Email Data 2012 Abdullah H. Wahbeh, Mohammed Al-Kabi [15] C4.5,SVM, Nave Bayes Arabic Text 2012 Rohit Arora, Suman[16] C4.5, MLP Diabetes and Glass 2013 S. Vijayarani, M. Muthulakshmi[17] Attribute Selected Classifier, Filtered Classifier, LogitBoost Classifying computer files 2013 Murat Koklu and Yavuz Unal [18] MLP, J48, and Nave Bayes Classifying computer files 2014 Devendra Kumar Tiwary[19] Decision Tree(DT), Nave Bayes (NB), Artificial Neural Networks (ANN), Support Vector Machine (SVM). Credit Card International Journal of Advanced and Innovative Research (2278-7844) / # 194 / Volume 4 Issue 4 2015 IJAIR. All Rights Reserved 194results of different research works are going to be consistent and comparable. C. Classification algorithms The following classifier algorithms are taken for the performance comparison on the NSL-KDD data set. (a) OneR OneR [21], short for "One Rule", accurate and simple classification algorithm that generates one rule for every predictor within the data, then selects the rule with the tiniest total error as its "one rule". To make a rule for a predictor, we construct a frequency table for every predictor against the target. It's been shown that OneR produces rules only slightly less accurate than progressive classification algorithms whereas producing rules that are easy for humans to interpret. (b) Hoeffding Tree A Hoeffding tree [22] is a progressive, anytime decision tree induction algorithm that's capable of learning from data streams, accepting that the distribution generating examples doesn't change over the long run. Hoeffding trees exploit the actual fact that a small sample will usually be enough to decide on the optimal splitting attribute. This is determined mathematically by the Hoeffding bound that quantifies the amount of observations required to estimate some statistics within a prescribed preciseness. One of the features of Hoeffding Trees not shared by other incremental decision tree learners is that its sound guarantees of performance. (c) Decision Stump A decision stump [23] is a machine learning model consisting of one-level decision tree. That is, it's a decision tree with one internal node that is instantly connected to its leaves. The predictions made by decision stump are based on just one input feature. They're also known as 1-rules.Decision stumps are usually used as base learners in machine learning ensemble techniques like boosting and bagging. For example, the ViolaJones face detection algorithm employs AdaBoost with decision stumps as weak learners. (d) Alternating Decision Tree An alternating decision tree (ADTree) [24], combines the simplicity of distinct decision tree with the effectiveness of boosting. The information illustration combines tree stumps, a standard prototype deployed in boosting, into a decision tree kind structure. The various branches aren't any longer mutually exclusive. The root node could be a prediction node, and has simply a numeric score. Consecutive layer of nodes are decision nodes, and are basically a group of decision tree stumps. Subsequent layer then consists of prediction nodes, and so on, alternating between prediction nodes and call nodes. A model is deployed by identifying the possibly multiple ways from the root node to the leaves through the alternating decision tree that correspond to the values for the variables of an observation to be classified. The observation's classification score is the total of the prediction values along the corresponding ways. IV. CLASSIFIER PERFORMANCE MEASURES A confusion matrix contains information regarding actual and foreseen classifications done by a classification system. Performance of such systems is often evaluated using the data within the matrix. The following Fig. 1 shows the confusion matrix, Fig. 1 Confusion matrix The entries within the confusion matrix have meaning which means within the context of our study: a is that the number of correct predictions that an instance is negative, b is that the number of incorrect predictions that an instance is positive, c is that the number of incorrect of predictions that an instance negative, and d is that the number of correct predictions that an instance is positive. The following are the metrics that is used for the evaluation of data set: Accuracy: The accuracy is that the proportion of the total number of predictions that were correct. it's determined using the equation: Accuracy= Detection Rate::Detection Rate is the proportion of the predicted positive cases that were correct, as calculated using the equation: Detection Rate= False Alarm Rate: False Alarm Rate is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation False Alarm Rate=b/ (a+b) V. EXPERIMENTAL RESULTS AND COMPARATIVE ANALYSIS We investigated the performance of designated classification algorithms .The classifications are done using 10-fold cross-validation. In WEKA, all data is considered as instances and features within the data are referred to as attributes. The simulation results are divided into different bar charts for easier analysis and evaluation. The Table-2 shows the performance of classifier algorithms on NSL-KDD data set. TABLE- II International Journal of Advanced and Innovative Research (2278-7844) / # 195 / Volume 4 Issue 4 2015 IJAIR. All Rights Reserved 195PERFORMANCE OF CLASSIFIER ALGORITHMS The Fig.2 shows the Accuracy of classifiers on NSL-KDD data set. From the result it can be observed that Hoeffding Tree is the best classifier, followed by ADTree, oneR and Decision Stump. Fig. 2 Accuracy of classifiers The Fig. 3 shows the Detection Rate of classifiers on NSL-KDD data set. The experimental result shows that, Hoeffding Tree is the best classifier, followed by Decision Stump, oneR and ADTree. Fig. 3 Detection Rate of classifiers The Fig. 4 shows the False Alarm Rate of classifiers on NSL-KDD data set. The result of the experiment shows that, Decision Stump is the best classifier, followed by Hoeffding Tree, oneR and ADTree. Fig. 4 False Alarm Rate of classifiers VI. CONCLUSION Four classification algorithms are investigated in this paper with NSL-KDD as data set. They included Hoeffding Tree, ADTree, oneR and Decision Stump. Comparative study and analysis related to classification measures included Accuracy, Detection Rate and False Alarm Rate have been computed by simulation using Weka Toolkit. Experimental Results show that Hoeffding Tree gives the best performance in terms of Accuracy and Detection Rate .But when we consider False Alarm Rate; Decision Stump is the best performer. Classifier Accuracy Detection rate False alarm rate oneR 0.94615 0.94954 0.06714 Decision Stump 0.81733 0.94964 0.05025 Hoeffding Tree 0.95120 0.95515 0.05952 ADTree 0.95094 0.94592 0.07321 International Journal of Advanced and Innovative Research (2278-7844) / # 196 / Volume 4 Issue 4 2015 IJAIR. All Rights Reserved 196REFERENCES [1] O. Serhat and A.C Yilamaz, Classification And Prediction In A Data Mining Application, Journal of Marmara for Pure and Applied Sciences, Vol. 18, pp. 159-174. [2] H. Kaushik and B.G Raviya, Performance Evaluation of Different Data Mining Classification Algorithm Using WEKA, Indian Journal of Research(PARIPEX), Vol. 2 ,2013. [3] WaiHoAu, KeithC, C.Chan and XinYao, A NovelEvolutionaryDataMiningAlgorithm with Applications toChurn Prediction, IEEE Transactions On Evolutionary Computation, Vol. 7, pp. 532- 545, 2003. [4] G. Karina, S.M Miquel, and S. Beatriz, Tools for Environmental Data Mining and Intelligent Decision Support, International Congress on Environmental Modelling and Software, 2012. [5] C. Giraud-Carrier and O. Povel, Characterising Data Mining software, Intelligent Data Analysis, pp. 181192,2003. [6] H. Mark, F. Eibe, H. Geoffrey, P. Bernhard, R. Peter, H.W Ian, The WEKA Data Mining Software: An Update, SIGKDD Explorations, Vol. 11, 2009. [7] H.W Abdullah ,A.A Qasem , N.A Mohammed, and M.A Emad , A Comparison Study between Data Mining Tools over some Classification Methods, International Journal of Advanced Computer Science and Applications, pp18-26, 2012. [8] H. Hong, L. Jiuyong, P. Ashley, W. Hua, and D. Grant, A Comparative Study of Classification Methods for Microarray Data Analysis, in Australasian Data Mining Conference, 2006. [9] C.T My, S. Dongil, and S. Dongkyoo, A Comparative Study of Medical Data Classification Methods Based on Decision Tree and Bagging Algorithms, in Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, 2009. [10] K.S Aman, A Comparative Study of Classification Algorithms for Spam Email Data Analysis, International Journal on Computer Science and Engineering, Vol. 3, pp. 1890- 1895, 2011. [11] M. Ragab, Abdul Hamid., et al. , A Comparative Analysis of Classification Algorithms for Students College Enrollment Approval Using Data Mining., Proceedings of the 2014 Workshop on Interaction Design in Educational Environments. 2014. [12] K. Ahmed, Nesreen., et al., An empirical comparison of machine learning models for time series forecasting., Econometric Review,s ,Vol. 29 , pp. 594-621, 2010. [13] S. Aruna, S.P. Rajagopalan and L.V. Nandakishore, An Empirical Comparison of Supervised Learning Algorithms in Disease Detection, International Journal of Information Technology Convergence and Services, vol. 1, pp. 81-92, 2011. [14] R. Kishore Kumar, G. Poonkuzhali, and P. Sudhakar, Comparative Study on Email Spam Classifier using Data Mining Techniques, Proceedings of Int. MultConf. of Engineers and Computer Scientists, Vol. 1, 2012. [15] H.W Abdullah and A.K Mohamedi, Comparative Assessment of the Performance of Three, Basic Sci. & Eng, Vol. 21, pp. 15- 28, 2012. [16] A. Rohit and Suman, Comparative Analysis of Classification Algorithms on Different Datasets using WEKA, International Journal of Computer Applications, Vol. 54, pp. 21-25,2012. [17] S. Vijayarani1 and M. Muthulakshmi, Comparative Study on Classification Meta Algorithms, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 1, pp. 1768- 1774, 2013. [18] K. Murat and U. Yavuz, Analysis of a Population of Diabetic Patients Databases with Classifiers, International Journal of Medical, Pharmaceutical Science and Engineering, Vol.7, pp.176- 178, 2013. [19] K.T Devendra, A Comparative Study of Classification Algorithms for Credit Card Approval using WEKA, GALAXY International Interdisciplinary Research Journal, Vol.2,,pp. 165 174, 2014. [20] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, A detailed analysis of the KDD CUP99 data set, IEEE International. Conference on Computational Intelligence in Security and Defence Applications., pp. 5358, 2009. [21] Holte, C. Robert. "Very simple classification rules perform well on most commonly used datasets." Machine learning, Vol. 11, pp. 63-90,1993. [22] G. Hulten, S. Laurie, and D. Pedro, Mining time-changing data streams, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. [23] W.Iba and P. Langley, Induction of One-Level Decision Trees, in Proceedings of the Ninth International Conference on Machine Learning, pp. 233240,1992. [24] Y. Freund, L. Mason, The alternating decision tree learning algorithm In Proceeding of the Sixteenth International Conference on Machine Learning, pp. 124-133, 1999. International Journal of Advanced and Innovative Research (2278-7844) / # 197 / Volume 4 Issue 4 2015 IJAIR. All Rights Reserved 197

Recommended

View more >