weka(datamining tool) - ics labics.ajou.ac.kr/~aislab/weka_hands_on_practice.pdf · ·...
TRANSCRIPT
Weka(DataMining Tool)
Ajou University
Introduction to Weka
• Developed by University of Waikato(Keep being updated since 1999)
• Java based Open source application(Free to download)
• Uses for Datamining & Machine learning
Installation
• Download Link: http://www.cs.waikato.ac.nz/ml/weka/downloading.html
Installation
How to Start Weka
or
Options of Weka
Preprocess- open / edit / save data- modify data by preprocessing to use it
Classify- Select classifier to perform classification
or regression- Training & Testing the data
Cluster- Select clustering algorithm to make
clusters for the data
Associate- Analyze the data by using associator
to make association rules
Select Attribute- Select effective attributes
Visualize- Show 2D graph or plot of the data
Preprocessing
Useful & Most used Preprocessing Filters
- DiscretizeDiscretize a range of numeric attributes in the dataset into nominal attribute
- NormalizeNormalize(-1 ~ 1) all numeric values in the given dataset
- NumericToNominalTurning numeric attributes into nominal ones
- ReplaceMissingValuesReplacing all missing values for nominal and numeric attributes in dataset withthe modes and means from the training data
- StandardizeStandardizes all numeric attributes in the given dataset to have zero mean and unit variance
- StringToNominalTurning string attributes into nominal ones
- SwapValuesSwaps two values of a nominal attributes
Preprocessing
Example> Discretize Function(using “diabetes.arff”)
13
20
Discretized
Classification
Useful & Most used Classifier
- AttributeSelectedClassifierDimensionality of training and test data is reduced by attribute selectionbefore being passed on to a classifier
- ClassificationViaClustering(Regression)A simple meta-classifier that uses a clusterer(regression model) for classification
- DecisionTableClass for building and using a simple decision table majority classifier
- IBk(k-NN)K-nearest neighbor classifier
- LibSVMA wrapper class for using a LibSVM(Library for Support Vector Machine)
- NaiveBayesA Naïve Bayes classifier using estimator class
Classification
Example> IBk(k-NN) Classifier(using “KDDCup99_sample.arff”)
Classification
Example> IBk(k-NN) Classifier(using “KDDCup99_sample.arff”)
Classification
Example> IBk(k-NN) Classifier(using “KDDCup99_sample.arff”)
Classification
Saving, Loading, and Using of built Model
Classification
Saving, Loading, and Using of built Model
Clustering
Useful & Most used Clusterer
- EMUsing simple EM(Expectation Maximization) algorithm
- HierarchicalClustererUsing Agglomerative clustering algorithm
- SimpleKMeansUsing K-means clustering algorithm
Clustering
Example> SimpleKmean(using “KDDCup99_sample.arff”)
Clustering
Example> SimpleKmean(using “KDDCup99_sample.arff”)
Visualization
Select attributions
Example> CfsSubsetEval/BestFirst(using “KDDCup99_sample.arff”)
Select attributions
Example> CfsSubsetEval/BestFirst(using “KDDCup99_sample.arff”)
Association
Example> Apriori(using “KDDCup99_sample.arff”)
Association
Example> Apriori(using “KDDCup99_sample.arff”)