1 running clustering algorithm in weka presented by rachsuda jiamthapthaksin computer science...

30
1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

Upload: meghan-douglas

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

1

Running Clustering Algorithm in Weka

Presented by Rachsuda Jiamthapthaksin

Computer Science DepartmentUniversity of Houston

Page 2: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

2

What is Weka?

• Data mining software in Java– Supervised learning (classification)– Unsupervised learning (clustering)

• Tools– Exploration– Visualization– Experiment– Statistical summary

Page 4: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

4

Getting Start

Page 5: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

5

Memory Limitation in Weka

• Run Chooser from DOS to increase memory

• C:\> java -Xmx128m -classpath .;/progra~1/weka-3-5/weka.jar

weka.gui.GUIChooser

Page 6: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

6

Weka GUI

Page 7: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

7

Explorer

Page 8: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

8

Open Files (.csv, .arff)

Page 9: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

9

Dataset’s Description

Attributes

Dataset’sstatistics

Page 10: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

10

Remove Class Attribute

Non-classattributes

Page 11: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

11

Select A Clustering Algorithm

Page 12: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

12

Select A Clustering Algorithm

Page 13: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

13

Select A Clustering Algorithm

Page 14: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

14

Parameters’ Setting

Page 15: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

15

Run A Clustering Algorithm

Page 16: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

16

DBSCAN Results=== Run information ===

Scheme: weka.clusterers.DBScan -E 0.9 -M 6 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject

Relation: iris-weka.filters.unsupervised.attribute.Remove-R5Instances: 150Attributes: 4 sepallength sepalwidth petallength petalwidthTest mode: evaluate on training data

=== Model and evaluation on training set ===

DBScan clustering results========================================================================================

Clustered DataObjects: 150Number of attributes: 4Epsilon: 0.9; minPoints: 6Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabaseDistance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObjectNumber of generated clusters: 1Elapsed time: .06

( 0.) 5.1,3.5,1.4,0.2 --> 0( 1.) 4.9,3,1.4,0.2 --> 0( 2.) 4.7,3.2,1.3,0.2 --> 0( 3.) 4.6,3.1,1.5,0.2 --> 0( 4.) 5,3.6,1.4,0.2 --> 0…(146.) 6.3,2.5,5,1.9 --> 0(147.) 6.5,3,5.2,2 --> 0(148.) 6.2,3.4,5.4,2.3 --> 0(149.) 5.9,3,5.1,1.8 --> 0

Clustered Instances

0 150 (100%)

Page 17: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

17

Simplify A Tested Dataset

Page 18: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

18

Simplify A Tested Dataset

Page 19: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

19

Parameters’ Setting

Page 20: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

20

DBSCAN Clustering Results=== Run information ===

Scheme: weka.clusterers.DBScan -E 0.3 -M 50 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject

Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5Instances: 150Attributes: 2 petallength petalwidthTest mode: evaluate on training data

=== Model and evaluation on training set ===

DBScan clustering results========================================================================================

Clustered DataObjects: 150Number of attributes: 2Epsilon: 0.3; minPoints: 50Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabaseDistance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObjectNumber of generated clusters: 2Elapsed time: .03

( 0.) 1.4,0.2 --> 0( 1.) 1.4,0.2 --> 0( 2.) 1.3,0.2 --> 0( 3.) 1.5,0.2 --> 0…(146.) 5,1.9 --> 1(147.) 5.2,2 --> 1(148.) 5.4,2.3 --> 1(149.) 5.1,1.8 --> 1

Clustered Instances

0 50 ( 33%)1 100 ( 67%)

Page 21: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

21

Run k-Means in Weka

Page 22: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

22

Parameters’ Setting

Page 23: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

23

k-Means Clustering Results=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5Instances: 150Attributes: 2 petallength petalwidthTest mode: evaluate on training data

=== Model and evaluation on training set ===

kMeans======

Number of iterations: 6Within cluster sum of squared errors: 5.179687509974782

Cluster centroids:

Cluster 0Mean/Mode: 4.906 1.676 Std Devs: 0.8256 0.4248

Cluster 1Mean/Mode: 1.464 0.244 Std Devs: 0.1735 0.1072

Clustered Instances

0 100 ( 67%)1 50 ( 33%)

Page 24: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

24

ArffViewer: Convert Dataset’s Extension

Page 25: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

25

Open A Dataset’s file

Page 26: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

26

Select A Dataset’s File

Page 27: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

27

View the Dataset

Page 28: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

28

Manipulate the Dataset (Optional)

Page 29: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

29

Save As .Arff File

Page 30: 1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

30

Weka Documentation