machine learning with weka - fordham university computer and

138
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

Upload: others

Post on 12-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Department of Computer Science,

University of Waikato, New Zealand

Eibe Frank

WEKA: A Machine

Learning Toolkit

The Explorer

• Classification and

Regression

• Clustering

• Association Rules

• Attribute Selection

• Data Visualization

The Experimenter

The Knowledge

Flow GUI

Conclusions

Machine Learning with WEKA

2/22/2011 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

2/22/2011 University of Waikato 3

WEKA: the software

Machine learning/data mining software written in

Java (distributed under the GNU Public License)

Used for research, education, and applications

Complements “Data Mining” by Witten & Frank

Main features:

Comprehensive set of data pre-processing tools,

learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization)

Environment for comparing learning algorithms

2/22/2011 University of Waikato 4

WEKA: versions

There are several versions of WEKA:

WEKA 3.0: “book version” compatible with

description in data mining book

WEKA 3.2: “GUI version” adds graphical user

interfaces (book version is command-line only)

WEKA 3.3: “development version” with lots of

improvements

This talk is based on the latest snapshot of WEKA

3.3 (soon to be WEKA 3.4)

2/22/2011 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...

WEKA only deals with “flat” files

2/22/2011 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...

WEKA only deals with “flat” files

2/22/2011 University of Waikato 7

2/22/2011 University of Waikato 8

2/22/2011 University of Waikato 9

2/22/2011 University of Waikato 10

Explorer: pre-processing the data

Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary

Data can also be read from a URL or from an SQL

database (using JDBC)

Pre-processing tools in WEKA are called “filters”

WEKA contains filters for:

Discretization, normalization, resampling, attribute

selection, transforming and combining attributes, …

2/22/2011 University of Waikato 11

2/22/2011 University of Waikato 12

2/22/2011 University of Waikato 13

2/22/2011 University of Waikato 14

2/22/2011 University of Waikato 15

2/22/2011 University of Waikato 16

2/22/2011 University of Waikato 17

2/22/2011 University of Waikato 18

2/22/2011 University of Waikato 19

2/22/2011 University of Waikato 20

2/22/2011 University of Waikato 21

2/22/2011 University of Waikato 22

2/22/2011 University of Waikato 23

2/22/2011 University of Waikato 24

2/22/2011 University of Waikato 25

2/22/2011 University of Waikato 26

2/22/2011 University of Waikato 27

2/22/2011 University of Waikato 28

2/22/2011 University of Waikato 29

2/22/2011 University of Waikato 30

2/22/2011 University of Waikato 31

2/22/2011 University of Waikato 32

Explorer: building “classifiers”

Classifiers in WEKA are models for predicting

nominal or numeric quantities

Implemented learning schemes include:

Decision trees and lists, instance-based classifiers,

support vector machines, multi-layer perceptrons,

logistic regression, Bayes’ nets, …

“Meta”-classifiers include:

Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

2/22/2011 University of Waikato 33

2/22/2011 University of Waikato 34

2/22/2011 University of Waikato 35

2/22/2011 University of Waikato 36

2/22/2011 University of Waikato 37

2/22/2011 University of Waikato 38

2/22/2011 University of Waikato 39

2/22/2011 University of Waikato 40

2/22/2011 University of Waikato 41

2/22/2011 University of Waikato 42

2/22/2011 University of Waikato 43

2/22/2011 University of Waikato 44

2/22/2011 University of Waikato 45

2/22/2011 University of Waikato 46

2/22/2011 University of Waikato 47

2/22/2011 University of Waikato 48

2/22/2011 University of Waikato 49

2/22/2011 University of Waikato 50

2/22/2011 University of Waikato 51

2/22/2011 University of Waikato 52

2/22/2011 University of Waikato 53

2/22/2011 University of Waikato 54

2/22/2011 University of Waikato 55

2/22/2011 University of Waikato 56

2/22/2011 University of Waikato 57

2/22/2011 University of Waikato 58

2/22/2011 University of Waikato 59

2/22/2011 University of Waikato 60

2/22/2011 University of Waikato 61

2/22/2011 University of Waikato 62

2/22/2011 University of Waikato 63

2/22/2011 University of Waikato 64

2/22/2011 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 68

2/22/2011 University of Waikato 69

2/22/2011 University of Waikato 70

2/22/2011 University of Waikato 71

2/22/2011 University of Waikato 72

2/22/2011 University of Waikato 73

2/22/2011 University of Waikato 74

2/22/2011 University of Waikato 75

Quic k Time™ and a TIFF (LZW) dec ompres s or are needed to s ee this pic ture.

2/22/2011 University of Waikato 76

2/22/2011 University of Waikato 77

2/22/2011 University of Waikato 78

2/22/2011 University of Waikato 79

2/22/2011 University of Waikato 80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 82

2/22/2011 University of Waikato 83

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011 University of Waikato 84

2/22/2011 University of Waikato 85

2/22/2011 University of Waikato 86

2/22/2011 University of Waikato 87

2/22/2011 University of Waikato 88

2/22/2011 University of Waikato 89

2/22/2011 University of Waikato 90

2/22/2011 University of Waikato 91

2/22/2011 University of Waikato 92

Explorer: clustering data

WEKA contains “clusterers” for finding groups of

similar instances in a dataset

Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst

Clusters can be visualized and compared to “true”

clusters (if given)

Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

2/22/2011 University of Waikato 93

2/22/2011 University of Waikato 94

2/22/2011 University of Waikato 95

2/22/2011 University of Waikato 96

2/22/2011 University of Waikato 97

2/22/2011 University of Waikato 98

2/22/2011 University of Waikato 99

2/22/2011 University of Waikato 100

2/22/2011 University of Waikato 101

2/22/2011 University of Waikato 102

2/22/2011 University of Waikato 103

2/22/2011 University of Waikato 104

2/22/2011 University of Waikato 105

2/22/2011 University of Waikato 106

2/22/2011 University of Waikato 107

2/22/2011 University of Waikato 108

Explorer: finding associations

WEKA contains an implementation of the Apriori

algorithm for learning association rules

Works only with discrete data

Can identify statistical dependencies between

groups of attributes:

milk, butter bread, eggs (with confidence 0.9 and

support 2000)

Apriori can compute all rules that have a given

minimum support and exceed a given confidence

2/22/2011 University of Waikato 109

2/22/2011 University of Waikato 110

2/22/2011 University of Waikato 111

2/22/2011 University of Waikato 112

2/22/2011 University of Waikato 113

2/22/2011 University of Waikato 114

2/22/2011 University of Waikato 115

2/22/2011 University of Waikato 116

Explorer: attribute selection

Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones

Attribute selection methods contain two parts:

A search method: best-first, forward selection,

random, exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary

combinations of these two

2/22/2011 University of Waikato 117

2/22/2011 University of Waikato 118

2/22/2011 University of Waikato 119

2/22/2011 University of Waikato 120

2/22/2011 University of Waikato 121

2/22/2011 University of Waikato 122

2/22/2011 University of Waikato 123

2/22/2011 University of Waikato 124

2/22/2011 University of Waikato 125

Explorer: data visualization

Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem

WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)

To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values

“Jitter” option to deal with nominal attributes (and

to detect “hidden” data points)

“Zoom-in” function

2/22/2011 University of Waikato 126

2/22/2011 University of Waikato 127

2/22/2011 University of Waikato 128

2/22/2011 University of Waikato 129

2/22/2011 University of Waikato 130

2/22/2011 University of Waikato 131

2/22/2011 University of Waikato 132

2/22/2011 University of Waikato 133

2/22/2011 University of Waikato 134

2/22/2011 University of Waikato 135

2/22/2011 University of Waikato 136

2/22/2011 University of Waikato 137

2/22/2011 University of Waikato 138

Conclusion: try it yourself!

WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

Also has a list of projects based on WEKA

WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard

Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger

,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,

Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,

Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,

Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang