dt mii statisticadata mining: statisticauser.engineering.uiowa.edu/~comp/public/statistica.pdf · 2...

28
1 Dt Mi i STATISTICA Data Mining: STATISTICA The University of Iowa Intelligent Systems Laboratory Outline •Prepare the data •Classification and regression •Clustering •Association rules Graphic user interface The University of Iowa Intelligent Systems Laboratory Graphic user interface

Upload: trinhhuong

Post on 16-Feb-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

1

D t Mi i STATISTICAData Mining: STATISTICA

The University of Iowa Intelligent Systems Laboratory

Outline

•Prepare the data•Classification and regression•Clustering•Association rules •Graphic user interface

The University of Iowa Intelligent Systems Laboratory

•Graphic user interface

2

Prepare the Data• Statistica can read from Excel, .txt and many other types of files• Compared with WEKA, Statistica is much easier in terms of data

preparing

The University of Iowa Intelligent Systems Laboratory

Open an Excel File• Click the “Import selected sheet to Spreadsheet”• Select the desired Excel sheet where your data is stored• Get variable names from the first row

The University of Iowa Intelligent Systems Laboratory

3

Open an Excel File• Change variable type

The University of Iowa Intelligent Systems Laboratory

Open an Excel File• Change variable type

The University of Iowa Intelligent Systems Laboratory

4

Classification and Regression

• C&RT• C&RT• Boosting tree• Neural Networks

The University of Iowa Intelligent Systems Laboratory

C&RT Classification• Iris data is used as a example data set

The University of Iowa Intelligent Systems Laboratory

5

C&RT Classification• Click “Data Mining” menu and find the “Interactive Trees”

The University of Iowa Intelligent Systems Laboratory

C&RT Classification• View the final tree and understand the results

The University of Iowa Intelligent Systems Laboratory

6

C&RT---Regression• Use the CPU data set and select the regression analysis

The University of Iowa Intelligent Systems Laboratory

C&RT---Regression• Regression tree structure

The University of Iowa Intelligent Systems Laboratory

7

C&RT---Regression

redi

cted

val

ues

The University of Iowa Intelligent Systems Laboratory

Pr

Boosting tree Classification• In “Data Mining” menu and find the “Boosted tree classifier and regression”

The University of Iowa Intelligent Systems Laboratory

8

Boosting tree Classification• See the results and predictor’s importance

The University of Iowa Intelligent Systems Laboratory

Boosting tree Classification• See the results and predictor’s importance

The University of Iowa Intelligent Systems Laboratory

9

Boosting tree Regression• CPU data set

The University of Iowa Intelligent Systems Laboratory

Boosting tree Classification• See the results and predictor’s importance

redi

cted

val

ues

The University of Iowa Intelligent Systems Laboratory

Pr

10

Boosting tree Classification• See the results and predictor’s importance

The University of Iowa Intelligent Systems Laboratory

Boosting tree Classification• See the results and predictor’s importance

The University of Iowa Intelligent Systems Laboratory

11

Neural Networks Classification• In “Data Mining” menu and find the “Automated Neural Networks”

The University of Iowa Intelligent Systems Laboratory

Neural Networks Classification• Choose “Classification”, then select variables

The University of Iowa Intelligent Systems Laboratory

12

Neural Networks Classification• Statistica will try a set of different neural networks and keep the best ones

The University of Iowa Intelligent Systems Laboratory

Neural Networks Classification• See the classification results

The University of Iowa Intelligent Systems Laboratory

13

Neural Networks Classification• See the classification results---Predictions

The University of Iowa Intelligent Systems Laboratory

Neural Networks Classification• See the classification results---Predictions

The University of Iowa Intelligent Systems Laboratory

14

Neural Networks Classification• See the classification results---Confusion matrix

The University of Iowa Intelligent Systems Laboratory

Neural Networks Regression• CPU data set

The University of Iowa Intelligent Systems Laboratory

15

Neural Networks Regression• CPU data set, select variables

The University of Iowa Intelligent Systems Laboratory

Neural Networks Regression• Training and results

The University of Iowa Intelligent Systems Laboratory

16

Neural Networks Regression• Predictions

The University of Iowa Intelligent Systems Laboratory

Neural Networks Regression• Some statistics about the predictions

The University of Iowa Intelligent Systems Laboratory

17

Clustering• Use the Deere data set

The University of Iowa Intelligent Systems Laboratory

Clustering• Select k-Means and choose the variables

The University of Iowa Intelligent Systems Laboratory

18

Clustering• Choose the distance metrics and initial cluster centers

The University of Iowa Intelligent Systems Laboratory

Clustering• 5 clusters and see the results

The University of Iowa Intelligent Systems Laboratory

19

Clustering• Centroids (cluster means)

The University of Iowa Intelligent Systems Laboratory

Clustering• Members and their distance to the centroids

The University of Iowa Intelligent Systems Laboratory

20

Association rules• Use the Deere data set

The University of Iowa Intelligent Systems Laboratory

Association rules• Select variables and set up proper parameters

The University of Iowa Intelligent Systems Laboratory

21

Association rules• See rules

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Divide CPU data into training and testing data set

The University of Iowa Intelligent Systems Laboratory

22

Graphic User Interface

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Choose different algorithms

The University of Iowa Intelligent Systems Laboratory

23

Graphic User Interface• Insert the selected data mining algorithms into workspace

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Select data sources

The University of Iowa Intelligent Systems Laboratory

24

Graphic User Interface• Specify whether the data is used to build the model or used as a testing set

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Connect the data with data mining algorithms

The University of Iowa Intelligent Systems Laboratory

25

Graphic User Interface• Connect the data with data mining algorithms

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Set up deployment, double click the data mining algorithm icon

The University of Iowa Intelligent Systems Laboratory

26

Graphic User Interface• Click “Run” button

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• See the deployment code by double click the icons in “Reports” section

code

The University of Iowa Intelligent Systems Laboratory

C

27

Graphic User Interface• Test the learnt models by testing data set • First disable the connections between

t i i d t t d th d t i itraining data set and the data mining algorithms

• Connect the testing data set with the data mining algorithms

The University of Iowa Intelligent Systems Laboratory

Graphic User Interface• Test the learnt models

The University of Iowa Intelligent Systems Laboratory

28

Graphic User Interface• See the prediction results

The University of Iowa Intelligent Systems Laboratory