dt mii statisticadata mining: statisticauser.engineering.uiowa.edu/~comp/public/statistica.pdf · 2...
Post on 16-Feb-2019
217 Views
Preview:
TRANSCRIPT
1
D t Mi i STATISTICAData Mining: STATISTICA
The University of Iowa Intelligent Systems Laboratory
Outline
•Prepare the data•Classification and regression•Clustering•Association rules •Graphic user interface
The University of Iowa Intelligent Systems Laboratory
•Graphic user interface
2
Prepare the Data• Statistica can read from Excel, .txt and many other types of files• Compared with WEKA, Statistica is much easier in terms of data
preparing
The University of Iowa Intelligent Systems Laboratory
Open an Excel File• Click the “Import selected sheet to Spreadsheet”• Select the desired Excel sheet where your data is stored• Get variable names from the first row
The University of Iowa Intelligent Systems Laboratory
3
Open an Excel File• Change variable type
The University of Iowa Intelligent Systems Laboratory
Open an Excel File• Change variable type
The University of Iowa Intelligent Systems Laboratory
4
Classification and Regression
• C&RT• C&RT• Boosting tree• Neural Networks
The University of Iowa Intelligent Systems Laboratory
C&RT Classification• Iris data is used as a example data set
The University of Iowa Intelligent Systems Laboratory
5
C&RT Classification• Click “Data Mining” menu and find the “Interactive Trees”
The University of Iowa Intelligent Systems Laboratory
C&RT Classification• View the final tree and understand the results
The University of Iowa Intelligent Systems Laboratory
6
C&RT---Regression• Use the CPU data set and select the regression analysis
The University of Iowa Intelligent Systems Laboratory
C&RT---Regression• Regression tree structure
The University of Iowa Intelligent Systems Laboratory
7
C&RT---Regression
redi
cted
val
ues
The University of Iowa Intelligent Systems Laboratory
Pr
Boosting tree Classification• In “Data Mining” menu and find the “Boosted tree classifier and regression”
The University of Iowa Intelligent Systems Laboratory
8
Boosting tree Classification• See the results and predictor’s importance
The University of Iowa Intelligent Systems Laboratory
Boosting tree Classification• See the results and predictor’s importance
The University of Iowa Intelligent Systems Laboratory
9
Boosting tree Regression• CPU data set
The University of Iowa Intelligent Systems Laboratory
Boosting tree Classification• See the results and predictor’s importance
redi
cted
val
ues
The University of Iowa Intelligent Systems Laboratory
Pr
10
Boosting tree Classification• See the results and predictor’s importance
The University of Iowa Intelligent Systems Laboratory
Boosting tree Classification• See the results and predictor’s importance
The University of Iowa Intelligent Systems Laboratory
11
Neural Networks Classification• In “Data Mining” menu and find the “Automated Neural Networks”
The University of Iowa Intelligent Systems Laboratory
Neural Networks Classification• Choose “Classification”, then select variables
The University of Iowa Intelligent Systems Laboratory
12
Neural Networks Classification• Statistica will try a set of different neural networks and keep the best ones
The University of Iowa Intelligent Systems Laboratory
Neural Networks Classification• See the classification results
The University of Iowa Intelligent Systems Laboratory
13
Neural Networks Classification• See the classification results---Predictions
The University of Iowa Intelligent Systems Laboratory
Neural Networks Classification• See the classification results---Predictions
The University of Iowa Intelligent Systems Laboratory
14
Neural Networks Classification• See the classification results---Confusion matrix
The University of Iowa Intelligent Systems Laboratory
Neural Networks Regression• CPU data set
The University of Iowa Intelligent Systems Laboratory
15
Neural Networks Regression• CPU data set, select variables
The University of Iowa Intelligent Systems Laboratory
Neural Networks Regression• Training and results
The University of Iowa Intelligent Systems Laboratory
16
Neural Networks Regression• Predictions
The University of Iowa Intelligent Systems Laboratory
Neural Networks Regression• Some statistics about the predictions
The University of Iowa Intelligent Systems Laboratory
17
Clustering• Use the Deere data set
The University of Iowa Intelligent Systems Laboratory
Clustering• Select k-Means and choose the variables
The University of Iowa Intelligent Systems Laboratory
18
Clustering• Choose the distance metrics and initial cluster centers
The University of Iowa Intelligent Systems Laboratory
Clustering• 5 clusters and see the results
The University of Iowa Intelligent Systems Laboratory
19
Clustering• Centroids (cluster means)
The University of Iowa Intelligent Systems Laboratory
Clustering• Members and their distance to the centroids
The University of Iowa Intelligent Systems Laboratory
20
Association rules• Use the Deere data set
The University of Iowa Intelligent Systems Laboratory
Association rules• Select variables and set up proper parameters
The University of Iowa Intelligent Systems Laboratory
21
Association rules• See rules
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Divide CPU data into training and testing data set
The University of Iowa Intelligent Systems Laboratory
22
Graphic User Interface
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Choose different algorithms
The University of Iowa Intelligent Systems Laboratory
23
Graphic User Interface• Insert the selected data mining algorithms into workspace
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Select data sources
The University of Iowa Intelligent Systems Laboratory
24
Graphic User Interface• Specify whether the data is used to build the model or used as a testing set
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Connect the data with data mining algorithms
The University of Iowa Intelligent Systems Laboratory
25
Graphic User Interface• Connect the data with data mining algorithms
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Set up deployment, double click the data mining algorithm icon
The University of Iowa Intelligent Systems Laboratory
26
Graphic User Interface• Click “Run” button
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• See the deployment code by double click the icons in “Reports” section
code
The University of Iowa Intelligent Systems Laboratory
C
27
Graphic User Interface• Test the learnt models by testing data set • First disable the connections between
t i i d t t d th d t i itraining data set and the data mining algorithms
• Connect the testing data set with the data mining algorithms
The University of Iowa Intelligent Systems Laboratory
Graphic User Interface• Test the learnt models
The University of Iowa Intelligent Systems Laboratory
top related