data mining - university of...
TRANSCRIPT
Data Mining
Dr. Saed SayadUniversity of Toronto
2010
1http://chem-eng.utoronto.ca/~datamining/
Data Mining
http://chem-eng.utoronto.ca/~datamining/ 2
Data mining is about explaining the past and predicting the
future by means of data analysis.
http://chem-eng.utoronto.ca/~datamining/ 3
AI &Machine Learning
Statistics
Data Mining
Database & DW
Data Mining
http://chem-eng.utoronto.ca/~datamining/ 4
0 10 20 30 40 50 60
Gambling
Entertainment/ Music
Investment / Stocks
Junk email / Anti-spam
Security / Anti-terrorism
Travel/Hospitality
Web
Biotech/Genomics
e-Commerce
Other
Government applications
Medical/ Pharma
Health care/ HR
Science
Manufacturing
Telecom
Insurance
Retail
Fraud Detection
Direct Marketing/ Fundraising
Credit Scoring
Banking
CRM
Data Mining Applications
Source: KDnuggets.com
http://chem-eng.utoronto.ca/~datamining/ 5
much higher20%
somewhat higher
30%
about the same41%
somewhat lower4%
much lower5%
Data mining activity in 2007 compare to 2006
Source: KDnuggets.com
Data Mining Steps
1 • Problem Definition
2 • Data Preparation
3 • Data Exploration
4 • Modeling
5 • Evaluation
6 • Deployment
http://chem-eng.utoronto.ca/~datamining/ 6
CRISP-DM Process ModelCRoss-Industry Standard Process for Data Mining
http://chem-eng.utoronto.ca/~datamining/ 7
Source: http://www.crisp-dm.org/Process/index.htm
1. Problem Definition
http://chem-eng.utoronto.ca/~datamining/ 8
Understanding the project objectives and requirements from a business perspective and then converting this knowledge into a data mining problem definition with a preliminary plan designed to achieve the objectives.
Source: http://www.crisp-dm.org/Process/index.htm
2. Data Preparation
Modeling Data
DataText
Data DSN
ETL
http://chem-eng.utoronto.ca/~datamining/ 9
3. Data Exploration
Data Exploration
UnivariateAnalysis
Average, StDev, Min, Max, ...
Bar, Line, Pie, ...
Charts
Bivariate Analysis
Correlation
Z test, ...
Combination Charts
http://chem-eng.utoronto.ca/~datamining/ 10
Data Exploration - Univariate
http://chem-eng.utoronto.ca/~datamining/ 11
Data Exploration - Bivariate
http://chem-eng.utoronto.ca/~datamining/ 12
4. Modeling
Classification
Bayesian
Decision Tree
Logistic Regression
SVM
Regression
Linear Regression
Robust Regression
Neural Network
Clustering
Hierarchical
K-Means
Association
A Priori
http://chem-eng.utoronto.ca/~datamining/ 13
Data Mining: Classification & Regression
http://chem-eng.utoronto.ca/~datamining/ 14
Frequency
Table
OneR
Bayesian
Decision Tree
Markov Chains
HMM
Covariance
Matrix
Linear
Regression
LDA
(Z Score)
PCA/PCR
Logistic
Regression
Robust Regression
Similarity
Functions
KNN
Neural
Networks
Perceptron
Back
Propagation
RBF
Others
SVM
GA
Scalable Methods
Modeling - Classification
http://chem-eng.utoronto.ca/~datamining/ 15
fAge Responder
e.g., Y or N
Modeling - Regression
http://chem-eng.utoronto.ca/~datamining/ 16
fAge AmountPurchased
e.g., $350
Modeling - Clustering
http://chem-eng.utoronto.ca/~datamining/ 17
Age
Income
Association Rules
http://chem-eng.utoronto.ca/~datamining/ 18
Market Basket Analysis
5. Evaluation
Charts Stats
Variables Contribution
Mean Square Error
Confusion Matrix
K-S Chart
Lift Chart
Gain Chart
http://chem-eng.utoronto.ca/~datamining/ 19
Evaluation - Confusion Matrix
http://chem-eng.utoronto.ca/~datamining/ 20
True
Positive
False
Positive
False
Negative
True
Negative
CM
Positive Cases Negative Cases
Pre
dic
ted
Po
siti
veP
red
icte
d
Neg
ativ
e
Evaluation – Gain Chart
http://chem-eng.utoronto.ca/~datamining/ 21
Population%
50%10%
100%
100%
45%
10%
Responder%
6. Deployment
SQL VB
JAVA HTML
http://chem-eng.utoronto.ca/~datamining/ 22
Data Mining Team
Modeler
AnalystDBA
http://chem-eng.utoronto.ca/~datamining/ 23
DomainExpert
Data Mining Software Vendors
http://chem-eng.utoronto.ca/~datamining/ 24
Data Mining
SAS
KXEN
KNIMEAngoss
SPSS
Case Study...
http://chem-eng.utoronto.ca/~datamining/ 25