prediction in data mining
TRANSCRIPT
-
8/8/2019 Prediction in Data Mining
1/12
Hong Vit Lm
Nguyn Minh Tn
-
8/8/2019 Prediction in Data Mining
2/12
Outliney Data Mining Prediction in general
y Definition
y How does Prediction work?y Prediction Evaluation
y Some current research in Data Mining prediction
y Regression
y SVM
y Neural Network
-
8/8/2019 Prediction in Data Mining
3/12
Definitiony Prediction in Data Mining is a
process to build a continuous-valued function to
predict future val
ued from current data.
y Different with Classification:
y Classification : predict categoricallabels
y Prediction : predict continuous-valued
y Field applied:y Medical , Economic .
-
8/8/2019 Prediction in Data Mining
4/12
How does Prediction work?
Datapreprocessing
DataCleaning
Relevanceanalysis
Datatransformationsand reduction
Learning Phase
Input:
training set
Output:
Continuous-valuedfunction
Testing Phase
Input:
Testing set
Continuous-valuedfunction
Output:
Accuracy Evaluation
Suitable Predictor
Predictor
-
8/8/2019 Prediction in Data Mining
5/12
Criteria for comparing prediction
methods
Accuracy
Speed
Robustness
Scalability
Interpretability
-
8/8/2019 Prediction in Data Mining
6/12
Predictor Error Measuresy Use loss function:
y Absolute error
y
Squared error
y Test error rate:y Mean absolute error
y Mean squared error
y Relative absolute error
y Relative squared error
-
8/8/2019 Prediction in Data Mining
7/12
Evaluating the Accuracy of
Predictor
y Holdout Methods and Random Sub sampling
y Cross-Validation
y Bootstrap
-
8/8/2019 Prediction in Data Mining
8/12
Outliney Data Mining Prediction in general
y Definition
y How does Prediction work?y Prediction Evaluation
y Some current research in Data Mining prediction
y Regression
y SVM
y Neural Network
-
8/8/2019 Prediction in Data Mining
9/12
Regressiony Statistical methodology developed by Sir Frances
Galton
yA good choice when all of the predictor variables arecontinuous valued
y Classification:
y Linear regression /Non-linear regression
y Single variable / Multiple variables
-
8/8/2019 Prediction in Data Mining
10/12
Support Vector Machines
y SVM : use nonlinear mapping to transform the originaltraining data into a higher dimension then find out thelinear optimal separating hyper plane
y In regression it can be used to learn the input-outputrelationship between input training tuples.
y Pros and Cons:
y Training time low but highlyaccurate
-
8/8/2019 Prediction in Data Mining
11/12
Neural Network
y Artificial neural networks: Non-linear predictive models thatlearn through training and resemble biological neural networksin structure.
y One of the most commonly used techniques in data mining
y Pros:y Long training timesy Poor interpretability
y Cons:y High tolerance of noisy dataand pattern that have not been trainy
Well-suit for continuous-valued inputs and outputy Success on wide public
y Architecture:y Feed forward
y Algorithm:y Back propagation
-
8/8/2019 Prediction in Data Mining
12/12