predictive modeling: predict premium subscriber for a leading international music website

12
MSBA 6420 Rapid Winners Kaushik Nuvvula, Pankaj Singhal, Wenqiuli Zhang, John Tong, Rohith D Rapid Approaches

Upload: kaushik-nuvvula

Post on 10-Feb-2017

235 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

MSBA 6420 Rapid WinnersKaushik Nuvvula, Pankaj Singhal, Wenqiuli Zhang, John Tong, Rohith D

Rapid Approaches

Page 2: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Agenda

Approach

Different techniques used

Techniques that worked

Techniques that did not work

Best Model

Future Scope and Learnings

Page 3: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Approach

Neural Network

Voting (SVM, Neural)

Boosting (Neural)

Bagging (Neural)

Sampling, NormalizationData Pre-processing, Weights, Bagging, Voting, SamplingData Pre-processing, Sampling, Generate AttributesSelect by weights, Data Pre-processing, SamplingData Pre-processing, Sampling

Top 5 Models

Model Techniques Used

Stacking (k-NN, Neural)

Page 4: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

F-measure and cost: Top 5 Models

Page 5: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Techniques that worked

Data Processing

Data Preprocessing

Generate Attributes

Attribute Selection Techniques

Attributes Selection

Optimize Parameters

Techniques

PCA, SMOTE

Voting

NormalizationBagging, Boosting,

Stacking

Filter Examples Sampling

Page 6: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

6

SMOTE: Resampling Approach

• SMOTE -Synthetic Minority Oversampling combines Informed Oversampling of the minority class with Random Under-sampling of the majority class.

• For each minority Sample– Find its k-nearest minority neighbors– Randomly select j of these neighbors– Randomly generate synthetic samples along the lines joining the minority sample and its j

selected neighbors

*SMOTE currently yields best results as far as re-sampling and modifying probabilistic estimate techniques (Chawla, 2003).

Page 7: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Deep Dive: SMOTE Sampling

: Minority sample

: Synthetic sample

What happens if there is a nearby majority sample?

: Majority sample

Page 8: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Techniques that did not work

• Meta-cost• Forward Selection• Logistic Regression

Page 9: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Best Model

Neural Network

Class 0: Above 0.1

Class 0: Between 0.03 – 0.1

F- Measure and Misclassification Cost Improvement

Page 10: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Scope - ImprovementsFi

lter E

xam

ples

Metric Change Improvement

Average Friend Age

17 to 31 Positive

Tenure > 4 Positive

Songs Listened

> 1 Negative

Age > 8 and <70 Negative

Page 11: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Key Learnings: Warnings

• Remove Oversampling - Bias in the data• Generate Calculated Attributes• complex f-measure• Try to train your models on relatively higher

variability capturing records – Using Filter Examples

Page 12: Predictive Modeling: Predict Premium Subscriber for a Leading International Music Website

Appendix

True 0 True 1

Pred. 0 24259 335

Pred. 1 1302 109

True 0 True 1

Pred. 0 23442 320

Pred. 1 1289 400

F- Measure

Misclassification Cost