h2o world - solving customer churn with machine learning - julian bharadwaj
TRANSCRIPT
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
1
Consumer Churn Program Framework, capabilities and lessons learned (well, at least
so far….)
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Before and after…
2
The thinking around churn
• Wait, the consumer hasn’t churned yet,
we’ll do xx after they churn
• Churn happens when we find out
someone hasn’t transacted
• Let’s assign a probability every day and
figure out today, if someone’s going to
churn in the next pre-defined churn
period. It’s ok if you’re not super
accurate
• A consumer churned on the day of their
last transaction, not when we found out,
but, when they did their last transaction
(probably)
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Rough idea of end product
3
What do we think will resonate with our internal customers Cust Segment P(churn)
C1 Month of prior txn 0.945
C2 Days since your last
txn
0.883
C3 Days since/max. gap 0.657
C4 Lifetime spend 0.760
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Rough idea of audience
4
How will our internal customers use the product
Churn Model Output
Executive Marketing Analysts
• Consistency (can’t change 12
month churn to 45.87 days, or
refer to churn as “brief hiatus”)
• Aggregates and segments
• May be related but different
from what drives action for
other personas, so, code
needs to be written
• Easy to put into PowerPoint,
email, Excel
• Moderately fast tool to size population
• Must have filters on region and country
• Actual population is much smaller
• Test/control clarity and size estimator
• Data, documentation
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Predictive Modeling Exercise
5
Mission Statement to Data Product
Exploratory Data Analysis Modeling Production
• Feature engineering and
reduction
• SQL, Pig, Python, JMP, R, SK
Learn
• Transaction variables - v.
important; Behavioral
variables - moderately
important; Demographic –
meh
• Automation is critical, saves
time in the long run
• Optimize SQL or MapReduce
now, don’t wait until production
• JDBC >> ODBC
• Further feature reduction, fitting,
tuning, validation
• R, H2O
• Ensemble models rock! Validate
sample size, go multi processing early,
QC your data
• Train/test/validate data sets
• AUC to set threshold
• Focus on Confusion matrix variables
like accuracy, in class error, recall,.. to
compare models
• MVP for time/accuracy and iterate
• R, H2O, C3 (PayPal’s S3), HTML,
Tableau, FEXP
• Scale with C3 and a Unix cluster
management tool
• HTML wrapper helps keep things
organized and version controlled
• I/O is time consuming - FEXP on a
DT ETL Box is super fast
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Modeling
6
Performance
Train
CV1
Validate
365 days
90 days
Metric Value
F1 0.87
Precision 0.86
Recall 0.88
Accuracy 0.87
Train: 2 million sample
Validation: 1 million sample
Precision : TP/(TP+FP)
% of wolves when I cried ‘Wolf’
Recall: TP/(TP+FN)
% of wolves I actually identified
CV2
CV5
…
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Modeling
7
Benchmarking on Random Forest and H2O’s Distributed Random Forest
Software Hardware Performance Data size
R, ODBC 1 processor,
32 GB RAM
Modeling – 6 hrs
Scoring – 72 hrs
Train: hundreds of thousands of
rows, score on entire consumer
base
Revolution R,
ODBC
8 processors,
32 GB RAM
Modeling – 1 hr
Scoring – 48+ hrs (did not complete)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, JDBC 3 machines,
24 processors,
50 GB
Modeling – 30 min
Scoring – 12 hrs (mainly I/O)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, JDBC 16 machines,
128 processors,
300 GB
Modeling – 20 min
Scoring – 25 min (unzip)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, Hadoop 20 nodes Modeling – 10 min
Scoring - 5 min (about 4 min is I/O)
Train and score on entire
consumer base !
Goal: Modeling – under 30 min
Scoring – under 1 hour
Enables multiple models daily – a true forecast!
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Production
8
Process used for identifying individual features
Current Enhancement
• Normalize feature importance
• Normalize features per consumer
feature value - mean
standard score = ---------------------------
standard deviation
• Sort feature columns by feature importance *
standard score for each feature
• Works for most cases, misses out obvious
branching in corner cases
• OK for MVP, but, not a great process
• Multiple runs of same model less 1 feature
• Evaluate difference in probability for each
run
• Order differences by feature to get most
impact
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
So what ?
9
Data science matters!
I can’t share $$ impact, so here are some proxies:
• Resources dedicated to overall program both budget, headcount and tech spend
• Feature importance output fed into enterprise level framework
• Ongoing program built around model, literally, around output of Random Forest and GBM – no
longer a prototype (I need to figure out a way to productionize this stuff, quickly)