h2o world - solving customer churn with machine learning - julian bharadwaj

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

1

Consumer Churn Program Framework, capabilities and lessons learned (well, at least

so far….)


Before and after…

2

The thinking around churn

• Wait, the consumer hasn’t churned yet,

we’ll do xx after they churn

• Churn happens when we find out

someone hasn’t transacted

• Let’s assign a probability every day and

figure out today, if someone’s going to

churn in the next pre-defined churn

period. It’s ok if you’re not super

accurate

• A consumer churned on the day of their

last transaction, not when we found out,

but, when they did their last transaction

(probably)


Rough idea of end product

3

What do we think will resonate with our internal customers Cust Segment P(churn)

C1 Month of prior txn 0.945

C2 Days since your last

txn

0.883

C3 Days since/max. gap 0.657

C4 Lifetime spend 0.760


Rough idea of audience

4

How will our internal customers use the product

Churn Model Output

Executive Marketing Analysts

• Consistency (can’t change 12

month churn to 45.87 days, or

refer to churn as “brief hiatus”)

• Aggregates and segments

• May be related but different

from what drives action for

other personas, so, code

needs to be written

• Easy to put into PowerPoint,

email, Excel

• Moderately fast tool to size population

• Must have filters on region and country

• Actual population is much smaller

• Test/control clarity and size estimator

• Data, documentation


Predictive Modeling Exercise

5

Mission Statement to Data Product

Exploratory Data Analysis Modeling Production

• Feature engineering and

reduction

• SQL, Pig, Python, JMP, R, SK

Learn

• Transaction variables - v.

important; Behavioral

variables - moderately

important; Demographic –

meh

• Automation is critical, saves

time in the long run

• Optimize SQL or MapReduce

now, don’t wait until production

• JDBC >> ODBC

• Further feature reduction, fitting,

tuning, validation

• R, H2O

• Ensemble models rock! Validate

sample size, go multi processing early,

QC your data

• Train/test/validate data sets

• AUC to set threshold

• Focus on Confusion matrix variables

like accuracy, in class error, recall,.. to

compare models

• MVP for time/accuracy and iterate

• R, H2O, C3 (PayPal’s S3), HTML,

Tableau, FEXP

• Scale with C3 and a Unix cluster

management tool

• HTML wrapper helps keep things

organized and version controlled

• I/O is time consuming - FEXP on a

DT ETL Box is super fast


Modeling

6

Performance

Train

CV1

Validate

365 days

90 days

Metric Value

F1 0.87

Precision 0.86

Recall 0.88

Accuracy 0.87

Train: 2 million sample

Validation: 1 million sample

Precision : TP/(TP+FP)

% of wolves when I cried ‘Wolf’

Recall: TP/(TP+FN)

% of wolves I actually identified

CV2

CV5

…


Modeling

7

Benchmarking on Random Forest and H2O’s Distributed Random Forest

Software Hardware Performance Data size

R, ODBC 1 processor,

32 GB RAM

Modeling – 6 hrs

Scoring – 72 hrs

Train: hundreds of thousands of

rows, score on entire consumer

base

Revolution R,

ODBC

8 processors,

32 GB RAM

Modeling – 1 hr

Scoring – 48+ hrs (did not complete)

Train on hundreds of thousands of


base

H2O, JDBC 3 machines,

24 processors,

50 GB

Modeling – 30 min

Scoring – 12 hrs (mainly I/O)



base

H2O, JDBC 16 machines,

128 processors,

300 GB

Modeling – 20 min

Scoring – 25 min (unzip)



base

H2O, Hadoop 20 nodes Modeling – 10 min

Scoring - 5 min (about 4 min is I/O)

Train and score on entire

consumer base !

Goal: Modeling – under 30 min

Scoring – under 1 hour

Enables multiple models daily – a true forecast!


Production

8

Process used for identifying individual features

Current Enhancement

• Normalize feature importance

• Normalize features per consumer

feature value - mean

standard score = ---------------------------

standard deviation

• Sort feature columns by feature importance *

standard score for each feature

• Works for most cases, misses out obvious

branching in corner cases

• OK for MVP, but, not a great process

• Multiple runs of same model less 1 feature

• Evaluate difference in probability for each

run

• Order differences by feature to get most

impact


So what ?

9

Data science matters!

I can’t share $$ impact, so here are some proxies:

• Resources dedicated to overall program both budget, headcount and tech spend

• Feature importance output fed into enterprise level framework

• Ongoing program built around model, literally, around output of Random Forest and GBM – no

longer a prototype (I need to figure out a way to productionize this stuff, quickly)