customer-base analysis using aggregated data ( or: the joys of rcss )
DESCRIPTION
Kinshuk Jerath, Carnegie Mellon University Peter S. Fader, Wharton/Univ. of Penn Bruce G. S. Hardie, London Business School. Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS ). Customer-Base Analysis. Faced with a customer transaction database, we may wish to determine - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/1.jpg)
Kinshuk Jerath, Carnegie Mellon UniversityPeter S. Fader, Wharton/Univ. of PennBruce G. S. Hardie, London Business School
Customer-Base Analysis Using Aggregated Data (Or: The Joys of RCSS)
![Page 2: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/2.jpg)
2
Customer-Base AnalysisFaced with a customer transaction database, we may
wish to determine
The level of transactions we expect in future periods, both collectively and individually
Key characteristics of the cohort (e.g., degree of heterogeneity in behavior)
Formal financial metrics (such as “customer lifetime value”) to guide resource allocation decisions
![Page 3: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/3.jpg)
3
Typical Data Structure
Models for customer-base analysis typically require access to individual-customer-level data
![Page 4: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/4.jpg)
4
Long-Standing IT Challenges
![Page 5: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/5.jpg)
5
Too-Much-Data Problem
![Page 6: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/6.jpg)
6
Data Privacy Issues
![Page 7: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/7.jpg)
7
Barriers to Disaggregate DataMany firms may not (be able to) keep detailed individual-
level records: General weaknesses with the firm’s information
systems capabilities Corporate information silos make data integration
difficult Wariness given high-profile stories on data loss Data protection laws (with bans on trans-border data
flows)
“Anonymizing” (and other statistical disclosure control methods) costly and potentially ineffective
![Page 8: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/8.jpg)
8
Key Challenges
What data formats are easy to create/maintain privacy preserving
Can we adapt our “tried and true” models to accommodate these data limitations but still work well?
How much do we lose in the process?
![Page 9: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/9.jpg)
9
Repeated Cross-Sectional Summary Data
![Page 10: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/10.jpg)
10
Proof of Concept: Tuscan Lifestyles
![Page 11: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/11.jpg)
11
Tuscan Lifestyles Data
![Page 12: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/12.jpg)
12
How would we proceed if we had disaggregate data?
![Page 13: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/13.jpg)
13
“Buy Till You Die” Model
Transaction Process (“Buy”) While “alive”, a customer purchases randomly around his
mean transaction rate Transaction rates vary across customers
Dropout Process (“Till You Die”) Each customer has an unobserved “lifetime” Dropout rates vary across customers
![Page 14: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/14.jpg)
14
“Shop Till You Drop”
![Page 15: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/15.jpg)
15
The Pareto/NBD Model(Schmittlein, Morrison, and Colombo 1987)
Transaction Process: While active, number of transactions made by a customer
follows a Poisson process with transaction rate λ
Transaction rates are distributed gamma(r,α) across the population
Dropout Process: Each customer has an unobserved lifetime of length τ,
which is distributed exponential with dropout rate μ
Dropout rates are distributed gamma(s,β) across the population
Astonishingly good fit and predictive performance
![Page 16: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/16.jpg)
16
The Pareto/NBD works very well…
…given individual-level (disaggregate) data.
![Page 17: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/17.jpg)
17
![Page 18: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/18.jpg)
18
Pareto/NBD using RCSS data
Same assumptions as for the usual Pareto/NBD implementation
Calculate purchase probabilities over discrete intervals: P(X(t, t +1)) = x, P(X(t +1, t +2)) = x, P(X(t +2, t +3)) = x, etc.
Apply to RCSS histograms and use standard MLE estimation
Parameter estimation is fast, stable, and robust All of the usual Pareto/NBD diagnostics (e.g.,
“P(Alive)”) can be obtained from the parameter estimates
![Page 19: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/19.jpg)
19
Model Fit
![Page 20: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/20.jpg)
20
Do We Need All Five Years of Data?
Calibrate the model on years 1-3 only, predict for years 4 and 5.
![Page 21: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/21.jpg)
21
Customer-Base Analysis Using Repeated Cross-Sectional Summary (RCSS) DataUnder more general conditions, what is the
“information loss” by aggregating data?
Under what conditions can a model built using aggregated data accurately mimic its individual-level counterpart?
How much aggregated data is required to do this job well?
![Page 22: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/22.jpg)
22
Reminder – RCSS Data
![Page 23: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/23.jpg)
23
Research Design Manipulate the four parameters of the Pareto/NBD
r, s = 0.5, 1.0, 1.5 α, β = 5, 10, 15We have 34 = 81 “worlds”
For each “world,” simulate 104 weeks of data for five synthetic panels of 2500 customers (first 78 weeks for calibration, last 26 weeks for holdout)
Fit the Pareto/NBD model to the raw transaction data – obtain disaggregate LL and parameters
“Backward-looking” (“Chopping it up”) analysis
“Forward-looking” (“Build as you go”) analysis
![Page 24: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/24.jpg)
24
“Backward-Looking” AnalysisHow many cross-sectional summaries should be created? (How to “chop it up?”)
One 78-week histogram? Two 39-week histograms? Three 26-week histograms? … Six 13-week histograms?
For each of the six aggregation conditions, fit the Pareto/NBD to the resulting RCSS data, and:
1. Compare RCSS parameter estimates to the disaggregate benchmarks
2. Evaluate the disaggregate LL functions using the RCSS parameter estimates and compare to the disaggregate benchmark LL
3. Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms
![Page 25: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/25.jpg)
25
Scenario 1: r = 0.5, α = 5, s = 0.5, β = 5
# Hist. Avg. LL Dev. RMSE r α s β
1
-23452.
4 3.1% 37.1 0.37 3.83 1.65 40.01
2
-22813.
3 0.3% 5.4 0.40 4.63 0.65 11.65
3
-22759.
0 0.0% 5.0 0.41 4.48 0.57 7.89
4
-22759.
4 0.0% 4.9 0.41 4.56 0.58 8.54
5
-22767.
8 0.1% 5.0 0.41 4.58 0.56 8.18
6
-22754.
9 0.0% 5.0 0.46 4.79 0.50 5.32
Disagg.
-22748.
1 5.7 0.44 4.85 0.56 7.35
![Page 26: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/26.jpg)
26
“Forward-Looking” AnalysisHow many quarterly (13-week) histograms are required?(How many to “build as you go?”)
One (total 13 weeks)? Two (total 26 weeks)? Three (total 39 weeks)? … Six (total 78 weeks)?
For each of the six “number of histogram” conditions, fit the Pareto/NBD to the resulting RCSS data, and:
1. Compare RCSS parameter estimates to the disaggregate benchmarks
2. Evaluate the disaggregate LL functions on the full data using the RCSS parameter estimates and compare to the disaggregate benchmark LL
3. Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms
![Page 27: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/27.jpg)
27
Scenario 1: r = 0.5, α = 5, s = 0.5, β = 5
# Qtrs. Avg. LL Dev. RMSE r α s β
1
-23411.
8 2.9% 167.1 0.45 4.20 3.28 40.01
2
-22761.
9 0.1% 19.5 0.49 4.91 0.37 2.67
3
-22756.
5 0.0% 17.0 0.49 4.88 0.44 3.49
4
-22749.
8 0.0% 7.2 0.45 4.75 0.49 5.22
5
-22750.
2 0.0% 4.8 0.46 4.83 0.49 4.93
6
-22750.
1 0.0% 5.0 0.46 4.79 0.50 5.32
Disagg.
-22748.
1 5.7 0.44 4.85 0.56 7.37
![Page 28: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/28.jpg)
28
Summary of Results
Using three or more quarters always provides the same performance as disaggregate data in terms of:
Parameter recovery
In-sample LL
Out-of-sample predictions
![Page 29: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/29.jpg)
29
Conclusions
We can estimate the Pareto/NBD using RCSS data; the findings from the Tuscan Lifestyles study are generalizable
Useful/interesting model diagnostics still emerge – even in the absence of any individual-level data
Three cross-sections are generally sufficient
![Page 30: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/30.jpg)
30
Other Desirable Properties Just the percentage of total customers in each bucket is
sufficient – don’t even need actual numbers
Data can be “aperiodic” (they just have to be “repeated”)
Histograms can be of different time lengths, e.g., 3-month + 6-month + 4-month
Histograms can be missing, e.g., Qtr. 1, –, Qtr. 3, Qtr. 4
Data management/storage benefits
![Page 31: Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS )](https://reader035.vdocuments.mx/reader035/viewer/2022081503/56815d52550346895dcb5c4b/html5/thumbnails/31.jpg)
31
What Would Managers (and Customers) Rather Use?
or