collective response spike prediction for mutually interacting consumers
Post on 18-Nov-2014
461 Views
Preview:
DESCRIPTION
TRANSCRIPT
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.
. ..
.
.
Collective Response Spike Prediction
for Mutually Interacting Consumers
Rikiya Takahashi1 Hideyuki Mizuta1 Naoki Abe2
Ruby L. Kennedy3 Vincent J. Jeffs3 Ravi Shah3
Robert H. Crites3
1IBM Research - Tokyo
2IBM Thomas J. Watson Research Center
3IBM Software Group, Enterprise Marketing Management
December 8, 2013
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Response Spike Forecasting in e-Commerce
Goal: predict the prob. of consumer i ’s response spike in time[t, t+∆t), by using the two types of factors.
individual factor the consumer i ’s experiences before time t
collective factor many consumers’ experiences before time t.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Omni-Channel Events and Essential Questions
Examples of past events that could affect future responses
response purchase, web-site browsing, inquiry to call center
stimulus e-mail, direct-mail, reach to TV ad
interaction observable word-of-mouth in online review sitesunobservable offline word-of-mouth
physiological sync among humans
Need to answer the two essential questions
.
.
.
1 How to model the time-dependency among these events?
.
.
.
2 How to handle the unobservable interaction?
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Agenda
.
..
1 Introduction of the Goal and Issues
.
.
.
2 Modeling of the Individual Factors
Basics in Continuous-Time Event PredictionHyperbolic Discounting in Human MemoryEfficient Learning with Piecewise-Constant States
.
.
.
3 Modeling of the Collective Factors
.
.
.
4 Experimental Results
.
.
.
5 Conclusion
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Regression for Inhomogeneous Poisson Processes
Regressing Poisson Process Rates with a State Vector
Inhomogeneous Poisson Process: point process whoseprob. of event occurrence per time is time-varying.
Yik(t, t+∆t): random variable to represent # of type-kresponse events in time [t, t+∆t).
Model the time-varying log-intensity
zik(t) = lim∆t→0
logP (Yik(t, t+∆t)≥1)
∆t,
as a function of some state vector xi(t)∈Rd .
xi(t) must be designed with past events before time tthrough multiple response and stimulus channels.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Desirable Property of State: Piecewise-Constancy
How to appropriately model the state xi(t)?Exploit piecewise-constant features (Rajaram et al., 2005;Gunawardana et al., 2011).
Analytically tractable Poisson log-likelihood terms.Examples: “# of type-k events in past L days/weeks”
Efficiently computed with terminators of sliding windows.
Figure: Computing state time-series with multiple sliding windows.Every element of the state vector is finally time-aligned.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Desirable Property: Hyperbolic Discounting
State should be psychologically interpretable for marketers.Power-law decay (hyperbolic discounting) in memory(Rubin and Wenzel, 1996; Wixted and Ebbesen, 1997).
Fast initial decay whilelong-term persistence
0
1
mag
nitu
de
elapsed time after event
hyperbolicexponential
Formed as infinite mixture ofexponential discounting
1
(1 + t/τ)α ≡∫ ∞
0e−λtGa(λ;α, τ)dλ
whereGa(λ; α, τ), τα
Γ(α)exp(−τλ)
Finite-mixture approximation with sampling1
(1 + t/τ)α 'K∑
i=1
wie−λi t where wi =1/K , λi ∼ Ga(λ; α, τ)
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Nonparametric Staircase Modeling
Effective compromise between the piecewise constancy andhyperbolic discounting - staircase function approximation.
time
Finite mixture of step functions,instead of mixing exponentials
K∑i=1
wi I (ti < t < ti + Li )
Nonparametric curve fitting: contain many combinationsof event types & window lengths in the state xI (t).
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Nonparametric Staircase Modeling
An example of forecasting the response rates in future.
How to efficiently fit event-specific curves from real data?
Fix every sliding-window length a priori, andoptimize only the height of each step function.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Convex Variable-Interval Poisson Regression
Linear model with a mapping Φ : Rd →RdΦ to representdiminishing returns (e.g., element-wise sub-linear function).
P (Yik(t, t+∆t)≥1)=∆t exp(bk +w>
k Φ (xi(t)))
Maximum A Posteriori estimation with L1 penalty:
maxbk ,wk
[−nC0‖wk‖1+
n∑i=1
Ti∑j=1
`(yijk ; bk +w>
k Φ (xij) , τij
)]
`(y ; z , τ),yz−τ exp (z): continuous-time log-likelihood
n: # of consumers
τij : the j-th interval time of consumer i
Update with either coordinate-wise batch learning or onlinelearning algorithms (e.g., FOBOS (Duchi and Singer, 2009)).
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Examples of the Fitted Curves
Data: individual-level daily records of 2-year (2009-2011)marketing-action and response events in an online retailer.
1684210 time [week]
purchase→purchase
1684210 time [week]
catalog→purchase
1684210 time [week]
e-mail→purchase
1684210 time [week]
e-mail→browsing1684210 time [week]
browsing→purchase
1684210 time [week]
browsing→browsing
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Agenda
.
..
1 Introduction of the Goal and Issues
.
.
.
2 Modeling of the Individual Factors
.
.
.
3 Modeling of the Collective Factors
Dependence among Consumer ResponsesFrequency Aggregation with Residual ClusteringMulti-Task Learning of Cluster-Specific Models
.
.
.
4 Experimental Results
.
.
.
5 Conclusion
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Evidence of Interactions among Consumers
Reject a hypothesis of independence among consumers.Left: frequent excesses of confidence intervals.
Response freq. by each individual is Poisson-distributed.Sum of response frequencies by independent consumersmust obey a Poisson distribution.
Right: significant autocorrelation (predictability) aboutthe sum of regression residuals.
Date 1... 7... 1... 0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Lag [days]
Aut
o−co
rrel
atio
n
Figure: High dispersion from the sum of individuals. Black: actual.Blue: predicted mean. Green: predicted 0.5%- & 99.5%-tiles.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Approaches to Incorporate Interactions
Graphical Granger Modeling(e.g., (Lozano et al., 2009))
to connect individuals
Unscable quadratic costto the # of consumers
Our approach: fit a graphto connect only clusters
Quadratic cost only tothe # of clusters
In our data, fitting an individual-to-individual graph didnot improve accuracy even with L1 sparsification.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.The 3-Step Estimator with Residual Clustering
Aggregating frequency among population stabilizes the fitting.Detect each group whose members follow the common trend.
Clustering to exclude the autocor. within the same consumer.
A) Initial Fitting regression using only the individual factors.
B) Residual Clustering clustering of residual time-series in A.
C) Final Fitting fit interaction coefficients among clusters.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Residual Clustering
Do a clustering of vectors {rik ,(ri1k , . . . , riHk)>}n
i=1 where
rijk , yijk −∫
etj
et(j−1)
exp(bk +w>
k Φ (xit))
dt.
yijk : weekly-smoothed actual response frequencyWe recommend m-medians for suppressing outlying peaks.
Persistent biases confirmed in residual time-series for eachcluster: implying autocorrelation within the same cluster.
Date 1... 7... 1...
actual
Date 1... 7... 1...
predictedDate
1... 7... 1...
residualsFigure: Frequency time-series for 4 example clusters. Aggregatedresiduals for each of clusters have non-zero means.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Multi-Task Regression for Clusters
Reformulate the log-intensity as
zik(t) = bc[i ]k +w>c[i ]kΦ (xi(t))+
m∑c ′=1
θ>c[i ]c ′kΨ (zc ′(t)) .
zc ′(t): sum of freq. within cluster c ′ with sliding windows
Ψ: another mapping function
θc[i ]c ′ : interaction strength from cluster c ′ to c[i ].
Using the initial estimate W in step A, we maximize
L (Dk |Θ∗k)−
n
m
m∑c=1
(C1‖wck−wk‖1+C2
m∑c ′=1
‖θcc ′k‖1
),
which is the sum of data log-likelihood L (Dk |Θ∗k) (details are
abbreviated) and convex multi-task learning penalty.ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Experimental Setting
Evaluate the predictive accuracy by using individual-level daily2-year (2009-2011) events in an online retailer.
3 folds: each fold contains about 2M (browsing: 900K &purchase: 60K) events by 30K customers.
response: 1 type of purchase & 9 types of browsing
stimulus: 5 types of omni-channel direct marketing
8 types of window lengths: 1 day, 2 days, 4 days, 1 week,2 weeks, 4 weeks, . . ., and 32 weeks
Each dataset is split into a training and a test datasets byusing the middle date as a dividing point.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Performance Metric
As well as the log-likelihood, we evaluate the Continuous-TimeArea-Under-Curve (CTAUC) based on
Continuous-Time Receiver-Operator-Characteristics(CTROC) curve to represent what fractions of the actualresponse events are covered in the high-score periods.
Figure: Principle in computing the CTAUC.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Predictability with Rich Individual Factors
Results when using only the individual factors
Baseline: models only containing past purchase events.
Adding past browsing events and stimuli improve the acc.
Over 80% of responses are covered in top-20% periods.
0.6
0.7
0.8
0.9
1
1 2 3
CT
AU
C
index of the dataset
All (Proposed)PastPurchase+Action
OnlyPastPurchase
-0.1
-0.08
-0.06
-0.04
-0.02
1 2 3
avg.
test
-set
log-
likel
ihoo
d
index of the response type
All (Proposed)PastPurchase+Action
OnlyPastPurchase
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cov
erag
e fo
r A
ctua
l Res
pons
es
Coverage for High-Intensity Periods
All (Proposed)PastPurchase+Action
OnlyPastPurchase
Figure: Comparing performances for the inclusion of covariates.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Implication from the Fitted Curves
Marketing implications from autocorrelationStrong impacts from browsing into next purchase
Cause a seq. of (campaigns→browsing→purchase)Necessity of simulation in long-term forecasting
To marginalize direct impact & and chains of responses
1684210 time [week]
purchase→purchase
1684210 time [week]
catalog→purchase
1684210 time [week]
e-mail→purchase
1684210 time [week]
e-mail→browsing1684210 time [week]
browsing→purchase
1684210 time [week]
browsing→browsing
Figure: Examples of the nonparametrically fitted curves.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Gains Obtained with the Collective Factors
Collective factors are useful for parts of the labels.Significant improvements of CTAUC by adding thecollective factor even without clustering (m=1)Somewhat outperforming log-likelihood when m>1
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
CT
AU
C
index of the response type
Individualm=1
m=16m=64
-0.001
0
0.001
0.002
0.003
1 2 3 4 5 6 7 8 9 10
gain
of a
vg. t
est l
og-li
kelih
ood
index of the response type
m=1m=16m=64
Dataset #1 (Results for datasets #2 & #3 are similar.)
Figure: Label-dependent performances for collective-factor models.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.Conclusion and Future Directions
Proposed continuous-time response prediction models formarketing decision making
Piecewise-constant states as a staircase function toapproximate hyperbolic discountingCollective factors for unobservable interactions
Stabilized estimation by frequency aggregationDetect groups following the common trends withclustering of regression residuals
Issues to be handled in future
Richer structures of social interactionsNonlinear functional approximationsNon-stationarity of correlation: rare butsuddenly-exploding word-of-mouth events
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.References I
Duchi, J. and Singer, Y. (2009). Efficient online and batch learningusing forward backward splitting. Journal of Machine LearningResearch, 10:2899–2934.
Gunawardana, A., Meek, C., and Xu, P. (2011). A model fortemporal dependencies in event streams. In Shawe-Taylor, J.,Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors,Advances in Neural Information Processing Systems 24, pages1962–1970.
Lozano, A. C., Abe, N., Liu, Y., and Rosset, S. (2009). Groupedgraphical granger modeling methods for temporal causalmodeling. In Proceedings of the 15th ACM SIGKDDinternational conference on Knowledge discovery and datamining, KDD ’09, pages 577–586, New York, NY, USA. ACM.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
Modeling of the Individual FactorsModeling of the Collective Factors
Experimental ResultsReferences
.
.References II
Rajaram, S., Graepel, T., and Herbrich, R. (2005).Poisson-networks: A model for structured point processes. InProceedings of the 10th International Workshop on ArtificialIntelligence and Statistics (AISTATS 2005).
Rubin, D. C. and Wenzel, A. E. (1996). One hundred years offorgetting: A quantitative description of retention. PsychologicalReview, 103:734–760.
Wixted, J. T. and Ebbesen, E. B. (1997). Genuine power curves inforgetting: A quantitative analysis of individual subjectforgetting functions. Memory and Cognition, 25:731–739.
ICDM 2013: IEEE International Conference on Data Mining Collective Response Spike Prediction
top related