hierarchical signal propagation for household level sales
TRANSCRIPT
Hierarchical Signal Propagation for Household Level Sales in
Bayesian Dynamic Models
by
Di Deng
Department of Statistical ScienceDuke University
Date:Approved:
Mike West, Advisor
Peter Hoff
Andrew Cron
A thesis submitted in partial fulfillment of therequirements for the degree of Master of Science
in the Department of Statistical Sciencein the Graduate School of
Duke University
2021
ABSTRACT
Hierarchical Signal Propagation for Household Level Sales in
Bayesian Dynamic Models
by
Di Deng
Department of Statistical ScienceDuke University
Date:Approved:
Mike West, Advisor
Peter Hoff
Andrew Cron
An abstract of a thesis submitted in partial fulfillment of therequirements for the degree of Master of Science
in the Department of Statistical Sciencein the Graduate School of
Duke University
2021
Abstract
Large consumer sales companies frequently face challenges in customizing decision
making for each individual customer or household. This dissertation presents a novel,
efficient and interpretable approach to such personalized business strategies, involving
multi-scale dynamic modeling, Bayesian decision analysis and detailed application in
the context of supermarket promotion decisions and sales forecasting.
We use a hierarchical, sequential, probabilistic and computationally efficient Bayesian
dynamic modeling framework to propagate signals down the hierarchy, from the level
of overall supermarket sales in a store, to items sold in a department of the store,
within refined categories in a department, and then to the finest level of individual
items on sale. Scalability is achieved by extending the decouple-recouple concept:
the core example involves 162,319 time series over a span of 112 weeks, arising from
combinations of 211 items and 2,000 households. In addition to novel dynamic model
developments and application in this multi-scale framework, this thesis also devel-
ops a comprehensive customer labeling system, built based on customer purchasing
behavior in the context of prices and discounts offered by the store. This labeling
system addresses a main goal in the applied context to define customer categorization
to aid in business decision making beyond the currently adopted models. Further, a
key and complementary contribution of the thesis is development of Bayesian deci-
sion analysis using a set of loss functions that suit the context of the price discount
selection for supermarket promotions. Formal decision analysis is explored both the-
oretically and via simulations. Finally, some of the modeling developments in the
multi-scale framework are of general interest beyond the specific applied motivat-
ing context here, and are incorporated into the latest version of PyBATS, a Python
package for Bayesian time series analysis and forecasting.
iv
Contents
Abstract iv
List of Figures viii
List of Tables ix
1 Introduction 1
1.1 Data and Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Prior Relevant Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Thesis Scope and Contributions . . . . . . . . . . . . . . . . . . . . . 3
2 Dynamic Models 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 DGLMs: Dynamic Generalized Linear Models . . . . . . . . . . . . . 6
2.3 DMMs: Dynamic Mixture Models . . . . . . . . . . . . . . . . . . . . 7
2.3.1 DCMMs: Dynamic Count Mixture Models . . . . . . . . . . . 7
2.3.2 DLMMs: Dynamic Linear Mixture Models . . . . . . . . . . . 8
2.4 Multi-scale Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Case Study and Examples . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Individual Household DGLMs . . . . . . . . . . . . . . . . . . 11
2.5.2 Multi-scale Modeling . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.3 Model Evaluation and Comparison . . . . . . . . . . . . . . . 14
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Labeling System 18
3.1 Motivations and Purposes . . . . . . . . . . . . . . . . . . . . . . . . 18
v
3.2 Labeling System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Case Study and Examples . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Decision Analysis 25
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Business Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Tentative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 DCMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Computation, Implementation and Code 39
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Copula Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 VBLB for Latent Factor DGLMs . . . . . . . . . . . . . . . . 39
5.2.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Clustering Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Conclusions and Summary 60
Appendices 63
A DGLMs 64
A.1 VBLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
A.2 Discount Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A.3 Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
vi
A.4 Multi-scale Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B More Figures 69
C More Code 73
vii
List of Figures
2.1 Modeling hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 (a) Coefficient of average discount percent in the external model M0;(b) Product of (a) and actual discount percent; (c) Coefficient of (b)in an individual model M2 . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Model comparison in terms of forecasting accuracy. Naive model: Mnaive;
DGLM: M1; Latent: M2; TF: a logistic regression model written in Tensor-
Flow by 84.51ā¦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Interactive visual aids of labeling system . . . . . . . . . . . . . . . . . . 23
4.1 Distributions of four model parameters . . . . . . . . . . . . . . . . . . 29
4.2 Utility vs. Discount . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Distributions of simulated outcomes over a year (p/c = 1.2). . . . . . 34
4.4 Distributions of simulated outcomes over a year (p/c = 2). . . . . . . 35
4.5 Distributions of simulated outcomes over a year (p/c = 10). . . . . . . 36
B.1 Distributions of simulated parameters over a year (p/c = 1.2). . . . . 70
B.2 Distributions of simulated parameters over a year (p/c = 2). . . . . . 71
B.3 Distributions of simulated parameters over a year (p/c = 10). . . . . . 72
viii
List of Tables
3.1 Example items for each group . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Summary statistics of logistic and poisson regressions . . . . . . . . . . . 30
ix
Chapter 1
Introduction
1.1 Data and Context
The data processed throughout the case study is provided by 84.51ā¦. It records
weekly purchasing data of 211 actively selling items across over 2000 households,
during the span of 112 weeks, from September 5th, 2017 to October 22nd, 2019.
For each row/visit, the key numeric variables include regular price, discounted/net
price of items, units of items sold, as well as identification information, such as the
date, household identification number, category/department identification number of
items.
We create a few derivative variables for the modeling purpose: total money spent
on items for each visit, dummy variable of whether or not there were promotions,
and the discount percentage: ratio of discount and regular price.
1.2 Prior Relevant Work
The motives of this thesis are closely related to customized forecasting and decision
making in various retail contexts (e.g. Chen et al., 2018), even though the broader
applications of individualized statistical models aim for recommmendation systems,
such as image recommendation systems (Niu et al., 2018) to music (Wang et al.,
2013).
1
Collaborative filtering and matrix factorization (Su and Khoshgoftaar, 2009; Du
et al., 2018) are usually the pillars of customized recommendation systems. More
recently, deep learning shows potentials in dealing with big, complex data (He et al.,
2018; Niu et al., 2018; Naumov et al., 2019). Unlike most of non-dynamic methods,
this thesis aims to forecast the purchasing behaviors of an individual. The whole
setting of this thesis is dynamic, like the dynamic extension of matrix factoriza-
tion (Jerfel et al., 2017) and temporal features in (Chu and Park, 2009; Hu et al.,
2015).
Customized prediction also has its significance in medical applications, such as ge-
nomics (e.g. Nevins et al., 2003; Pittman et al., 2004; West et al., 2006). Even though
there has been growth in the field such as glaucoma progression prediction (Kazemian
et al., 2018), the main goal focuses on a single value of interest. This thesis presents
methodology beyond forecasting a single value of interest (i.e. glaucoma progres-
sion prediction (Kazemian et al., 2018)). Instead, it deals with forecasting across
thousands of households and hundreds of items.
Finally, in the retail domain, which is the setting of this thesis, relevant personal-
ized models are either not explicitly dynamic and unable to deal with a large number
of time series (e.g. Lichman and Smyth, 2018; Kazemian et al., 2018; Lichman and
Smyth, 2018; Thai-Nghe et al., 2011), or difficult to interpret (e.g. Salinas et al., 2019;
Chen et al., 2018). The formers have hierarchical structure, but the formulations do
not allow comoputational scalability. This makes it difficult to model many time se-
ries. The deep learning methods in the laters are probabilistic and scalable, but they
trade for interpretability. When it comes to forecasting and decision making in retail
domain, this thesis presents a probabilistic, scalable, interpretable model. The inter-
pretability enables clear communication and easy decision making for downstream
collaborators. For example, one can easily determine how many discount coupons
2
which household needs for which item, so that the store can send out appropriate
amount of promotions to targeted households at the right time.
1.3 Thesis Scope and Contributions
When it comes to commercial usage of dynamic modeling, we prefer online models
that generate full distributions of quantities of interest with both computational
speed and forecasting accuracy. This thesis adopts the Bayesian Dynamic Generalized
Linear Modeling framework by West and Harrison (1997), which is designed to be
sequential and probabilistic and is reviewed in section 2.2.
The challenge in this context occurs due to the inherent sparsity of the data at the
finest level of individual household and item, where the random noises can dominate
the real signals. The sporadic counts can be well modeled by mixture models, such
as Dynamic Counts Mixture Models (Berry and West, 2020) and Dynamic Linear
Mixture Models (Yanchenko et al., 2021), which are described in section 2.3.
In practice, computational speed and accuracy have an incompatible trade-off
relationshipāwe need to sacrifice one to compensate for the other. Within com-
mercial settings, the new information comes in so fast that one cannot afford to
run a computationally intense model, for example, a model that requires MCMC.
In order to promote efficiency while maintain forecasting accuracy, we resort to the
decouple-recouple modeling strategy proposed by Berry and West (2020). The mod-
els adopting this strategy are named multi-scale models, which first treat each series
independently and then propagates common simultaneous high-level signals down
to the decoupled series to restore the dependence. Decoupling enables fast parallel
computation, while recoupling mitigates the random noises issue at the finest level
which contributes to overall accuracy along with the restored dependence. The rest
3
of chapter 2 reviews the framework of multi-scale modeling and showcases modeling
results on the data mentioned below. Note that even though section 2.4 describes the
multi-scale modeling by Berry and West (2020), section 2.5 utilizes an approximated
but much more efficient version by Lavine et al. (2020).
Notice that in section 1.1, there is no demographic information about those house-
holds available in the data. Since the multi-scale modeling scheme requires signals
from some aggregate levels, we inevitably need to create a set of criteria to classify
households into groups. Chapter 3 discusses the motivation and the more profound
significance beyond modeling of the labeling/classification system in section 3.1, and
defines the specific standards in section 3.2, with section 3.3 showcases a few examples
in both tabular and graphic forms.
Ultimately, the pursuit of better modeling is at service of better decision making.
In the commercial context, questions such as how many discounts one should give to
a particular household, or even what is the optimal discounts, are of great interest.
The answers to such questions involve the decision makerās utility functionāwhat
he/she prioritizes, i.e. short-term profits or long-term customer relationships. Chap-
ter 4 explores these questions under the framework in chapter 2, in the context of
businesses like supermarkets, which is described in details by section 4.2. Then, sec-
tion 4.3 and 4.4 walk through mathematical details of decision optimizations under
relevant models, from the simple to the more sophisticated, complemented by more
illustrations in section 4.4.
Chapter 5, as the last segment of the thesis, elaborates my contributions in terms
of programming to the project (Yanchenko et al. (2021)). It contains both the latent
factor modeling, and the labeling system based on household purchasing behaviors,
which is introduced in chapter 3.
4
Chapter 2
Dynamic Models
2.1 Introduction
Dynamic modeling is of great interest of commercial outlets such as e-commence
companies like Amazon and supermarkets like Walmart and Target. This chapter
reviews the framework of the relevant models for our problem. Specifically, we use
dynamic multi-scale mixture models that are well suited to deal with multivariate
time series that are either non-negative counts or continuous-valued. Built off ex-
tensions to dynamic generalized linear models by West and Harrison (1997), these
models inherent the advantages of being sequential and probabilistic, and are able to
generate samples from the implied predictive distributions of target quantities, which
allows inference on various statistics and further decision analysis (Chapter 4).
Background
Many prior works pave the way for this thesis. Over 20 years ago, the framework of
dynamic generalized linear models was established (West and Harrison, 1997, chap.
14). In recent years, researchers have picked up the baton, extending and modifying
the old framework to build tailormade models for count-valued time seriess (Berry
and West, 2020; Berry et al., 2020). The framework of multi-scale modeling leverages
information from aggregate level, which provides a potential solution to the zero-
inflated data. To improve computational efficiency, Lavine et al. (2020) propose
5
a copula-based approximation that drastically speeds up the modeling, meanwhile
maintains forecasting accuracy.
2.2 DGLMs: Dynamic Generalized Linear Models
DGLMs are dynamic models whose primary variables are from the exponential family.
The sampling model for a time series i over time t is given by Equation 2.1. Here i
indicates the index of each individual series, while t is the time.
p(yi,t|Āµi,t, Ļi,t, Dt) = b(yi,t, Ļi,t)exp[Ļi,t(yi,tĀµi,t ā a(Āµi,t))], i = 1 : N, t = 1, 2, 3, . . .
(2.1)
Equation 2.1 is the conditional distribution of yi,t, given all the information avail-
able up to time t, which is denoted by Dt = {yt,Dtā1, Itā1}, , where Itā1 represents
any additional relevant information beyond the observed data. Āµi,t and Ļi,t are the
natural parameter and precision parameter, respectively. Here, in the DGLM frame-
work, the focus is Āµi,t, which maps to the linear predictor Ī»i,t = g(Āµi,t) via a link
function g(Ā·). As a state-space model, its dynamic Markov evolution is defined as
Ī»i,t = F ā²i,tĪøi,t where Īøi,t = Gi,tĪøi,tā1 + Ļi,t with Ļi,t ā¼ [0,Wi,t] (2.2)
where
ā¢ Fi,t is a matrix of known covariates at time t,
ā¢ Īøi,t is the state vector, which evolves via a first-order Markov process,
6
ā¢ Gi,t is a known state matrix,
ā¢ Ļi,t is the stochastic innovation vector, or evolution ānoise ā, with E(Ļi,t|Dtā1, Itā1) =
0 and V(Ļi,t|Dtā1, Itā1) = Wi,t, independently over time, which can be con-
trolled by the design/scheme descirbed in details in Appendix A.2
ā¢ For Poisson DGLMs, the design has a random effect parameter Ļ ā (0, 1] to
account for the overdispersion. The models in both Section 2.5.1 and 2.5.2
specify this parameter. More details of random effect can be found in Appendix
A.3.
2.3 DMMs: Dynamic Mixture Models
When it comes to zero-inflated data, a single DGLM is not flexible enough to capture
the signal from the finest level, which is the goal of our project: customized modeling
and decision-making for individual households and items. Therefore, we resort to
mixture models which consider zeros separately, such as the Dynamic Count Mixture
Models (Berry and West (2020)) and Dynamic Linear Mixture Models (Yanchenko
et al. (2021)). These two models are designed for non-negative counts and continuous
values respectively. In the context of our business problem, the former models weekly
sales, while the later is used for weekly spending.
2.3.1 DCMMs: Dynamic Count Mixture Models
In order to deal with non-negative counts with many zeros, Berry and West (2020)
propose Dynamic Count Mixture Models (DCMMs). The models are a mixture of
Bernoulli DGLMs and shifted Poisson DGLMs, as described by Equation 2.3.
7
Bernoulli DGLM: zt ā¼ Ber(Ļt)
Poisson DGLM: yt|zt =
0, zt = 0
1 + st, st ā¼ Po(Āµt), zt = 1
(2.3)
The two components of this mixture, Bernoulli and Poisson, separately evolve,
predict and update just like a univariate DGLM, which are detailed in Appendix A.1.
2.3.2 DLMMs: Dynamic Linear Mixture Models
With the similar strategy of treating those inflated zeros separately, Yanchenko et al.
(2021) proposes Dynamic Linear Mixture Models, which are mixtures of Bernoulli
and Normal DGLMs, as in Equation 2.4, and are used to model the logarithm of
weekly spending of each individual household.
Bernoulli DGLM: zt ā¼ Ber(Ļt)
Normal DGLM: xt|zt =
0, zt = 0
xt ā¼ N(F ā²tĪøt,Vt), zt = 1
(2.4)
Similarly, DLMMs retain the flexibility, computational efficiency, and full proba-
bilistic uncertainty from Bayesian DGLMs in West and Harrison (1997).
8
2.4 Multi-scale Framework
In this project, the multi-scale modeling framework incorporates all the dynamic
models all together. The framework has been conceptualized, developed, elaborated,
exemplified, throughout prior work such as Berry and West (2020), Carvalho and
West (2007), and Ferreira et al. (2003, 2006). This prior work uses the ābottom-
up/top-downāideas and novelly adopts them in multi-scale time series models and
Bayesian forecasting.
Specifically, the application of multi-scale modeling relies on the natural hierarchy
of items defined by the store (see section 3.1, Yanchenko et al. (2021)), and the
innovation of household grouping achieved by the labeling system/criteria in Chapter
3. This modeling strategy utilizes information across items and households that are
somehow close in the hierarchy, which allows signals from the shared level to be
propagated in the ātop-downā fashion, thus improving forecast accuracy at the finest
level: each item household pair.
Under the multi-scale framework, the shared signals from the aggregate level are
simultaneous with the quantity of interest, as opposed to lagged, which retains the
online learning of the model of interest, as well as takes into account the additional
uncertainty introduced by the high-level signals. This is critical for any inference on
the predictive samples. Note that this requires some knowledge/control of future val-
ues, such as the discount percentages offerred next week. The multi-scale framework
is described by Equation 2.5. (A summary of this section can be found in Appendix
A.4)
9
Mi : Equation 2.1 and 2.2 with Īøi,t = (Ī³i,t, Ī²i,t)ā² and Fi,t = (hi,t, Ļt)
ā², i = 1 : N
M0 : Ļt ā¼ p(Ļt|Dtā1)
(2.5)
There are independent external models denoted by M0 modeling the simultaneous
predictor vector Ļt, which is incorporated into the individual models: Mi in the
regressor vector Fi,t. Each Mi has its own dynamic state vector Ī²i,t for the shared
signal Ļt, which allows individual models to uniquely respond to the shared higher-
level signal thereby improving forecast accuracy.
When it comes to the implementation of the multi-scale model, Berry and West
(2020) propose a direct Monte Carlo method, which obviates the usage of Markov
Chain Monte Carlo, while in a more recent literature, Lavine et al. (2020) adopt an
analytical approximation which significantly boosts computational efficiency, as well
as maintains similar forecast accuracy.
2.5 Case Study and Examples
As a part of the project (Yanchenko et al. (2021)), one of the major goals of this thesis
is to identify, capture, and utilize the price sensitivity of each household. Specifically,
as displayed in Figure 2.1, I find a multi-scale model that utilizes an aggregate dis-
count percentage across households, to improve forecasting accuracy. The household
hierarchy used to aggregate the discount information is introduced and elaborated in
Chapter 3. Here in this chapter, models or modeling results of households with high
price sensitivity are exemplified, evaluated, and discussed.
Similar to Figure 5 in Yanchenko et al. (2021) which visualizes the modeling
10
decomposition of each item, household pair, Figure 2.1 demonstrates the two main
implementations of this problem: ātop-downā propagations along the hierarchy of
an item (store, department, category, product) or a household (groups with different
price sensitivity/loyalty). This thesis mainly contributes to the later.
Figure 2.1: Modeling hierarchy
2.5.1 Individual Household DGLMs
Resorting to the decouple-recouple strategy, univariate DCMMs (Equation 2.3) model
the weekly sales of each household, with the first two covariates of the regressor
vector F ā² being (1, discountt), where discountt is the simultaneous binary indicator
of weekly promotion. The third covariate explored is the information in weekly
discount pecentage, which is to be used either directly or aggregately. Model M1 and
M2 give the full descriptions of the models.
11
M1 :
ā¢ Response variable yt: weekly sales of an item from a particular household
ā¢ F ā²t = (1, discountt, discount percentt),G = I3
ā¢ Discount factors: Ļ = 0.6, Ļlocal linear = 0.98, Ļregression = 0.98
M2 :
ā¢ Response variable yt: weekly sales of an item from a particular household
ā¢ F ā²t = (1, discountt, aggregate discount percentt),G = I3
ā¢ Discount factors: Ļ = 0.6, Ļlocal linear = 0.98, Ļregression = 0.98
2.5.2 Multi-scale Modeling
The idea of the multi-scale modeling is to have an aggregate level signal as a baseline
reference extracted from group behaviors, so that we have, at least, some āsafetyā
information to draw on, when there is nothing but noise at the finest level (household).
In contrast to M1 which is a household specific model, M2 is a multi-scale model
(Equation 2.5) with a simultaneous covariate that incorporates price sensitivity across
a group of households. The exploration finds that M2 outperforms M1 and others,
especially when it comes to households with high price sensitivity. This section
describes the external model that generates the third covariate of M2: aggregate
discount percentt.
External Model Specification
The external model M0, whose parameters are specified below, is a Poisson DGLM
(Equation 2.1 and 2.2)āa DCMM would have been practically equivalent, due to the
12
lack of zero inflation in the aggregate data.
M0 :
ā¢ Response variable yt: weekly sales of an item from a group of households
ā¢ F ā²t = (1, average discount percentt),G = I3
ā¢ Discount factors: Ļ = 0.6, Ļlocal linear = 0.998, Ļregression = 0.998
Model Integration: Signal Propagation
Upon possessing the coefficients of M0, the next step is to combine the aggregate level
signal and the household specific information. Specifically, given the state vector Īøā²t =
(Ī±t, Ī²t)ā² of M0, the third covariate of M2 becomes aggregate discount percentt =
Ī²t ā discount percentt.
In the context of this application, Ī²t can be interpreted as a measure of price
sensitivity on the item for the whole group of households, while multiplying it with
the household specific discount percent accounts for the heterogeneity of promotions
within the group.
Figure 2.2 shows the critical quantities over the forecasting period in the de-
scribed process. Figure 2.2a is the coefficient of average discount percent: Ī²t, which
is multiplied by the simultaneous individual household discount percent to obtain
the customized aggregate discount percent for that particular householdāshown in
Figure 2.2b. Note that in Figure 2.2a, the coefficient is well above zero, indicating a
strong price sensitivity of the chosen household group, which validates the multi-scale
strategy. The latent factor in Figure 2.2b displays one household of the groupāit
has zero values in Figure 2.2b, because the household was not offered any promotion
on item 62 for those weeks. Figure 2.2c gives the coefficient of the latent factor: ag-
gregate discount percent, in model M2. The mean and one-standard-deviation region
13
(a) Coefficient of average discount percent (b) Aggregate discount percent
(c) Coefficient of aggregate discount percent
Figure 2.2: (a) Coefficient of average discount percent in the external model M0;(b) Product of (a) and actual discount percent; (c) Coefficient of (b) in an individualmodel M2
implies the combined covariate plays a consistently valuable role in that particular
model.
2.5.3 Model Evaluation and Comparison
Figure 2.2 illustrates the multi-scale model at the individual household level, this
section discusses the performance of the model on the chosen group, and compares
to other alternatives.
Figure 2.3 shows the accuracy of all three models: M1, M2, TensorFlow logistic
14
regression model, and the naive guess 2.6 based on the promotion. Figure 2.3a
compares the multi-scale model with each of the others, while Figure 2.3b displays
their individual distributions.
Naive Guess: zt =
0, No discount
1, Discount
(2.6)
In Figure 2.3a, each point represents a household in the group, and the straight
line is the y = x line. Therefore, the points above the line are where the multi-scale
model outperforms the others. From the first subplot of Figure 2.3a, we can see that
the multi-scale model is better than the naive guess, when the households follow the
promotions between 60% to 80%. This is significant, because that is actually the
most common case and when it is difficult to predict the behaviorāif all customers
are following the promotions, there is no need to build models more complex than
the naive guess. In the second subplot, for a few households, using the aggregate
discount percent conspicuously dominates the DGLM without the signal from an
aggregate level, which emphasizes the importance of ātop-downā propagation, as
mentioned at the beginning of Section 2.5.2. Compared to the TensorFlow model,
the multi-scale model generates similar outcomes, while the later has the benefits of
being probabilistic, sequential, and a lot faster.
As the parental project of this one, Yanchenko et al. (2021) compares model M2
and model M1 in terms of other metrics, such as MAD, MAPE and ZAPE. Particu-
larly, Table 4 and 6 exemplify the improvement on a larger scope. The āsimultaneous
ācolumn and āmulti-sclae ācolumn in Table 4 and 6 respectively are the performances
of the model chosen in Section 2.5.3. In the paper, the model is applied to a more
heterogeneous household group and outperforms the alternatives by all metrics.
15
(a) Accuracy pairwise comparisons
(b) Accuracy distributions
Figure 2.3: Model comparison in terms of forecasting accuracy. Naive model: Mnaive;DGLM: M1; Latent: M2; TF: a logistic regression model written in TensorFlow by 84.51ā¦
16
2.6 Summary
This chapter reviews the framework of Bayesian dynamic generalized linear models
and the extensions to count-valued time series. The rest of it elaborates and exemplify
the multi-scale modeling approach. This ātop-down āstrategy shows potential to
handle the difficult modeling when it comes to sparse data, compared to other models
described in Section 2.5.3.
The extension of the multi-scale approach to hierarchical decomposition (Figure
2.1) not only captures household behavior, but also maintains scalability. The key is
to identify a group of individual series that share information (Chapter 3).
All models throughout this chapter are extensions of the fully probabilistic, inter-
pretable, sequential dynamic generalized linear models. They are tailormade for the
individualized forecasting and decision making problem.
17
Chapter 3
Labeling System
The ātop-down āmodeling strategy (Berry and West, 2020) is suited to the personal-
ized household forecasting problem described in Section 2.5. It seeks common signals
in aggregate level and propagates such signals down the hierarchy. This is effective
when it comes to sparse data, as I face throughout this thesis. One obstacle to realize
this modeling concept is the lack of proper aggregate information, which is the main
incentive of this chapter (Section 3.1).
3.1 Motivations and Purposes
Upon implementation of the multiscale modeling strategy, which propogates clearer
signals from the aggregate level for each household, it is natural to develop a set of
grouping/clustering criteria, based on which we could circumvent the lack of demo-
graphic information and identify the appropriate aggregate signals. The goal is to
group thousands of households according to their promotion scenarios and purchasing
behaviors. Based on the quantification of such scenarios and behaviors, households
are classified into eight different categories, with each geometrically represented by
an octant of a unit cube centering around the origin. Note that the process described
above can be implemented for every item actively sold in the store, which allows us
to identify and then model the aggregate signals for every household, item combi-
nation. From a holistic perspective, the grouping not only enables the multiscale
strategy, but also, as a guidance, illuminates the proper actions on different groups
18
of households and identifies the strength of the model.
3.2 Labeling System
Due to the unavailability of the demographic information about the households, the
following grouping is developed, on the basis of the promotion circumstances and
buying behaviors, which are defined for every household, item combination, as below:
For every item-household pair: (i,h), i = 1:I, h = 1:H, with I, H being the total
number of items being sold and households recorded,
ā¢ Discount Offered Percentage (DOP): over the span of the 112 weeks recorded,
the proportion of weeks when there were promotions offered to household h of
item i.
ā¢ Discounted Purchase Percentage (DPP): among the weeks when item i was
discounted for household h, the proportion of household h made a purchase.
ā¢ Regular Purchase Percentage (RPP): among the weeks when item i was at
regular price for household h, the proportion of household h made a purchase.
These three quantities together define a household space for each item, whose
domain is a unit cube centering around the origin, with the eight divisions established
and interpreted as below:
For i = 1:I,
1. The octant containing (DOPi, DPPi, RPPi) = (0, 0, 1);
Interpretation: Loyal households who are very consistent on item i.
19
2. The octant containing (DOPi, DPPi, RPPi) = (0, 1, 1);
Interpretation: Similarly loyal households to type 1 who are very consistent on
item i.
3. The octant containing (DOPi, DPPi, RPPi) = (1, 1, 0);
Interpretation: Promotion sensitive households who are responding and enjoy-
ing the discounts on item i.
4. The octant containing (DOPi, DPPi, RPPi) = (1, 1, 1);
Interpretation: Similar to type 3.
5. The octant containing (DOPi, DPPi, RPPi) = (0, 0, 0);
Interpretation: Untouched or pristine households who might respond to pro-
motions of item i if delivered.
6. The octant containing (DOPi, DPPi, RPPi) = (0, 1, 0);
Interpretation: Similar to type 5.
7. The octant containing (DOPi, DPPi, RPPi) = (1, 0, 0);
Interpretation: Disinterested households, despite of promotions of item i
8. The octant containing (DOPi, DPPi, RPPi) = (1, 0, 1);
Interpretation: Similar to type 7.
Based on the classification, it is natural for the eight types of customers to coalesce
into four larger groups, as the following:
For i = 1:I, among which households,
1. habit and loyalty for item i are established.
Actions: Maintain the relationship and occasionally compensate for their loy-
alty to item i.
20
2. promotion sensitivity and interests in item i are detectable or even conspicuousā
which is the ideal group of customers to model the price sensitivity.
Actions: It is interesting to find the amount of promotions of item i that gen-
erates the most profits, which depends on the distribution of sales and the
quantity being optimized.
3. promotions are not available.
Actions: Explore and experiment with these customers by delivering promo-
tions of item i.
4. disinterests in the item or disregard for the promotions are noticeable.
Actions: Check the validity of the promotions sent out. If they are disregarded,
stop the promotions of item i.
Note that the households from group 2 above are of the most modeling interest,
given that the covariates are price.
3.3 Case Study and Examples
The examples demonstrated in this section are results of the labeling system, Section
3.2, on a portion of the dataāthe highest spending household groupādescribed in
Section 1.1.
Table 3.1: Example items for each group
item group1 group2 group3 group4 total
type1 type2 type3 type4 type5 type6 type7 type8
...
21
62 0 0 578 254 0 0 1071 44 1947
...
72 34 395 71 2 567 412 3 0 1484
...
176 5 46 0 0 1522 98 10 0 1681
...
199 0 0 5 6 7 0 1653 1 1672
...
As mentioned in Section 3.1, the labeling system aids in identifying signals to
modelāwhich is discussed in Section 2.5, as well as sheds light on decision making
at individual item, household level.
Table 3.1 displays the four items with significant numbers of households in each
group. As discussed in Section 3.2, group 2 is of the most interest when it comes
to modeling householdsā sensitivity to promotions and item 62 is the chosen item to
illustrate the multi-scale modeling strategy in Section 2.5 and the multi-step decision
analysis in Section 4.4. Group 1 is the loyal customers with consistent spending on a
given item, exemplified by the 429 households purchasing item 72; while the majority
of the households recorded in item 176 are classified in group 3, which suggests ten-
tative promotions; lastly, item 199 is not a seling item, despite the promotions, which
should draw the attention of decision makers. Potential questions to be investigated
are should the store 1. check the validation or accessibility of the promotions being
sent out. 2. shrink the promotions to save on mailing, etc. 3. reduce the inventory,
since it does not sell. 4. bundle it with other items that sell.
Figure 3.1 visualizes the definition of the clustering criteria in Section 3.2, with
a couple of examples with and without the grouping regions. For a particular item,
22
(a) Subspaces defined by the labeling system(b) 3D scatterplot of all households recordedfor item X
(c) 3D grouped scatterplot (d) Interactive legend
Figure 3.1: Interactive visual aids of labeling system
23
these kind of plots demonstrate its customersā sensitivity to the promotions, loyalty
to the product, as well as help identify anomaly in delivery of its promotions. The
four cuboids which consist of two octants each, represent the four customer groups.
Each point in Figure 3.1b and 3.1c is a household recorded for that particular item.
The axes are the three quantities defined in Section 3.2: DOP, DPP, RPP, with more
information incorporated in the plots, such as household id, the exact values of the
three axes, plus a couple categorical results, as shown in Figure 3.1d.
This visualization serves as a dictionary and enables easy, straightforward search-
ing for any particular record in the data. For example, one might be curious about
the exact information of a point, after locating it in group 2 of Figure 3.1c. Then user
can turn off the shadow for the grouping, to display in the mode of 3.1b, and hovers
on the chosen point for the household index, the average discounted sales, whether or
not the customer buys more with promotions, etc. The tool also makes it simple for
anormaly detection. For instance, a household buying significantly more without dis-
counts rather than with discounts is indicated by a cross, and is easily distinguished
from a household buying more with discounts than without them, which is a circle.
3.4 Summary
This chapter defines a set of standards to assign households according to their pur-
chasing behavior. These criteria are best demonstrated, utilized interactively, as
illustrated by Section 3.3 and Figure 3.1. The outcomes are referred in Chapter 2
espeically for the definition of aggregate information. Besides, the user can interact
with the figures, exploring features of interest, such as popularity of items, promotion
availability, distributions of households in the space of purchasing behavior, etc.
24
Chapter 4
Decision Analysis
4.1 Introduction
In reality, for any business analyses, it is essential to make decisions and understand
the corresponding consequences and uncertainties attached to them. It converts
our statistical efforts into business potentials and bestows real-life significance upon
the project. This chapter begins with a few simple examples of decision analysis
tailor-made for this context of item-specific discount offers. Then it proceeds with
more realistic settings where simulation-based approach shows advantages in terms
of efficiency. Finally, this chapter concludes with an example focusing not only on
optimization of the expected utility, but also the uncertainty analysis coming from
the full distribution, showcasing the advantage of the probabilistic model.
4.2 Business Context
For retail businesses such as grocery stores or supermarkets, it is of great interest to
understandāfor a given itemāhow discounts impact sales and eventually profits per
unit time. A typical setup would be the following:
ā¢ An item has usual/nominal selling price $p.
ā¢ Item cost is $c, intended to capture all real costs for the store (purchase/whole-
sale costs, storage, labour, etc).
25
ā¢ Percent discount 100d% for decision variable d ā (0, 1] (usually, d has to be
greater than 0 to make profit); discounted price is $(1ā d)p.
ā¢ Implied profit per item at discount d is then $pd = ${(1 ā d)p ā c}. Note
that a short-term decision maker would always have this value positive, i.e.
d < 1 ā c/p, which is the scenario considered here. However, sometimes it
is beneficial for the long term to have d ā„ 1 ā c/p for a controlled span of
period, i.e. sacrificing short-term profits to build up entrenched relationship
with customers, which suggests a more sophisticated setup than described here.
(An extra term describing expected gain is needed in the expected utility)
ā¢ y is the number of items sold per unit time at offered discount.
ā¢ Implied expected profit (utility)= $ud where
ud = E(y|d){(1ā d)pā c} (4.1)
Supposedly, smaller d implies higher price and lower expected sales; higher d
increases expected sales but reduces profit per sale. Hence ud may have optimized
point(s) within reasonable range.
4.3 Tentative Models
4.3.1 Poisson Model
When it comes to non-negative counts, Poisson model is one of the lower-hanging
fruits. Conditioned on a chosen discount d, assuming the sales of a particular item y
26
follows a Poisson distribution, try a linear model for the log link. Statistical details
of the model and its optimization are shown below.
ā¢ Sales: y|d ā¼ Po(Āµd) with log(Āµd) = Ī± + Ī²d. Naturally Ī² > 0.
ā¢ Expected profit:
ud = Āµd{(1ā d)pā c} = {(1ā d)pā c} exp{Ī± + Ī²d}. (4.2)
ā¢ Maximizing ud is equivalent to maximizing log(ud), with d ā (0, 1ā c/p],
doptimal = argmaxd log(ud) =
1ā c
pā 1
Ī², Ī² > p
pāc
0, otherwise
(4.3)
ā¢ Sometimes in practice, the actual selling price $p and its cost $c are not of great
interest. In those circumstances, it makes sense to replace p/c with r which is
the markup plus 1. Then equation 4.3 simply becomes
doptimal = argmaxd log(ud) =
1ā 1
rā 1
Ī², Ī² > r
rā1
0, otherwise
(4.4)
4.3.2 Mixture Model
ā¢ Sales: y|d = z(x+ 1), where z, x are independent,
z ā¼ Ber(Ļd), logit(Ļd) = Ī±0 + Ī²0d
x ā¼ Po(Āµd), log(Āµd) = Ī± + Ī²d
(4.5)
Naturally Ī²0, Ī² > 0.
27
ā¢ Expected profit:
ud = Ļd(Āµd + 1){(1ā d)pā c}
= {(1ā d)pā c} logitā1(Ī±0 + Ī²0d)(1 + exp(Ī± + Ī²d))
= {(1ā d)pā c}exp(Ī±0 + Ī²0d)(1 + exp(Ī± + Ī²d))
1 + exp(Ī±0 + Ī²0d)
(4.6)
ā¢ Maximizing ud is equivalent to maximizing log(ud), with d ā (0, 1āc/p]. Taking
the first derivative, we have the following problem:
Ī²0(1ā Ļd) + Ī²Āµd
1 + Āµd=
p
(1ā d)pā cĪ²0, Ī² > 0 (4.7)
Due to the difficulty of solving this for doptimal analytically, we resort to a
numerical method for mode hunting, which is a seven-dimension problem:
(d, Ī±0, Ī²0, Ī±, Ī², p, c). Similar to how we obtain Equation 4.4, with p(1ād)pāc writ-
ten as r(1ād)rā1 , where r = p/c, we reduce the total dimension to six. Besides,
by incorporating the information from business context, we are able to narrow
down the plausible values of some parameters, thus mitigate the computational
burden/intensity.
ā¢ Referring to Table 4.1āwhich shows the distributions of those four coefficients
over 300 household-level dataāfor the purpose of this analysis, some plausible
28
domains are chosen to be the following.
d ā (0, 1ā 1/r] where r = p/c
Ī±0 ā (ā0.9, 1.2) taking the 10, 90 percentiles
Ī²0 ā (0, 1.7) truncating the positive portion
Ī± ā (ā0.55, 0.95) taking the 10, 90 percentiles
Ī² ā (0, 2.2) truncating the positive portion up to 75 percentile
r ā (1.1, 2) just a reasonable guess
Figure 4.1: Distributions of four model parameters
For each set of parameters, it is trivial to compute the optimal discount under the
given utility. Figure 4.2 shows the relationships between discount: d and Ļd, Āµd and
29
Table 4.1: Summary statistics of logistic and poisson regressions
alpha0 beta0 alpha beta
count 300.000000 300.000000 300.000000 300.000000mean 0.220211 -5.577831 0.206850 1.466264std 0.911970 2.894260 0.596577 1.207612min -4.153728 -19.449333 -2.457143 -2.22932610% -0.872800 -8.751870 -0.556313 -0.01021325% -0.199389 -7.043806 -0.209543 0.71673550% 0.323359 -5.459991 0.199102 1.41622075% 0.788633 -3.830893 0.603878 2.20895690% 1.229518 -2.197984 0.958858 3.157952max 2.602268 1.693713 2.071097 5.358105
ud, with four different sets of intercepts, but the same slopes: (Ī²0, Ī²) = (1.7, 2.2).
Since the intercepts represent the circumstances without discounts, high values of
intercepts compared to slopes will lead to doptimal = 0, because the item is already
popular without discounts, a lower price would simply hurt the profit and bring
marginal increase in sales. On the other hand, very low values give the same results for
a different reasonācustomers are so indifferent to the item that even high discounts
are not able to attract them. In terms of slopes, high values indicate high sensitivity
in discounts and an outstanding peak in utility can be expected.
4.4 DCMMs
The decision analysis can be incorporated into the framework of Bayesian Dynamic
Linear Models. Without approximations of digamma and trigamma functions, the
optimal problem cannot be written in a close form. One has to resort to iterative
numerical solution based on standard Newton-Raphson to find the implied conjugate
parameters. With the following approximations of digamma and trigamma func-
tions, respectively, we are able to write out the optimization problem in terms of the
30
regressor vector Ft.
Ļ(x) u log(x)
Ļā²(x) u1
x
(4.8)
For Binomial DGLMs, we have
Ī±t =1 + exp(ft)
qt
Ī²t =1 + exp(āft)
qt
(4.9)
and for Poisson DGLMs, we have
Ī±t =1
qt
Ī²t =exp(āft)
qt.
(4.10)
In general, we might want to optimize the expectation of the scaled direct out-
come: the profit in this case. That is the product of the expected sales and a linear
function of the regressor vector Ft. If we continue to work throught the math, we
have the following optimization problem:
Ļt = Ī±t
Ī±t+Ī²t= 1+exp(ft)
2+exp(ft)+exp(āft)
Āµt = Ī±t
Ī²t= exp(ft)
ut = Ļt(Āµt + 1)F ā²tb
(4.11)
where ft = F ā²tat, at the first moment of the evolved state vector Īøt and b is the
32
known linear coefficients. Note that this does not depend on the second moment qt,
which makes sense because it is the first moment of the profit that we are optimize.
So, after simplification, we have
ut(Ft) =(1 + exp(F ā²tat))
2
2 + exp(F ā²tat) + exp(āF ā²tat)F ā²tb (4.12)
While Equation 4.12 is hard to solve analytically, it is straightfoward to approx-
imate the optimal solution computationally, when Ft is short. In the case of this
study, Ft = (1, d)ā², where d is the discount percentageāwhich has a plausible range
from 0 to 1. In reality, d < 1 ā c/p for any positive profit, which means we need to
discuss this problem under various ratios of cost and sale price c/p.
Here, I have explored three different sale, cost ratio: 1.2, 2, 10. The first two ratios
are realistic, while 10 is to experiment with an extreme case. The question of interest
is to forecast the outcomes if the store were to use the optimal discount determined
by Equation 4.12, with Ft = (1, d)ā² and b = (pā c,āp)ā², every week for 52 weeks.
In order to obtain the distributions of optimal discounts and corresponding profits,
parameters, it is convenient to utilize the DCMMsāas emphasized throughout this
study and delineated in Appendix A, it is probabilistic and sequential. Figure 4.3,
4.4, 4.5 show such distributions for each scenario. Appendix B has more figures of
the relevant model parameters: Ļt, Āµt.
The figures are simulations of outcomes, given the models trained up until week
1, and the store always picks the discount percentage that maximizes the total profits
in the impending weeks. All three figures use the same household, item pair, only
with p/c different.
If one compares the results across the three, it is conspicuous that lower cost/higher
sale price affords the store more space to offer discount, thereby increases the weekly
33
(a) Distributions of optimal discounts over a year
(b) Distributions of optimal profits over a year
Figure 4.3: Distributions of simulated outcomes over a year (p/c = 1.2).34
(a) Distributions of optimal discounts over a year
(b) Distributions of optimal profits over a year
Figure 4.4: Distributions of simulated outcomes over a year (p/c = 2).35
(a) Distributions of optimal discounts over a year
(b) Distributions of optimal profits over a year
Figure 4.5: Distributions of simulated outcomes over a year (p/c = 10).36
sales large enough to compensate the discounted price, as a result generates more
total profits. Meanwhile, one tends to investigate each figure along its one-year hori-
zon. It is worth being noted that the distributions appear to converge as enough time
elapses. This is easy to accept once we realize that despite the exquisite design of
DCMMs/DGLMs, it does not inject new information or introduce disturbances. So,
after accepting the model up until week 1, we are bound to have a stationary forecast
after enough time passed. This discloses the difficulty of forecasting long termā
without any definite, deterministic insight, forecasts made by stationary models (all
pragmatic time series models) are simply reflection of the observed. In comparison,
short-term forecasting is more plausible (Figure 2.3b), because our variables of inter-
est are not as volatile in the short term as long. In a few words, statistical models
are simply capsules of available information, after training on the past, all one can
hope is that the history could shed light on the future.
4.5 Summary
This chapter begins with a business setting that approximates the reality, exploring
decision analysis problem under a few models. The goal throughout is to maximize
profit earned by an item from a household (Equation 4.1). Simple models such as
Poisson regression (Section 4.3) as well as DCMMs are studied for the optimiza-
tion problem (Section 4.4). I derive the mathematics for each case, at least up to
simplification of the problem. I resort to numberic method and simulation-based
computation, when the analytical solution is difficult to obtain (Section 4.4).
More areas can be explored in terms of loss/utility function. Relevant loss func-
tions for zero-inflated count-vaued time series are ZAPE, adjusted ZAPE, MAPE,
etc Yanchenko et al. (2021). Besides, the probabilistic model allows much more com-
37
plicated utility function than these that only provide point forecast. For instance, a
decision maker can ask for 0.5 or higher probability of gaining four dollars of profit
from a household over a span of two weeks.
Of course, there are expected but unsolved questions, such as forecasting and
decision making long-term. Forecasting long-term has always been challenging but
intriguing, regardless of the field. Applications include natural disasters like earth-
quakes (Talebi, 2017), as well as artificial advancements (Kott and Perconti, 2018).
The ability of forecasting long-term has significance for policy-makers, business own-
ers, residents in a particular area, potentially everyone. However, since a bad forecast
is worse than no forecast at all, there are much fewer studies on long-term horizon
than short-term. In my personal opinion, the best forecast is to push the future
towards to desired direction. We will meet with the future where our eyes are onāit
could be late, but hopefully not absent.
38
Chapter 5
Computation, Implementation and Code
5.1 Introduction
This chapter is dedicated to showcasing the related programming contributions I
have made to the project. Section 5.2 first introduces the mathematics behind the
programming, followed by section 5.3, which contains the codes generating those
clustering interactive 3D plots (Figure 3.1) in section 3.3.
5.2 Copula Approximation
Lavine et al. (2020) proposes a copula-based analytic method to approximate the
simulation-based one in Berry and West (2020). This approximation balances speed
and accuracy, substantially improves the computational cost. This section derives
the mathematics behind Variantional Bayes and Linear Bayes (VBLB) for multi-
scale DGLMs, as well as the programming contribution I have made to the published
Python package PyBATS.
5.2.1 VBLB for Latent Factor DGLMs
This subsection extends VBLB in Appendix A.1 to the latent factor modeling context.
To implement the method in the latent factor context, we first need to know the first
two moments of the linear predictor Ī»i,t for all i = 1 : N and their covariances.
39
Expanding the expression for Ī»i,t, we get
Ī»i,t = F ā²i,tĪøi,t = hā²i,tĪ³i,t + Ļā²tĪ²i,t (5.1)
We also denote the first two moments of Ļt as Ļt|Dtā1 ā¼ [bt,Bt] and partition
the moments of the state vector as the following:
Īøi,t|Dt ā¼
aĪ³,i,taĪ²,i,t
,
RĪ³,i,t Si,t
Sā²i,t RĪ²,i,t
(5.2)
The mean of the linear predictor is then,
fi,t = E[Ī»i,t] = E[F ā²i,tĪøi,t]
= hā²i,taĪ³,i,t + bā²taĪ²,i,t
(5.3)
The variance of the linear predictor can be calculated using the law of total
covariance,
40
qi,t = V ar[Ī»i,t] = V ar[F ā²i,tĪøi,t]
= Cov(hā²i,tĪ³i,t + Ļā²tĪ²i,t,hā²i,tĪ³i,t + Ļā²tĪ²i,t)
= Cov(E[hā²i,tĪ³i,t + Ļā²tĪ²i,t|Ļt], E[hā²i,tĪ³i,t + Ļā²tĪ²i,t|Ļt])
+ E[Cov(hā²i,tĪ³i,t + Ļā²tĪ²i,t,hā²i,tĪ³i,t + Ļā²tĪ²i,t|Ļt)]
= Cov(hā²i,taĪ³,i,t + Ļā²taĪ²,i,t,hā²i,taĪ³,i,t + Ļā²taĪ²,i,t) Note that Ļt is independent
+ E[V ar[hā²i,tĪ³i,t] + Ļā²tV ar[Ī²i,t]Ļt + hā²i,tCov(Ī³i,t,Ī²i,t)Ļt + Ļā²tCov(Ī²i,t,Ī³i,t)hi,t]
= V ar[Ļā²taĪ²,i,t] + E[hā²i,tRĪ³,i,thi,t + Ļā²tV ar[Ī²i,t]Ļt + 2hā²i,tSi,tĻt]
= aā²Ī²,i,tBtaĪ²,i,t + hā²i,tRĪ³,i,thi,t + 2hā²i,tSi,tbt + E[tr(Ļā²tRĪ²,i,tĻt)]
(5.4)
E[tr(Ļā²tRĪ²,i,tĻt))] = E[tr(RĪ²,i,tĻtĻā²t)]
= tr(E[RĪ²,i,tĻtĻā²t])
= tr(RĪ²,i,tE[ĻtĻā²t])
= tr(RĪ²,i,t(V ar[Ļt] + E[Ļt]E[Ļt]ā²))
= tr(RĪ²,i,tBt) + tr(RĪ²,i,tbtbā²t)
= tr(RĪ²,i,tBt) + bā²tRĪ²,i,tbt
(5.5)
Therefore, the moments of the linear predictor of the extended VBLB for the
latent factor modeling are
fi,t = hā²i,taĪ³,i,t + bā²taĪ²,i,t
qi,t = hā²i,tRĪ³,i,thi,t + 2hā²i,tSi,tbt + bā²tRĪ²,i,tbt + aā²Ī²,i,tBtaĪ²,i,t + tr(RĪ²,i,tBt)
(5.6)
41
Accordingly, the adaptive vector in the LB update step is Ri,tFĢi,t/qi,t where FĢ =
(hā²i,t, bā²i,t)ā². In contrast to the traditional DGLMs, this modified analysis introduces
more uncertainty due to the fact that Ļt is simutaneous and comes from another
external model, which are explicitly written out in qi,t as the last two terms.
Now that we have the means and variances, we only need the pairwise covariance
between Ī»i,t and Ī»j,t, i 6= j, i, j = 1 : N to complete the joint covariance matrix.
qi,j,t = Cov(Ī»i,t, Ī»j,t)
= Cov(E[Ī»i,t|Ļt], E[Ī»j,t|Ļt]) + E[Cov(Ī»i,t, Ī»j,t|Ļt)]
= Cov(hā²i,taĪ³,i,t + Ļā²taĪ²,i,t,hā²j,taĪ³,j,t + Ļā²taĪ²,j,t) + 0
= Cov(Ļā²taĪ²,i,t,Ļā²taĪ²,j,t)
= aā²Ī²,i,tBtaĪ²,j,t
(5.7)
We got zero for the third step because of the independence between Mi and Mj
given Ļt, which is the key assumption for the decouple-recouple modeling strategy.
At this point, we have finished the modifications under the multiscale modeling
context, which paves the road for the construction of copula in section 3 of Lavine
et al. (2020).
5.2.2 Code
This subsection contains aspects of code I developed for the main modeling compo-
nents of thesis research. This covers aspects of the dynamic latent factor framework
that is part of the PyBATS package (https://lavinei.github.io/pybats/). The
first couple of functions extract the linear predictor Ī» for the latent factor, while the
second couple generate scaled versions of model coefficients. The later can be achieved
42
by using dlm coef fxn() and dlm coef forecast fxn(), with merge lf with predictor(),
whose explanations are available at https://lavinei.github.io/pybats/latent_
factor.html
## Latent factor functions for linear predictor lambda
def lambda_fxn(date, mod, k, **kwargs):
"""
function that returns mean and variance of linear predictor
āŖā lambda
:param date: date index
:param mod: model that is being run
:param k: forecast horizon
:param kwargs: other arguments
:return: mean and variance of lambda
"""
return (mod.F.T @ mod.m).copy().reshape(-1), (mod.F.T @ mod.C
āŖā @ mod.F).copy()
def lambda_forecast_fxn(date, mod, k, forecast_path = False,
āŖā **kwargs):
"""
functions that return forecast mean and variance, potentially
āŖā covariance of lambda (if forecast_path is True)
:param date: date index
:param mod: model that is running
:param k: forecast horizon
:param forecast_path: True or False
43
:param kwargs: other arguments
:return: forecast mean and variance, potentially covariance
āŖā of lambda (if forecast_path is True)
"""
lambda_mean = []
lambda_var = []
if forecast_path:
lambda_cov = [np.zeros([1, h]) for h in range(1, k)]
for j in range(1, k + 1):
f, q = mod.get_mean_and_var(mod.F, mod.a.reshape(-1), mod.R)
lambda_mean.append(f.copy())
lambda_var.append(q.copy())
if forecast_path:
if j > 1:
for i in range(1, j):
lambda_cov[j-2][i-1] = mod.F.T @ forecast_R_cov(mod, i, j) @
āŖā mod.F
if forecast_path:
return lambda_mean, lambda_var, lambda_cov
else:
return lambda_mean, lambda_var
44
lambda_lf = latent_factor(gen_fxn = lambda_fxn,
āŖā gen_forecast_fxn = lambda_forecast_fxn)
## Latent factor functions for scaled model coefficients
def dlm_coef_scale_fxn(date, mod, scale = None, idx = None,
āŖā scale_which = None, **kwargs):
"""
function that gets the mean and variance of coefficent latent
āŖā factor
:param date: date index
:param mod: model that is being run
:param scale: scalars that used to scale the mean and
āŖā variance, as known fixed values. For example,
āŖā covariates of
models that use this latent factor. Should be in pandas data
āŖā frame with scalars in columns and dates as index
:param scale_which: index of coefficents to be scaled by
āŖā series in scale (need to be within idx)
:param idx: index of coefficents desired to extract
:param kwargs: other arguments
:return: mean and variance of scaled coefficents
"""
if scale is None:
return dlm_coef_fxn(date, mod, idx, **kwargs)
45
if idx is None:
idx = np.arange(0, len(mod.m))
if not set(scale_which).issubset(set(idx)):
ValueError("scale_which needs to be subset of idx")
m_scale, C_scale = mod.m.copy(), mod.C.copy()
scale_matrix = np.identity(C_scale.shape[0])
scale_matrix[np.ix_(scale_which, scale_which)] = scale.loc[
āŖā date].values * scale_matrix[np.ix_(scale_which,
āŖā scale_which)]
m_scale = scale_matrix@m_scale
C_scale = scale_matrix@C_scale@scale_matrix
return (m_scale[idx]).reshape(-1), (C_scale[np.ix_(idx, idx)])
āŖā .copy()
def dlm_coef_scale_forecast_fxn(date, mod, k, scale = None,
āŖā idx=None, scale_which = None, forecast_path=False, **
āŖā kwargs):
"""
function that compute the forecast mean, variance and
46
āŖā potentially covariance (if forecast_path is True)
:param date: date index
:param mod: model that is being run
:param k: forecast horizon
:param scale: scalars that used to scale the mean and
āŖā variance, as known fixed values. For example,
āŖā covariates of
models that use this latent factor. Should be in pandas data
āŖā frame with scalars in columns and dates as index
:param scale_which: index of coefficents to be scaled by
āŖā series in scale (need to be within idx)
:param idx: index of coefficents desired to extract
:param forecast_path: True or False
:param kwargs: other arguments
:return: forecast mean, variance and potentially covariance (
āŖā if forecast_path is True)
"""
if scale is None:
return dlm_coef_forecast_fxn(date, mod, k, idx=None,
āŖā forecast_path=False, **kwargs)
if idx is None:
idx = np.arange(0, len(mod.m))
p = len(idx)
if not set(scale_which).issubset(set(idx)):
47
ValueError("scale_which needs to be subset of idx")
dlm_coef_mean = []
dlm_coef_var = []
if forecast_path:
dlm_coef_cov = [np.zeros([p, p, h]) for h in range(1, k)]
for j in range(1, k + 1):
a, R = forecast_aR(mod, j)
a_scale = a.copy()
R_scale = R.copy()
scale_matrix = np.identity(R_scale.shape[0])
scale_matrix[np.ix_(scale_which, scale_which)] = scale.loc[
āŖā date].values*scale_matrix[np.ix_(scale_which,
āŖā scale_which)]
a_scale = scale_matrix@a_scale
R_scale = scale_matrix@R_scale@scale_matrix
dlm_coef_mean.append(a_scale[idx].copy().reshape(-1))
dlm_coef_var.append(R_scale[np.ix_(idx, idx)].copy())
if forecast_path:
if j > 1:
for i in range(1, j):
R_cov_scale = forecast_aR(mod, i)[1]
R_cov_scale = scale_matrix@R_cov_scale@scale_matrix
Gk = np.linalg.matrix_power(mod.G, j - i)
48
dlm_coef_cov[j-2][:,:,i-1] = (Gk@R_cov_scale)[np.ix_(idx, idx)
āŖā ]
if forecast_path:
return dlm_coef_mean, dlm_coef_var, dlm_coef_cov
else:
return dlm_coef_mean, dlm_coef_var
dlm_coef_scale_lf = latent_factor(gen_fxn = dlm_coef_fxn,
āŖā gen_forecast_fxn=dlm_coef_forecast_fxn)
5.3 Clustering Visualization
Below are the python codes which output interactive 3D plots exemplified by Figure
3.1. It has the required packages and data manipulations work before the plotly
ploting commands.
import pandas as pd
import numpy as np
from plotly.graph_objects import Scatter3d, Volume
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default = "browser"
## define a function to merge multiple dataframes
49
def df_merge(df, on, how = āouterā):
"""
:param df: List of dataframes to be merged
:param how: Choices are "outer", "inner", or index
:return: A merged dataframe
"""
if how in [āouterā, āinnerā]:
return reduce(lambda x, y: pd.merge(x, y, on = on, how = how),
āŖā df)
else:
df_reorder = [df.pop(how)]
df_reorder.extend(df)
return reduce(lambda x, y: pd.merge(x, y, on = on, how = "left
āŖā "), df_reorder)
## read in the provided data
data = pd.read_pickle(āThe household dataā)
## clean and create sensitivity data
item_discount_ratio = data.loc[:,[ādateā, āitemā,āhouseholdā,
āŖā ādiscountā,ādiscount_potā, āitem_qtyā, ānet_priceā, ā
āŖā regular_priceā]]
item_discount_ratio[ādiscount_percentageā] = (
āŖā item_discount_ratio[āregular_priceā] -
āŖā item_discount_ratio[ānet_priceā]) / item_discount_ratio[
50
āŖā āregular_priceā]
item_discount_ratio[ādiscount_senā] = item_discount_ratio.
āŖā discount == item_discount_ratio.discount_pot
sen = item_discount_ratio.groupby([āitemā, āhouseholdā],
āŖā as_index = False, observed = True)[ādiscount_senā, ā
āŖā discount_potā, āitem_qtyā].mean().\
sort_values(by = [ādiscount_senā, ādiscount_potā,āitem_qtyā],
āŖā ascending=[False, False, True])
## clean and create discounted purchase data
item_discount_ratio_buyd = item_discount_ratio.loc[
āŖā item_discount_ratio.discount_pot == 1]
item_discount_ratio_buyd[ābuy_discountā] =
āŖā item_discount_ratio_buyd.item_qty > 0
buyd = item_discount_ratio_buyd.groupby([āitemā, āhouseholdā],
āŖā as_index = False, observed = True)[ābuy_discountā, ā
āŖā item_qtyā, ādiscount_percentageā].mean().\
sort_values(by = [ābuy_discountā, āitem_qtyā, ā
āŖā discount_percentageā], ascending=[False, True, False])
## clean and create regular purchase data
item_discount_ratio_buyr = item_discount_ratio.loc[
āŖā item_discount_ratio.discount_pot == 0]
item_discount_ratio_buyr[ābuy_regularā] =
āŖā item_discount_ratio_buyr.item_qty > 0
buyr = item_discount_ratio_buyr.groupby([āitemā, āhouseholdā],
51
āŖā as_index = False, observed = True)[ābuy_regularā, ā
āŖā item_qtyā].mean().\
sort_values(by = [ābuy_regularā,āitem_qtyā], ascending=[False,
āŖā True])
# create extra variables for plotting
buy_data = df_merge([sen, buyd, buyr], on = [āitemā, ā
āŖā householdā], how=āouterā)
buy_data.columns = [āitemā, āhouseholdā, āDiscount sensitivity
āŖā ā, āDiscount offeredā,
āSalesā, āDiscount buyā, āDiscount salesā, āDiscount percentā,
āRegular buyā, āRegular salesā]
buy_data.iloc[:,2:] = buy_data.iloc[:,2:].fillna(0)
buy_data[āBuy more with discountā] = buy_data[āDiscount buyā]
āŖā > buy_data[āRegular buyā]
buy_data[āDiscount levelā] = [āsmallā if d < 0.25 else āmedian
āŖā ā if d < 0.6 else ālargeā for d in buy_data[āDiscount
āŖā percentā]]
# 3D interactive plots for households
# Initialize figure with 212 3D subplots
rows = 4
cols = 4
specs = [[{ātypeā: āsceneā} for j in range(cols)] for i in
52
āŖā range(rows)]
subplot_titles = [āpanelā + str(i) for i in range(16)]
fig = make_subplots(
rows=rows, cols=cols,
specs=specs, subplot_titles=subplot_titles)
## There are 211 items, thus 211 plot panels. index 211 to
āŖā 274 are used to create color shadows for each customer
āŖā group
for item in range(275):
## Plot 3D scatterplot for each household item pair
if item < 211:
data = buy_data.loc[buy_data.item == item]
# Generate data
x = data[āDiscount offeredā]
y = data[āDiscount buyā]
z = data[āRegular buyā]
symbol = data[āBuy more with discountā].map({True: ācircleā,
āŖā False: āxā})
color = data[āDiscount levelā].map({āsmallā: āblueā, āmedianā:
āŖā āgreenā, ālargeā: āredā})
size = data[āDiscount salesā]
size = (size - np.min(size)) / (np.max(size) - np.min(size)) *
53
āŖā 20 + 6
# adding surfaces to subplots.
fig.add_trace(
Scatter3d(
x=x,
y=y,
z=z,
name=āitemā + str(item),
visible=False,
## more information added to the scatterplots
customdata=np.stack((data[āhouseholdā].values,
data[āBuy more with discountā].values,
data[āBuy more with discountā].map({True: āCircleā, False: āXā
āŖā }).values,
data[āDiscount levelā].values,
data[āDiscount levelā].map(
{āsmallā: ā<25%ā, āmedianā: ā25-60%ā, ālargeā: ā>60%ā}).values
āŖā ),
axis=-1),
mode=āmarkersā,
marker=dict(
size=size,
color=color,
cauto=True,
54
symbol=symbol,
opacity=0.8
),
hovertemplate=
ā<b>Household</b>: %{customdata[0]}<br>ā +
ā<b>Discount offered</b>: %{x:.0%}<br>ā +
ā<b>Discount buy</b>: %{y:.0%}<br>ā +
ā<b>Regular buy</b>: %{z:.0%}<br>ā +
ā<b>Discount sales</b>: %{marker.size:.2f} units<br>ā +
ā<b>Buy more with discount</b>: %{customdata[1]} (%{customdata
āŖā [2]})<br>ā +
ā<b>Discount level</b>: %{customdata[3]} (%{customdata[4]})<br
āŖā >ā,
hoverlabel=dict(bgcolor=color)
),
row=(np.floor((item % 16) / 4) + 1).astype(āintā), col=(item %
āŖā 16) % 4 + 1)
if item < 16:
fig[ālayoutā][āsceneā + str(item + 1)][āxaxisā] = {ātitleā: {ā
āŖā textā: āDiscount offeredā}}
fig[ālayoutā][āsceneā + str(item + 1)][āyaxisā] = {ātitleā: {ā
āŖā textā: āDiscount buyā}}
fig[ālayoutā][āsceneā + str(item + 1)][āzaxisā] = {ātitleā: {ā
āŖā textā: āRegular buyā}}
55
## Create shadow for each customer group
else:
if (item - 211) % 4 == 0:
X, Y, Z = np.mgrid[0:0.5:2j, 0:1:2j, 0.5:1:2j]
values = np.zeros(X.shape)
elif (item - 211) % 4 == 1:
X, Y, Z = np.mgrid[0.5:1:2j, 0.5:1:2j, 0:1:2j]
values = np.ones(X.shape)
elif (item - 211) % 4 == 2:
X, Y, Z = np.mgrid[0:0.5:2j, 0:1:2j, 0:0.5:2j]
values = np.ones(X.shape) * 2
elif (item - 211) % 4 == 3:
X, Y, Z = np.mgrid[0.5:1:2j, 0:0.5:2j, 0:1:2j]
values = np.ones(X.shape) * 3
x = X.flatten()
y = Y.flatten()
z = Z.flatten()
value = values.flatten()
fig.add_trace(
Volume(
name=āgroupā + str((item - 211) % 4),
x=x,
y=y,
z=z,
56
value=value,
opacity=0.3, # needs to be small to see through all surfaces
surface_count=50, # needs to be a large number for good
āŖā volume rendering
colorscale="RdBu",
showlegend=False,
showscale=False,
isomax=3,
isomin=0,
hovertemplate=
ā<b>Group</b>: #%{value: .f}ā
),
row=(1 + np.floor((np.floor((item - 211) / 4)) / 4)).astype(ā
āŖā intā),
col=((np.floor((item - 211) / 4)) % 4 + 1).astype(āintā)
)
# create buttons
buttons = []
for i in range(27):
if i < 25:
buttons.append(dict(method=āupdateā,
args=[{"visible": [False if np.floor(item / 16) != i else True
āŖā for item in range(211)] + [
False] * 64}],
57
label="item" + str(i * 16) + "--" + str((i + 1) * 16 - 1)))
buttons.append(dict(method=āupdateā,
args=[{"visible": [False if np.floor(item / 16) != i else True
āŖā for item in range(211)] + [
True] * 64}],
label="item" + str(i * 16) + "--" + str((i + 1) * 16 - 1) + "
āŖā grouped"))
else:
buttons.append(dict(method=āupdateā,
args=[{"visible": [False if np.floor(item / 16) != i else True
āŖā for item in range(211)] + [
False] * 64}],
label="item208--210"))
buttons.append(dict(method=āupdateā,
args=[{"visible": [False if np.floor(item / 16) != i else True
āŖā for item in range(211)] + [
True] * 64}],
label="item208--210" + " grouped"))
fig.update_layout(scene=dict(
xaxis_title=āDiscount offeredā,
yaxis_title=āDiscount buyā,
zaxis_title=āRegular buyā),
title_text=āComprehensive 3D plots for all items (Author:
āŖā Daniel Deng)ā,
height=2500,
58
width=1800,
updatemenus=[dict(type=ābuttonsā,
buttons=buttons,
x=1.09,
xanchor=āleftā,
y=1,
yanchor=ātopā)],
hovermode=āclosestā
)
fig.show()
59
Chapter 6
Conclusions and Summary
Recently, the application of statistical modeling in commercial problems surges, since
people found the tremendous upsides once insights are revealed. Businesses like
Walmart, Amazon, Harris Teteer start to seek statistical methods that improve their
decision making, thereby consolidate their customer relationship.
In this thesis, I introduce the multi-sacle modeling within the Bayesian Dynamic
Modeling framework. It showcases the power of hierarchical, sequential, probabilistic
and computationally efficient models, as well emphasizes the novel decouple-recouple
modeling strategy, which propagates the signals down the hierarchy. I also exemplify
the improvement on the forecasting accuracy.
This method aims to mitigate the difficulty when it comes to forecasting sporadic
dataāwhich is the finest level in our hierarchy. It has been a challenge for years,
so the improvement brought by this approach is another step forward. The multi-
scale modeling successfully inherents hierarchical information of the retail setting:
households visit a store, spending on items, and purchasing outcomes connect across
large categories of items to small, refined categories, and eventually to specific items.
In chapter 3, I elaborate the design and criteria which I use to classify thousands
of households based on their purchasing behaviors. This not only enables the multi-
scale modeling in chapter 2, but also sets an example of visualized learning, providing
a valuable way of thinking about customer behaviors. The case study exemplify
the identification of price-sensitive households, which paves the way to customized
decision analysis.
60
At the end, making good decisions is the ultimate goal of modeling. In chapter 4,
I explore the optimal-discount problem under various models. I also attempt to solve
the decision analysis over longer period of time (a year). Even though the results
(section 4.4) align with business sense, the usage of the model on such a long time
span remains questionable.
Finally, I demonstrate my contributions in terms of programming to this project
(Yanchenko et al. (2021)). First, my research has contributed extensions and func-
tionality on latent factor dynamic modeling to the existing PyBATS package. Sec-
ond, my development of innovative data assessment and dynamic visualization with
household labeling has defined software that is available for further applications.
Future Work and Comments
This thesis presents a novel approach to efficiently forecast sparse time series. How-
ever, the decision analysis based on the model has a lot more to explore than presented
in Chapter 4. The main obstacle is to forecast long-term. First, one needs to define
how long is long-term, based on the context. For instance, three to six months might
be long for retail setting, while one to two years can be short for earthquake or volcano
eruption. Problems with more artificial components are generally easier than those
without control. Secondly, accouting for the significant factors can be challenging.
Sometimes, even for a social behavior problem like retailing, there might be unex-
pected shocks that make our forecast obsolete (e.g. COVID 19 in 2020). Lastly, the
uncertainty associated with our forecast increases rapidly with the length of horizon
and number of uncertain factors. This can leave us with a statistically right forecast,
but it has no pragmatic use.
Fortunately, I would like to consider the long-term forecasting in the following way.
61
No one really forsees the future. Instead, we can only study the past for insights that
are helpful for our decision making at the present. This has a significant impact on
the future. As statisticians, we learn from the history in a quantitative way: from
data. We extract and summarize information buried in the data that is not visible
to naked eyes. As a result, interpretability and openness are the key, assuming that
we do not believe in some āblack box āto determine our future (see what happened
to Catholicism when plague hit). Therefore, I think the problem of forecasting long-
term is simply a modeling or mathematical problem. Rather, it is closely related to
the horizon of total human knowledge.
Back to statistical modeling and decision analysis, a rational decision maker
should listen to multiple sources, to decrease uncertainty in quality of agents. Bayesian
Predictive Synthesis (McAlinn et al., 2020; West and Crosse, 1992; West, 1992) pro-
vides a potential framework for future researchers. A decision maker using such
framework takes into account all probabilistic information from available agents, and
update his own opinion on the quantity of interest.
62
Appendix A
DGLMs
ā¢ yt denotes the time series of interest, no matter it is continuous, binary or
non-negative count.
ā¢ At any given time t, available information is denoted by Dt = {yt, Dtā1, Itā1},
where Itā1 is any relevant additional information at time tā 1.
ā¢ Ft, Īøt are the dynamic regression vector and state vector at time t, respectively.
ā¢ Ī»t = F ā²tĪøt, where Ī»t is the linear predictor at time t. It links the parameter of
interest and the linear regression via link functions, i.e., Ī»t = logit(Ļt) for bino-
mial DGLM and Ī»t = log(Āµt) for Poisson DGLM, where Ļt, Āµt are probability
of success and mean for these precesses.
ā¢ state vector Īøt evolves via Īøt = GtĪøt +wt and wt ā¼ (0,Wt), where Gt is the
known evolution matrix and wt is the stochastic innovation vector.
ā¢ wt is independent of current and past states with moments E[wt|Dtā1, Itā1] = 0
and V [wt|Dtā1, Itā1] = Wt
A.1 VBLB
1. Current information is summarized in mean vector and variance matrix of the
posterior state vector Īøtā1|Dtā1, Itā1 ā¼ [mtā1,Ctā1].
64
2. Via the evolution equation Īøt = GtĪøt + wt, the implied 1-step ahead prior
moments at time t are Īøt|Dtā1, Itā1 ā¼ [at,Rt], with at = GtCtā1Gā²t and
Rt = GtCtā1Gā²t +Wt.
3. The time t conjugate prior satisfies E[Ī»t|Dtā1, Itā1] = ft = F ā²tat and V [Ī»t|Dtā1, Itā1] =
qt = F ā²tRtFt.
i.e.
Binomial: yt ā¼ Bin(ht, Ļt), conjugate prior: Ļt ā¼ Be(Ī±t, Ī²t), with ft =
Ļ(Ī±t) ā Ļ(Ī²t) and qt = Ļā²(Ī±t) + Ļā²(Ī²t), where Ļ(x), Ļā²(x) are digamma and
trigamma functions.
Poisson: yt ā¼ Poi(Āµt), conjugate prior: Āµt ā¼ Ga(Ī±t, Ī²t), with ft = Ļ(Ī±t) ā
log(Ī²t) and qt = Ļā²(Ī±t).
4. Forecast yt 1-step ahead using the conjugacy-induced predictive distribution
p(yt|Dtā1, Itā1). This can be simulated trivially.
5. Observing yt, update to the posterior.
i.e.
Binomial: conjugate posterior: Ļt ā¼ Be(Ī±t + yt, Ī²t + ht ā yt).
Poisson: conjugate posterior Āµt ā¼ Ga(Ī±t + yt, Ī²t + 1).
6. Update posterior mean and variance of the linear predictor Ī»t: gt = E[Ī»t|Dt]
and pt = V [Ī»t|Dt]
7. Linear Bayes estimation gives posterior moments mt = at +RtFt(gt ā ft)/qt
and Ct = Rt āRtFtFā²tRā²t(1ā pt/qt)/qt
This completes the time tā 1-to-t evolve-predict-update cycle.
65
A.2 Discount Factors
ā¢ Regression vector F can include intercept, known quantities, such as price of
items, indicator of whether or not using a firewall.
i.e.
F ā²t = (1, pricet, promotiont, 1, 0, 1, 0, 1, 0)
ā¢ Evolution matrix Gt is usually a block-diagonal matrix. For normal covariates
in F matrix, Gt takes values of 1 to allow the corresponding coefficients to
evolve with random innovation wt, while Gt can also include seasonal effects
by adding blocks of seasonal components.
i.e.
Gt = blockdiag(1, 1, 1,H1,H2,H3), whereHj =
cos(2Ļj/7) sin(2Ļj/7)
āsin(2Ļj/7) cos(2Ļj/7)
,
for j = 1,2,3
ā¢ Evolution variance matrixWt can be controlled by discount factor Ī“j ā (0, 1], j =
1 : J , via the following design:
Note that Rt = GtCtā1Gā²t +Wt.
Let Pt = GtCtā1Gā²t and Wt = blockdiag(Pt1(1ā Ī“1)/Ī“1, . . . ,PtJ(1ā Ī“J)/Ī“J),
where Ptj is the corresponding diagonal block of Pt = GtCtā1Gā²t.
This design enables separate discount factors for different components and each
componentās uncertainty increases by (1ā Ī“j)/Ī“j, while maintains correlations
in Ptj.
A.3 Random Effects
ā¢ Applicable to any DGLMs.
66
ā¢ Capture additional variation.
ā¢ Extended state vector: Īøt = (Ī¾t,Īøā²t,0)ā² and regression vector: F ā²t = (1,F ā²t,0)ā²,
where Ī¾t is a sequence of independent, zero-mean random effects and Īøā²t,0,Fā²t,0
are the baseline state vector and regression vector. Extended linear predictor:
Ī»t = Ī¾t + Ī»t,0
ā¢ Ī¾t provides an additional, day-specific āshocksā to latent coefficients.
ā¢ A random effect discount factor Ļ ā (0, 1] is used to control the level of vari-
ability injected (via a similar fashion as the other discount factors):
i.e.
qt,0 = V [Ī»t,0|Dtā1, Itā1], let vt = V [Ī¾t|Dtā1, Itā1] = qt,0(1 ā Ļ)/Ļ, which inflates
the variation of Ī»t by (1ā Ļ)/Ļ
A.4 Multi-scale Modeling
ā¢ Use decouple/recouple method to enable information sharing across series as
well as scalability.
ā¢ Add information at aggregate level to avoid being obscured by noises.
ā¢ For each of the N univariate series, it has a state vector and regression vector
defined by the following:
Mi : Īøi,t = (Ī³ā²i,t,Ī²ā²i,t)ā², Fi,t = (f ā²i,t,Ļ
ā²t)ā², i = 1 : N
which implies Ī»i,t = Ī³ā²i,tfi,t+Ī²ā²i,tĻt, where the first three contain series-specific
information, while Ļt is a latent factor shared by all series.
67
ā¢ Ļt, the common latent factor can be any common factors and modeled by an-
other DGLM, denoted asM0, conditioned on which, the updates and forecasting
of each Mi perform separately and in parallel.
ā¢ This decoupling/recoupling technique enables scalability of the N individual
series, while manages to create linkage across series.
68
Appendix B
More Figures
Here are the figures of parameters distributions from the case study in section 4.4.
We can see that one is able to boost the shifted mean Āµt of DCMMs and probability
Ļt, by offering more discounts, given the item remains profitable.
69
(a) Distributions of optimal Bernoulli probability over a year
(b) Distributions of optimal Poisson mean over a year
Figure B.1: Distributions of simulated parameters over a year (p/c = 1.2).70
(a) Distributions of optimal Bernoulli probability over a year
(b) Distributions of optimal Poisson mean over a year
Figure B.2: Distributions of simulated parameters over a year (p/c = 2).71
(a) Distributions of optimal Bernoulli probability over a year
(b) Distributions of optimal Poisson mean over a year
Figure B.3: Distributions of simulated parameters over a year (p/c = 10).72
Appendix C
More Code
This last appendix attaches the codes for modeling.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
####### My pybats is called pybats_latest, you need to change
āŖā that to your file name.
from pybats_latest.analysis import analysis, analysis_dcmm
from pybats_latest.latent_factor import dlm_coef_scale_fxn,
āŖā dlm_coef_scale_forecast_fxn, latent_factor, \
merge_lf_with_predictor, dlm_coef_fxn, dlm_coef_forecast_fxn
from pybats_latest.point_forecast import zape_point_estimate
from sklearn.metrics import roc_auc_score, f1_score
from functools import partial, reduce
## These are used for actual modeling, see the part at the
āŖā bottom (it is commented)
from complement import create_agg_data,
āŖā latent_factor_generator, multi_scale_modeling,
73
āŖā list_files
from joblib import Parallel, delayed
import multiprocessing
import time
import os
def create_agg_data(directory, data_names, data_name):
"""
functions that create aggregate level data
:param directory: directory of the data and where you want to
āŖā save it
:param data_names: data paths for each individual data
:param data_name: name of data to be stored
:return:
"""
item_total_sales = np.zeros(112)
item_total_transaction = np.zeros(112)
item_discount = np.zeros(112)
item_discount_perc = np.zeros(112)
total_household = np.zeros(112)
for f in data_names:
data = pd.read_pickle(directory + ā/ā + f)
74
item_total_sales += data.item_qty.values
item_total_transaction += (data.item_qty > 0).astype(āintā).
āŖā values
item_discount += data.discount_pot.values
item_discount_perc += data.discount_amount_pot.values/data.
āŖā regular_price.values
total_household += 1
item_discount = item_discount/len(data_names)
item_discount_perc = item_discount_perc/len(data_names)
item_data = pd.DataFrame({ādateā:data.date.values, ā
āŖā total_salesā:item_total_sales, ātotal_transactionā:
āŖā item_total_transaction,
ādiscountā:item_discount, ādiscount_percā:item_discount_perc,
āŖā ātotal_householdā:total_household})
item_data.to_pickle(directory + ā/ā + data_name)
def latent_factor_generator(agg_data_path):
"""
function that generates latent factor on aggregate level
:param agg_data_path: a string of the data path
:return: a list of latent factor
"""
agg_data = pd.read_pickle(agg_data_path)
75
Y_sales = agg_data.total_sales.values
X = np.c_[agg_data.discount.values, agg_data.discount_perc.
āŖā values]
# X[-8:, 0] = 1
# X[-8:, 1] += 0.1
n = agg_data.total_household.values
# latent factor parameters
prior_length = 8
nsamps = 5000
delregn = 0.98
deltrend = 0.98
delseas = 0.98
rho = 0.6
adapt_discount = None
forecast_start = 52
forecast_end = 103
k = 8
T = 112
start_date = pd.to_datetime(ā2017-09-05ā) # Make up a start
āŖā date
dates = pd.date_range(start_date, start_date + pd.DateOffset(
āŖā days=T - 1), freq=āDā)
forecast_start_date = start_date + pd.DateOffset(days=
āŖā forecast_start)
76
forecast_end_date = dates[-1] - pd.DateOffset(days=k)
# latent factor modeling
## latent factor for sales
idx = np.array([2])
dlm_coef_fxn_sales = partial(dlm_coef_fxn, idx=idx)
dlm_coef_forecast_fxn_sales = partial(dlm_coef_forecast_fxn,
āŖā idx=idx)
discount_sensitivity_lf_sales = latent_factor(gen_fxn=
āŖā dlm_coef_fxn_sales,
gen_forecast_fxn=dlm_coef_forecast_fxn_sales)
discount_latent_sales = analysis(Y=Y_sales, X=X, family=ā
āŖā poissonā, prior_length=prior_length,
k=k, rho = rho,
forecast_start=forecast_start, forecast_end=forecast_end,
forecast_start_date=forecast_start_date, forecast_end_date=
āŖā forecast_end_date,
dates=dates,
nsamps=nsamps,
deltrend=deltrend, delregn=delregn, adapt_discount=
āŖā adapt_discount,
ret=[ānew_latent_factorsā],
77
new_latent_factors=[discount_sensitivity_lf_sales.copy()])
# ##### This part can be replaced by a function called
āŖā latent_factor_plot().
# M = np.array([])
# V = np.array([])
#
# for date in dates[forecast_start:forecast_end]:
# m, v = discount_latent_sales.get_lf_forecast(date)
# M = np.append(M, m[0])
# V = np.append(V, v[0])
#
# lf_mean = pd.DataFrame({āaverageā: M, āupperā: M + np.sqrt(
āŖā V), ālowerā: M - np.sqrt(V), ādateā: dates[
āŖā forecast_start:forecast_end]})
# fig, ax = plt.subplots(1,1)
# ax.plot(np.arange(0, len(lf_mean.date), 1), lf_mean.average
āŖā .values,
# color = āredā, alpha = 0.5, label = āmeanā)
# ax.fill_between(np.arange(0, len(lf_mean.date), 1), lf_mean
āŖā .upper.values, lf_mean.lower.values,
# alpha = 0.4, label = āunit sd regionā)
# plt.xticks(rotation=20)
# plt.legend()
78
# ax.set_ylabel("Coefficient of average discount percentage")
#
# ax.set_title("Coefficient of " + "item" + "Your name")
# fig.savefig("Your path")
return [discount_latent_sales]
def multi_scale_modeling(data_path, latent_factor):
"""
implement a multi-scale modeling on data for item 62
:param data_path: path of time series chosen
:param latent_factor: latent factor to be used
:return:
"""
discount_latent = latent_factor
try:
data = pd.read_pickle(data_path)
Y = data.item_qty.values
buy = (Y > 0).astype(āfloatā)
X = np.c_[data.discount_pot.values, data.discount_amount_pot.
āŖā values / data.regular_price.values]
# X[-8:,0] = 1
# X[-8:,1] += 0.1
household = data.household.iloc[0]
group = data_path[-5:-4]
79
# model parameters
prior_length = 52
nsamps = 5000
delregn = 0.998
deltrend = 0.998
delseas = 0.998
rho = 0.6
adapt_discount = None
forecast_start = 52
forecast_end = 103
k = 8
T = len(Y)
start_date = pd.to_datetime(ā2017-09-05ā) # Make up a start
āŖā date
dates = pd.date_range(start_date, start_date + pd.DateOffset(
āŖā days=T - 1), freq=āDā)
forecast_start_date = start_date + pd.DateOffset(days=
āŖā forecast_start)
forecast_end_date = dates[-1] - pd.DateOffset(days=k)
### You can add your signal as a latent factor in the latent
āŖā factor list. Here the first one is my latent factor:
āŖā discount times coefficients
80
discount_latent[0] = merge_lf_with_predictor(discount_latent
āŖā [0], X[:, 1], dates)
# discount_latent[1] = "Your latent factor" (You will need to
āŖā append it to the input of this function)
# this function plots mean and sd shadow for latent factors
āŖā you have in the input.
# latent_factor_plot(discount_latent, directory=, names=)
print("begin " + str(household))
try:
samples = analysis_dcmm(Y=Y, X=X[:,0].reshape(112,-1), k=k,
āŖā prior_length=prior_length,
forecast_start=forecast_start, forecast_end=forecast_end,
forecast_start_date=forecast_start_date, forecast_end_date=
āŖā forecast_end_date,
dates=dates, latent_factor=discount_latent[0],
nsamps=nsamps, rho=rho,
delseas=delseas, deltrend=deltrend, delregn=delregn,
āŖā adapt_discount=adapt_discount,
ret=[āforecastā])
print(samples.shape)
# point forecasts
81
buy_samples = (samples > 0).astype(āfloatā)
medians = np.median(buy_samples[:,:,0], axis=0).astype(āintā)
# probs = np.mean(buy_samples, axis=(0, 2))
# performance scores
accuracy = (medians == buy[forecast_start + 1:forecast_end +
āŖā 2]).astype(āfloatā).mean()
f1 = f1_score(buy[forecast_start + 1:forecast_end + 2].astype(
āŖā āintā), medians)
naive = (X[:,0][forecast_start + 1:forecast_end + 2] == buy[
āŖā forecast_start + 1:forecast_end + 2]).astype(āfloatā).
āŖā mean()
naive_f1 = f1_score(buy[forecast_start + 1:forecast_end + 2].
āŖā astype(āintā), X[:,0][forecast_start + 1:forecast_end +
āŖā 2].astype(āintā))
zape = zape_point_estimate(samples)
print(str(household) + "finished")
return [household, accuracy, f1, naive, naive_f1, zape, np.
āŖā median(buy_samples, axis=0)]
except ValueError:
print("error!!!!!!!!!!!")
82
except EOFError:
print("Opps")
def latent_factor_plot(latent_factor, directory, names):
"""
:param latent_factor: a list of latent factors you want to
āŖā plot
:param directory: path you want to save the figure
:param names: list of names of your figure
:return: it saves the
"""
for l, n in zip(latent_factor, names):
M = np.array([])
V = np.array([])
for date in dates[forecast_start:forecast_end]:
m, v = l.get_lf_forecast(date)
M = np.append(M, m[0])
V = np.append(V, v[0])
lf_mean = pd.DataFrame({āaverageā: M, āupperā: M + np.sqrt(V),
āŖā ālowerā: M - np.sqrt(V),
ādateā: dates[forecast_start:forecast_end]})
83
fig, ax = plt.subplots(1, 1)
ax.plot(np.arange(0, len(lf_mean.date), 1), lf_mean.average.
āŖā values,
color=āredā, alpha=0.5, label=āmeanā)
ax.fill_between(np.arange(0, len(lf_mean.date), 1), lf_mean.
āŖā upper.values, lf_mean.lower.values,
alpha=0.4, label=āunit sd regionā)
plt.xticks(rotation=20)
plt.legend()
ax.set_ylabel("Coefficient multiplied by household discount
āŖā percent")
ax.set_title("Latent factor for " + n)
fig.savefig(directory + ā/ā + n + ā.pngā)
## Here are how I fit those models, you can adapt them if you
āŖā want to. These work with the functions above.
# # Data names
# item72_names_group0 = list_files(os.getcwd() + "/Data/Items
āŖā /group0", "item72", ".pkl")
84
# item62_names_group1 = list_files(os.getcwd() + "/Data/Items
āŖā /group1", "item62", ".pkl")
# item17_names_group2 = list_files(os.getcwd() + "/Data/Items
āŖā /group2", "item17", ".pkl")
# item76_names_group3 = list_files(os.getcwd() + "/Data/Items
āŖā /group3", "item76", ".pkl")
#
# # # create aggregate level data
# create_agg_data(os.getcwd() + "/Data/Items/group0",
āŖā item72_names_group0, āagg-72-group0.pklā)
# create_agg_data(os.getcwd() + "/Data/Items/group1",
āŖā item62_names_group1, āagg-62-group1.pklā)
# create_agg_data(os.getcwd() + "/Data/Items/group2",
āŖā item17_names_group2, āagg-17-group2.pklā)
# create_agg_data(os.getcwd() + "/Data/Items/group3",
āŖā item76_names_group3, āagg-76-group3.pklā)
#
#
# # create latent factors
# latent_factor72 = latent_factor_generator(os.getcwd() + "/
āŖā Data/Items/group0/" + āagg-72-group0.pklā)
# latent_factor62 = latent_factor_generator(os.getcwd() + "/
āŖā Data/Items/group1/" + āagg-62-group1.pklā)
# latent_factor17 = latent_factor_generator(os.getcwd() + "/
āŖā Data/Items/group2/" + āagg-17-group2.pklā)
# latent_factor76 = latent_factor_generator(os.getcwd() + "/
85
āŖā Data/Items/group3/" + āagg-76-group3.pklā)
#
#
# # parallelism
# num_cores = multiprocessing.cpu_count()
#
# scores72 = []
# scores72.append(Parallel(n_jobs=num_cores)(delayed(
āŖā multi_scale_modeling)(data_path = "Data/Items/group0/"
āŖā + data_path, latent_factor = latent_factor72) for
āŖā data_path in item72_names_group0))
#
# scores62 = []
# scores62.append(Parallel(n_jobs=num_cores)(delayed(
āŖā multi_scale_modeling)(data_path = "Data/Items/group1/"
āŖā + data_path, latent_factor = latent_factor62) for
āŖā data_path in item62_names_group1))
#
# scores17 = []
# scores17.append(Parallel(n_jobs=num_cores)(delayed(
āŖā multi_scale_modeling)(data_path = "Data/Items/group2/"
āŖā + data_path, latent_factor = latent_factor17) for
āŖā data_path in item17_names_group2))
#
# scores76 = []
# scores76.append(Parallel(n_jobs=num_cores)(delayed(
86
āŖā multi_scale_modeling)(data_path = "Data/Items/group3/"
āŖā + data_path, latent_factor = latent_factor76) for
āŖā data_path in item76_names_group3))
#
# # get rid of results that are None
# scores72 = [[score for score in scores72[0] if score is not
āŖā None]]
# scores62 = [[score for score in scores62[0] if score is not
āŖā None]]
# scores17 = [[score for score in scores17[0] if score is not
āŖā None]]
# scores76 = [[score for score in scores76[0] if score is not
āŖā None]]
#
# # print out how many households are left
# print(len(scores72[0]))
# print(len(scores62[0]))
# print(len(scores17[0]))
# print(len(scores76[0]))
#
# # save the results in a numpy zip file.
# np.savez(os.getcwd() + "/plots/performances", item72 = np.
āŖā array(scores72[0]), item62 = np.array(scores62[0]),
# item17 = np.array(scores17[0]), item76 = np.array(scores76
āŖā [0]))
87
Bibliography
Berry, L. R., P. Helman, and M. West (2020). Probabilistic forecasting of hetero-geneous consumer transaction-sales time series. International Journal of Forecast-ing 36, 552ā569.
Berry, L. R. and M. West (2020). Bayesian forecasting of many count-valued timeseries. Journal of Business and Economic Statistics 38, 872ā887.
Carvalho, C. M. and M. West (2007). Dynamic matrix-variate graphical models.Bayesian Analysis 2, 69ā98.
Chen, T., B. Keng, and J. Moreno (2018). Multivariate arrival times with recurrentneural networks for personalized demand forecasting. In 2018 IEEE InternationalConference on Data Mining Workshops (ICDMW), pp. 810ā819.
Chu, W. and S.-T. Park (2009). Personalized recommendation on dynamic content us-ing predictive bilinear models. In Proceedings of the 18th International Conferenceon World Wide Web, WWW ā09, New York, NY, USA, pp. 691ā700. Associationfor Computing Machinery.
Du, C., C. Li, Y. Zheng, J. Zhu, and B. Zhang (2018, February). Collaborative filter-ing with user-item co-autoregressive models. In Proceedings of the Thirty-SecondAAAI Conference on Artificial Intelligence, New Orleans, Louisiana. Associationfor the Advancement of Artificial Intelligence.
Ferreira, M. A. R., Z. Bi, M. West, H. K. H. Lee, and D. M. Higdon (2003). Multiscalemodelling of 1-D permeability fields. In J. M. Bernardo, M. J. Bayarri, J. O.Berger, A. P. David, D. Heckerman, A. F. M. Smith, and M. West (Eds.), BayesianStatistics 7, pp. 519ā528. Oxford University Press.
Ferreira, M. A. R., M. West, H. K. H. Lee, and D. M. Higdon (2006). Multiscale andhidden resolution time series models. Bayesian Analysis 2, 294ā314.
He, X., Z. He, X. Du, and T.-S. Chua (2018, July). Adversarial personalized rankingfor recommendation. In SIGIR ā18: 41st International ACM SIGIR Conferenceon Research and Development in Information Retrieval, Ann Arbor, MI.
Hu, Y., Q. Peng, X. Hu, and R. Yang (2015). Web service recommendation basedon time series forecasting and collaborative filtering. In 2015 IEEE InternationalConference on Web Services, pp. 233ā240.
Jerfel, G., M. Basbug, and B. Engelhardt (2017, 20ā22 Apr). Dynamic collabora-tive filtering with compound Poisson factorization. Volume 54 of Proceedings ofMachine Learning Research, Fort Lauderdale, FL, USA, pp. 738ā747. PMLR.
88
Kazemian, P., M. S. Lavieri, M. P. V. Oyen, C. Andrews, and J. D. Stein (2018, April).Personalized prediction of Glaucoma progression under different target intraocularpressure levels using filtered forecasting methods. Ophthalmology 125 (4), 569ā577.
Kott, A. and P. Perconti (2018). Long-term forecasts of military technologies for a20-30 year horizon: An empirical assessment of accuracy. Technological Forecastingand Social Change 137, 272ā279.
Lavine, I., A. J. Cron, and M. West (2020). Bayesian computation in dynamiclatent factor models. Technical Report, Department of Statistical Science, DukeUniversity. arxiv.org/abs/2007.04956.
Lichman, M. and P. Smyth (2018, April). Prediction of sparse user-item consumptionrates with zero-inflated poisson regression. In WWW ā18: Proceedings of the 2018World Wide Web Conference, pp. 719ā728.
McAlinn, K., K. A. Aastveit, J. Nakajima, and M. West (2020). Multivariate Bayesianpredictive synthesis in macroeconomic forecasting. Journal of the American Sta-tistical Association 115, 1092ā1110. arXiv:1711.01667. Published online: Oct 92019.
Naumov, M., D. Mudigere, H. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang,U. Gupta, C. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherni-avskii, Y. Lu, R. Krishnamoorthi, A. Yu, V. Kondratenko, S. Pereira, X. Chen,W. Chen, V. Rao, B. Jia, L. Xiong, and M. Smelyanskiy (2019). Deep learn-ing recommendation model for personalization and recommendation systems.arxiv.org/abs/1906.00091.
Nevins, J. R., E. S. Huang, H. Dressman, J. L. Pittman, A. T. Huang, and M. West(2003). Towards integrated clinico-genomic models for personalized medicine:Combining gene expression signatures and clinical factors in breast cancer out-comes prediction. Human Molecular Genetics 12, 153ā157.
Niu, W., J. Caverlee, and H. Lu (2018). Neural personalized ranking for imagerecommendation. In Proceedings of 1th ACM International Conference on WebSearch and Data Mining (WSDM 2018). ACM.
Pittman, J. L., E. S. Huang, H. K. Dressman, C. F. Horng, S. H. Cheng, M. H.Tsou, C. M. Chen, A. Bild, E. S. Iversen, A. T. Huang, J. R. Nevins, and M. West(2004). Integrated modeling of clinical and gene expression information for per-sonalized prediction of disease outcomes. Proceedings of the National Academy ofSciences 101, 8431ā8436.
Salinas, D., M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus (2019). High-dimensional multivariate forecasting with low-rank Gaussian copula processes. In
89
Advances in Neural Information Processing Systems 32, pp. 6827ā6837. CurranAssociates, Inc.
Su, X. and T. M. Khoshgoftaar (2009, Jan). A survey of collaborative filteringtechniques. Advances in Artificial Intelligence.
Talebi, M., Z. M. P. A. A. A. (2017). Long-term probabilistic forecast for m 5.0earthquakes in iran. Pure Appl. Geophys. 174, 1561ā1580.
Thai-Nghe, N., T. Horvāth, and L. Schmidt-Thieme (2011). Personalized forecastingstudent performance. In 2011 IEEE 11th International Conference on AdvancedLearning Technologies, pp. 412ā414.
Wang, X., Y. Wang, D. Hsu, and Y. Wang (2013). Exploration in interactive person-alized music recommendation: A reinforcement learning approach. ACM Trans.Multimedia Comput. Commun. Appl. 2 (3).
West, M. (1992). Modelling agent forecast distributions. Journal of the Royal Sta-tistical Society (Ser. B) 54, 553ā567.
West, M. and J. Crosse (1992). Modelling of probabilistic agent opinion. Journal ofthe Royal Statistical Society (Ser. B) 54, 285ā299.
West, M. and P. J. Harrison (1997). Bayesian Forecasting and Dynamic Models (2nded.). Springer-Verlag, New York, Inc.
West, M., A. T. Huang, G. S. Ginsberg, and J. R. Nevins (2006). Embracing thecomplexity of genomic data for personalized medicine. Genome Research 16, 559ā566.
Yanchenko, A., D. D. Deng, J. Li, A. J. Cron, and M. West (2021). Hierarchical dy-namic modelling for individualized bayesian forecasting. Department of StatisticalScience, Duke University. Submitted for publication. arXiv:2101.03408.
90