sales forecasting using longitudinal data models

16
Sales forecasting using longitudinal data models Edward W. Frees * , Thomas W. Miller University of Wisconsin—Madison, School of Business, 975 University Avenue, Madison, WI 53706, USA Abstract This paper shows how to forecast using a class of linear mixed longitudinal, or panel, data models. Forecasts are derived as special cases of best linear unbiased predictors, also known as BLUPs, and hence are optimal predictors of future realizations of the response. We show that the BLUP forecast arises from three components: (1) a predictor based on the conditional mean of the response, (2) a component due to time-varying coefficients, and (3) a serial correlation correction term. The forecasting techniques are applicable in a wide variety of settings. This article discusses forecasting in the context of marketing and sales. In particular, we consider a data set of the Wisconsin State Lottery, in which 40 weeks of sales are available for each of 50 postal codes. Using sales data as well as economic and demographic characteristics of each postal code, we forecast sales for each postal code. D 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. Keywords: Panel data models; Unobserved effects; Random coefficients; Heterogeneity 1. Introduction Forecasting is an integral part of a marketing manager’s role. Sales forecasts are important for understanding market share and the competition, future production needs, and the determinants of sales, including promotions, pricing, advertising and distribution. Market researchers and business analysts are often faced with the task of predicting sales. One approach to prediction is to use cross-sectional data, working with one point in time or aggregating over several time periods. We search for variables that relate to sales and use those variables as explanatory variables in our models. Another approach is to work with time series data, aggregating across sales territories or accounts. We use past and current sales as a predictor of future sales as we search for explanatory variables that relate to sales. Cross-sectional and simple time series approaches do not make full use of data available to sales and marketing managers. Typical sales data have a hier- archical structure. They are longitudinal or panel data, having both cross-sectional and time series characteristics. They are cross-sectional because they include observations from many cases, sales across stores, territories or accounts; we say that these data are differentiated across ‘space’. They are time series data because they represent many points in time. Longitudinal data methods are appropriate for these types of data. Longitudinal data methods have been widely developed for understanding relationships among variables in the social and biological sciences includ- ing marketing research; see, for example, Ailawadi and Neslin (1998) and Erdem (1996). But there is 0169-2070/$ - see front matter D 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/S0169-2070(03)00005-0 * Corresponding author. E-mail addresses: [email protected] (E.W. Frees), [email protected] (T.W. Miller). www.elsevier.com/locate/ijforecast International Journal of Forecasting 20 (2004) 99 – 114

Upload: edward-w-frees

Post on 15-Sep-2016

227 views

Category:

Documents


3 download

TRANSCRIPT

www.elsevier.com/locate/ijforecast

International Journal of Forecasting 20 (2004) 99–114

Sales forecasting using longitudinal data models

Edward W. Frees*, Thomas W. Miller

University of Wisconsin—Madison, School of Business, 975 University Avenue, Madison, WI 53706, USA

Abstract

This paper shows how to forecast using a class of linear mixed longitudinal, or panel, data models. Forecasts are derived as

special cases of best linear unbiased predictors, also known as BLUPs, and hence are optimal predictors of future realizations of

the response. We show that the BLUP forecast arises from three components: (1) a predictor based on the conditional mean of

the response, (2) a component due to time-varying coefficients, and (3) a serial correlation correction term. The forecasting

techniques are applicable in a wide variety of settings. This article discusses forecasting in the context of marketing and sales. In

particular, we consider a data set of the Wisconsin State Lottery, in which 40 weeks of sales are available for each of 50 postal

codes. Using sales data as well as economic and demographic characteristics of each postal code, we forecast sales for each

postal code.

D 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Panel data models; Unobserved effects; Random coefficients; Heterogeneity

1. Introduction

Forecasting is an integral part of a marketing

manager’s role. Sales forecasts are important for

understanding market share and the competition,

future production needs, and the determinants of

sales, including promotions, pricing, advertising and

distribution.

Market researchers and business analysts are often

faced with the task of predicting sales. One approach

to prediction is to use cross-sectional data, working

with one point in time or aggregating over several

time periods. We search for variables that relate to

sales and use those variables as explanatory variables

in our models. Another approach is to work with

time series data, aggregating across sales territories

0169-2070/$ - see front matter D 2003 International Institute of Forecaste

doi:10.1016/S0169-2070(03)00005-0

* Corresponding author.

E-mail addresses: [email protected] (E.W. Frees),

[email protected] (T.W. Miller).

or accounts. We use past and current sales as a

predictor of future sales as we search for explanatory

variables that relate to sales.

Cross-sectional and simple time series approaches

do not make full use of data available to sales and

marketing managers. Typical sales data have a hier-

archical structure. They are longitudinal or panel

data, having both cross-sectional and time series

characteristics. They are cross-sectional because they

include observations from many cases, sales across

stores, territories or accounts; we say that these data

are differentiated across ‘space’. They are time series

data because they represent many points in time.

Longitudinal data methods are appropriate for these

types of data.

Longitudinal data methods have been widely

developed for understanding relationships among

variables in the social and biological sciences includ-

ing marketing research; see, for example, Ailawadi

and Neslin (1998) and Erdem (1996). But there is

rs. Published by Elsevier B.V. All rights reserved.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114100

relatively little literature available for forecasting

using longitudinal data methods. Some important

exceptions include Battese, Harter, and Fuller

(1988) and Baltagi and Li (1992).

By using information in both the cross section

(space) and time, we are able to provide forecasts

that are superior to traditional forecasts that use only

one dimension. We can forecast at the subject/micro

level, providing managers with additional informa-

tion for making both strategic and tactical decisions,

including decisions about sizing and capacity plan-

ning for manufacturing plants, pricing, marketing

promotions, advertising, sales organization and sales

processes.

The longitudinal data mixed model is introduced

in Section 2. Appendix A shows that the longitudi-

nal data mixed model can be represented as a special

case of the mixed linear model. Thus, there is a large

literature on estimation of the regression parameters

(B) as well as variance components; see, for exam-

ple, Searle, Casella, and McCulloch (1992) or Ver-

beke and Molenbergs (2000). In the data analysis

Section 4, we find it convenient to use the SAS

procedure for mixed linear models (PROC MIXED)

when estimating longitudinal data mixed models;

See Littell, Milliken, Stroup, and, Wolfinger (1996)

for an introduction from this perspective. Similar

procedures are available for S-PLUS (Pinheiro and

Bates, 2000).

Section 3 develops longitudinal data mixed model

forecasts using best linear unbiased predictors

(BLUPs). These predictors were introduced by Gold-

berger (1962) and developed by Harville (1976) in

the context of the mixed linear model. One goal of

this paper is to show how this type of predictor can

be used as an optimal forecast for longitudinal data

mixed models. This section is more technical and

many readers may wish to go directly to the Section

4 case study.

Specifically, Section 4 describes the case study

motivating the theoretical modeling work, forecasting

Wisconsin lottery sales. Here, we consider a data set

that contains forty weeks of lottery sales from a

random sample of 50 Wisconsin postal codes. Sec-

tion 4 shows how to specify an appropriate longitu-

dinal data mixed model and forecast using the

specified model.

Section 5 closes with some concluding remarks.

2. Longitudinal data mixed model

Longitudinal data models are regression models in

which repeated observations of subjects, such as

stores, are available. Using longitudinal data models,

we can provide detailed representations of character-

istics that are unique to each subject, thus accounting

for the classical misspecification problem of hetero-

geneity. Furthermore, the repeated observations over

time allow us to consider flexible models of the

evolution of responses, such as sales, known as the

dynamic structure of a model.

This article introduces forecasting for a broad

class of dynamic longitudinal data models that we

call the longitudinal data mixed model. As an

example of this class of models, consider the basic

two-way model

yit ¼ ai þ kt þ xitVbþ eit; t ¼ 1; . . . ; T ;

i ¼ 1; . . . ; n: ð2:1Þ

Baltagi (1988) and Koning (1989) developed forecasts

for this model. Here, yit denotes the response (sales) for

the ith subject, such as store, during the tth time period.

This is a model of balanced data in that we assume that

the same number, T, observations is available for each

of n stores. The quantity b ¼ ðb1; . . . ; bKÞV is a K � 1

vector of parameters that is common to all subjects and

xit ¼ ðxit;1; xit;2; . . . ; xit;KÞV is the corresponding vector

of covariates. The term ai is specific to subject i

yet is common to all time periods. This variable may

account for features that are unique, yet unobserved,

characteristics of each subject. The term kt is specificto the time period t yet is common to all stores.

This variable may account for common, yet unob-

served, events that affect sales. Both terms ai and ktare random variables and hence the model in Eq.

(2.1) is also known as the two-way error components

model.

The longitudinal data mixed model is considerably

more complex than the model in Eq. (2.1) because it

has the ability to capture many additional features of

the data that may be of interest to an analyst. We

focus on three aspects:

1. The longitudinal data mixed model does not

require balanced data. To illustrate, it is possible

to allow new subjects to enter the data by

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 101

allowing the first observation to be after time t=1.

Similarly, the last observation may be prior to

time t=T, allowing for early departure. Using an

underlying continuous stochastic process for the

disturbances, the model allows for unequally

spaced (in time) observations as well as missing

data.

2. The longitudinal data mixed model allows for

covariates associated with vector error compo-

nents. This allows one to handle broad classes of

mixed models, such as random coefficient

models.

3. The longitudinal data mixed model allows for

specification of dynamic aspects in two fashions,

through the error terms {eit} and through the

specification of {kt} as a stochastic process.

To illustrate the third point, in the traditional

longitudinal data mixed model, such as introduced

by Laird and Ware (1982), the dynamics are speci-

fied through the correlation structure of subject-

specific errors. For example, it is common to con-

sider an autoregressiveoforderp ðARðpÞÞ model for

the disturbances {eit} of the form:

ei;t ¼ /1ei;t�1 þ /2ei;t�2 þ . . .þ /pei;t�p þ fi;t:

ð2:2Þ

where {fi,t} are initially assumed to be identically

and independently distributed, mean zero, random

variables. Alternative structures are easily accommo-

dated; see Section 3 for further discussion.

Alternatively, we may model the dynamics using a

stochastic process for {kt}. For comparison, note that

Eq. (2.2) is a model of serial relationships at the

subject level, whereas a dynamic model of kt is one

that is common to all subjects. To illustrate the latter

specification, Section 3 considers a random walk

model for the common, time-specific components.

Beginning with the basic two-way model in Eq.

(2.1), more generally, we use

za;i;t;1ai;1 þ : : : þ za;i;t;qai;q ¼ zVa;i;tai ð2:3Þ

and

zk;i;t;1kt;1 þ : : : þ zk;i;t;rkt;r ¼ zVk;i;tkt: ð2:4Þ

With these terms, we define the longitudinal data

mixed model as

yit ¼ zVa;i;tai þ zVk;i;tlt þ xVitb þ eit;

t ¼ 1; . . . ; Ti; i ¼ 1; . . . ; n: ð2:5Þ

Here, ai ¼ ðai;1; . . . ; ai;qÞV is a q� 1 vector of sub-

ject-specific terms and za;i;t ¼ ðza;i;t;1; . . . ; za;i;t;qÞV is

the corresponding vector of covariates. Similarly, lt=

(kt;1; . . . ; kt;rÞVis a r � 1 vector of time-specific terms

and zk;i;t ¼ ðzk;i;t;1; . . . ; zk;i;t;rÞVis the corresponding

vector of covariates. We use the notation t = 1; . . . ; Tito indicate the unbalanced nature of the data. Without

the time-specific terms, this model was introduced by

Laird and Ware (1982) and is widely used in the

biological sciences (Diggle, Liang, & Zeger, 1994).

We have added the time-specific terms to provide

another mechanism for handling temporal, or dynam-

ic, patterns. We allow the time-specific term to be a

vector for symmetry with the subject-specific terms

and to handle some special cases described in Section

3 where we give more details of the assumptions of

these quantities.

3. Longitudinal data mixed model and forecasting

3.1. The longitudinal data mixed model

A more compact form of Eq. (2.5) can be given by

stacking over t . This yields a matrix form of the

longitudinal data mixed model

yi ¼ Za;iai þ Zk;il þ Xib þ ei;

i ¼ 1; . . . ; n: ð3:1Þ

This expression uses vectors of responses, yi ¼ ðyi1;yi2; . . . ; yiTiÞV, and of disturbances, eei ¼ ðei1; ei2; . . . ;eiTiÞV. Similarly, the matrices of covariates are

Xi ¼ xi1; xi2; . . . ; xiTið ÞV, of dimension Ti � K;Za;i ¼ðza,i;1; za;i;2; . . . ; za;i;TiÞV, of dimension Ti � q matrix

and

Z��;i ¼

z0��;i;1 0 : : : 0

0 z0��;i;2

: : : 0

..

. ...

. .. ..

.

0 0 : : : z0��;i;Ti

0BBB@

1CCCA: 0i

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114102

of dimension Ti � rT , where 0i is a Ti � rðT � TiÞzero matrix. Finally, L=(L1,. . . , LT)Vis the rT � 1

vector of time-specific coefficients.

We assume that sources of variability, ei;ai and

lt , are mutually independent and mean zero. The

non-zero means are accounted for in the B para-

meters. The disturbances are independent bet-

ween subjects, yet we allow for serial correlation

and heteroscedasticity through the notation Var ei ¼Ri. Further, we assume that the subject-specific effects

{ai} are random with variance–covariance matrix D,

a q� q positive definite matrix. Time-specific effects

L have variance–covariance matrix Sk, a rT � rT

positive definite matrix. With this notation, we may

express the variance of each subject as Var yi ¼ Va;i+Zl;i SlZl;iV where

Va;i ¼ Za;i DZa;iV þ Ri: ð3:2Þ

3.2. Forecasting for the longitudinal data mixed

model

For forecasting, we wish to predict

yi;TiþL ¼ zVa;i;Ti þ L ai þ zVl;i;Ti þ L lTi þ L

þ ei;Ti þ L ð3:3Þ

for L lead time units in the future. We use results for

best linear unbiased prediction (BLUP) for the mixed

linear model; see Robinson (1991) or Frees, Young,

and Luo (1999, 2001) for recent reviews. A BLUP is

the best linear combination of responses that is

unbiased and has the smallest mean square error over

the class of all linear, unbiased predictors. When using

available data to approximate random variables, such

as yi;TiþL , we use the term ‘prediction’ in lieu of

‘estimation.’

To calculate these predictors, we use the sum of

squares SZZ ¼P

ni¼1 ZVl ; iV

�1a;i Zl ; i. We summarize

the results in the following proposition. The details

of the derivation are in Appendix A.

Proposition. Consider the longitudinal data mixed

model described in Section 3.1. Then, the best linear

unbiased predictor L is

lBLUP ¼ SZZþ �1l

� �1Xni¼1

ZVl;iV�1a;i ei;GLS ð3:4Þ

þxVi;Ti þ Lb

with residuals ei,GLS=yi�XibGLS and bGLS is the

generalized least squares estimator of B. The best

linear unbiased predictors for ei and ai are,

ei;BLUP ¼ RiV�1a;i ei;GLS � Zl;ilBLUP

� ð3:5Þ

and

ai;BLUP ¼ DZ Va;iR�1i ei;BLUP: ð3:6Þ

Further, the best linear unbiased predictor of yi;TiþL is

yi;TiþL ¼ x Vi;TiþLbGLS þ z Va;i;TiþLai;BLUP

þzVl;iTiþLCovðlTiþL;lÞV �1l lBLUP

þCovðei;TiþL; eiÞVR�1i ei;BLUP: ð3:7Þ

Remarks. We may interpret the BLUP forecast as

arising from three components. The first component,

xVi;TiþLbGLS þ zVa;i;TiþLai;BLUP, is due to the condition-

al mean. The second component, zVk;i;TiþLCovðlTiþL;lÞV �1

l EBLUP , is due to time-varying coefficients.

The third component, Covðei;TiþL; eiÞVR�1i ei;BLUP, is a

serial correlation ‘correction term,’ analogous to a

result due to Goldberger (1962); see Example 1.2

below. An expression for the variance of the forecast

error, Var yi;TiþL � yi;TiþL

� , is available from the

authors.

3.3. Forecasting for special cases of the longitudinal

data mixed model

The Proposition provides sufficient structure to

calculate forecasts for a wide variety of models. Still,

it is instructive to interpret the BLUP forecast in a

number of special cases. We first consider the case of

independent and identically distributed time-specific

components {Lt}.

Example 1. (Random time-specific components). We

consider the special case where {Lt} are i.i.d. and

assume that Ti þ L > T . Thus, from Eq. (3.7), we

have the BLUP forecast of yi;TiþL is

yi;TiþL ¼ xVi;TiþLbGLS þ zVa;i;TiþL ai;BLUP

þ Covðei;TiþL; eiÞVR�1i ei;BLUP: ð3:8Þ

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 103

Suppose further that the disturbance terms are serially

uncorrelated so that Covðei;TiþL; eiÞ ¼ 0 . As an

immediate consequence of Eq. (3.8), we have

yi;TiþL ¼ xVi;TiþLbGLS þ zVa;i;TiþLai;BLUP:

We note that even when {lt } are i.i.d., the time-

specific components appear in ai,BLUP. Thus, the

presence of Lt changes the forecasts.

Example 1.1. (No time-specific components) We

now consider the case of no time-specific component

lt . Here, using Eq. (3.8), the BLUP forecast of

yi;TiþL is

yi;TiþL ¼ xVi;TiþLbGLS þ zVa;i;TiþLai;BLUP

þ Covðei;TiþL; eiÞVV�1a;i yi � XibGLSð Þ

where, from Eqs. (3.5) and (3.6), ai;BLUP ¼ DZVa;iV�1

a;i yi � XibGLSð Þ. To help further interpret this case,

consider:

Example 1.2. (AR(1) serial correlation) An interest-

ing special case that provides a great deal of intuition

is the case where we assume autoregressive of order 1

(AR(1)), serially correlated errors. For example,

Baltagi and Li (1991, 1992) considered this serially

correlated structure in the error components model

(q ¼ 1 ) in the balanced data case. More generally,

from Eq. (3.7), it can be checked that

yi;TiþL ¼ xVi;TiþLbGLS þ zVi;TiþLai;BLUP þ qLeiTi;BLUP:

Thus, the L step forecast equals conditional mean,

with the correction factor of qL times the most recent

BLUP residual. This result was originally given by

Goldberger (1962) in the context of ordinary

regression without random effects (that is, assuming

D ¼ 0).

Example 1.3. (Time-varying coefficients) Suppose

that the model is

yit ¼ xVit t þ eit;

where f tg are i.i.d. We can re-write this as:

yit ¼ zVk;i;tlt þ xitV þ eit;

B

B

B

where E t¼ ;lt ¼ t � and zk;i;t ¼ xi;t . With this

notation and Eq. (3.8), the forecast of yi;TiþL is

yi;TiþL ¼ xVi;TiþLbGLS.

Example 1.4. (Two-way error components model)

Consider the basic two-way model given in Eq.

(2.1). As in Example 1.2, we have that q ¼ r ¼ 1 and

D ¼ r2a and za;i;TiþL ¼ 1. Thus, from Eq. (3.8), we

have that the BLUP forecast of yi;TiþL is

yi;TiþL ¼ ai;BLUP þ x Vi;TiþLbGLS:

For additional interpretation, we assume balanced

data so that Ti ¼ T as in Baltagi (1988) and Koning

(1989); see also Baltagi (1995, p. 38). To ease

notation, define f ¼ Tr2a ðr2 þ Tr2

a�

. Then, it can be

shown that

yi;TiþL ¼ xVi;TiþLbGLS þ f

�yi � xVibGLSð Þ

� n 1� fð Þr2k

r2 þ n 1� fð Þr2k

y� xVbGLSð Þ :

Example 2. (Random walk model) Through minor

modifications, other temporal patterns of common, yet

unobserved, components can be easily included. For

this example, we assume that r ¼ 1; fktg are i.i.d., so

that the partial sum process {k1 þ k2 þ : : : þ kt} is a

random walk process. Thus, the model is

yit ¼ zVa;i;tai þ k1 þ k2 þ : : : þ kt þ xVit

þeit; t ¼ 1; . . . ; Ti; i ¼ 1; . . . ; n:

Stacking over t, this can be expressed in matrix form

as Eq. (3.1) where the Ti � T matrix Zl;i is a lower

triangular matrix of 1’s for the first Ti rows, and zero

elsewhere. That is,

Zl;i ¼

1 0 0 : : : 0 0 : : : 0

1 1 0 : : : 0 0 : : : 0

1 1 1 : : : 0 0 : : : 0

..

. ... ..

.O 0 0 : : : 0

1 1 1 : : : 1 0 : : : 0

0BBBB@

1CCCCA

1

Then, it can be shown that

yi;TiþL ¼ xVi;TiþLbGLS þXt

s¼1lt;BLUP

þzVa;i;TiþLai;BLUP

þCovðei;TiþL; qiÞVR�1i ei;BLUP:

BB BB

B

2

Ti

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114104

4. Case study: Forecasting Wisconsin lottery sales

In this section, we forecast the sale of state

lottery tickets from 50 postal (ZIP) codes in Wis-

consin. Lottery sales are an important component of

state revenues. Accurate forecasting helps in the

budget planning process. Further, a model is useful

in assessing the important determinants of lottery

sales. Understanding the determinants of lottery

sales is useful for improving the design of the

lottery sales system and making decisions about

numbers of retail sales licenses to grant within

postal (ZIP) codes.

4.1. Sources and characteristics of data

State of Wisconsin lottery administrators provid-

ed weekly lottery sales data. We consider online

lottery tickets that are sold by selected retail estab-

lishments in Wisconsin. These tickets are generally

priced at $1.00, so the number of tickets sold

equals the lottery revenue. We analyze lottery sales

(ZOLSALES) over a 40-week period, April, 1998

through January, 1999, from 50 ZIP codes random-

ly selected from more than 700 ZIP codes within

the state of Wisconsin. We also consider the num-

ber of retailers within a ZIP code for each time

(NRETAIL).

A budding literature, such as Ashley, Liu, and

Chang (1999), suggests variables that influence

lottery sales. In developing models for lottery sales

we can draw upon this literature and anecdotal

evidence concerning the determinants of sales vol-

Table 1

Lottery, economic and demographic characteristics of 50 Wisconsin ZIP c

Lottery characteristics

ZOLSALES

NRETAIL

Economic and demographic characteristics

PERPERHH

MEDSCHYR

OOMEDHVL

PRCRENT

PRC55P

HHMEDAGE

CEMI

POPULAT

ume. Higher lottery jackpots lead to higher sales.

Ticket sales should be higher in areas with higher

population. Ticket sales should be higher in areas

better served by online ticket retailers; i.e. higher

numbers of retailers should lead to higher sales.

Lower income, less educated people may buy more

lottery tickets per capita than higher income, more

educated people. Senior citizens may buy more

lottery tickets per person than people in other age

groups. The thinking here is that seniors have more

free time to engage in recreational and gaming

activities.

Table 1 lists economic and demographic character-

istics that we consider in this analysis. Much of the

empirical literature on lotteries is based on annual data

that examine the state as the unit of analysis. In

contrast, we examine much finer economic units, the

ZIP code level, and weekly lottery sales. The eco-

nomic and demographic characteristics were abstract-

ed from the 1990 and 1995 United States census, as

organized and distributed by the Direct Marketing

Education Foundation. These variables summarize

characteristics of individuals within ZIP codes at a

single point in time and thus are not time-varying.

Table 2 summarizes the economic and demograph-

ic characteristics of 50 Wisconsin ZIP codes. To

illustrate, for the population variable (POPULAT),

we see that the smallest ZIP code contained 280

people whereas the largest contained 39 098. The

average, over 50 ZIP codes, was 9311.04. Table 2

also summarizes average online sales and average

number of retailers. Here, these are averages over 40

weeks. To illustrate, we see that the 40-week average

odes

Online lottery sales to individual consumers

Number of listed retailers

Persons per household times 10

Median years of schooling times 10

Median home value in $100 s for owner-occupied homes

Percent of housing that is renter occupied

Percent of population that is 55 or older

Household median age

Estimated median household income, in $100 s

Population

Table 2

Summary statistics of lottery, economic and demographic characteristics of 50 Wisconsin ZIP codes

Variable Mean Median Standard Minimum Maximum

deviation

Average 6494.83 2426.41 8103.01 189 33 181

ZOLSALES

Average 11.94 6.36 13.29 1 68.625

NRETAIL

PERPERHH 27.06 27 2.09 22 32

MEDSCHYR 126.96 126 5.51 122 159

OOMEDHVL 570.92 539 183.73 345 1200

PRCRENT 24.68 24 9.34 6 62

PRC55P 39.70 40 7.51 25 56

HHMEDAGE 48.76 48 4.14 41 59

CEMI 451.22 431 97.84 279 707

POPULAT 9311.04 4405.5 11 098 280 39 098

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 105

of online sales was as low as $189 and as high as

$33 181.

It is possible to examine cross-sectional relation-

ships between sales and economic/demographic char-

acteristics. For example, Fig. 1 shows a positive

relationship between average online sales and popu-

lation. Further, the ZIP code corresponding to city of

Kenosha, Wisconsin has unusually large average

sales for its population size. However, cross-sectional

relationships alone, such as correlations and plots

Fig. 1. Scatter plot of average lottery sales versus population size.

similar to Fig. 1, do not show dynamic patterns of

sales.

Fig. 2 is a multiple time series plot of logarithmic

(weekly) sales over time. Here, each line traces the

sales patterns for a ZIP code. This figure shows the

increase in sales for most ZIP codes, at approximately

weeks eight and 18. For both time points, the jackpot

prize of one online game, PowerBall, grew to an

amount in excess of $100 million. Interest in lotteries,

and sales, increases dramatically when jackpot prizes

Sales for Kenosha are unusually large for its population size.

Fig. 2. Multiple time series plot of logarithmic (base 10) lottery sales.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114106

reach large amounts. Moreover, Fig. 2 suggests a

dynamic pattern that is common to all ZIP codes.

Specifically, logarithmic sales for each ZIP code are

relatively stable with the same approximate level of

variability.

Another form of the response variable to consider

is the proportional, or percentage, change. Specifical-

ly, define the percentage change to be

pchangeit ¼ 100salesit

salesi;t�1

� 1

� : ð4:1Þ

A multiple times series plot of the percentage changes,

not displayed here, shows autocorrelated serial pat-

terns. We consider models of this transform of the

series in the following subsection on model selection.

4.2. In-sample model specification

This subsection considers the specification of a

model, a necessary component prior to forecasting.

We decompose model specification criteria into two

components, in-sample and out-of-sample criteria. To

this end, we partition the data into two subsamples:

we use the first 35 weeks to develop alternative

models and estimate parameters, and we use the last

5 weeks to ‘predict’ our held-out sample. The choice

of 5 weeks for the out-of-sample validation is some-

what arbitrary; it was made with the rationale that

lottery officials consider it reasonable to try to predict

5 weeks of sales based on 35 weeks of historical sales

data.

Our first forecasting model is an ordinary regres-

sion model

yit ¼ a þ xVit þ eit;

where the intercept is common to all subjects, also

known as a pooled cross-sectional model. The model

fits the data well; the coefficient of determination turns

out to be R2 ¼ 69:6%: The estimated regression coef-

ficients appear in Table 3. From the corresponding t-

statistics, we see that each variable is statistically

significant.

Our second forecasting model is an error compo-

nents model

yit ¼ ai þ xVit þ eit;

B

B

Table 3

Lottery model coefficient estimates

Variable Pooled cross-sectional Error components Error components model

model model with AR(1) term

Parameter t-Statistic Parameter t-Statistic Parameter t-Statistic

estimate estimate estimate

Intercept 13.821 10.32 18.096 2.47 15.255 2.18

PERPERHH �0.108 �6.77 �0.129 �1.45 �0.115 �1.36

MEDSCHYR �0.082 �11.90 �0.108 �2.87 �0.091 �2.53

OOMEDHVL 0.001 5.19 0.001 0.50 0.001 0.81

PRCRENT 0.032 8.51 0.026 1.27 0.030 1.53

PRC55P �0.070 �5.19 �0.073 �0.98 �0.071 �1.01

HHMEDAGE 0.118 5.64 0.119 1.02 0.120 1.09

CEMI 0.004 8.18 0.005 1.55 0.004 1.58

POP/1000 0.057 9.41 0.121 4.43 0.080 2.73

NRETAIL 0.021 5.22 �0.027 �1.56 0.004 0.20

Var a (r2a) 0.607 0.528

Var e (r2e ) 0.700 0.263 0.279

AR(1) corr (q) 0.555 25.88

AIC 4353.25 2862.74 2269.83

Based on in-sample data of n=50 ZIP codes and T=35 weeks. The response is (natural) logarithmic sales.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 107

where the intercept varies according to subject. Table

3 provides parameter estimates and the corresponding

t-statistics, as well as estimates of the variance com-

ponents, r2a and r2

e. As is common in longitudinal data

analysis, allowing intercepts to vary by subject can

result in regression coefficients for other variables

becoming statistically insignificant.

When comparing this model to the pooled cross-

sectional model, we may use the Lagrange multiplier

test described in Baltagi (1995, Chapter 3). The test

statistic turns out to be TS ¼ 11; 395:5, indicating that

the error components model is strongly preferred to

the pooled cross-sectional model. Another piece of

evidence is Akaike’s Information Criterion (AIC). The

smaller this criterion, the more preferred is the model.

Table 3 shows that the error components model is

preferred compared to the pooled cross-sectional

model based on the smaller value of the AIC statistic.

To assess further the adequacy of the error compo-

nents model, we calculated residuals from the fitted

model and examined diagnostic tests and graphs. One

such diagnostic graph, not displayed here, is a plot of

residuals versus lagged residuals. This graph shows a

strong relationship between residuals and lagged

residuals which we can represent using an autocorre-

lation structure for the error terms. The graph also

shows a strong pattern of clustering corresponding to

weeks with large PowerBall jackpots. A variable that

captures information about the size of PowerBall

jackpots would help in developing a model of lottery

sales. However, for forecasting purposes, we require

one or more variables that anticipates large PowerBall

jackpots. That is, because the size of PowerBall

jackpots is not known in advance, variables that proxy

the event of large jackpots are not suitable for fore-

casting models. These variables could be developed

through a separate forecasting model of PowerBall

jackpots.

Other types of random effects models for fore-

casting lottery sales could also be considered. To

illustrate, we also fit a more parsimonious version of

the AR(1) version of the error components model;

specifically, we fit this model, deleting those varia-

bles with insignificant t-statistics. It turned out that

this fitted model did not perform substantially better

in terms of overall model fit statistics such as AIC.

We explore alternative transforms of the response

when examining a held-out sample in the following

subsection.

Table 4 reports the estimation results from fitting

the two-way error components model in Eq. (2.1),

with and without an AR(1) term. For comparison

Table 4

Lottery model coefficient estimates

Variable One-way error components Two-way error Two-way error components

model with AR(1) term components model model with AR(1) term

Parameter t-Statistic Parameter t-Statistic Parameter t-Statistic

estimate estimate estimate

Intercept 15.255 2.18 16.477 2.39 15.897 2.31

PERPERHH �0.115 �1.36 �0.121 �1.43 �0.118 �1.40

MEDSCHYR �0.091 �2.53 �0.098 �2.79 �0.095 �2.70

OOMEDHVL 0.001 0.81 0.001 0.71 0.001 0.75

PRCRENT 0.030 1.53 0.028 1.44 0.029 1.49

PRC55P �0.071 �1.01 �0.071 �1.00 �0.072 �1.02

HHMEDAGE 0.120 1.09 0.118 1.06 0.120 1.08

CEMI 0.004 1.58 0.004 1.59 0.004 1.59

POP/1000 0.001 2.73 0.001 5.45 0.001 4.26

NRETAIL 0.004 0.20 �0.009 �1.07 �0.003 �0.26

Var aðr2aÞ 0.528 0.564 0.554

Var eðr2e Þ 0.279 0.022 0.024

Var kðr2kÞ 0.241 0.241

AR(1) corr ðqÞ 0.555 25.88 0.518 25.54

AIC 2270.97 �1109.61 �1574.02

Based on in-sample data of n=50 ZIP codes and T=35 weeks. The response is (natural) logarithmic sales.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114108

purposes, the fitted coefficient for the one-way

model with an AR(1) term are also presented in

this table. As in Table 3, we see that the model

selection criterion, AIC, indicates that the more

complex two-way models provide an improved fit

compared to the one-way models. As with the one-

way models, the autocorrelation coefficient is statis-

tically significant even with the time-varying pa-

rameter kt. In each of the three models in Table 4,

only the population size (POP) and education levels

(MEDSCHYR) have a significant effect on lottery

sales.

4.3. Out-of-sample model specification

This subsection compares the ability of several

competing models to forecast values outside of the

sample used for model parameter estimation. As in

Section 4.2, we use the first 35 weeks of data to

estimate model parameters. The remaining 5 weeks

are used to assess the validity of model forecasts. For

each model, we compute forecasts of lottery sales for

weeks 36 through 40, by ZIP code level, based on the

first 35 weeks. Denote these forecast values as

ZOLSALESi;35þL , for L=1 to 5. We summarize the

accuracy of the forecasts through two statistics, the

mean absolute error

MAE ¼ 1

5n

Xni¼1

X5L¼1

ZOLSALESi;35þL

��

�ZOLSALESi;35þLj ð4:2Þ

and the mean absolute percentage error

MAPE ¼ 100

5n

Xni¼1

X5L¼1

�ZOLSALESi;35þL � ZOLSALESi;35þL

�� ��ZOLSALESi;35þL

ð4:3Þ

The several competing models include the models

of logarithmic sales summarized in Tables 3 and 4.

Because the autocorrelation term appears to be

highly statistically significant in Table 3, we also

fit a pooled cross-sectional model with an AR(1)

term. Further, we fit two modifications of the error

components model with the AR(1) term. In the first

case we use lottery sales as the response (not the

logarithmic version) and in the second case we use

percentage change of lottery sales, defined in

E.W. Frees, T.W. Miller / International Jour

Eq. (4.1), as the response. Finally, we also consider a

basic fixed effects model,

yit ¼ ai þ eit;

with an AR(1) error structure. For fixed effects

models, the term ai is treated as a fixed parameter,

not a random variable. Because this parameter is

time-invariant, it is not possible to include our time-

invariant demographic and economic characteristics

as part of the fixed effects model.

Table 5 presents the model forecast criteria in Eqs.

(4.2) and (4.3) for eachmodel.We first note that Table 5

re-confirms the point that the AR(1) term improves

each model. Specifically, for the pooled cross-section-

al, as well as one-way and two-way error components

models, the version with an AR(1) term outperforms

the analogous model without this term. Table 5 also

shows that the one-way error components model dom-

inates the pooled cross-sectional model. This was also

anticipated by our pooling test, an in-sample test

procedure. Somewhat surprisingly, the two-way model

did not perform as well as the one-way model.

Table 5 confirms that the error components

model with an AR(1) term with logarithmic sales

as the response is the preferred model, based on

either the MAE or MAPE criterion. The next best

model was the corresponding fixed effects model. It

is interesting to note that the models with sales as

the response outperformed the model with percent-

age change as the response based on the MAE

criterion, although the reverse is true based on the

MAPE criterion.

Table 5

Out-of-sample forecast comparison of nine alternative models

Model Model res

Pooled cross-sectional model Logarithm

Pooled cross-sectional model with AR(1) Logarithm

term

Error components model Logarithm

Error components model with AR(1) term Logarithm

Error components model with AR(1) term Sales

Error components model with AR(1) term Percentag

Fixed effects model with AR(1) term Logarithm

Two-way error components model Logarithm

Two-way error components model with Logarithm

AR(1) term

4.4. Forecasts

We now forecast using the model that provides the

best fit to the data, the error components model with

an AR(1) term. The forecasts and forecast intervals for

this model are a special case of the results for the more

general longitudinal data mixed model, given in Eqs.

(3.6) and Appendix A, respectively.

Fig. 3 displays the forecasts and forecast intervals.

Here, we use T ¼ 40 weeks of data to estimate

parameters and provide forecasts for L ¼ 5 weeks.

Calculation of the parameter estimates, point fore-

casts and forecast intervals were done using loga-

rithmic sales as the response. Then, point forecasts

and forecast intervals were converted to dollars to

display the ultimate impact of the model forecasting

strategy.

Fig. 3 shows the forecasts and forecast intervals for

two selected postal codes. The lower forecast repre-

sents a postal from Dane County whereas the upper

represents a postal code from Milwaukee. For each

postal code, the middle line represents the point

forecast and the upper and lower lines represent the

bounds on a 95% forecast interval. When calculating

this interval, we applied a normal curve approxima-

tion, using the point forecast plus or minus 1.96 times

the standard error. Compared to the Dane County

code, the Milwaukee postal code has higher forecast

sales. Thus, although standard errors on a logarithmic

scale are about the same as Dane County, this higher

point forecast leads to a larger interval when rescaled

to dollars.

nal of Forecasting 20 (2004) 99–114 109

ponse Model forecast criteria

MAE MAPE

ic sales 3012.68 83.41

ic sales 680.64 21.19

ic sales 1318.05 33.85

ic sales 571.14 18.79

1409.61 140.25

e change 1557.82 48.70

ic sales 584.55 19.07

ic sales 1257.21 33.14

ic sales 1202.97 32.47

Fig. 3. Forecast intervals for two selected postal codes. For each postal code, the middle line corresponds to point forecasts for 5 weeks. The

upper and lower lines correspond to endpoints of 95% prediction intervals.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114110

4.5. Sales and marketing applications

The lottery sales example has a structure similar

to many sales and marketing applications. Sales data

are organized in time—days, weeks, months, or

years. Sales data are organized in space—country,

geographical region, sales areas or districts. We

model sales volume as a function of market con-

ditions and the ‘marketing mix.’ Market conditions

include information about competitors, substitute

products, and economic climate. ‘Marketing mix’

refers to the actions that firms take with regard to

product features, pricing, distribution, advertising,

and promotion. Like sales response, most explanato-

ry variables vary in time and space.

Consider the case of supermarket scanner data

collected weekly from thousands of stores across the

United States. To forecast sales volume for a partic-

ular product, say a frozen dinner, in a particular

store, researchers build upon observed sales volumes

for all products in the frozen dinner category. Prod-

uct price, discount coupon, promotion, and advertis-

ing data would be collected, and appropriate

explanatory variables defined. In building a forecast-

ing model for a food manufacturer or retail chain, a

researcher might combine data across stores within

cities and across products within brands. Models at

various levels of aggregation can be built from

longitudinal data. Leeflang, Wittink, Wedell, and

Naert (2000) and Hanssens, Parsons, and Schultz

(2001) discuss traditional time series and regression

approaches to scanner data analysis. Longitudinal

data mixed models, as discussed in this paper,

represent a set of flexible, dynamic models for sales

and marketing applications of this type.

5. Summary and concluding remarks

This article considers the longitudinal data mixed

model, a class of models that extends the traditional

two-way error components longitudinal data models

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 111

yet still falls with the framework of a mixed linear

model. In particular, the theory allows us to consider

unbalanced data, slopes that may vary by subject or

time and parametric forms of heteroscedasticity as

well as serial correlation. Forecasts for these models

are derived as special cases of best linear unbiased

predictors, BLUPs, together with the variance of the

forecast errors.

The theory provides optimal predictions assuming

that the variance components are known. If estima-

tors replace the true variance components, then the

mean squared error of the predictors increases. The

magnitude of this increase, as well as approximate

adjustments, have been studied by Kackar and Har-

ville (1984), Harville and Jeske (1992) and Baillie

and Baltagi (1999).

The theory is substantially broader than prevailing

forecasting practice. To illustrate how this theory may

be applied, this article considers forecasting Wisconsin

lottery sales. We examined a variety of model specifi-

cations to arrive at a simple one-way error components

with an AR(1) term. This simplemodel provides a good

fit to the data. In subsequent work, we intend to

investigate more complex models in order to realize

more useful forecasts of future sales. One direction that

subsequent research may take is to examine longitudi-

nal data models with spatial, as well as time-series,

error components.

Appendix A. Inference for the longitudinal data

mixed model

To express the model more compactly, we use

the mixed linear model specification. Thus, define

y ¼ yV1; yV2; . . . ; yVnð ÞV; e ¼ eV1 ; eV2 ; . . . ; eVnð ÞV; a ¼aV1;aV2 ; . . . ;aVnð ÞV,

X ¼

X1

X2

X3

..

.

Xn

0BBBBBBBBBBBB@

1CCCCCCCCCCCCA

; Zl ¼

ZE;1

ZE;2

ZE;3

..

.

ZE;n

0BBBBBBBBBBBB@

1CCCCCCCCCCCCA

and

Za ¼

Za;1 0 0 : : : 0

0 Za;2 0 : : : 0

0 0 Za;3: : : 0

..

. ... ..

.O ..

.

0 0 0 : : : Za;n

0BBBBB@

1CCCCCA:

With these choices, the longitudinal data mixed

model in Eq. (2.1) can be expressed as a mixed linear

model, given by

y ¼ Zaaþ Zll þ Xb þ e: ðA:0Þ

Further, we also use the notation R=Var

E=blockdiag(R1,. . ., Rn) and note that Var A=In �D. With this notation, we may express the variance–

covariance matrix of y as

Var y ¼ V ¼ ZaðIn � DÞZaVþ ZEREZEVþ R: ðA:1Þ

Moreover, define

Va ¼ R þ ZaðIn � DÞZaV

¼ blockdiagðVa;1; . . . ;Va;nÞ;

where Va;i is defined in Eq. (3.2). With this notation,

we use standard matrix algebra results to write

V�1 ¼ ðVa þ ZEREZEVÞ�1

¼ V�1a � V�1

a ZEðZEVV�1a ZE

þ R�1E Þ�1ZEVV�1

a : ðA:2Þ

For best linear unbiased prediction (BLUP) in the

mixed linear model, suppose that we observe an N � 1

random vector y with mean E y=X B and variance Var

y=V. The generic goal is to predict a random variable

w, such that E w ¼ cVB and Var w ¼ r2w. Denote the

covariance between w and y as Cov(w, y). The BLUP

of w is

wBLUP ¼ cVbGLS þ Covðw; yÞVV�1 y � XbGLSð Þ

ðA:3Þ

¼ CovðceVei; yÞVV�1a ðy � XbGLSÞ

� CovðceVei; yÞVV�1a ZllBLUP

¼ ceVRiV�1a ;i yi � XibGLSð Þ � Zk;ikBLUP

� : ðA:6Þ

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114112

where bGLS ¼ XVV�1X� �1

XVV�1y. The mean square

error is

VarðwBLUP � wÞ ¼ c V� Covðw; yÞVV�1X�

� XVV�1X� �1

c V� Covðw; yÞVV�1X�

V

� Covðw; yÞVV�1Covðw; yÞ þ r2w: ðA:4Þ

Proof of the Proposition

We first derive the BLUP predictor of L. Let cL be

an arbitrary vector of constants and setw=cLVL. For thischoice of w, we have ;=0. From Eq. (A.2), we have

clVlBLUP ¼ clVCovðl;ZllÞVV�1ðy� XbGLSÞ:

Using Eq. (A.2) and SZZ ¼Pni¼1

ZVl ;i V�1a;iZl;i ¼

ZVlV�1a Zl, we have

RlZlVV�1 ¼ RlZlVðV�1a � V�1

a ZlðZlVV�1a Zl

þ R�1l Þ�1ZlVV

�1a Þ

¼ RlðI � SZZðSZZ þ R�1l Þ�1ÞZlVV�1

a

¼ ðSZZ þ R�1l Þ�1

ZlVV�1a :

Thus,

kBLUP ¼ RlZlVV�1ðy � XbGLSÞ

¼ ðSZZ þ R�1l Þ�1ZlVV�1

a ðy�XbGLSÞ; ðA:5Þ

which is sufficient for Eq. (3.4).

To derive the BLUP predictor of ei , let ce be an

arbitrary vector of constants and set w=ceV ei . With

; = 0, we have

CovðceVei; yjÞ ¼ceVRi for j ¼ i

0 for j p i:

Using this, Eqs. (A.2) and (A.3) yield

ceVei;BLUP ¼ CovðceVei; yÞVV�1ðy � XbGLSÞ

¼ CovðceVei; yÞVV�1a ðy � XbGLSÞ

� CovðceVei; yÞVV�1a ZlðSZZ

þ R�1l Þ�1ZlVV�1

a ðy � XbGLSÞ

Using Wald’s device, we have the BLUP of ei,given in Eq. (3.5).

The derivation for the BLUP of Ai is similar. Let

cA be an arbitrary vector of constants and set w=cAVAi.

This yields

CovðcaVei; yjÞ ¼caVRi for j ¼ i

0 for j p i:

Using this, Eqs. (A.2), (A.3) and (3.4), we have

caVai;BLUP ¼ CovðcaVai; yÞVV�1ðy � XbGLSÞ

¼ CovðcaVai; yÞVV�1a ðy � XbGLSÞ

� CovðcaVai; yÞVV�1a ZkðSZZ

þ R�1l Þ�1

ZlVV�1a ðy� XbGLSÞ

¼ cVaDZVa;iV�1a;i ei;GLS � Zl; iBLUP�

; ðA:7Þ

which is sufficient for Eq. (3.6).

Now, to calculate the BLUP forecast of yi;TiþL in

Eq. (3.7), we wish to predict w ¼ yi;TiþL . With this

choice of w, we have ; ¼ xi;TiþL. Now, we examine

Covðyi;TiþL; yÞ ¼ CovðzVa;i;TiþLai; yÞ

þ CovðzVl ;i ; Ti þL lTiþL; yÞ

þ Covðei;TiþL; yÞ: ðA:8Þ

Using Eqs. (A.0) and (A.5), we have

CovðzVl; i; Ti þ LlTiþL; yÞVV�1 y � XbGLSð Þ

¼ zVl; i; TiþLCovðlTiþL;lVÞZVlV�1 y � XbGLSð Þ

¼ zVl; i; Ti þ LCovðlTiþL;lVÞR�1l lBLUP :

ðA:9Þ

Similarly, from Eq. (A.7), we have

CovðzVa;i;TiþLai; yÞVV�1 y � XbGLSð Þ

¼ zVa;i;TiþLai;BLUP ðA:10Þ

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114 113

and, similar to Eq. (A.6), we have

Covðei;TiþL; yÞVV�1 y � XbGLSð Þ

¼ Covðei;TiþL; eiÞVR�1i ei;BLUP: ðA:11Þ

Thus, from Eqs. (A.3), (3.3) and (A.8)–(A.11), the

BLUP forecast of yi;TiþL is

yi;TiþL ¼ xVi;TiþLbGLS

þ Covðyi;TiþL; yÞVV�1 y � XbGLSð Þ

¼ xVi;TiþLbGLS

þ z Vl;i;TiþL CovðlTiþL;lÞVS�1l lBLUP

þ z Va ; i; Ti þ Lai;BLUP

þ Covðei;TiþL; eiÞVR�1i ei;BLUP;

as in Eq. (3.7). This is sufficient for the proof of the

Proposition.

References

Ailawadi, K. L., & Neslin, S. A. (1998). The effect of promotion on

consumption: Buying more and consuming it faster. Journal of

Marketing Research, 35, 390–398.

Ashley, T., Liu, Y., & Chang, S. (1999). Estimating net lottery

revenues for states. Atlantic Economics Journal, 27, 170–178.

Baillie, R. T., & Baltagi, B. H. (1999). Prediction from the regres-

sion model with one-way error components. In Hsiao, C.,

Lahiri, K., Lee, L., & Pesaran, M. H. (Eds.), Analysis of panels

and limited dependent variable models. Cambridge, UK: Cam-

bridge University Press.

Baltagi, B. H. (1988). Prediction with a two-way error component

regression model. Problem 88.1.1. Econometric Theory, 4,

171.

Baltagi, B. H. (1995). Econometric analysis of panel data.

NY: Wiley.

Baltagi, B. H., & Li, Q. (1991). A transformation that will circum-

vent the problem of autocorrelation in an error-component

model. Journal of Econometrics, 48(3), 385–393.

Baltagi, B. H., & Li, Q. (1992). Prediction in the one-way error

component model with serial correlation. Journal of Forecast-

ing, 11, 561–567.

Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error

components model for prediction of county crop areas using

survey and satellite data. Journal of the American Statistical

Association, 83, 28–36.

Diggle, P. J., Liang, K. -Y., & Zeger, S. L. (1994). Analysis of

longitudinal data. Oxford University Press.

Erdem, T. (1996). A dynamic analysis of market structure based on

panel data. Marketing Science, 15, 359–378.

Frees, E. W., Young, V., & Luo, Y. (1999). A longitudinal data

analysis interpretation of credibility models. Insurance: Mathe-

matics and Economics, 24, 229–247.

Frees, E. W., Young, V., & Luo, Y. (2001). Credibility ratemaking

using panel data models. North American Actuarial Journal,

5(4), 24–42.

Goldberger, A. S. (1962). Best linear unbiased prediction in the

generalized linear regression model. Journal of the American

Statistical Association, 57, 369–375.

Hanssens, D. M., Parsons, L. J., & Schultz, R. L. (2001). Market

response models: econometric and time series analysis (2nd

edition). Boston: Kluwer.

Harville, D. (1976). Extension of the Gauss–Markov theorem to

include the estimation of random effects. Annals of Statistics, 2,

384–395.

Harville, D., & Jeske, J. R. (1992). Mean square error of estimation

or prediction under a general linear model. Journal of the Amer-

ican Statistical Association, 87, 724–731.

Kackar, R. N., & Harville, D. (1984). Approximations for standard

errors of estimators of fixed and random effects in mixed linear

models. Journal of the American Statistical Association, 79,

853–862.

Koning, R. H. (1989). Prediction with a two-way error component

regression model. Solution 88.1.1. Econometric Theory, 5, 175.

Laird, N. M., & Ware, J. H. (1982). Random-effects models for

longitudinal data. Biometrics, 38, 963–974.

Leeflang, P. S. H., Wittink, D. R., Wedel, M., & Naert, P. A. (2000).

Building models for marketing decisions. Boston: Kluwer.

Littell, R. C., Milliken, G. A., Stroup, W. W., & Wolfinger,

R. D. (1996). SAS system for mixed models. Cary, North

Carolina: SAS Institute.

Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S

and S-PLUS. New York: Springer.

Robinson, G. K. (1991). The estimation of random effects. Statis-

tical Science, 6, 15–51.

Searle, S. R., Casella, G., & McCulloch, C. E. (1992).

Variance components. New York: John Wiley and Sons.

Verbeke, G., & Molenbergs, G. (2000). Linear mixed models for

longitudinal data. New York: Springer–Verlag.

Biographies: Edward W. (Jed) FREES is a Professor of Business

and Statistics at the University of Wisconsin—Madison and is

holder of the Fortis Health Professorship of Actuarial Science. He

is a Fellow of both the Society of Actuaries and the American

Statistical Association. Professor Frees is a frequent contributor to

scholarly journals. His papers have won several awards for quality,

including the Actuarial Education and Research Fund’s annual

Halmstad Prize for best paper published in the actuarial literature

(three times). Professor Frees currently is the Editor of the North

American Actuarial Journal and an Associate Editor for Insurance:

Mathematics and Economics. The National Science Foundation

(Grant Number SES-0095343) provided funding to support this

research.

E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99–114114

Thomas W. MILLER is Director of the A.C. Nielsen Center for

Marketing Research at the University of Wisconsin—Madison. He

holds graduate degrees in psychology (PhD, psychometrics) and

statistics (MS) from the University of Minnesota and in business

(MBA) and economics (MS) from the University of Oregon. An

expert in applied statistics and modeling, Tom has designed and

conducted numerous empirical and simulation studies comparing

traditional and data-adaptive methods. Tom’s current research

includes explorations of online research methods and studies of

consumer life-styles, choices and uses of technology products. He

won the David K. Hardin Award for the best paper in Marketing

Research in 2001.