car demand forecasting using pseudo panel method

Upload: varsha-valecha

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    1/24

    Car Demand Forecasting Using Dynamic Pseudo

    Panel Model

    Biao Huang 1

    MVA and Department of Economics, Birkbeck College

    1. INTRODUCTION

    The forecasts of car ownership and use play a central role in the planning and

    decision making of numerous public agencies and private organisations. Ithas been a lively area of research and numerous models have beenconstructed. The international literature review reveals that the staticapproach dominates car ownership forecast (see, for example, NRTF, 1997;Whelan, 2001; Hensher et al., 1989; Brownstone et al., 2000; De Jong, 1989a,1989b). It is envisaged that the inclusion of the dynamic in car demandforecasting will yield fruitful results. Nevertheless, the use of dynamicapproach in car demand forecasting is still limited due to heavy datarequirement. There have been relatively few forecasting models that use the

    dynamic approach except some using aggregate time series methods. It ispossible to forecast car demand using panel data model. However, there isonly one panel survey in Britain containing limited transport relatedinformation: the British Household Panel Survey (BHPS), which is inadequatefor the purpose of our study 2. Furthermore, due to the attrition problem, thesize and representativeness of the samples decline over time, rendering thepanel data inferior to other national cross-sectional data.

    One approach to circumvent the need for panel data is to construct pseudopanels from the cross sectional data. The pseudo-panel approach is arelatively new econometric approach to estimate dynamic demand models. Itis based on grouping individuals or households into cohorts and thus treatingthe averages within these cohorts as observations in a panel. In this way, itenables us to follow over time a representative sample of the same cohorts ofindividuals or households and to overcome the deficiencies in both the staticmodels and aggregate time series. In most empirical studies, the authors

    would impose certain restrictions on pseudo-panel before treating them asactual panel data, although it has been shown that the undesirable effects ofapplying only a synthetic panel would be small if the cohort sizes are sufficientlarge (more than 100 individuals) and if the true means within each cohortexhibit sufficient time variation (Verbeek and Nijman, 1992).

    The use of pseudo panel data was introduced by Deaton (1985) for the

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    2/24

    Vythoulkas (1999) was the first study of dynamic car ownership model usingpseudo panel approach.

    The main contribution of the current paper lies in the extension of pseudopanel method to non-linear models, where it is possible to incorporate theeffect of saturation. More specifically, the following models were estimated:standard logit model based on proportions data; dynamic mixed logit model;dynamic mixed logit model with saturation. All models have good level of fitand the forecast performance is very satisfactory. This paper is organized asfollows: section two discusses the construction of the pseudo panel andpresents the descriptive statistics of selected variables; section three

    describes three non-linear models and the results of estimation; section fourreports the forecasts of private car ownership in Great Britain to 2021 andevaluates the model performance; section five is a brief conclusion.

    2. PSEUDO PANEL DATA

    The motivation to use the pseudo panel model is to take advantage of thehigh quality cross sectional survey data available in the UK. Among the

    several national surveys containing transport related information, the longestrunning and most comprehensive one is the Family Expenditure Survey (FES).The FES is a voluntary survey of a random sample of private households inthe United Kingdom carried out by the Office for National Statistics. It is acontinuous survey with an annual sample of around 6,500 households. It ranfrom 1957 to 2001, until it was merged with the National Food Survey to forma new Expenditure and Food Survey. Data is collected throughout the year tocover seasonal variations in expenditures. The FES contains rich data on

    expenditure and income, including vehicle purchasing and servicing costsdata. It also collects information on socio-economic characteristics of thehouseholds, e.g. composition, size, social class, occupation and age of thehead of household. Many of these variables have been identified as the mainfactors influencing car ownership

    2.1 Constructing the Pseudo Panel

    To compile a pseudo panel dataset, the cohorts should be defined on thebasis of common shared characteristics. Such characteristics should be timeinvariant, such as year of birth of the head of the household, education level,geographic region, etc (Dargay and Vythoulkas, 1999). In the current study,the cohort is defined based on the year of birth of the head of the household.The choice of the width of the birth cohort is a trade off between the need tohave a large number of observations per cohort and the desire to have as

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    3/24

    cohort is 79; in 1983, this mean age is 80; in 1984, this mean age is 81, andso on. Likewise, for each sampling year, all the households with its head bornbetween 1906 and 1910 are grouped into a cohort; and for those bornbetween 1911 and 1915, and so on. The objective of such grouping is to trackthe notionally same group of people. Table A.1 in the Appendix shows themean age of all the cohorts constructed in this study. It should be noted thatonly cohorts with more than 100 observations are included.

    Furthermore, the FES survey year changed from calendar year to fiscal yearsince 1994. Since this will have an impact on the age of the household head,adjustment has been made to allocate each observation into calendar year

    based on the data collection year. The final wave of the FES data is for year2000/2001. However, only data for year 2000 are used as there are only a fewhundred observations for 2001. In total, the constructed pseudo panel has254 observations, covering 19 years from 1982 to 2000.

    2.2 Descriptive Statistics of the Pseudo Panel

    The pseudo panel data set contains 17 primary variables directly derived from

    the FES. They fall into five categories. Table 2.1 summarizes these variables.Table 2-1 Variables in the Pseudo Panel Dataset

    Category Variable

    Number of cars owned by HouseholdNumber of cars owned or Used by HouseholdPercentage of household owning at least one carPercentage of household owning two or more cars

    Transport data

    Average weekly public transport expenditure perpersonWeekly household incomeWeekly household disposable income

    Household income andexpenditure data

    Weekly household expenditureHousehold sizeNumber of adult

    Householddemographic data

    Number of workerPercentage of household living in metropolitan areaResidence area dataPercentage of household living in rural areaYearBirth Cohort

    General data

    Number of observations within cohort

    Some of the variables show strong trends across time and cohorts. For

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    4/24

    2.2.1 Number of Cars Owned or Used by the Household

    The pseudo panel data clearly show the difference of car ownership betweencohorts and between years. Figure 2.1 compares the number of cars ownedor used by different household cohorts in 1982 and 2000. For a given year,the car ownership is the highest for the cohort whose household head is inlate 40s. In 1982, the cohort with the highest car ownership was the one withhead of household born between 1931 and 1935, i.e. aged between 47 and51. The average number of car owned or used was 1.11. In 2000, the cohortwith the highest car ownership was the one with household head born

    between 1951 and 1955, i.e. aged between 45 and 49. The number of carsowned or used by that cohort was 1.38, significantly higher than that of thecomparable cohort in 1982.

    Figure 2-1 Number of Cars Owned or Used by Household, 1982 and2000

    Car Owned or Used by Household, Comparison by Year

    0.20

    0.40

    0.60

    0.80

    1.00

    1.20

    1.40

    1901

    -05

    1906

    -10

    1911

    -15

    1916

    -20

    1921

    -25

    1926

    -30

    1931

    -35

    1936

    -40

    1941

    -45

    1946

    -50

    1951

    -55

    1956

    -60

    1961

    -65

    1966

    -70

    1971

    -75

    1976

    -80

    Birth Cohort

    1982 2000

    Figure 2.2 compares the car ownership of eight cohorts. Over the sample

    period, each cohort sees its household head getting older year by year.Hence, between 1982 and 2000, the mean age of household head in theseeight cohorts covers different ranges, although there are overlaps betweenthese ranges. By plotting the number of cars owned against the mean age ofthe household head, we are able to make some sensible comparison betweenthese eight cohorts.

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    5/24

    be declining for the most recent regenerations. Using more recent data (1982-2000 as opposed to the 1974-1994 data used by Dargay and Vythoulkas), thecurrent study found that this diminishing generation effect is more apparent.

    Figure 2-2 Number of Cars Owned or Used by Household, comparisonof eight cohorts

    Number of Cars Owned or Used by Household,

    Comparison by Cohorts

    0.20

    0.40

    0.60

    0.80

    1.00

    1.20

    1.40

    19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85

    Age of Household Head

    1901-05 1911-15 1921-25 1931-35 1941-45 1951-55 1961-65 1971-75

    2.2.2 Weekly Household Disposable Income

    The weekly household disposable income also shows a strong trend acrosscohorts. First, we compared the difference of household income across birth

    cohorts for 1982 and 2000 (Figure 2.3). It shows that for young and mid-agedhousehold, cohorts with older head have higher disposable income; for olderhousehold, cohorts with older head have lower disposable income. This trendis very similar to that of car ownership, suggesting that car ownership is highlycorrelated to income level.

    Figure 2-3 Household Weekly Disposable Income, 1982 and 2000

    Household Weekly Dispensable Income ( in 1995 price),

    Comparison by Year

    300

    350

    400

    450

    500

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    6/24

    Figure 2.3 also shows the rise of income level for corresponding age groupfrom 1982 to 2000 (all the expenditure and income data in the pseudo paneldataset have been converted to 1995 prices based on Retail Price Index).

    It is revealing to track the change of household income level according to theage of household head for the eight selected cohorts. Figure 2.4 shows thatweekly household disposable income rises as the age of household headincreases and reaches its peak when the household head is in late 40s. Forthe cohort whose household head is born between 1941 and 1945, the weeklydisposable income is the highest of 490 when its household head is aged 47.

    Figure 2-4 Household Weekly Disposable Income, Eight Cohorts

    Household Weekly Dispensable Income ( in 1995 price),

    Comparison by Cohorts

    100

    150

    200

    250

    300

    350

    400

    450

    500

    19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85

    Age of Household Head1901-05 1911-15 1921-25 1931-35 1941-45 1951-55 1961-65 1971-75

    3. NONLINEAR MODELLING

    The main advantage of estimating non-linear models is the possibility toinclude the saturation level. By specifying car ownership models with an S-shape functional form and a saturation level, forecasts of vehicle ownership

    will be curtailed as saturation is approached. Although probably not beingsignificant in developing countries, this feature would be highly significant toforecasts in more mature markets such as Great Britain (Whelan et al, 2000).This section discusses three nonlinear models, starting from a static model,then dynamic model with and without saturation.

    3 1 Static Logit Model based on Proportions Data

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    7/24

    y*= x + (1)

    As we do not directly observe the net benefit of owning a car, the observationwe have is whether a household has car or not:

    y= 1 ify*> 0,y= 0 ify*< 0;

    Then the probability of y equals one is:

    Prob (y = 1| x) = Prob (y*>0| x) = Prob (< x| x)

    Assuming that follows a logistic distribution, it gives the familiar logit model:

    Prob (y = 1| x) =)exp(x'1

    )exp(x'

    +(2)

    The current study follows NRTF (1997) and separately estimates models ofhousehold with one or more cars and household with two or more cars. The

    dependent variable is expressed as the proportion of households owning atleast one car in a cohort (R1+) and the proportion of households owing two ormore cars, conditional on ownership of at least one car (R2+|1+)

    3. Thesevariables have the property that they generally increase monotonically as afunction of income and other variables of household characteristics.

    The estimation of standard logit model based on aggregate data is relativelystraight forward and can be done using standard econometric software such

    as Limdep. The log likelihood function is derived here, as it will be required bythe estimation of more complex logit model, such as mixed logit and logit withsaturation level, which will be discussed in the following sections.

    Let ni be the number of household for cohort i, and N be the total number ofcohorts. Let ci be the number of household owning at least one car for thatcohort (the same applied to household owning two or more cars). Furtherdefine ri = ci / ni, the proportion of household owning at least one car within

    cohort. Then, the likelihood function is as follows:

    =

    =

    ==N

    i

    nr

    i

    r

    i

    N

    i

    cn

    i

    c

    iiiiiii PPPPL

    1

    1

    1

    ])1()[()1()( (3)

    where Pi is the probability of an average household in cohort i owning atleast one car, as defined by equation (2).

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    8/24

    The logit model is estimated using maximum likelihood in GAUSS. Followingthe general to specific modelling approach, the initial model includes nineexplanatory variables. For model of one plus car (dependent variable R1+), allbut three variables, MET, RURAL and LNMCOST, are significant at 1% level.Table 3.1 describes the explanatory variables included in the model.

    Table 3-1 Description of explanatory variables

    Variable Description

    LNINC Log of Household disposable income (1995 price)HHSIZE Household SizeWORKER Number of employed person in the household

    LNAGE Log of Age of household headLNMCOST Log of index of real motoring costs (All costs, GB*)MET Proportion of households living in Metropolitan areaRURAL Proportion of households living in rural area

    AGEDUMMYDummy variable for "young" household, whose head is youngerthan 50

    (* Source: Transport Trends, 2004)

    Consequently, a reduced model with six variables was re-estimated. The log-likelihood of the reduced model is -70290.4. As the log-likelihood of the fullmodel is -70287.4, the hypothesis that the reduced model is as good as thefull model is not rejected. Similar model was estimated for model ofconditional two plus car (dependent variable R2+|1+). AGEDUMMY was notsignificant for the R2+|1+ model and was thus omitted. Table 3.2 presents theresults of the model of one plus car and that of conditional two plus cars.

    Table 3-2 Modeling results of Car1+ and Car2+|1+ (t-statistic inparentheses)

    Constant LNINC HHSIZE WORKER LNAGE AGEDUMMY-11.827 2.317 0.392 -0.450 -0.206 -0.378

    R1+ (-42.855) (42.832) (13.889) (-13.367) (-5.267) (-10.045)

    -13.293 2.172 -0.301 0.277 0.052R2+|1+

    (-46.543) (38.203) (-15.510) (8.880) (2.043)

    Besides the constant, LNINC is the most significant explanatory variable,whose parameter is a large positive value. This suggests that the increase ofaverage household disposable income has significant and positive impacts onthe proportion of car owning household in a cohort. Due to the interactionbetween the explanatory variables, it is difficult to directly interpret the impact

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    9/24

    Lagged effects in a binary choice setting can arise from three sources: serialcorrection in it, the heterogeneity, i, or true state dependence through the

    term yi,t-1. As modelling dynamic effects in binary choice models is morecomplex than in the linear model, by comparison there are relatively fewerfirm results in the applied literature (Green, 2003). Traditionally, the focus ison the methods of avoiding the strong parametric assumptions of the probitand logit models. A different approach is taken in the current study, i.e. themethod of Maximum Simulated Likelihood.

    The use of simulation enables the estimation of a highly flexible model, mixed

    logit model. Mixed logit alleviates the three limitation of standard logit byallowing for random taste variation, unrestricted substitution patterns andcorrelation in unobserved factors over time (Train, 2003). In recent years,mixed logit model has been applied to many empirical studies based on paneldata (see, for example, Revelt and Train, 1998; Bhat, 2000) as well as crosssectional data (e.g. Bhat, 1998; Browstone and Train, 1999). This study is thefirst application of mixed logit model in the pseudo panel setting.

    Mixed logit probabilities are the integrals of standard logit probabilities over adensity of parameters. The choice probabilities of a mixed logit model can beexpressed as the followings:

    = dfpP ii )|()( (6)

    wherepi() is the logit probability evaluated at (for binary choice model thisis defined by equation 1) and f(|) is the density function of , whose

    parameters are denoted as . The method of Maximum Simulated Likelihoodis based on the simulated probability of (6):

    =

    =D

    d

    d

    ii pD

    P1

    ~

    )(1

    (7)

    where d is a value of from the dth draw of f(|) and D is the number of

    draws. By construction, iP

    ~

    is an unbiased estimated of Pi. Inserting thesimulated probability into the log likelihood function of binary choice modelbased on aggregate data (4), it gives a simulated log likelihood function:

    =

    +=N

    i

    iiiii PrPrnSLL1

    ~~

    )]1ln()1()ln([ (8)

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    10/24

    with itbeing iid (independent identically distributed) extreme value over timeand people. Conditional on the probability of a person makes a sequence ofchoices overTperiods is the products of logit formulas:

    = +

    =T

    t it

    itit

    x

    xp

    1 )'exp(1

    )'exp()(

    (10)

    since its are independent over time. The unconditional probability is theintegral of this product over all values of:

    = dfpPitit )()( (11)

    As the only difference of mixed logit with repeated choices is that theintegrand involves a product of logit formulas, the probability is simulatedsimilarly to the probability with one choice period. Lagged dependentvariables can be added in the mixed logit model in a given period to representlagged response behaviour without changing the estimation procedure (Train,2003). Conditional on n, the only remaining random terms in the mixed logit

    are the its, which are independent over time. A lagged dependent variableentering yit*is uncorrelated with these remaining error terms for period t, sincethese terms are independent over time. The conditional probabilities aretherefore the same as in equation (10), but with the xs including laggeddependent variables, and the unconditional probability is the integral of thisconditional probability over all values of (equation 11).

    Extending the static models in Table 3.2 to include a lagged dependent

    variable yields a dynamic pseudo panel model. It is assumed that parametervectorfollows a multivariate normal distribution 5. The model was estimatedusing the method of Maximum Simulated Likelihood in Gauss. For bothmodels (R1+ and R2+|1+), the means of most parameters in are statisticallysignificant at 1% level. However, the standard deviations of all parameters in are close to zero and none is significant 6, which seems to suggest that afterthe data have been aggregated in a pseudo panel, heterogeneity acrosscohorts becomes insignificant.

    Table 3-3 Results of dynamic pseudo panel model for Car1+ and Car2+|1+(t-statistic in parentheses)

    Constant LNINC HHSIZE WORKER LNAGE AGEDUMMY LagY

    -6.0930 1.0131 0.1529 -0.1712 -0.1697 -0.1454 2.5073Mean

    (-14.4190) (11.2280) (4.6780) (-4.3350) (-4.0860) (-3.5530) (19.3560)R

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    11/24

    Table 3.3 shows the estimated means and standard deviations of theparameter for model of one plus car and that of conditional two plus cars. Thelagged dependent variable is highly significant, suggesting that the proportion

    of households owning cars in a cohort in one year is strongly influenced bythat proportion in the previous year. The model also indicates that householddisposable income has most significant impact on car ownership among allindependent variables (except for the constant term). Evaluated at the meansof the regressors, the income elasticity 7 is 0.312 for households owning atleast one car and 0.468 for those owning two plus cars given ownership ofone car. The log likelihood for the R1+ and R2+|1+ is -66085 and -64999respectively.

    Due to the non-linear specification of the model, it is not possible to directlyobtain the long run income elasticity. Hence, this study uses linear Taylorapproximation to transform the long run equilibrium equation. The expansionpoints are the weighted average value of R1+ and R2+|1+ respectively

    ( 711.01 =+R ; 277.02 =+R ). Evaluated at the chosen expansion point and themeans of the regressors, the long run income elasticity is 0.618 for R1+ and0.837 for R2+|1+. This implies that the long run elasticity for households owning

    at least one car is about double the one in the short run, while for multi-carhouseholds the long run elasticity is about 80% higher than that in the shortrun. Note that this difference is not as big as reported in Dargay andVythoulkas (1999), where the long run income elasticity is about three time ofthe short run one.

    3.3 Dynamic Model with Saturation Level

    In car ownership forecast model, saturation is an important concept. It is alimit on the choices faced by decision maker, which may be reached by notexceeded. A model with saturation level explicitly assumes that increasingincome will bring car ownership levels closer to but never in excess of asaturation limit. Similar models that restrict range of possible choice fractionshave been used under the name of DOGIT. While well established, however,there have been problems with estimating these saturation models. In thecurrent study, attempt to directly estimate the following model failed miserably:

    )'exp(1

    )'exp(

    x

    xSPi +

    = (12)

    where S is the saturation level. However, after a bit of manipulation, it ispossible to transform Sinto a linear term in the exponential function. Rewrite

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    12/24

    Then,)exp()'exp()'exp(1

    )'exp(** SSxx

    xPi +++

    =

    (15)

    Instead of directly estimating S, we now estimate a linear term S* in theexponential function. One advantage of this formulation is that implicitlyconstrains S within the range of zero and one. It should be noted thatequation (15) is equivalent to the choice probability function of the nested logitmodel proposed by Daly (1999) and Whelan et al. (2000).

    Assuming follows multivariate normal distribution, expression (15) can besimulated in the same manner as (6). The simulated probability is inserted inlog likelihood function (4), which enables the model to be estimated by similarGauss routine. Table 3.4 presents the results for the model of R1+ and R2+|1+.

    Table 3-4 Results of dynamic pseudo panel model with saturation level(t-statistic in parentheses)

    Constant LNINC HHSIZE WORKER LNAGE AGEDUMMY LagY S*

    -8.2544 1.3945 0.2415 -0.2262 -0.1317 -0.1725 2.5309 2.5759Mean

    (-9.5190) (8.5480) (4.4400) (-4.1280) (-2.2610) (-2.9860) (16.3860) (10.3420)

    0.0001 0.0000 0.0004 0.0008 0.0001 0.0006 0.0006R1+

    Std Dv(0.0110) (0.0010) (0.1050) (0.1090) (0.0410) (0.0490) (0.0530) N.A.

    -5.5842 0.7730 -0.0742 0.1252 -0.1294 5.4581 0.4120Mean

    (-9.2090) (6.3590) (-2.2330) (2.4330) (-3.3900) (16.5520) (4.6000)

    0.0002 0.0002 0.0001 0.0006 0.0003 0.0061R2+|1+

    Std Dv

    (0.0160) (0.0850) (0.0400) (0.0880) (0.1170) (0.1800) N.A.

    For both models, the means of most parameters in vector are significant at1% level (AGEDUMMY is not significant for the R2+|1+ model and is thusomitted). None of the standard deviations of is significant. The log likelihoodfor R1+ is -66079, while that for R2+|1+ is -64967. Compared to models withoutsaturation level (Table 3.3), this represents an increase of log likelihood of 6and 32 respectively. As the 5% critical value of a chi-squared statistic with onedegree of freedom is 3.84, it suggests that the explanatory power of themodels increase after the saturation level is included. This result is particularlysignificant for model of R2+|1+, as the saturation level for R2+|1+ is substantiallylower than one.

    From equation (14), it is easy to derive the saturation level Sbased on theestimated S*:

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    13/24

    4. CAR DEMAND FORECAST AND EVALUATION

    After a robust model is estimated, the next step is to apply the model in

    demand forecast. Due to data availability problem, the geographic areacovered is limited to Great Britain only (as opposed to the United Kingdom),and the forecast horizon is between year 2001 and 2021. As the model isestimated using pseudo panel data, the common problem of aggregation biasin models based on individual data can be avoided. However, there are twoimportant issues need to be resolved in the pseudo panel setting. The first isthe treatment of new cohorts. The second is the separation of cohort effectand time trend effect in the input data.

    The compiled pseudo panel dataset includes 16 cohorts. The head of theoldest cohort born between 1901 and 1905, while that of the youngest cohortborn between 1976 and 1980. Over the forecast horizon, five new cohorts willbe introduced, with the youngest whose head is born between 2001 and 2006.For a longer forecast period, there will be more new cohorts. Whether theestimated model is applicable to these new cohorts remains a question. Sincethe model concerned is a random parameter model and none of the standard

    deviation of the parameters is significant, it seems to reject the hypotheses ofheterogeneity across cohorts. As a result, it can be argued that marginaleffects of all explanatory variables remain the same for all cohorts and itwould be acceptable to apply the model to the new cohorts.

    For each of the twenty cohorts 8 over the forecast period, two categories ofinput data are required: number of households and characteristics ofhouseholds (household disposable income, household size, number ofworkers in the household, age of household head). The analysis of cohortcharacteristic in Section 2 reveals that for pseudo panel data, householdcharacteristic such as income goes through a life cycle peaking at the age oflate 40s; furthermore, at a given age, households in younger cohorts tend tohave higher income than those in older cohorts. In order to estimate futureyear input to the model, the current study develops a household sub-model,which includes 81 overlapping age bands and explicitly separates the cohortand time trend effect. The first part of this chapter will describe the householdsub-model in further details; the second part will present the forecast results

    and compare them to the observed data between 2001 and 2004.

    4.1 Household sub-model

    The household sub-model estimates total number of households for eachcohort as well as relevant household characteristics, which are input to the car

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    14/24

    2. For each of the 81 age band, forecast the future year figure based onstandard growth assumption, derived from various sources. This stageintroduces the trend effect over time.

    3. The first two stages have produced four 81 by 21 matrices (81 agebands by 21 years by 4 variables). Within each matrix, identify thetwenty cohorts by the age of the household head. For example, in 2001,age band 16-20 is in effect cohort whose head is born between 1981and 1985 (Cohort ID F5); age band 21-25 is cohort born between 1976and 1980 (ID F6). In 2002, it is age band 17-21 that refers to cohort F5and age band 22-26 refers to cohort F6. Similarly, age band 36-40

    refers to cohort F5 and age band 41-45 refers to cohort F6 in 2021.

    The base year household characteristics data are estimated based on FamilyExpenditure Survey. The Office of National Statistics product, Focus onFamily (ONS, 2005), contains data on the number of families based on the14 age bands of family reference person in 2001. By further taking intoaccount the number of one person household in different age groups, it ispossible to derive the number for household for all cohorts in 2001.

    The future year growth assumptions are derived from various social economicforecasts. In particular, the assumption on household number growth wasobtained from ODPM (1999) and Scottish Executive (2002). The realdisposable income is assumed to grow in line with the Gross DomesticProduct (GDP), adjusted by the number of household in each cohort. Theassumption of GDP growth is obtained from Treasury (2005a; 2005b).

    4.2 Car Demand Forecast and Model Performance Evaluation

    For every year, the total number of cars is estimated by multiplying the totalnumber of households by the proportion of car owning households for eachcohort, and summing over all cohorts:

    )]1()()()([ 11|21 += ++++ ititititi

    ititt ACPPHHPHHTC

    where, TCt= Total number of cars in yeart;HHit= Total number of household for cohort iin yeart;ACit = Average number of cars in multi-car household for cohort i in

    yeart.

    The base year value ofACi is derived using the Family Expenditure Survey

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    15/24

    Table 4-1 Average number of cars in multi-car household

    16-19 20-25 25-45 45-65 65-75 75+

    2001 2.009 2.029 2.163 2.306 2.071 2.0092006 2.053 2.073 2.210 2.356 2.117 2.0532011 2.075 2.095 2.234 2.382 2.140 2.075

    2016 2.082 2.103 2.242 2.390 2.147 2.0822021 2.084 2.105 2.244 2.392 2.149 2.084

    P1+ and P2+|1+ are forecasted using the parameters estimated from the carownership models (see Table 3.4 in the previous section) and output from the

    household sub-model. Figure 4.1 and 4.2 shows the estimated proportion ofhousehold owning at least one car and that of household owing two or morecars (P2+= P1+ *P2+|1+). Both show strong cohort effect and time trend effect.For a given year, the proportion of household owning cars is lower for youngand old cohorts and reaches its peak when the household head is agedaround 50. The time trend effect is also clear, as at a given age the proportionof car ownership is higher in later years, although the difference becomessmaller when the household head gets older. Finally, it is worth noting that thedistribution curve across cohorts becomes much less peaky in 2021compared to 2001, indicating a strong saturation effect. Table A2 and TableA3 in the appendix present the proportion of car ownership for all cohorts overthe forecast period.

    Figure 4-1 Forecast proportion of household owing at least one car, fiveselected years

    Proportion of household owning at least one car,Comparison by Years

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1906

    -10

    1916

    -20

    1926

    -30

    1936

    -40

    1946

    -50

    1956

    -60

    1966

    -70

    1976

    -80

    1986

    -90

    1996

    -00

    2001

    2006

    2011

    20162021

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    16/24

    Figure 4-2 Forecast proportion of household owing two or more cars,five selected years

    Proportion of household owning two or more cars,

    Comparison by Years

    0

    0.05

    0.1

    0.150.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    1906

    -10

    1916

    -20

    1926

    -30

    1936

    -40

    1946

    -50

    1956

    -60

    1966

    -70

    1976

    -80

    1986

    -90

    1996

    -00

    Birth Cohorts

    2001

    2006

    2011

    2016

    2021

    As there are four years of car ownership data available in the forecast period(year 2001 to 2004), it is possible to evaluate the model performance bycomparing the forecasts to the observed data. By comparing the forecast totalnumber of cars to the total number of private cars currently licensed in GreatBritain, our forecasts are higher by between 1.15% and 2.40%. Taking intoaccount the unlicensed car stock, the overall forecast results seem veryaccurate. Table 4.2 shows the comparison of forecast and actual private carstock in Britain.

    Table 4-2 Forecast and Actual Private Car Stock in Britain, 2001-2004

    Total CarsVehicle

    Licensed aDifference

    Estimated UnlicensedPLG Stock b

    2001 24,473 23,899 2.40%2002 24,868 24,543 1.33% 4.40%

    2003 25,420 24,985 1.74%2004 26,052 25,755 1.15% 2.90%

    (Source: a DfT, 2005; b DfT, 2004)

    Another evaluation criterion is to compare the forecast proportion ofhousehold owning cars to the observed value. However, the data available

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    17/24

    5. CONCLUSION

    The paper presents a fresh attempt in car demand forecasting. A pseudo

    panel dataset was constructed from the UK Family Expenditure Surveycovering the period of 1982 to 2000. It enables the estimation of dynamicmodels and the identification of long run and short run elasticities. It isexpected that allowing state dependence over time would improve theperformance of the forecasting model. Furthermore, this study introduces anew approach that enables the direct estimation of the saturation level in adynamic mixed logit model. The estimated saturation levels are highlysignificant for both models (one plus car and conditional two plus cars). The

    explanatory power of the models has also been improved after the inclusion ofsaturation level, especially for that of conditional two plus cars. All theseresults suggest that it is very important to consider the effects of saturation asan integrated part of any car demand forecasting model.

    After a robust model has been estimated, this study forecasts the number ofcars in Great Britain to year 2021. For the four years with data available, theforecasts results closely match the actual value. Overall, the performance of

    the forecast model is highly satisfactory. One particular difficulty we haveencountered in forecasting is the treatment of new cohorts. It is debatablewhether we can treat them in the same way as the existing ones. It isexpected that this problem will become more acute if the forecast horizon isextended to 30 or even 50 years, as it becomes more and more uncertainwhat the choice behaviour of the newer cohorts will be like. The solution heremight lie in the consideration of fixed effect and its trend over cohorts, whichremain an area of future research.

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    18/24

    Notes:

    1. The author would like to thank Prof. Ron Smith for his detailed andconstructive comments.2. Hanly and Dargay (2000) was a good attempt to explore the BHPS.3. The conditional proportion of household owning two or more cars can bederived using unconditional proportion of household owning two or more carsand at least one car: R2+|1+ = (R2+)/ (R1+).4. In the pseudo panel setting, the decision (choice) maker is the averagehousehold in a cohort.

    5. Separate models assuming that follows a multivariate uniform distributionand multivariate triangular distribution have been estimated and producedsimilar results.6. These estimated standard deviations are for the underlying distribution ofthe parameter. Noted that ~ f(), and in the current study is defined by thetwo parameters in vector: mean and standard deviation.

    7. LNINCR

    R = )( , LNINC being log disposable income. Note that for

    discrete choice model, marginal effect and elasticity depend on the values ofthe regressors.8. The oldest cohort (head born between 1901 and 1906) has been drop inthe forecasting.9. For number of household the base year is 2001, the census year; forhousehold characteristics, the base year is 2000, last year of the FamilyExpenditure Survey series.10. It is assumed that the growth rate g=1/x, where x= 2, 3, 4 for the threesubsequent five-year periods.

    11. To obtain multi-car factor for all cohorts, we follow a process similar to thehousehold sub-model, which involves expanding the future year factors to an81 by 21 matrix.

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    19/24

    Reference:

    Anderson, G. F. and Hussey, P. S. (2000), Population Aging: A Comparisonamong Industrialized Countries, Health Affairs, 19 (3), pp191-204

    Baldini, M. and Mazzaferro, C. (1999), Demographic transition and HouseholdSaving in Italy, paper presented to the Bank of Italy Conference QuantitativeResearch for Political Economy, Perugia, Dec. 1999

    Beach, C. M. and Finnie, R. (2004), A Longitudinal Analysis of Earnings

    Change in Canada, Canadian Journal of Economics, 37 (1), pp219-241

    Bhat, C. (1998), Accommodating Variations in Responsiveness to Level-of-Service Variables in Travel Mode Choice Models, Transportation Research A,32, pp455507

    Bhat, C. (2000), Incorporating Observed and Unobserved Heterogeneity inUrban Work Mode Choice Modeling, Transportation Science, 34, pp228238

    Bourguignon, F., Goh, C. and Kim, D. (2004), Estimating individualvulnerability to poverty with pseudo-panel data, World Bank Policy ResearchWorking Paper 3375

    Brownstone, D. and Train, K. (1999), Forecasting New Porduct Penetrationwith Flexible Substitution Patterns, Journal and Econometrics, 89, pp109-129

    Brownstone, D., Bunch, D. and Train, K. (2000) Joint Mixed Logit Models ofStated and Revealed Preferences for Alternative-Fuel Vehicles,Transportation Research Part B: Methodological, 34 (5), pp315-338

    Campbell, J. Y. and Cocoo, J. F. (2005), How Do House Prices AffectConsumption? Evidence from Micro Data, NBER Working Paper Series No.11534

    Daly, A. (1999), How Much is Enough? Saturation Effects Using Choice

    Models, Traffic Engineering and Control, Oct. 1999, pp 493-495

    Dargay, J. and Vythoulkas, P. (1999), Estimation of a Dynamic CarOwnership Model, A Pseudo-Panel Approach, Journal of TransportEconomics and Policy, 33 (3), pp 287-302

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    20/24

    Department for Transport (2005), Transport Statistics Great Britain, 2004,http://www.dft.gov.uk/stellent/groups/dft_control/documents/contentservertemplate/dft_index.hcst?n=11691&l=3

    De Jong, G.C. (1989a) Some joint models of car ownership and car use, Ph.D.thesis, Faculty of Economic Science and Econometrics, University ofAmsterdam.

    De Jong, G.C. (1989b) Simulating car cost changes using an indirect utilitymodel of car ownership and car use, paper presented at PTRC SAM 1989,PTRC, Brighton.

    Garner, B.R., Godley, S. H. and Funk, R. R. (2002), Evaluating AdmissionAlternatives in an Outpatient Substance Abuse Treatment Program forAdolescents, Evaluation & Program Planning, 25 (3), pp287-295

    Glied, S. (2002), Youth Tobacco Control: Reconciling Theory and EmpiricalEvidence, Journal of Health Economics, 21 (1), pp117-136

    Green, W. H. (2003), Econometric Analysis, 5th Edition, New Jersey: PearsonEducation Inc.

    Hanly, M. and Dargay, J. (2000), Car Ownership in Great Britain A PanelData Analysis, ESRC Transport Studies Unit, University College London

    Hensher, D., Bernard, P.O., Smith, N.C. and Wilthorpe, F.W. (1989), AnEmpirical Model of Household Automobile holdings, Applied Economics, 21,pp35-57

    Lauer, C., (2003), Family Background, Cohort and Education: A FrenchGerman Comparison Based on A Multivariate Ordered Probit Model ofEducational Attainment, Labour Economics, 10 (2), pp231-252

    NRTF (1997), National Road Traffic Forecasts (Great Britain) 1997, WorkingPaper No. 1, Car Ownership: Modelling and Forecasting, Department of theEnvironment, Transport and the Regions

    Office of Deputy Prime Minister (1999), Projections of households in England2021,http://www.odpm.gov.uk/stellent/groups/odpm_housing/documents/page/odpm_house_604206.hcsp

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    21/24

    Scottish Executive (2002), Household Projections for Scotland: 2000-Based,http://www.scotland.gov.uk/stats/bulletins/00179-00.asp

    Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge:Cambridge University Press

    Treasury (2005a), HM Treasury Pocket Data Bank, 9th August 2005,http://www.hm-treasury.gov.uk/media/9B0/A8/pdb090805.xls

    Treasure (2005b), Budget 2005, Investing for our future: Fairness and

    opportunity for Britain's hard-working families,http://www.hm-treasury.gov.uk/budget/budget_05/budget_report/bud_bud05_report.cfm

    Verbeek, M. and Nijman, T. (1992), Can Cohort Data be Treated as GenuinePanel Data? Empirical Economics, 17, pp9-23

    Weir, G. (2003), Self-employment in the UK labour market, Labour MarketTrends, 111 (9), pp441-452

    Whelan, G. (2001), Methodological Advances in Modelling and ForecastingCar Ownership in Great Britain, paper presented to European TransportConference 2001, PTRC, Cambridge

    Whelan, G., Wardman, M. and Daly, A. (2000), Is There a Limit to CarOwnership Growth? An Exploration of Household Saturation Levels Using twoNovel Approaches, paper presented to European Transport Conference 2000,

    PTRC, Cambridge

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    22/24

    Association for European Transport and contributors 2005

    Appendix

    Table A-1 Constructing pseudo panel by household heads date of birth (mean age for all cohorts)

    Born1976-1980

    1971-1975

    1966-1970

    1961-1965

    1956-1960

    1951-1955

    1946-1950

    1941-1945

    1936-1940

    1931-1935

    1926-1930

    1921-1925

    1916-1920

    1911-1915

    1906-1910

    1901-1905

    Group 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    1982 19 24 29 34 39 44 49 54 59 64 69 74 79

    1983 20 25 30 35 40 45 50 55 60 65 70 75 80

    1984 21 26 31 36 41 46 51 56 61 66 71 76 81

    1985 22 27 32 37 42 47 52 57 62 67 72 77 82

    1986 23 28 33 38 43 48 53 58 63 68 73 78 83

    1987 19 24 29 34 39 44 49 54 59 64 69 74 79

    1988 20 25 30 35 40 45 50 55 60 65 70 75 80

    1989 21 26 31 36 41 46 51 56 61 66 71 76 81

    1990 22 27 32 37 42 47 52 57 62 67 72 77 82

    1991 23 28 33 38 43 48 53 58 63 68 73 78 83

    1992 19 24 29 34 39 44 49 54 59 64 69 74 79 84

    1993 20 25 30 35 40 45 50 55 60 65 70 75 80 85

    1994 21 26 31 36 41 46 51 56 61 66 71 76 81 86

    1995 22 27 32 37 42 47 52 57 62 67 72 77 82 87

    1996 23 28 33 38 43 48 53 58 63 68 73 78 83

    1997 24 29 34 39 44 49 54 59 64 69 74 79 84

    1998 20 25 30 35 40 45 50 55 60 65 70 75 80 85

    1999 21 26 31 36 41 46 51 56 61 66 71 76 81 86

    200022 27 32 37 42 47 52 57 62 67 72 77 82 87

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    23/24

  • 8/4/2019 Car Demand Forecasting Using Pseudo Panel Method

    24/24