periodic autoregressive model identification using genetic algorithms

8

Click here to load reader

Upload: eugen-ursu

Post on 01-Oct-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Periodic autoregressive model identification using genetic algorithms

Periodic autoregressive model identificationusing genetic algorithmsEugen Ursua and Kamil Feridun Turkmana,*,†

Periodic autoregressive (PAR) models extend the classical autoregressive models by allowing the parameters tovary with seasons. Selecting PAR time-series models can be computationally expensive, and the results are notalways satisfactory. In this article, we propose a new automatic procedure to the model selection problem by usingthe genetic algorithm. The Bayesian information criterion is used as a tool to identify the order of the PAR model.The success of the proposed procedure is illustrated in a small simulation study, and an application with monthlydata is presented.

Keywords: Periodic time series; identification; genetic algorithms; parameter constraints; BIC.

1. INTRODUCTION

Periodic time series are useful for modelling a variety of data sets drawn from climatology, hydrology, economics and electricalengineering, amongst others. See, for example, Lund and Basawa (2000), Basawa and Lund (2001), Lund et al. (2006) and thereferences cited in these papers for applications of periodic time-series models to these fields. Pioneering work in the statisticalliterature has been done by Jones and Brelsford (1967), Pagano (1978) and Troutman (1979), who have examined the fundamentalproperties of univariate Periodic autoregressive (PAR) processes, the inferential techniques for parameter estimation and prediction,as well as the connections with the related stationary multi-variate autoregressive (AR) process. Historically, the concept ofperiodically correlated stochastic processes goes back to Gladyshev (1961).

In McLeod (1994), we find a complete description of the usual stages of model development in univariate PAR models:identification, estimation and diagnostic checking.

Model identification is typically the most difficult aspect of the model-building procedure. Noakes et al. (1985) recommend theexamination of the plots of the periodic partial autocorrelation function as the best way to identify the PAR models. This approachworks well when the parameter space is not constrained and requires high level of user experience. Another approach is to use anautomatic selection criterion, such as the Akaike Information Criterion (AIC) (Akaike, 1974) or the Bayesian information criterion (BIC)(Schwarz, 1978) when all possible models are examined. A possible difficulty with this procedure is the large number of models to beinvestigated. For example, for a PAR model with 12 seasons and maximum 15 parameters for each season, there are in total 393,216models for comparison. For PAR models, the order of the AR operators can be different across the seasons and the efficient algorithmto search for optimal AR model proposed by McClave (1975) may be used. The disadvantage of this proposed algorithm was outlinedby Gaetan (2000): if several models have the same number of parameters, only one of them will be considered in the comparisonprocedure. Other relevant references are Boshnakov (1996) for a recursive procedure for determination of a periodic ARMA andTesfaye et al. (2006) for the identification of the PARMA model using the innovations algorithm. For general discussion of recursiveestimation of AR-moving average order, see Hannan and Risanen (1982).

There are two convenient and efficient estimation techniques available for PAR models, namely the least squares method and themethod of moments based on Yule–Walker equations. The least squares method was used in Franses and Paap (2004) for theunivariate case and in Lutkepohl (2005) for multi-variate case. Ursu and Duchesne (2009) recently studied least squares estimators forthe vector PAR models and they consider situations where there are linear constraints on the parameters of a given season. Formethod of moments based on Yule–Walker equations and asymptotic properties, see Pagano (1978), Troutman (1979) and Hipel andMcLeod (1994). Pagano (1978) also showed that the estimators for different periods are asymptotically uncorrelated. In conclusion,the identification, estimation and diagnostic checking can be modelled independently for each season (Noakes et al., 1985; Hipel andMcLeod, 1994).

McLeod (1994) established the asymptotic distribution of the residual autocorrelations, and as a consequence, portmanteau teststatistics based on a fixed number of residual autocorrelations were proposed for diagnosing PAR models. As discussed in McLeod(1994) in the univariate case and confirmed in the simulation experiments of Ursu and Duchesne (2009) in the multi-variateframework, a modified version of this test offers better finite-sample properties than the original version.

aCEAUL - FCUL, University of Lisbon*Correspondence to: Kamil Feridun Turkman, Department of Statistics and Operations Research, Faculty of Sciences, Bloco C6, Campo Grande, 1749-016 LisboaCodex, Portugal.†E-mail: [email protected]

Original Article

First version received July 2011 Published online in Wiley Online Library: 19 January 2012

(wileyonlinelibrary.com) DOI: 10.1111/j.1467-9892.2011.00772.x

J. Time Ser. Anal. 2012, 33 398–405 � 2012 Blackwell Publishing Ltd.

39

8

Page 2: Periodic autoregressive model identification using genetic algorithms

In this article, we develop a selection procedure for PAR models based on one of the new tools for statistical optimization, namelya version of the genetic algorithm (GA). GA was introduced by Holland (1975) and has become a popular tool due to the work ofGoldberg (1989). It is appropriate to use a GA when the solution space is discrete and large, and the objective function does not meetthe usual requirements, such as continuity, differentiability and convexity. The advantages and limitations of GAs comparing withother optimization techniques are enumerated, for example, in Sivanandam and Deepa (2008). Statistical applications of GAs havebeen discussed by Chatterjee et al. (1996), Baragona and Battaglia (2009), amongst others. Various versions of the GA have beenproposed and shown to be useful for similar problems in time-series analysis, such as multi-regime models based on thresholds fordescribing structural breaks (Davis et al., 2008), for complicated nonlinear models (Baragona et al., 2004b), subset AR moving-average identification (Gaetan, 2000).

This article is organized as follows. In Section 2, the PAR model is introduced and least squares estimators are given. In Section 3, anoverview for GA is given, and in Section 4, its implementation to the identification problem is discussed. In Section 5, somesimulation results are reported, and in Section 6, an application to the well known Fraser river data which were previously studied inMcLeod (1994) is made. Finally, Section 7 offers some concluding remarks.

2. PERIODIC MODELS

The class of PAR models extends the class of AR models by allowing the AR parameters to vary with the seasons. Let Y ¼ fYt; t 2 Zgbe a PAR stochastic process given by

Ynsþm ¼XpðmÞ

k¼1

/kðmÞYnsþm�k þ �nsþm: ð1Þ

For fixed m and predetermined value s, the random variable Yns+m represents the realization during the mth season, withm 2 f1, . . . , sg, at year n, n 2 Z. The AR model order at season m is given by p(m), whereas /k(m), k ¼ 1, . . . , p(m), represent theAR model coefficients during season m, m ¼ 1, . . . , s. The error process � ¼ f�t; t 2 Zg in eqn (1) corresponds to a periodicwhite noise, with E(�t) ¼ 0 and var(�ns+m) ¼ r2(m) > 0, m ¼ 1, . . . , s. In this article, we assume that the PAR process (1) has meanequal to zero in each of the s seasons, that is E(Yns+m) ¼ 0, m ¼ 1,2, . . . , s. In practical applications, trends and seasonal meansare first removed from the series, meaning that a model is formulated by examining Yns+m � lm, say, where in general themathematical expectation E(Yns+m) ¼ lm may be a function of season m. Note that if s ¼ 1, then model (1) reduces to a classicalAR model.

The Equation (1) can be written in a vector form, as a special case of the multi-variate AR model (Ula, 1990; Franses and Paap, 2004).The stationarity conditions for a multi-variate AR are well known (see Brockwell and Davis, 1991); therefore, they are also readilyavailable for a PAR model.

Clearly, a periodic model encapsulates s models for each of the seasons, which are potentially different. Also, the memory structurereflected by the AR components can vary with the seasons. This fact is illustrated in McLeod (1993) for seasonal river flow time series,since river flows for a particular season of the year may be statistically similar from year to year, but may depend intrinsically on theseason, resulting in a periodic correlation structure.

2.1. Unconstrained least squares estimators

Consider the time-series data Yns+m, n ¼ 0,1, . . . ,N � 1, m ¼ 1, . . . , s with sample size n ¼ Ns. Let

zðmÞ ¼ ðYm; Ysþm; . . . ; YðN�1ÞsþmÞT;eðmÞ ¼ ð�m; �sþm; . . . ; �ðN�1ÞsþmÞT;

XðmÞ ¼

Ym�1 Ym�2 . . . Ym�pðmÞ

Ysþm�1 Ysþm�2 . . . Ysþm�pðmÞ

..

. . .. ..

.

YðN�1Þsþm�1 YðN�1Þsþm�2 . . . YðN�1Þsþm�pðmÞ

2666664

3777775;

be N · 1, N · 1 and N · p(m) random matrices.

The PAR model can be reformulated as:

zðmÞ ¼ XðmÞbðmÞ þ eðmÞ; m ¼ 1; . . . ; s: ð2Þ

The model parameters are collected in the p(m) · 1 vector b(m) which is defined as:

bðmÞ ¼ ð/1ðmÞ; . . . ;/pðmÞðmÞÞT: ð3Þ

PAR MODEL IDENTIFICATION USING GA

J. Time Ser. Anal. 2012, 33 398–405 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa

39

9

Page 3: Periodic autoregressive model identification using genetic algorithms

The covariance matrix of the random vector e(m) is r2(m)IN. The least squares estimators of b(m) are obtained by minimizing

SðbÞ ¼Xs

m¼ 1

eTðmÞeðmÞ: ð4Þ

Consequently, the least squares estimators of b(m) satisfy the relation:

bðmÞ ¼ ½fXTðmÞXðmÞg�1XTðmÞ�zðmÞ;

and the residuals are given by �nsþm ¼ Ynsþm � XnðmÞbðmÞ.

2.2. Least squares estimation with linear constraints on the parameters

Assume that linear constraints for the p(m) · 1 vector b(m) are given in the form

bðmÞ ¼ RðmÞnðmÞ þ bðmÞ; ð5Þ

where R(m) is a known p(m) · K(m) matrix of order K(m), n(m) is a unrestricted K(m) · 1 vector of unknown parameters and b(m) is ap(m) · 1 vector of known constants. All the linear constraints of interests can be written in this form. For example, the constraint/1(m) ¼ 0 can be expressed as in (5) by choosing K(m) ¼ p(m) � 1,

RðmÞ ¼

0 0 . . . 01 0 . . . 00 1 . . . 0... . .

. ...

0 0 . . . 1

266664

377775; nðmÞ ¼

/2ðmÞ/3ðmÞ

..

.

/pðmÞðmÞ

26664

37775; bðmÞ ¼

00...

0

2664

3775:

Note that the case when R(m) ¼ Ip(m), where Ip(m) denotes the p(m) · p(m) identity matrix and b(m) ¼ 0, m ¼ 1,2, . . . , s corresponds tothe unconstrained case described in Section 2.1. In general, the matrices R(m) and the vectors b(m) allow for linear constraints on theparameters of the same season m, m ¼ 1, 2, . . . , s. When there are constrained parameters, the residuals in (2) satisfy

eðmÞ ¼ zðmÞ � XðmÞðRðmÞnðmÞ þ bðmÞÞ:

Proceeding as in Section 2.1, it is possible to show that the least squares estimator nðmÞ of n(m) is given by:

nðmÞ ¼ ½RTðmÞXTðmÞXðmÞRðmÞ��1RTðmÞXTðmÞ½zðmÞ � XðmÞbðmÞ�:

Furthermore, nðmÞ follows asymptotically a multi-variate normal distribution. The asymptotic properties of the PAR parameterestimators could be deduced from results for multi-variate case (see Ursu and Duchesne, 2009).

2.3. Identification method

Various selection criteria using AIC or BIC can be employed for PAR model identification. One possible way is to employ the BICselection criterion separately for each of the seasonal components, thus defining the overall BIC criterion as

BIC ¼Xs

m¼ 1

BICðmÞ; ð6Þ

with

BICðmÞ ¼ log r2ðmÞ þ logðNÞN

pðmÞ; ð7Þ

where rðmÞ corresponds to the least squares estimators of r(m) and p(m) represents the number of AR parameters in the season m; seeMcLeod (1994) for details. The problem with this piecewise identification method is that the number of possible models may becomeexcessively large especially for monthly or weekly data, so a random search algorithm may have to be employed. We suggest aversion of GA as a possible solution to this problem.

3. THE GENETIC ALGORITHM PROCEDURE

Genetic algorithm is a search technique to find approximate solutions to optimization problems. A simple GA start with an arbitrarilychosen population of possible solutions of size Np. Each solution is represented through a chromosome, encoded as a binary string oflength L. Their relative performance (fitness) is evaluated. Based on their performances, a new population of potential chromosomesis created, using simple evolutionary operators selection, crossover and mutation. This is motivated by the hope that the newpopulation will perform better than the previous one.

E. URSU AND K. F. TURKMAN

wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012, 33 398–405

40

0

Page 4: Periodic autoregressive model identification using genetic algorithms

The three evolutionary operators are, as follows,:

• Selection: Selection is the process of random choice of parents from the population, giving higher selection probability to achromosome proportional to the value of its fitness function.

• Crossover: Assuming that the parents chromosomes are considered in pairs (Np is an even integer), the crossover operation isperformed by randomly choosing pairs of parents with probability Pc and mixing these parents. The aim is to produce second-generation chromosomes preserving the best from their parents.

• Mutation: Any part of a chromosome in the population is allowed to change with probability Pm. The mutation operationprevents the algorithm from being trapped in local optima.

This cycle continues until the maximum number of generations Ng is attained, or until a stop condition is reached.An additional step, called the elitist step, may be performed: the best chromosome found in each of the Ng generations is always

maintained in the current population. This elite chromosome replaces the chromosome with the worst fitness value in the nextgeneration. These generation-best chromosomes may otherwise be lost if they are not selected to reproduce or if crossover ormutation destroys them.

The convergence to global optimum chromosome is based on Holland’s scheme theorem. For many variations of the basic GA anddetailed explanations, see Goldberg (1989) and Mitchell (1996).

4. APPLYING GA TO THE IDENTIFICATION PROBLEM

A large number of possible solutions of the PAR identification problem suggest that a GA will be useful in efficient examination of thespace of solutions and select the combination of parameters that corresponds to the best model. In this section, the GA’s will be usedfor the PAR model identification problem when subset AR models are specified for each season and the space of solutions is ratherlarge.

While using GA approach, string representations for chromosomes have to be provided. Since the identification will be made foreach of the separate periods, one gene for each possible lag will be reserved, filling it with 1 if the parameter is free, and with 0 if theparameter is constrained to zero. The resulting string represents the chromosome. For example, if we take s ¼ 4, m ¼ 1 andp(1) ¼ 15, the following model

Y4nþ1 ¼ /6ð1ÞY4n�5 þ /7ð1ÞY4n�6 þ �4nþ1

is represented in terms of the chromosome

000001100000000:

An alternative coding and its advantages are described in Baragona and Battaglia (2009). Our implementation of the GA starts withan initial population of chromosomes generated at random.

We evaluate each chromosome by means of a positive real-valued function called fitness function which measures goodness ofsolution. Essentially, the fitness function is linked to some version of identification criterion. Our choice is the BIC criterion givenin eqn (7), one of the most popular identification criterion in time series. A frequent problem with this choice is that theproposed criterion has to be minimized, therefore cannot be implemented directly as fitness function (which has to bemaximized). Since the BIC given in (7) may be negative, a natural candidate will be a simple linear transformation (Goldberg,1989; Gaetan, 2000)

fjðmÞ ¼ Mþ 1� BICjðmÞ; ð8Þ

where BICj(m) is the BIC(m) value for the jth chromosome in the current population and M is the maximum value of thecurrent BIC(m) in the current population. A negative exponential transformation could also be used, as in Baragona et al.(2004a,b).

Based on these fitness values fj(m), a tournament selection of the individuals is performed. This tournament selection is based onrunning several tournaments among a group of chromosomes chosen at random from the current population. The winner of eachtournament (the one with the best fitness) is selected to be part of a new population, that replaces the previous population. It is clearthat each individual can be selected more than once to become a member of the new population. According to Mitchell (1996), rankand tournament selection have become increasingly common in recent years. The tournament selection method is computationallymore efficient than the rank selection; therefore, we adopt this method in our case.

The crossover probability Pc describes how often crossovers will be performed. The probability of a crossover occurring in theinterval (0.5,0.9) is usually high. Hence, upon studying many variants of the crossover method found in the literature, the uniformcrossover with Pc ¼ 0.8 was found to be adequate. The probability of mutation Pm is usually assumed small, often taken in theinterval (0.001,0.1). Larger values turn the GA to a random search. We chose Pm ¼ 0.01. In our simulations, presented in the followingsection, we use one elitist chromosome which replaces the lowest fitness chromosome. We stop the procedure after a fixed numberNg of generations.

PAR MODEL IDENTIFICATION USING GA

J. Time Ser. Anal. 2012, 33 398–405 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa

40

1

Page 5: Periodic autoregressive model identification using genetic algorithms

5. SIMULATION EXPERIMENTS

It is natural to provide empirical evidence by evaluating the finite-sample properties of the proposed method for identifying the PARorder. Here we report the simulation result of a small Monte Carlo experiment.

To illustrate the selection of PAR models, three data-generating processes (DGP) are used:

DGPi : Ynsþm ¼ /1ðmÞYnsþm�1 þ � � � þ /pðmÞðmÞYnsþm�pðmÞ þ �nsþm;

where i ¼ 1,2,3. The models DGP1 and DGP2 correspond to a PAR with four seasons, while DGP3 represents a PAR with 12 seasons.No parameter constraints are hypothesized for model DGP1 (Table 1), but for models DGP2 and DGP3, it is assumed that the zero-valued parameters in Table 2 and Table 3 are known. It worth to mention that the AR orders are different across the seasons. For eachDGP, the stochastic process � ¼ f�t; t 2 Zg is assumed to be a periodic Gaussian white noise, with mean 0 and variances given onthe last column in Tables 1, 2 and 3.

For the purpose of order notation, we indicate only the position of the unconstrained parameters for each period. For example, forthe DGP2 model, the following notation

ðpð1Þ; . . . ; pð4ÞÞ ¼ ðð1; 2Þ; ð2; 4Þ; ð1; 2; 3Þ; ð6; 7ÞÞ

indicates that for the first period, the maximum order is 2 and the parameters on the first two positions are unconstrained, whereasfor the second period, the maximum order is 4 and the parameters on the second and fourth position are unconstrained and so on.

We examine the selection algorithm for different lengths: 200 observations per season for DGP1 and DGP2 and 500 observationsper season for DGP3. For all three models, seasonal orders vary from 0 to 15, leading to a chromosome of length L ¼ 15. The size ofthe population, Np, is chosen in the set f20, 40g, the maximum allowed number of generations, Ng, is equal to 25, 50 or 100 and thenumber of elite individuals is 1. For each Np and Ng indicated above, the selection algorithm is applied to 100 independentsimulations.

The empirical frequencies when the true model is detected are given in Table 4. Three statistics are used to summarize the result:q1 is the proportion of simulations in which the true model is detected, q2 is the proportion of simulations in which the detectedchromosome differs from the true string in one point (that means at most a nonexistent lag was included or an existent lag wasexcluded) and q3 is the proportion of the remaining cases, namely the proportion of simulations in which the detected chromosomediffers from the true string in at least two points.

Table 1. The autoregressive parameters of the model DGP1; the periodic variances are given on the last column. The autoregressive orders are givenby (p(1), . . . p(4)) ¼ ((1,2),(1),(1,2,3),(1))

/1 /2 /3 r2

m ¼ 1 0.30 0.50 1.00m ¼ 2 0.42 1.60m ¼ 3 �0.80 0.20 0.35 2.00m ¼ 4 �0.30 2.50

Table 3. The autoregressive parameters of the model DGP3; the autoregressive orders are given by (p(1), . . . p(12)) ¼ ((2),(1,2),(1,4),(1,2,3),(1,5),(1,2),(1),(1,2,3),(2,3),(1),(1,2),(1))

/1 /2 /3 /4 /5 r2

m ¼ 1 0 0.50 1.00m ¼ 2 0.42 0.80 1.60m ¼ 3 �0.80 0 0 0.66 2.00m ¼ 4 �0.30 0.20 �0.50 2.50m ¼ 5 0.70 0 0 0 0.80 3.00m ¼ 6 0.40 �0.50 0.90m ¼ 7 0.70 1.70m ¼ 8 �0.60 0.40 �0.30 2.40m ¼ 9 0 0.30 0.70 1.50m ¼ 10 0.90 3.20m ¼ 11 �0.60 0.40 2.80m ¼ 12 0.72 0.70

Table 2. The autoregressive parameters of the model DGP2; the autoregressive orders are given by (p(1), . . . p(4)) ¼ ((1,2),(2,4),(1,2,3),(6,7))

/1 /2 /3 /4 /5 /6 /7 r2

m¼1 0.30 0.50 1.00m¼2 0 �0.65 0 0.50 1.60m¼3 �0.80 0.30 0.35 2.00m¼4 0 0 0 0 0 0.81 0.70 2.50

E. URSU AND K. F. TURKMAN

wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012, 33 398–405

40

2

Page 6: Periodic autoregressive model identification using genetic algorithms

The performance of the GA is quite satisfactory, with a high rate of correct identification. The rate of correct identification clearlydepends on Np, and from this limited simulation study, Np ¼ 40 seems to be sufficient to obtain excellent results even for models ascomplex as the model DGP3. Total computing time is also reasonable. For example in worst case, with Np ¼ 40 and Ng ¼ 100,running the GA for model DGP3 took around 5 minutes. On the other hand, for models DGP1 and DGP2, the computing time rangesfrom 6 to 45 seconds depending on the population size Np and the number of generations Ng. All the computer codes are writtenusing the MATLAB software and are available on the web page of the second author. The tests are run on a Intel Quad Core 2.4 GHzprocessor.

Given the number of parameters involved in periodic time series, it is not really surprising that moderate to large sample sizes areneeded so as to have satisfactory results.

6. CASE STUDY: MEAN MONTHLY FLOWS OF THE FRASER RIVER

We analyze the time series of mean monthly flows of the Fraser river discussed in McLeod (1994). The time series of mean monthlyflows of the Fraser river at Hope, British Columbia from March 1912 to December 1990 include 946 observation. The data aretransformed by applying the natural logarithm, and the seasonal means are removed. This data set is available as fraser in pearpackage (Periodic Autoregression Analysis) of the R software.

Using the plot of the sample periodic partial autocorrelation (see Figure 1), McLeod found the AR orders

ðpð1Þ; . . . pð12ÞÞ ¼ ðð1Þ; ð1Þ; ð1Þ; ð1Þ; ð1Þ; ð1; 2; 3Þ; ð1; 2Þ; ð1Þ; ð1Þ; ð1; 2; 3Þ; ð1Þ; ð1ÞÞ:

No parameters were constrained to zero for his model and the corresponding BIC is �3149.37.We ran the GA algorithm for 50 iterations with parameters L ¼ 15, Np ¼ 40, Pc ¼ 0.8, Pm ¼ 0.01 and with one elite individual. Note

that in this case, the number of possible models is approximately 4 · 105. The model corresponding to the minimum BIC is a PARhaving the AR orders

Table 4. Results from simulation experiments

Model Np Ng q1 q2 q3

DGP1 20 25 72 11 1720 50 86 0 1420 100 90 0 1040 25 99 0 140 50 100 0 040 100 99 0 1

DGP2 20 25 74 23 320 50 98 2 020 100 100 0 040 25 99 1 040 50 100 0 040 100 100 0 0

DGP3 20 25 34 35 3120 50 80 20 020 100 84 13 340 25 92 6 240 50 97 2 140 100 96 4 0

0 2 4 6 8 10 12

05

1015

Period

Pacf

Figure 1. Periodic partial autocorrelations for Fraser river data. Period 1 corresponds to January

PAR MODEL IDENTIFICATION USING GA

J. Time Ser. Anal. 2012, 33 398–405 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa

40

3

Page 7: Periodic autoregressive model identification using genetic algorithms

ðpð1Þ; . . . pð12ÞÞ ¼ ðð1Þ; ð1; 12Þ; ð1Þ; ð1Þ; ð1; 11Þ; ð1; 2; 7Þ; ð1Þ; ð1Þ; ð1Þ; ð1; 2; 3Þ; ð1Þ; ð1ÞÞ:

and with constraints on the parameters. The constrained parameters are the zero-valued parameters given in Table 5. Thecorresponding BIC is �3175.53. In this latter case, we add one free parameter, but the BIC is smaller than that obtained by using themodel proposed by McLeod. The modified portmanteau test statistics proposed by McLeod (1994) serves the purpose of diagnosticchecking. Our model and McLeod’s model are comparable, since both models have one (the same) significant value at the 5% level.

It is clear that model identification based on GA may not always have significant improvement in terms of BIC value, but taking inconsideration the difficulty of interpreting the plot of periodic partial autocorrelation function, it is a reliable and easy way ofidentifying PAR models.

7. CONCLUSIONS

In this article, we propose a new selection method for the class of PAR models. We have examined the GA as a method for identifyingthe subset PAR processes. As seen in simulation experiments, the rate at which this automatic method correctly identifies the rightmodel is high. The methodology has been applied successfully to the well-known Fraser river monthly data. The comparison of theresults with the findings of Mcleod (1994) confirms the efficiency of the proposed method.

Subset models are often useful in time-series analysis, especially when the data exhibit some form of periodic behaviour with arange of different natural periods, for instance days, weeks, months and years. Thus, the method we propose in this article seems wellchosen for use with periodic models.

Acknowledgement

The authors thank an anonymous referee for constructive remarks and suggestions. This research is supported by national grantsthrough the Fundac~ao para a Ciencia e a Tecnologia (FCT), Portugal, under the projects SFRH/BPD/63912/2009, PTDC/MAT/118335/2010 and Pest-OE/MAT/UI0006/2011.

REFERENCES

Akaike, H. (1974) A new look at the statistical model identification. IEEE Transactions of Automatic Control 19, 716–23.Baragona, R. and Battaglia, F. (2009) Evolutionary computing in statistical data analysis. Foundations of Computational Intelligence 3, 347–86.Baragona, R., Battaglia, F. and Cucina, D. (2004a) Estimating threshold subset autoregressive moving-average models by genetic algorithms. Metron

62(1), 39–61.Baragona, R., Battaglia, F. and Cucina, D. (2004b) Fitting piecewise linear threshold autoregressive models by means of genetic algorithms. Com-

putational Statistics & Data Analysis 47(2), 277–95.

Table 5. Fitted PAR model to the Fraser river data. The estimated parameters and their standard errors (in parentheses). The residual standarddeviation is given on the last column. Period 1 corresponds to January

/1 /2 /3 /4 /5 /6 /7 /8 /9 /10 /11 /12 r

m ¼ 1 0.664 0.029(0.063)

m ¼ 2 0.733 0 0 0 0 0 0 0 0 0 0 0.229 0.022(0.063) (0.065)

m ¼ 3 0.813 0.028(0.074)

m ¼ 4 0.765 0.089(0.123)

m ¼ 5 0.177 0 0 0 0 0 0 0 0 0 0.360 0.044(0.065) (0.137)

m ¼ 6 0.278 �0.211 0 0 0 0 0.181 0.023(0.077) (0.049) (0.056)

m ¼ 7 0.715 0.027(0.105)

m ¼ 8 0.751 0.015(0.067)

m ¼ 9 0.751 0.025(0.089)

m ¼ 10 1.159 �0.682 0.364 0.036(0.137) (0.210) (0.167)

m ¼ 11 0.754 0.052(0.091)

m ¼ 12 0.745 0.039(0.072)

E. URSU AND K. F. TURKMAN

wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012, 33 398–405

40

4

Page 8: Periodic autoregressive model identification using genetic algorithms

Basawa, I. V. and Lund, R. B. (2001) Large sample properties of parameter estimates for periodic arma models. Journal of Time Series Analysis 22, 651–63.

Boshnakov, G. N. (1996) Recursive computation of the parameters of periodic autoregressive moving-average processes. Journal of Time Series Analysis17, 333–49.

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods, 2nd edn. New York: Springer.Chatterjee, S., Laudato, M. and Lynch, L. A. (1996) Genetic algorithms and their statistical applications: an introduction. Computational Statistics & Data

Analysis 22, 633–51.Davis, R. A., Lee, T. C. M. and Rodriguez-Yam, G. A. (2008) Break detection for a class of nonlinear time series models. Journal of Time Series Analysis

29(5), 834–67.Franses, P. H. and Paap, R. (2004) Periodic Time Series Models. New York: Oxford University Press.Gaetan, C. (2000) Subset arma model identification using genetic algorithms. Journal of Time Series Analysis 21(5), 559–70.Gladyshev, E. G. (1961) Periodically correlated random sequences. Soviet Mathematics 2, 385–8.Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization & Machine Learning. Reading, MA: Addison-Wesley.Hannan, E. J. and Rissanen, J (1982) Recursive estimation of mixed autoregressive-moving average order. Biometrika 69, 81–94.Hipel, K. W. and McLeod, A. I. (1994) Time Series Modelling of Water Resources and Environmental Systems. Amsterdam: Elsevier.Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press.Jones, R. H. and Brelsford, W. (1967) Time series with periodic structure. Biometrika 54, 403–8.Lund, R. B. and Basawa, I. V. (2000) Recursive prediction and likelihood evaluation for periodic arma models. Journal of Time Series Analysis 21, 75–93.Lund, R. B., Shao, Q. and Basawa, I. V. (2006) Parsimonious periodic time series modeling. Australian & New Zealand Journal of Statistics 48, 33–47.Lutkepohl, H. (2005) New Introduction to Multiple Time Series Analysis. Berlin: Springer.McClave, J. T. (1975) Subset autoregression. Technometrics 17, 213–219.McLeod, A. I. (1993) Parsimony, model adequacy, and periodic autocorrelation in time series forecasting. International Statistical Review 61, 387–93.McLeod, A. I. (1994) Diagnostic checking periodic autoregression models with applications. Journal of Time Series Analysis 15, 221–33.Mitchell, M. (1996) An Introduction to Genetic Algorithms. Cambridge: MIT Press.Noakes, D. J., McLeod, A. I. and Hipel, K. W. (1985) Forecasting monthly riverflow time series. International Journal of Forecasting 1, 179–90.Pagano, M. (1978) On periodic and multiple autoregressions. The Annals of Statistics 6, 1310–7.Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6(2), 461–4.Sivanandam, S. N. and Deepa, S. N. (2008) Introduction to Genetic Algorithms. Berlin: Springer.Tesfaye, Y. G, Meerschaert, M. M. and Anderson, P. L. (2006) Identification of periodic moving average models and their application to the modeling of

river flows. Water Resources Research 42, W01419.Troutman, B. M. (1979) Some results in periodic autoregression. Biometrika 66, 219–28.Ula, A. T. (1990) Periodic covariance stationarity of multivariate periodic autoregressive moving average processes. Water Resources Research 26, 855–

61.Ursu, E. and Duchesne, P. (2009) On modelling and diagnostic checking of vector periodic autoregressive time series models. Journal of Time Series

Analysis 30(1), 70–96.

PAR MODEL IDENTIFICATION USING GA

J. Time Ser. Anal. 2012, 33 398–405 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa

40

5