air quality forecasting using a hybrid autoregressive and nonlinear model

7
Atmospheric Environment 40 (2006) 1774–1780 Air quality forecasting using a hybrid autoregressive and nonlinear model Asha B. Chelani , S. Devotta National Environmental Engineering Research Institute, Nagpur 440 020, India Received 10 October 2005; received in revised form 27 October 2005; accepted 9 November 2005 Abstract The usual practices of air quality time-series forecasting are based on applying the models that deal with either the linear or nonlinear patterns. As the linear or nonlinear behavior of the time series is not known in advance, one applies the number of models and finally selects the one, which provides the most accurate results. The air pollutant concentration time series contain patterns that are not purely linear or nonlinear and applying either technique may give inadequate results. This study aims to develop a hybrid methodology that can deal with both the linear and nonlinear structure of the time series. The hybrid model is developed using the combination of autoregressive integrated moving average model, which deals with linear patterns and nonlinear dynamical model. To demonstrate the utility of the proposed technique, nitrogen dioxide concentration observed at a site in Delhi during 1999 to 2003 was utilized. The individual linear and nonlinear models were also applied in order to examine the performance of the hybrid model. The performance is compared for one-step and multi-step ahead forecasts using the error statistics such as mean absolute percentage error and relative error. It is observed that hybrid model outperforms the individual linear and nonlinear models. The exploitation of unique features of linear and nonlinear models makes it a powerful technique to predict the air pollutant concentrations. r 2005 Elsevier Ltd. All rights reserved. Keywords: Time-series forecasting; ARIMA; Nonlinear dynamics; Hybrid model 1. Introduction In the air quality literature, time-series analysis is generally carried out to understand the cause and effect relationships, which in turn helps in forecast- ing the future concentrations. In this direction, a class of techniques including autoregressive inte- grated moving average (ARIMA) or Box–Jenkins models (Shi and Harrison, 1997; Milionis and Davies, 1994; Zennetti, 1990) and structural models (Schlink et al., 1997) have been applied to analyze air pollutant concentrations. These approaches are widely applied in the air-quality literature due to the lack of data on emissions of air pollutants. Although these models are quite flexible as they can represent several different types of time series, their major limitation is the pre-assumed linear form of the model. The approximation of linear models to real-world problems is not always satisfactory. For example, the air pollutant concentrations are influenced by several factors in the atmosphere and prediction using linear models may not always give reasonable results (Benarie, 1987). As an ARTICLE IN PRESS www.elsevier.com/locate/atmosenv 1352-2310/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2005.11.019 Corresponding author. E-mail address: [email protected] (A.B. Chelani).

Upload: asha-b-chelani

Post on 04-Sep-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

ARTICLE IN PRESS

1352-2310/$ - se

doi:10.1016/j.at

�CorrespondE-mail addr

Atmospheric Environment 40 (2006) 1774–1780

www.elsevier.com/locate/atmosenv

Air quality forecasting using a hybrid autoregressive andnonlinear model

Asha B. Chelani�, S. Devotta

National Environmental Engineering Research Institute, Nagpur 440 020, India

Received 10 October 2005; received in revised form 27 October 2005; accepted 9 November 2005

Abstract

The usual practices of air quality time-series forecasting are based on applying the models that deal with either the linear

or nonlinear patterns. As the linear or nonlinear behavior of the time series is not known in advance, one applies the

number of models and finally selects the one, which provides the most accurate results. The air pollutant concentration

time series contain patterns that are not purely linear or nonlinear and applying either technique may give inadequate

results. This study aims to develop a hybrid methodology that can deal with both the linear and nonlinear structure of the

time series. The hybrid model is developed using the combination of autoregressive integrated moving average model,

which deals with linear patterns and nonlinear dynamical model. To demonstrate the utility of the proposed technique,

nitrogen dioxide concentration observed at a site in Delhi during 1999 to 2003 was utilized. The individual linear and

nonlinear models were also applied in order to examine the performance of the hybrid model. The performance is

compared for one-step and multi-step ahead forecasts using the error statistics such as mean absolute percentage error and

relative error. It is observed that hybrid model outperforms the individual linear and nonlinear models. The exploitation of

unique features of linear and nonlinear models makes it a powerful technique to predict the air pollutant concentrations.

r 2005 Elsevier Ltd. All rights reserved.

Keywords: Time-series forecasting; ARIMA; Nonlinear dynamics; Hybrid model

1. Introduction

In the air quality literature, time-series analysis isgenerally carried out to understand the cause andeffect relationships, which in turn helps in forecast-ing the future concentrations. In this direction, aclass of techniques including autoregressive inte-grated moving average (ARIMA) or Box–Jenkinsmodels (Shi and Harrison, 1997; Milionis andDavies, 1994; Zennetti, 1990) and structural models

e front matter r 2005 Elsevier Ltd. All rights reserved

mosenv.2005.11.019

ing author.

ess: [email protected] (A.B. Chelani).

(Schlink et al., 1997) have been applied to analyzeair pollutant concentrations. These approaches arewidely applied in the air-quality literature due to thelack of data on emissions of air pollutants.Although these models are quite flexible as theycan represent several different types of time series,their major limitation is the pre-assumed linear formof the model. The approximation of linear modelsto real-world problems is not always satisfactory.For example, the air pollutant concentrations areinfluenced by several factors in the atmosphere andprediction using linear models may not alwaysgive reasonable results (Benarie, 1987). As an

.

ARTICLE IN PRESSA.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–1780 1775

alternative, nonlinear models have been proposed inthe literature. Artificial neural networks are one ofthe potential examples of nonlinear models that areapplied to model and predict air pollutant concen-trations (Gardner and Dorling, 1998). These modelsare generally developed using the external inputssuch as meteorology and emissions and output is theair pollutant concentration at a site (Gardner andDorling, 1999; Chelani et al., 2002). The applicationof these models is, however, restricted to the someparticular cases where the data on emissions andmeteorological parameters are available.

Recently the time-series forecasting based on thenonlinear dynamical theory or chaos theory hasbeen extensively studied and used in the forecastingof air pollutant concentrations (Raga andLeMoyne, 1996; Li et al., 1994; Chen et al., 1998;Kocak et al., 2000). The basic assumption involvedin the application of these techniques is that thesingle air pollutant concentration time series con-tains the effect of all the influencing factors. Themajor advantage of these techniques over ARIMAis their ability to take into account the nonlineardynamics involved in the time series. The informa-tion about the linearity or nonlinearity of the timeseries is however, not available in advance. So oneapplies the number of linear and nonlinear modelsand finally selects the one, which provides the mostaccurate results. Also, the real time series containspatterns that are not purely linear or nonlinear andapplying either of the techniques may give inade-quate results. Hence the models need to bedeveloped that consider both linearity and non-linearity involved in the time series. This helps inimproving the forecasting ability of the model. Also,combining different models can increase the chanceto capture different patterns in the data andimprove forecasting performance (Clemen, 1989;Newbold and Granger, 1974).

In this study, a hybrid methodology is thereforeproposed to tackle the problem of modeling the airpollutant time series with linear and nonlinearpatterns. For this, the concepts from ARIMAmodel and nonlinear dynamical systems theory areutilized. The proposed technique is applied to thetime series of nitrogen dioxide (NO2) concentrationsin ambient air measured at a site in Delhi during1999–2003. In order to compare the forecastingefficiency of the proposed hybrid model, ARIMAand nonlinear models are also developed individu-ally and the results of these models are thencompared with the hybrid model.

2. Box–Jenkins ARIMA models

ARIMA linear models have dominated manyareas of time series forecasting. As the applicationof these models is very common, it is described herebriefly. In general, a nonseasonal time series, xt;t¼1...n

(n being the number of observations) of airpollutant concentrations measured at an equal timeintervals, can be modeled as a combination of pastvalues and past errors as

xt ¼ a1xt�1 þ a2xt�2 þ � � � þ apxt�p þ et � b1et�1

� b2et�2 � � � bqet�q, ð1Þ

where a and b are the coefficients, p and q are theorder of the autoregressive and moving averagepolynomials, respectively. The further details toestimate the parameters and order of the model aregiven in Box and Jenkins (1970).

3. Nonlinear dynamical modeling

The nonlinear dynamical modeling involves thereconstruction of phase space of the time series todescribe the behavior of a nonlinear system. Aphase space is an abstract construct whose coordi-nates are the components of the state (Cambel,1993). In general, phase space is nothing but thecollection of all possible variables underlying thesystem. The phase space portrait can be analyzedmathematically to demonstrate the presence of anattractor and its dimension. An attractor charac-terizes the long-term behavior of the system in thephase space (Martelli, 1999). If an attractor exists,then the minimum number of independent variablesdescribing the system can be estimated by comput-ing the dimension of the attractor.

The general nonlinear prediction method is toreconstruct the phase space from the set of data in aminimum embedding space and then predict thefuture using a local approximation function com-puted from the set of given data (Farmer andSidorowich, 1987; Abarbanel et al., 1993; Takens,1981). According to Takens’ embedding theorem,the predictions can be obtained from the set ofprevious data points using the functional relation-ship

X nþT ¼ f ðX nÞ, (2)

where X n is a vector of data points defined by

X n ¼ ðxn; xn�t; xn�2t; . . . ; xn�ðm�1ÞtÞ, (3)

ARTICLE IN PRESSA.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–17801776

where n is the number of data points in the timeseries, T is the prediction lead-time, t is the timedelay and m is the embedding dimension. The timelag t can be obtained by using the mutualinformation IðtÞ (Fraser and Swinney, 1986) definedas

IðtÞ ¼Xt;tþt

pðxt;xtþtÞ lnpðxt; xtþtÞ

pðxtÞpðxtþtÞ, (4)

where pðxtÞ and pðxtþtÞ are the probabilities to find atime-series value in the tth and ðtþ tÞth interval,respectively, and pðxt; xtþtÞ is the joint probabilitythat an observation falls into the tth and ðtþ tÞthinterval. These probabilities can be obtained byplotting the histogram of the data. This functionneeds to be computed for various t and that valuecan be considered as optimum t, where this functionexhibits a first minimum. An embedding dimensionm of a dynamical system is an integer that gives thenecessary number of coordinates to unfold itsdynamics. It can be obtained by using falsenearest-neighbor method (Kennel et al., 1992).According to this, for every vector Xt, its nearestneighbors X 0t can be obtained by computing thedistance jjX t � X 0tjj between the two vectors. Thefalse nearest neighbor can be obtained if,

Ft ¼jjX tðmþ 1Þ � X 0tðmþ 1Þjj

jjX tðmÞ � X 0tðmÞjj, (5)

exceeds a threshold value say ‘r’. The details of thismethod are given in Kennel et al. (1992).

After computing the appropriate values of t andm, the next step is to establish the functionalrelationship in Eq. (2) using local approximations.Given the vector Xn from which the predictions areto be made, one selects its nearest neighbors X 0n byusing the Euclidean distance between the twovectors. With this, the local functions can then bebuilt, which take each point in the neighborhood tothe next neighborhood, i.e., X 0n to X 0nþT . Theapproximation of function f then can be obtainedin terms of local polynomial maps, which can beexpressed as

X 0nþT ¼ Aþ B X 0n þ CðX 0nÞ2, (6)

where A, B and C are the coefficients that are to bedetermined from the learning sets by using least-squares estimation. The predicted point is then setas the new starting vector and the above process canbe repeated to predict the other values.

4. The hybrid methodology

The ARIMA and nonlinear models are useful formodeling linear and nonlinear time series, respec-tively. For modeling the nonlinear time series,ARIMA models, however, do not provide accurateresults and applying the nonlinear models to linearproblems is not a reasonable step. Hence hybridiz-ing the linear and nonlinear models would givebetter performance as compared to applying theindividual models (Gooijer and Kumar, 1992). Thehybrid methodology is based on the combination oflinear autocorrelation structure and nonlinear part,which can be given as

yt ¼ lt þ nt, (7)

where lt and nt denote the linear and nonlinearcomponent of time series xt and these are to beestimated from the data. To compute these twocomponents, the first step is to apply ARIMAmodel using the procedure described above to thetime-series data. The next step is to obtain theresiduals of the ARIMA model. The residuals nowrepresent the nonlinearity part of the data. Let theresiduals be denoted as rt at time t as

rt ¼ yt � lt, (8)

where lt is the forecast value for time t from Eq. (1).These residuals can be used to evaluate the modelperformance. Accepting the model implies that thelinear correlations are not significant in the resi-duals. However, there may be some nonlineardependence among the residuals and residualanalysis would not be able to capture this non-linearity. Hence modeling residuals using nonlineartechniques can provide an insight into the nonlinearrelationships in the data. The next step in the hybridmethodology is to model residuals using the non-linear modeling technique described in earliersections. Let the forecast from the nonlinear modelbe denoted as nt. The time-series forecast cantherefore be obtained as

yt ¼ lt þ nt. (9)

Describing briefly, the hybrid technique consistsof two steps; first an ARIMA model is used tomodel the linearity in the data and then a nonlinearmodel is developed to model the residuals from theARIMA model. The results from the nonlinearmodel can be used as predictions of the residualterms of the ARIMA model. To evaluate theperformance of the proposed model, the error

ARTICLE IN PRESS

020406080

100120140160180

Year

NO

2 (µ

g m

-3)

1999 2000 2001 2002 2003

Fig. 1. NO2 time series observed at a site in Delhi during

1999–2003.

1999 2000 2001 2002 20030

50

100

150

200

YearN

O2

(µg

m-3

)

Observed Predicted using AR(2) model

Learning set Predictionset

Fig. 2. Prediction performance of AR(2) model for NO2

concentration observed at a site in Delhi.

A.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–1780 1777

statistics such as correlation between observed andpredicted concentrations, root mean square error(RMSE), mean absolute percentage error (MAPE),relative error (RE) are utilized. These test statisticscan be obtained as

MAPE ¼ 100x1

n

Xn

t¼1

observedt � predictedt

observedt

��������,

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

n

Xn

t¼1ðobservedt � predictedtÞ

2

r,

RE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnt¼1ðobservedt � predictedtÞ

2Pn�1t¼1 ðobservedt � observedtþ1Þ

2

s, ð10Þ

where ‘observed’ denotes the observed and ‘pre-dicted’ denotes the estimated values by the proposedmodel.

5. Study area and data used

To examine the effectivity of proposed hybridmodel over real data, the time series of nitrogendioxide concentration observed in ambient airduring 1999–2003 at a site in Delhi was utilized.Delhi, the capital city of India (Latitude 281350N,Longitude 771150E) with a population of around13.8 million, is among the most polluted cities in theworld. The region has a semi-arid climate that isoften described as tropical plain, with extremely hotsummers, heavy rainfalls in the monsoon months(approximately 73 cm) and cold winters. Theselected site is termed as ‘commercial’ based onthe relative activities in and around the site. Thesampling frequency of NO2 was 4 hourly monitoredround the clock in a day and twice a week. The dailyaverages were taken; hence in each year, 104measurements were available (NEERI Report2001,2002,2003,2004).

6. Results and discussions

The time series of NO2 concentration observedduring 1999–2003 is plotted in Fig. 1. The concen-tration of NO2 varies between 33 and 128 mgm�3

with an average and standard deviation of 61 and17.8 mgm�3, respectively. Maximum concentrationis generally observed in winter months fromNovember to February with minimum concentra-tion in monsoon season during July to October.During winter months, the levels have exceeded theregulatory limits stipulated by Central PollutionControl Board (CPCB). It can be observed from

Fig. 1 that NO2 time series is stationary withoscillating characteristics. For modeling purpose,the time series is divided into two parts; learningand prediction sets. The data observed during1999–2002 is considered as learning set and thedata observed during 2003 is considered as predic-tion set.

The codes in MATLAB (Beale, 1997) werewritten for all the computations. For ARIMAmodeling, the order of the model is selected byplotting the autocorrelation and partial autocorre-lation functions. An autoregressive model of order 2,i.e. AR(2) is found to be appropriate. With thismodel order, autoregressive model was fitted to thelearning data. The model parameters were obtainedby adopting the Box–Jenkins methodology. Theone-step ahead predictions were then obtained andplotted in Fig. 2. The residual analysis was carriedout to examine the significance of autocorrelationsin the residuals. The insignificant autocorrelationswere observed in the residuals of AR(2) model fittedto NO2 time series. RMSE, MAPE, RE andcorrelation between observed and predicted NO2

concentration is given in Table 1.For nonlinear modeling, phase space was recon-

structed using Eq. (3). Here choice of m and t iscrucial for proper unfolding of the dynamics

ARTICLE IN PRESS

Table 1

Forecasting performance of three models for the prediction of

NO2 concentration

Prediction

technique

Relative

error

Mean absolute

percentage error

Root mean

square error

Corr�

ARIMA 0.24 17.3 58.78 0.90

Nonlinear

prediction

0.23 11.6 55.37 0.91

Hybrid

model

0.19 5.37 13.93 0.93

�Corr indicates the correlation between observed and predicted

time series.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

Time Lag

Ave

rage

Mut

ual I

nfor

mat

ion

(bit

s)

Fig. 3. Average mutual information function for NO2 time series.

05

1015202530354045

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Time Lag

fals

e ne

ares

t ne

ighb

ours

(%

)

Fig. 4. Percentage of false nearest neighbours for NO2 time

series.

020406080

100120140160180

Year

NO

2 (µ

g m

-3)

NO

2 (µ

g m

-3)

Observed Predicted using Hybrid model

(b)

Prediction setLearining set

020406080

100120140160180

Year

Observed Predicted using Nonlinear model

(a)

Learning setPrediction

set

1999 2000 2001 2002 2003

1999 2000 2001 2002 2003 2004 2005

Fig. 5. Prediction performance of (a) nonlinear and (b) hybrid

model for NO2 time series observed at a site in Delhi.

A.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–17801778

involved in the time series. The time delay t iscomputed using the average mutual informationfunction, which is plotted against various lags inFig. 3. That lag value can be selected as optimum t,where average mutual information function exhibitsthe first minimum. Mutual information functionI(t) shows a clear first minimum at t ¼ 13. Usingthis value of t, false nearest-neighbor search wascarried out to compute embedding dimension m.The threshold ‘r’ is selected as equal to 15 followingAbarbanel (1996), who found that for many non-linear systems ‘r’ approaches 15. Fig. 4 shows thepercentage of false nearest neighbors as a functionof dimension. Minimum embedding dimension isthat value for which the false nearest neighbors goesto zero. For the case of NO2 time series, thefunction decreases rapidly at m ¼ 6 and remainsapproximately constant at zero. Hence an embed-ding dimension of 6 should be sufficient to representthe system.

After selecting the appropriate values for t and m,nonlinear modeling was performed using the abovedescribed method. The learning set (1999–2002) wasused for reconstructing the phase space with anembedding dimension of 6 and time lag of 13. Forexample, the phase space vector Xn for n ¼ 66 canbe formed as ½x66; x53;x40;x27;x14;x1�. The nearestneighbors X 0n of X n (X 066 is denoted as the nearestneighbor of X 66) were searched out in the learningset. The one-step predictions (T ¼ 1 in Eqs. (2) and(6)) were then obtained by least-squares fitting. Inthis way, to predict each point in the phase space,function f is approximated using local polynomialequation and the coefficients A, B, and C arelearned from the learning set. The coefficients A, B

and C are not the constants rather depend on thelearning samples. For each point, its nearestneighbors are obtained and then projected to obtain

the next point. The prediction results are plotted inFig. 5a. The performance statistics were alsocomputed for this model and presented in Table 1.

ARTICLE IN PRESS

Table 2

Effect of prediction horizon on forecasting performance of three

models

Prediction

technique

8-period ahead 24-period ahead

RE MAPE RE MAPE

ARIMA 0.33 22.7 0.37 26.3

Nonlinear

prediction

0.29 19.3 0.34 29.8

Hybrid model 0.27 9.58 0.31 13.69

A.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–1780 1779

For hybrid modeling, first an autoregressivemodel of order 2 was applied to the NO2 timeseries and the residuals were obtained. The localapproximation model was then used to model theseresiduals. For this purpose, the time delay t ¼ 1 andan embedding dimension m ¼ 3 were found appro-priate for the residuals of the AR(2) model. Usingthese measures, nonlinear modeling technique givenin Eqs. (2) and (6) was applied to the reconstructedphase space of residuals and the predictions werethen combined to obtain the predictions of NO2

concentrations. The observed and predicted NO2

concentration is plotted in Fig. 5b.Evaluating the prediction performance quantita-

tively, the correlation between observed and pre-dicted time series of AR(2) model is lower than theother two models. Considering the RMSE, MAPEand RE, it is again evident that AR(2) model doesnot perform well as compared to other two modelsfor NO2 concentration prediction. Nonlinear modelor local approximation model, on the other hand,provides the forecasts with less RE and MAPE thanAR (2) model. However, comparing the perfor-mance of nonlinear and hybrid model, the errorstatistics is lower for hybrid model than thenonlinear model. Hybrid model results indicate thatit outperforms the AR(2) and nonlinear model. Inorder to assess the model performance in case of thehigh concentrations, peaks in December 2000 andJanuary 2001 were selected. The NO2 concentra-tions exceeded the CPCB guideline of 80 mgm�3 forthese peaks. The observed concentrations for thesepeaks were 150 and 155 mgm�3, where as the fittedconcentrations were 121 and 127 mgm�3 by AR(2)model, 129 and 145 mgm�3 by nonlinear model and137 and 148 mgm�3 by hybrid model. Consideringthe high concentration of 136 mgm�3 in predictionset during December 2003, the predicted concentra-tion by AR(2), nonlinear and hybrid model was 115,122 and 129 mg m�3, respectively. It can be notedthat AR(2) model does not perform well for highconcentration as it underestimate the levels, whereasnonlinear and hybrid models perform fairly well forhigh concentration levels. As the hybrid model givessensible results with less error than individual AR(2)and nonlinear models for the peak NO2 concentra-tion, this type of modeling approach can be usefulfor the cases where lower forecasting error isdesired.

It is also tried to obtain the predictions for 2004and 2005 to check the capability of the model toproduce the next year’s forecast. Although this

exercise was performed for all the three models,only the results of hybrid model are shown. Also, asthe observed data was not available to validate theresults, only the forecasts are plotted in Fig. 5b. Itcan be observed that the forecasts follow nearly thesame variations as previous years. As and when thelatest data are available, the model predictions canfurther be validated.

6.1. Effect of prediction horizon on prediction

performance

In order to examine the model sensitivity towardsmulti-step forecasting, two forecast horizons ofperiods 8 (one month ahead forecast) and 24(3 months ahead forecast) were used. The errorstatistics MAPE and RE were used for thecomparison of prediction performance. It can beobserved form Table 2 that applying nonlinearmodel alone can improve the forecasting accuracyover AR(2) model for the 8-period predictionhorizon, whereas the performance of the nonlinearmodel is getting worse as the time horizon extendsto 24 periods. This may suggest that neither thenonlinear model nor the AR(2) model captures allof the patterns in the data. On the other hand, theMAPE and RE for the hybrid model are lower ascompared to individual models. The results of thehybrid model show that by combining two modelstogether, the overall forecasting errors can besignificantly reduced. In terms of RE, the hybridmodel improved over the AR(2) and nonlinearmodel with a reduction of 26.31% and 21.05%,respectively for one step forecast. For 8-periodforecast, the improvement of hybrid model overAR(2) and nonlinear model is 22.22% and 7.4%,respectively and for 24-period forecasts, the reduc-tion is 19.35% and 9.6%, respectively. This showsthat hybrid model outperforms the individualmodels for multi-step forecasting.

ARTICLE IN PRESSA.B. Chelani, S. Devotta / Atmospheric Environment 40 (2006) 1774–17801780

7. Conclusions

In this paper, a hybrid approach is proposed toforecast the air pollutant concentrations. This isachieved by extracting the unique features of linearautoregressive model and nonlinear model based onthe chaos theory. The hybrid model is applied to thetime series of NO2 concentration observed at a site inDelhi. The autoregressive and nonlinear models werealso applied in order to compare the results of thehybrid model. The one-step ahead, 8-period and24-period ahead predictions were obtained to evaluatethe model’s capabilities in forecasting for differenttime horizons. The univariate approaches appliedabove assess the characteristics of the time series andprovide the predictions without having an under-standing of the mechanisms that govern the system.The prediction performance results show that thehybrid modeling can be an effective tool to forecastthe air pollutant concentrations instead of applyingindividual models. The study is useful for the caseswhere the data on other explanatory variables thatinfluence the air pollutant concentrations is notavailable. The developed model can also be appliedto predict other pollutants like ozone concentrations.

References

Abarbanel, H.D.I., Brown, R., Sidorowich, J.J., Tsimring, L.Sh.,

1993. The analysis of observed chaotic data in physical

systems. Review of Modern Physics 65, 1331–1392.

Beale, M., 1997. MATLAB Users Guide (Version 5.1). The Math

Works Inc, Natick, MA.

Benarie, M., 1987. The limits of air pollution modeling. Atmo-

spheric Environment 21 (1), 1–5.

Box, G.E.P., Jenkins, G.M., 1970. Time Series Analysis,

Forecasting and Control. Holden-Day, San Francisco, CA.

Cambel, A.B., 1993. Applied Chaos Theory—a paradigm for

complexity. Academic Press Inc., San Diego, CA.

Chelani, A.B., Gajghate, D.G., Hasan, M.Z., 2002. Prediction of

ambient PM10 and toxic metals using artificial neural

networks. Journal of the Air and Waste Management

Association 52, 805–810.

Chen, J.L., Islam, S., Biswas, P., 1998. Nonlinear dynamics of

hourly ozone concentrations: nonparametric short-term pre-

diction. Atmospheric Environment 32 (11), 1839–1848.

Clemen, R., 1989. Combining forecasts: a review and annotated

bibliography with discussion. International Journal of Fore-

casting 5, 559–608.

Farmer, D.J., Sidorowich, J.J., 1987. Predicting chaotic time

series. Physics Review Letters 59, 85–848.

Fraser, A.M., Swinney, H.L., 1986. Independent coordinates for

strange attractors from mutual information. Physics Review

A 33, 1134–1140.

Gardner, M.W., Dorling, S.R., 1998. Artificial neural networks

(the multilayer perceptron)—a review of applications in the

atmospheric sciences. Atmospheric Environment 32 (14/15),

2627–2636.

Gardner, M.W., Dorling, S.R., 1999. Neural network modeling

and prediction of hourly NOx and NO2 concentrations

in urban air in London. Atmospheric Environment 33,

709–719.

Gooijer, J.G., Kumar, K., 1992. Some recent developments in

non-linear time series modeling, testing, and forecasting.

International Journal of Forecasting 8, 135–156.

Kennel, M.B., Brown, R., Abarbanel, H.D.I., 1992. Determining

embedding dimension for phase space reconstruction using a

geometric method. Physics Review A 45, 3403–3411.

Kocak, K., Saylan, L., Sen, O., 2000. Nonlinear time series

prediction of O3 concentration in Istanbul. Atmospheric

Environment 34, 1267–1271.

Li, I.F., Biswas, P., Islam, S., 1994. Estimation of dominant

degrees of freedom for air pollutant concentration data:

applications to ozone measurement. Atmospheric Environ-

ment 28, 1707–1714.

Martelli, M., 1999. Introduction to Discrete Dynamical Systems

and Chaos. Wiley Inter Science Series in Discrete Mathe-

matics and Optimization. Wiley, New York.

Milionis, A.E., Davies, T.D., 1994. Regression and stochastic

models for air pollution—I, review, comments and sugges-

tions. Atmospheric Environment 28 (17), 2801–2810.

NEERI Report, 2001. Air Quality Status for ten cities of India.

NEERI Nagpur, India, No. 9–10.

NEERI Report, 2002. Air Quality Status for ten cities of India.

NEERI Nagpur, India, No. 11.

NEERI Report, 2003. Air Quality Status for ten cities of India.

NEERI Nagpur, India, No. 12.

NEERI Report, 2004. Air Quality Status for ten cities of India.

NEERI Nagpur, India, No. 13.

Newbold, P., Granger, C.W.J., 1974. Experience with forecasting

univariate time series and the combination of forecasts (with

discussion). Journal of the Royal Statistical Society Series A

137, 131–164.

Raga, G.B., LeMoyne, L., 1996. On the nature of air pollution

dynamics in Mexico city—I, non linear analysis. Atmospheric

Environment 30 (23), 3987–3993.

Schlink, U., Herbarth, O., Tetzlaff, G., 1997. A component time-

series model for SO2 data: forecasting, interpretation and

modification. Atmospheric Environment 31, 1285–1295.

Shi, J.P., Harrison, R.M., 1997. Regression modeling of hourly

NOx and NO2 concentrations in urban air in London.

Atmospheric Environment 31 (24), 4081–4094.

Takens, F., 1981. Detecting strange attractors in turbulence. In:

Rand, D.A., Young, L.S. (Eds.), Lecture Notes in Mathe-

matics, vol. 898. Springer, New York.

Zennetti, P., 1990. Air Pollution Modeling—Theories, Computa-

tional Methods and Available Software. Computational

Mechanics Publications.