linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

7
Atmosphrrx E~tuwonment Vol. 9. pp. 417-423. Pergamon Press 1975. Printed in Great Bntam LINEAR STOCHASTIC MODELS FOR FORECASTING DAILY MAXIMA AND HOURLY CONCENTRATIONS OF AIR POLLUTANTS GEORGE M. MCCOLLISTER Department of Mathematics, University of California, San Diego, La Jolla, California 92037, U.S.A. and KENT R. WILSON Department of Chemistry and Energy Center, University of California, San Diego, La Jolla, California 92037. U.S.A. (First received 3 June 1974 and injnul form 8 October 1974) Abstract-Two related time series models have been developed to forecast concentrations of various air pollutants and have been tested on carbon monoxide and oxidant data for the Los Angeles basin. One mode1 forecasts daily maximum concentrations of a particular pollutant using only past daily maximum values of that pollutant as input. The other model forecasts l-h average concentrations using only the past hourly average values. Both are found to be significantly more accurate than persistence, i.e. forecasting for tomorrow what occurred today (or yesterday). Model forecasts for 1972 of the daily instantaneous maxima for total oxidant made using only past pollu- tant concentration data were found to be somewhat more accurate than those made by the Los Angeles APCD using meteorological input as well as pollutant concentrations. Although none of these models forecast as accurately as might be desired for a health warning system, the relative success of simple time series models, even though based solely on pollutant concentration, suggests that models incorporating meteorological data and using either multi-dimensional times series or pattern recognition techniques should be tested. 1. INTRODUCTION Air pollution is a man made meteorological phenomenon. In major metropolitan areas its daily variation is large enough and its importance to health and esthetics is serious enough to merit inclusion in the group of meteorlogical variables which are forecast daily. When concentrations are high, a resulting decrease in respiratory efficiency and an im- paired ability to transport oxygen through the blood may be health hazards for individuals with preexisting respiratory and coronary artery disease (California, 1972). In addition, many individuals experience respiratory discomfort and eye irritation linked to pollutant concentrations, and curtailment of strenuous exercise is advised when pollutant levels (in particular total oxidant or ozone) reach specified levels (California, 1972; Chass et al., 1972). In polluted areas there is thus a need for warnings to the general public so that sensi- tive individuals can take necessary precautions. For an adequate health warning system, reasonably accurate forecasts of pollutant concentrations as a function of time and of loca- tion are necessary so that those persons with pollutant affected health problems can plan their activities in advance and so schools can cancel physical education classes. There is also the possibility that foreknowledge of high pollution potential could be used to reduce future atmospheric pollutant concentrations through timely reduction of emissions by traf- fic control or industrial shut-down (Elkus and Wilson, 1974). In this paper we describe the application of some simple statistical methods which may be of use in the air pollution forecasting problem. We have used methods of time series analysis (Box and Jenkins, 1970) to develop two stochastic models for air pollution con- centration. These models have been applied to the forecasting of oxidant and carbon monoxide in Los Angeles County. 2. DATA Monitoring data from the Los Angeles County Air Pollution Control District (LAAPCD) was used to develop and evaluate the models, which have also been tested on 417

Upload: george-m-mccollister

Post on 08-Oct-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

Atmosphrrx E~tuwonment Vol. 9. pp. 417-423. Pergamon Press 1975. Printed in Great Bntam

LINEAR STOCHASTIC MODELS FOR FORECASTING DAILY MAXIMA AND HOURLY CONCENTRATIONS

OF AIR POLLUTANTS

GEORGE M. MCCOLLISTER

Department of Mathematics, University of California, San Diego, La Jolla, California 92037, U.S.A.

and

KENT R. WILSON

Department of Chemistry and Energy Center, University of California, San Diego, La Jolla, California 92037. U.S.A.

(First received 3 June 1974 and injnul form 8 October 1974)

Abstract-Two related time series models have been developed to forecast concentrations of various air pollutants and have been tested on carbon monoxide and oxidant data for the Los Angeles basin. One mode1 forecasts daily maximum concentrations of a particular pollutant using only past daily maximum values of that pollutant as input. The other model forecasts l-h average concentrations using only the past hourly average values. Both are found to be significantly more accurate than persistence, i.e. forecasting for tomorrow what occurred today (or yesterday). Model forecasts for 1972 of the daily instantaneous maxima for total oxidant made using only past pollu- tant concentration data were found to be somewhat more accurate than those made by the Los Angeles APCD using meteorological input as well as pollutant concentrations. Although none of these models forecast as accurately as might be desired for a health warning system, the relative success of simple time series models, even though based solely on pollutant concentration, suggests that models incorporating meteorological data and using either multi-dimensional times series or pattern recognition techniques should be tested.

1. INTRODUCTION

Air pollution is a man made meteorological phenomenon. In major metropolitan areas its daily variation is large enough and its importance to health and esthetics is serious enough to merit inclusion in the group of meteorlogical variables which are forecast daily. When concentrations are high, a resulting decrease in respiratory efficiency and an im- paired ability to transport oxygen through the blood may be health hazards for individuals with preexisting respiratory and coronary artery disease (California, 1972). In addition, many individuals experience respiratory discomfort and eye irritation linked to pollutant concentrations, and curtailment of strenuous exercise is advised when pollutant levels (in particular total oxidant or ozone) reach specified levels (California, 1972; Chass et al.,

1972). In polluted areas there is thus a need for warnings to the general public so that sensi- tive individuals can take necessary precautions. For an adequate health warning system, reasonably accurate forecasts of pollutant concentrations as a function of time and of loca- tion are necessary so that those persons with pollutant affected health problems can plan their activities in advance and so schools can cancel physical education classes. There is also the possibility that foreknowledge of high pollution potential could be used to reduce future atmospheric pollutant concentrations through timely reduction of emissions by traf- fic control or industrial shut-down (Elkus and Wilson, 1974).

In this paper we describe the application of some simple statistical methods which may be of use in the air pollution forecasting problem. We have used methods of time series analysis (Box and Jenkins, 1970) to develop two stochastic models for air pollution con- centration. These models have been applied to the forecasting of oxidant and carbon monoxide in Los Angeles County.

2. DATA

Monitoring data from the Los Angeles County Air Pollution Control District (LAAPCD) was used to develop and evaluate the models, which have also been tested on

417

Page 2: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

AIR MONiT~RlNG STATIONS IN THE LOS ANGELES BASIN

I -

* AZUSO

* West L A * Downtown L A 4 Pomona

Sun Bpordino

Redkxds

l Riverside

* Los Angeles Cwnfy Stations

l Othw Stations

Fig. I. Air monitoring stations in the Los Angclcs basin.

data from other California locations supplied by the California Air Resources Board. The locations of the stations used in the work reported here are shown in Fig. 1. For this study, two pollutant measures, total oxidant (OX) and carbon monoxide (CO) concentrations, have been chosen. because of their relatively high levels in the basin and long standing concern about their relationship to health effects. Total oxidant concentration, in large part photochemically produced ozone, is very low at night, then increases with sunlight, usually reaching a maximum during the early afternoon, before subsiding at dusk. In con- trast to the usual single OS peak. CO often shows a sharp morning traffic peak and a slower build-up in the stagnant night air. Thus the time of the CO maximum is more vari- able. Oxidant concentrations are generally greater further inland whereas CO con- centrations are usually greater near the downtown area.

Several related types of linear stochastic models for forecasting pollutant concentrations were fitted to Los Angeles County 196 LP 1971 OX and CO data. The time series techniques used, which can also be applied to longer time periods (Mere et aI., 1972; Tiao et al.. 1973a,b) and to other pollutants (Saxena rf (II., 1971), have been admirably described in a treatise by Box and Jenkins (1970) and we will only sketch them very briefly here. Two models were finally chosen, with the aim of maxilnizing accuracy as well as simplicity. Each of the chosen models has two parameters, 4 and 0, which must be determined separ- ately for each pollutant and for each station by a least squares fitting procedure. Once these two numbers are known, future values for that pollutant at that station can be easily and quickly forecast given past values. Of course, the further ahead one forecasts from a point where the past values are known, the less reliable the forecast. Mode1 I uses past daily maximum values of a pollutant to forecast future daily maximum values for that pollutant. The maximum values can be either the maximum of average hourly measurements taken for the day or the maximLlm concentration reached during the day for any instant, some- times called the instantaneous maximum. Model II uses past hourly average con- centrations of a pollutant to predict future hourly average values.

A time series is a sequence in time of observations, representing here a sequence of pollu- tant concentrations denoted by zl, zz, . . . , zn. Given 17 values, we wish to forecast the next I values. The general form of the linear stochastic model we consider is (Box and Jenkins, 1970)

Wt - (I, i”l’, 1 - . . , - .&wi ,, = (I, - fl,u,. f - . - ~~~~~ 4’ (11

in which w, = Vdz,. V is the backward difference operator; VL, = z, - z,_ ,. which may be

Page 3: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

Linear stochastic models 419

repeated d times. For time series with a cycle or period of s time intervals, we can also use V, where Vp, = z, - z,_,. CO and Ox, for example, have a 24 h daily cycle, so we let s = 24 for an hourly time series. Finding such a model for a time series involves first decid- ing how many times to difference with V, choosing how many parameters @t, . . ., $p and 0 8 to include and then adjusting them to obtain optimal forecasts. If this is done I,“‘? 4 correctly, then the residuals a,, a,_.,, . . , at-q which relate w, to past values w,~, . . ., w,_.~ are a sequence of normally distributed random values with mean zero and variance cri. They represent all the factors other than the past values w,~, M)$_~, . . _ which actually determine the values of the time series. Since their expected value is zero, forecasts are made by setting to zero the unknown residuals yet to occur.

Model I, which is the simpler in that its time series has only one value per day, the maxi- mum value of the pollutant, instead of 24 different hourly values, is

(z, - z,. 1) - &_.I - z,_~) = a, - OU,~. (2)

This model, which in Box and Jenkins (1970) notation would be (1,1,1), may also be more simply expressed as

~3, - $w,_~ = (I, - Oa,_,, (3)

if we let w, = Vz, = Z, - z~_,. Given (b and 0 for the particular pollutant and monitoring station from fitting the model to past pollutant values, future values of pollutant con- centration, z^,+ 1, ;,+ 2, . . . , may sequentially be forecast from past concentrations using equation (2) by setting unknown future residuals, a,,, 1, an+2, . . ., to zero.

Model II, for hourly time series with a 24 h cycle or seasonality, is

-?t - z,24 - &_, - ~2~) = a, - %24,

which, if we let w, = V14z, = Z, - zt_24, may be written as

(4)

ct’, - @_I = L1, - &Z_z,. (5)

In Box and Jenkins (1970) notation, this is a (l,O,O) x (O,l,l),, model. The # and 19 para- meters fitted to Ox and CO data for nine LAAPCD stations for different years are shown in Table 1. As can be seen, the values are similar year by year, and do not differ greatly from station to station.

Twenty other more complex models of Box and Jenkins (1970) type were tried. Better x2 statistics when compared to the total number of fitted parameters could be obtained, but the improvement in the variance of the residuals was too small to warrant their use. In other words, more complex models were found which were technically more faithful, but the accuracy of their predictions was only marginally better, and they were rejected on grounds of parsimony. Models I and II are simple enough that, once 4 and 0 are caicu- lated (a calculation which needs updating only every year or so), pollutant forecasts can be made in a few minutes with a hand calculator.

4. MODEL ACCURACY

The accuracy of forecasts by the models may be judged both against the accuracy of forecasts by alternative models, and against levels of accuracy required by model applica- tions.

Two alternative models are available: LAAPCD predictions and persistence. Predic- tions of tomorrow’s instantaneous oxidant maxima expected at each monitoring station are regularly made by the LAAPCD at 10:00 h of each day, using trained personnel who evaluate pollutant concentration data as well as meteorological data (Wachtenheim and Keith, 1969; Chass et al., 1972; Davidson, 1974). The data input is thus richer than we use in the simple one dimensional time series models described in this paper, which are based on pollutant concentrations alone. Persistence, on the other hand, assumes that the future exactly replicates the past. To compare with the LAAPCD forecasts, we take a per- sistence model which predicts that tomorrow’s oxidant maximum is the same as yester- day’s. (It is unlikely that today’s oxidant maximum occurs before 1O:OO h, so yesterday’s must be used.)

Page 4: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

420 GECXW M. M~COLIJSIIR and KENT R. WILSON

Table 1. Model 11 (hourly ~~~n~entration~ parameters fb and U for Los Angeles county stations, &ted to data from different years

- _.._ --~ Station Oxidant CO

NQ. Location Year 4 (I 9 0

I Downtown LA 1964 @X34 0.873 OX71 (3870 1970 0,848 0911 0.860 0,908 1971 OX30 0.843

60 Azu sa 1969 0~855 0,791 O-887 0.902 1970 0859 0.89-I 0%2 0.938 1971 0.854 0,857

69 Burbank 1969 0845 0.822 0‘8% 0.811 1970 (1.860 0,872 0.902 0.889 1971 0.836 0,818

71 west LA I969 @X57 092 1 O-858 0.891 1970 O.X?h 0,947 O-849 0913 1971 0829 0.90 I

72 Long Beach 1969 0~77.1 @93’ 0X5X 0~830 1970 0748 095.7 0867 0.919 1971 0688 ff929

74 Reseda i 90 f&864 0~869 @3-445 if%xl I9717 0,865 0,927 (b9IO o-929 1971 0.856 0%70

75 Pomona 1%‘) 0,859 @XI0 0.91 i OK23 I970 0.84s 0.9 12 0x17 0.896 1971 @X63 @9OY

76 Lennox 1969 0,784 0~940 0.914 0.X64 1Y70 0.788 0.959 0*X51 0.912 1971 07X.5 0952

80 Whittier $969 (data not availahfe) 1970 O~XO? 0940 0.X94 iH+E 1971 0~803 0842

Figure 2 compares Model I with the LAAPCIYs forecasts and with persistence, using as a measure of accuracy the mean of the absolute value of the forecasting error divided by the mean of the actual &oncentrations. Comparison of the forecasting systems in terms of the variance, cr’, gives similar relative results. {One must be careful to recognize the ac- curacy limitations in {zt] itself. For example, night-time values of oxidant are usually so low that they are within the probable error of standard instrumentation, and thus are often assigned rather arbitrary values.) In order to test Model I, independent data sets were used for fitting and for comparison. Years prior to 1972 were used to fit 8 and C#J and Model I forecasts based on previous days’ concentrations were compared with 1972 actual con- ce~~trations.

As can be seen, all three prediction systems lie within a relatively small range of accu- racy, with, on the average, persistence the least accurate, the LAAPCD predictions in the

_.c

:: g 02

i iO:OOh Doy before

Statson No.

Fig. 2. A comparison of the accuracy of forecasts’ for I972 of the daily instantaneous ‘maximum concentration oftotal oxidant made by Model I, by the LA.APGD and by persistence. The forecasts were made using actual data that occurred previous ta 1O:OO h the day before the,day being fore- cast. For each station the vertical scale used as a measure of forecast accuracy is the mean of the absolute value of the forecast errors divided by the mean of the actual instantaneous maxj~um

concentrations. AVG indicates the average of thisquantity over all the stations.

Page 5: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

Linear stochastic models 421

middle, and Model I the most accurate. Note, however, that the mean error is almost half the mean value.

One can also easily predict more than just maximum values. Figure 3 shows the accu- racy at nine Los Angeles County stations for forecasts of hourly average oxidant levels, using both persistence and Model II. The accuracy, by the measure used, is almost identi- cal to that of the Model f maxima forecasts, if the forecast is made at the same time (10:00 h the day before) and improves somewhat if the forecast is made at a time (06:OO h the same day) closer to the prediction period. Again, the simple time series model is some- what more accurate than persistence.

al

SO?-, 8 ! I , I I I I ,

> r o,6 _ OXHourly ::

E 0.5 -o--0. Persistence

.._,_ w_._O . .._.. .p.....Q . . . . . 0 . . ..I. 0.‘” ..a

0 \

tj 0.4-- e--- ._ k Model 9, 03- ..d 2 o.2- IO:OOh Day before

D

s O,I -

(al

i O

Station No

Fig. 3. The accuracy of persistence and of Model II in forecasting hourly values of oxidant during 1972 is compared. In Fig. 3(a) hourly values for all 24 h are forecast using as input the actual oxi- dant values that occurred before 1O:OO h the day preceding the day being forecast. Persistence uses as a forecast the latest 24 h values previous to IO:00 h. In Fig. 3(b) the actual hourly vaiues of oxi-

dant up to 06:OO h are used.

Figure 4 shows Model II applied to CO instead of oxidant. Again, fitting and compari- son are with independent data sets. The parameters d, and 6’ (shown separately for 1969 and 1970 in Table 1) are fitted to data previous to 1972, and then the model applied to each day in 1972. While it has previously been assumed that CO is harder to predict than oxidant (Chass et al., 1972), the times series model used, at least judged by the accuracy criterion given, is more accurate for CO than oxidant, particularly so as the forecasting interval shortens. If the forecast is made at 06:fXl h on the day to be forecast, the next 24 h are modelled reasonably well, as shown in Fig. 4(b).

Figure 5 illustrates pollu~nt patterns over the South Coast basin. From forecast or measured pollutant concentration values at the stations shown in Fig. 1, a con~n~ation surface is interpolated passing through each station value. Computer animated films have been made comparing the measured and forecast temporal and spatial pollutant patterns in the basin. Such films allow the perception of whole basin patterns, and could be the basis of communication of air quality predictions to the public, for example, as part of television weather forecasts. It is clear upon viewing the films that the time series forecasts are reasonably good predictions of the overall spatial pattern (values of one station relative to another) and temporal pattern (overall daily rise and fall of concentration), but are weak in anticipating the beginning and end ofepisodes of high or low air pollution. It is precisely in the neglected meteorological variables that one might expect to find the determinants

Page 6: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

422 Gto~w M. MCCOLLISIER and Km r R. WII ‘jox

IO:OOh Day before (a)

i a:[ <06:00; Same day tbj 1

I I 60 63 71 72 74 75 76 80 Avg

Station No

Fig. 4. The accuraq ofpersistencc and of Model II in forecasting hourly values of (‘0 during I Y77 is compared. The conditions of the calculations are the same as for Fig. 3.

13:OOh Saturday 29 July 1972 2I:OOh Saturday 9December 1972

SC

2c ic, t

0 30

“p 20 2 IO 0

12:OOh Sunday30July 1972 OI:OOh Sunday IO December 1972

13:OOh Monday 31 July 1972 08:OOh Monday II December 1972

Fig. 5. Measured and forecast oxidant and CO pollutant patterns in the 1,~ Angeles (South Coast) air basin. covering portions of Los Angeles (LA). San Bernardino (SB), Orange (OR) and Riverside (RIV) Counties. Measured and forecast values al each of the 17 monitoring stations in Fig. 1 are used as the basis of interpolated surfaces passing through each of the data points. Values outside the air basin. where data is insufficient. are arbitrarily set to lero. Shown here arc the “basin hourly maximum” times, the hour out of 24 h when the hourly average concentration aberaged over all

17 stations reached its maximum. The forecasts are made each midnight for the next 24 h.

Page 7: Linear stochastic models for forecasting daily maxima and hourly concentrations of air pollutants

Linear stochastic models 423

of such episodes, and thus the next logical step would seem to be the inclusion of meteoro- logical variables explicitly in model formation (Davidson, 1974). In addition, the develop- ment of such models might provide clues to a deeper understanding of the deterministic chemical and meteorological processes involved.

5. CONCLUSION

It has been shown that simple one dimensional time series models based only on pre- vious pollutant level data can be used to forecast daily maximum and hourly average con- centrations of total oxidant and CO. The models may easily be fitted to previous pollutant data in order to derive the appropriate parameter values of $J and 0 for each pollutant at each station. Once the # and N parameters are calculated, forecasts can be made from pollutant concentration data in a few minutes with a desk calculator or a few seconds with a computer. Relatively rapid availability of pollutant (and meteorological) data is desir- able, for forecasting accuracy drops off with time.

The accuracy of the time series predictions appears better than that of other available forecasting techniques: persistence and the LAAPCD subjective method. On the other hand, the accuracy is still short of that desirable for a health warning system. Possible extensions include: (i) the direct inclusion of meteorological data, through multidimen- sional time series analysis and pattern recognition techniques; and (ii) the application of control theory techniques in an attempt to improve air quality through short term emis- sions limitations in response to pollution forecasts.

Acknowletlyrmettt.~We thank the LAAPCD and the California Air Resources Board for supplying the data upon which this study is based.

REFERENCES

Box G. E. P. and Jenkins G. M. (1970) ‘Fim~’ Srrirs .4ntrlysis. Forerusting ad Control. Holden-Day, San Fran- &CO.

California Air Resources Board (1972) An evaluation of a medical advisory notice to persons with respiratory disease or coronary artery disease. Technical Advisory Committee Report.

Chass R. L., Birakos J. N., Brunelle M. F. and Mosher J. (1972) The school and health smog warning system of Los Angeles county. Los Angeles Air Pollution Control District, Paper No. 72-59. Presented at the 65th Annual Meeting of the Air Pollution Control Association.

Davidson A. (1974) An objective ozone forecast system for July--October in the Los Angeles basin. Los Angeles Air Pollution Control District, Technical Services Division Report.

Elkus B. and Wilson K. R. Photochemical air pollution: Weekend--weekday differences. Submitted to Encirorz. Sci. Trehnal.

Merz P. H., Painter L. J. and Ryason P. R. (1972) Aerometric data analysis--Time series analysis and forecast and an atmospheric smog diagram. Attnosphaic Envkonrrrmt 6, 319.-342.

Saxena U. and Tsao K. C. (1971) New methods to forecast air pollutant concentrations. A.S.M.E., Paper No. 71WA/APC-I.

Tiao G. C., Box G. E. P. and Hamming W. J. (1973) Analysis of Los Angeles photochemical smog data: A statisti- cal over~~~ew.T~chnical Report No. 331, apartment of Statistics, University of Wisconsin. Madison. Wiscon- sin.

Tiao G. C., Box G. E. P., Grupe M., Liu S. T., Hillmer S., Wei W. S. and Hamming W. J. (1973) Los Angeles Aerometric ozone data 19551972. Technical Report No. 346, De~rtment of Statistics, University of Wiscon- sin, Madison, Wisconsin.

Wachtenheim A. and Keith R. W. (1969) Forecasting ozone maxima for Los Angeles county. Los Angeles Air Pollution Control District. Paper No. 69-78. Presented at 62nd Annual Meeting of the Air Pollution Control Association.