forecasting of the daily meteorological pollution using wavelets and support vector machine

ARTICLE IN PRESS

0952-1976/$ - se

doi:10.1016/j.en

�CorrespondPoland.

E-mail addr

Engineering Applications of Artificial Intelligence 20 (2007) 745–755

www.elsevier.com/locate/engappai

Forecasting of the daily meteorological pollution using wavelets andsupport vector machine

Stanislaw Osowskia,b,�, Konrad Garantya

aWarsaw University of Technology, Warsaw, PolandbMilitary University of Technology, Warsaw, Poland

Received 14 July 2006; received in revised form 23 October 2006; accepted 27 October 2006

Available online 29 December 2006

Abstract

The paper presents the method of daily air pollution forecasting by using support vector machine (SVM) and wavelet decomposition.

Based on the observed data of NO2, CO, SO2 and dust, for the past years and actual meteorological parameters, like wind, temperature,

humidity and pressure, we propose the forecasting approach, applying the neural network of SVM type, working in the regression mode.

To obtain the acceptable accuracy of forecast we decompose the measured time series data into wavelet representation and predict the

wavelet coefficients. On the basis of these predicted values the final forecasting is prepared. The paper presents the results of numerical

experiments on the basis of the measurements made by the meteorological stations, situated in the northern region of Poland.

r 2006 Elsevier Ltd. All rights reserved.

Keywords: Pollution forecasting; Support vector machine; Wavelet decomposition; Neural network predictors; Generalization ability

1. Introduction

The information of the meteorological pollution, such asCO, NO2, SO2 and dust is more and more important due totheir harmful effects on human health (Comrie and Diem,1999). It is especially true in the urban environment ofevery country. The automatic measurements of theconcentration of these pollutants provide the instantregistration, on the basis of which the averaged valuesare usually calculated. The important problem is earlyprediction of the harmful pollution just to inform or alarmthe local inhabitants of the incoming danger.

The aim of this research work is to construct aforecasting model for the averaged next day pollution thatwould be applicable for the use by the authorityresponsible for air pollution regulation in the appropriateregion of the country. The use of the artificial neuralnetworks (Haykin, 1999) of multilayer perceptron (MLP)type as the model of pollution was exploited frequently last

e front matter r 2006 Elsevier Ltd. All rights reserved.

gappai.2006.10.008

ing author. Warsaw University of Technology, Warsaw,

ess: [email protected] (S. Osowski).

years (Comrie and Diem, 1999; Boznar et al., 2004;Hooyberghs et al., 2005; Kukkonen et al., 2003; Bianchiniet al., 1989). We propose here the forecasting system basedon the support vector machine (SVM) and the waveletdecomposition of the time series, formed on the basis of thesignals measured in the previous days. The predictionsystem makes use of such meteorological quantities asspeed and direction of the wind, temperature, humidity, airpressure, season of the year, type of the day and also thedaily pollution measured in the previous days. We build theSVM networks for the prediction of each consideredpollutant (CO, NO2 and SO2 and dust). The forecastingsystem based directly on these measurements was not ableto provide the acceptable accuracy of prediction. To solvethe problem we decompose the measured signals intowavelets (Mallat, 1989; Daubechics, 1988). The predictionis performed for the wavelet coefficients (the detailedcoefficients up to some level and the approximated coarsesignal corresponding to the last level) in the originalresolution. On the basis of these predicted values thereconstruction of the real value of the forecasted pollutionfor all considered pollutants is performed by simplysumming up the predicted decomposition signals.

www.elsevier.com/locate/engappai

dx.doi.org/10.1016/j.engappai.2006.10.008

mailto:[email protected]

ARTICLE IN PRESSS. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755746

The obtained system based on the application of SVMnetworks and wavelet decomposition represents very goodgeneralization ability. Trained on one type of pollutant(NO2) it is able to predict the other pollutant concentra-tions (CO, SO2, dust) with the satisfactory accuracy. Thenumerical experiments of forecasting have been performedfor the northern region of Poland (the meteorologicalstations near Gdansk). The results of experiments haveconfirmed good accuracy of daily prediction for allconsidered pollutants. These detailed results will bepresented and discussed in the paper.

2. Problem analysis

The problem of monitoring and early warning ofalarming values of the pollution in a region is an importanttask of any agency monitoring the quality of theenvironment. The typical values measured by the meteor-ological stations include the concentrations of SO2, NO2,CO, dust, etc. The network of measuring stations areowned and maintained by the environmental agency of thestate and the measurements are performed according to theregulations of this agency. Typically the instantaneousvalues are registered and on the basis of them theappropriate averaged values are determined. The dayaveraged values are of practical use, since they representsome long term tendencies of the pollution development.For meteorologists they establish the satisfactory tool forobserving trends of pollution, and enable to warn the localpopulation of alarming values, as well as to undertakesome preventive actions for the future.

Fig. 1. The real time series of NO2 concentration for the year 2003 av

Fig. 1 presents the daily averaged time series of NO2

concentration for the whole year 2003 measured by one ofthe meteorological stations of northern region of Poland.There is a visible large variety of the concentration of thepollutant from the day to day. At the mean value of 15.53mg/m3 the standard deviation of the measured values wasequal 7.99 mg/m3. The large value of the standard deviationconfirms the difficulty of the forecasting problem.The meteorological stations measure also some addi-

tional parameters such as the temperature, direction andstrength of the wind, humidity, pressure, etc. Theseparameters are associated with the pollution, although thisassociation is of rather complex nature. Fig. 2 presents theexemplary relationships between the temperature and dust(Fig. 2a), and the temperature and NO2 concentration(Fig. 2b). As it is seen the distribution of the measuredpoints is far from linear and represents the complexrelation. Plotting the distribution of other pollutants versusthe temperature, wind, humidity or pressure, no clearrelationship can be observed as well. It means that theprediction of these quantities needs an application ofhighly complicated nonlinear model of these dependencies.A bit better correlation is observed among the concen-

trations of different pollutants. Fig. 3 presents themeasured dependencies between the concentration of SO2

and NO2 (Fig. 3a) and between NO2 and dust (Fig. 3b).These correlations are closer to the linear, although evenhere the scattering of points is also wide. The standarddeviations of the distance between the real distributionof points and their linear approximation are equal 1.92[mg/m3] for the relationship between NO2 and SO2 (Fig. 3a)

eraged a day for one chosen station in northern region of Poland.

ARTICLE IN PRESS

Fig. 2. The measured dependence between the temperature and the concentration of (a) dust and (b) NO2 for one chosen station in northern region of

Poland.

Fig. 3. The measured relationship between the concentrations of NO2 and SO2 (a) and NO2 and dust (b) for one chosen station in northern region of

Poland.

S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 747

and 1.98 [mg/m3] for the relationship between NO2 and dust(Fig. 3b).

The important conclusion from these results is that thereare some similarities existing among the mechanisms ofspreading different pollutants. The observed increase of NO2

pollution results in the increase of the concentration of theother types of pollutants, although the observed relationshipis not linear one. The interesting challenge is to construct andlearn the model of one kind of pollution and using this modelto predict the data corresponding to the other pollutant.

3. The tools and methods of pollution prediction

In our work we will apply the SVM of the Gaussiankernel, working in the regression mode (Vapnik, 1998;Scholkopf and Smola, 2002) as the model of pollution. Thischoice was accepted after trying other neural type solutions,like MLP and neuro-fuzzy structure (Haykin, 1999; Jang etal., 1997). The main advantage of the SVM over MLP orneuro-fuzzy network is its good generalization ability,acquired at relatively small number of learning data andat large number of input nodes (high dimensional problem).Due to very specific problem formulation the learning task

is simplified to the solution of the quadratic problem of asingle minimum point (global minimum). To obtain betteraccuracy of prediction we have applied also the waveletdecomposition (Mallat, 1989; Daubechier, 1988) of the timeseries of the measured concentrations of pollutants corre-sponding to the whole year. Instead of predicting the totalconcentration of the particular pollutant we build few SVMnetworks responsible for prediction of the wavelet coeffi-cients on different levels, forming the partial representationsof the particular pollutant concentration. Reconstruction ofthe final concentration of the pollutant is done on the basisof the predicted wavelet coefficients using simple operationof summing.

3.1. Support vector machine for regression

SVM, the solution of the universal feedforward net-works, is known as the excellent tool for the classificationand regression problems of good generalization ability(Vapnik, 1998; Scholkopf and Smola, 2002). In distinctionto the classical neural networks, the formulation of thelearning problem of SVM leads to the quadratic program-ming with linear constraints.


The SVM is a linear machine of one output y(x),working in the high dimensional feature space formed bythe nonlinear mapping of the N-dimensional input vector xinto a K-dimensional feature space (K4N) through the useof the nonlinear function jðxÞ. The number of hidden units(K) is equal to the number of so-called support vectors,that are the learning data points, closest to the separatinghyperplane. The learning task is transformed to theminimization of the error function, while keeping theweights of the network at minimum. The error function isdefined through the so-called e-insensitive loss functionL�ðd; yðxÞÞ (Vapnik, 1998):

L�ðd; yðxÞÞ ¼jd � yðxÞj � � for jd � yðxÞjX�;

0 for jd � yðxÞjo�;

((1)

where e is the assumed accuracy, d is the destination, x theinput vector and y(x) the actual output of the networkunder excitation of x and the actual output signal of theSVM network is defined by

yðxÞ ¼XK

j¼1

wjjjðxÞ þ b ¼ wTuðxÞ þ b, (2)

where w ¼ ½w1; . . . ;wK �T is the weight vector, b the bias and

uðxÞ ¼ ½j1ðxÞ; . . . ;jK ðxÞ�T the basis function vector.

The solution of the so defined optimization problem issolved by the introduction of the Lagrangian function andthe Lagrange multipliers ai, a0i ði ¼ 1; 2; . . . ; pÞ responsiblefor the functional constraints defined by (1). The mini-mization of the Lagrangian function has been transformedto the so-called dual problem (Vapnik, 1998; Platt, 1998):

maxXp

i¼1

diðai � a0iÞ � �Xp

i¼1

ðai � a0iÞ�

(

�1

2

Xp

i¼1

Xp

j¼1

ðai � a0iÞðaj � a0jÞKðxi;xjÞ

)ð3Þ

at the constraints

Xp

i¼1

ðai � a0iÞ ¼ 0,

0paipC; 0pa0ipC, ð4Þ

where Kðxi;xjÞ ¼ uTðxiÞuðxjÞ is an inner-product kerneldefined in accordance with the Mercer’s theorem (Vapnik,1998) for the learning data set x. After solving the dualproblem all weights are expressed through the Nsv nonzeroLagrange multipliers ai, a0i and the same number oflearning vectors xi associated with them. The networkoutput signal y(x) can be then expressed in the form:

yðxÞ ¼XNsv

i¼1

ðai � a0iÞKðx; xiÞ þ b. (5)

The most known kernel functions used in practice areradial (Gaussian), polynomial, spline or even sigmoidalfunctions (Vapnik, 1998; Scholkopf and Smola, 2002).

The most important is the choice of coefficients e and C.Constant e determines the margin within which the error isneglected. The smaller its value the higher accuracy oflearning is required, and more support vectors will befound by the algorithm. The regularization constant C isthe weight, determining the balance between the complex-ity of the network, characterized by the weight vector w

and the error of approximation, measured by the slackvariables and the value of e. For the normalized inputsignals the value of e is usually adjusted in the range(10�3–10�2), and C is much bigger than 1.

3.2. Wavelet decomposition of signals

The discrete wavelet transform belongs to the multi-resolution analysis (Mallat, 1989; Daubechies, 1988). It is alinear transformation with a special property of time andfrequency localization at the same time. It decomposes thegiven signal series onto a set of basis functions of differentfrequencies, shifted each other and called wavelets. Unlikethe discrete Fourier transform the discrete wavelet transformis not a single object. In reality, it hides a whole family oftransformations. The individual members of the family aredetermined by the choice of so-called mother waveletfunction. The goal of discrete wavelet transform is todecompose arbitrary signal f(t) into a finite summation ofwavelets at different scales (levels) according to the expansion

f ðtÞ ¼X

j

Xk

cjkcð2j t� kÞ, (6)

where cjk is a new set of coefficients and cð2j t� kÞ is thewavelet of jth level (scale) shifted by k samples. The set ofwavelets of different scales and shifts can be generated fromthe single prototype wavelet, called mother wavelet, bydilations and shifts. What makes the wavelet bases interestingis their self-similarity: every function in wavelet basis is adilated and shifted version of one (or possibly few) motherfunctions. In practice the most often used are the orthogonalor bio-orthogonal wavelets, for which the set of waveletsforms an orthogonal or bi-orthogonal base (Mallat, 1989;Daubechies, 1988).Let us denote the discrete form of the original signal

vector by f and by Ajf the operator that computes theapproximation of f at resolution 2j. Let Djf denotes thedetailed signal, Djf ¼ Ajþ1f � Ajf at the resolution 2j. Itwas shown by Mallat (1989) that both operations Ajf andDjf can be interpreted as the convolution of the signal ofprevious resolution and the finite impulse response of thequadrature mirror filters: the high pass ( ~G) of coefficients ~gand the low pass ( ~H) of coefficients ~h

Ajf ¼X1

k¼�1

~hð2n� kÞAjþ1fð2nÞ, (7)

Djf ¼X1

k¼�1

~gð2n� kÞAJþ1fð2nÞ. (8)

ARTICLE IN PRESSS. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 749

These operations performed for values of j, from 1 to J,deliver the coefficients of the decomposition at differentlevels (scales) and different resolutions of the originalvector f and form the analysis of the signal. The most oftenused discrete wavelet analysis scheme uses Mallat pyramidalgorithm (Mallat, 1989).

As a result of such transformation we get the set ofcoefficients representing the detailed signals Dj at differentlevels jðj ¼ 1; 2; . . . ; JÞ, and the residue signal AJf at thelevel J. All of them are of different resolutions, appropriateto the level. The coefficients of Djf can be interpreted as thehigh frequency details, that distinguish the approximationof f at two subsequent levels of resolution. On the otherhand, the signal AJf represents the coarse approximation ofthe vector f.

The next step is the transformation of the detailed signalsDjf ðj ¼ 1; 2; . . . ; JÞ and the coarse approximation signalsAJf into the original resolution. It is done by using specialfilters G and H associated with the analysis filters ~G and ~Hby the quadrature and reflection relationships (Mallat,1989). This is so-called reverse Mallat pyramid algorithm,forming the reconstruction of the original signal. As aresult we get the decomposed signals of each level in theoriginal resolution. The recovery of the original signal f(n)in each time instant n is then performed by simply addingthe appropriate wavelet coefficients and the coarseapproximation. At J-level decomposition we have

f ðnÞ ¼ D1ðnÞ þD2ðnÞ þ � � � þDJðnÞ þ AJðnÞ. (9)

Fig. 4 presents the results of 5-level wavelet decompositionof the real data of NO2 concentrations (see Fig. 1), of thewhole year 2003, obtained by using Matlab (1997)platform. The Daubechies wavelets Db8 have been appliedin the decomposition. All signals (the first five levels ofwavelet coefficients from D1 to D5 and the coarseapproximation A5 on the fifth level) are illustrated in theoriginal resolution. We observe the substantial differenceof variability of the signals at different levels. The higher isthe wavelet level, the lower variation of the coefficients andeasier prediction of them.

Our main idea is to substitute the prediction task of theoriginal time series of high variability by the prediction ofits wavelet coefficients on different levels of lowervariabilities, and then using Eq. (9) for final forecastingof the pollution at any time point n. Since most of thewavelet coefficients are of lower variability we expect theincrease of the total prediction accuracy.

The important point in designing the prediction system isdecision what the optimal value of J is. At higher J thevariability of larger number of predicted signals is lower, sotheir prediction is easier and hence the expected accuracyhigher. However, at too high number of levels the totalerror associated with the increased number of terms underprediction begins to dominate and as a result the totalaccuracy deteriorates. In our solution, we have determinedthe value of J on the basis of the standard deviation of theapproximated signals AJf. We stop the decomposition on

the level for which the standard deviation of theapproximated signal is substantially smaller than that ofthe original signal. In practice, the stopping condition hasbeen expressed in the empirical form:

stdðAJfÞ

stdðfÞo0:1. (10)

For the data distribution presented in Fig. 4 the value J ¼

5 was appropriate, since the ratio stdðA5fÞ

stdðfÞ¼ 0:067 satisfies

relation (10).

3.3. The prediction method

The forecasting of the pollution is done for the next dayon the basis of the information of the past pollution datafrom the last few days and the measured or predictedmeteorological parameters, such as the speed and directionof the wind, the temperature, humidity, pressure, and theactual information of the month (from 1 to 12) as well asthe type of the day within the week (from 1 to 7).The meteorological parameters listed above are well-

known factors influencing the pollution (Comrie and Diem,1999; Boznar et al., 2004; Hooyberghs et al., 2005;Kukkonen et al., 2003). All of them are available fromthe actual measurements performed by the meteorologicalstations. The type of the day influences the pollution in anevident way, since the industrial activity (the importantsource of pollution) is concentrated mainly in the workingdays. The season of the year is also strictly connected withthe state of environment due to the enlarged demand ofpower in the winter or reduced activity of industry in thesummer holiday.The consideration of the past pollution is justifiable by

the continuity of the process of formation of the actualpollution. The significant question is how many past daysshould be taken into account. To solve this problem wehave applied the correlation analysis of the actual pollutionwith the pollution of few past days. Table 1 presents theexemplary results of the correlation analysis for CO, NO2,SO2 and dust for the period of one year in the form of thecorrelation coefficient for the data measured by one chosenmeteorological station situated in northern Poland. Theactual day (d) presented in the table was Saturday and theprevious days: Friday (d�1) Thursday (d�2), Wednesday(d�3) and Tuesday (d�4).We have analyzed many examples of such dependencies

for different days of the week and noticed significantcorrelations for at most two neighboring days. At thenumber of days exceeding two the correlations were usuallyeither weak or changing significantly (from small tomedium) for different days. So only past two days datahave been used in the prediction model.All data have been normalized linearly to the range from

0 to 1 by simply dividing the real value by the maximumone of the appropriate set. For the prediction of thewavelet coefficients on different levels for the next day we

ARTICLE IN PRESS

Fig. 4. The wavelet decomposition of the measured time series corresponding to NO2 concentration of the year 2003; D1 to D5 represent the detailed

coefficients and A5 the coarse approximation of the time on fifth level.

Table 1

The correlation coefficient values of the actual concentration of CO, NO2,

SO2 and dust with the past four days for one chosen meteorological

station

Previous days d�1 d�2 d�3 d�4

CO 0.97 0.90 0.79 0.64

NO2 0.95 0.82 0.76 0.69

SO2 0.95 0.83 0.80 0.75

Dust 0.94 0.82 0.78 0.77

Fig. 5. The structure of the SVM network used for prediction of any

detailed coefficient Dpi and the coarse approximation Api for pth

pollutant.

S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755750

have applied the SVM approach. After analysis ofvariability of the coarse approximation data at differentlevel decompositions (see Eq. (7)) we have decided toapply five levels ðJ ¼ 5Þ. Prediction of wavelet coefficientson each level requires to use one SVM network. Theadditional one is needed for prediction of the coarseapproximation of the data. It means the application of sixSVM networks altogether. Fig. 5 presents the exemplarygeneral SVM structure used for prediction of the waveletcoefficients Dpiðdþ 1Þ for i ¼ 1; 2; . . . ; 5 of the particularpollutant p on ith level for the next (d+1) day. Identicalstructure is used for the prediction of the coarseapproximation Ap5 of the pollutant p (CO, NO2, SO2

and dust).

The input data of the network is formed by fivemeteorological parameters (the speed of the wind, directionof the wind, temperature, humidity, pressure), the succeed-ing number of the month and the week day, and also twopast values of the forecasted quantities corresponding totwo last days. To provide the appropriate representation of

ARTICLE IN PRESS

Fig. 6. The geographical location of the meteorological stations in

northern region of Poland.


the wind, we have applied its strength and directioncombined together in the form x and y components(rectangular system) of its speed vector (two nodes inrepresentation of x). It makes together nine inputs and thesame number of normalized input signals. All data used inlearning has been transformed to the original resolution oneach level using reverse Mallat filtering. The learningvectors x were formed from the components of this database of the year 2001 and 2002 and the correspondingmeteorological parameters for these days. The destinationwas associated with the predicted value of the appropriatewavelet coefficient for the next day.

Two kinds of experiments have been performed. In thefirst one, the SVM networks were specialized for predictionof the wavelet coefficients on the appropriate decomposi-tion levels for the next day for each pollutant separately. Atfive decomposition levels and four pollutants it meanstraining 24 independent SVM networks. After phase oflearning all parameters of SVM networks were frozen andthe networks were tested on the data of the same kind ofpollutant (not used in learning phase).

In the second type of experiments, we have learned theSVM networks on the data related to one pollutant only(for example NO2) and the trained networks tested on thedata of all pollutants (NO2, CO, SO2 and dust). It meanstraining only six SVM networks. Assuming that themechanisms of creating different type pollutions are thesame, the generalization ability of the trained SVMnetworks should be sufficient for predicting the concentra-tions of the considered pollutants.

In the testing mode we supply the appropriate valuesforming the vector x to the trained SVM networks thatpredict the wavelet coefficients of all five levels and thecoarse approximation value (all in the normal resolution)for the next day. On the basis of these predicted coefficientsthe real prediction of the concentration of the particularpollutant for the next day is made by simply adding them,as is shown by Eq. (9).

4. The results of numerical experiments

The numerical experiments of predicting the next dayaverage pollution have been performed for the network ofseven meteorological stations in the northern region ofPoland near Gdansk, belonging to the ARMAAGfoundation. Fig. 6 presents the geographical location ofthese stations.

Some of the stations are situated in different points ofthe city of Gdansk and two outside the real center. Thedata taking part in learning and testing have been collectedwithin three years, from 2001 to 2003. Part of them (theyears 2001 and 2002) has been used for learning and theother part (the year 2003) for testing only.

The results of learning and testing have been assessed onthe basis of the mean absolute errors and the averagerelative error of the real concentrations and their estimatedvalues produced by our predicting system for the whole

2003 year. We denote by d and y the 365 (the year)component vectors of the average daily concentration ofthe particular pollutant (CO, NO2, SO2 or dust) corre-sponding to the destination vector d (the real measure-ments) and the vector y actually generated by our SVMsystem. We have defined two kinds of errors.

�
The mean absolute error
MAE ¼1

N

XN

i¼1

jdi � yij

!, (11)

where N is the number of days under prediction.
� The relative (normalized) error
� ¼kd� yk

kdk. (12)

The main experiments have been performed by using SVMpredictor. For the comparison purposes some experimentshave been repeated by application of the classical MLP.The introductory experiments have been performed for

prediction of the whole pollutant without any previousdecomposition. However, the results were not encouraging.Although the learning was possible with the acceptableerror, the testing error was far too large (lack of general-ization ability). This was the main reason why we haveproposed the additional step of decomposing the data intothe wavelets, and used the SVM networks for predictingthe wavelet coefficients of each decomposition level. Fivelevel decompositions of Daubechies wavelets Db8 havebeen applied. In the prediction step instead of one SVMnetwork to predict particular pollutant concentration wehave to use 6 networks responsible for predicting five levels


of wavelet coefficients and the coarse approximationsignal. On the basis of these predicted values the wholepollutant concentrations for the next day have beenreconstructed by applying Eq. (9). In all experiments, wehave used the general scheme of SVM network shown inFig. 5 of the Gaussian kernel Kðx;xiÞ ¼ expð�kx�xik

2=2s2Þ. For the normalized data samples we have useds ¼ 1 and the tolerance value � ¼ 0:01. The regularizationconstant C was chosen after series of experiments using thestandard validation approach (C ¼ 100).

The most important observation from these experimentsis, that we have got very good generalization abilities of theSVM networks, trained on the data related to NO2 only.

Fig. 7. The exemplary results of daily prediction of the time series of CO (a),

upper plots—the real and predicted concentrations, the lower ones—the predi

The networks trained on one type of pollution data wereable to predict the concentration of the other pollutants(SO2, CO2, dust) with practically the same accuracy as thespecialized networks, trained for each pollutant separately.It confirms the fact that the mechanism of creatingdifferent type pollutions is similar and the neural networksare able to learn such mechanism.All presented numerical results will be related to this

generalization ability of SVM networks. We limit pre-sentation of the results to the testing data only, not takingpart in learning (the data of the year 2003).Fig. 7(a–d) depicts the exemplary results of prediction of

NO2, CO, dust and SO2 concentrations corresponding to

dust (b), SO2 (c) and NO2 (d) for one chosen meteorological station (the

ction errors).

ARTICLE IN PRESS

Fig. 7. (Continued)


one chosen station, compared to the real measured values.The upper plots present both patterns (the real andpredicted) while the lower one—the distribution of theprediction errors. The predictions coincide very well withthe real measured data. The mean of prediction errors isclose to zero (0.19 [mg/m3] for dust, 1.2 [mg/m3] for CO,0.094 [mg/m3] for NO2 and 0.023 [mg/m3] for SO2). Themean of absolute errors is 4.88 [mg/m3] for the dust, 27.77[mg/m3] for CO, 1.59 [mg/m3] for NO2 and 1.13 [mg/m3] forSO2. The standard deviation of errors for the dust is equal5.97 [mg/m3], 49.7 [mg/m3] for CO, 2.27 [mg/m3] for NO2

and 1.97 [mg/m3] for SO2. If we compare these values to thereal concentrations of the dust (the average level of 70 [mg/

m3]), CO (average level of 500 [mg/m3]), NO2 (average levelof 16 [mg/m3]) and SO2 (the average value of 18 [mg/m3]) itis seen that the relative errors for all pollutants are close toeach other.Table 2 presents the mean absolute error of prediction of

CO, NO2, SO2 and dust for all seven stations. Comparingthese values with the average levels of the real concentra-tion of the pollutants shows that their relative values arerelatively small and stay on a similar level for all pollutants.Table 3 presents the details of the average relative

prediction errors for all considered pollutants at sevenstations. All of them have been obtained using SVMnetworks trained on the data of NO2 only.

ARTICLE IN PRESS

Fig. 8. The details of tracing the destination values of NO2 concentration

(solid line) by SVM (dash line) and MLP (dash-dot line) predictors.

Table 3

The average relative errors eSVM of the daily predictions of all pollutants

for seven stations in the year 2003 obtained by using SVM

Pollutant CO (%) NO2 (%) SO2 (%) Dust (%)

Station 1 13.37 15.22 18.19 21.53

Station 2 10.10 14.10 18.14 18.66

Station 3 7.15 15.18 18.16 13.71

Station 4 11.37 13.00 16.55 13.70

Station 5 10.47 13.97 16.34 13.99

Station 6 14.03 17.41 17.79 13.11

Station 7 9.76 14.39 15.83 14.16

Average for

all stations

10.86 14.74 17.28 15.55

Table 4

The average relative errors of prediction of the daily NO2 concentration

for seven different stations in the year 2003 obtained by SVM and MLP

predictors

Stations 1 2 3 4 5 6 7

eMLP (%) 21.03 22.06 22.64 18.55 19.80 27.01 20.97

eSVM (%) 15.22 14.10 15.18 13.00 13.97 17.41 14.39

Table 2

The mean absolute error MAE of daily predictions of all pollutants for

seven stations in the year 2003 obtained by using SVM

Pollutant CO (mg/m3) NO2 (mg/m3) SO2 (mg/m

3) Dust (mg/m3)

Station 1 31.26 2.11 0.94 5.86

Station 2 27.77 1.59 1.13 4.88

Station 3 22.94 2.06 0.99 3.17

Station 4 31.96 1.78 1.40 3.59

Station 5 36.54 3.41 0.98 5.96

Station 6 52.66 2.62 1.45 4.62

Station 7 30.56 2.16 1.41 3.83

S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755754

The presented numerical results confirm that theaccuracy of prediction of all pollutants stay on a similarlevel (the average relative error for all stations—below orclose to 15%) although the testing data were related to thepollutants not used in learning. This is very interestingphenomenon of the proposed system of prediction,following from the principle of operation of the neuralnetworks. In the stage of learning by using training data,the SVM predictor was learned the mechanism of creationof the pollution and not the data points themselves!Satisfactory results of testing the trained network ondifferent pollutants simply show that the mechanism offormation of different pollutions are really similar. Goodgeneralization ability of SVM acquired in the learningstage was sufficient to obtain the satisfactory results offorecasting for all pollutants. The obtained accuracy ofprediction makes this method practical for the one dayahead prediction.

To assess the quality of SVM predictor we havecompared it with the solution based on MLP. Table 4presents the comparison of the average relative errors ofthe daily NO2 forecasting in the year 2003 for sevenstations by applying the SVM and MLP approaches. Theaverage error eSVM corresponds to the SVM and the eMLP

to MLP approach. There is a significant difference ofresults. The average errors at SVM approach (14.74%) ismuch lower than that obtained at MLP application(21.71%). Evidently the MLP is of much worse general-ization ability. Observe that the same data has been usedfor learning and testing both networks. The optimalstructure of MLP was found 9-17-1 (188 weights). Atavailable 730 training data pairs (two years used inlearning) the number of data points was not satisfactoryand as a result the acquired generalization ability of MLPwas not good enough and this was the main reason of itsinferior testing performance. On the other side therelatively small number of learning pairs was not a problemfor SVM, since this network is relatively insensitive to thelimited number of learning data.Fig. 8 presents the details of performance of both

predictors of NO2 for 13 succeeding days compared to thereal values (destination). It is evident that the results ofSVM are much closer to the destination (solid line) thanthat generated by MLP.The interesting is also the comparison of the proposed

approach with the other recent solutions presented in theworks devoted to the prediction of pollution. Unfortu-nately the data under prediction are different in each case,since no standard base is available in internet. The paper(Peace et al., 2005) has presented the results of predictingthe carbon concentration in the north–west of Italy usingthe site-optimized semi-empirical air pollution model. TheMAE error given for each season of the year was changingfrom 127 [mg/m3] to 510 [mg/m3], while the average level of

ARTICLE IN PRESSS. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 755

pollution under observation was around 1000 [mg/m3] (thenormalized relative error from 12.7% to 51%). In our casethe average MAE error of prediction of CO concentrationfor all seven stations over the whole year was 33.38 [mg/m3]at the average level of pollution approximately 500 [mg/m3](the normalized relative error around 7%).

The paper (Bianchini et al., 2006) has proposed andcompared two methods of predicting NO2 concentration:the AutoRegressive eXogenous (ARX) model and acyclostationary neural network (CNN) model applyingthe MLP network. The results have been obtained forthe data gathered by the agency ARPA, Lombardia innorthern Italy. The MAE error computed for 12 monthswas almost 5 [mg/m3] (the CNN model) and 20 [mg/m3] (theARX model). The peak errors of prediction were 35 [mg/m3] (CNN) and 60 [mg/m3] (ARX), respectively. Our resultsof NO2 prediction were as following: 2.24 [mg/m3] (MAE)and 8 [mg/m3] (the peak error). This comparison shows thesignificant advantage of applying the wavelet decomposi-tion and the SVM predictors proposed in this paper.

5. Conclusions

The paper has presented the prediction method of thedaily atmospheric pollution by applying the support vectormachine and wavelet decomposition. The important pointof this approach is the decomposition of the daily data intothe wavelets and individual prediction of wavelets atdifferent levels. Application of SVM instead of classicalMLP has enabled to obtain much better accuracy ofprediction of the wavelet coefficients and as a result also ofthe whole pollutant concentration.

The proposed approach has been tested on the data ofthe meteorological stations situated in northern Poland.The obtained results of prediction are in good agreementwith the actual measurements made at these stations,irrespective of the type of pollutant. The importantobservation from these experiments is very good general-ization ability of the applied predicting system. The SVMnetworks trained on the data related to only one pollutant(NO2) measured at one chosen station, were capable of

producing good quality predictions for the other types ofpollutants (CO, SO2, dust) at all stations.

References

Bianchini, M., Di Iorio, E., Maggini, M., Mocenni, C., Pucci, A., 2006. A

cyclostationary neural network model for the prediction of the NO2

concentration. Proceedings of ESANN, Bruges, 2006.

Boznar, M., Mlakar, P., Grasic, B., 2004. Neural networks based ozone

forecasting. Ninth Conference on Harmonisation within Atmospheric

Dispersion Modelling for Regulatory Purposes, Garmisch Parten-

kirchen, 2004, pp. 356–360.

Comrie, A.C., Diem, J.E., 1999. Climatology and forecast modelling of

ambient carbon monoxide in phoenix. Atmospheric Environment 33,

5023–5036.

Daubechies, I., 1988. Ten Lectures on Wavelets. SIAM Press, Philadel-

phia, PA.

Haykin, S., 1999. Neural Networks. Comprehensive Foundation. Pre-

ntice-Hall, New Jersey.

Hooyberghs, J., Mensink, C., Dumont, D., Fierens, F., Brasseur, O., 2005.

A neural network forecast for daily average PM10 concentrations in

Belgium. Atmospheric Environment 39/18, 3279–3289.

Jang, J.S., Sun, C.T., Mizutani, E., 1997. Neuro-fuzzy and Soft

Computing. Prentice-Hall, New York.

Kukkonen, J., Partanen, L., Karpinen, A., Ruuskanen, J., Junninen, H.,

Kolehmainen, M., Niska, H., Dorling, S., Chatterton, T., Foxall, R.,

Cawley, G., 2003. Extensive evaluation of neural networks models for

the prediction of NO2 and PM10 concentrations, compared with a

deterministic modeling system and measurements in central Helsinki.

Atmospheric Environment 37, 4539–4550.

Mallat, S., 1989. A theory for multiresolution signal decomposition: the

wavelet representation. IEEE Transactions of the Pattern Analysis and

Machine Intelligence 11, 674–693.

Peace, M., Dirks, K., Austin, G., 2005. The prediction of air pollution

using a site optimised model and mesoscale model wind forecast.

World Weather Research Program Symposium on Nowcasting and

Very Short Range Forecasting, Toulouse, 2005.

Platt, L., 1998. Fast training of SVM using sequential optimization. In:

Scholkopf, B., Burges, B., Smola, A. (Eds.), Advances in Kernel

Methods—Support Vector Learning. MIT Press, Cambridge,

pp. 185–208.

Scholkopf, B., Smola, A., 2002. Learning with Kernels. MIT Press,

Cambridge, MA.

Vapnik, V., 1998. Statistical Learning Theory. Wiley, New York.

Wavelet Toolbox for Matlab, 1997. User manual, MathWorks, Natick,

USA, 1997.

forecasting of the daily meteorological pollution using wavelets and support vector machine

Documents