wavelets in forecasting - · pdf file2 wavelets in forecasting wavelet analysis is not a...

Wavelets in Forecasting

Mak Kaboudan School of Business, University of Redlands

1200 East Colton Avenue, Redlands, CA 92373

Tel: (909) 748-6349

Email: [email protected]

July 5, 2004

2

Wavelets in Forecasting

Wavelet analysis is not a forecasting technique but may help improve our forecasting abilities.

Multiresolution analysis decomposes observed series to produce different levels of detail. For

sufficiently lengthy time series, a level of decomposition that transforms the observed series into a

smoothed representation and low scale details is selected with just enough observations to model

then forecast each. An inverse transformation process converts models’ fitted values back for

comparison with the originally observed sequence. In this paper, monthly sunspot numbers are

first transformed using the Haar wavelet providing input data to train using artificial neural

networks and genetic programming. Inverse transformation compute fitted and forecasted values

of the monthly series. Each technique was used to produce three-year-ahead forecasts for a

relatively large number of years into the future. These results should therefore invite more

research to improve on the proposed method.

Keywords: Non-stationary time series; Haar wavelet; artificial neural networks; genetic

programming.

1. INTRODUCTION

The purpose of this paper is to investigate the potential role wavelet analysis can play in forecasting.

Most prior studies using wavelets focus on estimation. Lee (1998) provides an interesting review. Pan and

Wang (1998) introduced a new wavelet-based estimator that combined state-space model with wavelet

transform to explore stock market inefficiency. Jensen (1999) uses wavelets to obtain consistent OLS

estimator of long-memory parameters. Nason and Theofanis (2000) use the method to model

nonstationary time series. Ramsey et al. (1995) use wavelet analysis to detect self-similarity or non-

randomness in the U.S. stock market. The role of wavelet analysis in filtering was investigated in Gençay

et al. (2002). Aussum and Murtagh (1997) apply neural networks to ‘à-trous’ wavelet-transformed annual

sunspot numbers to obtain one-step-ahead forecasts for 59 sunspot values ranging from 1921 to 1979.

Thomason (1997) obtain one-step-ahead forecasts of wavelet filtered S&P 500 index using neural

3

networks as well. Pan and Wang (1998) introduced a new estimator that combines a state-space model

with wavelet transforms to forecast S&P 500 as a function of the S&P dividend yield. Renuad et al.

(2002) experimented with AR(4) noisy data to provide one-step-ahead forecast. It is therefore important

to investigate whether wavelet analysis can be used in forecasting. And if it is, will it be possible to obtain

more than one-step-ahead forecast?

Wavelet analysis, in and by itself, is not a forecasting technique. In its simplest form, it transforms

a suspected signal into different levels of resolution. Wavelets localize a process in time and frequency.

This is done utilizing a wavelet transform function (the mother wavelet). Its role is to capture low and

high frequency features in a time series at successive decomposition levels. When the wavelet transform

is high in frequency it captures detailed incidents. When it is low in frequency it captures long in time

incidents making it ideal for analyzing nonstationary time series. The strength of wavelet analysis is in

inverse transformation. The inverse transform perfectly reconstructs the original suspected signal. At each

level of decomposition there exists a finer level of resolution. Each stage of decomposition produces two

series containing half the number of points in the prior level. One of the two captures the high while the

other captures the low level frequency. Increasing the level of decomposition ultimately provides

predictable series if the original data contains any signal. What is interesting here is that if one of the

high-level frequency decompositions is predictable, the associated low-level frequency is also predictable.

(This characteristic was not known prior to obtaining the results reported later in this paper.) If this is true,

then an experiment to fit estimation models to higher frequencies at consecutive decomposition levels

until a predictable one is found may be warranted. One then estimates a model for the associated low-

level scale. Fitted values from solving models at all the higher and lower frequency levels can then be

used to obtain fitted values of the original signal. Using this system and by selecting the appropriate lag

structure to estimate the different models, multi-period-ahead forecasts become possible.

4

Proposing a method to use in forecasting is validated by its successful application. While the

objective here is not to prove that this method provides the best forecast, it is imperative to show that it

can be applied to real world phenomena. Monthly sunspot numbers were selected to test the method. The

number is an index representing daily appearances of sunspots. These are huge dark areas sometimes

exceeding the Earth’s size that appear on the Sun’s visible surface, the photosphere, then disappear in a

few hours, days, or even months. They occur in pairs at confined latitudes north and south of the Sun’s

equator with opposite magnetic polarity. A sunspot number is a daily measure r = A (10 G + I), where A

is an adjustment factor that accounts for differences between observatories and observers, G is the number

of groups of sunspots, and I is actual count of visible spots. Wolf and Wolfer introduced this index in

1848. It was designed to estimate solar activity level because they found that neither the number of groups

nor the total count of individual spots alone provide an accurate representation. The series used in this

study is Rt = monthly averages of r in time period t. Most studies analyze and forecast annual data. (See

for example Gabr and Rao 1981, Tong 1990, and Lin and Pourahmadi 1998.) The relatively fewer

attempts to forecast monthly averages include those of Mundt et al. (1991) and Hathaway et al. (1999).

Monthly sunspot numbers were selected for four reasons. First, wavelet analysis demands a

relatively large number of observations to transform. Using annual data would leave too few observations

to fit. Second, using real world data seems to be more convincing than artificially simulated data. Third,

timely and accurate forecasts of the numbers are especially important in decision making for satellite

orbits and space missions. They have significant economic implications for technologies such as high-

frequency radio communications and radars. Their accurate prediction is also essential for weather

forecasting. Fourth, sunspot numbers have been one of the most widely investigated series in the field of

statistics and forecasting.

There are three problems that may have hindered the use of wavelets in forecasting. (These

problems are discussed more generously in Lee (1998) and Gençay et al. 2002, p. 143).) First, wavelet

5

analysis is assumed to apply only to time series of dyadic length (T = 2J, where T is the length of the time

series and J is a positive integer). In the method proposed, it is possible to relax the requirement for a

dyadic-length vector of observations. The second problem pertains to selecting the wavelet basis function.

Since real world data is collected periodically, it is easy to assume that they are piecewise constant

functions (Gençay et al. 2002), and the Haar wavelet would be most appropriate especially for

nonstationary signals (Swee and Elangovan 1999). Third, the application of wavelet transformation to

discrete finite-length time series is affected by the boundary. Transformation is based on filtering - and

sometimes solutions must be developed to meet boundary conditions needed to compute the transformed

values. This problem is dissolved while solving the second problem. The Haar wavelet is exactly

reversible thus eliminating boundary effects that are a problem with other wavelet transforms.

Investigating the application of other filters such as the Daubechies wavelet (Daubechies 1992) is left for

future research. Given that the objective here is to demonstrate how wavelets can be used in forecasting,

the simplicity of the Haar wavelet becomes an appealing characteristic even though the Daubechies

wavelet may improve on the frequency-domain characteristics of the Haar wavelet. The Haar wavelet is

reviewed in the next Section. To forecast the low and high frequency transformations, two techniques are

used: artificial neural networks (ANN) and genetic programming (GP). They are briefly described in

Section 3. Their application is in Section 4. The forecast and its evaluation are in Section 5. Section 6

contains discussion and some suggestions for future research.

2. THE HAAR WAVELET

Like other types of wavelets, the Haar transform iteratively decomposes a sequence of observation

into father and mother wavelets. The inverse transform restores the series to its original sequence. At each

level of decomposition the two transforms are half the length they were in the prior one. In the Haar

transform, a series Xt is first decomposed to obtain averages (denoted by a1) and differences (denoted by

d1) of consecutive pairs of observations (as opposed to consecutive points) in that series. Averages

6

preserve its main signal while differences capture the series’ fluctuations. If Xt contains T0 elements, there

will be a1 and d1 of averages and differences of length T1 = T0/2. The input for the next level of

decomposition is a1, where for this second iteration, T2 = T1/2. Recursive iterations continue until a single

average and difference are calculated. (This explains the restriction Tj = 2J.) The computations of aj,t and

dj,t are as follows:

The averages: ( ) 2aaa 1t21jt1jtj −−−+= ,,,

The differences: ( ) 2aad 1t21jt1jtj −−−−= ,,,

where for each level of decomposition tj = 1, …, Tj-1/2, j = 1, …, J, and a0 = Xt. For the Haar transform to

preserve the energy of a signal the wavelets are normalization by a factor 2 (Walker 1999, pp. 3-7), and

The father wavelet: ( ) 2aaa 1t21jt1jtj −−−+= ,,, (1)

The mother wavelet: ( ) 2aad 1t21jt1jtj −−−−= ,,, . (2)

Energy is an important characteristic in wavelet analysis. It helps here in evaluating estimation outcomes.

The inverse wavelet is simply calculated by reversing the decomposition process. From the highest

level of aj,t and dj,t reached, and in reverse order, only the aj,t are reversed and in pairs. The inverse

transforms are obtained by solving (1) and (2) for aj-1,t and aj-1,2t-1. Alternatively,

( ) 2daa tjtj1t21j ,,, −=−−

and (3)

( ) 2daa tjtjt1j ,,, +=−

(4)

are computed until the original series is restored. Jensen and la Cour-Harbo (2000) offer simple and

detailed explanation of the Haar transform and its inverse.

Wavelet transforms can be used to fit and forecast aK,t and dj,t for levels of decomposition K < J

and inverse transforms are then used to assemble fitted and forecasted values of a0,t = Xt. Decomposing a

series to a level K < J permits relaxing the restriction T = 2J. To demonstrate, consider a series with t = 1,

7

…, 320 observations (where T ≠ 2J) and assume K = 3 limited decomposition levels are performed. After

the first level of decomposition, 160 observations are left, the second leaves 80, and the third leaves 40

observations. These 40 observations can now be used to fit a model to. The restriction (T ≠ 2J) is thus

reduced to selecting a series of length T such that T/2K is an integer. Clearly, limited decomposition can

be applied to other than the Haar transform.

Determining the level of K depends on the complexity level of the series one is attempting to

model and forecast. Larger K will be needed for higher levels of complexity. This suggests modeling

successive levels of paired differencing until a reasonable model (i.e., one with good predictive ability) is

reached. Thus, one would expect a poor fit of d1. Data from the second level of differencing (d2) of pairs

of a1 should be more predictable than the first if the original series contains any signal and fitness will

continue to improve at higher levels of differencing. Once an acceptable model (with high fitness) is

obtained, there is no gain from filtering the data any further and that level is selected as K. This means

that the restriction set on the length of the series to analyze should now be expanded to the following:

Select T such that T/2K is an integer and the number of observations at K is sufficiently large to estimate a

model. (The minimum sample to model depends on the technique selected.)

3. MODELING TECHNIQUES

This Section contains a brief review of the two techniques applied in forecasting and an outline of

the steps to follow in obtaining a forecast. In addition to possessing good predictive abilities, ANN and

GP are applicable in nonlinear modeling.

Artificial Neural Networks

ANN is an information-processing paradigm based on the way the densely interconnected parallel

structure of the human brain processes information. They are a collection of mathematical models that

emulate the nervous systems and draw on the analogies of adaptive learning. ANN can be used to detect

8

structure in time-series. Input data is presented to the network that learns to predict future outcomes.

Principe et al. (2000) among many others provide a complete description on how ANN can be used in

forecasting. When training a model to forecast sunspot numbers, both multilayer perceptions (MP) and

generalized feedforward (GF) networks were used. An MP is a layered feed forward network. It is

typically trained with static backpropagation. It is easy to use and produces good approximations. A GF

network is a generalization of MP such that its connections can jump over one or more layers. GF often

solves the problem much more efficiently.

Genetic Programming

GP is a computerized optimization technique applicable in solving diverse problems in different

disciplines. The idea gained attention after work by Koza (1992). GP is utilized here to evolve model

specifications that can be used in forecasting. A description of how GP is used in forecasting and its

statistical properties are in Kaboudan (2001). The GP software used in this study is TSGP (Kaboudan

2003) written for a Windows environment in C++. It uses two types of input: data input files and a

configuration file. Data files consist of separate ones with values of the dependent and each of the

independent variables. The configuration file contains parameters that provide the computer with

execution information. Parameters provided in this file include: name of the dependent variable, number

of observations to use in fitting the model, number of observations to forecast, number of equation

specifications to evolve, and other GP-specific parameters. The computer code produces two output files.

One has a final model specification and the other contains actual and fitted values as well as performance

statistics such as R2, historic MSE, and ex post MSE for each best-fit evolved model.

To find a model that would replicate history and forecast well, executing the program only once is

not sufficient. This is because the process is random and while searching for a minimum SSE, the

algorithm easily gets trapped in a local minimum rather than a more desirable global one. Thus, the

resulting best-fit equation in a single execution may not be very useful and executing the program a fairly

9

large number of times is necessary. For most observed phenomena with signal suspected, 100

independent executions produce a few reasonable specifications.

While a best-fit model may replicate historical values well, it is not uncommon that it does not

forecast well. This is especially true when most equations specified by GP are nonlinear and therefore

sensitive to minute changes in values of the input variables. It is therefore necessary to evaluate ex post

forecasts before accepting and using an evolved model. Because solutions of nonlinear equations are

sensitive to initial conditions, GP equations are evolved using the autoregressive specification Xt = f(Xt-n,

Xt-n-1, …, Xt-n-c), where n is a constant integer defining the number of periods to forecast ahead, and c is a

constant integer defining a lag structure. By not using the first few lags, it is possible to forecast for as

many periods ahead using actual rather than forecasted values of the lagged dependent variables. The

ability to produce models with such long lag structure is one of the advantages using GP. Further, when

using long lag structures, there is no loss in degrees of freedom because coefficients are not computed.

They are randomly assigned numbers.

Steps to follow

The following is a summary of the sequence of tasks that help obtain a forecast using the proposed

method:

i. Use the Haar transform to decompose the signal Xt into multiresolution layers. Iterative

applications of equations (1) and (2) accomplish this.

ii. Use a modeling techniques (ANN and GP are used here) to fit or train a sample < t at each

level of iteration. At a3,t where t = 40 for example, it is sufficient to train the first 30

observations only and forecast the remaining 10. This provides out-of-sample outcomes to

evaluate. The number of data sets to train depends on the level of decomposition performed. If

that level is 3, then four data sets are trained to obtain four forecasts of a3,t, d3,t, d2,t, and d1,t.

Cumulative energy for these three levels of decomposition of the original series is:

10

Γ = γ1 + γ2 + γ3 + γ4 = 1. (5)

Energy at each of the three levels (where Σ is over the applicable t) is defined as:

γ1 = Σ d12 / Et, (6)

γ2 = Σ d22 / Et, (7)

γ3 = Σ d32 / Et, (8)

γ4 = Σ a32 / Et, and where (9)

Et = Σ Xt2. (10)

The cumulative energy of the fitted data sets should be expected to yield:

G = g1 + g2 + g3 +g4 ≈ 1 (11)

if the proposed method is valid. (In (11) g = the sample estimate of γ.)

iii. Use the Haar inverse transform to construct the fitted values of the original series and obtain a

forecast. Iterative applications of equations (3) and (4) accomplish this.

4. APPLICATION

The Data

The method proposed is applied to monthly unsmoothed sunspot numbers obtained from the Solar

Influences Data analysis Center division of the Royal Observatory of Belgium (SIDC 2003). The sample

selected starts January 1920 and ends August 2002 with a total of 992 observations. Identical data sets

were used to train ANN and GP. Both were designed to produce 36-month-ahead forecasts. The first 36

months were used as lags to avoid using model solutions to forecast future observations. This may seem a

lot in standard modeling techniques. However, with wavelet decomposition, this number became very

manageable quickly. By the fourth level of decomposition, the number of lags needed to forecast 36

periods ahead was down to only two. Table 1 shows how the data was used. Each column identifies the

level of differencing. Reasonable fits were found at d4. (K = 4 was possible because a large number of

11

observations was available.) The following hypothetical model specifications (used in training) may

clarify the information in Table 1:

a4,t = f (a4,t-3, a4,t-4, …, a4,t-8) (12)

d4,t = f (d4,t-3, d4,t-4, …, d4,t-8) (13)

d3,t = f (d3,t -5, d3,t -6, …, d3,t -10) (14)

d2,t = f (d2,t -9, d2,t -10, …, d2,t -14) (15)

d1,t = f (d1,t -17, d1,t -18, …, d1,t -22) (16)

Equations (12) and (13) provide two-step-ahead forecasts (or 32 months after applying the inverse

transform) without using their own forecast. Equation (14) provides four-step-ahead (or 32 months) as

well, and so on. All equations are specified similarly. Six lagged dependent variables were assumed

sufficient to explain variations in each of the dependent variable. Under Lags in Table 1, NOBS = number

of observations used up in establishing the needed lags. Lost df = lost degrees of freedom due to lags.

Under Training and Forecast, NOBS = number used to train and to forecast, respectively.

ANN Results

Five networks specified using equations (12-16) were identified and used to train and forecast the

monthly sunspot numbers. Table 2 contains the details. Figures 1a-1f contain six plots comparing actual

with fitted values of monthly sunspot numbers and of each of the five decomposed variables. The Figures

clearly suggest that ANN was successful in capturing the dynamics of a4,t and d4,t. ANN’s performance

deteriorated at lower levels of resolution d3,t, d2,t, and d1,t. This is expected since d1,t may be just noise.

Figure 1a portrays overall performance of the system when fitting MSSNt (where MSSN = monthly

sunspot numbers).

Statistics on training results are summarized in Table 3. It contains three sets of statistics. The first

one helps analyze the residuals. The statistics provide hints about normality of the residuals and test for

autocorrelation. Residuals’ means are all statistically equal to zero. Residuals of d1 and a4 are right

12

skewed and all but d3 are leptokurtic. The first order autocorrelation null cannot be rejected for d1, and the

second order autocorrelation null cannot be rejected for a4. LM is the general Lagrange multiplier test

suggested by Godfrey (1978) and Breusch (1978). Its test statistics suggest the absence of serial

correlation of order 2 in either autoregressive or moving average form for all residuals. To complete this

test, residuals were regressed on their lagged values. The test-statistic = T*R2 has an asymptotic 2)(qχ

distribution. The last statistic in this set is the Ljung-Box (1978) Q test. It was utilized to test against

higher order serial correlation. For all residuals, higher order serial correlation null hypotheses were

rejected.

The second set of statistics in Table 3 provides hints about the amount of energy captured in the

simulated ANN data relative to the original set. Importantly, the energy statistic for the simulated a4

suggests that most energy in the series is preserved and successful prediction of this variable may help

forecast the series well. Simulations of the ANN estimations successfully captured 96% of that signal. It

was also successful in capturing a respectable portion of the energy in d4. Overall, the simulation captured

93.8% of total energy contained in the observed series. The third set reports an R2 = 0.90 and root mean

square error (RMSE) = 17.03.

GP Results

Using the same equations (12-16) GP model specifications were evolved. (Because they are of little

importance, they are included in an Appendix.) Table 4 contains configuration details. The information in

the Table helps reproducing some of the results using TSGP. TSGP prompts the user with questions to

answer. Answers to those questions are in the Table listed in the order they are asked during execution.

Figures 2a-2f contain six plots comparing actual with fitted values of monthly sunspot numbers and of

each of the five variables. They suggest that GP was successful in capturing the dynamics of a4,t and d4,t.

For d3,t, d2,t, and d1,t, GP’s results were slightly better than ANN’s. Actual and fitted MSSNt are in Figure

2a.

13

Table 5 contains a summary of the computed statistics. All residuals’ means are statistically equal

to zero and are not skewed. Only d1 and d2 are leptokurtic. The null hypotheses testing for presence of

serial correlation are rejected for all data. The energy statistic for the GP simulated a4 data set suggests

that most energy in the series is preserved and that MSSN’s values were successfully fitted. Overall, the

simulation captured 96.3% of total energy contained in the observed series. Finally, the R2 = 0.90 and the

root mean square error (RMSE) = 17.52.

5. THE FORECASTS

Figures 3 and 4 contain comparisons of actual and ex post forecasts in addition to ex ante forecasts

using the respective techniques. Forecast statistics for both are in Table 6. The Theil’s U-statistics are

consistent with models’ performances.

Although both forecasts are not perfect, they actually exceeded expectations set before conducting

this experiment given that all points are forecasted 36 months ahead and that training and fitting were

only over the period 1923-1974. The resulting models seem to predict well for a relatively long period

into the future. For ANN, the forecast seems almost perfect 1981 through 1997. For GP, it seems almost

perfect 1978 through 2000.

6. DISCUSSION AND FUTURE RESEARCH

In this paper, a method was proposed to use wavelets in forecasting. The method applies only to

high frequency data (hourly, daily, weekly, monthly, and possibly quarterly provided a very large number

of observations exists). Input data in this study were monthly sunspot numbers. They were decomposed to

a level less than the maximum possible to leave a sufficient number of observations to fit a model to.

Models were fit to the decomposed series. Fitted values were transformed back then compared to the

original series.

14

The results suggest that the method helps forecast a large number of steps ahead (36 months for

sunspot numbers) and for a large number of years into the future (more than 20). Encouraging results

were obtained in the experiment undertaken. This is probably because sunspot numbers have sufficient

signal to model. Applications to other series with higher data generating process complexity will probably

yield less encouraging results. Only future experiments can confirm that.

Future research therefore may involve further experimentation with data characterized with

different levels of complexity. Applying the same method using different types of wavelets should be of

interest as well. Research may also be extended to perhaps employ this method in approximating the

signal-to-noise ratio of an observed series.

APPENDIX

GP EVOLVED EQUATIONS

Equations evolved by TSGP are included here. To compute fitted or forecasted values using these

equations, the following protections apply:

(1) If in (x÷y), y = 0, then (x/y) = 1.

(2) If in y1/2, y < 0, then y1/2 = - | y|1/2.

They are designed to prevent computational problems. Here are the fittest evolved models:

d1,t = [{d1,t-18 / {(sin d1,t-18) – ((sin d1,t-18) / 7)) + d1,t-18 + 2 d1,t-19}} + {d1,t-18 / (d1,t-20 ½ – 9 +

d1,t-19)} + {(d1,t-18 + d1,t-20) / (d1,t-18 + d1,t-19)} + {(2 d1,t-18 + d1,t-20) / (d1,t-18 + d1,t-19 +

d1,t-20)} + (d1,t-18 / d1,t-19) + d1,t-20 {sin (d1,t-19 – 4)}] ½.

(17)

d2,t = (d2,t-9 / d2,t-14 ) – {(d2,t-13 – d2,t-12 – 2 d2,t-11 – 2 d2,t-14 )}1/4 + d2,t-12½ – 2 (d2,t-13 / d2,t-14)1/4 –

(d2,t-13 – d2,t-12 – d2,t-11 – d2,t-131/32 – d2,t-13

½ ) / cos (d2,t-11 * d2,t-14).

(18)

d3,t = [{cos (d3,t-7 + d3,t-5 + 32) + sin (d3,t-7 * d3,t-6) + cos (d3,t-7 + sin d3,t-6 + sin (15 d3,t-7) +

d3,t-71/8} * (sin (d3,t-5 – (d3,t-5 + 46)½] + (19 / d3,t-5) + [{sin d3,t-7 – (d3,t-7 + 8)½ – (d3,t-5 +

(19)

15

12)½} * (cos d3,t-9 + sin (d3,t-6)2)].

d4,t = d4,t-6 – 34 – {(2 d4,t-4 + d4,t-8) / cos (d4,t-5 * 72½} – 3 d4,t-5}½) – d4,t-5 / {d4,t-8 * cos (– 24 /

d4,t-6)}½ – (d4,t-4 + d4,t-5)½ + (– 15 / d4,t-6) + (d4,t-4 + 2 d4,t-5)½ + (– 23 / d4,t-5) + d4,t-8 ½.

(20)

a4,t = a4,t-8 + a4,t-5 + (a4,t-7 * a4,t-8) + [{((a4,t-7 { a4,t-8 (a4,t-8 - 117)}½) ½ * { a4,t-6 (a4,t-8 - 204) ½}½)

+ a4,t-6 {( a4,t-7 (a4,t-8 - 128)) ½ + a4,t-8 - 88}½ + a4,t-5} * cos [a4,t-6½ { a4,t-8 + (a4,t-7 * a4,t-8)

+ a4,t-8}½ + a4,t-8] ] ½.

(21)

REFERENCES

Aussem, A., and Murtagh, F. (1997). Compbining neural network forecasts on wavelet-transformed time

series. Connection Science, 9, 113-121.

Breusch, T. S. (1978). Testing for autocorrelation in dynamic linear models. Australian Economic Papers,

17, 334-355.

Daubechies, I. (1992), Ten Lectures on Wavelets, Vol. 61 of CBMS-NSF Regional Conference Series in

Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia.

Gabr, M. and Rao, S. (1981). The estimation and prediction of subset bilinear time series with

applications. Journal of Time Series Analysis, 2, 155-171.

Gençay, R., Selçuk, F., and Whitcher, B. (2002). An introduction to wavelets and other filtering methods

in finance and economics, San Diego, CA: Academic Press.

Godfrey, L. G. (1978). Testing against general autoregressive and moving average error models when the

regressors include lagged dependent variables. Econometrica, 46, 1293-1302.

Hathaway, D., Wilson, R., and Reichmann, E. (1999). A synthesis of solar cycle prediction techniques.

Journal of Geophysical Research, 104, 22375-22388.

16

Kaboudan, M. (2001). Statistical properties of fitted residuals from genetically evolved models. Journal

of Economic Dynamics and Control, 25, 1719-1749.

Kaboudan, M. (2003). TSGP: A time eeries genetic programming software.

http://newton.uor.edu/facultyfolder/mahmoud_kaboudan/tsgp.

Koza, J. (1992). Genetic Programming, Cambridge, MA: The MIT Press.

Jensen, A. and la Cour-Harbo, A. (2000). Ripples in Mathematics: The Wavelet Transform, Berlin,

Springer.

Jensen, M. (1999). Using wavelets to obtain a consistent ordinary least squares estimator of the long-

memory parameter. Journal of Forecasting, 18, 17-32.

Lee, G. (1998). Wavelets and wavelet estimation: A Review. Journal of Economic Theory and

Econometrics, 4, 123-157.

Lin, t., and Pourahmadi, M. (1998). Nonparametric and non-linear models and data mining in time series:

A case-study on the Canadian lynx data. Applied Statistics, 47, Part 2, 187-201.

Ljung, G. and Box, G. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297-303.

Mundt, M., Maguire II, B., and Chase, R. (1991). Chaos in the sunspot cycle: Analysis and prediction.

Journal of Geophysical Research, 96, 1705-1716.

Nason, G., Theofanis, S. (2000). Wavelet packet function modelling of nonstationary time series.

http://www.stats.bris.ac.uk/~guy/Research/papers/WPTransNonSta.pdf.

Pan, Z., and Wang, X. (1998). A stochastic nonlinear regression estimator using wavelets. Computational

Economics, 11, 89-102.

Ramsey, J., Usikov, D., and Zaslavsky, G. (1995). An analysis of U.S. stock price behavior using

wavelets. Fractals, 3, 377-389.

17

Renaud, O., Starck, J., and Murtagh, F. (2002). Wavelet-based forecasting of short and long memory time

series. Working Paper No. 2002.04, Department of Econometrics, University of Geneva,

http://www.unige.ch/ses/metri/.

Principe, J., Euliano, N., and Lefebvre, C. (2000). Neural and Adaptive Systems: Fundamentals Through

Simulations, New York: John Wiley & Sons, Inc.

SIDC (Solar Influences Data Analysis Center) division of the Royal Observatory of Belgium, (2003).

http://sidc.oma.be/DATA/monthssn.dat.

Swee, E. and Elangovan. S. (1999). Applications of symmlets for denoising and load forecasting. In

Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics, 165-169.

Thomason, M. (1997). Financial forecasting with wavelet filters and neural networks. Journal of

Computational Intelligence in Finance, 2, 27-32.

Tong, H. (1990). Nonlinear Time Series Analysis: A Dynamical System Approach, Oxford: Oxford

University Press.

Walker, J. (1999). A Primer on Wavelets and Their Scientific Applications, Boca Raton, Chapman &

Hall/CRC.

18

Table 1. Detailed information on data used in training and forecasting. ANN and GP were given identical data.

d1 d2 d3 d4 a4 Lags: Start 1920:01 1920:04 1920:08 1921:04 1921:04 End 1923:09 1924:08 1926:08 1930:08 1930:08 NOBS 22 14 10 8 8 Lost df 6 6 6 6 6 Steps-ahead 16 8 4 2 2 Training: Start 1923:10 1924:12 1927:04 1931:12 1931:12 End 1973:06 1973:08 1973:12 1974:03 1974:03 NOBS 299 147 71 32 32 Forecast: Start 1973:08 1973:12 1974:08 1974:08 1974:08 End 2005:05 2005:05 2005:05 2005:05 2005:05 NOBS 192 95 47 24 24

19

Table 2. Neural networks architectures used. d1 d2 d3 d4 a4 Network type MP MP GF MP MP Input PEs 6 6 6 6 6 Hidden layers 2 2 1 1 2 Transfer function* Sigmoid Tan Tan Tan Tan Learning momentum 0.7 0.7 0.7 0.7 0.7 Maximum training epochs 5000 2000 2000 1000 2000 * Sigmoid = SigmoidAxon, and Tan = TanAxon.

20

Table 3. Neural networks training results. d1 d2 d3 d4 a4 MSSN Residuals: Mean 0.000 -0.340 -0.215 0.402 3.420 1.867 p-value 1.000 0.783 0.935 0.738 0.556 0.013 Skewness 0.395 0.142 -0.195 -0.745 0.949 0.380 p-value 0.006 0.487 0.510 0.101 0.037 0.000 Kurtosis 3.271 1.000 0.566 2.328 2.382 1.005 p-value 0.000 0.016 0.355 0.016 0.014 0.000 ρ1 -0.124 -0.066 -0.209 -0.216 -0.126 0.336 p-value 0.034 0.430 0.088 0.259 0.472 0.000 ρ2 -0.032 -0.098 0.141 -0.223 0.353 0.045 p-value 0.581 0.246 0.248 0.245 0.041 0.317 LM χ2 4.567 1.893 5.397 2.285 5.043 63.053 Significance 1 1 1 1 1 1 Q 21.377 19.957 14.020 2.781 5.829 38.152 Significance 0.975 0.986 0.666 0.904 0.560 0.372 Energy: Original 0.011 0.008 0.012 0.017 0.953 1.002 Simulated 0.000 0.001 0.003 0.016 0.919 0.940 Other Stats: RMSE 12.90 14.90 21.19 6.63 32.19 17.03 R2 0.00 0.18 0.33 0.98 0.97 0.90

21

Table 4. TSGP prompted questions and answers used to search for fittest equations

Question Answer

Please enter the dependent variable file name (* for *.txt): a4, d4, …, d1.

Please enter number of data points in Historical (Training) set: Tj

Please enter total number of data points to Forecast: (Discretionary)

Please enter number of data points for ex post Forecast: (Discretionary)

Please enter population size: 1000

Please enter number of generations: 200

Please enter '1' for trig function and '0' for no trig: 1

Please enter '1' for exp function and '0' for no exp: 0

Please enter the number of explanatory variables: 6

Please enter number of searches desired: 100

22

Table 5. GP fitting results. d1 d2 d3 d4 a4 MSSN Residuals: Mean 0.470 -0.450 2.121 -0.707 7.791 2.342 p-value 0.487 0.689 0.292 0.772 0.344 0.002 Skewness -0.146 0.358 -0.385 -0.271 0.257 0.155 p-value 0.305 0.079 0.195 0.551 0.572 0.154 Kurtosis 1.718 0.913 0.263 -0.125 -0.166 0.147 p-value 0.000 0.027 0.667 0.898 0.864 0.499 ρ1 -0.070 -0.013 -0.020 -0.020 0.281 0.436 p-value 0.227 0.872 0.865 0.915 0.180 0.000 ρ2 -0.048 0.007 0.176 0.097 -0.060 0.163 p-value 0.411 0.934 0.148 0.614 0.779 0.000 LM χ2 2.019 0.034 2.190 0.293 1.948 146.500 Significance 1 1 1 1 1 1 Q 23.961 44.716 10.711 3.377 2.460 57.449 Significance 0.938 0.151 0.871 0.848 0.930 0.013 Energy: Original 0.011 0.008 0.012 0.017 0.953 1.002 Simulated 0.002 0.002 0.006 0.016 0.939 0.966 Other Stats: RMSE 11.68 13.97 17.34 13.51 45.80 17.52 R2 0.18 0.28 0.60 0.92 0.95 0.90

23

Table 6. ANN and GP forecast results. d1 d2 d3 d4 a4 MSSN ANN: RMSE 14.42 19.30 26.04 52.91 115.63 36.36 Thiele’s U 0.94 0.79 0.61 0.51 0.16 0.19 GP: RMSE 14.94 19.48 32.14 39.56 90.55 31.27 Thiele’s U 0.81 0.72 0.71 0.44 0.13 0.17

24

0

50

100

150

200

250

193112 193602 194004 194406 194808 195210 195612 196102 196504 196906 197308Date

Figure 1a. Monthly sunpot number ANN actual and fitted values

MS

SN

Actual Fitted

0

200

400

600

800

1000

1 4 7 10 13 16 19 22 25 28 31

ObservationsFigure 1b. Actual and f itted valued of a4

a4

Actual Fitted

-120

-80

-40

0

40

80

120

1 4 7 10 13 16 19 22 25 28 31

ObservationsFigure 1c. Actual and f itted values of d4

d4

Actual Fitted

25

-90

-60

-30

0

30

60

90

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70Observations

Figure 1d. Actual and f itted values of d3

d3

Actual Fitted

-60

-40

-20

0

20

40

60

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133Observations

Figure 1e. Actual and f itted values of d2

d2

Actual Fitted

-60

-40

-20

0

20

40

60

80

1 31 61 91 121 151 181 211 241 271Observations

Figure 1f. Actual and fitted values of d1

d1

Actual Fitted

26

0

50

100

150

200

250

193112 193602 194004 194406 194808 195210 195612 196102 196504 196906 197308Date

Figure 2a. Monthly sunpot number GP actual and fitted values

MSSN

Actual Fitted

0

200

400

600

800

1000

1 4 7 10 13 16 19 22 25 28 31Observations

Figure 2b. Actual and f itted values of 44

a4

Actual Fitted

-200

0

200

1 4 7 10 13 16 19 22 25 28 31

ObservationsFigure 2c. Actual and f itted values of d4

d4

Actual Fitted

27

-80

-60

-40

-20

0

20

40

60

80

100

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70Observations

Figure 2d. Actual and f itted values of d3

d3

Actual Fitted

-60

-40

-20

0

20

40

60

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134Observations

Figure 2e. Actual and f itted values of d2

d2

Actual Fitted

-60

-40

-20

0

20

40

60

80

1 31 61 91 121 151 181 211 241 271Observations

Figure 2f. Actual and f itted values of d1

d1

Actual Fitted

28

0

50

100

150

200

250

197408 197712 198104 198408 198712 199104 199408 199712 200104 200409

Date

MS

SN

Actual Forecast

Figure 3. Actual and ANN forecasted MSSN.

29

0

50

100

150

200

250

197408 197712 198104 198408 198712 199104 199408 199712 200104 200409

Date

MS

SN

Actual Forecast

Figure 4. Actual and GP forecasted MSSN.

wavelets in forecasting - · pdf file2 wavelets in forecasting wavelet analysis is not a...

Documents