# Multistage RBF neural network ensemble learning for exchange rates forecasting

Post on 10-Sep-2016

212 views

TRANSCRIPT

le

cad

, Kow

no

ign

be

) mi

RBF

par

che

rop

stu

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

iple neprove[13,23ble pring thight imBut thie mod

of unstable learning techniques, i.e., small changes in the training

beenlobalnimahichr rate

generated for any training data set applied to those models is

ARTICLE IN PRESS

Contents lists availabl

els

Neurocom

Neurocomputing 71 (2008) 32953302accurate predictors may generate a more accurate forecast thanCorresponding author at: Institute of Systems Science, Academy of Mathe-

matics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.set and/or parameter selection can produce large changes in thepredicted output. Many experiments have shown that the

theoretical innite. The best performance we typically get is onlythe best one selected from a limited number of neural networks,i.e., the single model with the best generalization to a testing set.One interesting point is that, in a prediction case, other less-

the most-accurate predictor. Thus, it is clear that simply selectingthe best predictors according to their performance is not the

Tel.: +8610 62565817; fax: +8610 62541823.

E-mail address: yulean@amss.ac.cn (L. Yu).0925-23

doi:10.1Actually, neural networks provide a natural framework forensemble learning. This is so, because neural networks are a kind

is not zero. Since the number of neural models and their potentialinitialization is unlimited, the possible number of results[3,6,8,1014,17,22,23]. Over this time, the research stream hasgained momentum with the advancement of computer technol-ogies, which have made many elaborate computation methodsavailable and practical [22].

ence? Through the analysis of error distributions, it hasfound that the ways neural networks have of getting to the gminima vary, and some networks just settle into local miinstead of global minima. In any case, it is hard to justify wneural networks error reaches the global minima if the erroaccurate and diverse enough, which requires an adequate trade-off between the conicting conditions. That is, performanceimprovement can result from training the individual neuralnetworks to be decorrelated with each other [14] with respectto their errors. For this, ensemble learning and modeling havebeen a common research stream in the last few decades

sets may not be as good as expected [6,11,12,22].Instability of the single neural network has hampered the

development of better neural network models. Why can the sametraining data applied to different neural network models or thesame neural models with different initialization lead to differentperformance? What are the major factors affecting this differ-Combining the outputs of multaggregate output often gives imindividual neural network outputmotivation of neural network enseman intuitive idea that by combinindividual neural networks, one mmance of a single generic one [10].to be true only when the ensembl12/$ - see front matter & 2008 Elsevier B.V. A

016/j.neucom.2008.04.029ural networks into and accuracy over any]. Initially, the genericocedure is based upone outputs of severalprove on the perfor-

s idea has been provedels are simultaneously

generalization of single neural network is not unique, that is,the neural networks solutions are not stable. Even for somesimple problems, different structures of neural networks(e.g., different number of hidden layers, different number ofhidden nodes and different initial conditions) result in differentpatterns of network generalization. In addition, even the mostpowerful neural network model still cannot cope well whendealing with complex data sets containing some random errors orinsufcient training data. Thus, the performance for these dataMultistage RBF neural network ensembfor exchange rates forecasting

Lean Yu a,b,, Kin Keung Lai b, Shouyang Wang a

a Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Ab Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue

a r t i c l e i n f o

Available online 21 June 2008

Keywords:

RBF neural networks

Ensemble learning

Conditional generalized variance

Exchange rates prediction

a b s t r a c t

In this study, a multistage

model is proposed for fore

stage produces a great num

generalized variance (CGV

In the nal stage, another

testing purposes, we com

network ensemble approa

the predictions using the p

methods presented in this

journal homepage: www.ll rights reserved.learning

emy of Sciences, Beijing 100190, China

loon, Hong Kong

nlinear radial basis function (RBF) neural network ensemble forecasting

exchanger rates prediction. In the process of ensemble modeling, the rst

r of single RBF neural network models. In the second stage, a conditional

nimization method is used to choose the appropriate ensemble members.

network is used for neural network ensemble for prediction purpose. For

e the new ensemble models performance with some existing neural

s in terms of four exchange rates series. Experimental results reveal that

osed approach are consistently better than those obtained using the other

dy in terms of the same measurements.

e at ScienceDirect

evier.com/locate/neucom

puting

Japanese yen, while comparing forecasting performance with allkinds of evaluation criteria.

The rest of this study is organized as follows. The next sectionbriey describes the RBF neural network. Section 3 presents thebuilding process of the multistage RBF neural network ensemblelearning model in detail. For illustration purpose, the empiricalresults of the exchange rates of four main currencies are reportedin Section 4. The conclusions are contained in Section 5.

ARTICLE IN PRESS

X

uting 71 (2008) 32953302optimal choice. More and more researchers have realized thatmerely selecting the predictor that gives the best performance onthe testing set will lead to losses of potentially valuableinformation contained by other less successful predictors. Thelimitation of individual neural network suggested a differentapproach to solving these problems that considers those dis-carded neural networks, whose performance is less accurate aspotential candidates for new network models: a neural networkensemble model [8].

In addition, another important motivation behind ensemblelearning method integrating different neural network models isbased on the fundamental assumption that one cannot identifythe true process exactly, but different models may play acomplementary role in the approximation of this process. In thissense, an ensemble learning model is generally better than asingle model.

However, it is worth noting that there are still several mainproblems to be addressed. First of all, the existing neural networkensemble learning models are basically relied on the standardfeed-forward neural network models [3,6,8,1014,17,22] such asback-propagation neural networks (BPNN). But one main draw-back of the standard feed-forward neural networks is that theirlearning processes are time-consuming and have a tendency toget stuck at local minima [7]. For this reason, this study adoptsanother neural network typeradial basis function (RBF) neuralnetwork for ensemble learning purpose. As [2,19] indicated, theRBF neural network can overcome the above drawbacks to obtaingood performance.

Second, the output of the existing neural network ensemblemodel is a weighted average of the outputs of each neuralnetwork, with the ensemble weights determined as a function ofthe relative error of each network determined in training [13,23]methods include simple averaging method [22,23], simple meansquared error (MSE) method [1,23], stacked regression method[4,23] and variance-based weighting method [10,23]. However,numerous empirical experiments have been demonstrated thateach existing method has its own drawbacks, as will be indicatedin the following section. For this problem, the study proposes anovel ensemble strategyRBF-based ensembling method. That is,we use another RBF network for neural network ensemble. In thissense, the RBF network ensemble learning model is actually anembedded learning model or a modularized learning model.

Third, in the existing neural network ensemble learningmodels, most studies focused on the ensemble strategy and theensemble learning processes are often neglected. For this issue,this study proposes a multistage process to formulate an overallensemble learning procedure.

In terms of the above three aspects, this study will formulate amultistage RBF neural network ensemble learning model in anattempt to solve the three main issues mentioned above. Forfurther illustration and verication, this proposed RBF neuralnetwork ensemble learning model is applied to exchange ratesprediction. The main goal of the proposed RBF ensemble learningmethod is to overcome the shortcomings of the existing neuralnetwork ensemble methods and ameliorate forecasting perfor-mance. The main objectives of this study are three-fold: (1) toshow how to construct a multistage RBF neural network ensemblelearning model; (2) to show how to predict exchange rates usingthe proposed RBF neural network ensemble learning model and(3) to display how various methods compare in their performancein predicting foreign exchange rates. According to the threeobjectives, this study mainly describes the building process of theproposed nonlinear ensemble learning model and the applications

L. Yu et al. / Neurocomp3296of the RBF neural network ensemble learning approach in foreignexchange rate forecasting between the US dollar and four othermajor currencies: British pounds, euros, German marks and1

X2

Xk

W0

W1

W2+

Y

C1

C22. An overview of the RBF neural network

The RBF neural network [2,19] is generally composed of threelayers: input layer, hidden layer and output layer. The input layerfeeds the input data to each of the nodes of the hidden layer. Thehidden layer of nodes differs greatly from other neural networksin that each node represents a data cluster which is centered at aparticular point with a given radius. Each node in the hidden layercalculates the distance from the input vector to its own center. Thecalculated distance is transformed via some basis function and theresult is output from a node. The output from the node ismultiplied by a constant or weighting value and fed into theoutput layer. The output layer consists of only one node whichacts to sum the outputs of the previous layer and to yield a naloutput value [23]. A generic architecture of an RBF network with kinput and m hidden nodes is illustrated in Fig. 1.

The computation process of the RBF neural network followsthe following procedures. When the network receives a kdimensional input vector X, the network computes a scalar valueusing the following formula

Y f X w0 Xmi1

wijDi (1)

where w0 is the bias, wi is the weight parameter, m is the numberof nodes in the hidden layers of the RBF neural network, and j(Di)is the RBF. In this study, the Gaussian function is used as RBF, asshown below

jDi expD2i =s2 (2)

where s is the radius of the cluster represented by the centernode, the Di represents the distance between the input vector Xand all the data centers. It is clear that j(Di) will return valuesbetween 0 and 1. Usually, the Euclidean norm is used to calculatedistance, but other metrics can also be used. The Euclidean normis calculated by

Di Xkj1

xj cji2vuut (3)

where c is a cluster center for any of the given nodes in the hiddenlayer.WmCm

Fig. 1. The generic architecture of the RBF neural network.

data are generally difcult to model using linear regressionmetarelearpro

thutend

manin t

intefast

without trapping into local minima [19].

RBF neural networks or utilizing different training sets. Inparticular, the main methods include the following four ways

ARTICLE IN PRESS

uting(1) Varying different RBF neural network architecture: by chan-ging the number of nodes in hidden layer, diverse RBF neuralnetworks with much disagreement can be created.

(2) Utilizing different cluster center of the RBF neural networks:through varying the cluster center c of the RBF neuralnetworks, different RBF neural networks can be produced.

(3) Adopting different cluster radius of the RBF neural networks:through varying the cluster center o of the RBF neuralnetworks, different RBF neural networks can be generated.

(4) Using different training data: by re-sampling and pre-processing data, we can obtain different training sets. Typicalmethods include bagging [5], cross-validation (CV) [10],stacking [20], and boosting [15]. With these different trainingdatasets, diverse RBF neural network models can be produced.

In the above methods, we can use any one of the four ways. Of3. Building process of the RBF neural networkensemble model

In this section, a three-stage RBF neural network ensemblelearning model is proposed for exchange rates prediction. In therst stage, multiple single RBF neural network predictors areproduced in terms of diversication. In the second stage, anappropriate number of RBF neural network predictors are chosenfrom the considerable number of candidate predictors generatedby the previous stage. That is, the previous two stages adopt anunderlying overproduce and choose paradigm. In the nal stage,the selected RBF neural network predictors are combined into anaggregated output in an embedded learning way in terms ofanother RBF neural network model.

3.1. Producing multiple single RBF neural network predictors

According to bias-variance trade-off principle [24], an ensem-ble model consisting of diverse models with much disagreementis more likely to have a good performance. Therefore, how togenerate the diverse model is a crucial factor. For RBF neuralnetwork model, several methods have been investigated for thegeneration of ensemble members making different errors. Suchmethods basically depended on varying the parameters ofusucoucreametrpolation [2]. Therefore, their parameters are found mucher than in BPNN. Furthermore, the RBF neural network canally reach near perfect accuracy on the training data setsoluce since their parameters that need to be trained are the oneshe hidden layer of the network. Finding their values is thetion of a linear problem and can be obtained throughnetws it is a time-consuming process. Furthermore, they have aency to get stuck at local minima [7]. But RBF neuralorks overcome the above problems to obtain good perfor-of thhodologies [19]. Dissimilar to the regression, neural networksnonlinear and their parameters are determined by somening techniques and search algorithms such as error backpagation and steep gradient algorithm. The main shortcomingse standard BPNN are that their learning process is slow, andComplex nonlinear systems such as foreign exchange rates

L. Yu et al. / Neurocomprse, diverse RBF neural network ensemble candidates could beted using a hybridization of two or more of the abovehods, e.g., different RBF network architectures plus differenttraining data [16]. In this study, we adopt a certain single way tocreate ensemble members. Once some individual RBF neuralnetwork predictors are created, we are required to select somerepresentative members for ensemble purposes to save computa-tional costs and speedup the computational process.

3.2. Choosing appropriate ensemble members

Using the diverse RBF neural network model generated by theprevious stage and training data, each individual RBF neuralpredictor has generated its own result when facing a new sample.However, if there are a great number of individual members (i.e.,the previous stage overproduces multiple ensemble members or apool of ensemble members), we need to select a subset ofrepresentatives in order to improve ensemble efciency and savecomputational cost. As Yu et al. [22] claimed, not all circum-stances are satised with the rule of the more, the better. Thus,it is necessary to use an appropriate method to choose ordetermine the number of individual neural network models(i.e., ensemble members) for ensemble forecasting purpose.Generally, we chose some ensemble members with error weakcorrelation for diverse RBF neural models. In the work of Yu et al.[22], they utilized a principal component analysis (PCA) techniqueto select the appropriate number of ensemble members andobtained good performance from experimental analysis. However,the PCA is a kind of data-reduction technique, which it does notconsider the internal correlations between different ensemblemembers. To overcome this problem, a conditional generalizedvariance (CGV) minimization method is proposed here.

Supposed that there are p neural predictors with n forecastvalues. Then the error matrix (e1,e2,y,ep) of p predictors isrepresented as

E

e11 e12 e1pe21 e22 e2p... ..

. ...

en1 en2 enp

2666664

3777775np

(4)

From the matrix, the mean, variance and covariance of E can becalculated as

Mean : ei 1

n

Xnk1

eki i 1;2; . . . ; p (5)

Variance : Vii 1

n

Xnk1

eki ei2 i 1;2; . . . ; p (6)

Covariance : Vij 1

n

Xnk1

eki eiekj ej i; j 1;2; . . . ;p (7)

Considering Eqs. (6) and (7), we can obtain a variancecovariance matrix

Vpp Vij (8)Here, we use the determinant of V, i.e., |V| to represent the

correlation among the p predictors. When p is equal to one,|V| |V11| the variance of e1 (the rst predictor). When p islarger than one, |V| can be considered to be the generalization ofvariance: therefore, we call |V| as the generalized variance. Clearlywhen the p predictors are correlated, the generalized variance |V|is equal to zero. On the other hand, when the p predictors areindependent, the generalization variance |V| reaches its max-

71 (2008) 32953302 3297imum. Therefore, when the p predictors are neither independentnor correlated, the measurement of generalized variance |V|reects the correlation among the p predictors.

(1)

itse

ARTICLE IN PRESS

uting3.3. Combining the selected members

Depended on the work done in previous stages, a collection ofappropriate ensemble members can be collected. The subsequenttask is to combine these selected members into an aggregatedpredictor in an appropriate ensemble strategy. For helping readerunderstand the following notations, the ensemble predictors canbe rst dened. Suppose there are n individual RBF neuralnetworks trained on a data set D {xi, yi} (i 1,2,y,n). Aftertraining, n individual RBF neural network outputs, i.e.,f1(x),f2(x),y,fn(x) are generated. Through selection procedurepresented in Section 3.2, m RBF ensemble member,f1(x),f2(x),y,fm(x), are chosen. The current question of the RBFneural network ensemble forecasting is how to combine (en-semble) these selected members into an aggregate output y f(x),which is assumed to be a more accurate output. The general formof the model for such an ensemble predictor can be dened as

f^ x Xmi1

wif^ ix (12)

where wi denotes the assigned weight of fi(x), and in general the

sumenskeyFor the retained predictors, we can perform the previousprocedures (1)(4) iteratively until satisfactory results areobtained.(5)should be deleted from the p predictors. On the contrary, if ti4y, then the ith predictor should be retained.(4)e(2), and other p1 predictors are seen as e(1); then we cancalculate CGV of the ith predictor, ti, with Eq. (11).For a pre-specied threshold y, if ti oy, then the ith predictor(3)The CGV V(e(2)|e(1)) can be calculated according to Eq. (11). Itshould be noted here that V(e(2)|e(1)) is a value, denoted as tp.Similarly, for the ith predictor (i 1,2,y,p), we can use (ei) as(2)Considering that the p predictors, the errors can be dividedinto two parts: (e1,e2,y,ep1) is seen as e(1), (ep) is seen as e(2).Now we introduce the concept of CGV. The matrix V can bereformulated with the block matrix. The detailed process is asfollows: (e1,e2,y,ep) is divided into two parts: (e1,e2,y,ep1) and(ep1+1,ep1+2,y,ep), denoted as e(1) and e(2), i.e.,

E

e1

e2

..

.

ep

2666664

3777775

e1e2

!p11p21

; p1 p2 p (9)

V V11

V21

V12

V22

!p1p2

p1 p2

(10)

where V11, V22 represent the covariance matrix of e(1) and e(2).Given e(1), the CGV of e(2), V(e(2)|e(1)), can be expressed as

Ve2je1 V22 V21V111 V12 (11)The above equation shows the change of e(2) given that e(1) is

known. If e(2) has a small change under e(1), then the predictorse(2) can be deleted. This implies that the predictors e(1) can obtainall the information that the predictors e(2) reect. Now we cangive an algorithm for minimizing the CGV below

L. Yu et al. / Neurocomp3298of the weight is equal to one. In the RBF neural networkemble forecasting, how to determine ensemble weights is aissue. As earlier mentioned, there are a variety of methods forare diverse. Thus, averaging them can reduce the ensemblevariance. Usually, the simple averaging method for ensembleforecasting is dened as

f^ x Xmi1

wif^ ix 1

m

Xmi1

f^ ix (13)

where the weight of each individual network output wopt,i 1/m.However, this approach treats each member equally, i.e., it

does not stress ensemble members that can make morecontribution to the nal generalization. That is, it does not takeinto account the fact that some networks may be more accuratethan others. If the variances of ensemble networks are verydifferent, we do not expect to obtain a better result using simpleaveraging method [18]. In addition, since the weights in thecombination are so unstable, a simple average may not the bestchoice in practice [9].

The simple MSE approach estimates the linear weight para-meter wi in Eq. (12) by minimizing the MSE [1], that is, fori 1,2,y,m,

wopt;i argminwi

XkwTi f^ ixj djixj2

( )

Xkj1

f^ ixjf Ti xj0@

1A

1Xkj1

djixf^ ixj (14)

where d(x) is the expected value.The simple MSE solution seems to be reasonable, but, as

Breiman [4] has pointed out, this approach has two seriousproblems in practice

(1) The data are used both in the training of each predictor, and inthe estimation of wi and

(2) Individual predictors are often strongly correlated since theytry to predict the same task.

Due to these problems, this approachs generalization abilitywill be poor. To address these two issues, another ensemblemethod called stacked regression is proposed.

The stacked regression method was proposed by Breiman [4] inorder to solve the problems associated with the previous simpleMSE method. Thus, the stacked regression method is also calledthe modied MSE method. This approach utilizes CV data tomodify the simple MSE solution, i.e.,

wopt;i argminwi

Xkj1

wTi gixj djixj28