engineering applications of artificial intelligence2015

Improved extreme learning machine for multivariate time series onlinesequential prediction$

Xinying Wang, Min Han n

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. Dalian, Liaoning, China

a r t i c l e i n f o

Article history:Received 11 July 2014Received in revised form22 December 2014Accepted 22 December 2014

Keywords:Online predictionMultivariate time seriesExtreme Learning MachineLM algorithm

a b s t r a c t

Multivariate time series has attracted increasing attention due to its rich dynamic information of theunderlying systems. This paper presents an improved extreme learning machine for online sequentialprediction of multivariate time series. The multivariate time series is rst phase-space reconstructed toform the input and output samples. Extreme learning machine, which has simple structure and goodperformance, is used as prediction model. On the basis of the specic network function of extremelearning machine, an improved LevenbergMarquardt algorithm, in which Hessian matrix and gradientvector are calculated iteratively, is developed to implement online sequential prediction. Finally,simulation results of articial and real-world multivariate time series are provided to substantiate theeffectiveness of the proposed method.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Time series prediction, which is bounded in both scienticresearches and engineering applications, has attracted increasingattention for years (De Gooijer and Hyndman, 2006). Due to thecomplexity of underlying systems, nonlinear or chaotic time seriesprediction has aroused more and more concerns (Zhao et al., 2009; Liet al., 2012). Furthermore, the time series observed from complexsystems generally compromises multiple variables, and there is moredynamic information of the underlying dynamic system contained inmultivariate time series than in univariate time series (Cao et al.,1998; Chakraborty et al., 1992). As a consequence, multivariate timeseries prediction has become an increasingly important researchdirection (Popescu, 2011; von Bnau et al., 2009).

In existing literature, support vector machines (Sapankevych andSankar, 2009), neural networks (Shi and Han, 2007; Pino et al., 2008),and other machine learning methods (Bai and Li, 2012) have beenresearched to predict time series. Moreover, according to Takens'Embedding Theorem (Takens, 1981), the time series can be recon-structed to the phase-space by a time delayed coordinate, which cantranslate time correlation to spatial correlation in phase-space.Because of the universal approximation capability and distributedcomputing characteristic, neural networks have become one of the

most inuential prediction tools (Jaeger and Haas, 2004; Zemouriet al., 2003; Niska et al., 2004).

But the traditional gradient-based learning algorithms of neuralnetworks converge slowly and are easy to be trapped in localoptimums, which constrain the prediction performance of neuralnetworks. To deal with the shortcomings of traditional neural net-works, extreme learning machine (ELM) has been developed (Huang etal., 2006b). Compared with other neural networks with randomweights (Huang, 2014), the input weights and the bias of hidden nodesof ELM are generated randomly before learning, and an optimal outputweights can be obtained by a one-shot algorithm. Owing to its simplestructure, fast learning speed and good generalization performance,ELM has been successfully applied to function approximation (Rong etal., 2009), time series prediction (Nizar et al., 2008; Lian et al., 2013),pattern classication (Man et al., 2012; Miche et al., 2010; Luo andZhang, 2014) and other elds (Soria-Olivas et al., 2011; Ye et al., 2013).Although ELM has greatly improved the neural network training speedand accuracy, there are still some shortcomings (Huang et al., 2011).

The ridge regression algorithm is introduced to improve thestability and generalization performance of ELM (Deng et al., 2009;Huang et al., 2012), and second order Newton optimization algorithmis applied in ELM training (Balasundaram, 2013). Besides, onlinelearning variants of ELM are proposed to satisfy real-time and onlinelearning requirements. Online sequential ELM (OS-ELM) (Liang et al.,2006) provides a sequential implementation of the least squaressolution of ELM. Successively, ensemble of online sequential extremelearning machine (EOS-ELM) (Lan et al., 2009), online sequentialextreme learning machine with forgetting mechanism (FOS-ELM)(Zhao et al., 2012), regularized online sequential learning algorithm(ReOS-ELM) (Huynh and Won, 2011), low complexity adaptive

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/engappai

Engineering Applications of Articial Intelligence

http://dx.doi.org/10.1016/j.engappai.2014.12.0130952-1976/& 2014 Elsevier Ltd. All rights reserved.

This paper is under the support of the National Natural Science Foundation ofChina (61374154) and the National Basic Research Program of China (973 Program)(2013CB430403).

n Corresponding author.E-mail address: [email protected] (M. Han).

Engineering Applications of Articial Intelligence 40 (2015) 2836

forgetting factor OS-ELM (LAFF-COS-ELM) (Lim et al., 2013), onlinesequential ELM-TV (Ye et al., 2013) and online sequential extremelearning machine with kernels (OS-ELMK) (Wang and Han, 2014a)have been proposed and their superior performances have beentestied.

Considering the advantages of ELM, some variants of ELM havebeen used to predict multivariate time series. In Wang and Han(2012), a model selection algorithm is applied to determine theoptimal structure of ELM, and the resulting model is used topredict multivariate chaotic time series. And in Wang and Han(2014), different kernels are used together to map the multivariatetime series and the resulting multiple kernel extreme learningmachine (MKELM) is proposed. However, these two methods areall included in the batch or ofine prediction framework. In orderto solve the problem of online sequential prediction of multi-variate time series, an improved ELM prediction model is pre-sented in this paper. The multivariate time series is rstreconstructed to the phase-space where ELM is used to approx-imate the inputoutput mapping. An improved LevenbergMar-quardt (LM) algorithm is developed to optimize the outputweights of the ELM prediction model online sequentially. Whennew samples are observed, the Hessian matrix and the gradientvector are updated iteratively, and the corresponding outputweights are tuned immediately. As a result, the ELM can learnthe latest observed time series in real-time. The paper is organizedas follows. In the second section, the problem denitions aregiven. Some preliminary work is briey reviewed in the thirdsection. Next, an improved online sequential LM algorithm ispresented, and it is incorporated in the ELM prediction model.Finally, three experiments of articial and real-world multivariatetime series are conducted to illustrate the effectiveness of theproposed method compared with other existing approaches.

2. Problem denitions

The variables and notations are dened in Table 1.

3. Preliminaries

In this section, we will give a brief review of multivariate timeseries reconstruction and ELM model.

3.1. Multivariate time series reconstruction

Time series is a sequence of value points, measured at successivetimes spaced typically at uniform time intervals. Time series predic-tion is the use of a model to predict future value based on previouslyobserved values. In order to establish a prediction model for the timeseries data generated from a nonlinear dynamic system, a timedelayed phase-space reconstruction is used as preprocessing, gener-ally. According to Takens' Embedding Theorem (Takens, 1981), asenough delayed coordinates are used, scalar time series is sufcient toreconstruct the dynamic of the underlying systems. However, it is notcertain whether a given scalar time series is sufcient to reconstructthe dynamic or not (Popescu, 2011). Additionally, multivariate timeseries contains more dynamic information than scalar time series,using available multivariate time series would improve the predictionperformance (Chakraborty et al., 1992).

Considering M dimensional time series: X1, X2, , XN , whereXi x1;i; x2;i;; xM;i

, i 1;2;;N. As in the case of scalar time

series (whereM1), a time delayed reconstruction can be made asfollows:

Vn x1;n; x1;n 1 ;; x1;n d11 1 ;x2;n; x2;n2 ;; x2;n d2 1 2 ;

xM;n; xM;n M ;; xM;n dM 1 M T 1

where i, di, i 1;;M, are the time delays and the embeddingdimensions, respectively. As Takens' Embedding Theorem (Takens,1981), if d or each di is large enough, there exists generally afunction F : Rd-Rd d PMi 1 di such thatVn1 F Vn 2The equivalent form of (2) can be written as

x1;n1 F1 Vn x2;n1 F2 Vn xM;n1 FM Vn 3The remaining problems are how to choose the time delays i andembedding dimensions di; i 1;;M, so that (2) or (3) holds.There are several methods for choosing the time delay for scalartime series, such as mutual information and autocorrelation (Sunet al., 2014).

3.2. Extreme learning machine prediction model

ELM has a simple three-layer structure: input layer, outputlayer, and hidden layer which contains a large number of non-linear processing nodes. The weights connecting the input layer tothe hidden layer, and the bias values within the hidden layer ofELM are randomly generated and maintained throughout thelearning process, and only the output weights need to be learned.Generally, ELM has interpolation capability and universal approx-imation capability (Huang et al., 2006a), so ELM can be a promis-ing time series prediction tool.

Mathematically, ELM can be formulated as a function asfollows:

XLi 1

wigW ini; bi; xj XLi 1

wigW ini xjbi yj; j 1;;N: 4

where xjARn is the input vector, W iniAR

n is the weight vectorconnecting the input nodes to the ith hidden node, W ini xjdenotes the inner product of W ini and xj, biAR is the bias of theith hidden node, g is the sigmoid activation function , wiAR isthe output weight connecting the ith hidden node to the out-put node, yjAR is the output of ELM, L is the number of hiddennodes, and N is the number of training samples. In the ELMlearning framework, W ini and bi are randomly chosen before-hand.

The function (4) can be further expressed by the followingmatrix-vector form:

Aw Y : 5where

AgW in1; b1; x1 gW inL;bL; x1

gW in1; b1; xN gW inL;bL; xN

0B@

1CA

NL

;

Y y1;; yNT; and w w1;w2;;wLT. Matrix A is called thehidden layer output matrix of ELM in Huang et al. (2011); the ithcolumn of A is the ith hidden node's output vector with respect toinputs x1; x2;; xN and the jth row of A, denoted as aj is the outputvector of the hidden layer with respect to input xj.

If the ELM model with L hidden nodes can learn these Ntraining samples with no residuals, there exists w, so that

XLi 1

wigW ini xjbi tj; j 1;;N: 6

X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 2836 29

where tj is the target value. Eq. (6) can be written compactly inmatrix-vector form as

Aw T 7

where T t1;; tNT is the target vector. As the input weights andthe hidden layer bias have been randomly chosen at the beginningof learning, (7) becomes a linear parameter system, and thesmallest norm least squares solution of the linear parametersystem can be written as

w AT 8

where A is the MoorePenrose generalized inverse (Huang et al.,2011) of the hidden layer output matrix A. In practice, the MoorePenrose generalized inverse of the hidden layer output matrix iscalculated using the SVD decomposition, and then the non-zerosingular values construct the output weights. However, when thenumber of the training samples and hidden nodes are large, the

computation complexity of the SVD decomposition does restrictthe learning speed of ELM seriously.

4. Multivariate time series online sequential prediction basedon improved ELM

The goal of ELM training is to determine the output weights tominimize the following cost function:

fw XNj 1

XLi 1

wigW ini xjbitj !2

9

Therefore, the training of the ELM can be transformed to anoptimization problem. In this section, an improved online sequen-tial LM algorithm is proposed to solve the optimization problem.Afterwards, based on the improved online sequential LM algorithm,

Table 1Problem denitions.

Variable or Notation Explanation

Xi x1;i ; x2;i ;; xM;i

M dimensional time seriesVn The time delayed reconstruction vectori The time delay of Xidi The embedding dimension of XixjAR

n The jth input vector with n features

W iniARn The weight vector connecting the input nodes to the ith hidden node

biAR The bias of the ith hidden nodeg The sigmoid activation functionwiAR The output weight connecting the ith hidden node to the output nodeyjAR The output of ELMtjAR The target valueL The number of hidden nodesN The number of samples or length of time seriesA The hidden layer output matrix of ELMw w1;;wLT The output weight vector of ELMY y1;; yN T The output vector of ELMT t1;; tN T The target vectorA The MoorePenrose generalized inverse A

H The Hessian matrix of ELMJ The Jacobian matrix of ELMI The unit matrixQ The quasi-Hessian matrix of ELMg The gradient vector of ELMe The error vector of ELMak The kth row of Aqk The sub-Hessian matrix of ELMck The sub-gradient vector of ELMf w The cost function with respect to wEw The sum squared error of ELM with respect to w The control parameter of the learning step of ILM The regularization coefcient The update parameterbs The chunk number of data

ELM Extreme learning machineLM LevenbergMarquardtILM Improved LevenbergMarquardtLM-ELM ELM trained by LM algorithmILM-ELM ELM trained by ILM algorithmOS-ELM Online sequential ELMMS-ELM ELM with model selection algorithmMKELM Multiple kernel extreme learning machineRMSE Root mean squared errorTrainTime The CPU time used to train a prediction modelTrainRMSE The RMSE measured on the training time seriesTrainSTD The standard deviation of TrainRMSETestTime The CPU time used to test a prediction modelTestRMSE The RMSE measured on the testing time seriesTestSTD The standard deviation of TestRMSE

X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 283630

the multivariate time series online sequential prediction model isproposed.

4.1. Improved online sequential LM algorithm for ELM

In this section, an improved LM algorithm (Wilamowski andYu, 2010b, 2010a) is extended to online sequential frameworkfor ELM.

The basic ELM aims to minimize the Sum Squared Error (SSE)between the network output and the desired output

Ew 12

XNi 1

e2i 12

XNi 1

yiti 2 10

The constant 12 is added in order to simplify the calculation. Thecorresponding Hessian matrix of (10) is

H

2Ew21

2Ew1w2

2Ew1wL2E

w2w12Ew22

2Ew2wL 2E

wLw12E

wLw2 2Ew2L

26666664

37777775

11

According to (10) and (11), the element of H can be written as

2Ewiwj

XNk 1

ekwi

ekwj

2ek

wiwjek

12

In LM algorithm, (12) can be approximated as follows:

2Ewiwj

XNk 1

ekwi

ekwj

13

The corresponding Jacobian matrix of (10) can be written as

J

e1w1

e1w2

e1wLe2w1

e2w2

e2wL eNw1

eNw2

eNwL

2666664

3777775 14

The gradient vector of (10) is

g Ew1

Ew2

EwL

T15

Taking (10) into (15), we obtain the specic formula of theelements

gi Ewi

XNk 1

ekwi

ek

16

With (14), we have

g JTe 17where e e1; e2;; eN T is the error vector.

The weight update equations of LM algorithm can be written as

www 18

w JTJI 1

JTe 19

where is the control parameter of the learning step, w is theoutput weights vector which needs to be optimized, I is the unitmatrix, is the regularization coefcient, and e is the error vector.

One disadvantage of the traditional LM algorithm is the necessityto compute and store the Jacobian matrix related to the number ofsamples, and, the traditional LM algorithm is an ofine or batchoptimization algorithm, essentially. Taking account of the ELM net-work equation (4), in order to overcome the defects and extend theLM algorithm to online sequential variant, the following improvementis conducted.

Considering (4), the elements of (14) can be written as

ekwi

yktk wi

PL

i 1 wigW ini xkbitk

wi gW ini xkbi; k 1;; L: 20

Due to the specic network function of ELM, the Jacobian matrix Jis just the hidden layer output matrix A. And, the kth row of thehidden layer output matrix A is denoted as ak, which can bewritten as follows:

ak ekw1

ekw2

ekwL

21

The matrix JTJ can be written as

JTJ ATA

aT1 aTN

a1

aN

264

375

XNk 1

aTkak 22

In order to simplify the analysis, the following sub-Hessian matrixis dened

qk aTkak 23As a result, the matrix JTJ is dened as a quasi-Hessian matrix Q

Q XNk 1

qk 24

The above analysis shows that: the matrix qk can be computedonly using ak, that is, the corresponding sub-matrix qk of eachtraining sample k can be calculated. The matrix Q can be obtainedby the accumulation of these sub-matrices.

When the samples are arriving online sequentially, the sub-matrix qk can be computed immediately by (23). The matrix Q isobtained by accumulating these sub-matrices. Therefore, it onlyneeds L elements storage space to calculate the matrix Q , with noneed to store the Jacobian matrix with N L elements. Addition-ally, it can be seen from (23) that the matrix qk is symmetric.Hence, we can only store its lower or upper triangular part so as tofurther reduce the computation of quasi-Hessian matrix.

Taking into account the denition (21), we introduce thefollowing immediate vector

ck aTkek

ekw1ekw2ekwL

2666664

3777775ek

ekw1

ekekw2

ek

ekwL

ek

2666664

3777775 25

With (16) and (25), the gradient vector can be calculated asfollows:

g XNk 1

ck 26

Similarly, the sub-gradient vector ck is calculated according toeach training sample. Then, the gradient vector can be computed byaccumulating these sub-gradient vectors. As ak is the kth row of thehidden layer output matrix A, only a scalar ek needs to be stored.

Considering the specic form of the hidden layer output matrixA, when a data sample is observed, the vector ak can be calculatedimmediately. As bs data samples are received, the quasi-Hessianmatrix Q and the gradient vector g can be estimated by accumu-lating the sub-Hessian matrix qk and the sub-gradient vector ck,


respectively. The output weights can be updated using (18) and(19). When the next bs data samples are received, the matrix Qand the gradient vector g can be updated by accumulating newcomputed sub-matrix and sub-gradient vector. Then, the outputweights can be further updated. As a result, the LM algorithm isextended to the online sequential framework.

4.2. Prediction model based on improved ELM

Based on the above analysis, when appropriate parameters ofmultivariate time series phase-space reconstruction are selected,and appropriate input and output sample pairs are made as well,the ELM prediction model based on the improved online sequen-tial LM algorithm can be summarized as follows:

Algorithm 1. ELM based on improved online sequential LMalgorithm.

Initialization:Q 0, g 0, Ew 0Randomly generated W in, b and w

Iteration: When bs samples are observedfor k 1 : bs doCalculate network output yk by (4)Calculate network error ekCalculate EwEw12e2kCalculate ak by (21)Calculate qk by (23)Calculate ck by (25)QQqkggck

end forwhile 1 doCalculate w by (19)Update the weights w using (18)Re-calculated Ewif Ew is reduced then\, Break

else

end ifend while

Where bs is the chunk data number, is the regularizationcoefcient and is the update parameter. When bs1, theimproved LM algorithm updates the output weights when eachsample is observed. Moreover, when bs41, the improved LMalgorithm can update the output weights when every bs samplesare observed.

5. Simulation results

Three articial and real-world examples are presented to test theperformance of the proposed method. In order to illustrate theeffectiveness of the proposed method, the proposed method iscompared with ELM (Huang et al., 2006b), OS-ELM (Liang et al.,2006), ELM with model selection (MS-ELM) (Wang and Han, 2012),multiple kernel extreme learning machine (MKELM) (Wang and Han,2014) and ELM trained by traditional LM algorithm (LM-ELM). In thissection, the multivariate time series online sequential predictionbased on the improved ELM is denoted as ILM-ELM for short.

The parameter settings of ELM, OS-ELM, LM-ELM, MS-ELM andILM-ELM are as follows: the activation functions are chosen as thesigmoid function. The input weights are randomly generated from the

uniform interval 1;1. The inputs are normalized into 1;1 andthe outputs are normalized into 0;1. Both the initialization samplenumber and the chuck number are set 20 for OS-ELM and ILM-ELM.The kernels used in MKELM are the Gaussian kernels with the kernelwidth 0:01;0:06;0:11;0:16;0:2;0:5;1;2;5;7;10;12;15;17;20,and the polynomial kernels with the degree coefcient p 1;2;3.The same as Wilamowski and Yu (2010a), the values of parameters and of LM-ELM and ILM-ELM are initialized as 10 and 0.01,respectively. Similar to Huang et al. (2006b), in order to reduce therandomness and test the stability of these methods, 50 trails areconducted for each multivariate time series. The average results andthe standard deviation are reported in this section.

The root mean square error (RMSE) is used to characterize theaccuracy of prediction:

RMSE

1N

XNi 1

tiyi 2

vuut 27where ti indicates the ith sample of the desired output, yi indicatesthe ith sample of the predicted output, and N is the number ofsamples.

5.1. Lorenz chaotic time series

The equations of the Lorenz chaotic systems are as follows:

_x yx_y zxy_z xyz

8>: 28When taking 10, 8/3, 28, x0 y0 z0 1:0, thesystem exhibits chaotic behavior. The fourth-order Runge-Kuttamethod is used to generate the trivariate time series, and 2501samples are obtained. In this simulation, x(t), y(t) and z(t) series areused together to predict xt. The delayed times and embeddingdimensions are set as 1 1, 2 1, 3 1, d1 10, d2 10, d3 10,respectively. And the resulting reconstructed vector is used as theinput while xt is served as the target output. The rst 2000reconstructed samples are used to train the models and the remain-ing samples are used as testing set. Furthermore, 10% (ratio betweenthe standard deviation of noise and the signal standard deviation)Gaussian white noise is added to the training x(t) series, and thetesting time series remains noise free.

In order to determine the number of hidden nodes for ELM, LM-ELM, OS-ELM and ILM-ELM, the rst 1500 samples of the 2000training samples are used to train the ELM with hidden nodesnumbers spaced in 10;15;20;;300, and the RMSE is calculatedon the remaining 500 training set samples. The results are summar-ized in Fig. 1. According to Fig. 1, the number of hidden nodes of ELM,LM-ELM, OS-ELM and ILM-ELM is set as 60. Similarly, with 1500samples to train the model, and 500 samples as the validation set, thehidden nodes number of MS-ELM is determined by a model selectionalgorithm, and the regularization coefcient of MKELM is set as 28,which is selected from 210;9;;5.

The one step ahead prediction results of Lorenz x(t) time seriesare summarized in Table 2. The training times (TrainTime) of theonline sequential learning methods, i.e. OS-ELM and ILM-ELM arelarger than the ofine ELM and LM-ELM, but much less than MS-ELM and MKELM. The testing times (TestTime) of these comparedmethods except for MKELM are at the same order of magnitude. Inthe training RMSE (TrainRMSE) term, the proposed ILM-ELMperforms comparable with ELM, LM-ELM, MS-ELM and MKELM,and better than OS-ELM. Considering the standard deviation oftraining RMSE (TrainSTD), ILM-ELM is the same as LM-ELM, betterthan MS-ELM and much better than OS-ELM. Since the MKELMmethod is a kernel-based method, the trainSTD (the standarddeviation of TrainRMSE) and testSTD (the standard deviation of


TestRMSE) of MKELM are ignored. In testing RMSE (TestRMSE) andTestSTD measures, the proposed ILM-ELM performs almost as wellas the ofine ELM, LM-ELM, MS-ELM and MKELM, while outper-forms the online OS-ELM.

Fig. 2 shows the one step ahead simulation prediction curvesand the error curve based on the proposed ILM-ELM method. Itcan be seen from Fig. 2 that the predicted curve and the actualcurve are so identical that it is difcult to distinguish. Fig. 3 showsthe multiple step ahead prediction results. From Fig. 3, it can beobserved that, as the prediction step increasing, the TestRMSEs ofELM, LM-ELM, OS-ELM, MS-ELM, MKELM and ILM-ELM grow

quickly. In general, the multiple step ahead simulation result ofILM-ELM is similar with ELM and LM-ELM. Compared with OS-ELM, which is an online sequential learning method but issensitive to noise, the proposed ILM-ELM has an adaptive regular-ization parameter, thus, the performance of the proposed ILM-ELMis better than OS-ELM.

5.2. The monthly temperature and rainfall of Dalian

In order to further verify the validity of the proposed ILM-ELMfor multivariate time series prediction, the monthly rainfall andtemperature time series of Dalian are predicted in this simulationexperiment. The monthly rainfall and temperature time series ofDalian, China, from 1951 to 2001, totaling 612 records, are observedand used in this paper. Since there is an obvious period of 12 monthsin these two series, we initially choose the embedding parameters apriori by xing the delayed time 1 2 1 and d1 d2 12. Afterreconstructed, 600 samples are obtained, the rst 480 samples areused as training set, and the remaining 120 samples are used to testthe prediction performance. In the training set, the rst 360 samplesare used to train the model, and the remaining 120 samples are usedas validating set to select the hyper parameters of prediction models.

Number of Hidden Nodes10 30 55 80 105 130 155 180 205 230 255 280 300

RM

SE (L

oren

z)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 1. The relationship between RMSE and the number of hidden nodes (Lorenz x(t)).

Table 2Comparison of prediction results (Lorenz x(t)).

Methods ELM LM-ELM OS-ELM MS-ELM MKELM ILM-ELM

TrainTime 0.0265 0.0493 0.0702 40.2420 29.5723 1.5756TrainRMSE 0.7726 0.7727 0.9113 0.7750 0.7844 0.7870TrainSTD 0.0026 0.0028 0.0368 0.0054 0.0028TestTime 0.0031 0.0031 0.0012 0.0016 1.3205 0.0015TestRMSE 0.0956 0.0972 0.3699 0.0945 0.0880 0.1062TestSTD 0.0110 0.0102 0.0716 0.0133 0.0113

0 50 100 150 200 250 300 350 400 450 500-20

-10

0

10

20 Actual Predicted

0 50 100 150 200 250 300 350 400 450 500

Pred

ictio

n Er

ror

-0.4

-0.2

0

0.2

0.4

Lore

nz

Fig. 2. The predicted and actual time series of Lorenz-x(t) and prediction errors.

Prediction Step0 5 10 15 20 25 30 35 40 45 50

RM

SE

0

1

2

3

4

5

6

7

ELMLM-ELMOS-ELMMS-ELMMKELMILM-ELM

Fig. 3. Multiple step ahead prediction of Lorenz x(t)).

10 55 105 155 205 255 300

RM

SE (T

empe

ratu

re)

0

2

4

RM

SE (R

ainf

all)

50

100

150

Fig. 4. The relationships between validation RMSEs and number of hidden nodes(Monthly Temperature (Left) and Rainfall (Right)).


The validating RMSEs and the number of hidden nodes of ELMare shown in Fig. 4. According to Fig. 4, the numbers of hiddennodes of ELM, LM-ELM, OS-ELM and ILM-ELM, for monthlytemperature and rainfall time series of Dalian, are set as 60 and20, respectively. The hidden node number of MC-ELM is selectedby the model selection algorithm while the regularization coef-cient of MKELM is optimized as 28 for both time series.

The performances of the single step ahead prediction results ofmonthly temperature and rainfall time series of Dalian are shownin Tables 3 and 4. In the view point of training time (TrainTime),ILM-ELM is greater than ELM, LM-ELM, and OS-ELM, but less thanMS-ELM and MKELM for both time series. The MKELM, which usesmany different kernels, gets the best training RMSE (TrainRMSE)among these compared methods, while MS-ELM gets the bestTrainSTD measure of both temperature and rainfall series. Theproposed ILM-ELM is superior to OS-ELM in both TrainRMSE andTrainSTD. When considering the testing results, i.e. testing time(TestTime), testing RMSE (TestRMSE) and standard deviation oftesting RMSE (TestSTD), for monthly temperature time series, theproposed ILM-ELM gets the best TestRMSE among these methods

and the TestSTD of ILM-ELM is less than OS-ELM; for monthlyrainfall time series, the TestRMSE of the proposed ILM-ELM is lessthan ELM, LM-ELM and OS-ELM while the TestSTD of ILM-ELM isless than ELM, OS-ELM and MS-ELM. The TestTime measure ofILM-ELM is moderate among these methods. Moreover, viewingthe ofine or batch learning essence of ELM, LM-ELM, MS-ELM andMKELM, and the online sequential learning essence of OS-ELM andILM-ELM, ILM-ELM is superior to OS-ELM in almost all themeasures while performs comparable well with these ofine orbatch methods.

Table 3Comparison of prediction results (monthly temperature of Dalian).



Table 4Comparison of prediction results (monthly rainfall of Dalian).



0 20 40 60 80 100 120

Tem

pera

ture

-10

0

10

20

30Actual Predicted

0 20 40 60 80 100 120

Pred

ictio

n Er

ror

-4

-2

0

2

4

Fig. 5. The predicted and actual time series of Dalian temperature and predictionerrors.

0 20 40 60 80 100 120

Rai

nfal

l

0

100

200

300

400Actual Predicted

0 20 40 60 80 100 120

Pred

ictio

n Er

ror

-200

-100

0

100

200

300

Fig. 6. The predicted and actual time series of Dalian rainfall and the predictionerrors.

Number of Hidden Nodes10 30 55 80 105 125 155 180 200

RM

SE (R

unof

f of Y

ello

w R

iver

)

101

102

103

104

Fig. 7. The relationships between validation RMSEs and number of hidden nodes(Runoff of Yellow River).

Table 5Comparison of prediction results (yearly runoff of Yellow River)




Fig. 5 shows the single step ahead prediction results of monthlytemperature time series of Dalian. From Fig. 5, it can be seen thatthe predicted curve ts the actual curve very well. Fig. 6 shows thesingle step ahead prediction results of monthly rainfall time seriesof Dalian. It can be observed from Fig. 6 that there are some largeerrors when the time series changed dramatically. However, ingeneral, the predicted curve ts the trend of the actual curve.

5.3. The yearly runoff of Yellow River and yearly mean sunspotsnumber

In this example, the proposed ILM-ELM is used to predict themultivariate series consisting of yearly mean sunspots number andyearly natural runoff of Yellow River. The yearly runoff time seriesof Yellow River is measured at Sanmenxia gauge station in about304 years from 1700 to 2003. In this example, the yearly meansunspot number time series and the yearly runoff time series ofYellow River are used together to predict the runoff of Yellow Riverin the next year. The delayed times and embedding dimensions areset as 1 1, 2 1, d1 12, d2 12, respectively. After the phase-space reconstruction, 292 data samples are obtained. The rst 250samples are used for training the model and the remaining 42samples are used to test the prediction performance. Among these250 training samples, the rst 180 samples are used to train themodel, and the remaining 70 samples are used as validating set tooptimize the free parameters.

The RMSEs measured on the validating set and the number ofhidden nodes of ELM are shown in Fig. 7. As a result, the numberof hidden nodes of ELM, LM-ELM, OS-ELM and ILM-ELM is set as55. The hidden nodes number of MC-ELM is determined by themodel selection algorithm. The regularization coefcient ofMKELM is optimized as 22.

The simulation results are shown in Table 5. From Table 5, it canbe drawn that, ELM gets the smallest TrainTime, LM-ELM gets thebest TrainRMSE and TrainSTD, MS-ELM gets the best TestTime andTestRMSE, and the proposed ILM-ELM gets the smallest TestSTD.Considering only the online sequential learning methods, theproposed ILM-ELM is superior to OS-ELM on all these measuresexcept for TrainTime. Among all these methods, the proposed ILM-ELM gets the second best TrainSTD and TestRMSE. Fig. 8 shows theprediction results of the yearly runoff time series of Yellow River. Itis visually observed that the predicted curve successfully presentsthe trend of the actual curve.

From Tables 3 to 5, it can be seen that the TrainRMSE of ILM-ELMis greater than ELM and LM-ELM, but the TestRMSE of ILM-ELM isalways less than ELM and LM-ELM. The reason is that the ELM andLM-ELMmethods are ofine or batch learning methods which use allthe training set to tune the output weights while the ILM-ELMmethod is online sequential method which uses a few latest samplesto update the output weights. Due to the online sequential learningessence of ILM-ELM, it can capture the latest dynamic information ofthe multivariate time series, and hence it would be suitable for real-time applications.

6. Conclusion

In order to solve the online sequential prediction problem ofmultivariate time series, an extreme learning machine predictionmodel based on an improved online sequential LM algorithm hasbeen proposed. The multivariate time series is rst phase-spacereconstructed. Then, the new developed feedforward neural net-work extreme learning machine is used to model the inputoutputmapping on the phase-space. The improved online sequential LMalgorithm, which extends the traditional LM algorithm to the onlinesequential framework and overcomes the problem of calculating theHessian matrix, is used to train the extreme learning machine. Theeffectiveness of the proposed method is testied on articial andreal-world multivariate time series prediction simulations. Theresults show that the proposed method gets almost the sameperformance as ofine learning methods but gets much betterperformance than online sequential method. Thus, the proposedmethod provides an effective way to predict multivariate time seriesin practical applications and online sequential applications. The typeand the number of hidden nodes of ELM have some importanteffects on the prediction performance. Future work is underway toevaluate the effects.

References

Bai, Y.M., Li, T.S., 2012. Robust fuzzy inference system for prediction of time serieswith outliers. In: 2012 International Conference on Fuzzy Theory and it'sApplications (iFUZZY). IEEE, pp. 394399.

Balasundaram, S., 2013. On extreme learning machine for epsilon-insensitiveregression in the primal by Newton method. Neural Comput. Appl. 22,559567.

von Bnau, P., Meinecke, F.C., Kirly, F.C., Mller, K.-R., 2009. Finding stationarysubspaces in multivariate time series. Phys. Rev. Lett. 103, 214101.

Cao, L.Y., Mees, A., Judd, K., 1998. Dynamics from multivariate time series. Phys. D:Nonlinear Phenom. 121, 7588.

Chakraborty, K., Mehrotra, K., Mohan, C.K., Ranka, S., 1992. Forecasting the behaviorof multivariate time series using neural networks. Neural Netw. 5, 961970.

De Gooijer, J.G., Hyndman, R.J., 2006. 25 years of time series forecasting. Int. J.Forecast. 22, 443473.

Deng, W.Y., Zheng, Q.H., Chen, L., 2009. Regularized extreme learning machine. In:IEEE Symposium on Computational Intelligence and Data Mining. CIDM'09.IEEE, pp. 389395.

Huang, G.B., 2014. An insight into extreme learning machines: random neurons,random features and kernels. Cogn. Comput. 6, 376390.

Huang, G.B., Chen, L., Siew, C.-K., 2006a. Universal approximation using incre-mental constructive feedforward networks with random hidden nodes. IEEETrans. Neural Netw. 17, 879892.

Huang, G.B., Wang, D.H., Lan, Y., 2011. Extreme learning machines: a survey. Int. J.Mach. Learn. Cybern. 2, 107122.

Huang, G.B., Zhou, H.M., Ding, X.J., Zhang, R., 2012. Extreme learning machine forregression and multiclass classication. IEEE Trans. Syst. Man Cybern. Part B:Cybern. 42, 513529.

Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006b. Extreme learning machine: theory andapplications. Neurocomputing 70, 489501.

Huynh, H.T., Won, Y., 2011. Regularized online sequential learning algorithm forsingle-hidden layer feedforward neural networks. Pattern Recognit. Lett. 32,19301935.

Jaeger, H., Haas, H., 2004. Harnessing nonlinearity: predicting chaotic systems andsaving energy in wireless communication. Science 304, 7880.

Lan, Y., Soh, Y.C., Huang, G.-B., 2009. Ensemble of online sequential extremelearning machine. Neurocomputing 72, 33913395.

Li, D.C., Han, M., Wang, J., 2012. Chaotic time series prediction based on a novelrobust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 23, 787799.

0 5 10 15 20 25 30 35 40 45

Run

off o

f Yel

low

Riv

er

0

200

400

600

800 Actual Predicted

0 5 10 15 20 25 30 35 40 45

Pred

ictio

n Er

ror

-200

-100

0

100

200

Fig. 8. The predicted and observed curves of annual runoff time series of YellowRiver and their errors.


Lian, C., Zeng, Z.G., Yao, W., Tang, H.M., 2013. Ensemble of extreme learningmachine for landslide displacement prediction based on time series analysis.Neural Comput. Appl. 24, 99107.

Liang, N.Y., Huang, G.B., Saratchandran, P., Sundararajan, N., 2006. A fast andaccurate online sequential learning algorithm for feedforward networks. IEEETrans. Neural Netw. 17, 14111423.

Lim, J.-S., Lee, S., Pang, H.-S., 2013. Low complexity adaptive forgetting factor foronline sequential extreme learning machine (os-elm) for application to non-stationary system estimations. Neural Comput. Appl. 22, 569576.

Luo, M.X., Zhang, K., 2014. A hybrid approach combining extreme learning machineand sparse representation for image classication. Eng. Appl. Artif. Intell. 27,228235.

Man, Z.H., Lee, K., Wang, D.H., Cao, Z.W., Khoo, S., 2012. Robust single-hidden layerfeedforward network-based pattern classier. IEEE Trans. Neural Netw. Learn.Syst. 23, 19741986.

Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A., 2010. Op-elm:optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 21,158162.

Niska, H., Hiltunen, T., Karppinen, A., Ruuskanen, J., Kolehmainen, M., 2004.Evolving the neural network model for forecasting air pollution time series.Eng. Appl. Artif. Intell. 17, 159167.

Nizar, A.H., Dong, Z.Y., Wang, Y., 2008. Power utility nontechnical loss analysis withextreme learning machine method. IEEE Trans. Power Syst. 23, 946955.

Pino, R., Parreno, J., Gomez, A., Priore, P., 2008. Forecasting next-day price ofelectricity in the Spanish energy market using articial neural networks. Eng.Appl. Artif. Intell. 21, 5362.

Popescu, F., 2011. Robust statistics for describing causality in multivariate timeseries. J. Mach. Learn. Res. 12, 3064.

Rong, H.J., Huang, G.B., Sundararajan, N., Saratchandran, P., 2009. Online sequentialfuzzy extreme learning machine for function approximation and classicationproblems. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39, 10671072.

Sapankevych, N., Sankar, R., 2009. Time series prediction using support vectormachines: a survey. IEEE Comput. Intell. Mag. 4, 2438.

Shi, Z.W., Han, M., 2007. Support vector echo-state machine for chaotic time-seriesprediction. IEEE Trans. Neural Netw. 18, 359372.

Soria-Olivas, E., Gomez-Sanchis, J., Martin, J., Vila-Frances, J., Martinez, M., Magda-lena, J., Serrano, A., 2011. Belm: Bayesian extreme learning machine. IEEE Trans.Neural Netw. 22, 505509.

Sun, Y., Li, J., Liu, J., Chow, C., Sun, B., Wang, R., 2014. Using causal discovery forfeature selection in multivariate numerical time series. Mach. Learn., 119.

Takens, F., 1981. Detecting strange attractors in turbulence. Dynamical systems andturbulence. Warwick 1980, pp. 366381.

Wang, X.Y., Han, M., 2012. Multivariate chaotic time series prediction based onextreme learning machine. Acta Phys. Sin. 61, 97105.

Wang, X.Y., Han, M., 2014. Multivariate time series prediction based on multiplekernel extreme learning machine. In: 2014 International Joint Conference onNeural Networks (IJCNN), pp. 198201.

Wang, X.Y., Han, M., 2014a. Online sequential extreme learning machine withkernels for nonstationary time series prediction. Neurocomputing 145, 9097.

Wilamowski, B.M., Yu, H., 2010a. Improved computation for LevenbergMarquardttraining. IEEE Trans. Neural Netw. 21, 930937.

Wilamowski, B.M., Yu, H., 2010b. Neural network learning without backpropaga-tion. IEEE Trans. Neural Netw. 21, 17931803.

Ye, Y., Squartini, S., Piazza, F., 2013. Online sequential extreme learning machine innonstationary environments. Neurocomputing 116, 94101.

Zemouri, R., Racoceanu, D., Zerhouni, N., 2003. Recurrent radial basis functionnetwork for time-series prediction. Eng. Appl. Artif. Intell. 16, 453463.

Zhao, J.W., Wang, Z.H., Park, D.S., 2012. Online sequential extreme learning machinewith forgetting mechanism. Neurocomputing 87, 7989.

Zhao, P., Xing, L., Yu, J., 2009. Chaotic time series prediction: from one to another.Phys. Lett. A 373, 21742177.


Improved extreme learning machine for multivariate time series online sequential predictionIntroductionProblem definitionsPreliminariesMultivariate time series reconstructionExtreme learning machine prediction model

Multivariate time series online sequential prediction based on improved ELMImproved online sequential LM algorithm for ELMPrediction model based on improved ELM

Simulation resultsLorenz chaotic time seriesThe monthly temperature and rainfall of DalianThe yearly runoff of Yellow River and yearly mean sunspots number

ConclusionReferences

engineering applications of artificial intelligence2015

Documents