engineering applications of artificial intelligence2015

9
Improved extreme learning machine for multivariate time series online sequential prediction $ Xinying Wang, Min Han n Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. Dalian, Liaoning, China article info Article history: Received 11 July 2014 Received in revised form 22 December 2014 Accepted 22 December 2014 Keywords: Online prediction Multivariate time series Extreme Learning Machine LM algorithm abstract Multivariate time series has attracted increasing attention due to its rich dynamic information of the underlying systems. This paper presents an improved extreme learning machine for online sequential prediction of multivariate time series. The multivariate time series is rst phase-space reconstructed to form the input and output samples. Extreme learning machine, which has simple structure and good performance, is used as prediction model. On the basis of the specic network function of extreme learning machine, an improved LevenbergMarquardt algorithm, in which Hessian matrix and gradient vector are calculated iteratively, is developed to implement online sequential prediction. Finally, simulation results of articial and real-world multivariate time series are provided to substantiate the effectiveness of the proposed method. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Time series prediction, which is bounded in both scientic researches and engineering applications, has attracted increasing attention for years (De Gooijer and Hyndman, 2006). Due to the complexity of underlying systems, nonlinear or chaotic time series prediction has aroused more and more concerns (Zhao et al., 2009; Li et al., 2012). Furthermore, the time series observed from complex systems generally compromises multiple variables, and there is more dynamic information of the underlying dynamic system contained in multivariate time series than in univariate time series (Cao et al., 1998; Chakraborty et al., 1992). As a consequence, multivariate time series prediction has become an increasingly important research direction (Popescu, 2011; von Bünau et al., 2009). In existing literature, support vector machines (Sapankevych and Sankar, 2009), neural networks (Shi and Han, 2007; Pino et al., 2008), and other machine learning methods (Bai and Li, 2012) have been researched to predict time series. Moreover, according to Takens' Embedding Theorem (Takens, 1981), the time series can be recon- structed to the phase-space by a time delayed coordinate, which can translate time correlation to spatial correlation in phase-space. Because of the universal approximation capability and distributed computing characteristic, neural networks have become one of the most inuential prediction tools (Jaeger and Haas, 2004; Zemouri et al., 2003; Niska et al., 2004). But the traditional gradient-based learning algorithms of neural networks converge slowly and are easy to be trapped in local optimums, which constrain the prediction performance of neural networks. To deal with the shortcomings of traditional neural net- works, extreme learning machine (ELM) has been developed (Huang et al., 2006b). Compared with other neural networks with random weights (Huang, 2014), the input weights and the bias of hidden nodes of ELM are generated randomly before learning, and an optimal output weights can be obtained by a one-shot algorithm. Owing to its simple structure, fast learning speed and good generalization performance, ELM has been successfully applied to function approximation (Rong et al., 2009), time series prediction (Nizar et al., 2008; Lian et al., 2013), pattern classi cation (Man et al., 2012; Miche et al., 2010; Luo and Zhang, 2014) and other elds (Soria-Olivas et al., 2011; Ye et al., 2013). Although ELM has greatly improved the neural network training speed and accuracy, there are still some shortcomings (Huang et al., 2011). The ridge regression algorithm is introduced to improve the stability and generalization performance of ELM (Deng et al., 2009; Huang et al., 2012), and second order Newton optimization algorithm is applied in ELM training (Balasundaram, 2013). Besides, online learning variants of ELM are proposed to satisfy real-time and online learning requirements. Online sequential ELM (OS-ELM) (Liang et al., 2006) provides a sequential implementation of the least squares solution of ELM. Successively, ensemble of online sequential extreme learning machine (EOS-ELM) (Lan et al., 2009), online sequential extreme learning machine with forgetting mechanism (FOS-ELM) (Zhao et al., 2012), regularized online sequential learning algorithm (ReOS-ELM) (Huynh and Won, 2011), low complexity adaptive Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Articial Intelligence http://dx.doi.org/10.1016/j.engappai.2014.12.013 0952-1976/& 2014 Elsevier Ltd. All rights reserved. This paper is under the support of the National Natural Science Foundation of China (61374154) and the National Basic Research Program of China (973 Program) (2013CB430403). n Corresponding author. E-mail address: [email protected] (M. Han). Engineering Applications of Articial Intelligence 40 (2015) 2836

Upload: auddrey

Post on 07-Nov-2015

214 views

Category:

Documents


0 download

DESCRIPTION

Artificial Intelligence2015

TRANSCRIPT

  • Improved extreme learning machine for multivariate time series onlinesequential prediction$

    Xinying Wang, Min Han n

    Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. Dalian, Liaoning, China

    a r t i c l e i n f o

    Article history:Received 11 July 2014Received in revised form22 December 2014Accepted 22 December 2014

    Keywords:Online predictionMultivariate time seriesExtreme Learning MachineLM algorithm

    a b s t r a c t

    Multivariate time series has attracted increasing attention due to its rich dynamic information of theunderlying systems. This paper presents an improved extreme learning machine for online sequentialprediction of multivariate time series. The multivariate time series is rst phase-space reconstructed toform the input and output samples. Extreme learning machine, which has simple structure and goodperformance, is used as prediction model. On the basis of the specic network function of extremelearning machine, an improved LevenbergMarquardt algorithm, in which Hessian matrix and gradientvector are calculated iteratively, is developed to implement online sequential prediction. Finally,simulation results of articial and real-world multivariate time series are provided to substantiate theeffectiveness of the proposed method.

    & 2014 Elsevier Ltd. All rights reserved.

    1. Introduction

    Time series prediction, which is bounded in both scienticresearches and engineering applications, has attracted increasingattention for years (De Gooijer and Hyndman, 2006). Due to thecomplexity of underlying systems, nonlinear or chaotic time seriesprediction has aroused more and more concerns (Zhao et al., 2009; Liet al., 2012). Furthermore, the time series observed from complexsystems generally compromises multiple variables, and there is moredynamic information of the underlying dynamic system contained inmultivariate time series than in univariate time series (Cao et al.,1998; Chakraborty et al., 1992). As a consequence, multivariate timeseries prediction has become an increasingly important researchdirection (Popescu, 2011; von Bnau et al., 2009).

    In existing literature, support vector machines (Sapankevych andSankar, 2009), neural networks (Shi and Han, 2007; Pino et al., 2008),and other machine learning methods (Bai and Li, 2012) have beenresearched to predict time series. Moreover, according to Takens'Embedding Theorem (Takens, 1981), the time series can be recon-structed to the phase-space by a time delayed coordinate, which cantranslate time correlation to spatial correlation in phase-space.Because of the universal approximation capability and distributedcomputing characteristic, neural networks have become one of the

    most inuential prediction tools (Jaeger and Haas, 2004; Zemouriet al., 2003; Niska et al., 2004).

    But the traditional gradient-based learning algorithms of neuralnetworks converge slowly and are easy to be trapped in localoptimums, which constrain the prediction performance of neuralnetworks. To deal with the shortcomings of traditional neural net-works, extreme learning machine (ELM) has been developed (Huang etal., 2006b). Compared with other neural networks with randomweights (Huang, 2014), the input weights and the bias of hidden nodesof ELM are generated randomly before learning, and an optimal outputweights can be obtained by a one-shot algorithm. Owing to its simplestructure, fast learning speed and good generalization performance,ELM has been successfully applied to function approximation (Rong etal., 2009), time series prediction (Nizar et al., 2008; Lian et al., 2013),pattern classication (Man et al., 2012; Miche et al., 2010; Luo andZhang, 2014) and other elds (Soria-Olivas et al., 2011; Ye et al., 2013).Although ELM has greatly improved the neural network training speedand accuracy, there are still some shortcomings (Huang et al., 2011).

    The ridge regression algorithm is introduced to improve thestability and generalization performance of ELM (Deng et al., 2009;Huang et al., 2012), and second order Newton optimization algorithmis applied in ELM training (Balasundaram, 2013). Besides, onlinelearning variants of ELM are proposed to satisfy real-time and onlinelearning requirements. Online sequential ELM (OS-ELM) (Liang et al.,2006) provides a sequential implementation of the least squaressolution of ELM. Successively, ensemble of online sequential extremelearning machine (EOS-ELM) (Lan et al., 2009), online sequentialextreme learning machine with forgetting mechanism (FOS-ELM)(Zhao et al., 2012), regularized online sequential learning algorithm(ReOS-ELM) (Huynh and Won, 2011), low complexity adaptive

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/engappai

    Engineering Applications of Articial Intelligence

    http://dx.doi.org/10.1016/j.engappai.2014.12.0130952-1976/& 2014 Elsevier Ltd. All rights reserved.

    This paper is under the support of the National Natural Science Foundation ofChina (61374154) and the National Basic Research Program of China (973 Program)(2013CB430403).

    n Corresponding author.E-mail address: [email protected] (M. Han).

    Engineering Applications of Articial Intelligence 40 (2015) 2836

  • forgetting factor OS-ELM (LAFF-COS-ELM) (Lim et al., 2013), onlinesequential ELM-TV (Ye et al., 2013) and online sequential extremelearning machine with kernels (OS-ELMK) (Wang and Han, 2014a)have been proposed and their superior performances have beentestied.

    Considering the advantages of ELM, some variants of ELM havebeen used to predict multivariate time series. In Wang and Han(2012), a model selection algorithm is applied to determine theoptimal structure of ELM, and the resulting model is used topredict multivariate chaotic time series. And in Wang and Han(2014), different kernels are used together to map the multivariatetime series and the resulting multiple kernel extreme learningmachine (MKELM) is proposed. However, these two methods areall included in the batch or ofine prediction framework. In orderto solve the problem of online sequential prediction of multi-variate time series, an improved ELM prediction model is pre-sented in this paper. The multivariate time series is rstreconstructed to the phase-space where ELM is used to approx-imate the inputoutput mapping. An improved LevenbergMar-quardt (LM) algorithm is developed to optimize the outputweights of the ELM prediction model online sequentially. Whennew samples are observed, the Hessian matrix and the gradientvector are updated iteratively, and the corresponding outputweights are tuned immediately. As a result, the ELM can learnthe latest observed time series in real-time. The paper is organizedas follows. In the second section, the problem denitions aregiven. Some preliminary work is briey reviewed in the thirdsection. Next, an improved online sequential LM algorithm ispresented, and it is incorporated in the ELM prediction model.Finally, three experiments of articial and real-world multivariatetime series are conducted to illustrate the effectiveness of theproposed method compared with other existing approaches.

    2. Problem denitions

    The variables and notations are dened in Table 1.

    3. Preliminaries

    In this section, we will give a brief review of multivariate timeseries reconstruction and ELM model.

    3.1. Multivariate time series reconstruction

    Time series is a sequence of value points, measured at successivetimes spaced typically at uniform time intervals. Time series predic-tion is the use of a model to predict future value based on previouslyobserved values. In order to establish a prediction model for the timeseries data generated from a nonlinear dynamic system, a timedelayed phase-space reconstruction is used as preprocessing, gener-ally. According to Takens' Embedding Theorem (Takens, 1981), asenough delayed coordinates are used, scalar time series is sufcient toreconstruct the dynamic of the underlying systems. However, it is notcertain whether a given scalar time series is sufcient to reconstructthe dynamic or not (Popescu, 2011). Additionally, multivariate timeseries contains more dynamic information than scalar time series,using available multivariate time series would improve the predictionperformance (Chakraborty et al., 1992).

    Considering M dimensional time series: X1, X2, , XN , whereXi x1;i; x2;i;; xM;i

    , i 1;2;;N. As in the case of scalar time

    series (whereM1), a time delayed reconstruction can be made asfollows:

    Vn x1;n; x1;n 1 ;; x1;n d11 1 ;x2;n; x2;n2 ;; x2;n d2 1 2 ;

    xM;n; xM;n M ;; xM;n dM 1 M T 1

    where i, di, i 1;;M, are the time delays and the embeddingdimensions, respectively. As Takens' Embedding Theorem (Takens,1981), if d or each di is large enough, there exists generally afunction F : Rd-Rd d PMi 1 di such thatVn1 F Vn 2The equivalent form of (2) can be written as

    x1;n1 F1 Vn x2;n1 F2 Vn xM;n1 FM Vn 3The remaining problems are how to choose the time delays i andembedding dimensions di; i 1;;M, so that (2) or (3) holds.There are several methods for choosing the time delay for scalartime series, such as mutual information and autocorrelation (Sunet al., 2014).

    3.2. Extreme learning machine prediction model

    ELM has a simple three-layer structure: input layer, outputlayer, and hidden layer which contains a large number of non-linear processing nodes. The weights connecting the input layer tothe hidden layer, and the bias values within the hidden layer ofELM are randomly generated and maintained throughout thelearning process, and only the output weights need to be learned.Generally, ELM has interpolation capability and universal approx-imation capability (Huang et al., 2006a), so ELM can be a promis-ing time series prediction tool.

    Mathematically, ELM can be formulated as a function asfollows:

    XLi 1

    wigW ini; bi; xj XLi 1

    wigW ini xjbi yj; j 1;;N: 4

    where xjARn is the input vector, W iniAR

    n is the weight vectorconnecting the input nodes to the ith hidden node, W ini xjdenotes the inner product of W ini and xj, biAR is the bias of theith hidden node, g is the sigmoid activation function , wiAR isthe output weight connecting the ith hidden node to the out-put node, yjAR is the output of ELM, L is the number of hiddennodes, and N is the number of training samples. In the ELMlearning framework, W ini and bi are randomly chosen before-hand.

    The function (4) can be further expressed by the followingmatrix-vector form:

    Aw Y : 5where

    AgW in1; b1; x1 gW inL;bL; x1

    gW in1; b1; xN gW inL;bL; xN

    0B@

    1CA

    NL

    ;

    Y y1;; yNT; and w w1;w2;;wLT. Matrix A is called thehidden layer output matrix of ELM in Huang et al. (2011); the ithcolumn of A is the ith hidden node's output vector with respect toinputs x1; x2;; xN and the jth row of A, denoted as aj is the outputvector of the hidden layer with respect to input xj.

    If the ELM model with L hidden nodes can learn these Ntraining samples with no residuals, there exists w, so that

    XLi 1

    wigW ini xjbi tj; j 1;;N: 6

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 2836 29

  • where tj is the target value. Eq. (6) can be written compactly inmatrix-vector form as

    Aw T 7

    where T t1;; tNT is the target vector. As the input weights andthe hidden layer bias have been randomly chosen at the beginningof learning, (7) becomes a linear parameter system, and thesmallest norm least squares solution of the linear parametersystem can be written as

    w AT 8

    where A is the MoorePenrose generalized inverse (Huang et al.,2011) of the hidden layer output matrix A. In practice, the MoorePenrose generalized inverse of the hidden layer output matrix iscalculated using the SVD decomposition, and then the non-zerosingular values construct the output weights. However, when thenumber of the training samples and hidden nodes are large, the

    computation complexity of the SVD decomposition does restrictthe learning speed of ELM seriously.

    4. Multivariate time series online sequential prediction basedon improved ELM

    The goal of ELM training is to determine the output weights tominimize the following cost function:

    fw XNj 1

    XLi 1

    wigW ini xjbitj !2

    9

    Therefore, the training of the ELM can be transformed to anoptimization problem. In this section, an improved online sequen-tial LM algorithm is proposed to solve the optimization problem.Afterwards, based on the improved online sequential LM algorithm,

    Table 1Problem denitions.

    Variable or Notation Explanation

    Xi x1;i ; x2;i ;; xM;i

    M dimensional time seriesVn The time delayed reconstruction vectori The time delay of Xidi The embedding dimension of XixjAR

    n The jth input vector with n features

    W iniARn The weight vector connecting the input nodes to the ith hidden node

    biAR The bias of the ith hidden nodeg The sigmoid activation functionwiAR The output weight connecting the ith hidden node to the output nodeyjAR The output of ELMtjAR The target valueL The number of hidden nodesN The number of samples or length of time seriesA The hidden layer output matrix of ELMw w1;;wLT The output weight vector of ELMY y1;; yN T The output vector of ELMT t1;; tN T The target vectorA The MoorePenrose generalized inverse A

    H The Hessian matrix of ELMJ The Jacobian matrix of ELMI The unit matrixQ The quasi-Hessian matrix of ELMg The gradient vector of ELMe The error vector of ELMak The kth row of Aqk The sub-Hessian matrix of ELMck The sub-gradient vector of ELMf w The cost function with respect to wEw The sum squared error of ELM with respect to w The control parameter of the learning step of ILM The regularization coefcient The update parameterbs The chunk number of data

    ELM Extreme learning machineLM LevenbergMarquardtILM Improved LevenbergMarquardtLM-ELM ELM trained by LM algorithmILM-ELM ELM trained by ILM algorithmOS-ELM Online sequential ELMMS-ELM ELM with model selection algorithmMKELM Multiple kernel extreme learning machineRMSE Root mean squared errorTrainTime The CPU time used to train a prediction modelTrainRMSE The RMSE measured on the training time seriesTrainSTD The standard deviation of TrainRMSETestTime The CPU time used to test a prediction modelTestRMSE The RMSE measured on the testing time seriesTestSTD The standard deviation of TestRMSE

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 283630

  • the multivariate time series online sequential prediction model isproposed.

    4.1. Improved online sequential LM algorithm for ELM

    In this section, an improved LM algorithm (Wilamowski andYu, 2010b, 2010a) is extended to online sequential frameworkfor ELM.

    The basic ELM aims to minimize the Sum Squared Error (SSE)between the network output and the desired output

    Ew 12

    XNi 1

    e2i 12

    XNi 1

    yiti 2 10

    The constant 12 is added in order to simplify the calculation. Thecorresponding Hessian matrix of (10) is

    H

    2Ew21

    2Ew1w2

    2Ew1wL2E

    w2w12Ew22

    2Ew2wL 2E

    wLw12E

    wLw2 2Ew2L

    26666664

    37777775

    11

    According to (10) and (11), the element of H can be written as

    2Ewiwj

    XNk 1

    ekwi

    ekwj

    2ek

    wiwjek

    12

    In LM algorithm, (12) can be approximated as follows:

    2Ewiwj

    XNk 1

    ekwi

    ekwj

    13

    The corresponding Jacobian matrix of (10) can be written as

    J

    e1w1

    e1w2

    e1wLe2w1

    e2w2

    e2wL eNw1

    eNw2

    eNwL

    2666664

    3777775 14

    The gradient vector of (10) is

    g Ew1

    Ew2

    EwL

    T15

    Taking (10) into (15), we obtain the specic formula of theelements

    gi Ewi

    XNk 1

    ekwi

    ek

    16

    With (14), we have

    g JTe 17where e e1; e2;; eN T is the error vector.

    The weight update equations of LM algorithm can be written as

    www 18

    w JTJI 1

    JTe 19

    where is the control parameter of the learning step, w is theoutput weights vector which needs to be optimized, I is the unitmatrix, is the regularization coefcient, and e is the error vector.

    One disadvantage of the traditional LM algorithm is the necessityto compute and store the Jacobian matrix related to the number ofsamples, and, the traditional LM algorithm is an ofine or batchoptimization algorithm, essentially. Taking account of the ELM net-work equation (4), in order to overcome the defects and extend theLM algorithm to online sequential variant, the following improvementis conducted.

    Considering (4), the elements of (14) can be written as

    ekwi

    yktk wi

    PL

    i 1 wigW ini xkbitk

    wi gW ini xkbi; k 1;; L: 20

    Due to the specic network function of ELM, the Jacobian matrix Jis just the hidden layer output matrix A. And, the kth row of thehidden layer output matrix A is denoted as ak, which can bewritten as follows:

    ak ekw1

    ekw2

    ekwL

    21

    The matrix JTJ can be written as

    JTJ ATA

    aT1 aTN

    a1

    aN

    264

    375

    XNk 1

    aTkak 22

    In order to simplify the analysis, the following sub-Hessian matrixis dened

    qk aTkak 23As a result, the matrix JTJ is dened as a quasi-Hessian matrix Q

    Q XNk 1

    qk 24

    The above analysis shows that: the matrix qk can be computedonly using ak, that is, the corresponding sub-matrix qk of eachtraining sample k can be calculated. The matrix Q can be obtainedby the accumulation of these sub-matrices.

    When the samples are arriving online sequentially, the sub-matrix qk can be computed immediately by (23). The matrix Q isobtained by accumulating these sub-matrices. Therefore, it onlyneeds L elements storage space to calculate the matrix Q , with noneed to store the Jacobian matrix with N L elements. Addition-ally, it can be seen from (23) that the matrix qk is symmetric.Hence, we can only store its lower or upper triangular part so as tofurther reduce the computation of quasi-Hessian matrix.

    Taking into account the denition (21), we introduce thefollowing immediate vector

    ck aTkek

    ekw1ekw2ekwL

    2666664

    3777775ek

    ekw1

    ekekw2

    ek

    ekwL

    ek

    2666664

    3777775 25

    With (16) and (25), the gradient vector can be calculated asfollows:

    g XNk 1

    ck 26

    Similarly, the sub-gradient vector ck is calculated according toeach training sample. Then, the gradient vector can be computed byaccumulating these sub-gradient vectors. As ak is the kth row of thehidden layer output matrix A, only a scalar ek needs to be stored.

    Considering the specic form of the hidden layer output matrixA, when a data sample is observed, the vector ak can be calculatedimmediately. As bs data samples are received, the quasi-Hessianmatrix Q and the gradient vector g can be estimated by accumu-lating the sub-Hessian matrix qk and the sub-gradient vector ck,

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 2836 31

  • respectively. The output weights can be updated using (18) and(19). When the next bs data samples are received, the matrix Qand the gradient vector g can be updated by accumulating newcomputed sub-matrix and sub-gradient vector. Then, the outputweights can be further updated. As a result, the LM algorithm isextended to the online sequential framework.

    4.2. Prediction model based on improved ELM

    Based on the above analysis, when appropriate parameters ofmultivariate time series phase-space reconstruction are selected,and appropriate input and output sample pairs are made as well,the ELM prediction model based on the improved online sequen-tial LM algorithm can be summarized as follows:

    Algorithm 1. ELM based on improved online sequential LMalgorithm.

    Initialization:Q 0, g 0, Ew 0Randomly generated W in, b and w

    Iteration: When bs samples are observedfor k 1 : bs doCalculate network output yk by (4)Calculate network error ekCalculate EwEw12e2kCalculate ak by (21)Calculate qk by (23)Calculate ck by (25)QQqkggck

    end forwhile 1 doCalculate w by (19)Update the weights w using (18)Re-calculated Ewif Ew is reduced then\, Break

    else

    end ifend while

    Where bs is the chunk data number, is the regularizationcoefcient and is the update parameter. When bs1, theimproved LM algorithm updates the output weights when eachsample is observed. Moreover, when bs41, the improved LMalgorithm can update the output weights when every bs samplesare observed.

    5. Simulation results

    Three articial and real-world examples are presented to test theperformance of the proposed method. In order to illustrate theeffectiveness of the proposed method, the proposed method iscompared with ELM (Huang et al., 2006b), OS-ELM (Liang et al.,2006), ELM with model selection (MS-ELM) (Wang and Han, 2012),multiple kernel extreme learning machine (MKELM) (Wang and Han,2014) and ELM trained by traditional LM algorithm (LM-ELM). In thissection, the multivariate time series online sequential predictionbased on the improved ELM is denoted as ILM-ELM for short.

    The parameter settings of ELM, OS-ELM, LM-ELM, MS-ELM andILM-ELM are as follows: the activation functions are chosen as thesigmoid function. The input weights are randomly generated from the

    uniform interval 1;1. The inputs are normalized into 1;1 andthe outputs are normalized into 0;1. Both the initialization samplenumber and the chuck number are set 20 for OS-ELM and ILM-ELM.The kernels used in MKELM are the Gaussian kernels with the kernelwidth 0:01;0:06;0:11;0:16;0:2;0:5;1;2;5;7;10;12;15;17;20,and the polynomial kernels with the degree coefcient p 1;2;3.The same as Wilamowski and Yu (2010a), the values of parameters and of LM-ELM and ILM-ELM are initialized as 10 and 0.01,respectively. Similar to Huang et al. (2006b), in order to reduce therandomness and test the stability of these methods, 50 trails areconducted for each multivariate time series. The average results andthe standard deviation are reported in this section.

    The root mean square error (RMSE) is used to characterize theaccuracy of prediction:

    RMSE

    1N

    XNi 1

    tiyi 2

    vuut 27where ti indicates the ith sample of the desired output, yi indicatesthe ith sample of the predicted output, and N is the number ofsamples.

    5.1. Lorenz chaotic time series

    The equations of the Lorenz chaotic systems are as follows:

    _x yx_y zxy_z xyz

    8>: 28When taking 10, 8/3, 28, x0 y0 z0 1:0, thesystem exhibits chaotic behavior. The fourth-order Runge-Kuttamethod is used to generate the trivariate time series, and 2501samples are obtained. In this simulation, x(t), y(t) and z(t) series areused together to predict xt. The delayed times and embeddingdimensions are set as 1 1, 2 1, 3 1, d1 10, d2 10, d3 10,respectively. And the resulting reconstructed vector is used as theinput while xt is served as the target output. The rst 2000reconstructed samples are used to train the models and the remain-ing samples are used as testing set. Furthermore, 10% (ratio betweenthe standard deviation of noise and the signal standard deviation)Gaussian white noise is added to the training x(t) series, and thetesting time series remains noise free.

    In order to determine the number of hidden nodes for ELM, LM-ELM, OS-ELM and ILM-ELM, the rst 1500 samples of the 2000training samples are used to train the ELM with hidden nodesnumbers spaced in 10;15;20;;300, and the RMSE is calculatedon the remaining 500 training set samples. The results are summar-ized in Fig. 1. According to Fig. 1, the number of hidden nodes of ELM,LM-ELM, OS-ELM and ILM-ELM is set as 60. Similarly, with 1500samples to train the model, and 500 samples as the validation set, thehidden nodes number of MS-ELM is determined by a model selectionalgorithm, and the regularization coefcient of MKELM is set as 28,which is selected from 210;9;;5.

    The one step ahead prediction results of Lorenz x(t) time seriesare summarized in Table 2. The training times (TrainTime) of theonline sequential learning methods, i.e. OS-ELM and ILM-ELM arelarger than the ofine ELM and LM-ELM, but much less than MS-ELM and MKELM. The testing times (TestTime) of these comparedmethods except for MKELM are at the same order of magnitude. Inthe training RMSE (TrainRMSE) term, the proposed ILM-ELMperforms comparable with ELM, LM-ELM, MS-ELM and MKELM,and better than OS-ELM. Considering the standard deviation oftraining RMSE (TrainSTD), ILM-ELM is the same as LM-ELM, betterthan MS-ELM and much better than OS-ELM. Since the MKELMmethod is a kernel-based method, the trainSTD (the standarddeviation of TrainRMSE) and testSTD (the standard deviation of

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 283632

  • TestRMSE) of MKELM are ignored. In testing RMSE (TestRMSE) andTestSTD measures, the proposed ILM-ELM performs almost as wellas the ofine ELM, LM-ELM, MS-ELM and MKELM, while outper-forms the online OS-ELM.

    Fig. 2 shows the one step ahead simulation prediction curvesand the error curve based on the proposed ILM-ELM method. Itcan be seen from Fig. 2 that the predicted curve and the actualcurve are so identical that it is difcult to distinguish. Fig. 3 showsthe multiple step ahead prediction results. From Fig. 3, it can beobserved that, as the prediction step increasing, the TestRMSEs ofELM, LM-ELM, OS-ELM, MS-ELM, MKELM and ILM-ELM grow

    quickly. In general, the multiple step ahead simulation result ofILM-ELM is similar with ELM and LM-ELM. Compared with OS-ELM, which is an online sequential learning method but issensitive to noise, the proposed ILM-ELM has an adaptive regular-ization parameter, thus, the performance of the proposed ILM-ELMis better than OS-ELM.

    5.2. The monthly temperature and rainfall of Dalian

    In order to further verify the validity of the proposed ILM-ELMfor multivariate time series prediction, the monthly rainfall andtemperature time series of Dalian are predicted in this simulationexperiment. The monthly rainfall and temperature time series ofDalian, China, from 1951 to 2001, totaling 612 records, are observedand used in this paper. Since there is an obvious period of 12 monthsin these two series, we initially choose the embedding parameters apriori by xing the delayed time 1 2 1 and d1 d2 12. Afterreconstructed, 600 samples are obtained, the rst 480 samples areused as training set, and the remaining 120 samples are used to testthe prediction performance. In the training set, the rst 360 samplesare used to train the model, and the remaining 120 samples are usedas validating set to select the hyper parameters of prediction models.

    Number of Hidden Nodes10 30 55 80 105 130 155 180 205 230 255 280 300

    RM

    SE (L

    oren

    z)

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    Fig. 1. The relationship between RMSE and the number of hidden nodes (Lorenz x(t)).

    Table 2Comparison of prediction results (Lorenz x(t)).

    Methods ELM LM-ELM OS-ELM MS-ELM MKELM ILM-ELM

    TrainTime 0.0265 0.0493 0.0702 40.2420 29.5723 1.5756TrainRMSE 0.7726 0.7727 0.9113 0.7750 0.7844 0.7870TrainSTD 0.0026 0.0028 0.0368 0.0054 0.0028TestTime 0.0031 0.0031 0.0012 0.0016 1.3205 0.0015TestRMSE 0.0956 0.0972 0.3699 0.0945 0.0880 0.1062TestSTD 0.0110 0.0102 0.0716 0.0133 0.0113

    0 50 100 150 200 250 300 350 400 450 500-20

    -10

    0

    10

    20 Actual Predicted

    0 50 100 150 200 250 300 350 400 450 500

    Pred

    ictio

    n Er

    ror

    -0.4

    -0.2

    0

    0.2

    0.4

    Lore

    nz

    Fig. 2. The predicted and actual time series of Lorenz-x(t) and prediction errors.

    Prediction Step0 5 10 15 20 25 30 35 40 45 50

    RM

    SE

    0

    1

    2

    3

    4

    5

    6

    7

    ELMLM-ELMOS-ELMMS-ELMMKELMILM-ELM

    Fig. 3. Multiple step ahead prediction of Lorenz x(t)).

    10 55 105 155 205 255 300

    RM

    SE (T

    empe

    ratu

    re)

    0

    2

    4

    RM

    SE (R

    ainf

    all)

    50

    100

    150

    Fig. 4. The relationships between validation RMSEs and number of hidden nodes(Monthly Temperature (Left) and Rainfall (Right)).

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 2836 33

  • The validating RMSEs and the number of hidden nodes of ELMare shown in Fig. 4. According to Fig. 4, the numbers of hiddennodes of ELM, LM-ELM, OS-ELM and ILM-ELM, for monthlytemperature and rainfall time series of Dalian, are set as 60 and20, respectively. The hidden node number of MC-ELM is selectedby the model selection algorithm while the regularization coef-cient of MKELM is optimized as 28 for both time series.

    The performances of the single step ahead prediction results ofmonthly temperature and rainfall time series of Dalian are shownin Tables 3 and 4. In the view point of training time (TrainTime),ILM-ELM is greater than ELM, LM-ELM, and OS-ELM, but less thanMS-ELM and MKELM for both time series. The MKELM, which usesmany different kernels, gets the best training RMSE (TrainRMSE)among these compared methods, while MS-ELM gets the bestTrainSTD measure of both temperature and rainfall series. Theproposed ILM-ELM is superior to OS-ELM in both TrainRMSE andTrainSTD. When considering the testing results, i.e. testing time(TestTime), testing RMSE (TestRMSE) and standard deviation oftesting RMSE (TestSTD), for monthly temperature time series, theproposed ILM-ELM gets the best TestRMSE among these methods

    and the TestSTD of ILM-ELM is less than OS-ELM; for monthlyrainfall time series, the TestRMSE of the proposed ILM-ELM is lessthan ELM, LM-ELM and OS-ELM while the TestSTD of ILM-ELM isless than ELM, OS-ELM and MS-ELM. The TestTime measure ofILM-ELM is moderate among these methods. Moreover, viewingthe ofine or batch learning essence of ELM, LM-ELM, MS-ELM andMKELM, and the online sequential learning essence of OS-ELM andILM-ELM, ILM-ELM is superior to OS-ELM in almost all themeasures while performs comparable well with these ofine orbatch methods.

    Table 3Comparison of prediction results (monthly temperature of Dalian).

    Methods ELM LM-ELM OS-ELM MS-ELM MKELM ILM-ELM

    TrainTime 0.0131 0.0293 0.0549 26.9968 2.1216 0.2799TrainRMSE 1.3109 1.3068 1.4477 1.3097 1.1840 1.3675TrainSTD 0.0290 0.0246 0.0395 0.0216 0.0326TestTime 0.0072 0.0022 0.0066 0.0012 0.1092 0.0025TestRMSE 1.5011 1.4773 1.6356 1.4819 1.4776 1.4584TestSTD 0.0702 0.0595 0.1336 0.0709 0.0719

    Table 4Comparison of prediction results (monthly rainfall of Dalian).

    Methods ELM LM-ELM OS-ELM MS-ELM MKELM ILM-ELM

    TrainTime 0.0081 0.0234 0.0203 27.4599 2.6208 0.2518TrainRMSE 49.2248 49.1206 49.2144 46.5029 40.2193 48.9098TrainSTD 0.9127 0.8479 0.9520 0.6728 0.7353TestTime 0.0037 0.0006 0.0025 0.0034 0.0624 0.0031TestRMSE 50.4664 50.4504 50.8610 49.8155 49.1533 49.8530TestSTD 1.2157 1.1229 1.2451 1.1628 1.1597

    0 20 40 60 80 100 120

    Tem

    pera

    ture

    -10

    0

    10

    20

    30Actual Predicted

    0 20 40 60 80 100 120

    Pred

    ictio

    n Er

    ror

    -4

    -2

    0

    2

    4

    Fig. 5. The predicted and actual time series of Dalian temperature and predictionerrors.

    0 20 40 60 80 100 120

    Rai

    nfal

    l

    0

    100

    200

    300

    400Actual Predicted

    0 20 40 60 80 100 120

    Pred

    ictio

    n Er

    ror

    -200

    -100

    0

    100

    200

    300

    Fig. 6. The predicted and actual time series of Dalian rainfall and the predictionerrors.

    Number of Hidden Nodes10 30 55 80 105 125 155 180 200

    RM

    SE (R

    unof

    f of Y

    ello

    w R

    iver

    )

    101

    102

    103

    104

    Fig. 7. The relationships between validation RMSEs and number of hidden nodes(Runoff of Yellow River).

    Table 5Comparison of prediction results (yearly runoff of Yellow River)

    Methods ELM LM-ELM OS-ELM MS-ELM MKELM ILM-ELM

    TrainTime 0.0175 0.0699 0.0321 9.3482 1.3884 0.2125TrainRMSE 38.8204 38.4696 56.4881 41.1191 39.3834 41.7314TrainSTD 1.5755 1.1766 4.0598 1.9406 1.5709TestTime 0.0075 0.0034 0.0031 0.0012 0.0624 0.0037TestRMSE 76.4108 75.6824 77.1470 75.2117 77.8863 75.3637TestSTD 9.1037 9.2894 9.4222 10.2716 8.4686

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 283634

  • Fig. 5 shows the single step ahead prediction results of monthlytemperature time series of Dalian. From Fig. 5, it can be seen thatthe predicted curve ts the actual curve very well. Fig. 6 shows thesingle step ahead prediction results of monthly rainfall time seriesof Dalian. It can be observed from Fig. 6 that there are some largeerrors when the time series changed dramatically. However, ingeneral, the predicted curve ts the trend of the actual curve.

    5.3. The yearly runoff of Yellow River and yearly mean sunspotsnumber

    In this example, the proposed ILM-ELM is used to predict themultivariate series consisting of yearly mean sunspots number andyearly natural runoff of Yellow River. The yearly runoff time seriesof Yellow River is measured at Sanmenxia gauge station in about304 years from 1700 to 2003. In this example, the yearly meansunspot number time series and the yearly runoff time series ofYellow River are used together to predict the runoff of Yellow Riverin the next year. The delayed times and embedding dimensions areset as 1 1, 2 1, d1 12, d2 12, respectively. After the phase-space reconstruction, 292 data samples are obtained. The rst 250samples are used for training the model and the remaining 42samples are used to test the prediction performance. Among these250 training samples, the rst 180 samples are used to train themodel, and the remaining 70 samples are used as validating set tooptimize the free parameters.

    The RMSEs measured on the validating set and the number ofhidden nodes of ELM are shown in Fig. 7. As a result, the numberof hidden nodes of ELM, LM-ELM, OS-ELM and ILM-ELM is set as55. The hidden nodes number of MC-ELM is determined by themodel selection algorithm. The regularization coefcient ofMKELM is optimized as 22.

    The simulation results are shown in Table 5. From Table 5, it canbe drawn that, ELM gets the smallest TrainTime, LM-ELM gets thebest TrainRMSE and TrainSTD, MS-ELM gets the best TestTime andTestRMSE, and the proposed ILM-ELM gets the smallest TestSTD.Considering only the online sequential learning methods, theproposed ILM-ELM is superior to OS-ELM on all these measuresexcept for TrainTime. Among all these methods, the proposed ILM-ELM gets the second best TrainSTD and TestRMSE. Fig. 8 shows theprediction results of the yearly runoff time series of Yellow River. Itis visually observed that the predicted curve successfully presentsthe trend of the actual curve.

    From Tables 3 to 5, it can be seen that the TrainRMSE of ILM-ELMis greater than ELM and LM-ELM, but the TestRMSE of ILM-ELM isalways less than ELM and LM-ELM. The reason is that the ELM andLM-ELMmethods are ofine or batch learning methods which use allthe training set to tune the output weights while the ILM-ELMmethod is online sequential method which uses a few latest samplesto update the output weights. Due to the online sequential learningessence of ILM-ELM, it can capture the latest dynamic information ofthe multivariate time series, and hence it would be suitable for real-time applications.

    6. Conclusion

    In order to solve the online sequential prediction problem ofmultivariate time series, an extreme learning machine predictionmodel based on an improved online sequential LM algorithm hasbeen proposed. The multivariate time series is rst phase-spacereconstructed. Then, the new developed feedforward neural net-work extreme learning machine is used to model the inputoutputmapping on the phase-space. The improved online sequential LMalgorithm, which extends the traditional LM algorithm to the onlinesequential framework and overcomes the problem of calculating theHessian matrix, is used to train the extreme learning machine. Theeffectiveness of the proposed method is testied on articial andreal-world multivariate time series prediction simulations. Theresults show that the proposed method gets almost the sameperformance as ofine learning methods but gets much betterperformance than online sequential method. Thus, the proposedmethod provides an effective way to predict multivariate time seriesin practical applications and online sequential applications. The typeand the number of hidden nodes of ELM have some importanteffects on the prediction performance. Future work is underway toevaluate the effects.

    References

    Bai, Y.M., Li, T.S., 2012. Robust fuzzy inference system for prediction of time serieswith outliers. In: 2012 International Conference on Fuzzy Theory and it'sApplications (iFUZZY). IEEE, pp. 394399.

    Balasundaram, S., 2013. On extreme learning machine for epsilon-insensitiveregression in the primal by Newton method. Neural Comput. Appl. 22,559567.

    von Bnau, P., Meinecke, F.C., Kirly, F.C., Mller, K.-R., 2009. Finding stationarysubspaces in multivariate time series. Phys. Rev. Lett. 103, 214101.

    Cao, L.Y., Mees, A., Judd, K., 1998. Dynamics from multivariate time series. Phys. D:Nonlinear Phenom. 121, 7588.

    Chakraborty, K., Mehrotra, K., Mohan, C.K., Ranka, S., 1992. Forecasting the behaviorof multivariate time series using neural networks. Neural Netw. 5, 961970.

    De Gooijer, J.G., Hyndman, R.J., 2006. 25 years of time series forecasting. Int. J.Forecast. 22, 443473.

    Deng, W.Y., Zheng, Q.H., Chen, L., 2009. Regularized extreme learning machine. In:IEEE Symposium on Computational Intelligence and Data Mining. CIDM'09.IEEE, pp. 389395.

    Huang, G.B., 2014. An insight into extreme learning machines: random neurons,random features and kernels. Cogn. Comput. 6, 376390.

    Huang, G.B., Chen, L., Siew, C.-K., 2006a. Universal approximation using incre-mental constructive feedforward networks with random hidden nodes. IEEETrans. Neural Netw. 17, 879892.

    Huang, G.B., Wang, D.H., Lan, Y., 2011. Extreme learning machines: a survey. Int. J.Mach. Learn. Cybern. 2, 107122.

    Huang, G.B., Zhou, H.M., Ding, X.J., Zhang, R., 2012. Extreme learning machine forregression and multiclass classication. IEEE Trans. Syst. Man Cybern. Part B:Cybern. 42, 513529.

    Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006b. Extreme learning machine: theory andapplications. Neurocomputing 70, 489501.

    Huynh, H.T., Won, Y., 2011. Regularized online sequential learning algorithm forsingle-hidden layer feedforward neural networks. Pattern Recognit. Lett. 32,19301935.

    Jaeger, H., Haas, H., 2004. Harnessing nonlinearity: predicting chaotic systems andsaving energy in wireless communication. Science 304, 7880.

    Lan, Y., Soh, Y.C., Huang, G.-B., 2009. Ensemble of online sequential extremelearning machine. Neurocomputing 72, 33913395.

    Li, D.C., Han, M., Wang, J., 2012. Chaotic time series prediction based on a novelrobust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 23, 787799.

    0 5 10 15 20 25 30 35 40 45

    Run

    off o

    f Yel

    low

    Riv

    er

    0

    200

    400

    600

    800 Actual Predicted

    0 5 10 15 20 25 30 35 40 45

    Pred

    ictio

    n Er

    ror

    -200

    -100

    0

    100

    200

    Fig. 8. The predicted and observed curves of annual runoff time series of YellowRiver and their errors.

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 2836 35

  • Lian, C., Zeng, Z.G., Yao, W., Tang, H.M., 2013. Ensemble of extreme learningmachine for landslide displacement prediction based on time series analysis.Neural Comput. Appl. 24, 99107.

    Liang, N.Y., Huang, G.B., Saratchandran, P., Sundararajan, N., 2006. A fast andaccurate online sequential learning algorithm for feedforward networks. IEEETrans. Neural Netw. 17, 14111423.

    Lim, J.-S., Lee, S., Pang, H.-S., 2013. Low complexity adaptive forgetting factor foronline sequential extreme learning machine (os-elm) for application to non-stationary system estimations. Neural Comput. Appl. 22, 569576.

    Luo, M.X., Zhang, K., 2014. A hybrid approach combining extreme learning machineand sparse representation for image classication. Eng. Appl. Artif. Intell. 27,228235.

    Man, Z.H., Lee, K., Wang, D.H., Cao, Z.W., Khoo, S., 2012. Robust single-hidden layerfeedforward network-based pattern classier. IEEE Trans. Neural Netw. Learn.Syst. 23, 19741986.

    Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A., 2010. Op-elm:optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 21,158162.

    Niska, H., Hiltunen, T., Karppinen, A., Ruuskanen, J., Kolehmainen, M., 2004.Evolving the neural network model for forecasting air pollution time series.Eng. Appl. Artif. Intell. 17, 159167.

    Nizar, A.H., Dong, Z.Y., Wang, Y., 2008. Power utility nontechnical loss analysis withextreme learning machine method. IEEE Trans. Power Syst. 23, 946955.

    Pino, R., Parreno, J., Gomez, A., Priore, P., 2008. Forecasting next-day price ofelectricity in the Spanish energy market using articial neural networks. Eng.Appl. Artif. Intell. 21, 5362.

    Popescu, F., 2011. Robust statistics for describing causality in multivariate timeseries. J. Mach. Learn. Res. 12, 3064.

    Rong, H.J., Huang, G.B., Sundararajan, N., Saratchandran, P., 2009. Online sequentialfuzzy extreme learning machine for function approximation and classicationproblems. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39, 10671072.

    Sapankevych, N., Sankar, R., 2009. Time series prediction using support vectormachines: a survey. IEEE Comput. Intell. Mag. 4, 2438.

    Shi, Z.W., Han, M., 2007. Support vector echo-state machine for chaotic time-seriesprediction. IEEE Trans. Neural Netw. 18, 359372.

    Soria-Olivas, E., Gomez-Sanchis, J., Martin, J., Vila-Frances, J., Martinez, M., Magda-lena, J., Serrano, A., 2011. Belm: Bayesian extreme learning machine. IEEE Trans.Neural Netw. 22, 505509.

    Sun, Y., Li, J., Liu, J., Chow, C., Sun, B., Wang, R., 2014. Using causal discovery forfeature selection in multivariate numerical time series. Mach. Learn., 119.

    Takens, F., 1981. Detecting strange attractors in turbulence. Dynamical systems andturbulence. Warwick 1980, pp. 366381.

    Wang, X.Y., Han, M., 2012. Multivariate chaotic time series prediction based onextreme learning machine. Acta Phys. Sin. 61, 97105.

    Wang, X.Y., Han, M., 2014. Multivariate time series prediction based on multiplekernel extreme learning machine. In: 2014 International Joint Conference onNeural Networks (IJCNN), pp. 198201.

    Wang, X.Y., Han, M., 2014a. Online sequential extreme learning machine withkernels for nonstationary time series prediction. Neurocomputing 145, 9097.

    Wilamowski, B.M., Yu, H., 2010a. Improved computation for LevenbergMarquardttraining. IEEE Trans. Neural Netw. 21, 930937.

    Wilamowski, B.M., Yu, H., 2010b. Neural network learning without backpropaga-tion. IEEE Trans. Neural Netw. 21, 17931803.

    Ye, Y., Squartini, S., Piazza, F., 2013. Online sequential extreme learning machine innonstationary environments. Neurocomputing 116, 94101.

    Zemouri, R., Racoceanu, D., Zerhouni, N., 2003. Recurrent radial basis functionnetwork for time-series prediction. Eng. Appl. Artif. Intell. 16, 453463.

    Zhao, J.W., Wang, Z.H., Park, D.S., 2012. Online sequential extreme learning machinewith forgetting mechanism. Neurocomputing 87, 7989.

    Zhao, P., Xing, L., Yu, J., 2009. Chaotic time series prediction: from one to another.Phys. Lett. A 373, 21742177.

    X. Wang, M. Han / Engineering Applications of Articial Intelligence 40 (2015) 283636

    Improved extreme learning machine for multivariate time series online sequential predictionIntroductionProblem definitionsPreliminariesMultivariate time series reconstructionExtreme learning machine prediction model

    Multivariate time series online sequential prediction based on improved ELMImproved online sequential LM algorithm for ELMPrediction model based on improved ELM

    Simulation resultsLorenz chaotic time seriesThe monthly temperature and rainfall of DalianThe yearly runoff of Yellow River and yearly mean sunspots number

    ConclusionReferences