a cycle deep belief network model for multivariate...

Research ArticleA Cycle Deep Belief Network Model for MultivariateTime Series Classification

ShuqinWang12 Gang Hua1 Guosheng Hao2 and Chunli Xie2

1School of Information and Electrical Engineering China University of Mining and Technology Xuzhou Jiangsu 221116 China2School of Computer Science and Technology Jiangsu Normal University Xuzhou Jiangsu 221116 China

Correspondence should be addressed to Gang Hua ghuacumteducn

Received 27 May 2017 Revised 11 July 2017 Accepted 24 August 2017 Published 4 October 2017

Academic Editor M L R Varela

Copyright copy 2017 Shuqin Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Multivariate time series (MTS) data is an important class of temporal data objects and it can be easily obtained However the MTSclassification is a very difficult process because of the complexity of the data type In this paper we proposed a Cycle Deep BeliefNetwork model to classify MTS and compared its performance with DBN and KNNThis model utilizes the presentation learningability of DBN and the correlation between the time series dataThe experimental results showed that this model outperforms otherfour algorithms DBN KNN ED KNN DTW and RNN

1 Introduction

Time series data are sequences of real-valued signals that aremeasured at successive time intervals They can be dividedinto two kinds univariate time series and multivariate timeseries (MTS) Univariate time series contain one variablewhile MTS have two or more variables MTS is a moreimportant data type of time series because it is widely usedin many areas such as speech recognition medicine and bio-logy measurement financial and market data analysis tele-communication and telemetry sensor networking motiontracking and meteorology

As the availability of MTS data increases the problem ofMTS classification attracts great interest recently in the litera-ture [1]MTS classification is a supervised learning procedureaimed for labeling a new multivariate series instance accord-ing to the classification function learned from the trainingset [2] However the features in traditional classificationproblems are independent of their relative positions whilethe features in time series are highly correlatedThat resultedin the loss of some important information if the traditionalclassification algorithms are used for MTS since they treateach feature as an independent attribute Many techniqueshave been proposed for time series classification A methodbased on boosting are presented for multivariate time seriesclassification [3] In [4] the authors proposed a DTW based

decision tree to classify time series and the error rate is 49In [5] the authors utilize a multilayer perceptron neural net-work on the control chart problem and the best performanceachieved is 19 error rate Hidden Markov Models are usedon the PCV-ECG classification problem and achieve 98accuracy [6] Support vector machine combined with Gaus-sian Elastic Metric Kernel is used for time series classification[7] The dynamics of recurrent neural networks (RNNs) forthe classification of time series are presented in [8] Howeversimple combination of one-nearest-neighbor with DTWdistance is claimed to be exceptionally difficult to beat [9]

Deep Belief Network is a type of deep neural networkwith multiple hidden layers introduced by Hinton et al [10]along with a greedy layer-wise learning algorithm RestrictedBoltzmann Machine (RBM) a probabilistic model is thebuilding block of DBN DBN and RBM have witnessedincreased attention from researchersThey have already beenapplied in many problems and gained excellent performancesuch as classification [11] dimensionality-reduction [12] andinformation retrieval [13] Taylor et al [14] proposed condi-tional RBM an extension of the RBM which is applied tohuman notion sequences Chao et al [15] evaluated the DBNperformance as a forecasting tool on predicting exchangerate Langkvist et al [16] applied DBN for sleep stage classi-fication and evaluated the performanceThe result illustratedthat DBN either with features (feat-DBN) or using the raw

HindawiMathematical Problems in EngineeringVolume 2017 Article ID 9549323 7 pageshttpsdoiorg10115520179549323

2 Mathematical Problems in Engineering

data (raw-DBN) performed better than the feat-GOHMMThe feat-DBN achieved 722 and the raw-DBN achieved674 while the feat-GOHMM achieved only 639

Raw-DBN do not need to extract feature before classify-ing the sleep data and this algorithm is easy to implementHowever it neglects the important information in time seriesdata and its performance is not satisfactory This paperproposed a Cycle DBN model for time series classificationThis model possesses the ability of feature learning since it isdeveloped on the basis of DBN Meanwhile the characters oftime series data are taken into consideration in the model

The remainder of the paper is organized as follows Nextsection reviews the background material In Section 3 wedetail the Cycle DBN model for multivariate time seriesSection 4 evaluates the performance of ourCycleDBNon tworeal data sets Section 5 concludes the work of this paper

2 Background Material

A time series is a sequence of observations over a period oftime Formally a univariate time series 119909 = 119909(119894) isin 119877 119894 =1 2 119899 is an ordered set of 119899 real-valued numbers and 119899 iscalled the length of the time series 119909 Multivariate time seriesismore common in real life and it ismore complex since it hastwo or more variables A MTS is defined as a finite sequenceof univariate time series

119883 = (1199091 1199092 119909119898) (1)

The MTS 119883 has 119898 variables and the corresponding compo-nent of the 119895th variable 119909119895 is a univariate time series of length119899

119909 = 119909119895 (119894) isin 119877 119894 = 1 2 119899 (119895 = 1 2 119898) (2)

In this paper we use bold face characters forMTS and regularfonts for univariate time series

The time series classification problem is a supervisedlearning procedure First we should learn a function 119891 119883 rarr119910 according to the given training set 119860 = (119883(119894) 119910(119894)) 119894 =1 2 119896 The training set 119860 includes 119896 samples and eachsample consists of an input119883(119894) paired with its correspondinglabel 119910(119894) Then we can assign a label to a new time seriesinstance based on the function we learned from the trainingset

ADeep Belief Network (DBN) consists of an input layer anumber of hidden layers and finally an output layer The toptwo layers have undirected symmetric connections betweenthem The lower layers receive top-down directed con-nections from the layer above

The process of training DBNs includes two phases Eachtwo consecutive layers in DBN are treated as a RestrictedBoltzmann Machine with visible units V and hidden unitsℎ There are full connections between visible layer and hid-den layer but no visible-to-visible or hidden-to-hidden con-nections (see Figure 1) The visible and hidden units are con-nected with a weight matrix119882 and have a visible bias vector119887 and a hidden bias vector 119888 respectively We need to traineach RBM independently one after another and then stack

ℎ1 ℎ2 ℎ3 ℎ4

v1 v2 v3

c

W

b

Figure 1 Graphical depiction of RBM

Begin

Train a rbm

True

End

False

Initialize the number

parameter of each layer dbn(i)

Input data v

V = ℎ

i = 0

i lt 1

i++

of layer l and the

P(ℎj | ) =1

1 + ＲＪ(cj + sumiWiji)

Figure 2 The flowchart of DBN training

them on top of each other in the first phaseThis procedure isalso called pretraining In the second phase the BP networkis set up at the last level of the DBN and the output of thehighest RBM is received as its input Then we can performa supervised learning in this phase This procedure is calledfine-tuning since the parameters in the DBN are tuned usingerror back propagation algorithm in this phase

The procedure of training DBN is shown by Algorithm 1and the corresponding flowchart is given by Figure 2

From the above analysis we can conclude that the mostimportant of DBN is the training of each RBM

Since there are no hidden-hidden or visible-visible con-nections in the RBM the probability that hidden unit ℎ119895 isactivated by visible vector 119875(ℎ119895|V) and the probability that

Mathematical Problems in Engineering 3

BeginInitialize 120572 (the learning rate) 119897 (the number ofDBN layers)119898119896 (the number of hidden unites in 119896 layerand 119896 = 1 2 119897) epochs DBN(119896)parameters(the parameters of each layer and 119896 = 1 2 119897)

Input data1198831198811 = 119883119896 = 1dotrainRBM(1198811 DBN(119896)parameters)119896++compute ℎ according to Equation (3)1198811 = ℎwhile (119896 lt 119897)End

Algorithm 1 DBN train algorithm

visible unit V119894 is activated by given hidden vector 119875(V119894|ℎ) isgiven by

119875 (ℎ119895 | V) =1

1 + exp (119888119895 + sum119894 119908119894119895V119894) (3)

119875 (V119894 | ℎ) =1

1 + exp (119887119894 + sum119895 119908119895119894ℎ119895) (4)

Contrastive Divergence (CD) approximation is used to trainthe parameters by minimizing the reconstruction error andthe learning rule is given by

120597 log119875 (V)120597119882119894119895

asymp ⟨V119894ℎ119895⟩data minus ⟨V119894ℎ119895⟩recon (5)

⟨V119894ℎ119895⟩data is expectation of the training set and ⟨V119894ℎ119895⟩recon rep-resents the expectation of the distribution of reconstructions

The procedure of training RBM is shown as Algorithm 2and the corresponding flowchart is given by Figure 3

3 Cycle_DBN for Time Series Classification

Langkvist et al [16] applied DBN in time series classificationand obtained a remarkable result The standard DBN opti-mizes the posterior probability 119901(119910119905 | 119909119905) of the class labelsgiven the current input 119909119905 However time series data aredifferent from other kinds of data and there are correlationsbetween time series data It is unsuitable to apply DBN fortime series classification without any modification because itneglects the important information in time series data

Based on the above discussion this paper proposed aCycle DBN model for time series classification just as Fig-ure 4Themodel inherits the powerful feature representationof DBN and utilizes the data correlation of the time seriesThus this model is quite suitable for time series classification

In this model 119883119905 is the input at time step 119905 and 119874119905 is thecorresponding output of DBN Since our purpose is classi-fication we add a softmax function on the top layer and 119910119905 isthe corresponding label After training DBN and getting the

Beginm = 1while (m lt epoch)for all hidden units 119895119875 (ℎ1119895 | V1) =

11 + exp (119888119895 + sum119894119882119894119895V1119894)

Sample ℎ1119894 isin 0 1 from 119875(ℎ1119895|V1)End forFor all visible units 119894 do119875 (V2119894 | ℎ1119895) =

11 + exp (119887119894 + sum119895119882119894119895ℎ1119895)

Sample V2119894 isin 0 1 from 119875(V2119895 | ℎ1)End forFor all hidden units 119895 do119875 (ℎ2119895 | V2) =

11 + exp (119888119895 + sum119894119882119894119895V2119894)

ℎ2119895 = 119875 (ℎ2119895 | V2)end for119908 = 119908 + 120572 lowast (ℎ1 lowast V11015840 minus ℎ2 lowast V21015840)119887 = 119887 + 120572 lowast (V1 minus V2)119888 = 119888 + 120572 lowast (ℎ1 minus ℎ2)End whileEnd

Algorithm 2 The algorithm for RBM train

label 119910119905 119910119905 is then treated as one item input of DBN At time 119905the inputs of DBN not only include 119883119905 but also include 119910119905minus1the output of DBN at time 119905 minus 1

The training procedure of this Cycle DBN which issimilar to the traditional DBN includes two procedures Theonly difference is that the output at time 119905 minus 1 is feedback toCycle DBN as one of the inputs at time 119905 The first procedureis unsupervised training to initiate the parameters of DBNAfter unsupervised learning we add a softmax function onthe top layer and do a supervised training procedure

4 Experimental Evaluation

In this section we conduct extensive experiments to evalu-ate the classification performance of the proposed modelCycle DBN and compare it against traditional DBN 119870NNED119870NN DTW and recurrent neural networks (RNN)

The 119896-NN is one of the most well-known classificationalgorithms that are very simple to understand but performswell in practice An object in the testing set is classifiedaccording to the distances of the object to the objects in thetraining set and the object is assigned to the class its 119896 nearestneighbors belongs to We will choose 119896 = 1 in our experi-ment and the algorithm is simply called the nearest neighboralgorithm In 119870NN ED we use Euclidean Distance to mea-sure the similarity between two instances

Dynamic Time Warping (DTW) [17] is another distancemeasure for time series and it was originally and typicallydesigned for univariate time series However the time serieshandled in this paper is multidimensional and a multi-dimensional version of DTW is needed Fortunately ten


Begin

according to Eqs (3-4)

according to Eq (5)

True

End

False

i lt ＪＩ＝Ｂ

Update w b c

Compute ℎ1 2 ℎ2

V1 = x

i = 1

i++

Figure 3 The flowchart of RBM training

V

Softmax

yt

ot

xt ytminus1

ℎ2

ℎ1

Figure 4 The architecture of Cycle DBN

Holt et al [18] proposed a multidimensional DTW and itutilizes all dimensions to find the best synchronization InstandardDTW the distance is usually calculated by taking thesquared distance between the feature values of each combina-tion of points 119889(119902119894 119888119895) = (119902119894 minus 119888119895)

2 But in multidimensionalDTW a distancemeasure for two119870-dimensional pointsmustbe calculated 119889(119902119894 119888119895) = sum119870119896=1(119902119894119896 minus 119888119895119896)

2 In 119870NN DTWwe usemultidimensional DTWdistance tomeasure the simi-larity between two instances

RNN allows the identification of dynamic system withan explicit model of time and memory which makes it idealfor time series classification In this paper we choose Elmanrsquosarchitecture which consist of a context layer an input layerone hidden layer and an output layer

To evaluate the performance of these methods we testthem on real-world time series datasets including sleepingdataset PAMAP2 dataset and UCR Time Series Classifica-tion Archive

The performance of the classifier is reported using errorrate and the error rate of classifiers is defined as shown in

error rate

=total number of misclassification data

total number of testing data

(6)

41 Sleep Stage Classification We first consider the problemof sleep stage classificationThe data used in the paper is pro-vided by St Vincentrsquos UniversityHospital andUniversity Col-lege Dublin and can be downloaded from httpwwwphy-sionetorgpn3ucddb PhysioNet

411 Dataset The recordings of this data set have beenobtained from 25 adult subjects with suspected sleep-dis-ordered breathing Each recording consists of 2 EEG channels(C3-A2 andC4-A1) 2 EOG channels and 1 EMG channelWeonly use one of the EEG signals (C3-A2) in our study

119883119905 = (EEG119905 EOG1119905 EOG2119905 EMG119905) (7)

According to Rechtschaffen and Kales (RampK) [19] sleeprecordings can be divided into the following five stagesawake rapid eye movement (REM) stage 1 stage 2 and slowwave sleep (SWS) Our goal is to find a map function 119891 thatcorrectly predicts the corresponding sleep stage according tothe X119905 119910119905 = 119891(119883119905)

412 Experiment Setup The raw signals of all subjects areslightly preprocessed by notch filtering at 50Hz to cancel outpower line disturbances and then are prefiltered with a band-pass filter of 03 to 32Hz for EEG and EOG and 10 to 32Hzfor EMG After that they are downsampled to 64Hz

Since the sample rate is 64 samples per second and weset window width 119908 to be 1 second of data our time seriesbecome

119883119894 = (EEG1+11989464+119894EOG11+119894

64+119894EOG21+11989464+119894

EMG1+11989464+119894)

(8)


Table 1 Distribution of five classes in the data set

Subject Total Awake REM Stage 1 Stage 2 SWStrainSamples 25000 5017 5005 4993 4986 4999Val Samples 5000 983 995 1007 1014 1001Ucdbb009 20000 4230 3720 6420 740 4890Ucdbb010 20000 1980 11040 2610 2480 1890Ucdbb011 20000 3270 6170 2430 390 7740Ucdbb012 20000 4380 7710 930 3660 3320Ucdbb013 20000 2670 4040 3660 2010 7620Ucdbb014 20000 0 6780 7380 1190 4650

Table 2 Classification error rate of five models on sleep datasets

Subject Cycle DBN DBN 119870NN ED 119870NN DTW RNNUcdbb009 00061 027619 03572 03359 0432568Ucdbb010 00112 019108 045832 03971 05609Ucdbb011 00044 010805 058243 057314 054586Ucdbb012 00098 004804 041244 03379 045141Ucdbb013 00068 019709 04964 04857 065820Ucdbb014 00077 026535 047192 036183 058422

Since the length of X119905 is 64 we have corresponding 64 labelsThe last label is selected as the label of the time series X119905

In our study we use five people recordings as the trainingset In order to balance the samples we select 6000 recordsevery category random So we have 30000 recordings andwe divide 25000 into train samples and 5000 into validationsamplesTheother six people recordings are used for test dataThe distribution of dataset is listed in Table 1

413 Experiment Result Our goal is to compare the perfor-mance of IDBNs with original DBN 119896NN ED 119870NN DTWand RNN for time series classification We illustrate the errorrate of each model in Table 2The best results are recorded inboldface in Table 2

Compared with other four algorithms the proposedalgorithm has best performanceThe classification accuraciesof Cycle DBN on all the test data are up to 90 and especiallymost of them are more than 99 Standard DBN has a higherrate of correct classification than119870NN ED119870NN DTW andRNN RNN shows quite poor performance and the error rateis about 50

42 Activity Classification Our second experiment is on thePAMAP2 dataset for activity classification This dataset canbe downloaded at httparchiveicsuciedumldatasetsPAMAP2+Physical+Activity+Monitoring

421 Dataset This data set records 18 activities performedby 9 subjects wearing 3 IMUs and a HR-monitor Each ofdata contains 54 columns per row and the columns containthe following data timestamp (1) activityID (2) heart rate(3) IMU hand (4ndash20) IMU chest (21ndash37) and IMU ankle(38ndash54) In our experiment we only select 7 activities whichare ldquolying (1)rdquo ldquositting (2)rdquo ldquostanding (3)rdquo ldquowalking (4)rdquoldquorunning (5)rdquo ldquocycling (6)rdquo ldquoNordic walking (7)rdquo Since the

records of subject103 and subject109 do not have all the aboveseven activities and we discard these two subjects That is tosay we select subject 101simsubject 102 and subject 104simsubject108 seven subjects to classify seven activities Furthermorethe record of heart rate is not used in our experiment

422 Experiment Setup To improve the performance of theproposed approach we need to carry out a data preprocessingprocess at the beginning of the experiment Each dimensionof time series is normalized through

119909 = 119909 minusmean (119909)std (119909)

(9)

where mean(119909) and std(119909) are the mean and standard devia-tion of the variable for samples belonging to the same columnnot all samples

For each subject of seven subjects we randomly select 12as training set 16 as validation set and the rest as test set

423 Experiment Result We evaluate classification accura-cies of each model on these seven subjects Table 3 showsthe detailed error rates comparison of each subject FromTable 3 we can see that the classification accuracies of the fivemodels on the seven datasets are more than 90 Howeverour Cycle DBN model is either the lowest error rate oneor very close to the lowest error rate one for each subject119870NN ED also shows quite excellent performance and weshould note that119870NN is feature-based model

It is well known that feature-based models have anadvantage over lazy classification models such as 119870NN inefficiency Although 119870NN has high classification accuracythe prediction time of 119870NN will increase dramatically whenthe size of training data set grows The prediction time ofDBN and Cycle DBN will not increase no matter how largethe training data is Therefore Cycle DBN shows excellent


Table 3 Classification error rate of five models on PAMAP2

Subject Cycle DBN DBN 119870NN ED 119870NN DTW RNNSubject 101 570E minus 05 001208 0000171 003 0167081Subject 102 182119864 minus 05 0000675 0 0034 0036484Subject 104 0 0000197 393119864 minus 05 0006 0000995Subject 105 338119864 minus 05 0004746 0 0024 0101161Subject 106 182E minus 05 0001546 364119864 minus 05 0018 0Subject 107 0 0000438 399119864 minus 05 0024 0062604Subject 108 348119864 minus 05 0000869 174E minus 05 0018 00833

Table 4 Classification error rate of five models on ten UCR time series datasets

Dataset Cycle DBN DBN 119870NN ED 119870NN DTW RNNuWaveGestureLibrary Z 0333333 0330656 0350363 035 0419598UWaveGestureLibraryAll 0095047 007095 0051926 0052 0100503FordA 0489647 0487211 0341016 0341 0484865Two Patterns 0044365 0032374 009325 0090 010775ECG5000 0049161 0051559 0075111 0075 0072889Wafer 0000838 0002513 0004543 0005 0006976StarLightCurves 0069481 0070779 0151166 0151 0326979ElectricDevices 0029931 0338983 0449228 0376 09135InsectWingbeatSound 039782 0384196 0438384 0438 0408081Face (all) 0101333 0106667 0286391 0286 0193491

performance in terms of classification accuracy and timeconsuming

43 UCR Time Series Classification Besides the above twodata sets we also test our Cycle DBN on the ten distinct timeseries datasets from UCR time series [20] All the datasethas been split into training and testing by default The onlypreprocessing in our experiment is normalization and dividesthem into training validating and testing set

Table 4 shows the test error rate and a comprehensivecomparison with 119870NN ED 119870NN DTW RNN DBN andCycle DBN

Cycle DBN outperforms other four methods on fivedatasets of ten datasets 119870NN ED and 119870NN DTW achievebest performance on the same two datasets DBN achievesbest performance on two datasets Although the performanceof RNN is not prominent the effect is also acceptable

5 Conclusion

Time series classification is becomingmore andmore impor-tant in a broad range of real-world applications Howevermost existing methods have lower classification accuracy orneed domain knowledge to identify representative featuresin data In this paper we proposed a Cycle DBN for clas-sification of multivariate time series data in general LikeDBN Cycle DBN is an unsupervised learning algorithmwhich can discover the structure hidden in the data and learnrepresentations that aremore suitable as input to a supervisedmachine than the raw input Comparing with DBN the newmodel Cycle DBN predicts the label of time 119905 119910119905 not onlybased on the current input 119909119905 but also based on the label of

previous time 119910119905minus1 We evaluated our Cycle DBN model ontwelve real-world datasets and experimental results show thatour model outperforms DBN 119870NN ED 119870NN DTW andRNN on most datasets

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 51574232 61379100 61673196 and61502212)

References

[1] L Chen andM S Kamel ldquoDesign of multiple classifier systemsfor time series datardquo Lecture Notes in Computer Science pp 216ndash225 2005

[2] I Batal et al ldquoMultivariate time series classification with tem-poral abstractionsrdquoThe Florida Ai Research Society vol 22 344pages 2009

[3] J J Rodriguez C J Alonso andH Bostrom ldquoBoosting intervalbased literalsrdquo Intelligent Data Analysis vol 5 no 2 pp 245ndash262 2001

[4] J J Rodrıguez and C J Alonso Interval and Dynamic TimeWarping-Based Decision Trees p 548 2004

[5] A Nanopoulos R Alcock and Y Manolopoulos ldquoFeature-based classification of time-series datardquo in Information Process-ing and Technology M Nikos andD N Stavros Eds pp 49ndash61Nova Science Publishers Inc New York USA 2001


[6] S Kim P Smyth and S Luther ldquoModeling waveform shapeswith random effects segmental hidden Markov modelsrdquo inProceedings of the 20th Conference on Uncertainty in ArtificialIntelligence pp 309ndash316 AUAI Press Banff Canada July 2004

[7] D Zhang W Zuo D Zhang and H Zhang ldquoTime series clas-sification using support vector machine with Gaussian elasticmetric kernelrdquo inProceedings of the 2010 20th International Con-ference on Pattern Recognition (ICPR rsquo10) IEEE Istanbul Tur-key August 2010

[8] M Husken and P Stagge ldquoRecurrent neural networks fortime series classificationrdquoNeurocomputing vol 50 pp 223ndash2352003

[9] X Xi E Keogh C Shelton L Wei and C A RatanamahatanaldquoFast time series classification using numerosity reductionrdquo inProceedings of the the 23rd International Conference on Machinelearning (ICML rsquo06) pp 1033ndash1040 ACM Pittsburgh Pen-nsylvania USA June 2006

[10] G E Hinton S Osindero and Y-W Teh ldquoA fast learning algo-rithm for deep belief netsrdquo Neural Computation vol 18 no 7pp 1527ndash1554 2006

[11] H Lee R Grosse R Ranganath and A Y Ng ldquoConvolutionaldeep belief networks for scalable unsupervised learning ofhierarchical representationsrdquo in Proceedings of the 26th AnnualInternational Conference on Machine Learning (ICML rsquo09) pp609ndash616 ACM Montreal Quebec Canada June 2009

[12] G E Hinton and R Salakhutdinov ldquoReducing the dimension-ality of data with neural networksrdquo Science vol 313 no 5786pp 504ndash507 2006

[13] MWAmpMRosenzvi andG EHinton ldquoExponential familyharmoniums with an application to information retrievalrdquo inProceedings of the 18thAnnual Conference onNeural InformationProcessing Systems (NIPS rsquo04) 2004

[14] G W Taylor G E Hinton and S T Roweis ldquoModeling humanmotion using binary latent variablesrdquo in Proceedings of the Inter-national Conference on Neural Information Processing 2006

[15] J Chao F Shen and J Zhao ldquoForecasting exchange rate withdeep belief networksrdquo in Proceedings of the 2011 InternationalJoint Conference on Neural Network (IJCNN rsquo11) IEEE San JoseCA USA August 2011

[16] M Langkvist L Karlsson and A Loutfi ldquoSleep stage classifica-tion using unsupervised feature learningrdquoAdvances in ArtificialNeural Systems vol 2012 Article ID 107046 9 pages 2012

[17] J Kruskall and M Liberman ldquoThesymmetric time warpingproblem from continuous to discreterdquo in Time Warps StringEdits and Macromolecules TheTheory and Practice of Sequence-Comparison Addison-Wesley 1983

[18] G A ten Holt M J Reinders and E Hendriks ldquoMulti-dimen-sional dynamic time warping for gesture recognitionrdquo in Pro-ceedings of the Thirteenth annual conference of the AdvancedSchool for Computing and Imaging 2007

[19] A Rechtschaffen andAKalesAManual of StandardizedTermi-nology Techniques and Scoring Systems for Sleep Stages ofHuman Subjects 1968

[20] Y Chen et al The ucr time series classification archive httpwwwcsucredusimeamonntime series data 2015

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Journal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


data (raw-DBN) performed better than the feat-GOHMMThe feat-DBN achieved 722 and the raw-DBN achieved674 while the feat-GOHMM achieved only 639

Raw-DBN do not need to extract feature before classify-ing the sleep data and this algorithm is easy to implementHowever it neglects the important information in time seriesdata and its performance is not satisfactory This paperproposed a Cycle DBN model for time series classificationThis model possesses the ability of feature learning since it isdeveloped on the basis of DBN Meanwhile the characters oftime series data are taken into consideration in the model

The remainder of the paper is organized as follows Nextsection reviews the background material In Section 3 wedetail the Cycle DBN model for multivariate time seriesSection 4 evaluates the performance of ourCycleDBNon tworeal data sets Section 5 concludes the work of this paper

2 Background Material

A time series is a sequence of observations over a period oftime Formally a univariate time series 119909 = 119909(119894) isin 119877 119894 =1 2 119899 is an ordered set of 119899 real-valued numbers and 119899 iscalled the length of the time series 119909 Multivariate time seriesismore common in real life and it ismore complex since it hastwo or more variables A MTS is defined as a finite sequenceof univariate time series

119883 = (1199091 1199092 119909119898) (1)

The MTS 119883 has 119898 variables and the corresponding compo-nent of the 119895th variable 119909119895 is a univariate time series of length119899

119909 = 119909119895 (119894) isin 119877 119894 = 1 2 119899 (119895 = 1 2 119898) (2)

In this paper we use bold face characters forMTS and regularfonts for univariate time series

The time series classification problem is a supervisedlearning procedure First we should learn a function 119891 119883 rarr119910 according to the given training set 119860 = (119883(119894) 119910(119894)) 119894 =1 2 119896 The training set 119860 includes 119896 samples and eachsample consists of an input119883(119894) paired with its correspondinglabel 119910(119894) Then we can assign a label to a new time seriesinstance based on the function we learned from the trainingset

ADeep Belief Network (DBN) consists of an input layer anumber of hidden layers and finally an output layer The toptwo layers have undirected symmetric connections betweenthem The lower layers receive top-down directed con-nections from the layer above

The process of training DBNs includes two phases Eachtwo consecutive layers in DBN are treated as a RestrictedBoltzmann Machine with visible units V and hidden unitsℎ There are full connections between visible layer and hid-den layer but no visible-to-visible or hidden-to-hidden con-nections (see Figure 1) The visible and hidden units are con-nected with a weight matrix119882 and have a visible bias vector119887 and a hidden bias vector 119888 respectively We need to traineach RBM independently one after another and then stack

ℎ1 ℎ2 ℎ3 ℎ4

v1 v2 v3

c

W

b

Figure 1 Graphical depiction of RBM

Begin

Train a rbm

True

End

False

Initialize the number

parameter of each layer dbn(i)

Input data v

V = ℎ

i = 0

i lt 1

i++

of layer l and the

P(ℎj | ) =1

1 + ＲＪ(cj + sumiWiji)

Figure 2 The flowchart of DBN training

them on top of each other in the first phaseThis procedure isalso called pretraining In the second phase the BP networkis set up at the last level of the DBN and the output of thehighest RBM is received as its input Then we can performa supervised learning in this phase This procedure is calledfine-tuning since the parameters in the DBN are tuned usingerror back propagation algorithm in this phase

The procedure of training DBN is shown by Algorithm 1and the corresponding flowchart is given by Figure 2

From the above analysis we can conclude that the mostimportant of DBN is the training of each RBM

Since there are no hidden-hidden or visible-visible con-nections in the RBM the probability that hidden unit ℎ119895 isactivated by visible vector 119875(ℎ119895|V) and the probability that






119875 (ℎ119895 | V) =1

1 + exp (119888119895 + sum119894 119908119894119895V119894) (3)

119875 (V119894 | ℎ) =1

1 + exp (119887119894 + sum119895 119908119895119894ℎ119895) (4)


120597 log119875 (V)120597119882119894119895









11 + exp (119888119895 + sum119894119882119894119895V1119894)


11 + exp (119887119894 + sum119895119882119894119895ℎ1119895)


11 + exp (119888119895 + sum119894119882119894119895V2119894)










Begin


according to Eq (5)

True

End

False

i lt ＪＩ＝Ｂ

Update w b c

Compute ℎ1 2 ℎ2

V1 = x

i = 1

i++


V

Softmax

yt

ot

xt ytminus1

ℎ2

ℎ1








error rate



(6)



119883119905 = (EEG119905 EOG1119905 EOG2119905 EMG119905) (7)




119883119894 = (EEG1+11989464+119894EOG11+119894

64+119894EOG21+11989464+119894

EMG1+11989464+119894)

(8)














119909 = 119909 minusmean (119909)std (119909)

(9)














5 Conclusion





Acknowledgments


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of









119875 (ℎ119895 | V) =1

1 + exp (119888119895 + sum119894 119908119894119895V119894) (3)

119875 (V119894 | ℎ) =1

1 + exp (119887119894 + sum119895 119908119895119894ℎ119895) (4)


120597 log119875 (V)120597119882119894119895









11 + exp (119888119895 + sum119894119882119894119895V1119894)


11 + exp (119887119894 + sum119895119882119894119895ℎ1119895)


11 + exp (119888119895 + sum119894119882119894119895V2119894)










Begin


according to Eq (5)

True

End

False

i lt ＪＩ＝Ｂ

Update w b c

Compute ℎ1 2 ℎ2

V1 = x

i = 1

i++


V

Softmax

yt

ot

xt ytminus1

ℎ2

ℎ1








error rate



(6)



119883119905 = (EEG119905 EOG1119905 EOG2119905 EMG119905) (7)




119883119894 = (EEG1+11989464+119894EOG11+119894

64+119894EOG21+11989464+119894

EMG1+11989464+119894)

(8)














119909 = 119909 minusmean (119909)std (119909)

(9)














5 Conclusion





Acknowledgments


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of





Begin


according to Eq (5)

True

End

False

i lt ＪＩ＝Ｂ

Update w b c

Compute ℎ1 2 ℎ2

V1 = x

i = 1

i++


V

Softmax

yt

ot

xt ytminus1

ℎ2

ℎ1








error rate



(6)



119883119905 = (EEG119905 EOG1119905 EOG2119905 EMG119905) (7)




119883119894 = (EEG1+11989464+119894EOG11+119894

64+119894EOG21+11989464+119894

EMG1+11989464+119894)

(8)














119909 = 119909 minusmean (119909)std (119909)

(9)














5 Conclusion





Acknowledgments


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of

















119909 = 119909 minusmean (119909)std (119909)

(9)














5 Conclusion





Acknowledgments


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of













5 Conclusion





Acknowledgments


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of



























Volume 2014




Journal of











Journal of


Function Spaces






Algebra





Journal of




a cycle deep belief network model for multivariate...

Documents