abstract arxiv:1908.06848v2 [eess.sp] 27 sep 2019 · we observe thata convolutional neural network...

Classification of chaotic time series with deep learning

Nicolas Boullea,∗, Vassilios Dallasa, Yuji Nakatsukasaa, D. Samaddarb

aMathematical Institute, University of Oxford, Oxford, OX2 6GG, UKbUnited Kingdom Atomic Energy Authority, Culham Centre for Fusion Energy,

Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK

Abstract

We use standard deep neural networks to classify univariate time series generated by discrete and continuousdynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack ofprecise models for some of the most challenging real-life applications is to train different neural networks ona data set from a dynamical system with a basic or low-dimensional phase space and then use these networksto classify univariate time series of a dynamical system with more intricate or high-dimensional phase space.We illustrate this generalisation approach using the logistic map, the sine-circle map, the Lorenz system,and the Kuramoto–Sivashinsky equation. We observe that a convolutional neural network without batchnormalization layers outperforms state-of-the-art neural networks for time series classification and is able togeneralise and classify time series as chaotic or not with high accuracy.

Keywords: Dynamical systems, Chaos, Deep learning, Time series, Classification

1. Introduction

Data and in particular time series are gener-ated from numerous observations and experimentsacross different scientific fields such as atmosphericand oceanic sciences for climate predictions, nuclearfusion for control and safety, biology and medicinefor diagnosis. Fourier transforms, radial basis func-tions approximation and standard numerical tech-niques have been extensively applied to performshort and long term predictions of chaotic time se-ries [1, 2, 3, 4]. On the other hand, the spectacu-lar success of machine learning and deep learningtechniques to image classification [5, 6], which haverecently surpassed human-level performance on theImageNet data set [7], has inspired the developmentof neural network techniques for time series fore-casting [8, 9] and classification [10]. Recently, deeplearning approaches have been used to solve partialdifferential equations in high dimensions [11, 12, 13]and identify hidden physics models from experi-mental data [14, 15, 16, 17].

∗Corresponding authorEmail addresses: [email protected] (Nicolas

Boulle), [email protected] (Vassilios Dallas),[email protected] (Yuji Nakatsukasa),[email protected] (D. Samaddar)

The size of the data sets is often large andanalysing these time series represents a huge com-putational challenge and interest nowadays. Forsome of the most challenging real-life applications aprecise dynamical system is unknown, which makesthe identification of the different dynamical regimesimpossible. In that spirit, machine learning hasbeen recently employed by Pathak et al. [18, 19] toperform model-free predictions of chaotic dynam-ical systems. Moreover, deep learning requires alarge data set to adequately train the artificial neu-ral network, which might not be available in somecases due to the infinite dimensional phase space ofthe system or experimental constraints.

In this paper, we address the aforementionedchallenges by considering the following classifica-tion problem: given a univariate time series gen-erated by a discrete or continuous dynamical sys-tem, can we determine whether the time series hasa chaotic or non-chaotic behaviour? We choose totrain neural networks on a different set than thetesting set of interest in order to assess the abilityof the machine learning algorithms to generalise theclassification of univariate time series of a dynam-ical system with a basic or low-dimensional phasespace to a more intricate or high-dimensional one.

1

arX

iv:1

908.

0684

8v2

[ee

ss.S

P] 2

7 Se

p 20

19

Our aim is to demonstrate the generalisation abil-ity of neural networks in this classification problemusing standard deep learning models. The mainchallenge here is to learn the chaotic features of atraining set, whose chaotic behaviour can be de-termined a priori using measures from dynamicalsystems theory, without overfitting, and generaliseon a testing data set, which comes from anotherdynamical system, whose behaviour is different.

The paper is organised as follows. We brieflydescribe five different neural networks architecturesfor time series classification in Section 2. Then, inSection 3, we classify signals generated by discretedynamical systems and compare the accuracy of theneural networks. Finally, Section 4 consists of theclassification of time series generated by the Lorenzsystem and the Kuramoto–Sivashinsky equation.

2. Neural networks for time series classifica-tion

Time series classification is one of the most chal-lenging problems in machine learning [20] with awide range of applications in human activity recog-nition [21], acoustic scene classification [22], andcybersecurity [23]. In this section, we describe fivedifferent architectures that we have considered forclassifying time series generated by discrete andcontinuous dynamical systems.

First, we consider a simple shallow neural net-work (see Section 2.1) in order to compare it withstate-of-the-art classifiers. The multilayer percep-tron [10], presented in Section 2.2, is chosen becauseit is a fully connected deep neural network, whichdoes not rely on convolutional layers. Convolu-tional neural networks have first been introduced toperform handwritten digit [24] and have been suc-cessfully applied to images and time series [5, 6, 25].The next two neural networks (see Section 2.3 and2.4) are then based on convolutional layers and werechosen on the basis that they achieve the best per-formance on standard time series datasets accord-ing to the recent review by Fawaz et al. [26]. Finallyin Section 2.5 we consider a convolutional neuralnetwork for time series classification with a largekernel size to decrease the computational expense.

In what follows, the input of the neural networkis a univariate time series X of size T (in practicewe take T to be one thousand). Each of the time se-ries is assigned a class label that we want to recoverusing the different neural networks: Class NC cor-responds to a non-chaotic time series while Class

C corresponds to a chaotic time series. The neuralnetworks presented in this section end with a soft-max layer, which outputs a vector of probabilities[pNC , pC ], where pNC is the probability that the in-put time series is non-chaotic and pC = 1− pNC isthe probability that it is chaotic. A time series isclassified as non-chaotic if pNC ≥ 0.5 and chaoticotherwise. The performance of the neural networksis then assessed using the classification accuracy de-fined as

Accuracy =TNC + TC

TNC + TC + FNC + FC× 100,

where TNC = true non-chaotic predictions, TC =true chaotic predictions, FNC = false non-chaoticpredictions, and FC = false chaotic predictions.

Analysing our results with other metrics than ac-curacy can be important particularly when the test-ing data sets are not balanced. For this reason wealso considered metrics such as precision, recall [28,Chapt. 9], and balanced accuracy [29]. However, wesaw that our findings were not influenced by thesemetrics even though their values change. Thus, wechoose to report only the values of the classifica-tion accuracy which are not better or worse thanthe other metrics.

2.1. Shallow neural network

The first type of networks considered in this sec-tion is shallow neural networks (ShallowNet), whichare simple and efficient networks for fitting func-tions and perform pattern recognition. These net-works differ from deep neural network since theyusually contain only one or two hidden layers. Weuse MATLAB’s patternnet command from theDeep Learning Toolbox to define a network withone hidden layer, containing one hundred neurons,with the sigmoid as activation function. The net-work is trained with the Scaled Conjugate Gradientalgorithm [30] using MATLAB’s default settings.

2.2. Multi layer perceptrons

Multilayer perceptrons (MLP) are standard deepneural network architectures in the field of machinelearning and essentially consist of fully connectedlayers separated by a nonlinear activation function.Wang, Weizhong and Oates [10] use a structure ofthree hidden layers of five hundred neurons followedby a rectified linear unit (ReLU) to perform timeseries classification (a Python implementation us-ing TensorFlow is available in [31]). Moreover, a

2

dropout is added at the input, hidden and softmaxlayers with rates 0.1, 0.2, 0.3, respectively. Thenetwork is trained on 10 epochs with a batch sizeof 32 using Adam’s algorithm [27]. We use the sameparameters and optimisation algorithm to train thefollowing three convolutional neural networks.

2.3. Fully convolutional neural network

The fully convolutional neural network (FCN) ar-chitecture considered in [10] is a succession of threeconvolutional blocks, followed by a global averagedpooling layer [32] and a softmax layer. The first(resp. second, third) convolutional block that weconsider is composed of a convolution layer of ker-nel size 8 (resp. 5, 3) with 64 (resp. 128, 64) featurechannels, a batch normalization layer [33] and aReLU activation layer. The fully convolutional neu-ral network studied by [10] is implemented in [31].

2.4. Residual network

The last network considered by Wang, Weizhongand Oates [10] in the context of time series clas-sification is a residual network (ResNet). Resid-ual networks are examples of very deep neural net-work and are designed by stacking the convolutionalblocks arising in the FCN (see Section 2.3). Then,the ResNet is created by assembling three blocks ofthe FCN to generate a residual block. Three resid-ual blocks, with 64, 128, 128 respective number offeature channels, are then stacked and followed by aglobal average pooling layer and a softmax layer tooutput the classification of the different input timeseries. Reference [31] provides a practical Pythonimplementation.

2.5. Large kernel convolutional neural network

We consider a standard convolution neural net-work with large kernel size (LKCNN). The net-work is composed of two convolutional layers withfive feature channels and kernel size of a hundred,followed by ReLU activation, a maximum poolinglayer with a pool size of two, a flatten layer, and twofully connected layers of respective size one hundredand two. The ReLU activation function is chosenbecause it is easy to optimise due to its piecewiselinearity [34, Chap. 6]. Moreover, a dropping outunit (dropout) with rate 0.5 is added after the max-imum pooling layer to improve the generalisationability of the network [35]. The architecture of thenetwork is shown in Figure 1.

Standard implementations of convolutional neu-ral networks usually consider a higher number offeatures and a much smaller kernel size [10, 26].However, we observed that the performances of thisnetwork, on the datasets presented in the followingsections, were not affected by the kernel size or thenumber of feature channels. Therefore, we chooseto use a large kernel size in order to reduce thenumber of trainable parameters and thus reduce thecomputational expense of this network. Similarly,we verified that increasing the number of convo-lutional layers from two to three or four does notsignificantly improve the performance of LKCNNoverall.

The main difference that determines the clas-sification accuracy between LKCNN, FCN, andResNet is that FCN and ResNet have batch normal-ization layers. Batch normalization was introducedto speed up the optimisation algorithm that occursat the training phase by normalising the traininginput data at the internal layers [33].

In this work, we study the generalisation abilityfrom a training set to a testing set with time seriesthat span different range of values. For this reason,we normalize both the training set and the testingset (see details in the following sections) so thatthe values vary within the same range. Note thatthe scaling and shifting parameters of the batchnormalisation layers are determined in the trainingphase [33]. Therefore, applying a batch normalisa-tion to a data set with different mean and variancewould be wrong. In our problem this happens withthe testing data, which can have different mean andvariance from the training data after the convolu-tional layers. This explains the lack of performanceof the FCN and ResNet on the testing data sets aswe will show in the following sections.

3. Discrete dynamical systems

In this section we consider two discrete dynamicalsystems called the logistic map and the sine-circlemap. The first one is the logistic map, popular-ized by Robert May [36], which is defined by thesequence

xn+1 = µxn(1− xn), x0 = 0.5, (1)

where µ is the bifurcation parameter varying be-tween zero and four. This system exhibits periodicor chaotic behaviour depending on the value of µ.

3

Input timeseries

convolution maximumpooling

2 outputclasses

fully connected

Figure 1: The large kernel convolutional neural network (LKCNN) architecture for time series classification. Figure adaptedfrom [26].

50 100 150 2000

0.2

0.4

0.6

0.8

1

n

xn

50 100 150 2000

0.2

0.4

0.6

0.8

1

n

xn

Figure 2: A periodic (top) and a chaotic (bottom) signalof the logistic map of size two hundred with µ = 3.5 andµ = 3.8, respectively.

Periodic and chaotic signals of the logistic map areplotted in Figure 2.

The bifurcation diagram showing the orbits of thelogistic map is represented in Figure 3 (top). Thebehaviour of the attractors for different parametersµ has been extensively studied [37, Chap. 10] and ahighlight is the period-doubling cascade happeningfor µ ∈ [0, 3.54409].

µ

xn

µ

θn

Figure 3: Bifurcation diagrams of the logistic map (top) andthe sine-circle map (bottom).

The second dynamical system considered in thissection is the sine-circle map [38, Chap. 6], which

4

is sometimes referred to as the circle map. It takesthe form of the following nonlinear map

θn+1 = θn+Ω− µ

2πsin(2πθn) mod [1], θ0 = 0.5,

(2)where Ω = 0.606661 and µ ∈ [0, 5] is the parame-ter that measures the strength of the nonlinearity.Similarly to the logistic map, iterating Equation (2)leads to periodic or chaotic signals depending onthe bifurcation parameter µ chosen. Figure 4 illus-trates two signals with different behaviours, gen-erated using a bifurcation parameter of µ = 2.1(top) and µ = 2.3 (bottom). The bottom panelof Figure 3 shows the bifurcation diagram of thesine-circle map.

50 100 150 2000

0.2

0.4

0.6

0.8

1

n

θn

50 100 150 2000

0.2

0.4

0.6

0.8

1

n

θn

Figure 4: A periodic (top) and a chaotic (bottom) signal ofthe sine-circle map of size two hundred with µ = 2.1 andµ = 2.3, respectively.

We now want to classify signals generated bythe logistic and sine-circle maps according to theirchaotic and non-chaotic behaviour. Our main goal,and challenge, is to find a neural network that isable to learn the features characterising chaoticsignals of the logistic map and generalise on sig-nals generated by the sine-circle map. To do this,we generate two data sets by computing signals oflength one thousand of the logistic and sine-circlemaps. This is done for five thousand different valuesof the parameter µ, uniformly distributed in the in-terval [0, 5]. The logistic (resp. sine-circle) dataset

is then composed of 24% (resp. 35%) of chaotictime series. First, we randomize the logistic mapdata set across the bifurcation parameter and wechoose two thirds of the data to be the training set.Then, with the rest (one third) of the logistic mapdata set, we test the training of the five neural net-works described in Section 2. Note that to classifythe time series of the sine-circle data set we usethese five neural networks, which have been trainedon the logistic map training set.

2.5 3 3.5 4-5

-4

-3

-2

-1

0

1

λ

µ

2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

Entr

opy

µ

Figure 5: Lyapunov exponents (top) and Shannon entropy(bottom) of the logistic map. A time series is chaotic ifits Lyapunov exponent is greater than zero and its entropygreater than 0.75. The horizontal black lines in the plotsindicate these thresholds.

The classification of the time series is done usingtwo measures from dynamical systems theory. Thefirst measure is the Lyapunov exponent which isdefined as

λ = limn→+∞

1

n

n−1∑i=0

log |f ′(xi)| (3)

for a discrete dynamical system xn+1 = f(xn) andexpresses the exponential separation, viz. d(t) =d0e

λt, of two nearby trajectories originally sepa-rated by distance d0 = ε 1 at time t = 0. Thesecond measure is the Shannon entropy, which usesthe probability distribution function of a trajectoryto quantify a range of accessible states for a dynam-ical system and relates to the potential topological

5

transitivity of the system [38, Chap. 9]. Hence, weexpect a chaotic system to have well distributed tra-jectories in space compared to a periodic one andthe aim is then to count the number of accessiblestates for the system. Thus, we define the Shannonentropy of a time series xn to be

SN = − 1

log(N)

∑r

pr log(pr), (4)

where pr is the probability to be on the state rreached by the system with pr = 1

N#xi = r|1 ≤i ≤ N and N is the number of sample points.Here, the entropy SN has been normalized to liein [0, 1] so that the entropy of a constant signalSN → 0 while the entropy of a chaotic time seriesSN → 1.

We classify a given signal as chaotic when its Lya-punov exponent is strictly positive and its entropyis greater than a given threshold, experimentallyset at 0.75 (see also Figure 5), and non-chaotic oth-erwise. It is crucial to classify the training dataset accurately in order to reduce misclassificationson the testing set. On that note, by using theShannon entropy in addition to the Lyapunov expo-nent as a measure of chaos we gained an incremen-tal improvement in accuracy. This is because theLyapunov exponent was misclassifying some quasi-periodic signals as chaotic.

The Lyapunov exponent and the Shannon en-tropy of the logistic map as a function of the bifur-cation parameter µ are illustrated in Figure 5. Inreal applications, computing these quantities overthe whole range of parameters and in some caseswithout knowing the expression of the underlyingdynamical system can be unfeasible or computa-tionally expensive, which justifies the approach ofusing a machine learning algorithm to perform theclassification automatically.

The average classification accuracy of the neu-ral networks ShallowNet, MLP, FCN, ResNet, andLKCNN is reported in Table 1. The ShallowNet,MLP, FCN, and ResNet architectures classify sig-nals from the sine-circle map with an accuracy lessthan 65%. The LKCNN network however seemsto override overfitting issues on the training set bycapturing the main features of chaotic and periodicsignals and gets an average classification accuracyof 89.8%. It is of interest to notice that the shallowneural network reaches an accuracy greater thanstate-of-the-art time series classification networkson the sine-circle data set despite its simplicity. Im-proving the accuracy of LKCNN on the sine-circle

map might be challenging because this dynamicalsystem leads to signals with behaviour that is ab-sent in the training set of the logistic map (see e.g.the regime µ ∈ [1, 1.3] in Figure 3 (bottom)).

4. Continuous dynamical systems

We now consider continuous dynamical systemsof ordinary and partial differential equations thatexhibit temporal and spatiotemporal chaos, respec-tively. The aim here is to determine whether a neu-ral network trained on a low dimensional dynamicalsystem is able to generalise and classify univariatetime series generated by a higher dimensional dy-namical system. We will first consider the Lorenzsystem since it is one of the most typical contin-uous dynamical systems with a chaotic behaviourwhich has been widely studied in the twentieth cen-tury [39].

4.1. Lorenz system

The Lorenz system [40] consists of the followingthree ordinary differential equations:

x = σ(y − x), (5a)

y = x(ρ− z)− y, (5b)

z = xy − βz. (5c)

Taking the parameters σ = 10, β = 8/3, andvarying ρ in [0, 250] yields convergent, periodic,and chaotic solutions. We numerically solve Equa-tion (5) using MATLAB’s function ode45 with[x, y, z] = [1, 1, 1] as initial condition. Integratingthe equations for t ∈ [0, 100] we obtain time seriesfor x(t), y(t), and z(t) of length one thousand, andwe carry out this operation for five thousand valuesof the bifurcation parameter ρ in the range [0, 250].

The time series x(t), y(t), and z(t) are normal-ized by the linear transformation x(t) 7→ (x(t) −m)/(M −m), where M and m are respectively themaximum and minimum of the time series, suchthat their range are in the interval [0, 1] (see timeseries in Figure 6 for ρ = 70). Note that normalisingthe time series is crucial to obtain good generalisa-tion performance because the x, y, and z time seriesdo not have the same range of values. Figure 7 de-picts four time series of the variable x(t) generatedby numerically solving Equation (5) for ρ = 15, 28,160, and 180.

We classify the time series of the Lorenz sys-tem as chaotic or non-chaotic according to the sign

6

Table 1: Classification accuracy on the logistic and sine-circle maps data sets. The neural networks are trained on logisticsignals and the accuracy is averaged over five training cycles.

Networks ShallowNet MLP FCN ResNet LKCNNLogistic 99.5 83.4 95.3 96.7 98.8Sine-circle 64.9 60.2 54.0 44.8 89.8

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

x(t)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

y(t)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

z(t)

Figure 6: Normalized time series of the x, y, and z com-ponents of the Lorenz system with bifurcation parameterρ = 70.

of the Lyapunov exponent at the correspondingregimes of the bifurcation parameter ρ in order togenerate training and testing data sets for the neu-ral networks. Here, we compute the Lyapunov ex-ponents for the time series of the variable x(t) start-ing from some initial condition x(0) as follows

λ = limt→+∞

limε→0

1

tlog

(|x(t)− xε(t)|

ε

), (6)

where |x(0) − xε(0)| < ε 1. Figure 8 shows theLyapunov exponents of the variable x(t), which de-termine the classification of the testing set of timeseries given to the neural networks described in Sec-tion 2. For example, the chaotic time series plottedin Figure 7 (d) corresponds to a bifurcation param-eter of ρ = 180 and has a strictly positive Lyapunovexponent as shown in Figure 8. For continuous dy-namical systems the Shannon entropy did not ap-pear to be a precise measure of chaotic behaviour.In fact, Shannon entropy requires a threshold todetermine if a given time series is chaotic or not.Setting a good threshold might not always be pos-sible and requires a lot of experimentation. More-over, in this case, the distribution of values of thetime series is continuous, giving a broad probabil-ity distribution function, which makes the task ofchoosing a threshold troublesome. Therefore, wedo not consider it to classify the time series of theLorenz system.

The different neural networks are trained on timeseries of the x component of the Lorenz system andtested on the y and z components. These threedatasets are composed of two thirds of chaotic timeseries. Similarly to the logistic map data set (seeSection 3), the x component set is divided in the fol-lowing way: two thirds for training and one thirdfor testing. We then compare the classification ac-curacy of the networks described in Section 2 on thetwo data sets. The results are presented in Table 2.

The convolutional neural network LKCNN (seeSection 2.5) outperforms the other networks on allthe testing sets composed by time series of the x,y, and z components of the Lorenz system. In par-ticular, it is able to generalise well on the z compo-

7

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

x(t)

(a)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

x(t)

(b)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

x(t)

(c)

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

t

x(t)

(d)

Figure 7: Normalized time series of the x component of the Lorenz system with bifurcation parameter ρ = 15 (a), 28 (b), 160(c), and 180 (d).

0 50 100 150 200 250-50

-40

-30

-20

-10

0

10

λ

ρ

Figure 8: Lyapunov exponents of the x(t) component of theLorenz system for σ = 10, β = 8/3, and ρ ∈ [0, 250]. A pos-itive Lyapunov exponent (points above the horizontal blackline) indicates a chaotic solution to the Lorenz equations.

nent by determining whether a given time series ischaotic or not correctly with an average accuracy of78.2%. The other neural networks seem to overfitthe training set and fail to classify time series of thez component correctly. Note that the y componentof the Lorenz system is highly correlated with the xcomponent, unlike the z component (see Figure 6),which explains the relative good classification ac-curacy (around 75%) of all the neural networks onthe y component.

4.2. Kuramoto–Sivashinsky equation

In this section, we consider the Kuramoto–Sivashinsky (KS) equation, which is an example ofa fourth-order nonlinear partial differential equa-tion, which exhibits spatiotemporal chaos. Thisequation was originally derived by Kuramoto [41,42, 43] and Sivashinsky [44, 45, 46] to model in-stabilities in laminar flame fronts and arises in awide range of physical problems such as plasmaphysics [43], flame propagation [44], or free surfacefilm flows [47, 48, 49]. In particular, we study theKuramoto–Sivashinsky system normalized to theinterval [0, 2π]:

ut + 4uxxxx + α

[uxx +

1

2(ux)2

]= 0,

u(x, 0) = u0(x), u(x+ 2π, t) = u(x, t),

(7)

8

Table 2: Classification accuracy on the Lorenz system. The network is trained on x component of the Lorenz system and theaccuracy is averaged over five training cycles.

Networks ShallowNet MLP FCN ResNet LKCNNLorenz X 98.5 90.2 80.3 88.8 98.6Lorenz Y 75.9 75.5 74.6 73.6 95.4Lorenz Z 58.2 54.9 65.5 45.8 78.2

where x ∈ [0, 2π], t ∈ R+, and α is the bifurcationparameter.

We refer to the study of the attractors by Hy-man and Nicolaenko [50] and follow the approachof Papageorgiou and Smyrlis [51, 52] by consideringthe initial condition u0(x) = − sin(x) to ensure thatthe integral of the solution over the spatial domainvanishes. Varying the bifurcation parameter α inEquation (7) yields a wide range of attracting solu-tions such as periodic, bimodal, travelling wave, orchaotic, numerically studied in [50].

We spatially discretise Equation (7) using theFourier spectral method with the 2/3 dealiasingrule [53] and temporally using the ETDRK4 schemeof Cox and Matthews [54]. We use the stiff partialdifferential equation integrator [55] in the Chebfunsoftware [56] with a spectral resolution of 512 and atime step of 2.5× 10−4 to numerically solve Equa-tion (7) for t ∈ [0, 10]. The regimes we consideredare listed below based on the values of the bifurca-tion parameter α:

1. One hundred values of α are uniformly dis-tributed in each of the following intervals:[18, 22], [23, 33], [43, 45], [56, 65], [95, 115].These intervals are chosen to cover a widerange of behaviours according to [50].

2. Five hundred values of α are uniformly dis-tributed in [120, 130].

This leads to a data set of one thousand realisations,equally divided between chaotic and non-chaoticbehaviour.

Figure 9 shows oscillatory solutions to the KSequation for α = 20 (a), 44 (b) and a quadrimodalsolution (c). A chaotic solution to the KS equationis depicted in Figure 9 (d). This spatiotemporalchaotic behaviour is hard to analyse because of thehigh dimensionality of the system, i.e. the largenumber of Fourier modes of the solutions. Thus,we analyse the behaviour of the solutions by con-sidering the energy time series

E(t) =

ˆ 2π

0

u(x, t)2 dx, (8)

normalized by the transformation E(t) 7→ (E(t) −m)/(M −m), where

m = mint∈[0,10]

ˆ 2π

0

u(x, t)2 dx,

M = maxt∈[0,10]

ˆ 2π

0

u(x, t)2 dx,

such that it lies in the interval [0, 1]. The normal-ized energy time series of the solutions to the KSequation for α = 20, 44, 100, and 125 is plotted inFigure 9 (e) to (h), respectively. These figures illus-trate the relation between E(t) and the behaviourof the solution u(x, t).

Similarly to Section 4.1, we train the large ker-nel convolutional neural network described in Sec-tion 2.5 on the x component of the Lorenz systemand test it on the time series from the data set of theKS equation described above. The global accuracythat we obtain to classify the time series betweenchaotic and non-chaotic is 94.4%. The accuracy forthe different classes of attracting solutions in thetesting set is reported in Table 3.

Table 3: Classification results of the energy time series ofthe KS equation for various α. The neural network LKCNNis trained on the x component of the Lorenz system. Theclassification accuracy is reported for the different intervalsof α composing the data set and averaged over five trainingcycles.

Range of α Solutions behaviour Accuracy[18, 22] periodic 54.8[23, 33] bimodal 96.4[43, 45] periodic 99.2[56, 65] trimodal 95.4[95, 115] quadrimodal 99.6[120, 130] chaotic 99.8

We observe that the LKCNN classifies cor-rectly time series of bimodal, highly oscillatory,trimodal, and quadrimodal solutions, correspond-ing to α ∈ [23, 33], [43, 45], [56, 65], [95, 115], asnon-chaotic with an accuracy of 96.4%, 99.2%,

9

t

x

(a)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

t

E(t)

(e)

t

x

(b)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

t

E(t)

(f)

t

x

(c)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

t

E(t)

(g)

t

x

(d)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

t

E(t)

(h)

Figure 9: Solutions to the KS equation with α = 20 (a), 44 (b), 100 (c), and 125 (d). The right panels show the correspondingnormalized energy E(t). The chaotic solution depicted in (d) has been zoomed to t ∈ [0.9, 1]. (c) and (g) illustrate the transientregime of the solution for t ∈ [0, 0.5] before convergence to the global quadrimodal attractor.

10

95.4%, and 99.6%, respectively. Moreover, the net-work achieves 99.8% accuracy on the set of chaotictime series. However, the energy time series oflow-frequency periodic solutions to the Kuramoto–Sivashinsky equation for α ∈ [18, 22] are misclas-sified by the neural network since only 54.8% ofthem are identified as non-chaotic. We expect thismisclassification to be due to qualitative differencesbetween the corresponding energy time series of theKS equation and the periodic time series of theLorenz system. In particular, the KS data set con-tains periodic time series with low frequency oscil-lations in this regime (see Figure 9 (b)), while theLorenz system generates periodic time series withhigh frequency oscillations (see Figure 7 (c)). Theneural network is then unable to classify featuresthat are not present in the training set and hencefails to generalise to the low frequency periodic timeseries of the Kuramoto–Sivashinsky equation.

However, we identified two regimes (i.e. ρ ∈[140, 160] and ρ ∈ [200, 220]) for which the z com-ponent time series of the Lorenz system are lowfrequency. Therefore, we trained LKCNN on thez component of the Lorenz system over the wholerange of parameters ρ ∈ [0, 250] and tested it onthe KS system. Then, we find that the low fre-quency time series of the KS system (correspondingto α ∈ [18, 22]) are now classified correctly with anaccuracy of 95.6%, (instead of 54.8% when the net-work is trained on the x component of the Lorenzsystem). Moreover, the overall accuracy improvesto 98.7% instead of 94.4%.

4.3. Accuracy dependence on the training data setsize and the time series length

We test the robustness of the neural networkLKCNN on the classification problem of the KSequation by studying how the accuracy depends onthe size of the training data set and the length of thetime series. Remember that the network is trainedon time series of the x component of the Lorenzsystem of same length, whose chaotic classificationis obtained using the Lyapunov exponent.

Figure 10 shows the effect of the size of thetraining data set on the classification accuracyof LKCNN. We observe that the neural networkachieves a median accuracy greater than 90% whenthe amount of training data available is above 10%.Note that this 10% corresponds to five hundredtime series from the whole data set of the Lorenzsystem. This high accuracy is due to the accurateclassification of the neural network on most regimes

of the time series of the data set (see Table 3). Itis of interest to notice that the accuracy is slightlyaffected only for the first quartile above 10% of thesize of the training set. The fluctuations of the ac-curacy in Figure 10 are explained by the difficultyto classify low frequency time series of the KS dataset.

0 20 40 60 80 1000

20

40

60

80

100

Acc

ura

cyAmount of training data (%)

Figure 10: Classification accuracy of the KS data set versusthe amount of training data (percentage from five thousandrealisations). We show the first quartile (3), the median(#), and the third quartile () of the accuracy obtained by20 training cycles of the LKCNN neural network on timeseries of the x component of the Lorenz system.

In Figure 11, we analyse the classification abilityof LKCNN on shorter time series. It is interestingthat for all the lengths of the time series we con-sidered, LKCNN reaches a median accuracy greaterthan 90% on the KS problem. This indicates thatthe sign of the Lyapunov exponent even for shorttime series of length three hundred of the trainingset is determined as accurate as for time series oflength thousand. Consequently, this result suggeststhat the Lyapunov exponent is a good measure ofchaotic behaviour in the Lorenz system. The factthat we train and test the network on time seriesof the same length might also be a justification ofthis high accuracy. On the other hand, the fluctu-ations of the first quartile show the inability of thenetwork to classify the time series of the KS dataset in a few cases. Note that we have not consid-ered time series of lengths less than three hundred(or 30% of the time series length). This is becauseLKCNN includes two convolutional layers of kernelsize one hundred.

Overall, our results show the robustness ofLKCNN on this problem. In particular, this net-work is able to generalise well on time series gen-erated by the KS equation and achieves a medianclassification accuracy greater than 90%, indepen-dently of the size of the training data set or thelength of the time series.

11

30 40 50 60 70 80 90 1000

20

40

60

80

100

Acc

ura

cy

Length of the time series (%)

Figure 11: Classification accuracy of the KS time series withrespect to the length of the time series (percentage fromone thousand timesteps). We show the first quartile (3),the median (#), and the third quartile () of the accuracyobtained by 20 training cycles of the LKCNN neural networkon time series of the x component of the Lorenz system.

Conclusions

For some of the most challenging real-life appli-cations the expression of a precise underlying dy-namical system is unknown or the phase space ofthe system is infinite dimensional, which makes theidentification of the different dynamical regimes un-feasible or in the best case scenario computation-ally expensive. For this reason, in this study wehave introduced a deep learning approach for clas-sifying univariate time series generated by discreteand continuous dynamical systems. Our approachis to train standard neural networks on a givendynamical system with a basic or low-dimensionalphase space and generalise by using this network toclassify univariate time series of a dynamical sys-tem with more intricate or high-dimensional phasespace.

The convolutional neural network with large ker-nel size (LKCNN) is able to learn the chaotic fea-tures of these dynamical systems and classify theirtime series with high accuracy. On the other hand,state-of-the-art neural networks tend to overfit thetraining data set and do not perform as well. Indetail, our generalisation approach has been ap-plied to classify univariate time series generated bythe sine-circle map and the Kuramoto–Sivashinskyequation, using the logistic map and the Lorenz sys-tem as training data sets, respectively. We observeda classification accuracy greater than 80% on thesine-circle map and more than 90% on the KS equa-tion. This great performance can be justified fromthe fact that there are no batch normalisation lay-ers in this network and thus we avoid rescaling thetesting data with the wrong normalisation param-eters in contrast with FCN and ResNet. Moreover,

we observed that the classification accuracy of theKS time series increased further when low frequencytime series were involved in the training phase. Wealso found that LKCNN is able to generalise withhigh classification accuracy even when the size ofthe training data set or the length of the time se-ries is very small. This demonstrates the robustnessof this neural network in this problem.

Finally, our study suggests that deep learningtechniques, which can generalise the knowledge ac-quired from a training set to a different testingset, can be valuable to classify time series intochaotic and non-chaotic obtained from real-life ap-plications. There are many directions in which thepresent results can be pursued further. First ofall, attempting to classify time series obtained fromreal-life applications is crucial. In that respect, theeffect of noise in the training and testing data setsis an important aspect to be considered and studythe influence of the noise to the accuracy of thenetworks to classify the time series. We hope toaddress such an analysis in our future work.

Acknowledgments

The authors would like to thank Samuel Cohenfor useful discussions and Nicos Pavlidis for pro-viding useful comments on an early draft of thismanuscript. We thank the two anonymous review-ers for their valuable comments and suggestionsthat improved our manuscript. This work was sup-ported by the EPSRC Centre For Doctoral Train-ing in Industrially Focused Mathematical Mod-elling (EP/L015803/1) in collaboration with Cul-ham Centre for Fusion Energy. This work has beencarried out within the framework of the EUROfu-sion Consortium and has received funding from theEuratom research and training programme 2014-2018 and 2019-2020 under grant agreement No633053. The views and opinions expressed hereindo not necessarily reflect those of the EuropeanCommission.

References

References

[1] M. Casdagli, Nonlinear prediction of chaotic time series,Physica D 35 (3) (1989) 335–356.

[2] Y.-C. Lai, N. Ye, Recent developments in chaotic timeseries analysis, Int. J. Bifurcat. Chaos 13 (6) (2003)1383–1422.

12

[3] S. Mukherjee, E. Osuna, F. Girosi, Nonlinear predictionof chaotic time series using support vector machines, in:Proceedings of the 1997 IEEE Signal Processing SocietyWorkshop, IEEE, 1997, pp. 511–520.

[4] G. Sugihara, Nonlinear forecasting for the classificationof natural time series, Philos. T. R. Soc. A 348 (1688)(1994) 477–495.

[5] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenetclassification with deep convolutional neural networks,in: Advances in Neural Information Processing Sys-tems, NIPS, 2012, pp. 1097–1105.

[6] Y. LeCun, Y. Bengio, The handbook of brain theoryand neural networks, MIT Press, 1998.

[7] K. He, X. Zhang, S. Ren, J. Sun, Delving deep intorectifiers: Surpassing human-level performance on im-agenet classification, in: Proceedings of the IEEE in-ternational conference on computer vision, IEEE, 2015,pp. 1026–1034.

[8] J. B. Elsner, A. A. Tsonis, Nonlinear prediction, chaos,and noise, B. Am. Meteorol. Soc. 73 (1) (1992) 49–60.

[9] T. Kuremoto, S. Kimura, K. Kobayashi, M. Obayashi,Time series forecasting using a deep belief network withrestricted Boltzmann machines, Neurocomputing 137(2014) 47–56.

[10] Z. Wang, W. Yan, T. Oates, Time series classificationfrom scratch with deep neural networks: A strong base-line, in: International joint conference on neural net-works, IEEE, 2017, pp. 1578–1585.

[11] J. Han, A. Jentzen, E. Weinan, Solving high-dimensional partial differential equations using deeplearning, Proc. Nat. Acad. Sci. 115 (34) (2018) 8505–8510.

[12] J. Sirignano, K. Spiliopoulos, DGM: A deep learningalgorithm for solving partial differential equations, J.Comput. Phys. 375 (2018) 1339–1364.

[13] E. Weinan, B. Yu, The Deep Ritz method: A deeplearning-based numerical algorithm for solving varia-tional problems, Commun. Math. Stat. 6 (1) (2018) 1–12.

[14] M. Raissi, Deep hidden physics models: Deep learn-ing of nonlinear partial differential equations, J. Mach.Learn. Res. 19 (1) (2018) 932–955.

[15] M. Raissi, G. E. Karniadakis, Hidden physics models:Machine learning of nonlinear partial differential equa-tions, J. Comput. Phys. 357 (2018) 125–141.

[16] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning frameworkfor solving forward and inverse problems involving non-linear partial differential equations, J. Comput. Phys.378 (2019) 686–707.

[17] S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz,Data-driven discovery of partial differential equations,Sci. Adv. 3 (4) (2017) e1602614.

[18] J. Pathak, B. Hunt, M. Girvan, Z. Lu, E. Ott, Model-free prediction of large spatiotemporally chaotic sys-tems from data: A reservoir computing approach, Phys.Rev. Lett. 120 (2) (2018) 024102.

[19] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, E. Ott, Us-ing machine learning to replicate chaotic attractors andcalculate Lyapunov exponents from data, Chaos 27 (12)(2017) 121102.

[20] P. Esling, C. Agon, Time-series data mining, ACMComput. Surv. 45 (1) (2012) 12.

[21] H. F. Nweke, Y. W. Teh, M. A. Al-Garadi, U. R. Alo,Deep learning algorithms for human activity recognition

using mobile and wearable sensor networks: State ofthe art and research challenges, Expert Syst. Appl. 105(2018) 233–261.

[22] T. L. Nwe, T. H. Dat, B. Ma, Convolutional neuralnetwork with multi-task learning scheme for acousticscene classification, in: Asia-Pacific Signal and Informa-tion Processing Association Annual Summit and Con-ference, IEEE, 2017, pp. 1347–1350.

[23] G. A. Susto, A. Cenedese, M. Terzi, Time-series clas-sification methods: review and applications to powersystems data, in: Big data application in power sys-tems, Elsevier, 2018, pp. 179–220.

[24] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E.Howard, W. Hubbard, L. D. Jackel, Backpropagationapplied to handwritten zip code recognition, NeuralComput. 1 (4) (1989) 541–551.

[25] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, P.IEEE 86 (11) (1998) 2278–2324.

[26] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.-A. Muller, Deep learning for time series classification:a review, Data Min. Knowl. Disc. (2019) 1–47.

[27] D. P. Kingma, J. Ba, Adam: A method for stochasticoptimization, arXiv preprint arXiv:1412.6980.

[28] D. L. Olson, D. Delen, Advanced data mining tech-niques, Springer Science & Business Media, 2008.

[29] K. H. Brodersen, C. S. Ong, K. E. Stephan, J. M. Buh-mann, The balanced accuracy and its posterior distri-bution, in: 20th International Conference on PatternRecognition, IEEE, 2010, pp. 3121–3124.

[30] M. F. Møller, A scaled conjugate gradient algorithm forfast supervised learning, Neural networks 6 (4) (1993)525–533.

[31] Z. Wang, Github repository, https://github.com/

cauchyturing/UCR_Time_Series_Classification_

Deep_Learning_Baseline (2019).[32] M. Lin, Q. Chen, S. Yan, Network in network, arXiv

preprint arXiv:1312.4400.[33] S. Ioffe, C. Szegedy, Batch normalization: Accelerat-

ing deep network training by reducing internal covariateshift, in: Proceedings of the 32nd International Confer-ence on Machine Learning, PMLR, 2015, pp. 448–456.

[34] I. Goodfellow, Y. Bengio, A. Courville, Deep learning,MIT press, 2016.

[35] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,R. Salakhutdinov, Dropout: a simple way to preventneural networks from overfitting, J. Mach. Learn. Res.15 (1) (2014) 1929–1958.

[36] R. M. May, Simple mathematical models with very com-plicated dynamics, Nature 261 (1976) 459–467.

[37] S. H. Strogatz, Nonlinear Dynamics and Chaos: WithApplications to Physics, Biology, Chemistry, and Engi-neering, Westview Press, 2014.

[38] R. C. Hilborn, Chaos and nonlinear dynamics: an intro-duction for scientists and engineers, Oxford UniversityPress on Demand, 2000.

[39] C. Sparrow, The Lorenz equations: bifurcations, chaos,and strange attractors, Vol. 41, Springer Science &Business Media, 2012.

[40] E. N. Lorenz, Deterministic nonperiodic flow, J. Atmos.Sci. 20 (2) (1963) 130–141.

[41] Y. Kuramoto, Diffusion-induced chaos in reaction sys-tems, Prog. Theor. Phys. Supp. 64 (1978) 346–367.

[42] Y. Kuramoto, T. Tsuzuki, On the formation of dissi-pative structures in reaction-diffusion systems: Reduc-

13

https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline



tive perturbation approach, Prog. Theor. Phys. 54 (3)(1975) 687–699.

[43] Y. Kuramoto, T. Tsuzuki, Persistent propagation ofconcentration waves in dissipative media far from ther-mal equilibrium, Prog. Theor. Phys. 55 (2) (1976) 356–369.

[44] G. I. Sivashinsky, Nonlinear analysis of hydrodynamicinstability in laminar flames–I. Derivation of basis equa-tions, Acta Astronaut. 4 (11-12) (1977) 1177–1206.

[45] G. I. Sivashinsky, On flame propagation under condi-tions of stoichiometry, SIAM J. Appl. Math. 39 (1)(1980) 67–82.

[46] G. I. Sivashinsky, Instabilities, pattern formation, andturbulence in flames, Annu. Rev. Fluid Mech. 15 (1)(1983) 179–199.

[47] D. J. Benney, Long waves on liquid films, J. Math. Phys.45 (1-4) (1966) 150–155.

[48] A. P. Hooper, R. Grimshaw, Nonlinear instability atthe interface between two viscous fluids, Phys. Fluids28 (1) (1985) 37–45.

[49] G. I. Sivashinsky, D. M. Michelson, On irregular wavyflow of a liquid down a vertical plane, Prog. Theor.Phys. 63 (1980) 2112–2114.

[50] J. M. Hyman, B. Nicolaenko, The Kuramoto–Sivashinsky equation: a bridge between PDE’s and dy-namical systems, Physica D 18 (1-3) (1986) 113–126.

[51] D. T. Papageorgiou, Y. S. Smyrlis, The route to chaosfor the Kuramoto–Sivashinsky equation, Theor. Comp.Fluid Dyn. 3 (1) (1991) 15–42.

[52] Y. S. Smyrlis, D. T. Papageorgiou, Predicting chaos forinfinite dimensional dynamical systems: the Kuramoto–Sivashinsky equation, a case study, Proc. Nat. Acad.Sci. 88 (24) (1991) 11129–11132.

[53] S. A. Orszag, On the elimination of aliasing in finite-difference schemes by filtering high-wavenumber com-ponents, J. Atmos. Sci. 28 (6) (1971) 1074–1074.

[54] S. M. Cox, P. C. Matthews, Exponential time differenc-ing for stiff systems, J. Comput. Phys. 176 (2) (2002)430–455.

[55] H. Montanelli, N. Bootland, Solving periodic semilinearstiff PDEs in 1D, 2D and 3D with exponential integra-tors, arXiv preprint arXiv:1604.08900.

[56] T. A. Driscoll, N. Hale, L. N. Trefethen, Chebfun guide(2014).

[57] J. P. Eckmann, Roads to turbulence in dissipative dy-namical systems, Rev. Mod. Phys. 53 (1981) 643–654.

14

abstract arxiv:1908.06848v2 [eess.sp] 27 sep 2019 · we observe thata convolutional neural network...

Documents