chapter 9: machinelearning in time series · recurrent networks: moving average • second model...

Chapter 9: Machine Learning in Time Series

1

Time series processing

• Given: time-dependent observables

• Scalar: univariate; vector: multivariate

• Typical tasks:

,1,0, txt

2

- Forecasting- Noise modeling

- Pattern recognition

- Modeling

- Filtering- Source separation

Time series(minutes to days)

Signals(milliseconds to seconds)

Examples

3

Standard & Poor‘s Sunspots

Preprocessed: (returns) Preprocessed: (de-seasoned)1 ttt xxr 11 ttt xxs

Autoregressive models

• Forecasting: making use of past information to predict (estimate) the future

• AR: Past information = past observations

tptttt xxxFx ,,, 21

4

past observations ptX ,

Expected valuetx̂

Noise,„random shock“

• Best forecast: expected value

Linear AR models

• Most common case:

• Simplest form: random walk

• Nontrivial forecast impossible

p

ittit xax

11

1,0~ ;1 Nxx tttt

5

MLP as NAR

• Neural network can approximate nonlinear AR model

• „time window“ or „time delay“

6

Complex noise models

• Assumption: arbitrary distribution

• Parameters are time dependent (dependent on past):

• Likelihood:

D~

ptXg ,

7

N

i

iptXgdL

1

)(,

Probability density function for D

Heteroskedastic time series

• Assumption: Noise is Gaussian with time-dependent variance

• ARCH model

• MLP is nonlinear ARCH (when applied to returns/residuals)

N

i

X

Xx

iptt

iptt

ipt

it

eX

L1

2

)(,

2

)(,

2

2)(,

)(

2

1

p

iitit ra

1

22

8

222

2121

2 ,,,',,, ptttptttt rrrFrrrF

Non-Gaussian noise

• Other parametric pdfs (e.g. t-distribution)

• Mixture of Gaussians (Mixture density network, Bishop 1994)

• Network with 3k outputs (or separate networks)

k

i

X

Xx

pti

pti pti

pti

eX

Xd

1

2

,2

,2 ,2

2,

2,,

9

Identifiability problem

• Mixture models (like neural networks) are not identifiable (parameters cannot be interpreted)

• No distinction between model and noisee.g. sunspot data:

• Models have to be treated with care

10

Recurrent Perceptrons

• Recurrent connection = feedback loop

• From hidden layer („Elman“) or output layer („Jordan“)

11

Learning:„backpropagation through time“

Input Zustands- bzw.Kontextlayer

copy

Recurrent networks: Moving Average

• Second model class: Moving Average models

• Past information: random shocks

• Recurrent (Jordan) network: Nonlinear MA

• However, convergence notguaranteed

q

iitit bx

0

12

ttt xx ˆ

State space models

• Observables depend on (hidden) time-variant state

• Strong relationship to recurrent (Elman) networks

• Nonlinear version only with additional hidden layers

ttt

ttt

ss

sx

BA

C

1

13

Practical considerations

• Stationarity is an important issue

• Preprocessing (trends, seasonalities)

• N-fold cross-validation time-wise(validation set must be after training set

• Mean and standard deviation model selection

14

train

validation

test

Unfolding recurrent networks

• Event long in the past can have influence on presence

https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/

15

Vanishing (or exploding) gradients

16

• Gradient quickly goes to 0 (or infinity)

• Long term dependencies cannot be learnedhttps://adventuresinmachinelearning.com/recurrent-neural-

networks-lstm-tutorial-tensorflow/

Long short term memory (LSTM)

17

No vanishinggradient if f>>0

https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/

Medical example

• Sleep staging

18

Iber et al. 2007 (AASM scoring manual)

Stephansen et al., Nature Communications, 2018

Medical example 2

• Predicting mortality in ICU

19Xia et al., Comp Math Meth Med, 2018

Text/Speech processing

• Text/language is sequential in nature

• Long term dependencies:

– The girl, who played in the team last week, took her sister to school and gratulated her

• Medical applications:

– Text mining in abstracts

– Physician reports

20

„forget“

Symbolic time series

• Examples:– DNA

– Text

– Quantised time series (e.g. „up“ and „down“)

• Past information: past p symbols probability distribution

• Markov chains

• Problem: long substrings are rare

it sx

ptttt xxxxp ,,,| 21

21

alphabet

Fractal prediction machines

• Similarsubsequences aremapped to pointsclose in space

• Clustering = extraction ofstochasticautomaton

• Variable lengthMarkov model

22

Relationship to recurrent network

• Network of 2nd order

23

Distinguishing coding/noncoding DNA

24

Tino P., Dorffner G., Machine Learning 2001

- DNA: sequence with alphabet size 4

Summary

• Neural networks are powerful semi-parametric models fornonlinear dependencies

• Can be considered as nonlinear extensions of classical time series and signal processing techniques

• Applying semi-parametric models to noise modeling addsanother interesting facet

• Models must be treated with care, much data necessary

• Recurrent networks

• Latest development (deep learning): LSTM

25

chapter 9: machinelearning in time series · recurrent networks: moving average • second model...

Documents