time series slides (1)

8/19/2019 Time Series Slides (1)

1/44

Time Series

Karanveer Mohan Keegan Go Stephen Boyd

EE103Stanford University

November 12, 2015


2/44

Outline

Introduction

Linear operations

Least-squares

Prediction

Introduction 2


3/44

Time series data

represent time series x1, . . . , xT as T -vector x

xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T

examples:

– average temperature at some location on day t– closing price of some stock on (trading) day t– hourly number of users on a website– altitude of an airplane every 10 seconds– enrollment in a class every quarter

vector time series: xt is an n-vector; can represent as T × n matrix

Introduction 3


4/44


5/44


6/44

Melbourne temperature

zoomed to one year

50 100 150 200 250 300 3500

5

10

15

20

25

Introduction 6


7/44

Apple stock price

log10 of Apple daily share price, over 30 years, 250 trading days/year

you can see (not steady) growth

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

1.5

2

2.5

Introduction 7


8/44

Log price of Apple

zoomed to one year

6000 6050 6100 6150 6200 62500.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

Introduction 8


9/44

Electricity usage in (one region of) Texas

total in 15 minute intervals, over 1 year

you can see variation over year

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

1

2

3

4

5

6

7

8

9

10

Introduction 9


10/44

Electricity usage in (one region of) Texas

zoomed to 1 month

you can see daily periodicity and weekend/weekday variation

500 1000 1500 2000 25001

2

3

4

5

6

7

8

9

Introduction 10


11/44

Outline

Introduction

Linear operations

Least-squares

Prediction

Linear operations 11


12/44

Down-sampling

k× down-sampled time series selects every kth entry of x

can be written as y = Ax

for 2× down-sampling, T even,

A =

1 0 0 0 0 0 · · · 0 0 0 00 0 1 0 0 0 · · · 0 0 0 00 0 0 0 1 0 · · · 0 0 0 0...

......

......

......

......

...0 0 0 0 0 0 · · · 1 0 0 0

0 0 0 0 0 0 · · · 0 0 1 0



13/44

Down-sampling on Apple log price

4× down-sample

5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17



14/44

Up-sampling

k× (linear) up-sampling interpolates between entries of x can be written as y = Ax

for 2× up-sampling

A =

1

1/2 1/21

1/2 1/21

. . .

11/2 1/2

1



15/44

Up-sampling on Apple log price

4× up-sample

5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17



16/44

Smoothing

k-long moving average y of x is given by

yi = 1

k(xi + xi+1 + · · · + xi+k−1), i = 1, . . . , T − k + 1

can express as y = Ax, e.g., for k = 3,

A =

1/3 1/3 1/31/3 1/3 1/3

1/3 1/3 1/3. . .

1/3 1/3 1/3

can also have trailing or centered smoothing



17/44

Melbourne daily temperature smoothed

centered smoothing with window size 41

0 500 1000 1500 2000 2500 3000 3500 4000−5

0

5

10

15

20

25

30



18/44

First-order differences

(first-order) difference between adjacent entries

discrete analog of derivative

express as y = Dx, D is the (T − 1) × T difference matrix

D =

−1 1 . . .

−1 1 . . .. . .

. . .

. . . −1 1

Dx2 is a measure of the wiggliness of x

Dx2 = (x2 − x1)2 + · · · + (xT − xT −1)

2



19/44

Outline

Introduction

Linear operations

Least-squares

Prediction

Least-squares 19


20/44

De-meaning

de-meaning a time series means subtracting its mean:x̃ = x − avg(x)

rms(x̃) = std(x)

this is the least-squares fit with a constant

Least-squares 20


21/44

Straight-line fit and de-trending

fit data (1, x1), . . . , (T, xT ) with affine model xt ≈ a + bt(also called straight-line fit )

b is called the trend

a + bt is called the trend line

de-trending a time series means subtracting its straight-line fit

de-trended time series shows variations above and below thestraight-line fit

Least-squares 21


22/44

Straight-line fit on Apple log price

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

1.5

2

2.5

Trend

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

Residual

Least-squares 22


23/44

Periodic time series

let P -vector z be one period of periodic time series

xper = (z , z , . . . , z)

(we assume T is a multiple of P )

express as xper = Az with

A =

I P ...

I P

Least-squares 23


24/44

Extracting a periodic component

given (non-periodic) time series x, choose z to minimize x − Az2

gives best least-squares fit with periodic time series

simple solution: average periods of original:

ẑ = (1/k)AT x, k = T /P

e.g., to get ẑ for January 9, average all xi’s with date January 9

Least-squares 24


25/44

Periodic component of Melbourne temperature

0 500 1000 1500 2000 2500 3000 3500 4000−5

0

5

10

15

20

25

30

Least-squares 25


26/44

Extracting a periodic component with smoothing

can add smoothing to periodic fit by minimizing

x − Az2 + λDz2

λ > 0 is smoothing parameter

D is P × P circular difference matrix

D =

−1 1−1 1

. . . . . .

−1 11 −1

λ is chosen visually or by validation

Least-squares 26


27/44

Choosing smoothing via validation

split data into train and test sets, e.g., test set is last period (P entries)

train model on train set, and test on the test set

choose λ to (approximately) minimize error on the test set

Least-squares 27


28/44

Validation of smoothing for Melbourne temperature

trained on first 8 years; tested on last two years

10−2

10−1

100

101

102

2.6

2.65

2.7

2.75

2.8

2.85

2.9

2.95

3

Least-squares 28

P i di f i h hi


29/44

Periodic component of temperature with smoothing

zoomed on test set, using λ = 30

2900 3000 3100 3200 3300 3400 3500 36000

5

10

15

20

25

Least-squares 29


30/44

Outline

Introduction

Linear operations

Least-squares

Prediction

Prediction 30

P di i


31/44

Prediction

goal: predict or guess xt+K given x1, . . . , xt K = 1 is one-step-ahead prediction

prediction is often denoted x̂t+K , or more explicitly x̂(t+K |t)(estimate of xt+K at time t)

x̂t+K − xt+K is prediction error applications: predict

– asset price– product demand– electricity usage

– economic activity– position of vehicle

Prediction 31

S i l di


32/44

Some simple predictors

constant: x̂t+K = a

current value: x̂t+K = xt linear (affine) extrapolation from last two values:

x̂t+K = xt + K (xt − xt−1)

average to date: x̂t+K = avg(x1:t)

(M + 1)-period rolling average: x̂t+K = avg(x(t−M ):t)

straight-line fit to date (i.e.

, based on x1:t

)

Prediction 32

A t i di t


33/44

Auto-regressive predictor

auto-regressive predictor:

x̂t+K = (xt, xt−1, . . . , x(t−M ))T β + v

– M is memory length– (M + 1)-vector β gives predictor weights; v is offset

prediction x̂t+K is affine function of past window xt−M :t (which of the simple predictors above have this form?)

Prediction 33

L t fitti f t i d l


34/44

Least-squares fitting of auto-regressive models

choose coefficients β , offset v via least-squares (regression)

regressors are (M + 1)-vectors

x1:(M +1), . . . , x(N −M ):N

outcomes are numbers

x̂M +K +1, . . . , x̂N +K

can add regularization on β

Prediction 34

Evaluating predictions with validation


35/44

Evaluating predictions with validation

for simple methods: evaluate RMS prediction error

for more sophisticated methods:

– split data into a training set and a test set (usually sequential)– train prediction on training data– test on test data

Prediction 35

Example


36/44

Example

predict Texas energy usage one step ahead (K = 1)

train on first 10 months, test on last 2

Prediction 36

Coefficients


37/44

Coe c e ts

using M = 100 0 is the coefficient for today

0 10 20 30 40 50 60 70 80 90 100−1

−0.5

0

0.5

1

1.5

2

Prediction 37

Auto-regressive prediction results


38/44


3 3.1 3.2 3.3 3.4 3.5

x 104

1

2

3

4

5

6

7

8

9

Prediction 38



39/44


showing the residual

3 3.1 3.2 3.3 3.4 3.5

x 104

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Prediction 39



40/44

Auto regressive prediction results

predictor RMS erroraverage (constant) 1.20

current value 0.119auto-regressive (M = 10) 0.073auto-regressive (M = 100) 0.051

Prediction 40

Autoregressive model on residuals


41/44

Autoregressive model on residuals

fit a model to the time series, e.g., linear or periodic

subtract this model from the original signal to compute residuals

apply auto-regressive model to predict residuals

can add predicted residuals back to model to obtain predictions

Prediction 41

Example


42/44

Melbourne temperature data residuals zoomed on 100 days in test set

3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8

−6

−4

−2

0

2

4

6

8

Prediction 42

Auto-regressive prediction of residuals


43/44

g p

3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10

−8

−6

−4

−2

0

2

4

6

8

10

Prediction 43

Prediction results for Melbourne temperature


44/44

p

tested on last two years

predictor RMS erroraverage 4.12

current value 2.57periodic (no smoothing) 2.71periodic (smoothing, λ = 30) 2.62auto-regressive (M = 3) 2.44auto-regressive (M = 20) 2.27auto-regressive on residual (M = 20) 2.22

Prediction 44

time series slides (1)

Documents