time series slides (1)

Upload: karishma-shah

Post on 08-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 Time Series Slides (1)

    1/44

    Time Series

    Karanveer Mohan Keegan Go Stephen Boyd

    EE103Stanford University

    November 12, 2015

  • 8/19/2019 Time Series Slides (1)

    2/44

    Outline

    Introduction

    Linear operations

    Least-squares

    Prediction

    Introduction   2

  • 8/19/2019 Time Series Slides (1)

    3/44

    Time series data

      represent time series  x1, . . . , xT   as  T -vector  x

      xt   is value of some quantity at time (period, epoch)  t,  t = 1, . . . , T  

      examples:

    –   average temperature at some location on day  t–  closing price of some stock on (trading) day  t–  hourly number of users on a website–   altitude of an airplane every  10  seconds–  enrollment in a class every quarter

     vector time series:   xt   is an  n-vector; can represent as  T  × n  matrix

    Introduction   3

  • 8/19/2019 Time Series Slides (1)

    4/44

  • 8/19/2019 Time Series Slides (1)

    5/44

  • 8/19/2019 Time Series Slides (1)

    6/44

    Melbourne temperature

     zoomed to one year

    50 100 150 200 250 300 3500

    5

    10

    15

    20

    25

    Introduction   6

  • 8/19/2019 Time Series Slides (1)

    7/44

    Apple stock price

      log10  of Apple daily share price, over 30 years, 250 trading days/year

     you can see (not steady) growth

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

    −0.5

    0

    0.5

    1

    1.5

    2

    2.5

    Introduction   7

  • 8/19/2019 Time Series Slides (1)

    8/44

    Log price of Apple

     zoomed to one year

    6000 6050 6100 6150 6200 62500.35

    0.4

    0.45

    0.5

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    Introduction   8

  • 8/19/2019 Time Series Slides (1)

    9/44

    Electricity usage in (one region of) Texas

     total in 15 minute intervals, over 1 year

     you can see variation over year

    0   0.5 1 1.5 2 2.5 3 3.5 4

    x 104

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Introduction   9

  • 8/19/2019 Time Series Slides (1)

    10/44

    Electricity usage in (one region of) Texas

     zoomed to 1 month

     you can see daily periodicity and weekend/weekday variation

    500 1000 1500 2000 25001

    2

    3

    4

    5

    6

    7

    8

    9

    Introduction   10

  • 8/19/2019 Time Series Slides (1)

    11/44

    Outline

    Introduction

    Linear operations

    Least-squares

    Prediction

    Linear operations   11

  • 8/19/2019 Time Series Slides (1)

    12/44

    Down-sampling

      k×  down-sampled  time series selects every  kth entry of  x

     can be written as  y = Ax

      for 2×  down-sampling,  T   even,

    A =

    1 0 0 0 0 0   · · ·   0 0 0 00 0 1 0 0 0   · · ·   0 0 0 00 0 0 0 1 0   · · ·   0 0 0 0...

    ......

    ......

    ......

    ......

    ...0 0 0 0 0 0   · · ·   1 0 0 0

    0 0 0 0 0 0   · · ·   0 0 1 0

    Linear operations   12

  • 8/19/2019 Time Series Slides (1)

    13/44

    Down-sampling on Apple log price

    4×  down-sample

    5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09

    0.1

    0.11

    0.12

    0.13

    0.14

    0.15

    0.16

    0.17

    Linear operations   13

  • 8/19/2019 Time Series Slides (1)

    14/44

    Up-sampling

      k×  (linear)   up-sampling   interpolates between entries of  x  can be written as  y = Ax

      for 2×  up-sampling

    A =

    1

    1/2 1/21

    1/2 1/21

    . . .

    11/2 1/2

    1

    Linear operations   14

  • 8/19/2019 Time Series Slides (1)

    15/44

    Up-sampling on Apple log price

    4×  up-sample

    5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09

    0.1

    0.11

    0.12

    0.13

    0.14

    0.15

    0.16

    0.17

    Linear operations   15

  • 8/19/2019 Time Series Slides (1)

    16/44

    Smoothing

      k-long  moving average  y  of  x  is given by

    yi  =  1

    k(xi + xi+1 + · · · + xi+k−1), i = 1, . . . , T   − k + 1

      can express as  y = Ax,  e.g., for  k = 3,

    A =

    1/3 1/3 1/31/3 1/3 1/3

    1/3 1/3 1/3. . .

    1/3 1/3 1/3

      can also have trailing or centered smoothing

    Linear operations   16

  • 8/19/2019 Time Series Slides (1)

    17/44

    Melbourne daily temperature smoothed

     centered smoothing with window size  41

    0 500 1000 1500 2000 2500 3000 3500 4000−5

    0

    5

    10

    15

    20

    25

    30

    Linear operations   17

  • 8/19/2019 Time Series Slides (1)

    18/44

    First-order differences

     (first-order) difference between adjacent entries

     discrete analog of derivative

      express as  y = Dx,  D   is the  (T  − 1) × T  difference matrix

    D =

    −1 1   . . .

    −1 1   . . .. . .

      . . .

    . . .   −1 1

      Dx2 is a measure of the wiggliness of  x

    Dx2 = (x2 − x1)2 + · · · + (xT  − xT −1)

    2

    Linear operations   18

  • 8/19/2019 Time Series Slides (1)

    19/44

    Outline

    Introduction

    Linear operations

    Least-squares

    Prediction

    Least-squares   19

  • 8/19/2019 Time Series Slides (1)

    20/44

    De-meaning

      de-meaning  a time series means subtracting its mean:x̃ =  x − avg(x)

      rms(x̃) = std(x)

     this is the least-squares fit with a constant

    Least-squares   20

  • 8/19/2019 Time Series Slides (1)

    21/44

    Straight-line fit and de-trending

     fit data  (1, x1), . . . , (T, xT )  with affine model  xt ≈ a + bt(also called  straight-line fit )

      b  is called the  trend 

      a + bt   is called the   trend line 

      de-trending  a time series means subtracting its straight-line fit

     de-trended time series shows variations above and below thestraight-line fit

    Least-squares   21

  • 8/19/2019 Time Series Slides (1)

    22/44

    Straight-line fit on Apple log price

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

    −0.5

    0

    0.5

    1

    1.5

    2

    2.5

    Trend

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

    −0.5

    0

    0.5

    1

    Residual

    Least-squares   22

  • 8/19/2019 Time Series Slides (1)

    23/44

    Periodic time series

      let  P -vector  z  be one period of periodic time series

    xper = (z , z , . . . , z)

    (we assume  T   is a multiple of  P )

      express as  xper = Az  with

    A =

    I P ...

    I P 

    Least-squares   23

  • 8/19/2019 Time Series Slides (1)

    24/44

    Extracting a periodic component

     given (non-periodic) time series  x, choose  z  to minimize  x − Az2

     gives best least-squares fit with periodic time series

     simple solution: average periods of original:

    ẑ = (1/k)AT x, k = T /P 

      e.g., to get  ẑ   for January 9, average all  xi’s with date January 9

    Least-squares   24

  • 8/19/2019 Time Series Slides (1)

    25/44

    Periodic component of Melbourne temperature

    0 500 1000 1500 2000 2500 3000 3500 4000−5

    0

    5

    10

    15

    20

    25

    30

    Least-squares   25

  • 8/19/2019 Time Series Slides (1)

    26/44

    Extracting a periodic component with smoothing

      can add  smoothing  to periodic fit by minimizing

    x − Az2 + λDz2

      λ > 0   is smoothing parameter

      D   is  P  × P   circular difference matrix 

    D =

    −1 1−1 1

    . . .  . . .

    −1 11   −1

      λ   is chosen visually or by validation

    Least-squares   26

  • 8/19/2019 Time Series Slides (1)

    27/44

    Choosing smoothing via validation

     split data into train and test sets,  e.g., test set is last period (P entries)

     train model on train set, and test on the test set

      choose  λ  to (approximately) minimize error on the test set

    Least-squares   27

  • 8/19/2019 Time Series Slides (1)

    28/44

    Validation of smoothing for Melbourne temperature

    trained on first 8 years; tested on last two years

    10−2

    10−1

    100

    101

    102

    2.6

    2.65

    2.7

    2.75

    2.8

    2.85

    2.9

    2.95

    3

    Least-squares   28

    P i di f i h hi

  • 8/19/2019 Time Series Slides (1)

    29/44

    Periodic component of temperature with smoothing

     zoomed on test set, using  λ = 30

    2900 3000 3100 3200 3300 3400 3500 36000

    5

    10

    15

    20

    25

    Least-squares   29

  • 8/19/2019 Time Series Slides (1)

    30/44

    Outline

    Introduction

    Linear operations

    Least-squares

    Prediction

    Prediction   30

    P di i

  • 8/19/2019 Time Series Slides (1)

    31/44

    Prediction

      goal:   predict  or guess  xt+K   given  x1, . . . , xt   K  = 1   is   one-step-ahead prediction

     prediction is often denoted  x̂t+K , or more explicitly  x̂(t+K |t)(estimate of  xt+K  at time  t)

      x̂t+K  − xt+K   is   prediction error    applications: predict

    –   asset price–   product demand–   electricity usage

    –   economic activity–   position of vehicle

    Prediction   31

    S i l di

  • 8/19/2019 Time Series Slides (1)

    32/44

    Some simple predictors

      constant:   x̂t+K  = a

      current value:   x̂t+K  = xt   linear (affine) extrapolation from last two values:

    x̂t+K  = xt + K (xt − xt−1)

      average to date:   x̂t+K  = avg(x1:t)

      (M  + 1)-period rolling average:   x̂t+K  = avg(x(t−M ):t)

     straight-line fit to date (i.e.

    , based on  x1:t

    )

    Prediction   32

    A t i di t

  • 8/19/2019 Time Series Slides (1)

    33/44

    Auto-regressive predictor

      auto-regressive  predictor:

    x̂t+K  = (xt, xt−1, . . . , x(t−M ))T β  + v

    –   M   is  memory length–   (M  + 1)-vector  β  gives predictor weights;  v   is offset

      prediction  x̂t+K   is affine function of  past window  xt−M :t   (which of the simple predictors above have this form?)

    Prediction   33

    L t fitti f t i d l

  • 8/19/2019 Time Series Slides (1)

    34/44

    Least-squares fitting of auto-regressive models

      choose coefficients  β , offset  v  via least-squares (regression)

      regressors are  (M  + 1)-vectors

    x1:(M +1), . . . , x(N −M ):N 

     outcomes are numbers

    x̂M +K +1, . . . , x̂N +K 

     can add regularization on  β 

    Prediction   34

    Evaluating predictions with validation

  • 8/19/2019 Time Series Slides (1)

    35/44

    Evaluating predictions with validation

      for simple methods: evaluate RMS prediction error

      for more sophisticated methods:

    –  split data into a training set and a test set (usually sequential)–  train prediction on training data–  test on test data

    Prediction   35

    Example

  • 8/19/2019 Time Series Slides (1)

    36/44

    Example

     predict Texas energy usage one step ahead (K  = 1)

     train on first 10 months, test on last 2

    Prediction   36

    Coefficients

  • 8/19/2019 Time Series Slides (1)

    37/44

    Coe c e ts

      using  M  = 100  0 is the coefficient for today

    0 10 20 30 40 50 60 70 80 90 100−1

    −0.5

    0

    0.5

    1

    1.5

    2

    Prediction   37

    Auto-regressive prediction results

  • 8/19/2019 Time Series Slides (1)

    38/44

    Auto-regressive prediction results

    3 3.1 3.2 3.3 3.4 3.5

    x 104

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Prediction   38

    Auto-regressive prediction results

  • 8/19/2019 Time Series Slides (1)

    39/44

    Auto-regressive prediction results

    showing the residual

    3 3.1 3.2 3.3 3.4 3.5

    x 104

    −1

    −0.8

    −0.6

    −0.4

    −0.2

    0

    0.2

    0.4

    0.6

    0.8

    Prediction   39

    Auto-regressive prediction results

  • 8/19/2019 Time Series Slides (1)

    40/44

    Auto regressive prediction results

    predictor RMS erroraverage (constant)   1.20

    current value   0.119auto-regressive (M  = 10)   0.073auto-regressive (M  = 100)   0.051

    Prediction   40

    Autoregressive model on residuals

  • 8/19/2019 Time Series Slides (1)

    41/44

    Autoregressive model on residuals

     fit a model to the time series,  e.g., linear or periodic

     subtract this model from the original signal to compute residuals

     apply auto-regressive model to predict residuals

     can add predicted residuals back to model to obtain predictions

    Prediction   41

    Example

  • 8/19/2019 Time Series Slides (1)

    42/44

     Melbourne temperature data residuals  zoomed on 100 days in test set

    3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8

    −6

    −4

    −2

    0

    2

    4

    6

    8

    Prediction   42

    Auto-regressive prediction of residuals

  • 8/19/2019 Time Series Slides (1)

    43/44

    g p

    3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10

    −8

    −6

    −4

    −2

    0

    2

    4

    6

    8

    10

    Prediction   43

    Prediction results for Melbourne temperature

  • 8/19/2019 Time Series Slides (1)

    44/44

    p

     tested on last two years

    predictor RMS erroraverage 4.12

    current value 2.57periodic (no smoothing) 2.71periodic (smoothing,  λ = 30) 2.62auto-regressive (M  = 3) 2.44auto-regressive (M  = 20) 2.27auto-regressive on residual (M  = 20) 2.22

    Prediction   44