time series slides (1)
TRANSCRIPT
-
8/19/2019 Time Series Slides (1)
1/44
Time Series
Karanveer Mohan Keegan Go Stephen Boyd
EE103Stanford University
November 12, 2015
-
8/19/2019 Time Series Slides (1)
2/44
Outline
Introduction
Linear operations
Least-squares
Prediction
Introduction 2
-
8/19/2019 Time Series Slides (1)
3/44
Time series data
represent time series x1, . . . , xT as T -vector x
xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T
examples:
– average temperature at some location on day t– closing price of some stock on (trading) day t– hourly number of users on a website– altitude of an airplane every 10 seconds– enrollment in a class every quarter
vector time series: xt is an n-vector; can represent as T × n matrix
Introduction 3
-
8/19/2019 Time Series Slides (1)
4/44
-
8/19/2019 Time Series Slides (1)
5/44
-
8/19/2019 Time Series Slides (1)
6/44
Melbourne temperature
zoomed to one year
50 100 150 200 250 300 3500
5
10
15
20
25
Introduction 6
-
8/19/2019 Time Series Slides (1)
7/44
Apple stock price
log10 of Apple daily share price, over 30 years, 250 trading days/year
you can see (not steady) growth
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Introduction 7
-
8/19/2019 Time Series Slides (1)
8/44
Log price of Apple
zoomed to one year
6000 6050 6100 6150 6200 62500.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Introduction 8
-
8/19/2019 Time Series Slides (1)
9/44
Electricity usage in (one region of) Texas
total in 15 minute intervals, over 1 year
you can see variation over year
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
0
1
2
3
4
5
6
7
8
9
10
Introduction 9
-
8/19/2019 Time Series Slides (1)
10/44
Electricity usage in (one region of) Texas
zoomed to 1 month
you can see daily periodicity and weekend/weekday variation
500 1000 1500 2000 25001
2
3
4
5
6
7
8
9
Introduction 10
-
8/19/2019 Time Series Slides (1)
11/44
Outline
Introduction
Linear operations
Least-squares
Prediction
Linear operations 11
-
8/19/2019 Time Series Slides (1)
12/44
Down-sampling
k× down-sampled time series selects every kth entry of x
can be written as y = Ax
for 2× down-sampling, T even,
A =
1 0 0 0 0 0 · · · 0 0 0 00 0 1 0 0 0 · · · 0 0 0 00 0 0 0 1 0 · · · 0 0 0 0...
......
......
......
......
...0 0 0 0 0 0 · · · 1 0 0 0
0 0 0 0 0 0 · · · 0 0 1 0
Linear operations 12
-
8/19/2019 Time Series Slides (1)
13/44
Down-sampling on Apple log price
4× down-sample
5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Linear operations 13
-
8/19/2019 Time Series Slides (1)
14/44
Up-sampling
k× (linear) up-sampling interpolates between entries of x can be written as y = Ax
for 2× up-sampling
A =
1
1/2 1/21
1/2 1/21
. . .
11/2 1/2
1
Linear operations 14
-
8/19/2019 Time Series Slides (1)
15/44
Up-sampling on Apple log price
4× up-sample
5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Linear operations 15
-
8/19/2019 Time Series Slides (1)
16/44
Smoothing
k-long moving average y of x is given by
yi = 1
k(xi + xi+1 + · · · + xi+k−1), i = 1, . . . , T − k + 1
can express as y = Ax, e.g., for k = 3,
A =
1/3 1/3 1/31/3 1/3 1/3
1/3 1/3 1/3. . .
1/3 1/3 1/3
can also have trailing or centered smoothing
Linear operations 16
-
8/19/2019 Time Series Slides (1)
17/44
Melbourne daily temperature smoothed
centered smoothing with window size 41
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Linear operations 17
-
8/19/2019 Time Series Slides (1)
18/44
First-order differences
(first-order) difference between adjacent entries
discrete analog of derivative
express as y = Dx, D is the (T − 1) × T difference matrix
D =
−1 1 . . .
−1 1 . . .. . .
. . .
. . . −1 1
Dx2 is a measure of the wiggliness of x
Dx2 = (x2 − x1)2 + · · · + (xT − xT −1)
2
Linear operations 18
-
8/19/2019 Time Series Slides (1)
19/44
Outline
Introduction
Linear operations
Least-squares
Prediction
Least-squares 19
-
8/19/2019 Time Series Slides (1)
20/44
De-meaning
de-meaning a time series means subtracting its mean:x̃ = x − avg(x)
rms(x̃) = std(x)
this is the least-squares fit with a constant
Least-squares 20
-
8/19/2019 Time Series Slides (1)
21/44
Straight-line fit and de-trending
fit data (1, x1), . . . , (T, xT ) with affine model xt ≈ a + bt(also called straight-line fit )
b is called the trend
a + bt is called the trend line
de-trending a time series means subtracting its straight-line fit
de-trended time series shows variations above and below thestraight-line fit
Least-squares 21
-
8/19/2019 Time Series Slides (1)
22/44
Straight-line fit on Apple log price
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Trend
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
Residual
Least-squares 22
-
8/19/2019 Time Series Slides (1)
23/44
Periodic time series
let P -vector z be one period of periodic time series
xper = (z , z , . . . , z)
(we assume T is a multiple of P )
express as xper = Az with
A =
I P ...
I P
Least-squares 23
-
8/19/2019 Time Series Slides (1)
24/44
Extracting a periodic component
given (non-periodic) time series x, choose z to minimize x − Az2
gives best least-squares fit with periodic time series
simple solution: average periods of original:
ẑ = (1/k)AT x, k = T /P
e.g., to get ẑ for January 9, average all xi’s with date January 9
Least-squares 24
-
8/19/2019 Time Series Slides (1)
25/44
Periodic component of Melbourne temperature
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Least-squares 25
-
8/19/2019 Time Series Slides (1)
26/44
Extracting a periodic component with smoothing
can add smoothing to periodic fit by minimizing
x − Az2 + λDz2
λ > 0 is smoothing parameter
D is P × P circular difference matrix
D =
−1 1−1 1
. . . . . .
−1 11 −1
λ is chosen visually or by validation
Least-squares 26
-
8/19/2019 Time Series Slides (1)
27/44
Choosing smoothing via validation
split data into train and test sets, e.g., test set is last period (P entries)
train model on train set, and test on the test set
choose λ to (approximately) minimize error on the test set
Least-squares 27
-
8/19/2019 Time Series Slides (1)
28/44
Validation of smoothing for Melbourne temperature
trained on first 8 years; tested on last two years
10−2
10−1
100
101
102
2.6
2.65
2.7
2.75
2.8
2.85
2.9
2.95
3
Least-squares 28
P i di f i h hi
-
8/19/2019 Time Series Slides (1)
29/44
Periodic component of temperature with smoothing
zoomed on test set, using λ = 30
2900 3000 3100 3200 3300 3400 3500 36000
5
10
15
20
25
Least-squares 29
-
8/19/2019 Time Series Slides (1)
30/44
Outline
Introduction
Linear operations
Least-squares
Prediction
Prediction 30
P di i
-
8/19/2019 Time Series Slides (1)
31/44
Prediction
goal: predict or guess xt+K given x1, . . . , xt K = 1 is one-step-ahead prediction
prediction is often denoted x̂t+K , or more explicitly x̂(t+K |t)(estimate of xt+K at time t)
x̂t+K − xt+K is prediction error applications: predict
– asset price– product demand– electricity usage
– economic activity– position of vehicle
Prediction 31
S i l di
-
8/19/2019 Time Series Slides (1)
32/44
Some simple predictors
constant: x̂t+K = a
current value: x̂t+K = xt linear (affine) extrapolation from last two values:
x̂t+K = xt + K (xt − xt−1)
average to date: x̂t+K = avg(x1:t)
(M + 1)-period rolling average: x̂t+K = avg(x(t−M ):t)
straight-line fit to date (i.e.
, based on x1:t
)
Prediction 32
A t i di t
-
8/19/2019 Time Series Slides (1)
33/44
Auto-regressive predictor
auto-regressive predictor:
x̂t+K = (xt, xt−1, . . . , x(t−M ))T β + v
– M is memory length– (M + 1)-vector β gives predictor weights; v is offset
prediction x̂t+K is affine function of past window xt−M :t (which of the simple predictors above have this form?)
Prediction 33
L t fitti f t i d l
-
8/19/2019 Time Series Slides (1)
34/44
Least-squares fitting of auto-regressive models
choose coefficients β , offset v via least-squares (regression)
regressors are (M + 1)-vectors
x1:(M +1), . . . , x(N −M ):N
outcomes are numbers
x̂M +K +1, . . . , x̂N +K
can add regularization on β
Prediction 34
Evaluating predictions with validation
-
8/19/2019 Time Series Slides (1)
35/44
Evaluating predictions with validation
for simple methods: evaluate RMS prediction error
for more sophisticated methods:
– split data into a training set and a test set (usually sequential)– train prediction on training data– test on test data
Prediction 35
Example
-
8/19/2019 Time Series Slides (1)
36/44
Example
predict Texas energy usage one step ahead (K = 1)
train on first 10 months, test on last 2
Prediction 36
Coefficients
-
8/19/2019 Time Series Slides (1)
37/44
Coe c e ts
using M = 100 0 is the coefficient for today
0 10 20 30 40 50 60 70 80 90 100−1
−0.5
0
0.5
1
1.5
2
Prediction 37
Auto-regressive prediction results
-
8/19/2019 Time Series Slides (1)
38/44
Auto-regressive prediction results
3 3.1 3.2 3.3 3.4 3.5
x 104
1
2
3
4
5
6
7
8
9
Prediction 38
Auto-regressive prediction results
-
8/19/2019 Time Series Slides (1)
39/44
Auto-regressive prediction results
showing the residual
3 3.1 3.2 3.3 3.4 3.5
x 104
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Prediction 39
Auto-regressive prediction results
-
8/19/2019 Time Series Slides (1)
40/44
Auto regressive prediction results
predictor RMS erroraverage (constant) 1.20
current value 0.119auto-regressive (M = 10) 0.073auto-regressive (M = 100) 0.051
Prediction 40
Autoregressive model on residuals
-
8/19/2019 Time Series Slides (1)
41/44
Autoregressive model on residuals
fit a model to the time series, e.g., linear or periodic
subtract this model from the original signal to compute residuals
apply auto-regressive model to predict residuals
can add predicted residuals back to model to obtain predictions
Prediction 41
Example
-
8/19/2019 Time Series Slides (1)
42/44
Melbourne temperature data residuals zoomed on 100 days in test set
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8
−6
−4
−2
0
2
4
6
8
Prediction 42
Auto-regressive prediction of residuals
-
8/19/2019 Time Series Slides (1)
43/44
g p
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10
−8
−6
−4
−2
0
2
4
6
8
10
Prediction 43
Prediction results for Melbourne temperature
-
8/19/2019 Time Series Slides (1)
44/44
p
tested on last two years
predictor RMS erroraverage 4.12
current value 2.57periodic (no smoothing) 2.71periodic (smoothing, λ = 30) 2.62auto-regressive (M = 3) 2.44auto-regressive (M = 20) 2.27auto-regressive on residual (M = 20) 2.22
Prediction 44