time series analysis
DESCRIPTION
School of Electrical Engineering and Computer Science. Time Series Analysis. Topics in Machine Learning Fall 2011. Time Series Discussions. Overview Basic definitions Time domain Forecasting Frequency domain State space. Why Time Series Analysis?. - PowerPoint PPT PresentationTRANSCRIPT
Time Series Analysis
Topics in Machine LearningFall 2011
School of Electrical Engineeringand Computer Science
Time Series Discussions
• Overview• Basic definitions• Time domain• Forecasting• Frequency domain• State space
Why Time Series Analysis?
• Sometimes the concept we want to learn is the relationship between points in time
Time series: a sequence of
measurements over time
A sequence of random variablesx1, x2, x3, …
What is a time series?
Time Series ExamplesDefinition: A sequence of measurements over time
Finance Social science Epidemiology Medicine Meterology Speech Geophysics Seismology Robotics
Three Approaches
• Time domain approach– Analyze dependence of current value on past values
• Frequency domain approach– Analyze periodic sinusoidal variation
• State space models– Represent state as collection of variable values– Model transition between states
Sample Time Series Data
Johnson & Johnson quarterly earnings/share, 1960-1980
Sample Time Series Data
Yearly average global temperature deviations
Sample Time Series Data
Speech recording of “aaa…hhh”, 10k pps
Sample Time Series Data
NYSE daily weighted market returns
Not all time data will exhibit strong patterns…
LA annual rainfall
…and others will be apparent
Canadian Hare counts
Time Series Discussions
• Overview• Basic definitions• Time domain• Forecasting• Frequency domain• State space
Definitions
• Mean
• Variance
mean
variance
Definitions
• Covariance
• Correlation
N
i
yixi
Nyx
YXCov1
))((),(
YX
YXCovrYXCor
),(),(
Y
X
Y
X
Y
Xr = -1 r = -.6 r = 0
Y
X
Y
Xr = +.3r = +1
Y
Xr = 0
Correlation
Redefined for Time
•
•
•
,...2,1,0)()( tforXEt tXErgodic?
Mean function
),()0()()( tht
X
XX XXCorhh
Autocorrelation
),()( thtX XXCovh lag
Autocovariance
Autocorrelation Examples
Positivelag
Negative
lag
Stationarity – When there is no relationship
• {Xt} is stationary if– X(t) is independent of t– X(t+h,t) is independent of t for each h
• In other words, properties of each section are the same
• Special case: white noise
Time Series Discussions
• Overview• Basic definitions• Time domain• Forecasting• Frequency domain• State space
Linear Regression
• Fit a line to the data• Ordinary least squares
– Minimize sum of squareddistances between pointsand line
• Try this out at http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/LeastSquaresDemo.html
y = x +
R2: Evaluating Goodness of Fit
• Least squares minimizesthe combined residual
• Explained sum of squaresis difference between lineand mean
• Total sum of squares is the total of these two
y = x +
i iYYRSS 2)(
i
YYESS 2)(
ii i YYYYRSSESSTSS 22 )()(
R2: Evaluating Goodness of Fit
• R2, the coefficientof determination
• 0 R2 1• Regression minimizes RSS and so
maximizes R2
y = x + TSSRSS
TSSESSR 12
R2: Evaluating Goodness of Fit
TSSRSS
TSSESSR 12
R2: Evaluating Goodness of Fit
TSSRSS
TSSESSR 12
R2: Evaluating Goodness of Fit
TSSRSS
TSSESSR 12
Linear Regression
• Can report:– Direction of trend (>0, <0, 0)– Steepness of trend (slope)– Goodness of fit to trend (R2)
Examples
What if a linear trend does not fit my data well?
• Could be no relationship• Could be too much local variation
– Want to look at longer-term trend– Smooth the data
• Could have periodic or seasonality effects– Add seasonal components
• Could be a nonlinear relationship453423121 QbQbQbQbtbaX t
Moving Average
• Compute an average of the last m consecutive data points
• 4-point moving average is
• Smooths white noise4
)( 321)4(
tttt
MAxxxxx
k
kjjtjt xam
• Can apply higher-order MA
• Exponential smoothing
• Kernel smoothing
Power Load Data
5 week
53 week
Piecewise Aggregate Approximation
• Segment the data into linear pieces
Interesting paper
Nonlinear Trend Examples
Nonlinear Regression
Fit Known Distributions
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)• Moving average model of order q: MA(q)• ARMA(p,q)
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
• Moving average model of order q: MA(q)• ARMA(p,q)
tptpttt wxxxx ..2211
AR(1),
0 20 40 60 80 100
-20
24
9.0
AR(1),
0 20 40 60 80 100
-4-2
02
49.0
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
• Moving average model of order q: MA(q)
• ARMA(p,q)
tptpttt wxxxx ..2211
tqtqttt wwwwx ..2211
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
• Moving average model of order q: MA(q)
• ARMA(p,q)– A time series is ARMA(p,q) if it is stationary and
tptpttt wxxxx ..2211
tqtqttt wwwwx ..2211
qtqtt
tptpttt
www
wxxxx
..
..
2211
2211
ARIMA (AutoRegressive Integrated Moving Average)
• ARMA only applies to stationary process• Apply differencing to obtain stationarity
– Replace its value by incremental change from last value
• A process xt is ARIMA(p,d,q) if– AR(p)– MA(q)– Differenced d times
• Also known as Box Jenkins
Differenced x1 x2 x3 x41 time x2-x1’ x3’-x2’ x4’-x3’
2 times x3’-2x2’+x1’ x4’-2x3’+x2’
Time Series Discussions
• Overview• Basic definitions• Time domain• Forecasting• Frequency domain• State space
Express Data as Fourier Frequencies
• Time domain– Express present as function of the past
• Frequency domain– Express present as function of oscillations, or
sinusoids
Time Series Definitions
• Frequency, , measured at cycles per time point• J&J data
– 1 cycle each year– 4 data points (time points) each cycle– 0.25 cycles per data point
• Period of a time series, T = 1/– J&J, T = 1/.25 = 4– 4 data points per cycle– Note: Need at least 2
Fourier Series
• Time series is a mixture of oscillations– Can describe each by amplitude, frequency and
phase– Can also describe as a sum of amplitudes at all
time points– (or magnitudes at all frequencies)
– If we allow for mixtures of periodic series then
)2sin()2cos( ttxt
Take a look
q
iiiiit ttx
1
)]2sin()2cos([
Example
)100/62sin(3)100/62cos(21
ttxt
)100/402sin(7)100/402cos(63
ttxt
)100/102sin(5)100/102cos(42
ttxt
3214 tttt xxxx
How Compute Parameters?
• Regression• Discrete Fourier Transform
• DFTs represent amplitude and phase of series components
• Can use redundancies to speed it up (FFT)
2/
1
)]/2sin()/()/2cos()/([n
jiit ntjnjntjnjx
n
t
ntjitexnnjdd
1
/221
)/()(
Breaking down a DFT
• Amplitude
• Phase
22 ))/(())/((|)/(|)/( njdInjdRnjdnjA
)))/((/))/(((tan)/( 1 njdRnjdInj
Example
-1
0
1
2
GB
P
-1
0
1
2
GB
P
-1
0
1
2
GB
P
-1
0
1
2
GB
P
-1
0
1
2
GB
P
-1
0
1
2
GB
P
1 frequency
2 frequencies
3 frequencies
5 frequencies
10 frequencies
20 frequencies
Periodogram
• Measure of squared correlation between– Data and– Sinusoids oscillating at frequency of j/n
– Compute quickly using FFT
n
tt
n
tt ntjx
nntjx
nnjP
1
2
1
2 ))/2sin(2())/2cos(2()/(
Example
P(6/100) = 13, P(10/100) = 41, P(40/100) = 85
Wavelets
• Can break series up into segments– Called wavelets– Analyze a window of time separately– Variable-sized windows
Time Series Discussions
• Overview• Basic definitions• Time domain• Forecasting• Frequency domain• State space
State Space Models
• Current situation represented as a state– Estimate state variables from noisy observations
over time– Estimate transitions between states
• Kalman Filters– Similar to HMMs
• HMM models discrete variables• Kalman filters models continuous variables
Conceptual Overview
• Lost on a 1-dimensional line• Receive occasional sextant position readings
– Possibly incorrect• Position x(t), Velocity x’(t)
x
Conceptual Overview
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
• Current location distribution is Gaussian• Transition model is linear Gaussian• The sensor model is linear Gaussian• Sextant Measurement at ti: Mean = i and Variance = 2
i
• Measured Velocity at ti: Mean = ’i and Variance = ’2i
Noisy information
Kalman Filter Algorithm
• Start with current location (Gaussian)• Predict next location
– Use current location– Use transition function (linear Gaussian)– Result is Gaussian
• Get next sensor measurement (Gaussian)• Correct prediction
– Weighted mean of previous prediction and measurement
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• We generate the prediction for time i+, prediction is Gaussian• GPS Measurement: Mean = i+ and Variance = 2
i +
• They do not match
prediction Measurement at i
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• Corrected mean is the new optimal estimate of position• New variance is smaller than either of the previous two variances
measurement at i+
corrected estimate
prediction
Updating Gaussian Distributions
• One-step predicted distribution is Gaussian
• After new (linear Gaussian) evidence, updated distribution is Gaussian
tx ttttttt dxexPxXPeXP )|()|()|( :11:11
)|()|()|( :11111:11 tttttt eXPXePeXP
PriorTransition
Previous step
New measurement
Why Is Kalman Great?
• The method, that is…• Representation of state-based series with
general continuous variables grows without bound
Why Is Time Series Important?
• Time is an important component of many processes
• Do not ignore time in learning problems• ML can benefit from, and in turn benefit,
these techniques– Dimensionality reduction of series– Rule discovery– Cluster series– Classify series– Forecast data points– Anomaly detection