introduction to time series forecasting

129
Introduction to Time Series Forecasting With Case Studies in NLP A Tutorial at ICON 2019 Sandhya Singh & Kevin Patel December 18, 2019 Sandhya and Kevin Time Series Forecasting 1

Upload: others

Post on 04-Dec-2021

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Time Series Forecasting

Introduction to Time Series ForecastingWith Case Studies in NLP

A Tutorial at ICON 2019

Sandhya Singh & Kevin Patel

CFC ILTenter

or

nd ian

anguage

December 18, 2019

Sandhya and Kevin Time Series Forecasting 1

Page 2: Introduction to Time Series Forecasting

Overview

We will highlight how NLP people are also well suited to workon Time Series problems.

We will provide background information on Time SeriesForecasting.

We will discuss some statistical approaches, some classicalmachine learning approaches, and some deep learningapproaches for time series forecasting.

We will mention some commonalities between NLP and TimeSeries, and how one can assist the other.

Sandhya and Kevin Time Series Forecasting 2

CFC ILTenter

or

nd ian

anguage

Page 3: Introduction to Time Series Forecasting

Outline

1 IntroductionTime SeriesTime Series Forecasting

2 BackgroundTime Series ComponentsTime Series CategorizationTime Series Forecasting Terminology

3 Statistical MethodsSimple ModelsAuto Regressive ModelsEvaluation Metrics

4 Classical ML ModelsPreparing DataML Models

Sandhya and Kevin Time Series Forecasting 3

CFC ILTenter

or

nd ian

anguage

Page 4: Introduction to Time Series Forecasting

Outline

5 Deep Learning Models

6 Connection with NLPProblem LevelTooling Level

7 DemosStatsmodel LibraryProphet Library

8 Conclusion

Sandhya and Kevin Time Series Forecasting 4

CFC ILTenter

or

nd ian

anguage

Page 5: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Acknowledgement

We thank

The ICON 2019 committee, for accepting our proposal

LGSoft, for their joint project with CFILT. Our investigationsinto the same germinated the idea of this tutorial

Sandhya and Kevin Time Series Forecasting 5

CFC ILTenter

or

nd ian

anguage

Page 6: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Some Context Regarding the Tutorial

Why are we (people working in NLP) talking about TimeSeries?

Text and Time Series - both are Sequential DataCommonality - Exploit structure we know about the problemin advance

The sequential nature in this case

Similar tools, different terminologyKnowing terminology will enable us to apply our knowledge oftools in this area too

Sandhya and Kevin Time Series Forecasting 6

CFC ILTenter

or

nd ian

anguage

Page 7: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Target Audience

People who have not worked with Time Series data

Subsumes

Those who are completely new to MLThose who have basic knowledge of how to apply MLThose who are proficient in ML and/or are working in NaturalLanguage Processing

Sandhya and Kevin Time Series Forecasting 7

CFC ILTenter

or

nd ian

anguage

Page 8: Introduction to Time Series Forecasting

Introduction

Sandhya and Kevin Time Series Forecasting 8

Page 9: Introduction to Time Series Forecasting

IntroductionTime Series

Sandhya and Kevin Time Series Forecasting 8

Page 10: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

What is a Time Series?

Definition

A time series is a sequence of observations ordered in time.

Xt ; t = 0, 1, 2, 3

The observations arecollected over a fixedtime interval

The time dimensionadds structure andconstraint to the data

Nifty 50 Index as of 13/10/2019 15:31 IST

Sandhya and Kevin Time Series Forecasting 9

CFC ILTenter

or

nd ian

anguage

Page 11: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Where (and When) does One Encounter Time Series?

As a time series of our life !

Economy and Finance: Exchange rates, Interest rates,Employment rate, Financial indicesMeteorology: Properties of weather like temperature, humidity,windspeed, etc.Medicine: Physiological signals (EEG), heart-rate, patienttemperature, etc.

Other venues:

Industry: Electric load, power consumption, resourceconsumptionWeb: Clicks, Logs

Sandhya and Kevin Time Series Forecasting 10

CFC ILTenter

or

nd ian

anguage

Page 12: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

What is Time Series Analysis?

Applying statistical approaches to time series data

Will enable one to

Predict future based on the pastUnderstand the underlying mechanism which generates thedataControl the mechanismDescribe salient features of the data

Sandhya and Kevin Time Series Forecasting 11

CFC ILTenter

or

nd ian

anguage

Page 13: Introduction to Time Series Forecasting

IntroductionTime Series Forecasting

Sandhya and Kevin Time Series Forecasting 11

Page 14: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Forecasting

Predict futurebased on past

Extrapolation inclassical statisticalterminology

MonthNo. of Passengers(in thousands)

1949-02 118

1949-03 132

1949-04 129

1949-05 121

1949-06 135

1949-07 ?

1949-08 ?

1949-09 ?

Sandhya and Kevin Time Series Forecasting 12

CFC ILTenter

or

nd ian

anguage

Page 15: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Forecasting - Yes or No?

Determining whether tomorrow a stock will go up/down orstay put?

Given a voice recording, who is the speaker?

Given a voice recording, who is speaking after the currentspeaker?

Given an ECG plot, is the heart functioning normal orabnormal?

Given an ECG plot, predict whether the person will have aheart related issue in the next month.

Sandhya and Kevin Time Series Forecasting 13

CFC ILTenter

or

nd ian

anguage

Page 16: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Forecasting - Yes or No?

Determining whether tomorrow a stock will go up/down orstay put? - YES

Given a voice recording, who is the speaker? - NO

Given a voice recording, who is speaking after the currentspeaker? - NO

Given an ECG plot, is the heart functioning normal orabnormal? - NO

Given an ECG plot, predict whether the person will have aheart - YES related issue in the next month.

Sandhya and Kevin Time Series Forecasting 13

CFC ILTenter

or

nd ian

anguage

Page 17: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Advantages of Time Series Forecasting

Reliability:

Given the forecast of power surges in your area, you can checkwhether your home’s wiring is reliable or not.

Preparing for Seasons:

Looking at the patterns from previous Christmas events, stockyour warehouse for the upcoming Christmas accordinglyGiven that the south east coast of India experiences typhoonsduring monsoons, pre-allocate rescue and relief resources

Estimating trends:

Given trend of a particular stock, should I invest in it?

Sandhya and Kevin Time Series Forecasting 14

CFC ILTenter

or

nd ian

anguage

Page 18: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Forecasting and Machine Learning

Forecasting - predicting the future from the past

Given an observed value Y , predict Yt+1 using Y1 . . .Yt

In other words, learn f such that

Yt+1 = f (Y1, . . . ,Yt) (1)

Machine Learning practitioners should be easily able to relatethis expression to

Y = f (X ) (2)

Are ML skills applicable? - Yes

Sandhya and Kevin Time Series Forecasting 15

CFC ILTenter

or

nd ian

anguage

Page 19: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

AI/ML : NOT a Silver Bullet

AI/ML are multipliers, and not a silver bullet.

Consider the example:

EMNLP - Empirical Methods in Natural Language Processing -a top tier NLP conferenceEMNLP 2015 was informally called Embedding Methods inNatural Language ProcessingThis is due to sheer number of papers about word embeddingsMore or less implying that word embeddings are the silverbulletIf that were the case, shouldn’t all problems be solved by now?

Shouldn’t ACL, etc. close shops?

That is not the caseDomain knowledge is still needed for proper utilization of MLSo lets discuss some background to gain time series domainknowledge

Sandhya and Kevin Time Series Forecasting 16

CFC ILTenter

or

nd ian

anguage

Page 20: Introduction to Time Series Forecasting

Background

Sandhya and Kevin Time Series Forecasting 17

Page 21: Introduction to Time Series Forecasting

BackgroundTime Series Components

Sandhya and Kevin Time Series Forecasting 17

Page 22: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Components

Level

The average value of a time series

Trend

A long term pattern present in the time seriesCan be positive, negative, linear or nonlinearIf no increasing or decreasing trend, then the time series isstationary.

i.e. Data has constant mean and variance over time

Sandhya and Kevin Time Series Forecasting 18

CFC ILTenter

or

nd ian

anguage

Page 23: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Components (contd.)

Seasonality

Regular and Predictable changes that recur in regular shortintervalsLargely due to involvement of periodically occurring factors

Cyclicality

Changes that recur in irregular intervalsAs opposed to fixed period intervals in seasonality

Noise / Irregularity / Residual

Random variations that do not repeat in the pattern

Sandhya and Kevin Time Series Forecasting 19

CFC ILTenter

or

nd ian

anguage

Page 24: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Components for Airline Passenger Data

Sandhya and Kevin Time Series Forecasting 20

CFC ILTenter

or

nd ian

anguage

Page 25: Introduction to Time Series Forecasting

BackgroundTime Series Categorization

Sandhya and Kevin Time Series Forecasting 20

Page 26: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Categorization of Time Series Problem Formulation

Based on the number of inputsUnivariate vs. Multivariate

Based on the number of time steps predicted in the output

One step forecasting vs. Multi step forecasting

Based on the modeling of interactions between differentcomponents

Additive vs. Multiplicative models

Sandhya and Kevin Time Series Forecasting 21

CFC ILTenter

or

nd ian

anguage

Page 27: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Univariate Time Series

Single Time Dependent Variable

Examples:

Monthly Airline Passenger Data

MonthNo. of Passengers(in thousands)

1949-02 118

1949-03 132

1949-04 129

1949-05 121

1949-06 135

Sandhya and Kevin Time Series Forecasting 22

CFC ILTenter

or

nd ian

anguage

Page 28: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Multivariate Time Series

Multiple time dependent variables

Can be considered as multiple univariate time series thatneeds to be analyzed jointly

Example: Rainfall Forecast

Date Humidity Temperature Rainfall01/01/18 36.81 16.222 30.25

02/01/18 34.438 18.146 29.26

03/01/18 29.291 19.002 28.26

04/01/18 30.712 19.279 29.54

05/01/18 32.352 19.494 30.12

06/01/18 31.952 20.894 27.63

Sandhya and Kevin Time Series Forecasting 23

CFC ILTenter

or

nd ian

anguage

Page 29: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Categorization of Time Series Problem Formulation

Based on the number of inputs

Univariate vs. Multivariate

Based on the number of time steps predicted in the output

One step forecasting vs. Multi step forecasting

Based on the modeling of interactions between differentcomponents

Additive vs. Multiplicative models

Sandhya and Kevin Time Series Forecasting 24

CFC ILTenter

or

nd ian

anguage

Page 30: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

One Step vs. Multi Step Forecasting

One Step Forecasting

Given data upto time t, predict value only for the next onestep i.e. at t + 1

Multi Step Forecasting

Given data upto time t, predict values for two or more stepsi.e. at t + 1, t + 2, t + 3, . . .

Sandhya and Kevin Time Series Forecasting 25

CFC ILTenter

or

nd ian

anguage

Page 31: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

One Step vs. Multi Step Forecasting (contd.)

One Step Prediction for 3 steps Multi Step Prediction for 3 steps

Sandhya and Kevin Time Series Forecasting 26

CFC ILTenter

or

nd ian

anguage

Page 32: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

One Step vs. Multi Step Forecasting (contd.)

One Step Prediction for 8 steps Multi Step Prediction for 8 steps

Note how close prediction is to true value in case of one stepprediction

Sandhya and Kevin Time Series Forecasting 27

CFC ILTenter

or

nd ian

anguage

Page 33: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

One Step vs. Multi Step Forecasting (contd.)

This guy should NOT use multi step forecasting

Img Src: https://xkcd.com/605/Sandhya and Kevin Time Series Forecasting 28

CFC ILTenter

or

nd ian

anguage

Page 34: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Categorization of Time Series Problem Formulation

Based on the number of inputs

Univariate vs. Multivariate

Based on the number of time steps predicted in the output

One step forecasting vs. Multi step forecasting

Based on the modeling of interactions between differentcomponents

Additive vs. Multiplicative models

Sandhya and Kevin Time Series Forecasting 29

CFC ILTenter

or

nd ian

anguage

Page 35: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive vs. Multiplicative Models

Additive models:

The series is additively dependent on the different components

Y = Level + Trend + Seasonality + Noise (3)

Multiplicative models:

The series is multiplicatively dependent on the differentcomponents

Y = Level × Trend × Seasonality × Noise (4)

Sandhya and Kevin Time Series Forecasting 30

CFC ILTenter

or

nd ian

anguage

Page 36: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive vs. Multiplicative Models (contd.)

Comparison of Additive and Multiplicative Seasonality

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/

Sandhya and Kevin Time Series Forecasting 31

CFC ILTenter

or

nd ian

anguage

Page 37: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 32

CFC ILTenter

or

nd ian

anguage

Page 38: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 32

CFC ILTenter

or

nd ian

anguage

Page 39: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 33

CFC ILTenter

or

nd ian

anguage

Page 40: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 33

CFC ILTenter

or

nd ian

anguage

Page 41: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 34

CFC ILTenter

or

nd ian

anguage

Page 42: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 34

CFC ILTenter

or

nd ian

anguage

Page 43: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 35

CFC ILTenter

or

nd ian

anguage

Page 44: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 35

CFC ILTenter

or

nd ian

anguage

Page 45: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 36

CFC ILTenter

or

nd ian

anguage

Page 46: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Multiplicative

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 36

CFC ILTenter

or

nd ian

anguage

Page 47: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 37

CFC ILTenter

or

nd ian

anguage

Page 48: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 37

CFC ILTenter

or

nd ian

anguage

Page 49: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Dealing with Multiplicative Models

Passenger Data is Multiplicative Log(Passenger) Data is Additive

Sandhya and Kevin Time Series Forecasting 38

CFC ILTenter

or

nd ian

anguage

Page 50: Introduction to Time Series Forecasting

BackgroundTime Series Forecasting Terminology

Sandhya and Kevin Time Series Forecasting 38

Page 51: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Correlations

Captures relation between two series

r = Corr(X ,Y )

=Cov(X ,Y )

σxσy

=E [(X − µX )(Y − µY )]

σxσy

Img Src:https://en.wikipedia.org/wiki/Correlation_and_dependence

Sandhya and Kevin Time Series Forecasting 39

CFC ILTenter

or

nd ian

anguage

Page 52: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Spurious Correlations

Img Src: https://www.tylervigen.com/spurious-correlationsSandhya and Kevin Time Series Forecasting 40

CFC ILTenter

or

nd ian

anguage

Page 53: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Autocorrelation

Capturing relation between a series and a lagged version ofthe same

Passenger dataAutocorrelation on Passengerdata

Sandhya and Kevin Time Series Forecasting 41

CFC ILTenter

or

nd ian

anguage

Page 54: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

White Noise

White noise is a time series that ispurely random in nature

Lets denote it by εt

Mean of white noise i.e. E [εt ] = 0 andVariance is always constant

εt , εk are uncorrelated

If data is white noise, then intelligentforecasting is not possible

The best would be to just returnmean as the prediction

https://en.wikipedia.org/wiki/White_noise

Sandhya and Kevin Time Series Forecasting 42

CFC ILTenter

or

nd ian

anguage

Page 55: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Stationarity

A time series is stationary if it does not exhibit any trend orseasonality

Stationary Time Series Non-Stationary Time Series

Sandhya and Kevin Time Series Forecasting 43

CFC ILTenter

or

nd ian

anguage

Page 56: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Stationarity (contd.)

Strict stationarity

P(Yt) = P(Yt+k) and P(Yt ,Yt+k) is independent of tMean and variance time invariant

Weak Stationarity

In this case, mean constant, variance constantCov(Y1,Y1+k) = Cov(Y2,Y2+k) = Cov(Y3,Y3+k) = γi.e. Covariance only depends on lag value k

Sandhya and Kevin Time Series Forecasting 44

CFC ILTenter

or

nd ian

anguage

Page 57: Introduction to Time Series Forecasting

Statistical Methods

Sandhya and Kevin Time Series Forecasting 45

Page 58: Introduction to Time Series Forecasting

Statistical MethodsSimple Models

Sandhya and Kevin Time Series Forecasting 45

Page 59: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Naive Forecasting

A dumb forecasting approach

Predict Yt+1 = Yt

i.e. Forecast that the next value is going to be the same asthe current value

Sandhya and Kevin Time Series Forecasting 46

CFC ILTenter

or

nd ian

anguage

Page 60: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Simple Moving Average (SMA)

Prediction is the mean of a rolling window over previous data

Yt =1

n

n∑i=1

Xt−i

where n is the rolling window size

MonthThousands ofPassengers

6-month-SMA

12-month-SMA

1949-01-01 112 NaN NaN

1949-02-01 118 NaN NaN

1949-03-01 132 NaN NaN

1949-04-01 129 NaN NaN

1949-05-01 121 NaN NaN

1949-06-01 135 124.500000 NaN

1949-07-01 148 130.500000 NaN

1949-08-01 148 135.500000 NaN

1949-09-01 136 136.166667 NaN

1949-10-01 119 134.500000 NaN

1949-11-01 104 131.666667 NaN

1949-12-01 118 128.833333 126.666667

1950-01-01 115 123.333333 126.916667

1950-02-01 126 119.666667 127.583333

1950-03-01 141 120.500000 128.333333

Sandhya and Kevin Time Series Forecasting 47

CFC ILTenter

or

nd ian

anguage

Page 61: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Simple Moving Average (SMA) (contd.)

Shortcomings of SMA:

Smaller windows lead to more noise, rather than signalWill lag by window sizeCannot predict extreme values (due to averaging)Captures trend, but poor at capturing other components; poorat forecasting

Sandhya and Kevin Time Series Forecasting 48

CFC ILTenter

or

nd ian

anguage

Page 62: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Exponential Weighted Moving Average (EWMA)

Gives exponentially high weights to nearby values and lowweights to far off values while performing weighted averaging

Y0 = X0

Yt = (1− α)Yt−1 + αXt

where α is a smoothing factor such that 0 < α ≤ 1

Sandhya and Kevin Time Series Forecasting 49

CFC ILTenter

or

nd ian

anguage

Page 63: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Comparison between SMA and EWMA

One can see seasonality better captured in EWMA ascompared to SMA

Sandhya and Kevin Time Series Forecasting 50

CFC ILTenter

or

nd ian

anguage

Page 64: Introduction to Time Series Forecasting

Statistical MethodsAuto Regressive Models

Sandhya and Kevin Time Series Forecasting 50

Page 65: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Auto Regressive (AR) Models

If the series is not white noise, then the forecasting can bemodeled as

Yt = f (Y1, . . . ,Yt−1, et) (5)

Practically not feasible to consider all time steps

Approximation time !

Yt = β0 + β1Yt−1 + εt (6)

Since we used 1 step, this is called AR(1) model

Extending to AR(p), we get

Yt = β0 + β1Yt−1 + β2Yt−2 + · · ·+ βpYt−p + εt (7)

Sandhya and Kevin Time Series Forecasting 51

CFC ILTenter

or

nd ian

anguage

Page 66: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Moving Average (MA) Models

Consider the modeling in AR

Yt = f (Y1, . . . ,Yt−1, et) (8)

Prediction based on previous values

In MA models, we model upon the white noise observations

Yt = f (e1, . . . , et−1, et) (9)

Using the previous analogy, an MA(q) model learns

Yt = γ0 + εt + γ1εt−1 + γ1εt−2 + · · ·+ γqεt−q (10)

Sandhya and Kevin Time Series Forecasting 52

CFC ILTenter

or

nd ian

anguage

Page 67: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

ARMA Models

ARMA models combine both AR and MA models

An ARMA(p,q) models Yt using p previous values and qprevious noise components

Yt = β0 + β1Yt−1 + β2Yt−2 + · · ·+ βpYt−p (11)

+εt + γ1εt−1 + γ2εt−2 + · · ·+ γqεt−q (12)

Sandhya and Kevin Time Series Forecasting 53

CFC ILTenter

or

nd ian

anguage

Page 68: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Differencing: Converting Non-stationary to Stationary

A time series which is non-stationary can be converted to astationary time series by differencing

Y ′t = Yt − Yt−1

If still not stationary, do second order differencing

Y ′′t = Y ′t − Y ′t−1 = Yt − 2Yt−1 + Yt−2

Sandhya and Kevin Time Series Forecasting 54

CFC ILTenter

or

nd ian

anguage

Page 69: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

ARIMA Models

Stands for Auto Regressive Integrated Moving Average

In ARIMA, the AR and MA are same as ARMA

However, I indicates the amount of difference done

If differencing done once, it is called I(1)

Thus an ARIMA(p,d,q) model is a combination of AR(p) andMA(q) with I(d)

Sandhya and Kevin Time Series Forecasting 55

CFC ILTenter

or

nd ian

anguage

Page 70: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

How to Decide p, d, q ?

Difficult for a human - will have to look various plots, runsome tests, etc.

Another approach - Auto ARIMA

Learns p,d, and q automatically

Sandhya and Kevin Time Series Forecasting 56

CFC ILTenter

or

nd ian

anguage

Page 71: Introduction to Time Series Forecasting

Statistical MethodsEvaluation Metrics

Sandhya and Kevin Time Series Forecasting 56

Page 72: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Evaluation Metrics

Standard evaluation metrics for time series forecasting are;

Mean Absolute Error (MAE)Mean Absolute Percentage Error (MAPE)Mean Squared Error (MSE)Root Mean Squared Error (RMSE)Normalized Root Mean Squared Error (NRMSE)

Sandhya and Kevin Time Series Forecasting 57

CFC ILTenter

or

nd ian

anguage

Page 73: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Mean Absolute Error (MAE)

MAE =1

n

n∑j=1

|yj − yj | (13)

Measures the average magnitude of the errors

If MAE = 0, then no error

Unable to properly alert when the forecast is very off for a fewpoints

Sandhya and Kevin Time Series Forecasting 58

CFC ILTenter

or

nd ian

anguage

Page 74: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Mean Absolute Percentage Error (MAPE)

MAPE =100%

n

n∑j=1

|yj − yj

yj| (14)

Percentage equivalent of MAE

Not defined for zero values

Sandhya and Kevin Time Series Forecasting 59

CFC ILTenter

or

nd ian

anguage

Page 75: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Mean Squared Error (MSE)

MSE =1

n

n∑j=1

(yj − yj)2 (15)

Measures the mean of the squared error

Those forecast values which are very off are penalized more

Squared values make it more difficult to interpret the errors

Sandhya and Kevin Time Series Forecasting 60

CFC ILTenter

or

nd ian

anguage

Page 76: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Root Mean Squared Error (RMSE)

RMSE =

√√√√1

n

n∑j=1

(yj − yj)2 (16)

Value of the loss is of similar magnitude as that of theprediction

Thereby making it more interpretable

Also punishes large prediction errors

Sandhya and Kevin Time Series Forecasting 61

CFC ILTenter

or

nd ian

anguage

Page 77: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Normalized Root Mean Squared Error (NRMSE)

NRMSE =

√1n

∑nj=1(yj − yj)2

Z(17)

where Z is the normalization factor

NRMSE allows for comparison between models acrossdifferent datasets

Common normalization factors:

Mean: Preferred when same preprocessing and predictedfeatureRange: sensitive to sample sizeStandard Deviation: suitable across datasets as well aspredicted features

Sandhya and Kevin Time Series Forecasting 62

CFC ILTenter

or

nd ian

anguage

Page 78: Introduction to Time Series Forecasting

Classical ML Models

Sandhya and Kevin Time Series Forecasting 63

Page 79: Introduction to Time Series Forecasting

Classical ML ModelsPreparing Data

Sandhya and Kevin Time Series Forecasting 63

Page 80: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Preparing Time Series Data for Machine Learning

TimeExtra

FeatureFeature of

Interest

t1 e1 x1t2 e2 x2t3 e3 x3t4 e4 x4t5 e5 x5t6 e6 x6t7 e7 x7t8 e8 x8t9 e9 x9t10 e10 x10t11 e11 x11t12 e12 x12

Sandhya and Kevin Time Series Forecasting 64

CFC ILTenter

or

nd ian

anguage

Page 81: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

One Step Forecasting Setup

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12t12 e12 x12 NaN

Sandhya and Kevin Time Series Forecasting 65

CFC ILTenter

or

nd ian

anguage

Page 82: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Random Split

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t4 e4 x4 x5t6 e6 x6 x7t7 e7 x7 x8t9 e9 x9 x10t10 e10 x10 x11

Table: Train Set

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t3 e3 x3 x4t5 e5 x5 x6t8 e8 x8 x9t11 e11 x11 x12

Table: Test Set

Sandhya and Kevin Time Series Forecasting 66

CFC ILTenter

or

nd ian

anguage

Page 83: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Sequential Split

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9

Table: Train Set

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

Table: Test Set

Sandhya and Kevin Time Series Forecasting 67

CFC ILTenter

or

nd ian

anguage

Page 84: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Multiple Train-Test Split

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4

Table: Train Set 1

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7

Table: Train Set 2

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7

Table: Test Set 1

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10

Table: Test Set 2

Sandhya and Kevin Time Series Forecasting 68

CFC ILTenter

or

nd ian

anguage

Page 85: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Multiple Train-Test Split (contd.)

Train Size Test Size

3 timesteps (t1 - t3) 3 timesteps (t4 - t6)

6 timesteps (t1 - t6) 3 timesteps (t7 - t9)

9 timesteps (t1 - t9) 3 timesteps (t10 - t12)

12 timesteps (t1 - t12) 3 timesteps (t13 - t15)

Sandhya and Kevin Time Series Forecasting 69

CFC ILTenter

or

nd ian

anguage

Page 86: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Expanding Window Multiple Sets

Train Size Test Size

10 timesteps (t1 - t10) t1111 timesteps (t1 - t11) t1212 timesteps (t1 - t12) t1313 timesteps (t1 - t13) t14

Sandhya and Kevin Time Series Forecasting 70

CFC ILTenter

or

nd ian

anguage

Page 87: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Fixed Window Sequential Data

Sequence No. Input Data Output Data

1 t1, t2, t3 t42 t2, t3, t4 t53 t3, t4, t5 t64 t4, t5, t6 t75 t5, t6, t7 t86 t6, t7, t8 t97 t7, t8, t9 t108 t8, t9, t10 t11

Sandhya and Kevin Time Series Forecasting 71

CFC ILTenter

or

nd ian

anguage

Page 88: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Comparison of Different Dataset Preparation

Split Approach Comments

Random SplitNot advisable as the temporalinformation is lost

Sequential Split Mostly preferred on large datasets

Multiple Splits Leads to leakage of data

Expanding WindowMultiple Sets

Also known as Walk Forward validation

Sandhya and Kevin Time Series Forecasting 72

CFC ILTenter

or

nd ian

anguage

Page 89: Introduction to Time Series Forecasting

Classical ML ModelsML Models

Sandhya and Kevin Time Series Forecasting 72

Page 90: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Linear Regression

Models change in one variable through change in othervariables

Method for finding the linear relationship betweenindependent and dependent variables

Assuming a linear relationship exists !!!

Also known as line of best fit, ordinary least squaresregression, etc.

Sandhya and Kevin Time Series Forecasting 73

CFC ILTenter

or

nd ian

anguage

Page 91: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Estimating Linear Regression

Simple univariate linear regression

Given training data of the form (x , y), learn w and b such that

(y − (wx + b))2 (18)

is minimized

Simple multivariate linear regression

Given training data of the form (X , y) where X is ndimensional, learn w1 . . .wn and b such that

(y − (n∑

i=1

wixi + b))2 (19)

is minimized

Sandhya and Kevin Time Series Forecasting 74

CFC ILTenter

or

nd ian

anguage

Page 92: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Forecasting with Linear Regression

Simple univariate linear regression

Given new data X , forecast using learned parameters w and bas

y = wx + b (20)

Simple multivariate linear regression

Given new data X where X is n dimensional, forecast usinglearned parameters w1 . . .wn and b as

y =n∑

i=1

wixi + b (21)

Sandhya and Kevin Time Series Forecasting 75

CFC ILTenter

or

nd ian

anguage

Page 93: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Support Vector Regression

This model uses the concept of support vectors for regression

Performs linear regression in high dimensional feature space

Aim is to fit the error within a threshold range

A hyperplane is obtained such that the loss is minimized

Loss is considered to be zero within small deviation ε fromhyperplane

Sandhya and Kevin Time Series Forecasting 76

CFC ILTenter

or

nd ian

anguage

Page 94: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Estimating SVR

Multivariate scenario

Given training data of the form (X , y) where X is ndimensional, learn w1 . . .wn and b such that

Loss =

{0 |f (xi )− yi | < ε|f (xi )− yi | − ε otherwise

(22)

is minimizedwhere f (xi ) =

∑ni=1 wixi + b

Sandhya and Kevin Time Series Forecasting 77

CFC ILTenter

or

nd ian

anguage

Page 95: Introduction to Time Series Forecasting

Deep Learning Models

Sandhya and Kevin Time Series Forecasting 78

Page 96: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Feedforward Neural Networks

Inputlayer

Hiddenlayer

Outputlayer

Sandhya and Kevin Time Series Forecasting 79

CFC ILTenter

or

nd ian

anguage

Page 97: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Feedforward Neural Network: Forward Propagation

Let X = (x1, . . . , xn) be the set of input features

hidden layer activation neurons,aj = f (

∑ni=1Wjixi ), ∀j ∈ 1, . . . h

Sandhya and Kevin Time Series Forecasting 80

CFC ILTenter

or

nd ian

anguage

Page 98: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Feedforward Neural Network: Forward Propagation

Let a = (a1, . . . , ah) be the set of hidden layer features

output neurons, ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

Sandhya and Kevin Time Series Forecasting 81

CFC ILTenter

or

nd ian

anguage

Page 99: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Feedforward Neural Network: Learning Algorithm

Adjust weights W and U to minimize the error on training set

Define the error to be squared loss between predictions andtrue output

E =1

2(y − o)2 (23)

Gradient w.r.t to output is,

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok) (24)

Sandhya and Kevin Time Series Forecasting 82

CFC ILTenter

or

nd ian

anguage

Page 100: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Recurrent Neural Networks

Feed forward networks cannot handle sequences

If sequential data is flattened, then it can be learned by FFN

However, weights will not be shared across timesteps

Recurrent networks to the rescue!

Sandhya and Kevin Time Series Forecasting 83

CFC ILTenter

or

nd ian

anguage

Page 101: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

An Unrolled RNN

NOTE : Hidden state (ht) tells us summary of the sequence tilltime t

Forward passht = tanh(Wht−1 + Uxt + bh)

zt = softmax(Vht + bz)

Sandhya and Kevin Time Series Forecasting 84

CFC ILTenter

or

nd ian

anguage

Page 102: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Long Short Term Memory (LSTM) Network

Forward Pass 1

ft = σ(Wf [ht−1; xt ] + bf )it = σ(Wi [ht−1; xt ] + bi )at = tanh(Wa[ht−1; xt ] + ba)ot = σ(Wo [ht−1; xt ] + bo)

Ct = ft ∗ Ct−1 + it ∗ atht = ot ∗ tanh(Ct)

1For a more detailed treatment of neural networks, refer ICON 2018 slidesat http://www.cfilt.iitb.ac.in/documents/ICON_Tutorial_2018.pdf byKevin Patel and Himanshu Singh

Sandhya and Kevin Time Series Forecasting 85

CFC ILTenter

or

nd ian

anguage

Page 103: Introduction to Time Series Forecasting

Connection with NLP

Sandhya and Kevin Time Series Forecasting 86

Page 104: Introduction to Time Series Forecasting

Connection with NLPProblem Level

Sandhya and Kevin Time Series Forecasting 86

Page 105: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Problem Level Connections

Discuss connections and similarities among problems and howone solution can impact another

Time Series Forecasting benefitting from NLP

NLP benefitting from Time Series Forecasting

Sandhya and Kevin Time Series Forecasting 87

CFC ILTenter

or

nd ian

anguage

Page 106: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

On the Importance of Text Analysis for Stock PricePrediction

By Lee et al. (2014)

Forecast companies’ stock price changes (UP, DOWN, STAY)in response to financial events reported by them in 8-Kdocuments

Baseline: Using recent stock price movement and earningssurprise

Contribution: Using textual information from 8-K documentsalong with recent stock price movement and earnings surprise

Observation: Proposed system outperforms baseline by 10%

Resource Contribution: Annotated 8-K documents for use infurther research

Sandhya and Kevin Time Series Forecasting 88

CFC ILTenter

or

nd ian

anguage

Page 107: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Semantic Frames to Predict Stock Price Movement

Xie et al. (2013)

Uses FrameNet information to generalize specific sentences toscenarios

Sandhya and Kevin Time Series Forecasting 89

CFC ILTenter

or

nd ian

anguage

Page 108: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Semantic Frames to Predict Stock Price Movement(contd.)

Predict 1) change in stock price or 2) polarity of change(up/down)

Baseline: BOW features and LDA

Contribution: FWD (Frames, BOW and part-of-speechspecific DAL scores) features and SemTree datarepresentations

Model: SVM with tree kernels

Observation: Proposed features assist significantly in polaritytask, and show promise in change task.

Sandhya and Kevin Time Series Forecasting 90

CFC ILTenter

or

nd ian

anguage

Page 109: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Stock Movement Prediction from Tweets and HistoricalPrices

Xu and Cohen (2018)

Present a novel deep generative model jointly exploiting textand price signals for this task

Introduce recurrent, continuous latent variables for bettertreatment of stochasticity, and use neural variational inference

Resource Contribution: A new stock movement predictiondataset 2

2https://github.com/yumoxu/stocknet-dataset

Sandhya and Kevin Time Series Forecasting 91

CFC ILTenter

or

nd ian

anguage

Page 110: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Identifying and Following Expert Investors in StockMicroblogs

Bar-Haim et al. (2011)

Task: Identify expert investors from the information publishedin online stock investment message boards / tweets

Indirect evaluation by considering advice of detected expertsin stock prediction

Baseline: Assume all users as experts

Contribution: A probabilistic expert finding framework

Observation: Information from tweets of identified expertsallowed to forecast stock price movement with higherprecision.

Sandhya and Kevin Time Series Forecasting 92

CFC ILTenter

or

nd ian

anguage

Page 111: Introduction to Time Series Forecasting

Connection with NLPTooling Level

Sandhya and Kevin Time Series Forecasting 92

Page 112: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Attention

Anyone familiar with technical indicators in stock marketpredictions will probably have an epiphany at this point

Attention can be used by a neural network to attend toarbitrary portions of the time signal for forecasting

Qin et al. (2017) use two attentions in their paper to improvetime series prediction

An input attention which adaptively extracts relevant inputfeatures (more interpretable)A temporal attention over the encoder states (betterperformance)

Outperforms state-of-the-art on two time series predictiondatasets

Sandhya and Kevin Time Series Forecasting 93

CFC ILTenter

or

nd ian

anguage

Page 113: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Transfer Learning

From AlexNet, ResNet in computer vision to BERT, ELMO inNLP, transfer learning has proved its effectiveness

The idea of learning on one scenario and applying the same inother with little tuning is fascinating

What could be the equivalent in time series forecasting?

Ye and Dai (2018) mix transfer learning with online sequentialextreme learning machine and ensemble learning

Does not discard long-ago data

Instead the authors claim that their model is able to transferknowledge from long-ago data

Showed effectiveness on multiple synthetic and real world data

Sandhya and Kevin Time Series Forecasting 94

CFC ILTenter

or

nd ian

anguage

Page 114: Introduction to Time Series Forecasting

Demos

Sandhya and Kevin Time Series Forecasting 95

Page 115: Introduction to Time Series Forecasting

DemosStatsmodel Library

Sandhya and Kevin Time Series Forecasting 95

Page 116: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Statsmodel Library

Statsmodels is a Python package that allows users to exploredata, estimate statistical models, and perform statistical tests

Has extensive list of descriptive statistics, statistical tests,plotting functions, and result statistics

For different data types and estimators

Built on top of NumPy and SciPy

Integrates with Pandas

Sandhya and Kevin Time Series Forecasting 96

CFC ILTenter

or

nd ian

anguage

Page 117: Introduction to Time Series Forecasting

DemosProphet Library

Sandhya and Kevin Time Series Forecasting 96

Page 118: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Prophet Library

An open source library released by Facebook

Originally designed to forecast business data internally atFacebook

For more details, refer Taylor and Letham (2018)

Sandhya and Kevin Time Series Forecasting 97

CFC ILTenter

or

nd ian

anguage

Page 119: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Prophet

It is an additive regression model with 4 components

Yt = gt + st + ht + εt (25)

where gt is trend, st is seasonality, ht is holidays and εt is the errorterm.

It automatically detects the change points in data

It is robust to missing data and shifts in the trend and handlesoutliers well.

Sandhya and Kevin Time Series Forecasting 98

CFC ILTenter

or

nd ian

anguage

Page 120: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Other Options

Google AutoML

Provides a web interfaceModels being used is hiddenProvides a final metric of the model being usedHas a trial period

Microsoft Azure

Provides a web interfaceExposes the list of modelsProvides metrics for each model being testedHas a trial period

Amazon Forecast

Provides a web interfaceExposes the list of modelsProvides a final metric of the model being usedNo trial period

Sandhya and Kevin Time Series Forecasting 99

CFC ILTenter

or

nd ian

anguage

Page 121: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Demo Content

The code for the demo on Statsmodel and ARIMA can befound athttps://github.com/Sandhya2207/ICON-2019-TSF-demo

The code for the demo on how statistical techniques helpdeep learning techniques can be found athttps://github.com/KevinNPatel/icon2019_demo

Sandhya and Kevin Time Series Forecasting 100

CFC ILTenter

or

nd ian

anguage

Page 122: Introduction to Time Series Forecasting

Conclusion

Sandhya and Kevin Time Series Forecasting 101

Page 123: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Forecasting Competitions

Santa Fe Time Series Prediction and Analysis Competition(1994)

International Workshop on Advanced Black-box techniques fornon-linear modeling competition (1998)

NN3 and NN5 competitions

Kaggle challenges

Makridakis challenges (M1, M2, M3 and M4)

Sandhya and Kevin Time Series Forecasting 102

CFC ILTenter

or

nd ian

anguage

Page 124: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Forecasting Conferences

Makridakis conferenceshttps://mofc.unic.ac.cy/m-conferences/

International conference on Time Series and Forecasting(ITISE)

ACM Special Interest Group on Knowledge, Data andDiscovery (SIGKDD)

IEEE International Conference on Data Mining (ICDM)

Society for Industrial and Applied Mathematics (SIAM)

The usual ACL, EMNLP, etc. for the interplay between textand time series

Sandhya and Kevin Time Series Forecasting 103

CFC ILTenter

or

nd ian

anguage

Page 125: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Time Series Forecasting Datasets

UCR datahttps://www.cs.ucr.edu/~eamonn/UCRsuite.html

Makridakis challenge data

Simple data for testing models using generative models likeARIMA

Simple real world datasets 3:Airline Passenger datasetShampoo Sales datasetMinimum Daily Temperatures datasetMonthly Sunspot datasetDaily Female Births datasetEEG Eye State datasetOccupancy Detection datasetOzone Level Detection dataset

3https://machinelearningmastery.com/

time-series-datasets-for-machine-learning/

Sandhya and Kevin Time Series Forecasting 104

CFC ILTenter

or

nd ian

anguage

Page 126: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Conclusion

Time Series Forecasting - interesting challenge

Provided background and clarified terminology of time series

Discussed different approaches

Statistical ApproachesClassical MLDeep Learning

Discussed few papers showcasing interplay between NLP andTime Series and potential future directions worth exploring

Sandhya and Kevin Time Series Forecasting 105

CFC ILTenter

or

nd ian

anguage

Page 127: Introduction to Time Series Forecasting

Thank You

Sandhya and Kevin Time Series Forecasting 106

CFC ILTenter

or

nd ian

anguage

Page 128: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

References I

Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., and Goldstein,G. (2011). Identifying and following expert investors in stockmicroblogs. In Proceedings of the 2011 Conference on EmpiricalMethods in Natural Language Processing, pages 1310–1319,Edinburgh, Scotland, UK. Association for ComputationalLinguistics.

Lee, H., Surdeanu, M., MacCartney, B., and Jurafsky, D. (2014).On the importance of text analysis for stock price prediction. InLREC, pages 1170–1175.

Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., and Cottrell,G. W. (2017). A dual-stage attention-based recurrent neuralnetwork for time series prediction. In Proceedings of the 26thInternational Joint Conference on Artificial Intelligence, pages2627–2633. AAAI Press.

Sandhya and Kevin Time Series Forecasting 107

CFC ILTenter

or

nd ian

anguage

Page 129: Introduction to Time Series Forecasting

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

References II

Taylor, S. J. and Letham, B. (2018). Forecasting at scale. TheAmerican Statistician, 72(1):37–45.

Xie, B., Passonneau, R. J., Wu, L., and Creamer, G. G. (2013).Semantic frames to predict stock price movement. InProceedings of the 51st Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers), pages873–883, Sofia, Bulgaria. Association for ComputationalLinguistics.

Xu, Y. and Cohen, S. B. (2018). Stock movement prediction fromtweets and historical prices. In Proceedings of the 56th AnnualMeeting of the Association for Computational Linguistics(Volume 1: Long Papers), pages 1970–1979.

Ye, R. and Dai, Q. (2018). A novel transfer learning framework fortime series forecasting. Knowledge-Based Systems, 156:74–99.

Sandhya and Kevin Time Series Forecasting 108

CFC ILTenter

or

nd ian

anguage