introduction to time series forecasting

Introduction to Time Series ForecastingWith Case Studies in NLP

A Tutorial at ICON 2019

Sandhya Singh & Kevin Patel

CFC ILTenter

or

nd ian

anguage

December 18, 2019

Sandhya and Kevin Time Series Forecasting 1

Overview

We will highlight how NLP people are also well suited to workon Time Series problems.

We will provide background information on Time SeriesForecasting.

We will discuss some statistical approaches, some classicalmachine learning approaches, and some deep learningapproaches for time series forecasting.

We will mention some commonalities between NLP and TimeSeries, and how one can assist the other.


CFC ILTenter

or

nd ian

anguage

Outline

1 IntroductionTime SeriesTime Series Forecasting

2 BackgroundTime Series ComponentsTime Series CategorizationTime Series Forecasting Terminology

3 Statistical MethodsSimple ModelsAuto Regressive ModelsEvaluation Metrics

4 Classical ML ModelsPreparing DataML Models


CFC ILTenter

or

nd ian

anguage

Outline

5 Deep Learning Models

6 Connection with NLPProblem LevelTooling Level

7 DemosStatsmodel LibraryProphet Library

8 Conclusion


CFC ILTenter

or

nd ian

anguage

Introduction Background StatisticalMethods

Classical MLModels

Deep LearningModels

Connectionwith NLP

Demos Conclusion References

Acknowledgement

We thank

The ICON 2019 committee, for accepting our proposal

LGSoft, for their joint project with CFILT. Our investigationsinto the same germinated the idea of this tutorial


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Some Context Regarding the Tutorial

Why are we (people working in NLP) talking about TimeSeries?

Text and Time Series - both are Sequential DataCommonality - Exploit structure we know about the problemin advance

The sequential nature in this case

Similar tools, different terminologyKnowing terminology will enable us to apply our knowledge oftools in this area too


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Target Audience

People who have not worked with Time Series data

Subsumes

Those who are completely new to MLThose who have basic knowledge of how to apply MLThose who are proficient in ML and/or are working in NaturalLanguage Processing


CFC ILTenter

or

nd ian

anguage

Introduction


IntroductionTime Series



Classical MLModels

Deep LearningModels

Connectionwith NLP


What is a Time Series?

Definition

A time series is a sequence of observations ordered in time.

Xt ; t = 0, 1, 2, 3

The observations arecollected over a fixedtime interval

The time dimensionadds structure andconstraint to the data

Nifty 50 Index as of 13/10/2019 15:31 IST


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Where (and When) does One Encounter Time Series?

As a time series of our life !

Economy and Finance: Exchange rates, Interest rates,Employment rate, Financial indicesMeteorology: Properties of weather like temperature, humidity,windspeed, etc.Medicine: Physiological signals (EEG), heart-rate, patienttemperature, etc.

Other venues:

Industry: Electric load, power consumption, resourceconsumptionWeb: Clicks, Logs


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


What is Time Series Analysis?

Applying statistical approaches to time series data

Will enable one to

Predict future based on the pastUnderstand the underlying mechanism which generates thedataControl the mechanismDescribe salient features of the data


CFC ILTenter

or

nd ian

anguage

IntroductionTime Series Forecasting



Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Forecasting

Predict futurebased on past

Extrapolation inclassical statisticalterminology

MonthNo. of Passengers(in thousands)

1949-02 118

1949-03 132

1949-04 129

1949-05 121

1949-06 135

1949-07 ?

1949-08 ?

1949-09 ?


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Forecasting - Yes or No?

Determining whether tomorrow a stock will go up/down orstay put?

Given a voice recording, who is the speaker?

Given a voice recording, who is speaking after the currentspeaker?

Given an ECG plot, is the heart functioning normal orabnormal?

Given an ECG plot, predict whether the person will have aheart related issue in the next month.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Forecasting - Yes or No?

Determining whether tomorrow a stock will go up/down orstay put? - YES

Given a voice recording, who is the speaker? - NO

Given a voice recording, who is speaking after the currentspeaker? - NO

Given an ECG plot, is the heart functioning normal orabnormal? - NO

Given an ECG plot, predict whether the person will have aheart - YES related issue in the next month.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Advantages of Time Series Forecasting

Reliability:

Given the forecast of power surges in your area, you can checkwhether your home’s wiring is reliable or not.

Preparing for Seasons:

Looking at the patterns from previous Christmas events, stockyour warehouse for the upcoming Christmas accordinglyGiven that the south east coast of India experiences typhoonsduring monsoons, pre-allocate rescue and relief resources

Estimating trends:

Given trend of a particular stock, should I invest in it?


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Forecasting and Machine Learning

Forecasting - predicting the future from the past

Given an observed value Y , predict Yt+1 using Y1 . . .Yt

In other words, learn f such that

Yt+1 = f (Y1, . . . ,Yt) (1)

Machine Learning practitioners should be easily able to relatethis expression to

Y = f (X ) (2)

Are ML skills applicable? - Yes


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


AI/ML : NOT a Silver Bullet

AI/ML are multipliers, and not a silver bullet.

Consider the example:

EMNLP - Empirical Methods in Natural Language Processing -a top tier NLP conferenceEMNLP 2015 was informally called Embedding Methods inNatural Language ProcessingThis is due to sheer number of papers about word embeddingsMore or less implying that word embeddings are the silverbulletIf that were the case, shouldn’t all problems be solved by now?

Shouldn’t ACL, etc. close shops?

That is not the caseDomain knowledge is still needed for proper utilization of MLSo lets discuss some background to gain time series domainknowledge


CFC ILTenter

or

nd ian

anguage

Background


BackgroundTime Series Components



Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Components

Level

The average value of a time series

Trend

A long term pattern present in the time seriesCan be positive, negative, linear or nonlinearIf no increasing or decreasing trend, then the time series isstationary.

i.e. Data has constant mean and variance over time


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Components (contd.)

Seasonality

Regular and Predictable changes that recur in regular shortintervalsLargely due to involvement of periodically occurring factors

Cyclicality

Changes that recur in irregular intervalsAs opposed to fixed period intervals in seasonality

Noise / Irregularity / Residual

Random variations that do not repeat in the pattern


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Components for Airline Passenger Data


CFC ILTenter

or

nd ian

anguage

BackgroundTime Series Categorization



Classical MLModels

Deep LearningModels

Connectionwith NLP


Categorization of Time Series Problem Formulation

Based on the number of inputsUnivariate vs. Multivariate

Based on the number of time steps predicted in the output

One step forecasting vs. Multi step forecasting

Based on the modeling of interactions between differentcomponents

Additive vs. Multiplicative models


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Univariate Time Series

Single Time Dependent Variable

Examples:

Monthly Airline Passenger Data

MonthNo. of Passengers(in thousands)

1949-02 118

1949-03 132

1949-04 129

1949-05 121

1949-06 135


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Multivariate Time Series

Multiple time dependent variables

Can be considered as multiple univariate time series thatneeds to be analyzed jointly

Example: Rainfall Forecast

Date Humidity Temperature Rainfall01/01/18 36.81 16.222 30.25

02/01/18 34.438 18.146 29.26

03/01/18 29.291 19.002 28.26

04/01/18 30.712 19.279 29.54

05/01/18 32.352 19.494 30.12

06/01/18 31.952 20.894 27.63


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP



Based on the number of inputs

Univariate vs. Multivariate






CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


One Step vs. Multi Step Forecasting

One Step Forecasting

Given data upto time t, predict value only for the next onestep i.e. at t + 1

Multi Step Forecasting

Given data upto time t, predict values for two or more stepsi.e. at t + 1, t + 2, t + 3, . . .


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


One Step vs. Multi Step Forecasting (contd.)

One Step Prediction for 3 steps Multi Step Prediction for 3 steps


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP



One Step Prediction for 8 steps Multi Step Prediction for 8 steps

Note how close prediction is to true value in case of one stepprediction


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP



This guy should NOT use multi step forecasting

Img Src: https://xkcd.com/605/Sandhya and Kevin Time Series Forecasting 28

CFC ILTenter

or

nd ian

anguage

https://xkcd.com/605/


Classical MLModels

Deep LearningModels

Connectionwith NLP



Based on the number of inputs

Univariate vs. Multivariate






CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Additive vs. Multiplicative Models

Additive models:

The series is additively dependent on the different components

Y = Level + Trend + Seasonality + Noise (3)

Multiplicative models:

The series is multiplicatively dependent on the differentcomponents

Y = Level × Trend × Seasonality × Noise (4)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Additive vs. Multiplicative Models (contd.)

Comparison of Additive and Multiplicative Seasonality

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/


CFC ILTenter

or

nd ian

anguage

https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/



Classical MLModels

Deep LearningModels

Connectionwith NLP


Additive or Multiplicative?

Additive

Img Src: https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/Sandhya and Kevin Time Series Forecasting 32

CFC ILTenter

or

nd ian

anguage




Classical MLModels

Deep LearningModels

Connectionwith NLP



Multiplicative


CFC ILTenter

or

nd ian

anguage




Classical MLModels

Deep LearningModels

Connectionwith NLP



Additive


CFC ILTenter

or

nd ian

anguage




Classical MLModels

Deep LearningModels

Connectionwith NLP



Multiplicative


CFC ILTenter

or

nd ian

anguage




Classical MLModels

Deep LearningModels

Connectionwith NLP



Additive


CFC ILTenter

or

nd ian

anguage




Classical MLModels

Deep LearningModels

Connectionwith NLP


Dealing with Multiplicative Models

Passenger Data is Multiplicative Log(Passenger) Data is Additive


CFC ILTenter

or

nd ian

anguage

BackgroundTime Series Forecasting Terminology



Classical MLModels

Deep LearningModels

Connectionwith NLP


Correlations

Captures relation between two series

r = Corr(X ,Y )

=Cov(X ,Y )

σxσy

=E [(X − µX )(Y − µY )]

σxσy

Img Src:https://en.wikipedia.org/wiki/Correlation_and_dependence


CFC ILTenter

or

nd ian

anguage

https://en.wikipedia.org/wiki/Correlation_and_dependence


Classical MLModels

Deep LearningModels

Connectionwith NLP


Spurious Correlations

Img Src: https://www.tylervigen.com/spurious-correlationsSandhya and Kevin Time Series Forecasting 40

CFC ILTenter

or

nd ian

anguage

https://www.tylervigen.com/spurious-correlations


Classical MLModels

Deep LearningModels

Connectionwith NLP


Autocorrelation

Capturing relation between a series and a lagged version ofthe same

Passenger dataAutocorrelation on Passengerdata


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


White Noise

White noise is a time series that ispurely random in nature

Lets denote it by εt

Mean of white noise i.e. E [εt ] = 0 andVariance is always constant

εt , εk are uncorrelated

If data is white noise, then intelligentforecasting is not possible

The best would be to just returnmean as the prediction

https://en.wikipedia.org/wiki/White_noise


CFC ILTenter

or

nd ian

anguage

https://en.wikipedia.org/wiki/White_noise


Classical MLModels

Deep LearningModels

Connectionwith NLP


Stationarity

A time series is stationary if it does not exhibit any trend orseasonality

Stationary Time Series Non-Stationary Time Series


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Stationarity (contd.)

Strict stationarity

P(Yt) = P(Yt+k) and P(Yt ,Yt+k) is independent of tMean and variance time invariant

Weak Stationarity

In this case, mean constant, variance constantCov(Y1,Y1+k) = Cov(Y2,Y2+k) = Cov(Y3,Y3+k) = γi.e. Covariance only depends on lag value k


CFC ILTenter

or

nd ian

anguage

Statistical Methods


Statistical MethodsSimple Models



Classical MLModels

Deep LearningModels

Connectionwith NLP


Naive Forecasting

A dumb forecasting approach

Predict Yt+1 = Yt

i.e. Forecast that the next value is going to be the same asthe current value


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Simple Moving Average (SMA)

Prediction is the mean of a rolling window over previous data

Yt =1

n

n∑i=1

Xt−i

where n is the rolling window size

MonthThousands ofPassengers

6-month-SMA

12-month-SMA

1949-01-01 112 NaN NaN

1949-02-01 118 NaN NaN

1949-03-01 132 NaN NaN

1949-04-01 129 NaN NaN

1949-05-01 121 NaN NaN

1949-06-01 135 124.500000 NaN

1949-07-01 148 130.500000 NaN

1949-08-01 148 135.500000 NaN

1949-09-01 136 136.166667 NaN

1949-10-01 119 134.500000 NaN

1949-11-01 104 131.666667 NaN

1949-12-01 118 128.833333 126.666667

1950-01-01 115 123.333333 126.916667

1950-02-01 126 119.666667 127.583333

1950-03-01 141 120.500000 128.333333


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Simple Moving Average (SMA) (contd.)

Shortcomings of SMA:

Smaller windows lead to more noise, rather than signalWill lag by window sizeCannot predict extreme values (due to averaging)Captures trend, but poor at capturing other components; poorat forecasting


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Exponential Weighted Moving Average (EWMA)

Gives exponentially high weights to nearby values and lowweights to far off values while performing weighted averaging

Y0 = X0

Yt = (1− α)Yt−1 + αXt

where α is a smoothing factor such that 0 < α ≤ 1


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Comparison between SMA and EWMA

One can see seasonality better captured in EWMA ascompared to SMA


CFC ILTenter

or

nd ian

anguage

Statistical MethodsAuto Regressive Models



Classical MLModels

Deep LearningModels

Connectionwith NLP


Auto Regressive (AR) Models

If the series is not white noise, then the forecasting can bemodeled as

Yt = f (Y1, . . . ,Yt−1, et) (5)

Practically not feasible to consider all time steps

Approximation time !

Yt = β0 + β1Yt−1 + εt (6)

Since we used 1 step, this is called AR(1) model

Extending to AR(p), we get

Yt = β0 + β1Yt−1 + β2Yt−2 + · · ·+ βpYt−p + εt (7)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Moving Average (MA) Models

Consider the modeling in AR

Yt = f (Y1, . . . ,Yt−1, et) (8)

Prediction based on previous values

In MA models, we model upon the white noise observations

Yt = f (e1, . . . , et−1, et) (9)

Using the previous analogy, an MA(q) model learns

Yt = γ0 + εt + γ1εt−1 + γ1εt−2 + · · ·+ γqεt−q (10)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


ARMA Models

ARMA models combine both AR and MA models

An ARMA(p,q) models Yt using p previous values and qprevious noise components

Yt = β0 + β1Yt−1 + β2Yt−2 + · · ·+ βpYt−p (11)

+εt + γ1εt−1 + γ2εt−2 + · · ·+ γqεt−q (12)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Differencing: Converting Non-stationary to Stationary

A time series which is non-stationary can be converted to astationary time series by differencing

Y ′t = Yt − Yt−1

If still not stationary, do second order differencing

Y ′′t = Y ′t − Y ′t−1 = Yt − 2Yt−1 + Yt−2


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


ARIMA Models

Stands for Auto Regressive Integrated Moving Average

In ARIMA, the AR and MA are same as ARMA

However, I indicates the amount of difference done

If differencing done once, it is called I(1)

Thus an ARIMA(p,d,q) model is a combination of AR(p) andMA(q) with I(d)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


How to Decide p, d, q ?

Difficult for a human - will have to look various plots, runsome tests, etc.

Another approach - Auto ARIMA

Learns p,d, and q automatically


CFC ILTenter

or

nd ian

anguage

Statistical MethodsEvaluation Metrics



Classical MLModels

Deep LearningModels

Connectionwith NLP


Evaluation Metrics

Standard evaluation metrics for time series forecasting are;

Mean Absolute Error (MAE)Mean Absolute Percentage Error (MAPE)Mean Squared Error (MSE)Root Mean Squared Error (RMSE)Normalized Root Mean Squared Error (NRMSE)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Mean Absolute Error (MAE)

MAE =1

n

n∑j=1

|yj − yj | (13)

Measures the average magnitude of the errors

If MAE = 0, then no error

Unable to properly alert when the forecast is very off for a fewpoints


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Mean Absolute Percentage Error (MAPE)

MAPE =100%

n

n∑j=1

|yj − yj

yj| (14)

Percentage equivalent of MAE

Not defined for zero values


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Mean Squared Error (MSE)

MSE =1

n

n∑j=1

(yj − yj)2 (15)

Measures the mean of the squared error

Those forecast values which are very off are penalized more

Squared values make it more difficult to interpret the errors


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Root Mean Squared Error (RMSE)

RMSE =

√√√√1

n

n∑j=1

(yj − yj)2 (16)

Value of the loss is of similar magnitude as that of theprediction

Thereby making it more interpretable

Also punishes large prediction errors


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Normalized Root Mean Squared Error (NRMSE)

NRMSE =

√1n

∑nj=1(yj − yj)2

Z(17)

where Z is the normalization factor

NRMSE allows for comparison between models acrossdifferent datasets

Common normalization factors:

Mean: Preferred when same preprocessing and predictedfeatureRange: sensitive to sample sizeStandard Deviation: suitable across datasets as well aspredicted features


CFC ILTenter

or

nd ian

anguage

Classical ML Models


Classical ML ModelsPreparing Data



Classical MLModels

Deep LearningModels

Connectionwith NLP


Preparing Time Series Data for Machine Learning

TimeExtra

FeatureFeature of

Interest

t1 e1 x1t2 e2 x2t3 e3 x3t4 e4 x4t5 e5 x5t6 e6 x6t7 e7 x7t8 e8 x8t9 e9 x9t10 e10 x10t11 e11 x11t12 e12 x12


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


One Step Forecasting Setup

TimeExtra

FeatureFeature of

InterestForecast Feature

of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12t12 e12 x12 NaN


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Random Split

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t4 e4 x4 x5t6 e6 x6 x7t7 e7 x7 x8t9 e9 x9 x10t10 e10 x10 x11

Table: Train Set

TimeExtra

FeatureFeature of


of Interest

t3 e3 x3 x4t5 e5 x5 x6t8 e8 x8 x9t11 e11 x11 x12

Table: Test Set


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Sequential Split

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7t7 e7 x7 x8t8 e8 x8 x9

Table: Train Set

TimeExtra

FeatureFeature of


of Interest

t9 e9 x9 x10t10 e10 x10 x11t11 e11 x11 x12

Table: Test Set


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Multiple Train-Test Split

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4

Table: Train Set 1

TimeExtra

FeatureFeature of


of Interest

t1 e1 x1 x2t2 e2 x2 x3t3 e3 x3 x4t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7

Table: Train Set 2

TimeExtra

FeatureFeature of


of Interest

t4 e4 x4 x5t5 e5 x5 x6t6 e6 x6 x7

Table: Test Set 1

TimeExtra

FeatureFeature of


of Interest

t7 e7 x7 x8t8 e8 x8 x9t9 e9 x9 x10

Table: Test Set 2


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Multiple Train-Test Split (contd.)

Train Size Test Size

3 timesteps (t1 - t3) 3 timesteps (t4 - t6)





CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Expanding Window Multiple Sets

Train Size Test Size

10 timesteps (t1 - t10) t1111 timesteps (t1 - t11) t1212 timesteps (t1 - t12) t1313 timesteps (t1 - t13) t14


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Fixed Window Sequential Data

Sequence No. Input Data Output Data

1 t1, t2, t3 t42 t2, t3, t4 t53 t3, t4, t5 t64 t4, t5, t6 t75 t5, t6, t7 t86 t6, t7, t8 t97 t7, t8, t9 t108 t8, t9, t10 t11


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Comparison of Different Dataset Preparation

Split Approach Comments

Random SplitNot advisable as the temporalinformation is lost

Sequential Split Mostly preferred on large datasets

Multiple Splits Leads to leakage of data

Expanding WindowMultiple Sets

Also known as Walk Forward validation


CFC ILTenter

or

nd ian

anguage

Classical ML ModelsML Models



Classical MLModels

Deep LearningModels

Connectionwith NLP


Linear Regression

Models change in one variable through change in othervariables

Method for finding the linear relationship betweenindependent and dependent variables

Assuming a linear relationship exists !!!

Also known as line of best fit, ordinary least squaresregression, etc.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Estimating Linear Regression

Simple univariate linear regression

Given training data of the form (x , y), learn w and b such that

(y − (wx + b))2 (18)

is minimized

Simple multivariate linear regression

Given training data of the form (X , y) where X is ndimensional, learn w1 . . .wn and b such that

(y − (n∑

i=1

wixi + b))2 (19)

is minimized


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Forecasting with Linear Regression

Simple univariate linear regression

Given new data X , forecast using learned parameters w and bas

y = wx + b (20)

Simple multivariate linear regression

Given new data X where X is n dimensional, forecast usinglearned parameters w1 . . .wn and b as

y =n∑

i=1

wixi + b (21)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Support Vector Regression

This model uses the concept of support vectors for regression

Performs linear regression in high dimensional feature space

Aim is to fit the error within a threshold range

A hyperplane is obtained such that the loss is minimized

Loss is considered to be zero within small deviation ε fromhyperplane


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Estimating SVR

Multivariate scenario

Given training data of the form (X , y) where X is ndimensional, learn w1 . . .wn and b such that

Loss =

{0 |f (xi )− yi | < ε|f (xi )− yi | − ε otherwise

(22)

is minimizedwhere f (xi ) =

∑ni=1 wixi + b


CFC ILTenter

or

nd ian

anguage

Deep Learning Models



Classical MLModels

Deep LearningModels

Connectionwith NLP


Feedforward Neural Networks

Inputlayer

Hiddenlayer

Outputlayer


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Feedforward Neural Network: Forward Propagation

Let X = (x1, . . . , xn) be the set of input features

hidden layer activation neurons,aj = f (

∑ni=1Wjixi ), ∀j ∈ 1, . . . h


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Feedforward Neural Network: Forward Propagation

Let a = (a1, . . . , ah) be the set of hidden layer features

output neurons, ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Feedforward Neural Network: Learning Algorithm

Adjust weights W and U to minimize the error on training set

Define the error to be squared loss between predictions andtrue output

E =1

2(y − o)2 (23)

Gradient w.r.t to output is,

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok) (24)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Recurrent Neural Networks

Feed forward networks cannot handle sequences

If sequential data is flattened, then it can be learned by FFN

However, weights will not be shared across timesteps

Recurrent networks to the rescue!


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


An Unrolled RNN

NOTE : Hidden state (ht) tells us summary of the sequence tilltime t

Forward passht = tanh(Wht−1 + Uxt + bh)

zt = softmax(Vht + bz)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Long Short Term Memory (LSTM) Network

Forward Pass 1

ft = σ(Wf [ht−1; xt ] + bf )it = σ(Wi [ht−1; xt ] + bi )at = tanh(Wa[ht−1; xt ] + ba)ot = σ(Wo [ht−1; xt ] + bo)

Ct = ft ∗ Ct−1 + it ∗ atht = ot ∗ tanh(Ct)

1For a more detailed treatment of neural networks, refer ICON 2018 slidesat http://www.cfilt.iitb.ac.in/documents/ICON_Tutorial_2018.pdf byKevin Patel and Himanshu Singh


CFC ILTenter

or

nd ian

anguage

http://www.cfilt.iitb.ac.in/documents/ICON_Tutorial_2018.pdf

Connection with NLP


Connection with NLPProblem Level



Classical MLModels

Deep LearningModels

Connectionwith NLP


Problem Level Connections

Discuss connections and similarities among problems and howone solution can impact another

Time Series Forecasting benefitting from NLP

NLP benefitting from Time Series Forecasting


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


On the Importance of Text Analysis for Stock PricePrediction

By Lee et al. (2014)

Forecast companies’ stock price changes (UP, DOWN, STAY)in response to financial events reported by them in 8-Kdocuments

Baseline: Using recent stock price movement and earningssurprise

Contribution: Using textual information from 8-K documentsalong with recent stock price movement and earnings surprise

Observation: Proposed system outperforms baseline by 10%

Resource Contribution: Annotated 8-K documents for use infurther research


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Semantic Frames to Predict Stock Price Movement

Xie et al. (2013)

Uses FrameNet information to generalize specific sentences toscenarios


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Semantic Frames to Predict Stock Price Movement(contd.)

Predict 1) change in stock price or 2) polarity of change(up/down)

Baseline: BOW features and LDA

Contribution: FWD (Frames, BOW and part-of-speechspecific DAL scores) features and SemTree datarepresentations

Model: SVM with tree kernels

Observation: Proposed features assist significantly in polaritytask, and show promise in change task.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Stock Movement Prediction from Tweets and HistoricalPrices

Xu and Cohen (2018)

Present a novel deep generative model jointly exploiting textand price signals for this task

Introduce recurrent, continuous latent variables for bettertreatment of stochasticity, and use neural variational inference

Resource Contribution: A new stock movement predictiondataset 2

2https://github.com/yumoxu/stocknet-dataset


CFC ILTenter

or

nd ian

anguage

https://github.com/yumoxu/stocknet-dataset


Classical MLModels

Deep LearningModels

Connectionwith NLP


Identifying and Following Expert Investors in StockMicroblogs

Bar-Haim et al. (2011)

Task: Identify expert investors from the information publishedin online stock investment message boards / tweets

Indirect evaluation by considering advice of detected expertsin stock prediction

Baseline: Assume all users as experts

Contribution: A probabilistic expert finding framework

Observation: Information from tweets of identified expertsallowed to forecast stock price movement with higherprecision.


CFC ILTenter

or

nd ian

anguage

Connection with NLPTooling Level



Classical MLModels

Deep LearningModels

Connectionwith NLP


Attention

Anyone familiar with technical indicators in stock marketpredictions will probably have an epiphany at this point

Attention can be used by a neural network to attend toarbitrary portions of the time signal for forecasting

Qin et al. (2017) use two attentions in their paper to improvetime series prediction

An input attention which adaptively extracts relevant inputfeatures (more interpretable)A temporal attention over the encoder states (betterperformance)

Outperforms state-of-the-art on two time series predictiondatasets


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Transfer Learning

From AlexNet, ResNet in computer vision to BERT, ELMO inNLP, transfer learning has proved its effectiveness

The idea of learning on one scenario and applying the same inother with little tuning is fascinating

What could be the equivalent in time series forecasting?

Ye and Dai (2018) mix transfer learning with online sequentialextreme learning machine and ensemble learning

Does not discard long-ago data

Instead the authors claim that their model is able to transferknowledge from long-ago data

Showed effectiveness on multiple synthetic and real world data


CFC ILTenter

or

nd ian

anguage

Demos


DemosStatsmodel Library



Classical MLModels

Deep LearningModels

Connectionwith NLP


Statsmodel Library

Statsmodels is a Python package that allows users to exploredata, estimate statistical models, and perform statistical tests

Has extensive list of descriptive statistics, statistical tests,plotting functions, and result statistics

For different data types and estimators

Built on top of NumPy and SciPy

Integrates with Pandas


CFC ILTenter

or

nd ian

anguage

DemosProphet Library



Classical MLModels

Deep LearningModels

Connectionwith NLP


Prophet Library

An open source library released by Facebook

Originally designed to forecast business data internally atFacebook

For more details, refer Taylor and Letham (2018)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Prophet

It is an additive regression model with 4 components

Yt = gt + st + ht + εt (25)

where gt is trend, st is seasonality, ht is holidays and εt is the errorterm.

It automatically detects the change points in data

It is robust to missing data and shifts in the trend and handlesoutliers well.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Other Options

Google AutoML

Provides a web interfaceModels being used is hiddenProvides a final metric of the model being usedHas a trial period

Microsoft Azure

Provides a web interfaceExposes the list of modelsProvides metrics for each model being testedHas a trial period

Amazon Forecast

Provides a web interfaceExposes the list of modelsProvides a final metric of the model being usedNo trial period


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Demo Content

The code for the demo on Statsmodel and ARIMA can befound athttps://github.com/Sandhya2207/ICON-2019-TSF-demo

The code for the demo on how statistical techniques helpdeep learning techniques can be found athttps://github.com/KevinNPatel/icon2019_demo


CFC ILTenter

or

nd ian

anguage

https://github.com/Sandhya2207/ICON-2019-TSF-demo

https://github.com/KevinNPatel/icon2019_demo

Conclusion



Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Forecasting Competitions

Santa Fe Time Series Prediction and Analysis Competition(1994)

International Workshop on Advanced Black-box techniques fornon-linear modeling competition (1998)

NN3 and NN5 competitions

Kaggle challenges

Makridakis challenges (M1, M2, M3 and M4)


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Forecasting Conferences

Makridakis conferenceshttps://mofc.unic.ac.cy/m-conferences/

International conference on Time Series and Forecasting(ITISE)

ACM Special Interest Group on Knowledge, Data andDiscovery (SIGKDD)

IEEE International Conference on Data Mining (ICDM)

Society for Industrial and Applied Mathematics (SIAM)

The usual ACL, EMNLP, etc. for the interplay between textand time series


CFC ILTenter

or

nd ian

anguage

https://mofc.unic.ac.cy/m-conferences/


Classical MLModels

Deep LearningModels

Connectionwith NLP


Time Series Forecasting Datasets

UCR datahttps://www.cs.ucr.edu/~eamonn/UCRsuite.html

Makridakis challenge data

Simple data for testing models using generative models likeARIMA

Simple real world datasets 3:Airline Passenger datasetShampoo Sales datasetMinimum Daily Temperatures datasetMonthly Sunspot datasetDaily Female Births datasetEEG Eye State datasetOccupancy Detection datasetOzone Level Detection dataset

3https://machinelearningmastery.com/

time-series-datasets-for-machine-learning/


CFC ILTenter

or

nd ian

anguage

https://www.cs.ucr.edu/~eamonn/UCRsuite.html

https://machinelearningmastery.com/time-series-datasets-for-machine-learning/

https://machinelearningmastery.com/time-series-datasets-for-machine-learning/


Classical MLModels

Deep LearningModels

Connectionwith NLP


Conclusion

Time Series Forecasting - interesting challenge

Provided background and clarified terminology of time series

Discussed different approaches

Statistical ApproachesClassical MLDeep Learning

Discussed few papers showcasing interplay between NLP andTime Series and potential future directions worth exploring


CFC ILTenter

or

nd ian

anguage

Thank You


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


References I

Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., and Goldstein,G. (2011). Identifying and following expert investors in stockmicroblogs. In Proceedings of the 2011 Conference on EmpiricalMethods in Natural Language Processing, pages 1310–1319,Edinburgh, Scotland, UK. Association for ComputationalLinguistics.

Lee, H., Surdeanu, M., MacCartney, B., and Jurafsky, D. (2014).On the importance of text analysis for stock price prediction. InLREC, pages 1170–1175.

Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., and Cottrell,G. W. (2017). A dual-stage attention-based recurrent neuralnetwork for time series prediction. In Proceedings of the 26thInternational Joint Conference on Artificial Intelligence, pages2627–2633. AAAI Press.


CFC ILTenter

or

nd ian

anguage


Classical MLModels

Deep LearningModels

Connectionwith NLP


References II

Taylor, S. J. and Letham, B. (2018). Forecasting at scale. TheAmerican Statistician, 72(1):37–45.

Xie, B., Passonneau, R. J., Wu, L., and Creamer, G. G. (2013).Semantic frames to predict stock price movement. InProceedings of the 51st Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers), pages873–883, Sofia, Bulgaria. Association for ComputationalLinguistics.

Xu, Y. and Cohen, S. B. (2018). Stock movement prediction fromtweets and historical prices. In Proceedings of the 56th AnnualMeeting of the Association for Computational Linguistics(Volume 1: Long Papers), pages 1970–1979.

Ye, R. and Dai, Q. (2018). A novel transfer learning framework fortime series forecasting. Knowledge-Based Systems, 156:74–99.


CFC ILTenter

or

nd ian

anguage

introduction to time series forecasting

Documents