causality detection

42
Causality Detection in Time Series Tushar Mehndiratta IDD CSE ( V year) 10211026

Upload: tushar-mehndiratta

Post on 10-Jul-2015

253 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Causality detection

Causality Detection in Time SeriesTushar MehndirattaIDD CSE ( V year)10211026

Page 2: Causality detection

Overview

Introduction

Detecting Causality Control Experimentation

Granger Causality

Building Causal Relationship Graphs Exhaustive Granger Method

Lasso Granger Method

Forward backward Granger Method

Application of Causal Modelling Brain Imaging

Topic Mining

Anomaly Detection

Conclusion

Page 3: Causality detection

Why?

Why did the apple fall down instead of going

up?

Why does average temperature rise?

Why did the stock market fall?

Why did a post go viral on facebook?

CAUSE EFFECT RELATIONSHIPS

Page 4: Causality detection

Cause - Effect Relationships◦ Causality is defined as the relation between two events: cause and effect where

the effect occurs as a consequence of the cause.

◦ Effect is “What happened?” and Cause is “Why it happened?”

◦ e.g. In case of global warming, the increase in Greenhouse gases is the cause and increase in average temperature is the effect.[1]

Page 5: Causality detection

Characteristics of Causal Relationships◦ Temporal Precedence: It states that the cause occurs prior to the effect. e.g. A person

must smoke first and then he gets lung cancer.

◦ Co-occurrence : Whenever cause happens, effect must also happen. Cause cannot be

isolated from the effect. e.g. Whenever there is a net force on a body, it will accelerate.

Is Causality same as Association then?

Page 6: Causality detection

Correlation Vs Causation◦ Correlation does not imply Causation

◦ Correlation only means that two events co-exist more often than ordinary chance.[2]

Page 7: Causality detection

Physics

EconometricsTypes of Data: web metrics , stock prices, sales

(all time series)

MedicineTypes of Data: experiments result, gene

sequences(sequential data), brain signals(time series)

Climate ScienceTypes of Data: weather conditions (spatio-

temporal or temporal data)

Fields of Study

HOW TO DETECT CAUSALITY?

Page 8: Causality detection

Detecting CausalityTo test if X causes Y

Page 9: Causality detection

Control ExperimentationAim: To find out what happens to a system when you interfere with it.

Divide subjects randomly into two groups: Test and

Control

Introduce X only in the test group and observe Y in both.

If X causes Y : ((ܺ)݀|ݕ=ܻ)ܲ >

((ܺ)݀!|ݕ=ܻ)ܲimplies Causality

Page 10: Causality detection

Disadvantage of Control Experimentation◦ Not possible to always carry out the experiment.

◦ Most time series data cannot be manipulated. e.g. Climate, Stock data

◦ Have to resort to statistical methods to determine causality.

HOW TO DO IT IN TIME SERIES?

Page 11: Causality detection

Time Series◦ A time series is a sequence of data points, measured typically at successive

points in time spaced at uniform time intervals.

Page 12: Causality detection

Granger Causality ◦ Also known as Predictive Causality.

◦ Granger said that Causality could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series.

◦ Two main principles:

Cause must occur before the Effect.

The Cause can be used to predict the of Effect i.e. Cause has some unique information

about the future values of the effect.

Page 13: Causality detection

Granger Causality

𝑃[𝑌(𝑡 + 1)|𝛤 𝑡 ≠ 𝑃[𝑌(𝑡 + 1)| 𝛤−X 𝑡

𝛤 𝑡 and 𝛤−X 𝑡 denote the “information in the universe up to time t” and “information in alternate universe up to time t in which X is excluded”.

Suppose X and Y are two time series and for X to cause Y :

Page 14: Causality detection

Performing the Granger Causality test◦ Model 1: Build model 1 by regressing on the past values of both X and Y

𝐸(𝑌|𝑌𝑡−𝑘 , 𝑋𝑡−𝑘) 𝑌𝑡 = 𝑗=1𝑚 𝛼𝑗𝑌𝑡−𝑗 + 𝑖=1

𝑛 𝛽𝑖 𝑋𝑡−𝑖 + 𝐷𝑡 + 𝜀𝑡

◦ Model 2: Build model 2 by regressing on the past values of Y only

𝐸(𝑌|𝑌𝑡−𝑘) 𝑌𝑡 = 𝑗=1𝑚 𝛼𝑗𝑌𝑡−𝑗 + 𝐷𝑡 + 𝜀𝑡

◦ Check whether the prediction accuracy has significantly increased by performing F-test.[11]

Page 15: Causality detection

Granger Causality• CONS

It does not take into account the effect of hidden common

causes(confounders)

It assumes that all the relationships are linear in nature and does not account

for non-linear dependencies.

HOW TO DEAL WITH MULTIPLE TIME SERIES?

Page 16: Causality detection

Relationship Graphs in Time SeriesExtending the concept of Granger Causality to Multiple Time Series

Page 17: Causality detection

Relationship Graphs◦ Relationship graph has all time series as nodes and an edge between any two

nodes denotes the direction of relationship between the two.

◦ Input: Matrix X of time series

Xlag which is the lagged versions of time series matrix X.

◦ Output◦ Relationship graph between the time series with nodes xi’s each edge from xi to xj if xi

causes xj.

xi xj

Page 18: Causality detection

Exhaustive Graphical Granger method ◦ Algorithm:

◦ For every pair of nodes(xi,xj) perform the following

Insert an edge xi → xj if Granger (xi,xj, Xlag) = ‘yes’ and Granger (xj,xi, Xlag) = ‘no’

Insert an edge xi ← xj if Granger (xi,xj, Xlag)= ‘no’ and Granger (xj,xi, Xlag) = ‘yes’

Insert an edge xi↔xj, if Granger (xi,xj, Xlag) = ‘yes’ and Granger (xj,xi, X

lag)= ‘yes’

Page 19: Causality detection

Exhaustive Graphical Granger method ◦ Complexity

A total of N time series with T lags each and P time stamps/sample size, makes the

complexity as O(N2P2T2).

◦ Shortcomings

Not considering the effect of other time series.

Computationally expensive.

Page 20: Causality detection

The LASSO-Granger Method

LASSO-Least Absolute Shrinkage and Selection Operator

◦ Uses variable selection in Causality Detection

◦ Aim is to identify the subset of time series on which xi is conditionally dependent

and on what lag is it dependent.

◦ Achieved by applying variable selection on the set of time series and the lags

◦ Variable selection is done by LASSO.

Page 21: Causality detection

LASSO◦ A selection method for linear regression

◦ Selects a subset of variables subject to the following condition

𝑤 = 𝑚𝑖𝑛1

n (𝑤. 𝑥 − 𝑦)2+𝜆 𝑤

Here w is the vector of coefficients, y is the variable to be predicted.

◦ Aim is to minimize the OLS error and the sum of coefficients to prevent over

fitting.

◦ LARS(Least Angle Regression): best method to achieve LASSO.

Page 22: Causality detection

LARS(Least Angle Regression Shrinkage)

Step 1: Start with û0=0

Step 2: The residual ŷ2-û0 has a greater correlation with x1 than with x2

Page 23: Causality detection

LARS(Least Angle Regression Shrinkage)

Step 3: Move in the direction of x1

Page 24: Causality detection

LARS(Least Angle Regression Shrinkage)

Step 4: First LARS estimate : û1 = û0 + ƛx1

where the residual ŷ2-û1

has equal correlation with both x1 and x2

Page 25: Causality detection

LARS(Least Angle Regression Shrinkage)

Step 5: Move in the direction of Angular bisector of x1 and x2

Page 26: Causality detection

The Lasso-Granger Method◦ Algorithm

Obtain Xlag(the lagged version of the time series matrix X).

For each xi in X,

y= xi

Performs LASSO (y,Xlag)

Wi : the set of time series for which the coefficients returned by are non-zero.

Add edge (xj, xi) to the graph if xj is in Wj

Page 27: Causality detection

The Lasso-Granger Method◦ Complexity

Using LARS to solve the lasso problem: O(PN2T2).

◦ Pros.

Computationally less expensive.

Can be used when number of series are quite large as compared to the number of data

points.

Consistency: The probability of Lasso falsely including a non-neighboring feature in its

neighborhood is very small even when the number of features are very large.

Page 28: Causality detection

Forward Backward Granger Causality◦ Improvement on LASSO-Granger Algorithm

◦ Inspired from Physics

◦ Principle: Reverse time and all the relationships must remain same except for

change in direction, i.e. if xi causes xj with a time lag of k then on reversing time xj

will cause xi with time lag k.

◦ Apply LASSO-Granger on both the forward and backward time series and

combine the results of the two.

Page 29: Causality detection

Application of Causal ModellingBRAIN IMAGING TOPIC MINING ANOMALY DETECTIO N

Page 30: Causality detection

Brain Imaging◦ How different portions of the brain affect one another.

Identify the direction and order of influence

◦ Apply Granger Causality to obtain the relationship between different components of

the brain.

Obtain fMRI data from the brain corresponding to a stimulus and divided it into

independent components corresponding to different sections of the brain.

Each independent component corresponds to a time series.

Apply Exhaustive Granger test to obtain the relationship between different time

series.

Page 31: Causality detection

Brain Imaging◦ Advantages:

No prior assumption about the nodes and their inter-connections.

Measures not only the connections but also the time lags between interactions.

Can work with a large number of regions.

Page 32: Causality detection

Mining topics based on Causality◦ Identification of topics that are causally related with the non textual data

iteratively.

◦ InCaToMi (Integrative Causal Topic Miner)

◦ Architecture:

Topic modelling module

Causality Module

Feedback

Text Data

Page 33: Causality detection

InCaToMi: Integrative Causal Topic Miner◦ Topic Modelling Module:

Takes text and number of topics as input.

Creates topics based on word probabilities and the likelihood of each topic in the

document using PLSA algorithm.

Time series of the topic formed by summation of likelihood of each word in the topic for

a day.

Page 34: Causality detection

InCaToMi: Integrative Causal Topic Miner◦ Causality Module:

Perform the Granger Causality test for the time series for each topic and for each word in

the topic.

Form new candidate topic by selecting the words which are most causally related with the

non textual series.

Use this as prior for the next round of Topic Modelling.

Page 35: Causality detection

Anomaly Detection◦ Types of Anomalies:

Univariate Anomalies

Dependency Anomalies

◦ Given two sets of data sequences A(training) and B(test) each containing p time

series we have to find data points in B which significantly deviate from the

normal pattern of data sequence.

◦ Algorithm for finding dependency anomalies.

Page 36: Causality detection

Anomaly Detection

Learning temporal causal graphs by regularization

Finding the Anomaly Score using Kullback-

Leibler (KL) Divergence

Determining Anomalies by

specifying a threshold and finding the

underlying causes

Hypothesis: Causal Graphs of both remain the same

Page 37: Causality detection

Anomaly Detection

Learning temporal causal graphs by regularization

Finding the Anomaly Score using Kullback-

Leibler (KL) Divergence

Determining Anomalies by

specifying a threshold and finding the

underlying causes

Calculate the graph for A by LASSO Granger method.

When finding the causal graph for B we need to apply additional constraints. This can be done using two methods:a) Neighborhood Similarity: This implies imposing an additional constraint that the values of β(a) should be zero or non-zero only when the value of β(b) are zero or non-zero. Here β(a) and β(b) are the coefficients obtained by running Lasso Granger on set A and Set B respectively.b) Coefficient similarity: The constraint is that the coefficients β(a) and β(b) should be similar.

Page 38: Causality detection

Anomaly Detection

Learning temporal causal graphs by regularization

Finding the Anomaly Score using Kullback-

Leibler (KL) Divergence

Determining Anomalies by

specifying a threshold and finding the

underlying causes

KL divergence is a measure of how much one distribution differs from another.

Obtain the distributions for the two time series and the anomaly score is calculated using the KL formulae.

Page 39: Causality detection

Anomaly Detection

Learning temporal causal graphs by regularization

Finding the Anomaly Score using Kullback-

Leibler (KL) Divergence

Determining Anomalies by

specifying a threshold and finding the

underlying causes

• To set a threshold we calculate how a normal time series would score on the anomaly score.

• We slide the window through the reference data and calculate the anomaly scores for each window.

• We them use these to approximate the distribution of anomaly scores that a normal time series should have.

• Given a significance level α, we set the α quantile of the distribution as threshold cutoff.

Page 40: Causality detection

Conclusion◦ Widespread application of causal relationships motivates the study.

◦ Completely data driven approach. So provides a new outlook in every field

without making any assumptions.

◦ Further Scope:

Applying the model to different domains. e.g Climate and Social media

Predicting anomalous behavior.

Page 41: Causality detection

References[1] Lashof, Daniel A., and Dilip R. Ahuja. "Relative contributions of greenhouse gas emissions to global warming." (1990): 529-531.

[2] Perry, Ronen. "Correlation versus Causality: Further Thoughts on the Law Review/Law School Liaison." Conn. L. Rev. 39 (2006): 77.

[3] Diks, Cees, and Valentyn Panchenko. Modified hiemstra-jones test for Granger non-causality. No. 192. Society for Computational Economics, 2004.

[4] Granger, Clive WJ. "Investigating causal relations by econometric models and cross-spectral methods." Econometrica: Journal of the Econometric Society (1969): 424-438.

[5] Arnold, Andrew, Yan Liu, and Naoki Abe. "Temporal causal modeling with graphical granger methods." Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007

[6] Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267-288.

[7] Cheng, Dehua, Mohammad Taha Bahadori, and Yan Liu. "FBLG: a simple and effective approach for temporal dependence discovery from time series data."Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.

[8] Smith, Delmas, Iwabuchi, Kirk. “Demonstrating causal links between fMRI time series using time-lagged correlation”.

[9] Kim, Hyun Duk, et al. "Incatomi: Integrative causal topic miner between textual and non-textual time series data." Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012.

[10] Qiu, Liu, Subrahmanya, et al. "Granger Causality for Time-Series Anomaly Detection." Proceedings of the 12th IEEE international conference on data mining, 2012.

[11] Lomax, Richard G. (2007) Statistical Concepts: A Second Course, p. 10

Page 42: Causality detection

Thanks!! Any Questions?