causal inference from multivariate time series: principles ...• f∗(t)- all information in the...
TRANSCRIPT
Theory of Big Data 2 Conference
Big Data Institute, University College London
Causal Inference from Multivariate Time Series:Principles and Problems
Michael Eichler
Department of Quantitative Economics
Maastricht University
http://researchers-sbe.unimaas.nl/michaeleichler
6 January 2016
Outline
• Causality concepts
• Graphical representation
• Definition• Markov properties• Extension: systems with latent variables
• Causal learning
• Basic principles• Identification from empirical relationships
• Non-Markovian constraints
• Trek-separation in graphs• Tetrad representation theorem• Testing for tetrad constraints
• Open problems and conclusions
2 / 52
Concepts of causality for time series
We consider two variables X and Y measured at discrete times t ∈ Z:
X =�
Xt
�
t∈Z, Y =�
Yt
�
t∈Z.
Question: When is it justified to say that X causes Y?
Various approaches:
• Intervention causality (Pearl, 1993; Eichler & Didelez 2007, 2010)
• Structural causality (White and Lu, 2010)
• Granger causality (Granger, 1967, 1980, 1988)
• Sims causality (Sims, 1972)
3 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
4 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
This leads us to consider two information sets:
• F ∗(t) - all information in the universe up to time t
• F ∗−X(t) - this information except the values of X
4 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
This leads us to consider two information sets:
• F ∗(t) - all information in the universe up to time t
• F ∗−X(t) - this information except the values of X
Granger’s definition of causality (Granger 1969, 1980)
We say that X causes Y if the probability distributions of
• Yt+1 given F ∗(t) and
• Yt+1 given F ∗−X(t)
are different.
4 / 52
Granger causality
Problem: The definition cannot be used with actual data.
5 / 52
Granger causality
Problem: The definition cannot be used with actual data.
Suppose data consist of multivariate time series V = (X,Y,Z) and let
• {Xt} - information given by X up to time t
• similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
Yt+1⊥⊥Xt |Yt,Zt.
• Otherwise we say that X Granger-causes Y with respect to V.
5 / 52
Granger causality
Problem: The definition cannot be used with actual data.
Suppose data consist of multivariate time series V = (X,Y,Z) and let
• {Xt} - information given by X up to time t
• similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
Yt+1⊥⊥Xt |Yt,Zt.
• Otherwise we say that X Granger-causes Y with respect to V.
Additionally:
• X and Y are said to be contemporaneously independent w.r.t. V if
Xt+1⊥⊥Yt+1 |Vt
5 / 52
Sims causality
Definition: Sims non-causality
X does not Sims-cause Y with respect to V = (X,Y,Z) if
{Yt′ |t′ > t}⊥⊥Xt |X
t−1,Y t,Zt
Note:
• Granger causality is a concept of direct causality
• Sims causality is a concept of total causality (direct and indirect
pathways)
The following statistics are measures for Sims causality:
• impulse response function (time and frequency domain)
• direct transfer function (DTF)
6 / 52
Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector
autoregressive representation
Xt =∞∑
k=1
Ak Xt−k + ǫt
Granger non-causality in VAR models:
The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈N.
7 / 52
Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector
autoregressive representation
Xt =∞∑
k=1
Ak Xt−k + ǫt =∞∑
k=0
Bk ǫt−k
Granger non-causality in VAR models:
The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈N.
Sims non-causality in VAR models:
The following are equivalent:
• Xb does not Sims cause Xa with respect to X;
• Bab,k = 0 for all k ∈N.
7 / 52
Outline
• Causality concepts
• Graphical representation
• Definition• Markov properties• Extension: systems with latent variables
• Causal learning
• Basic principles• Identification from empirical relationships
• Non-Markovian constraints
• Trek-separation in graphs• Tetrad representation theorem• Testing for tetrad constraints
• Open problems and conclusions
8 / 52
Graphical models for time series
Basic idea: use graphs to encode conditional independences among
variables
• nodes/vertices represent variables
• missing edge between two nodes implies conditional independence
of the two variables
Application to time series:
• treat each variable at each time separately ( time series chain
graphs)
• treat each series as one variables (only one node in the graph)
9 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv;
10 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv;
• directed edges between the vertices indicate Granger-causal
relationships;
10 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv;
• directed edges between the vertices indicate Granger-causal
relationships;
• additionally undirected (dashed) edges indicate contemporaneous
associations.
10 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
• X2,t = f2(X4,t−1) + ǫ2,t
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
• X2,t = f2(X4,t−1) + ǫ2,t
• X3,t = f3(X1,t−1,X2,t−1) + ǫ3,t
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
• X2,t = f2(X4,t−1) + ǫ2,t
• X3,t = f3(X1,t−1,X2,t−1) + ǫ3,t
• X4,t = f4(X3,t−1,X5,t−1) + ǫ4,t
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
• X2,t = f2(X4,t−1) + ǫ2,t
• X3,t = f3(X1,t−1,X2,t−1) + ǫ3,t
• X4,t = f4(X3,t−1,X5,t−1) + ǫ4,t
• X5,t = f5(X3,t−1) + ǫ5,t
11 / 52
Graphical models for time seriesGranger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
Xt = f(Xt−1) + ǫt
1 3 5
2 4
with
• X1,t = f1(X3,t−1) + ǫ1,t
• X2,t = f2(X4,t−1) + ǫ2,t
• X3,t = f3(X1,t−1,X2,t−1) + ǫ3,t
• X4,t = f4(X3,t−1,X5,t−1) + ǫ4,t
• X5,t = f5(X3,t−1) + ǫ5,t
• ǫ1,t,ǫ2,t,ǫ3,t⊥⊥ǫ4,t,ǫ5,t
ǫ4,t⊥⊥ǫ5,t
11 / 52
Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V
12 / 52
Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V
Idea: characterize pathways that induce associations
12 / 52
Markov properties
Objective: derive Granger-causal relationships for XS, S ⊆ V
Idea: characterize pathways that induce associations
Tool: concepts of separation in graphs
• DAGs: d-separation (Pearl 1988)
• mixed graphs: d-separation (Spirtes et al. 1998, Koster 1999) or
m-separation (Richardson 2003)
12 / 52
Markov properties
1
2
3
p(x) = p(x3|x2)p(x2|x1)p(x1)
⇒ X3⊥⊥X1 |X2
13 / 52
Markov properties
1
2
3
p(x) = p(x3|x2)p(x2|x1)p(x1)
⇒ X3⊥⊥X1 |X2
1
2
3
p(x) = p(x1|x2)p(x3|x2)p(x2)
⇒ X3⊥⊥X1 |X2
13 / 52
Markov properties
1
2
3
p(x) = p(x3|x2)p(x2|x1)p(x1)
⇒ X3⊥⊥X1 |X2
1
2
3
p(x) = p(x1|x2)p(x3|x2)p(x2)
⇒ X3⊥⊥X1 |X2
1
2
3
p(x) = p(x2|x1, x3)p(x3)p(x1)
6⇒ X3⊥⊥X1 |X2
13 / 52
Global Granger-causal Markov propertySeparation in mixed graphs
Question: What type of paths induce Granger causal relations between
variables?
Note: Granger (non)causality is not symmetric
Idea: consider only paths ending with a directed edge�
Examples: 1� 2� 3� 4 entails
• X1 does not Granger cause X4 with respect to X1,X4
• X1 does not Granger cause X4 with respect to X1,X3,X4
• X1 does not Granger cause X4 with respect to X1,X2,X3,X4
but not
• X1 does not Granger cause X4 with respect to X1,X2,X4
14 / 52
Outline
• Causality concepts
• Graphical representation
• Definition• Markov properties• Extension: systems with latent variables
• Causal learning
• Basic principles• Identification from empirical relationships
• Non-Markovian constraints
• Trek-separation in graphs• Tetrad representation theorem• Testing for tetrad constraints
• Open problems and conclusions
15 / 52
Principles of causal inference
Objective: identify causal structure of process X
Question: What to use in practise?
• Granger causality or Sims causality
• bivariate or fully multivariate analysis
16 / 52
Principles of causal inference
Objective: identify causal structure of process X
Question: What to use in practise?
• Granger causality or Sims causality
• bivariate or fully multivariate analysis
Answer:
For causal inference . . . all and more.
16 / 52
Principles of identification
An example of indirect causality:
1
2
3
implies for the bivariate submodel
1 3
17 / 52
Principles of identification
An example of spurious causality:
1
2
3
L
implies for the trivariate and bivariate submodels
1
2
3
1 3
18 / 52
Principles of identification
Inverse problem:
What can we say about the full system based on observed
Granger-noncausal relations for the observed (sub)process?
Suppose
• Xa→ Xc [XS] for all {a, c} ⊆ S ⊆ V
• Xc→ Xb [XS] for all {c,b} ⊆ S ⊆ V
Rules of causal inference
• Indirect causality rule: Xa truely causes Xb if
Xa9 Xb [S] for some S ⊆ V with c ∈ S
• Spurious causality rule: Xa is a spurious cause of Xb if
Xa9 Xb [S] for some S ⊆ V with c /∈ S
19 / 52
Principles of causal inference
Y
Z
X
U
2 4 6 8 10
−0.2
0.0
0.2
0.4
lag h
AY
X(h
)
bivariate Granger
2 4 6 8 10
−0.2
0.0
0.2
0.4
lag h
AY
X(h
)
trivariate Granger
2 4 6 8 10 12 14
−0.2
0.0
0.2
0.4
lag h
BY
X(h
)
trivariate Sims
20 / 52
Principles of causal inference
Y
Z
X
U V
2 4 6 8 10
−0.2
0.0
0.2
0.4
lag h
AY
X(h
)
bivariate Granger
2 4 6 8 10
−0.2
0.0
0.2
0.4
lag h
AY
X(h
)
trivariate Granger
2 4 6 8 10 12 14
−0.2
0.0
0.2
0.4
lag h
BY
X(h
)
trivariate Sims
21 / 52
Identification of causal structure
Algorithm: identification of adjacencies
• insert a � b whenever Xa and Xb are not contemporaneously
independent
• insert a b whenever
• Xb→ Xa [XS] for all S ⊆ V with a,b ∈ S;
• Xa(t− k) 6⊥⊥Xb(t+ 1) |FS1(t)∨FS2
(t− k)∨Fa(t− k− 1)
for all k ∈N, t ∈ Z, for all disjoint S1,S2 ⊆ V with b ∈ S1 and
a /∈ S1 ∪ S2.
22 / 52
Identification of causal structure
Algorithm: identification of tails
• colliders:
a c b ∈ G and Xa9 Xb [XS] for some S such that c /∈ S
⇒ c b c� b
• non-colliders:
a c b ∈ G and Xa9 Xb [XS] for some S such that c ∈ S
⇒ c b c� b
• ancestors:
a� . . .� b in G ⇒ a b a� b
• discriminating paths: e.g. Ali et al. (2004)
23 / 52
Identification of causal structure
Example: application to neural spike train data
Time [sec]
0 2 4 6 8
Neuron 10Neuron 9 Neuron 8 Neuron 7 Neuron 6 Neuron 5 Neuron 4 Neuron 3 Neuron 2 Neuron 1
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
1→
2)
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
1→
3)
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
1→
4)
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
2→
3)
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
2→
4)
−60 −40 −20 0 20 40 60−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
lag
pdc(
3→
4)
24 / 52
Identification of causal structure
Example:
1
2 3
4
(a) (b) (c) (d)
(e)(f) (g) (h)
(i) (j) (k)
1
2 3
4
Result:
25 / 52
Outline
• Causality concepts
• Graphical representation
• Definition• Markov properties• Extension: systems with latent variables
• Causal learning
• Basic principles• Identification from empirical relationships
• Non-Markovian constraints
• Trek-separation in graphs• Tetrad representation theorem• Testing for tetrad constraints
• Open problems and conclusions
26 / 52
Problem
Example:
1 2 3 4
L
• X1,X2,X3,X4 are conditionally independent given L
• no conditional independences among X1, . . . ,X4.
27 / 52
Trek separation
Problem:
• conditional independences are not sufficient to describe processes
that involve latent variables
• identification of such structures relies on sparsity that is often not
given
Approach: Sullivant et al (2011) for multivariate Gaussian distributions
• new concept of separation in graphs
• encodes rank constraints on minors of covariance matrix
• generalizes other concepts of separation
• special case: conditional independences
28 / 52
Trek separation
A trek between nodes i and j is a path π= (πL,πM,πR) such that
• πL is a directed path from some node kL to i;
• πR is a directed path from some node kR to j;
• πM is an undirected edge kL � kR or a path of length zero (kL = kR).
Examples: i� kR � kL� j, i� v� k� j, i� v� j, i � j
Definition (trek separation)
(CL,CR) t-separates sets A and B if for every trek (πL,πM,πR)
• πL contains a vertex in CL or
• πR contains a vertex in CR.
29 / 52
Trek separation
Let X be a stationary Gaussian process with spectral matrix Σ(ω)
satisfying
Σ(ω) = 12π
∞∑
u=−∞
cov(Xt,Xt−u) e−i uω.
Theorem
Let X be G-Markov. Then the following are equivalent:
• rank(ΣAB(ω))≤ r for all ω ∈ [−π,π]
• A and B are t-separated by some (CL,CR) with |CL|+ |CR| ≤ r.
30 / 52
Trek separation
Corollaries:
Let X be Gaussian stationary process. Then
XA⊥⊥XB |XC ⇔ rank(ΣA∪C,B∪C) = |C|.
Furthermore the following are equivalent:
• XA⊥⊥XB |XC for all G-Markov processes X;
• (CA,CB) t-separates A∪ C and B∪ C for some partition C = CA ∪ CB.
31 / 52
Tetrad representation theorem
Consider the classM (G) of all G-Markov stationary Gaussian processes
Proposition
The following are equivalent:
• The spectral matrices Σ(·) of processes inM (G) satisfy
Σik(ω)Σjl(ω)−Σil(ω)Σjk(ω) = 0;
• {i, j} and {k, l} are t-separated by (c,∅) or (∅, c) for some node c in G
32 / 52
Tetrad representation theorem
If the spectral matrix Σ(ω) satisfies the tetrad constraints
Σik(ω)Σjl(ω)−Σil(ω)Σjk(ω) = 0
Σij(ω)Σkl(ω)−Σil(ω)Σkj(ω) = 0
Σik(ω)Σlj(ω)−Σij(ω)Σlk(ω) = 0
then there exists a node P such that Xi, Xj, Xk, and Xl are mutually
conditionally independent given XP.
1 2 3 4
P
Note: If no such XP is among the observed variables, XP must be a latent
factor.
33 / 52
Testing tetrad constraints
Approach: nonparametric test (Eichler 2008)
Null hypothesis: ψ(Σ(ω))≡ 0 where ψ(Z) = zik zjl − zil zjk
Test statistic:
ST =
∫
|ψ(Σ̂(ω))|2 dω.
where Σ̂(ω) is a kernel spectral estimator with bandwidth bT
34 / 52
Testing tetrad constraints
Approach: nonparametric test (Eichler 2008)
Null hypothesis: ψ(Σ(ω))≡ 0 where ψ(Z) = zik zjl − zil zjk
Test statistic:
ST =
∫
|ψ(Σ̂(ω))|2 dω.
where Σ̂(ω) is a kernel spectral estimator with bandwidth bT
Theorem Under the null hypothesis
b1/2T T ST − b
−1/2T µ
D→N (0,σ2),
where
µ= Ch Cw,2
∫
tr�
∇ψ(Σ(ω))′Σ(ω)∇ψ(Σ(−ω))Σ(ω)�
dω
σ2 = 4πC2h
Cw,4
∫
| tr�
∇ψ(Σ(ω))′ΣAA(ω)∇ψ(Σ(−ω))ΣBB(ω)�
|2 dω,
34 / 52
Latent variable models
Common identifiability constraint for factor models:
factors are uncorrelated/independent
But: in many applications (eg in neuroscience), we think of latent
variables that are causally connected.
• EEG recordings measures neural activity in close cortical regions
• fMRI recordings measure hemodynamic responses which depend on
underlying neural activity
Objective: recover latent processes and interrelations among them
35 / 52
Latent variable models
Suppose that Y(t) can be partioned into YI1(t), . . . ,YIr
(t) such that
YIj(t) = Λj Xj(t) + ǫIj
(t)
and X(t) is a VAR(p) process.
Then the model can be fitted by the following steps:
• identify clusters of variables depending on one latent variable
(based on tetrad rules)
• use PCA to determine latent variable processes Xj(t)
• fit VAR model to all latent variable processes jointly
36 / 52
Latent variable modelsExample
0 200 400 600 800 1000−15−10
−505
1015
time
X(1
)
0 200 400 600 800 1000−15−10
−505
1015
time
X(2
)
0 200 400 600 800 1000−30−20−10
01020
time
X(3
)
0 200 400 600 800 1000−4−2
024
time
X(4
)
0 200 400 600 800 1000−4−2
024
time
X(5
)
37 / 52
Latent variable modelsExample
Set {1,2} with:
• {3,4}: S= −0.98
• {3,5}: S= −0.31
• {4,5}: S= −1.4 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
38 / 52
Latent variable modelsExample
Set {1,3} with:
• {2,4}: S= −1.37
• {2,5}: S= 0.76
• {4,5}: S= −0.44 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
39 / 52
Latent variable modelsExample
Set {1,4} with:
• {2,3}: S= −1.19
• {2,5}: S= 6.54
• {3,5}: S= 6.55 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
40 / 52
Latent variable modelsExample
Set {1,5} with:
• {2,3}: S= −1.22
• {2,4}: S= 5.43
• {3,4}: S= 5.77 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
41 / 52
Latent variable modelsExample
Set {2,3} with:
• {1,4}: S= −1.18
• {1,5}: S= −1.21
• {4,5}: S= −1.58 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
42 / 52
Latent variable modelsExample
Set {2,4} with:
• {3,4}: S= −1.36
• {3,5}: S= 5.43
• {4,5}: S= 5.66 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
43 / 52
Latent variable modelsExample
Set {2,5} with:
• {1,3}: S= 0.76
• {1,4}: S= 6.55
• {3,4}: S= 5.73 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
44 / 52
Latent variable modelsExample
Set {3,4} with:
• {1,2}: S= −0.98
• {1,5}: S= 5.77
• {2,5}: S= 5.73 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
45 / 52
Latent variable modelsExample
Set {3,5} with:
• {1,2}: S= −0.31
• {1,4}: S= 6.54
• {2,4}: S= 5.66 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
46 / 52
Latent variable modelsExample
Set {4,5} with:
• {1,2}: S= −1.41
• {1,3}: S= −0.44
• {2,3}: S= −1.58 0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Index
abs(
Res
[m, ]
)
47 / 52
Latent variable models
Example:
1 2 3 4 5
P Q
48 / 52
Latent variable models
Example:
1 2 3 4 5 6
L1 L2 L3
49 / 52
Conclusion
Causal Inference is a complex task
• requires modelling at all levels (bivariate to fully multivariate)
• requires Granger causality as well as other measures (e.g. Sims
causality)
• definite results may be sparse without further assumptions
• latent variables induces further (non-Markovian) constraints on the
distribution
Open Problems:
• merging of information about latent variables;
development of algortihms for latent variables
• uncertainty in identification of Granger causal relationships
• instantaneous causality
• aggregation over time (distortion of identification only possible up
to Markov equivalence
• non-stationarity and non-linearity
50 / 52
References
• E. (2007), Granger-causality and path diagrams for multivariate time series,
Journal of Econometrics 137, 334-353.
• E. (2008), Testing nonparametric and semiparametric hypotheses in vector
stationary processes. Journal of Multivariate Analysis 99, 968-1009.
• E. (2009), Causal inference from time series: what can be learned from
Granger causality? In: G. Glymour, W. Wang, D. Westerståhl (eds),
Proceedings of the 13th International Congress of Logic, Methodology and
Philosophy of Science, College Publications, London.
• E. (2010), Graphical Modelling of multivariate time series with latent
variables. Journal of Machine Learning Research W&CP 9
• E. (2012), Graphical modelling of multivariate time series. Probability
Theory and Related Fields 153, 233-268.
• E. (2012). Causal inference in time series analysis. In: C. Berzuini, A.P.
Dawid, L. Bernardinelli (eds), Causality: Statistical Perspectives and
Applications, Wiley, Chichester.
• E. (2013). Causal inference with multiple time series: principles and
problems. Philosophical Transaction of The Royal Society A 371, 20110613.
51 / 52