Exploratory Causal Analysis in Bivariate Time Series Data
AbstractMany scientific disciplines rely on observational data of systems for which it is difficult
(or impossible) to implement controlled experiments and data analysis techniques are
required for identifying causal information and relationships directly from observational
data. This need has lead to the development of many different time series causality
approaches and tools including transfer entropy, convergent cross-mapping (CCM), and
Granger causality statistics.
A practicing analyst can explore the literature to find many proposals for identifying
drivers and causal connections in times series data sets, but little research exists of how
these tools compare to each other in practice. This work introduces and defines
exploratory causal analysis (ECA) to address this issue. The motivation is to provide a
framework for exploring potential causal structures in time series data sets.
J. M. McCracken
Defense talk for PhD in Physics, Department of Physics and Astronomy
10:00 AM November 20, 2015; Exploratory Hall, 3301
Advisor: Dr. Robert Weigel; Committee: Dr. Paul So, Dr. Tim Sauer
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 1 / 50
Exploratory Causal Analysis in Bivariate TimeSeries Data
J. M. McCrackenDepartment of Physics and AstronomyGeorge Mason University, Fairfax, VA
November 20, 2015
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 2 / 50
Outline1. Motivation
2. Causality studies
3. Data causality
4. Exploratory causal analysis
5. Making an ECA summaryTransfer entropy differenceGranger causality statisticPairwise asymmetric inferenceWeighed mean observed leaningLagged cross-correlation difference
6. Computational tools for the ECA summary
7. Empirical examplesCooling/Heating System DataSnowfall Data
8. Times series causality as data analysis
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 3 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Motivation
Data
Consider two sets of time series measurements, X and Y.
Question
Is there evidence that X “drives” Y?
We were looking for a data analysis approach, i.e., we were looking foranalysis tools that
I worked with time series data,
I had straightforward, preferably well-established, interpretations,
I were reliable,
I and did not require studying the (vast) philosophical causalityliterature.
Essentially, we were looking for a “plug-and-play” analysis tool.
This work stems from our search for such a tool.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50
Causality studies
The study of causality is as old as science itself
I Modern historians credit Aristotle with both the first theory ofcausality (“four causes”) and an early version of the scientific method
I The modern study of causality is broadly interdisciplinary; far toobroad to review in a short talk.
Illari and Russo’s textbook1provides an overview of causality studies
1Illari, P., & Russo, F. (2014). Causality: Philosophical theory meets scientific practice. Oxford University Press.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 5 / 50
Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
Foundational causality “Is a cause required to precede an effect?” or “Howare causes and effects related in space-time?”
Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”
2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50
Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
Foundational causality “Is a cause required to precede an effect?” or“How are causes and effects related in space-time?”
Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”
2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50
Towards a taxonomy of causal studies
Paul Holland identified four types of causal questions2:
I the ultimate meaningfulness of the notion of causality
I the details of causal mechanisms
I the causes of a given effect
I the effects of a given cause
Foundational causality “Is a cause required to precede an effect?” or “Howare causes and effects related in space-time?”
Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”
2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Data causality
Data causality is data analysis to draw causal inferences
Approaches to data causality studies include
I design of experiments (e.g., Fisher randomization)
I potential outcomes (Rubin’s counterfactuals)
I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”
I time series causality
There is no consensus on the best approach to data causality
Many authors consider their favored approach to be the exclusive correctapproach.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50
Time series causality
Time series causality is data causality with time series data
Approaches to time series causality can be roughly divided into fivecategories,
I Granger (model based approaches)
I Information-theoretic
I State space reconstruction (SSR)
I Correlation
I Penchant
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50
Time series causality
Time series causality is data causality with time series data
Approaches to time series causality can be roughly divided into fivecategories,
I Granger (model based approaches)
I Information-theoretic
I State space reconstruction (SSR)
I Correlation
I Penchant
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50
Exploratory causal analysisLanguage
Exploring causal structures in data sets is distinct from confirming causalstructures in data sets.
Causal language used in ECA should not be conflated with other typicaluses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms withdefinitions unrelated to their common, everyday definitions.
→ and ← will be used as shorthand for causal statements,e.g., A drives B will be written as A→ B.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50
Exploratory causal analysisLanguage
Exploring causal structures in data sets is distinct from confirming causalstructures in data sets.
Causal language used in ECA should not be conflated with other typicaluses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms withdefinitions unrelated to their common, everyday definitions.
→ and ← will be used as shorthand for causal statements,e.g., A drives B will be written as A→ B.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50
Exploratory causal analysisAssumptions
A cause always precedes an effect.
This assumption is required for the operational definitions of causality.
A driver may be present in the data being analyzed.
This assumption may lead to issues of confounding.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 10 / 50
Exploratory causal analysisECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool
Consider a time series pair (X,Y),
ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.
ECA summary
The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50
Exploratory causal analysisECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool
Consider a time series pair (X,Y),
ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.
ECA summary
The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50
Exploratory causal analysisECA summary vector approach
We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool
Consider a time series pair (X,Y),
ECA summary vector
Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.
ECA summary
The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
g2
g3
g4
g5
transfer entropy differenceGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
g2
g3
g4
g5
transfer entropy differenceGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
• g1
g2
g3
g4
g5
transfer entropy difference information-theoreticGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
• g2
g3
g4
g5
transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
g2
• g3
g4
g5
transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaninglagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
g2
g3
• g4
g5
transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaning penchantlagged cross-correlation difference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Making an ECA summary
Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.
g1
g2
g3
g4
• g5
transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaning penchantlagged cross-correlation difference correlation
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50
Transfer entropy (g1)Shannon entropy
The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,
HX = −NX∑n=1
P(X = Xn) log2 P(X = Xn)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50
Transfer entropy (g1)Shannon entropy
The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,
HX = −NX∑n=1
P(X = Xn) log2 P(X = Xn)
P(X = Xn) is the probability that X takes the specific value Xn
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50
Transfer entropy (g1)Shannon entropy
The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,
HX = −NX∑n=1
P(X = Xn) log2 P(X = Xn)
The sum is over all possible values of Xn; n = 1, 2, . . . ,NX
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50
Transfer entropy (g1)Shannon entropy
The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,
HX = −NX∑n=1
P(X = Xn) log2 P(X = Xn)
The base of the logarithm sets the entropy units, which is “bits” here
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50
Transfer entropy (g1)Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1
completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50
Transfer entropy (g1)Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1
completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50
Transfer entropy (g1)Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1
completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50
Transfer entropy (g1)Shannon entropy example
Binary example (to help with intuition)
Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is
HC = − (pH log2 pH + pT log2 pT )
completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1
completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0
(Entropy calculations almost always assume 0 log2 0 := 0.)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50
Transfer entropy (g1)Mutual information
A pair of random variables (X,Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
=
NX∑n=1
NY∑m=1
P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)
P(X = Xn)P(Y = Ym)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50
Transfer entropy (g1)Mutual information
A pair of random variables (X,Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
=
NX∑n=1
NY∑m=1
P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)
P(X = Xn)P(Y = Ym)
P(X = Xn,Y = Ym) is the probability that X takes the specific value Xn
and Y takes the specific value Ym
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50
Transfer entropy (g1)Mutual information
A pair of random variables (X,Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
=
NX∑n=1
NY∑m=1
P(X = Xn,Y = Ym) log2P(X=Xn,Y=Ym)
P(X=Xn)P(Y=Ym)
If X and Y are independent, thenP(X = Xn,Y = Ym) = P(X = Xn)P(Y = Ym)⇒ IX ;Y = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50
Transfer entropy (g1)Mutual information
A pair of random variables (X,Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
=
NX∑n=1
NY∑m=1
P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)
P(X = Xn)P(Y = Ym)
The mutual information is symmetric; i.e., IX ;Y = IY ;X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50
Transfer entropy (g1)Mutual information
A pair of random variables (X,Y) have some mutual information given by
IX ;Y = HX + HY − HX ,Y
=
NX∑n=1
NY∑m=1
P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)
P(X = Xn)P(Y = Ym)
Schreiber proposed an extension of the mutual information to measure“information flow” by making it conditional and including assumptionsabout the temporal behavior X and Y.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50
Transfer entropy (g1)Information flow
Suppose X and Y are both Markov processes. The directed flow ofinformation from Y to X is given by the transfer entropy,
TY→X =
NX∑n=1
NY∑m=1
pn+1,n,m log2
pn+1|n,m
pn+1|n
with
I pn+1,n,m = P(X(t + 1) = Xn+1,X(t) = Xn,Y(τ) = Ym)
I pn+1|n,m = P(X(t + 1) = Xn+1|X(t) = Xn,Y(τ) = Ym)
I pn+1|n = P(X(t + 1) = Xn+1|X(t) = Xn)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 16 / 50
Transfer entropy (g1)Information flow
There is no directed information flow from Y to X if X is conditionallyindependent of Y; i.e.,
pn+1|n,m = pn+1|n ⇒ TY→X = 0
Operational causality (information-theoretic)
X causes Y if the directed information flow from X to Y is higher than thedirected information flow from Y to X; i.e.,
TX→Y − TY→X > 0 ⇒ X→ Y
TX→Y − TY→X < 0 ⇒ Y → X
TX→Y − TY→X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50
Transfer entropy (g1)Information flow
There is no directed information flow from Y to X if X is conditionallyindependent of Y; i.e.,
pn+1|n,m = pn+1|n ⇒ TY→X = 0
Operational causality (information-theoretic)
X causes Y if the directed information flow from X to Y is higher than thedirected information flow from Y to X; i.e.,
TX→Y − TY→X > 0 ⇒ X→ Y
TX→Y − TY→X < 0 ⇒ Y → X
TX→Y − TY→X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50
Granger causality (g2)Granger’s axioms
Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50
Granger causality (g2)Granger’s axioms
Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Axiom 1
The past and present may cause the future, but the future cannot causethe past.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50
Granger causality (g2)Granger’s axioms
Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Axiom 1
The past and present may cause the future, but the future cannot causethe past.
Axiom 2
Ωn contains no redundant information, so that if some variable Z isfunctionally related to one or more other variables, in a deterministicfashion, then Z should be excluded from Ωn.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50
Granger causality (g2)Granger’s axioms
Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Granger’s definition of causality
Given some set A, Y causes X if
P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50
Granger causality (g2)Granger’s axioms
Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.
Granger’s definition of causality
Given some set A, Y causes X if
P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)
Granger’s original goal was to make this notion of causality “operational”.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50
Granger causality (g2)VAR models
Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(
Xt
Yt
)=
n∑i=1
(Ai
11 Ai12
Ai21 Ai
22
)(Xt−iYt−i
)+
(ε1,t
ε2,t
)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50
Granger causality (g2)VAR models
Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(
Xt
Yt
)=
n∑i=1
(Ai
11 Ai12
Ai21 Ai
22
)(Xt−iYt−i
)+
(ε1,t
ε2,t
)
The current time step t of X and Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50
Granger causality (g2)VAR models
Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(
Xt
Yt
)=
n∑i=1
(Ai
11 Ai12
Ai21 Ai
22
) (Xt−iYt−i
)+
(ε1,t
ε2,t
)
The current time step t of X and Y is modeled as a sum of n past steps
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50
Granger causality (g2)VAR models
Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(
Xt
Yt
)=
n∑i=1
(Ai
11 Ai12
Ai21 Ai
22
) (Xt−iYt−i
)+
(ε1,t
ε2,t
)
The current time step t of X and Y is modeled as a sum of n past stepsof X and Y,
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50
Granger causality (g2)VAR models
Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(
Xt
Yt
)=
n∑i=1
(Ai
11 Ai12
Ai21 Ai
22
)(Xt−iYt−i
)+
(ε1,t
ε2,t
)
The current time step t of X and Y is modeled as a sum of n past stepsof X and Y, plus uncorrelated noise terms.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50
Granger causality (g2)Comparison of VAR models
Consider two different VAR models for the pair (X,Y),(Xt
Yt
)=
n∑i=1
(Axx ,i Axy ,i
Ayx ,i Ayy ,i
)(Xt−iYt−i
)+
(εx ,tεy ,t
)(Xt
Yt
)=
n∑i=1
(A′xx ,i 0
0 A′yy ,i
)(Xt−iYt−i
)+
(ε′x ,tε′y ,t
)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50
Granger causality (g2)Comparison of VAR models
Consider two different VAR models for the pair (X,Y),(Xt
Yt
)=
n∑i=1
(Axx ,i Axy ,i
Ayx ,i Ayy ,i
)(Xt−iYt−i
)+
(εx ,tεy ,t
)(Xt
Yt
)=
n∑i=1
(A′xx ,i 0
0 A′yy ,i
)(Xt−iYt−i
)+
(ε′x ,tε′y ,t
)The G-causality log-likelihood statistic is defined as
FY→X = ln|Σ′xx ||Σxx |
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50
Granger causality (g2)Comparison of VAR models
Consider two different VAR models for the pair (X,Y),(Xt
Yt
)=
n∑i=1
(Axx ,i Axy ,i
Ayx ,i Ayy ,i
)(Xt−iYt−i
)+
(εx ,tεy ,t
)(Xt
Yt
)=
n∑i=1
(A′xx ,i 0
0 A′yy ,i
)(Xt−iYt−i
)+
(ε′x ,tε′y ,t
)The G-causality log-likelihood statistic is defined as
FY→X = ln|Σ′xx ||Σxx |
Covariance of X model residuals given no dependence on Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50
Granger causality (g2)Comparison of VAR models
Consider two different VAR models for the pair (X,Y),
(Xt
Yt
)=
n∑i=1
(Axx ,i Axy ,i
Ayx ,i Ayy ,i
)(Xt−iYt−i
)+
(εx ,tεy ,t
)(Xt
Yt
)=
n∑i=1
(A′xx ,i 0
0 A′yy ,i
)(Xt−iYt−i
)+
(ε′x ,tε′y ,t
)The G-causality log-likelihood statistic is defined as
FY→X = ln|Σ′xx ||Σxx |
Covariance of X model residuals given a possible dependence on Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50
Granger causality (g2)G-causality log-likelihood statistic
If both VAR models fit (or “forecast”) the data equally well, then there isno G-causality; i.e.,
|Σ′xx | = |Σxx | ⇒ FY→X = 0
Operational causality (Granger)
X causes Y if the X-dependent forecast of Y decreases the Y modelresidual covariance (as compared to the X-independent forecast) morethan the Y-dependent forecast of X decreases the X model residualcovariance (as compared to the Y-independent forecast); i.e.,
FX→Y − FY→X > 0 ⇒ X→ Y
FX→Y − FY→X < 0 ⇒ Y → X
FX→Y − FY→X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50
Granger causality (g2)G-causality log-likelihood statistic
If both VAR models fit (or “forecast”) the data equally well, then there isno G-causality; i.e.,
|Σ′xx | = |Σxx | ⇒ FY→X = 0
Operational causality (Granger)
X causes Y if the X-dependent forecast of Y decreases the Y modelresidual covariance (as compared to the X-independent forecast) morethan the Y-dependent forecast of X decreases the X model residualcovariance (as compared to the Y-independent forecast); i.e.,
FX→Y − FY→X > 0 ⇒ X→ Y
FX→Y − FY→X < 0 ⇒ Y → X
FX→Y − FY→X = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50
Pairwise asymmetric inference (g3)State space reconstruction
Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as
X = xt | t = 1 + (E − 1)τ, . . . , L
withxt =
(xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ
)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50
Pairwise asymmetric inference (g3)State space reconstruction
Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as
X = xt | t = 1 + (E − 1)τ, . . . , L
with
xt =
(xt , xt− τ , xt−2 τ , . . . , xt−(E−1) τ
)I τ is the delay time step
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50
Pairwise asymmetric inference (g3)State space reconstruction
Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as
X = xt | t = 1 + (E − 1)τ, . . . , L
with
xt =
(xt , xt−τ , xt−2τ , . . . , x
t−( E −1)τ
)I τ is the delay time step
I E is the embedding dimension
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
1. Find the n nearest neighbors to xt (in X), where “nearest” meanssmallest Euclidean distance, d ; i.e., d1 < d2 < . . . < dn
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
2. Create weights,w , from the nearest neighbors as
wi =e− di
d1∑nj=1 e
−djd1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
3. Construct the cross-mapped estimate of Y using the weights as
Y|X =
Yt |X =
n∑i=1
wiYti| t = 1 + (E − 1)τ, . . . , L
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
Each cross-mapped point in the estimate of Y, i.e.,
Yt |X =n∑
i=1
e− di /d1∑nj=1 e
−dj/d1Yti
depends on comparisons of
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)
Each cross-mapped point in the estimate of Y, i.e.,
Yt |X =n∑
i=1
e−di/d1∑nj=1 e
−dj/d1Yti
depends on comparisons of the pasts of X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapping
Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points
xt = ( xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt )
Each cross-mapped point in the estimate of Y, i.e.,
Yt |X =n∑
i=1
e−di/d1∑nj=1 e
−dj/d1Yti
depends on comparisons of the pasts of X and the presents of X and Y.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50
Pairwise asymmetric inference (g3)Cross-mapped correlation
A good cross-mapped estimate is defined as one that is strongly correlatedwith the original times series. The cross-mapped correlation is
CYX =[ρ(Y,Y|X)
]2
where ρ (·) is Pearson’s correlation coefficient.
Cross-mapping interpretation
If similar histories of X (i.e., nearest neighbors in the shadow manifold)capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then thepresence (or action) of Y in the system has been recorded in X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50
Pairwise asymmetric inference (g3)Cross-mapped correlation
A good cross-mapped estimate is defined as one that is strongly correlatedwith the original times series. The cross-mapped correlation is
CYX =[ρ(Y,Y|X)
]2
where ρ (·) is Pearson’s correlation coefficient.
Cross-mapping interpretation
If similar histories of X (i.e., nearest neighbors in the shadow manifold)capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then thepresence (or action) of Y in the system has been recorded in X.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50
Pairwise asymmetric inference (g3)Cross-mapping interpretation of causality
A time series pair (X,Y) will have two cross-mapped correlations, CYX
and CXY .
Operational causality (SSR)
X causes Y if similar histories of Y estimate X better than similar historiesof X estimate Y, where the “similar histories” of one time series are usedto estimate another time series through shadow manifold nearest neighborweighting (cross-mapping); i.e.,
CYX − CXY < 0 ⇒ X→ Y
CYX − CXY > 0 ⇒ Y → X
CYX − CXY = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50
Pairwise asymmetric inference (g3)Cross-mapping interpretation of causality
A time series pair (X,Y) will have two cross-mapped correlations, CYX
and CXY .
Operational causality (SSR)
X causes Y if similar histories of Y estimate X better than similar historiesof X estimate Y, where the “similar histories” of one time series are usedto estimate another time series through shadow manifold nearest neighborweighting (cross-mapping); i.e.,
CYX − CXY < 0 ⇒ X→ Y
CYX − CXY > 0 ⇒ Y → X
CYX − CXY = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C ) − P(E |C
)
P (E |C ) is the probability of some effect E given some cause C
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
P(E |C
)is the probability of some effect E given no cause C
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
So, the penchant is the probability of an effect E given a cause C minusthe probability of that effect without the cause
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
In the psychology/medical literature, the causal penchant is known as theEells measure of causal strength or probability contrast.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
If C drives E , then it is expected that ρEC > 0.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P (E |C )− P(E |C
)
The second term, P(E |C
), can be eliminated from the penchant formula
using Bayes theorem.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P(E |C )
(1 +
P(C )
1− P(C )
)− P(E )
1− P(C )
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P(E |C )
(1 +
P(C )
1− P(C )
)− P(E )
1− P(C )
If E and C are independent, then P(E |C ) = P(E ), which implies
ρEC = P(E ) +P(E )P(C )− P(E )
1− P(C )= P(E )− P(E ) = 0
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P(E |C )
(1 +
P(C )
1− P(C )
)− P(E )
1− P(C )
Example (to help with intuition)
Consider C and E to be two fair coins, c1 and c2, being “heads”; i.e.,P(c1 = “heads ′′) = 0.5 and P(c2 = “heads ′′) = 0.5. If the coins areindependent, then
P(c2 = “heads ′′|c1 = “heads ′′) = P(c2 = “heads ′′) = 0.5⇒ ρEC = 0
If they are completely dependent then
P(c2 = “heads ′′|c1 = “heads ′′) = 1 or 0⇒ ρEC = 1 or − 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal penchant
The causal penchant ρEC ∈ [1,−1] is
ρEC = P(E |C )
(1 +
P(C )
1− P(C )
)− P(E )
1− P(C )
This formula has the additional benefit of only needing to estimate oneconditional probability from the data.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50
Weighted mean observed leaning (g4)Causal leaning
A difference of penchants can be used to compare different cause-effectassignments (i.e., different assumptions of what should be considered acause and what should be considered an effect). The leaning is
λEC = ρEC − ρCE
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50
Weighted mean observed leaning (g4)Causal leaning
A difference of penchants can be used to compare different cause-effectassignments (i.e., different assumptions of what should be considered acause and what should be considered an effect). The leaning is
λEC = ρEC − ρCE
Leaning interpretation
If λEC > 0, then C drives E more than E drives C .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50
Weighted mean observed leaning (g4)Usefulness of the leaning
The usefulness of the leaning depends on two things,
1. Operational definitions of C and E (called the cause-effectassignment)
2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data
The primary cause-effect assignment will be the l-standard assignment,
l-standard assignment
Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.
Probabilities will estimated using data frequency counts.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50
Weighted mean observed leaning (g4)Usefulness of the leaning
The usefulness of the leaning depends on two things,
1. Operational definitions of C and E (called the cause-effectassignment)
2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data
The primary cause-effect assignment will be the l-standard assignment,
l-standard assignment
Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.
Probabilities will estimated using data frequency counts.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50
Weighted mean observed leaning (g4)Usefulness of the leaning
The usefulness of the leaning depends on two things,
1. Operational definitions of C and E (called the cause-effectassignment)
2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data
The primary cause-effect assignment will be the l-standard assignment,
l-standard assignment
Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.
Probabilities will estimated using data frequency counts.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50
Weighted mean observed leaning (g4)Usefulness of the leaning
The usefulness of the leaning depends on two things,
1. Operational definitions of C and E (called the cause-effectassignment)
2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data
The primary cause-effect assignment will be the l-standard assignment,
l-standard assignment
Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.
Probabilities will estimated using data frequency counts.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50
Weighted mean observed leaning (g4)Usefulness of the leaning
The usefulness of the leaning depends on two things,
1. Operational definitions of C and E (called the cause-effectassignment)
2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data
The primary cause-effect assignment will be the l-standard assignment,
l-standard assignment
Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.
Probabilities will estimated using data frequency counts.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50
Weighted mean observed leaning (g4)Leaning from the data
The cause-effect assignment must be specific if the probabilities are to beestimated with frequency counts and need to include tolerance domains toaccount for noise in the measurements.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Leaning from the data
Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as
P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Leaning from the data
Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as
P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩b
nb
na∩b is the number of times yt ∈ [a− δLy , a + δRy ] and
xt−l ∈ [b − δLx , b + δRx ] in (X,Y)
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Leaning from the data
Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as
P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb
nb is the number of times xt−l ∈ [b − δLx , b + δRx ] in X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Leaning from the data
Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as
P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb
The tolerance domains are usually considered symmetric; i.e., δLx = δRx andδLy = δRy
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Leaning from the data
Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as
P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb
The causal inference implied by the leaning calculations aredependent on both the cause-effect assignment and the tolerance
domains.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50
Weighted mean observed leaning (g4)Weighted mean
Any time series pair (X,Y) will have many leanings; e.g., an l-standardassignment of C ,E = xt−l = b ± δx , yt = a± δy will have a differentleaning calculation for each xt−1 ∈ [b − δx , b + δx ] andyt ∈ [a− δy , a + δy ].
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50
Weighted mean observed leaning (g4)Weighted mean
Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50
Weighted mean observed leaning (g4)Weighted mean
Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.
Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50
Weighted mean observed leaning (g4)Weighted mean
Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.
Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.
The weighed mean observed penchant, 〈ρEC 〉w , is the weighedalgebraic mean of the observed penchants.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50
Weighted mean observed leaning (g4)Weighted mean
Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.
Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.
The weighed mean observed penchant, 〈ρEC 〉w , is the weighed algebraicmean of the observed penchants.
The weighed mean observed leaning, 〈λEC 〉w , is the difference of theweighed mean observed penchants; i.e., 〈λEC 〉w = 〈ρEC 〉w − 〈ρCE 〉w
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50
Weighted mean observed leaning (g4)Causal inference
Operational causality (penchant)
X causes Y if the weighted mean observed leaning is positive given acause-effect assignment (and reasonable tolerance domains) in which theassumed cause X precedes the assumed effect Y; i.e.,
〈λEC 〉w > 0 ⇒ X→ Y
〈λEC 〉w < 0 ⇒ Y → X
〈λEC 〉w = 0 ⇒ no causal inference
given C ∈ X, E ∈ Y, and C precedes E .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 31 / 50
Lagged cross-correlation difference (g5)Cross-correlation
The cross-correlation between two time series X and Y is
ρxy =E [(xt − µX ) (yt − µY )]√
σ2Xσ
2Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50
Lagged cross-correlation difference (g5)Cross-correlation
The cross-correlation between two time series X and Y is
ρxy =E[(
xt − µX
)(yt − µY )
]√σ2Xσ
2Y
Every point in X is compared to the mean of X
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50
Lagged cross-correlation difference (g5)Cross-correlation
The cross-correlation between two time series X and Y is
ρxy =E[(xt − µX )
(yt − µY
)]√σ2Xσ
2Y
Every point in Y is compared to the mean of Y
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50
Lagged cross-correlation difference (g5)Cross-correlation
The cross-correlation between two time series X and Y is
ρxy =E [(xt − µX ) (yt − µY )]√
σ2Xσ
2Y
The product of the individual variances of X and Y is used as anormalization
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50
Lagged cross-correlation difference (g5)Cross-correlation
The cross-correlation between two time series X and Y is
ρxy =E [(xt − µX ) (yt − µY )]√
σ2Xσ
2Y
Example (to help with intuition)
X = Y ⇒ ρxy =E [(xt − µX ) (yt − µY )]√
σ2Xσ
2Y
=E[(xt − µX )2
]σ2X
=σ2X
σ2X
= 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50
Lagged cross-correlation difference (g5)Lagged cross-correlation
Consider a time series pair (X,Y). The past of Y may be compared to thepresent of X by introducing a lag l into the cross-correlation calculation,
ρxyl =E [(xt − µX ) (yt−l − µY )]√
σ2Xσ
2Y
Operational causality (correlation)
X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) ismore strongly correlated with the present of Y than the past of Y (i.e., Ylagged by l time steps) is with the present of X; i.e.,
|ρxyl | − |ρyxl | < 0 ⇒ X→ Y
|ρxyl | − |ρyxl | > 0 ⇒ Y → X
|ρxyl | − |ρyxl | = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50
Lagged cross-correlation difference (g5)Lagged cross-correlation
Consider a time series pair (X,Y). The past of Y may be compared to thepresent of X by introducing a lag l into the cross-correlation calculation,
ρxyl =E [(xt − µX ) (yt−l − µY )]√
σ2Xσ
2Y
Operational causality (correlation)
X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) ismore strongly correlated with the present of Y than the past of Y (i.e., Ylagged by l time steps) is with the present of X; i.e.,
|ρxyl | − |ρyxl | < 0 ⇒ X→ Y
|ρxyl | − |ρyxl | > 0 ⇒ Y → X
|ρxyl | − |ρyxl | = 0 ⇒ no causal inference
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50
Computational tools for the ECA summary
Open source packages are available for some of the mentioned times seriescausality tools and others required code to be develop from scratch.
g1
g2
g3
g4
g5
Java Information Dynamics Toolkit (JIDT)Multivariate Granger Causality (MVGC) MATLAB toolbox(C++)(MATLAB)(MATLAB)
All the code is available at https://github.com/jmmccracken
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 34 / 50
Cooling/Heating System DataTime series data
Consider a time series pair (X,Y) where X are indoor temperaturemeasurements (in degrees Celsius) in a house with “experimental”environmental controls and Y is the temperature outside of that house,measured at the same time intervals (168 measurements in each series)3
0 20 40 60 80 100 120 140 16020
21
22
23
24
25
26
27
t
xt
X
0 20 40 60 80 100 120 140 1600
5
10
15
20
25
30
t
yt
Y
The intuitive causal inference is Y → X.
3This data was originally presented at a time series conference. The abstract is available here,
http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50
Cooling/Heating System DataTime series data
Consider a time series pair (X,Y) where X are indoor temperaturemeasurements (in degrees Celsius) in a house with “experimental”environmental controls and Y is the temperature outside of that house,measured at the same time intervals (168 measurements in each series)3
0 20 40 60 80 100 120 140 16020
21
22
23
24
25
26
27
t
xt
X
0 20 40 60 80 100 120 140 1600
5
10
15
20
25
30
t
yt
Y
The intuitive causal inference is Y → X.
3This data was originally presented at a time series conference. The abstract is available here,
http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataECA summary preliminaries
An ECA summary requires several parameters be set from the data,including
I embedding dimension and time delay for g3 (PAI)
I cause-effect assignment and tolerance domains for g4 (leaning)
I lags for g5 (cross-correlation)
The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.
The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.
The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50
Cooling/Heating System DataAutocorrelations
There are autocorrelations in both time series (only 50 lags are shown),
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
|r(x
t−l,xt)|2
l
X
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
|r(y
t−l,yt)|2
l
Y
The autocorrelations appear cyclic and initially drop to zero around l = 7for both time series.
This observation will be used justify using lags of l = 1, 2, · · · , 7 for
both g4 (leaning) and g5 (cross-correlation).
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50
Cooling/Heating System DataAutocorrelations
There are autocorrelations in both time series (only 50 lags are shown),
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
|r(x
t−l,xt)|2
l
X
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
|r(y
t−l,yt)|2
l
Y
The autocorrelations appear cyclic and initially drop to zero around l = 7for both time series.
This observation will be used justify using lags of l = 1, 2, · · · , 7 for
both g4 (leaning) and g5 (cross-correlation).
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50
Cooling/Heating System DataLagged cross-correlations and leanings
The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,
1 2 3 4 5 6 7−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
l
⟨ λl ⟩
∆l
There are 7 different causal inferences in this plot, all of which agreeexcept l = 7. A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50
Cooling/Heating System DataLagged cross-correlations and leanings
The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,
1 2 3 4 5 6 7−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
l
⟨ λl ⟩
∆l
There are 7 different causal inferences in this plot, all of which agreeexcept l = 7.
A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50
Cooling/Heating System DataLagged cross-correlations and leanings
The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,
1 2 3 4 5 6 7−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
l
⟨ λl ⟩
∆l
There are 7 different causal inferences in this plot, all of which agreeexcept l = 7. A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒FX→Y − FY→X = −0.35 ⇒ Y → X ⇒CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ
yxl |〉 = 0.40 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1
FX→Y − FY→X = −0.35 ⇒ Y → X ⇒CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ
yxl |〉 = 0.40 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1
CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ
yxl |〉 = 0.40 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1
〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ
yxl |〉 = 0.40 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1
〈|ρxyl | − |ρyxl |〉 = 0.40 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1〈|ρxyl | − |ρ
yxl |〉 = 0.40 ⇒ Y → X ⇒ g5 = 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Cooling/Heating System DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1
FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1
CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1
〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1
〈|ρxyl | − |ρyxl |〉 = 0.40 ⇒ Y → X ⇒ g5 = 1
∴ the ECA summary is Y → X, which agrees with intuition
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50
Snowfall DataTime series data
Consider a time series pair (X,Y) where X is the mean daily temperature(in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall(in centimeters) (7,753 measurements in each series)4
0 1000 2000 3000 4000 5000 6000 7000 8000−30
−20
−10
0
10
20
30
t
xt
X
0 1000 2000 3000 4000 5000 6000 7000 80000
20
40
60
80
100
120
t
yt
Y
The intuitive causal inference is X→ Y.
3This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to
December 31, 2009.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50
Snowfall DataTime series data
Consider a time series pair (X,Y) where X is the mean daily temperature(in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall(in centimeters) (7,753 measurements in each series)4
0 1000 2000 3000 4000 5000 6000 7000 8000−30
−20
−10
0
10
20
30
t
xt
X
0 1000 2000 3000 4000 5000 6000 7000 80000
20
40
60
80
100
120
t
yt
Y
The intuitive causal inference is X→ Y.
3This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to
December 31, 2009.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50
Snowfall DataECA summary preliminaries
The ECA summary can be made with similar parameters as the previousexample,
I The embedding dimension will be E = 100 with a time delay of τ = 1
I The cause-effect assignment will be the l-standard assignment
I The tolerance domains will be the 1/4-width domains
I The tested lags will be l = 1, 2, . . . , 20
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 41 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ
yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0
FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ
yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1
CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ
yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0
〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ
yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0〈|ρxyl | − |ρ
yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0
FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1
CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0
〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1
∴ the ECA summary is undefined
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Snowfall DataMaking an ECA summary
Each of the five time series tools leads to a causal inference in the ECAsummary vector,
TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0
FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1
CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0
〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1
∴ the ECA summary is undefined
The majority of the causal inferences agree with intuition.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50
Times series causality as data analysisObjections to causal studies
Data analysis often ignores causality.
Two primary objections to time series causality
1. Correlation is not causation
2. Confounding cannot be controlled
Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.
True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50
Times series causality as data analysisObjections to causal studies
Data analysis often ignores causality.
Two primary objections to time series causality
1. Correlation is not causation
2. Confounding cannot be controlled
Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.
True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50
Times series causality as data analysisObjections to causal studies
Data analysis often ignores causality.
Two primary objections to time series causality
1. Correlation is not causation
2. Confounding cannot be controlled
Many different tools have been developed that go beyondcorrelation and ignoring such tools means ignoring potentially usefulinferences that can be drawn from the data.
True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50
Times series causality as data analysisObjections to causal studies
Data analysis often ignores causality.
Two primary objections to time series causality
1. Correlation is not causation
2. Confounding cannot be controlled
Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.
True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50
ECA summaries as practical toolsExploratory causal inference as a practical part of data analysis
Consider a recent result presented in Unraveling the cause-effect relationbetween time series [Phys. Rev. E 90, 052150]:
Liang; Section V, Ibid.
“. . . El Nino and IOD [Indian Ocean Dipole] are mutually causal, and thecausality is asymmetric, with the one from the latter to the former largerthan its counterpart . . .” (In the language of ECA: Given the time seriespair (E, I), the dominant potential driver is I; i.e., I→ E)
This conclusion is drawn from a derivation of the “Liang information flow”from the transfer entropy and then applying this new formula to E and I,but this same conclusion can be drawn from the ECA summaryvectors of these time series pairs, using the code presentedpreviously with naive algorithm parameters (specifically, theparameters used in the snowfall example).
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50
ECA summaries as practical toolsExploratory causal inference as a practical part of data analysis
Consider a recent result presented in Unraveling the cause-effect relationbetween time series [Phys. Rev. E 90, 052150]:
Liang; Section V, Ibid.
“. . . El Nino and IOD [Indian Ocean Dipole] are mutually causal, and thecausality is asymmetric, with the one from the latter to the former largerthan its counterpart . . .” (In the language of ECA: Given the time seriespair (E, I), the dominant potential driver is I; i.e., I→ E)
This conclusion is drawn from a derivation of the “Liang information flow”from the transfer entropy and then applying this new formula to E and I,but this same conclusion can be drawn from the ECA summaryvectors of these time series pairs, using the code presentedpreviously with naive algorithm parameters (specifically, theparameters used in the snowfall example).
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 45 / 50
BACK-UP
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 46 / 50
Impulse with linear response
Consider X,Y = xt, yt where t = 0, 1, . . . , L,
xt =
2 t = 1Aηt ∀ t ∈ t | t 6= 1 and t mod 5 6= 02 ∀ t ∈ t | t mod 5 = 0
and yt = xt−1 + Bηt with y0 = 0, A,B ∈ R ≥ 0 and ηt ∼ N (0, 1).Specifically, consider L = 500, A = 0.1, and B = 0.4.
TX→Y − TY→X = 5.3× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 4.5× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −8.3× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 6.6× 10−3 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = −2.8× 10−3 ⇒ X→ Y ⇒ g5 = 0
ECA summary is X→ Y, which agrees with intuition.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 47 / 50
Cyclic driving with linear response
Consider X,Y = xt, yt where t = 0, 1, . . . , L,
xt = a sin(bt + c) + Aηt
andyt = xt−1 + Bηt
with y0 = 0, A ∈ [0, 1], B ∈ [0, 1], ηt ∼ N (0, 1), and with the amplitudea, the frequency b, and the phase c all in the appropriate units.Specifically, consider L = 500, A = 0.1, B = 0.4, a = b = 1, and c = 0.
TX→Y − TY→X = 1.9× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 2.1× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −9.8× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.9× 10−3 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = −2.9× 10−2 ⇒ X→ Y ⇒ g5 = 0
ECA summary is X→ Y, which agrees with intuition.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 48 / 50
Cyclic driving with non-linear responseConsiderX,Y = xt, yt where t = 0, 1, . . . , L,
xt = a sin(bt + c) + Aηt
andyt = Bxt−1 (1− Cxt−1) + Dηt ,
with y0 = 0, with A,B,C ,D ∈ [0, 1], ηt ∼ N (0, 1), and with theamplitude a, the frequency b, and the phase c all in the appropriate unitsgiven t = 0, f π, 2f π, 3f π, . . . , 6π with f = 1/30, which implies L = 181.Specifically, consider A = 0.1, B = 0.3, C = 0.4, D = 0.5, a = b = 1, andc = 0.
TX→Y − TY→X = 2.7× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 2.6× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −1.8× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 8.4× 10−3 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = −6.8× 10−2 ⇒ X→ Y ⇒ g5 = 0
ECA summary is X→ Y, which agrees with intuition.J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 49 / 50
Coupled logistic map
Consider X,Y = xt, yt where t = 0, 1, . . . , L,
xt = xt−1 (rx − rxxt−1 − βxyyt−1)
andyt = yt−1 (ry − ryyt−1 − βyxxt−1)
where the parameters rx , ry , βxy , βyx ∈ R ≥ 0. Specifically, considerL = 500, βxy = 0.5, βyx = 1.5, rx = 3.8, and ry = 3.2 with initialconditions x0 = y0 = 0.4.
TX→Y − TY→X = 4.9× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 5.4× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −3.9× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 2.7× 10−1 ⇒ X→ Y ⇒ g4 = 0
〈|ρxyl | − |ρyxl |〉 = −2.6× 10−1 ⇒ X→ Y ⇒ g5 = 0
ECA summary is X→ Y, which agrees with intuition.
J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 50 / 50