discovering causal structure with the pc-algorithm · pc algorithm two simple principles 1.there is...
TRANSCRIPT
-
Discovering Causal Structure with thePC-algorithm
CRG and MSDSlab Meeting
Discussant: Oiśın Ryan
February 23, 2018
-
What is a DAG?
I DAG = Directed Acyclic Graph
I Nodes or Vertices = {Author,Bar, V5.. }
I Directed Edges →I No cycles
I Cannot have A→ B → C → A
-
Why DAGs?
1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables
I A → B means A 6⊥⊥ BI Read off conditional dependency relationships
2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links
I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality
I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords
-
Why DAGs?
1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables
I A → B means A 6⊥⊥ BI Read off conditional dependency relationships
2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links
I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality
I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords
-
Some graph terminology
I Parents(B) = {A,D}I Children(B) = {C}I A is an ancestor of C
I C is a descendant of A
I Path = sequence of edges
-
DAGs and Probability distributions
DAGs can be used to represent a joint density function using theMarkov Condition
P(V ) =∏V∈V
P(V |Parents(V )
)(1)
P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )
We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition
-
DAGs and Probability distributions
DAGs can be used to represent a joint density function using theMarkov Condition
P(V ) =∏V∈V
P(V |Parents(V )
)(1)
P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )
We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition
-
DAGs and Probability distributions
DAGs can be used to represent a joint density function using theMarkov Condition
P(V ) =∏V∈V
P(V |Parents(V )
)(1)
P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )
We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition
-
DAGs and conditional dependencies
𝑋
𝑍
𝑌
X 6⊥⊥ YX ⊥⊥ Y | Z𝑋
𝑍
𝑌 𝑋
𝑍
𝑌
-
DAGs and conditional dependencies
𝑋
𝑍
𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z
General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules
-
DAGs and conditional dependencies
𝑋
𝑍
𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z
General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules
-
What use is having a DAG?
I Representation of causal relations amongst variables
I Estimation of causal effectsI Identify sufficient conditioning set to control for confounding
I I may not need to condition on all possible ancestorsI I shouldn’t condition on colliders
-
Discovering DAGs from Data
PC algorithm (Peter Spirtes & Clark Glymour)
Assuming:I The set of observed variables is sufficent
I No common causes not present in the datasetI Extensions that account for latent variables do exist!
I The distribution of the observed variables is faithful to a DAG
-
PC algorithm
Two simple principles
1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables
I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }
2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y
I We only orientate arrows if we can identify a collider
-
PC algorithm
Two simple principles
1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables
I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }
2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y
I We only orientate arrows if we can identify a collider
-
PC algorithm example
X Y
Z Q
W
True DAG
-
PC algorithm example
X Y
Z Q
W
Step 0
I Start with a fully connectedundirected graph
-
PC algorithm example
X Y
Z Q
W
Step 1a
I Test marginal dependenciesfor each pair
I E.g. correlation between Xand Z
I If not significant, delete theedge
I repeat for all pairs
-
PC algorithm example
X Y
Z Q
W
Step 1a
I Test marginal dependenciesfor each pair
I E.g. correlation between Xand Z
I If not significant, delete theedge
I repeat for all pairs
-
PC algorithm example
X Y
Z Q
W
Step 1b
I Take the result of step 1a asinput
I For each adjacent pair inthis graph (e.g., A,B)
I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)
I Test CI of A and B for eachsize=1 subset of Adj (A,B)
I Record the variables thatseperate A from B
-
PC algorithm example
X Y
Z Q
W
Step 1b
I Take the result of step 1a asinput
I For each adjacent pair inthis graph (e.g., A,B)
I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)
I Test CI of A and B for eachsize=1 subset of Adj (A,B)
I Record the variables thatseperate A from B
-
PC algorithm example
X Y
Z Q
W
Step 1c
I Take the result of step 1b asinput
I Repeat previous step but forsubset of size =2
In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)
-
PC algorithm example
X Y
Z Q
W
Step 1c
I Take the result of step 1b asinput
I Repeat previous step but forsubset of size =2
I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)
-
PC algorithm example
X Y
Z Q
W
Step 1c
I Take the result of step 1b asinput
I Repeat previous step but forsubset of size =2
I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)
-
PC algorithm example
X Y
Z Q
W
I This is an estimate of theskeleton of the DAG
I An estimate of theundirected version of theDAG
-
PC algorithm example
X Y
Z Q
W
Step 2
I For each triplet (opentriangle) A− B − C
I Orient A→ B ← C if wedid not condition on B toseperate A and C
I We now have a CompletePartially-Oriented DAG(CPDAG)
-
PC algorithm example
X Y
Z Q
W
Step 2
I For each triplet (opentriangle) A− B − C
I Orient A→ B ← C if wedid not condition on B toseperate A and C
I We now have a CompletePartially-Oriented DAG(CPDAG)
-
Let’s test the PC algorithm out