penn state - march 23, 20071 the tetrad project: computational aids to causal discovery peter...

Penn State - March 23, 2007 1

The TETRAD Project: Computational Aids to

Causal Discovery

Peter Spirtes, Clark Glymour, Richard Scheines

and many others

Department of Philosophy

Carnegie Mellon

Agenda

1. Morning I: Theoretical Overview

Representation, Axioms, Search

2. Morning II: Research Problems

3. Afternoon: TETRAD Demo - Workshop

Part I: Agenda

1. Motivation

2. Representation

3. Connecting Causation to Probability (Independence)

4. Searching for Causal Models

5. Improving on Regression for Causal Inference

1. Motivation

Non-experimental Evidence

Typical Predictive Questions

• Can we predict aggressiveness from the amount of violent TV watched

Causal Questions:

• Does watching violent TV cause Aggression?

• I.e., if we intervene to change TV watching, will the level of Aggression

change?

Day Care Aggressiveness

A little

Causal Estimation

Manipulated Probability P(Y | X set= x, Z=z)

Unmanipulated Probability P(Y,X,Z)

When and how can we use non-experimental data to tell us about the effect of an intervention?

Spartina in the Cape Fear Estuary

What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary?

pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what?

14 variables for 45 samples of Spartina from Cape Fear Estuary.

Biologist concluded salinity must be a factor.

Bayes net analysis says only pH directly affects Spartina biomass

Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.

2. Representation

1. Association & causal structure -

qualitatively

2. Interventions

3. Statistical Causal Models

1. Bayes Networks

2. Structural Equation Models

Causation & Association

X is a cause of Y iff x1 x2 P(Y | X set= x1) P(Y | X set= x2)

Causation is asymmetric: X Y Y X

X and Y are associated (X _||_ Y) iff

x1 x2 P(Y | X = x1) P(Y | X = x2)

Association is symmetric: X _||_ Y Y _||_ X

Direct Causation

X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Z set= z)

P(Y | X set= x2 , Z set= z)

where Z = S - {X,Y} X Y

Causal Graphs

Causal Graph G = {V,E} Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Exposure Rash

Exposure Infection Rash

Chicken Pox

Causal Graphs

Not Cause Complete

Common Cause Complete

Exposure Infection Symptoms

Omitted Causes

Exposure Infection Symptoms

Omitted

Common Causes

Sweaters

Room Temperature

Pre-experimental SystemPost

Modeling Ideal Interventions

Interventions on the Effect

Modeling Ideal Interventions

Sweaters

OnRoom

Temperature

Pre-experimental SystemPost

Interventions on the Cause

Ideal Interventions & Causal Graphs

• Model an ideal intervention by adding an “intervention” variable outside the original system

• Erase all arrows pointing into the variable intervened upon

Exp Inf

Intervene to change Inf

Post-intervention graph?Pre-intervention graph

Exp Inf Rash

Conditioning vs. Intervening

P(Y | X = x1) vs. P(Y | X set= x1)

Teeth Slides

Causal Bayes Networks

P(S = 0) = .7P(S = 1) = .3

Smoking [0,1]

Lung Cancer[0,1]

Yellow Fingers[0,1]

P(S,YF, L) = P(S) P(YF | S) P(LC | S)

The Joint Distribution Factors

According to the Causal Graph,

i.e., for all X in V

P(V) = P(X|Immediate Causes of(X))

Structural Equation Models

1. Structural Equations2. Statistical Constraints

Education

LongevityIncome

Statistical Model

Causal Graph

Structural Equations: One Equation for each variable V in the graph:

V = f(parents(V), errorV)for SEM (linear regression) f is a linear function

Statistical Constraints: Joint Distribution over the Error terms

Education

LongevityIncome

Causal Graph

Equations: Education = ed

Income =Educationincome

Longevity =EducationLongevity

Statistical Constraints: (ed, Income,Income ) ~N(0,2)

2diagonal - no variance is zero

Education

LongevityIncome

Causal Graph

Education

Income Longevity

LongevityIncome

SEM Graph

(path diagram)

3. Connecting

Causation to Probability

Causal Structure

Statistical Predictions

The Markov Condition

Causal Graphs

Independence

X _||_ Z | Y

P(X | Y) = P(X | Y, Z)

Causal Markov Axiom

If G is a causal graph, and P a probability distribution over the variables in G, then in P:

every variable V is independent of its non-effects, conditional on its immediate causes.

Causal Markov Condition

Two Intuitions: 1) Immediate causes make effects independent

of remote causes (Markov).

2) Common causes make their effects independent (Salmon).

1) Immediate causes make effects independent of remote causes (Markov).

E || S | I

E = Exposure to Chicken Pox

I = Infected

S = Symptoms

Markov Cond.

2) Effects are independent conditional on their common causes.

YF || LC | S

Smoking (S)

Yellow Fingers (YF)

Lung Cancer (LC)

Markov Cond.

Causal Structure Statistical Data

X3 | X2 X1

X2 X3 X1

Causal Markov Axiom (D-separation)

Independence Relations

Acyclic Causal Graph

Causal Markov Axiom

In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram -

i.e., assuming that the model is common cause complete.

Causal Markov and D-Separation

• In acyclic graphs: equivalent

• Cyclic Linear SEMs with uncorrelated errors:• D-separation correct

• Markov condition incorrect

• Cyclic Discrete Variable Bayes Nets:• If equilibrium --> d-separation correct

• Markov incorrect

D-separation: Conditioning vs. Intervening

P(X3 | X2) P(X3 | X2, X1)

X3 _||_ X1 | X2

P(X3 | X2 set= ) = P(X3 | X2 set=, X1)

X3 _||_ X1 | X2 set=

4. Search

From Statistical Data

to Probability to Causation

Causal Discovery

Statistical Data Causal Structure

Background Knowledge

- X2 before X3

- no unmeasured common causes

X3 | X2 X1

Independence Relations

Statistical Inference

X2 X3 X1

Equivalence Class of Causal Graphs

X2 X3 X1

Discovery Algorithm

Causal Markov Axiom (D-separation)

Faithfulness

X3 | X2 X1

X2 X3 X1Causal Markov Axiom

(D-separation)

IndependenceRelations

Causal Markov Axiom(D-separation)

Special ParameterValues

No Independence Relations

Faithfulness Assumption

Statistical Constraints arise from Causal

Structure, not Coincidence

All independence relations holding in a

probability distribution P generated by a

causal structure G are entailed by d-

separation applied to G.

Faithfulness Assumption

Revenues = aRate + cEconomy + Rev.

Economy = bRate + Econ.

a -bcTax Revenues

Economyc

Tax Rate

Representations ofD-separation Equivalence Classes

We want the representations to:

• Characterize the Independence Relations Entailed by the Equivalence Class

• Represent causal features that are shared by every member of the equivalence class

Patterns & PAGs

• Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables.

• PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X

Patterns

Possible Edges Example

Patterns: What the Edges Mean

X2 X1X1 X2 in some members of theequivalence class, and X2 X1 inothers.

X1 X2 (X1 is a cause of X2) inevery member of the equivalenceclass.

X2 X1 X1 and X2 are not adjacent in anymember of the equivalence class

Patterns

Represents

Pattern

PAGs: Partial Ancestral Graphs

X2 There is a latent commoncause of X1 and X2

No set d-separates X2 and X1

X1 is a cause of X2

X2 is not an ancestor of X1

X2 X1 X1 and X2 are not adjacent

What PAG edges mean.

PAGs: Partial Ancestral Graphs

Represents

Overview of Search Methods

• Constraint Based Searches• TETRAD

• Scoring Searches• Scores: BIC, AIC, etc.• Search: Hill Climb, Genetic Alg., Simulated

Annealing• Difficult to extend to latent variable models

Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166

Search - Illustration

X1 Independencies entailed

X1 _||_ X2

X1_||_ X4 | X3

X2_||_ X4 | X3

CausalGraph

Independcies

Begin with:

X1 X4 {X3}

X2 X4 {X3}

Search: Adjacency

Causal Graph

Independcies

Begin with:

X1 X4 {X3}

X2 X4 {X3}

X1 X4 {X3}

X2 X4 {X3}

Search: Orientation in Patterns

X Z | YX Z | Y

Before OrientationY Unshielded

Collider Non-collider

Search: Orientation

PAG Pattern

X1 || X2

X1 || X4 | X3

X2 || X4 | X3

After Orientation Phase

The theory of interventions, simplified

Start with an graphical causal model, without feedback.

Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.

First Thing

Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x.

Recent work by Waldemann gives evidence that adults are sensitive to the difference.

Example

Y Z WBecause X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability.

But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X.

It should, necessarily, make the value of Y independent of X—informally, the value of Y should give no information about the value of X, and vice-versa.

Representing a Simple Manipulation

Observed Structure:

Structure uponManipulating Yellow Fingers:

Smoking [0,1]

Lung Cancer[0,1]

Yellow Fingers[0,1]

Smoking [0,1]

Lung Cancer [0,1]

Yellow Fingers = 0

Intervention Calculations

1. Set Y = y

2. Do “surgery” on the graph: eliminate edges into Y

3. Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.

Intervention Calculations

Y= y Z W

Original Markov Factorization

Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X)

The Factorization After Intervention

Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

What’s The Point?

Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

The probability distribution on the left hand side is a prediction of the effects of an intervention.

The probabilities on the right are all known before the intervention.

So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.

Surprising Results

Y Z WSuppose we know:

the causal structure

the joint probability for Y, Z, W only—not for X

We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.

Surprising Result

Y Z WThe effect of Z on W is confounded by the the probabilistic

effect of Y on Z, X on Y and X on W.

But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention.

How? By conditioning on Y.

Surprising Result

Y Z WPr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery)

Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y)

Pr(W | Do(Z = z), Y) = x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X)

Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem)

Pr(W | Do(Z = z)) = y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y)

The right hand side is composed entirely of observed probabilities.

Pearl’s “Do” Calculus

Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.

Applications

• Spartina Grass

• Parenting among Single, Black Mothers

• Pneumonia

• Photosynthesis

• Lead - IQ

• College Retention

• Corn Exports

• Rock Classification

• College Plans

• Political Exclusion

• Satellite Calibration

• Naval Readiness

References

• Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press)

• Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press

• Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press.

• TETRAD IV: www.phil.cmu.edu/projects/tetrad

• Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/

• Causality Lab: www.phil.cmu.edu/projects/causality-lab

penn state - march 23, 20071 the tetrad project: computational aids to causal discovery peter...

Documents

1 searching for causal models richard scheines philosophy,...

richard scheines peter spirtes, clark glymour, dept. of...

1 identifying predictors of cognitive change when the...

peter spirtes bayes net perspectives on causation and causal...

regionalwährungen in Österreich im geschichtlichen...

1 gr2002 peter spirtes carnegie mellon university

theoretical equivalence and invariant structure - home |...

1 causal inference and ambiguous manipulations richard...

1 feature discovery in the context of educational data...

1 tetrad: machine learning and graphcial causal models...

1 automatic causal discovery richard scheines peter spirtes,...

hans reichenbach’s probability...

1 richard scheines carnegie mellon university causal...

joe ramsey with enormous help from clark glymour as well as...

1 causal data mining richard scheines dept. of philosophy,...

annie blomberg, sam krumholz, emily scheines & … ·...

analisis del alumnado de la universidad de´ almer´ia...

causal inference and graphical models peter spirtes carnegie...

why glymour is a...

1 learning causal structure from observational and...