penn state - march 23, 20071 the tetrad project: computational aids to causal discovery peter...

Post on 05-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Penn State - March 23, 2007 1

The TETRAD Project: Computational Aids to

Causal Discovery

Peter Spirtes, Clark Glymour, Richard Scheines

and many others

Department of Philosophy

Carnegie Mellon

Penn State - March 23, 2007 2

Agenda

1. Morning I: Theoretical Overview

Representation, Axioms, Search

2. Morning II: Research Problems

3. Afternoon: TETRAD Demo - Workshop

Penn State - March 23, 2007 3

Part I: Agenda

1. Motivation

2. Representation

3. Connecting Causation to Probability (Independence)

4. Searching for Causal Models

5. Improving on Regression for Causal Inference

Penn State - March 23, 2007 4

1. Motivation

Non-experimental Evidence

Typical Predictive Questions

• Can we predict aggressiveness from the amount of violent TV watched

Causal Questions:

• Does watching violent TV cause Aggression?

• I.e., if we intervene to change TV watching, will the level of Aggression

change?

Day Care Aggressiveness

John

Mary

A lot

None

A lot

A little

Penn State - March 23, 2007 5

Causal Estimation

Manipulated Probability P(Y | X set= x, Z=z)

from

Unmanipulated Probability P(Y,X,Z)

When and how can we use non-experimental data to tell us about the effect of an intervention?

Penn State - March 23, 2007 6

Spartina in the Cape Fear Estuary

Penn State - March 23, 2007 7

What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary?

pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what?

14 variables for 45 samples of Spartina from Cape Fear Estuary.

Biologist concluded salinity must be a factor.

Bayes net analysis says only pH directly affects Spartina biomass

Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.

Penn State - March 23, 2007 8

2. Representation

1. Association & causal structure -

qualitatively

2. Interventions

3. Statistical Causal Models

1. Bayes Networks

2. Structural Equation Models

Penn State - March 23, 2007 9

Causation & Association

X is a cause of Y iff x1 x2 P(Y | X set= x1) P(Y | X set= x2)

Causation is asymmetric: X Y Y X

X and Y are associated (X _||_ Y) iff

x1 x2 P(Y | X = x1) P(Y | X = x2)

Association is symmetric: X _||_ Y Y _||_ X

Penn State - March 23, 2007 10

Direct Causation

X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Z set= z)

P(Y | X set= x2 , Z set= z)

where Z = S - {X,Y} X Y

Penn State - March 23, 2007 11

Causal Graphs

Causal Graph G = {V,E} Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Exposure Rash

Exposure Infection Rash

Chicken Pox

Penn State - March 23, 2007 12

Causal Graphs

Not Cause Complete

Common Cause Complete

Exposure Infection Symptoms

Omitted Causes

Exposure Infection Symptoms

Omitted

Common Causes

Penn State - March 23, 2007 13

Sweaters

On

Room Temperature

Pre-experimental SystemPost

Modeling Ideal Interventions

Interventions on the Effect

Penn State - March 23, 2007 14

Modeling Ideal Interventions

Sweaters

OnRoom

Temperature

Pre-experimental SystemPost

Interventions on the Cause

Penn State - March 23, 2007 15

Ideal Interventions & Causal Graphs

• Model an ideal intervention by adding an “intervention” variable outside the original system

• Erase all arrows pointing into the variable intervened upon

Exp Inf

Rash

Intervene to change Inf

Post-intervention graph?Pre-intervention graph

Exp Inf Rash

I

Penn State - March 23, 2007 16

Conditioning vs. Intervening

P(Y | X = x1) vs. P(Y | X set= x1)

Teeth Slides

Penn State - March 23, 2007 17

Causal Bayes Networks

P(S = 0) = .7P(S = 1) = .3

P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20

Smoking [0,1]

Lung Cancer[0,1]

Yellow Fingers[0,1]

P(S,YF, L) = P(S) P(YF | S) P(LC | S)

The Joint Distribution Factors

According to the Causal Graph,

i.e., for all X in V

P(V) = P(X|Immediate Causes of(X))

Penn State - March 23, 2007 18

Structural Equation Models

1. Structural Equations2. Statistical Constraints

Education

LongevityIncome

Statistical Model

Causal Graph

Penn State - March 23, 2007 19

Structural Equation Models

Structural Equations: One Equation for each variable V in the graph:

V = f(parents(V), errorV)for SEM (linear regression) f is a linear function

Statistical Constraints: Joint Distribution over the Error terms

Education

LongevityIncome

Causal Graph

Penn State - March 23, 2007 20

Structural Equation Models

Equations: Education = ed

Income =Educationincome

Longevity =EducationLongevity

Statistical Constraints: (ed, Income,Income ) ~N(0,2)

2diagonal - no variance is zero

Education

LongevityIncome

Causal Graph

Education

Income Longevity

1 2

LongevityIncome

SEM Graph

(path diagram)

Penn State - March 23, 2007 21

3. Connecting

Causation to Probability

Penn State - March 23, 2007 22

Causal Structure

Statistical Predictions

The Markov Condition

Causal Graphs

Z Y X

Independence

X _||_ Z | Y

i.e.,

P(X | Y) = P(X | Y, Z)

Causal Markov Axiom

Penn State - March 23, 2007 23

Causal Markov Axiom

If G is a causal graph, and P a probability distribution over the variables in G, then in P:

every variable V is independent of its non-effects, conditional on its immediate causes.

Penn State - March 23, 2007 24

Causal Markov Condition

Two Intuitions: 1) Immediate causes make effects independent

of remote causes (Markov).

2) Common causes make their effects independent (Salmon).

Penn State - March 23, 2007 25

Causal Markov Condition

1) Immediate causes make effects independent of remote causes (Markov).

E || S | I

E = Exposure to Chicken Pox

I = Infected

S = Symptoms

S I E

Markov Cond.

Penn State - March 23, 2007 26

Causal Markov Condition

2) Effects are independent conditional on their common causes.

YF || LC | S

Smoking (S)

Yellow Fingers (YF)

Lung Cancer (LC)

Markov Cond.

Penn State - March 23, 2007 27

Causal Structure Statistical Data

X3 | X2 X1

X2 X3 X1

Causal Markov Axiom (D-separation)

Independence Relations

Acyclic Causal Graph

Penn State - March 23, 2007 28

Causal Markov Axiom

In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram -

i.e., assuming that the model is common cause complete.

Penn State - March 23, 2007 29

Causal Markov and D-Separation

• In acyclic graphs: equivalent

• Cyclic Linear SEMs with uncorrelated errors:• D-separation correct

• Markov condition incorrect

• Cyclic Discrete Variable Bayes Nets:• If equilibrium --> d-separation correct

• Markov incorrect

Penn State - March 23, 2007 30

D-separation: Conditioning vs. Intervening

X3

T

X2 X1

X3

T

X2 X1

I

P(X3 | X2) P(X3 | X2, X1)

X3 _||_ X1 | X2

P(X3 | X2 set= ) = P(X3 | X2 set=, X1)

X3 _||_ X1 | X2 set=

Penn State - March 23, 2007 31

4. Search

From Statistical Data

to Probability to Causation

Penn State - March 23, 2007 32

Causal Discovery

Statistical Data Causal Structure

Background Knowledge

- X2 before X3

- no unmeasured common causes

X3 | X2 X1

Independence Relations

Data

Statistical Inference

X2 X3 X1

Equivalence Class of Causal Graphs

X2 X3 X1

X2 X3 X1

Discovery Algorithm

Causal Markov Axiom (D-separation)

Penn State - March 23, 2007 33

Faithfulness

X3 | X2 X1

X2 X3 X1Causal Markov Axiom

(D-separation)

IndependenceRelations

X2 X3

X1

Causal Markov Axiom(D-separation)

Special ParameterValues

No Independence Relations

Penn State - March 23, 2007 34

Faithfulness Assumption

Statistical Constraints arise from Causal

Structure, not Coincidence

All independence relations holding in a

probability distribution P generated by a

causal structure G are entailed by d-

separation applied to G.

Penn State - March 23, 2007 35

Faithfulness Assumption

Revenues = aRate + cEconomy + Rev.

Economy = bRate + Econ.

a -bcTax Revenues

Economyc

ba

Tax Rate

Penn State - March 23, 2007 36

Representations ofD-separation Equivalence Classes

We want the representations to:

• Characterize the Independence Relations Entailed by the Equivalence Class

• Represent causal features that are shared by every member of the equivalence class

Penn State - March 23, 2007 37

Patterns & PAGs

• Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables.

• PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X

Penn State - March 23, 2007 38

Patterns

X2 X1

X2 X1

X2 X1

X4 X3

X2 X1

Possible Edges Example

Penn State - March 23, 2007 39

Patterns: What the Edges Mean

X2 X1

X2 X1X1 X2 in some members of theequivalence class, and X2 X1 inothers.

X1 X2 (X1 is a cause of X2) inevery member of the equivalenceclass.

X2 X1 X1 and X2 are not adjacent in anymember of the equivalence class

Penn State - March 23, 2007 40

Patterns

X2

X4 X3

X1

X2

X4 X3

Represents

Pattern

X1 X2

X4 X3

X1

Penn State - March 23, 2007 41

PAGs: Partial Ancestral Graphs

X2 X1

X2 X1

X2 X1

X2 There is a latent commoncause of X1 and X2

No set d-separates X2 and X1

X1 is a cause of X2

X2 is not an ancestor of X1

X1

X2 X1 X1 and X2 are not adjacent

What PAG edges mean.

Penn State - March 23, 2007 42

PAGs: Partial Ancestral Graphs

X2

X3

X1

X2

X3

Represents

PAG

X1 X2

X3

X1

X2

X3

T1

X1

X2

X3

X1

etc.

T1

T1 T2

Penn State - March 23, 2007 43

Overview of Search Methods

• Constraint Based Searches• TETRAD

• Scoring Searches• Scores: BIC, AIC, etc.• Search: Hill Climb, Genetic Alg., Simulated

Annealing• Difficult to extend to latent variable models

Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166

Penn State - March 23, 2007 44

Search - Illustration

X4 X3

X2

X1 Independencies entailed

X1 _||_ X2

X1_||_ X4 | X3

X2_||_ X4 | X3

Penn State - March 23, 2007 45

X1

X2

X3 X4

CausalGraph

Independcies

Begin with:

X1

X2

X3 X4

X1 X2

X1 X4 {X3}

X2 X4 {X3}

Search: Adjacency

Penn State - March 23, 2007 46

X1

X2

X3 X4

Causal Graph

Independcies

Begin with:

From

X1

X2

X3 X4

X1 X2

X1 X4 {X3}

X2 X4 {X3}

X1

X2

X3 X4

X1

X2

X3 X4

X1

X2

X3 X4

From

From

X1 X2

X1 X4 {X3}

X2 X4 {X3}

Penn State - March 23, 2007 47

Search: Orientation in Patterns

X Y Z

X Z | YX Z | Y

Before OrientationY Unshielded

Collider Non-collider

X Y Z

X Y Z

X Y Z

X Y Z

X Y Z

Penn State - March 23, 2007 48

Search: Orientation

X4 X3

X2

X1

X4 X3

X2

X1

X4 X3

X2

X1

X4 X3

X2

X1

X4 X3

X2

X1

PAG Pattern

X4 X3

X2

X1

X1 || X2

X1 || X4 | X3

X2 || X4 | X3

After Orientation Phase

Penn State - March 23, 2007 49

The theory of interventions, simplified

Start with an graphical causal model, without feedback.

Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.

Penn State - March 23, 2007 50

First Thing

Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x.

Recent work by Waldemann gives evidence that adults are sensitive to the difference.

Penn State - March 23, 2007 51

Example

X

Y Z WBecause X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability.

But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X.

It should, necessarily, make the value of Y independent of X—informally, the value of Y should give no information about the value of X, and vice-versa.

Penn State - March 23, 2007 52

Representing a Simple Manipulation

Observed Structure:

Structure uponManipulating Yellow Fingers:

Smoking [0,1]

Lung Cancer[0,1]

Yellow Fingers[0,1]

Smoking [0,1]

Lung Cancer [0,1]

Yellow Fingers = 0

Penn State - March 23, 2007 53

Intervention Calculations

X

Y Z W

1. Set Y = y

2. Do “surgery” on the graph: eliminate edges into Y

3. Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.

Penn State - March 23, 2007 54

Intervention Calculations

X

Y= y Z W

Original Markov Factorization

Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X)

The Factorization After Intervention

Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

Penn State - March 23, 2007 55

What’s The Point?

Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

The probability distribution on the left hand side is a prediction of the effects of an intervention.

The probabilities on the right are all known before the intervention.

So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.

Penn State - March 23, 2007 56

Surprising Results

X

Y Z WSuppose we know:

the causal structure

the joint probability for Y, Z, W only—not for X

We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.

Penn State - March 23, 2007 57

Surprising Result

X

Y Z WThe effect of Z on W is confounded by the the probabilistic

effect of Y on Z, X on Y and X on W.

But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention.

How? By conditioning on Y.

Penn State - March 23, 2007 58

Surprising Result

X

Y Z WPr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery)

Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y)

Pr(W | Do(Z = z), Y) = x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X)

Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem)

Pr(W | Do(Z = z)) = y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y)

The right hand side is composed entirely of observed probabilities.

Penn State - March 23, 2007 59

Pearl’s “Do” Calculus

Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.

Penn State - March 23, 2007 60

Applications

• Spartina Grass

• Parenting among Single, Black Mothers

• Pneumonia

• Photosynthesis

• Lead - IQ

• College Retention

• Corn Exports

• Rock Classification

• College Plans

• Political Exclusion

• Satellite Calibration

• Naval Readiness

Penn State - March 23, 2007 61

References

• Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press)

• Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press

• Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press.

• TETRAD IV: www.phil.cmu.edu/projects/tetrad

• Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/

• Causality Lab: www.phil.cmu.edu/projects/causality-lab

top related