1 learning causal structure from observational and experimental data richard scheines carnegie...

Post on 31-Mar-2015

220 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Learning Causal Structure from

Observational and Experimental Data

Richard Scheines

Carnegie Mellon University

Causation, Statistics, and Experiments

2

Francis Bacon

Galileo Galilei

Sewall Wright

Trygve Haavelmo

Charles Spearman

Udny Yule

Sir Ronald A. Fisher

Jerzy Neyman

1500 1600 ….. …… 1900 1930 1960

1990

Graphical

Causal Models

Potential

Outcomes

3

Causal Graph G = {V,E}

Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Causal Graphs

Years of Education

Income

IncomeSkills and Knowledge

Years of Education

4

Causal Markov Axiom Acyclicity

d-separation criterion

Independence OracleCausal Graph

Z X Y1

Z _||_ Y1 | X Z _||_ Y2 | X

Z _||_ Y1 | X,Y2 Z _||_ Y2 | X,Y1

Y1 _||_ Y2 | X Y1 _||_ Y2 | X,ZY2

Bridge Principles: Causal Graph over V Constraints on P(V)

5

Faithfulness

Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G.

Revenues = aRate + cEconomy + eRev.

Economy = bRate + eEcon.

Faithfulness: a ≠ -bcTax Revenues

Economyc

ba

Tax Rate

6

Faithfulness

Gene A

Gene B

Protein 24

++

- By evolutionary design:

Gene A _||_ Protein 24

Air

Temp Core Body

Temp

Homeostatic

Regulator

By evolutionary design:

Air temp _||_ Core Body Temp

Sampling Rate vs. Equilibration rate

7

Causal Structure Association

TV Obesity

TV Obesity

ObesityTV

C

TV _||_ Obesity

TV _||_ Obesity

TV _||_ Obesity

8

Sweaters On

Room Temperature

Pre-experimental SystemPost

Modeling Ideal Interventions

Interventions on the Effect

9

Modeling Ideal Interventions

SweatersOn

Room Temperature

Pre-experimental SystemPost

Interventions on the Cause

10

Interventions & Causal GraphsModel an ideal intervention by adding an “intervention” variable

outside the original system as a direct cause of its target.

Education Income Taxes Pre-intervention graph

Intervene on Income

“Soft” Intervention

Education Income Taxes

I

“Hard” Intervention

Education Income Taxes

I

11

Association underdetermines Causal Structure

TV Obesity

TV Obesity

ObesityTV

C

TV _||_ Obesity

TV _||_ Obesity

TV _||_ Obesity

Spurious Association

12

Randomization Association = Causation

TV Obesity

TV Obesity

ObesityTV

C

TV _||_ Obesity

TV _||_ Obesity

TV _||_ Obesity

Randomizer

Randomizer

Randomizer

13

Randomization Association = Causation

Treatment _||_ Response

Treatment ResponseRandomizer

U

TreatmentAssignment

Treatment _||_ Response | Dropout = no

Treatment

Response

Randomizer

U

Dropout

14

Randomization Association = Causation

Treatment _||_ Response

Treatment ResponseRandomizerTreatment

Assignment

Belief

15

Experimental Control & Statistical Control

X3 _||_ X1 | CX3 _||_ X1 | C(set)

Statistically control for CExperimentally control for C

X1 X3

C

Randomizer

X3 _||_ X1 | MX3 _||_ X1 | M(set)

Statistically control for MExperimentally control for M

X1 X3

M

Randomizer

16

Experimental Control ≠ Statistical Control

X3 _||_ X1 | M(set)

Statistically control for MExperimentally control for M

X1 X3

M

Randomizer

UX3 _||_ X1 | M

X3 _||_ X1 | M(set)

Statistically control for MExperimentally control for M

Randomizer

X3 _||_ X1 | M

X1 X3

M U2U1

17

Causal Model(V)

• X Y Z

• Structural Eqs.(V) or CPT(V)

Experimental Setup(V)

• V = {O, M}• P(M)

Manipulated Causal ModelM(V)

• X Y Z

• Structural Eqs.M(V) or CPTM (V)

I

PM(V)

Data

Sampling

P(V) = f(Causal Model(V), Experimental Setup(V))

18

Experimental Setup(V)

• V = {O, M}• P(M)

PM(V)

Data

StatisticalInference

Discovery Algorithm

Equivalence Class of Causal Structures

Causal Discovery

General Assumptions- Markov, Faithfulness- Linearity- Gaussianity- Acyclicity- Etc.

19

Causal Discoveryfrom Passive Observation

• PC, GES Patterns (Markov equivalence class - no latent confounding)

• FCI PAGs (Markov equivalence - including confounders and selection bias)

• CCD Linear cyclic models (no confounding)

• BPC Linear latent variable models

• Lingam unique DAG (no confounding – linear non-Gaussian – faithfulness not

needed)

• LVLingam set of DAGs (confounders allowed)

• CyclicLingam set of DGs (cyclic models, no confounding)

• Non-linear additive noise models unique DAG

20

Causal Discoveryfrom Manipulations/Interventions

• Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0

• Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform)

• Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I))

• Simultaneous interventions

• Sequential interventions

• Sequential, conditional interventions

• Time sensitive interventions

• Shock and run: Set X at time t, and then let the system run

• Clamp : Set X at time t, and hold it fixed until time t + D

What sorts of manipulation/interventions have been studied?

X Y

21

Causal Discoveryfrom Manipulations/Interventions

Simultaneous Interventions Destroy Information

Experimental Setup

Randomize(X,Y) independently

PM(V) X _||_ Y

X Y

Equivalence Class

X Y

X Y

X Y

X Y

X Y

X Y

X Y

22

Causal Discoveryfrom Manipulations/Interventions

Simultaneous Interventions Destroy Information, but:

Sequence of single interventions over N variables,

N-1 experiments are needed to guarantee causal identification

Sequence of simultaneous interventions: 2 log(N) + 1

23

Causal Discoveryfrom Manipulations/Interventions

Equivalence class oddities

X Y

True Model Experimental Setup

Randomize(Y)

PM(V) X _||_ YX Y

I

24

Causal Discoveryfrom Manipulations/Interventions

Equivalence class oddities

Experimental Setup

Randomize(Y)

PM(V) X _||_ Y

X Y

Equivalence Class

X Y

X Y

X Y

X Y

25

Causal Discoveryfrom Manipulations/Interventions

Equivalence class oddities

Experimental Setup

Randomize(X,Y) independently

PM(V) X _||_ Z

Equivalence Class

• X is an ancestor of Z

• X has a path to Z not through Y

26

Issues

• Efficiently representing a wider array of information relevant to

causal structure discovery, and then efficiently combining it to

maximally constrain the possible explanations of data

• Rate of reaching equilibrium vs. rate of sampling

• Transportability

• Constructing appropriate variables from raw measurements

• High dimensionality

top related