can causal models be evaluated? isabelle guyon clopinet / chalearn

Can causal models be evaluated?

Isabelle Guyon

ClopiNet / ChaLearn

http://clopinet.com/causality [email protected]

1) Feature Extraction, Foundations and ApplicationsI. Guyon, S. Gunn, et al.Springer, 2006.http://clopinet.com/fextract-book

2) Causation and Prediction ChallengeI. Guyon, C. Aliferis, G. Cooper,

A. Elisseeff, J.-P. Pellet, P. Spirtes, and A. Statnikov, Eds. CiML, volume 2, Microtome. 2010.

http://www.mtome.com/Publications/CiML/ciml.html

Acknowledgements and references

http://gesture.chalearn.org

Co-founders:

Constantin Aliferis Alexander Statnikov

André Elisseeff Jean-Philippe Pellet

Gregory F. Cooper Peter Spirtes

ChaLearn directors and advisors:

Alexander Statnivov Ioannis Tsamardinos

Richard Scheines Frederick Eberhardt

Florin Popescu

Preparation of ExpDeCoExperimental design in causal

discovery

• Motivations• Quiz• What we want to do (next challenge)• What we already set up (virtual lab)• What we could improve• Your input…

Note: Experiment = manipulation = action

Causal discovery motivations (1)

Interesting problems

which actions will have beneficial effects?

…your health?

…climate changes?

… the economy?

What affects…

and…

Predict the consequences of (new)

actions• Predict the outcome of actions

– What if we ate only raw foods?– What if we imposed to paint all cars white?– What if we broke up the Euro?

• Find the best action to get a desired outcome– Determine treatment (medicine)– Determine policies (economics)

• Predict counterfactuals– A guy not wearing his seatbelt died in a car

accident. Would he have died had he worn it?

Causal discovery motivations (2) Lots of

data available

http://data.govhttp://data.uk.govhttp://www.who.int/research/en/http://www.ncdc.noaa.gov/oa/ncdc.htmlhttp://neurodatabase.org/http://www.ncbi.nlm.nih.gov/Entrez/http://www.internationaleconomics.net/data.htmlhttp://www-personal.umich.edu/~mejn/netdata/http://www.eea.europa.eu/data-and-maps/

Causal discovery motivations (3) Classical

ML helpless

X

YY

X

Y

Predict the consequences of actions:

Under “manipulations” by an external agent, only causes are predictive, consequences and confounders are not.

Y


ML helpless

X

Y

If manipulated, a cause influences the outcome…

Y


ML helpless

X

Y

… a consequence does not …

Y


ML helpless

X

Y

… neither does a confounder (consequence of a common cause).

Y


ML helpless


ML helpless• Special case: stationary or cross-sectional

data (no time series).• Superficially, the problem resembles a

classical feature selection problem.

X

n

m

n’

What could be the causal graph?

Could it be that?

Y

X1 X2

x2

Let’s try

x1

Y

X1 X2

Simpson’s paradox

X1 || X2 | Yx1

Y

Could it be that?

Y

X1 X2

x2

x1

Let’s try

Y

X1 X2

Y

Plausible explanation

baseline(X2)

health(Y)

peak(X1)

X2 X1

180 190 200 210 220 230 240 250 260

20

40

60

80

100

120

peak

baselineY

normaldisease

x1

x2

X2 || Y

X2 || Y | X1

x1

What we would like

Y

X1 X2

Yx2

x1

Manipulate X1

Y

X1 X2

Yx2

x1

Y

X1 X2

Yx2

Manipulate X2

What we want to do

Causal data miningHow are we going to do it?

Obstacle 1: Practical

Many statements of the "causality problem"

Obstacle 2: Fundamental

It is very hard to assess solutions

Evaluation

• Experiments are often:– Costly– Unethical– Infeasible

• Non-experimental “observational” data is abundant and costs less.

New challenge: ExpDeCo

Experimental design in causal discovery

- Goal: Find variables that strongly influence an outcome- Method:

- Learn from a “natural” distribution (observational data)

- Predict the consequences of given actions (checked against a test set of “real” experimental data)

- Iteratively refine the model with experiments (using on-line learning from experimental data)

What we have already done

QUERIES

ANSWERS

Database

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Models of systems

http://clopinet.com/causality

February 2007: Project starts. Pascal2 funding.August 2007: Two-year NSF grant.Dec. 2007: Workbench alive. 1st causality challenge.Sept. 2008: 2nd causality challenge (Pot luck).Fall 2009: Virtual lab alive. Dec. 2009: Active Learning Challenge (Pascal2).December 2010: Unsupervised and Transfer Learning

Challenge (DARPA).Fall 2012: ExpDeCo (Pascal2)Planned: CoMSiCo

What remains to be done

ExpDeCo (new challenge)

Setup:• Several paired datasets (preferably or real data):

– “Natural” distribution – “Manipulated” distribution

• Problems– Learn a causal model from the natural distribution– Assessment 1: test with natural distribution– Assessment 2: test with manipulated distribution– Assessment 3: on-line learning from manipulated

distribution (sequential design of experiments)

Challenge design constraints

- Largely not relying on “ground truth” this is difficult or impossible to get (in real data)

- Not biased towards particular methods

- Realistic setting as close as possible to actual use

- Statistically significant, not involving "chance“

- Reproducible on other similar data

- Not specific of very particular settings

- No cheating possible

- Capitalize on classical experimental design

Lessons learned from the Causation & Prediction

Challenge

Causation and Prediction challenge

Toy datasets

Challenge datasets

Assessment w. manipulations (artificial data)

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy


Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS0: natural

Causality assessmentwith manipulations

LUCAS1: manipulate

d

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy


Yellow Fingers

Car Accident

Born an Even Day

Fatigue


Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy


Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS2: manipulate

d


•Participants score feature relevance: S=ordered list of features

•We assess causal relevance with AUC=f(V,S)

Assessment w. ground truth

0

9 4

11

61

10 2

3

7

5

8

• We define: V=variables of interest

(Theoretical minimal set of predictive variables, e.g.MB, direct causes, ...)

4 11 2 3 1

Assessment without manip. (real data)

Using artificial “probes”

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy


Yellow Fingers

Car Accident

Born an Even Day

FatigueLUCAP0: natural

Probes

P1 P2 P3 PT

Probes

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy


Yellow Fingers

Car Accident

Born an Even Day

Fatigue

P1 P2 P3 PT

LUCAP1&2:

manipulated

Using artificial “probes”

Scoring using “probes”

• What we can compute (Fscore):

– Negative class = probes (here, all “non-causes”, all manipulated).

– Positive class = other variables (may include causes and non causes).

• What we want (Rscore):

– Positive class = causes.

– Negative class = non-causes.

• What we get (asymptotically):

Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal)

Pairwise comparisons

Gavin CawleyYin-Wen Chang

Mehreen Saeed

Alexander Borisov

E. Mwebaze & J. QuinnH. Jair Escalante

J.G. Castellano

Chen Chu AnLouis Duclos-Gosselin

Cristian Grozea

H.A. Jen

J. Yin & Z. Geng Gr.Jinzhu Jia

Jianming Jin

L.E.B & Y.T.

M.B.Vladimir Nikulin

Alexey Polovinkin

Marius PopescuChing-Wei Wang

Wu Zhili

Florin Popescu

CaMML TeamNistor Grozavu

Causal vs. non-causal

Jianxin Yin: causal Vladimir Nikulin: non-causal

Insensitivity to irrelevant features

Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i wixi < v.

ng number of “good” (relevant) features

nb number of “bad” (irrelevant) features

m number of training examples.

How to overcome this problem?

• Leaning curve in terms of number of features revealed– Without re-training on manipulated data

– With on-line learning with manipulated data

• Give pre-manipulation variable values and the value of the manipulation

• Other metrics: stability, residuals, instrument variables, missing features by design

Conclusion(more:

http://clopinet.com/causality) • We want causal discovery to become “mainstream” data

mining• We believe we need to start with “simple” standard

procedures of evaluation• Our design is close enough to a typical prediction

problem, but– Training on natural distribution– Test on manipulated distribution

• We want to avoid pitfalls of previous challenge designs:– Reveal only pre-manipulated variable values– Reveal variables progressively “on demand”

can causal models be evaluated? isabelle guyon clopinet / chalearn

Documents