challenges in causality: results of the wcci 2008 challenge isabelle guyon, clopinet constantin...

42
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

Upload: christopher-reynolds

Post on 21-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Challenges in causality: Results of the WCCI 2008

challenge

Isabelle Guyon, ClopinetConstantin Aliferis and Alexander Statnikov, Vanderbilt Univ.

André Elisseeff and Jean-Philippe Pellet, IBM Zürich

Gregory F. Cooper, Pittsburg University

Peter Spirtes, Carnegie Mellon

Page 2: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causal discovery

Which actions will have beneficial effects?

…your health?

…climate changes?… the economy?

What affects…

Page 3: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

What is causality?

• Many definitions:– Science– Philosophy– Law– Psychology– History– Religion– Engineering

• “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)

Page 4: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

The system

Systemic causality

External agent

Page 5: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Difficulty

• A lot of “observational” data.

Correlation Causality!

• Experiments are often needed, but:– Costly– Unethical– Infeasible

Page 6: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causality workbench

http://clopinet.com/causality

Page 7: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Our approach

What is the causal question?

Why should we care?

What is hard about it?

Is this solvable?

Is this a good benchmark?

Page 8: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Four tasks

Toy datasets

Challenge datasets

Page 9: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

On-line feed-back

Page 10: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Toy Examples

Page 11: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS0: natural

Causality assessmentwith manipulations

Page 12: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

LUCAS1: manipulate

d

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Causality assessmentwith manipulations

Page 13: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS2: manipulate

d

Causality assessmentwith manipulations

Page 14: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Goal driven causality

0

9 4

11

61

10 2

3

7

5

8

• We define: V=variables of interest

(e.g. MB, direct causes, ...)

• We assess causal relevance: Fscore=f(V,S).

4 11 2 3 1

• Participants return: S=selected subset

(ordered or not).

Page 15: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causality assessmentwithout manipulation?

Page 16: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Using artificial “probes”

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

FatigueLUCAP0: natural

Probes

P1 P2 P3 PT

Page 17: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

FatigueLUCAP0: natural

Probes

P1 P2 P3 PT

Using artificial “probes”

Page 18: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Probes

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

P1 P2 P3 PT

LUCAP1&2:

manipulated

Using artificial “probes”

Page 19: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Scoring using “probes”

• What we can compute (Fscore):

– Negative class = probes (here, all “non-causes”, all manipulated).

– Positive class = other variables (may include causes and non causes).

• What we want (Rscore):

– Positive class = causes.

– Negative class = non-causes.

• What we get (asymptotically):

Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal)

Page 20: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Results

Page 21: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

AUC distribution

Page 22: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Methods employed

• Causal: Methods employing causal discovery technique to unravel cause-effect relationships in the neighborhood of the target.

• Markov blanket: Methods for extracting the Markov blanket, without attempting to unravel cause-effect relationships.

• Feature selection: Methods for selecting predictive features making no explicit attempt to uncover the Markov blanket or perform causal discovery.

Page 23: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Formalism:Causal Bayesian networks

• Bayesian network:– Graph with random variables X1, X2, …Xn as

nodes.– Dependencies represented by edges.– Allow us to compute P(X1, X2, …Xn) as

i P( Xi | Parents(Xi) ).

– Edge directions have no meaning.

• Causal Bayesian network: egde directions indicate causality.

Page 24: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causal discovery from “observational data”

Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999)

Let A, B, C X and V X. Initialize with a fully connected un-oriented graph.1. Conditional independence. Cut connection if

V s.t. (A B | V).2. Colliders. In triplets A — C — B (A — B) if there is

no subset V containing C s.t. A B | V, orient edges as: A C B.

3. Constraint-propagation. Orient edges until no change:

(i) If A B … C, and A — C then A C. (ii) If A B — C then B C.

Page 25: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Computational and statistical complexity

Computing the full causal graph poses:• Computational challenges (intractable for large numbers of

variables)• Statistical challenges (difficulty of estimation of conditional

probabilities for many var. w. few samples).

Compromise:• Develop algorithms with good average- case

performance, tractable for many real-life datasets.• Abandon learning the full causal graph and instead

develop methods that learn a local neighborhood.• Abandon learning the fully oriented causal graph and

instead develop methods that learn unoriented graphs.

Page 26: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Target Y

A prototypical MB algo: HITON

Aliferis-Tsamardinos-Statnikov, 2003

Page 27: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Target Y

1 – Identify variables with direct edges to the target

(parent/children)

Aliferis-Tsamardinos-Statnikov, 2003

Page 28: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Target Y

Aliferis-Tsamardinos-Statnikov, 2003

1 – Identify variables with direct edges to the target

(parent/children)

A

B Iteration 1: add A

Iteration 2: add B

Iteration 3: remove A because A Y | B

etc.

A

A B

B

Page 29: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Target Y

Aliferis-Tsamardinos-Statnikov, 2003

2 – Repeat algorithm for parents and children of Y(get

depth two relatives)

Page 30: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Target Y

Aliferis-Tsamardinos-Statnikov, 2003

3 – Remove non-members of the MB

A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B .

A

B

Page 31: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Collider

Spouse

Target Y

Spouse

Collider

Aliferis-Tsamardinos-Statnikov, 2003

4 – Orient edges1. Colliders:

• The presence of a spouse determines a collider.

• The target may also be a collider (B C | Y).

B C

Page 32: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Collider

Spouse

Target Y

Spouse

Collider

Aliferis-Tsamardinos-Statnikov, 2003

4 – Orient edges1. Colliders:

• The presence of a spouse determines a collider.

• The target may also be a collider (B C | Y).

2. Orient remaining edges.

B C

Page 33: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Additional Bells and Whistles

• The basic algorithms make simplifying assumptions:– Faithfulness (any conditional independence

between two variables results in an absence of direct edge.)

– Causal sufficiency (there are no unobserved common causes of the observed variables.)

• Laura E. Brown &Ioannis Tsamardinos:– Violations of “faithfulness”: select product of

features.– Violation of “causal sufficiency”: use Y structures.

Page 34: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Discussion

Page 35: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Top ranking methods

• According to the rules of the challenge:– Yin Wen Chang: SVM => best prediction accuracy on

REGED and CINA. – Gavin Cawley: Causal explorer + linear ridge

regression ensembles => best prediction accuracy on SIDO and MARTI.

• According to pairwise comparisons:– Jianxin Yin and Prof. Zhi Geng’s group: Partial

Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.

Page 36: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Pairwise comparisons

Gavin CawleyYin-Wen Chang

Mehreen Saeed

Alexander Borisov

E. Mwebaze & J. QuinnH. Jair Escalante

J.G. Castellano

Chen Chu AnLouis Duclos-Gosselin

Cristian Grozea

H.A. Jen

J. Yin & Z. Geng Gr.Jinzhu Jia

Jianming Jin

L.E.B & Y.T.

M.B.Vladimir Nikulin

Alexey Polovinkin

Marius PopescuChing-Wei Wang

Wu Zhili

Florin Popescu

CaMML TeamNistor Grozavu

Page 37: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causal vs. non-causal

Jianxin Yin: causal Vladimir Nikulin: non-causal

Page 38: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Using manip-MB as feature set using a causal model

Unmanipulated (training)

Manipulation #1 (test)

Heuristic: (1) Use the post-manipulation MB as feature set; (2) train a classifier to predict Y on training data (from the unmanipulated distribution).

Manipulation #2 (test)

Problem: Manipulated children of the target may remain in the post-manipulation MB (if they are also spouses) but with a different dependency to the target.

Page 39: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

MB is not the best feature set?

Some features outside the MB may enhance predictivity if:a. Some MB features go undetected (e.g. the direct

causes are children of a common ancestor).

b. The predictor is too “weak” (e.g. the relationship to the target is non-linear but the predictor is linear).

Y

X

Z

y=a x2 z= x2

(a) (b)

Page 40: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Insensitivity to irrelevant features

Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i wixi < v.

ng number of “good” (relevant) features

nb number of “bad” (irrelevant) features

m number of training examples.

Page 41: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Conclusion

• Causal discovery from observational data is not an impossible task, but a very hard one.

• This points to the need for further research and benchmark.

• Don’t miss the “pot-luck challenge”

http://clopinet.com/causality

Page 42: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

1) Causal Feature SelectionI. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, 2007.

2) Design and Analysis of theCausation and Prediction Challenge

I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff,J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press.

http://clopinet.com/causality