1 causality challenge #2: pot-luck isabelle guyon, clopinet constantin aliferis and alexander...

54
1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

Upload: milo-wheeler

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

1

Causality challenge #2:Pot-Luck

Isabelle Guyon, ClopinetConstantin Aliferis and Alexander Statnikov, Vanderbilt Univ.

André Elisseeff and Jean-Philippe Pellet, IBM Zürich

Gregory F. Cooper, Pittsburg University

Peter Spirtes, Carnegie Mellon

Page 2: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

2

Motivations

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 3: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

3

Causality Workbench

• February 2007: Project starts. Initial funding of the EU Pascal network.

• August 15, 2007: Two-year grant from the US National Science Foundation.

• December 15, 2007: Workbench made alive. First causality challenge: causation an prediction.

• June 3-4, 2008: WCCI 2008, workshop to discuss the results of the first challenge.

• September 15, 2008: Start pot-luck challenge. Target: NIPS 2008.

• Fall, 2008: Start developing an interactive workbench.

Page 4: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

4

Why a new challenge?

• Causality challenge #1– Favor “depth”

• Single well defined task

• Rigor of performance assessment

• Causality challenge #2– Favor “breadth”

• Many different tasks

• Encourage creativity

Page 5: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

5

http://clopinet.com/causality

5

Page 6: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

6

artif

Pot-Luck challenge

• CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions.

• LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI.

• PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years.

• SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system.

• TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all.

self eval

self eval

real

real

artif

artif

artif

Page 7: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

7

Learning causal structure

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 8: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

8

What is causality?

• Many definitions.• Pragmatic (engineering) view: predicting

the consequences of ACTIONS.• Distinct from making predictions in a

stationary environment.• Canonical methodology: designed

experiments.• Causal discovery from observational data.

Page 9: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

9

The “language” ofcausal Bayesian networks

• Bayesian network:– Graph with random variables X1, X2, …Xn as

nodes.– Dependencies represented by edges.– Allow us to compute P(X1, X2, …Xn) as

i P( Xi | Parents(Xi) ).

– Edge directions have no meaning.

• Causal Bayesian network: egde directions indicate causality.

Page 10: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

10

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS0: natural

Small example

Markov boundary

Page 11: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

11

Arrows indicate “mechanisms”

If Lung Cancer (LC) is determined by Smoking (S) and Genetics (G),

• In the language of BN, use the data table:P(LC=1| S=1, G=1)=… , P(LC=0| S=1, G=1)=…P(LC=1| S=1, G=0)=… , P(LC=0| S=1, G=0)=…P(LC=1| S=0, G=1)=… , P(LC=0| S=0, G=1)=…P(LC=1| S=0, G=0)=… , P(LC=0| S=0, G=0)=…

• In the language of Structural Equation Models (SEM), use:

LC = f(S, G) + noisewhere usually f is a linear function.

Page 12: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

12

Common simplifications

– Assume a Markov process– Assume a DAG– Assume causal sufficiency (no hidden common cause)

– Assume stability or faithfulness (no particular parameterization implying dependencies not reflected by the structure)

– Assume linearity of relationships– Assume Gaussianity of PDF’s– Discard relationships of low statistical significance– Focus on a local neighborhood of a target variable– Learn unoriented or partially oriented graphs– Assume uniqueness of the Markov boundary

Page 13: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

13

How about time?

Cross-sectional study

0

9 4

11

61

10 2

3

7

5

8

Page 14: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

14

How about time?

Cross-sectional study

0

9 4

11

61

10 2

3

7

5

8

01234567891011

01234567891011

01234567891011

01234567891011

Longitudinal study

Page 15: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

15

Learning causal structurefrom “cross-sectional”

studies:

CYTOLOCANET

TIED

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 16: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

16

Causal models as particular “generative models”

• Imagine we have “prior knowledge” about a few alternative plausible “causal models” (we basically know the architecture).

• Fit the parameters of the model to data.• Select the model based on goodness of fit

(score), perhaps penalizing higher complexity models.

• Could two models have identical scores?

Page 17: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

17

Key types of causal relationships 1

Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Lung Cancer

Smoking

Direct cause

Page 18: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

18

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Key types of causal relationships 2

Indirect cause (chain)AN LC | S

Lung Cancer

Anxiety

Page 19: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

19

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Car Accident

Born an Even Day

Fatigue

Key types of causal relationships 3

Confounder (fork)YF LC | S

Lung Cancer

Yellow Fingers

Page 20: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

20

How this might look in data

Lung cancer

Yellow Fingers

Page 21: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

21

Simpson’s paradox

YF LC | S

How this might look in data

Non-smokingSmoking

Lung cancer

Yellow Fingers

Page 22: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

22

Markov equivalence

X1 Y | X2

X1 YX2

X1 YX2

X1 YX2

P(X1, X2 , Y)

= P(X1 | X2 , Y) P(Y | X2) P(X2)

P(X1, X2 , Y)

= P(Y | X2 , X1 ) P(X2 | X1) P(X1)

P(X1, X2 , Y)

= P(X1 | X2 , Y) P(X2 | Y) P(Y)

Page 23: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

23

Smoking Genetics

Coughing

AttentionDisorder

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Key types of causal relationships 4

Collider (V-structure)

AL LC | C

Lung CancerAllergy

Page 24: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

24

How this might look in data

Lung cancer

Allergy

Page 25: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

25

How this might look in data

Lung cancer

Allergy

Coughing=1

Coughing=0

Page 26: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

26

No Markov equivalence

Colliders (V-structures) : X1 Y | X2

X1 Y

X2

P(X1, X2 , Y) = P(X2 | X1,Y) P(X1) P(Y)

Page 27: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

27

Structural methods

1. Build unoriented graph (using conditional independencies).

2. Orient colliders.3. Add more arrows

by constraint propagation without creating new colliders.

0

9 4

11

61

10 2

3

7

5

8

0

9 4

11

61

10 2

3

7

5

8

Page 28: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

28

… towards CYTO:using experiments to

learn the causal structure

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 29: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

29

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Direct cause

Manipulating a single variable

1

Smoking manipulated (disconnected from its direct causes): remains predictive of LC.

Page 30: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

30

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Indirect cause

Manipulating a single variable

2

Anxiety manipulated: remains predictive of Lung Cancer.

Page 31: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

31

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Manipulating a single variable

3

Consequence of common

cause (correlated,

but not cause) Yellow Fingers manipulated: no longer predictive of LC.

Page 32: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

32

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Direct cause

Manipulating a single variable

4

Genetics manipulated: remains predictive of LC and AD.

?

Page 33: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

33

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Direct cause

Manipulating a single variable

5

Attention disorder manipulated: no longer predictive of Genetics.

Page 34: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

34

MEK3/6

MAPKKK

PLC

Erk1/2

Mek1/2

Raf

PKC

p38

Akt

MAPKKK

MEK4/7

JNK

L

A

TLck

VAVSLP-76

RAS

PKA

1 2 3CD28CD3

PI3K

LFA-1

Cytohesin

Zap70

PIP3

PIP2

JAB-1

Activators

1.-CD3

2.-CD28

3. ICAM-2

4. PMA

5. 2cAMP

Inhibitors

6. G06976

7. AKT inh

8. Psitect

9. U0126

10. LY294002

10

5

46

7

9

8

The CYTO problem

Karen Sachs et al

Page 35: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

35

… towards LOCANET:learning the causal structure without

experimentsto predict the

consequences of future actions.

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 36: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

36

What if we cannot experiment?

• Experiments may be infeasible, costly or unethical

• Using only observations we may want to predict the effect of new policies.

• Policies may consist in manipulating several variables.

Page 37: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

37

LUCAS1: manipulate

d

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

Manipulating a few variables

Markov boundary

Page 38: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

38

Lung Cancer

Smoking Genetics

Coughing

AttentionDisorder

Allergy

Anxiety Peer Pressure

Yellow Fingers

Car Accident

Born an Even Day

Fatigue

LUCAS2: manipulate

d

Manipulating all variables

Markov boundary

Page 39: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

39

Causality challenge #1:causation and prediction

• Task: Predict the target (e.g., Lung cancer) in “unmanipulated” or “manipulated” test data.

• Goals:– Introduce ML people to causal discovery problems.– Investigate ties between causation and prediction.

• Findings:– Participants used either causal or non-causal feature

selection.– Good causal discovery (feature set containing the

“manipulated” MB) correlated with good predictions.– However, some participants using non-causal feature

selection obtained good prediction results.

Page 40: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

40

Causality challenge #2:The LOCANET problem

• Task: Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI.

• Goal: Analyze more finely to which extent causal discovery methods recover the causal structure and how this affects predicting the target values.

Page 41: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

41

TIEDEquivalent Markov

boundaries

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 42: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

42

Equivalent Markov boundaries

Markov boundary

Many almost identical measurements of the same (hidden) variable can lead to many statistically

undistinguishable Markov boundaries.

Y

Page 43: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

43

Target Information Equivalence (TIE)

Two disjoint subsets of variables V1 and V2 are Target Information Equivalent (TIE) with respect to target Y iff:

• V1Y

• V2Y

• V1Y | V2

• V2Y | V1

Alexander Statnikov & Constantin Aliferis

Page 44: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

44

TIE Data (TIED)Exact equivalence

X2 X3 X11 Y

0

1

2

0

1

2

0

2

0

1

2

3 3 3

1

0

1

2

3

X1

3

Small example of the type of relationships implemented in TIED.The following TIE relations hold in the data:

TIEY(X1, X2) TIEY(X1, X3) TIEY(X1, X11)TIEY(X2, X3) TIEY(X2, X11)TIEY(X3, X11)

TIEX11(X1, X2) TIEX11(X1, X3) TIEX11(X2, X3)Notice that variables X1, X2, X3, X11, and Y are not deterministically related.

Alexander Statnikov & Constantin Aliferis

Page 45: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

45

Learning causal structurefrom “longitudinal”

studies:

SIGNET PROMO

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *

Page 46: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

46

SIGNET: a plant signaling network

• Plants loose water and take in carbone dioxide through microscopic pores.

• During drought, plant hormone abiscisic acid (ABA) inhibits pore opening (important for the genetic engineering of new drought resistant plants).

• Unraveling the ABA signal transduction network took years of research. A recent dynamic model synthesizes many findings (Li, Assmann, Albert, PLOS, 2006).

• The model is used by Jenkins and Soni to generate artificial data. The problem is to reconstruct the network from the data.

Page 47: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

47

Abscisic Acid Signaling Network

Li, Assmann, Albert, PLOS, 2006

47

Page 48: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

48

SIGNET: sample data10111011101011011011010010100010110000110011100001110111101101101111111011001011101011110001111011111010110110001101000111010101011000011101111101011011000110000111101010101100001110111110101101100011000011110101010

- Boolean model; asynchronous updates- 43 nodes- 300 simulations

Example of asynchronous updates for a

4-node network:

time

Page 49: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

49

PROMO: simulated marketing task

• 100 products• 1000 promotions• 3 years of daily

data• Goal: quantify

the effect of promotions on sales

products

promotions

Jean-Philippe Pellet

Page 50: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

50

PROMO: schematically…

The difficulties include:

- non iid samples

- seasonal effects

- promotions are binary, sales are continuous

- the problem is more quantifying the relationships than determining the causal skeleton

other

1000

100

Page 51: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

51

Pot-luck challenge:Bring your own problem

* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem *

Page 52: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

52

From NIPS 2006 workshop…

1. 1.        Predict the consequences of a manipulation (similar to a usual predictive modeling task, but the test data is no longer distributed in the same way as the training data; the system undergoes a manipulation to produce the test data).

2. 2.        Determine what manipulations are needed to reach a desired system state with maximum probability (e.g., select variables and propose values to achieve a certain value of a response/target variable, with perhaps a cost per variable).

3. 3.        Propose system queries to acquire more training data, i.e. design experiments, with perhaps an associated cost per variable and per sample and perhaps with constraints on variables, which cannot be controllable.

4. 4.        Determine all causal relationships between variables.5. 5.        Determine a local causal region around a response/target variable (causal adjacency).6. 6.        Determine the source cause(s) for a response/target variable.7. 7.        Determine for all variables whether they are, with respect to a response/target variable: cause, effect,

consequence of a common cause, cause of a common effect, or unrelated.8. 8.        Predict the existence of unmeasured variables (not part of the set of variables provided in the data),

which are potential confounders (are common causes of an observed variable and the target).9. 9.        Predict which variables called “relevant” by feature selection algorithms are potentially causally

irrelevant because their correlation to the target is the result of an experimental artifact (e.g., sampling bias or systematic error).

10. 10.     Determine a causal order of all variables.11. 11.     Determine a causal direction in time series data in which one variable is causing the other.12. 12.     Determine the direction of time in a time series (mostly of fundamental rather than practical interest).13. 13.     Incorporate prior knowledge in causal discovery.14. 14.     Predict counterfactuals.

Page 53: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

53

http://clopinet.com/causality

• September 15, 2008: challenge start. • October 15, 2008: deadline for (optional)

submission of milestone challenge results.• October 24, 2008: workshop abstracts due.• November 12, 2008: challenge ends (last day to

submit challenge results).• November 21, 2008: JMLR proceedings paper

submission deadline.• December 12, 2008: challenge results publicly

released; workshop.

Page 54: 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe

54

Prizes

• Four prizes (free NIPS workshop entrance or $200). – Best solution to one or more problems: 3 prizes.– Best problem:1 prize.

• All competitors must submit a 6-page paper.• Criteria: performance/usefulness,

novelty/originality, sanity, insight, reproducibility, clarity.