weakening the causal faithfulness assumption

28
Weakening the Causal Faithfulness Assumption Jiji Zhang Lingnan University Based on joint work with Peter Spirtes

Upload: clem

Post on 08-Jan-2016

53 views

Category:

Documents


2 download

DESCRIPTION

Weakening the Causal Faithfulness Assumption. Jiji Zhang Lingnan University Based on joint work with Peter Spirtes. Markov and Faithfulness Assumptions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Weakening the Causal Faithfulness Assumption

Weakening the Causal Faithfulness Assumption

Jiji ZhangLingnan University

Based on joint work with Peter Spirtes

Page 2: Weakening the Causal Faithfulness Assumption

2

Markov and Faithfulness Assumptions

Suppose the set of observed variables V is causally sufficient and its causal structure can be properly represented by a DAG over V.

A statement of conditional independence is said to be entailed by a DAG if it is entailed by the Markov property of the DAG.

Causal Markov Assumption: Every conditional independence statement entailed by the causal DAG over V is satisfied by the joint distribution over V.

Causal Faithfulness Assumption: Every conditional independence statement satisfied by the joint distribution over V is entailed by the causal DAG over V.

Page 3: Weakening the Causal Faithfulness Assumption

3

Simple Examples of Unfaithfulness

X

Y

Z-

+ +

X Y

Z

X

[0, 1]

Y Z

[0, 1, 2] [0, 1]

Entailed: none; Extra: X Z.

Entailed: X Z | Y; Extra: X Z.

Entailed: X Y; Extra: X Z; Y Z

Page 4: Weakening the Causal Faithfulness Assumption

4

Testing Faithfulness?• Without knowing the true causal DAG, the Faithfulness

assumption is not fully testable.

• But given the Markov assumption, the Faithfulness assumption has a testable consequence: the distribution of V is (Markov and) faithful to some DAG.

• Unfaithfulness is in principle detectable if the distribution is not faithful to any DAG.

It is undetectable if the distribution is faithful to some (false) DAG.

Page 5: Weakening the Causal Faithfulness Assumption

5

SGS AlgorithmS1. Form the complete undirected graph H over V.

S2. For each pair of variables X and Y, search for S V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found.

S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent),

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

S4. More orientation rules …

Page 6: Weakening the Causal Faithfulness Assumption

6

Justification of S2S2. For each pair of variables X and Y, search for S V\{X, Y} such that

X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found.

• Inference of adjacencies is justified by the Markov assumption.

• Inference of non-adjacencies is justified by a consequence of the Faithfulness assumption.

Adjacency-Faithfulness: For every X, YV, if X and Y are adjacent in the true causal DAG, then they are not independent conditional on any subset of V\{X,Y}.

Page 7: Weakening the Causal Faithfulness Assumption

7

Justification of S3S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y

and Z are adjacent, but X and Z are not adjacent),

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

• (1) and (2) are both justified by the Markov assumption.

• What about the Faithfulness assumption?

Page 8: Weakening the Causal Faithfulness Assumption

8

Justification of S3 (con’t)• The antecedent of clause (1) and that of clause (2) do not exhaust the

logical possibilities.

• The remaining logical possibility is ruled out by the following consequence of the Faithfulness assumption:

Orientation-Faithfulness: For every unshielded triple <X, Y, Z> in the true causal DAG,

– If X Y Z, then X and Z are not independent conditional on any subset of V\{X,Y} that contains Y.

– Otherwise, X and Z are not independent conditional on any subset of V\{X,Z} that does not contain Y.

X Y Z

Entailed: X Z | Y; Extra: X Z.

Page 9: Weakening the Causal Faithfulness Assumption

9

First Weakening of Faithfulness• It follows that given the Markov and Adjacency-Faithfulness

assumptions, violations of Orientation-Faithfulness are detectable, and a there is a straightforward test:

S3*. For each unshielded triple <X, Y, Z>,

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

(3) Otherwise, mark the triple as ambiguous or unfaithful.

Page 10: Weakening the Causal Faithfulness Assumption

10

Conservative SGS• Replace S3 with S3*, and we get what we call the Conservative

SGS (CSGS) algorithm.

• The CSGS algorithm is correct under the causal Markov and Adjacency-Faithfulness assumptions.

• When Orientation-Faithfulness happens to hold, the output of CSGS is the same as that of SGS.

Page 11: Weakening the Causal Faithfulness Assumption

11

E-pattern• We call the (supposed) output of CSGS an extended pattern (e-pattern), which represents a set of patterns (each of which represents a Markov equivalence class of

DAGs).

X

Y

Z U

W

X

Y

Z U

W

X

Y

Z U

W

X

Y

Z U

W

Page 12: Weakening the Causal Faithfulness Assumption

12

Violations of Adjacency-Faithfulness• Some violations of Adjacency-Faithfulness are also detectable.

• Compare to an undetectable violation:

X

Y

ZExtra: X Z.

X Y

Z

Extra: X Z; Y Z.

X

Y

Z

W

Extra: X Z.

Page 13: Weakening the Causal Faithfulness Assumption

13

Triangle-Faithfulness

Triangle-Faithfulness: For every triangle <X, Y, Z> (i.e., they are adjacent to one another) in the true causal DAG,

(1) If Y is a non-collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that does not contain Y.

(2) If Y is a collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that contains Y.

• Triangle-Faithfulness is weaker than Adjacency-Faithfulness.

X

Y

Z X

Y

Z

X

Y

Z

Page 14: Weakening the Causal Faithfulness Assumption

14

Further Weakening of Faithfulness• Another weak condition entailed by the Adjacency-Faithfulness

assumption is known as the causal Minimality condition: no proper subgraph of the true causal DAG satisfies the Markov condition with the joint distribution.

• Theorem: Given the causal Markov, Minimality and Triangle-Faithfulness assumptions, any violation of the Faithfulness assumption is detectable.

• What if we only make the Markov, Minimality and Triangle-Faithfulness assumptions?

Page 15: Weakening the Causal Faithfulness Assumption

15

CSGS under the Weaker Assumptions• Given the Markov assumption, in the adjacency step S2, the inferred adjacencies

are still correct.

• The inferred non-adjacencies, however, are not necessarily correct, since Adjacency-Faithfulness is not assumed. (Mark the non-adjacencies as ‘apparent’).

• Given the Markov and Triangle-Faithfulness assumptions, the orientation step S3* is still correct!

(For an ‘apparently’ unshielded triple <X, Y, Z>, either it is really unshielded or it is a triangle. In the former case, S3* is correct by the Markov assumption; in the latter case, S3* is correct by the Triangle-Faithfulness assumption.)

Page 16: Weakening the Causal Faithfulness Assumption

16

Testing Adjacency-Faithfulness?• Therefore, given only the Markov and Triangle-Faithfulness assumptions,

CSGS is still correct, provided that we take the non-adjacencies in the output as uninformative.

• Can we somehow test Adjacency-Faithfulness and confirm non-adjacencies if the test returns affirmative?

• What we have for now: take the output of CSGS and check the Markov condition for each pattern represented by the output. If every pattern satisfies the Markov condition, then the non-adjacencies are correct (assuming Minimality in addition to Markov and Triangle-Faithfulness).

Page 17: Weakening the Causal Faithfulness Assumption

17

Conjecture• The condition should be improvable. In particular, it is sufficient but not necessary

for Adjacency-Faithfulness.

• A necessary condition for Adjacency-Faithfulness is: some pattern represented by the CSGS output satisfies the Markov condition.

• Conjecture: The necessary condition is also sufficient.

That is, assuming Markov, Minimality, and Triangle-Faithfulness, Adjacency-Faithfulness holds iff some pattern represented by the CSGS output satisfies the Markov condition.

Page 18: Weakening the Causal Faithfulness Assumption

18

Still Further Weakening• Let G and H be DAGs over V. H is an I-structure of G if every conditional

independence entailed by G is also entailed by H. H is a proper I-structure of G if H is an I-structure of G but G is not an I-structure of H.

P-minimality assumption: No proper I-structure of the true causal DAG satisfies the Markov condition with the joint distribution.

• The causal Faithfulness assumption is equivalent to a conjunction of (1) the P-minimality assumption and (2) that the joint distribution is faithful to some DAG.

Page 19: Weakening the Causal Faithfulness Assumption

19

Still Further Weakening (con’t)• The causal Faithfulness assumption is often regarded as a methodological

assumption of simplicity; that is only part of its content, namely, the P-minimality assumption.

• Violations of the P-minimality assumption are not detectable; Given the P-minimality assumption, violations of (the rest of) the Faithfulness assumption are detectable.

• The causal (SGS-)minimality assumption plus the Triangle-Faithfulness assumption entail the P-minimality assumption.

• Conversely, the P-minimality assumption entails the causal (SGS-)minimality assumption, but does not entail Triangle-Faithfulness.

Page 20: Weakening the Causal Faithfulness Assumption

20

Example

• Triangle-Faithfulness is violated, but P-minimality is not.

• Assuming Markov and P-minimality, the violation of Triangle-Faithfulness is detectable.

ZX

Y

W

Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}.

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

Page 21: Weakening the Causal Faithfulness Assumption

21

Example (con’t)

• I suspect that VCSGS (i.e., CSGS in which non-adjacencies are regarded as ambiguous, unless a check of Markov condition in the end confirms them) is also correct under the causal Markov and P-minimality assumptions.

ZX

Y

W

Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}.

ZX

Y

W

Output of CSGS:

Page 22: Weakening the Causal Faithfulness Assumption

22

Further Questions• Are there feasible versions (or approximations)?

• How about causal inference without causal sufficiency?

Page 23: Weakening the Causal Faithfulness Assumption

23

PC and CPC

• The PC algorithm is a much more efficient version of SGS.

• The key efficiency-improving ideas are also applicable to CSGS (when Adjacency-Faithfulness is assumed to hold). The resulting algorithm was called Conservative PC (CPC).

• Joe Ramsey did simulations and found that even when the Faithfulness assumption is true, (1) CPC produces significantly fewer errors than PC at moderate sample sizes; (2) outputs about as much correct information as PC does; and (3) runs almost as fast.

Page 24: Weakening the Causal Faithfulness Assumption

24

Almost Unfaithfulness

• The reason, we think, is that CPC not only guards against strict failure of orientation-faithfulness, but also guards against almost violations.

• Intuitively, CPC suspends judgments when it detects “almost unfaithfulness” at a given sample size, just as it suspends judgments when it detects unfaithfulness in the large sample limit.

Page 25: Weakening the Causal Faithfulness Assumption

25

Uniform Consistency

• A negative result due to Robins et al. (2003) is that causal inference can only be pointwise consistent but not uniformly consistent under the Causal Markov and Faithfulness assumptions.

• The basis of their proof is related to almost unfaithfulness.

Page 26: Weakening the Causal Faithfulness Assumption

26

Uniform Consistency of Inferring Causal Direction

• Suppose that we have the right adjacencies, and use procedures like PC to infer causal directions.

• Robins et al.’s results do not apply here.

• But we can still show that the PC procedure is not uniformly consistent in the inference of causal direction given the right adjacencies.

Page 27: Weakening the Causal Faithfulness Assumption

27

Uniform Consistency of Inferring Causal Direction (con’t)

• Our argument is based on a theorem that no procedure can be uniformly consistent in, for example, deciding between an unshielded collider (X Y Z) and an unshielded non-collider without sometimes suspending judgments.

• This argument does not apply to CPC, and we can show that CPC can be made uniformly consistent in its inference of causal directions (given the right adjacencies).

Page 28: Weakening the Causal Faithfulness Assumption

28

References

P. Spirtes and J. Zhang (forthcoming) “A uniformly consistent estimator of causal effects under the k-triangle-faithfulness assumption”, Statistical Science.

J. Zhang (2013) “A comparison of three Occam’s razors for Markovian causal models”, British Journal for the Philosophy of Science, 64(2): 423-448.

J. Zhang (2008) “Error probabilities for the inference of causal direction”, Synthese 163: 409-418.

J. Zhang and P. Spirtes (2008) “Detection of unfaithfulness and robust causal inference”, Minds and Machines 18(2): 239-271.

J. Ramsey, P. Spirtes, and J. Zhang (2006) “Adjacency-faithfulness and conservative causal inference”, UAI proceedings: 401-408.