1 peter spirtes, richard scheines, joe ramsey, erich kummerfeld, renjie yang

73
Searching for Causal Models with Latent Variables 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Upload: kelley-adams

Post on 16-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

1

Searching for Causal Models with Latent

Variables

Peter Spirtes, Richard Scheines, Joe Ramsey, Erich

Kummerfeld, Renjie Yang

Page 2: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

2

What is the Causal Relation Between Economic Stability and Political Stability?

Economicalstability

Politicalstability

Economicalstability

Politicalstability

Economicalstability

Politicalstability

Economicalstability

Politicalstability

L

?

?

?

?

Page 3: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

3

Measure Latents with Indicators

Country XYZ

1. GNP per capita: _____2. Energy consumption per capita: _____3. Labor force in industry: _____4. Ratings on freedom of press: _____5. Freedom of political opposition: _____6. Fairness of elections: _____7. Effectiveness of legislature _____

Task: learn causal model

Page 4: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

4

To draw causal conclusions about the unmeasured Economical stability and Political stability variables we are interested in, usehypothesized causal relations between X’s , Es

and Psstatistics gathered on X’s (correlation matrix)

Multiple Indicator ModelsEconomical

stabilityPoliticalstability

?

Pure Measurement Model

X1 X2 X3 X5 X6 X7X4

Page 5: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Structural Model – Two Factor Model

Page 6: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Measurement Model

Page 7: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Impurities

Page 8: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

8

A pure n-factor measurement model for an observed set of variables O is such that:Each observed variable has exactly n latent

parents.No observed variable is an ancestor of other

observed variable or any latent variable. A set of observed variables O in a pure n-

factor measurement model is a pure cluster if each member of the cluster has the same set of n parents.

Pure Measurement Models

Page 9: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Alternative Models

L1

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 Bifactor

L2 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 Higher-Order

L1 L3 L2

Higher-Order ⊂ Bifactor ⊂ Connected Bifactor ⊂ Connected Two-Factor

Page 10: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

1. Estimate and test pure Higher-order model. 2. Estimate and test pure Two-Factor model. 3. Choose whichever one fits best.

Common Strategy

Page 11: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

If a measurement model is impure, and you assume it is pure, this will hinder the inference of the correct structural model.

If a higher-order model has impurities, it will fit a more inclusive pure model such as a pure two-factor model better than a pure higher-order model.

Two Problems

Page 12: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Generating Model

Finding the Structural Model

Page 13: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Data fits model with black edges + pure measurement model better than model without black edges + pure measurement model.

Finding the Structural Model

Page 14: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

Generating Model

Finding the Right Kind of Measurement Model

Page 15: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

Worse Fit

Finding the Right Kind of Measurement Model

Page 16: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Better Fit

Finding the Right Kind of Measurement Model

Page 17: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

Generating Model

1. Identify pure submodel {1,2,3,4,5,8,9,10,11,12,13}. 2. See if it fits Higher-order.3. If it does select Higher –order, otherwise see if it fits Two-Factor model.

Finding the Right Kind of Measurement Model

Page 18: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X8 X9 X10 X11 X12 X13

Pure submodel fits Higher-order model, so select Higher-order.

Alternative Strategy?

Page 19: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X8 X9 X10 X11 X12 X13

L2 L4

Data will also fit Two-Factor model (slightly lower chi-squared), but when adjusted for degrees of freedom, p-value will be lower.

Alternative Strategy?

?

Page 20: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Rank Constraints

Page 21: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

21

An algebraic constraint is linearly entailed by a DAG if it is true of the implied covariance for every value of the free parameters (the linear coefficients and the variances of the noise terms)

Entailed Algebraic Constraints

Page 22: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Trek and Sides of Treks

Page 23: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

(CA:CB) trek-separates A from B iff every trek between A and B intersects CA on the A side or CB on the B side.

Trek-Separation

Page 24: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

< {L1,L2}, ∅> Trek-Separate {1,2,3}:{8,9,10}

Page 25: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

<∅,{L3,L4}> Trek-Separate {1,2,3}:{8,9,10}

Page 26: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

If (CA:CB) trek-separates A from B, and the model is an acyclic linear Gaussian model, then rank(cov(A,B)) ≤ #CA + #CB.

Theorem (Sullivant, Talaska, Draisma)

Page 27: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

< {L1,L2}, ∅> Trek-Separate {1,2,3}:{8,9,10}

Page 28: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

If #CA + #CB ≤ #C’A + #C’B for all (C’A:C’B) that trek-separate A from B, then for generic linear acyclic Gaussian models, rank(cov(A,B)) = #CA + #CB.

Theorem (Sullivant, Talaska, Draisma)

Page 29: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

If #CA + #CB > r for all (CA:CB) that trek-separate A from B in DAG G, then for some linear Gaussian parameterization, rank(cov(A,B)) > r.

Theorem (Sullivant, Talaska, Draisma)

Page 30: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3}:{10,11,12} linear acyclic below <{L1,L2}, ∅>

Linear Acyclic Below the Choke Sets

f(L1,εL3)

g(L2,εL4)

Page 31: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3}:{10,11,12} not linear acyclic below < ∅, {L1,L2}>

Linear Acyclic Below the Choke Sets

f(L1,εL3)

g(L2,εL4)

Page 32: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

If (CA:CB) trek-separates A from B, and model is linear acyclic below (CA:CB) for A, B, then rank(cov(A,B)) ≤ #CA + #CB.

Theorem (Spirtes)

Page 33: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

ProofCA

… …

full rank

A B

CB

Page 34: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

If #CA + #CB > r for all (CA:CB) that trek-separate A from B in DAG G, then for some linear acyclic below (CA:CB) for A, B parameterization, rank(cov(A,B)) > r.

Theorem (Spirtes)

Page 35: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

If a rank constraint is not entailed by the graphical structure, then the rank constraint does not hold.

If the constraints do not hold for the whole space of parameters (i.e. they are not entailed), but are the roots of rational equations in the parameters, they are of Lebesgue measure 0.

Faithfulness Assumption

Page 36: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

This says nothing about the measure of constraints that are not entailed but “almost” hold (i.e. cannot be distinguished from 0 reliably given the power of the statistical tests.)

However, the performance of the algorithm will not depend upon the extent to which individual non-entailed constraints “almost” hold, but the extent to which sets of non-entailed constraints “almost” hold.

This depends upon which sets of constraints affect the performance of the algorithm, and the joint distribution of the constraints which we do not know.

Faithfulness Assumption

Page 37: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

37

AdvantagesNo need for estimation of model.

No iterative algorithmNo local maxima.No problems with identifiability.Fast to compute.

DisadvantagesDoes not contain information about

inequalities.Power and accuracy of tests?Difficulty in determining implications among

constraints

Advantages and Disadvantages of Algebraic Constraints

Page 38: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Find a list of pure pentads of variable.

Merge pentads on list that overlap.Select which merged subsets to

output.

Find Two Factor Clusters (FTFC) Algorithm

Page 39: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

For each subset of size 5, if it is Pure, add to PureList.

{1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13};{8,9,10,12,13}; {8,9,10,11,12}

1. Construct a List of Pure Fivesomes

Page 40: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

<{L1,L2},∅> Trek-Separate All Partitions of {1,2,3,4,5,x}

Test for Purity of {1,2,3,4,5}

Page 41: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

No Pair Trek-Separate All Partitions of {1,2,3,4,8,x}, e.g. {1,2,8}:{3,4,9}

Test for Purify of {1,2,3,4,8}

Page 42: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

No Pair Trek-Separates All Partitions of {1,2,3,4,6,x}, e.g. {1,2,6}:{3,4,7}

Test for Purify of {1,2,3,4,6}

Page 43: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

No Pair Trek-Separate {1,2,3}:{7,8,9}

Page 44: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13};{8,9,10,12,13}; {8,9,10,11,12} → {1,2,3,4,5}; {8,9,10,11,12,13}

2. Merge Overlapping Items - Theory

Page 45: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13};{8,9,10,12,13}; {8,9,10,11,12}; {1,2,3,8,9} (false positive)

2. Merge Overlapping Items - Practice

Page 46: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{9,10,11,12,13}; {8,10,11,12,13} → {8,9,10,11,12,13};

All subsets of size 5 of {8,9,10,11,12,13} are in PureList so accept merger, and remove both from PureList.

2. Merge Overlapping Items - Practice

Page 47: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3,4,5}; {1,2,3,8,9} → {1,2,3,4,5,8,9}

All subsets of size 5 except {1,2,3,8,9} and {1,2,3,4,5}not on PureList – so reject merger

2. Merge Overlapping Items - Practice

Page 48: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3,4,5}; {8,9,10,11,12,13}; {1,2,3,8,9}

2. Final List

Page 49: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

{1,2,3,4,5}; {8,9,10,11,12,13}; {1,2,3,8,9}Output {8,9,10,11,12,13} because it is largest. Output {1,2,3,4,5} because it is next largest that is disjoint.

3. Select Which Ones to Output

Page 50: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

IfThe causal graph contains as a subgraph a pure 2-

factor measurement model with at least six indicators and at least 5 variables in ech cluster;

The model is linear acyclic below the latent variables;

Whenever there is no trek between two variables they are independent;

There are no correlations equal to zero or one;The distribution is LA faithful to the causal graph;

then the population FTFC algorithm outputs a clustering in which any two variables in the same output cluster have the same pair of latent parents.

Theorem

Page 51: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L6

L2 L4

Undetectible Impurities

Page 52: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

X1 X2 X3 X4 X5 X6Spider Model (Sullivant, Talaska, Draisma)

Alternative Models with Same Constraints

L

L1 L2

L3 L4

L5 L6

Page 53: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

53

However, the spider model (and the collider model) do not receive the same chi-squared score when estimated, so in principle they can be distinguished from a 2-factor model. ExpensiveRequires multiple restartsNeed to test only pure clustersIf non-Gaussian, may be able to detect

additional impurities.

Checking with Estimated Model

Page 54: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

In case of linear pure single factor models (with at least 3 indicators per cluster), all of the latent-latent edges are guaranteed to be identifiable.

Can apply causal search model using the estimated covariance matrix among the latents as input.

Inferring Structural Model

Page 55: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4

Non-identified edges in Two-Factor Model

Page 56: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

56

For sextads, the first step is to check 10 * n choose 6 sextads.

However, a large proportion of social science contexts, there are at most 100 observed variables, and 15 or 16 latents. If based on questionairres, generally can’t get

people to answer more questions than that. Simulation studies by Kummerfeld indicate that

given the vanishing sextads, the rest of the algorithm is subexponential in the number of clusters, but exponential in the size of the clusters.

Complexity

Page 57: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

ΣIJ is the I×J submatrix of the inverse of ΣIJ,

and ΣIJ×IJ is the (I ∪ J) × (I ∪ J) submatrix of Σ. This can be turned into a statistical test by substituting the maximum likelihood estimate of Σ in for the population values of Σ.

Drton Test – Assuming Normality

Page 58: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

τ is a column vector of independent population sextad differences implied by a model to vanish

t is a column vector of corresponding sample sextad differencesσ is a column vector of covariances that appear in one of more

vanishing sextad differences in tΣss is the covariance matrix of the limiting distribution of

sample covariances appearing in t, σefgh is the fourth order moment matrix.

Delta Test – Asymptotically Normal

Page 59: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

59

Problems in Testing ConstraintsTests require (algebraic) independence among

constraints.

Additional complication – when some correlations or partial correlations are non-zero, additional dependencies among constraints arise

Some models entail that neither of a pair of sextad constraints vanish, but that they are equal to each other

Page 60: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

3 hypothesized latent variables: Stress, Depression, and (religious) Coping.

21 indicators for Stress, 20 each for Depression and Coping

n = 127

Application to Depression Data

Page 61: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Lee modelp(χ2) = 0

Application to Depression Data

Page 62: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Silva et al. modelp(χ2) = .28

Application to Depression Data

Page 63: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Silva et al. modelp(χ2) = .28

Application to Depression Data

Page 64: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

The current version of the FTFC algorithm cannot be applied to all 61 measured indicators in the Lee data set as input in a feasible amount of time.

We applied it at several different signicance levels to look for 2-pure sub-models of the 3 original given subsets of measured indicators.

We ran the FTFC algorithm at a number of dierent significance levels. Using the output of FTFC as a starting point, we searched for a model that had the highest p-value using a chi-squared test.

The best model that we found contained a cluster of 9 coping variables, 8 stress variables, and 8 depression variables (all latent variables directly connected).

p(χ2) = 0.27.

Application to Depression Data

Page 65: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

L1 L3 L5

X1 X2 X3 … X10 X11 … X20 X21 … X30

L2 L4 L6

Generated from model, and pure submodel. 3 sample sizes: n = 100 (alpha = .1), 500 (alpha = .1), 1000 (alpha = .4).

Non-linear function are convex combination of linear and cubic.

Simulation Studies

Page 66: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Purity

P/I – Generated from pure/impure submodelL/N – Generated from linear/non-linear latent-latent functionsL/N – Generated from linear/non-linear latent-measured connectionsPurity – percentage of output cluster from same pure subcluster.

Page 67: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

The average number of clusters output ranged between 2.7 and 3.1 for each kind of model and sample size, except for PNN (pure submodel, non-linear latent-latent and latent-measured functions.)

For PNN at sample sizes 100, 500, and 1000 average number of clusters were 1.05, 1.38, and 1.54 respectively.This is expected, because non-linear latent-

measurd connections violates the assumptions under which the algorithm is correct.

Number of Clusters

Page 68: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

The percentage of each pure subcluster that was in the output cluster.

Fraction of Possible Output

Page 69: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

Larger clusters are more stably produced and more likely to be (almost) correct.

Informal Observation

Page 70: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

70

Described algorithm that relies on weakened assumptionsWeakened linearity assumption to linearity

below the latentsWeakened assumption of existence of pure

submodels to existence of n-pure submodelsConjecture correct if add assumptions of no

star or collider models, and faithfulness of constraintsIs there reason to believe in faithfulness of

constraints when non-linear relationships among the latents?

Summary

Page 71: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

71

Give complete list of assumptions for output of algorithm to be pure.

Speed up the algorithm.Modify algorithm to deal with almost

unfaithful constraints as much as possible.Add structure learning component to output

of algorithm. Silva – Gaussian process model among latents,

linearity below latentsIdentifiability questions for stuctural models

with pure measurement models.

Open Problems

Page 72: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

72

Silva, R. (2010). Gaussian Process Structure Models with Latent Variables. Proceedings from Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10).

Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006a). Learning the structure of linear latent variable models. J Mach Learn Res, 7, 191-246.

Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Ann Stat, 38(3), 1665-1685.

References

Page 73: 1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

73

Drton, M., Massam, H., and Olkin, I. (2008) Moments of minors of Wishart matrices, Annals of Statistics 36, 5, pp. 2261-2283.

Drton, M., Sturmfels, B., Sullivant, S. (2007) Algebraic factor analysis: tetrads, pentads and beyond, Probability Theory and Related Fields, 138, 3-4, 463-493

Harman, H. (1976) Modern Factor Analysis, University of Chicago Press Books

References