generalized linear latent variable models with flexible ...presentation at swiss statistics meeting,...

18
Generalized Linear Latent Variable Models with Flexible Distribution of Latent Variables Irina Irincheeva University of Geneva, Switzerland Presentation at Swiss Statistics Meeting, October 29, 2009 Joint work with Eva Cantoni and Marc G. Genton Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 1 / 17

Upload: others

Post on 10-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Generalized Linear Latent Variable Models with Flexible

Distribution of Latent Variables

Irina Irincheeva

University of Geneva, Switzerland

Presentation at Swiss Statistics Meeting, October 29, 2009Joint work with Eva Cantoni and Marc G. Genton

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 1 / 17

Page 2: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Outline

1 Latent variables

2 Generalized Linear Latent Variable Models

3 Semi-Nonparametric distribution

4 SNP GLLVM

5 Simulations

6 Discussion

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 2 / 17

Page 3: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Latent variables

Latent variables are important in modern science, even everyday life, butcannot be measured directly. Usually we assess them via a linearcombination of observable variables.

What? How to measure?

quality of life physical and mental health,wealth, employment, education ...

physical health cholesterol & hemoglobin rates,BMI, chronic disease, eyesight, hearing ...

wealth expenditures for food, clothes, leisure,owner of car, dishwasher, real estate ...

solvency of a customer age, permanent employment, revenue,prosecution for debts, color of the car ...

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 3 / 17

Page 4: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Generalized Linear Latent Variable Models

How to construct an optimal linear combination explaining the most of thedata?Generalized Linear Latent Variable Model (GLLVM):

h(f) is the multivariate density of q latent variables, f = (f1, . . . , fq)T ;

g(xj |µ + Γf), j = 1, . . . , p is the conditional density of j-th manifestvariable given the latent ones q < p, a p × 1 vector µ and a p × q

matrix Γ are parameters of interest;

Assume that latent variables f explain all the systematic variability ofdata i.e. xj |µ + Γf, j = 1, . . . , p are mutually independent. Then thedensity of random vector x is

g(x) =

Rq

p∏

j=1

g(xij |µ + Γf)

h(f)df

and we can use Maximum likelihood method to estimate parameters.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 4 / 17

Page 5: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Generalized Linear Latent Variable Models

Example 1: Manifest given latent and latent are normally distributed

h(f) =1

(2π)q2

exp{

−12 f

T f}

, i.e. Nq(0, I)

g(x|f) =1

|2πΨ|exp

{

−12(x − µ − Γf)TΨ−1(x − µ − Γf)

}

,

i.e. Np(µ + Γf,Ψ)

Result: factor analysis model i.e.

x = µ + Γf + u, where f ∼ Nq(0, I), u ∼ Np(0,Ψ),

f and u are independent.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 5 / 17

Page 6: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Generalized Linear Latent Variable Models

Example 2:

x ∈ R, f ∼ N(0, 1), x | f ∼ Bernoulli(θ)

so x | f takes values 0, 1. A possible link function between the expectationof x (which is θ) and µ+ γf is logit:

θ(f ) =exp {µ+ γf }

1 + exp {µ+ γf }

and the probability mass function of x given f is

g(x |f ) = θ(f )x (1 − θ(f ))1−x .

Finally the marginal density of x is

g(x) =

R

g(x |f )h(f )df = 1√2π

R

exp {xµ+ xγf − f 2/2}

1 + exp {µ+ γf }df

The integral does not exist in closed form.Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 6 / 17

Page 7: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Semi-Nonparametric distribution

Latent variables were traditionally supposed to be normally distributed duemostly to the Central Limit Theorem. It is not always appropriate.In our work we propose an approach based on a family of”Semi-NonParametric” (SNP) distributions :

h(f) = P2K (f − τ )φ(f | τ , σ2Iq), with f, τ ∈ R

q,

PK (f) =∑

0≤i1+...+iq≤K

ai1...iq fi11 . . . f

iqq ,

ai1...iq are such that h(f) is a valid density function,

φq(f | τ , σ2Iq) is the density of Nq(τ , σ2Iq).

This densities can approximate any smooth enough density arbitrarily close(Gallant and Nychka (1987)). This approach was already used by Chen,Zhang, and Davidian (2002) for random effects in Generalized LinearMixed Models.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 7 / 17

Page 8: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Semi-Nonparametric distribution

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

h(x)=φ(x)

h(x)=(0.34x2+0.47x+0.57)2 φ(x)

h(x)=(0.17x2+0.87)2 φ(x)

h(x)=( x2

2 2 + x

2 − 3

2 2)2φ(x)

Figure: Some particular cases for K = 1 (SNP1) and 2 (SNP2) of univariatedensities of the form h(f ) = P2

K (f )φ(f ).

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 8 / 17

Page 9: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

SNP GLLVM

Ifh(f) = P2

K (f)φ(f | 0, I),

andg(x|f) = φ(x | µ + Γf,Ψ)

then g(x;µ,Γ,Ψ) has closed form for any K !We implemented the estimation by

SNP1 ML, i.e. assuming that latent density is (a0 + a1f )2φ(f )

SNP2 ML, i.e. assuming that latent density is (a0 + a1f + a2f2)2φ(f ).

Analytical gradient and hessian are used to boost the optimization which isvery sensitive to the initial values.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 9 / 17

Page 10: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

We explore the properties of proposed estimators on 200 simulated

samples of size 200 from the model:

x1

x2

x3

=

γ1

γ2

γ3

f +

u1

u2

u3

where

ui ∼ N(0, ψi ), ψ1 = ψ2 = ψ3 = 1,

γ1 = 1.4, γ2 = 1.8, γ3 = −1,

and we generate f from 3 different distributions:

standard normal distribution N(0, 1);

mixture of normals with distribution 0.7N(2, 1) + 0.3N(−2, 0.25)

SNP2 distribution with density(

− 1√2

cos 0.7 + f sin 0.7 + f 2 1√2cos 0.7

)2φ(f )

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 10 / 17

Page 11: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

Standard normal latent variable

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

E(x1)

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

γ1

Nor SNP1 SNP2−

0.5

0.0

0.5

1.0

1.5

2.0

2.5

ψ11

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 11 / 17

Page 12: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

Mixture of normals latent variable

Latent variable is generated from the mixture of normals0.7N(2, 1) + 0.3N(−2, 0.25)

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

E(x1)

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

γ1

Nor SNP1 SNP2−

0.5

0.0

0.5

1.0

1.5

2.0

2.5

ψ11

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 12 / 17

Page 13: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

SNP2 latent variable

Latent variable is generated from the SNP2 density with distribution(

− 1√2cos 0.7 + f sin 0.7 + f 2 1√

2cos 0.7

)2φ(f )

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

E(x1)

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

γ1

Nor SNP1 SNP2

−0

.50

.00

.51

.01

.52

.02

.5

ψ11

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 13 / 17

Page 14: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

Surprisingly, the FA MLE of loadings and uniquenesses are not sensitive atall to the wrong specification of the latent variable distribution. But ourapproach allows to visualize the latent variable distribution.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 14 / 17

Page 15: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

Estimated densities 1

Estimated latent densities when the true is normal and mixture of normals.(a), (b) Solid line is the average of estimated densities for fits preferred byHQ, shaded area is the point wise estimated confidence envelope, dashedred line is the true density.

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(a)

z

De

nsi

ties

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(b)

z

De

nsi

ties

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 15 / 17

Page 16: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Simulations

Estimated densities 2

Simulations results based on 200 data sets with SNP2 latent: (a)Densities estimated by SNP2. (b) Solid curve is the average of SNP2estimated densities, dashed red curve is the true density.

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(a)

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(b)

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 16 / 17

Page 17: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Discussion

Discussion

Remarkable ”robustness“ of FA to the wrong specification of the latentvariable distribution. Similar studies in mixed linear models confirmthis result.

This ”robustness“ does not hold in cases when distribution of manifestgiven the latent is discontinuous (Bernoulli for example). We areinvestigating this case.

Our approach offer a deeper insight on the behavior of latentvariables. Indeed the non-normality of the estimated latent densitycan indicate presence of outliers, non-linearity, heterogeneity ofpopulation or just inconvenience of normally distributed latent.

Different fits (Nor, SNP1, SNP2) can be compared using informationcriterions such as AIC, BIC or HQ.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 17 / 17

Page 18: Generalized Linear Latent Variable Models with Flexible ...Presentation at Swiss Statistics Meeting, October 29, 2009 ... is a valid density function, ... This ”robustness“ does

Discussion

Chen, J., Zhang, D., & Davidian, M. (2002). A Monte-Carlo EMAlgorithm for generalized mixed models with flexible random effectsdistribution. Biostatistics, 3(3), 347-360.

Gallant, A. R., & Nychka, D. W. (1987). Semi-nonparametric maximumlikelihood estimation. Econometrica, 55(2), 363-390.

Irincheeva, Cantoni, Genton (UniGe) SNP GLLVM SSM 17 / 17