sugi 31 - contributed paper 201-311 performing latent class analysis using the catmod procedure...

32
SUGI 31 - Contributed paper 201-31 SUGI 31 - Contributed paper 201-31 1 Performing Latent Class Performing Latent Class Analysis Using the CATMOD Analysis Using the CATMOD Procedure Procedure David M. Thompson David M. Thompson Department of Biostatistics and Epidemiology Department of Biostatistics and Epidemiology College of Public Health, OUHSC College of Public Health, OUHSC

Upload: amos-warner

Post on 29-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 11

Performing Latent Class Analysis Performing Latent Class Analysis Using the CATMOD ProcedureUsing the CATMOD Procedure

David M. ThompsonDavid M. Thompson

Department of Biostatistics and EpidemiologyDepartment of Biostatistics and Epidemiology

College of Public Health, OUHSCCollege of Public Health, OUHSC

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 22

Latent class analysis (LCA)Latent class analysis (LCA)

• LCA validates classification in the LCA validates classification in the absence of a gold standard for absence of a gold standard for decision-making.decision-making.

• LCA is unavailable in SAS.LCA is unavailable in SAS.

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 33

LCA and Patient ClassificationLCA and Patient Classification

Patient classification is part of Patient classification is part of many clinical decisions.many clinical decisions.

• DiagnosisDiagnosis

• PrognosisPrognosis

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 44

Patient classification in the Patient classification in the absence of a gold standardabsence of a gold standard

DiagnosisDiagnosis• Diagnostic categories may be Diagnostic categories may be

emerging or unclear.emerging or unclear.

PrognosisPrognosis• predicting rehabilitation outcomes predicting rehabilitation outcomes • counseling patients and families counseling patients and families

regarding expectationsregarding expectations

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 55

Latent class analysis (LCA)Latent class analysis (LCA)• LCA is a parallel to LCA is a parallel to

factor analysis, but factor analysis, but for categorical for categorical responses. responses.

• Like factor analysis, Like factor analysis, LCA addresses the LCA addresses the complex pattern of complex pattern of association that association that appears among appears among observations….observations….

A

B

C

D

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 66

… … and attributes the pattern to a set of and attributes the pattern to a set of latent (underlying, unobserved) factors latent (underlying, unobserved) factors or classes.or classes.

A

B

C

D

Class 1

Class 2

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 77

What if no gold standard What if no gold standard eexisted in xisted in cardiology to assess a pattern of cardiology to assess a pattern of ““yes/no”yes/no”signs and symptoms?signs and symptoms?

Rindskopf, R., & Rindskopf, W. (1986). The value of latent class Rindskopf, R., & Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. analysis in medical diagnosis. Statistics in Medicine, 5Statistics in Medicine, 5, 21-27. , 21-27.

Q-wave in EKG

Abnormal LDH pattern

History of angina

Elevated CPK

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 88

LCA predicts latent class membership LCA predicts latent class membership such that the observed variables are such that the observed variables are independent.independent.

Q-wave in EKG

History of

angina

abnormal LDH

pattern

elevated CPK

MI No MI

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 99

LCA LCA estimatesestimatesLatent class prevalencesLatent class prevalencesConditional probabilities: probabilities of Conditional probabilities: probabilities of specific response, given class specific response, given class membershipmembership

Q-wave in EKG

History of

angina

abnormal LDH

pattern

elevated CPK

P(MI)P(MI) PP(MI)

P(No MI)

P(Q-wave| MI)

P(CPK| No MI)

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1010

Conditional probabilities are analogous to Conditional probabilities are analogous to sensitivities and specificities, sensitivities and specificities, but are calculated in the absence but are calculated in the absence of a gold standard.of a gold standard.

Q-wave in EKG

History of

angina

abnormal LDH

pattern

elevated CPK

P(MI)P(MI) PP(MI)

P(No MI)

P(Q-wave| MI)

P(CPK| No MI)

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1111

LCA works on unconditional contingency table LCA works on unconditional contingency table (no information on latent class membership)(no information on latent class membership)

Q-Q-wavewave

Hx of Hx of AnginaAngina

““flipped” flipped”

LDHLDH

HighHigh

CPKCPK

nnijklijkl

00 00 00 00 1515

00 00 00 11 1414

00 00 11 00 1111

00 00 11 11 88

00 11 00 00 2323

.. .. .. .. ..

11 11 11 11 99

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1212

LCA’s goal is to produce LCA’s goal is to produce a complete (conditional) table a complete (conditional) table that assigns counts for each latent classthat assigns counts for each latent class::

Q-waveQ-wave Hx of Hx of AnginaAngina

““flipped” flipped”

LDHLDH

HighHigh

CPKCPK

Latent Latent

Class Class

X=tX=t

nnijkltijklt

00 00 00 00 11 99

00 00 00 11 22 66

00 00 11 00 11 33

00 00 11 11 22 1111

.. .. .. .. .. ..

11 11 11 11 22 99

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1313

EEstimating LC parametersstimating LC parameters

• Maximum likelihood approachMaximum likelihood approach • Because LC membership is unobserved, the Because LC membership is unobserved, the

likelihood function, and the likelihood surface, likelihood function, and the likelihood surface, are complex.are complex.

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1414

EM algorithm EM algorithm calculates L calculates L when some data (X) are unobservedwhen some data (X) are unobserved

““E” step E” step uses parameter estimates uses parameter estimates to update expected values to update expected values

for cell counts nfor cell counts nijkltijklt in complete contingency tablein complete contingency table

““M” step M” step produces ML estimates produces ML estimates

from complete tablefrom complete table

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1515

EM algorithm requires initial estimatesEM algorithm requires initial estimates

““E” stepE” step

““M” stepM” step1st “E” step: 1st “E” step: Provide initial Provide initial

estimates to “fill in” estimates to “fill in” missing information missing information on LC membershipon LC membership

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1616

EM algorithm in SASEM algorithm in SAS

““E” step E” step

SAS DATA stepSAS DATA step

““M” stepM” step

PROC CATMODPROC CATMOD1st “E” step: 1st “E” step:

SAS DATA step that SAS DATA step that randomly assigns randomly assigns

each response profile each response profile to one latent classto one latent class

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1717

““M” stepM” stepods output estimates=mu;ods output estimates=mu;proc catmod order=data;proc catmod order=data; weight count;weight count; model a*b*c*d*x=_response_model a*b*c*d*x=_response_

/wls addcell=.1;/wls addcell=.1; loglin a b c d x loglin a b c d x

a*x b*x c*x d*x;a*x b*x c*x d*x;run;run;quit;quit;ods output close;ods output close;

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1818

““E” stepE” step• data step that uses loglinear ML estimates data step that uses loglinear ML estimates

from CATMODfrom CATMOD

• converts loglinear estimates into LC converts loglinear estimates into LC prevalences and conditional probabilitiesprevalences and conditional probabilities

• calculates joint response probabilities within calculates joint response probabilities within and summed across latent classesand summed across latent classes

• calculates “posterior probabilities”, i.e. calculates “posterior probabilities”, i.e. P(X=1|abcd)P(X=1|abcd)

• constructs a new complete (conditional) constructs a new complete (conditional) contingency tablecontingency table

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1919

Results of a simulation studyResults of a simulation study

• simulate responses to four binary simulate responses to four binary (yes-no) o(yes-no) observed bserved variables with variables with known but unobservable (latent) known but unobservable (latent) group membership group membership

• evaluate whether an LCA approach evaluate whether an LCA approach using CATMOD accurately detects using CATMOD accurately detects true parameterstrue parameters

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2020

Distribution of true LC Distribution of true LC prevalences from 1000 prevalences from 1000 simulated samples simulated samples where n=200 and where n=200 and E[P(X=1)] = 0.5E[P(X=1)] = 0.5

Parameter estimates Parameter estimates from 406 successful from 406 successful runs using CATMODruns using CATMOD

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2121

Distribution of conditional probabilities Distribution of conditional probabilities from 1000 simulated samples from 1000 simulated samples E[P(A=1|X=1)] = 0.9 E[P(A=1|X=2)] = 0.2E[P(A=1|X=1)] = 0.9 E[P(A=1|X=2)] = 0.2

Parameter estimates from CATMODParameter estimates from CATMOD

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2222

Distribution of conditional probabilities Distribution of conditional probabilities from 1000 simulated samples from 1000 simulated samples E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8

Parameter estimates from CATMODParameter estimates from CATMOD

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2323

Concluding remarksConcluding remarks

•LCA is a potentially valuable tool in LCA is a potentially valuable tool in clinical epidemiology for clarifyclinical epidemiology for clarifying ing ill-defined diagnostic and prognostic ill-defined diagnostic and prognostic classifications. classifications.

•An approach using CATMOD brings An approach using CATMOD brings LCA closer to SAS’ analytic LCA closer to SAS’ analytic framework. framework.

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2424

• In any approach to LCA, sensitivity In any approach to LCA, sensitivity to initial estimates requires cautionto initial estimates requires caution

• E-M loop should iterate between 3 and E-M loop should iterate between 3 and 40 times40 times

• Initial estimates for LC prevalences Initial estimates for LC prevalences should be at least 0.3 should be at least 0.3

• Approach shoApproach should uld employemploy replicate replicate estimates using estimates using different starting valuesdifferent starting values

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2525

Parameter estimates from CATMODParameter estimates from CATMOD E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8

Replicated parameter estimates from CATMODReplicated parameter estimates from CATMOD

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2626

Thank you!Thank you!

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2727

AcknowledgementsAcknowledgements

Barbara R. Neas, Ph.D.Barbara R. Neas, Ph.D.

Willis Owen, Ph.D.Willis Owen, Ph.D.

Dept. of Biostatistics and Dept. of Biostatistics and Epidemiology, OUHSCEpidemiology, OUHSC

Gary Raskob, PGary Raskob, Ph.D.h.D.

Dean, College of Public Health, OUHSCDean, College of Public Health, OUHSC

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2828

Assumptions of LCAAssumptions of LCA

• Exhaustiveness Exhaustiveness ABCD ABCD = = X=X=t t ABCDX ABCDX

• Local Independence Local Independence ABCDX ABCDX

= = ABCD|X ABCD|X

==A|X A|X B|X B|X C|X C|X D|X D|X X X

(Goodman’s probabilistic parameterization of a (Goodman’s probabilistic parameterization of a latent class model with four manifest indicators)latent class model with four manifest indicators)

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2929

• Local Independence (2)Local Independence (2)

ABCDXABCDX==A|X A|X B|X B|X C|X C|X D|X D|X X X

ln ln ABCDX ABCDX == + + iiA A + + jj

B B + + kkC C + + ll

D D

+ + ttX X + + itit

AX AX + + jtjtBX BX + + ktkt

CX CX + + ltltDXDX

(Haberman’s loglinear parameterization of a latent (Haberman’s loglinear parameterization of a latent class model with four manifest indicators)class model with four manifest indicators)

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3030

EM algorithmEM algorithm• A way around the difficulty inherent in calculating L A way around the difficulty inherent in calculating L

when some data (X) are unobserved.when some data (X) are unobserved.

• The first “E” (expectation) step requires initial The first “E” (expectation) step requires initial estimates, which essentially “fill in” missing estimates, which essentially “fill in” missing information on LC membershipinformation on LC membership

• ““M” step maximizes likelihood for complete but M” step maximizes likelihood for complete but provisional data, then passes the associated provisional data, then passes the associated parameter estimates to next “E” step.parameter estimates to next “E” step.

• Given updated parameter estimates, revises the Given updated parameter estimates, revises the expected values for cell counts nexpected values for cell counts nijkltijklt in the complete in the complete contingency table while preserving observed marginal contingency table while preserving observed marginal counts ncounts nijklijkl..

• Finds new parameter estimates that maximize L. Finds new parameter estimates that maximize L.

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3131

Prognostic classificationPrognostic classification• Professionals must classify patients even Professionals must classify patients even when information is limited and available only when information is limited and available only in ‘yes/no’ form.in ‘yes/no’ form.

• Example of a challenge to prognostic Example of a challenge to prognostic classification:classification:

• able to ascend/descend flight of 3 able to ascend/descend flight of 3 stairs?stairs?

• positive screening test for depression?positive screening test for depression?

• spouse living at home?spouse living at home?

• independent in using toilet and bath?independent in using toilet and bath?

SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3232

Maximum likelihood approach Maximum likelihood approach to estimating LC parametersto estimating LC parameters

• probability of obtaining observed count nprobability of obtaining observed count n ijklijkl for for response profile {i,j,k,l} is response profile {i,j,k,l} is ((ABCDX ABCDX ))nnijkltijklt

• likelihood of obtaining a set of observed counts likelihood of obtaining a set of observed counts for all response profiles is for all response profiles is

L = L = ii j j k k l l t t ( (ABCDX ABCDX ))nnijkltijklt

log L = log L = ii j j k k l l t t n n ijkltijklt ln( ln(ABCDX ABCDX ))

• Because LC membership (X=t) is unobserved, Because LC membership (X=t) is unobserved, likelihood function is complicated.likelihood function is complicated.