improving latent trait analysis using mokken scaling analysis

Improving latent trait analysis using Mokken scaling analysis

Roger WatsonLisa Kirke

Faculty of Health & Social Care23 June 2015

03/05/2023 © The University of Sheffield / Department of Marketing and Communications

ROGER WATSON

PRESENTS

FUN WITH MOKKEN SCALING


Improving latent trait analysis using Mokken scaling analysis

• Revise IRT• Local stochastic independence• New developments

• Invariant item ordering• Confidence intervals• Person-item fit• Sample size

• Example of recent work


Revise IRT

• Measurement in social sciences• Advantages of IRT• Guttman and Mokken


Methods of measurement in social researchRange of methods

• Classical test theory• Item response theory• Latent class analysis


Item response theory

• Rasch analysis• Partial credit model• Mokken scaling


Advantages of item response theory

• only a specific set of items produces a given score on the latent variable

• therefore, you know what the score means


Item response theory (IRT)

• The unit of analysis in IRT:– The item characteristic curve (ICC)

• Also known as:– The item response curve (IRC)– The item response function (IRF)


Mokken Scaling

• Stochastic version of Guttman scaling


Guttman scales

Louis Guttman 1916-1987


Guttman scalogram


Guttman item responses are ‘deterministic’

P(θ)

θ

1 -

Item i Item j


Mokken Scaling

• Stochastic version of Guttman scaling


Deterministic versus stochastic:‘league versus cup’


Mokken Scaling

• Stochastic version of Guttman scaling• Adheres to the assumptions of item

response theory


16

Assumptions of IRT

• Unidimensionality• Local stochastic independence• Monotone homogeneity• Double monotonicity (non-intersection) for

dichotomous items†

• † eg “yes/no”


Robert Mokken 1929-

Mokken scaling

Mokken suggested a non-parametric item response theory where characteristic curves (ICCs) only had to be monotone and non-intersecting


Item characteristic curve

P(θ)

θ

1 -

= latent variable


Item characteristic curves

P(θ)

θ

1 -Item 1 Item 2

• item 2 is more ‘difficult’ than item 1

• it represents more of the latent variable

• more difficult items will have lower mean scores on the latent variable


Mokken Scaling

• Stochastic version of Guttman scaling• Adheres to the assumptions of item

response theory• Hierarchical cumulative scales


Concept of a cumulative scale

• Items are ordered reproducibly• Item are ordered meaningfully• A score on an item indicates the extent to

which the latent trait is present• The sum of item scores is a measure for

order of the latent trait


Cumulative scale: example(Also known as an ‘implicational’ scale)5) I would have no objections to my son or daughter marrying

a Scottish person

4) At a party I would not hesitate to dance with a Scottish person

3) I would have no objections to having a Scottish person dine in my house

2) I would not object to having a Scottish family live next door

1) I would not object to sitting next to a Scottish person on a bus

Response format: “yes” = 0/”no” = 1DIFFICULTY


Local stochastic independence


New developments in Mokken scaling

• Invariant item ordering

Invariant item ordering

P

Disability

Cut toe nails

Tie a knot

Fieo R, Watson R, Deary IJ, Starr JM (2010) A revised activities of daily living/instrumental activities of daily living instrument increases interpretive power: theoretical application for functional tasks Gerontology 56, 483-490


31

Polytomous item response functions (IRFs)

of the Hostility (HOS) items

HT = 0.67


32

Polytomous item response

functions (IRFs) of the Depression

(DEP) items

HT = 0.47


33

Polytomous item response functions

(IRFs) of the Physical

Functioning (PF) items

HT = 0.53



• Invariant item ordering• Confidence intervals


95% CI for H ij should not include 0

95% CI for Hi should not include 0.30



• Invariant item ordering• Confidence intervals• Person-item fit


…most studies are difficult to understand for nonspecialists.


FITSDON’T FIT

= Guttman errors


Italian EdFED scale PIF analysis

• Data for EdFED (Edinburgh Feeding Evaluation in Dementia scale) from an

intervention study (The Nutricare Project) in Italy with baseline and 6-

monthly follow-up

• Analysed using the PerFit programme in R


Italian EdFED scale PIF analysis

Person-item fit scores for selected individuals across the studyPerson ID Months 0 1 2 3 4 5

6

148 0.29 0.28 0.280.28 0.28 0.28 0.27

269 0.30 0.22 0.220.22 0.25 0.24 0.24

290 0.29 0.25 0.250.25 0.25 0.23 0.23

291 0.29 0.28 0.280.28 0.25 0.23 0.30

330 NF 0.23 0.230.23 0.22 0.23 0.21

NF = not flagged



• Invariant item ordering• Confidence intervals• Person-item fit• Sample size


The effect of sample size on Mokken scales in the Warwick-Edinburgh Mental Well-Being scale

• Increasing sample size• 50/250/500/600/750• Sampling with replacement• Study effects on:

• Scalability• Confidence intervals• Per element accuracy

range of Hi H Hi with 95% CI < .3 (n)

Hij with 95% CI < 0 (n)

PEA (AISP)

PEA (GA)

sample 1 .18 - .46 .32 12 47 .64 .64sample 2 .10 - .47 .31 12 43 .64 .64sample 3 .16 - .51 .35 11 39 .79 .79sample 4 .10 - .41 .26 14 60 .71 .71sample 5 .27 - .58 .43 6 25 .86 .86sample 6 .08 - .47 .35 9 41 .79 .79sample 7 .02 - .40 .27 14 53 .64 .64sample 8 .17 - .46 .29 12 57 .57 .57sample 9 .14 - .56 .37 12 46 .79 .79sample 10 .09 - .57 .38 9 35 .79 .79

Range of Hi values, H, CIs for Hi and Hij, and PEA for ten samples with n = 50Note. Hi values are based on TEST for all 14 items.The values of PEA (AISP) and PEA (GA) are the same.However, not always are the same items selected in the same scale.



PEA (AIS

P)

PEA (GA)

sample 1 .36 - .58 .48 2 0 1.00 1.00sample 2 .31 - .52 .41 5 3 1.00 1.00sample 3 .32 - .62 .48 2 0 1.00 1.00sample 4 .26 - .55 .47 2 0 .93 .93sample 5 .18 - .50 .39 6 10 .86 .86sample 6 .37 - .61 .49 1 0 1.00 1.00sample 7 .32 - .56 .46 2 0 1.00 1.00sample 8 .40 - .60 .49 0 0 1.00 1.00sample 9 .40 - .60 .49 0 0 .93 .93sample 10 .24 - .55 .44 2 2 .93 .93

Range of Hi values, H, CIs for Hi and Hij, and PEA for ten samples with n = 250Note. Hi values are based on TEST for all 14 items.The values of PEA (AISP) and PEA (GA) are the same.Furthermore, the same items are always selected in the same scale (only exception is shown in sample 9).



PEA (AIS

P)

PEA (GA)

sample 1 .30 - .60 .49 1 0 1.00 1.00sample 2 .32 - .58 .47 1 0 1.00 1.00sample 3 .36 - .57 .47 1 0 1.00 1.00sample 4 .33 - .58 .48 1 0 1.00 1.00sample 5 .34 - .59 .49 1 0 1.00 1.00sample 6 .35 - .61 .50 1 0 1.00 1.00sample 7 .33 - .59 .47 1 0 1.00 1.00sample 8 .37 - .60 .50 0 0 1.00 1.00sample 9 .25 - .55 .43 2 0 .93 .93sample 10 .33 - .63 .53 1 0 1.00 1.00

Range of Hi values, H, CIs for Hi and Hij, and PEA for ten samples with n = 500Note. Hi values are based on TEST for all 14 items.The values of PEA (AISP) and PEA (GA) are the same.Furthermore, the same items are always selected in the same scale.



PEA (AIS

P)

PEA (GA)

sample 1 .31 - .62 .50 0 0 1.00 1.00sample 2 .33 - .57 .46 1 0 1.00 1.00sample 3 .37 - .57 .48 0 0 1.00 1.00sample 4 .33 - .59 .49 0 0 1.00 1.00sample 5 .35 - .59 .49 1 0 1.00 1.00sample 6 .34 - .61 .50 1 0 1.00 1.00sample 7 .31 - .59 .47 1 0 1.00 1.00sample 8 .36 - .59 .49 1 0 1.00 1.00sample 9 .27 - .55 .44 1 0 .93 .93sample 10 .33 - .61 .51 1 0 1.00 1.00

Range of Hi values, H, CIs for Hi and Hij, and PEA for ten samples with n = 600Note. Hi values are based on TEST for all 14 items.The values of PEA (AISP) and PEA (GA) are the same.Furthermore, the same items are always selected in the same scale.



PEA (AIS

P)

PEA (GA)

sample 1 .30 - .57 .47 1 0 1.00 1.00sample 2 .38 - .59 .50 0 0 1.00 1.00sample 3 .33 - .61 .52 1 0 1.00 1.00sample 4 .37 - .64 .53 0 0 1.00 1.00sample 5 .35 - .59 .49 1 0 1.00 1.00sample 6 .33 - .60 .49 1 0 1.00 1.00sample 7 .38 - .60 .50 0 0 1.00 1.00sample 8 .37 - .60 .50 0 0 1.00 1.00sample 9 .37 - .59 .49 0 0 1.00 1.00sample 10 .40 - .61 .51 0 0 1.00 1.00

Range of Hi values, H, CIs for Hi and Hij, and PEA for ten samples with n = 750Note. Hi values are based on TEST for all 14 items.The values of PEA (AISP) and PEA (GA) are the same.

The AQ formed reliable Mokken scales. There was a large overlap between the scale from the university student sample and the sample with ASC, with the first scale, relating to social interaction, being almost identical. The present study confirms the utility of the AQ as a single instrument that can dimensionalize autistic traits in both university student and clinical samples of ASC, and confirms that items of the AQ are consistently ordered relative to one another.


Conclusion & prospects

• Mokken scaling useful

BUT:

• Sample sizes need to be quite large

• User-friendly methods of assessing LSI need to be developed

• We need some formal criteria to decide what to do with person-item misfits

[email protected]

0000-0001-8040-7625

@rwatson1955

improving latent trait analysis using mokken scaling analysis

Health & Medicine