how does health psychology measure up?

48
How does health psychology measure up? A critical look at measurement in health psychology Matthew Hankins 16th September 2011

Upload: matthew-hankins

Post on 06-May-2015

2.566 views

Category:

Career


1 download

DESCRIPTION

A critical look at measurement in Health Psychology.

TRANSCRIPT

Page 1: How does health psychology measure up?

How does health psychology measure up?

A critical look at measurement in health psychology

Matthew Hankins16th September 2011

Page 2: How does health psychology measure up?

2

The empirical basis of Health Psychology• Why do Health Psychologists collect data?

– Theory generation, esp. identifying constructs– Theory corroboration – Measuring outcomes (trials etc.)

• The value of such activities is therefore critically dependent on the quality of the data

Page 3: How does health psychology measure up?

3

Questionnaire measures• Majority of data collected by Health Psychologists

is generated by questionnaire measures (‘scales’)

• Questionnaires vary in the quality of data that they generate

– Validity: extent to which the questionnaire measures what is intended

– Reliability: extent to which variance in data reflects variance in construct measured

• Index of measurement error

Page 4: How does health psychology measure up?

4

Pragmatic approach• Validity

– Unidimensionality (factor analysis)– Associations between measures– Discrimination between known groups

• Reliability

– Estimated by Cronbach’s Alpha– Or test-retest correlation

Page 5: How does health psychology measure up?

5

Scale development• Combination of these approaches is derived from

‘Classical Test Theory’ (CTT)

– Originated with Spearman (1904)– Landmark text: Guilford 2nd ed. (1954) – Fully developed by Lord & Novick (1968)

• Further developments: ‘item-response theory’ (IRT)

– E.g Rasch model (1960)

• CTT implicit in most empirical Health Psychology research

Page 6: How does health psychology measure up?

6

What is a scale?• A scale orders people on the construct of interest

• Both CTT & IRT agree that a person’s position on the dimension can be estimated from the item scores

• Strength of IRT is that it does not assume that a set of correlated items forms a scale

• Implicit in CTT: if items load on same factor, we automatically assume that they form a scale

Construct

Low Person A Person B Person C Person D High

Page 7: How does health psychology measure up?

7

Scaling problem• Whether a set of items forms a scale is a hypothesis

(Guttman 1950)

– Formally tested whether items formed ‘Guttman scales’

• “In contemporary psychometric practice, it is the rule rather than the exception that two people having the same score on a test will have [endorsed]different items…Such scores are crude empirical devices known to have some predictive efficiency, but they cannot be called measurements in any strict sense” (Loevinger 1948)

• Additionally, there is no rational basis for adding up a set of ordinal Likert scores unless they have been shown to scale

Page 8: How does health psychology measure up?

8

Example: PHQ-9• Feeling tired + Little interest in doing things +

Poor appetite several days in last 2 weeks

– Scale score = +3

• Thoughts of hurting yourself in some way nearly every day in last 2 weeks

– Scale score = +3

• Are these responses really equivalent?

Page 9: How does health psychology measure up?

9

Implications• If a set of items are assumed to form a scale, then

we cannot be sure that the scale score accurately ranks people on the construct of interest

– People with different positions may be assigned the same score

– People with the same position may be assigned different scores

• Unless we test this hypothesis, assessing reliability & validity is pointless

Page 10: How does health psychology measure up?

10

Rejecting the hypothesis of a scale• Scales are very rarely ‘rejected’ in health

psychology

• Reliability is usually reported as ‘acceptable’ or ‘good’

– Based on arbitrary cut-off around 0.7 (0.6, 0.5…)– “Test-retest reliability was acceptable (r=0.43)”

• Criteria for validity are usually not specified in advance

– Any factor structure can be accommodated– Any association can be cited as ‘validating’ scale

• Formal testing of ‘scalability’ of items rare

Page 11: How does health psychology measure up?

11

What we would like: interval scales

What we might have: ordinal scales

What we probably have: disordered categories

A scale that cannot rank-order people is not a scale

Disordered categories

Page 12: How does health psychology measure up?

12

Item ‘difficulty’ (intensity)• The problem arises because CTT does not account

for item difficulty or intensity

• Some items are endorsed at low levels of the construct

– ‘Low intensity item’– Endorsement may indicate low or high level of construct

• Some items are endorsed at high levels of the construct

– ‘High intensity item’– Endorsement indicates high level of construct

Page 13: How does health psychology measure up?

13

Example: PHQ-9• Feeling tired on several days is a low intensity item

– Endorsed at low level of depression– But may also be endorsed at higher levels of

depression

Depression

Low Yes Yes Yes Yes High

Page 14: How does health psychology measure up?

14

Example: PHQ-9• Thoughts of hurting yourself in some way nearly

every day in last 2 weeks is a high intensity item

– Endorsed at high level of depression– But not endorsed at lower levels of depression

Depression

Low No No No Yes High

Page 15: How does health psychology measure up?

15

How CTT fails to deal with item intensityFactor analysis groups items of similar intensity

• Factor analysis of a unidimensional construct will produce more than one ‘factor’

• These ‘factors’ are simply sets of items with similar intensities

Page 16: How does health psychology measure up?

16

Example: GHQ-12

• Example: GHQ-12

• Many studies report 2- or 3-factor solutions

• ‘Factors’ simply group items by intensity (Hankins 2008)

Psychiatric morbidity

Low High7 4 5 2 6 10 111 12 98 3

Page 17: How does health psychology measure up?

17

How CTT fails to deal with item intensitySelecting items on basis of factor analysis exacerbates problem, but simultaneously conceals it

• Items are selected on basis of similar intensities, creating scales with limited range but high reliability

Psychiatric morbidity

Low High7 4 5 2 6 10 111 12 98 3

Low High

7 41 128 3

Psychiatric morbidity

Page 18: How does health psychology measure up?

18

Why Rasch modelling is not the answer• Rasch modelling (RM) explicitly takes into account

item intensities

– Stochastic Guttman scale

• Tests the hypothesis that items form a scale

• Additionally claims to produce interval scaling & ‘objective’ measurement

• Increasingly popular in Health Psychology

Page 19: How does health psychology measure up?

19

CTT vs. IRT• Argument tends to be that IRT is superior to CTT &

IRT is ‘objective’ measurement

• Differences more apparent than real:

– Large correlations between CTT data & IRT data– If data treated as ordinal, perfect correlation

between CTT & Rasch data

From Embretson & Reise (2000)

Page 20: How does health psychology measure up?

20

GHQ-12: CTT scoring vs. RM scoring

Page 21: How does health psychology measure up?

21

Problems• Rasch models require very large samples to allow

estimation of person and item parameters

• Very strong assumptions, e.g. logistic item-response curve

– Why should all items have the same form of response?

• The data must fit the model, not the other way round

– Discards potentially useful data to fit arbitrary assumptions

• Interval scaling is questionable gain if psychological constructs are not quantitative in the first place

Page 22: How does health psychology measure up?

22

Ontological diversion• In general, psychologists seem to believe that

attributes are either categorical or quantitative

– A ‘cat’ is a different from a ‘tree’: different categories, difference is qualitative

– 30cm is different 60cm: different quantities, difference is quantitative

• Having made this distinction, quantitative attributes may be measured as categorical, ordinal, interval

• Ordinal attributes cannot exist in their own right

– Just a way of collecting data on a quantitative attribute

Page 23: How does health psychology measure up?

23

Ontological diversion• Russell (1896): the difference between two

quantities is itself a quantity

– The difference between two lengths is itself a length

• For psychological attributes to be quantitative, the difference between two ‘levels’ of that attribute must itself be a ‘level’ of that attribute

– Is the difference between two pleasures itself a pleasure?– Is the difference between two levels of depression itself a

level of depression?

• If not, are psychological states then merely categorical?

– But what then do we mean by ‘severity’ of depression?

Page 24: How does health psychology measure up?

24

Ontological diversion• Is it possible for psychological attributes to be

ordinal?

– Can something exist in degree but not quantity?

• Michell (2009) argues that we cannot assume quantity from degree

– shows that they are logically separable: “It is possible that an ordered attribute is non-quantitative”

• Collingwood (1933) argues that some concepts exist only in degree

Page 25: How does health psychology measure up?

25

Ontological diversion• Are we comfortable talking about degree, rather

than quantity?

• Implicit in our descriptions and experiences of psychological attributes

– But does not require the assumption that the attributes are quantitative

Page 26: How does health psychology measure up?

26

The degrees of the lie• JAQUES

– Can you nominate in order now the degrees of the lie?

• TOUCHSTONE

– O sir, we quarrel in print, by the book; as you have books for good manners: I will name you the degrees. The first, the Retort Courteous; the second, theQuip Modest; the third, the Reply Churlish; thefourth, the Reproof Valiant; the fifth, theCountercheque Quarrelsome; the sixth, the Lie withCircumstance; the seventh, the Lie Direct.

• As You Like It, Act 5 Scene 1

Page 27: How does health psychology measure up?

27

Summary• Measurement methods in health psychology are

suboptimal

• In particular, the fundamental assumption that correlated items form a scale is not routinely tested

• IRT models such as the Rasch model assume that interval scaling is meaningful

• Psychological attributes may not exist as quantities

• Is there a method for constructing purely ordinal scales?

Page 28: How does health psychology measure up?

28

Non-parametric IRT (NPIRT)• E.g. Mokken (1971)

• Takes into account item intensities

– Stochastic Guttman scale

• Claims only to rank order people

• Very weak assumptions

– Retains data

• Complements CTT

– Uses simple scale score

Page 29: How does health psychology measure up?

Examples of NPIRT analysis

Page 30: How does health psychology measure up?

• Mokken (1971) proposed two models

– Monotone homogeneity model (MH)– Doubly monotone model (DM)

• Scales fitting the MH model rank order people on the attribute of interest

• Corollary is that scales not fitting the MH model do not rank order people on the attribute of interest

Page 31: How does health psychology measure up?

• Select items for the scale based on homogeneity

• Assess whether the resulting scale fits the MH model

• Scaling procedure and the MH model based on the following minimal assumptions:

– For all items, if person A has a higher degree of X than person B, A’s probability of endorsing an item will be equal to or higher than B’s

– Local independence: item scores are uncorrelated for the same degree of attribute

Page 32: How does health psychology measure up?

• If the purpose of the scale is to rank order people on a given attribute then the scale must be monotone homogenous

• Probability of item being endorsed must be monotone nondecreasing against attribute

• i.e. probability of item endorsement does not decrease with an increase in the measured attribute

* - as estimated from the remaining items of the scale

Page 33: How does health psychology measure up?

For this GHQ-12 item the probability of endorsement reaches 50% at a low level of psychological distress.

It is therefore a low intensity item: people endorsing this item are signalling a low level of distress.

Page 34: How does health psychology measure up?

For this GHQ-12 item the probability of endorsement reaches 50% at a high level of psychological distress.

It is therefore a high intensity item: people endorsing this item are signalling a high level of distress.

Page 35: How does health psychology measure up?

• If two items belong to a unidimensional scale, then:

– Endorsing the more intense item entails that the less intense item also be endorsed

– Endorsing the less intense item does not entail that the more intense item be endorsed

• For a Guttman scale, these are deterministic statements

• For a Mokken scale, these are probabilistic statements

Page 36: How does health psychology measure up?

• A Guttman error occurs when the more intense item is endorsed but not the less intense item

• Too many Guttman errors imply that items are not measuring the same attribute

More intense item

Less intense item

Page 37: How does health psychology measure up?

• This asymmetrical relationship between item pairs can be summarised with Loevinger’s H

– H is the coefficient of homogeneity between two items i and j

• Ranges from 0.0 to 1.0

– 0.0 indicates no association between items– 1.0 indicates perfect association, given the differences in item

intensity– 1.0 also indicates no Guttman errors

• Mokken (1971) developed H for scale development

– Hij : Homogeneity of pair of items

– Hi : Homogeneity of item i with all items

– H : Homogeneity of scale

Page 38: How does health psychology measure up?

• All Hij > 0

• Start with item pair with highest Hij

• Select third item to maximise scale H

• Proceed until H reaches threshold value c

• Produces a unidimensional scale

– c = 0.3; weak scale– c = 0.4; medium scale– c = 0.5; strong scale– c = 1.0; perfect Guttman scale

Page 39: How does health psychology measure up?

Results for GHQ-12

Step Item Scale H1 p6d 0.791 n4d 0.792 n6d 0.733 n5d 0.684 n2d 0.645 n3d 0.616 p5d 0.597 p3d 0.578 p4d 0.559 n1d 0.5310 p2d 0.5111 p1d 0.50

• => the items of the GHQ-12 form a strong unidimensional scale

Page 40: How does health psychology measure up?

Monotone homogeneity model: GHQ-12

Item H #vi maxvi zmax #zsig

p1d 0.44 0 0.00 0.00 0

n1d 0.45 0 0.00 0.00 0

p2d 0.43 1 0.06 0.99 0

p3d 0.50 0 0.00 0.00 0

n2d 0.55 0 0.00 0.00 0

n3d 0.51 0 0.00 0.00 0

p4d 0.47 0 0.00 0.00 0

p5d 0.50 1 0.05 0.90 0

n4d 0.56 0 0.00 0.00 0

n5d 0.50 0 0.00 0.00 0

n6d 0.56 1 0.05 0.93 0

p6d 0.53 1 0.04 0.68 0

• Small deviations from MH model but none significant

Page 41: How does health psychology measure up?
Page 42: How does health psychology measure up?
Page 43: How does health psychology measure up?

Conclusion

• The GHQ-12 is a strongly homogenous unidimensional scale

• Small deviations from monotone homogeneity, none significant

• The GHQ-12 summed score can rank order people by the measured attribute

• i.e. it can serve as an ordinal measure of severity of psychiatric impairment

• Compare to results of EFA/CFA studies

Page 44: How does health psychology measure up?

Example: Northwick Park dependency scale

• Item selection from pool of 16 items

Item Scale H

Q8 0.93

Q5 0.93

Q9 0.93

Q2 0.91

Q1 0.88

Q13 0.87

Q7 0.84

Q12 0.82

Q6 0.79

Q14 0.76

Q4 0.74

Q3 0.70

Q11 0.67

Q15 0.62

• 14 items form unidimensional scale

Page 45: How does health psychology measure up?

• Two items with serious violations of monotone homogeneity

Item H #vi maxvi zmax #zsig

Q3 0.45 6 0.25 2.88 4

Q11 0.32 5 0.28 3.43 2

Q3: help required using toilet (urination)

Q11: help required with drinking

Page 46: How does health psychology measure up?
Page 47: How does health psychology measure up?

• Some items decrease in probability as attribute increases

• With extreme dependency, patients require less help with drinking and emptying bladder– Because at this extreme, they are more likely to be

tube-fed and catheterised • Hence, for these items, probability of

endorsement decreases as dependency increases– Scale is not monotone homogenous

• The summed score will not rank order people on the measured attribute

Page 48: How does health psychology measure up?

48

Summary• The credibility of Health Psychology research &

practice rests on its empirical evidence base

• This evidence base relies on the quality of questionnaire data

• The quality of questionnaire data may be compromised by the use of inappropriate methods

• We should stop relying on factor analysis & reliability coefficients & test the hypothesis that a set of items constitutes a scale