Download - Differential Item Functioning
![Page 1: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/1.jpg)
Differential Item Functioning
Laura Gibbons, PhD
Thank you, Rich Jones, Frances Yang, and Paul Crane.
![Page 2: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/2.jpg)
Thank you, NIA• Funding for this conference was made possible, in
part by Grant R13AG030995-01A1 from the National Institute on Aging. The views expressed in written conference materials or publications and by speakers and moderators do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.
![Page 3: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/3.jpg)
Many (most?) of these slides were adapted or copied directly from his presentations.Check out his 3-day workshop:http://www.hebrewseniorlife.org/latent-variable-methods-workshop
Thank you, Rich
![Page 4: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/4.jpg)
Outline
1. Why do we care about DIF?2. A few notes about Item response theory3. What is DIF?4. How do we find DIF?5. What do we do when we find DIF?6. Does DIF matter?
![Page 5: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/5.jpg)
We want unbiased tests
• We want a test score to mean the same thing in all subgroups of people.
• Test bias has been recognized as an issue for at least a century.– Missing a question based on a reference to a
regatta may indicate race and/or SES, not intelligence.
![Page 6: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/6.jpg)
• Test bias came to the forefront in the 60’s, particularly with respect to race. – Many similar assumptions of a uniform culture
turned out to be invalid.– Educational testing, intelligence testing, insurance
licensing, firefighting– There hasn’t been a big political struggle for lack
of bias in cognitive aging, measures of affect, but an important research concern none the less.
![Page 7: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/7.jpg)
My favorite example of potential bias
• Does endorsing the item “I cry easily” mean the same thing in women as in men?
![Page 8: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/8.jpg)
Cognitive Tests
• Cognitive test scores should represent cognition, not sex, race, test language, age, SES, etc.
• True differences between groups in cognition exist.
• However, the difference should not affect the relationship between a person’s cognitive test score and their true cognitive ability.
![Page 9: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/9.jpg)
2. A few notes about Item Response Theory
![Page 10: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/10.jpg)
Key Ideas of IRT• Persons have a certain ability or trait• Items have characteristics
– difficulty (how hard the item is)– discrimination (how well the item measures the ability)– (I won’t talk about guessing)
• Person ability, and item characteristics are estimated simultaneously and expressed on unified metric
• Interval-level measure of ability or trait. – This means that no matter what your ability level, a change
of one point in the score represents an equivalent amount of change in ability. (NOT true for MMSE and most cognitive tests.)
![Page 11: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/11.jpg)
Some Things Rich (and others)Can Do with IRT
1. Refine measures2. Identify ‘biased’ test items3. Adaptive testing4. Handle missing data at the item level5. Equate measures
![Page 12: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/12.jpg)
Latent Ability / Trait• Symbolized with qi (or hi)
• Assumed to be continuously, and often normally, distributed in the population
• The more of the trait a person has, the more likely they are to ...whatever...(endorse the symptom, get the answer right etc.)
• The latent trait is that unobservable, hypothetical construct presumed to be measured by the test (assumed to “cause” item responses)
![Page 13: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/13.jpg)
Dimensionality• It matters whether or not the latent trait is
unidimensional.– Knowing a person’s level on single underlying latent trait is
sufficient to predict their likelihood of success on an item.– Item responses are dependent upon a person’s ability (and
item characteristics) only.– Secondary factors are trivial.
• There are methods that allow for departures from unidimensionality, but I won’t talk about them today.
![Page 14: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/14.jpg)
Item Characteristic Curve• The fundamental conceptual unit of IRT
• Relates item responses to the ability presumed to cause them
• Represented with cumulative logistic or cumulative normal distributions
• Here we illustrate with dichotomous items, for simplicity
![Page 15: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/15.jpg)
Item Response Function
P(yij=1|qi) = F[aj(qi-bj)]Example of an Item Characteristic Curve
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
![Page 16: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/16.jpg)
Example of an Item Characteristic Curve: High AbilityExample of an Item Characteristic Curve
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
A Person with High AbilityHas a High Probability ofResponding Correctly
![Page 17: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/17.jpg)
Example of an Item Characteristic Curve: Low AbilityExample of an Item Characteristic Curve
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
A Person with Low AbilityHas a Low Probability ofResponding Correctly
![Page 18: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/18.jpg)
Item Difficulty
Example of an Item Characteristic Curve
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Item Difficulty: The level ofability at which a person hasa 50% probability ofresponding correctly.
![Page 19: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/19.jpg)
Item and Person Ability are on the Same Metric
LatentTraitDensity
0.000.100.200.300.400.500.600.700.800.901.00
Prob
abili
ty o
f a C
orre
ct R
espo
nse
-3 -2 -1 0 1 2 3Latent Trait Level
![Page 20: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/20.jpg)
Example of Two ICCs that Differ in Difficulty
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
![Page 21: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/21.jpg)
Item Discrimination
Example of an Item Characteristic Curve
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Item Discrimination:How well the item separatespersons of high and low ability;Proportional to the slope of theICC at the point of inflection
![Page 22: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/22.jpg)
The Steeper Curve Has Greater Discrimination
Example of Two ICCs that Differ in Discrimination
Prob
abili
ty o
f Cor
rect
Res
pons
e
Latent Ability Distribution-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
![Page 23: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/23.jpg)
3. What is DIF?
![Page 24: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/24.jpg)
Identify Biased Test ItemsDifferential Item Functioning (DIF)
• Differences in endorsing a given item may be due to – group differences in ability– item bias– both
• IRT can parse this out• Item Bias = Differential Item Function + Rationale• Most IRT users identify DIF when two groups do not
have the same ICC
![Page 25: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/25.jpg)
• DIF: When a demographic characteristic
interferes with the relationship expected between a person’s ability level and responses to an item.
• This is a conditional definition; we have to control for ability level, or else we can’t differentiate between DIF and differential test impact.
![Page 26: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/26.jpg)
LatentTraitDens ity
0.000.100.200.300.400.500.600.700.800.901.00
Pro
babi
lity
of a
Cor
rect
Res
pons
e
-3 -2 -1 0 1 2 3Latent Trait Level
Example of group heterogeneity but no DIF
![Page 27: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/27.jpg)
Here the overall levels differ, and there is also Uniform DIF
LatentTraitDens ity
0.000.100.200.300.400.500.600.700.800.901.00
Pro
babi
lity
of a
Cor
rect
Res
pons
e
-3 -2 -1 0 1 2 3Latent Trait Level
Example of group heterogeneity and uniform DIF
![Page 28: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/28.jpg)
Non-Uniform (and uniform) DIF
LatentTraitDens ity
0.000.100.200.300.400.500.600.700.800.901.00
Pro
babi
lity
of a
Cor
rect
Res
pons
e
-3 -2 -1 0 1 2 3Latent Trait Level
Example of group heterogeneity and non-uniform DIF
![Page 29: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/29.jpg)
4. How do we find DIF?
![Page 30: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/30.jpg)
Chi-square
• Educational testing still uses 2x2 tables and chi-squared tests.
• Pros: conceptually and computationally easy• Cons:
– Needs huge samples with adequate discordance.– Need to estimate ability and DIF in separate steps,
potentially introducing bias.– Assumes ability is unidimensional.
![Page 31: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/31.jpg)
Logistic Regression
• Logistic regression, or ordinal logistic regression for ordinal items.
• Uses the logistic link for the ICC curve equation:
P(yij=1|qi) = F[aj(qi-bj)]
![Page 32: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/32.jpg)
The 2 Parameter Logistic model
• Logit P(Y=1|a,b,θ)=Da(θ-b)– Models probability that a person correctly
responds to an item given the item parameters (a,b) and their person ability level θ
– b is the item difficulty• When θ=b, 50% probability of getting the item correct
– a is item discrimination• a determines slope around the point where θ=b
– D is a constant
![Page 33: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/33.jpg)
1. P(Y=1| θ)=f(β1 θ)2. P(Y=1| θ, group)=f(β1 θ +β2*group)3. P(Y=1| θ, group)=f(β1 θ +β2*group+β3* θ *group)
– Uniform DIF: Compare models 1 and 2.– Non-Uniform DIF: Compare models 2 and 3.
Logistic Regression
![Page 34: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/34.jpg)
• Pros: – Handles fairly small samples. – Quick and easy if you’ve got Stata and Parscale, or
R• Cons:
– Need to estimate ability and DIF in separate steps, potentially introducing bias.
– Assumes ability is unidimensional.– Need specific software.
Logistic Regression
![Page 35: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/35.jpg)
Latent Variable Modeling
• Single and 2-group MIMIC* models.• “We” use Mplus for this.• Compare the loadings and intercepts of the
test items.
* Multiple Indicators Multiple Causes
![Page 36: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/36.jpg)
Factor Analysis with Covariates
x
h
1 1y
4y
1
2y 2
3
4
2
3
4y 3
*
*
*
*
1
1 ,1
1 ,1
1
11 , 1
MIMIC Model Multiple Indicators, Multiple Cause
y = h + x +
assuming VAR(h) = 1, h=0
a = 1-2 , b =
x
is sufficient to describe uniform DIF
![Page 37: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/37.jpg)
Multiple Group (MG) MIMIC
x
h
1 1y
4y
1
2y 2
3
4
2
3
4y 3
*
*
*
*
1
1 ,1
1 ,1
1
11 , 1
x
h
1 1y
4y
1
2y 2
3
4
2
3
4y 3
*
*
*
*
1
1 ,1
1 ,1
1
11 , 1
gro u p = 0 gro u p = 1
![Page 38: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/38.jpg)
• Pros:– Simultaneous modeling of differences in ability and item-level
performance– Capable of handling multidimensional constructs– Can use continuous variables for Uniform DIF
• Cons:– Not precisely the IRT model– Modeling Non-Uniform DIF a challenge (Multiple Group
models required)– Need specialized software.
Latent Variable Modeling
![Page 39: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/39.jpg)
5. What do we do when we find DIF?
![Page 40: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/40.jpg)
• In educational settings, often items with DIF are discarded.
• Unattractive option for us– Tests are too short as it is.– Lose variation and precision.– DIF doesn’t mean that the item doesn’t measure
the underlying construct at all, just that it does so differently in different groups.
Discard the item?
![Page 41: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/41.jpg)
Better to account for the DIF
• In logistic regression:• Constrain parameters for DIF-free items to be
identical across groups• Estimate parameters for items found with DIF
separately in appropriate groups
• In latent variable modeling, it’s all one big model.
![Page 42: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/42.jpg)
If we account for DIF, is the test unbiased?
• We can only adjust for measured covariates.• Confounders such as education level may
mean different things for different groups.• We may lack power or the data may be too
sparse to account for all the DIF.• If most of the items on a test are biased, it’s
hard to get anywhere.
![Page 43: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/43.jpg)
6. Does DIF matter?
![Page 44: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/44.jpg)
DIF Impact
• We find DIF in a lot of cognitive tests.
• It’s important to assess the impact of items with DIF on the final score.
• Often DIF in individual items favors one group in some items and the other group in others, the net result being a score that has little bias.
![Page 45: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/45.jpg)
Good for the field, bad for my job security
• So far, in my experience, cognitive scores accounting for DIF correlate very highly with the original IRT scores.
– Even for DIF with respect to test language.
![Page 46: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/46.jpg)
Here at Friday Harbor
• How about depression scales?My workgroup will look.
• Alden’s calibrated scores?Fascinating missing data question.
![Page 47: Differential Item Functioning](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f50550346895dce2e9f/html5/thumbnails/47.jpg)
Despite what I said about usually finding minimal impact,
DIF should be assessed as part of any test validation.