generating automated text complexity classifications...common core grade band mean flesch-kincaid gl...
TRANSCRIPT
Copyright © 2011 Educational Testing Service. All rights reserved.
Generating Automated Text Complexity Classifications that are
Aligned with the Common Core Text Complexity Standard
Kathleen M. Sheehan Educational Testing Service
CCSSO, National Conference on Student Assessment, June 2011, Orlando, FL
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Why Text Complexity is Important
• Form Comparability– Need to support the claim that the passages included on each new
form
of an assessment are comparable to those included on previous forms
• Compliance with Standards– Need to select texts that are consistent with the text complexity model
outlined in the CC State Standards
– Designed to ensure that students are exposed to texts at steadily
increasing complexity levels as they progress through school
so that all students are prepared for the advanced reading demands
of college and workforce training programs
• Valid Feedback about Skill Mastery– Need to measure skill mastery within a grade‐appropriate range of text
complexity since “Even experienced readers may fail to make inferences
that they would ordinarily make without a problem when text materials
are very challenging”
(van den Broeck, 2005, p.116)
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Key Differences Among Existing Approaches
for Modeling Text Complexity• Construct Representation
– How closely does the theory underlying the approach line up with the view of Text Complexity outlined in the
Standards?
• Genre Effects– Are genre effects accounted for? How?
• Heaps Law – Is Heaps Law accounted for? How?
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Construct-Representation
SystemSyntactic
ComplexityVocab.
DifficultyDegree
of Narr.Cohesion
(Ref & SM)
Lexile
(MM)
Flesch‐Kincaid
ATOS (RL)
REAP (CMU)
Coh‐Metrix
(UM)
SourceRater(ETS)
Narr. = Narrativity, Ref = Referential Cohesion, SM = Situation Model Cohesion
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Shallow vs. Deep NLP
SystemSyntactic
ComplexityVocab.
DifficultyDegree
of Narr.Cohesion
(Ref & SM)
Lexile
(MM)
Flesch‐Kincaid
ATOS (RL)
REAP (CMU)
Coh‐Metrix(UM)
SourceRater(ETS)
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Genre Effects: Many Useful Features Function
Differently Within Inf. & Literary Texts
Average ETS Word Frequency
Ave
rage
Gra
de L
evel
52 54 56 58 60 62 64 66
46
810
Literary
Informational
Less Familiar More Familiar
Result:
Literary Predictions are Too Low
Informational Predictions are Too High
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
What Happens When Genre Effects are Ignored
7
Common Core Grade Band
Mea
n Fl
esch
-Kin
caid
GL
Scor
e
2 4 6 8 10 12
24
68
1012
LiteraryInformational
Common Core Grade Band
Mea
n Le
xile
GL
Scor
e
2 4 6 8 10 122
46
810
12
LiteraryInformational
Analysis of 102 texts from CCSS, Appendix B
Flesch-Kincaid Lexile Framework
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Genre Bias is Not Present When Distinct Models are Estimated for Informational & Literary Texts
8
Common Core Grade Band
Mea
n S
ourc
eRat
er G
L S
core
2 4 6 8 10 12
24
68
1012
LiteraryInformational
SourceRater Analysis of 102 texts from CCSS, Appendix B
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Heaps Law
• The probability that a word collection of size s includes a repeated content word increases
with s.
• This is a fundamental characteristic of language that has been validated in a wide array of
applications
(Heaps, 1978)
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Heaps Law and Cohesion Theory Lead to Opposite Conclusions About a
Popular Measure of Cohesion: The Prop. of Sentences with Backward References
Cohesion
TheoryBackward References (i.e., repeated content words)
help the reader follow the thread of an argument or
story, so a high proportion of Backward References
is
an indicator of comprehension ease
Heaps
LawRepeated content words are more likely to appear in
longer sentences, but longer sentences are more
difficult to parse, so a high proportion of Backward
References is an indicator of comp. difficulty
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Empirical Results: 774 Informational & Literary Passages from High-Stakes Reading Assessments
Stem Overlap Adjacent
Ave
rage
Gra
de L
evel
0.0 0.2 0.4 0.6 0.8 1.0
46
810
LiteraryInformational
Prop. of Sentences with Backward References
(Informational Passages Only) Grade N Mean SD
3 46 0.42 0.194-5 89 0.46 0.17
6-7 76 0.42 0.178-9 87 0.46 0.18
10-11 44 0.47 0.17
Knowing the Proportion of Sentences with Backward
References Does Not Reduce Uncertainty about Text GL
Differences in Mean Scores are Not Significant
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
A New Cohesion Measure that Accounts for Effects Due to Heaps Law
• For each possible sentence length, estimate pk ,
the expected probability of a backward reference
• Calculate yi = standardized difference
between
the observed and expected proportion of backward
references in a passage
yi > 0 More backward References than expected,
an indication of comprehension ease
yi < 0 Fewer backward references than expected,
an indication of comprehension difficulty
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Pair Length (words)
P(B
ackw
ard
Ref
eren
ce |
Leng
th, G
enre
)
0 20 40 60 80 100 120
0.0
0.2
0.4
0.6
0.8
1.0
Literary
Informational
pk = P(Backward
Ref | Length = k)
Estimated from 31,885 pairs of adjacent sentences.
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
This New Measure is Very Useful for Distinguishing Texts at Low and High GLs
Stem Overlap Adjacent
Ave
rage
Gra
de L
evel
0.0 0.2 0.4 0.6 0.8 1.0
46
810
LiteraryInformational
Standardized Stem Overlap Adjacent
Aver
age
Gra
de L
evel
-4 -2 0 2 4
46
81
0
LiteraryInform ational
Prop. of Sentences with Backward References
Std. Difference Between Observed & Expected Number
of Backward References
Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.
Summary• Construct Representation
– Systems that only measure vocabulary difficulty and
syntactic complexity may not adequately represent the
complexity construct defined in the new Standards• Genre Effects
– Systems that assume equivalent genre effects run the risk
of providing complexity classifications that are too high for
informational texts, and too low for literary texts• Heaps Law
– Systems that do not account for Heaps Law may not
provide valid information about key dimensions of text
variation, e.g., referential cohesion