generating automated text complexity classifications...common core grade band mean flesch-kincaid gl...

Copyright © 2011 Educational Testing Service. All rights reserved.

Generating Automated Text Complexity Classifications that are

Aligned with the Common Core Text Complexity Standard

Kathleen M. Sheehan Educational Testing Service

[email protected]

CCSSO, National Conference on Student Assessment, June 2011, Orlando, FL

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Why Text Complexity is Important

• Form Comparability– Need to support the claim that the passages included on each new

form

of an assessment are comparable to those included on previous forms

• Compliance with Standards– Need to select texts that are consistent with the text complexity model

outlined in the CC State Standards

– Designed to ensure that students are exposed to texts at steadily

increasing complexity levels as they progress through school

so that all students are prepared for the advanced reading demands

of college and workforce training programs

• Valid Feedback about Skill Mastery– Need to measure skill mastery within a grade‐appropriate range of text

complexity since “Even experienced readers may fail to make inferences

that they would ordinarily make without a problem when text materials

are very challenging”

(van den Broeck, 2005, p.116)


Key Differences Among Existing Approaches

for Modeling Text Complexity• Construct Representation

– How closely does the theory underlying the approach line up with the view of Text Complexity outlined in the

Standards?

• Genre Effects– Are genre effects accounted for? How?

• Heaps Law – Is Heaps Law accounted for? How?


Construct-Representation

SystemSyntactic

ComplexityVocab.

DifficultyDegree

of Narr.Cohesion

(Ref & SM)

Lexile

(MM)

Flesch‐Kincaid

ATOS (RL)

REAP (CMU)

Coh‐Metrix

(UM)

SourceRater(ETS)

Narr. = Narrativity, Ref = Referential Cohesion, SM = Situation Model Cohesion


Shallow vs. Deep NLP

SystemSyntactic

ComplexityVocab.

DifficultyDegree

of Narr.Cohesion

(Ref & SM)

Lexile

(MM)

Flesch‐Kincaid

ATOS (RL)

REAP (CMU)

Coh‐Metrix(UM)

SourceRater(ETS)


Genre Effects: Many Useful Features Function

Differently Within Inf. & Literary Texts

Average ETS Word Frequency

Ave

rage

Gra

de L

evel

52 54 56 58 60 62 64 66

46

810

Literary

Informational

Less Familiar More Familiar

Result:

Literary Predictions are Too Low

Informational Predictions are Too High


What Happens When Genre Effects are Ignored

7

Common Core Grade Band

Mea

n Fl

esch

-Kin

caid

GL

Scor

e

2 4 6 8 10 12

24

68

1012

LiteraryInformational


Mea

n Le

xile

GL

Scor

e

2 4 6 8 10 122

46

810

12


Analysis of 102 texts from CCSS, Appendix B

Flesch-Kincaid Lexile Framework


Genre Bias is Not Present When Distinct Models are Estimated for Informational & Literary Texts

8


Mea

n S

ourc

eRat

er G

L S

core

2 4 6 8 10 12

24

68

1012


SourceRater Analysis of 102 texts from CCSS, Appendix B


Heaps Law

• The probability that a word collection of size s includes a repeated content word increases

with s.

• This is a fundamental characteristic of language that has been validated in a wide array of

applications

(Heaps, 1978)


Heaps Law and Cohesion Theory Lead to Opposite Conclusions About a

Popular Measure of Cohesion: The Prop. of Sentences with Backward References

Cohesion

TheoryBackward References (i.e., repeated content words)

help the reader follow the thread of an argument or

story, so a high proportion of Backward References

is

an indicator of comprehension ease

Heaps

LawRepeated content words are more likely to appear in

longer sentences, but longer sentences are more

difficult to parse, so a high proportion of Backward

References is an indicator of comp. difficulty


Empirical Results: 774 Informational & Literary Passages from High-Stakes Reading Assessments

Stem Overlap Adjacent

Ave

rage

Gra

de L

evel

0.0 0.2 0.4 0.6 0.8 1.0

46

810


Prop. of Sentences with Backward References

(Informational Passages Only) Grade N Mean SD

3 46 0.42 0.194-5 89 0.46 0.17

6-7 76 0.42 0.178-9 87 0.46 0.18

10-11 44 0.47 0.17

Knowing the Proportion of Sentences with Backward

References Does Not Reduce Uncertainty about Text GL

Differences in Mean Scores are Not Significant


A New Cohesion Measure that Accounts for Effects Due to Heaps Law

• For each possible sentence length, estimate pk ,

the expected probability of a backward reference

• Calculate yi = standardized difference

between

the observed and expected proportion of backward

references in a passage

yi > 0 More backward References than expected,

an indication of comprehension ease

yi < 0 Fewer backward references than expected,

an indication of comprehension difficulty


Pair Length (words)

P(B

ackw

ard

Ref

eren

ce |

Leng

th, G

enre

)

0 20 40 60 80 100 120

0.0

0.2

0.4

0.6

0.8

1.0

Literary

Informational

pk = P(Backward

Ref | Length = k)

Estimated from 31,885 pairs of adjacent sentences.


This New Measure is Very Useful for Distinguishing Texts at Low and High GLs

Stem Overlap Adjacent

Ave

rage

Gra

de L

evel

0.0 0.2 0.4 0.6 0.8 1.0

46

810


Standardized Stem Overlap Adjacent

Aver

age

Gra

de L

evel

-4 -2 0 2 4

46

81

0

LiteraryInform ational

Prop. of Sentences with Backward References

Std. Difference Between Observed & Expected Number

of Backward References


Summary• Construct Representation

– Systems that only measure vocabulary difficulty and

syntactic complexity may not adequately represent the

complexity construct defined in the new Standards• Genre Effects

– Systems that assume equivalent genre effects run the risk

of providing complexity classifications that are too high for

informational texts, and too low for literary texts• Heaps Law

– Systems that do not account for Heaps Law may not

provide valid information about key dimensions of text

variation, e.g., referential cohesion

generating automated text complexity classifications...common core grade band mean flesch-kincaid gl...

Documents