generating automated text complexity classifications...common core grade band mean flesch-kincaid gl...

15
Copyright © 2011 Educational Testing Service. All rights reserved. Generating Automated Text Complexity Classifications that are Aligned with the Common Core Text Complexity Standard Kathleen M. Sheehan Educational Testing Service [email protected] CCSSO, National Conference on Student Assessment, June 2011, Orlando, FL

Upload: others

Post on 18-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.

Generating Automated Text Complexity Classifications that are

Aligned with the Common Core Text Complexity Standard

Kathleen M. Sheehan Educational Testing Service

[email protected]

CCSSO, National Conference on Student Assessment, June 2011, Orlando, FL

Page 2: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Why Text Complexity is Important

• Form Comparability– Need to support the claim that the passages included on each new

form 

of an assessment are comparable to those included on previous forms

• Compliance with Standards– Need to select texts that are consistent with the text complexity model 

outlined in the CC State Standards 

– Designed to ensure that students are exposed to texts at steadily 

increasing complexity levels as they progress through school    

so that all students are prepared for the advanced reading demands                 

of college and workforce training programs

• Valid Feedback about Skill Mastery– Need to measure skill mastery within a grade‐appropriate range of text 

complexity since  “Even experienced readers may fail to make inferences 

that they would ordinarily make without a problem when text materials 

are very challenging”

(van den Broeck, 2005, p.116) 

Page 3: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Key Differences Among Existing Approaches

for Modeling Text Complexity• Construct Representation

– How closely does the theory underlying the approach              line up with the view of Text Complexity outlined in the 

Standards?

• Genre Effects– Are genre effects accounted for?  How? 

• Heaps Law – Is Heaps Law accounted for?  How?

Page 4: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Construct-Representation

SystemSyntactic 

ComplexityVocab. 

DifficultyDegree 

of Narr.Cohesion 

(Ref & SM)

Lexile

(MM)

Flesch‐Kincaid

ATOS (RL)

REAP (CMU)

Coh‐Metrix

(UM)

SourceRater(ETS)

Narr. = Narrativity, Ref = Referential Cohesion, SM = Situation Model Cohesion

Page 5: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Shallow vs. Deep NLP

SystemSyntactic 

ComplexityVocab. 

DifficultyDegree 

of Narr.Cohesion 

(Ref & SM)

Lexile

(MM)

Flesch‐Kincaid

ATOS (RL)

REAP (CMU)

Coh‐Metrix(UM)

SourceRater(ETS)

Page 6: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Genre Effects: Many Useful Features Function

Differently Within Inf. & Literary Texts

Average ETS Word Frequency

Ave

rage

Gra

de L

evel

52 54 56 58 60 62 64 66

46

810

Literary

Informational

Less Familiar More Familiar

Result:

Literary Predictions are Too Low

Informational Predictions are Too High

Page 7: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

What Happens When Genre Effects are Ignored

7

Common Core Grade Band

Mea

n Fl

esch

-Kin

caid

GL

Scor

e

2 4 6 8 10 12

24

68

1012

LiteraryInformational

Common Core Grade Band

Mea

n Le

xile

GL

Scor

e

2 4 6 8 10 122

46

810

12

LiteraryInformational

Analysis of 102 texts from CCSS, Appendix B

Flesch-Kincaid Lexile Framework

Page 8: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Genre Bias is Not Present When Distinct Models are Estimated for Informational & Literary Texts

8

Common Core Grade Band

Mea

n S

ourc

eRat

er G

L S

core

2 4 6 8 10 12

24

68

1012

LiteraryInformational

SourceRater Analysis of 102 texts from CCSS, Appendix B

Page 9: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Heaps Law

• The probability that a word collection of size s includes a repeated content word increases                   

with s.

• This is a fundamental characteristic of language                 that has been validated in a wide array of 

applications

(Heaps, 1978)

Page 10: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Heaps Law and Cohesion Theory Lead to Opposite Conclusions About a

Popular Measure of Cohesion: The Prop. of Sentences with Backward References

Cohesion 

TheoryBackward References (i.e., repeated content words) 

help the reader follow the thread of an argument or 

story, so a high proportion of Backward References     

is

an indicator of comprehension ease

Heaps 

LawRepeated content words are more likely to appear in 

longer sentences, but longer sentences are more 

difficult to parse, so a high proportion of Backward 

References is an indicator of comp. difficulty

Page 11: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Empirical Results: 774 Informational & Literary Passages from High-Stakes Reading Assessments

Stem Overlap Adjacent

Ave

rage

Gra

de L

evel

0.0 0.2 0.4 0.6 0.8 1.0

46

810

LiteraryInformational

Prop. of Sentences with Backward References

(Informational Passages Only) Grade N Mean SD

3 46 0.42 0.194-5 89 0.46 0.17

6-7 76 0.42 0.178-9 87 0.46 0.18

10-11 44 0.47 0.17

Knowing the Proportion of Sentences with Backward

References Does Not Reduce Uncertainty about Text GL

Differences in Mean Scores are Not Significant

Page 12: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

A New Cohesion Measure that Accounts for Effects Due to Heaps Law

• For each possible sentence length, estimate pk ,                           

the expected probability of a backward reference

• Calculate yi = standardized difference

between                            

the observed and expected proportion of backward 

references in a passage

yi > 0 More backward References than expected,             

an indication of comprehension ease

yi < 0 Fewer backward references than expected,                  

an indication of comprehension difficulty

Page 13: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Pair Length (words)

P(B

ackw

ard

Ref

eren

ce |

Leng

th, G

enre

)

0 20 40 60 80 100 120

0.0

0.2

0.4

0.6

0.8

1.0

Literary

Informational

pk = P(Backward

Ref | Length = k)

Estimated from 31,885 pairs of adjacent sentences.

Page 14: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

This New Measure is Very Useful for Distinguishing Texts at Low and High GLs

Stem Overlap Adjacent

Ave

rage

Gra

de L

evel

0.0 0.2 0.4 0.6 0.8 1.0

46

810

LiteraryInformational

Standardized Stem Overlap Adjacent

Aver

age

Gra

de L

evel

-4 -2 0 2 4

46

81

0

LiteraryInform ational

Prop. of Sentences with Backward References

Std. Difference Between Observed & Expected Number

of Backward References

Page 15: Generating automated Text complexity Classifications...Common Core Grade Band Mean Flesch-Kincaid GL Score 24 68 10 12 2468 10 12 Literary Informational Common Core Grade Band Mean

Copyright © 2011 Educational Testing Service. All rights reserved.Copyright © 2011 Educational Testing Service. All rights reserved.

Summary• Construct Representation

– Systems that only measure vocabulary difficulty and 

syntactic complexity may not adequately represent the 

complexity construct defined in the new Standards• Genre Effects

– Systems that assume equivalent genre effects run the risk 

of providing complexity classifications that are too high for 

informational texts, and too low for literary texts• Heaps Law

– Systems that do not account for Heaps Law may not 

provide valid information about key dimensions of text 

variation, e.g., referential cohesion