the impact of automated measurement of text characteristics graesser-plenary... · potential...
Post on 08-May-2018
214 Views
Preview:
TRANSCRIPT
Art GraesserProfessor, Psychology & Institute for Intelligent Systems, University of Memphis
Honorary Research Fellow, University of Oxford
The Impact of Automated
Measurement of Text
Characteristics
Bill & Melinda
Gates Foundation
Overview• A snapshot of conversational agents
in assessments of reading, writing,
listening, and speaking
• Models of reading that emphasize
discourse
• Automatic scoring of text and writing
with CohMetrix and other automated
systems
Foundational Claims• There have been major advances in
computational linguistics and automated
discourse analyses during the last two
decades.
• Accuracy is impressive in computer analyses
of reading, writing, listening, and speaking.
• Conversational agents in social scenarios will
play an increasing role in these assessments.
A snapshot of agents in assessments of
Reading
Writing
Speaking
Listening
Conversational Agents
BEAT LeonardoPKD Android Casey
iSTART TLTS MRE
AutoTutor Adele STEVE
iMAP
SI Agent
Guru
Writing-Pal
Memphis Agent Environments
PKD Android
Andrew Olney
DeepTutor
Vasile Rus
AutoTutor
Art Graesser
iMAP
Max Louwerse
Guru
Andrew Olney
Meta-Tutor
Roger Azevedo
AutoTutor-LITE
Xiangen Hu
iSTART
Danielle McNamara
iDRIVE
Barry Gholson
Writing-Pal
Danielle McNamara
HURA Advisor
Xiangen Hu
DeepTutor
Vasile Rus
Trialogs
Expert Fellow Student
Human
Vicarious
Trialogs in Learning
Low Ability Vicarious learning
Medium Ability Tutorial dialogue
High Ability Teachable agent
Trialogs in Assessment
Low Ability Short responses to prompts
Inaccurate or irrelevant
Violation of social norms
Medium Ability
High Ability Lengthier turns
Accurate contributions
Social appropriateness
10
Confidential and Proprietary. Copyright © 2011 Educational Testing Service. All rights reserved.Confidential and Proprietary. Copyright © 2011 Educational Testing Service. All rights reserved.
Trialog (English Language Skills)
11
Agent Utterance
Lisa: Hey, Ron, you need to leave your water outside. I'm going
to go talk to my friends. I'll see you guys inside.
Ron: Why did she tell me I have to leave my water outside, Tim?
Human (Tim): I do not know.
Ron: Tim, why can't I drink water?
Human (Tim): The books may get wet.
Lisa: Why do you still have your water bottle, Ron? Look at rule
number 2. We cannot get in the library with food or drink.
2005 ETS Invitational Conference, New York City, October 10-11, 2005
Tactical Language and Culture
Training System (Lewis Johnson)
iSTART (Danielle McNamara) interactive Strategy Training for Active Reading and Thinking
Writing Pal (Danielle McNamara)
Self-Regulated Learning with
MetaTutor (Roger Azevedo)
Scientific reasoning
Electronic textbook
Periodic questions – answer and ask
Critique experiments and newspaper clips
Game context with aliens
Trialogs with tutor and student agents
Operation ARIES!
People need to learn how to critically evaluate descriptions of research . . .
Th
Correlational
Design
Causal statement
Core Concepts are presented
across the game
c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c
Interactive
textCase studies Interrogation
---------------- Story elements ---------------
Daphne Greenberg
Lee Branum-Martin
Robin Morris
Chris Oshima
Maureen Lovett
Jan Frijters
Art Graesser
Xiangen Hu
Mark Conley
Andrew Onley
Intervention with AutoTutor and Trialogues(Graesser, D’Mello, Hu, Cai, Olney, & Morgan, 2012)
Conversational agents
Intelligent Tutoring System
Online through browser
Media include texts, diagrams, videos, quizzes, games, and social media
Adults communicate by typing, speaking, or pointing/clicking
Text Selection and Repository• Critical to good instruction
• Related to both cognition and motivation
• Interesting• Relevant to adult lives• Multiple purposes• Not too easy or too difficult
• Different genres, media, and technologies
Models of reading
that emphasize
discourse
Models or Frameworks
in Discourse Processes Field
• Construction-integration model (Kintsch)
• Structure building framework (Gernsbacher)
• Causal structure (Trabasso, Van den Broek)
• Landscape model (Van den Broek)
• Constructionist theory (Graesser, Singer, Trabasso)
• Event indexing model (Zwaan, Magliano, Graesser)
• Memory-based resonance model (Myers, O’Brien)
• Embodied cognition (Glenberg, Zwaan)
Multilevel framework of
discourse comprehension1. Words
2. Syntax
3. Textbase
Explicit ideas (propositions)
Referential cohesion
4. Situation model
Causal, intentional, temporal,
spatial, logical relationships
Connectives
5. Genre and rhetorical structure
6. Pragmatic communication
Graesser & McNamara
(2011). Topics in
Cognitive Science.
Language Learning is Multidimensionaland Changes over Time
(Scarborough, 2003)
Reading Framework Proposed by
Perfetti (1999)
Goldman, Brown, Britt, Magliano,
Greenleaf, Lee, Griffin, Hastings,
Lawless, Pellegrino, Radinsky,
Raphael, Shanahan, Wiley
Multilevel framework of discourse comprehension
1. Words
2. Syntax
3. Textbase
Explicit ideas (propositions)
Referential cohesion
4. Situation model
Causal, intentional, temporal, spatial logical relationships
Connectives
5. Genre and rhetorical structure
6. Pragmatic communication
Graesser & McNamara (2011). Topics in Cognitive Science.
Potential Principles of Processing
• Bottom-up processing until reaching a level
where the reader is not proficient. Attempt to achieve deepest, most global level.
• Intermediate levels (3 & 4) have the highest
information novelty.Textbase and situation model demand resources.
• Top-down higher levels can circumvent the
need to process lower levels.A cost to comprehension at lower levels.
• Levels can compensate for deficits at other
levels.Particular compensation mechanisms require more research.
Scenario 1
A child has trouble recognizing letters in
the alphabet so there is an obstacle in
lexical decoding at the word level (level 1).
The word deficit blocks him from
understanding any of the text at levels 2-6.
Scenario 2
• Parents take their children to a new
Disney movie that have some adult
themes. The children notice the
parents laughing at different points in
the movie than they do.
• The children are making it successfully
through discourse levels 1-4, but not
levels 5 and 6.
Scenario 3• An adult reads a health insurance document.
There are lengthy sentences with embedded
clauses, complex syntax, numerous quantifiers
(all, many, rarely), and many logical operators
(and, or, not, if).
• The adult signs the contract because she
understands its purpose and trusts the insurance
agency. Levels 5 and 6 circumvent the need to
understand levels 2-4 completely.
Scenario 4
• Laboratory partners in an engineering course
read the directions to assemble a new
computer. They argue about how to hook up
the cables on the dual monitors.
• They have no problems with levels 1, 2, 3, 5,
and 6, but they do have a deficit at the
situation model level (level 4).
Conclusion
A mature assessment of reading
and writing needs to be sensitive
to the different levels of language
and discourse.
Automatic scoring of text and writing with Coh-Metrix and other automated systems
Graesser, McNamara, & Kulikowich (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40, 223-234.
Graesser, A. C., & McNamara, D. S. (2012). Automated analysis of essays and open-ended verbal responses. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA Handbook of Research Methods in Psychology, Vol 1: Foundations, Planning, Measures, and Psychometrics(pp. 307-325). Washington, DC: American Psychological Association.
Text Difficulty Measures• Popular measures (correlate r = .89 to .94)
– Flesch-Kincaid (Klare, 1976)
– Degrees of Reading Power (Koslin, Zeno, Koslin, 1987)
– Lexile scores (Stenner, 2006, MetaMetrix)
– SourceRater (Kosten, 2011, Educational Testing Service)
– Reading Maturity Metric (Landauer, Foltz, 2011, Pearson)
• Typical factors
– Word Familiarity
Word frequency, # of letters, # of syllables
– Sentence Length
• Measures are more accurate if there are other levels
Automated Computer
Assessments of Writing• Essay graders (accuracy = human experts)
– Intelligent Essay Assessor (Pearson Knowledge Technology)
– E-Rater, Criterion, CBAL (Educational Testing Service)
– Writing-Pal (McNamara, Crossley, similar to Coh-Metrix)
• Answers to questions– C-Rater (Educational Testing Service)
• Think aloud & self-explanations during reading– iSTART (McNamara)
– Reading Strategy Assessment Tool (Magliano, Millis)
• Contributions during conversation– AutoTutor dialogs and trialogs (Graesser)
– Operation ARA (Pearson Education)
Language & Discourse
Analysis Tools• Analysis of Words
– WordNet, Framenet, MRC Database, Celex, LIWC
• Syntax– Penn Treebank, Charniak Parser
• Propositions – Propbank, logical form, entailment
• Coreference and cohesion – Coh-Metrix, SourceRater
• Essay Graders – Intelligent Essay Assessor, E-rater
• Genre analyzers– Biber, Coh-Metrix
• Dialogue– AutoTutor, iSTART, Operation ARA
Tools to Represent World
Knowledge• Linguistic Inquiry and Word Count (LIWC)
(Pennebaker, Booth, & Francis, 2007)
• Latent semantic analysis (Landauer, McNamara, Dennis, & Kintsch, 2007)
• N-grams
(Jurafsky & Martin, 2008)
• Topics model and MEM (Griffiths, Steyvers, & Tenenbaum, 2007; Chung
& Pennebaker, 2010)
Figurative Language Remains
a Challenge
• Metaphor, Simile
• Personification
• Metonymy
• Hyperbole, Understatement
• Irony, Sarcasm
• Jokes, Wit
• Indirect speech acts
Cohesion– Repeated nouns and concepts across
sentences
That cat sat on a hat. The hat was black.
– Less anaphor
The hat was black. vs. It was black.
That is short. vs. That sentence is short.
– More connectives between sentences
because, although, however, first, then
– More headers and topic sentences
– Genre structure
Plants (low and high cohesion)What Are the Needs of Plants?
Like all living things, plants have certain needs. Plants need sunlight, water, and air to live. Plants also need minerals(MIN·uhr·uhlz). A mineral is a naturally occurring substance that is neither plant nor animal.
The parts of plants help them to get or make what they need. All plants get water and minerals from the soil. The root is the part of the plant that grows underground. Roots help hold the plant in the ground. Roots also help take in water and minerals that the plant needs.
The stem is the part that supports the plant. It helps the plant stand upright. It carries minerals and water from the roots. It also carries food from the leaves to other parts of the plant.
[…]
What Plants Need
Plants have certain needs, just like all living
things have needs. For example, plants need
sunlight, water, and air to live. Plants also need
minerals (pronounced as MIN·uhr·uhlz). A
mineral is not a plant or an animal. Instead, a
mineral is a substance in the ground that occurs
naturally. There are three parts of plants that
help plants get what they need or help plants
make what they need.
The Three Parts of a Plant
The three parts of the plant are the roots, stems,
and leaves.
1. The Root
The root is the part of the plant that grows
underground. All plants get water and minerals
from the ground, which is sometimes called soil.
Roots help the plant take in water and minerals
that the plant needs from the soil. Roots also
help hold the plant in the ground.
2. The Stem
The stem is the part that supports the plant. The
stem helps the plant stand upright. It carries
minerals and water from the roots of the plant to
other parts of the plant. The stem also carries
food from the leaves to other parts of the plant.
Coh-Metrix Goals
• Automatic analysis of texts on multiple
levels of the multilevel, multicomponent
frameworks
• A tool that can be used by researchers,
teachers, students, educational leaders,
and the public
Coh-Metrix: A Natural Language Processing Tool
Analyzes texts on over 100 measures of
cohesion and language
Google Cohmetrix
Example Coh-Metrix MeasuresWord Measures
• Number of syllables
• Part of speech (noun, verb…)
• Word frequency
• Concreteness, imagery
• Multiple meanings
• WordNet
Co-reference Cohesion
• Noun and argument overlap
• Stem overlap
– (lemmas: run, runs, runner)
• Latent semantic analysis (LSA)
• Lexical diversity (type-token ratio)
• Pronouns
Situation Model Cohesion
• Connectives & discourse markers
• Causal and intentional verbs
• Causal and intentional cohesion
• Repetition in tense and aspect
• Logical operators
– and, or, therefore, if, then, not
Syntax
• Structural complexity
• Modifiers per noun-phrase
• Words before main verb of
main clause
• Syntactic similarity between
sentences
Preprocessing Syntax Analysis Lexical Analysis(Filters) (Tagger, Parser) (Lemmatizier, Stemmer)
LSA
WordNet
MRCCELEX
Wd Lists
Coh-Metrix
Database
Database
Info
Syntax
Features
Sentence
Complexity
Word
Difficulty
Lexical
Features
Spatial
Cohesion
Temporal
Cohesion
Causal
Cohesion
Referential
Cohesion
Text
Complexity
Components
Copy and Paste text
Click on Analyze
Graph appears with verbal translation
Tea.cohmetrix.com
Coh-Metrix Easability Components
• Narrativity. Narrative text tells a story, with characters, events, places, and
things that are familiar to the reader.
• Syntactic simplicity. Sentences with more complex syntax are more
difficult to process, whereas those with few words and simple, familiar structures are easier to process and understand.
• Word concreteness. Concrete words evoke mental images and are
more meaningful to the reader than abstract words.
• Referential cohesion. High cohesion text contains words and ideas
that overlap across sentences and the entire text, forming threads that connect the textbase together for the reader.
• Deep cohesion. Causal, intentional, and temporal connectives help the
reader to form a more coherent and deeper understanding of the text.
0 20 40 60 80 100
Deep Cohesion
Referential Cohesion
Word Concreteness
Syntactic Simplicity
Narrativity
Percentiles
Maps and Globes
0 20 40 60 80 100
Deep Cohesion
Referential Cohesion
Word Concreteness
Syntactic Simplicity
Narrativity
Percentiles
How the Camel Got His Hump Back
Thousands of years ago, our
ancestors invented the map.
Ancient maps were crude but
very useful tools. They helped
people find food, clean water,
and the way back home--even
when home was a cave.
As civilizations grew, better
maps were needed.
Now this is the next tale, and it
tells how the Camel got his big
hump. In the beginning of
years, when the world was so
new and all, and the Animals
were just beginning to work for
Man, there was a Camel, and
he lived in the middle of a
Howling Desert…
How did we arrive at our 5 major measures? (Graesser, McNamara, & Kulikowich, 2011, Educational Researcher)
• TASA - Touchstone Applied Science Associates • 37,520 texts
• Texts had mean of 288.6 words (SD = 25.4)
• Most of the texts in the language arts, science, and social
studies
• Represents texts a student would experience
throughout K12.
• Degrees of Reading Power (DRP) scores and
Genre classification
• Conducted a Principal Components Analysis
with 53 Coh-Metrix measures
Red = Narrative, Green = Social Studies, Blue = Science
-1,5
-1
-0,5
0
0,5
1
1,5
2
Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR
Narrativity Large differences
between narrative and
informational texts
Grade level approximated by DRP
Red = Narrative, Green = Social Studies, Blue = Science
-1,5
-1
-0,5
0
0,5
1
1,5
2
Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR
Syntactic SimplicitySyntax is simpler
for informational
texts
Grade level approximated by DRP
Red = Narrative, Green = Social Studies, Blue = Science
-1,5
-1
-0,5
0
0,5
1
1,5
2
Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR
Referential CohesionReferential cohesion
may compensate for
difficult content
Grade level approximated by DRP
Red = Narrative, Green = Social Studies, Blue = Science
-1,5
-1
-0,5
0
0,5
1
1,5
2
Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR
Deep CohesionScience is a bit
lower and slight
increase over
grades
Grade level approximated by DRP
Red = Narrative, Green = Social Studies, Blue = Science
-1,5
-1
-0,5
0
0,5
1
1,5
2
Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR
Word Concreteness An intriguing
curvilinear trend
Grade level approximated by DRP
Measures of Text Difficulty Report(Nelson, Perfetti, Liben, & Liben, 2012)
Measures of Text Difficulty Report(Nelson, Perfetti, Liben, & Liben, 2012)
Formality of Language
Formal
• Expository
• High cohesion
• Complex syntax
• Abstract words
Informal
• Narrative
• Low cohesion
• Simple syntax
• Concrete words
Policies of Text Assignment
• Pushing the envelope
• Building self-efficacy
• Balanced diet
• The role of topic interest
Text Library
Search
Coh-MetrixProfile
Analysis & Recommendations
Sort Texts
Easy Hard
Text Difficulty Heat Maps
There was once a popular television show about an exceptional talking
horse named Mr. Ed.
You might surmise that some people actually believed, or, at the very
least, wanted to believe in the alleged talking horse.
Centuries ago, there were vast majorities of people who believed that
there was a horse that could answer questions; that it could even spell
words and do complex arithmetic!
This is the factual story about a horse known as Clever Hans, and the
"clever" part of his name came from his lofty intelligence.
Around the turn of the 20th century, a German school instructor named
Mr. von Osten exhibited his keenly intelligent horse who could not
convey information verbally, so instead, he did so by tapping his hoofs
on the ground in order to give his responses to questions.
There was once a popular television show about a talking horse named
Mr. Ed.
You might think that some people actually believed there really were
talking horses.
But no, even back then they knew that horses could not talk.
Over 100 years ago there were a lot of people who believed that there
was a horse that could answer questions.
The horse could even spell words and do arithmetic!
This is the true story about a horse known as Clever Hans.
The horse lived with his owner in Germany; Hans is a common name
(even for a horse) in Germany.
The “clever” part of his name came from his high intelligence.
Here is the story: Around the turn of the 20th century, a German school
teacher named Mr. von Osten showed off his very smart horse.
Of course, not even this smart horse could talk, so he communicated
by tapping his hoofs.
Easy Hard
Easy Text Version Difficult Text Version
Corresponding sentences differ
up to 3 levels between Text A and
Text B
Definition of Engagement
Ztd = Z-score of text segment on difficulty
Zrt = Z-score of reading time
/ Zrt – Ztd / = Discrepancy score
Disengagement increases with the
discrepancy score.
What is the half-life of engagement for a
text?
FKG Score
De-Coupling as a function of Flesch-
Kincaid score of sentence triplets
Conclusions about Reading Time
and Decoupling
• Readers most engaged at the zone of
language they can handle
• Currently analyzing this decoupling at
different levels of the multilevel
framework
• Intelligent tutor could provide feedback
and activities at periodic points as they
read.
Foundational Claims• There have been major advances in computational
linguistics and automated discourse analyses during the last two decades.
• Accuracy is impressive in computer analyses of reading, writing, listening, and speaking.
• Conversational agents in social scenarios will play an increasing role in these assessments.
Who is at the Table?
• AI and computational
linguistics
• Cognitive and learning
sciences
• Language and
discourse processes
• Measurement and
assessment
top related