autotutor: an intelligent tutoring system with mixed initiative dialog art graesser university of...
TRANSCRIPT
AutoTutor: An Intelligent Tutoring System with
Mixed Initiative Dialog
Art GraesserUniversity of Memphis
Department of Psychology & the Institute for Intelligent Systems
Supported on grants from the NSF, ONR, ARI, IDA, IES, US Census Bureau, and CHI Systems
InterdisciplinaryApproach
Computer Science
Psychology
Computational Linguistics
Education
Overview
Brief comments on my research on question asking and answering
Primary focus is on AutoTutor -- a collaborative reasoning and question answering system
Overview of my Research on Questions
Psychological Models• Question asking (PREG, ONR, NSF, ARI)
• Question answering (QUEST, ONR)
Computer Artifacts• Tutor (AutoTutor, Why/AutoTutor, Think like a
commander, NSF, ONR, ARI, CHI Systems)
• Survey question critiquer (QUAID, US Census, NSF)
• Point & Query software (P&Q, ONR)
• Query-based information retrieval (HURA Advisor, IDA)
AutoTutor Collaborative reasoning and question answering in tutorial
dialog
Think Like a Commander Vignettes
1 Trouble in McLouth
2 Save the Shrine
3 The Recon Fight
4 A Shift In Forces
5 The Attack Begins
6 The Bigger Picture
7 Looking Deep
8 Before the Attack
9 Meanwhile Back at the Ranch
•Keep Focus on Mission? Higher’s Intent?•Model a Thinking Enemy?•Consider Effects of Terrain?•Use All Assets Available?•Consider Timing?•See the Bigger Picture?•Visualize the Battlefield
Accurately? - Realistic Space-Time Forecast Dynamically? - Entities Change Over Time Proactively? - What Can I Make Enemy Do
•Consider Contingencies and Remain Flexible?
What does AutoTutor do?
Asks questions and presents problems Why? How? What-if? What is the difference?
Evaluates meaning and correctness of the learner’s answers (LSA and computational linguistics)
Gives feedback on answers Face displays emotions + some gestures Hints Prompts for specific information Adds information that is missed Corrects some bugs and misconceptions Answers student question Holds mixed-initiative dialog in natural language
Pedagogical Design Goals
Simulate normal human tutors and ideal tutors
Active construction of student knowledge rather than information delivery system
Collaborative answering of deep reasoning questions
Approximate evaluation of student knowledge rather than detailed student modeling
A discourse prosthesis
Feasibility of Natural Language Dialog in Tutoring
Learners are forgiving when the tutor’s dialog acts are imperfect.
They are even more forgiving when the bar is set low during instructions.
There are learning gains.
Learning is not correlated with liking.
Low
Expected
Precision
High
Expected
Precision
Low Common
GroundYES MAYBE
High
Common
GroundMAYBE NO
DEMO
Human Tutors
Analyze hundreds of hours human tutors• Research methods in college students• Basic algebra in 7th grade• Typical unskilled cross-age tutors
Studies from the Memphis labs• Graesser & Person studies
Studies from other labs• Chi, Evens, McArthur …
Characteristics of students that we wish were better
Student question asking Comprehension calibration Self-regulated learning, monitoring, & and
error correction Precise, symbolic articulation of knowledge Global integration of knowledge
• Distant anaphoric reference• Analogical reasoning • Application of principles to a practical problem
Pedagogical strategies not used by unskilled tutors
Socratic method (Collins, Stevens) Modeling-scaffolding-fading (Rogoff) Reciprocal training (Brown, Palincsar) Anchored Learning (Bransford,Vye, CTGV) Error diagnosis & repair (Anderson, van Lehn,
Lesgold) Building on prerequisites (Gagne) Cascade techniques (van Lehn, Schank) Sophisticated motivational techniques (Lepper)
What can AutoTutor (and most human tutors) handle?
Correct Information Errors and Misconceptions
Tutor expects and
Student expresses
Yes Yes
Tutor corrects
Tutor expects but
Student does not expressYes
Tutor helps to fill in
Yes
Not manifested
Tutor does not expect but
Student expressesNo (or rarely)
Tutor ignores
No (or rarely)
Tutor ignores
AutoTutor
Language Extraction
Speech actclassifier
LatentSemanticAnalysis
CurriculumScript
Talkinghead withgestures
Dialog Management
Problem selection
Managing One AutoTutor Turn
Short feedback on the student’s previous turn
Advance the dialog by one or more dialog moves that are connected by discourse markers
End turn with a signal that transfers the floor to the student• Question
• Prompting hand gesture
• Head/gaze signal
Dialog Moves
Positive immediate feedback: “Yeah” “Right!” Neutral immediate feedback: “Okay” “Uh huh” Negative immediate feedback: “No” “Not quite”
Pump for more information: “What else?” Hint: “How does tossing the pumpkin affect horizontal velocity?” Prompt for specific information: “Vertical acceleration does not affect
horizontal _______.” Assert: “Vertical acceleration does not affect horizontal velicity.”
Correct: “Air resistance is negligible” Repeat: “So, once again, how does tossing the pumpkin affect
horizontal velocity?” Summarize: “So to recap, [succinct summary].”
Discourse Marker Relations
yeah
yesyep
Oh yeah
That'sfine
and
mhm
right
sure
unhunh
since
basically hell
that'sright
exactly
I know
apparently
Oh that'sright
Kind of
Oh yes
that'strue
Ofcourse
because
cause
man
In fact
No doubt
definitely
Sample dialog
Tutor: This might help. If the runner speeds up will the pumpkin land behind the runner, next to the runner, or in front of the runner?
The pumpkin will land behind the runner.
Tutor: Right. What is the horizontal force applied to the
pumpkin after it leaves the runner's hands?
There is no horizontal force applied to the pumpkin after it leaves the runner's hands.
Tutor: Right. expectation(1): 0.836expectation(2): 0.851expectation(3): 0.584misconception (1) 0.371misconception (2): 0.605
As zero force is acting on it, it will haveconstant velocity hence it will land on his head.
.51
.66
.44
Expectation-1 The pumpkin has the same horizontal velocity as the runner.
Expectation-2 Zero force is needed to keep an object going with constant velocity.
Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.
The horizontal velocity of the pumpkin is the same as the runner.
The horizontal velocity of the pumpkin is the same as the runner.
.99
.66
.87
Expectation-1 The pumpkin has the same horizontal velocity as the runner.
Expectation-2 Zero force is needed to keep an object going with constant velocity.
Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.
How does Why/AutoTutor select the next expectation?
Don’t select expectations that the student has covered cosine(student answers, expectation) > threshold
Frontier learning, zone of proximal developmentSelect highest sub-threshold expectation
CoherenceSelect next expectation that has highest overlap with
previously covered expectation
Pivotal expectations
How does AutoTutor know which dialog move to deliver?
Dialog Advancer Network (DAN) for mixed-initiative dialog
15 Fuzzy production rules Quality of the student’s assertion(s) in preceding
turnStudent ability levelTopic coverageStudent verbosity (initiative)
Hint-Prompt-Assertion cycles for expected good answers
Dialog Advancer Network
Hint-Prompt-Assertion Cycles to Cover Good Expectations
Cycle fleshes out one expectation at a time
Exit cycle when:
cos(S, E ) > T
S = student input
E = expectation
T = threshold
Hint
Prompt
Assertion
Hint
Assertion
Prompt
Who is delivering the answer?
STUDENT PROVIDES INFORMATION
Pump
Hint
Prompt
Assertion
TUTOR PROVIDES INFORMATION
Correlations between dialog moves and student ability
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Co
rrel
atio
n
Pumps Hints Prompts Assertions PositiveFeedback
NegativeFeedback
Corrections
Question TaxonomyQUESTION CATEGORY GENERIC QUESTION FRAMES AND EXAMPLES1. Verification Is X true or false? Did an event occur? Does a state exist?2. Disjunctive Is X, Y, or Z the case?3. Concept completion Who? What? When? Where?4. Feature specification What qualitative properties does entity X have?5. Quantification What is the value of a quantitative variable? How much? How many? 6. Definition questions What does X mean?7. Example questions What is an example or instance of a category?). 8. Comparison How is X similar to Y? How is X different from Y?9. Interpretation What concept/claim can be inferred from a static or active data pattern?10. Causal antecedent What state or event causally led to an event or state?
Why did an event occur? Why does a state exist?How did an event occur? How did a state come to exist?
11. Causal consequence What are the consequences of an event or state?What if X occurred? What if X did not occur?
12. Goal orientation What are the motives or goals behind an agent’s action? Why did an agent do some action?
13. Instrumental/procedural What plan or instrument allows an agent to accomplish a goal? How did agent do some action?
14. Enablement What object or resource allows an agent to accomplish a goal?15. Expectation Why did some expected event not occur? Why does some expected state
not exist?16. Judgmental What value does the answerer place on an idea or advice?
What do you think of X? How would you rate X?
Speech Act Classifier
Assertions Questions (16 categories)DirectivesMetacognitive expressions
(“I’m lost”)Metacommunicative expressions
(“Could you say that again?”)Short Responses
95% Accuracy on tutee contributions
A New Query-based Information Retrieval System(Louwerse, Olney, Mathews, Marineau, Hite-Mitchell, Graesser, 2003)
Input context: Text and Screen
Select Highest Matching Document
Syntactic ParserLexiconsSurface cuesFrozen expressions
Word particles of question category
Input speech act
Classify speech act QUEST’s 16 question categories,
assertion, directive, other
Augment retrieval cues
Search documents via LSA
Evaluations of
AutoTutor
LEARNING GAINS (effect sizes)
.42 Unskilled human tutors(Cohen, Kulik, & Kulik, 1982)
.75 AutoTutor (7 experiments)(Graesser, Hu, Person)
1.00 Intelligent tutoring systems
PACT (Anderson, Corbett, Koedinger)
Andes, Atlas (VanLehn)
2.00 (?) Skilled human tutors
Learning Gains (Effect Sizes)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Eff
ect
Siz
e
1 1.1 1.1 2 2 WHY2 WHY2
AutoTutor Version
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Shallow Deep Cloze/Exp
Spring 2002 EvaluationsConceptual Physics
(VanLehn & Graesser, 2002)
Four conditions1. Human tutors
2. Why/Atlas
3. Why/AutoTutor
4. Read control
86 College Students
Measures in Spring Evaluation
Multiple Choice Test
• Pretest and posttest (40 multiple choice questions in each)
Essays graded by 6 physics experts
• 4 pretest and 4 posttest essays
• Expectations versus misconceptions
• Wholistic grades
Generic principles and misconceptions (fine-grained)
Learner perceptions
Time on Tasks
Effect Sizes on Learning Gains (pretest to posttest,
no differences among tutoring conditions)
0 0.25 0.5 0.75 1 1.25 1.5
Expert Grades
Misconceptionremoval
Expectations-lenient
Expectations-stringent
Multiple Choice
Effect size
Fall 2002 EvaluationsConceptual Physics
(Graesser, Moreno, et al., 2003)
Three tutoring conditions
1. Why/AutoTutor
2. Read textbook control
3. Read nothing
63 subjects
Multiple Choice Scores
0.4
0.5
0.6
0.7
0.8
Why/AutoTutor Read Textbook Read nothing
Mul
tiple
Cho
ice
Test
Sco
re
Pretest Posttest Adjusted posttest
2002-3 EvaluationsComputer Literacy
(Graesser, Hu, et al., 2003)
2 Tutoring Conditions1. AutoTutor2. Read nothing
4 Media Conditions 1. Print2. Speech3. Speech+Head4. Speech+Head+Print
96 subjects
Deep Reasoning Questions
0.2
0.3
0.4
0.5
0.6
Why/AutoTutor Read nothing
Mul
tiple
Cho
ice
Test
Sco
re
Print Speech Head+S Head+S+P
LATENT SEMANTIC ANALYSIS
Signal Detection Analyses
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.30 0.40 0.50 0.60 0.65
LSA Threshold
Hit RateFalse Alarm Rated' score
Recall, Precision, and F-measure
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.30 0.40 0.50 0.60 0.65
LSA Threshold
RecallPrecisionF-measure
What Expectations are LSA-worthy?
Compute correlation between:(a) Experts’ ratings of whether essay
answers have expectation E (b) Maximum LSA cosine between E and all
possible combinations of sentences in essay
A high correlation means the expectation is LSA-worthy
Expectations and Correlations (expert ratings, LSA)
After the release, the only force on the balls is the force of the moon’s gravity (r = .71)
A larger object will experience a smaller acceleration for the same force (r = .12)
Force equals mass times acceleration (r = .67)
The boxes are in free fall (r = .21)
OTHER EMPIRICAL EVALUATIONS
Assessment of Dialogue Management
Bystander Turing test
Participants rate whether particular dialog moves in conversations were generated by AutoTutor or by skilled human tutors.
♀
Bystander says
Computer said it
Bystander says
Human said it
Reality:
Computer said it
Hit
.51
Miss
.49
Reality:
Human said itFalse alarm
.53
CR
.47
ASL Model 501 Eye Tracker
QUESTION
4%
ANSWER
7%OFF20%
TALKING HEAD
40%DISPLAY
29%
(MAINLY KEYBOARD)
Percentage of Time Allocated to Interface Components
What Conversational Agents Facilitate Learning?
Correlation matrix for DVs
Like Comp Cred Quality Sync
Comp .50**
Cred .51** .33**
Quality .54** .59** .49**
Sync .56** .54** .31** .53**
Learning .03 .07 .02 .04 .03
AutoTutor Collaborations University of Pittsburgh (VanLehn)
ONR, physics intelligent tutoring systems, Why2 University of Illinois, Chicago (Wiley, Goldman)
NSF/ROLE, plate tectonics, eye tracking, critical stance Old Dominion and Northern Illinois University (McNamara,
Magliano, Millis, Wiemer-Hastings) IERI, science text comprehension.
MIT Media LabNSR/ROLE, Learning Companion, emotion sensors (Picard, Reilly)BEAT, gesture, emotion and speech generator (Cassell, Bickmore)
CHI Systems (Zachary, Ryder)Army SBIR, Think Like a Commander
Institute for Defense Analyses (Fletcher, Toth, Foster)ONR/OSD, Human Use Regulatory Affairs Advisor, research
ethics, web site with agent
Collaboration with MIT Media Lab
Affect Computing Lab• Frustration• Anger• Confusion• Eureka high’s• Contemplation – flow experience
Inferring emotions from sensors• Blue Eyes• Mouse-glove pressure and sweat• Butt
Dialog moves sensitive to emotions
Forthcoming AutoTutor developments
Language and discourse enhancements
• Weave in deeper semantic processing components
• Natural language generation for prompts
Improved animated conversational agent
3-d simulation for enhancing the articulation of explanations
Improve authoring tools
Evolution of the content of curriculum scripts through tutoring experience
The Long-term Vision
Future human-computer interfaces will be conversational: Just like people talking face to face.
Avatars will tutor and mentor learners on the web: students, soldiers, citizens, customers, elderly, special populations, low and high literacy, low and high motivation…
Learning modules will be accessed and available throughout the globe: a 24 by 7 virtual university.
Learning tailored to learner’s abilities, talents, interests, motivation, unique histories.
Proposed NSF Project will Augment AutoTutor
AutoTutor enhanced with more problem solving and complex decision making with Franklin’s Intelligent Distribution Agent
Courseware from MIT, Carnegie Mellon, Pittsburgh, Wisconsin, Illinois, military, and corporations (Merlot, Concord Consortium)
SCORM learning software standards established in the military
University of Colorado speech recognition
FedEx Institute as ADL Co-Lab with DoD and Bureau of Labor, for expansion to business and industry
?
AutoTutor, Atlas, and Why2 are perhaps the most sophisticated tutorial dialogue projects in the intelligent tutoring systems community (James Lester, AI Magazine, 2001)
Are there properties of the expectations that correlate with LSA-worthiness?
Number of words .07 Number of content words .01 Vector length of expectation .05 Number of glossary terms -.03 Number of infrequent words .23 Number of negations -.29* Number of relative terms, symbols .06
quantifiers, deictic expressions
Challenges in use of LSA in AutoTutor
Widely acknowledged limitations of LSA• Negation• Word order• Structural composition • Size of description
“I thought I said that already!”
• coverage imperfection
Need for larger corpus of misconceptions• Expectation d’ of LSA with experts = .79
• Misconception d’ of LSA with experts = .57
Coordinating LSA with symbolic systems