autotutor: an intelligent tutoring system with mixed initiative dialog art graesser university of...

AutoTutor: An Intelligent Tutoring System with

Mixed Initiative Dialog

Art GraesserUniversity of Memphis

Department of Psychology & the Institute for Intelligent Systems

Supported on grants from the NSF, ONR, ARI, IDA, IES, US Census Bureau, and CHI Systems

InterdisciplinaryApproach

Computer Science

Psychology

Computational Linguistics

Education

Overview

Brief comments on my research on question asking and answering

Primary focus is on AutoTutor -- a collaborative reasoning and question answering system

Overview of my Research on Questions

Psychological Models• Question asking (PREG, ONR, NSF, ARI)

• Question answering (QUEST, ONR)

Computer Artifacts• Tutor (AutoTutor, Why/AutoTutor, Think like a

commander, NSF, ONR, ARI, CHI Systems)

• Survey question critiquer (QUAID, US Census, NSF)

• Point & Query software (P&Q, ONR)

• Query-based information retrieval (HURA Advisor, IDA)

AutoTutor Collaborative reasoning and question answering in tutorial

dialog

Think Like a Commander Vignettes

1 Trouble in McLouth

2 Save the Shrine

3 The Recon Fight

4 A Shift In Forces

5 The Attack Begins

6 The Bigger Picture

7 Looking Deep

8 Before the Attack

9 Meanwhile Back at the Ranch

•Keep Focus on Mission? Higher’s Intent?•Model a Thinking Enemy?•Consider Effects of Terrain?•Use All Assets Available?•Consider Timing?•See the Bigger Picture?•Visualize the Battlefield

Accurately? - Realistic Space-Time Forecast Dynamically? - Entities Change Over Time Proactively? - What Can I Make Enemy Do

•Consider Contingencies and Remain Flexible?

What does AutoTutor do?

Asks questions and presents problems Why? How? What-if? What is the difference?

Evaluates meaning and correctness of the learner’s answers (LSA and computational linguistics)

Gives feedback on answers Face displays emotions + some gestures Hints Prompts for specific information Adds information that is missed Corrects some bugs and misconceptions Answers student question Holds mixed-initiative dialog in natural language

Pedagogical Design Goals

Simulate normal human tutors and ideal tutors

Active construction of student knowledge rather than information delivery system

Collaborative answering of deep reasoning questions

Approximate evaluation of student knowledge rather than detailed student modeling

A discourse prosthesis

Feasibility of Natural Language Dialog in Tutoring

Learners are forgiving when the tutor’s dialog acts are imperfect.

They are even more forgiving when the bar is set low during instructions.

There are learning gains.

Learning is not correlated with liking.

Low

Expected

Precision

High

Expected

Precision

Low Common

GroundYES MAYBE

High

Common

GroundMAYBE NO

Human Tutors

Analyze hundreds of hours human tutors• Research methods in college students• Basic algebra in 7th grade• Typical unskilled cross-age tutors

Studies from the Memphis labs• Graesser & Person studies

Studies from other labs• Chi, Evens, McArthur …

Characteristics of students that we wish were better

Student question asking Comprehension calibration Self-regulated learning, monitoring, & and

error correction Precise, symbolic articulation of knowledge Global integration of knowledge

• Distant anaphoric reference• Analogical reasoning • Application of principles to a practical problem

Pedagogical strategies not used by unskilled tutors

Socratic method (Collins, Stevens) Modeling-scaffolding-fading (Rogoff) Reciprocal training (Brown, Palincsar) Anchored Learning (Bransford,Vye, CTGV) Error diagnosis & repair (Anderson, van Lehn,

Lesgold) Building on prerequisites (Gagne) Cascade techniques (van Lehn, Schank) Sophisticated motivational techniques (Lepper)

What can AutoTutor (and most human tutors) handle?

Correct Information Errors and Misconceptions

Tutor expects and

Student expresses

Yes Yes

Tutor corrects

Tutor expects but

Student does not expressYes

Tutor helps to fill in

Yes

Not manifested

Tutor does not expect but

Student expressesNo (or rarely)

Tutor ignores

No (or rarely)

Tutor ignores

AutoTutor

Language Extraction

Speech actclassifier

LatentSemanticAnalysis

CurriculumScript

Talkinghead withgestures

Dialog Management

Problem selection

Managing One AutoTutor Turn

Short feedback on the student’s previous turn

Advance the dialog by one or more dialog moves that are connected by discourse markers

End turn with a signal that transfers the floor to the student• Question

• Prompting hand gesture

• Head/gaze signal

Dialog Moves

Positive immediate feedback: “Yeah” “Right!” Neutral immediate feedback: “Okay” “Uh huh” Negative immediate feedback: “No” “Not quite”

Pump for more information: “What else?” Hint: “How does tossing the pumpkin affect horizontal velocity?” Prompt for specific information: “Vertical acceleration does not affect

horizontal _______.” Assert: “Vertical acceleration does not affect horizontal velicity.”

Correct: “Air resistance is negligible” Repeat: “So, once again, how does tossing the pumpkin affect

horizontal velocity?” Summarize: “So to recap, [succinct summary].”

Discourse Marker Relations

yeah

yesyep

Oh yeah

That'sfine

and

mhm

right

sure

unhunh

since

basically hell

that'sright

exactly

I know

apparently

Oh that'sright

Kind of

Oh yes

that'strue

Ofcourse

because

cause

man

In fact

No doubt

definitely

Sample dialog

Tutor: This might help. If the runner speeds up will the pumpkin land behind the runner, next to the runner, or in front of the runner?

The pumpkin will land behind the runner.

Tutor: Right. What is the horizontal force applied to the

pumpkin after it leaves the runner's hands?

There is no horizontal force applied to the pumpkin after it leaves the runner's hands.

Tutor: Right. expectation(1): 0.836expectation(2): 0.851expectation(3): 0.584misconception (1) 0.371misconception (2): 0.605

As zero force is acting on it, it will haveconstant velocity hence it will land on his head.

.51

.66

.44

Expectation-1 The pumpkin has the same horizontal velocity as the runner.

Expectation-2 Zero force is needed to keep an object going with constant velocity.

Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.

The horizontal velocity of the pumpkin is the same as the runner.

The horizontal velocity of the pumpkin is the same as the runner.

.99

.66

.87

Expectation-1 The pumpkin has the same horizontal velocity as the runner.

Expectation-2 Zero force is needed to keep an object going with constant velocity.

Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.

How does Why/AutoTutor select the next expectation?

Don’t select expectations that the student has covered cosine(student answers, expectation) > threshold

Frontier learning, zone of proximal developmentSelect highest sub-threshold expectation

CoherenceSelect next expectation that has highest overlap with

previously covered expectation

Pivotal expectations

How does AutoTutor know which dialog move to deliver?

Dialog Advancer Network (DAN) for mixed-initiative dialog

15 Fuzzy production rules Quality of the student’s assertion(s) in preceding

turnStudent ability levelTopic coverageStudent verbosity (initiative)

Hint-Prompt-Assertion cycles for expected good answers

Dialog Advancer Network

Hint-Prompt-Assertion Cycles to Cover Good Expectations

Cycle fleshes out one expectation at a time

Exit cycle when:

cos(S, E ) > T

S = student input

E = expectation

T = threshold

Hint

Prompt

Assertion

Hint

Assertion

Prompt

Who is delivering the answer?

STUDENT PROVIDES INFORMATION

Pump

Hint

Prompt

Assertion

TUTOR PROVIDES INFORMATION

Correlations between dialog moves and student ability

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Co

rrel

atio

n

Pumps Hints Prompts Assertions PositiveFeedback

NegativeFeedback

Corrections

Question TaxonomyQUESTION CATEGORY GENERIC QUESTION FRAMES AND EXAMPLES1. Verification Is X true or false? Did an event occur? Does a state exist?2. Disjunctive Is X, Y, or Z the case?3. Concept completion Who? What? When? Where?4. Feature specification What qualitative properties does entity X have?5. Quantification What is the value of a quantitative variable? How much? How many? 6. Definition questions What does X mean?7. Example questions What is an example or instance of a category?). 8. Comparison How is X similar to Y? How is X different from Y?9. Interpretation What concept/claim can be inferred from a static or active data pattern?10. Causal antecedent What state or event causally led to an event or state?

Why did an event occur? Why does a state exist?How did an event occur? How did a state come to exist?

11. Causal consequence What are the consequences of an event or state?What if X occurred? What if X did not occur?

12. Goal orientation What are the motives or goals behind an agent’s action? Why did an agent do some action?

13. Instrumental/procedural What plan or instrument allows an agent to accomplish a goal? How did agent do some action?

14. Enablement What object or resource allows an agent to accomplish a goal?15. Expectation Why did some expected event not occur? Why does some expected state

not exist?16. Judgmental What value does the answerer place on an idea or advice?

What do you think of X? How would you rate X?

Speech Act Classifier

Assertions Questions (16 categories)DirectivesMetacognitive expressions

(“I’m lost”)Metacommunicative expressions

(“Could you say that again?”)Short Responses

95% Accuracy on tutee contributions

A New Query-based Information Retrieval System(Louwerse, Olney, Mathews, Marineau, Hite-Mitchell, Graesser, 2003)

Input context: Text and Screen

Select Highest Matching Document

Syntactic ParserLexiconsSurface cuesFrozen expressions

Word particles of question category

Input speech act

Classify speech act QUEST’s 16 question categories,

assertion, directive, other

Augment retrieval cues

Search documents via LSA

Evaluations of

AutoTutor

LEARNING GAINS (effect sizes)

.42 Unskilled human tutors(Cohen, Kulik, & Kulik, 1982)

.75 AutoTutor (7 experiments)(Graesser, Hu, Person)

1.00 Intelligent tutoring systems

PACT (Anderson, Corbett, Koedinger)

Andes, Atlas (VanLehn)

2.00 (?) Skilled human tutors

Learning Gains (Effect Sizes)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Eff

ect

Siz

e

1 1.1 1.1 2 2 WHY2 WHY2

AutoTutor Version

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Shallow Deep Cloze/Exp

Spring 2002 EvaluationsConceptual Physics

(VanLehn & Graesser, 2002)

Four conditions1. Human tutors

2. Why/Atlas

3. Why/AutoTutor

4. Read control

86 College Students

Measures in Spring Evaluation

Multiple Choice Test

• Pretest and posttest (40 multiple choice questions in each)

Essays graded by 6 physics experts

• 4 pretest and 4 posttest essays

• Expectations versus misconceptions

• Wholistic grades

Generic principles and misconceptions (fine-grained)

Learner perceptions

Time on Tasks

Effect Sizes on Learning Gains (pretest to posttest,

no differences among tutoring conditions)

0 0.25 0.5 0.75 1 1.25 1.5

Expert Grades

Misconceptionremoval

Expectations-lenient

Expectations-stringent

Multiple Choice

Effect size

Fall 2002 EvaluationsConceptual Physics

(Graesser, Moreno, et al., 2003)

Three tutoring conditions

1. Why/AutoTutor

2. Read textbook control

3. Read nothing

63 subjects

Multiple Choice Scores

0.4

0.5

0.6

0.7

0.8

Why/AutoTutor Read Textbook Read nothing

Mul

tiple

Cho

ice

Test

Sco

re

Pretest Posttest Adjusted posttest

2002-3 EvaluationsComputer Literacy

(Graesser, Hu, et al., 2003)

2 Tutoring Conditions1. AutoTutor2. Read nothing

4 Media Conditions 1. Print2. Speech3. Speech+Head4. Speech+Head+Print

96 subjects

Deep Reasoning Questions

0.2

0.3

0.4

0.5

0.6

Why/AutoTutor Read nothing

Mul

tiple

Cho

ice

Test

Sco

re

Print Speech Head+S Head+S+P

LATENT SEMANTIC ANALYSIS

Signal Detection Analyses

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.30 0.40 0.50 0.60 0.65

LSA Threshold

Hit RateFalse Alarm Rated' score

Recall, Precision, and F-measure

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.30 0.40 0.50 0.60 0.65

LSA Threshold

RecallPrecisionF-measure

What Expectations are LSA-worthy?

Compute correlation between:(a) Experts’ ratings of whether essay

answers have expectation E (b) Maximum LSA cosine between E and all

possible combinations of sentences in essay

A high correlation means the expectation is LSA-worthy

Expectations and Correlations (expert ratings, LSA)

After the release, the only force on the balls is the force of the moon’s gravity (r = .71)

A larger object will experience a smaller acceleration for the same force (r = .12)

Force equals mass times acceleration (r = .67)

The boxes are in free fall (r = .21)

OTHER EMPIRICAL EVALUATIONS

Assessment of Dialogue Management

Bystander Turing test

Participants rate whether particular dialog moves in conversations were generated by AutoTutor or by skilled human tutors.

♀

Bystander says

Computer said it

Bystander says

Human said it

Reality:

Computer said it

Hit

.51

Miss

.49

Reality:

Human said itFalse alarm

.53

CR

.47

ASL Model 501 Eye Tracker

QUESTION

4%

ANSWER

7%OFF20%

TALKING HEAD

40%DISPLAY

29%

(MAINLY KEYBOARD)

Percentage of Time Allocated to Interface Components

What Conversational Agents Facilitate Learning?

Correlation matrix for DVs

Like Comp Cred Quality Sync

Comp .50**

Cred .51** .33**

Quality .54** .59** .49**

Sync .56** .54** .31** .53**

Learning .03 .07 .02 .04 .03

AutoTutor Collaborations University of Pittsburgh (VanLehn)

ONR, physics intelligent tutoring systems, Why2 University of Illinois, Chicago (Wiley, Goldman)

NSF/ROLE, plate tectonics, eye tracking, critical stance Old Dominion and Northern Illinois University (McNamara,

Magliano, Millis, Wiemer-Hastings) IERI, science text comprehension.

MIT Media LabNSR/ROLE, Learning Companion, emotion sensors (Picard, Reilly)BEAT, gesture, emotion and speech generator (Cassell, Bickmore)

CHI Systems (Zachary, Ryder)Army SBIR, Think Like a Commander

Institute for Defense Analyses (Fletcher, Toth, Foster)ONR/OSD, Human Use Regulatory Affairs Advisor, research

ethics, web site with agent

Collaboration with MIT Media Lab

Affect Computing Lab• Frustration• Anger• Confusion• Eureka high’s• Contemplation – flow experience

Inferring emotions from sensors• Blue Eyes• Mouse-glove pressure and sweat• Butt

Dialog moves sensitive to emotions

Forthcoming AutoTutor developments

Language and discourse enhancements

• Weave in deeper semantic processing components

• Natural language generation for prompts

Improved animated conversational agent

3-d simulation for enhancing the articulation of explanations

Improve authoring tools

Evolution of the content of curriculum scripts through tutoring experience

The Long-term Vision

Future human-computer interfaces will be conversational: Just like people talking face to face.

Avatars will tutor and mentor learners on the web: students, soldiers, citizens, customers, elderly, special populations, low and high literacy, low and high motivation…

Learning modules will be accessed and available throughout the globe: a 24 by 7 virtual university.

Learning tailored to learner’s abilities, talents, interests, motivation, unique histories.

Proposed NSF Project will Augment AutoTutor

AutoTutor enhanced with more problem solving and complex decision making with Franklin’s Intelligent Distribution Agent

Courseware from MIT, Carnegie Mellon, Pittsburgh, Wisconsin, Illinois, military, and corporations (Merlot, Concord Consortium)

SCORM learning software standards established in the military

University of Colorado speech recognition

FedEx Institute as ADL Co-Lab with DoD and Bureau of Labor, for expansion to business and industry

AutoTutor, Atlas, and Why2 are perhaps the most sophisticated tutorial dialogue projects in the intelligent tutoring systems community (James Lester, AI Magazine, 2001)

Are there properties of the expectations that correlate with LSA-worthiness?

Number of words .07 Number of content words .01 Vector length of expectation .05 Number of glossary terms -.03 Number of infrequent words .23 Number of negations -.29* Number of relative terms, symbols .06

quantifiers, deictic expressions

Challenges in use of LSA in AutoTutor

Widely acknowledged limitations of LSA• Negation• Word order• Structural composition • Size of description

“I thought I said that already!”

• coverage imperfection

Need for larger corpus of misconceptions• Expectation d’ of LSA with experts = .79

• Misconception d’ of LSA with experts = .57

Coordinating LSA with symbolic systems

autotutor: an intelligent tutoring system with mixed initiative dialog art graesser university of...

Documents

palincsaranchored learning

tutoring learners

tutors dialog acts

normal human tutors

us census bureau

intelligent tutoring

intelligent systemssupported

bigger picture7