student simulation and evaluation

30
Student simulation Student simulation and evaluation and evaluation DOD meeting DOD meeting Hua Ai ([email protected]) Hua Ai ([email protected]) 03/03/2006 03/03/2006

Upload: kuper

Post on 14-Feb-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Student simulation and evaluation . DOD meeting Hua Ai ([email protected]) 03/03/2006. Outline. Motivations Backgrounds Corpus Student Simulation Model Comparisons Conclusions & Future Work. Motivations. For larger corpus - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Student simulation and evaluation

Student simulation Student simulation and evaluation and evaluation DOD meetingDOD meetingHua Ai ([email protected])Hua Ai ([email protected])03/03/200603/03/2006

Page 2: Student simulation and evaluation

22

OutlineOutline MotivationsMotivations BackgroundsBackgrounds CorpusCorpus Student Simulation ModelStudent Simulation Model ComparisonsComparisons Conclusions & Future WorkConclusions & Future Work

Page 3: Student simulation and evaluation

33

MotivationsMotivations For larger corpusFor larger corpus

Reinforcement Learning (RL) is used to Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue learn the best policy for spoken dialogue systems automaticallysystems automatically

Best strategy may often not even be present Best strategy may often not even be present in small datasetin small dataset

For cheaper corpusFor cheaper corpus Human subjects are expensiveHuman subjects are expensive

Page 4: Student simulation and evaluation

44

Simulated UserDialog Manager

Strategy

Reinforcement Learning

DialogCorpus

Simulation models

Strategy learning using a simulated user (Schatzmann et al., Strategy learning using a simulated user (Schatzmann et al., 2005)2005)

Page 5: Student simulation and evaluation

55

Backgrounds (1)Backgrounds (1) Education communityEducation community

Focusing on changes of student’s inner-Focusing on changes of student’s inner-brain knowledge representation formsbrain knowledge representation forms

Usually not dialogue basedUsually not dialogue based Simulated students for (Venlehn et al., 1994) Simulated students for (Venlehn et al., 1994)

tutor trainingtutor training Collaborative learningCollaborative learning

Page 6: Student simulation and evaluation

66

Backgrounds (2)Backgrounds (2) Dialogue communityDialogue community

Focusing on interactions and dialogue Focusing on interactions and dialogue behaviorsbehaviors

Simulated users have limited actions to takeSimulated users have limited actions to take (Schatzmann et al., 2005)(Schatzmann et al., 2005)

Simulating on DA levelSimulating on DA level

Page 7: Student simulation and evaluation

77

Corpus (1)Corpus (1) Spoken dialogue physics tutor (ITSPOKE)Spoken dialogue physics tutor (ITSPOKE)

Page 8: Student simulation and evaluation

88

Corpus (2)Corpus (2) Tutoring procedureTutoring procedure

(T) Question

(S) Answer

Dialogue(T) Q(S) A

Essay revision

Dialogue

(T) Question

(S) Answer

Dialogue(T) Q(S) A

Essay revision

Dialogue

… …

5 problems

Page 9: Student simulation and evaluation

99

Corpus (3)Corpus (3) Tutor’s behaviorsTutor’s behaviors

Defined in KCD (Knowledge Construction Defined in KCD (Knowledge Construction Dialogues)Dialogues)

Correct

Incorrect/Partially Correct

Page 10: Student simulation and evaluation

1010

Corpus (4)Corpus (4)

  #dialogues   stuWord stuTurn tutorWord tutorTurn

f03 100 avg 57.16 23.35 1256.92 29.64

(Synthesized)    stdev 45.57638 17.44334 849.8195 19.76351

05syn 136 avg 91.0963 30.78519 1655.467 38.06667

(Synthesized)     stdev 53.82931 14.42551 757.8744 16.32469

05pre 135 avg 87.34559 30.11765 1597.206 37.33088(pre-recorded)     stdev 55.48004 16.96972 832.9845 18.20096

f03:s05 Different groups of subjectsf03:s05 Different groups of subjects

Page 11: Student simulation and evaluation

1111

Simulation Models (1)Simulation Models (1) Simulating on word levelSimulating on word level

Student’s have more complex behaviorsStudent’s have more complex behaviors DA info alone isn’t enough for the systemDA info alone isn’t enough for the system

Two models trained on two corpusTwo models trained on two corpus

ProbCorrect

Random

f03

s05

03ProbCorrect

03Random

05ProbCorrect

05Random

Page 12: Student simulation and evaluation

1212

Simulation Models (2)Simulation Models (2) ProbCorrect ModelProbCorrect Model

Simulates average knowledge level of real Simulates average knowledge level of real studentsstudents

Simulate meaningful dialogue behaviorsSimulate meaningful dialogue behaviors Random ModelRandom Model

Non-senseNon-sense As a contrastAs a contrast

Page 13: Student simulation and evaluation

1313ProbCorrect ModelProbCorrect Model

Real corpusquestion1Answer1_1 (c)Answer1_2 (ic)Answer1_3 (ic)

question2Answer2_1 (c)Answer2_2 (ic)

Candidate Ans:For question1c:ic = 1:2c:Answer1_1ic:Answer1_2Answer1_3

For question2c:ic = 1:1c:Answer2_1icAnswer2_2

ProbCorrect Model:Question 1Answer: 1) Choose to give a

c/ic answer with the same average probability as real student

2) Randomly choose one answers from the corresponding answer set

Page 14: Student simulation and evaluation

1414

HC03&05Question1Answer1_1Answer1_2Answer1_3Answer1_4

Question2Answer2_1Answer2_2

Candidate Ans:1) Answer1_12) Answer1_23) Answer1_34) Answer1_45) Answer2_16) Answer2_2

Big random Model:Question i:

Answer: any of the 6 answers with the same probability

(Regardless the question!)

Random ModelRandom Model

Page 15: Student simulation and evaluation

1515

ExperimentsExperiments Comparisons between real corporaComparisons between real corpora Comparisons between real & simulated Comparisons between real & simulated

corporacorpora Comparisons between simulated corporaComparisons between simulated corpora

Page 16: Student simulation and evaluation

1616

Evaluation metricsEvaluation metrics High-level dialog features High-level dialog features Dialog style and cooperativeness Dialog style and cooperativeness Dialog Success Rate and Efficiency Dialog Success Rate and Efficiency Learning GainsLearning Gains

Real Corpora Real Corpora Comparisons (1)Comparisons (1)

Page 17: Student simulation and evaluation

1717

High-level dialog featuresHigh-level dialog features

Real corpora comparisons Real corpora comparisons (2)(2)

Page 18: Student simulation and evaluation

1818

Real corpora comparisons Real corpora comparisons (3)(3)

Dialogue style featuresDialogue style features

Page 19: Student simulation and evaluation

1919

Real corpora comparisons Real corpora comparisons (3)(3)

Dialogue success rateDialogue success rate

Page 20: Student simulation and evaluation

2020

Real corpora comparisons Real corpora comparisons (4)(4)

Learning gains featuresLearning gains features

Page 21: Student simulation and evaluation

2121

ResultsResults Differences captured by these simple Differences captured by these simple

metrics can’t help to conclude whether a metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., corpus is real or not (Schatzmann et al., 2005)2005)

Differences could be due to different user Differences could be due to different user population population

Page 22: Student simulation and evaluation

2222

Real Vs Simulated Real Vs Simulated Corpora Comparisons Corpora Comparisons

00.20.40.60.8

11.21.41.61.8

2

tutorT

urn

tutorW

ord

tWord

Rate

stuTurn

stuW

ord

sWord

Rate

corre

ctRate

f03 03smooth 03random s05 05smooth

Page 23: Student simulation and evaluation

2323

Results (1) Results (1) Most of the measurements are able to Most of the measurements are able to

distinguish between Random and distinguish between Random and ProbCorrect modelProbCorrect model

ProbCorrect model generates more ProbCorrect model generates more realistic behaviorsrealistic behaviors

We can’t conclude on the power of these We can’t conclude on the power of these metrics since the two simulated corpus metrics since the two simulated corpus are really differentare really different

Page 24: Student simulation and evaluation

2424

Results (2)Results (2) Differences between real and random Differences between real and random

models are captured clearly, but models are captured clearly, but differences between real and differences between real and ProbCorrect is not clearProbCorrect is not clear

We don’t expect this simple model to give We don’t expect this simple model to give very real corpus. It’s surprising that the very real corpus. It’s surprising that the differences are small differences are small

Page 25: Student simulation and evaluation

2525

Results (3)Results (3) S05 variety > f03 variety S05 variety > f03 variety

05probCorrect variety > 03probCorrect 05probCorrect variety > 03probCorrect varietyvariety

However, we don’t get significantly more However, we don’t get significantly more varieties in the simulated corpus than the varieties in the simulated corpus than the real onesreal ones Could be the computer tutor is simple (c/ic)Could be the computer tutor is simple (c/ic) We’re using the same candidate answer setWe’re using the same candidate answer set

Page 26: Student simulation and evaluation

2626

Results (4)Results (4) ProbCorrect models trained on different ProbCorrect models trained on different

real corpora are quite differentreal corpora are quite different The ProbCorrect model is more similar to The ProbCorrect model is more similar to

the real corpus it is trained from than to the real corpus it is trained from than to the other real corpusthe other real corpus

Page 27: Student simulation and evaluation

2727

Comparisons between Comparisons between simulated dialogues with simulated dialogues with different dialogue structuredifferent dialogue structure

f03problem34

00.20.40.60.8

11.21.4

03prob 03smoothed 03random

f03problem7

00.20.40.60.8

11.21.41.6

tutorT

urn

tutorW

ord

tWord

Rate

stuTurn

stuW

ord

sWord

Rate

corre

ctRate

03prob 03smoothed 03random

Page 28: Student simulation and evaluation

2828

ResultsResults Larger differences between the two Larger differences between the two

simulated corpora in prob7 than in simulated corpora in prob7 than in prob34prob34

Dialogue structure of prob34 is more Dialogue structure of prob34 is more restrictedrestricted

The power of these simple metrics is The power of these simple metrics is restricted by the dialogue structurerestricted by the dialogue structure

Page 29: Student simulation and evaluation

2929

ConclusionsConclusions The simple measurements can The simple measurements can

distinguish between distinguish between real corporareal corpora

Different populationDifferent population simulated and real corpora simulated and real corpora

To different extentTo different extent simulated corporasimulated corpora

Different modelsDifferent models Trained on different corporaTrained on different corpora Limited to different Dialog structureLimited to different Dialog structure

Page 30: Student simulation and evaluation

3030

Future workFuture work Explore “deep” evaluation metricsExplore “deep” evaluation metrics Test simulated corpus on policyTest simulated corpus on policy More simulation modelsMore simulation models

More human featuresMore human features Emotion, learningEmotion, learning

Special casesSpecial cases Quick learners, slow learnersQuick learners, slow learners