a m achine l earning a pproach for a utomatic s tudent m odel d iscovery nan li, noboru matsuda,...

29
A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University

Upload: hilary-parks

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

3 S TUDENT M ODEL C ONSTRUCTION Traditional Methods Structured interviews Think-aloud protocols Rational analysis Previous Automated Methods Learning factor analysis (LFA) Proposed Approach Use a machine-learning agent, SimStudent, to acquire knowledge 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Require expert input. Highly subjective. Within the search space of human- provided factors. Independent of human- provided factors.

TRANSCRIPT

Page 1: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERYNan Li, Noboru Matsuda, William Cohen, and Kenneth KoedingerComputer Science DepartmentCarnegie Mellon University

Page 2: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

2

STUDENT MODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model

how students solve problemsExample: What to do next on problems like

3x=12 A key factor behind instructional

decisions in automated tutoring systems

Page 3: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

3

STUDENT MODEL CONSTRUCTION Traditional Methods

Structured interviews Think-aloud protocols Rational analysis

Previous Automated Methods Learning factor analysis (LFA)

Proposed Approach Use a machine-learning

agent, SimStudent, to acquire knowledge

1 production rule acquired => 1 KC in student model (Q matrix)

Require expert input.Highly subjective.

Within the search space of human-provided factors.

Independent of human-provided

factors.

Page 4: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

4

A BRIEF REVIEW OF SIMSTUDENT

• A machine-learning agent that• acquires production

rules from• examples & problem

solving experience• given a set of

feature predicates & functions

Page 5: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

5

PRODUCTION RULES Skill divide

(e.g. -3x = 6)

What: Left side (-3x) Right side (6)

When: Left side (-3x) does not

have constant term=> How:

Get-coefficient (-3) of left side (-3x)

Divide both sides with the coefficient

Each production rule is associated with one KC

Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step

Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

Page 6: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

6

DEEP FEATURE LEARNING Expert vs Novice (Chi et al., 1981)

Example: What’s the coefficient of -3x? Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3

Model deep feature learning using machine learning techniques

Integrate acquired knowledge into SimStudent learning

Remove dependence on strong operators & split KCs into finer grain sizes

Page 7: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

7

FEATURE RECOGNITION ASPCFG INDUCTION Underlying structure in the problem

Grammar Feature Non-terminal symbol in a grammar

rule Feature learning task Grammar induction Student errors Incorrect parsing

Page 8: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

8

LEARNING PROBLEM Input is a set of feature recognition records

consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

Page 9: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

9

A TWO-STEP PCFG LEARNING ALGORITHM• Greedy Structure

Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

Page 10: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

10

EXAMPLE OF PRODUCTION RULES BEFORE AND AFTER INTEGRATION Extend the “What” Part in Production RuleOriginal:

Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)

When:Left side (-3x) does not have constant term

=>How:

Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)

Extended:Skill divide (e.g. -3x = 6)What:

Left side (-3, -3x)Right side (6)

When:Left side (-3x) does not have constant term

=>How:

Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)

• Fewer operators• Eliminate need for domain-specific operators

Page 11: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

11

Original:Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)

When:Left side (-3x) does not have constant term

=>How:

Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)

Page 12: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

12

EXPERIMENT METHOD SimStudent vs. Human-generated model Code real student data

71 students used a Carnegie Learning Algebra I Tutor on equation solving

SimStudent: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable

production Human-generated model:

Coded manually based on expertise

Page 13: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

13

HUMAN-GENERATED VS SIMSTUDENT KCS

Human-generated Model

SimStudent

Comment

Total # of KCs 12 21# of Basic Arithmetic Operation KCs

4 13 Split into finer grain sizes based on different problem forms

# of Typein KCs 4 4 Approximately the same# of Other Transformation Operation KCs (e.g. combine like terms)

4 4 Approximately the same

Page 14: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

14

HOW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM)

An instance of logistic regression that Uses each student, each KC and KC by opportunity

interaction as independent variables To predict probabilities of a student making an error

on a specific step

Page 15: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

divide 1 1 1 1 1 1 1 1 1 1simSt-divide 1 1 1 1 1 1 1 0 0 0simSt-divide-1

0 0 0 0 0 0 0 1 1 1

AN EXAMPLE OF SPLIT IN DIVISION Human-generated

Model divide:

Ax=B & -x=A SimStudent

simSt-divide: Ax=B

simSt-divide-1: -x=A

Ax=B -x=A

Page 16: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

16

PRODUCTION RULES FOR DIVISION

Skill simSt-divide (e.g. -3x = 6) What:

Left side (-3, -3x) Right side (6)

When: Left side (-3x) does not

have constant term How:

Divide both sides with the coefficient (-3)

Skill simSt-divide-1 (e.g. -x = 3) What:

Left side (-x) Right side (3)

When: Left side (-x) is of the

form -v How:

Generate one (1) Divide both sides with -1

Page 17: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

17

AN EXAMPLE WITHOUT SPIT IN DIVIDE TYPEIN Human-

generated Model divide-typein

SimStudent simSt-divide-

typein

divide-typein 1 1 1 1 1 1 1 1 1simSt-divide-typin

1 1 1 1 1 1 1 1 1

Page 18: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

18

SIMSTUDENT VS SIMSTUDENT + FEATURE LEARNING SimStudent

Needs strong operators

Constructs student models similar to human-generated model

Extended SimStudent Only requires weak

operators Split KCs into finer

grain sizes based on different parse trees

Does Extended SimStudent produce a KC model that better fits student learning data?

Page 19: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

19

RESULTSHuman-generated Model

SimStudent

AIC 6529 64483-Fold Cross Validation RMSE

0.4034 0.3997

Significance Test SimStudent outperforms the human-generated

model in 4260 out of 6494 steps p < 0.001

SimStudent outperforms the human-generated model across 20 runs of cross validation

p < 0.001

Page 20: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

20

SUMMARY Presented an innovative application of a

machine-learning agent, SimStudent, for an automatic discovery of student models.

Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

Page 21: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

21

FUTURE STUDIES Test generality in other datasets in DataShop

Apply this proposed approach in other domains Stoichiometry Fraction addition

Page 22: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

22

Page 23: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

23

AN EXAMPLE IN ALGEBRA

Page 24: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

24

FEATURE RECOGNITION ASPCFG INDUCTION Underlying structure in the problem

Grammar Feature Non-terminal symbol in a grammar

rule Feature learning task Grammar induction Student errors Incorrect parsing

Page 25: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

25

LEARNING PROBLEM Input is a set of feature recognition records

consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

Page 26: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

26

A COMPUTATIONAL MODEL OF DEEP FEATURE LEARNING Extended a PCFG Learning Algorithm (Li et

al., 2009) Feature Learning Stronger Prior Knowledge:

Transfer Learning Using Prior Knowledge

Page 27: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

27

A TWO-STEP PCFG LEARNING ALGORITHM• Greedy Structure

Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

Page 28: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

28

FEATURE LEARNING Build most probable

parse trees For all observation

sequences Select a non-

terminal symbol that Matches the most

training records as the target feature

Page 29: A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science

29

TRANSFER LEARNING USING PRIOR KNOWLEDGE GSH Phase:

Build parse trees based on some previously acquired grammar rules

Then call the original GSH

Viterbi Training: Add rule frequency

in previous task to the current task

0.660.330.50.5