a m achine l earning a pproach for a utomatic s tudent m odel d iscovery nan li, noboru matsuda,...

A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERYNan Li, Noboru Matsuda, William Cohen, and Kenneth KoedingerComputer Science DepartmentCarnegie Mellon University

2

STUDENT MODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model

how students solve problemsExample: What to do next on problems like

3x=12 A key factor behind instructional

decisions in automated tutoring systems

3

STUDENT MODEL CONSTRUCTION Traditional Methods

Structured interviews Think-aloud protocols Rational analysis

Previous Automated Methods Learning factor analysis (LFA)

Proposed Approach Use a machine-learning

agent, SimStudent, to acquire knowledge

1 production rule acquired => 1 KC in student model (Q matrix)

Require expert input.Highly subjective.

Within the search space of human-provided factors.

Independent of human-provided

factors.

4

A BRIEF REVIEW OF SIMSTUDENT

• A machine-learning agent that• acquires production

rules from• examples & problem

solving experience• given a set of

feature predicates & functions

5

PRODUCTION RULES Skill divide

(e.g. -3x = 6)

What: Left side (-3x) Right side (6)

When: Left side (-3x) does not

have constant term=> How:

Get-coefficient (-3) of left side (-3x)

Divide both sides with the coefficient

Each production rule is associated with one KC

Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step

Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

6

DEEP FEATURE LEARNING Expert vs Novice (Chi et al., 1981)

Example: What’s the coefficient of -3x? Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3

Model deep feature learning using machine learning techniques

Integrate acquired knowledge into SimStudent learning

Remove dependence on strong operators & split KCs into finer grain sizes

7

FEATURE RECOGNITION ASPCFG INDUCTION Underlying structure in the problem

Grammar Feature Non-terminal symbol in a grammar

rule Feature learning task Grammar induction Student errors Incorrect parsing

8

LEARNING PROBLEM Input is a set of feature recognition records

consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

9

A TWO-STEP PCFG LEARNING ALGORITHM• Greedy Structure

Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

10

EXAMPLE OF PRODUCTION RULES BEFORE AND AFTER INTEGRATION Extend the “What” Part in Production RuleOriginal:

Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)

When:Left side (-3x) does not have constant term

=>How:

Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)

Extended:Skill divide (e.g. -3x = 6)What:

Left side (-3, -3x)Right side (6)


=>How:


• Fewer operators• Eliminate need for domain-specific operators

11

Original:Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)


=>How:


12

EXPERIMENT METHOD SimStudent vs. Human-generated model Code real student data

71 students used a Carnegie Learning Algebra I Tutor on equation solving

SimStudent: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable

production Human-generated model:

Coded manually based on expertise

13

HUMAN-GENERATED VS SIMSTUDENT KCS

Human-generated Model

SimStudent

Comment

Total # of KCs 12 21# of Basic Arithmetic Operation KCs

4 13 Split into finer grain sizes based on different problem forms

# of Typein KCs 4 4 Approximately the same# of Other Transformation Operation KCs (e.g. combine like terms)

4 4 Approximately the same

14

HOW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM)

An instance of logistic regression that Uses each student, each KC and KC by opportunity

interaction as independent variables To predict probabilities of a student making an error

on a specific step

divide 1 1 1 1 1 1 1 1 1 1simSt-divide 1 1 1 1 1 1 1 0 0 0simSt-divide-1

0 0 0 0 0 0 0 1 1 1

AN EXAMPLE OF SPLIT IN DIVISION Human-generated

Model divide:

Ax=B & -x=A SimStudent

simSt-divide: Ax=B

simSt-divide-1: -x=A

Ax=B -x=A

16

PRODUCTION RULES FOR DIVISION

Skill simSt-divide (e.g. -3x = 6) What:

Left side (-3, -3x) Right side (6)

When: Left side (-3x) does not

have constant term How:

Divide both sides with the coefficient (-3)

Skill simSt-divide-1 (e.g. -x = 3) What:

Left side (-x) Right side (3)

When: Left side (-x) is of the

form -v How:

Generate one (1) Divide both sides with -1

17

AN EXAMPLE WITHOUT SPIT IN DIVIDE TYPEIN Human-

generated Model divide-typein

SimStudent simSt-divide-

typein

divide-typein 1 1 1 1 1 1 1 1 1simSt-divide-typin

1 1 1 1 1 1 1 1 1

18

SIMSTUDENT VS SIMSTUDENT + FEATURE LEARNING SimStudent

Needs strong operators

Constructs student models similar to human-generated model

Extended SimStudent Only requires weak

operators Split KCs into finer

grain sizes based on different parse trees

Does Extended SimStudent produce a KC model that better fits student learning data?

19

RESULTSHuman-generated Model

SimStudent

AIC 6529 64483-Fold Cross Validation RMSE

0.4034 0.3997

Significance Test SimStudent outperforms the human-generated

model in 4260 out of 6494 steps p < 0.001

SimStudent outperforms the human-generated model across 20 runs of cross validation

p < 0.001

20

SUMMARY Presented an innovative application of a

machine-learning agent, SimStudent, for an automatic discovery of student models.

Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

21

FUTURE STUDIES Test generality in other datasets in DataShop

Apply this proposed approach in other domains Stoichiometry Fraction addition

23

AN EXAMPLE IN ALGEBRA

24

FEATURE RECOGNITION ASPCFG INDUCTION Underlying structure in the problem

Grammar Feature Non-terminal symbol in a grammar

rule Feature learning task Grammar induction Student errors Incorrect parsing

25

LEARNING PROBLEM Input is a set of feature recognition records

consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

26

A COMPUTATIONAL MODEL OF DEEP FEATURE LEARNING Extended a PCFG Learning Algorithm (Li et

al., 2009) Feature Learning Stronger Prior Knowledge:

Transfer Learning Using Prior Knowledge

27

A TWO-STEP PCFG LEARNING ALGORITHM• Greedy Structure

Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

28

FEATURE LEARNING Build most probable

parse trees For all observation

sequences Select a non-

terminal symbol that Matches the most

training records as the target feature

29

TRANSFER LEARNING USING PRIOR KNOWLEDGE GSH Phase:

Build parse trees based on some previously acquired grammar rules

Then call the original GSH

Viterbi Training: Add rule frequency

in previous task to the current task

0.660.330.50.5

a m achine l earning a pproach for a utomatic s tudent m odel d iscovery nan li, noboru matsuda,...

Documents