educational data mining ryan s.j.d. baker pslc/hcii carnegie mellon university richard scheines...

39
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human- Computer Interaction Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

Upload: derek-hoover

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Educational Data Mining

Ryan S.J.d. BakerPSLC/HCII

Carnegie Mellon University

Richard Scheines

Professor of Statistics, Machine Learning, and Human-Computer Interaction

Carnegie Mellon University

Ken Koedinger CMU Director of PSLC

Professor of Human-Computer Interaction & Psychology

Carnegie Mellon University

Page 2: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

In this segment…

We will give a brief overview of classes of Educational Data Mining methods

Discussing in detail Causal Data Mining

An important Educational Data Mining method Bayesian Knowledge Tracing

One of the key building blocks of many Educational Data Mining analyses

Page 3: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Baker (under review)EDM Methods

Prediction Clustering Relationship Mining Discovery with Models Distillation of Data for Human Judgment

Page 4: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Coverage at EDM2008(of 31 papers; not mutually exclusive) Prediction – 45% Clustering – 6% Relationship Mining – 19% Discovery with Models – 13% Distillation of Data for Human Judgment – 16%

None of the Above – 6%

Page 5: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

We will talk about three approaches now

2 types of Prediction 1 type of Relationship Mining

Tomorrow, 9:30am: Discovery with Models

Yesterday: Some examples of Distillation of Data for Human Judgment

Page 6: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Prediction

Pretty much what it says

A student is using a tutor right now.Is he gaming the system or not?(“attempting to succeed in an interactive learning environment by exploiting properties of the system rather than by learning the material”)

A student has used the tutor for the last half hour.How likely is it that she knows the knowledge component in the next step?

A student has completed three years of high school.What will be her score on the SAT-Math exam?

Page 7: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Two Key Types of Prediction

This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials

Page 8: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Classification

There is something you want to predict (“the label”)

The thing you want to predict is categorical The answer is one of a set of categories, not a number

CORRECT/WRONG (sometimes expressed as 0,1) HELP REQUEST/WORKED EXAMPLE

REQUEST/ATTEMPT TO SOLVE WILL DROP OUT/WON’T DROP OUT WILL SELECT PROBLEM A,B,C,D,E,F, or G

Page 9: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Classification

Associated with each label are a set of “features”, which maybe you can use to predict the label

KnowledgeComp pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….

Page 10: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Classification

The basic idea of a classifier is to determine which features, in which combination, can predict the label

KnowledgeComp pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….

Page 11: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Many algorithms you can use

Decision Trees (e.g. C4.5, J48, etc.) Logistic Regression Etc, etc

In your favorite Machine Learning package WEKA RapidMiner KEEL

Page 12: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Regression

There is something you want to predict (“the label”)

The thing you want to predict is numerical

Number of hints student requests (0, 1, 2, 3...) How long student takes to answer (4.7 s., 8.9 s.,

88.2 s., 0.3 s.) What will the student’s test score be (95%, 84%,

33%, 100%)

Page 13: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Regression

Associated with each label are a set of “features”, which maybe you can use to predict the label

KnowledgeComp pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….

Page 14: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Regression

The basic idea of regression is to determine which features, in which combination, can predict the label’s value

KnowledgeComp pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….

Page 15: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Linear Regression

The most classic form of regression is linear regression

Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions

Page 16: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Many more complex algorithms… Neural Networks Support Vector Machines

Surprisingly, Linear Regression performs quite well in many cases despite being overly simple

Particularly when you have a lot of data

Which increasingly is not a problem in EDM…

Page 17: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Relationship Mining

Richard Scheines will now talk about one type of relationship mining, Causal Data Mining

Page 18: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Bayesian Knowledge-Tracing

The algorithm behind the skill bars …

Being improved by Educational Data MiningKey in many EDM analyses and models

Page 19: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Goal: For each knowledge component (KC), infer the student’s knowledge state from performance.

Suppose a student has six opportunities to apply a KC and makes the following sequence of correct (1) and incorrect (0) responses. Has the student has learned the rule?

Bayesian Knowledge Tracing

0 0 1 0 1 1

Page 20: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Model Learning Assumptions

Two-state learning model Each skill is either learned or unlearned

In problem-solving, the student can learn a skill at each opportunity to apply the skill

A student does not forget a skill, once he or she knows it

Only one skill per action

Page 21: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Model Performance Assumptions

If the student knows a skill, there is still some chance the student will slip and make a mistake.

If the student does not know a skill, there is still some chance the student will guess

correctly.

Page 22: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Corbett and Anderson’s Model

Not learned

Two Learning Parameters

p(L0) Probability the skill is already known before the first opportunity to use the skill in problem solving.

p(T) Probability the skill will be learned at each opportunity to use the skill.

Two Performance Parameters

p(G) Probability the student will guess correctly if the skill is not known.

p(S) Probability the student will slip (make a mistake) if the skill is known.

Learnedp(T)

correct correct

p(G) 1-p(S)

p(L0)

Page 23: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Bayesian Knowledge Tracing

Whenever the student has an opportunity to use a skill, the probability that the student knows the skill is updated using formulas derived from Bayes’ Theorem.

Page 24: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Formulas

Page 25: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Knowledge Tracing

How do we know if a knowledge tracing model is any good?

Our primary goal is to predict knowledge

Page 26: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Knowledge Tracing

How do we know if a knowledge tracing model is any good?

Our primary goal is to predict knowledge

But knowledge is a latent trait

Page 27: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Knowledge Tracing

How do we know if a knowledge tracing model is any good?

Our primary goal is to predict knowledge

But knowledge is a latent trait

But we can check those knowledge predictions by checking how well the model predicts performance

Page 28: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Fitting a Knowledge-Tracing Model

In principle, any set of four parameters can be used by knowledge-tracing

But parameters that predict student performance better are preferred

Page 29: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Knowledge Tracing

So, we pick the knowledge tracing parameters that best predict performance

Defined as whether a student’s action will be correct or wrong at a given time

Effectively a classifier

Page 30: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Recent Advances

Recently, there has been work towards contextualizing the guess and slip parameters(Baker, Corbett, & Aleven, 2008a, 2008b)

The intuition:Do we really think the chance that an incorrect response was a slip is equal when Student has never gotten action right; spends 78

seconds thinking; answers; gets it wrong Student has gotten action right 3 times in a row;

spends 1.2 seconds thinking; answers; gets it wrong

Page 31: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Recent Advances

In this work, P(G) and P(S) are determined by a model that looks at time, previous history, the type of action, etc.

Significantly improves predictive power of method Probability of distinguishing correct from incorrect

increases by about 15% of potential gain To 71%, so still room for improvement

Page 32: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Uses

Outside of EDM, can be used to drive tutorial decisions

Within educational data mining, there are several things you can do with these models

Page 33: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Uses of Knowledge Tracing

Often key components in models of other constructs Help-Seeking and Metacognition (Aleven et al,

2004, 2008) Gaming the System (Baker et al, 2004, in press) Off-Task Behavior (Baker, 2007)

Page 34: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Uses of Knowledge Tracing

If you want to understand a student’s strategic/meta-cognitive choices, it is helpful to know whether the student knew the skill

Gaming the system means something different if a student already knows the step, versus if the student doesn’t know it

A student who doesn’t know a skill should ask for help; a student who does, shouldn’t

Page 35: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Uses of Knowledge Tracing

Can be interpreted to learn about skills

Page 36: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Skills from the Algebra Tutor

skill L0 T

AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01

ApplyExponentExpandExponentsevalradicalE 0.333 0.497

CalculateEliminateParensTypeinSkillElimi 0.979 0.001

CalculatenegativecoefficientTypeinSkillM 0.953 0.001

Changingaxisbounds 0.01 0.01

Changingaxisintervals 0.01 0.01

ChooseGraphicala 0.001 0.306

combineliketermssp 0.943 0.001

Page 37: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Which skills could probably be removed from the tutor?

skill L0 T

AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01

ApplyExponentExpandExponentsevalradicalE 0.333 0.497

CalculateEliminateParensTypeinSkillElimi 0.979 0.001

CalculatenegativecoefficientTypeinSkillM 0.953 0.001

Changingaxisbounds 0.01 0.01

Changingaxisintervals 0.01 0.01

ChooseGraphicala 0.001 0.306

combineliketermssp 0.943 0.001

Page 38: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

Which skills could use better instruction?

skill L0 T

AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01

ApplyExponentExpandExponentsevalradicalE 0.333 0.497

CalculateEliminateParensTypeinSkillElimi 0.979 0.001

CalculatenegativecoefficientTypeinSkillM 0.953 0.001

Changingaxisbounds 0.01 0.01

Changingaxisintervals 0.01 0.01

ChooseGraphicala 0.001 0.306

combineliketermssp 0.943 0.001

Page 39: Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer

END

This last example is a simple example of Discovery with Models

Tomorrow at 9:30am, we’ll discuss some more complex examples