educational data mining ryan s.j.d. baker pslc/hcii carnegie mellon university richard scheines...
TRANSCRIPT
Educational Data Mining
Ryan S.J.d. BakerPSLC/HCII
Carnegie Mellon University
Richard Scheines
Professor of Statistics, Machine Learning, and Human-Computer Interaction
Carnegie Mellon University
Ken Koedinger CMU Director of PSLC
Professor of Human-Computer Interaction & Psychology
Carnegie Mellon University
In this segment…
We will give a brief overview of classes of Educational Data Mining methods
Discussing in detail Causal Data Mining
An important Educational Data Mining method Bayesian Knowledge Tracing
One of the key building blocks of many Educational Data Mining analyses
Baker (under review)EDM Methods
Prediction Clustering Relationship Mining Discovery with Models Distillation of Data for Human Judgment
Coverage at EDM2008(of 31 papers; not mutually exclusive) Prediction – 45% Clustering – 6% Relationship Mining – 19% Discovery with Models – 13% Distillation of Data for Human Judgment – 16%
None of the Above – 6%
We will talk about three approaches now
2 types of Prediction 1 type of Relationship Mining
Tomorrow, 9:30am: Discovery with Models
Yesterday: Some examples of Distillation of Data for Human Judgment
Prediction
Pretty much what it says
A student is using a tutor right now.Is he gaming the system or not?(“attempting to succeed in an interactive learning environment by exploiting properties of the system rather than by learning the material”)
A student has used the tutor for the last half hour.How likely is it that she knows the knowledge component in the next step?
A student has completed three years of high school.What will be her score on the SAT-Math exam?
Two Key Types of Prediction
This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials
Classification
There is something you want to predict (“the label”)
The thing you want to predict is categorical The answer is one of a set of categories, not a number
CORRECT/WRONG (sometimes expressed as 0,1) HELP REQUEST/WORKED EXAMPLE
REQUEST/ATTEMPT TO SOLVE WILL DROP OUT/WON’T DROP OUT WILL SELECT PROBLEM A,B,C,D,E,F, or G
Classification
Associated with each label are a set of “features”, which maybe you can use to predict the label
KnowledgeComp pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
Classification
The basic idea of a classifier is to determine which features, in which combination, can predict the label
KnowledgeComp pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
Many algorithms you can use
Decision Trees (e.g. C4.5, J48, etc.) Logistic Regression Etc, etc
In your favorite Machine Learning package WEKA RapidMiner KEEL
Regression
There is something you want to predict (“the label”)
The thing you want to predict is numerical
Number of hints student requests (0, 1, 2, 3...) How long student takes to answer (4.7 s., 8.9 s.,
88.2 s., 0.3 s.) What will the student’s test score be (95%, 84%,
33%, 100%)
Regression
Associated with each label are a set of “features”, which maybe you can use to predict the label
KnowledgeComp pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….
Regression
The basic idea of regression is to determine which features, in which combination, can predict the label’s value
KnowledgeComp pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….
Linear Regression
The most classic form of regression is linear regression
Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions
Many more complex algorithms… Neural Networks Support Vector Machines
Surprisingly, Linear Regression performs quite well in many cases despite being overly simple
Particularly when you have a lot of data
Which increasingly is not a problem in EDM…
Relationship Mining
Richard Scheines will now talk about one type of relationship mining, Causal Data Mining
Bayesian Knowledge-Tracing
The algorithm behind the skill bars …
Being improved by Educational Data MiningKey in many EDM analyses and models
Goal: For each knowledge component (KC), infer the student’s knowledge state from performance.
Suppose a student has six opportunities to apply a KC and makes the following sequence of correct (1) and incorrect (0) responses. Has the student has learned the rule?
Bayesian Knowledge Tracing
0 0 1 0 1 1
Model Learning Assumptions
Two-state learning model Each skill is either learned or unlearned
In problem-solving, the student can learn a skill at each opportunity to apply the skill
A student does not forget a skill, once he or she knows it
Only one skill per action
Model Performance Assumptions
If the student knows a skill, there is still some chance the student will slip and make a mistake.
If the student does not know a skill, there is still some chance the student will guess
correctly.
Corbett and Anderson’s Model
Not learned
Two Learning Parameters
p(L0) Probability the skill is already known before the first opportunity to use the skill in problem solving.
p(T) Probability the skill will be learned at each opportunity to use the skill.
Two Performance Parameters
p(G) Probability the student will guess correctly if the skill is not known.
p(S) Probability the student will slip (make a mistake) if the skill is known.
Learnedp(T)
correct correct
p(G) 1-p(S)
p(L0)
Bayesian Knowledge Tracing
Whenever the student has an opportunity to use a skill, the probability that the student knows the skill is updated using formulas derived from Bayes’ Theorem.
Formulas
Knowledge Tracing
How do we know if a knowledge tracing model is any good?
Our primary goal is to predict knowledge
Knowledge Tracing
How do we know if a knowledge tracing model is any good?
Our primary goal is to predict knowledge
But knowledge is a latent trait
Knowledge Tracing
How do we know if a knowledge tracing model is any good?
Our primary goal is to predict knowledge
But knowledge is a latent trait
But we can check those knowledge predictions by checking how well the model predicts performance
Fitting a Knowledge-Tracing Model
In principle, any set of four parameters can be used by knowledge-tracing
But parameters that predict student performance better are preferred
Knowledge Tracing
So, we pick the knowledge tracing parameters that best predict performance
Defined as whether a student’s action will be correct or wrong at a given time
Effectively a classifier
Recent Advances
Recently, there has been work towards contextualizing the guess and slip parameters(Baker, Corbett, & Aleven, 2008a, 2008b)
The intuition:Do we really think the chance that an incorrect response was a slip is equal when Student has never gotten action right; spends 78
seconds thinking; answers; gets it wrong Student has gotten action right 3 times in a row;
spends 1.2 seconds thinking; answers; gets it wrong
Recent Advances
In this work, P(G) and P(S) are determined by a model that looks at time, previous history, the type of action, etc.
Significantly improves predictive power of method Probability of distinguishing correct from incorrect
increases by about 15% of potential gain To 71%, so still room for improvement
Uses
Outside of EDM, can be used to drive tutorial decisions
Within educational data mining, there are several things you can do with these models
Uses of Knowledge Tracing
Often key components in models of other constructs Help-Seeking and Metacognition (Aleven et al,
2004, 2008) Gaming the System (Baker et al, 2004, in press) Off-Task Behavior (Baker, 2007)
Uses of Knowledge Tracing
If you want to understand a student’s strategic/meta-cognitive choices, it is helpful to know whether the student knew the skill
Gaming the system means something different if a student already knows the step, versus if the student doesn’t know it
A student who doesn’t know a skill should ask for help; a student who does, shouldn’t
Uses of Knowledge Tracing
Can be interpreted to learn about skills
Skills from the Algebra Tutor
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
Which skills could probably be removed from the tutor?
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
Which skills could use better instruction?
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
END
This last example is a simple example of Discovery with Models
Tomorrow at 9:30am, we’ll discuss some more complex examples