machine learning models for some learning analytics issues...
TRANSCRIPT
Machine learning models for some learning analyticsissues in massive open online courses
Fei MI
Dept. of Computer Science and EngineeringHong Kong University of Science and Technology
Thesis supervised by Dit-Yan Yeung27/05/2015
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
MOOC Platform
Fei MI MOOC Learning Analytics CSE, HKUST
MOOC Platform in China
Fei MI MOOC Learning Analytics CSE, HKUST
Learning Analytics Issues
Current MOOC environment
1 Popularity and rapid development of MOOC platforms2 Massive, Open, Online nature (introduce new era of education)3 Access any where, any time (extend education boundary)
Peer Grading
1 Address student assessment issue in MOOCs2 Subjective, open-ended assignments3 Students benefit from grading process
Dropout Prediction
1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Fei MI MOOC Learning Analytics CSE, HKUST
Learning Analytics Issues
Current MOOC environment
1 Popularity and rapid development of MOOC platforms2 Massive, Open, Online nature (introduce new era of education)3 Access any where, any time (extend education boundary)
Peer Grading
1 Address student assessment issue in MOOCs2 Subjective, open-ended assignments3 Students benefit from grading process
Dropout Prediction
1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Fei MI MOOC Learning Analytics CSE, HKUST
Learning Analytics Issues
Current MOOC environment
1 Popularity and rapid development of MOOC platforms2 Massive, Open, Online nature (introduce new era of education)3 Access any where, any time (extend education boundary)
Peer Grading
1 Address student assessment issue in MOOCs2 Subjective, open-ended assignments3 Students benefit from grading process
Dropout Prediction
1 High dropout rate
2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Fei MI MOOC Learning Analytics CSE, HKUST
Learning Analytics Issues
Current MOOC environment
1 Popularity and rapid development of MOOC platforms2 Massive, Open, Online nature (introduce new era of education)3 Access any where, any time (extend education boundary)
Peer Grading
1 Address student assessment issue in MOOCs2 Subjective, open-ended assignments3 Students benefit from grading process
Dropout Prediction
1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Problem Formulation
Fei MI MOOC Learning Analytics CSE, HKUST
Problem Formulation
Fei MI MOOC Learning Analytics CSE, HKUST
Problem Formulation
Fei MI MOOC Learning Analytics CSE, HKUST
Problem Formulation
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal vs. Ordinal
Fei MI MOOC Learning Analytics CSE, HKUST
Peer Grading Data
1 “Science, Technology, and Society in China I” on Cousera2 Three assignments in total3 Three pieces assigned to a grader, cardinal rubrics4 Default score aggregation is done by taking median of peer
grades;
Assignment 1 Assignment 2 Assignment 3# finished students 1202 845 724# peer grades 3201 2261 2084# staff grades 23 19 23Full score 21 25 25Mean score 14.8 (70%) 17.2 (69%) 16.5 (58%)
Summary statistics of assignments for peer grading
Fei MI MOOC Learning Analytics CSE, HKUST
Peer Grading Data
1 “Science, Technology, and Society in China I” on Cousera2 Three assignments in total3 Three pieces assigned to a grader, cardinal rubrics4 Default score aggregation is done by taking median of peer
grades;
Assignment 1 Assignment 2 Assignment 3# finished students 1202 845 724# peer grades 3201 2261 2084# staff grades 23 19 23Full score 21 25 25Mean score 14.8 (70%) 17.2 (69%) 16.5 (58%)
Summary statistics of assignments for peer grading
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal Peer Grading Model [Piech et al. 2013]
PG𝟏
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal Peer Grading Model [Piech et al. 2013]
PG𝟏
= ?
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal Peer Grading Model [Piech et al. 2013]
PG𝟑
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal Peer Grading Model [Piech et al. 2013]
PG𝟑
Fei MI MOOC Learning Analytics CSE, HKUST
Cardinal Peer Grading Model [Piech et al. 2013]
PG𝟑
Fei MI MOOC Learning Analytics CSE, HKUST
Model Extensions
Still relate grader reliability with grader scoreModel relationship in a probabilistic form rather than a linear/deterministic form
PG𝟒 & PG𝟓
Fei MI MOOC Learning Analytics CSE, HKUST
Model Extensions
Still relate grader reliability with grader scoreModel relationship in a probabilistic form rather than a linear/deterministic form
PG𝟒 PG𝟓
PG𝟒 & PG𝟓
Fei MI MOOC Learning Analytics CSE, HKUST
Results for Cardinal Models
0 5 10 15 20 258
10
12
14
16
18
20
22
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 1
Intructor grade
PG3
PG4
PG5
0 5 10 15 206
8
10
12
14
16
18
20
22
24
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 2
Intructor grade
PG3
PG4
PG5
0 5 10 15 20 250
5
10
15
20
25
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 3
Intructor grade
PG3
PG4
PG5
Predicted scores on grouund truth set.
Average case and worst case analysis:
Average Case: RMSE
Assignment 1 Assignment 2 Assignment 3Mean Std Mean Std Mean Std
Median 4.94 5.54 4.12PG1 3.77 (23%) 0.02 4.93 (11%) 0.03 3.66 (11%) 0.01PG3 3.22 (35%) 0.02 5.24 (5%) 0.04 3.15 (23%) 0.02PG4 3.35 (32%) 0.05 4.75 (14%) 0.06 2.83 (31%) 0.09PG5 3.31 (33%) 0.05 4.69 (15%) 0.05 2.76 (33%) 0.09
Worst Case: Maximum prediction deviation(fairness issue)
Assignment 1 Assignment 2 Assignment 3PG3 6.52 11.10 6.77PG4 5.84 9.86 6.70PG5 5.81 9.85 5.79
Fei MI MOOC Learning Analytics CSE, HKUST
Results for Cardinal Models
0 5 10 15 20 258
10
12
14
16
18
20
22
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 1
Intructor grade
PG3
PG4
PG5
0 5 10 15 206
8
10
12
14
16
18
20
22
24
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 2
Intructor grade
PG3
PG4
PG5
0 5 10 15 20 250
5
10
15
20
25
Ground Truth Submissions
Pre
dic
ted
Sco
re
Assignment 3
Intructor grade
PG3
PG4
PG5
Predicted scores on grouund truth set.
Average case and worst case analysis:
Average Case: RMSE
Assignment 1 Assignment 2 Assignment 3Mean Std Mean Std Mean Std
Median 4.94 5.54 4.12PG1 3.77 (23%) 0.02 4.93 (11%) 0.03 3.66 (11%) 0.01PG3 3.22 (35%) 0.02 5.24 (5%) 0.04 3.15 (23%) 0.02PG4 3.35 (32%) 0.05 4.75 (14%) 0.06 2.83 (31%) 0.09PG5 3.31 (33%) 0.05 4.69 (15%) 0.05 2.76 (33%) 0.09
Worst Case: Maximum prediction deviation(fairness issue)
Assignment 1 Assignment 2 Assignment 3PG3 6.52 11.10 6.77PG4 5.84 9.86 6.70PG5 5.81 9.85 5.79
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Ordinal Peer Grading
Problem Formulation:1 Rank aggregation problem (Dwork et al. 2001)2 Preference learning problem (Chu and Ghahramani 2005;
Furnkranz and Hullermeier 2010).
Popular Model:1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and
Joachims 2014).
hypothesis = P(ui �ρ(v) uj) =1
1 + exp(−(sui − suj ))
L =λ
2σ2
∑u∈U
(su − µ)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
?
Combine cardinal and ordinal models
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
Fei MI MOOC Learning Analytics CSE, HKUST
Ordinal Peer Grading
Problem Formulation:1 Rank aggregation problem (Dwork et al. 2001)2 Preference learning problem (Chu and Ghahramani 2005;
Furnkranz and Hullermeier 2010).Popular Model:
1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and
Joachims 2014).
hypothesis = P(ui �ρ(v) uj) =1
1 + exp(−(sui − suj ))
L =λ
2σ2
∑u∈U
(su − µ)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
?
Combine cardinal and ordinal models
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
Fei MI MOOC Learning Analytics CSE, HKUST
Ordinal Peer Grading
Problem Formulation:1 Rank aggregation problem (Dwork et al. 2001)2 Preference learning problem (Chu and Ghahramani 2005;
Furnkranz and Hullermeier 2010).Popular Model:
1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and
Joachims 2014).
hypothesis = P(ui �ρ(v) uj) =1
1 + exp(−(sui − suj ))
L =λ
2σ2
∑u∈U
(su − µ)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
?
Combine cardinal and ordinal models
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
Fei MI MOOC Learning Analytics CSE, HKUST
Ordinal Peer Grading
Problem Formulation:1 Rank aggregation problem (Dwork et al. 2001)2 Preference learning problem (Chu and Ghahramani 2005;
Furnkranz and Hullermeier 2010).Popular Model:
1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and
Joachims 2014).
hypothesis = P(ui �ρ(v) uj) =1
1 + exp(−(sui − suj ))
L =λ
2σ2
∑u∈U
(su − µ)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
?
Combine cardinal and ordinal models
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
Fei MI MOOC Learning Analytics CSE, HKUST
Combining Cardinal and Ordinal Evaluations
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
1 Augment ordinal models with cardinal prediction as prior2 Tune the predictions of cardinal model with the ordinal peer
preferences3 Principled approach to combining both cardinal and ordinal peer
evaluations
Fei MI MOOC Learning Analytics CSE, HKUST
Combining Cardinal and Ordinal Evaluations
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
1 Augment ordinal models with cardinal prediction as prior
2 Tune the predictions of cardinal model with the ordinal peerpreferences
3 Principled approach to combining both cardinal and ordinal peerevaluations
Fei MI MOOC Learning Analytics CSE, HKUST
Combining Cardinal and Ordinal Evaluations
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
1 Augment ordinal models with cardinal prediction as prior2 Tune the predictions of cardinal model with the ordinal peer
preferences
3 Principled approach to combining both cardinal and ordinal peerevaluations
Fei MI MOOC Learning Analytics CSE, HKUST
Combining Cardinal and Ordinal Evaluations
L =λ
2σ2
∑u∈U
(su − µu)2 −∑v∈V
∑ui�ρ(v)
uj
log(hypothesis)
1 Augment ordinal models with cardinal prediction as prior2 Tune the predictions of cardinal model with the ordinal peer
preferences3 Principled approach to combining both cardinal and ordinal peer
evaluations
Fei MI MOOC Learning Analytics CSE, HKUST
Results for Cardinal + Ordinal Models
Ordinal evaluation: Percentage of correctly evaluated pairs
Assignment 1 Assignment 2 Assignment 3Cardinal Models
PG3 0.7526 0.6155 0.7775PG4 0.6928 0.6552 0.7854PG5 0.6979 0.6616 0.7889
“Cardinal + Ordinal” ModelsPG3+BT 0.7577 0.6110 0.7892PG4+BT 0.7221 0.6484 0.7931PG5+BT 0.7191 0.6646 0.8000PG3+BT+G 0.7645 0.6587 0.7879PG4+BT+G 0.7145 0.7032 0.7896PG5+BT+G 0.7170 0.7065 0.8013PG3+RBTL 0.7660 0.6494 0.7979PG4+RBTL 0.7064 0.6745 0.7835PG5+RBTL 0.7201 0.6845 0.8009
Pure Ordinal ModelsBT (or BTL) 0.6536 0.6329 0.6896RBTL 0.6583 0.6432 0.6996BT+G 0.6547 0.6535 0.7009BT Same Initial 0.6387 0.6194 0.6407BT Random Initial 0.6381 0.6416 0.6667
Baseline MethodMedian 0.6043 0.6610 0.6753
Caidinal evaluation: RMSE
Assignment 1 Assignment 2 Assignment 3PG3 3.22 5.24 3.15PG3+BT 3.04 5.30 3.18PG3+BT+G 3.01 4.95 3.10PG3+RBTL 3.00 5.04 3.15PG4 3.35 4.75 2.83PG4+BT 3.47 4.87 3.03PG4+BT+G 3.31 4.52 2.91PG4+RBTL 3.44 4.70 2.77PG5 3.31 4.69 2.76PG5+BT 3.30 4.77 2.93PG5+BT+G 3.35 4.50 2.74PG5+RBTL 3.24 4.62 2.70
1 Cardinal models perform better than pure ordinal models2 Combined model further boosts performance
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Background
Motivations:
1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Challenges:
1 Diverse engagement patterns (Data noise)2 Low-intensity participation (Data sparsity)3 High dropout rate (Data imbalance)
Fei MI MOOC Learning Analytics CSE, HKUST
Background
Motivations:
1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns
Challenges:
1 Diverse engagement patterns (Data noise)2 Low-intensity participation (Data sparsity)3 High dropout rate (Data imbalance)
Fei MI MOOC Learning Analytics CSE, HKUST
Related Work
Attentions from:
1 Individual research group2 Conference workshop (EMNLP 2014)3 KDD cup 2015
Machine learning models
1 SVM, Decision Tree (EMNLP 2014 Workshop)2 Logistic Regression (AAAI 2015)3 Probabilistic Soft Logic (AAAI 2014)4 Survival Model (NIPS2013)5 HMM (Technical report 2013)6 NLP (ISWSM 2014)
Fei MI MOOC Learning Analytics CSE, HKUST
Related Work
Attentions from:
1 Individual research group2 Conference workshop (EMNLP 2014)3 KDD cup 2015
Machine learning models
1 SVM, Decision Tree (EMNLP 2014 Workshop)2 Logistic Regression (AAAI 2015)3 Probabilistic Soft Logic (AAAI 2014)4 Survival Model (NIPS2013)5 HMM (Technical report 2013)6 NLP (ISWSM 2014)
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Prediction Problem Formulation
Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Input activity feature sequence: (x1, . . . , xt)
3 Dropout label sequence: (y1, . . . , yt)
4 Inputs are dependent (Temporal relationship)
5 Build incremental models and make predictions
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (Coursera)
1 “Science of Gastronomy”, six-week course.2 85394 → 39877
Feature Explanation (feature aggregated on a weekly basis)Lecture view (Lv) Number of lecture videos viewed by a studentLecture download (Ld) Number of lecture videos downloaded by a studentQuiz attempt (Qa) Number of quizzes attempted by a studentForum view (Fv) Number of times forum contents viewed by a studentForum thread (Ft) Number of forum threads created by a studentForum post (Fp) Number of forum posts submitted by a studentForum comment (Fc) Number of forum comments submitted by a student
Feature set of Coursera course
Feature Lv Ld Qa Fv Ft Fp FcWeek 1 26017 17991 15772 10694 581 1568 746Week 2 17991 4959 9752 5105 198 785 459Week 3 10924 3420 7384 3158 187 646 304Week 4 9634 3279 6553 2624 74 320 182Week 5 8045 3017 5827 2046 70 246 143Week 6 7749 2939 5150 1847 56 238 132
Aggregate feature statistics of Coursera course
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (Coursera)
1 “Science of Gastronomy”, six-week course.2 85394 → 39877
Feature Explanation (feature aggregated on a weekly basis)Lecture view (Lv) Number of lecture videos viewed by a studentLecture download (Ld) Number of lecture videos downloaded by a studentQuiz attempt (Qa) Number of quizzes attempted by a studentForum view (Fv) Number of times forum contents viewed by a studentForum thread (Ft) Number of forum threads created by a studentForum post (Fp) Number of forum posts submitted by a studentForum comment (Fc) Number of forum comments submitted by a student
Feature set of Coursera course
Feature Lv Ld Qa Fv Ft Fp FcWeek 1 26017 17991 15772 10694 581 1568 746Week 2 17991 4959 9752 5105 198 785 459Week 3 10924 3420 7384 3158 187 646 304Week 4 9634 3279 6553 2624 74 320 182Week 5 8045 3017 5827 2046 70 246 143Week 6 7749 2939 5150 1847 56 238 132
Aggregate feature statistics of Coursera course
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (Coursera)
1 “Science of Gastronomy”, six-week course.2 85394 → 39877
Feature Explanation (feature aggregated on a weekly basis)Lecture view (Lv) Number of lecture videos viewed by a studentLecture download (Ld) Number of lecture videos downloaded by a studentQuiz attempt (Qa) Number of quizzes attempted by a studentForum view (Fv) Number of times forum contents viewed by a studentForum thread (Ft) Number of forum threads created by a studentForum post (Fp) Number of forum posts submitted by a studentForum comment (Fc) Number of forum comments submitted by a student
Feature set of Coursera course
Feature Lv Ld Qa Fv Ft Fp FcWeek 1 26017 17991 15772 10694 581 1568 746Week 2 17991 4959 9752 5105 198 785 459Week 3 10924 3420 7384 3158 187 646 304Week 4 9634 3279 6553 2624 74 320 182Week 5 8045 3017 5827 2046 70 246 143Week 6 7749 2939 5150 1847 56 238 132
Aggregate feature statistics of Coursera course
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (edX)
1 “Introduction to Java Programming”, ten-week course.2 46972 → 27629
Feature Explanation (feature aggregated on a weekly basis)Navigate Number of times a student navigates through the course pageForum Number of times a student interacts with course forumVideo Number of course video activities (click-stream) by a studentProblem Number of course problem activities by a studentAccess Number of activities with other course objects (besides above)
Feature set of edX course
Time Navigate Forum Video Problem AccessWeek 1 385293 50105 1324469 559344 230300Week 2 384858 73390 1561386 534947 235758Week 3 317237 68738 1324338 482988 194007Week 4 240251 41803 1061124 353932 153791Week 5 195758 37656 809665 685558 118400Week 6 219658 44366 731733 259522 115039Week 7 156255 30893 624088 474377 83297Week 8 158369 34424 550557 213088 77454Week 9 144963 34754 466213 161164 74577Week 10 115369 9505 290103 411429 57210
Aggregate feature statistics of edX course
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (edX)
1 “Introduction to Java Programming”, ten-week course.2 46972 → 27629
Feature Explanation (feature aggregated on a weekly basis)Navigate Number of times a student navigates through the course pageForum Number of times a student interacts with course forumVideo Number of course video activities (click-stream) by a studentProblem Number of course problem activities by a studentAccess Number of activities with other course objects (besides above)
Feature set of edX course
Time Navigate Forum Video Problem AccessWeek 1 385293 50105 1324469 559344 230300Week 2 384858 73390 1561386 534947 235758Week 3 317237 68738 1324338 482988 194007Week 4 240251 41803 1061124 353932 153791Week 5 195758 37656 809665 685558 118400Week 6 219658 44366 731733 259522 115039Week 7 156255 30893 624088 474377 83297Week 8 158369 34424 550557 213088 77454Week 9 144963 34754 466213 161164 74577Week 10 115369 9505 290103 411429 57210
Aggregate feature statistics of edX course
Fei MI MOOC Learning Analytics CSE, HKUST
Datasets for Dropout Prediction (edX)
1 “Introduction to Java Programming”, ten-week course.2 46972 → 27629
Feature Explanation (feature aggregated on a weekly basis)Navigate Number of times a student navigates through the course pageForum Number of times a student interacts with course forumVideo Number of course video activities (click-stream) by a studentProblem Number of course problem activities by a studentAccess Number of activities with other course objects (besides above)
Feature set of edX course
Time Navigate Forum Video Problem AccessWeek 1 385293 50105 1324469 559344 230300Week 2 384858 73390 1561386 534947 235758Week 3 317237 68738 1324338 482988 194007Week 4 240251 41803 1061124 353932 153791Week 5 195758 37656 809665 685558 118400Week 6 219658 44366 731733 259522 115039Week 7 156255 30893 624088 474377 83297Week 8 158369 34424 550557 213088 77454Week 9 144963 34754 466213 161164 74577Week 10 115369 9505 290103 411429 57210
Aggregate feature statistics of edX course
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Definitions
1 No universally accepted definition2 Three definitions capture different contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei MI MOOC Learning Analytics CSE, HKUST
Dropout Definitions
1 No universally accepted definition2 Three definitions capture different contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input
with delayed copies3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous
state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM
cells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure
2 Time-delay neural networks (TDNN), augment the current inputwith delayed copies
3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous
state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM
cells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input
with delayed copies
3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous
state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM
cells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input
with delayed copies3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous
state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM
cells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input
with delayed copies3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly
3 State space models: two variants of IOHMM with continuousstate space.
4 Recurrent neural networks: vanilla RNN and RNN with LSTMcells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input
with delayed copies3 Temporal span fixed by sliding window
Temporal models:
1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous
state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM
cells as hidden units.
Fei MI MOOC Learning Analytics CSE, HKUST
Input-Ouput Hidden Markov Models
IOHMM 1:ht = Aht−1 + Bxt +N (0,Q)
yt = Cht +N (0,R)(1)
𝑦𝑡 𝑦𝑡+1
𝒉𝒕𝒉𝒕−𝟏
𝑦𝑡−1
𝒙𝒕−𝟏 𝒙𝒕 𝒙𝒕+𝟏
Hidden states
Dropout labels
Feature inputs
𝒉𝒕+𝟏
…
…
…
…
IOHMM 1
Fei MI MOOC Learning Analytics CSE, HKUST
Input-Ouput Hidden Markov Models
IOHMM 2:ht = Aht−1 + Bxt +N (0,Q)
yt = Cht + Dxt +N (0,R)(2)
𝑦𝑡 𝑦𝑡+1
𝒉𝒕𝒉𝒕−𝟏
𝑦𝑡−1
𝒙𝒕−𝟏 𝒙𝒕 𝒙𝒕+𝟏
Hidden states
Dropout labels
Feature inputs
𝒉𝒕+𝟏
…
…
…
…
IOHMM 2
Fei MI MOOC Learning Analytics CSE, HKUST
Recurrent Neural Network
Vanilla RNN:
Left: Vanilla RNN structure; Right: Vanilla RNN unfolded
ht = H(W1xt + W2ht−1 + bh)
yt = F(W3ht + by )(3)
Fei MI MOOC Learning Analytics CSE, HKUST
Recurrent Neural Network
Vanilla RNN:
Left: Vanilla RNN structure; Right: Vanilla RNN unfolded
ht = H(W1xt + W2ht−1 + bh)
yt = F(W3ht + by )(3)
Fei MI MOOC Learning Analytics CSE, HKUST
Properties of RNN
Pros:
1 Use contextual or sequential information by recurrent connection2 Nonlinear model
Cons:
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian
3 Vanishing gradient problem4 The range of temporality that can be accessed in practice is
usually quite limited5 Dynamic state of regular RNN is short-term memory
Fei MI MOOC Learning Analytics CSE, HKUST
Properties of RNN
Pros:
1 Use contextual or sequential information by recurrent connection2 Nonlinear model
Cons:
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian
3 Vanishing gradient problem4 The range of temporality that can be accessed in practice is
usually quite limited5 Dynamic state of regular RNN is short-term memory
Fei MI MOOC Learning Analytics CSE, HKUST
Properties of RNN
Pros:
1 Use contextual or sequential information by recurrent connection2 Nonlinear model
Cons:
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian
3 Vanishing gradient problem4 The range of temporality that can be accessed in practice is
usually quite limited5 Dynamic state of regular RNN is short-term memory
Fei MI MOOC Learning Analytics CSE, HKUST
Properties of RNN
Pros:
1 Use contextual or sequential information by recurrent connection2 Nonlinear model
Cons:
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian
3 Vanishing gradient problem
4 The range of temporality that can be accessed in practice isusually quite limited
5 Dynamic state of regular RNN is short-term memory
Fei MI MOOC Learning Analytics CSE, HKUST
Properties of RNN
Pros:
1 Use contextual or sequential information by recurrent connection2 Nonlinear model
Cons:
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian
3 Vanishing gradient problem4 The range of temporality that can be accessed in practice is
usually quite limited5 Dynamic state of regular RNN is short-term memory
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.
2 They design a memory cell withlogistic and linear units withmultiplicative interactions
1 Information get into a cellwhenever the “input” gateis on
2 Information stays in the cellso long as the “forget”gate is closed
3 Information can read fromthe cell by turning the“output” gate on
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
m n
it = σ(Wxixt + Whiht−1 + Wcict−1 + bi )
ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf )
ct = ft ⊗ ct−1 + it ⊗ tanh(Wxcxt + Whcht−1 + bc)
ot = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo)
ht = ot ⊗ tanh(ct)
(4)
Fei MI MOOC Learning Analytics CSE, HKUST
Long Short-Term Memory Cell (LSTM)
m n
it = σ(Wxixt + Whiht−1 + Wcict−1 + bi )
ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf )
ct = ft ⊗ ct−1 + it ⊗ tanh(Wxcxt + Whcht−1 + bc)
ot = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo)
ht = ot ⊗ tanh(ct)
(4)
Fei MI MOOC Learning Analytics CSE, HKUST
Preservation of Gradient Information
1 Input gate remains closed → the activation of the cell will not beoverwritten by the new inputs arriving in the network
2 Open the output gate → retrieve inputs from much later in thesequence.
Fei MI MOOC Learning Analytics CSE, HKUST
Hybrid of LSTM Memory Cells and RNN (LSTM Network)
…
…
…
… ……
…
Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Nonlinear Models Help
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF1)
Nonlinear SVM (Stacked)Linear SVM (Stacked)Nonlinear SVM (Non-stacked)Linear SVM (Non-stacked)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF3)
AUC scores of nonlinear and linear SVMs for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF1)
Nonlinear SVM (Stacked)Linear SVM (Stacked)Nonlinear SVM (Non-stacked)Linear SVM (Non-stacked)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF3)
AUC scores of nonlinear and linear SVMs for edX course
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF1)
Vanilla RNN
IOHMM 1
IOHMM 2
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF3)
AUC scores of vanilla RNN and IOHMMs for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF1)
IOHMM 1
IOHMM 2
Vanilla RNN
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF3)
AUC scores of vanilla RNN and IOHMMs for edX course
Fei MI MOOC Learning Analytics CSE, HKUST
Nonlinear Models Help
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF1)
Nonlinear SVM (Stacked)Linear SVM (Stacked)Nonlinear SVM (Non-stacked)Linear SVM (Non-stacked)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1SVM (DEF3)
AUC scores of nonlinear and linear SVMs for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF1)
Nonlinear SVM (Stacked)Linear SVM (Stacked)Nonlinear SVM (Non-stacked)Linear SVM (Non-stacked)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1SVM (DEF3)
AUC scores of nonlinear and linear SVMs for edX course
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF1)
Vanilla RNN
IOHMM 1
IOHMM 2
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Vinilla RNN, IOHMM (DEF3)
AUC scores of vanilla RNN and IOHMMs for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF1)
IOHMM 1
IOHMM 2
Vanilla RNN
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Vanilla RNN, IOHMM (DEF3)
AUC scores of vanilla RNN and IOHMMs for edX course
Fei MI MOOC Learning Analytics CSE, HKUST
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective
2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets
Fei MI MOOC Learning Analytics CSE, HKUST
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective
2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets
Fei MI MOOC Learning Analytics CSE, HKUST
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective
2 Vanilla RNN < LSTM network; Still among the top 3 methods
3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets
Fei MI MOOC Learning Analytics CSE, HKUST
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective
2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 1
4 Baselines ' vanilla RNN; Not consistent on two datasets
Fei MI MOOC Learning Analytics CSE, HKUST
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective
2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets
Fei MI MOOC Learning Analytics CSE, HKUST
Outline
1 Background and Motivation
2 Peer Grading Problem Formulation and Related Work
3 Cardinal Peer Grading Model Extensions
4 Combine Cardinal & Ordinal Peer Grading
5 Dropout Prediction Related Work and Problem Formulation
6 Temporal Models
7 Experiments for Temporal Models
8 Conclusion
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs
2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel
3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:
Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Conclusion
Contributions:
1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant
Take-home Message:Peer grading:
1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a
common framework.
Dropout prediction:
1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells
achieve promising performance boost
Fei MI MOOC Learning Analytics CSE, HKUST
Current Limitations and Future Work
Peer grading:
1 Limited ground truth set2 Semi-supervised learning techniques
Dropout prediction:
1 Try more network structures: max-pooling layer2 Feature engineering: detailed features3 Cross-course information
Fei MI MOOC Learning Analytics CSE, HKUST
Current Limitations and Future Work
Peer grading:
1 Limited ground truth set2 Semi-supervised learning techniques
Dropout prediction:
1 Try more network structures: max-pooling layer2 Feature engineering: detailed features3 Cross-course information
Fei MI MOOC Learning Analytics CSE, HKUST
Current Limitations and Future Work
Peer grading:
1 Limited ground truth set2 Semi-supervised learning techniques
Dropout prediction:
1 Try more network structures: max-pooling layer2 Feature engineering: detailed features3 Cross-course information
Fei MI MOOC Learning Analytics CSE, HKUST
Q & A
Fei MI MOOC Learning Analytics CSE, HKUST