machine learning models for some learning analytics issues...

Machine learning models for some learning analyticsissues in massive open online courses

Fei MI

Dept. of Computer Science and EngineeringHong Kong University of Science and Technology

Thesis supervised by Dit-Yan Yeung27/05/2015

Fei MI MOOC Learning Analytics CSE, HKUST

Page 2: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

1 Background and Motivation

2 Peer Grading Problem Formulation and Related Work

3 Cardinal Peer Grading Model Extensions

4 Combine Cardinal & Ordinal Peer Grading

5 Dropout Prediction Related Work and Problem Formulation

6 Temporal Models

7 Experiments for Temporal Models

8 Conclusion

Page 3: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 4: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

MOOC Platform

Page 5: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

MOOC Platform in China

Learning Analytics Issues

Current MOOC environment

1 Popularity and rapid development of MOOC platforms2 Massive, Open, Online nature (introduce new era of education)3 Access any where, any time (extend education boundary)

Peer Grading

1 Address student assessment issue in MOOCs2 Subjective, open-ended assignments3 Students benefit from grading process

Dropout Prediction

1 High dropout rate2 Help instructor intervene, drag back to class3 Understand student engagement patterns

Page 7: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Peer Grading

Dropout Prediction

Page 8: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Peer Grading

Dropout Prediction

1 High dropout rate

2 Help instructor intervene, drag back to class3 Understand student engagement patterns

Page 9: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Peer Grading

Dropout Prediction

Page 10: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 11: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Problem Formulation

Page 12: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Problem Formulation

Page 13: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Problem Formulation

Page 14: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Problem Formulation

Page 15: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Cardinal vs. Ordinal

Page 16: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Peer Grading Data

1 “Science, Technology, and Society in China I” on Cousera2 Three assignments in total3 Three pieces assigned to a grader, cardinal rubrics4 Default score aggregation is done by taking median of peer

grades;

Assignment 1 Assignment 2 Assignment 3# finished students 1202 845 724# peer grades 3201 2261 2084# staff grades 23 19 23Full score 21 25 25Mean score 14.8 (70%) 17.2 (69%) 16.5 (58%)

Summary statistics of assignments for peer grading

Page 17: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Peer Grading Data

1 “Science, Technology, and Society in China I” on Cousera2 Three assignments in total3 Three pieces assigned to a grader, cardinal rubrics4 Default score aggregation is done by taking median of peer

grades;

Assignment 1 Assignment 2 Assignment 3# finished students 1202 845 724# peer grades 3201 2261 2084# staff grades 23 19 23Full score 21 25 25Mean score 14.8 (70%) 17.2 (69%) 16.5 (58%)

Summary statistics of assignments for peer grading

Page 18: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 19: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Cardinal Peer Grading Model [Piech et al. 2013]

PG𝟏

Page 20: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

PG𝟏

= ?

Page 21: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

PG𝟑

Page 22: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

PG𝟑

Page 23: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

PG𝟑

Page 24: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Model Extensions

Still relate grader reliability with grader scoreModel relationship in a probabilistic form rather than a linear/deterministic form

PG𝟒 & PG𝟓

Page 25: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Model Extensions

Still relate grader reliability with grader scoreModel relationship in a probabilistic form rather than a linear/deterministic form

PG𝟒 PG𝟓

PG𝟒 & PG𝟓

Page 26: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Results for Cardinal Models

0 5 10 15 20 258

10

12

14

16

18

20

22

Ground Truth Submissions

Pre

dic

ted

Sco

re

Assignment 1

Intructor grade

PG3

PG4

PG5

0 5 10 15 206

8

10

12

14

16

18

20

22

24

Pre

dic

ted

Sco

re

Assignment 2

Intructor grade

PG3

PG4

PG5

0 5 10 15 20 250

5

10

15

20

25

Pre

dic

ted

Sco

re

Assignment 3

Intructor grade

PG3

PG4

PG5

Predicted scores on grouund truth set.

Average case and worst case analysis:

Average Case: RMSE

Assignment 1 Assignment 2 Assignment 3Mean Std Mean Std Mean Std

Median 4.94 5.54 4.12PG1 3.77 (23%) 0.02 4.93 (11%) 0.03 3.66 (11%) 0.01PG3 3.22 (35%) 0.02 5.24 (5%) 0.04 3.15 (23%) 0.02PG4 3.35 (32%) 0.05 4.75 (14%) 0.06 2.83 (31%) 0.09PG5 3.31 (33%) 0.05 4.69 (15%) 0.05 2.76 (33%) 0.09

Worst Case: Maximum prediction deviation(fairness issue)

Assignment 1 Assignment 2 Assignment 3PG3 6.52 11.10 6.77PG4 5.84 9.86 6.70PG5 5.81 9.85 5.79

Page 27: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Results for Cardinal Models

0 5 10 15 20 258

10

12

14

16

18

20

22

Pre

dic

ted

Sco

re

Assignment 1

Intructor grade

PG3

PG4

PG5

0 5 10 15 206

8

10

12

14

16

18

20

22

24

Pre

dic

ted

Sco

re

Assignment 2

Intructor grade

PG3

PG4

PG5

0 5 10 15 20 250

5

10

15

20

25

Pre

dic

ted

Sco

re

Assignment 3

Intructor grade

PG3

PG4

PG5

Predicted scores on grouund truth set.

Average case and worst case analysis:

Average Case: RMSE

Assignment 1 Assignment 2 Assignment 3Mean Std Mean Std Mean Std

Median 4.94 5.54 4.12PG1 3.77 (23%) 0.02 4.93 (11%) 0.03 3.66 (11%) 0.01PG3 3.22 (35%) 0.02 5.24 (5%) 0.04 3.15 (23%) 0.02PG4 3.35 (32%) 0.05 4.75 (14%) 0.06 2.83 (31%) 0.09PG5 3.31 (33%) 0.05 4.69 (15%) 0.05 2.76 (33%) 0.09

Worst Case: Maximum prediction deviation(fairness issue)

Assignment 1 Assignment 2 Assignment 3PG3 6.52 11.10 6.77PG4 5.84 9.86 6.70PG5 5.81 9.85 5.79

Page 28: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 29: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Ordinal Peer Grading

Problem Formulation:1 Rank aggregation problem (Dwork et al. 2001)2 Preference learning problem (Chu and Ghahramani 2005;

Furnkranz and Hullermeier 2010).

Popular Model:1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and

Joachims 2014).

hypothesis = P(ui �ρ(v) uj) =1

1 + exp(−(sui − suj ))

L =λ

2σ2

∑u∈U

(su − µ)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

?

Combine cardinal and ordinal models

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

Page 30: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Furnkranz and Hullermeier 2010).Popular Model:

1 Bradley-Terry model (Bradley and Terry 1952)2 Recently applied to peer grading (Shah et al. 2013; Raman and

Joachims 2014).

L =λ

2σ2

∑u∈U

(su − µ)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

?

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

Page 31: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Joachims 2014).

L =λ

2σ2

∑u∈U

(su − µ)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

?

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

Page 32: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Joachims 2014).

L =λ

2σ2

∑u∈U

(su − µ)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

?

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

Page 33: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Combining Cardinal and Ordinal Evaluations

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

1 Augment ordinal models with cardinal prediction as prior2 Tune the predictions of cardinal model with the ordinal peer

preferences3 Principled approach to combining both cardinal and ordinal peer

evaluations

Page 34: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

1 Augment ordinal models with cardinal prediction as prior

2 Tune the predictions of cardinal model with the ordinal peerpreferences

3 Principled approach to combining both cardinal and ordinal peerevaluations

Page 35: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

preferences

3 Principled approach to combining both cardinal and ordinal peerevaluations

Page 36: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

L =λ

2σ2

∑u∈U

(su − µu)2 −∑v∈V

∑ui�ρ(v)

uj

log(hypothesis)

preferences3 Principled approach to combining both cardinal and ordinal peer

evaluations

Page 37: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Results for Cardinal + Ordinal Models

Ordinal evaluation: Percentage of correctly evaluated pairs

Assignment 1 Assignment 2 Assignment 3Cardinal Models

PG3 0.7526 0.6155 0.7775PG4 0.6928 0.6552 0.7854PG5 0.6979 0.6616 0.7889

“Cardinal + Ordinal” ModelsPG3+BT 0.7577 0.6110 0.7892PG4+BT 0.7221 0.6484 0.7931PG5+BT 0.7191 0.6646 0.8000PG3+BT+G 0.7645 0.6587 0.7879PG4+BT+G 0.7145 0.7032 0.7896PG5+BT+G 0.7170 0.7065 0.8013PG3+RBTL 0.7660 0.6494 0.7979PG4+RBTL 0.7064 0.6745 0.7835PG5+RBTL 0.7201 0.6845 0.8009

Pure Ordinal ModelsBT (or BTL) 0.6536 0.6329 0.6896RBTL 0.6583 0.6432 0.6996BT+G 0.6547 0.6535 0.7009BT Same Initial 0.6387 0.6194 0.6407BT Random Initial 0.6381 0.6416 0.6667

Baseline MethodMedian 0.6043 0.6610 0.6753

Caidinal evaluation: RMSE

Assignment 1 Assignment 2 Assignment 3PG3 3.22 5.24 3.15PG3+BT 3.04 5.30 3.18PG3+BT+G 3.01 4.95 3.10PG3+RBTL 3.00 5.04 3.15PG4 3.35 4.75 2.83PG4+BT 3.47 4.87 3.03PG4+BT+G 3.31 4.52 2.91PG4+RBTL 3.44 4.70 2.77PG5 3.31 4.69 2.76PG5+BT 3.30 4.77 2.93PG5+BT+G 3.35 4.50 2.74PG5+RBTL 3.24 4.62 2.70

1 Cardinal models perform better than pure ordinal models2 Combined model further boosts performance

Page 38: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 39: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Background

Motivations:

Challenges:

1 Diverse engagement patterns (Data noise)2 Low-intensity participation (Data sparsity)3 High dropout rate (Data imbalance)

Page 40: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Background

Motivations:

Challenges:

1 Diverse engagement patterns (Data noise)2 Low-intensity participation (Data sparsity)3 High dropout rate (Data imbalance)

Page 41: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Related Work

Attentions from:

1 Individual research group2 Conference workshop (EMNLP 2014)3 KDD cup 2015

Machine learning models

1 SVM, Decision Tree (EMNLP 2014 Workshop)2 Logistic Regression (AAAI 2015)3 Probabilistic Soft Logic (AAAI 2014)4 Survival Model (NIPS2013)5 HMM (Technical report 2013)6 NLP (ISWSM 2014)

Related Work

Attentions from:

1 Individual research group2 Conference workshop (EMNLP 2014)3 KDD cup 2015

Machine learning models

1 SVM, Decision Tree (EMNLP 2014 Workshop)2 Logistic Regression (AAAI 2015)3 Probabilistic Soft Logic (AAAI 2014)4 Survival Model (NIPS2013)5 HMM (Technical report 2013)6 NLP (ISWSM 2014)

Dropout Prediction Problem Formulation

Sequence labeling task:1 A MOOC spans over a period of time usually no more than 10 weeks

Week 1 Week 2 Week 3 Week 4 Week t

𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡

𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels

Activities

2 Input activity feature sequence: (x1, . . . , xt)

3 Dropout label sequence: (y1, . . . , yt)

4 Inputs are dependent (Temporal relationship)

5 Build incremental models and make predictions

Page 44: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Activities

Page 45: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Activities

Page 46: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Activities

Page 47: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Activities

Page 48: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Activities

Page 49: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Datasets for Dropout Prediction (Coursera)

1 “Science of Gastronomy”, six-week course.2 85394 → 39877

Feature Explanation (feature aggregated on a weekly basis)Lecture view (Lv) Number of lecture videos viewed by a studentLecture download (Ld) Number of lecture videos downloaded by a studentQuiz attempt (Qa) Number of quizzes attempted by a studentForum view (Fv) Number of times forum contents viewed by a studentForum thread (Ft) Number of forum threads created by a studentForum post (Fp) Number of forum posts submitted by a studentForum comment (Fc) Number of forum comments submitted by a student

Feature set of Coursera course

Feature Lv Ld Qa Fv Ft Fp FcWeek 1 26017 17991 15772 10694 581 1568 746Week 2 17991 4959 9752 5105 198 785 459Week 3 10924 3420 7384 3158 187 646 304Week 4 9634 3279 6553 2624 74 320 182Week 5 8045 3017 5827 2046 70 246 143Week 6 7749 2939 5150 1847 56 238 132

Aggregate feature statistics of Coursera course

Page 50: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 51: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 52: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Datasets for Dropout Prediction (edX)

1 “Introduction to Java Programming”, ten-week course.2 46972 → 27629

Feature Explanation (feature aggregated on a weekly basis)Navigate Number of times a student navigates through the course pageForum Number of times a student interacts with course forumVideo Number of course video activities (click-stream) by a studentProblem Number of course problem activities by a studentAccess Number of activities with other course objects (besides above)

Feature set of edX course

Time Navigate Forum Video Problem AccessWeek 1 385293 50105 1324469 559344 230300Week 2 384858 73390 1561386 534947 235758Week 3 317237 68738 1324338 482988 194007Week 4 240251 41803 1061124 353932 153791Week 5 195758 37656 809665 685558 118400Week 6 219658 44366 731733 259522 115039Week 7 156255 30893 624088 474377 83297Week 8 158369 34424 550557 213088 77454Week 9 144963 34754 466213 161164 74577Week 10 115369 9505 290103 411429 57210

Aggregate feature statistics of edX course

Page 53: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 54: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 55: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Dropout Definitions

1 No universally accepted definition2 Three definitions capture different contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 56: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Dropout Definitions

1 No universally accepted definition2 Three definitions capture different contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 57: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 58: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure2 Time-delay neural networks (TDNN), augment the current input

with delayed copies3 Temporal span fixed by sliding window

Temporal models:

1 Markov assumption2 Learn and represent the temporal relationships from data directly3 State space models: two variants of IOHMM with continuous

state space.4 Recurrent neural networks: vanilla RNN and RNN with LSTM

cells as hidden units.

Page 59: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

1 Features aggregated using sliding window structure

2 Time-delay neural networks (TDNN), augment the current inputwith delayed copies

3 Temporal span fixed by sliding window

Temporal models:

Page 60: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

with delayed copies

3 Temporal span fixed by sliding window

Temporal models:

Page 61: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Temporal models:

Page 62: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Temporal models:

1 Markov assumption2 Learn and represent the temporal relationships from data directly

3 State space models: two variants of IOHMM with continuousstate space.

4 Recurrent neural networks: vanilla RNN and RNN with LSTMcells as hidden units.

Page 63: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Temporal models:

Page 64: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Input-Ouput Hidden Markov Models

IOHMM 1:ht = Aht−1 + Bxt +N (0,Q)

yt = Cht +N (0,R)(1)

𝑦𝑡 𝑦𝑡+1

𝒉𝒕𝒉𝒕−𝟏

𝑦𝑡−1

𝒙𝒕−𝟏 𝒙𝒕 𝒙𝒕+𝟏

Hidden states

Dropout labels

Feature inputs

𝒉𝒕+𝟏

…

IOHMM 1

Page 65: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Input-Ouput Hidden Markov Models

IOHMM 2:ht = Aht−1 + Bxt +N (0,Q)

yt = Cht + Dxt +N (0,R)(2)

𝑦𝑡 𝑦𝑡+1

𝒉𝒕𝒉𝒕−𝟏

𝑦𝑡−1

𝒙𝒕−𝟏 𝒙𝒕 𝒙𝒕+𝟏

Hidden states

Dropout labels

Feature inputs

𝒉𝒕+𝟏

…

IOHMM 2

Page 66: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Recurrent Neural Network

Vanilla RNN:

Left: Vanilla RNN structure; Right: Vanilla RNN unfolded

ht = H(W1xt + W2ht−1 + bh)

yt = F(W3ht + by )(3)

Page 67: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Recurrent Neural Network

Vanilla RNN:

Left: Vanilla RNN structure; Right: Vanilla RNN unfolded

ht = H(W1xt + W2ht−1 + bh)

yt = F(W3ht + by )(3)

Page 68: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Properties of RNN

Pros:

1 Use contextual or sequential information by recurrent connection2 Nonlinear model

Cons:

1 Influence of an input either decays or blows up as it cycles therecurrent connection

2 Back-propagation learning algorithm based on gradient descentrequires computing a product of a large number of Jacobian

3 Vanishing gradient problem4 The range of temporality that can be accessed in practice is

usually quite limited5 Dynamic state of regular RNN is short-term memory

Page 69: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Properties of RNN

Pros:

Cons:

Page 70: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Properties of RNN

Pros:

Cons:

Page 71: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Properties of RNN

Pros:

Cons:

3 Vanishing gradient problem

4 The range of temporality that can be accessed in practice isusually quite limited

5 Dynamic state of regular RNN is short-term memory

Page 72: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Properties of RNN

Pros:

Cons:

Page 73: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Long Short-Term Memory Cell (LSTM)

1 Hochreiter & Schimidhuber(1997) solved the problem ofgetting an RNN to rememberthings for a long time.

2 They design a memory cell withlogistic and linear units withmultiplicative interactions

1 Information get into a cellwhenever the “input” gateis on

2 Information stays in the cellso long as the “forget”gate is closed

3 Information can read fromthe cell by turning the“output” gate on

Page 74: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 75: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 76: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 77: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 78: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Page 79: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

m n

it = σ(Wxixt + Whiht−1 + Wcict−1 + bi )

ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf )

ct = ft ⊗ ct−1 + it ⊗ tanh(Wxcxt + Whcht−1 + bc)

ot = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo)

ht = ot ⊗ tanh(ct)

(4)

Page 80: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

m n

it = σ(Wxixt + Whiht−1 + Wcict−1 + bi )

ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf )

ct = ft ⊗ ct−1 + it ⊗ tanh(Wxcxt + Whcht−1 + bc)

ot = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo)

ht = ot ⊗ tanh(ct)

(4)

Page 81: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Preservation of Gradient Information

1 Input gate remains closed → the activation of the cell will not beoverwritten by the new inputs arriving in the network

2 Open the output gate → retrieve inputs from much later in thesequence.

Page 82: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Hybrid of LSTM Memory Cells and RNN (LSTM Network)

…

… ……

…

Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded

Page 83: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 84: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Nonlinear Models Help

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF1)

Nonlinear SVM (Stacked)Linear SVM (Stacked)Nonlinear SVM (Non-stacked)Linear SVM (Non-stacked)

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF2)

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF3)

AUC scores of nonlinear and linear SVMs for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF1)

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF2)

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF3)

AUC scores of nonlinear and linear SVMs for edX course

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1Vinilla RNN, IOHMM (DEF1)

Vanilla RNN

IOHMM 1

IOHMM 2

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

AUC scores of vanilla RNN and IOHMMs for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1Vanilla RNN, IOHMM (DEF1)

IOHMM 1

IOHMM 2

Vanilla RNN

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

AUC scores of vanilla RNN and IOHMMs for edX course

Page 85: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Nonlinear Models Help

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF1)

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF2)

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1SVM (DEF3)

AUC scores of nonlinear and linear SVMs for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF1)

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF2)

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

1SVM (DEF3)

AUC scores of nonlinear and linear SVMs for edX course

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Vanilla RNN

IOHMM 1

IOHMM 2

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

AUC scores of vanilla RNN and IOHMMs for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

IOHMM 1

IOHMM 2

Vanilla RNN

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

AUC scores of vanilla RNN and IOHMMs for edX course

Page 86: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Model Performance Comparison

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1Model Performance Comparison (DEF1)

LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for edX course

1 LSTM network performs consistently best, showing that thelong-term memory retained by the LSTM block is very effective

2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets

Page 87: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 88: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

2 Vanilla RNN < LSTM network; Still among the top 3 methods

3 IOHMMs performance worst; IOHMM 2 > IOHMM 14 Baselines ' vanilla RNN; Not consistent on two datasets

Page 89: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

2 Vanilla RNN < LSTM network; Still among the top 3 methods3 IOHMMs performance worst; IOHMM 2 > IOHMM 1

4 Baselines ' vanilla RNN; Not consistent on two datasets

Page 90: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 91: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Outline

6 Temporal Models

8 Conclusion

Page 92: Machine learning models for some learning analytics issues ...mi/upload/doc/publication/2015/Mi.pdf · Learning Analytics Issues Current MOOC environment 1 Popularity and rapid development

Conclusion

Contributions:

1 Two learning analytics issues, pioneer research in MOOCs2 Viewpoints to both research issues are novel3 The experiment results obtained are promising and significant

Take-home Message:Peer grading:

1 Propose new probabilistic models for cardinal peer grading2 Novel mechanism for combining cardinal and ordinal models in a

common framework.

Dropout prediction:

1 View this task as sequence classification problem2 Apply various temporal models; RNN model with LSTM cells

achieve promising performance boost