data archeology - a theory- and context-informed approach to analyzing data traces

61
DATA ARCHEOLOGY Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted A THEORY-INFORMED APPROACH TO ANALYZING DATA TRACES OF SOCIAL INTERACTION IN LARGE SCALE LEARNING ENVIRONMENTS ALYSSA WISE SIMON FRASER UNIVERSITY

Upload: alywise

Post on 06-Aug-2015

244 views

Category:

Education


0 download

TRANSCRIPT

DATA ARCHEOLOGY

Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted

A T H E O R Y - I N F O R M E D A P P R O A C H T O A N A L Y Z I N G D A T A T R A C E S O F S O C I A L I N T E R A C T I O N I N L A R G E S C A L E L E A R N I N G E N V I R O N M E N T S

A LY S S A W I S ES I M O N F R A S E R U N I V E R S I T Y

OVERVIEW

DATA ARCHEOLOGY – THE BIG IDEA

APPLICATIONT H E R O L E O F “ L I S T E N I N G ” I N O N L I N E D I S C U S S I O N S

S O C I A L I N T E R A C T I O N I N L A R G E S C A L E L E A R N I N G E N V I R O N M E N T S

CONCLUSION

DATA

ARCHEOLO

GY -

THE BIG

IDEA

DATA MINING

Image Credit: Scott Clark via Flickr (CC BY 2.0), adapted

DATA GEOLOGY

Image Credit: APS Museum via Flickr (CC BY 2.0), adapted

( S H A F F E R , 2 0 1 3 )

DATA ARCHEOLOGY

Image Credit: U.S. Army Corps of Engineers Europe District via Flickr (CC BY 2.0), adapted

( W I S E , 2 0 1 3 , 2 0 1 4 )

DATA ARCHEOLOGY

Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted

( W I S E , 2 0 1 3 , 2 0 1 4 )

DATA ARCHEOLOGY

THEORETICALLY-INFORMED EFFORTS TO MAKE SENSE

OF THE DIGITAL ARTIFACTS LEFT BEHIND BY A PRIOR

LEARNING “CIVILIZATION”

MOVING BEYOND “MORE IS BETTER”

AS A LEARNING MODEL TO PROBE WHAT KINDS OF

THINGS ARE BETTER FOR WHAT PURPOSES AND WHY

THEORETICALLY-INFORMED

ATTENDING TO THE PEDAGOGICAL CONTEXT

AS A CRITICAL FRAME FOR INTERPRETING THE

PAST ACTIVITY THAT OCCURRED

LEARNING “CIVILIZATION”

THE COMPLETE AGGREGATED DATA

RECORD AVAILABLE AT THE END DOESN’T

REFLECT THE DYNAMIC ENVIRONMENT IN WHICH THE ACTIVITY OCCURRED

TEMPORALITY & TRAJECTORIES

W H AT TO O L S A R E T H E R E I N T H E O N L I N E E N V I R O N M E N T ?

(RE)FRAMING QUESTIONS

W H AT I S T H E P U R P O S E O F T H E L E A R N I N G A C T I V I T I E S C O N D U C T E D I N T H E TO O L S ?

H O W M U C H D O S T U D E N T S U S E T H E M ?

W H AT A R E T H E O R E T I C A L LY D E S I R A B L E PAT T E R N S O F PA R T I C I PAT I O N ?

H O W C A N T H E S E B E S T B E P R OX I E D BY T H E AVA I L A B L E D ATA?

F R OM

TO

F O R M O R E O N C O N N E C T I N G L E A R N I N G A N A L Y T I C S + L E A R N I N G D E S I G N S E E L O C K Y E R , H E A T H C O T E & D A W S O N [ 2 0 1 3 ]

THE ROLE

OF

“LISTENING

” IN ONLINE

DISCUSSION

S

AN ONLINE DISCUSSION FORUM IS A TOOL

IT ’S EDUCATIONAL PURPOSE CAN CHANGE

Q & A

Peer Review

Dialogue

Reading Response

Team Decision Making

Argumen-tation

DIFFERENT PURPOSES FOR A

DISCUSSION FORUM IMPLY DIFFERENT

EXPECTATIONS FOR DESIRED PATTERNS OF

USE

ONLINE DISCUSSION LEARNING PURPOSE

Externalizing one’s ideas by contributing

posts to an online discussion

Taking in the externalizations of others by accessing

existing posts

• Social constructivist perspective - online discussions as a forum for learning through dialogue

• Learning occurs as students articulate their ideas, are exposed to the ideas of others, and negotiate differences in perspective

• Focus on how students contribute comments (“speak”), attend to other’s messages (“listen”), and the cxns bet them

UNDERLYING THEORY OF ONLINE “L ISTENING”

Listening not LurkingLurker• Specific person who participates

passively • Accesses existing comments but

does not contribute • Negative connotation

Listening • Active process conducted by anyone in

online discussion• Activity interrelated with contributing.• Productive element of discussion

participation

Listening • Specific term (online discussions)• Dynamic text, distinct sub-units • Multi-authored • Generating a response often involved

Reading• Generic term (all written text)• Static, cohesive text• Single author• Does not require response

Listening not Reading

Speaking

Mechanism for sharing ideas

Value in speaking that is Relevant to the topic at hand Rationaled with evidence Recurring and distributed Moderately portioned Responsive to the

conversation

Listening

Mechanism for becoming aware of ideas

Value in listening that is Broad (to consider a diversity of ideas) Deep (to consider ideas in earnest) Recursive (to provide context for

discussion flow) Integrated (attending to connected

rather than scattered comments)

ONLINE DISCUSSIONLEARNING MODEL

ONLINE DISCUSSION PEDAGOGICAL CONTEXT

• Group and Timing– Small group discussions (~8-12 students)

– Random assignment (would be better with differing perspectives)

– Discussions run on a weekly schedule with course

• Task– Contested real-world challenges (business, edu psychology)

– Given two viable contrasting perspectives, come to consensus

– Share decision with rationale with whole class

• Expectations– Given criteria / guidelines for speaking and listening

– Assessment varies (individual/group, student/instructor driven)

ONLINE DISCUSSION TECHNOLOGICAL CONTEXT

ONLINE DISCUSSION TECHNOLOGICAL CONTEXT

Listening

Mechanism for becoming aware of ideas

Value in listening that is Broad (to consider a diversity of ideas) Deep (to consider ideas in earnest) Recursive (to provide context for

discussion flow) Integrated (attending to connected

rather than scattered comments)

ONLINE DISCUSSIONLEARNING MODEL

Criteria Metric Definition

Breadth

% posts viewed Number of unique posts that a student viewed divided by the total number of posts in the discussion

% posts read Number of unique posts that a student read divided by the total number of posts in the discussion

Depth

% (real) reads Number of times a student viewed other’s posts at < 6.5 wps, divided by the total number of views

Av length of real reads (min)

Total time spent reading posts, divided by the number of reads (after scans removed )

Recursiveness # of reviews of others’ posts

Number of times a student revisited posts that they had viewed previously in the discussion

Integration Posts read connected, not scattered

Concentration of posts viewed by a student in the discussion space* [thread-density, network metrics..]

ONLINE DISCUSSIONLISTENING ANALYTICS

Speaking

Mechanism for sharing ideas

Value in speaking that is Relevant to the topic at hand Rationaled with evidence Recurring and distributed Moderately portioned Responsive to the

conversation

ONLINE DISCUSSIONLEARNING MODEL

Criteria Metric Definition

Recurring

Number of posts Total number of posts a student contributed to the discussion

Percent of sessions with posts

Number of sessions in which a student made a post, divided by their total of number sessions

Moderately Portioned Average post length Total number of words posted by a student divided by

the number of posts they made to the discussion

Responsive Depth of response to existing conversation

0 None1 Acknowledging2 Responding to an idea3 Responding to multiple ideas

Rationaled Degree of argumentation

0 No argumentation1 Unsupported argumentation (Position only)2 Simple argumentation (Position + Reasoning/Evidence)3 Complex argumentation (Position + Reasoning /Evidence+ Qualifier/Rebuttal)

ONLINE DISCUSSIONSPEAKING ANALYTICS

SOME RESULTS[ W I S E , H S I A O E T A L . 2 0 1 2 ][ W I S E , P E R E R A E T A L . , 2 0 1 2 ][ W I S E , S P E E R E T A L . 2 0 1 3 ]

DepthBreadth (% of posts viewed)

Low High

Low Disregardful Coverage

High Focused ThoroughUn-engaged Engaged

SOME RESULTS

SOME RESULTS

Depth (% of real reads)

Breadth (% of posts viewed)

Low High

Low Disregardful Coverage

High Focused Thorough

Un-engaged

Engaged

SOME MORE RESULTS[ W I S E , H A U S K N E C H T & Z H A O , 2 0 1 4 ]

Greater Listening Depth (% of real reads)

Listening Recursiveness(# reviews of others’ posts)

Associated with More Rationaled Speaking

More Responsive Speaking

Listening Breadth not associated with any speaking qualities in the study. Less important for current pedagogical design?

FLESHING OUT TYPOLOGIES

Pattern Characteristic Behaviors

Disregardful Minimal attention to others’ posts (few posts viewed; short time viewing). Brief and relatively infrequent sessions of activity in discussions.

CoverageViews a large proportion of others’ posts, but spends little time attending to them (often only scanning the contents). Short but frequent sessions of activity, focusing primarily on new posts. *May be socially-oriented or content-driven.

FocusedViews a limited number of others’ posts, but spends substantial time attending to them. Few extended sessions of activity in discussions.

ThoroughViews a large proportion of other’s posts; spends substantial time attending to many of them. Long overall time spent listening; considerable revisitiation of posts already read.

GROUND TRUTHING VIA TEMPORAL MICROANALYTIC CASE STUDIESDate Time Session Action Duration

(min)Length(words)

Message #

6/3/2011 23:46 1 Read 44.43 413 447

6/3/2011 23:52 1 Read 1.73 60 455

6/4/2011 00:08 1 Scan 0.23 117 459

6/4/2011 00:09 1 Read 12.51 413 460

6/4/2011 23:49 2 Post 3.18 120 477

TAKEAWAYATTENDING TO THE

PEDAGOGICAL CONTEXT OF DISCUSSION FORUM USE AND

CRAFTING THEORETICALLY INFORMED METRICS LET US EXTRACT EXPLANATORY AND ACTIONABLE INFORMATION

FROM THE CLICKSTREAM DATA

SOCIAL INTERACTION

IN LARGE SCALE

LEARNING

ENVIRONMENTS

W I T H T H A N K S TO M Y R E S E A R C H

A S S I S TA N T

Y I C U I

CHALLENGES WE SET OURSELVES FOR LOOKING AT THE MOOC DATALO O K AT S O C I A L I N T E R A C T I O N

A D D R ES S I S S U E S O F S C A L E

E M P LOY N AT U R A L L A N G UA G E P R O C E S S I N G *

AT T E N D TO P E DA G O G I C A L CO N T E X T

WO R K I N A T H EO RY- I N F O R M E D WAY

SOCIAL INTERACTION IN MOOC FORUMS

S T R O N G P R E D I C T O R O F P E R S I S T E N C E B U T T H I S M AY B E B E C A U S E I T I N D E X E S ( R AT H E R T H A N C A U S E S ) E N G A G E M E N T – W H AT A B O U T L E A R N I N G ?

C L A I M E D T O P R O V I D E C R I T I C A L S O C I A L L E A R N I N G S U P P O R T B U T W I T H O U T T I E S T O T H E A C A D E M I C C O N T E N T, S O C I A B I L I T Y M AY N O T I M PA C T L E A R N I N G[ K U H , 2 0 0 2 ; W I S E , D E L V A L L E , C H A N G & D U F F Y , 2 0 0 4 ]

O N LY A S M A L L % PA R T I C I PAT E B U T T H I S I S N O T S U R P R I S I N G I F I T I S N O T D E S I G N E D I N T O A C O U R S E . H O W P E O P L E PA R T I C I PAT E I S A S I M P O R TA N T A S I F T H E Y D O S O .

WHY FOCUS ON LEARNING NOT AT TRIT ION?

S T R O N G N E E D T O R E C O N C E P T U A L I S E P E R S I S T E N C E A N D AT T R I T I O N I N M O O C S G I V E N T H E N U M B E R O F P E O P L E W H O R E G I S T E R W / O “A N I N F O R M E D C O M M I T M E N T T O C O M P L E T E T H E C O U R S E ” [ D E B O E R , H O , S T U M P & B R E S L O W , 2 0 1 4 ]

G R E AT V A R I E T Y I N I N T E N T I O N S , W O R K I N G PAT T E R S , R E S O U R C E S U S E D , S E Q U E N C E A N D F R E Q U E N C Y O F U S E [ D E B O E R E T A L . , 2 0 1 4 ; K I Z I L C E C , P I E C H & S C H N E I D E R , 2 0 1 3 ]

J U S T I N D E X I N G L E V E L O F E N G A G E M E N T T O P R E D I C T W H O W I L L S T O P PA R T I C I PAT I N G D O E S N ’ T T E L L U S W H Y O R H O W T O I N T E R V E N E

L E A R N I N G M AY B E O C C U R R I N G E V E N F O R T H O S E W H O D O N ’ T E V E N T U A L LY C O M P L E T E

WHY FOCUS ON LEARNING NOT AT TRIT ION?

S T R O N G N E E D T O R E C O N C E P T U A L I S E P E R S I S T E N C E A N D AT T R I T I O N I N M O O C S G I V E N T H E N U M B E R O F P E O P L E W H O R E G I S T E R W / O “A N I N F O R M E D C O M M I T M E N T T O C O M P L E T E T H E C O U R S E ” [ D E B O E R , H O , S T U M P & B R E S L O W , 2 0 1 4 ]

G R E AT V A R I E T Y I N I N T E N T I O N S , W O R K I N G PAT T E R S , R E S O U R C E S U S E D , S E Q U E N C E A N D F R E Q U E N C Y O F U S E [ D E B O E R E T A L . , 2 0 1 4 ; K I Z I L C E C , P I E C H & S C H N E I D E R , 2 0 1 3 ]

J U S T I N D E X I N G L E V E L O F E N G A G E M E N T T O P R E D I C T W H O W I L L S T O P PA R T I C I PAT I N G D O E S N ’ T T E L L U S W H Y O R H O W T O I N T E R V E N E

L E A R N I N G M AY B E O C C U R R I N G E V E N F O R T H O S E W H O D O N ’ T E V E N T U A L LY C O M P L E T E

“I'm very happy to be in this course. I [couldn’t] finish it on

time, but I think I have learnt a lot. Thank you Prof X, you

are a great teacher, very [professional], excellent in many

ways. I will miss you!”

SOCIAL INTERACTION IN MOOC FORUMS

S T R O N G P R E D I C T O R O F P E R S I S T E N C E B U T T H I S M AY B E B E C A U S E I T I N D E X E S ( R AT H E R T H A N C A U S E S ) E N G A G E M E N T – W H AT A B O U T L E A R N I N G ?

C L A I M E D T O P R O V I D E C R I T I C A L S O C I A L L E A R N I N G S U P P O R T B U T W I T H O U T T I E S T O T H E A C A D E M I C C O N T E N T, S O C I A B I L I T Y M AY N O T I M PA C T L E A R N I N G[ K U H , 2 0 0 2 ; W I S E , D E L V A L L E , C H A N G & D U F F Y , 2 0 0 4 ]

O N LY A S M A L L % PA R T I C I PAT E B U T T H I S I S N O T S U R P R I S I N G I F I T I S N O T D E S I G N E D I N T O A C O U R S E . H O W P E O P L E PA R T I C I PAT E I S A S I M P O R TA N T A S I F T H E Y D O S O .

FRAMING QUESTIONS

W H AT WA S T H E P E D A G O G I C A L P U R P O S E / D E S I G N O F T H E D I S C U S S I O N F O R U M S I N T H E P S YC H M O O C ?

B A S E D O N T H I S , W H AT W E R E T H E O R E T I C A L LY D E S I R A B L E PAT T E R N S O F PA R T I C I PAT I O N ?

H O W C A N T H E S E B E S T B E P R OX I E D BY T H E AVA I L A B L E D ATA?

H O W C O U L D T H E D E S I R E D PAT T E R N S B E B E T T E R S U P P O R T E D ?

MOOCPEDAGOGICAL CONTEXT

• Course Topic– Introductory Psychology

• Level and Expected Background– Designed for college freshmen– Equivalent of high school education expected– No specific prior knowledge indicated

• Course Design– Video lectures (8-15 min long)– Readings from OLI (Open Learning Initiative) online textbook – Weekly timed multiple choice quiz– Final exam at the end of the course

WHAT ABOUT THE DISCUSSION FORUMS?

• Optional part of the course, main pedagogical design a Q&A forum to ask and answer questions about course material

Communication“There will be a Q&A forum where you can post your questions about the course. Students will have the opportunity to "vote up" questions they want answered, and the questions with the most votes will be answered either in a forum post or a video.”

….

Expectations“Participants are expected to seek help if needed from your fellow students by using the forums”

RECREATED STUDENT FORUM VIEW

ForumsWelcome to the course discussion forums.

Sub-forum Activity

General DiscussionDiscuss general aspects of the course. Q&A Ask and answer questions about course material.

Assignments Discuss details of the course assignments.

Technical Issues Post any issues with, or questions about, technical aspects of the course website (trouble with video playback, broken links, etc.).

OLI Textbook Questions Post any issues with, or questions about, technical aspects of the OLI Textbook.

Student BiosIntroduce yourself and learn about other students.

RECREATED STUDENT FORUM VIEW

Notes: [1] Counts only include non-deleted posts/threads [2] Counts taken prior to data cleaning, may include duplicate or nonsense posts

ForumsWelcome to the course discussion forums.

Sub-forum ActivityThreads (Posts + Comments)

General DiscussionDiscuss general aspects of the course. 289 (1341+804)

Q&A Ask and answer questions about course material. 158 (525+204)

Assignments Discuss details of the course assignments. 147 (827+775)

Technical Issues Post any issues with, or questions about, technical aspects of the course website (trouble with video playback, broken links, etc.).

108 (318+79)

OLI Textbook Questions Post any issues with, or questions about, technical aspects of the OLI Textbook. 99 (347+106)

Student BiosIntroduce yourself and learn about other students. 662 (1614+354)

PROCESS CHECKD O E S I T M A K E S E N S E T O U S E L I S T E N I N G A N D

S P E A K I N G T H E O R Y I N T H I S C O N T E X T ?

• P E D A G O G I C A L C H A L L E N G E– M O S T O F T H E D I S C U S S I O N I S N ’ T C O N T E N T

R E L AT E D , L I S T E N I N G I S N ’ T E X P E C T E D T O R E L AT E T O L E A R N I N G

• T E C H N I C A L C H A L L E N G E– L O W G R A N U L A R I T Y D ATA ( “ V I E W. F O R U M ” &

“ V I E W.T H R E A D ” V S . “ V I E W. P O S T ”, T H O U G H “ V O T E . U P ” N O W A V A I L A B L E )

• P R A C T I C A L C H A L L E N G E– M A N Y T H R E A D S I N Q & A F O R U M N O T A C T U A L LY

C O N T E N T

CHANGING TRACKSA W H O L E B U N C H O F Q U E S T I O N S W E

T H O U G H T W E W E R E G O I N G T O A S K W E N T O U T T H E W I N D O W …

• N E W ( V E R Y B A S I C ) F O C U S O N I F T H E PAT T E R N S O F F O R U M U S E M AT C H E D T H O S E D E S I R E D F O R T H E I N T E N D E D P U R P O S E– D I D S T U D E N T S U S E T H E Q & A F O R U M T O A S K

Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?– D I D T H E I N S T R U C T O R S R E P LY T O ( T H E

H I G H E S T V O T E D ) Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?

DID STUDENTS USE THE Q&A FORUM TO ASK QUESTIONS ABOUT THE COURSE MATERIAL?

• After preliminary inspection we decided to code both the Q&A and General Discussion (GD) forums b/c no clear fxnl difference was seen

• Two raters coded the starting post in each thread as either– Content [C] (Asking questions about course material, expanding on

course content; discussing a resource shared)– Non-Content [X] (Including logistics, social, study group formation and

link sharing)

• 439 of 447 total threads coded – 8 removed for foreign language or complete nonsense contents– 92% agreement (k=0.81), All difference reconciled, rule of leniency

Image: So Many MOOCs by mksmith23, CC by 2.0 license

DID STUDENTS USE THE Q&A FORUM TO ASK QUESTIONS ABOUT THE COURSE MATERIAL?

Content Threads Non-Content Threads

General Discussion 55 226

Q&A 68 90

Total 123 (28%) 316 (72%)

Image: So Many MOOCs by mksmith23, CC by 2.0 license

D I D T H E I N S T RU C TO R S R E P LY TO ( T H E H I G H E S T VOT E D ) Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?

• First approach: “Instructor Replied” label [problematic]

• 2 “official” Instructor IDs (threads automatically labelled)– Course Professor [2XXXXX4]

– Course TA [5XXXX1]

• 1 “unofficial” Instructor ID (threads not automatically labelled)– Course Professor [2XXXXX0]

“[Yes,] I really am NAME2XXXXX0 (XXXX is my first name) and am the instructor for the course. I've been at UNIVERSITY for 43 years and love teaching. This course was a challenge because there was no feedback from students when the modules were being taped. The lack of student interaction is the real challenge of a MOOC. Just looking at a camera is a very different context than looking at a classroom of bright students. NAME2XXXXX0, Instructor”

Image: So Many MOOCs by mksmith23, CC by 2.0 license

Forum ThreadsInstructor Replied

(All 3 IDs)Content Threads

Replied by instructor

% of Instructor Replies

Directed at Content

General Discussion 289 31 (11%) 55 5 (9%) 16%

Q&A 158 31 (20%) 68 17 (25%) 55%

Total 447 62 (14%) 123 22 (18%) 35%

D I D T H E I N S T RU C TO R S R E P LY TO ( T H E H I G H E S T VOT E D ) Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?Image: So Many MOOCs by mksmith23, CC by 2.0 license

Instructor Replied (62 threads)

Non-Replied (377 threads)

Average # (range) of votes 2.6 (0 to 30) 1.7 (-13 to 45)

Av # (range) of posts+comments 8.6 (2-110) 6.2 (1-92)

Av # (range) views 110 (25-1185) 75 (5-1143)

Content Threads Non-Content Threads

Av # votes 1.2 2.1Av # posts+comments 4.1 7.5

Av # views 51 91

D I D T H E I N S T RU C TO R S R E P LY TO ( T H E H I G H E S T VOT E D ) Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?Image: So Many MOOCs by mksmith23, CC by 2.0 license

A CORE CHALLENGE FOR SOCIAL INTERAC TION AT SCALE

• Too much quantity, not enough quality

• Students get lost / overwhelmed in the abundance of communication

• Instructors too, challenging to find where their input is needed

• A need to separate “the wheat from the chaff”

Image: So Many MOOCs by mksmith23, CC by 2.0 license

CAN NATURAL LANGUAGE PROCESSING HELP?

• Goal to support the instructor in finding content threads more efficiently in the forums to be able to respond and facilitate learning

• A modest attempt to build a proof-of-concept model– Feature extraction performed with basic bag-of-words feature set

(inc. bigrams, trigrams and parts-of-speech tagging), rare threshold of 5

– Unigrams and bigrams alone most useful to characterize and model posts

– Total of 1573 features extracted

Image: So Many MOOCs by mksmith23, CC by 2.0 license

CHARAC TERIST IC FEATURES

Feature Kappabut 0.25

more 0.24

by 0.24

in_the 0.24

why 0.24

as 0.23

what 0.22

is 0.22

that 0.21

or 0.20

then 0.20

in 0.19

when 0.18

of_the 0.17

of 0.17

Feature Kappaand_the 0.16

question 0.16

between 0.16

age 0.16

correct 0.16

than 0.16

were 0.15

by_the 0.15

an 0.15

answer 0.15

does 0.14

mental 0.14

research 0.14

to_the 0.14

their 0.14

Feature Kappacourse 0.12

i 0.09

my 0.08

this_course 0.07

final 0.06

BOL_i 0.06

quiz 0.06

the_course 0.06

exam 0.05

thanks 0.05

i_am 0.05

videos 0.04

grade 0.04

certificate 0.04

BOL_hi 0.04

Feature KappaBOL_hello 0.04

hello 0.04

everyone 0.04

will 0.04

final_exam 0.04

i_just 0.04

hi 0.04

i_can 0.04

find 0.04

coursera 0.03

courses 0.03

i_have 0.03

grades 0.03

material 0.03

quizzes 0.03

Content Threads Non-Content ThreadsImage: So Many MOOCs by mksmith23, CC by 2.0 license

PREDIC TING CONTENT POSTS

• Procedure– Algorithm: Support Vector Machines– Setting for Nominal Class Values: LibLINEAR– Cross-validation, 10 randomly generated folds

• Results– Best Model Accuracy/Kappa = 0.86/0.64– Recall = 0.71 (False Neg. rate = 0.29)– Precision = 0.76

Image: So Many MOOCs by mksmith23, CC by 2.0 license

STANDARD FORUM INDICATORS DON’ T HELP IDENTIFY CONTENT

Accuracy Kappa Recall Precision

Base Model 0.86 0.64 0.71 0.76

Addition of Standard Forum Indicators

# votes 0.84 0.60 0.68 0.74

# posts 0.85 0.62 0.69 0.75

# views 0.85 0.62 0.69 0.76

Image: So Many MOOCs by mksmith23, CC by 2.0 license

INSTRUC TOR PERSPEC TIVE

W/o Content Model(Default)

With Content Model

Total Number of Potential Content Threads to Read

439 (37/wk on av)

114 (10/wk on av)

Percent of Threads Actually About Course Content 28% 76%

Percent of Content Threads With Instructor Replies 18% >18%?

Percent of Instructor Replies Addressing Content 35% >35%?

Image: So Many MOOCs by mksmith23, CC by 2.0 license

STRENGTHS, L IMITATIONS & FUTURE OPPORTUNIT IES

• Design of forums can improve, but unexpected use will still happen. For instructors to facilitating learning, the first step is to locate where learning opportunities are happening – content modeling can help.– Aligns well with Coursera’s development of content / logistics TAs.

• Model is simple but useful, more sophisticated modelling can improve these results.

• Model built with only 439 starting posts, including all the posts could lead to both better prediction of if a post is content-related and more nuanced assessment of threads (e.g. “This thread is estimated to have 87% content-related posts)

• Model seemed not to draw heavily on domain-specific vocabulary but may rely on domain-specific discourse types (extensibility to other social sciences but perhaps not humanities / hard sciences)

Image: So Many MOOCs by mksmith23, CC by 2.0 license

TAKEAWAYAT TENDING TO THE PEDAGOGICAL CONTEXT OF D ISCUSS ION FORUM USE AND GET TING CLOSE TO THE DATA LET US DEVELOP A S IMPLE YET APPROPRIATE AND USEFUL

MODEL - SUPPORTING CONTENT RELATED LEARNING DISCUSS ION

MAY BE PRE-REQUIS ITE TO STUDYING MORE COMPLEX FACETS

OF INTERACTION LARGE SCALE LEARNING ENVIRONMENTS

A DATA ARCHEOLOGY APPROACH THAT PAYS

ATTENTION TO THE LEARNING “CIVIL IZATION” THAT CREATED THE DATA AND POSITS THEORY-

INFORMED PATTERNS OF BEHAVIOR CAN HELP US

BETTER UNDERSTAND AND SUPPORT SOCIAL INTERACTION

IN LARGE SCALE LEARNING ENVIRONMENTS

CONCLUSION

DATA ARCHEOLOGY

Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted

A T H E O R Y - I N F O R M E D A P P R O A C H T O A N A L Y Z I N G D A T A T R A C E S O F S O C I A L I N T E R A C T I O N I N L A R G E S C A L E L E A R N I N G E N V I R O N M E N T S

A LY S S A W I S ES I M O N F R A S E R U N I V E R S I T Y