data mining on social networks for students learning experiences

27
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10 th August, 2016 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014 Mining Social Media Data for Understanding StudentsLearning Experiences Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha Presented By Biplab Chandra Debnath ID: 1015312004 Institute of Information and Communication Technology (IICT) Bangladesh University of Engineering and Technology (BUET)

Upload: biplab-debnath

Post on 11-Feb-2017

59 views

Category:

Data & Analytics


0 download

TRANSCRIPT

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014

Mining Social Media Data for Understanding

Students’ Learning Experiences

Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha

Presented By

Biplab Chandra Debnath

ID: 1015312004

Institute of Information and Communication Technology (IICT)

Bangladesh University of Engineering and Technology (BUET)

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Contents

Objectives

Introduction

Related Works

Data Collection

Inductive Content Analysis

Naïve Bayes Multilevel Classifier

Comparison Experiment

Detect Students Problems From Purdue Data Set

Limitations and Future Work

Conclusion

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Objectives

Demonstrating workflow of social media data sense

making for education data mining.

Integrating both qualitative analysis and large scale data

mining techniques

Exploring engineering students informal conversations on

twitter.

Understanding issues and problems students encounter

in their learning experiences.

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Introduction

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Related Work

Public Discourse on the Web

Goffman’s theory (notion of front-stage and back-stage of people’s

social performances)

Mining Twitter Data

Analyze tweets with hastag #iranElection

Popular classification model (Decision tree, Logistic regression,

Maximum entropy, Boosting, SVM)

Learning Analytics and Educational Data Mining

CMS, VLE, EDM (blackboard.com)

Identify students academic performances

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Radian6 (http://www.salesforce.com/)

Twitter APIs

Keywords: engineer, students, campus, class, homework,

professor, and lab.

Twitter hashtag #engineeringProblems occurring most

frequently

25,284 tweets with the hashtag #engineeringProblems posted

from 10,239 unique Twitter accounts.

Considering only 2785 tweets

39,095 tweets with the hashtag #engineeringProblems posted

from 5,592 unique Twitter accounts (Purdue University)

Data Collection

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Non-mutually exclusive categories

Development of Categories

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Naïve Bayes classifier is effective on this data set compared to other

multi-label classifiers.

Text Pre-Processing

Naïve Bayes multilevel classifier

Evaluation Measures for Multi-Label Classifier

Classification Result

Naïve Bayes Multilevel Classification

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Remove all #tag, negative emotions, repeating letters(huuungryyy)

Used the Krovetz stemmer in the Lemur information

retrieval toolkit

Remove the common stop words (much, more, all, always,

still, only)

Text Pre-Processing

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Naïve Bayes multilevel classifier

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Naïve Bayes multilevel classifier

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Naïve Bayes multilevel classifier

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Example Based Classification Measures

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Example Based Classification Measures

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Example Based Classification Measures

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Label-Based Evaluation Measures

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Label-Based Evaluation Measures

Macro-averaged F1 is higher for classifiers work better on

smaller categories.

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Label-Based Evaluation Measures

Label based accuracy is not a very effective measure to

account label imbalance.

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Comparison Experiment: SVM and M3L

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Comparison Experiment: SVM and M3L

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Comparison Experiment: SVM and M3L

Same training and testing data sets

One-versus-all SVM multi-label classifier classified all

tweets into not in the category for all categories.

Max-Margin Multi-Label classifier takes label correlation.

The performance is better than the simplistic one-versus-

all SVM classifier.

But still not as good as the Naive Bayes classifier.

Because SVM is not a probabilistic model

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Detect Students Problems From Purdue Data Set

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Detect Students Problems From Purdue Data Set

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Detect Students Problems From Purdue Data Set

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

First, not all students are active on Twitter.

Second, consideration on only negative aspect but not

positive on learning experiences

Third, identified the prominent themes with relatively large

number of tweets in the data.

Fourth, the qualitative analysis reveals that there are

correlations among the themes.

Limitations

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

First, The “manipulation” of personal image online may

need to be taken into considerations in future work.

Second, Future work can compare both the good and bad

things to investigate the tradeoffs with which students

struggle.

Third, Future work can be done to design more

sophisticated algorithms in order to reveal the hidden

information in the “long tail”.

Fourth, Future work could specifically address the

correlations among these student problems.

Future Work

© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016

Through a qualitative content analysis, we found that

engineering students are largely struggling with the heavy

study load, and are not able to manage it successfully.

Heavy study load leads to many consequences including

lack of social engagement, sleep problems, and other

psychological and physical health problems.

This detector can be applied as a monitoring mechanism

to identify at-risk students.

Conclusion