data mining on social networks for students learning experiences
TRANSCRIPT
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 7, NO. 3, JULY-SEPTEMBER 2014
Mining Social Media Data for Understanding
Students’ Learning Experiences
Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madha
Presented By
Biplab Chandra Debnath
ID: 1015312004
Institute of Information and Communication Technology (IICT)
Bangladesh University of Engineering and Technology (BUET)
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Contents
Objectives
Introduction
Related Works
Data Collection
Inductive Content Analysis
Naïve Bayes Multilevel Classifier
Comparison Experiment
Detect Students Problems From Purdue Data Set
Limitations and Future Work
Conclusion
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Objectives
Demonstrating workflow of social media data sense
making for education data mining.
Integrating both qualitative analysis and large scale data
mining techniques
Exploring engineering students informal conversations on
twitter.
Understanding issues and problems students encounter
in their learning experiences.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Related Work
Public Discourse on the Web
Goffman’s theory (notion of front-stage and back-stage of people’s
social performances)
Mining Twitter Data
Analyze tweets with hastag #iranElection
Popular classification model (Decision tree, Logistic regression,
Maximum entropy, Boosting, SVM)
Learning Analytics and Educational Data Mining
CMS, VLE, EDM (blackboard.com)
Identify students academic performances
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Radian6 (http://www.salesforce.com/)
Twitter APIs
Keywords: engineer, students, campus, class, homework,
professor, and lab.
Twitter hashtag #engineeringProblems occurring most
frequently
25,284 tweets with the hashtag #engineeringProblems posted
from 10,239 unique Twitter accounts.
Considering only 2785 tweets
39,095 tweets with the hashtag #engineeringProblems posted
from 5,592 unique Twitter accounts (Purdue University)
Data Collection
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Non-mutually exclusive categories
Development of Categories
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes classifier is effective on this data set compared to other
multi-label classifiers.
Text Pre-Processing
Naïve Bayes multilevel classifier
Evaluation Measures for Multi-Label Classifier
Classification Result
Naïve Bayes Multilevel Classification
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Remove all #tag, negative emotions, repeating letters(huuungryyy)
Used the Krovetz stemmer in the Lemur information
retrieval toolkit
Remove the common stop words (much, more, all, always,
still, only)
Text Pre-Processing
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Naïve Bayes multilevel classifier
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Example Based Classification Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
Macro-averaged F1 is higher for classifiers work better on
smaller categories.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Label-Based Evaluation Measures
Label based accuracy is not a very effective measure to
account label imbalance.
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Comparison Experiment: SVM and M3L
Same training and testing data sets
One-versus-all SVM multi-label classifier classified all
tweets into not in the category for all categories.
Max-Margin Multi-Label classifier takes label correlation.
The performance is better than the simplistic one-versus-
all SVM classifier.
But still not as good as the Naive Bayes classifier.
Because SVM is not a probabilistic model
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Detect Students Problems From Purdue Data Set
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
First, not all students are active on Twitter.
Second, consideration on only negative aspect but not
positive on learning experiences
Third, identified the prominent themes with relatively large
number of tweets in the data.
Fourth, the qualitative analysis reveals that there are
correlations among the themes.
Limitations
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
First, The “manipulation” of personal image online may
need to be taken into considerations in future work.
Second, Future work can compare both the good and bad
things to investigate the tradeoffs with which students
struggle.
Third, Future work can be done to design more
sophisticated algorithms in order to reveal the hidden
information in the “long tail”.
Fourth, Future work could specifically address the
correlations among these student problems.
Future Work
© Biplab C. Debnath ICT 6522: Data Warehousing and Mining 10th August, 2016
Through a qualitative content analysis, we found that
engineering students are largely struggling with the heavy
study load, and are not able to manage it successfully.
Heavy study load leads to many consequences including
lack of social engagement, sleep problems, and other
psychological and physical health problems.
This detector can be applied as a monitoring mechanism
to identify at-risk students.
Conclusion