big data mining in twitter using python clients

26
University of New Orleans University of New Orleans ScholarWorks@UNO ScholarWorks@UNO Innovate UNO InnovateUNO Fall 2017 Big Data Mining in Twitter using Python Clients Big Data Mining in Twitter using Python Clients Sanjiv Pradhanang University of New Orleans Follow this and additional works at: https://scholarworks.uno.edu/innovate Pradhanang, Sanjiv, "Big Data Mining in Twitter using Python Clients" (2017). Innovate UNO. 5. https://scholarworks.uno.edu/innovate/Fall2017/oral/5 This Oral Presentation is brought to you for free and open access by the Undergraduate Showcase at ScholarWorks@UNO. It has been accepted for inclusion in Innovate UNO by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected].

Upload: others

Post on 22-Mar-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Mining in Twitter using Python Clients

University of New Orleans University of New Orleans

ScholarWorks@UNO ScholarWorks@UNO

Innovate UNO InnovateUNO Fall 2017

Big Data Mining in Twitter using Python Clients Big Data Mining in Twitter using Python Clients

Sanjiv Pradhanang University of New Orleans

Follow this and additional works at: https://scholarworks.uno.edu/innovate

Pradhanang, Sanjiv, "Big Data Mining in Twitter using Python Clients" (2017). Innovate UNO. 5. https://scholarworks.uno.edu/innovate/Fall2017/oral/5

This Oral Presentation is brought to you for free and open access by the Undergraduate Showcase at ScholarWorks@UNO. It has been accepted for inclusion in Innovate UNO by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected].

Page 2: Big Data Mining in Twitter using Python Clients

Big Data Mining in Twitter using

Python clientsSANJIV PRADHANANG

MENTOR: DR. SHAIKH ARIFUZZAMAN 

Page 3: Big Data Mining in Twitter using Python Clients

GOALS:

Assist in making better decisions

Discover patterns in data sets

Evaluation by natural language processing

Image courtesy of CountryLiving

Page 4: Big Data Mining in Twitter using Python Clients

METHODOLOGY

ExtractionClassificationComparison & Inference

Page 5: Big Data Mining in Twitter using Python Clients
Page 6: Big Data Mining in Twitter using Python Clients

EXTRACTION

Data first extracted in JSON format

Page 7: Big Data Mining in Twitter using Python Clients

EXTRACTION

Streaming APIs for perpetual data

Page 8: Big Data Mining in Twitter using Python Clients

EXTRACTION

Appropriate filters as requiredStaying within the limits of API

Page 9: Big Data Mining in Twitter using Python Clients
Page 10: Big Data Mining in Twitter using Python Clients

METHODOLOGY

ExtractionClassificationComparison & Inference

Page 11: Big Data Mining in Twitter using Python Clients

CLASSIFICATION

Labeling of tweets:Web services:

Amazon Mechanical Turk (AMT)

Set specific conditions

Page 12: Big Data Mining in Twitter using Python Clients

CLASSIFICATION

Classifiers:Naïve Bayes Support Vector Machine (SVM)Maximum Entropy Classifier

Page 13: Big Data Mining in Twitter using Python Clients

NAÏVE BAYES CLASSIFIER

Page 14: Big Data Mining in Twitter using Python Clients

SUPPORT VECTOR MACHINE (SVM)

Trains from a set of labeled dataAn SVM builds an algorithm from trainingUses it to categorize new test casesWorks best as a binary classifier

Page 15: Big Data Mining in Twitter using Python Clients

MAXIMUM ENTROPY CLASSIFIER

Features of data sets are weightedUniform probability distribution favoredBetter when prior data is unknownModel that converts the contextual

info into class prediction

Page 16: Big Data Mining in Twitter using Python Clients

METHODOLOGY

ExtractionClassificationComparison & Inference

Page 17: Big Data Mining in Twitter using Python Clients

COMPARISON & INFERENCE

Precision, recall, accuracy, F1 score, interrater agreement

Sentiment scores determine what factors are considerable

Page 18: Big Data Mining in Twitter using Python Clients

Analysis of tweets towards colleges during Hurricane Harvey

Page 19: Big Data Mining in Twitter using Python Clients

EXTRACTION

Tweets collected between August 27 and August 31

Tweets containing the twitter handle/id(s) of colleges in Louisiana, from users all over the world

Majority of tweets coming from UNO, Tulane, Loyola, UL at Lafayette, UL at Monroe & LSU

Storage of data stream provided by Louisiana Optical Network Initiative(LONI)

Page 20: Big Data Mining in Twitter using Python Clients

CLASSIFICATION

Categorized into three categories:1. Hurricane Harvey2. School Experience3. Irrelevant 4105 tweets collected, 3656 tweets classified 1160 for Harvey, 2048 for school experiences and

448 for irrelevant Manually inspected and labelled

Page 21: Big Data Mining in Twitter using Python Clients
Page 22: Big Data Mining in Twitter using Python Clients

REPRESENTATIONAL TWEETS

Harvey: “Yo @oursoutheastern you want to cancel school please.”

School Experience: “RT @HipHopPrez: .@du1869 in top 20% for national liberal arts”

Irrelavant: “@lsu i just finished gossip girl and i think i'm in tears #bestshowever”

Page 23: Big Data Mining in Twitter using Python Clients

COMPARISON OF CLASSIFIERS

Classifier Task Categories Precision Recall F1 score Accuracy

Naïve Bayes

1Relevant 0.905 0.798 0.847

0.749Irrelevant 0.215 0.397 0.279

2Harvey 0.688 0.803 0.741

0.797School 0.877 0.794 0.833

SVM (rbf)1

Relevant 0.891 0.987 0.9360.882

Irrelevant 0.587 0.136 0.221

2Harvey 0.497 0.9 0.641

0.635School 0.895 0.484 0.629

Maximum Entropy

1Relevant 0.767 0.38 0.509

0.355Irrelevant 0.038 0.174 0.062

2Harvey 0.359 0.916 0.517

0.38School 0.617 0.077 0.137

Page 24: Big Data Mining in Twitter using Python Clients

SENTIMENTAL ANALYSIS (AS OF 10.05.2017):

MetricSentiment Score

Proportion of collegesAverage Range

Population>8000 0.130 -1,1 0.74

< 0.153 -1,1 0.26

Rank (in LA)Inside top 10 0.129 -1,1 0.80

Outside 0.165 -0.8,1 0.20

RegionWestern 0.107 -0.75,1 0.17

Eastern 0.142 -1,1 0.83

Followers on Twitter

>10000 0.109 -1,1 0.65

< 0.185 -1,1 0.35

Page 25: Big Data Mining in Twitter using Python Clients

OTHER STATS:

1. UNO 9. Nicholls State U.2. LSU 10. Loyola U.3. UL at Lafayette 11. Tulane U.4. UL at Monroe5. Dillard U.6. Southeastern U. LA7. Southern U. A&M8. Louisiana College

Series 1: Number of tweets in thousandsSeries 2: Average sentiment score

Page 26: Big Data Mining in Twitter using Python Clients

SUMMARY

Tweets before a storm are usually negativeResponsiveness from school officials

important for positive feedbackEquitable features is important for Max.

Ent. ClassifiersGet an overall comparison of colleges