who do customers like more? a sentiment analysis of southwest air & delta air tweets collected...
TRANSCRIPT
The Airlines Dataset
• 10,000 Tweets containing @delta or @southwestair
• From September 24, 2014 to October 2, 2014
• CSV file contains ID, date & max 140 characters of text
• Set is unlabeled
The Challenges
• Abbreviations
• Typ0s
• Slang
• Hashtags
• Sarcasm
• Tweets out of Context
• TGIFlyday, pita
• "@Delta airlines is the sh*t”
• #SuckWestAir, #thanksnothanks
The Method
• Remove duplicate records, not RT’s
• Remove non-ASCII characters
• Remove URL’s
• Clean up typos using regular expressions i.e. “won\’t”
• Expand abbreviations “w/” to “with”
• Convert all characters to lower case
• Use TextBlob to generate a polarity
• Polarity is converted to a positive, neutral or negative label
• Calculate sentiment fractions for each airline
Algorithm Evaluation
• Took a 1% sample
• Accuracy in the low 70’s
• Tried NaiveBayesAnalyzer, very sloooooow
• Experimented with spell check. Produced unexpected results
>>> TextBlob("idina menzel").correct()
TextBlob("irina mental")
Results: Aggregated
Count Percent Count Percent
positive 2399 44.91% 2001 41.45%
negative 802 15.01% 763 15.80%
neutral 2141 40.08% 2064 42.75%
Analyze This
>>> tweet1 = "My sister wear hijab. She was just assaulted on @Delta flight 1576. Shes with her husband and 4 kids. Airline is not helpful. #Islamophobia"
>>> tweet2 = "What does @Delta and @DeltaAssist do when you are harassed? They put you and your kids on the back of the plane."
>>> TextBlob(tweet1).sentiment
Sentiment(polarity=0.0, subjectivity=0.0)
>>> TextBlob(tweet2).sentiment
Sentiment(polarity=0.0, subjectivity=0.0)
>>> TextBlob(tweet1, analyzer=NaiveBayesAnalyzer()).sentiment
Sentiment(classification='pos', p_pos=0.9287915431294463, p_neg=0.07120845687055152)
>>> TextBlob(tweet2, analyzer=NaiveBayesAnalyzer()).sentiment
Sentiment(classification='pos', p_pos=0.9859238351630609, p_neg=0.014076164836940958)
Results: Aggregated II
Count Percent Count Percent
positive 7755 34.32% 12427 41.85%
negative 5688 25.17% 4790 16.13%
neutral 9156 40.52% 12475 42.01%
External Survey Results
Source:http://www.theacsi.org/?option=com_content&view=article&id=147&catid=14&Itemid=212&i=Airlines