who do customers like more? a sentiment analysis of southwest air & delta air tweets collected...

18
Stirling Analytics Presents: Who do Customers More?

Upload: mmadave

Post on 19-Jul-2015

188 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Stirling Analytics Presents:Who do Customers

More?

The Airlines Dataset

• 10,000 Tweets containing @delta or @southwestair

• From September 24, 2014 to October 2, 2014

• CSV file contains ID, date & max 140 characters of text

• Set is unlabeled

The Challenges

• Abbreviations

• Typ0s

• Slang

• Hashtags

• Sarcasm

• Tweets out of Context

• TGIFlyday, pita

• "@Delta airlines is the sh*t”

• #SuckWestAir, #thanksnothanks

The Method

• Remove duplicate records, not RT’s

• Remove non-ASCII characters

• Remove URL’s

• Clean up typos using regular expressions i.e. “won\’t”

• Expand abbreviations “w/” to “with”

• Convert all characters to lower case

• Use TextBlob to generate a polarity

• Polarity is converted to a positive, neutral or negative label

• Calculate sentiment fractions for each airline

Algorithm Evaluation

• Took a 1% sample

• Accuracy in the low 70’s

• Tried NaiveBayesAnalyzer, very sloooooow

• Experimented with spell check. Produced unexpected results

>>> TextBlob("idina menzel").correct()

TextBlob("irina mental")

Results: Sentiment Over Time

Results: Sentiment Over Time

Results: Aggregated

Count Percent Count Percent

positive 2399 44.91% 2001 41.45%

negative 802 15.01% 763 15.80%

neutral 2141 40.08% 2064 42.75%

Results: Sentiment Over Time II

The Incident

Analyze This

>>> tweet1 = "My sister wear hijab. She was just assaulted on @Delta flight 1576. Shes with her husband and 4 kids. Airline is not helpful. #Islamophobia"

>>> tweet2 = "What does @Delta and @DeltaAssist do when you are harassed? They put you and your kids on the back of the plane."

>>> TextBlob(tweet1).sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

>>> TextBlob(tweet2).sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

>>> TextBlob(tweet1, analyzer=NaiveBayesAnalyzer()).sentiment

Sentiment(classification='pos', p_pos=0.9287915431294463, p_neg=0.07120845687055152)

>>> TextBlob(tweet2, analyzer=NaiveBayesAnalyzer()).sentiment

Sentiment(classification='pos', p_pos=0.9859238351630609, p_neg=0.014076164836940958)

Results: Sentiment Over Time II

Results: Aggregated II

Count Percent Count Percent

positive 7755 34.32% 12427 41.85%

negative 5688 25.17% 4790 16.13%

neutral 9156 40.52% 12475 42.01%

External Survey Results

Source:http://www.theacsi.org/?option=com_content&view=article&id=147&catid=14&Itemid=212&i=Airlines

Literal Interpretation

The Verdict

YOU WIN!

Questions?!