twitter sentiment analysis

Post on 12-Apr-2017

361 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The University of Texas at Dallas utdallas.edu

Airline Twitter Analysis

1

The University of Texas at Dallas utdallas.edu

What we wanted to do?

• Kaggle- Twitter Airlines Sentiments• Exploratory Analysisi. When do people tweet?ii. Which airlines gets the most tweets?iii. Which sentiments are dominant?iv. How these sentiments are distributed?• Text Analyticsi. Most frequently used wordsii. Most frequently used words when the sentiment is negative.iii. Most frequently used words when the sentiment is positive.iv. Tweet length vs Sentiment

2

The University of Texas at Dallas utdallas.edu

Cleansing of data

• Tweets Had “@airline name” at the beginning of every tweet

• 4 columns with hardly any data

• Null and missing values

• Co-Ordinates required - Geo coding

3

The University of Texas at Dallas utdallas.edu

When do people tweet?

• Most of the tweets have come in during the rush morning hours peaking at 9 am

4

0

200

400

600

800

1000

1200

0 5 10 15 20 25

Nu

mb

er

of

Twe

ets

Hour

Number of Tweets every hour

The University of Texas at Dallas utdallas.edu

How are the tweets & sentiments distributed?

• United Airlines, American and US Airways receive most of the tweets.

• Most of the tweets are negative as expected.

• 63% of the tweets are negative.

5

The University of Texas at Dallas utdallas.edu

Distribution of sentiments for all the airlines

Sentiment frequency

Positive 0.1706621Neutral 0.2295947Negative 0.5997432

• The three airlines having maximum tweets are the ones having maximum negative tweets? Why?

6

The University of Texas at Dallas utdallas.edu

Why so many negative tweets?

7

The University of Texas at Dallas utdallas.edu

Word clouds to show frequency of words used in negative tweets

8

US Airways United Airlines American Airlines

The University of Texas at Dallas utdallas.edu

An outlier in the case of Delta Airlines

.

9

The University of Texas at Dallas utdallas.edu

Word cloud for all the positive tweets

10

The University of Texas at Dallas utdallas.edu

From which time zones are people tweeting ?

• Flights travel everywhere throughout the world.

• But we observed that most of the tweets originate from the Eastern Time zone(US & Canada).

11

The University of Texas at Dallas utdallas.edu

Association Analysis

• Association Analysis on words used in the tweet.

12

The University of Texas at Dallas utdallas.edu

Hierarchical clustering to determine association between words

13

The University of Texas at Dallas utdallas.edu

Cont’d

14

The University of Texas at Dallas utdallas.edu

Kmean clustering

15

The University of Texas at Dallas utdallas.edu

Cont’d

16

The University of Texas at Dallas utdallas.edu

Association between Tweet length and sentiment

• Longer the tweet, we observed they are likely to be negative in sentiment.

17

The University of Texas at Dallas utdallas.edu

Cont’d

18

The University of Texas at Dallas utdallas.edu

What else we tried doing?

• A predictive model

• Setbacks we faced during the process

• Work on SPSS

• Categorization

19

The University of Texas at Dallas utdallas.edu

Why this Analysis? Will it help in some way?

• Airline Industry – lives on customers.

• We get to know where we are doing good and where we are doing bad.

• Can be a basis for a predictive model when we associated tweet length with sentiment.

• Companies can get to know their competition.

• Improve the flight journey overall.

20

The University of Texas at Dallas utdallas.edu

References

• Wikipedia.com

• Kaggle.com

• www.clarabridge.com/text-analytics/

• https://sites.google.com/site/manabusakamoto/home/r.../r-tutorial-3

21

top related