twitter sentiment analysis project report

4

Click here to load reader

Upload: bharat-khanna

Post on 14-Apr-2017

688 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Twitter sentiment analysis project report

11/18/2015

Analyze Twitter Data with Hortonworks HadoopIntermediate Project Report

Bharat KhannaUniversity at Buffalo

Page 2: Twitter sentiment analysis project report

Sentiment Analysis of Mr. Narendra Modi’s Brand Image using Twitter Data

Summary: - I am doing sentiment analysis of Mr. Narendra Modi’s Brand Image across different nations using data from twitter. For fetching the twitter data, I am using Apache Flume that is open source and by default comes installed in Hortonworks sandbox platform 1.3.

After fetching the data from twitter, it would be loaded directly to HDFS (Hadoop Distributed File System). This way I am reducing the extra overhead of transferring the data from local system to HDFS.

Data loaded in HDFS is still in unstructured format and not good for Ad-hoc analysis. So I will be converting the JSON data to tabular format and store it in HIVE. Also I would be providing a graphical user interface to end users to run their own ad-hoc analysis.

Next step deals with using the dictionary file to score the sentiment of each tweet by the number of positive words compared to number of negative words, and then assigned a positive, negative or neutral sentiment value to each tweet. I have downloaded the dictionary file from below link.

Click here for Dictionary

Last part of project is to show results of sentiments analysis in form of visualizations. Here I will be using Tableau for it. I will be connecting Tableau to Hive using Hortonworks ODBC Driver that I downloaded from Hortonworks website (link mentioned in references section). I will show the results of analysis in the form graphs and maps using Tableau’s inbuilt VIZQL server.

Data sets and Software:

Sentiment Data: - Sentiment Data is unstructured data that represents opinions, emotions, attitudes contained in sources such as social media posts, online blogs, and product reviews etc.

Why use sentiment Data: - Organizations use sentiment data to know what people feel about their product and what they can do to effectively market their product.

How did I fetched Twitter Data: - Created twitter app, configured flume.conf with app credentials and ran flume. All the steps for fetching data from twitter using Apache Flume I have mentioned in a YouTube video and a ppt, the link of which is below. I have also uploaded video at ublearns discussion forum of DC.

YouTube: - https://youtu.be/E1w5SkE7Cco

Slide share: - http://www.slideshare.net/bharat3khanna/extracting-twitter-data-using-apache-flume

Source code for Flume-Snapshot.jar:- I downloaded source code of Flume-snapshot.jar from github and built the jar using maven package in Hadoop cluster.

1

Page 3: Twitter sentiment analysis project report

Click here for Flume Source Code

Size of Data: - Though there is no limitation of amount of data I can get from twitter but for this project, I am going to do my analysis on approximately 100 mb of data.

Algorithms Used: - I am not using Map-Reduce Algorithm here, since I want to do analysis on complete data and I don’t want to use aggregated measures. If I would have used Map Reduce, then my lot of data would have been aggregated by reducer. My source data is in JSON format and I am using Hive-serde.jar (serde stands serializer and deserializer) that helps in parsing the JSON data effectively to hive tables.

Source code for Hive-serde.jar:- I downloaded source code of Hive-serde.jar from github and built the jar using maven package in Hadoop cluster.

Click here for Hive-serde.jar source code

Analysis to be done on Twitter data: - I am going to do following analysis using Hive and Tableau:-

a) Maximum tweets count per user.b) Count of retweets.c) Geographically mapping people’s sentiments towards Mr. Modi.

References: -

http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop

https://github.com/cloudera/cdh-twitter-example

https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon

http://hortonworks.com/products/releases/hdp-1-3/#add_ons

2