sentiment analysis

Download Sentiment Analysis

Post on 07-Aug-2015

22 views

Category:

Education

3 download

Embed Size (px)

TRANSCRIPT

  1. 1. Sentiment Analysis for Twitter Priyanka Bajaj priyanka.bajaj@students.iiit.ac.in Kamal Gurala kamal.gurala@students.iiit.ac.in Faraz Alam faraz.alam@students.iiit.ac.in Ritesh Kumar Gupta ritesh.kumar.gupta@in.ibm.com Guided By : Satarupa Guha satarupaguha11@gmail.com
  2. 2. Agenda 1.Introduction Sentiment Analysis 2.About Twitter and Our Goal 3.Glossary 4.Challenges 5.Approach 6.Results and Conclusion 7.Tools and Technologies
  3. 3. What is Sentiment Analysis? Mechanism to extract opinions, emotions and sentiments in text Enable us to track attitudes and feelings on the web based on blog posts, comments, reviews and tweets on different topics Enable to track products, brands and people and determine whether they are viewed positively or negatively on the web. acts: "The painting was more expensive than a Monet" pinions: "I honestly don't like Monet, Pollock is better
  4. 4. An online social networking and micro blogging service Enables users to send and read "tweets", which are text messages limited to 140 characters, hence unambiguous 500 million tweets daily by 240+ million active users Audience varies from common man to celebrities Users discuss current affairs and share personal views Our goal: To determine whether the expressed opinion in the tweets is positive, negative or neutral. For tweets conveying both a positive and negative sentiment, choose the stronger sentiment About
  5. 5. Natural Language Processing: The attempt to use programming to read and understand the meaning of text. Semantic Analysis: Use of Natural Language processing (NLP) to derive "sentiment," or subjective information from text. Artificial Intelligence: Using information provided by NLP and mathematics to determine whether something is negative or positive Glossary
  6. 6. Challenges Tweets are highly unstructured and also non-grammatical Out of Vocabulary Words Lexical Variation Extensive usage of acronyms like asap, lol, afaik
  7. 7. Our System
  8. 8. Tweet Downloader Download the tweets using Twitter API Tokenisation Twitter specific POS Tagger and tokenizer developed by ARK Social Media Search Preprocessing Replacing Emoticons by their polarity, assign scores Remove URL, Target Mentions Replace #text -> text, since hashtags may contribute to the sentiment Replace Sequence of Repeated Characters eg. cooooool by cool and assign higher score Twitter specific stop word removal Acronym expansion System Details
  9. 9. Feature Extractor Unigrams and Bigrams Polarity Score of the Tweet (f1) Count of Positive/Negative Words (f2,f3) Maximum Positive/Negative Score for Words (f4,f5) Count of Positive/Negative Emoticons and assign scores(contibutes to all f1,f2,f3,f4,f5) Positive/Negative special POS Tags Polarity Score Classifier and Prediction Features extracted are fed into to SVM classifier Model built used to predict sentiment of new tweets System Details Contd.
  10. 10. Results and Conclusion A baseline model by taking the unigrams, and compare it with the bigrams and lexicon features model Sub-Task Baseline Model Feature Based Model Sentence Based 49.81% 57.85% Accuracy F1 Score (f-Measure) Sub-Task Baseline Model Feature Based Model Sentence Based 55.56 61.17 We investigated two kinds of models: Baseline and Feature Based Models For our feature-based approach, feature analysis reveals that the most important features are bigrams and those that combine the prior polarity of words and their parts-of-speech tags
  11. 11. 1. Concepts of Data Mining and Information Retrieval 2. Python Language 3. Java, Eclipse 4. Support Vector Machine(SVM) Theory 5. LIBSVM package for accuracy and f-Measure 6. Twitter.inc API for training set 7. NLTK 8. Shell Script for integration Tools and Technology Used
  12. 12. Thank You