applying word vectors sentiment analysis

8
Applying Word Vectors for Sentiment Analysis & Text Analysis while Browsing Abdullah Khan Zehady Department Of Computer Science, Purdue University

Upload: abdullah-khan-zehady

Post on 04-Aug-2015

91 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Applying word vectors sentiment analysis

Applying Word Vectors forSentiment Analysis

&Text Analysis while Browsing

Abdullah Khan ZehadyDepartment Of Computer Science,

Purdue University

Page 2: Applying word vectors sentiment analysis

Movie Review- Sentiment Analysis

● Collected from Kaggle ML Competition.● Data

o “Review Index” “Review” “Sentiment( 0/1)”1. LabeledTrainData

● 25000 movie reviews1. TestData

● 25000 movie reviews

Page 3: Applying word vectors sentiment analysis

Approach 1: Bag Of Word - Baseline

● Data Preprocessingo Removal of HTML, Non-Letters, Stopwords, space +

LowerCase conversion ● Creating Features from Bag Of Words

o 5000 most freq words (25000 x 5000)o { the, cat, sat, on, hat, dog, ate, and } ---> { 2, 1, 1, 1, 1, 0, 0, 0 }o { the, cat, sat, on, hat, dog, ate, and } ---> { 3, 1, 0, 0, 1, 1, 1, 1}

● Supervised Learningo Random Forest Classifier with 100 trees

Page 4: Applying word vectors sentiment analysis

Approach 2: TF-IDF Word WeightApproach 3: Vector Averaging

● Review Vector ← TF-IDF word weight ● Word2Vec word vectors (Dim = 300)

o Review Vector ← Element wise Average

Approach 4: Bag Of Centroids

● K-Means Clustering to find word clusters● Number of Features = Number of Clusters● Review Feature Vector

o Find which feature a word belongs to and increase the cluster value.

Page 5: Applying word vectors sentiment analysis

Approach 5: Clustering + Pretrained Vector

+ External Sentiment Dict.

● Pre-trained Data (using word2vec)o Entity vectors trained on 100B words from various news articles:

freebase-vectors-skipgram1000.bin.gz o pre-trained vectors trained on part of Google News dataset (about 100 billion words)

● Word2Vec “distance”, “most_similar” to lookup close words + find review tones

● Incorporating “Sentiwordnet” informationo Positive, Negative Score for each word

Page 6: Applying word vectors sentiment analysis

Result

Method Accuracy

Bag Of Words 0.84

TF-IDF 0.74

Vector Averaging 0.63

Bag Of Centroids 0.81

PreTrain + Ext. Knowledge 0.87

Page 7: Applying word vectors sentiment analysis

Page Analysis Chrome Extension

● Important Word List● Important Named Entities● Tag Distribution● Summarization of Text● Sentiment Analysis

○ Comment Analysis

A useful tool everybody will be able to use to extract meaningful information from a webpage.

Page 8: Applying word vectors sentiment analysis

Future Work● Implementation of RNN, LSTM-RNN, Paragraph Vector

o Y Bengio, R Ducharme, P Vincent… - The Journal of Machine …, 2003 - dl.acm.org

o P Le, W Zuidema - COLING, 2012o QV Le, T Mikolov, 2014

● Relational inference for wikificationo Disambiguation to Wikipedia

Pr(title|surface) o Candidate title <- Compositional Semantics for candidate wiki page

● Extension: Reranking Google Search result using information visualization.