fyp ca2

27
MINING USER’S OPINIONS ON HOTELS

Upload: haha-teh

Post on 21-Jan-2015

470 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Fyp ca2

MINING USER’S OPINIONS ON

HOTELS

Page 2: Fyp ca2

BRIEF RECAP ON CA1

Page 3: Fyp ca2

Literature Review / Background

Web is a huge database of opinions on hotels

Commercial Possibilities / Business Intelligence

“What others think” is an important element in decision making

Opinion Mining / Sentiment Analysis

Page 4: Fyp ca2

Far From a Solved Problem

Impossible for human read every single opinions Machines can be trained to do this

People always express more than one opinion

Use of Sarcasm and Negation

Expression of sentiments in different topic and domain eg big: Positive when swimming pool is big enough

to swim, Negative when the queue is long

Page 5: Fyp ca2

How to train a machine to analyze sentiments

Natural Language Processing (NLP) Transform opinion to a format the machine

understand

Artificial Intelligence Machine are able to use information given by NLP

and a lot of math to analyze sentiments Make the machine determine what is facts and

opinions like how a normal human understand them by reading

Page 6: Fyp ca2

Problems of Machine

Subjectivity and Sentiment

Analyze polarity

Opinion rating

Sentiment intensity

Different domains / topic context

Facts Vs Opinion

Page 7: Fyp ca2

Ambiguity to machine examples

“The swimming pool is better than the tennis court”. Comparisons are hard to classify

“This hotel is very boleh lah” Use of Slang and cultural communication

“This breakfast is as good as none” Negativity not obvious to machine

“The weather is hot” In different context, the statement has different

polarity

Page 8: Fyp ca2

WHAT IS DONE IN CA1

Page 9: Fyp ca2

EXTRACTION – Preparing machine to analyze data

Page 10: Fyp ca2

Review and aspects extraction process

Extract important datasets from review websites

Word handling to refine datasets

Use part of speech tagging to label text to extract aspects which are nouns

Determine aspects / features that people are concerned about from these reviews by occurrence and context

Page 11: Fyp ca2

Part of Speech Tagging

Assigning a label to every word in the text to allow machine to do something with it

Page 12: Fyp ca2

Word Handling

Dictionary / Spelling Correction

Slang Check

Foreign language check

Singular / Plural conversion

Duplicate check

Page 13: Fyp ca2

END OF CA1

Page 14: Fyp ca2

CA2 : Data Processing

Page 15: Fyp ca2

Classifying Sentiments using some existing methods

Naïve Bayes To determine polarity of sentiments

Maximum Entropy Using probability distributions on the basis of partial

knowledge

Support Vector machine Analyze patterns and classify sentiments

Page 16: Fyp ca2

Naïve Bayes Classifier

To determine polarity of sentiments

P(X | Y) = P(X)P(Y | X) / P(Y)

Probability that a sentiments is positive or negative, given it's contents

Probability of a word occurring given a positive or negative sentiment

Assumptions: There is no link between words

P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)

Page 17: Fyp ca2

Problem with Naïve Bayes

Polarity does not change with domain

Words within sentiments have no relationship with each other

Words not found in lexicon might be missed by Naïve Bayes resulting in inaccuracy of polarity

No opinion rating to determine which sentiment is more polar

Page 18: Fyp ca2

Solution to Naïve Bayes

Establish domain sentiment relations

Establish domain aspects relations

Establish aspects sentiments relations

Estimate polarity for unseeded sentiments

Estimate strength of polarity on sentiments

Page 19: Fyp ca2

Establishing relations

Establish domain by categorizing aspects founded into domains such as food, location and security

Finding occurrence of aspects / sentiments within sentences for a particular domain

Finding polarity of sentences, aspects and sentiments and establishing relations

Domain

Aspects Sentiments

Page 20: Fyp ca2

Finding polarity for unseeded sentiments

After establishing relations, we have a graph of nodes (Sentiments / Aspects)

Some nodes have no polarity after naïve bayes but its connected nodes might have polarity

Determine the probability that the node is positive or negative given its surrounding nodes

Page 21: Fyp ca2

Estimating the strength of polarity

Determine the strength of the polarity of an unseeded node given that amount of traversal surrounding nodes with polarity has to take to reach it

Find the shortest path to reach an unseeded node which will result in a spanning tree

This will determine the strength of polarity

Page 22: Fyp ca2

Implementation

Using Dijkstra Algorithm to find the spanning tree

Page 23: Fyp ca2

Implementation

Find the cost to get from surrounding nodes to an unseed node

Page 24: Fyp ca2

END OF CA2

Page 25: Fyp ca2

What is going to happen in CA3?

Page 26: Fyp ca2

Prototyping

Refining parameters to come up with a prototype mainly to solve the following problems: Analyze polarity Opinion rating Sentiment intensity Different domains / topic context

Manually analyze reviews myself and check prototype for effectiveness and seek to improve accuracy

Page 27: Fyp ca2

Prototype testing

Enlarging dataset from various hotel review site

Merging results to find correlations between sentiments expression on different sites

Testing on different domain such as food to get domain dependent results