exploiting social context for review quality prediction

28
Exploiting Social Context for Review Quality Prediction Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros Ntoulas Microsoft Research Livia Polanyi Microsoft April 28, WWW’2010 Raleigh, NC

Upload: binah

Post on 22-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Social Context for Review Quality Prediction. Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros Ntoulas Microsoft Research - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Social Context for Review Quality Prediction

Exploiting Social Context for Review Quality Prediction

Yue Lu University of Illinois at Urbana-ChampaignPanayiotis Tsaparas Microsoft ResearchAlexandros Ntoulas Microsoft ResearchLivia Polanyi Microsoft

April 28, WWW’2010 Raleigh, NC

Page 2: Exploiting Social Context for Review Quality Prediction

2

Why do we care about Predicting Review Quality?

User reviews (1764)

User “helpfulness” voteshelp prioritize reading

But not all reviews have votes1. New reviews2. Reviews aggregated from

multiple sources

Page 3: Exploiting Social Context for Review Quality Prediction

3

What has been done?• As classification or regression problem

√ ×?

???

??

?

?

?

[Zhang&Varadarajan`06] [Kim et al. `06][Liu et al. `08] [Ghose&Ipeirotis `10]

Labeled

Unlabeled

• Textual features• Meta-data features

Page 4: Exploiting Social Context for Review Quality Prediction

4

Reviews are NOTStand-Alone Documents

We also observe…

Reviewer Identity

Social Network Social Context=+

Our Work:Exploiting Social Context for Review Quality Prediction

Page 5: Exploiting Social Context for Review Quality Prediction

5

Roadmap

• Motivation• Review Quality Prediction Algorithms • Experimental Evaluation• Conclusions

Page 6: Exploiting Social Context for Review Quality Prediction

6

• SentiPositive• SentiNegative

Text-only Baseline

Textual Features

Text Statistics

• NumSent• NumTokens• SentLen• CapRatio• UniqWordRatio

Syntactic

• POS:RB• POS:PP• POS:V• POS:CD• POS:JJ• POS:NN• POS:SYM• POS:COM• POS:FW

Conformity

• KLDiv

Sentiment

FeatureVector( )=

Page 7: Exploiting Social Context for Review Quality Prediction

7

Base Model: Linear Regression

w = argmin= argmin{ }

Quality( ) = Weights×FeatureVector( )i

i

Closed-form: w=

Page 8: Exploiting Social Context for Review Quality Prediction

8

Straight-forward Approach: Adding Social Context as Features

Reviewer History

• NumReview

• AvgRating

Social Network

• InDegree• OutDegree• PageRank

Textual Features

Social Context Features

FeatureVector( )=

Disadvantages:•Social context features not always available• Anonymous reviews?• A new reviewer?•Need more training data

Page 9: Exploiting Social Context for Review Quality Prediction

9

Our Approach: Social Context as Constraints

Reviewer Identity

Social Network

Quality( )Quality( )

is related to

Quality( ) is related to its Social Network

Our Intuitions:

How to combine such intuitions with Textual info?

Page 10: Exploiting Social Context for Review Quality Prediction

10

Formally: Graph-based Regularizers

{ + β× Graph Regularizer }w = argmin

Trade-off parameter

Designed to “favor”our intuitions

BaselineLoss function

Advantages:• Semi-supervised: make use of unlabeled data• Applicable to reviews without social context

Labeled Unlabeled

We will define four regularizers base on four hypotheses.

Page 11: Exploiting Social Context for Review Quality Prediction

11

1.Reviewer Consistency Hypothesis

Quality( )

Quality( ) ~

1 23 4

1

4

Quality( ) 2

Quality( ) ~3

Reviewers are consistent!

Page 12: Exploiting Social Context for Review Quality Prediction

12

Regularizer for Reviewer Consistency

Reviewer Regularizer =∑ [ Quality( ) -

Quality( ) ]21 2

Sum over all data (train + test) for all pairs reviews in the same-author graph

Closed-form solution!1 2

3 4

Same-Author Graph (A)

[Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06]

w=Graph LaplacianReview-Feature

Matrix

Page 13: Exploiting Social Context for Review Quality Prediction

13

2.Trust Consistency Hypothesis

Quality( ) - Quality( ) ≤ 0

I trust people with quality at least as good as mine!

AVG ( Quality( ) )Defined as

Page 14: Exploiting Social Context for Review Quality Prediction

14

Regularizer for Trust ConsistencyTrust Regularizer=∑max[0, Quality( ) -

Quality( )]2

Sum over all data (train + test) for all pairs ofreviewers connected in the trust graph

No closed-form solution…Still convexGradient Descent

Trust Graph

Page 15: Exploiting Social Context for Review Quality Prediction

15

3.Co-Citation Consistency Hypothesis

Quality( ) - Quality( ) → 0

Trust Graph Co-citation Graph

I am consistent with my “trust standard”!

Page 16: Exploiting Social Context for Review Quality Prediction

16

Regularizer for Co-citation Consistency

Co-citation Regularizer

=∑[ Quality( ) - Quality( ) ]2

Closed-form solution!

Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph

Co-citation Graph (C)

w=Review-Reviewer Matrix

Page 17: Exploiting Social Context for Review Quality Prediction

17

4.Link Consistency Hypothesis

Quality( ) - Quality( ) → 0

Trust Graph Link Graph

I trust people with similar quality as mine!

Page 18: Exploiting Social Context for Review Quality Prediction

18

Regularizer for Link ConsistencyLink Regularizer

=∑[ Quality( ) - Quality( ) ]2

Closed-form solution!

Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph

Link Graph

Page 19: Exploiting Social Context for Review Quality Prediction

19

Roadmap

• Motivation• Review Quality Prediction Algorithms• Experimental Evaluation• Conclusions

Page 20: Exploiting Social Context for Review Quality Prediction

20

Data from Ciao UKStatistics Cellphone Beauty Digital Camera# Reviews 1943 4849 3697Reviews/Reviewer ratio 2.21 2.84 1.06

Trust Graph Density 0.0075 0.014 0.0006

Summary Cellphone Beauty Digital CameraSocial Context rich rich sparse

Gold-std Quality Distribution balanced skewed balanced

Page 21: Exploiting Social Context for Review Quality Prediction

21

Hypotheses Testing:Reviewer Consistency

Qg( ) -1 Qg( ) 2

Qg( ) -1 Qg( ) 3

Reviewer Consistency Hypothesis supported by data

Difference in Review QualityDe

nsityFrom same reviewer

From different reviewers

(Cellphone)

Page 22: Exploiting Social Context for Review Quality Prediction

22

Hypotheses Testing:Social Network-based Consistencies

Qg( ) - Qg( ) B is not linked to AB trusts AB is co-cited with AB is linked to A

B A

Social Network-based Consistencies supported by data

Difference in Reviewer QualityDe

nsity

(Cellphone)

Page 23: Exploiting Social Context for Review Quality Prediction

23

-15%

-10%

-5%

0%

5%

10%

15%

Prediction Performance:Exploiting Social Context

% o

f MSE

Diff

eren

ce

Percentage of Training Data10% 25% 50% 100%

AddFeatures is most effective given sufficient training data

With limited training data, Reg methods work best

Reg:Reviewer > Reg:Trust > Reg:Cocitation > Reg:Link

(Cellphone)Better

Reg:

Link

AddF

eatu

res

Reg:

Revi

ewer

Reg:

Coci

tatio

nRe

g:Tr

ust

Page 24: Exploiting Social Context for Review Quality Prediction

24

Prediction Performance:Compare Three Categories

-15%-13%-11%

-9%-7%-5%-3%-1%

% o

f MSE

Diff

eren

ce Cellphone Beauty Digital Camera

Better

Reg:

Link

Reg:

Revie

wer

Reg:

Cocit

ation

Reg:

Trus

t

Improvement on Digital Camera is smaller due to sparse social context

Reviews/Reviewer ratio = 1.06

Page 25: Exploiting Social Context for Review Quality Prediction

25

Parameter Sensitivity

Text-only Baseline

(Cellphone) (Beauty)Regularization Parameter

Mea

n Sq

uare

d Er

ror

consistently better than Baseline when parameter < 0.1

Better

Page 26: Exploiting Social Context for Review Quality Prediction

26

Conclusions

• Improve Review Quality Prediction using Social Context

• Formalize into a Semi-supervised Graph Regularization framework• Utilize both labeled and unlabeled data• Applicable on data with no social context

• Promising results on real world data– Esp. limited labels, rich social context

Page 27: Exploiting Social Context for Review Quality Prediction

27

Future Work

• Combine multiple regularizers• Optimize by nDCG instead of MSE• Infer trust network• Spam detection

Page 28: Exploiting Social Context for Review Quality Prediction

Thank you!&

Questions?