exploiting social context for review quality prediction
DESCRIPTION
Exploiting Social Context for Review Quality Prediction. Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros Ntoulas Microsoft Research - PowerPoint PPT PresentationTRANSCRIPT
Exploiting Social Context for Review Quality Prediction
Yue Lu University of Illinois at Urbana-ChampaignPanayiotis Tsaparas Microsoft ResearchAlexandros Ntoulas Microsoft ResearchLivia Polanyi Microsoft
April 28, WWW’2010 Raleigh, NC
2
Why do we care about Predicting Review Quality?
User reviews (1764)
User “helpfulness” voteshelp prioritize reading
But not all reviews have votes1. New reviews2. Reviews aggregated from
multiple sources
3
What has been done?• As classification or regression problem
√ ×?
???
??
?
?
?
√
[Zhang&Varadarajan`06] [Kim et al. `06][Liu et al. `08] [Ghose&Ipeirotis `10]
Labeled
Unlabeled
• Textual features• Meta-data features
4
Reviews are NOTStand-Alone Documents
We also observe…
Reviewer Identity
Social Network Social Context=+
Our Work:Exploiting Social Context for Review Quality Prediction
5
Roadmap
• Motivation• Review Quality Prediction Algorithms • Experimental Evaluation• Conclusions
6
• SentiPositive• SentiNegative
Text-only Baseline
Textual Features
Text Statistics
• NumSent• NumTokens• SentLen• CapRatio• UniqWordRatio
Syntactic
• POS:RB• POS:PP• POS:V• POS:CD• POS:JJ• POS:NN• POS:SYM• POS:COM• POS:FW
Conformity
• KLDiv
Sentiment
FeatureVector( )=
7
Base Model: Linear Regression
w = argmin= argmin{ }
Quality( ) = Weights×FeatureVector( )i
i
Closed-form: w=
8
Straight-forward Approach: Adding Social Context as Features
Reviewer History
• NumReview
• AvgRating
Social Network
• InDegree• OutDegree• PageRank
Textual Features
Social Context Features
FeatureVector( )=
Disadvantages:•Social context features not always available• Anonymous reviews?• A new reviewer?•Need more training data
9
Our Approach: Social Context as Constraints
Reviewer Identity
Social Network
Quality( )Quality( )
is related to
Quality( ) is related to its Social Network
Our Intuitions:
How to combine such intuitions with Textual info?
10
Formally: Graph-based Regularizers
{ + β× Graph Regularizer }w = argmin
Trade-off parameter
Designed to “favor”our intuitions
BaselineLoss function
Advantages:• Semi-supervised: make use of unlabeled data• Applicable to reviews without social context
Labeled Unlabeled
We will define four regularizers base on four hypotheses.
11
1.Reviewer Consistency Hypothesis
Quality( )
Quality( ) ~
1 23 4
1
4
Quality( ) 2
Quality( ) ~3
Reviewers are consistent!
12
Regularizer for Reviewer Consistency
Reviewer Regularizer =∑ [ Quality( ) -
Quality( ) ]21 2
Sum over all data (train + test) for all pairs reviews in the same-author graph
Closed-form solution!1 2
3 4
Same-Author Graph (A)
[Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06]
w=Graph LaplacianReview-Feature
Matrix
13
2.Trust Consistency Hypothesis
Quality( ) - Quality( ) ≤ 0
I trust people with quality at least as good as mine!
AVG ( Quality( ) )Defined as
14
Regularizer for Trust ConsistencyTrust Regularizer=∑max[0, Quality( ) -
Quality( )]2
Sum over all data (train + test) for all pairs ofreviewers connected in the trust graph
No closed-form solution…Still convexGradient Descent
Trust Graph
15
3.Co-Citation Consistency Hypothesis
Quality( ) - Quality( ) → 0
Trust Graph Co-citation Graph
I am consistent with my “trust standard”!
16
Regularizer for Co-citation Consistency
Co-citation Regularizer
=∑[ Quality( ) - Quality( ) ]2
Closed-form solution!
Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph
Co-citation Graph (C)
w=Review-Reviewer Matrix
17
4.Link Consistency Hypothesis
Quality( ) - Quality( ) → 0
Trust Graph Link Graph
I trust people with similar quality as mine!
18
Regularizer for Link ConsistencyLink Regularizer
=∑[ Quality( ) - Quality( ) ]2
Closed-form solution!
Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph
Link Graph
19
Roadmap
• Motivation• Review Quality Prediction Algorithms• Experimental Evaluation• Conclusions
20
Data from Ciao UKStatistics Cellphone Beauty Digital Camera# Reviews 1943 4849 3697Reviews/Reviewer ratio 2.21 2.84 1.06
Trust Graph Density 0.0075 0.014 0.0006
Summary Cellphone Beauty Digital CameraSocial Context rich rich sparse
Gold-std Quality Distribution balanced skewed balanced
21
Hypotheses Testing:Reviewer Consistency
Qg( ) -1 Qg( ) 2
Qg( ) -1 Qg( ) 3
Reviewer Consistency Hypothesis supported by data
Difference in Review QualityDe
nsityFrom same reviewer
From different reviewers
(Cellphone)
22
Hypotheses Testing:Social Network-based Consistencies
Qg( ) - Qg( ) B is not linked to AB trusts AB is co-cited with AB is linked to A
B A
Social Network-based Consistencies supported by data
Difference in Reviewer QualityDe
nsity
(Cellphone)
23
-15%
-10%
-5%
0%
5%
10%
15%
Prediction Performance:Exploiting Social Context
% o
f MSE
Diff
eren
ce
Percentage of Training Data10% 25% 50% 100%
AddFeatures is most effective given sufficient training data
With limited training data, Reg methods work best
Reg:Reviewer > Reg:Trust > Reg:Cocitation > Reg:Link
(Cellphone)Better
Reg:
Link
AddF
eatu
res
Reg:
Revi
ewer
Reg:
Coci
tatio
nRe
g:Tr
ust
24
Prediction Performance:Compare Three Categories
-15%-13%-11%
-9%-7%-5%-3%-1%
% o
f MSE
Diff
eren
ce Cellphone Beauty Digital Camera
Better
Reg:
Link
Reg:
Revie
wer
Reg:
Cocit
ation
Reg:
Trus
t
Improvement on Digital Camera is smaller due to sparse social context
Reviews/Reviewer ratio = 1.06
25
Parameter Sensitivity
Text-only Baseline
(Cellphone) (Beauty)Regularization Parameter
Mea
n Sq
uare
d Er
ror
consistently better than Baseline when parameter < 0.1
Better
26
Conclusions
• Improve Review Quality Prediction using Social Context
• Formalize into a Semi-supervised Graph Regularization framework• Utilize both labeled and unlabeled data• Applicable on data with no social context
• Promising results on real world data– Esp. limited labels, rich social context
27
Future Work
• Combine multiple regularizers• Optimize by nDCG instead of MSE• Infer trust network• Spam detection
Thank you!&
Questions?