artificial intelligence

Authors:

Cristian Danescu-Niculescu-Mizil (Cornell University, NY, USA)

Gueorgi Kossinets (Google Inc., Mountain View, CA, USA)

Jon Kleinberg (Cornell University, NY, USA)

Lillian Lee (Cornell University, NY, USA)

1

Abstract Developing a framework for analyzing and

modeling opinion evaluation, using large-scale collection of Amazon book reviews as a dataset.

Perception of reviews depends on Content of review. How expressed evaluation relates to other evaluations of

same product.

2

Introduction Understanding how people’s opinions are received

and evaluated is a fundamental problem.

Problems arise in many domains, such as

In marketing studies of the impact of reviews on product sales

In political science models of how support for a candidate depends on the views he or she expresses on different topics.

3

Data Dataset (after various filters) comprised of 6,74,018

books.

99.3% of books had 100 or fewer reviews.

For remaining 4,664 books, the 100 earliest reviews for each product were retrieved.

Total reviews: 40,43,103 , out of which 10,08,466 reviews that had at least 10 helpfulness votes each were focused upon.

4

Important Terms Star Rating: the no. of stars it assigns to the product Helpfulness Votes: information that a out of b people

found the review helpful. Does this review help you in making a purchase decision? Understand the way in which Amazon users evaluate each

others’ reviews in practice.

Average star rating: Average of rating given by all reviews of that product

Computed product average star rating: Average star rating as computed over all reviews of that product in our dataset.

5

Helpfulness VotesStar Rating of Reviews

Average Star Rating by Amazon

6

Difference in Average and Computed Average Rating :

1. Amazon rounds the displayed star average to the nearest half-star (e.g., 3.5 or 4.0)—but for the experiments it was preferable to have a greater degree of resolution.

1. For very small (0.7%) fraction of books considered, the displayed star average would be partially based on reviews that came later than the first 100 and which are not there in our dataset.

However, the mean absolute difference between the computed star average when rounded to the nearest half-star (0.5 increment) and the displayed star average is only 0.02

7

Variance : Describes how far values lie from the mean.

If value is very low - the reviews with the highest helpfulness ratios are those with the average star rating.

If moderate values of the variance - most helpful are those that are slightly above the average star rating.

If value becomes large - reviews with star ratings both above and below the average are evaluated as more helpful than those that have the average star rating (with the positive reviews deemed somewhat more helpful).

8

Theories underlying Helpfulness Evaluation Conformity Hypothesis: a review is evaluated as more

helpful when its star rating is closer to average star rating.

Individual-bias Hypothesis: review is considered more helpful when opinion is close to viewer’s own opinion.

Brilliant-but-cruel Hypothesis: Negative opinion is considered most insightful.

Quality-only straw-man Hypothesis: Helpfulness vote depend on both textual and non-textual content (star rating).

9

Conformity Hypothesis The Helpfulness ratio should be lower for reviews with

star ratings either above or below the product average

Varies inversely with the absolute value of the difference between the review’s star rating and the computed product-average star rating—this difference is review’s deviation.

Figure 1 shows a very strong inverse correlation between the median helpfulness ratio and the absolute deviation, as predicted by the conformity hypothesis.

10

Fig 1: Helpfulness ratio declines with the absolute value of a review’s deviation from the computed star average; this behavior is predicted by the conformity hypothesisThe line segments within the bars (connected by the descending line) indicate the median helpfulness ratio; the bars depict the helpfulness ratio.

11

Fig 2: The dependence of helpfulness ratio on a review’s signed deviation from average is inconsistent with both the brilliant-but-cruel and, because of the asymmetry, the conformity hypothesis.

12

Figure 2 presents counter-evidence for the conformity hypothesis.

It incorrectly predicts that the connecting lines would be horizontal.

Thus to account for Figure 2, we can simply impose upon the conformity hypothesis, an extra “tendency towards positivity” factor, but this would be quite unsatisfactory: it wouldn’t suggest any underlying mechanism for this factor.

13

Brilliant-but-cruel Hypothesis Fig 1 cannot be used to prove this hypothesis, since

absolute deviation |x| > 0 doesn’t distinguish between positive and negative reviews.

This hypothesis clearly does not hold for our data:

Among reviews with the same absolute deviation |x| > 0, the relatively positive ones (|x|) have higher median helpfulness ratio than the relatively negative ones (-|x|), as lines connecting (-|x|,|x|) pairs of datapoint have positive slopes.

This contradicts the assumption that negative reviews are more helpful (negative slope). 14

Individual-Bias Hypothesis The results of this hypothesis look similar to conformity

theory when tested for data with low variance.

It forms Single-Peak Distribution.

For High Variance, we observe Two-Peaks in the distribution.

Also in the previous figures, the reviews with variance zero had the highest helpfulness ratios, but once the variance is 3.0 or greater, the highest helpfulness ratios achieved are close to, but still above zero.

Thus, Variance is the Key Factor that any hypothesis needs to incorporate. 15

Figure 3: As the variance increases, the median helpfulness ratio curve becomes two-humped and the helpfulness ratio at signed deviation 0 (indicated in red) no longer represents the unique global maximum. There are non-zero signed deviations in the plot for variance 0 because we rounded variance values to the nearest .5 increment.

16

Controlling for Text Quality

Analysis done is doesn't explicitly consider the actual text of reviews.

Helpfulness ratios are dependent on :

> Deviation from computed star

average rating

> star rating variance within reviews for a given product .

17

Key Insights

Instead of re-evaluating all reviews for helpfulness, reviews that were guaranteed to have very similar textual quality were focused.

Identical reviews must necessarily exhibit the same level of text quality.

18

Identifying Plagiarism

The pair of reviews which were posted to different books were only considered.

Code of sorokina was used to identify pairs of reviews having highly similar text.

A threshold was determined to deem a review pair as plagiarized.

19

Conclusion for text Quality

Text quality is not the (only) explanatory factor.

A statistically significant difference exists between members of such plagiarized pairs.

Certain copies are practically indistinguishable by their contexts.

20

Model based on Individual bias and Mixtures of Distributions

Assumptions: Review evaluators are positively

disposed towards the product . Review evaluators are negatively

disposed towards the product . No specific distributional assumption

about the evaluators.

21

Consistency among Countries

Noticeable differences occur between reviews collected from different regional Amazon sites.

In Japan, the helpfulness evaluators considered negative reviews more favourable.

22

Conclusion

Sites like Amazon.com provide opinions evaluation for and by its online members communities.

A review's helpfulness not only depends on its content ,but also the relation of its score to other scores.

23

THANK YOU !

24