disconfirmation effect on online rating behavior: a

Disconfirmation Effect on Online Rating Behavior: A Dynamic Analysis

Abstract

This research studies the effect of “disconfirmation”—the discrepancy between the expected and

experienced quality of the same product—on the behavior of consumers leaving product reviews in an

online setting. Specifically, we posit that an individual’s pre-purchase expectation is driven by the product

ratings she observes and the credibility of the review system she perceives at the time of purchases. Upon

product consumption, the individual obtains post-purchase evaluation and uses this information to

confirm/disconfirm with her expectation. We extend the Bayesian learning framework to model the

individual perception of the system credibility as how closely the ex-ante expectation matches the ex-post

evaluation. Incorporating such dynamic perception in raters’ decisions allows us to better understand the

chance of consumer engagement in sharing product experience over time. We estimate the proposed

model using a rich data set comprising compete purchasing and rating activities made by 361 individuals

on an e-commerce website. Our results show that consumer participation in leaving product reviews is

subject to the disconfirmation effect. The further an individual’s post-purchase evaluation deviates from

her pre-purchase expectation, the more likely she is to express her own opinions about the product. Our

results also show that online consumers tend to be less active with respect to posting reviews as they

perceive the system to be more precise over time. From simulations, we highlight 1) the dissimilar effects

of disconfirmation and experienced quality on consumer rating behavior; 2) various evolutionary patterns

of product ratings over time; and 3) the effect of fake, inflated ratings on subsequent rating entries.

Keywords: product reviews, disconfirmation, Bayesian learning, world of mouth, user-generated content

2

1. Introduction

It has been well recognized that online customer reviews and ratings1 have a substantial impact on

consumers’ purchasing decisions. According to surveys, 82% of the respondents agree that online reviews

directly influence their decisions on what product to buy (Deloitte 2007) and over 75% of them consider

recommendations from those who have experienced the product the most credible information source

(Nielson 2007). By observing the usage experience shared by peer consumers, prospective buyers can

reduce the product uncertainty and make more informed purchases, leading to higher satisfaction and

lower merchandise return rates (PowerReviews 2010). As a result, the prevalence of online reviews

enhances the information transparency and improves the efficiency of the digital marketplace.

Recently, there is growing evidence that companies intend to manipulate online customer reviews for

targeted products and services (Gormley 2013). Some boost the image of their own brands or

merchandises by posting deceptive positive evaluation, whereas others undermine that of the competitors’

by making up negative usage experience, or both. Such unethical practices cast serious doubt on the

trustworthiness of customer-reported ratings and jeopardize the credibility of the review system perceived

by consumers (Johnston 2014).

In this research, we empirically study how the behavior of consumer leaving online product reviews

is impacted by disconfirmation, the discrepancy between the pre-purchase expectation and the post-

purchase evaluation about the same product. Our proposed model is developed from two novel aspects.

First, we examine the online rating behavior by looking at the activities involved throughout the entire

purchase process. An online review platform is an information system which facilities information

exchange among its users. System users can be categorized into two groups based on how they interact

with the system. The first user type is the review reader (information receiver) who gathers information

about others’ usage experience stored in various formats, such as numeric ratings, textual contents, multi-

media files, etc. After consuming the product herself, a review reader has an opportunity to become a

1 Following literature, by reviews we mean a general format of consumer evaluation, whereas by ratings we

specifically refer to numeric values indicating overall satisfaction.

3

review poster (information provider), the second role a user can play, by sharing her own assessment on

the products. It has been shown that when making rating decisions, an individual tends to first observe the

expressed opinions by others and then adjusts her own feedback accordingly (Moe and Schweidel 2012;

Schlosser 2005). In terms of timing, prior research assumes that the previously posted ratings would

impact a focal individual only when she faces a rating decision in the post-purchase stage, without

considering an important fact that review posters are often review readers. Motivated by the dual roles a

consumer can play, we posit that the influence of existing ratings may take place when the focal

individual accesses the review information in the pre-purchase stage as well.

The second novelty of this research is that we take into account that individuals may not perceive the

review system to be truly trustful. Specifically, we consider that an individual holds the perception of the

system credibility, a subjective attitude underlying how credible the system is perceived by her. In the pre-

purchase stage, a consumer formulates the pre-purchase expectation of the product based on the posted

ratings, taking into account her own perceived system credibility. Upon product consumption, she

obtains the post-purchase evaluation of the product and encounters disconfirmation. The realizations of

disconfirmation are then used by the consumer to update her perception of the system credibility over

time. Such a consumer response is especially realistic, after the outbreak of firms’ strategic manipulation

in online rating environments.

Unlike extant work that studies the review incidence from a static point of view, this research

particularly examine at the evolution of the individual rating behavior from a dynamic perspective. We

extend the Bayesian learning framework, flexibly allowing the perception of system credibility to change

over time and dynamically incorporating this time-variant perception into raters’ decision-making process.

The proposed model is applied to a unique dataset consisting of complete purchase history and review

entries taken place on an e-commerce website. Utilizing this complete activity logs, we can exactly

observe whether a consumer leaves a review for the product she has purchased earlier on. In addition, the

length of our panel enables us to calibrate how consumer rating behavior changes over time. To the best

4

of our knowledge, this is the first research that looks at the evolution of consumer participation in giving

online reviews at the micro level.

Our estimation results show that consumer review-posting behavior is, at least partially, driven by

the quality disconfirmation, perhaps due to an intention to correct the overall tone established by peer

consumers. This finding provides an additional explanation to the common J-shaped distribution of online

ratings across different platforms. Moreover, the disconfirmation effect on review contribution is further

moderated by the rating environments where a focal individual is exposed. The magnitude of the effect is

weakened by the level of dissension among previous posters; however, it is intensified by the total

number of submitted reviews. Our results also reveal that consumers tend to be less active in sharing their

opinions as they perceive the review system to be more credible over time. This finding raises a potential

free-riding issue of which platform owners should be aware.

To highlight the significance of the disconfirmation effect identified in this paper, we perform a

series of simulations to better understand consumer rating behavior and the evolution of online product

ratings. First, we recover the missing ratings for purchase occasions that do not have a follow-up review

submission from the buyer. Our simulation results show that, at the population level, disappointed

consumers (i.e. those who encounter negative disconfirmation) are more vocal. The second simulation

presents various evolutionary patterns of online product ratings, all of which indicate that the average of

posted ratings will converge to the true quality in the long run. While other researchers have explained

why average product ratings decline over time (Li and Hitt 2008; Moe and Schweidel 2012), we believe

that the disconfirmation effect provides an alternative explanation since it can also explain other common

patterns observed in the real life. Our third simulation shows that fake, inflated reviews may strongly

induce low ratings from customers who suffer from negative disconfirmation.

The rest of this paper is organized as follows. In Section 2, we briefly review the relevant literature

and differentiate our work from others. Section 3 describes the data and provides model-free evidence

regarding consumer rating behavior. Our empirical model is specified in details in Section 4. Section 5

presents our estimation strategy, identification strategy, model fit and model comparison. In Section 6, we

5

present our estimation results and discuss the associated insights. We also provide robustness checks and

conduct a series of simulations to highlight the significance of our main findings. Concluding remarks

and future research directions are provided in Section 7.

2. Literature Review

This work is related to the stream of literature studying why consumers engage in post-purchase word of

mouth (WOM). Using survey data, researchers investigate this question from a motivational point of view.

In an offline setting, consumers’ desire for altruism, product involvement and self-enhancement are main

factors leading to positive WOM, whereas consumers spread negative WOM usually for altruism, anxiety

reduction and vengeance purpose (Sundaram et al. 1998). With a similar approach, Hennig-Thurau et al.

(2004) conclude that social benefits, economic incentives, concern for others, and extraversion are the

primary motivations for consumer sharing their product experiences on the Internet.

An extensive literature using quantitative methods has also been developed to identify what drives

consumers to share their product or service experiences in the absence of monetary reward mechanisms.

Dellarocas et al. (2004) examine the rating behavior on an electronic trading platform. They find that such

voluntary behavior is driven by the expectation of reciprocal behavior, meaning that a trader evaluates her

trading partner in order to solicit feedback from the other party. Shen et al. (2013) look at review posting

behavior from a strategic perspective. Using the book review data from online book sellers, they argue

that online reviewers are more prone to rate popular but less crowded books in order to gain attention and

to reduce competition for attention at the same time. They also conclude that reviewers with high

reputation costs are more likely to adopt an imitation strategy by posting ratings conforming to the

consensus at a community. Factors affecting the level of WOM have also been studied at the population

level. Rather than taking the common conception of the level of WOM activity as a monotonic function of

customer satisfaction, Anderson (1998) discovers that the relationship between them exhibits a U-shaped

pattern– customers are more likely to engage in WOM when they are either extremely satisfied or

extremely dissatisfied with the products. Using the data from a movie website, Dellarocas and Narayan

6

(2006) also identify a similar association between observed rating density and perceived movie quality.

Along this line, Dellarocas et al. (2010) further suggest that movie goers are more prone to post review

for the most or least popular movies measured by box office revenues.

There is an emerging literature stream examining how existing ratings affect subsequent ones. In a

lab setting, Schlosser (2005) identifies the “self-presentational” phenomenon in which a review poster

strategically adjusts her product ratings downwards after observing negative opinions by others, in order

to present herself as intelligent. She also finds that a consumer would make her opinions more balanced if

the opinions from the crowd exhibit a high level of dissention. Li and Hitt (2008) argue that predominant

declining trend of book ratings can be attributed to consumers’ “self-selection” bias, meaning that early

buyers have higher perceived quality and therefore tend to give higher ratings than do later buyers. Using

reviews posted on an online retailer of bath, fragrance and home products, Moe and Schweidel (2012)

show that consumers’ rating behavior is influenced by rating environments where they are exposed. In

particular, they find that a consumer is more inclined to share her experience when the existing ratings

have high valance and high volume. In addition, active posters are found to be more negative and exhibit

differentiation behavior, which is consistent with the self-presentational strategy suggested by Schlosser

(2005). Lee et al. (2014) distinguish prior ratings by the friends from those by the strangers and

investigate whether these two different sets of ratings have distinctive impact on a focal individual’s

opinions. They find that friends’ opinions always induce herding and the presence of social networking

dampens the impact of opinions from the crowd.

Research studying consumer rating behavior naturally leads to another stream of literature which

examines whether customer reviews can represent true product quality. This particular literature stream

can be further classified into two categories, depending on whether publicly available reviews are entirely

generated by consumers or partially manipulated by firms. Following the notion that online ratings are

truly truthful, Hu et al. (2006) discuss whether the mean of posted ratings can represent true product

quality. In particular, they develop an analytical model assuming that a consumer would post a review

only when the level of her satisfaction is either above or below a “brag-and-moan” interval; she would

7

otherwise be silent. Based on this assumption, they show that the average rating can serve as an unbiased

signal if and only if two bounds of the interval are equal or symmetrically deviate from the true quality.

On the other hand, Dellarocas (2006) assumes that the observed ratings may not be fully trustful and

could be strategically boosted by firms. He demonstrates that inflated reviews are still informative since

the firm producing high-quality products benefits the most through such manipulation.

In terms of research context and methodology, our work is most relevant to Moe and Schweidel

(2012) (henceforth MS). We develop our model based on MS which, in turn, is a generalization of Ying

et al. (2006) who first propose that whether a product is rated should affect analyst’s prediction of that

rating. Despite several similarities, this paper differs from MS in many aspects. First, our focus is to study

how quality disconfirmation and perceived system credibility affect the individual rating decisions, while

that of MS is to examine whether online raters would adjust their opinions according to expressed

opinions by others. Second, we extend a dynamic learning framework, flexibly allowing an individual’s

perception of system credibility to change over time and dynamically linking this perception to her rating

decisions. On the contrary, MS use a static model in which posting decisions are treated independent and

uncorrelated with each other. Finally, we are able to directly observe from our data set whether a

consumer leaves a review for product she has previously purchased, whereas the purchase data is missing

in MS. Our lengthy panel also allows us to construct a richer model to investigate why consumer rating

behavior changes over time.

This paper attempts to close the gap in online review literature by studying the relationship between

posted reviews and subsequent ones from a different perspective. Specifically, we postulate that the

impact of existing ratings may occur when an individual accesses review information for purchasing

decisions (in the pre-purchase stage) even before she faces the rating decisions (in the post-purchase

stage). We also consider that individuals may perceive the review system not to be truly trustful, by

modeling how the individual perception of system credibility evolve over time and how such perception

affects one’s rating decisions in a dynamic fashion. While credibility has been examined in different

applications such as how it determines the persuasiveness of communication (Chaiken 1980), the way in

8

which credibility impacts consumers interaction with information systems (as review readers and posters)

still remains unstudied.

3. Data and Model Free Evidence

The data for this study is provided by an online e-commerce website which sells a variety of electronics

and household items. The data set contains complete purchase history and review entries made by 1,000

heavy users, each of whom purchases at least 10 products.2 The purchase record set consists of detailed

order information such as the product name, price, handling time and order placement date. The rating

data set records customer-reported reviews in a typical format, including a review title, a review body,

submission date, and an overall product rating on a discrete 5-star scale, with 5 being the best. The data

spans from January 2007 to November 2011. During this period of time, the site had not gone through

any policy or system design change that could influence consumers’ rating behavior. It did not implement

any monetary and reviewers’ reputation mechanisms that could encourage posting either. Contribution on

product review is considered voluntary and self-driven.

Our data set is unique in two aspects. First, the complete logs of purchasing and rating activities

inform us whether a consumer leaves a review for the product she has purchased earlier on. With this

information, we can precisely discern posting behavior from lurking for each purchase occasion. Second,

by leveraging the time stamp of all purchasing and rating activities, we are able to recover the rating

information (characterized by the valence, volume and variance of posted ratings) available at any given

time. It is also worth noting that our data was collected at the micro (individual) level. This nature

distinguishes our work from others that commonly use review data at the aggregate (product) level.

To identify the experienced quality and consumer learning dynamics, we restrict our attention to

consumers who post at least one review, resulting in the final data set comprising 361 panelists who made

2 We study heavy users mainly because they are overwhelmingly more impactful to the e-commerce website in

terms of both revenue generation and review contribution.

9

37,209 purchase transactions and 2,257 review entries.3 We also collected the mean of all posted ratings

for each product in December 2013. We will utilize this long-run mean rating to proxy the baseline

product quality.4 Before introducing our empirical model, we briefly present some observations and

address potential data issues. Table 1 reports the descriptive statistics of variables from our final data set.

At the population level, the review posting rate is nearly 6% and the mean of all observed ratings is 3.83.

Some variables, such as price and handling time, exhibit a long-tail property (Figure 1a). To deal with the

over-dispersion issue, we take logarithmic transformation on them. The log-transformed variables appear

to be normally distributed (Figure 1b).

Table 1. Descriptive Statistics of Variables

Notations Description Mean St. Dev. Min Max

yijt A dummy indicating if a purchase occasion leads to a review entry

0.061 0.239 0 1

zijt Observed rating scores 3.826 1.200 1 5

pijt Product prices 50.208 200.170 0.440 13674

hijt Order handling time (days) 2.328 3.674 0 142

ijtR Valence of existing ratings 4.015 0.621 1 5

Volijt Volume of existing ratings 58.357 186.776 0 3587

Varijt Variance of existing ratings 0.817 0.652 0 4

(a) Distribution of price (b) Distribution of log(price+1)

Figure 1. Over-dispersion vs. Logarithm Transformation

3 The 361 selected individuals (36.1% of the entire sample size) account for 36.5% of all revenue generated and

33.4% of all items purchased by the sampled population. Given these statistics, we believe the selected individuals are representative of the sampled consumers with respect to purchase habits.

4 We test the validity of this proxy in one of robustness checks.

10

The consumer rating behavior on the e-commerce website is shown in Figure 2 to Figure 6. Figure 2

plots the frequency of 5 discrete ratings. The distribution roughly follows a J shape, which is commonly

found across various rating platforms (McGlohon et al. 2010). Such an unique shape is consistent with

Anderson (1998) who argues that consumers are more prone to express opinions about products when

they are either very satisfied (represented by a high score of 4 or 5) or very dissatisfied (represented by a

low score of 1 or 2). To better understand the association between the rating left by a focal consumer (a

focal rating) and the mean of previously posted ratings by others, we calculate the observed “rating

discrepancy” ( ijt ijtz R ) for all rated occasions (yijt = 1). Figure 3 shows that most focal ratings do not

deviate too much from the overall tone established by the crowd. But, based on this distribution, can we

conclude that online raters are more vocal when they concur with the consensus from their peers? This is

the first question we attempt to answer in this paper.

Figure 2. Distribution of Posted Rating Figure 3. Distribution of Rating Discrepancy

As discussed earlier, during the time span of our data, the e-commerce website didn’t implement any

marketing campaign or system design that can possibly alter consumer engagement in online WOM

(eWOM). We should expect the rating behavior to be consistent over the time horizon. Indeed, Figure 4(a)

shows that the monthly posting rate fluctuated in the early stage and stabilized later on.5 From Figure 4(b),

we also see that the monthly average rating discrepancy did not systematically move upward or

downward either. Interestingly, if we plot the average posting rate evaluated at the level of individual

5 We present individuals’ rating behavior since January 2008 because the number of samples is limited prior to this

point of time. Also note that we exclude data points for 5 months in which no product review was posted.

7.2% 6.4%

19.9%

29.6%

36.9%

0%

10%

20%

30%

40%

1 2 3 4 5

Fre

quen

cy (

%)

11

purchase occasion,6 it is evident that consumer participation in leaving reviews exhibits a declining trend

(Figure 5). In other words, consumers become less active in expressing their own opinions as they

engaged in more purchase occasions on the site. To find out potential explanations for this downward

pattern, we also look at other posted outcomes evaluated at the same (by-occasion) level. Despite some

random disturbance, however, the average rating discrepancy across individuals did not exhibit any

noticeable trend (Figure 6a), and nor did the average posted ratings (Figure 6b). In this paper, we also

intend to explore factors that potentially explain the decline in review incidence observed from the data.

Figure 4. Monthly Review Posting Rate and Average Rating Discrepency

Figure 5. Review Posting Rate by Purchase Occasion

Figure 6. Individuals' Rating Behavior by Purchase Occasion

6 We present the posting rate for the first 50 occasions since data points after the 50th occasion reveal a similar

pattern.

12

4. Model

We develop a dynamic model to study the evolution of consumer rating behavior in an online setting. The

general modeling context is that, at a given time, consumers hold the idiosyncratic (prior) perception of

the credibility of the review system. When a purchase occasion begins, the consumer first formulates a

pre-purchase expectation about the product based on the rating signal she observes and the perceived

system credibility she holds at that time. Upon product consumption she obtains the post-purchase

evaluation. Using one attitude to confirm/disconfirm with the other, she encounters a certain degree of

disconfirmation and then uses this private information to update her own (posterior) perception of the

system credibility. Driven by the disconfirmation and others factors, the consumer faces two rating

decisions, with the first decision being whether to leave a product review; if she decides to leave a review,

she also chooses what rating to submit. The proposed model is presented in the following order: 1)

formulation of the disconfirmation; 2) updating of perceived system credibility; 3) consumer decisions of

leaving product ratings; and 4) interdependence between two rating decisions.

4.1. Formulation of Quality Disconfirmation

Pre-purchase Evaluation. Consider an individual i who is about to purchase product j during occasion t.

In our research context, the e-commerce website displays the aggregate statistics of customer self-

reported ratings (e.g., arithmetic mean and distribution of discrete ratings) on the top of a product landing

page. We assume that the prospective buyer would (at least) access the aggregate rating information

available to her before committing the purchase. While posted ratings seem to provide objective

information about the product quality, how credible the review system is perceived could be subjective

and heterogeneous across individuals. As a result, we assume individual i’s pre-purchase expectation,

ˆijtQ , to follow a normal distribution:

1

, 1 , 1ˆ ~ , ijt jt i t jt i tQ N R

, (1)

13

where jtR and jt respectively denote the mean and the precision of all ratings for j posted prior to the

time of purchase. 7, 8 These two components together characterize the rating signal available to individual i.

We model that the individual perception of system credibility is composed of two components: system

biasedness (δ) and system precision (τ). In our model, the updating of the perceived system credibility

occurs when the disconfirmation is realized after product consumption. It should be clear that one’s ex-

ante expectation, formulated before the purchase, is influenced by the perception she has updated during

the previous occasion (with t-1 subscript). The system biasedness i measures how biased the review

system is perceived by individual i. When 0i , individual i perceives the system to be neutral, and

therefore believe that jtR provides an unbiased signal about the product quality. In this case, individual i

formulates her expectation centered on jtR . When 0i ( 0)i , she believe that jtR underrates

(overrates) the product quality, and therefore she would adjust the mean of her expectation upwards

(downwards) to offset the system biasedness she perceives. The sign of i can be thought as the direction

in which the mean of individual i’s expectation deviates from the mean of posted ratings. The system

precision i measures how precise the system is perceived by i at a given occasion. When it is large

(small), the individual i would perceive the review system to be precise (noisy) such that her expectation

would be tightly (loosely) centered on its mean.

Post-purchase Evaluation. One of the most challenging parts of our modeling task is to model the

baseline quality of products. One approach is to assume the latent quality of all products following a zero-

mean random effect (Moe and Schweidel 2012). To best utilize the publicly available information, we use

7 According to a survey conducted by Lightspeed Research (Leggatt 2011), 72% of online shoppers expect to find

customer reviews available on the website they are shopping at, while 47% seek them out on company websites and 43% on price comparison sites. Therefore, we assume that one’s expectation is primarily influenced by the mean ratings observed on the same site. The precision characterizes the level of disagreement among others. We use precision instead of variance because the former gives us a neater expression of belief updating.

8 Consumers’ ex-ante expectation is also subject to textual product reviews (Ghose et al. 2012). However, we cannot observe how many textual reviews each prospective buyer reads before making a purchase decision. Therefore, we use the mean of posted ratings to measure the overall opinions shared by others.

14

the long-term average rating, jR , as a proxy for the baseline product quality.9 We assume that the post-

purchase evaluation individual i acquires upon the consumption of j to be:

0ij j iQ R , (2)

where the parameter λi0 allows for variation in product evaluation across individuals.

Disconfirmation. Having developed the expectation and evaluation of the product, we now are able

to formally define the disconfirmation. We model disconfirmation as how far individual i’s ex-post

evaluation deviates from her ex-ante expectation obtained from the same product:

ˆijt ij ijtQ Q Q . (3)

Plugging (1) and (2) into (3), we have:

1

, 1~ , ijt ijt jt i tQ N Q

, (4)

where 0 , 1( ) ( )ijt j i jt i tQ R R . (5)

A positive ijtQ indicates that the experienced product quality is higher than the expectation, and we say

that the consumers encounter a positive disconfirmation.

4.2. Updating of the Perception of System Credibility

Next, we explain how the individual perception of system credibility evolves over time. We assume that,

before experiencing the product, individual i has prior beliefs of , 1 , 1|i t i t and , 1i t , which jointly

follow a normal-gamma distribution10:

1

, 1 , 1 , 1 , 1| ~ N , i t i t i t i t

, (6)

, 1 , 1 , 1~ , i t i t i ta b , (7)

9 The use of long-term average ratings as proxies for product baseline quality can be justified in two aspects. From

the perspective of the volume of ratings, the mean and minimal values of the total number of posted ratings at product level are 67 and 11, respectively. From the perspective of time horizon, the time we collected our long-term rating data set (December 2013) is 2 years after the latest purchase occasion observed in our final data set (November 2011). It is reasonable to argue that these long-term average ratings are representative of opinions from a sufficiently large consumer base. We also perform a robustness check in Section 7.1 and the results provide supportive evidence that the long-term average ratings well proxy the true product quality.

10 In the context of Bayesian learning, disconfirmation defined here follows a normal model with unknown mean and unknown variance. The conjugate prior for this model is a normal-gamma joint distribution.

15

where i is the precision parameter of a normal distribution, is a time-invariant constant, ia is the

shape parameter and ib is the inverse scale parameter of a gamma distribution. After experiencing the

product, the individual encounters a certain level of disconfirmation and use this private information to

update her beliefs in system credibility. As a result, the perceived system credibility will change over time

as the consumer is involved in more purchase-consumption occasions. According to Bayes’ rule,

individual i’s posterior beliefs after receiving one disconfirmation signal are given by (DeGroot 1970):

1| ~ N , ,it itit it (8)

~ , ,it it ita b (9)

where

, 1, 1

,jtit i t jt ijt

jt i t

D Q

(10)

, 1 ,it i t jtD (11)

, 1 / 2,it i t jta a D (12)

2, 1

, 1

, 1

,2 1jt i t ijt

it i t jt

i t

Qb b D

(13)

and Djt is a dummy variable indicating whether there is at least one rating posted for j at the time of the t-

th purchase being placed.

It is important at this time to point out how perception of the system credibility is updated. Consider

a scenario where individual i has prior beliefs of δi,t-1 | τi,t-1 and τi,t-1. Suppose that she observes the rating

signal for j and purchases j during purchase occasion t. Upon product experience, she receives one signal

of disconfirmation, ijtQ , and uses this private information to update her own beliefs. If there is no

product rating available at the time of purchase (i.e. Djt = 0), no belief updating would occur and the

beliefs would remain unchanged. If the rating signal is available (i.e. Djt = 1), she would jointly update her

beliefs in system biasedness and precision. According to (10), the posterior mean it is the sum of the

prior mean and the realized disconfirmation weighted by a fraction Tit / (Tit + γi,t-1). The signal precision Tit

16

and the prior precision γi,t-1 can be interpreted as the strength of the disconfirmation signal her encounters

and the strength of the prior belief in system biasedness she holds, respectively. When Tit is large,

individual i perceives the posted ratings to be more precise and therefore updates it in a relatively large

amount, ceteris paribus. Similarly, the extent of updating of the scale parameter bit is increasing in the

magnitude of the realized disconfirmation and the precision of the rating signal. When Tit is small,

individual i anticipates the review signal to be noisy with a higher probability and therefore updates bit in

a relatively small amount, ceteris paribus.

What remains unspecified is the initial beliefs of each individual. The updating rule of the shape

parameter indicates that the value of ai0 measures the richness of individual i’s initial learning experience.

Since we can observe complete purchase history for all individuals from our data set, we fix ai0 at a small

number11 (provided ai0 > 1) because consumers have very limited amount of learning experience with

respect to system credibility until they receive the first disconfirmation signal. To allow for heterogeneity

across individuals in their initial belief of system precision, we assume 200 ~ N( , ),i bb b where 0b and

2b measure the mean effect and dispersion of bi0 across individuals, respectively. We further assume

0 0.i This is reasonable because prior to the receipt of any disconfirmation signal, consumers may

naturally perceive the review system to be unbiased, due to lack of purchase-consumption experience.

Once they receive more disconfirmation signals, the perception of system biasedness will be updated

depending on whether the perceived rating information inflate or underestimate the ones’ own product

assessment. Finally, we fix γi0 = 0.1 to reflect consumers having an uninformative initial belief of δi0. The

belief updating mechanism specified in our model is used to measure consumers’ subjective perception of

system credibility. In terms of interpretation of the learning process, it is somewhat different from the

traditional learning framework in which the unobserved knowledge (system biasedness and precision in

our context) is fixed at a certain level and consumers learn more about it over time.

11 The choice of ai0 is subjective. We estimate the proposed model with ai0 fixed at different values (2.5, 5, 10, and

20) and we do not observe noticeable changes for the estimated parameters. Given this result, we report the estimation results with ai0 = 5 since it gives the best model fit.

17

4.3. Consumer Decisions on Whether to Post and What to Rate

So far, we have presented a general model of how disconfirmation is derived, how the individual

perception of system credibility is updated, and how these two constructs are linked to each other. In this

section, we discuss how we model consumers’ rating decisions.

Propensity Model. We posit that whether a consumer would post a review for the product she has

previously purchased is mainly affected by the three components: the post-purchase evaluation she has,

the disconfirmation she encounters, and the updated belief in the system precision she holds. Given that

beliefs are modeled in a form of distribution, we use the mean of the distribution to measure the

individual perception. Since the belief in system precision follows a gamma distribution, an individual i’s

expectation of the updated system precision, Preit, is given by:

1ititi

itt

aPr Ee

b

. (14)

According to the belief updating rule provided by (12) and (13), the latent variable Preit will become

larger if the magnitude of disconfirmation is relatively small, meaning that the individual perceives the

review system to be more precise. To investigate how the individual perception dynamically affects the

online rating behavior, we incorporate the latent construct Preit in the raters’ decision-making process.

We model that a rater’s decision of whether to leave a product review is governed by latent posting

propensity. Specifically, individual i’s posting propensity for product j associated with occasion t, Propijt,

is expressed as:

*

,

2 21 2 3 ,0 654 :8 ,

p ijt

ijt ijt it ijt ijt jt p i

ijt ijt

ji t

Prop Prop

Q Q Pre Q Q X

(15)

where βi0 allows for heterogeneity in baseline propensity across individuals and can be interpreted as

individual-specific net utility derived from writing an online review. The covariate ijtQ is the mean of

disconfirmation given in (5), Preit is the time-variant precision construct specified in (14) and ijtQ is the

post-purchase evaluation given in (2). We also add quadratic terms, 2ijtQ and 2

ijtQ , to capture possible

nonlinear relationships associated with Propijt. To control other effects that could potentially impact the

18

individual posting decisions, we include four variables in Xjt: product price, shipping and handling time

which serves as a proxy for the service level received from the e-commerce site, and the volume and the

statistical variance of posted ratings.12 Finally, εp,ijt is an idiosyncratic error with a mean of 0. Since the

outcomes of the posting decision is binary (posting or lurking), we assume , ~ 0, 1p ijt N such that the

resulting propensity model has a binary probit specification:

*Pr( 1) Pr( 0)ijt ijt ijty Prop Prop , (16)

where yijt is an occasion-specific dummy indicating whether a purchase leads to a review entry.

The parameters of our main interest are β1, β2, and β3. The estimates of β1 and β2 together will inform

us of the effect of disconfirmation on consumer participation in posting online reviews. The estimated

parameter β3 will indicate whether the perceived system precision would encourage (if β3 > 0) or deter (if

β3 < 0) a consumer to/from express/expressing her own opinions about the product. A significant β3 will

suggest that the declining trend of posting rate observed from our data (see Figure 5) may be partially

explained by the change in perceived system precision over time. Moreover, it has been shown that online

opinions are subject to a polar effect – consumers with extreme opinions tend to be more vocal (Anderson

1998; Dellarocas and Narayan 2006). We will be able to confirm the existence of this effect if the signs of

β4 and β5 are negative and positive, respectively.

Evaluation Model. Now, suppose that individual i has decided to leave a proudct rating for j. We

assume that i’s decision of what score to give is governed by latent rating evaluation, which is mainly

driven by her proudct evaluation:

, 1:4 ,*

e ijt ijtij jt e ijt jt tiEval Eval Q X , (17)

where εe,ijt is a zero-mean random shock. While consumers are commonly advised to provide product-

related feedback only, we often observe cases in which customers complain about the poor service they

receive from the merchant in product reviews as well. The parameter λ2 will validate whether this

12 We observe from our data that a typical consumer either posts a product rating within two weeks after the order

has been placed or does not express their opinions at all. The rating environments (such as volume, valence and variance of ratings) does not have noticeable changes during such short period of time. Therefore we use review statistics observed associated with occasion t to characterize the rating environment individual i is exposed to.

19

anecdotal observation exists among consumers. Since rating evaluation is continuous whereas the

submitted ratings are discrete (1 to 5 in our context), we assume the relationship between them to follow:

*4 5

*3 4

*2 3

*1 2

*0 1

5, ,

4, ,

3, ,

2, ,

1, ,

ijt

ijt

ijt ijt

ijt

ijt

if Eval

if Eval

z if Eval

if Eval

if Eval

(18)

where zijt denotes the submitted rating scores and κ. specifies the evalution-ratings translating cutpoints.

For identification purpose we set κ0 =∞, κ1 = 0 and κ5 = ∞ (Koop et al. 2007), resulting in three cutpoints

κ2, κ3 and κ4 to be estimated.

4.4. Interdependence between Two Rating Decisions

So far we have developed two separate models governing individual decisions of whether to post and

what to rate. However, the covariance matrix of the equation system has not yet been clearly specified.

Since the post-purchase evaluation, Qijt, enters two equations simultaneously, parameter estimates could

be biased if the interdependence between two decisions is not properly specified. As a result, we assume

two sets of error terms to follow a bivariate distribution:

0 1

~ ,0 1

p

e

BVN

. (19)

For identification purpose, we fix the standard deviations of εp and εe at 1 in order to obtain binary probit

and ordered probit specification, respectively. The parameter ρ is the correlation coefficient to be

estimated.13 Given this covariance structure and the translating cutpoints defined in (18), the probability

of observing a joint event of yijt = 1 and zijt =s is given by:

2 2 4

2 4 2 3

2 3 2 2

2 2

( , , ) ( , , ) 5,

( , , ) ( , , ) 4,

( , , ) ( , , ) 3,Pr( 1, )

( ,

ijt ijt ijt

ijt ijt ijt ijt

ijt ijt ijt ijtijt ijt

ijt

Prop Eval Prop s

Eval Prop Eval Prop s

Eval Prop Eval Prop sy z s

Eval Pro

2

2 2

, ) ( , , ) 2,

( , , ) ( , , ) 1,ijt ijt ijt

ijt ijt ijt

p Eval Prop s

Eval Prop Prop s

(20)

13 Alternatively, one can compute inverse Mills ratio (IMR) from the propensity model and plug ρ·IMR into (17).

20

where Φ2 denotes the standard bivariate normal cumulative distribution function (CDF). Our proposed

model releases the independence assumption between two sets of observed outcomes (y’s and z’s). The

probability of observing yijt = 0 (i.e. lurking) can be expressed as:14

Pr( 0) 1 .ijt ijty Prop (21)

Based on (20) and (21), the joint likelihood for observing individual i who makes m purchase occasions

and posts n product ratings is given by

( 0) ( 1), Pr 0 Pr 1,

ijt ijti i i ijt ijt ijtt y t y

L y z y y z s , (22)

|--- (m n) terms ---| |---------- (n) terms ---------|

and the likelihood of observing the entire decision set made by N individuals is 1

,N

i i iiL y z .

5. Estimation

For parameters that are subject to certain constraints we apply the following transformation strategy. First,

since the correlation coefficient ρ takes value from [1, 1] only, we estimate the inverse arctangent

transformation of it, which maps the support [1, 1] to real line (Ying et al. 2006). Second, we estimate

0( )log b instead of 0b because the inverse scale parameter of a gamma distribution takes positive values

only. Finally, to ensure the magnitude of three cutoffs to obey the desired order (i.e. κ2 < κ3 < κ4), we

estimate the log of difference between two adjacent cutoffs. For variables having hyper-dispersion

property such as product price pijt and order handling time hijt, we take logarithm transformation. We also

mean-center all variables using the grand means to reduce the correlation between the estimated intercepts

and slopes and avoid potential multicollinearity in the propensity model (especially for variables which

enter propensity model in both linear and quadratic forms).15

To estimate the proposed model, we use hierarchical Bayes approach which is convenient for the

estimation of individual-specific parameters. Given the nature of parameter hierarchy, the parameters in

our model can be divided into two groups (Netzer et al. 2008): (1) “random-effect” parameters that vary

14 The marginal distribution of one series of bivariate normally distributed variables is simply a normal density. 15 We calculate the variance inflation factor (VIF) for the covariates included in the propensity model and the result

shows no evidence of multicollinearity (VIF < 2).

21

across individuals (denoted by θi); and (2) “fixed” parameters that do not vary across individuals (denoted

by ψ). We allow individual-specific parameters governing propensity intercept, product evaluation and

initial learning parameter to be correlated by assuming:

~ MVN , i , (23)

where denotes the mean effects that persist across individuals and Σ denotes the variance and

covariance matrix of θ.

As we do not have much prior knowledge about model parameters, we use diffuse multivariate

normal for the fixed parameter ψ. There are 19 elements in the vector ψ, including β capturing the

coefficients for 9 covariates in the propensity model, λ measuring the effect of 4 covariates in the

evaluation model, 3 cutpoints (κ2, κ3, κ4) which maps continuous rating evaluation and discrete posted

scores, and governing the mean effect of 3 individual-specific parameters. Let ξi denote individual-

specific deviation from . Following (23), we can directly estimate those deviations using

~ 0, i MVN .

We develop a Markov chain Monte Carlo (MCMC) procedure to recursively draw parameters from

the corresponding full conditional distributions using the following steps:

| , , , , ;

| ;

| , , , ;

| , , , ;

i i i

i

i

i

Y Z

Y Z

Y Z

(24)

We adopt random walk Metropolis-Hasting algorithm for steps where the conditional posterior

distributions do not have a closed form (Steps 1, 3 and 4 in (24)). To improve the efficiency of the

sampler, we use a two-step strategy. In the first step, we run the MCMC for 50,000 iterations and discard

the first 25,000 draws as “burn-in” samples.16 We calculate posterior means and empirical variance based

on the remaining 25,000 draws. In the second step, we run a separate MCMC using the posterior means

16 The choice of burn-in period is based on the visual observation on the trace plot of MCMC draws. In fact, 25,000

is a conservative number since the chain appears to converge after initial 5,000 iterations.

22

calculated in the previous step as the initial values and the empirical variance as the diagonal elements of

the covariance matrix for the random-walk proposal distribution.17 We run the MCMC sampler for

100,000 iterations and record every 10th draw only to mitigate the autocorrelation issue which is an

evitable consequence of the MCMC simulation (Hoff 2009). The adaptive chain converges immediately

and explores the parameter space efficiently. To test convergence, we perform Gelman and Rubin

diagnostic (Gelman and Rubin 1992) by running 3 parallel chains with different initial values and random

seeds. The potential scale reduction factors (PSRF) are roughly 1.02 for all parameters, suggesting that

the chains have converged to the target distributions.18 We combine draws from three parallel chains,

resulting in an effective sample size of at least 500 for all parameters.

5.1. Identification

We start our discussion on identification strategy from the evaluation model. There are at least two ways

to identify the evaluation model. The first approach is to estimate all 4 utility-rating translating cutpoints

with no intercept. To identify the mean effect of the intercept 0 , we adopt an alternative approach by

holding one cutpoint as a constant (i.e. κ1 = 0) and estimating the remaining three. Given everything else

equal, the difference in the mean ratings among individuals helps us identify λi. Similarly, we can achieve

the identification of the baseline propensity βi0 through the difference in the posting rate among

individuals. In our framework consumers first update their perception of system credibility, followed by

making rating decisions. Denoting the change of posting behavior over time by a series S, we can express

the relationship between β3, 0a and 0b using an analogous equation: 0 03 ( 1) /f a bS , where ( )f

represents the belief updating mechanism. Notice that once the initial value of belief parameters is

identified, the mechanism of belief updating would become deterministic and the identification for β3 can

be achieved. As a result, the relationship among them is identified from the joint likelihood of the change

in posting behavior over time. The difficulty is that we cannot identify 0a and 0b simultaneously given

17 The scale parameter of proposal distribution is adaptively chosen such that the acceptance rate is around 23%, as

suggested for high-dimension vector (Gelman et al. 2003). 18 A series of MCMC draws are considered to achieve convergence as long as PSRF < 1.2.

23

their relationship described in (14). Since all consumers have limited knowledge about the credibility of

the review system until they receive the first disconfirmation signal, we fix the richness of learning

experience 0a at a small number (provided 0 1a ) such that 0b is identifiable. Given everything else

equal, the variation in the change of posting behavior over time helps us identify the initial inverse scale

parameter bi0, which determines the rate at which each individual learns about the system credibility.

Once 0b is identified, the value of biasness parameter δit is determined and can be thought as a constant

(denoted by c) when it enters the propensity model. We can express the mean of the disconfirmation

signal as .ijt ijtQ Q c Expanding 25

21 2 4ijt ijt ijt ijtQ Q Q Q , substituting and collecting terms,

we will have 2 21 2 4 5 1 2( 2 c ) ,ijt ijtQ Q c c with 4 variables and 4 identifiable parameters.

5.2. Model Fit and Model Comparison

To demonstrate the fit of our proposed model, we compare our full (dynamic) model with its three nested

versions:

Model 1– A static model (no dynamic learning).

Model 2– A static model without disconfirmation entering propensity model.

Model 3– A static model without disconfirmation and product evaluation entering propensity model.

Table 2. Model Fit and Comparison

Model Dynamic

Belief Updating Quality

Disconfirmation Experienced

Quality DIC w.r.t.

Full Model* Full Yes Yes Yes ---

Model 1 No Yes Yes 349.7

Model 2 No No Yes 427.2

Model 3 No No No 483.9

* Lower DIC is preferred.

Table 2 reports the deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002) for

four models. The full model outperforms its three nested versions, and therefore we report and interpret

the estimation results from the full model in the next section. Our full model has a better model fit over

three static models because the belief updating mechanism provides a general way to explain the chance

24

in consumer rating behavior over time. In addition, the model fit becomes worse (higher DIC) if we

exclude the disconfirmation effect (Model 2) and the effect of product evaluation (Model 3) on posting

propensity one by one. This pattern provides supportive evidence that it is desirable to incorporate all

three components when modeling the behavior of consumer leaving online reviews.

6. Results

Table 3 reports the posterior means of parameters specified in the propensity model. Since we apply a

Bayesian approach to estimate the model, we evaluate the significance level of parameter estimates based

on the highest posterior density (HPD) intervals. A HPD interval describes the information we have about

the location of the true parameter after we have observed the data (Hoff 2009). Parameter estimates are

considered significant if the corresponding HPD intervals do not contain zero.

Table 3. Estimation Results of Propensity Model

Notation Description Coeff. Signif.

βi0 Mean effect of baseline propensity 0.831 **

β1 Disconfirmation – Linear 0.011 **

β2 Disconfirmation – Quadratic 0.025 **

β3 Perceived system precision 0.508 **

β4 Experienced quality – Linear 0.268 **

β5 Experienced quality – Quadratic 0.065 **

β6 Price 0.271 **

β7 Shipping and handling time 0.041 **

β8 Volume of posted ratings 0.009

β9 Variance of posted ratings 0.009 *

** (*) Indicates 95% (90%) HPD interval does not contain 0.

The parameters β1 and β2 together suggest that an individual’s participation in leaving online product

review is, at least partially, driven by the realization of disconfirmation. That is, the further the post-

purchase evaluation deviates from the pre-purchase expectation, the more likely an individual is to

express her opinions about the product. If the expectation more or less reflects one’s own assessment, the

consumer is more inclined to lurk. Our estimation results demonstrate a very different story from what

appears to be from the distribution of rating discrepancy (see Figure 3). Loosely, the disconfirmation

25

effect on the posting behavior could be attributed to many reasons such as altruism. An altruist may praise

the product by submitting a score higher than the current mean level, if her usage experience is higher

than what the rating signal represents. On the contrary, if her usage experience on the product is worse

than what the rating signal indicates, she may want to warn peer consumers to be vigilant with the product

by leaving a below-the-average rating. Moreover, a negative β1 suggests that negative disconfirmation

induce posting behavior from consumers to a larger extent, ceteris paribus. While existing work has

documented several motives why consumers engage in WOM from a normative point of view (Hennig-

Thurau et al. 2004; Sundaram et al. 1998), this research provides a positive validation and contribute to

literature by demonstrating disconfirmation is one of the underlying forces driving consumers to

voluntarily engage in online WOM.

A unique feature of this research is that we tie the change of consumer participation in generating

reviews to the evolutionary perception of the system precision. From an economic perspective, it should

be clear that the disconfirmation has a different effect from the perceived system precision. The former

captures the short-term impact of a disconfirmation shock (presented by β1 and β2), whereas the latter

measures an accumulative attitude based on a series of realized disconfirmation in the past and can be

thought to have an long-term influence on the rating behavior (presented by β3). Our results show that

consumers tend to be more silent in rating activities as they perceive the review system to be more precise

over time (β3 < 0). Alternatively, we can say that consumers are more vocal when the review system is

perceived to be noisy.

As for the effect of the product evaluation on posting propensity, the estimated coefficient is

negative for the linear term (β4) but positive for the quadratic term (β5). Combined, these two parameter

estimates indicate that consumers are more likely to share opinions when the experienced product quality

to be either very high or very low. This finding echoes the polar effect observed by Anderson (1998) in an

offline setting and by several researchers in an online setting (Dellarocas et al. 2010; Dellarocas and

Narayan 2006; Moe and Schweidel 2012). Furthermore, we also find that consumers are more interested

in rating products with higher prices (β6 > 0) and products handled and shipped out in a timely manner (β7

26

< 0). The rating environments, characterized by the volume and variance of posted ratings, has dissimilar

impacts on a focal consumer’s posting decision. In particular, online raters’ posting decision is subject to

a crowding-out effect, meaning that a focal individual would like to lurk if there has been a big crowd

sharing their opinions (β8 < 0). Such finding is analogous to a political phenomenon where voters tend to

abstain if public polls have declared clear winners (Sudman 1986). On the other hand, the dissension of

posted opinions would encourage the focal individual to contribute in eWOM (β9 > 0).

The estimation results of other parameters are reported in Table 4. The coefficient for product price

is positive but insignificant (λ1 > 0), implying that product price in general is a weak proxy for quality. As

expected, the order handing time has a negative impact on rating evaluation (λ2 < 0). This provides

evidence that online consumers more or less reflect the service level they receive from the e-commerce

site in the product ratings. Unlike in the propensity model, the dissension of public opinions has a

negative impact on latent rating evaluation. The posterior mean of our estimate of log(bi) is significant,

suggesting that the updating mechanism well captures the evolution of the system precision perceived by

consumers.

Table 4. Estimation Results of Evaluation Model and Other Parameters

Model Notation Description Mean Signif.

Evaluation λi0 Mean effect of experienced quality 1.997 **

λ1 Price 0.033

λ2 Shipping and handling time .104 **

λ3 Volume of the rating environment .015

λ4 Variance of the rating environment .033 *

Cutoffs κ1 Cutpoints for s=1 and 2 (fixed at 0) 0.000 ---

κ2 Cutpoints for s=2 and 3 0.486 **

κ3 Cutpoints for s=3 and 4 1.406 **

κ4 Cutpoints for s=4 and 5 2.459 ** Credibility Updating

log(bi0) Mean effect of initial inverse scale parameter

(log-transformed) 2.189 **

ai0 Initial shape parameter (fixed at 5) 5.000 ---

Correlation ρ Interdependency between two rating stages 0.082


27

Table 5 reports the mean and standard deviation of individual-specific parameters. The large

standard deviations for βi0 and λi0 suggest that online reviewers are substantially heterogeneous in baseline

posting propensity and product evaluation (also see Figure 7). To explore the interdependence between

baseline latent parameters at the individual level, we provide the pair-wise correlation matrix among them

in Table 6. The positive correlation between βi0 and λi0 suggests that after controlling for other factors,

individuals who have higher baseline posting propensity appear to be more lenient with respect to giving

higher numeric ratings. This finding is opposite to Moe and Schweidel (2012) who find that active raters

are more negative. Finally, our estimate results indicate less heterogeneity across individuals in their

initial perception of system precision, as indicated by a small standard deviation for bi0.

Table 5. Individual-specific Parameter Estimates

Model Notation Mean among individuals

Std. dev. among individuals

Baseline propensity βi0 0.831 0.598

Baseline evaluation λi0 1.997 0.520

Initial learning status bi0 8.931 0.208

(a) Distribution of βi0 (b) Distribution of λi0 (c) Distribution of bi0

Figure 7. Distributions of Individual-Level Parameters

Table 6. Individual-specific Parameter Estimates

βi0 λi0 bi0 βi0 1.000 0.142 0.387 λi0 1.000 0.876 bi0 1.000

28

6.1. Extension and Robustness Checks

In this section, we relax some of our model assumptions by taking into account other factors that could

potentially bias our findings. The estimation results are consistent before and after we consider the

following three robustness checks.

6.1.1. The Effect of Disconfirmation on Product Evaluation

In our proposed model, we assume that the disconfirmation only enters the propensity model and does not

have any impact on the post-purchase evaluation of the product. That is, consumers do not factor their

prior expectation about quality into product evaluation. Prior research has shown that in offline settings, a

consumer’s overall satisfaction could be affected by the quality disconfirmation (Anderson and Sullivan

1993) and such relationship is empirically shown to be positive (Rust et al. 1999). Based on this theory,

we now allow a consumer’s rating evaluation to be affected by her overall product satisfaction S, which is

a linear combination of her experienced product quality and the quality disconfirmation:

ijt ijt ijtS Q Q , and ν is a new parameter to be estimated. With this specification, the evaluation

model can be rewritten as:

1:4 .ijt jtijtv SE al X (25)

The propensity model becomes:

2 21 2 30 54 6:9 .ijt ijt it ijt ijt ti t jj i Q Q CreProp d S S X (26)

The mean of disconfirmation becomes:

0( ) ( ) / (1 )j i jt itQ R R . (27)

Comparing (27) with (5), we can see that the inclusion of ν will reduce the mean of disconfirmation signal

towards zero, provided ν > 0.

The estimation results (with respect to parameter estimates and DIC) of the new model only have

marginal deviation from those reported in Table 3, suggesting that the main results of this paper is robust

even if we consider the impact of disconfirmation on overall product satisfaction. The posterior mean of ν

29

is 0.117 and is significant at 95% level, suggesting that online consumers indeed factor in their ex-ante

expectation when choosing product ratings.

6.1.2. The Interaction between Disconfirmation and Rating Environments

We have examined how a focal individual’s posting decision is influenced by disconfirmation she

encounters as well as rating environments to which she is exposed. An interesting question to ask is: Is

the disconfirmation effect itself also moderated by opinions expressed by the crowd? To answer this, we

allow the effect of disconfirmation on posting propensity19 to interact with the volume and variance of

posted ratings, and incorporate these two interaction terms in our propensity model. The parameters of

interest are reported in Table 7.20

Table 7. Effects of Rating Environments on Posting Decision

Effect Coeff. Signif.

Volume on propensity 0.008

Variance on propensity 0.023 **

Volume on Disconfirmation Effect 0.026

Variance on Disconfirmation Effect 0.008 *


Consistent with the results shown in Table 3, we still find the evidence that the level of disagreement

among others encourages posting behavior. Interestingly, the coefficient for “Disconfirmation Variance”

interaction term is negative and significant. This indicates that the disconfirmation effect is moderated by

the degree of dissension among posted opinions, perhaps because a consumer can anticipate the review

information to be noisy and hence become less sensitive to the realized disconfirmation. Moreover, the

coefficient of “Disconfirmation Volume” interaction has a positive sign. This also makes sense. The

higher the volume of posted ratings, the more trustworthy consumers will perceive the review information

to be, and therefore the more sensitive they will be to the disconfirmation effect.

6.1.3. The Use of the Long-term Average Rating as a Proxy for Product Quality

19 The disconfirmation effect can be computed as the multiplication of disconfirmation variables and corresponding

coefficients (for both linear and quadratic terms). 20 Since the parameter estimates do not have a substantial change, here we report coefficients for variables that are

related to rating environments only.

30

The long-term product-level average ratings are used as proxies for the baseline product quality. Although

the validity of this proxy is justified (Footnote 9), critical readers may argue that this proxy may not

necessarily reflect the overall evaluation among all consumers. Hu et al. (2006) argue that the mean rating

would over-report (under-report) the true quality if consumers are more inclined to brag (moan) about the

product when they are highly satisfied (disgruntled). To address this issue, we add a product-level random

effect in the formulation of the post-purchase evaluation. As a result, (2) can be rewritten as:

0( )ij j j iQ R , (28)

where 2~ (0, ).j N The parameter ωj models the product-level randomness in a sense that some

products are overrated and others are underrated by the long-tern mean ratings. If the value of estimated

is small, we can argue that the long-term average ratings more or less reflect the true product quality.

The estimation results obtained with this new specification do not have noticeable differences from

those obtained from our original model. The posterior mean (standard deviation) for ωj is 0.011 (6×105),

providing supportive evidence that the long-term average ratings serve as a good proxy for the baseline

quality. What makes our finding different from the extant work is that, in addition to the post-purchase

evaluation, we also consider the disconfirmation effect. The richer model allows consumers to make

posting decisions in response to the overall tone on the product quality set by posted ratings. We should

expect that, in the presence of the disconfirmation effect, the average rating will gradually converge to the

true quality in the long run. In contrast, the results of Hu et al. (2006) is dependent on an assumption that

the mechanism underlying consumer posting behavior is irresponsive to previously expressed opinions by

others.

6.2. Simulations and Analyses on Consumer Rating Behavior

In this subsection, we perform a series of counterfactual analyses to further highlight the significance of

our findings and better understand consumer rating behavior in an online setting.

6.2.1. Recovering unobserved rating scores

In the first analysis, we recover the rating outcomes (yijt‘s and zijt’s) for purchase occasions that have a

follow-up review entries (yijt = 1). Those recovered (simulated) outcomes are compared to the observed

31

ones to gauge the performance of our model. We also “uncover” the outcomes for lurking occasions

(yijt=0) to demonstrate the mechanism underlying online rating behavior. The simulation procedure for is

summarized as follows:

1. We compute the latent posting propensity per (15) and evaluation per (17) for all purchase occasions

based on the posterior estimates.

2. Given the computed posting propensity, evaluation and estimated cutpoints, we simulate individual

decisions of whether to rate per (16) and what to rate per (18).

3. We repeat Step 2 for 1,000 iterations and compute the average for the quantities of our interests

across iterations.

To make sure the number of iterations is sufficient, we repeat Steps 2-4 in three parallel processes

with different random seeds. We compare simulated results and do not find any inconsistency across

processes.

Figure 8. Simulated vs. Actual Ratings

We begin our analysis by assessing the performance of our model in explaining the behavior of

consumer leaving ratings. Figure 8 plots the simulated ratings (presented by light grey bars) for those

rated occasions against the actual ratings observed from the data set (presented by dark grey bars). Two

series of numbers are very close to each other, indicating that the proposed model well captures the

mechanism governing consumers’ rating decisions.21

21 We also run the same analysis for three alternative models defined in Section 6.2. The results show that our

proposed model outperforms its three nested versions in terms of recovering consumers’ rating decisions.

6.9% 6.1%

19.6%

30.6%

36.8%

7.2% 6.4%

19.9%

29.6%

36.9%

0%

10%

20%

30%

40%

1 2 3 4 5

Fre

quen

cy (

%)

Rating

Simulated Actual

32

For interpretation purpose, we compute the posting rates evaluated at two dimensions and plot them

against different levels of post-purchase evaluation (Figure 9a) and different levels of rating discrepancy

(Figure 9b). We can see that the relationship between posting rate and discretized product evaluation is

best characterized by a left-skewed U shape. Negative experiences (rated as 1 or 2) have a higher chance

to be reported (7.44% and 6.10%, respectively), relative to neutral (5.88%) and positive evaluations

(5.54% and 5.70%). It is evident that disappointed or disgruntled consumers (those who encounter strong,

negative rating discrepancy) are more vocal in expressing opinions about the product. The simulation

results provides a sharp insight into what drives consumers to leave online reviews; an insight that cannot

be discovered from directly observing the data pattern, as shown in Figure 2 and Figure 3.

(a) Posting Rate (%) vs. Post-purchase Evaluation (b) Posting Rate (%) vs. Rating Discrepancy

Figure 9. Dissimilar Effects of Different Measurements on Posting Behavior

6.2.2. Convergence of product ratings

To highlight how the disconfirmation effect influence the evolution of product ratings at the product level,

we perform our second simulation by following the procedure below.

1. We sample 3,000 individuals from the posterior estimates of population-level parameters (Σ). We

assume all individuals purchase and consume the product with the baseline quality being equivalent

to a 3.5 star. The sequence of individuals entering the market is random.

2. For each individual we first simulate her posting decision; and for the individual who decides to post

a rating, we simulate her decision of what rating to give. When a new rating is posted, we calculate

the up-to-date mean ratings as well as other variables that will serve as control variables in the

subsequent rating occasions.

7.44%

6.10% 5.88% 5.54% 5.70%

0%

2%

4%

6%

8%

1 2 3 4 5

Rating

0%

3%

6%

9%

12%

-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5

Rating Discrepancy

33

3. We repeat Step 2 until 150 ratings are posted.

4. We repeat Steps 1-3 for multiple iterations.

Figure 10 depicts four major evolutionary patterns of the online product ratings observed from our

simulation results. It is evident that in the long run, the mean rating will converge to the true product

quality (3.5). If the mean rating in the early stage is relatively high to the true quality, individuals with

neutral or negative evaluation would perceive the rating signal to be inflated. The disconfirmation effect

would take place and induce low ratings from those dissatisfied consumers. As a result, we shall expect

that the mean rating will gradually drop to the level of 3.5 and then stabilize (see Figure 10a). While such

declining trend have been attributed to different effects such as self-selection biases (Li and Hitt 2008)

and environmental factors (Moe and Schweidel 2012), the effect of disconfirmation on posting decisions

can also provide an alternative explanation to this data pattern. Perhaps more importantly, the

disconfirmation effect can also explain other evolutionary patterns that cannot be explained by those

established theories. For example, if the product is initially underrated, the subsequent consumers who

encounter positive disconfirmation are more likely to lift the mean rating up by sharing their pleasant

usage experiences (Figure 10b). Interestingly, we also observe an undershooting property where the

average rating first exhibits a steep declining pattern and then bounces back to a steady state (Figure 10d).

Based on our estimation and simulation results, we believe that the disconfirmation effect is one of the

most important forces driving the voluntary behavior of consumer posting product reviews.

6.2.3. The effect of fake, inflated ratings on subsequent rating entries

Our simulation can also be used to understand how review manipulation in the early stage influences

subsequent ratings submissions. Similar to previous setting, we consider a fictitious scenario where a firm,

selling a product with true quality being equivalent to a 3.5, is able to “set” the mean rating at 4 (an

inflation of a 0.5 star) when the number of reviews reaches 15. We repeat Steps 1-4 listed in previous

subsection for 10 iterations and average the posted ratings cross iterations. As Figure 11 illustrates, the

mean rating plummets after the 15th entry and stabilizes at the level of true quality. Although fake,

inflated ratings may elevate buyers’ pre-purchase expectation and hence reap additional revenue, we

argue that such deceptive information may strongly induce very negative feedback from disgruntled

34

consumers who post reviews for anxiety reduction or vengeance purpose. Similarly, we should expect that

the damage caused by the competitors’ malicious review distortion would be alleviated by subsequent

positive ratings reported by genuine and satisfied customers.

Figure 10. Various Evolutionary Patterns of Ratings (X-axis: Rating Index)

Figure 11. Effect of Inflated Ratings on Subsequent Rating Entries (X-axis: Rating Index)

7. Conclusion

The primary objective of this paper has been to study the effect disconfirmation has on consumer rating

behavior in a digital setting. The early research on online product reviews has studied how user-generated

ratings can be related to market performance, whereas the recent work focuses on the impact of existing

ratings on subsequent ones from a social dynamics standpoint (Lee et al. 2014; Moe and Schweidel 2012;

Shen et al. 2013). Our work contributes to the latter literature stream by proposing a novel framework in

which a focal consumer’s posting decisions has been influenced by others’ opinions even before the

35

product is purchased and consumed. In addition, we model how consumers’ perception of the system

credibility evolves over time as a result of a series of realizations of disconfirmation. By integrating these

two features in the individual decision-making process, we are able to provide a comprehensive

understand on what drives consumers to voluntarily contribute online product ratings.

Using a rich data set containing complete purchasing and rating activities at the individual level, we

empirically show that online consumers’ rating behavior is driven by disconfirmation, the extent to which

the post-purchase evaluation deviates from the pre-purchase expectation of the same product. The

magnitude of the disconfirmation effect is decreasing in the variance of posted ratings while increasing in

the volume of submitted reviews. We also find that online raters tend to become silent as they perceive

the review system to be more precise over time.

The main finding of this paper echoes prior research in some aspects but also provides different

insights in others. On one hand, the disconfirmation effect serves as one of the underlying drivers of why

people engaging in WOM such as concerns for others, anxiety reduction, vengeance, etc. (Anderson 1998;

Hennig-Thurau et al. 2004). The impact of disconfirmation on posting behavior can also (at least partially)

explain 1) the commonly observed U-shaped distribution of online product ratings; and 2) the declining

trend of average ratings at the product level. On the other hand, we believe that the long-term average

rating can more or less represent the true product quality. Through simulations, we demonstrate that the

identified disconfirmation effect will “ensure” that the average rating converges to the true value in the

long run. Our proposition to some extent disagrees with Hu et al. (2006) who argue that the mean score

may provide misleading recommendation, which is contingent on an analytical assumption that

consumers’ posting decisions are solely triggered by their perceived quality.

Our empirical results shed light on the economic value of online product ratings in the following

aspects. Manufactures or service providers should be aware that online reviewers are more prone to

provide feedback when expectations do not match their own perceived quality. While manipulating

product reviews by inflating numeric ratings can temporally boost the sales revenue, inflated ratings could

36

turn to be detrimental in the long run as disappointed customers engage in negative eWOM. 22

Furthermore, online reviewers’ perceptions of system precision have a negative impact on their

propensity to contribute reviews. Existing literature also shows that rating environment with smaller

volume or lower valence discourages posting incidence (Moe and Schweidel 2012). To keep the online

rating environment healthy and prosperous, policy makers and marketers who are in charge of online

ratings campaigns should design various incentives for different consumers accordingly.

This study has some limitations and can be improved in the following directions. First, our data has

limited information which prevents us from examining online rating behavior in a more detailed way. For

example, although we have utilized the aggregate review information, such as valence and variance, in

formulating quality variables, we do not consider the possibility that some consumers may value positive

ratings and negative ones differently. If we were able to observe the distribution of posted ratings at the

time of purchase, we might be able to discover more interesting findings on the way consumers interpret

review signal in the pre-purchase stage. Second, a promising direct for future research is to apply text

mining techniques to extract the sentiments stored in the textual data and incorporate them into our

econometric model. The synergy created by the integration of different research methods may allow us to

provide a shaper insight into consumers’ online rating behavior. Third, consumers’ browsing history

before or after purchase can greatly help us build up an even more sophisticated model. For example, if

we knew how many pages of reviews each prospective buy reads and how long she stays on each page,

we were able to better calibrate how previously posted reviews impact the formulation of the expected

quality. If we further knew what similar items each prospective buyer has looked at, we could directly

model the effect of existing ratings on a focal consumer’s purchase decision.

Reference

Anderson, E. W. 1998. Customer Satisfaction and Word of Mouth. Journal of Service Research 1 (1):5-

17.

22 In our empirical model, we can only show that inflated ratings would drive disappointed buyers to leave negative

numeric ratings. However, we cannot gauge the harm those negative textual reviews would do to the firm’s reputation.

37

Anderson, E. W., and M. W. Sullivan. 1993. The Antecedents and Consequences of Customer

Satisfaction for Firms. Marketing Science 12 (2):125-143.

Chaiken, S. 1980. Heuristic versus systematic information processing and the use of source versus

message cues in persuasion. Journal of personality and social psychology 39 (5):752.

DeGroot, M. H. 1970. Optimal statistical decisions. Vol. 82: Wiley-Interscience.

Dellarocas, C. 2006. Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and

Firms. Management Science 52 (10):1577-1593.

Dellarocas, C., M. Fan, and C. Wood. 2004. Self-interest, reciprocity, and participation in online

reputation systems.

Dellarocas, C., G. Gao, and R. Narayan. 2010. Are consumers more likely to contribute online reviews for

hit or niche products? Journal of Management Information Systems 27 (2):127-158.

Dellarocas, C., and R. Narayan. 2006. A statistical measure of a population’s propensity to engage in

post-purchase online word-of-mouth. Statistical science 21 (2):277-285.

Deloitte. 2007. Most consumers read and reply on online reviews; companies must adjust. USA.

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2003. Bayesian data analysis: CRC press.

Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences.

Statistical science:457-472.

Ghose, A., P. G. Ipeirotis, and B. Li. 2012. Designing ranking systems for hotels on travel search engines

by mining user-generated and crowdsourced content. Marketing Science 31 (3):493-520.

Gormley, M. NY attorney general cracks down on fake online reviews. NBC News 2013 [cited. Available

from http://www.nbcnews.com/tech/internet/ny-attorney-general-cracks-down-fake-online-reviews-

f4B11235875.

Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler. 2004. Electronic word-of-mouth via

consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?

Journal of interactive marketing 18 (1):38-52.

Hoff, P. D. 2009. A first course in Bayesian statistical methods: Springer.

Hu, N., P. A. Pavlou, and J. Zhang. 2006. Can online reviews reveal a product's true quality?: empirical

findings and analytical modeling of Online word-of-mouth communication. In Proceedings of the

7th ACM conference on Electronic commerce. Ann Arbor, Michigan, USA: ACM, 324-330.

Johnston, S. Don't Fall for Fake Online Reviews. US News 2014. Available at

http://money.usnews.com/money/personal-finance/articles/2014/07/08/dont-fall-for-fake-online-

reviews.

Koop, G., D. J. Poirier, and L. Tobias. 2007. Bayesian Econometric Methods (Econometric Exercises):

Cambridge University Press.

38

Lee, Y., K. Hosanaga, and Y. Tan. 2014. Do I follow my friends or the crowd? Information Cascades in

Online movie rating. Management Science Forthcoming.

Leggatt, H. 2011. 24% of consumers turned off after two negative online reviews 2011 [cited April 13

2011]. Available from http://www.bizreport.com/2011/04/27-of-consumers-turned-off-by-just-two-

negative-online-revie.html.

Li, X., and L. M. Hitt. 2008. Self-Selection and Information Role of Online Product Reviews. Information

Systems Research 19 (4):456-474.

McGlohon, M., N. Glance, and Z. Reiter. 2010. Star quality: Aggregating reviews to rank products and

merchants. Paper read at International Conference on Weblogs and Social Media.

Moe, W. W., and D. A. Schweidel. 2012. Online Product Opinions: Incidence, Evaluation, and Evolution.

Marketing Science 31 (3):372-386.

Netzer, O., J. M. Lattin, and V. Srinivasan. 2008. A Hidden Markov Model of Customer Relationship

Dynamics. Marketing Science 27 (2):185-204.

Nielson. Word-of-Mouth the most Powerful Selling Tool. Nielson 2007 [cited 2013/04/29. Available from

http://nz.nielsen.com/news/Advertising_Oct07.shtml.

PowerReviews. 2010. Available from http://www.powerreviews.com/products/social-content-

engine/customer-reviews.

Rust, R. T., J. J. Inman, J. Jia, and A. Zahorik. 1999. What You Don't Know About Customer-Perceived

Quality: The Role of Customer Expectation Distributions. Marketing Science 18 (1):77-92.

Schlosser, A. E. 2005. Posting versus lurking: Communicating in a multiple audience context. Journal of

Consumer Research 32 (2):260-265.

Shen, W., Y. J. Hu, and J. Rees. 2013. Competing for Attention: An Empirical Study of Online Reviewers’

Strategic. Working paper.

Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. Van Der Linde. 2002. Bayesian measures of model

complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64

(4):583-639.

Sudman, S. 1986. Do Exit Polls Influence Voting Behavior? The Public Opinion Quarterly 50 (3):331-

339.

Sundaram, D. S., K. Mitra, and C. Webster. 1998. Word-of-mouth communications: a motivational

analysis. Advances in consumer research 25 (1):527-531.

Ying, Y., F. Feinberg, and M. Wedel. 2006. Leveraging Missing Ratings to Improve Online

Recommendation Systems. Journal of Marketing Research 43 (3):355-365.

disconfirmation effect on online rating behavior: a

Documents