disconfirmation effect on online rating behavior: a
TRANSCRIPT
Disconfirmation Effect on Online Rating Behavior: A Dynamic Analysis
Abstract
This research studies the effect of “disconfirmation”—the discrepancy between the expected and
experienced quality of the same product—on the behavior of consumers leaving product reviews in an
online setting. Specifically, we posit that an individual’s pre-purchase expectation is driven by the product
ratings she observes and the credibility of the review system she perceives at the time of purchases. Upon
product consumption, the individual obtains post-purchase evaluation and uses this information to
confirm/disconfirm with her expectation. We extend the Bayesian learning framework to model the
individual perception of the system credibility as how closely the ex-ante expectation matches the ex-post
evaluation. Incorporating such dynamic perception in raters’ decisions allows us to better understand the
chance of consumer engagement in sharing product experience over time. We estimate the proposed
model using a rich data set comprising compete purchasing and rating activities made by 361 individuals
on an e-commerce website. Our results show that consumer participation in leaving product reviews is
subject to the disconfirmation effect. The further an individual’s post-purchase evaluation deviates from
her pre-purchase expectation, the more likely she is to express her own opinions about the product. Our
results also show that online consumers tend to be less active with respect to posting reviews as they
perceive the system to be more precise over time. From simulations, we highlight 1) the dissimilar effects
of disconfirmation and experienced quality on consumer rating behavior; 2) various evolutionary patterns
of product ratings over time; and 3) the effect of fake, inflated ratings on subsequent rating entries.
Keywords: product reviews, disconfirmation, Bayesian learning, world of mouth, user-generated content
2
1. Introduction
It has been well recognized that online customer reviews and ratings1 have a substantial impact on
consumers’ purchasing decisions. According to surveys, 82% of the respondents agree that online reviews
directly influence their decisions on what product to buy (Deloitte 2007) and over 75% of them consider
recommendations from those who have experienced the product the most credible information source
(Nielson 2007). By observing the usage experience shared by peer consumers, prospective buyers can
reduce the product uncertainty and make more informed purchases, leading to higher satisfaction and
lower merchandise return rates (PowerReviews 2010). As a result, the prevalence of online reviews
enhances the information transparency and improves the efficiency of the digital marketplace.
Recently, there is growing evidence that companies intend to manipulate online customer reviews for
targeted products and services (Gormley 2013). Some boost the image of their own brands or
merchandises by posting deceptive positive evaluation, whereas others undermine that of the competitors’
by making up negative usage experience, or both. Such unethical practices cast serious doubt on the
trustworthiness of customer-reported ratings and jeopardize the credibility of the review system perceived
by consumers (Johnston 2014).
In this research, we empirically study how the behavior of consumer leaving online product reviews
is impacted by disconfirmation, the discrepancy between the pre-purchase expectation and the post-
purchase evaluation about the same product. Our proposed model is developed from two novel aspects.
First, we examine the online rating behavior by looking at the activities involved throughout the entire
purchase process. An online review platform is an information system which facilities information
exchange among its users. System users can be categorized into two groups based on how they interact
with the system. The first user type is the review reader (information receiver) who gathers information
about others’ usage experience stored in various formats, such as numeric ratings, textual contents, multi-
media files, etc. After consuming the product herself, a review reader has an opportunity to become a
1 Following literature, by reviews we mean a general format of consumer evaluation, whereas by ratings we
specifically refer to numeric values indicating overall satisfaction.
3
review poster (information provider), the second role a user can play, by sharing her own assessment on
the products. It has been shown that when making rating decisions, an individual tends to first observe the
expressed opinions by others and then adjusts her own feedback accordingly (Moe and Schweidel 2012;
Schlosser 2005). In terms of timing, prior research assumes that the previously posted ratings would
impact a focal individual only when she faces a rating decision in the post-purchase stage, without
considering an important fact that review posters are often review readers. Motivated by the dual roles a
consumer can play, we posit that the influence of existing ratings may take place when the focal
individual accesses the review information in the pre-purchase stage as well.
The second novelty of this research is that we take into account that individuals may not perceive the
review system to be truly trustful. Specifically, we consider that an individual holds the perception of the
system credibility, a subjective attitude underlying how credible the system is perceived by her. In the pre-
purchase stage, a consumer formulates the pre-purchase expectation of the product based on the posted
ratings, taking into account her own perceived system credibility. Upon product consumption, she
obtains the post-purchase evaluation of the product and encounters disconfirmation. The realizations of
disconfirmation are then used by the consumer to update her perception of the system credibility over
time. Such a consumer response is especially realistic, after the outbreak of firms’ strategic manipulation
in online rating environments.
Unlike extant work that studies the review incidence from a static point of view, this research
particularly examine at the evolution of the individual rating behavior from a dynamic perspective. We
extend the Bayesian learning framework, flexibly allowing the perception of system credibility to change
over time and dynamically incorporating this time-variant perception into raters’ decision-making process.
The proposed model is applied to a unique dataset consisting of complete purchase history and review
entries taken place on an e-commerce website. Utilizing this complete activity logs, we can exactly
observe whether a consumer leaves a review for the product she has purchased earlier on. In addition, the
length of our panel enables us to calibrate how consumer rating behavior changes over time. To the best
4
of our knowledge, this is the first research that looks at the evolution of consumer participation in giving
online reviews at the micro level.
Our estimation results show that consumer review-posting behavior is, at least partially, driven by
the quality disconfirmation, perhaps due to an intention to correct the overall tone established by peer
consumers. This finding provides an additional explanation to the common J-shaped distribution of online
ratings across different platforms. Moreover, the disconfirmation effect on review contribution is further
moderated by the rating environments where a focal individual is exposed. The magnitude of the effect is
weakened by the level of dissension among previous posters; however, it is intensified by the total
number of submitted reviews. Our results also reveal that consumers tend to be less active in sharing their
opinions as they perceive the review system to be more credible over time. This finding raises a potential
free-riding issue of which platform owners should be aware.
To highlight the significance of the disconfirmation effect identified in this paper, we perform a
series of simulations to better understand consumer rating behavior and the evolution of online product
ratings. First, we recover the missing ratings for purchase occasions that do not have a follow-up review
submission from the buyer. Our simulation results show that, at the population level, disappointed
consumers (i.e. those who encounter negative disconfirmation) are more vocal. The second simulation
presents various evolutionary patterns of online product ratings, all of which indicate that the average of
posted ratings will converge to the true quality in the long run. While other researchers have explained
why average product ratings decline over time (Li and Hitt 2008; Moe and Schweidel 2012), we believe
that the disconfirmation effect provides an alternative explanation since it can also explain other common
patterns observed in the real life. Our third simulation shows that fake, inflated reviews may strongly
induce low ratings from customers who suffer from negative disconfirmation.
The rest of this paper is organized as follows. In Section 2, we briefly review the relevant literature
and differentiate our work from others. Section 3 describes the data and provides model-free evidence
regarding consumer rating behavior. Our empirical model is specified in details in Section 4. Section 5
presents our estimation strategy, identification strategy, model fit and model comparison. In Section 6, we
5
present our estimation results and discuss the associated insights. We also provide robustness checks and
conduct a series of simulations to highlight the significance of our main findings. Concluding remarks
and future research directions are provided in Section 7.
2. Literature Review
This work is related to the stream of literature studying why consumers engage in post-purchase word of
mouth (WOM). Using survey data, researchers investigate this question from a motivational point of view.
In an offline setting, consumers’ desire for altruism, product involvement and self-enhancement are main
factors leading to positive WOM, whereas consumers spread negative WOM usually for altruism, anxiety
reduction and vengeance purpose (Sundaram et al. 1998). With a similar approach, Hennig-Thurau et al.
(2004) conclude that social benefits, economic incentives, concern for others, and extraversion are the
primary motivations for consumer sharing their product experiences on the Internet.
An extensive literature using quantitative methods has also been developed to identify what drives
consumers to share their product or service experiences in the absence of monetary reward mechanisms.
Dellarocas et al. (2004) examine the rating behavior on an electronic trading platform. They find that such
voluntary behavior is driven by the expectation of reciprocal behavior, meaning that a trader evaluates her
trading partner in order to solicit feedback from the other party. Shen et al. (2013) look at review posting
behavior from a strategic perspective. Using the book review data from online book sellers, they argue
that online reviewers are more prone to rate popular but less crowded books in order to gain attention and
to reduce competition for attention at the same time. They also conclude that reviewers with high
reputation costs are more likely to adopt an imitation strategy by posting ratings conforming to the
consensus at a community. Factors affecting the level of WOM have also been studied at the population
level. Rather than taking the common conception of the level of WOM activity as a monotonic function of
customer satisfaction, Anderson (1998) discovers that the relationship between them exhibits a U-shaped
pattern– customers are more likely to engage in WOM when they are either extremely satisfied or
extremely dissatisfied with the products. Using the data from a movie website, Dellarocas and Narayan
6
(2006) also identify a similar association between observed rating density and perceived movie quality.
Along this line, Dellarocas et al. (2010) further suggest that movie goers are more prone to post review
for the most or least popular movies measured by box office revenues.
There is an emerging literature stream examining how existing ratings affect subsequent ones. In a
lab setting, Schlosser (2005) identifies the “self-presentational” phenomenon in which a review poster
strategically adjusts her product ratings downwards after observing negative opinions by others, in order
to present herself as intelligent. She also finds that a consumer would make her opinions more balanced if
the opinions from the crowd exhibit a high level of dissention. Li and Hitt (2008) argue that predominant
declining trend of book ratings can be attributed to consumers’ “self-selection” bias, meaning that early
buyers have higher perceived quality and therefore tend to give higher ratings than do later buyers. Using
reviews posted on an online retailer of bath, fragrance and home products, Moe and Schweidel (2012)
show that consumers’ rating behavior is influenced by rating environments where they are exposed. In
particular, they find that a consumer is more inclined to share her experience when the existing ratings
have high valance and high volume. In addition, active posters are found to be more negative and exhibit
differentiation behavior, which is consistent with the self-presentational strategy suggested by Schlosser
(2005). Lee et al. (2014) distinguish prior ratings by the friends from those by the strangers and
investigate whether these two different sets of ratings have distinctive impact on a focal individual’s
opinions. They find that friends’ opinions always induce herding and the presence of social networking
dampens the impact of opinions from the crowd.
Research studying consumer rating behavior naturally leads to another stream of literature which
examines whether customer reviews can represent true product quality. This particular literature stream
can be further classified into two categories, depending on whether publicly available reviews are entirely
generated by consumers or partially manipulated by firms. Following the notion that online ratings are
truly truthful, Hu et al. (2006) discuss whether the mean of posted ratings can represent true product
quality. In particular, they develop an analytical model assuming that a consumer would post a review
only when the level of her satisfaction is either above or below a “brag-and-moan” interval; she would
7
otherwise be silent. Based on this assumption, they show that the average rating can serve as an unbiased
signal if and only if two bounds of the interval are equal or symmetrically deviate from the true quality.
On the other hand, Dellarocas (2006) assumes that the observed ratings may not be fully trustful and
could be strategically boosted by firms. He demonstrates that inflated reviews are still informative since
the firm producing high-quality products benefits the most through such manipulation.
In terms of research context and methodology, our work is most relevant to Moe and Schweidel
(2012) (henceforth MS). We develop our model based on MS which, in turn, is a generalization of Ying
et al. (2006) who first propose that whether a product is rated should affect analyst’s prediction of that
rating. Despite several similarities, this paper differs from MS in many aspects. First, our focus is to study
how quality disconfirmation and perceived system credibility affect the individual rating decisions, while
that of MS is to examine whether online raters would adjust their opinions according to expressed
opinions by others. Second, we extend a dynamic learning framework, flexibly allowing an individual’s
perception of system credibility to change over time and dynamically linking this perception to her rating
decisions. On the contrary, MS use a static model in which posting decisions are treated independent and
uncorrelated with each other. Finally, we are able to directly observe from our data set whether a
consumer leaves a review for product she has previously purchased, whereas the purchase data is missing
in MS. Our lengthy panel also allows us to construct a richer model to investigate why consumer rating
behavior changes over time.
This paper attempts to close the gap in online review literature by studying the relationship between
posted reviews and subsequent ones from a different perspective. Specifically, we postulate that the
impact of existing ratings may occur when an individual accesses review information for purchasing
decisions (in the pre-purchase stage) even before she faces the rating decisions (in the post-purchase
stage). We also consider that individuals may perceive the review system not to be truly trustful, by
modeling how the individual perception of system credibility evolve over time and how such perception
affects one’s rating decisions in a dynamic fashion. While credibility has been examined in different
applications such as how it determines the persuasiveness of communication (Chaiken 1980), the way in
8
which credibility impacts consumers interaction with information systems (as review readers and posters)
still remains unstudied.
3. Data and Model Free Evidence
The data for this study is provided by an online e-commerce website which sells a variety of electronics
and household items. The data set contains complete purchase history and review entries made by 1,000
heavy users, each of whom purchases at least 10 products.2 The purchase record set consists of detailed
order information such as the product name, price, handling time and order placement date. The rating
data set records customer-reported reviews in a typical format, including a review title, a review body,
submission date, and an overall product rating on a discrete 5-star scale, with 5 being the best. The data
spans from January 2007 to November 2011. During this period of time, the site had not gone through
any policy or system design change that could influence consumers’ rating behavior. It did not implement
any monetary and reviewers’ reputation mechanisms that could encourage posting either. Contribution on
product review is considered voluntary and self-driven.
Our data set is unique in two aspects. First, the complete logs of purchasing and rating activities
inform us whether a consumer leaves a review for the product she has purchased earlier on. With this
information, we can precisely discern posting behavior from lurking for each purchase occasion. Second,
by leveraging the time stamp of all purchasing and rating activities, we are able to recover the rating
information (characterized by the valence, volume and variance of posted ratings) available at any given
time. It is also worth noting that our data was collected at the micro (individual) level. This nature
distinguishes our work from others that commonly use review data at the aggregate (product) level.
To identify the experienced quality and consumer learning dynamics, we restrict our attention to
consumers who post at least one review, resulting in the final data set comprising 361 panelists who made
2 We study heavy users mainly because they are overwhelmingly more impactful to the e-commerce website in
terms of both revenue generation and review contribution.
9
37,209 purchase transactions and 2,257 review entries.3 We also collected the mean of all posted ratings
for each product in December 2013. We will utilize this long-run mean rating to proxy the baseline
product quality.4 Before introducing our empirical model, we briefly present some observations and
address potential data issues. Table 1 reports the descriptive statistics of variables from our final data set.
At the population level, the review posting rate is nearly 6% and the mean of all observed ratings is 3.83.
Some variables, such as price and handling time, exhibit a long-tail property (Figure 1a). To deal with the
over-dispersion issue, we take logarithmic transformation on them. The log-transformed variables appear
to be normally distributed (Figure 1b).
Table 1. Descriptive Statistics of Variables
Notations Description Mean St. Dev. Min Max
yijt A dummy indicating if a purchase occasion leads to a review entry
0.061 0.239 0 1
zijt Observed rating scores 3.826 1.200 1 5
pijt Product prices 50.208 200.170 0.440 13674
hijt Order handling time (days) 2.328 3.674 0 142
ijtR Valence of existing ratings 4.015 0.621 1 5
Volijt Volume of existing ratings 58.357 186.776 0 3587
Varijt Variance of existing ratings 0.817 0.652 0 4
(a) Distribution of price (b) Distribution of log(price+1)
Figure 1. Over-dispersion vs. Logarithm Transformation
3 The 361 selected individuals (36.1% of the entire sample size) account for 36.5% of all revenue generated and
33.4% of all items purchased by the sampled population. Given these statistics, we believe the selected individuals are representative of the sampled consumers with respect to purchase habits.
4 We test the validity of this proxy in one of robustness checks.
10
The consumer rating behavior on the e-commerce website is shown in Figure 2 to Figure 6. Figure 2
plots the frequency of 5 discrete ratings. The distribution roughly follows a J shape, which is commonly
found across various rating platforms (McGlohon et al. 2010). Such an unique shape is consistent with
Anderson (1998) who argues that consumers are more prone to express opinions about products when
they are either very satisfied (represented by a high score of 4 or 5) or very dissatisfied (represented by a
low score of 1 or 2). To better understand the association between the rating left by a focal consumer (a
focal rating) and the mean of previously posted ratings by others, we calculate the observed “rating
discrepancy” ( ijt ijtz R ) for all rated occasions (yijt = 1). Figure 3 shows that most focal ratings do not
deviate too much from the overall tone established by the crowd. But, based on this distribution, can we
conclude that online raters are more vocal when they concur with the consensus from their peers? This is
the first question we attempt to answer in this paper.
Figure 2. Distribution of Posted Rating Figure 3. Distribution of Rating Discrepancy
As discussed earlier, during the time span of our data, the e-commerce website didn’t implement any
marketing campaign or system design that can possibly alter consumer engagement in online WOM
(eWOM). We should expect the rating behavior to be consistent over the time horizon. Indeed, Figure 4(a)
shows that the monthly posting rate fluctuated in the early stage and stabilized later on.5 From Figure 4(b),
we also see that the monthly average rating discrepancy did not systematically move upward or
downward either. Interestingly, if we plot the average posting rate evaluated at the level of individual
5 We present individuals’ rating behavior since January 2008 because the number of samples is limited prior to this
point of time. Also note that we exclude data points for 5 months in which no product review was posted.
7.2% 6.4%
19.9%
29.6%
36.9%
0%
10%
20%
30%
40%
1 2 3 4 5
Fre
quen
cy (
%)
11
purchase occasion,6 it is evident that consumer participation in leaving reviews exhibits a declining trend
(Figure 5). In other words, consumers become less active in expressing their own opinions as they
engaged in more purchase occasions on the site. To find out potential explanations for this downward
pattern, we also look at other posted outcomes evaluated at the same (by-occasion) level. Despite some
random disturbance, however, the average rating discrepancy across individuals did not exhibit any
noticeable trend (Figure 6a), and nor did the average posted ratings (Figure 6b). In this paper, we also
intend to explore factors that potentially explain the decline in review incidence observed from the data.
Figure 4. Monthly Review Posting Rate and Average Rating Discrepency
Figure 5. Review Posting Rate by Purchase Occasion
Figure 6. Individuals' Rating Behavior by Purchase Occasion
6 We present the posting rate for the first 50 occasions since data points after the 50th occasion reveal a similar
pattern.
12
4. Model
We develop a dynamic model to study the evolution of consumer rating behavior in an online setting. The
general modeling context is that, at a given time, consumers hold the idiosyncratic (prior) perception of
the credibility of the review system. When a purchase occasion begins, the consumer first formulates a
pre-purchase expectation about the product based on the rating signal she observes and the perceived
system credibility she holds at that time. Upon product consumption she obtains the post-purchase
evaluation. Using one attitude to confirm/disconfirm with the other, she encounters a certain degree of
disconfirmation and then uses this private information to update her own (posterior) perception of the
system credibility. Driven by the disconfirmation and others factors, the consumer faces two rating
decisions, with the first decision being whether to leave a product review; if she decides to leave a review,
she also chooses what rating to submit. The proposed model is presented in the following order: 1)
formulation of the disconfirmation; 2) updating of perceived system credibility; 3) consumer decisions of
leaving product ratings; and 4) interdependence between two rating decisions.
4.1. Formulation of Quality Disconfirmation
Pre-purchase Evaluation. Consider an individual i who is about to purchase product j during occasion t.
In our research context, the e-commerce website displays the aggregate statistics of customer self-
reported ratings (e.g., arithmetic mean and distribution of discrete ratings) on the top of a product landing
page. We assume that the prospective buyer would (at least) access the aggregate rating information
available to her before committing the purchase. While posted ratings seem to provide objective
information about the product quality, how credible the review system is perceived could be subjective
and heterogeneous across individuals. As a result, we assume individual i’s pre-purchase expectation,
ˆijtQ , to follow a normal distribution:
1
, 1 , 1ˆ ~ , ijt jt i t jt i tQ N R
, (1)
13
where jtR and jt respectively denote the mean and the precision of all ratings for j posted prior to the
time of purchase. 7, 8 These two components together characterize the rating signal available to individual i.
We model that the individual perception of system credibility is composed of two components: system
biasedness (δ) and system precision (τ). In our model, the updating of the perceived system credibility
occurs when the disconfirmation is realized after product consumption. It should be clear that one’s ex-
ante expectation, formulated before the purchase, is influenced by the perception she has updated during
the previous occasion (with t-1 subscript). The system biasedness i measures how biased the review
system is perceived by individual i. When 0i , individual i perceives the system to be neutral, and
therefore believe that jtR provides an unbiased signal about the product quality. In this case, individual i
formulates her expectation centered on jtR . When 0i ( 0)i , she believe that jtR underrates
(overrates) the product quality, and therefore she would adjust the mean of her expectation upwards
(downwards) to offset the system biasedness she perceives. The sign of i can be thought as the direction
in which the mean of individual i’s expectation deviates from the mean of posted ratings. The system
precision i measures how precise the system is perceived by i at a given occasion. When it is large
(small), the individual i would perceive the review system to be precise (noisy) such that her expectation
would be tightly (loosely) centered on its mean.
Post-purchase Evaluation. One of the most challenging parts of our modeling task is to model the
baseline quality of products. One approach is to assume the latent quality of all products following a zero-
mean random effect (Moe and Schweidel 2012). To best utilize the publicly available information, we use
7 According to a survey conducted by Lightspeed Research (Leggatt 2011), 72% of online shoppers expect to find
customer reviews available on the website they are shopping at, while 47% seek them out on company websites and 43% on price comparison sites. Therefore, we assume that one’s expectation is primarily influenced by the mean ratings observed on the same site. The precision characterizes the level of disagreement among others. We use precision instead of variance because the former gives us a neater expression of belief updating.
8 Consumers’ ex-ante expectation is also subject to textual product reviews (Ghose et al. 2012). However, we cannot observe how many textual reviews each prospective buyer reads before making a purchase decision. Therefore, we use the mean of posted ratings to measure the overall opinions shared by others.
14
the long-term average rating, jR , as a proxy for the baseline product quality.9 We assume that the post-
purchase evaluation individual i acquires upon the consumption of j to be:
0ij j iQ R , (2)
where the parameter λi0 allows for variation in product evaluation across individuals.
Disconfirmation. Having developed the expectation and evaluation of the product, we now are able
to formally define the disconfirmation. We model disconfirmation as how far individual i’s ex-post
evaluation deviates from her ex-ante expectation obtained from the same product:
ˆijt ij ijtQ Q Q . (3)
Plugging (1) and (2) into (3), we have:
1
, 1~ , ijt ijt jt i tQ N Q
, (4)
where 0 , 1( ) ( )ijt j i jt i tQ R R . (5)
A positive ijtQ indicates that the experienced product quality is higher than the expectation, and we say
that the consumers encounter a positive disconfirmation.
4.2. Updating of the Perception of System Credibility
Next, we explain how the individual perception of system credibility evolves over time. We assume that,
before experiencing the product, individual i has prior beliefs of , 1 , 1|i t i t and , 1i t , which jointly
follow a normal-gamma distribution10:
1
, 1 , 1 , 1 , 1| ~ N , i t i t i t i t
, (6)
, 1 , 1 , 1~ , i t i t i ta b , (7)
9 The use of long-term average ratings as proxies for product baseline quality can be justified in two aspects. From
the perspective of the volume of ratings, the mean and minimal values of the total number of posted ratings at product level are 67 and 11, respectively. From the perspective of time horizon, the time we collected our long-term rating data set (December 2013) is 2 years after the latest purchase occasion observed in our final data set (November 2011). It is reasonable to argue that these long-term average ratings are representative of opinions from a sufficiently large consumer base. We also perform a robustness check in Section 7.1 and the results provide supportive evidence that the long-term average ratings well proxy the true product quality.
10 In the context of Bayesian learning, disconfirmation defined here follows a normal model with unknown mean and unknown variance. The conjugate prior for this model is a normal-gamma joint distribution.
15
where i is the precision parameter of a normal distribution, is a time-invariant constant, ia is the
shape parameter and ib is the inverse scale parameter of a gamma distribution. After experiencing the
product, the individual encounters a certain level of disconfirmation and use this private information to
update her beliefs in system credibility. As a result, the perceived system credibility will change over time
as the consumer is involved in more purchase-consumption occasions. According to Bayes’ rule,
individual i’s posterior beliefs after receiving one disconfirmation signal are given by (DeGroot 1970):
1| ~ N , ,it itit it (8)
~ , ,it it ita b (9)
where
, 1, 1
,jtit i t jt ijt
jt i t
D Q
(10)
, 1 ,it i t jtD (11)
, 1 / 2,it i t jta a D (12)
2, 1
, 1
, 1
,2 1jt i t ijt
it i t jt
i t
Qb b D
(13)
and Djt is a dummy variable indicating whether there is at least one rating posted for j at the time of the t-
th purchase being placed.
It is important at this time to point out how perception of the system credibility is updated. Consider
a scenario where individual i has prior beliefs of δi,t-1 | τi,t-1 and τi,t-1. Suppose that she observes the rating
signal for j and purchases j during purchase occasion t. Upon product experience, she receives one signal
of disconfirmation, ijtQ , and uses this private information to update her own beliefs. If there is no
product rating available at the time of purchase (i.e. Djt = 0), no belief updating would occur and the
beliefs would remain unchanged. If the rating signal is available (i.e. Djt = 1), she would jointly update her
beliefs in system biasedness and precision. According to (10), the posterior mean it is the sum of the
prior mean and the realized disconfirmation weighted by a fraction Tit / (Tit + γi,t-1). The signal precision Tit
16
and the prior precision γi,t-1 can be interpreted as the strength of the disconfirmation signal her encounters
and the strength of the prior belief in system biasedness she holds, respectively. When Tit is large,
individual i perceives the posted ratings to be more precise and therefore updates it in a relatively large
amount, ceteris paribus. Similarly, the extent of updating of the scale parameter bit is increasing in the
magnitude of the realized disconfirmation and the precision of the rating signal. When Tit is small,
individual i anticipates the review signal to be noisy with a higher probability and therefore updates bit in
a relatively small amount, ceteris paribus.
What remains unspecified is the initial beliefs of each individual. The updating rule of the shape
parameter indicates that the value of ai0 measures the richness of individual i’s initial learning experience.
Since we can observe complete purchase history for all individuals from our data set, we fix ai0 at a small
number11 (provided ai0 > 1) because consumers have very limited amount of learning experience with
respect to system credibility until they receive the first disconfirmation signal. To allow for heterogeneity
across individuals in their initial belief of system precision, we assume 200 ~ N( , ),i bb b where 0b and
2b measure the mean effect and dispersion of bi0 across individuals, respectively. We further assume
0 0.i This is reasonable because prior to the receipt of any disconfirmation signal, consumers may
naturally perceive the review system to be unbiased, due to lack of purchase-consumption experience.
Once they receive more disconfirmation signals, the perception of system biasedness will be updated
depending on whether the perceived rating information inflate or underestimate the ones’ own product
assessment. Finally, we fix γi0 = 0.1 to reflect consumers having an uninformative initial belief of δi0. The
belief updating mechanism specified in our model is used to measure consumers’ subjective perception of
system credibility. In terms of interpretation of the learning process, it is somewhat different from the
traditional learning framework in which the unobserved knowledge (system biasedness and precision in
our context) is fixed at a certain level and consumers learn more about it over time.
11 The choice of ai0 is subjective. We estimate the proposed model with ai0 fixed at different values (2.5, 5, 10, and
20) and we do not observe noticeable changes for the estimated parameters. Given this result, we report the estimation results with ai0 = 5 since it gives the best model fit.
17
4.3. Consumer Decisions on Whether to Post and What to Rate
So far, we have presented a general model of how disconfirmation is derived, how the individual
perception of system credibility is updated, and how these two constructs are linked to each other. In this
section, we discuss how we model consumers’ rating decisions.
Propensity Model. We posit that whether a consumer would post a review for the product she has
previously purchased is mainly affected by the three components: the post-purchase evaluation she has,
the disconfirmation she encounters, and the updated belief in the system precision she holds. Given that
beliefs are modeled in a form of distribution, we use the mean of the distribution to measure the
individual perception. Since the belief in system precision follows a gamma distribution, an individual i’s
expectation of the updated system precision, Preit, is given by:
1ititi
itt
aPr Ee
b
. (14)
According to the belief updating rule provided by (12) and (13), the latent variable Preit will become
larger if the magnitude of disconfirmation is relatively small, meaning that the individual perceives the
review system to be more precise. To investigate how the individual perception dynamically affects the
online rating behavior, we incorporate the latent construct Preit in the raters’ decision-making process.
We model that a rater’s decision of whether to leave a product review is governed by latent posting
propensity. Specifically, individual i’s posting propensity for product j associated with occasion t, Propijt,
is expressed as:
*
,
2 21 2 3 ,0 654 :8 ,
p ijt
ijt ijt it ijt ijt jt p i
ijt ijt
ji t
Prop Prop
Q Q Pre Q Q X
(15)
where βi0 allows for heterogeneity in baseline propensity across individuals and can be interpreted as
individual-specific net utility derived from writing an online review. The covariate ijtQ is the mean of
disconfirmation given in (5), Preit is the time-variant precision construct specified in (14) and ijtQ is the
post-purchase evaluation given in (2). We also add quadratic terms, 2ijtQ and 2
ijtQ , to capture possible
nonlinear relationships associated with Propijt. To control other effects that could potentially impact the
18
individual posting decisions, we include four variables in Xjt: product price, shipping and handling time
which serves as a proxy for the service level received from the e-commerce site, and the volume and the
statistical variance of posted ratings.12 Finally, εp,ijt is an idiosyncratic error with a mean of 0. Since the
outcomes of the posting decision is binary (posting or lurking), we assume , ~ 0, 1p ijt N such that the
resulting propensity model has a binary probit specification:
*Pr( 1) Pr( 0)ijt ijt ijty Prop Prop , (16)
where yijt is an occasion-specific dummy indicating whether a purchase leads to a review entry.
The parameters of our main interest are β1, β2, and β3. The estimates of β1 and β2 together will inform
us of the effect of disconfirmation on consumer participation in posting online reviews. The estimated
parameter β3 will indicate whether the perceived system precision would encourage (if β3 > 0) or deter (if
β3 < 0) a consumer to/from express/expressing her own opinions about the product. A significant β3 will
suggest that the declining trend of posting rate observed from our data (see Figure 5) may be partially
explained by the change in perceived system precision over time. Moreover, it has been shown that online
opinions are subject to a polar effect – consumers with extreme opinions tend to be more vocal (Anderson
1998; Dellarocas and Narayan 2006). We will be able to confirm the existence of this effect if the signs of
β4 and β5 are negative and positive, respectively.
Evaluation Model. Now, suppose that individual i has decided to leave a proudct rating for j. We
assume that i’s decision of what score to give is governed by latent rating evaluation, which is mainly
driven by her proudct evaluation:
, 1:4 ,*
e ijt ijtij jt e ijt jt tiEval Eval Q X , (17)
where εe,ijt is a zero-mean random shock. While consumers are commonly advised to provide product-
related feedback only, we often observe cases in which customers complain about the poor service they
receive from the merchant in product reviews as well. The parameter λ2 will validate whether this
12 We observe from our data that a typical consumer either posts a product rating within two weeks after the order
has been placed or does not express their opinions at all. The rating environments (such as volume, valence and variance of ratings) does not have noticeable changes during such short period of time. Therefore we use review statistics observed associated with occasion t to characterize the rating environment individual i is exposed to.
19
anecdotal observation exists among consumers. Since rating evaluation is continuous whereas the
submitted ratings are discrete (1 to 5 in our context), we assume the relationship between them to follow:
*4 5
*3 4
*2 3
*1 2
*0 1
5, ,
4, ,
3, ,
2, ,
1, ,
ijt
ijt
ijt ijt
ijt
ijt
if Eval
if Eval
z if Eval
if Eval
if Eval
(18)
where zijt denotes the submitted rating scores and κ. specifies the evalution-ratings translating cutpoints.
For identification purpose we set κ0 =∞, κ1 = 0 and κ5 = ∞ (Koop et al. 2007), resulting in three cutpoints
κ2, κ3 and κ4 to be estimated.
4.4. Interdependence between Two Rating Decisions
So far we have developed two separate models governing individual decisions of whether to post and
what to rate. However, the covariance matrix of the equation system has not yet been clearly specified.
Since the post-purchase evaluation, Qijt, enters two equations simultaneously, parameter estimates could
be biased if the interdependence between two decisions is not properly specified. As a result, we assume
two sets of error terms to follow a bivariate distribution:
0 1
~ ,0 1
p
e
BVN
. (19)
For identification purpose, we fix the standard deviations of εp and εe at 1 in order to obtain binary probit
and ordered probit specification, respectively. The parameter ρ is the correlation coefficient to be
estimated.13 Given this covariance structure and the translating cutpoints defined in (18), the probability
of observing a joint event of yijt = 1 and zijt =s is given by:
2 2 4
2 4 2 3
2 3 2 2
2 2
( , , ) ( , , ) 5,
( , , ) ( , , ) 4,
( , , ) ( , , ) 3,Pr( 1, )
( ,
ijt ijt ijt
ijt ijt ijt ijt
ijt ijt ijt ijtijt ijt
ijt
Prop Eval Prop s
Eval Prop Eval Prop s
Eval Prop Eval Prop sy z s
Eval Pro
2
2 2
, ) ( , , ) 2,
( , , ) ( , , ) 1,ijt ijt ijt
ijt ijt ijt
p Eval Prop s
Eval Prop Prop s
(20)
13 Alternatively, one can compute inverse Mills ratio (IMR) from the propensity model and plug ρ·IMR into (17).
20
where Φ2 denotes the standard bivariate normal cumulative distribution function (CDF). Our proposed
model releases the independence assumption between two sets of observed outcomes (y’s and z’s). The
probability of observing yijt = 0 (i.e. lurking) can be expressed as:14
Pr( 0) 1 .ijt ijty Prop (21)
Based on (20) and (21), the joint likelihood for observing individual i who makes m purchase occasions
and posts n product ratings is given by
( 0) ( 1), Pr 0 Pr 1,
ijt ijti i i ijt ijt ijtt y t y
L y z y y z s , (22)
|--- (m n) terms ---| |---------- (n) terms ---------|
and the likelihood of observing the entire decision set made by N individuals is 1
,N
i i iiL y z .
5. Estimation
For parameters that are subject to certain constraints we apply the following transformation strategy. First,
since the correlation coefficient ρ takes value from [1, 1] only, we estimate the inverse arctangent
transformation of it, which maps the support [1, 1] to real line (Ying et al. 2006). Second, we estimate
0( )log b instead of 0b because the inverse scale parameter of a gamma distribution takes positive values
only. Finally, to ensure the magnitude of three cutoffs to obey the desired order (i.e. κ2 < κ3 < κ4), we
estimate the log of difference between two adjacent cutoffs. For variables having hyper-dispersion
property such as product price pijt and order handling time hijt, we take logarithm transformation. We also
mean-center all variables using the grand means to reduce the correlation between the estimated intercepts
and slopes and avoid potential multicollinearity in the propensity model (especially for variables which
enter propensity model in both linear and quadratic forms).15
To estimate the proposed model, we use hierarchical Bayes approach which is convenient for the
estimation of individual-specific parameters. Given the nature of parameter hierarchy, the parameters in
our model can be divided into two groups (Netzer et al. 2008): (1) “random-effect” parameters that vary
14 The marginal distribution of one series of bivariate normally distributed variables is simply a normal density. 15 We calculate the variance inflation factor (VIF) for the covariates included in the propensity model and the result
shows no evidence of multicollinearity (VIF < 2).
21
across individuals (denoted by θi); and (2) “fixed” parameters that do not vary across individuals (denoted
by ψ). We allow individual-specific parameters governing propensity intercept, product evaluation and
initial learning parameter to be correlated by assuming:
~ MVN , i , (23)
where denotes the mean effects that persist across individuals and Σ denotes the variance and
covariance matrix of θ.
As we do not have much prior knowledge about model parameters, we use diffuse multivariate
normal for the fixed parameter ψ. There are 19 elements in the vector ψ, including β capturing the
coefficients for 9 covariates in the propensity model, λ measuring the effect of 4 covariates in the
evaluation model, 3 cutpoints (κ2, κ3, κ4) which maps continuous rating evaluation and discrete posted
scores, and governing the mean effect of 3 individual-specific parameters. Let ξi denote individual-
specific deviation from . Following (23), we can directly estimate those deviations using
~ 0, i MVN .
We develop a Markov chain Monte Carlo (MCMC) procedure to recursively draw parameters from
the corresponding full conditional distributions using the following steps:
| , , , , ;
| ;
| , , , ;
| , , , ;
i i i
i
i
i
Y Z
Y Z
Y Z
(24)
We adopt random walk Metropolis-Hasting algorithm for steps where the conditional posterior
distributions do not have a closed form (Steps 1, 3 and 4 in (24)). To improve the efficiency of the
sampler, we use a two-step strategy. In the first step, we run the MCMC for 50,000 iterations and discard
the first 25,000 draws as “burn-in” samples.16 We calculate posterior means and empirical variance based
on the remaining 25,000 draws. In the second step, we run a separate MCMC using the posterior means
16 The choice of burn-in period is based on the visual observation on the trace plot of MCMC draws. In fact, 25,000
is a conservative number since the chain appears to converge after initial 5,000 iterations.
22
calculated in the previous step as the initial values and the empirical variance as the diagonal elements of
the covariance matrix for the random-walk proposal distribution.17 We run the MCMC sampler for
100,000 iterations and record every 10th draw only to mitigate the autocorrelation issue which is an
evitable consequence of the MCMC simulation (Hoff 2009). The adaptive chain converges immediately
and explores the parameter space efficiently. To test convergence, we perform Gelman and Rubin
diagnostic (Gelman and Rubin 1992) by running 3 parallel chains with different initial values and random
seeds. The potential scale reduction factors (PSRF) are roughly 1.02 for all parameters, suggesting that
the chains have converged to the target distributions.18 We combine draws from three parallel chains,
resulting in an effective sample size of at least 500 for all parameters.
5.1. Identification
We start our discussion on identification strategy from the evaluation model. There are at least two ways
to identify the evaluation model. The first approach is to estimate all 4 utility-rating translating cutpoints
with no intercept. To identify the mean effect of the intercept 0 , we adopt an alternative approach by
holding one cutpoint as a constant (i.e. κ1 = 0) and estimating the remaining three. Given everything else
equal, the difference in the mean ratings among individuals helps us identify λi. Similarly, we can achieve
the identification of the baseline propensity βi0 through the difference in the posting rate among
individuals. In our framework consumers first update their perception of system credibility, followed by
making rating decisions. Denoting the change of posting behavior over time by a series S, we can express
the relationship between β3, 0a and 0b using an analogous equation: 0 03 ( 1) /f a bS , where ( )f
represents the belief updating mechanism. Notice that once the initial value of belief parameters is
identified, the mechanism of belief updating would become deterministic and the identification for β3 can
be achieved. As a result, the relationship among them is identified from the joint likelihood of the change
in posting behavior over time. The difficulty is that we cannot identify 0a and 0b simultaneously given
17 The scale parameter of proposal distribution is adaptively chosen such that the acceptance rate is around 23%, as
suggested for high-dimension vector (Gelman et al. 2003). 18 A series of MCMC draws are considered to achieve convergence as long as PSRF < 1.2.
23
their relationship described in (14). Since all consumers have limited knowledge about the credibility of
the review system until they receive the first disconfirmation signal, we fix the richness of learning
experience 0a at a small number (provided 0 1a ) such that 0b is identifiable. Given everything else
equal, the variation in the change of posting behavior over time helps us identify the initial inverse scale
parameter bi0, which determines the rate at which each individual learns about the system credibility.
Once 0b is identified, the value of biasness parameter δit is determined and can be thought as a constant
(denoted by c) when it enters the propensity model. We can express the mean of the disconfirmation
signal as .ijt ijtQ Q c Expanding 25
21 2 4ijt ijt ijt ijtQ Q Q Q , substituting and collecting terms,
we will have 2 21 2 4 5 1 2( 2 c ) ,ijt ijtQ Q c c with 4 variables and 4 identifiable parameters.
5.2. Model Fit and Model Comparison
To demonstrate the fit of our proposed model, we compare our full (dynamic) model with its three nested
versions:
Model 1– A static model (no dynamic learning).
Model 2– A static model without disconfirmation entering propensity model.
Model 3– A static model without disconfirmation and product evaluation entering propensity model.
Table 2. Model Fit and Comparison
Model Dynamic
Belief Updating Quality
Disconfirmation Experienced
Quality DIC w.r.t.
Full Model* Full Yes Yes Yes ---
Model 1 No Yes Yes 349.7
Model 2 No No Yes 427.2
Model 3 No No No 483.9
* Lower DIC is preferred.
Table 2 reports the deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002) for
four models. The full model outperforms its three nested versions, and therefore we report and interpret
the estimation results from the full model in the next section. Our full model has a better model fit over
three static models because the belief updating mechanism provides a general way to explain the chance
24
in consumer rating behavior over time. In addition, the model fit becomes worse (higher DIC) if we
exclude the disconfirmation effect (Model 2) and the effect of product evaluation (Model 3) on posting
propensity one by one. This pattern provides supportive evidence that it is desirable to incorporate all
three components when modeling the behavior of consumer leaving online reviews.
6. Results
Table 3 reports the posterior means of parameters specified in the propensity model. Since we apply a
Bayesian approach to estimate the model, we evaluate the significance level of parameter estimates based
on the highest posterior density (HPD) intervals. A HPD interval describes the information we have about
the location of the true parameter after we have observed the data (Hoff 2009). Parameter estimates are
considered significant if the corresponding HPD intervals do not contain zero.
Table 3. Estimation Results of Propensity Model
Notation Description Coeff. Signif.
βi0 Mean effect of baseline propensity 0.831 **
β1 Disconfirmation – Linear 0.011 **
β2 Disconfirmation – Quadratic 0.025 **
β3 Perceived system precision 0.508 **
β4 Experienced quality – Linear 0.268 **
β5 Experienced quality – Quadratic 0.065 **
β6 Price 0.271 **
β7 Shipping and handling time 0.041 **
β8 Volume of posted ratings 0.009
β9 Variance of posted ratings 0.009 *
** (*) Indicates 95% (90%) HPD interval does not contain 0.
The parameters β1 and β2 together suggest that an individual’s participation in leaving online product
review is, at least partially, driven by the realization of disconfirmation. That is, the further the post-
purchase evaluation deviates from the pre-purchase expectation, the more likely an individual is to
express her opinions about the product. If the expectation more or less reflects one’s own assessment, the
consumer is more inclined to lurk. Our estimation results demonstrate a very different story from what
appears to be from the distribution of rating discrepancy (see Figure 3). Loosely, the disconfirmation
25
effect on the posting behavior could be attributed to many reasons such as altruism. An altruist may praise
the product by submitting a score higher than the current mean level, if her usage experience is higher
than what the rating signal represents. On the contrary, if her usage experience on the product is worse
than what the rating signal indicates, she may want to warn peer consumers to be vigilant with the product
by leaving a below-the-average rating. Moreover, a negative β1 suggests that negative disconfirmation
induce posting behavior from consumers to a larger extent, ceteris paribus. While existing work has
documented several motives why consumers engage in WOM from a normative point of view (Hennig-
Thurau et al. 2004; Sundaram et al. 1998), this research provides a positive validation and contribute to
literature by demonstrating disconfirmation is one of the underlying forces driving consumers to
voluntarily engage in online WOM.
A unique feature of this research is that we tie the change of consumer participation in generating
reviews to the evolutionary perception of the system precision. From an economic perspective, it should
be clear that the disconfirmation has a different effect from the perceived system precision. The former
captures the short-term impact of a disconfirmation shock (presented by β1 and β2), whereas the latter
measures an accumulative attitude based on a series of realized disconfirmation in the past and can be
thought to have an long-term influence on the rating behavior (presented by β3). Our results show that
consumers tend to be more silent in rating activities as they perceive the review system to be more precise
over time (β3 < 0). Alternatively, we can say that consumers are more vocal when the review system is
perceived to be noisy.
As for the effect of the product evaluation on posting propensity, the estimated coefficient is
negative for the linear term (β4) but positive for the quadratic term (β5). Combined, these two parameter
estimates indicate that consumers are more likely to share opinions when the experienced product quality
to be either very high or very low. This finding echoes the polar effect observed by Anderson (1998) in an
offline setting and by several researchers in an online setting (Dellarocas et al. 2010; Dellarocas and
Narayan 2006; Moe and Schweidel 2012). Furthermore, we also find that consumers are more interested
in rating products with higher prices (β6 > 0) and products handled and shipped out in a timely manner (β7
26
< 0). The rating environments, characterized by the volume and variance of posted ratings, has dissimilar
impacts on a focal consumer’s posting decision. In particular, online raters’ posting decision is subject to
a crowding-out effect, meaning that a focal individual would like to lurk if there has been a big crowd
sharing their opinions (β8 < 0). Such finding is analogous to a political phenomenon where voters tend to
abstain if public polls have declared clear winners (Sudman 1986). On the other hand, the dissension of
posted opinions would encourage the focal individual to contribute in eWOM (β9 > 0).
The estimation results of other parameters are reported in Table 4. The coefficient for product price
is positive but insignificant (λ1 > 0), implying that product price in general is a weak proxy for quality. As
expected, the order handing time has a negative impact on rating evaluation (λ2 < 0). This provides
evidence that online consumers more or less reflect the service level they receive from the e-commerce
site in the product ratings. Unlike in the propensity model, the dissension of public opinions has a
negative impact on latent rating evaluation. The posterior mean of our estimate of log(bi) is significant,
suggesting that the updating mechanism well captures the evolution of the system precision perceived by
consumers.
Table 4. Estimation Results of Evaluation Model and Other Parameters
Model Notation Description Mean Signif.
Evaluation λi0 Mean effect of experienced quality 1.997 **
λ1 Price 0.033
λ2 Shipping and handling time .104 **
λ3 Volume of the rating environment .015
λ4 Variance of the rating environment .033 *
Cutoffs κ1 Cutpoints for s=1 and 2 (fixed at 0) 0.000 ---
κ2 Cutpoints for s=2 and 3 0.486 **
κ3 Cutpoints for s=3 and 4 1.406 **
κ4 Cutpoints for s=4 and 5 2.459 ** Credibility Updating
log(bi0) Mean effect of initial inverse scale parameter
(log-transformed) 2.189 **
ai0 Initial shape parameter (fixed at 5) 5.000 ---
Correlation ρ Interdependency between two rating stages 0.082
** (*) Indicates 95% (90%) HPD interval does not contain 0.
27
Table 5 reports the mean and standard deviation of individual-specific parameters. The large
standard deviations for βi0 and λi0 suggest that online reviewers are substantially heterogeneous in baseline
posting propensity and product evaluation (also see Figure 7). To explore the interdependence between
baseline latent parameters at the individual level, we provide the pair-wise correlation matrix among them
in Table 6. The positive correlation between βi0 and λi0 suggests that after controlling for other factors,
individuals who have higher baseline posting propensity appear to be more lenient with respect to giving
higher numeric ratings. This finding is opposite to Moe and Schweidel (2012) who find that active raters
are more negative. Finally, our estimate results indicate less heterogeneity across individuals in their
initial perception of system precision, as indicated by a small standard deviation for bi0.
Table 5. Individual-specific Parameter Estimates
Model Notation Mean among individuals
Std. dev. among individuals
Baseline propensity βi0 0.831 0.598
Baseline evaluation λi0 1.997 0.520
Initial learning status bi0 8.931 0.208
(a) Distribution of βi0 (b) Distribution of λi0 (c) Distribution of bi0
Figure 7. Distributions of Individual-Level Parameters
Table 6. Individual-specific Parameter Estimates
βi0 λi0 bi0 βi0 1.000 0.142 0.387 λi0 1.000 0.876 bi0 1.000
28
6.1. Extension and Robustness Checks
In this section, we relax some of our model assumptions by taking into account other factors that could
potentially bias our findings. The estimation results are consistent before and after we consider the
following three robustness checks.
6.1.1. The Effect of Disconfirmation on Product Evaluation
In our proposed model, we assume that the disconfirmation only enters the propensity model and does not
have any impact on the post-purchase evaluation of the product. That is, consumers do not factor their
prior expectation about quality into product evaluation. Prior research has shown that in offline settings, a
consumer’s overall satisfaction could be affected by the quality disconfirmation (Anderson and Sullivan
1993) and such relationship is empirically shown to be positive (Rust et al. 1999). Based on this theory,
we now allow a consumer’s rating evaluation to be affected by her overall product satisfaction S, which is
a linear combination of her experienced product quality and the quality disconfirmation:
ijt ijt ijtS Q Q , and ν is a new parameter to be estimated. With this specification, the evaluation
model can be rewritten as:
1:4 .ijt jtijtv SE al X (25)
The propensity model becomes:
2 21 2 30 54 6:9 .ijt ijt it ijt ijt ti t jj i Q Q CreProp d S S X (26)
The mean of disconfirmation becomes:
0( ) ( ) / (1 )j i jt itQ R R . (27)
Comparing (27) with (5), we can see that the inclusion of ν will reduce the mean of disconfirmation signal
towards zero, provided ν > 0.
The estimation results (with respect to parameter estimates and DIC) of the new model only have
marginal deviation from those reported in Table 3, suggesting that the main results of this paper is robust
even if we consider the impact of disconfirmation on overall product satisfaction. The posterior mean of ν
29
is 0.117 and is significant at 95% level, suggesting that online consumers indeed factor in their ex-ante
expectation when choosing product ratings.
6.1.2. The Interaction between Disconfirmation and Rating Environments
We have examined how a focal individual’s posting decision is influenced by disconfirmation she
encounters as well as rating environments to which she is exposed. An interesting question to ask is: Is
the disconfirmation effect itself also moderated by opinions expressed by the crowd? To answer this, we
allow the effect of disconfirmation on posting propensity19 to interact with the volume and variance of
posted ratings, and incorporate these two interaction terms in our propensity model. The parameters of
interest are reported in Table 7.20
Table 7. Effects of Rating Environments on Posting Decision
Effect Coeff. Signif.
Volume on propensity 0.008
Variance on propensity 0.023 **
Volume on Disconfirmation Effect 0.026
Variance on Disconfirmation Effect 0.008 *
** (*) Indicates 95% (90%) HPD interval does not contain 0.
Consistent with the results shown in Table 3, we still find the evidence that the level of disagreement
among others encourages posting behavior. Interestingly, the coefficient for “Disconfirmation Variance”
interaction term is negative and significant. This indicates that the disconfirmation effect is moderated by
the degree of dissension among posted opinions, perhaps because a consumer can anticipate the review
information to be noisy and hence become less sensitive to the realized disconfirmation. Moreover, the
coefficient of “Disconfirmation Volume” interaction has a positive sign. This also makes sense. The
higher the volume of posted ratings, the more trustworthy consumers will perceive the review information
to be, and therefore the more sensitive they will be to the disconfirmation effect.
6.1.3. The Use of the Long-term Average Rating as a Proxy for Product Quality
19 The disconfirmation effect can be computed as the multiplication of disconfirmation variables and corresponding
coefficients (for both linear and quadratic terms). 20 Since the parameter estimates do not have a substantial change, here we report coefficients for variables that are
related to rating environments only.
30
The long-term product-level average ratings are used as proxies for the baseline product quality. Although
the validity of this proxy is justified (Footnote 9), critical readers may argue that this proxy may not
necessarily reflect the overall evaluation among all consumers. Hu et al. (2006) argue that the mean rating
would over-report (under-report) the true quality if consumers are more inclined to brag (moan) about the
product when they are highly satisfied (disgruntled). To address this issue, we add a product-level random
effect in the formulation of the post-purchase evaluation. As a result, (2) can be rewritten as:
0( )ij j j iQ R , (28)
where 2~ (0, ).j N The parameter ωj models the product-level randomness in a sense that some
products are overrated and others are underrated by the long-tern mean ratings. If the value of estimated
is small, we can argue that the long-term average ratings more or less reflect the true product quality.
The estimation results obtained with this new specification do not have noticeable differences from
those obtained from our original model. The posterior mean (standard deviation) for ωj is 0.011 (6×105),
providing supportive evidence that the long-term average ratings serve as a good proxy for the baseline
quality. What makes our finding different from the extant work is that, in addition to the post-purchase
evaluation, we also consider the disconfirmation effect. The richer model allows consumers to make
posting decisions in response to the overall tone on the product quality set by posted ratings. We should
expect that, in the presence of the disconfirmation effect, the average rating will gradually converge to the
true quality in the long run. In contrast, the results of Hu et al. (2006) is dependent on an assumption that
the mechanism underlying consumer posting behavior is irresponsive to previously expressed opinions by
others.
6.2. Simulations and Analyses on Consumer Rating Behavior
In this subsection, we perform a series of counterfactual analyses to further highlight the significance of
our findings and better understand consumer rating behavior in an online setting.
6.2.1. Recovering unobserved rating scores
In the first analysis, we recover the rating outcomes (yijt‘s and zijt’s) for purchase occasions that have a
follow-up review entries (yijt = 1). Those recovered (simulated) outcomes are compared to the observed
31
ones to gauge the performance of our model. We also “uncover” the outcomes for lurking occasions
(yijt=0) to demonstrate the mechanism underlying online rating behavior. The simulation procedure for is
summarized as follows:
1. We compute the latent posting propensity per (15) and evaluation per (17) for all purchase occasions
based on the posterior estimates.
2. Given the computed posting propensity, evaluation and estimated cutpoints, we simulate individual
decisions of whether to rate per (16) and what to rate per (18).
3. We repeat Step 2 for 1,000 iterations and compute the average for the quantities of our interests
across iterations.
To make sure the number of iterations is sufficient, we repeat Steps 2-4 in three parallel processes
with different random seeds. We compare simulated results and do not find any inconsistency across
processes.
Figure 8. Simulated vs. Actual Ratings
We begin our analysis by assessing the performance of our model in explaining the behavior of
consumer leaving ratings. Figure 8 plots the simulated ratings (presented by light grey bars) for those
rated occasions against the actual ratings observed from the data set (presented by dark grey bars). Two
series of numbers are very close to each other, indicating that the proposed model well captures the
mechanism governing consumers’ rating decisions.21
21 We also run the same analysis for three alternative models defined in Section 6.2. The results show that our
proposed model outperforms its three nested versions in terms of recovering consumers’ rating decisions.
6.9% 6.1%
19.6%
30.6%
36.8%
7.2% 6.4%
19.9%
29.6%
36.9%
0%
10%
20%
30%
40%
1 2 3 4 5
Fre
quen
cy (
%)
Rating
Simulated Actual
32
For interpretation purpose, we compute the posting rates evaluated at two dimensions and plot them
against different levels of post-purchase evaluation (Figure 9a) and different levels of rating discrepancy
(Figure 9b). We can see that the relationship between posting rate and discretized product evaluation is
best characterized by a left-skewed U shape. Negative experiences (rated as 1 or 2) have a higher chance
to be reported (7.44% and 6.10%, respectively), relative to neutral (5.88%) and positive evaluations
(5.54% and 5.70%). It is evident that disappointed or disgruntled consumers (those who encounter strong,
negative rating discrepancy) are more vocal in expressing opinions about the product. The simulation
results provides a sharp insight into what drives consumers to leave online reviews; an insight that cannot
be discovered from directly observing the data pattern, as shown in Figure 2 and Figure 3.
(a) Posting Rate (%) vs. Post-purchase Evaluation (b) Posting Rate (%) vs. Rating Discrepancy
Figure 9. Dissimilar Effects of Different Measurements on Posting Behavior
6.2.2. Convergence of product ratings
To highlight how the disconfirmation effect influence the evolution of product ratings at the product level,
we perform our second simulation by following the procedure below.
1. We sample 3,000 individuals from the posterior estimates of population-level parameters (Σ). We
assume all individuals purchase and consume the product with the baseline quality being equivalent
to a 3.5 star. The sequence of individuals entering the market is random.
2. For each individual we first simulate her posting decision; and for the individual who decides to post
a rating, we simulate her decision of what rating to give. When a new rating is posted, we calculate
the up-to-date mean ratings as well as other variables that will serve as control variables in the
subsequent rating occasions.
7.44%
6.10% 5.88% 5.54% 5.70%
0%
2%
4%
6%
8%
1 2 3 4 5
Rating
0%
3%
6%
9%
12%
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
Rating Discrepancy
33
3. We repeat Step 2 until 150 ratings are posted.
4. We repeat Steps 1-3 for multiple iterations.
Figure 10 depicts four major evolutionary patterns of the online product ratings observed from our
simulation results. It is evident that in the long run, the mean rating will converge to the true product
quality (3.5). If the mean rating in the early stage is relatively high to the true quality, individuals with
neutral or negative evaluation would perceive the rating signal to be inflated. The disconfirmation effect
would take place and induce low ratings from those dissatisfied consumers. As a result, we shall expect
that the mean rating will gradually drop to the level of 3.5 and then stabilize (see Figure 10a). While such
declining trend have been attributed to different effects such as self-selection biases (Li and Hitt 2008)
and environmental factors (Moe and Schweidel 2012), the effect of disconfirmation on posting decisions
can also provide an alternative explanation to this data pattern. Perhaps more importantly, the
disconfirmation effect can also explain other evolutionary patterns that cannot be explained by those
established theories. For example, if the product is initially underrated, the subsequent consumers who
encounter positive disconfirmation are more likely to lift the mean rating up by sharing their pleasant
usage experiences (Figure 10b). Interestingly, we also observe an undershooting property where the
average rating first exhibits a steep declining pattern and then bounces back to a steady state (Figure 10d).
Based on our estimation and simulation results, we believe that the disconfirmation effect is one of the
most important forces driving the voluntary behavior of consumer posting product reviews.
6.2.3. The effect of fake, inflated ratings on subsequent rating entries
Our simulation can also be used to understand how review manipulation in the early stage influences
subsequent ratings submissions. Similar to previous setting, we consider a fictitious scenario where a firm,
selling a product with true quality being equivalent to a 3.5, is able to “set” the mean rating at 4 (an
inflation of a 0.5 star) when the number of reviews reaches 15. We repeat Steps 1-4 listed in previous
subsection for 10 iterations and average the posted ratings cross iterations. As Figure 11 illustrates, the
mean rating plummets after the 15th entry and stabilizes at the level of true quality. Although fake,
inflated ratings may elevate buyers’ pre-purchase expectation and hence reap additional revenue, we
argue that such deceptive information may strongly induce very negative feedback from disgruntled
34
consumers who post reviews for anxiety reduction or vengeance purpose. Similarly, we should expect that
the damage caused by the competitors’ malicious review distortion would be alleviated by subsequent
positive ratings reported by genuine and satisfied customers.
Figure 10. Various Evolutionary Patterns of Ratings (X-axis: Rating Index)
Figure 11. Effect of Inflated Ratings on Subsequent Rating Entries (X-axis: Rating Index)
7. Conclusion
The primary objective of this paper has been to study the effect disconfirmation has on consumer rating
behavior in a digital setting. The early research on online product reviews has studied how user-generated
ratings can be related to market performance, whereas the recent work focuses on the impact of existing
ratings on subsequent ones from a social dynamics standpoint (Lee et al. 2014; Moe and Schweidel 2012;
Shen et al. 2013). Our work contributes to the latter literature stream by proposing a novel framework in
which a focal consumer’s posting decisions has been influenced by others’ opinions even before the
35
product is purchased and consumed. In addition, we model how consumers’ perception of the system
credibility evolves over time as a result of a series of realizations of disconfirmation. By integrating these
two features in the individual decision-making process, we are able to provide a comprehensive
understand on what drives consumers to voluntarily contribute online product ratings.
Using a rich data set containing complete purchasing and rating activities at the individual level, we
empirically show that online consumers’ rating behavior is driven by disconfirmation, the extent to which
the post-purchase evaluation deviates from the pre-purchase expectation of the same product. The
magnitude of the disconfirmation effect is decreasing in the variance of posted ratings while increasing in
the volume of submitted reviews. We also find that online raters tend to become silent as they perceive
the review system to be more precise over time.
The main finding of this paper echoes prior research in some aspects but also provides different
insights in others. On one hand, the disconfirmation effect serves as one of the underlying drivers of why
people engaging in WOM such as concerns for others, anxiety reduction, vengeance, etc. (Anderson 1998;
Hennig-Thurau et al. 2004). The impact of disconfirmation on posting behavior can also (at least partially)
explain 1) the commonly observed U-shaped distribution of online product ratings; and 2) the declining
trend of average ratings at the product level. On the other hand, we believe that the long-term average
rating can more or less represent the true product quality. Through simulations, we demonstrate that the
identified disconfirmation effect will “ensure” that the average rating converges to the true value in the
long run. Our proposition to some extent disagrees with Hu et al. (2006) who argue that the mean score
may provide misleading recommendation, which is contingent on an analytical assumption that
consumers’ posting decisions are solely triggered by their perceived quality.
Our empirical results shed light on the economic value of online product ratings in the following
aspects. Manufactures or service providers should be aware that online reviewers are more prone to
provide feedback when expectations do not match their own perceived quality. While manipulating
product reviews by inflating numeric ratings can temporally boost the sales revenue, inflated ratings could
36
turn to be detrimental in the long run as disappointed customers engage in negative eWOM. 22
Furthermore, online reviewers’ perceptions of system precision have a negative impact on their
propensity to contribute reviews. Existing literature also shows that rating environment with smaller
volume or lower valence discourages posting incidence (Moe and Schweidel 2012). To keep the online
rating environment healthy and prosperous, policy makers and marketers who are in charge of online
ratings campaigns should design various incentives for different consumers accordingly.
This study has some limitations and can be improved in the following directions. First, our data has
limited information which prevents us from examining online rating behavior in a more detailed way. For
example, although we have utilized the aggregate review information, such as valence and variance, in
formulating quality variables, we do not consider the possibility that some consumers may value positive
ratings and negative ones differently. If we were able to observe the distribution of posted ratings at the
time of purchase, we might be able to discover more interesting findings on the way consumers interpret
review signal in the pre-purchase stage. Second, a promising direct for future research is to apply text
mining techniques to extract the sentiments stored in the textual data and incorporate them into our
econometric model. The synergy created by the integration of different research methods may allow us to
provide a shaper insight into consumers’ online rating behavior. Third, consumers’ browsing history
before or after purchase can greatly help us build up an even more sophisticated model. For example, if
we knew how many pages of reviews each prospective buy reads and how long she stays on each page,
we were able to better calibrate how previously posted reviews impact the formulation of the expected
quality. If we further knew what similar items each prospective buyer has looked at, we could directly
model the effect of existing ratings on a focal consumer’s purchase decision.
Reference
Anderson, E. W. 1998. Customer Satisfaction and Word of Mouth. Journal of Service Research 1 (1):5-
17.
22 In our empirical model, we can only show that inflated ratings would drive disappointed buyers to leave negative
numeric ratings. However, we cannot gauge the harm those negative textual reviews would do to the firm’s reputation.
37
Anderson, E. W., and M. W. Sullivan. 1993. The Antecedents and Consequences of Customer
Satisfaction for Firms. Marketing Science 12 (2):125-143.
Chaiken, S. 1980. Heuristic versus systematic information processing and the use of source versus
message cues in persuasion. Journal of personality and social psychology 39 (5):752.
DeGroot, M. H. 1970. Optimal statistical decisions. Vol. 82: Wiley-Interscience.
Dellarocas, C. 2006. Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and
Firms. Management Science 52 (10):1577-1593.
Dellarocas, C., M. Fan, and C. Wood. 2004. Self-interest, reciprocity, and participation in online
reputation systems.
Dellarocas, C., G. Gao, and R. Narayan. 2010. Are consumers more likely to contribute online reviews for
hit or niche products? Journal of Management Information Systems 27 (2):127-158.
Dellarocas, C., and R. Narayan. 2006. A statistical measure of a population’s propensity to engage in
post-purchase online word-of-mouth. Statistical science 21 (2):277-285.
Deloitte. 2007. Most consumers read and reply on online reviews; companies must adjust. USA.
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2003. Bayesian data analysis: CRC press.
Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences.
Statistical science:457-472.
Ghose, A., P. G. Ipeirotis, and B. Li. 2012. Designing ranking systems for hotels on travel search engines
by mining user-generated and crowdsourced content. Marketing Science 31 (3):493-520.
Gormley, M. NY attorney general cracks down on fake online reviews. NBC News 2013 [cited. Available
from http://www.nbcnews.com/tech/internet/ny-attorney-general-cracks-down-fake-online-reviews-
f4B11235875.
Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler. 2004. Electronic word-of-mouth via
consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?
Journal of interactive marketing 18 (1):38-52.
Hoff, P. D. 2009. A first course in Bayesian statistical methods: Springer.
Hu, N., P. A. Pavlou, and J. Zhang. 2006. Can online reviews reveal a product's true quality?: empirical
findings and analytical modeling of Online word-of-mouth communication. In Proceedings of the
7th ACM conference on Electronic commerce. Ann Arbor, Michigan, USA: ACM, 324-330.
Johnston, S. Don't Fall for Fake Online Reviews. US News 2014. Available at
http://money.usnews.com/money/personal-finance/articles/2014/07/08/dont-fall-for-fake-online-
reviews.
Koop, G., D. J. Poirier, and L. Tobias. 2007. Bayesian Econometric Methods (Econometric Exercises):
Cambridge University Press.
38
Lee, Y., K. Hosanaga, and Y. Tan. 2014. Do I follow my friends or the crowd? Information Cascades in
Online movie rating. Management Science Forthcoming.
Leggatt, H. 2011. 24% of consumers turned off after two negative online reviews 2011 [cited April 13
2011]. Available from http://www.bizreport.com/2011/04/27-of-consumers-turned-off-by-just-two-
negative-online-revie.html.
Li, X., and L. M. Hitt. 2008. Self-Selection and Information Role of Online Product Reviews. Information
Systems Research 19 (4):456-474.
McGlohon, M., N. Glance, and Z. Reiter. 2010. Star quality: Aggregating reviews to rank products and
merchants. Paper read at International Conference on Weblogs and Social Media.
Moe, W. W., and D. A. Schweidel. 2012. Online Product Opinions: Incidence, Evaluation, and Evolution.
Marketing Science 31 (3):372-386.
Netzer, O., J. M. Lattin, and V. Srinivasan. 2008. A Hidden Markov Model of Customer Relationship
Dynamics. Marketing Science 27 (2):185-204.
Nielson. Word-of-Mouth the most Powerful Selling Tool. Nielson 2007 [cited 2013/04/29. Available from
http://nz.nielsen.com/news/Advertising_Oct07.shtml.
PowerReviews. 2010. Available from http://www.powerreviews.com/products/social-content-
engine/customer-reviews.
Rust, R. T., J. J. Inman, J. Jia, and A. Zahorik. 1999. What You Don't Know About Customer-Perceived
Quality: The Role of Customer Expectation Distributions. Marketing Science 18 (1):77-92.
Schlosser, A. E. 2005. Posting versus lurking: Communicating in a multiple audience context. Journal of
Consumer Research 32 (2):260-265.
Shen, W., Y. J. Hu, and J. Rees. 2013. Competing for Attention: An Empirical Study of Online Reviewers’
Strategic. Working paper.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. Van Der Linde. 2002. Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64
(4):583-639.
Sudman, S. 1986. Do Exit Polls Influence Voting Behavior? The Public Opinion Quarterly 50 (3):331-
339.
Sundaram, D. S., K. Mitra, and C. Webster. 1998. Word-of-mouth communications: a motivational
analysis. Advances in consumer research 25 (1):527-531.
Ying, Y., F. Feinberg, and M. Wedel. 2006. Leveraging Missing Ratings to Improve Online
Recommendation Systems. Journal of Marketing Research 43 (3):355-365.