in-store experiments to determine the impact of price on...
TRANSCRIPT
In-Store Experiments to Determine
the Impact of Price on Sales∗
Vishal Gaur†, Marshall L. Fisher‡
December 2003
(Revised July 2004)
Abstract
This paper describes an experimentation methodology to measure how demand varies withprice and the results of its application at a toy retailer. The same product is assigned differentprice-points in different store panels and the resulting sales are used to estimate a demand curve.We use a variant of the k-median problem to form store panels that control for differencesbetween stores and produce results that are representative of the entire chain. We use theestimated demand curve to find a price that maximizes profit. Our experiment yielded theunexpected result that demand increases with price in some cases. We present likely reasons forthis finding from our discussions with retail managers. Our methodology can be used to analyzethe effect of several marketing and promotional levers employed in a retail store besides pricing.
∗This paper is based on a pricing experiment conducted at Zany Brainy, Inc., a toys retailer based in King of
Prussia, Pa., and owned by F.A.O. We would like to thank Gene Rosadino for facilitating the experiment, and
Young-Hoon Park (Cornell University) and Robert Shoemaker (New York University) for helpful comments.†Department of Information, Operations and Management Science, Leonard N. Stern School of Business, New
York University, 8-160, 44 West 4th St., New York, NY 10012. E-mail: [email protected].‡The Wharton School, University of Pennsylvania, Jon M. Huntsman Hall, 3730 Walnut St., Philadelphia, PA
19104-6366. E-mail: [email protected].
1 Introduction
An in-store experiment is a useful scientific tool for retailers to learn about consumer response to
the use of various marketing levers, such as pricing, promotions, store layout, presentation, etc.
However, successful design and execution of experiments involves many challenges. A retailer must
select such stores for the experiment that are representative of its entire chain, otherwise the results
of the experiment may be idiosyncratic and may not be generalizable to the entire chain. Further,
the retailer faces the choice whether to try all experimental treatments within a single store over
time, or to try different treatments across a cross-section of comparable stores. This choice depends
on the length of product lifecycle and the seasonality of demand. Finally, the retailer must execute
the experiment in a controlled environment, keeping the effect of non-experimental factors to a
minimum. These challenges can be daunting. For example, in a recent study of 32 large U.S.
retailers, we found that 90% of the retailers conduct price experiments. However, the retailers
themselves rated the effectiveness of their experiments as a median score of 6 out of 10.
This paper describes a controlled pricing experiment conducted at a toy retailer to measure
how its demand varies with price and to determine the price at which profit is maximized. The
products considered in the experiment are three different types of toys involving different buyer
behaviors (a branded toy, an unbranded but technologically complex toy, and an unbranded but
simple toy). None of the three products is a repeat purchase item. A set of stores is selected for the
experiment and the products are offered at different prices in different stores. The original prices
of the products are changed so that the customers see only the offered price and cannot tell if it
has been marked up or down. The experiment is carefully designed to ensure control and minimize
the chance that a customer may visit two stores and find the same product at different prices. We
present the experiment design methodology and then analyze the results from the experiment.
1
The key methodological questions addressed in the paper are: what stores to select for the
experiment, and how to map different price-points to each of these stores. Let n denote the number
of stores in the chain and suppose that each product in the experiment is to be tested at p price-
points in m stores (mp ≤ n). Thus, at each price-point, we require m stores such that they are
similar to the stores at the other price-points and, moreover, are representative of the entire chain.
We select such stores by using a clustering technique based on a measure of distance or degree of
dissimilarity between stores. Then we design a randomized block layout for the experiment wherein
p stores in each cluster are assigned one-to-one to different price-points.
Our paper contributes to the literature on pricing and field experiments in several ways. First,
while designing experiments has a long history in marketing and consumer research, in-store field
experiments are not that common. Instead, most existing research is based on interviews and
laboratory experiments involving the use of overt intervention to elicit consumer response; research
on pricing additionally uses historical time-series and panel data. An in-store experiment offers
some methodological advantages. It enables us to observe the relationship between price and unit
sales directly by changing prices systematically in a controlled environment. Thus, it allows us
to distinguish between consumer response to changes in regular price and consumer response to
markdowns. Furthermore, while prior research on pricing largely considers frequently purchased
items, we consider short lifecycle and one-time purchase items. The buyer behavior for such items
is more complex than that for frequently purchased items since consumers typically are less well
informed about such items and rely on price and other cues to make their purchase decisions (see
Roberts and Lilien (1993) for a survey of consumer behavior models). A field experiment enables
us to study buyer behavior for such items without having to assume a stylized pricing model.
Second, our experiment yields counterintuitive findings and shows that the relationship between
price and demand is not straightforward. While two of the three products used in the experiment
2
had downward-sloping demand curves, the third product had demand increasing significantly with
price in a part of the tested price range. Researchers in marketing have studied the role of price
in the consumer choice process as a budget constraint (allocative role) and as a signal of quality
(informational role). Our finding provides evidence to support existing theory and further shows
that the informational role of price can be so strong as to dominate the allocative role and result
in demand increasing with price. This finding is unique because we show an increase in demand
with price for the same product whereas existing empirical research typically compares price and
demand across competing products with different quality levels. We present possible explanations
for this finding from our discussions with managers at several retailers, and discuss managerial
implications on assortment selection and signals of quality other than price.
Third, we address the store selection problem in experiment design. This problem is particularly
relevant to seasonal or short-lifecycle products because there is a short time-span available for such
products, and it is not possible to change prices in the same subject store over time and observe the
corresponding change in sales.1 Instead, a matched subset of stores must be identified and different
prices must be used simultaneously in different stores. Several criteria must be met in selecting
the stores in order that the results of the experiment are accurately representative of the retailer’s
customer segment, and the cost and execution complexity of the experiment are minimized.
Our paper shows that in-store experimentation can be a powerful tool for retail managers to
learn about their customers. Once designed, the experiment can be conducted regularly with
different product groups. Further, it can be applied not only to price but also to other marketing
levers such as ‘item of the week’ promotion, shelf-space allocation, presentation, salesperson push,
etc.1It is also not prudent to change prices in the same subject store unless the goal is to study the effect of price
promotions. This is so because customers who revisit the store may discover the changes in price.
3
This paper is organized as follows. Section 2 describes the relevant literature; §3 presents our
store selection model; §4 discusses the application of the methodology to the retailer; §5 presents
the results of the experiment; §6 discusses the insights obtained for the pricing decision; and §7
concludes the paper with directions for future research.
2 Literature Review
There has been a vast amount of research on pricing in the marketing literature. We focus on the
main ideas related to our paper without providing a comprehensive survey.
Tellis (1988) conducts a meta-analysis study of models estimating the price elasticity of demand
for various products. He reports 367 estimates of elasticity for 220 brands/markets obtained from
42 studies published during the period 1960-85 and yielding 424 sales models. These studies use
time-series or panel sales data for frequently purchased products, and estimate various market
share models. They differ by product category, brand, lifecycle, estimation method, functional
form, region and demographic groups. Tellis finds that the mean price elasticity of demand is
significantly negative and that it differs significantly across product categories and even across
brands in the same product category. More significantly, he finds 50 items with estimated price
elasticities greater than zero and another 40 items with price elasticities between 0 and -1. See
also Brodie and Kluyver (1984), Ghosh, Neslin and Shoemaker (1984) and Naert and Weverbergh
(1981) for price elasticity studies using different datasets and pricing models.
It is recognized that price plays two distinct roles in the consumer choice process: an allocative
role (as a budget constraint) and an informational role (as a signal of quality); see Nagle (1984),
Rao (1984, 1993), Rao and Sattler (2000) for reviews. Empirical measurement of price elasticity of
demand is complicated by the fact that only the net effect of price can be observed in practice, and
4
the two roles cannot be distinguished. For example, in the aforementioned studies on price elasiticity
of demand, it is likely that the items with elasiticity greater than -1 had significant informational
role of price, and the ones with positive elasiticity had the informational role dominating the
allocative role. While there is a lack of research that integrates the two roles of price into a single
model, several papers use experiments to identify these roles.
For example, several researchers have examined the effect of price on consumers’ quality percep-
tions and on objective quality. Objective quality is defined as an unbiased measurement of quality
based on characteristics such as design, durability, performance and safety, and is often obtained
from independent consumer reports published by the Consumers Union. Research evidence suggests
that while price is used as an indicator of quality, there is a lack of uniformity across products as
well as across individuals in both the price-quality relationship and the perceived quality-objective
quality relationship (see Etgar and Malhotra 1981, Gerstner 1985, Zeithaml 1988). Several expla-
nations have been offered for this lack of uniformity. For example, Lichtenstein and Burton (1989)
find that consumers perceive a stronger association between price and quality for durable products
than for non-durable products, even though the objective quality of products may be unrelated to
or negatively correlated with price. They offer the following explanations for this finding: (i) less
knowledge about durable goods because the consumer makes fewer and more infrequent purchases
in a durable goods category, and (ii) greater difficulty in evaluating the quality of durable goods as
they are more complex products.
Dodd, et al. (1991) investigate how the price-perceived quality relationship influences buyers’
perception of value or their purchase intentions. They conduct an experiment with two short
lifecycle products (calculators and personal stereo) to identify the effects of price, brand and store
name on the consumers’ quality perceptions, value perceptions and willingness to buy. They find
that while price is positively correlated with quality perceptions, it is negatively correlated with
5
willingness to buy for both products. Their methodology differs from ours in that they use a
laboratory experiment rather than in-store experiment.
The lack of uniformity in the relationship between price and sales underscores the need for
further empirical research to validate existing theory. It also implies that retailers must use pro-
prietary data to learn about their own consumers. The field experiment presented in this paper
fits these purposes. It differs in focus from the existing research in three important ways. First,
we directly measure the effect of changes in regular price on sales rather than eliciting consumer
attitudes and preferences. Second, existing empirical research focuses on estimating market share
models for competing products or identifying the drivers of buyer behavior process, while we focus
on a retailer’s pricing decision. Finally, we consider short lifecycle one-time purchase items which
have not been commonly studied in prior research using actual sales data.
We note that other researchers have used controlled experiments at retail chains to estimate the
price elasticity of demand, see for example, Nevin (1974), Curhan (1974) and Neslin and Shoemaker
(1983). Experiments have also been used to study the impact of store environmental variables such
as music, lighting, behavior of store employees and store design on consumer behavior. See, for
example, Baker, Levy and Grewal (1992) and Gagnon and Osterhaus (1985). These studies differ
from this paper in that they use aggregated data or data from a pre-selected set of stores. Thus,
they do not address the problem of store selection for a retail chain selling short lifecycle products.
Our paper is related to Fisher and Rajaram (2000), who present an experimental methodology
for testing new merchandise at a subset of stores prior to launch and demonstrate results from
application to a women’s apparel retail chain. Their paper differs from ours since they do not
consider price as a decision variable. Further, their experiment design does not involve multiple
treatment effects, and therefore, does not require a matched subset of stores to be selected.
6
3 Experiment Design: Store Selection Model
Let n be the number of stores in the retail chain, p the number of price-points to be tested in the
experiment, and m the number of stores at which each price-point is to be repeated (mp ≤ n). In
the terminology of experiment design, each price-point is called a ‘treatment’, so that we have p
treatments and m repetitions of each treatment. We construct a randomized block design for the
experiment. We first partition all the stores into m disjoint blocks such that each block has at least
p stores and the stores in each block are as ‘alike’ as possible. We then select p stores in each block
that are geographically far from each other and randomly assign them to the p price-points. Thus,
each price-point is tested once in each of the m blocks.
The randomized block design is used because it satisfies three principles of experiment design:
replication, randomization, and local control. The replication of each treatment m times gives a
basis for the estimation of the experiment error. Randomization within each block is a necessary
condition for obtaining a valid estimate of the effects of the treatments on the experiment results
since it controls for unknown differences between stores that may be sources of error in the experi-
ment. Local control implies that the stores assigned to each block be chosen as alike as possible for
the comparison of treatment effects within each block. These three principles were formulated by R.
A. Fisher (1923, 1926). See Montgomery (1991), and Mason, Gunst, and Hess (1989) for an intro-
duction to experiment design, and Ghosh and Rao (1996) for survey articles on the mathematical
properties of experiment design methods.2
Our experiment has one other requirement in addition to these principles. We need to ensure
that the results of our experiment are representative, i.e., they provide an accurate forecast of the2When the number of experimental units per block is smaller than the number of treatments, then an incomplete
block design is used. While we use a complete block design, the method of store selection we present may be used in
conjunction with an incomplete block design as well.
7
sales in the entire chain at each treatment. Therefore, we require that the stores assigned to each
price-point should represent the sales characteristics of the entire chain.
To measure the degree of (dis)similarity between stores, we use a metric given by Fisher and
Rajaram (2000). They define the ‘distance’ or the degree of dissimilarity between two stores as
the difference between their sales distributions across product categories. Let l = 1, . . . , q index
the product categories sold by the retailer, and fsl be the fraction of sales of store s realized from
product category l. The distance between the sales distributions of stores s and t, denoted dst, is
computed using the Euclidean norm,
dst =√∑
l
(fsl − ftl)2. (1)
Fisher and Rajaram compare this distance metric with several alternatives. They find that the
partition of stores obtained using (1) provides a more accurate forecast of total chain sales than
distance measures defined solely on store size or geographical location. Our methodology remains
unchanged if an alternative measure of dissimilarity between stores is used.
We now assign stores to clusters (henceforth, ‘cluster’ is used synonymous with block). We use
a variant of the k-median problem (see Nemhauser and Wolsey 1988). Each cluster is represented
by its median store. The degree of dissimilarity within each cluster is defined as the sum of
the distances of all stores in that cluster from the median store. The objective of the problem
formulation is to minimize the total sum of dissimilarities within each cluster.
Let yk be 1 if store k is chosen as the median of a cluster and 0 otherwise. Also let xsk be
1 if store s is assigned to the cluster with store k as its median and 0 otherwise. The problem
formulation with decision variables yk and xsk is as follows:
minimize∑s,k
dskxsk (2)
8
subject to
∑k
xsk = 1 s = 1, . . . , n (3)
xsk ≤ yk k = 1, . . . , n, s = 1, . . . , n (4)∑k
yk = m (5)∑s
xsk ≥ pyk k = 1, . . . , n (6)
xsk, yk ∈ {0, 1} k = 1, . . . , n, s = 1, . . . , n. (7)
The objective function (2) is the sum of the distance of each store from the median of its cluster.
Constraint (3) ensures that each store is assigned to exactly one cluster; (4) ensures that stores are
assigned only to the median stores of their respective clusters; (5) restricts the number of clusters
to m; finally (6) defines the additional constraint that each cluster has at least p stores. We solve
this problem using the standard branch-and-bound algorithm for integer programming using the
CPLEX solver in GAMS.
We note that the k-median problem formulation is a non-hierarchical clustering technique.
It gives the optimal solution by solving an integer program for a fixed k, but is NP-Hard, thus,
computationally intensive and suited to smaller datasets. Additionally, it has the nice property that
the clusters formed are convex in the Euclidean space. In contrast, hierarchical cluster analysis
uses a greedy algorithm that iteratively combines together clusters with the minimum distance
between them in each step, but does not allow reallocation of stores that may have been poorly
classified in an early iteration. Thus, it need not give the optimal solution, but runs in polynomial
time and is particularly suited to large datasets. Further, it is useful when k is not known a priori.
Hierarchical cluster analysis requires a measure of distance not only between individual stores but
also between clusters of stores. Several distance metrics have been proposed, such as the distance
between the centroids of the clusters, or as the average distance between all pairs of stores across
9
the two clusters. For computing distance between singleton clusters, these methods are identical to
the Euclidean norm used by us. See Everitt, et al. (2001) for a comparison of alternative clustering
techniques.
4 Application
Zany Brainy, Inc.3 is a specialty retailer of high quality educational toys for children less than 12
years old. It has 53 retail stores in the U.S. with total sales of about $200 million. It faces a pricing
problem for its products every season when it launches new products and determines their prices.
The main characteristics of Zany Brainy’s business are as follows:
1. It sells products across eleven categories, such as games and puzzles, arts and crafts, sport-
theme toys, building toys and trains, infant development toys, electronic learning aids and
science-related toys, etc. The retailer has a unique image and does not sell toys that reinforce
gender stereotypes or encourage violence.
2. Most products sold at Zany Brainy are exclusive, with only 30% of the products being common
with discounters and mass-market retailers.
3. A typical store is 10,600 square feet and carries 15,000 skus. Stores are located in suburban
shopping centers, sharing retail space with destination and lifestyle-oriented retailers.
4. Zany Brainy seeks to set prices that give value to the customer, without trying to be the
discount leader in its market. Therefore, it does not use cost-plus pricing. Instead, merchan-
disers determine the markup for each item individually using their experience and judgement.
Products vary in price from less than $1 up to $200, with the average price for a single product
being less than $10. The average gross margin across all products is close to 50%.3All data for Zany Brainy, Inc. are based on the year 2000.
10
Zany Brainy was interested in adopting a more scientific approach to pricing. As a first step
towards this, it decided to conduct an in-store experiment with a subset of products in its chain
to observe how their demand varies with price. It was also interested in exploring the long-term
usefulness of in-store experimentation to learn about consumer response to various marketing levers
employed in its stores, such as “item of the week” promotion, salesperson push, advertisements,
store layout, assortment planning, etc.
Store Selection Model: We are given data for 53 stores, and each item is to be tested at three
price-points in six stores each. Thus, n = 53,m = 6 and p = 3. To measure the distances between
stores, we classify the annual dollar sales of each store into eleven product categories used by the
firm and compute the fraction of sales in each category. The average distance between stores thus
computed is 0.5347. After solving the k-median problem, the average distance of a store from the
median of its cluster is 0.2228, a 58% reduction. Since clustering per se does not provide tests of
statistical significance, we use an analysis of variance test to determine whether the clusters formed
are representative of the chain, i.e., whether they explain a statistically significant proportion of
the variation in sales distribution of all stores. The test is statistically significant at p< 0.001.
Now, from each cluster, three stores are selected for the experiment using additional criteria
to further control for dissimilarities within clusters. The selected stores are similar in age and size
(measured by total dollar sales), and their geographical locations are relatively isolated from other
stores belonging to the chain. Table 1 lists the stores selected for the experiment, their opening
years, their total year-to-date sales and the percentage of sales coming from the five largest product
categories.
Ideally, we would like to conduct the experiment in as many stores as possible to obtain a
large dataset for analysis. However, the choice of the number of stores was limited due to the
11
opportunity cost of lost sales in the stores under experimentation, and the complexity of managing
the controlled experiment such that there are no execution errors. The management of the firm was
also concerned about interference between stores located close to each other. If a large number of
stores were used in the experiment, then a customer visiting two neighboring stores would discover
the difference in their prices. This would not only introduce an error in the experiment but also
result in a loss of goodwill for the retailer.
Description of Items: The experiment was conducted for the following three products: a family
game center, ‘phonics traveler’, and a headset walkie-talkie. The family game center is an unbranded
board game. It is a simple toy that customers can try out in the store. The phonics traveler is a
branded toy, produced by Leapfrog Enterprises, Inc. It is a complex electronic product to teach
spelling and reading to children through interaction. The headset walkie-talkie is also a complex
electronic toy, but is unbranded. The family game center and the headset walkie-talkie are not
carried by the competition. Each item is unique to avoid comparison with other brands in the
same category.
Table 2 lists the items, their price-points and their purchase costs. The middle price-point in
each case is the existing list price and the high and low price-points are five dollars above and
below the list price. These price-points were chosen by the firm’s merchandising managers to be
sufficiently far from each other in order that they cause observable changes in demand.
Experiment Layout: Table 3 shows the layout of the experiment. In each cluster, one store is
designated the control store and assigned each item at the middle price-point. The remaining two
stores are randomly assigned the high and low price-points for each item. For example, store 103 is
the control store in cluster 1, store 102 has the Family Game Center at its highest price-point, and
the Phonics Traveler and the Headset Walkie-Talkie at their lowest price-points, and store 204 is
12
assigned the remaining price-points. We note that, as an alternative to this layout, all three prices
could be randomized across all 18 stores. Doing so would eliminate a potential confound if there is
interaction in demand across the products under experiment. We did not consider this possibility
since the products in our experiment belong to different categories, and are physically located in
different parts of the store.
Data Collection: The experiment was conducted for a period of six weeks. The length of the
time period was fixed as a compromise between our desire to have a long time period to collect
data, and the managers’ keenness to avoid the experiment encroaching on their peak selling season.
The data collected at each store for each week were the unit sales of the three items, the number
of returns, the beginning-of-week inventories of the three items, and the total number of sales
transactions in the store across all items. The total number of sales transactions in the store was
used to measure customer traffic in order to control for differences in store size.
Since store managers were kept unaware of the experiment, inventory control and data quality
were managed centrally. An inventory planner in the corporate office was responsible to monitor
beginning of week inventories to ensure that there was no stock-out. Various studies on retail
supply chains have demonstrated that data quality tends to be poor and that much work needs
to be done on upgrading data quality so that data could be useful in analysis (see, for example,
Raman, DeHoratius and Ton 2001). Thus, the inventory planner monitored sales data for the
experiment daily to safeguard against data quality problems.
Some precautions were observed during the experiment: (1) The price labels in each case were
changed to reflect the new prices. The labels did not show the original list price, so that customers
would not perceive that a product was marked up or marked down. (2) Sufficient inventory was kept
in the experimental stores to avoid stock-outs. (3) The store managers were not informed about
13
the experiment to avoid any execution differences that may arise because of managers treating the
experimental items differently, or trying to promote them based on their price-points.
5 Results
Table 4 summarizes the results of the experiment. For each item-store combination, it shows the list
price and the total number of units sold over six weeks. There were no stock-outs in the experiment
stores as sufficient inventory was provided. The last column in the table gives the average number
of sales transactions per week recorded in each store. Figure 1 shows a plot of the total sales of
each product at each price-point.
Note that the total sales of the family game center and the phonics traveler are downward sloping
in price. The family game center recorded total sales of 7 units, 5 units and 3 units at prices of
$19.99, $24.99 and $29.99, respectively. The phonics traveler recorded total sales of 33 units, 26
units and 15 units at prices of $24.99, $29.99 and $34.99, respectively. However, the headset walkie-
talkie shows a different pattern. Its sales of 74 units at the middle price-point are much higher than
the sales of 47 units at the lowest price-point and the sales of 36 units at the highest price-point.
This finding was unexpected to us as well as to the managers at the retail chain. To ascertain
whether this finding is statistically significant, we fit a demand model to the experimental data
expressing demand as a function of categorical variables for the three price-points.
Since all three products are slow-moving items, we represent weekly demand in each store with
a Poisson distribution, and thus, use a Poisson regression model for statistical analysis (See Greene
1997: Chapter 19 for details). We assume that mean weekly demand follows a multiplicative model
and is given by a product of cluster-specific, price-specific, and store size specific variables. Let
ykit denote random demand in the store in cluster k at price-point i in week t, and λkit denote the
14
mean of ykit. Here, k = 1, . . . ,m and i ∈ {L,M,H}, denoting low price, middle price and high
price, respectively. We write λkit as
λkit = akbi(xkit)c, (8)
where ak is a cluster-specific constant to control for differences between clusters, bi is a price-specific
constant, xkit is the number of transactions in the store in cluster k at price-point i in week t, and
c is a coefficient representing the increase in mean demand with store size.
The regression model is used to estimate ak, bi and c. We represent the indices k and i by
dummy variables, so that ak and bi are the coefficients of their respective dummy variables. Since
the coefficients’ matrix must be non-singular, it is not possible to estimate all coefficients separately.
Therefore, we set the value of bM to be 1 and use two dummy variables to estimate bL and bH
relative to bM . The model (8) is estimated by maximum likelihood estimation (MLE). The log
likelihood function for Poisson distribution is:
log L =∑k,i,t
[λkit + ykit (log ak + log bi + c log xkit)− log (ykit!)] . (9)
We test whether demand is downward sloping with price for the first two products using the
following hypothesis:
Hypothesis 1. For the phonics traveler and the family game center, bL ≥ bM ≥ bH .
For the headset walkie-talkie, we test whether demand at the middle price-point is higher than
that at lower and higher price-points using the following hypothesis:
Hypothesis 2. For the headset walkie-talkie, bL ≤ bM and bM ≥ bH .
Since bM is set to 1, Hypothesis 1 is equivalent to testing that log bL ≥ 0 and log bH ≤ 0.
Hypothesis 2 is equivalent to testing that log bL ≤ 0 and log bH ≤ 0. The Student’s t test is used
in each case because the estimates of log bi are asymptotically normally distributed.
15
Table 5 gives the parameter estimates obtained from MLE. For each item, the first two rows of
the table give the estimates of log bL and log bH setting a value of zero for the middle price-point.
The next six coefficient estimates give the values of log ak for the six clusters. The last estimate
gives the value of c, the exponent of xkit.
We observe that bi decreases with price for the family game center and the phonics traveler.
For example, for the family game center, the estimate of log bi is 0.3277 at the lower price, 0 at
the middle price, and -0.5249 at the higher price. Further, computing t-statistics and p-values, we
observe that the values of bi are statistically significant (p<0.05) at the higher price for the phonics
traveler and at both price levels for the headset walkie-talkie. Thus, Hypothesis 1 is supported
for the phonics traveler at 95% confidence level but not for the family game center. The lack of
significance for the family game center may be because the quantity of sales registered at each
price-point for this product is too small for statistical analysis.
For the headset walkie-talkie, Hypothesis 2 is supported with a 99% confidence level. Thus,
the finding that the sales at the middle price-point exceed those at the low and high price-points
is statistically validated.
6 Discussion
6.1 Reasons for upward sloping demand
To understand the reasons for the demand curve observed for the headset Walkie-Talkie, we dis-
cussed the results with the managers of the subject firm and several other retailers. The following
explanations emerged from our discussions:
1. Price as an indicator of quality: The headset walkie-talkie is a complex electronic item. The
consumers find it difficult to judge its quality, and therefore, use price as an indicator of quality.
16
The managers used wine as another example where consumers might be expected to use price as an
indicator of quality. This argument did not apply to the family game center because it is a board
game, and easily understood by the customer, so that price need not be used as an indicator of
quality. It also did not apply to the phonics traveler because it is a branded item.
This explanation is consistent with those given by Dodd, et al. (1991), Gerstner (1985), Licht-
enstein and Burton (1989), Tellis and Wernerfelt (1987) and Zeithaml (1988) for consumers using
price as an indicator of quality. Our finding extends the insights from these articles because it
is based on actual sales. Further, some of these papers are based on comparing price and quality
across products in a category, while we document the increase in sales with price for the same prod-
uct due to a likely increase in quality perception. For example, if a reputed and a not so reputed
brand sell the same toy for $30 and $25, respectively, then it is possible that the more reputed
brand has a higher quality and thus a higher demand in spite of the higher price. However, our
finding implies that the demand of the more reputed brand might increase if its price is increased
from $30 to $35.
2. Sweet spot of pricing: The price-point $19.99 is more popular for gift purchases than $14.99.
Consumers may like the headset walkie-talkie as a gift item, so that the unit sales at $19.99 exceed
those at $14.99.
6.2 Implications of Informational Role of Price
We first note that if price plays an informational role, then it leads to a reduction in the demand
and the expected profit for the subject item. To see this, consider the headset Walkie-Talkie. If
price did not play an informational role and consumers had full information about the quality of
this item from other sources, then the demand for this item would be downward sloping in price. A
conservative estimate of the price elasticity of demand is obtained from the two higher price points
17
of $19.99 and $24.99.4 We find that the price elasticity is -3.16 (see Appendix for the computations).
With this price elasticity, the price that maximizes the profit is $16.09 and the maximum profit
is $743.50. Compared to the profit of $665.26 realized at the middle price-point ($19.99), this
represents an increase of 12%. In such a case, it might be advantageous for the retailer to provide
additional signals of quality to increase demand and profits.
However, the informational role of price can be beneficial to the retailer due to product substi-
tution. Suppose that the retailer offers an assortment of items at different price-points and that the
consumers are willing to substitute between these items. Then using price as a signal for quality
could enable the retailer to capture a larger consumer surplus by promoting substitution.
7 Conclusions
We have presented a methodology for conducting experiments in a retail store that can help retail-
ing managers learn more about their consumers. This methodology is useful not just for finding
consumer reactions to different price-points, but also to test the effects of other marketing levers
on sales. The critical aspect of the methodology is the selection of stores for the experiment. We
have shown how the differences between stores may be defined using their sales characteristics and
used to partition stores into a randomized block design. This technique is advantageous because
an experiment with a small number of experimental stores can yield accurate results applicable to
the entire chain.
While our paper shows the usefulness of in-store experimentation, it has some limitations that
may be addressed in future research. First, our paper provides a single statistically significant
evidence that demand can be upward-sloping in price. There is need for replicating this finding
through further experimentation to demonstrate its validity. A wider range of products and price4This estimate is conservative since the observed sales are net of the informational and allocative roles of price.
18
points should be considered to systematically study how consumers react to prices and extend our
results. Second, the experiment may be conducted with fast-moving items or for a larger number
of weeks to collect more data for statistical analysis. Finally, the informational role of price may be
quantified in future research to investigate the conditions under which such a role is advantageous
or disadvantageous to the retailer. Such research would be useful to retailers that use different
pricing strategies and provide different levels of quality signals for their products.
References
Baker, J., M. Levy, D. Grewal. 1992. An experimental approach to making retail store environ-
mental decisions. Journal of Retailing 68 445-460.
Brodie, R., C. Kluyver. 1984. Attraction Versus Linear and Multiplicative Market Share Models:
An Empirical Evaluation. Journal of Marketing Research 21 194-201.
Curhan R. 1974. The Effects of Merchandising and Temporary Promotional Activities on the
Sales of Fresh Fruits and Vegetables in Supermarkets. Journal of Marketing Research 11
286-294.
Dodds, W. B., K. B. Monroe, D. Grewal. 1991. Effects of Price, Brand and Store Information on
Buyers’ Product Evaluations. Journal of Marketing Research 28 307-319.
Etgar, M., N. Malhotra. 1981. Determinants of Price Dependency: Personal and Perceptual
Factors. Journal of Consumer Research 8 217-222.
Everitt, B. S., S. Landau, M. Leese. 2001. Cluster Analysis. 4th ed. Edward Arnold.
Fisher, M. L., K. Rajaram. 2000. Accurate Retail Testing of Fashion Merchandise: Methodology
and Application. Marketing Science 19 266-278.
19
Fisher, R. A. 1923. Studies on crop variation II: The manorial response of different potato varieties.
Journal of Agricultural Science 13 311-320.
Fisher, R. A. 1926. The arrangement of field trials. Journal of Ministry of Agriculture 33 503-513.
Gagnon, J. P., J. T. Osterhaus. 1985. Effectiveness of Floor Displays on the Sales of Retail
Products. Journal of Retailing 61 104-116.
Gerstner, E. 1985. Do Higher Prices Signal Higher Quality? Journal of Marketing Research 22
209-15.
Ghosh, A., S. Neslin, R. Shoemaker. 1984. A Comparison of Market Share Models and Estimation
Procedures. Journal of Marketing Research 21 202-210.
Ghosh, S., C. R. Rao (eds.) 1996. Design and Analysis of Experiments. Handbook of Statistics,
Vol 13. Elsevier Science, Amsterdam.
Green, W. H. 1997. Econometric Analysis. 3rd ed., Prentice Hall, New Jersey.
Lichtenstein, D. R., S. Burton. 1989. The Relationship Between Perceived Quality and Objective
Price-Quality. Journal of Marketing Research 26 429-443.
Mason, R., R. Gunst, J. Hess. 1989. Statistical Design and Analysis of Experiments. John Wiley
& Sons, New York.
Montgomery, D. 1991. Design and Analysis of Experiments. 3rd ed., John Wiley & Sons, New
York.
Naert, Ph., M. Weverbergh. 1981. On the Prediction Power of Market Share Attraction Models.
Journal of Marketing Research 21 202-210.
Nagle, T. 1984. Economic Foundations for Pricing. Journal of Business 57 S3-S26.
20
Nemhauser, G., L. Wolsey. 1988. Integer and Combinatorial Optimization. John Wiley & Sons,
New York.
Neslin, S., R. Shoemaker. 1983. Using a Natural Experiment to Estimate Price Elasticity: The
1974 Sugar Shortage and the Ready-to-Eat Cereal Market. Journal of Marketing 47 44-57.
Nevin, J. 1974. Laboratory Experiments for Estimating Consumer Demand: A Validation Study.
Journal of Marketing Research 9 261-268.
Raman, A., N. DeHoratius and Z. Ton. 2001. Execution: The Missing Link in Retail Operations.
California Management Review 43 136 - 152.
Rao, V. R. 1984. Pricing Research in Marketing: The State of the Art. Journal of Business 57
S39-S60.
Rao, V. R. 1993. Pricing Models in Marketing. Chapter 11 in Handbooks in OR & MS. Vol. 5.
Marketing. J. Eliashberg and G. L. Lilien, eds., North-Holland, Amsterdam. 517-552.
Rao, V. R., H. Sattler. 2000. Measurement of Price Effects with Conjoint Analysis: Separating
Informational and Allocative Effects of Price. Chapter 2 in Conjoint Measurement: Methods
and Applications. Springer-Verlag, Berlin, Germany.
Roberts, J. H., G. L. Lilien. 1993. Explanatory and Predictive Models of Consumer Behavior.
Chapter 2 in Handbooks in OR & MS. Vol. 5. Marketing. J. Eliashberg and G. L. Lilien,
eds., North-Holland, Amsterdam. 27-82.
Tellis, G. J. 1988. Price Elasticity of Selective Demand: A Meta-Analysis of Econometric Models
of Sales. Journal of Marketing Research 25 331-341.
21
Tellis, G. J., B. Wernerfelt. 1987. Competitive Price and Quality under Asymmetric Information.
Marketing Science 6 240-253.
Zeithaml V. 1988. Consumer Perceptions of Price, Quality and Value: A Means-End Model and
Synthesis of Evidence. Journal of Marketing 52 2-22.
Appendix: Estimation of Price Elasticity of Demand
Assume that the mean demand depends on price according to a constant elasticity demand curve.
Then, (8) in the Poisson regression model of §5 is modified to
λkit = ak (Pki)η (xkit)
c ,
where Pki is the i-th price-point in cluster k, η is the price elasticity of demand, and the other
variables have the same definition as before. The log likelihood function is thus given by:
log L =∑kit
[λkit + ykit (log ak + η log Pki + c log xkit)− log (ykit!)] .
We compute the maximum likelihood estimates of ak, η and c. The price elasticity of demand for
the phonics traveler is found to be -2.02 (standard error = 0.90, p-value = 0.01). For the family
game center, it is found to be -1.95 (standard error = 1.64, p-value = 0.12). The lower significance
level for the family game center may be because it is a very slow-moving item with a few units sold
at each price. For the headset Walkie-Talkie, the price elasticity of demand estimated from the two
higher price points of $19.99 and $24.99 is -3.16 (standard error = 1.18, p-value < 0.01).
The optimal price is found by maximizing the standard expected profit function,
E[π] = Pkiλkit − Cλkit = constant ×(P 1+η
ki − CP ηki
),
where C denotes unit cost. This gives an optimal price of Pki = ηC/(1 + η). Its value is $35.65 for
the phonics traveler for a cost of $18 and a profit margin of $17.65. For the family game center, it
22
is equal to $22.58 for a cost of $11 and a profit margin of $11.58. Thus, the price for the phonics
traveler is found to be higher than the existing price of $29.99 and the price for the family game
center is found to be lower than the existing price of $24.99 at this retail chain. The increase in
expected gross profit from moving to the optimal price is 3.8% for the phonics traveler and 0.9%
for the family game center. The pricing problem for the headset Walkie-Talkie is discussed in §6.2.
23
24
Table 1: Summary of the stores used in the experiment classified into cluster
% Sales from each category Cluster Store Opening Year
Year To Date Sales ($ '000) 1 2 3 4 5
102 92 680.8 12.1 5.7 8.4 8.0 8.3 204 94 514.1 12.4 4.6 7.8 9.4 8.9
1
103 93 610.5 11.2 5.2 8.8 8.5 8.2 108 96 481.7 9.7 12.0 7.3 9.9 10.6 402 94 477.9 8.7 15.2 6.7 9.6 9.8
2
105 94 492.2 8.1 12.3 6.1 9.9 9.6 302 94 432.6 9.6 6.0 8.4 10.2 8.6 303 95 434.1 10.7 6.3 7.8 11.4 8.2
3
205 95 466.2 10.6 5.7 7.7 10.3 9.7 325 96 561.8 10.7 6.0 7.7 8.6 9.2 526 96 556.1 10.9 6.5 8.3 8.6 9.9
4
401 94 644.1 11.1 6.7 7.3 7.6 8.6 107 94 523.5 10.2 4.3 6.1 11.2 12.0 527 96 441.8 11.1 5.8 6.1 11.1 13.2
5
326 96 438.1 10.3 5.3 7.3 10.5 12.6 110 95 595.8 11.5 9.0 7.5 8.2 7.3 504 96 553.3 12.3 9.4 8.3 8.9 6.3
6
503 96 547 12.9 9.9 7.8 8.6 6.3
Table 2: Summary of products and price-points used in the experiment
Prices ($)
Low Medium (Existing list price)
High Purchase Cost
($)
A: Family Game Center 19.99 24.99 29.99 11 B: Phonics Traveler 24.99 29.99 34.99 18 C: Headset Walkie-Talkie 14.99 19.99 24.99 11
25
Table 3: Experiment layout showing the random assignment of stores in each cluster to price-points for each product
(a) Family Game Center
Prices ($) Clusters 19.99 24.99 29.99
1 204 103 1022 108 105 4023 302 205 3034 325 401 5265 107 326 5276 504 503 110
(b) Phonics Traveler
Prices ($) Clusters 24.99 29.99 34.99
1 102 103 2042 108 105 4023 303 205 3024 526 401 3255 107 326 5276 504 503 110
(c) Headset Walkie-Talkie
Prices ($) Clusters 14.99 19.99 24.99
1 102 103 2042 108 105 4023 303 205 3024 325 401 5265 527 326 1076 504 503 110
26
Table 4: Total sales recorded for each product in each store
Family Game Center Phonics Traveler Headset Walkie Talkie
Cluster Store Price Total Unit
Sales Price Total Unit
Sales Price Total Unit
Sales
Average number of transactions
per week 102 29.99 0 24.99 6 14.99 15 1954.83 204 19.99 2 34.99 0 24.99 5 1780.67
1
103 24.99 1 29.99 3 19.99 16 1741.50 108 19.99 2 24.99 3 14.99 8 1485.33 402 29.99 0 34.99 6 24.99 6 1355.50
2
105 24.99 1 29.99 2 19.99 18 1714.00 302 19.99 0 34.99 2 24.99 12 1294.17 303 29.99 0 24.99 6 14.99 4 1263.33
3
205 24.99 0 29.99 2 19.99 10 1343.00 325 19.99 2 34.99 0 14.99 9 1911.33 526 29.99 0 24.99 12 24.99 6 1872.17
4
401 24.99 3 29.99 10 19.99 9 1916.67 107 19.99 1 24.99 5 24.99 5 1795.00 527 29.99 1 34.99 1 14.99 6 1266.17
5
326 24.99 0 29.99 5 19.99 8 1328.33 110 29.99 2 34.99 6 24.99 2 1694.33 504 19.99 0 24.99 1 14.99 5 1424.67
6
503 24.99 0 29.99 4 19.99 13 2019.17 19.99 7 24.99 33 14.99 47 24.99 5 29.99 26 19.99 74
Total Sales
29.99 3 34.99 15 24.99 36
27
Table 5: Maximum likelihood estimates of the effects of price, cluster, and store size on unit sales
ML Estimates for Poisson Regression Model
Coefficients Estimate Standard
Error t-Statistic p-value Family Game Center Low price 0.3277 0.5858 0.5594 0.2880 High price -0.5249 0.7306 -0.7184 0.2363 Cluster 1 -0.2940 2.6467 -0.1111 0.4558 Cluster 2 -0.3317 2.5820 -0.1285 0.4489 Cluster 3 -14.1577 444.5660 -0.0318 0.4873 Cluster 4 0.2391 2.6643 0.0897 0.4643 Cluster 5 -0.7304 2.6288 -0.2778 0.3906 Cluster 6 -0.7086 2.6677 -0.2656 0.3953 Scale -0.2049 0.3525 -0.5812 0.2805 Phonics Traveler Low price 0.1858 0.2629 0.7065 0.2399 High price -0.6323 0.3248 -1.9465 0.0258 Cluster 1 5.4954 0.9818 5.5972 0.0000 Cluster 2 5.4885 0.9360 5.8635 0.0000 Cluster 3 5.2852 0.9293 5.6873 0.0000 Cluster 4 6.5071 0.9775 6.6569 0.0000 Cluster 5 5.5630 0.9570 5.8130 0.0000 Cluster 6 5.6776 0.9794 5.7969 0.0000 Scale -0.8609 0.1337 -6.4409 0.0000 Headset Walkie-Talkie Low price -0.5088 0.1868 -2.7238 0.0032 High price -0.7484 0.2017 -3.7098 0.0001 Cluster 1 6.9388 0.6362 10.9069 0.0000 Cluster 2 6.6475 0.6172 10.7704 0.0000 Cluster 3 6.3334 0.6152 10.2956 0.0000 Cluster 4 6.6708 0.6690 9.9713 0.0000 Cluster 5 6.1491 0.6454 9.5271 0.0000 Cluster 6 6.4219 0.6589 9.7464 0.0000 Scale -0.8306 0.0894 -9.2852 0.0000