Fast, “Robust”, and Approximately Correct:
Estimating Mixed Demand Systems∗
Bernard Salanie† Frank A. Wolak‡
March 8, 2019
Abstract
Many econometric models used in applied work integrate over unobserved
heterogeneity. We show that a class of these models that includes many random
coefficients demand systems can be approximated by a “small-σ” expansion that
yields a linear two-stage least squares estimator. We study in detail the models
of product market shares and prices popular in empirical IO. Our estimator is
only approximately correct, but it performs very well in practice. It is extremely
fast and easy to implement, and it is “robust” to changes in the higher moments
of the distribution of the random coefficients. At the very least, it provides
excellent starting values for more commonly used estimators of these models.
∗We are grateful to Dan Ackerberg, John Asker, Steve Berry, Xiaohong Chen, Chris Conlon,
Pierre Dubois, Jeremy Fox, Han Hong, Guy Laroque, Simon Lee, Arthur Lewbel, Thierry Magnac,
Lars Nesheim, Ariel Pakes, Mathias Reynaert, Tobias Salz, Richard Smith, Pedro Souza, Frank Ver-
boven, Martin Weidner, and Ken Wolpin for their useful comments, as well as to seminar audiences
at NYU, Rice, UCL, and the Stanford Institute for Theoretical Economics (SITE). We also thank
Zeyu Wang for excellent research assistance.†Department of Economics, Columbia University, 1022 International Affairs Building, 420 West
118th Street, New York, NY 10027, [email protected].‡Department of Economics and Program on Energy and Sustainable Development, Stanford
University, Stanford CA 94305-6072, [email protected].
1
Introduction
Many econometric models are estimated from conditional moment conditions that
express the mean independence of random unobservable terms η and instruments Z:
E pη|Zq “ 0.
In structural models, the unobservable term is usually obtained by solving a set of
equations—often a set of first-order conditions—that define the observed endogenous
variables as functions of the observed exogenous variables and unobservables. That
is, we start from
Gpy, η, θ0q “ 0 (1)
where y is the vector of observed endogenous variables and θ0 is the true value of the
vector of unknown parameters. The parametric function G is assumed to be known
and can depend on a vector of observed exogenous variables. Then (assuming that
the solution exists and is unique) we invert this system into
η “ F py, θ0q
and we seek an estimator of θ0 by minimizing an empirical analog of a norm
‖E pF py, θqmpZqq‖
where mpZq is a vector of measurable functions of Z.
Unless F py, θ0q exists in closed form, inversion often is a step fraught with diffi-
culties. Even when a simple inversion algorithm exists, it is still costly and must be
done with a high degree of numerical precision, as errors may jeopardize the “outer”
minimization problem. One alternative is to minimize an empirical analog of the
norm
‖E pηmpZqq‖
subject to the structural constraints (1). This “MPEC approach” has met with
some success in dynamic programming and empirical industrial organization (Su and
Judd 2012, Dube et al 2012). It still requires solving a nonlinearly constrained,
nonlinear objective function minimization problem; convergence to a solution can
be a challenging task in the absence of very good initial values. This is especially
2
galling when the model has be estimated many times, for instance within models of
bargaining like those of Crawford and Yurokoglu (2012) or Ho and Lee (2017).
We propose an alternative that derives a linear model from a very simple series
expansion. To fix ideas, suppose that θ0 can be decomposed into a pair pβ0, σ0q, where
σ0 is a scalar that we have reasons to think is not too far from zero. We rewrite (1)
as
Gpy, F py, β0, σ0q, β0, σ0q “ 0.
We expand σ Ñ F py, β0, σq in a Taylor series around 0 and re-write F py, β0, σ0q as:
F py, β0, σ0q “ F py, β0, 0q ` Fσpy, β0, 0qσ0 ` . . .` Fσσ...σpy, β0, 0qσL0L!`OpσL`10 q,
where the subscript σ denotes a partial derivative with respect to the argument σ.
This suggests a sequence of “approximate estimators” that minimize the empirical
analogs of the following norms
‖E pF py, β, 0qmpZqq‖‖E ppF py, β, 0q ` Fσpy, β, 0qσqmpZqq‖›
›
›E
ˆˆ
F py, β, 0q ` Fσpy, β, 0qσ ` Fσσpy, β, 0qσ2
2
˙
mpZq
˙
›
›
›
. . .
If the true value σ0 is not too large, one may hope to obtain a satisfactory estimator
with the third of these “approximate estimators.” In general, this still requires solving
a nonlinear minimization problem. However, suppose that the function F satisfies
the following three conditions:
C1: Fσpy, β0, 0q ” 0
C2: F py, β, 0q ” f0pyq ´ f1pyqβ is affine in β for known functions f0p¨q and f1p¨q.
C3: the second derivative Fσσpy, β, 0q does not depend on β.
Denote f2pyq ” ´Fσσpy, β, 0q. Under C1–C3, we would minimize
›
›
›E
ˆˆ
f0pyq ´ f1pyqβ ´ f2pyqσ2
2
˙
mpZq
˙
›
›
›.
3
Taking the parameters of interest to be pβ0, σ20q, this is simply a two-stage least
squares regression of f0pyq on f1pyq and f2pyq with instruments mpZq. As this is
a linear problem, the optimal1 instruments mpZq associated with the conditional
moment restrictions Epη|Zq “ 0 are simply
mpZq “ Z˚ “ pE pf1pyq|Zq , E pf2pyq|Zqq .
These optimal instuments could be estimated directly from the data using nonpara-
metric regressions. Or more simply, we can include flexible functions of the columns
of Z in the instruments used to compute the 2SLS estimates.
The resulting estimators of β0 and σ20 are only approximately correct, because
they consistently estimate an approximation of the original model. On the other
hand, they can be estimated in closed form using linear 2SLS. Moreover, because
they only rely on limited features of the data generating process, they are “robust”
in ways that we will explore later.
Conditions C1–C3 extend directly to a multivariate parameter σ0. They may
seem very demanding. Yet as we will show, under very weak conditions the Berry,
Levinsohn, and Pakes (1995) (macro-BLP) model that is the workhorse of empirical
IO satisfies all three. In this application, σ0 is taken to be the square root of the
variance–covariance matrix Σ of the random coefficients in the mixed demand model.
More generally, we will characterize in Section 6.1 a general class of models with
unobserved heterogeneity to which conditions C1–C3 apply.
Our approach builds on “small-Σ” approximations to construct successive approx-
imations to the inverse mapping (from market shares to product effects). Kadane
(1971) pioneered the “small-σ” method. He applied it to a linear, normal simulta-
neous equation system and studied the properties of k-class estimators2 when the
number of observations n is fixed and σ goes to zero. He showed that when the num-
ber of observations is large, under these “small-σ asymptotics” the k-class estimators
have biases in σ2, and that their mean-squared errors differ by terms of order σ4.
Kadane argued that small σ, fixed n asymptotics are often a good approximation to
finite-sample distributions when the estimation sample is large enough.
The small-σ approach was used by Chesher (1991) in models with measurement er-
1In the sense of Amemiya (1975).2Which include OLS and 2SLS.
4
ror. Most directly related to us, Chesher and Santos-Silva (2002) used a second-order
approximation argument to reduce a mixed multinomial logit model to a “heterogene-
ity adjusted” unmixed multinomial logit model in which mean utilities have additional
terms3. They suggested estimating the unmixed logit and using a score statistic based
on these additional covariates to test for the null of no random variation in preferences.
Like them, we introduce additional covariates. Unlike them, we develop a method to
estimate jointly the mean preference coefficients and parameters characterizing their
random variation; and we only use linear instrumental variables estimators. To some
degree, our method is also related to that of Harding and Hausman 2007, who use a
Laplace approximation of the integral over the random coefficients in a mixed logit
model without choice-specific random effects. Unlike them, we allow for endogeneous
prices; our approach is also much simpler to implement.
Section 1 presents the model popularized by Berry–Levinsohn–Pakes (1995) and
discusses some of the difficulties that practitioners have encountered when taking it
to data. We give a detailed description of our algorithm in section 2; readers not in-
terested in the derivation of our formulæ in fact can jump directly to our Monte Carlo
simulations in section 7. The rest of the paper justifies our algorithm (sections 3 and
4); studies its properties (section 5); and discusses a variety of extensions (section 6).
1 The macro-BLP model
Our leading example is taken from empirical IO. Much work in this area is based on
market share and price data. It has followed Berry et al (1995—hereafter BLP) in
specifying a mixed multinomial logit model with product-level random effects. To
deal with the endogeneity of prices implied by these product-level random effects,
BLP use a Generalized Method Moments (GMM) estimator that relies on the mean
independence of the product-level random effects and a set of instruments.
To fix ideas, we define “the standard model” as follows4. Let J products be
available on each of T markets. Each market contains an infinity of consumers who
3Ketz (2018) builds on a quadratic expansion in σ0 “ 0 to derive asymptotic distributions when
the true σ0 is on the boundary.4While some of our exposition relies on it for simplicity, our methods apply to a more general
model— see section 6.1.
5
choose one of J products. Consumer i in market t derives a conditional indirect utility
from consuming product j equal to
Xjt pβ0 ` εiq ` ξjt ` uijt.
There is also a good 0, the “outside good”, whose utility for consumer i is typically
normalized to equal ui0t. The random variables ε represent individual variation in
tastes for observed product characteristics, while the u stand for unobserved hetero-
geneity observed by the individual, but not by the econometrician. The vector ε and
u are independent of each other, and of the covariates X and product random effects
ξ. Berry et al. (1995) assume that the vector uit “ pui0t, ui1t, . . . , uiJtq is indepen-
dently and identically distributed (iid) as standard type-I Extreme Value (EV); the
product effects ξjt are unknown mean zero random variables conditional on a set of
instruments; and the random variation in preferences εi has a mean-zero distribution
which is known up to its variance-covariance matrix Σ0. The εi are often taken to be
independent, identically distributed Np0,Σ0q random vectors with a diagonal Σ0.
Some of the covariates in Xjt may be correlated with the product-specific random
effects. The usual example is a model of imperfect price competition where the prices
firms set in market t depend on the value of the vector of unobservable product
characteristics, ξt, some of which the firms observe.
The parameters to be estimated are the mean coefficients β0 and the variance-
covariance matrix of the random coefficients Σ0. We collect them in θ0 “ pβ0,Σ0q.
The data available consists of the market shares ps1t, . . . , sJtq and prices pp1t, . . . , pJtq1
of the J varieties of the good, of the covariates Xt, and of additional instruments
Zt, all for market t. Note that the market shares do not include information on
the proportion S0t of consumers who choose to buy good 0. Typically the analyst
estimates this from other sources. Let us assume that this is done, so that we can deal
with the augmented vector of market shares pS0t, S1t, . . . , SJtq, with Sjt “ p1´S0tqsjt
for j P J “ t1, . . . , Ju.
The market shares for market t are obtained by integration over the variation in
preferences ε: for good j P J ,
Sjt “ Eε
„
exp pXjt pβ ` εq ` ξjtq
1` ΣJk“1 exp pXkt pβ ` εq ` ξktq
(2)
and S0t “ 1´řJj“1 Sjt.
6
Berry et al. (1995) assume that
E pξjt|Zjtq “ 0
for all j P J and t. The instruments Zjt may for instance be the characteristics
of competing products, or cost-side variables. The procedure is operationalized by
showing that for given values of θ, the system (2) defines an invertible mapping5 in
IRJ . Call ΞpSt,Xt,θq its inverse; a GMM estimator obtains by choosing functions
Z˚jt of the instruments and minimizing a well-chosen quadratic norm of the sample
analogue of:
E`
ΞpSt,Xt,θqZ˚jt
˘
over θ.
These models have proved very popular; but their implementation has faced a
number of problems. Much recent literature has focused on the sensitivity of the
estimates to the instruments used in GMM estimation of the mixed multinomial
logit model. Reynaert–Verboven (2014) showed that using linear combinations of
the instruments can lead to unreliable estimates of the parameters of interest. They
recommend using the optimal instruments given by the Amemiya (1975) formula:
Z˚jt “ E
ˆ
BΞ
BθpSt,Xt,θ0q|Zjt
˙
.
As implementing the Amemiya formula relies on a consistent first-step estimate of θ0,
this is still problematic. Gandhi and Houde (2016) propose “differentiation IVs” to
approximate the optimal instruments for the parameters Σ of the distribution of the
random preferences ε. They also suggest a simple regression to detect weak instru-
ments. An alternative is to use the Continuously Updating Estimator to build up the
optimal instruments as minimization progresses. Armstrong (2016) points out that
instruments based on the characteristics of competing products achieve identification
through correlation with markups. But when the number of products is large, many
models of the cost-side of the market yield markups that just do not have enough
variation, relative to sampling error. This can give inconsistent or just uninformative
estimates6.5See Berry (1994).6Instruments that shift marginal cost directly (if available) do not need variation in the markup
to shift prices, and therefore do not suffer from these issues. Variation in the number of products
per market may also be used to restore identification, data permitting.
7
Computation has also been a serious issue. The original BLP approach used a
“nested fixed point” (NFP) approach: every time the objective function to be mini-
mized was evaluated for the current parameter values, a contraction mapping/fixed-
point algorithm must be employed to compute the implied product effects ξt from the
observed market shares St and for the current value of θ. This was both very costly
and prone to numerical errors that propagate from the nested fixed point algorithm
to the minimization algorithm. Dube et al (2012) proposed a nonlinearly-constrained,
nonlinear optimization problem to estimate θ. Their simulations suggest that this
“MPEC” approach often outperforms the NFP method, sometimes by a large factor.
Lee and Seo (2015) proposed an “approximate BLP” method that inverts a linearized
approximation of the mapping from ξt to St. They argue that this can be even faster
than the MPEC approach to estimation. Petrin and Train (2010) have proposed a
maximum likelihood estimator that replaces endogeneous regressors with a control
function. This circumvents the need to compute the implied value of ξ for each
value of θ, but still requires solving a nonlinear optimization problem to compute an
estimate of θ0.
Solving a nonlinear optimization problem for a potentially large set of parame-
ters is time-consuming. It typically requires starting values in the neighborhood of
the optimal solution; closed-form gradients; and careful monitoring of the optimiza-
tion algorithm by the analyst, as the objective function is not globally concave. The
method we propose in this paper completely circumvents the need to solve a non-
linear optimization problem. Our estimator relies on an approximate model that is
exactly valid when there is no random variation in preferences, and becomes a coarser
approximation as the amplitude of the random variation in elements of β` εi grows.
As such, our estimator is not a consistent estimator of the parameters of the BLP
model. On the other hand, it has some very real advantages that may tip the scale
in its favor. First, it requires a single linear 2SLS regression that can be computed
in microseconds with off-the-shelf software7. Second, our estimator needs to assume
very little about the form of the distribution of the random variation in preferences ε
(beyond its small scale), justifying the “robust” in our title—where the quotes reflect
7Fox et al. (2011) discretize the distribution of the random coefficients on a grid and estimate the
corresponding probability masses. This also results in a least-squares estimator—but it is constrained
by linear inequalities and may be sensitive to the choice of the grid points.
8
our awareness that we are taking some liberties with the definition of robustness. Fi-
nally, because our estimating equation is linear, computing the “optimal” instruments
for our estimator is also straightforward.
Some readers may find the “approximate correctness” of our estimator unsatis-
fying. It at least yields “nearly consistent” starting values for the classical nested-
fixed point and MPEC nonlinear optimization procedures at a minimal cost. This
can address a major challenge associated with successfully implementing the MPEC
estimation procedure. It also provides useful diagnoses about how well different pa-
rameters can be identified with a particular model and dataset; and a simple way to
select between models, as we discuss below.
2 2SLS Estimation in the Standard BLP Model
For the reader primarily interested in applying our method, this section provides a
step-by-step guide to implementing the estimator in the standard macro-BLP model.
This requires some notation. The dimensions of the vectors and matrices are as
follows:
• for each j P J and t, Xjt is a row vector with nX components
• β is a column vector with nX components
• for each i, εi is a row vector with ne components; in the standard model,
ne ď nX .
We denote I as the set of pairs of ordered indices pm,n ď mq such that the variance-
covariance element Σmn “ covpεim, εinq is not restricted to be zero8. For notational
simplicity, we also assume that we use all J ˆ T conditional moment restrictions:
E pξjt|Zjtq “ 0.
Adapting our procedure to subsets of moment restrictions is straightforward.
Our procedure runs as follows:
8E.g. if ne “ nX and Σ is assumed to be diagonal, I “ tp1, 1q, . . . , pnX , nXqu.
9
Algorithm 1. Fast Robust and Approximately Correct (FRAC) estima-
tion of the standard BLP model
1. For every market t, augment the market shares from ps1t, . . . , sJtq to pS0t, S1t, . . . , SJtq
2. For every product-market pair pj P J , tq :
(a) compute the market-share weighted covariate vector et “řJk“1 SktXkt;
(b) for every pm,nq in I, compute the “artificial regressor” Kjtmn as
• if n “ m: Kjtmm “
´
Xjtm
2´ etm
¯
Xjtm;
• if n ă m: Kjtmn “ XjtmXjtn ´ etmXjtn ´ etnXjtm.
(c) for every j “ 1, . . . , J , define yjt “ logpSjt{S0tq
3. Run a two-stage least squares regression of y on X and K, taking as instru-
ments a flexible set of functions of the columns of Z. Define β to be the es-
timated coefficients associated with X and (the nonzero part of) Σ to be the
estimated coefficients associated with K.
4. (optional9) Run a three-stage least squares (3SLS) regression across the T mar-
kets stacking the J equations for each product with a weighting matrix equal to
the inverse of the sample variance of the residuals from step 3.
Consistent estimates of the covariance matrix of the asymptotic distribution of?TJpθ ´ plimpθqq can be obtained from the expressions for the heteroskedasticity
consistent covariance matrix for the 2SLS estimator given in White (1982).
Ideally, the “flexible set of functions of the columns of Z” in step 3 should be
able to span the space of the optimal instruments EpX|Zq and EpK|Zq for our
approximate model. Alternatively, these optimal instruments can be estimated by a
nonparametric regressions of each the column of X on the columns of Z.
As is well-known, misspecification of one equation of the model can lead to incon-
sistency in 3SLS parameter estimates of all equations of the model. It is therefore
unclear whether Step 4 is worth the additional effort. We intend to explore this
question in future work.
9This step should only be considered when T " J .
10
It is important to note here that e is not a simple weighted average, as the weights
do not sum to one, but only to p1´S0tq. To illustrate, if Xjtm ” 1 is the constant, then
etm is p1 ´ S0tq and the artificial regressor that identifies the corresponding variance
parameter is
Kjtmm “ S0t ´
1
2.
More generally, if Xjtn “ 11pj P J0q is a dummy that reflects whether variety j belongs
to group J0 Ă J , then it is easy to see that the corresponding variance parameter is
the coefficient of the artificial regressor
Kjtnn “ 11pj P J0q
ˆ
1
2´ SJ0t
˙
where SJ0t is the market share of group J0 on market t.
3 Second-order Expansions
The rest of the paper justifies algorithm 1 and discusses extensions. We first derive
the small-σ expansions of the introduction.
We start from a specification of the conditional indirect utility of variety j for
consumer i on market t as
Xjtβ ` g pXjt, εiq ` ξjt ` uijt (3)
for j P J ; and Ui0t “ ui0t. Define the vectors uit “ pui0t, ui1t, . . . , uiJtq; Xt “
pX1t, . . . ,XJtq; and ξt “ pξ1t, . . . , ξJtq. We assume that
1. the random terms εi are i.i.d. across i with finite variance;
2. they are distributed independently of pXt, ξtq;
3. EgpXjt, εiq “ 0 for all Xjt;
4. the random vectors uit are i.i.d. across i and t; and they are distributed inde-
pendently of pεi,Xt, ξtq.
11
These assumptions are all standard, except for the third one which is only a mild
extension of the usual normalization Eεi “ 0. They allow for any type of codepen-
dence between the product effects ξt and the covariates Xt. Note that the additive
separability between β and ε is not as strict as it seems. If for instance we start from
a multiplicative model with utilities
nXÿ
k“1
Xjtkβkζki ` ξjt ` uijt
we can always redefine εki “ βkpζki ´ 1q to recover (3).
Our crucial assumption, which we maintain throughout, is that the utilities are
affine in β and additive in the product effects ξ and in the idiosyncratic terms u. On
the other hand, we allow for any kind of distribution for εi and uit. This encompasses
most empirical specifications used, as well as many more. We will refer to three special
cases for illustrative purposes:
1. The standard model, also known as the mixed multinomial logit model, has
g pX, εq “Xε; and the vector uit is distributed as iid standard type-I Extreme
Value random variables.
2. The standard binary model (or mixed logit model) further imposes J “ 1.
3. The standard symmetric model is a standard model with ε distributed symmet-
rically around 0;
4. The standard Gaussian model is a standard model with ε normal with a diag-
noral covariance matrix. It is probably the most commonly used in applications
of the macro-BLP method.
5. Finally, the standard Gaussian binary model imposes both 2 and 4.
In order to do small-σ expansions, we need to introduce a scale parameter σ.
We do this with Assumption 1, which fits the usual understanding of what a scale
parameter is10 and also imposes the restriction that all moments of ε are finite-valued.
The most common specification of the “macro-BLP” model has a Gaussian ε and
satisfies Assumption 1.
10In principle it should be possible to use several scale parameters, say σ1 for one part of the
variance-covariance matrix and σ2 for another one.
12
Assumption 1 (Finite moments). For some integer L ě 2, all moments of order
1 ď l ď L`1 of the vector ε are finite; they are of order l in some non-negative scalar
σ. The first moment is zero: Eε “ 0. We denote Σ “ Eεε1 the variance-covariance
matrix of ε, and µl (for l ě 3) its (uncentered) higher order moments.
It will be convenient to write ε ” σBv with v a random vector of mean zero
and covariance matrix equal to the identity matrix, so that V arpεq ” Σ “ σ2BB1.
We only use this decomposition for intermediate results. Note that B is an ne ˆ nv
matrix pne ě nxq, where v is a row vector with nv components. This allows for Σ
to have a factor structure. Our final expansions do not depend on how σ and B are
normalized, and we won’t need to specify it.
We drop the index t from the notation in most of this section as we will only need
to deal with one market at a time.
3.1 Second-order Expansions in the Standard Model
Much of the rest of the remainder of the paper focuses on the standard model, where
the idiosyncratic error terms u have iid Type I extreme value distributions. We will
show in section 6.4 how to extend our results to more general distributions.
Recall that in the standard model, market shares are given by (2). If the scale
parameter σ was zero, inverting (2) would simply give us
ξj “ logSjS0
´Xjβ for j P J , (4)
which is standard multinomial logit model without random coefficients. This is the
starting point of the contraction algorithm described in Berry (1994).
Now let σ be positive. With ε “ σBv, a Taylor expansion of (4) at σ “ 0 would
give (assuming that the expansion is valid11)
ξj “ logSjS0
´Xjβ ` ΣLl“1aljpS,X,βq
σl
l!`OpσL`1q, (5)
where the aljpS,X,βq are defined below. In this equation, X collects the covariates
of all products and S is the vector of market shares. Market-share weighted sums
will play a crucial role in what follows:
11We return to this point in section 5.1.
13
Definition 1 (Market-share weighting). For any J-dimensional vector T of J com-
ponents, we define the scalar
eST “Jÿ
k“1
SkTk.
By extension, if m is a pJ ˆ Jq matrix with J columns pm1, . . . ,mJq, we define the
vector
eSm “
Jÿ
k“1
Skmk.
Finally, we denote Tj “ Tj ´ eST and mj “mj ´ eSm.
Note that we are using the observed market shares of the J goods, so that these
weighted sums are very easy to compute from the data. It is important to emphasize
that the operator eS is not an average, as the augmented market shares Sk for k P Jdo not sum to one but to p1 ´ S0q. Similarly, the Tj terms are not residuals, and
eST ‰ 0 in general.
Our first goal is to find explicit formulæ for the coefficients alj in (5). While this
can be done at a high level of generality, let us start with a result that covers a large
majority of applications.
In the standard model, g pXj, εq is simply Xjε. Using the identity ε ” σBv,
denote xj “ pXjBq1, a vector of nv components; and x the matrix whose J columns
are px1, . . . ,xJq. Then
g pXj, εq “ σx1jv.
We now use this notation to derive the second-order expansion in σ in the standard
model.
Theorem 1 (Intermediate expansion in the standard model). In the standard model,
(i) the alj coefficients only depend on S and on x;
(ii) the first-order coefficients are zero: a1j ” 0 for all j;
(iii) the second-order coefficients are given by
a2j “ 2xj ¨ eSx´ ‖xj‖2 “ ´xj ¨
˜
xj ´ 2Jÿ
k“1
Skxk
¸
; (6)
14
(iv) in the standard symmetric model, alj “ 0 for all j and odd l ď L. Therefore if
L ě 3,
ξj “ logSjS0
´Xjβ `a2j2σ2`Opσ4
q. (7)
Proof. See Appendix A.
3.2 The Artificial Regressors in the Standard Model
When truncated of its remainder term, equation (7) becomes linear in the parame-
ters pβ, σ2q. The coefficients a2j, however, are quadratic combinations of the vectors
xj, which are themselves linear in the unknown coefficients of the matrix B. Fortu-
nately, the formula that gives a2j can be transformed so that it becomes linear in the
coefficients of the variance-covariance matrix Σ of ε.
To see this, note that since xk “ B1X 1
k,
x1kxl “XkBB1X 1
l .
But because Σ “ σ2BB1, we have
σ2x1kxl “nXÿ
m,n“1
ΣmnXkmXln “ Tr pΣXlX1kq
where Trp¨q is the trace operator.
Plugging this into (6) gives
σ2a2j2“ Tr
ˆ
Σ
ˆ
eSX ´Xj
2
˙
X 1j
˙
.
Define the nX ˆ nX matrices Kj by
Kj“
ˆ
Xj
2´ eSX
˙
X 1j
so that we can also write σ2 a2j2“ ´TrrΣKj
s. The matrices Kj can be constructed
straightforwardly from the covariates X and the market shares S. Given that Σ is
symmetric,
TrrΣKjs “
nXÿ
m“1
ΣmmKjmm `ÿ
m,năm
Σmn
`
Kjmn `Kjnm˘
. (8)
Define the lower-triangular matrices Kj by
15
• on the diagonal: Kjmm “ Kj
mm
• off the diagonal (for n ă m): Kjmn “ Kj
mn `Kjnm.
We call their elements the “artificial regressors”, for reasons that will soon become
clear.
Additional a priori restrictions can be accommodated very easily. For instance,
it is common to restrict Σ to be diagonal, as is the case in Berry et al. (1995). Then
only nX terms enter in the sum (8); moreover,
Kjmm “
˜
Xjm
2´
Jÿ
k“1
SkXkm
¸
Xjm.
If Σ is not diagonal, then we need to also use additional terms
Kjmn “ XjmXjn ´
˜
Jÿ
k“1
SjXkm
¸
Xjn ´
˜
Jÿ
k“1
SjXkn
¸
Xjm.
To summarize, we have:
Theorem 2 (Final expansion in the standard model). In the standard model,
ξj “ logSjS0
´Xjβ ´nXÿ
m“1
ΣmmKjmm ´
ÿ
năm
ΣmnKjmn `Op‖Σ‖k{2q, (9)
where the integer k ě 3; and k “ 4 if the model is symmetric.
The artificial regressors are given by
Kjmm “
˜
Xjm
2´
Jÿ
k“1
SkXkm
¸
Xjm
Kjmn “ XjmXjn ´
˜
Jÿ
k“1
SjXkm
¸
Xjn ´
˜
Jÿ
k“1
SjXkn
¸
Xjm.
4 2SLS Estimation
Equation (9) is linear in the parameters of interest θ “ pβ,Σq, up to the remainder
term. This immediately suggests neglecting the remainder term and estimating the
approximate model ξj “ logSj
S0´Xjβ ´ TrrΣKjs.
16
More precisely, assume we are given a sample of T markets, and instruments
Zjt such that E pξjt|Zjtq for all j and t. Then our proposed estimator θ fits the
approximate linear set of conditional moment restrictions:
E
ˆ
log
„
SjtS0t
´`
Xjtβ ` TrrΣKjts˘
|Zjt
˙
“ 0 (10)
which only differs from the original model by a term of order σ3 (or σ4 if the distri-
bution of ε is symmetric)12. This can simply be done by choosing vector functions
Z˚jt of the instruments and running two-stage least squares: for each j “ 1, . . . , J
and t “ 1, . . . , T , we run a 2SLS regression of logpSjt{S0tq on Xjt and the relevant13
variables Kjt, using the instruments Z˚jt.
5 Pros and Cons of the 2SLS Estimation Approach
The drawback of our method is obvious: because this is only an approximate model,
the resulting estimator θ will not converge to θ0 as the number of markets T goes
to infinity. We discuss this in much more detail in section 5.1. For now, let us
note that this drawback is tempered by several considerations. First, the number of
markets available in empirical IO is typically small; finite-sample performance of the
estimator is what matters, and we will examine that in Section 7. More importantly,
our estimator has several useful features. Let us list six of them:
1. Because the estimator employs linear 2SLS, computing it is extremely fast and
can be done in microseconds with any of-the-shelf software.
2. We do not have to assume any distributional form for the random variation
in preferences ε. This is a notable advantage over other methods: while they
yield inconsistent estimates if the distribution of ε is misspecified, our estimator
remains consistent for the parameters of the approximate model.
12Note that because our model is only an approximation, there may not exist a value of θ that
satisfies the conditional moment restrictions (10) in the population. Nevertheless, our 2SLS estima-
tion procedure will still yield an estimate of the value of θ that minimizes the probability limit of
the 2SLS objective function. We explore this further in our Monte Carlo study in section 7.13E.g. only the nX variables Kjt
mm if Σ is restricted to be diagonal, or even a subset if some
coefficients are non-random.
17
3. Computing the optimal instruments does not require any first-step estimate
because the estimating equation is linear. We can just use a flexible set of
functions of the columns of Z that span the space of the optimal instruments
EpX|Zq and EpK|Zq.
4. Even if the econometrician decides to go for a different estimation method,
our proposed 2SLS estimates obtained should provide a set of very good initial
parameter values for a nonlinear optimization algorithm.
5. The confidence regions on the estimates will give useful diagnoses about the
strength of identification of the parameters, both mean coefficients β and their
random variation Σ. This would be very hard to obtain otherwise, except by
trying different specifications.
6. There has been much interest in systematic specification searches in recent
years; see e.g. Horowitz–Nesheim 2018 for a Lasso-based selection approach in
discrete choice models. With our method any number of variants can be tried
in seconds, and model selection is drastically simplified.
5.1 The Quality of the Approximation
Ideally, we would be able to bound the approximation error in the expansion of ξj,
and use this bound to majorize the error in our estimator in the manner described in
Kristensen and Salanie (2017). While we have not gone that far, we can justify the
local-to-zero validity of the expansion in the usual way. We are taking a mapping
S “ G pξ,X, σq
that is differentiable in both ξ and σ; inverting it to ξ “ Ξ pS,X, σq; and taking
an expansion to the right of σ “ 0 for fixed market shares S and covariates X. The
validity of the expansion for small σ and fixed pX,Sq depends on the invertibility of
the Jacobian Gξ.
First consider the standard model. It follows from Berry 1994 that Gξ is invertible
if no observed market share hits zero or one. Applying the Implicit Function Theorem
repeatedly shows that in fact the Taylor series of ξ converges over some interval r0, σs
18
if all moments of ε are finite; and that the expansion is valid at order L if the moments
of ε are bounded to order pL` 1q.
Characterizing this range of validity is trickier. Figure 1 uses formulæ derived
in Appendix B to plot the first four coefficients of the expansion in Σ11X21 for the
standard Gaussian binary model (that is, the Gaussian mixed logit) with one covariate
X1:
ξ1 “ logS1
S0
´ βX1 `
4ÿ
k“1
tkpS1q`
Σ11X21
˘k`Opσ10
q.
Each curve plots the function tk as market shares vary between zero and one. The
visual impression is clear: the coefficients damp quickly. Beyond the first term (which
corresponds to our 2SLS method), the coefficients are always smaller than 0.05 in
absolute value. Of course, the approximation error also depends on the values taken
by the covariates.
0.2 0.4 0.6 0.8 1.0S1
-0.15
-0.10
-0.05
0.05
0.10
0.15
t1(S1)
t2(S1)
t3(S1)
t4(S1)
Figure 1: Coefficients t1,2,3,4pS1q
While this simple example can only be illustrative, we find the figure encouraging
as to the practical range of validity of the approximation. We have not determined
the extent to which these results generalize to the multinomial logit model.
19
5.2 “Robustness”
Our expansions only rely on the properties of the derivatives of the logistic cdf Lptq “1
1`expp´tqand on the first two moments of ε. This has a distinct advantage over
competing methods: the lower-order moments of ε can be estimated by 2SLS, and
nothing more needs to be known about its distribution.
Suppose for instance that the analyst does not want to assume that ε has a
symmetric distribution. Then the coefficients a1j are still zero, and the coefficients
a2j are unchanged. In the absence of symmetry, the approximate model is only valid
up to Opσ3q; but running Algorithm 1 may still provide very useful estimators of the
elements of Σ0.
6 Extensions
Our technique can be extended to other random coefficient models as long as the
conditional indirect utility remains additive in the product-specific random effects ξ.
This is shown in section 6.1. We follow with a modification of our multinomial logit
random coefficients modeling framework to account for the third and fourth moments
of ε. We then turn to methods for improving the quality of our 2SLS estimates14.
Finally, section 6.4 presents a nonlinear 2SLS estimation procedure for the nested
logit model.
6.1 Quasi-linear Random Coefficients Models
Consider the following class of models, whose defining characteristic is that the error
term η and the mean coefficients β only enter via a linear combination η ´ f1pyqβ:
Gpy,η,β, σq ” G˚py, EvA
˚py,η ´ f1pyqβ, σBvqq . (11)
where v is unobserved heterogeneity distributed independently of y and η and nor-
malized by Ev “ 0 and V v “ I; and both functions G˚ and A˚ are assumed to be
known.
14We explore the small sample properties of several of these corrections in our Monte Carlo study.
20
Note that the macro–BLP model takes this form, with y “ pS,Xq; f1pyq “ ´X;
η “ ξ; and
A˚j “ Pr
ˆ
j “ arg maxJ“0,1,...,J
pXkβ ` ξk ` σXkBvq |X, ξ,v
˙
so that, denoting aj ”Xj and bj “Xjβ ` ξj,
A˚pa, b, cq ”exp pbj ` ajcq
1`řJk“1 exp pbk ` akcq
;
and G˚j ” Sj ´ EvA˚j .
We continue to assume that E pη|Zq “ 0. The quasi-linear structure in (11)
allows this class of models to be approximately estimated by 2SLS. Denoting partial
derivatives with subscripts, we have:
Theorem 3 (Expansions for quasi-linear random coefficients models). Consider a
model of the class defined by (11) and assume that
• G˚ is twice differentiable with respect to its second argument
• A˚ is twice differentiable with respect to its last two arguments
• the matrices G˚2py,A
˚py,η´f1pyqβ,0qq and A˚2py,η´f1pyqβ,0q are invertible
for all py,η,βq.
Any such model satisfies the conditions C1–C3 in the introduction. Moreover,
• f1pyq appears directly in (11)
• the variables f0pyq are defined by the system of equations
G˚py,A˚py,f0pyq,0qq “ 0
• and the variables f2pyq solve the linear system
A˚33py,f0pyq,0qq f2pyq “ ´A˚2py,f0pyq,0qq.
21
Proof: See Appendix E.
As explained in the introduction, these models can be estimated by regressing
f0pyq on f1pyq and f2pyq with a set of flexible functions of Z as instruments. As
the macro–BLP model belongs to this class, this confirms that conditions C1–C3
hold in the BLP model; we had shown it implicitly in section 3 by deriving the
expansions. Note also that we did not use any distributional assumption on the
random coefficients and the idiosyncratic shocks—although of course the terms in
the expansions do depend on these distributions. We give an illustration for a one-
covariate mixed binary model without any distributional assumption in Appendix B.3.
6.1.1 Examples
It is easy to generate models in the quasi-linear class (11). Starting from any Genes-
ralized Linear Model gpyq “ Xβ ` η, we can for instance transform the right-hand
side by adding additive unobserved heterogeneity and another link function:
gpyq “ Eεh pXβ ` η, σεq .
When the link functions g and h are both assumed to be known, all such models obey
conditions C1–C3 and can therefore be studied with our method. Note that in these
models f1pyq ” ´X and f2pyq “ ´ph1{h22qpf0pyq, 0q where
f0pyq “ hp¨, 0q´1pgpyqq
(assuming the inverse is well-defined.)
The nested logit of section 6.4 shows that our method remains useful beyond
the class of quasi-linear models, at the cost of breaking conditions C2 and C3 and
requiring (simple) numerical optimization.
6.2 Higher-order terms
In Appendix B, we study in more detail the standard binary model. For this simpler
case, calculations are easily done by hand for lower orders of approximation, or using
symbolic software for higher orders.
22
More generally, return to the standard model and assume (as is often done in
practice) that the εm are independent across the covariates m “ 1, . . . , nX . We
denote as before Σmm “ Epε2mq, and µlm the expected value of εlm for l ě 3. Tedious
calculations15 show that the second- to fourth-order terms of the expansion in σ are
ξj “ logSjS0
´Xjβ `4ÿ
l“2
Alj `Opσ5q
with
A2j “ÿ
m
Xjm peSXm ´Xjm{2qΣmm;
A3j “ÿ
m
Xjm
ˆ
XjmeSXm
2`eS pX
2mq
2´X2jm
6´ peSXmq
2
˙
µ3m;
and
A4j “ÿ
m
µ4mXjm
ˆ
peSXmq3´ peSXmqpeSpX
2mqq ´Xjm
peSXmq2
2´X3jm
24
`eSpX
3mq
6`Xjm
eSpX2mq
4`X2
jm
eSXm
6
˙
`A2
2j
2`ÿ
m
ΣmmXjm
ˆ
eSpA2Xmq ` peSA2q
ˆ
Xjm
2´ 2peSXmq
˙˙
.
First consider the third-order term A3j. It is a linear function of the unknown skew-
nesses µ3m; in fact it can be rewritten as
´ÿ
m
T jmµ3m
where we introduced new artificial regressors
T jm ” Xjm
ˆ
X2jm
6` peSXmq
2´Xjm
eSXm
2´eSpX
2mq
2
˙
.
Algorithm 1 can be adapted in the obvious way to take possible skewness of ε into
account. Note that the procedure remains linear in the parameters pβ,Σ,µ3q, for
which it generates approximate estimates by 2SLS.
15Available from the authors.
23
The fourth-order term, on the other hand, contains terms that are linear in the
µ4m (the first two lines of the formula) as well as terms that are quadratic in Σ (the
last line). The first group suggests introducing more artificial regressors
Qjm ” Xjm
ˆ
peSXmqpeSpX2mqq ´ peSXmq
3`Xjm
peSXmq2
2`X3
jm{24
´eSpX
3mq
6´Xjm
eSpX2mq
4´X2
jm
eSXm
6
˙
,
whose coefficients are the µ4m. The second group yields
´ÿ
m,n“1
ΣmmΣnnWjmn
where new artificial regressors W are assigned products of the elements of Σ. Esti-
mating the resulting regression requires nonlinear optimization (albeit a very simple
one).
6.3 Correcting the 2SLS estimates
If the analyst is willing to make more distributional assumptions, she can resort to
bootstrap or asymptotic corrections to improve the accuracy of our 2SLS estimators.
6.3.1 Bootstrapping
Once we have approximate estimators β and Σ, we can use them to solve the market
shares equations for estimates of the product effects ξ and bootstrap them, provided
that we are willing to impose a distribution for v (beyond the normalization of its
first two moments.)
We use Berry inversion to solve for ξt in the system
Sjt “ Evexp
´
Xjt
´
β ` Σ1{2v¯
` ξjt
¯
1` ΣJk“1 exp
´
Xkt
´
β ` Σ1{2v¯
` ξkt
¯ ,
where Evp¨q denotes the expectation taken with respect to the assumed distribution
of v. For any resample ξ˚ of the ξ, we simulate the market shares from
S˚jt “ Evexp
´
Xjt
´
β ` Σ1{2v¯
` ξ˚jt
¯
1` ΣJk“1 exp
´
Xkt
´
β ` Σ1{2v¯
` ξ˚kt
¯
24
and we use our 2SLS method to get new estimates β˚,Σ˚. Finally, we compute
bias-corrected estimates by e.g.
βC “ 2β ´1
B
Bÿ
b“1
β˚b .
More generally, the resampled estimates can be used to estimate the distribution of
β and Σ in the usual manner.
6.3.2 Asymptotic Correction
Another way to use the third- and fourth-order terms in our expansion is as a cor-
rective term: that is, we run 2SLS on the second-order expansion and we use the
formulæ for the higher-order terms to correct for the effects of the approximation.
Denote θ “ pΣ,βq, and θ0 its true value. Let θ2 be our 2SLS estimator based on
a second-order expansion. That is, we estimate the approximate model Epξ2Zq “ 0
with instruments Z and weighting matrix W , where
ξ2j “ logSjS0
´Xjβ ´ Tr ΣKj . (12)
As the number of markets T gets large, θ2 converges to the solution θ2 of Ef2pθ2q “ 0,
with
f2pθq ”Bξ2Bθpθ,X,Sq1ZWZ 1ξ2pθ,X,Sq.
Alternatively, we could have estimated the model using inversion or MPEC, with an
“exact” ξ8. Let λ0 denote additional parameters of the model (such as higher-order
moments of the distribution of ε) that are identified using the exact ξ8 but not16
with our approximate ξ2.
Since by assumption E pξ8pθ0,λ0,X,SqZq “ 0, a fortiori Ef8pθ0;λ0q “ 0 with
f8pθ;λ0q ”Bξ8Bθpθ,λ0,X,Sq1ZWZ 1ξ8pθ,λ0,X,Sq.
The dominant term in the asymptotic bias is given by expanding Ef8pθ;λ0q around
θ “ θ2, keeping λ0 fixed. It is
θ2 ´ θ0 »
ˆ
EBf8Bθpθ2;λ0q
˙´1
Ef8pθ2;λ0q.
16If the only free parameters of the distribution of ε are the elements of Σ, then λ0 will be empty.
25
Denote X the matrix with terms Xjm and K the matrix whose row j “ 1, . . . , J
contains the artificial regressors Kjmn. We define e2pθ;λ0q “ ξ8pθ;λ0q ´ ξ2pθq, the
approximation error on ξ. Under any assumption about the parameters in λ0, we
can compute the higher-order terms ξ3, ξ4, . . . to approximate e2. If for instance we
maintain the assumption that the model is symmetric, we can approximate e2 »
ξ4 ´ ξ2. Results in Robinson (1988) and in Kristensen and Salanie (2017) could be
adapted to show that the resulting corrected estimator not only has a smaller bias;
it is in fact asymptotically equivalent to the estimator θ4 based on a fourth-order
expansion.
Let us suppose then that we have a reliable estimator e2pθ;λ0q of e2pθ;λ0q. Define
V by the Cholesky decomposition ZWZ 1 “ V V 1, so that V is a pJˆJq matrix. We
prove in Appendix D that this asymptotic correction yields the following formula:
θ0 » θ2 `
˜
E pX 1V V 1Xq E pX 1V V 1Kq
E pK 1V V 1Xq E pK 1V V 1Kq
¸´1˜
E pX 1V V 1e2q
E pK 1V V 1e2q ´ E`
Be2BΣV V 1ξ2
˘
¸
.
To interpret this formula, note that if e2 did not depend on Σ the corrective term
on the right-hand-side would simply be the 2SLS estimate of the regression of e2
on pX,Kq with instruments V . In fact, if we are only interested in correcting the
estimators of the mean coefficients β2, we can simply keep the corresponding part
of the 2SLS estimate. The correction on Σ2 has an additional term as higher order
terms in the expansion of ξ typically depend on Σ. (Recall from Theorem 1.(i) that
they do not depend on β.)
6.3.3 Two-Step Estimator Based on the Asymptotic Correction
In practice, we use the estimate of ξ8pθ2q computed using our 2SLS estimator and
the Berry inversion as well as the residual from our 2SLS estimate
ξ2jpθ2q “ logSjS0
´Xjβ2 ´ Tr Σ2Kj
to correct our initial 2SLS estimate. The difference between ξ2jpθ0q and ξ8pθ0q is
equal to the higher-order terms in equation (5). Computing a new dependent variable
yj “ log
„
SjS0
´ pξ2jpθ0q ´ ξ8pθ0qq
26
and applying 2SLS would yield a consistent estimate of θ0. Using our 2SLS estimate
θ2 and computing
y˚j “ log
„
SjS0
´ pξ2jpθ2q ´ ξ8pθ2qq
and applying 2SLS with this dependent variable is a feasible version of this correc-
tion17. In our Monte Carlo study we will explore the small sample properties of this
procedure.
6.4 The Two-level Nested Logit
Campioni (2018) applies a nonparametric approach to the choice among a very large
set of products. He shows that the mixed logit specification forces the price elasticity
to become “too small” at high price levels. This raises the question of the appropriate
choice of a distribution for the idiosyncratic terms uijt.
For the mixed logit (J “ 1), it is very easy to compute the artificial regressors
for any distribution of the idiosyncratic terms; we give the formulæ in Appendix B.3.
When J ą 1, the space of possible distributions increases dramatically. The compu-
tations also become more complicated. Finally, estimating the additional parameters
of the distribution of u requires (simple) nonlinear optimization.
For illustrative purposes, we give the estimating equation for the two-level nested
logit model. Assume that there is a nest for good 0, and K nests N1, . . . , NK for the
varieties of the good. For k “ 1, . . . , K, we denote λk the corresponding distribution
parameter—with the usual interpretation that p1 ´ λkq proxies for the correlation
between choices within nest k, and that the multinomial logit model obtains when all
λk “ 1.
We denote the market share of nest k by SNk“
ř
jPNkSj. Take any variable
T “ pT0, T1, . . . , TJq. We define the within-nest-k share-weighted average as
Tk “ÿ
jPNk
SjSNk
Tj.
Note in particular that eST “řKk“1 SNk
Tk.
17Note that this procedure could be iterated.
27
Appendix C derives the equivalent of (6): for j P Nk,
a2j “ xj ¨
ˆ
2eSx´xjλk` 2
1´ λkλk
xk
˙
´1´ λkλk‖xk‖2.
Reintroducing the market index t, the corresponding artificial regressors are
Kjtmm “
ˆ
Xjt,m
2´
1´ S0tλk1´ S0t
etm
˙
Xjt,m
λk`
1´ λkλk
Xkt,m
ˆ
Xkt,m ´2Xjt,m
λk
˙
and for any off-diagonal term n ă m,
Kjtmn “ Xjt,mXjt,n´
1´ S0tλk1´ S0t
etmXjt,n ` etnXjt,m
λk`2
1´ λkλk
ˆ
Xkt,mXkt,n ´Xkt,mXjt,n ` Xkt,nXjt,m
λk
˙
where as in section 3, etm “řJj“1 SjtXjtm.
If the λk parameters are known, then our procedure becomes:
Algorithm 2. FRAC estimation of the two-level nested logit BLP model
1. on every market t, augment the market shares from ps1t, . . . , sJtq to pS0t, S1t, . . . , SJtq
2. for every product-market pair pj P J , tq :
(a) compute the market-share weighted covariate vector et “řJl“1 SltXlt and
the within-nest weighted average covariate vector
Xkpjq,t “ÿ
lPNkpjq
SltSNkpjq,t
Xlt
where kpjq is the nest that variety j belongs to.
(b) for every pm,nq in I, compute the “artificial regressor”
Kjtmm “
ˆ
Xjt,m
2´
1´ S0tλkpjq1´ S0t
etm
˙
Xjt,m
λkpjq`
1´ λkpjqλkpjq
Xkpjq,t,m
ˆ
Xkpjq,t,m ´2Xjt,m
λkpjq
˙
and for any off-diagonal term n ă m,
Kjtmn “ Xjt,mXjt,n ´
1´ S0tλkpjq1´ S0t
etmXjt,n ` etnXjt,m
λkpjq
` 21´ λkpjqλkpjq
ˆ
Xkpjq,t,mXkpjq,t,n ´Xkpjq,t,mXjt,n ` Xkpjq,t,nXjt,m
λkpjq
˙
.
28
(c) define
yjt “ logSNkpjq,t
S0t
` λkpjq logSjt
SNkpjq,t
3. run a two-stage least squares regression of y on X and K, taking as instruments
a flexible set of functions of Z
4. (optional) run a three-stage least squares (3SLS) regression across the T markets
stacking the J equations for each product with a weighting matrix equal to the
inverse of the sample variance of the residuals from step 43.
If the parameters λ are not known, then things are slightly more complicated:
the formulæ cannot be made linear in λ, and there are no corresponding artificial
regressors. Estimation of pβ,Σ,λq requires numerical minimization over the λ.
More general distributions in the GEV family could also be accommodated. As
the nested logit example illustrates, there is a cost to it: the approximate model
becomes nonlinear in some parameters18. Note however that if there is reason to
believe that the true distribution is close to the multinomial logit (say λ » 1 in the
example above), then one can take expansions in the same way we did for the random
coefficients and use a 2SLS estimate again.
7 Monte Carlo Analysis of the Small-Sample Per-
formance of our Estimator
This section presents the results of a Monte Carlo study of an aggregate discrete choice
demand system with random coefficients. It compares the finite sample performance
of our estimator of the parameters to estimators computed using the mathematical
programming with equilibrium constraints (MPEC) approach recommended by Dube,
Fox and Su (2012). We also show results demonstrating some of the “robustness”
of our estimation procedure to assumptions about the distribution of the random
coefficients. Specifically, we find that even if the distribution of random coefficients is
misspecified, our procedure still yields very good estimates of the means and variances
of the random coefficients.
18Technically, condition C1 in the introduction still holds, but conditions C2 and C3 do not.
29
The basic set-up of our Monte Carlo study follows that in Dube, Fox and Su
(2012). It is a standard static aggregate discrete choice random coefficients demand
system with T “ 50 markets and J “ 25 products in each market, and K “ 3
observed product characteristics. Following Dube, Fox, and Su (2012), let Mt denote
the mass of consumers in market t “ 1, 2, . . . , T . Each product is characterized by the
vector pX 1jt, ξjt, pjtq
1, where Xjt is a K ˆ 1 vector of observable attributes of product
j “ 1, 2, . . . , J in market t, ξjt is the vertical product characteristic of product j
in market t that is observed by producers and consumers, but unobserved by the
econometrician, and pjt is the price of product j in market t. Collect these variables
for each product into the following market-specific variables: Xt “ pX11t, . . . ,X
1Jtq
1,
ξt “ pξ1t, ξ2t, . . . , ξJtq1, and pt “ pp1t, p2t, . . . , pJtq
1.
The conditional indirect utility of consumer i in market t from purchasing product
j is
uijt “ β0 `X1jtβ
xi ´ β
pi pjt ` ξjt ` εijt
The utility of the j “ 0 good, the “outside” good, is equal to u0jt “ εi0t. Each
element of βxi “ pβxi1, . . . , β
xiKq
1 is assumed to be drawn independently from Npβxk , σ2kq
distributions, and each βpi is assumed to be drawn independently from Npβp, σ2pq. We
denote βi “ pβxi1, βpi q
1.
We collect all parameters into
θ “ pβ0, βx1 , . . . , βxK , βp, σ
21, . . . , σ
2K , σ
2pq1.
Our simulations all have mean coefficients
pβ0, βx1 , βx2 , β
x3 , βpq “ p´1, 1.5, 1.5, 0.5,´1q.
and varying variances pσ21, σ
22, σ
23, σ
2pq. We also experiment with varying βp.
To compute the market shares for the J products, we assume that the εijt are
independently and identically distributed Type I extreme value random variables, so
that the probability that consumer i with random preferences βi purchases good j in
market t is equal to
sijtpXt,pt, ξt|βiq “exppβ0 `X 1
jtβxi ´ β
pi pjt ` ξjtq
1`řJk“1 exppβ0 `X 1
ktβxi ´ β
pi pkt ` ξktq
30
We compute the observed market shares for all goods in market t by drawing ns “
1, 000 draws pζiktq from four Np0, 1q random variables and constructing 1, 000 draws
from βi|θ as follows:
βxikt “ βxk ` σkζikt and βpit “ βp ` σpζipt.
We then use these draws to compute the observed market share of good j in market
t as:
sjtpXt,pt, ξt|θq “1
ns
nsÿ
i“1
sijtpXt,pt, ξt|βiq
given the vectors Xt, pt, and ξt for each market t.
Consistent with the experimental design in Dube, Fox and Su (2012), we generate
the values of Xt, pt, ξt and a vector of 6 instruments Zjt as follows. First we draw
Xt for all markets t “ 1, 2, . . . , T from a multivariate normal distribution:»
—
–
x1j
x2j
x3j
fi
ffi
fl
„ N
¨
˚
˝
»
—
–
0
0
0
fi
ffi
fl
,
»
—
–
1 ´0.8 0.3
´0.8 1 0.3
0.3 0.3 1
fi
ffi
fl
˛
‹
‚
The price of good j in market t is equal to
pjt “ |0.5ξjt ` ejt ` 1.1px1j ` x2j ` x3jq|,
where ejt „ Np0, 1q, distributed independently across products and markets. The
ξjt are Np0, σ2ξ q random variables drawn independently across products and markets
for different values of σ2ξ described below. The data generating process for the vector
of instruments is:
zjtd „ Up0, 1q ` 0.25pejt ` 1.1px1j ` x2j ` x3jqq
where d “ 1, . . . , 6.
For a specified value of the parameter vector θ, following this process for T “ 50
markets yields the dataset for one Monte Carlo draw.
7.1 MPEC Approach
The MPEC approach solves a nonlinear minimization problem subject to nonlin-
ear equilibrium constraints. The first step of the estimation process constructs the
31
following instrumental variables for all the products in all the markets. There are
42 instruments in total; they are constructed from product characteristics xj and
excluded instruments zjt:
1, xkj, x2kj, x
3kj, x1jx2jx3j, zjtd, z
2jtd, z
3jtd, zjtdx1j, zjtdx2j,
6ź
d“1
zjtd
Let W denote this pJ ˆ T q ˆ 42 matrix of instruments. In our case J ˆ T “ 1, 250
since J “ 25 and T “ 50.
The MPEC approach solves for θ by minimizing
η1W pW 1W q´1W 1η
subject to the “equilibrium constraints”
spη,θq “ S
where S is the vector of observed market shares computed as described above given
the values of xt, pt and ξt and η is a pJ ˆ T q ˆ 1 vector defined by the following
equation:
sjtpη,θq “1
Ns
Nsÿ
i“1
exppθ1 ` x1jβx1i ` x2jβ
x2i ` x3jβ
x3i ` pjtβ
pi ` ηjtq
1`řJk“1 exppθ1 ` x1kβx1i ` x2kβ
x2i ` x3kβ
x3i ` pktβ
pi ` ηktq
where each pβxi , βpi q is a random draw from the following normal distribution:
N
¨
˚
˚
˚
˝
»
—
—
—
–
θ2
θ3
θ4
θ5
fi
ffi
ffi
ffi
fl
,
»
—
—
—
–
θ6 0 0 0
0 θ7 0 0
0 0 θ8 0
0 0 0 θ9
fi
ffi
ffi
ffi
fl
˛
‹
‹
‹
‚
Note that θ1 (like β0) is not allowed to be random. For purposes of estimation, we
set Ns “ 1, 000. For each Monte Carlo simulation, we start the optimization with
the following initial point: true values for θ, and a vector of zeros for the η vector.
Clearly, these starting values are not feasible for empirical researchers; we use them
to maximize the chances that the MPEC estimation will converge to a solution.
32
7.2 Our 2SLS Approach
Our 2SLS approach resorts to a slight modification of the standard linear 2SLS esti-
mator to account for the fact that the estimates of the σ2k and σ2
p cannot be negative.
First, we construct the instrumental variables as the MPEC approach. We then con-
struct the artificial regressors K1, K2, K3, Kp of Theorem 2 for each product in each
market by applying
Xit “
Jÿ
k“1
xikSkt
Kjti “ xijpxij{2´ Xitq
for i “ 1, 2, 3, p.
The next step performs an instrumental variable regression of yjt “ logpSjt
S0tq on
1, x1, x2, x3, x4, K1, K2, K3, Kp using all 42 instruments. If any coefficient for the
last four variables is negative, we set that coefficient to 0 and rerun the regression
without that variable. We iterate this process until all the coefficients are positive,
or all four variables are excluded from the instrumental variables regression.
In addition to this standard 2SLS estimator, we compute a corrected estimator as
explained in section 6.3.3. To evaluate it, we replace yjt, the dependent variables for
2SLS estimates, with yjt ´ pξ2,jt ´ ξ8,jtq, where
• ξ2,jt is the residual from our initial 2SLS estimation procedure
• ξ8,jt is the value of ξjt that results from solving the equation stpξt, θq “ St,
where θ is the initial 2SLS estimate of θ.
We found that it worked as well as and often better than the bootstrap, at a con-
siderably lower computational cost. We also experimented with using the opti-
mal instruments, obtained by a kernel regression of X and of K on the variables
x1, x2, x3, z1, . . . , z6.
7.3 Pseudo True Values for the 2SLS Approach
As explained earlier, the 2SLS estimator is not consistent for the true parameter
values, as it estimates an approximate model. We constructed estimates of the pseudo
33
true values to which our 2SLS estimators converge by simulating their probability
limit. A first approach increases the number of markets and computes our 2SLS
estimates for this large number of markets. The second approach computes estimates
of the population values of the moments of our 2SLS estimator.
7.3.1 Increasing-number-of-markets Approach
For each simulation, we keep the size and distribution of product characteristics for
each market fixed, but increase the number of markets. For each scenario, we cal-
culate the pseudo true value (and its standard error) by 20 simulations of 100,000
markets. Note that across different simulations, we generate different product char-
acteristics. Also, when calculating market shares, we use different random draws of
βi across different simulations, but the same random draws of βi within a simulation.
Estimates are calculated by the sample mean of the 20 simulations. Standard errors
are calculated by the sample standard errors of the 20 simulations.
7.3.2 Moment-based Approach
We can also calculate the pseudo true values in a different way. We first run the first
stage projection: Π “ pW 1W q´1W 1X for each simulation, where W is our matrix of
instruments and X is our matrix of regressors. We then take the average across all
the simulations to get our estimate of the population value of Π. Then in the second
stage, we calculate pWΠq1X and pWΠq1Y for each simulation, and then take averages
across all the simulations to get two matrices A and B. The final estimate is then
A´1B. In short, we have
Π “ Eall simulationsrpW1W q´1W 1Xs
A “ Eall simulationsrpWΠq1Xs
B “ Eall simulationsrpWΠq1Y s
Estimate “ A´1B
With this method, we only have the estimates but cannot get the standard errors.
We used 1000 simulations of 10,000 markets.
34
7.4 Monte Carlo Simulation Results
We used the SNOPT optimization package available from the Stanford Systems Op-
timization Laboratory to solve the nonlinear optimization problems for the MPEC
estimator. The software employs a sparse sequential quadratic programming (SQP)
algorithm with limited-memory quasi-Newton approximations to the Hessian of the
Lagrangian.
We run simulations for 9 scenarios obtained by setting three values for the variance
of the product random effects: σ2ξ “ Varpξq “ 0.1, 0.5, 1 and three values for the vector
of variances of the coefficients βi “ pβ0, βx1i, β
x2i, β
x3i, β
pi q1:
Varpβiq “ p0, 0.1, 0.1, 0.1, 0.05q, p0, 0.2, 0.2, 0.2, 0.1q, p0, 0.5, 0.5, 0.5, 0.2q.
Note that the square roots of the elements of Varpβjq represent the relative values of
the scale parameter σ of models 1, 2, and 5.
It is worth noting here that we explored other scenarii in which MPEC often failed
to converge, even though we are starting it from the true values of the parameters. In
particular, larger variances of ξ are problematic. It is also the reason why we reduced
the highest value of σ2p from 0.25 to 0.2.
All the other parameter specifications are as described above.
7.4.1 Distribution of the Estimates
We summarize the estimation results in Tables 1 to 9, where density plots are grouped
by parameter for all scenarii. These plots suggest that if the researcher is interested
in a precise estimate of the mean of the random coefficients, then using our 2SLS
approach does not imply any significant bias or loss in efficiency relative to the MPEC
approach.
The MPEC approach appears to dominate the 2SLS approach for the variance of
the random coefficients. The 2SLS estimators of the variances have a downward bias
that increases with the variance of the random coefficients. However, larger values of
the variance of ξjt do seem to improve the performance of the 2SLS estimator of the
variance of the random coefficients.
We also implemented the Petrin and Train (2010) control function approach. To
35
do this, we first run a linear regression of the price pjt on all 42 instruments. We
denote the residuals from this regression by εjt and we include them as an additional
covariate in the mean utility of product j. The middle panel of Tables 1 to 9 presents
the distribution of estimated parameters for the case that Varpξq “ 0.5. The middle
panels of Tables 8 and 9 shows the results of this experiment for the mean and variance
of the price coefficients in the central case Varpξq “ 0.5. The control function approach
exhibits substantial bias in the estimates of both means and variances of the random
price coefficients19.
7.4.2 Starting Values
We are giving a big advantage to MPEC in our comparisons, since we allow the
algorithm to start from the true values of the parameters. This is of course infeasible
in practice. With this initial boost, MPEC converges 100% of the time, after 1,030
iterations on average; the minimization takes 110 seconds on average. Our 2SLS
approach provides a more realistic alternative, in which we start MPEC from the
results of our 2SLS regression. This appears to work very well: MPEC converges
after an average 125 seconds and 1,280 iterations, again with a 100% success rate.
The resulting estimates are very close to those obtained when staring from the true
values: the difference is between 10´6 and 10´7.
These results are very encouraging for the use of our approach to find very good
starting values for the MPEC and nested-fixed point estimation procedures. Given
that 2SLS takes no time at all, we would strongly recommend running it before a
more sophisticated algorithm.
7.4.3 Pseudo-true Values
Tables 10 and 11 demonstrate that for most scenarii and coefficients, the pseudo true
values implied by our 2SLS procedure are not substantially different from the true
values. Based on these results, it is difficult to argue that a researcher would draw
conclusions from 2SLS estimates that differ in an economically or even statistically
meaningful way from those obtained with MPEC estimates.
19Other experiments (not reported here) show that this bias increases with the variance of ξjt, as
one would expect since the control function is inexact.
36
7.4.4 Price Elasticities
Based on the parameter estimates, we can estimate the own price elasticity of the
demand for each product. The graphs in table 12 plot the distribution of the difference
between the true price elasticity and the estimated price elasticity for the MPEC
approach, our standard 2SLS approach, and our corrected 2SLS estimator. For space
reasons, we only presents the results for five products: numbers 5, 10, 15, 20, 25.
Our simulations have variances pσ21, σ
22, σ
23, σ
2pq “ p0, 0.4, 0.4, 0.4, 0.2q with Varpξq “ 1.
Table 12 demonstrates that our procedure recovers nearly identical mean own-price
elasticities for products as the MPEC approach, although the spread for our estimates
is slightly larger than in the MPEC approach.
We also performed a set of simulations (with the same variances) to determine
if changing the true value of the price coefficient βp changes the performance of the
estimators. The results in tables 13 and 14 reinforce our previous conclusions about
the 2SLS approach. For a range of values of the mean value of the price coefficient,
our approach introduces minimal bias in the estimates of the means of the random
coefficients. The estimates of the variances of the random coefficients for our 2SLS
estimate continue to be downwards biased in general, but the bias is smaller for larger
price coefficients.
7.4.5 Variable Selection Tests
Researchers in empirical IO have little guidance on the list of characteristics X they
should include, or how to specify the matrix Σ. Experimenting with different speci-
fications is costly with the usual estimators. Our 2SLS approach, on the other hand,
makes variable selection very easy. We can decide whether a characteristic simply
by testing whether the corresponding covariate can be dropped from the estimat-
ing equation; and to decide whether we should allow for a random coefficient, we
only need to test whether the associated artificial regressor can be dropped from the
equation. We experimented with this approach to detecting random coefficients by
setting βx1 “ σx1 “ 0 in the data generating process and applying standard tests that
that the covariate x1 and/or the artificial regressor K1 has a zero coefficient in the
2SLS regression. We also performed this test using our corrected 2SLS estimates.
Tables 15 to 20 give the probability that the null hypothesis is not rejected, where
37
the null hypothesis is
• βx1 “ 0 (Tables 15 and 16)
• σx1 “ 0 (Tables 17 and 18)
• βx1 “ σx1 “ 0 (Tables 19 and 20).
The row labelled “2SLS with heteroskedasticity-robust standard error” is our 2SLS
estimate, using a standard heteroskedasticity-robust covariance matrix to compute
standard errors. The row labelled “GLS estimator and standard errors” uses Cragg’s
(1983) generalized least squares estimator and his recommended standard error es-
timates. The row labelled “2SLS with clustered standard errors” uses our 2SLS
estimates with standard errors clustered at market level.
Since the null hypothesis is true, each row in Tables 15-20 would ideally contain
0.99, 0.95, and 0.90. Clearly, our test rejects the null too often. In this particu-
lar application, this is probably better than the alternative: better to include more
variables and lose some efficiency than to incur bias by leaving them out. The size
distortion is smaller for tests on the means (Tables 15 and 16); it is also smaller when
we use corrected estimates. The clustered standard error estimates appear to have
the largest size distortions. On the whole, we take this to suggest that demonstrate
that our estimator can be used to good effect in order to decide which coefficients
should be modelled as random.
7.5 Lognormal Distribution for β
As explained in section 5.2, our estimating equation is the same whether the dis-
tribution of the random coefficients is normal or not. To illustrate this, we modify
the data-generating process so that the consumer preference parameters βi have a
lognormal distribution:
βi “ βiεi
βi “ p1, 1.5, 1.5, 0.5, 1q
lnpεiq „ Np´0.5σ2, σ2q.
38
We study several cases, with σ “ 0.3, 0.4, 0.5 and ξjt „ Np0, 0.1q. The rest of the
specification is as before. Lognormality induces significant skewness and kurtosis into
the distribution of the random coefficients. The standard 2SLS approach gives us
estimates of the first and second moments. We can also introduce the additional
artificial regressors T of section 6.2, either to control for skewness or to estimate it.
We experimented with both possibilities. Each plot in Tables 21, 22, and 23 shows
the distributions when we use only X and K (“only include 2nd moment”) and when
we add T (“include third moment”). These three tables report the distributions of
the estimates of the first, second, and third moments of β1, β2, β3, and βp. Table 24
provides the corresponding summary statistics.
For a variety of values for the parameter σ of our lognormal distribution, the
2SLS estimates are just as good as they were in the normal setup. The additional
information in the third moment of the random coefficients does not appreciably
increase the precision in our estimates of the means and variances of the random
coefficients. In fact, for some of the coefficients, including the third-order artificial
regressors T leads to significantly less efficient estimates. This is likely due to the fact
that our procedure has a difficult time estimating the third moment of the random
coefficients, as Table 23 shows.
7.5.1 Correction and Kernel
Our paper suggested two potential improvements to the standard 2SLS regression:
our two-step asymptotic correction described in section 6.3.3, and using a kernel
regression to estmate the optimal instruments EpX|Zq and EpK|Zq. We compare
both methods, when coefficients are normal with variances p0, 0.5, 0.5, 0.5, 0.2q and
when they are lognormally distributed with σ “ 0.4 for lnpεiq „ Np´0.5σ2, σ2q. In
both cases we took V arpξq “ 0.1.
Tables 25 and 26 plot the distributions of the estimators. They suggest that our
correction helps reduce the bias, both for the means and the variances. This holds
whether the random coefficients are normally or log-normally distributed. Using
kernel regressions to approximate the optimal instruments appears to slightly reduce
both the bias and the variance of some of the estimates.
39
Concluding Comments
Our FRAC estimation procedure applies directly to the random coefficients demand
models commonly used in empirical IO. For the most part, our Monte Carlo results
confirm the findings from the expansions. The 2SLS approach yields reliable estimates
of the parameters of the model and of economically meaningful quantities such as price
elasticities; and it does so at a very minimal cost. It is “robust” to variations on the
distribution of the random coefficients. In addition, it provides straightforward tests
that help in variable selection, especially as a guide to determine which coefficients
in the demand system should be modeled as random.
Some of our simulation results point to directions for future research. We hope to
report more general analytical results that illuminate these findings.
References
Amemiya, T. (1975), “The nonlinear limited-information maximum-likelihood
estimator and the modified nonlinear two-stage least-squares estimator,” Journal of
Econometrics, 3, 375–386.
Armstrong, T. (2016), “Large Market Asymptotics for Differentiated Product
Demand Estimators With Economic Models of Supply”, Econometrica, 84, 1961-1980.
Berry, S. (1994), “Estimating Discrete Choice Models of Product Differentia-
tion”, Rand Journal of Economics, 23, 242-262.
Berry, S., Levinsohn, J., and A. Pakes (1995), “Automobile Prices in Market
Equilibrium”, Econometrica, 60, 889-917.
Campioni, G. (2018), “Nonparametric Demand Estimation in Differentiated
Products Markets”, mimeo Yale.
Chesher, A. (1991), “The Effect of Measurement Error”, Biometrika, 78, 451–
462.
40
Chesher, A. and J. Santos-Silva (2002), “Taste Variation in Discrete Choice
Models”, Review of Economic Studies, 69, 147–168.
Cragg, J.G. (1983), “More Efficient Estimation in the Presence of Heteroscedas-
ticity of Unknown Form”, Econometrica, 51, 3, 751–763.
Crawford, G., and A. Yurukoglu (2012), “The Welfare Effects of Bundling
in Multichannel Television Markets,” American Economic Review, 102, 643-685.
Dube, J.-P., Fox, J., and C.-L. Su (2012), “Improving the Numerical Per-
formance of BLP Static and Dynamic Discrete Choice Random Coefficients Demand
Estimation”, Econometrica, 80, 2231-2267.
Fox, J., Kim, K., Ryan, S.. and P. Bajari (2011), “A simple estimator for
the distribution of random coefficients”, Quantitative Economics, 2, 381–418.
Gandhi, A. and J.-F. Houde (2016), “Measuring Substitution Patterns in
Differentiated Products Industries”, mimeo.
Harding, M. and J. Hausman (2007), “Using a Laplace Approximation to
Estimate the Random Coefficients Logit Model by Nonlinear Least Squares”, Inter-
national Economic Review, 48, 1311–1328.
Ho, K. and R. Lee (2017), “Insurer Competition in Health Care Markets”,
Econometrica, 85, 379-417.
Horowitz, J. and L. Nesheim (2018), “Using Penalized Likelihood to Select
Parameters in a Random Coefficients Multinomial Logit Model”, mimeo UCL.
Kadane, J. (1971), “Comparison of k-class Estimators when the Variance is
Small”, Econometrica, 39, 723–737.
Ketz, P. (2018), “On Asymptotic Size Distortions in the Random Coefficients
Logit Model”, mimeo Paris School of Economics.
Kristensen, D. and B. Salanie (2017), “Higher-order Properties of Approxi-
mate Estimators”, Journal of Econometrics, 189–208.
Lee, J. and K. Seo (2015), “A computationally fast estimator for random
coefficients logit demand models using aggregate data”, Rand Journal of Economics,
46, 86-102.
Reynaert, M. and F. Verboven (2014), “Improving the Performance of Ran-
dom Coefficients Demand Models: The Role of Optimal Instruments”, Journal of
41
Econometrics, 179, 83-98.
Robinson, P. (1988), “The Stochastic Difference between Econometric Statis-
tics”, Econometrica, 56, 531–548.
Su, C.-L. and K. Judd (2012), “Constrained Optimization Approaches to
Estimation of Structural Models”, Econometrica, 80, 2213-2230.
White, H. (1982), “Instrumental Variables Regression with Independent Ob-
servations”, Econometrica, 50(2), 483-499.
42
A Proof of Theorem 1
Proof. We start from (2); we drop the market index t and the bold letters. Since we
now denote Xε “ σx ¨ v, we can rewrite (2) in the standard model as
Sj “ EvexppXjβ ` σxj ¨ v ` ξjq
1`řJk“1 Sk exppXkβ ` σxk ¨ v ` ξkq
.
Given that
ξj “ logSjS0
´Xjβ ` a1jσ ` a2jσ2
2`Opσ3
q,
we get
Sj “ EvSj exp
´
σ pxj ¨ v ` a1jq ` a2jσ2
2`Opσ3q
¯
S0 `řJk“1 Sk exp
`
σ pxk ¨ v ` a1kq ` a2kσ2
2`Opσ3q
˘
Eliminating Sj gives
1 “ Evexp
´
σ`
x1jv ` a1j˘
` σ2
2a2j `Opσ
3q
¯
S0 `řJk“1 Sk exp
`
σ px1kv ` a1kq `σ2
2a2k `Opσ3q
˘ . (13)
In this form, Theorem 1.(i) is obvious since only the vectors xk and market shares Sk
enter the system of equations.
Now use the notation eSZ ”řJk“1 SkZk and Zj “ Zj ´ eSZ to rewrite (13) as
0 “ Ev
˜
Vj1` eSV
¸
(14)
where Vj ” fj ` αj ` fjαj, with
fj ” exppσxj ¨ vq ´ 1 “ σxj ¨ v `σ2
2pxj ¨ vq
2`OP pσ
3q
and
αj “ exppa1jσ ` a2jσ2{2`Opσ3
qq ´ 1 “ a1jσ ` pa2j ` a21jqσ2
2`Opσ3
q.
We note that fj is OP pσq and αj is Opσq, so that Vj is also OP pσq.
Now expanding (14) gives
Opσ3q “ EvVj ´ Ev
´
VjpeSV q¯
(15)
“ Evfj ` αj (16)
` {αjEvfj ´ peSαqEvfj ´ αjEvpeSfq ´ αjpeSαq ´ EvfjpeSfq. (17)
43
Only the terms on line (16) can be of order 1 in σ. But using Evv “ 0 and Evpxj ¨
vqpxk ¨ vq “ xj ¨xk gives us Evfj “σ2
2‖xj‖2`Opσ3q. Therefore the only term of order
1 is in αj “ pa1jσ`Opσ2q, and we must have pa1j “ 0. We note that the “hat” operator
is linear and invertible:
Lemma 1. If Zj “ Wj for all j and S0 ă 1, then Zj “ Wj.
Proof. Zj ´ eSZ “ Wj ´ eSW implies Zj “ Wj `λ, where λ “ eSZ ´ eSW . But then
eSZ “ eSW ` eSλ “ p1´ S0qλ, so that λ “ p1´ S0qλ “ 0.
Applying the lemma gives a1j “ 0. As a consequence, αj “ a2jσ2{2`Opσ3q; and
all terms on line (17) except the last one are of order at least 3 in σ. Since
Evfj ` αj “σ2
2z‖xj‖2 `
σ2
2pa2j `Opσ
3q
and
EvfjpeSfq “ σ2Evpxj ¨ vqppeSxq ¨ vq `Opσ3q “ σ2
pxj ¨ peSxqq `Opσ3q
applying the lemma again gives us p‖xj‖2 ` a2jq{2´ xj ¨ peSxq “ 0.
Finally, if the distribution of v is symmetric around 0 changing σ to ´σ in (2)
must leave all market shares unchanged; therefore all expansions can only contain
even-degree terms in σ.
B Detailed Examination of the Mixed Logit
The standard binary model is simply a mixed logit. Applying Theorem 1 with J “ 1
and using S0 ` S1 “ 1, we obtain
a21 “ p2S1 ´ 1q‖x1‖2
and K1 “ p1{2´ S1qX1X11. Therefore
ξ1 “ logS1
S0
´X1β ´
ˆ
1
2´ S1
˙
Tr ΣX1X11 `Opσ
kq
where k “ 3 in general, and k “ 4 if the distribution of ε is symmetric around zero.
44
The presence of the term p1{2´ S1q in this formula is a consequence of the sym-
metry of the distribution of v around 0 and of the logistic distribution around 0.
Taken together, this implies that market shares around one half vary very little with
σ. The random variation in tastes can only be identified from nonlinearities in the
market shares; but since the cdf of the logistic has an inflexion point when its value
is one half, market shares are essentially linear around that point. It is easy to check
that this is specific to the one-product case; when J ą 1, the mixed multinomial logit
does not face any such difficulty.
Let us focus for simplicity on the case when random variation in preferences is
uncorrelated across covariates: Σ is the nXˆnX diagonal matrix with elements Σmm.
Then given instruments such that E pξ1|Zq “ 0, the approximate model is
E
˜
logS1
S0
´X1β ´
ˆ
1
2´ S1
˙ nXÿ
m“1
ΣmmX21m
ˇ
ˇ
ˇZ
¸
“ 0. (18)
B.1 Identification
The form of the estimating equation holds interesting insights about identification.
First note that the optimal instruments are
fpZq “ E pX1|Zq , E
ˆˆ
1
2´ S1
˙
X21 |Z
˙
where X21 is the vector with components X2
1m. The asymptotic variance-covariance
matrix of our estimator θ is given by the usual formula:
T Vasθ » J´1V pξ1fpZqqJ
´1,
where
J “ E
ˆˆ
X1,
ˆ
1
2´ S1
˙
X21
˙
fpZq
˙
.
The identifying power of the (approximate) model relies on the full-rank of the ma-
trix J . Suppose for instance that after projecting (via nonparametric regression)
the regressors on the instruments, the residual variation in the artificial regressor
p1{2 ´ S1qX21m is very well explained in a linear regression on the other covariates.
Then the estimate of Σmm will be very imprecise, and random taste variation on the
characteristic X1m is probably best left out of the model. Of course, this can be
diagnosed immediately by looking at the precision of the 2SLS estimates.
45
B.2 Higher-order terms
It is easy to program a symbolic algebra system to compute higher-order terms alj for
l ą 2. We show here how to compute the fourth-order term in the mixed logit model
by hand. This will also illustrate the “robustness” of our method to distributional
assumptions.
Assume that ε has a distribution that is symmetric around zero, and that its com-
ponents are independent of each other with variances Σmm and fourth-order moments
km. As before, we assume that Σmm is of order σ2 and km is of order σ4. We also
assume that we can take expansions to order L ě 5.
Since the distribution is symmetric, we already know that
ξ1 “ logS1
S0
´X1β `a212σ2`a4124σ4`Opσ6
q.
Define Lptq “ 1{p1 ` expp´tqq the cdf of the logistic distribution. Note that L1 “
Lp1´ Lq, and that higher-order derivatives follow easily:
L2 “ Lp1´ Lqp1´ 2Lq
Lp3q “ Lp1´ Lqp1´ 6L` 6L2q
Lp4q “ Lp1´ Lqp1´ 2Lqp1´ 12L` 12L2q.
Since the market share of good 1 is
S1 “ EεL pX1pβ ` εq ` ξ1q
we obtain, much as in Appendix A,
S1 “ EεL
ˆ
logS1
S0
`X1ε` α2σ2` α4σ
4`Opσ6
q
˙
where we defined αl “ al1{l! for l “ 2, 4.
Let a 0 subscript indicate that we take the value and derivatives of Lptq at t “
logpS1{S0q. Defining upεq “X1ε` α2σ2 ` α4σ
4 `Opσ6q and expanding gives
L
ˆ
logS1
S0
` u
˙
“ L0 ` L10u`
L202u2 `
Lp3q0
6u3 `
Lp4q0
24u4 `Opu5q.
46
Incorporating L0 “ S1, L10 “ S1p1´ S1q, up to L
p4q0 gives
S1 “ Eε
ˆ
S1 ` S1p1´ S1qupεq ` S1p1´ S1qp1´ 2S1qupεq2
2
` S1p1´ S1qp1´ 6S1 ` 6S21qupεq3
6` S1p1´ S1qp1´ 2S1qp1´ 12S1 ` 12S2
1qupεq4
24`Opupεq5q
˙
;
dividing by S1p1´ S1q yields
Eεu`p1´2S1qEεu2{2`p1´6S1`6S2
1qEεu3{6`p1´2S1qp1´12S1`12S2
1qEεu4{24 “ EεOpu
5q.
(19)
Finally, up to order 6 in σ:
Eεu “ α2σ2` α4σ
4
Eεu2“ σ2Eε pX1εq
2` α2
2σ4“
nXÿ
m“1
Σmmx21m ` α
22σ
4
Eεu3“ 3α2σ
4Eε pX1εq2“ 3α2
nXÿ
m“1
Σmmx21m
Eεu4“ σ4Eε pX1εq
4“
nXÿ
m“1
kmx41m.
Regrouping terms in σ2 in (19) confirms that
α2σ2“ pS1 ´ 1{2q
nXÿ
m“1
Σmmx21m,
which we knew from Theorem 1. The terms in σ4 give us
α4σ4“ α2
2σ4pS1 ´ 1{2q ´ α2σ
2p1´ 6S1 ` 6S2
1q
nXÿ
m“1
Σmmx21m{2
´ p1´ 2S1qp1´ 12S1 ` 12S21q
nXÿ
m“1
kmx41m{24.
This simplifies to
α4σ4“
ˆ
1
2´ S1
˙
ˆ
¨
˝
ˆ
1
4´ 2S1p1´ S1q
˙
˜
nXÿ
m“1
Σmmx21m
¸2
´
ˆ
1
12´ S1p1´ S1q
˙ nXÿ
m“1
kmx41m
˛
‚.
47
This formula may not seem especially enlightening, but it shows several impor-
tant points. First, terms of higher orders can be computed without much difficulty.
Second, each additional term adds information on lower-order moments (here σ2m), as
well as on the moments of higher order (here km). The model remains linear in the
highest order moments; here for km we have new artificial regressorsˆ
1
2´ S1
˙ˆ
1
12´ S1p1´ S1q
˙
x41m.
On the other hand, the higher-order expansions introduce nonlinear functions of the
lower-order moments, here represented by
ˆ
1
2´ S1
˙ˆ
1
4´ 2S1p1´ S1q
˙
˜
nXÿ
m“1
Σmmx21m
¸2
,
and the model is not linear in these parameters any more. This could be dealt with
in several ways: by nonlinear optimization (of a very simple kind), or by iterative
methods. In any case, we will see in our simulations that stopping with the second-
order expansion often gives results that are already very reliable.
Finally, while the estimator based on the second-order expansion is “robust” to
any (well-behaved) distribution, the estimator based on this fourth-order expansion
also assumes symmetry: a skewed distribution would generate terms in σ3. Making
more assumptions changes the form of the artificial regressors. To illustrate this,
consider a mixed logit with one covariate only (nX “ 1). The expansion to order 2L
can be written
ξ1 “ logS1
S0
´ βX1 `
Lÿ
k“1
tkpS1q`
Σ11X21
˘k`Opσ2L`2
q.
Assume that ε has normal kurtosis. Then k1 “ 3Σ211 and we find the simpler formula
t2 “ α4 “
ˆ
1
2´ S1
˙
S1p1´ S1q.
Specializing further, Figure 1 in the main text plots the terms tkpS1q for k “
1, 2, 3, 4 as the market share goes from zero to one when ε is Gaussian.
If the components of ε are independently distributed and have third moments
ps1, . . . , snXq, then it is easy to see that an additional term
ˆ
S1p1´ S1q ´1
6
˙ nXÿ
m“1
smX31m
48
enters the expansion. To test for skewness on covariate m, one could simply test that
the regressor`
S1p1´ S1q ´16
˘
X31m can be omitted.
B.3 Beyond Logit and Gaussian
The properties of the logistic function may seem to have been more central to our
calculations; but in fact they are quite ancillary. Suppose that ui1t ´ ui0t has some
distribution with cdf Q instead of L. While the derivatives of Q may not obey the
nice polynomial formulæ we used for L, it is still true that if Q is invertible and
smooth then we can define functions Fk by
Qpkqptq “ FkpQptqq.
This is all we need to carry out the expansions. One can show for instance that the
factor pS1 ´12q that appears in (18) just needs to be replaced with
´F2pS1q
2F1pS1q.
Take for instance a mixed binary model with such a general distribution for u1 ´
u0, and a distribution of the random coefficient on the single covariate X1 that has
successive moments 0,Σ, µ3, µ4. Then it is easy to derive the following fourth-order
expansion, which could perhaps serve as the basis for a semiparametric estimator:
ξ2 “ logS1
S0
´ βX1 ´F2pS1q
F1pS1qX2
1Σ
´F3pS1q
F1pS1qX3
1µ3
`F2pS1q
F1pS1q
˜
3F3pS1q
F1pS1q´
ˆ
F2pS1q
F1pS1q
˙2¸
X41Σ2
´F4pS1q
F1pS1qµ4X
41 `Opσ
5q.
This can be extended in the obvious way to make v heteroskedastic (just replace Σ
with Epε2|X1q and µm with Epεm|X1q in the above formula.)
49
C The Two-level Nested Logit
In the unmixed model (σ “ 0) the mean utility of alternative j is Uj “ Ik`λk logSj|Nk
if j P Nk, with Ik ” logpSNk{S0q and Sj|Nk
” Sj{SNk.This gives
ξ0j “ ´Xjβ ` logpSNk{S0q ` λk logSj|Nk
.
We write (imposing a1j “ 0 from the start as this is a general property of models
with Ev “ 0)
Ujpvq “ logpSNk{S0q ` λk logSj|Nk
` σxj ¨ v `σ2
2a2j
and
exppIkpvq{λkq “ÿ
jPNk
exppUjpvq{λkq “ pSNk{S0q
1{λk fkpvq
where we denote Xk “ř
jPNkSj|Nk
Xj and
fjpvq “ exp
ˆ
σ
λk
´
xj ¨ v ` σa2j2
¯
˙
» 1`σ
λkpxj ¨ vq `
σ2
2λ2k
`
λka2j ` pxj ¨ vq2˘
so that
fkpvq » 1`σ
λksxk ¨ v `
σ2
2λ2k
`
λka2k ` Ğpx ¨ vq2k˘
.
Now using
Sj “ Ev expppUjpvq ´ Ikpvqq{λkqexppIkpvqq
1`řKl“1 exppIlpvqq
we get
1 “ Ev
˜
fjpvq
fkpvq
`
fkpvq˘λk
S0 `řKl“1 SNl
`
flpvq˘λl
¸
.
We note that
1` aσ ` bσ2
1` cσ ` dσ2“ 1` pa´ cqσ ` pb´ d´ cpa´ cqqσ2
`Opσ3q. (20)
Denote Aj|k “ Aj ´ Ak. Applying (20) gives
fjpvq
fkpvq» 1`
σ
λkCjpvq `
σ2
2λ2kDjpvq.
50
with
Cjpvq “ xj|k ¨ v
and
Djpvq “ λk pa2j|k `{px ¨ vq2j|k ´ 2pxk ¨ vqpxj|k ¨ vq.
Moreover,
`
flpvq˘λl» 1` σxl ¨ v `
σ2
2
˜
λl ´ 1
λlpxl ¨ vq
2` a2l `
Ğpx ¨ vq2lλl
¸
and
`
fkpvq˘λk
S0 `řKl“1 SNl
`
flpvq˘λl
»
1` σxk ¨ v `σ2
2
´
a2k `λk´1λkpxk ¨ vq
2 `Ğpx¨vq2kλk
¯
1` σeSx ¨ v `σ2
2
´
eSa2 `řKl“1 SNl
´
λl´1λlpxl ¨ vq2 `
Ğpx¨vq2lλl
¯¯
where as usual eST “řJj“1 SjTj “
řKk“1 SNk
Tk.
Then, using (20) again,
`
fkpvq˘λk
S0 `řKl“1 SNl
`
flpvq˘λl
» 1` σEkpvq `σ2
2Fkpvq
with
Ekpvq “ pxk ´ eSxq ¨ v
and
Fkpvq “ a2k ´ eSa2
`λk ´ 1
λkpxk ¨ vq
2´
Kÿ
l“1
SNl
λl ´ 1
λlpxl ¨ vq
2
`Ğpx ¨ vq2kλk
´
Kÿ
l“1
SNl
Ğpx ¨ vq2lλl
´ 2peSx ¨ vqppxk ´ eSxq ¨ vq.
This allows us to write
1 » Ev
ˆ
1`σ
λkCj `
σ2
2λ2kDj
˙ˆ
1` σEk `σ2
2Fk
˙
» Ev
ˆ
1` σ
ˆ
Cjλk` Ek
˙
`σ2
2λ2k
`
Dj ` λ2kFk ` 2λkCjEk
˘
˙
.
51
We have EvCj “ EvEk “ 0; also,
EDj “ λk pa2j|k ` ‖xj‖2 ´Ę‖x‖2k ´ 2xk ¨ xj|k
EFk “ a2k ´ eSa2
`λk ´ 1
λk‖xk‖2 ´
Kÿ
l“1
SNl
λl ´ 1
λl‖xl‖2
`Ę‖x‖2kλk
´
Kÿ
l“1
SNl
Ę‖x‖2lλl
´ 2peSxq ¨ pxk ´ eSxq
EpCjEkq “ xj|k ¨ pxk ´ eSxq.
Writing EpDj ` λ2kFk ` 2λkCjEkq “ 0 gives us an equation of the form
λkpa2j ´ a2kq ` λ2kpa2k ´ eSa2q “ λ2kM ` νk ` µj
where
M “
Kÿ
l“1
SNl
λl ´ 1
λl‖xl‖2 `
Kÿ
l“1
SNl
Ę‖x‖2lλl
´ 2‖eSx‖2
νk “ Ę‖x‖2k ´ 2‖xk‖2 ´ λkpλk ´ 1q‖xk‖2 ´ λkĘ‖x‖2k ` 2λ2keSx ¨ xk ` 2λk‖xk‖2 ´ 2λkxk ¨ eSx
“ p1´ λkq`
Ę‖x‖2k ´ p2´ λkq‖xk‖2´ 2λkxk ¨ eSx
˘
(21)
µj “ ´‖xj‖2 ` 2xj ¨ xk ´ 2λkxj ¨ pxk ´ eSxq
“ xj ¨ p2λkeSx´ xj ` 2p1´ λkqxkq . (22)
It is easy to aggregate from a2j “ p1´ λkqa2k ` λkeSa2 ` λkM ` pνk ` µjq{λk to
a2k “ eSa2 `M `νk ` µkλ2k
and then to
S0eSa2 “ p1´ S0qM `
Kÿ
k“1
SNk
νk ` µkλ2k
,
52
which gives
a2j “ eSa2 `M ` p1´ λkqνk ` µkλ2k
`νk ` µjλk
“M
S0
`1
S0
Kÿ
l“1
SNl
νl ` µlλ2l
` p1´ λkqνk ` µkλ2k
`νk ` µjλk
“M
S0
`1
S0
Kÿ
l“1
SNl
νl ` µlλ2l
`νk ` p1´ λkqµk
λ2k`µjλk.
Finally, using equations (21) and (22) we aggregate
µk “ 2λkxk ¨ eSx` 2p1´ λkq‖xk‖2 ´Ę‖x‖2k,
which gives
νk ` µk “ 2λ2kxk ¨ eSx` λkp1´ λkq‖xk‖2 ´ λkĘ‖x‖2kand
νk ` p1´ λkqµk “ ´λkp1´ λkq‖xk‖2.
Putting everything together, we get
a2j “M
S0
`1
S0
Kÿ
l“1
SNl
νl ` µlλ2l
`νk ` p1´ λkqµk
λ2k`µjλk
“1
S0
˜
Kÿ
l“1
SNl
λl ´ 1
λl‖xl‖2 `
Kÿ
l“1
SNl
Ę‖x‖2lλl
´ 2‖eSx‖2¸
`2
S0
‖eSx‖2 `1
S0
Kÿ
l“1
SNl
´Ę‖x‖2l ` p1´ λlq‖xl‖2
λl
“ xj ¨
ˆ
2eSx´xjλk` 2
1´ λkλk
xk
˙
´1´ λkλk‖xk‖2.
D Our Correction Formula
Remember from section 6.3.2 that
θ0 » θ2 ´
ˆ
EBf8Bθpθ2;λ0q
˙´1
f8pθ2;λ0q. (23)
53
The term in the inverse is easily proxied:
EBf8Bθpθ2;λ0q » E
Bf2Bθpθ2q “ E
ˆ
Bξ2Bθpθ2q
˙1
V V 1Bξ2Bθpθ2q,
since ξ2 is linear in θ. Note that this is EX 1X , where
X ” V 1Bξ2Bθpθ2q “ ´V
1pX,Kq
and row j “ 1, . . . , J of pX,Kq lists the covariates and artificial regressors for this
product. It follows that
EBf8Bθpθ2;λ0q »
˜
E pX 1V V 1Xq E pX 1V V 1Kq
E pK 1V V 1Xq E pK 1V V 1Kq
¸
.
To the second-order in e2, Ef8pθ2;λ0q equals
E
ˆ
Bξ2Bθpθ2q
1V V 1e2pθ2;λ0q
˙
` E
ˆ
Be2Bθpθ2;λ0q
1V V 1ξ2pθ2q
˙
. (24)
The first term in (24) is simply E pX 1V 1e2q. Going back to (23), we get
θ0 » θ2 ´
˜
E pX 1V V 1Xq E pX 1V V 1Kq
E pK 1V V 1Xq E pK 1V V 1Kq
¸´1ˆ
E pX 1V 1e2q ` E
ˆ
Be2Bθ
1
V V 1ξ2
˙˙
.
Finally, using Theorem 1(i), we know that Be2Bβ“ 0. Therefore
E pX 1V 1e2q ` E
ˆ
Be2Bθ
1
V V 1ξ2
˙
“
˜
´E pX 1V V 1e2q
´E pK 1V V 1e2q ` E´
Be2BΣ
1V V 1ξ2
¯
¸
.
E Proof of Theorem 3
We drop the bold letters in this proof to alleviate the notation, and without loss of
generality we normalize B “ 1.
Remember thatGpy, F py, β, σq, β, σq “ 0, so thatGpy, F py, β, 0q, β, 0q “ 0. Given (11),
this gives G˚py, A˚py, F py, β, 0q ´ f1pyqβ, 0q “ 0 for all β. This can only hold if
F py, β, 0q ´ f1pyqβ does not depend on β, which implies condition C2. Denoting
f0pyq “ F py, β, 0q ´ f1pyqβ, we obtain
G˚py, A˚py, f0pyq, 0qq “ 0.
54
Now writing G˚py, EvA˚py, F py, β, σq´f1pyqβ, σvqq “ 0 as an identity in σ and taking
derivatives with respect to σ, we get
G˚2Ev pA˚2Fσ ` A
˚3vq “ 0
G˚22rEv pA˚2Fσ ` A
˚3vqsrEv pA
˚2Fσ ` A
˚3vqs
`G˚2Ev pA˚2Fσσ ` A
˚22rFσ, Fσs ` 2A˚23rFσ, vs ` A
˚33rv, vsq “ 0.
Fortunately, this simplifies greatly at σ “ 0. The first equation gives
G˚2Ev pA˚2Fσpy, β, 0q ` A
˚3vq “ 0,
where the derivatives A˚2 and A˚3 do not depend on v since σ “ 0. It follows that
G˚2A˚2Fσpy, β0, 0q “ 0 since Ev “ 0. Given our invertibility assumption, condition C1
also holds. Using the second equation at σ “ 0, and given that Fσpy, β0, 0q “ 0, we
get
G˚2Ev pA˚22rFσ, Fσs ` 2A˚23rFσ, vsq “ 0
so that
G˚2 pA˚2Fσσ ` A
˚33q “ 0.
Given that G˚2 is invertible, this gives (reintroducing the arguments)
A˚2py, f0pyq, 0qFσσpy, β, 0q ` A˚33py, f0pyq, 0q “ 0.
Therefore Fσσpy, β, 0q is independent of β and condition C3 holds. Noting that
f2pyq “ ´Fσσpy, β, 0q completes the proof.
55
List of Tables
1 Distribution of the Estimates of β0 . . . . . . . . . . . . . . . . . . . 57
2 Distribution of the Estimates of βx1 . . . . . . . . . . . . . . . . . . . 58
3 Distribution of the Estimates of Varpβx1 q . . . . . . . . . . . . . . . . 59
4 Distribution of the Estimates of βx2 . . . . . . . . . . . . . . . . . . . 60
5 Distribution of the Estimates of Varpβx2 q . . . . . . . . . . . . . . . . 61
6 Distribution of the Estimates of βx3 . . . . . . . . . . . . . . . . . . . 62
7 Distribution of the Estimates of Varpβx3 q . . . . . . . . . . . . . . . . 63
8 Distribution of the Estimates of βp . . . . . . . . . . . . . . . . . . . 64
9 Distribution of the Estimates of Varpβpq . . . . . . . . . . . . . . . . 65
10 Pseudo True Value: Increasing-number-of-markets Approach . . . . . 66
11 Pseudo True Value: Moment-based Approach . . . . . . . . . . . . . 67
12 Distribution of the Difference between True and Estimated Elasticity 68
13 Distribution of the Estimates of the Means — Different βp . . . . . . 69
14 Distribution of the Estimates of the Variances — Different βp . . . . 70
15 Testing for Zero Means — Standard 2SLS . . . . . . . . . . . . . . . 71
16 Testing for Zero Means — Corrected 2SLS . . . . . . . . . . . . . . . 72
17 Testing for Zero Variances — Standard 2SLS . . . . . . . . . . . . . . 73
18 Testing for Zero Variances — Corrected 2SLS . . . . . . . . . . . . . 74
19 Joint Test of Zero Means and Variances — Standard 2SLS . . . . . . 75
20 Joint Test of Zero Means and Variances — Corrected 2SLS . . . . . . 76
21 Distribution of the Estimates of the Means (Lognormal Case) . . . . 77
22 Distribution of the Estimates of the Variances (Lognormal Case) . . . 78
23 Distribution of the Estimates of the Third-order Moments (LognormalCase) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
24 Summary Statistics for the Lognormal Case . . . . . . . . . . . . . . 80
25 Distribution of Three Estimates of the Means — Normal and Lognormal 81
26 Distribution of Three Estimates of the Variances — Normal and Log-normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
56
Table 1: Distribution of the Estimates of β0
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
57
Table 2: Distribution of the Estimates of βx1
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
58
Table 3: Distribution of the Estimates of Varpβx1 q
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
59
Table 4: Distribution of the Estimates of βx2
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
60
Table 5: Distribution of the Estimates of Varpβx2 q
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
61
Table 6: Distribution of the Estimates of βx3
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
62
Table 7: Distribution of the Estimates of Varpβx3 q
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
63
Table 8: Distribution of the Estimates of βp
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
64
Table 9: Distribution of the Estimates of Varpβpq
Var(ξ)=0.1 Var(ξ)=0.1 Var(ξ)=0.1Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=0.5 Var(ξ)=0.5 Var(ξ)=0.5Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
Var(ξ)=1.0 Var(ξ)=1.0 Var(ξ)=1.0Var(β)=(0,0.1,0.1,0.1,0.05) Var(β)=(0,0.2,0.2,0.2,0.1) Var(β)=(0,0.5,0.5,0.5,0.2)
65
Table 10: Pseudo True Value: Increasing-number-of-markets Approach
Parameter ScenariosTrue Varpβq : p0, 0.1, 0.1, 0.1, 0.05q p0, 0.2, 0.2, 0.2, 0.1q p0, 0.5, 0.5, 0.5, 0.2qTrue Varpξq : 0.1 0.5 1 0.1 0.5 1 0.1 0.5 1
β0 “ ´1-1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.02 -1.03 -1.03
(.0043) (.0050) (.0058) (.011) (.012) (.013) (.032) (.035) (.038)
βx1 “ 1.51.51 1.51 1.51 1.53 1.53 1.53 1.56 1.57 1.57
(.022) (.023) (.024) (.050) (.050) (.050) (.13) (.13) (.13)
βx2 “ 1.51.51 1.51 1.51 1.52 1.52 1.52 1.55 1.56 1.56
(.023) (.024) (.025) (.048) (.049) (.049) (.12) (.12) (.12)
βx3 q “ 0.50.487 0.487 0.487 0.465 0.465 0.464 0.403 0.400 0.398(.022) (.022) (.022) (.048) (.047) (.047) (.12) (.12) (.11)
βp “ ´1-0.999 -0.999 -0.999 -0.990 -0.990 -0.990 -0.954 -0.955 -0.956(.0086) (.0088) (.0090) (.0184) (.0186) (.0188) (.043) (.044) (.045)
Varpβx1 q0.0857 0.0856 0.0856 0.152 0.152 0.152 0.288 0.290 0.291(.011) (.011) (.011) (.028) (.027) (.027) (.078) (.076) (.075)
Varpβx2 q0.0863 0.0865 0.0866 0.152 0.152 0.153 0.284 0.286 0.288(.0086) (.0086) (.0087) (.0205) (.020) (.020) (.059) (.057) (.056)
Varpβx3 q0.0952 0.0949 0.0946 0.182 0.181 0.181 0.400 0.399 0.397(.0097) (.010) (.010) (.024) (.023) (.023) (.063) (.063) (.062)
Varpβpq0.0480 0.0479 0.0478 0.0888 0.088 0.088 0.148 0.147 0.147(.0056) (.0057) (.0059) (.013) (.013) (.014) (.031) (.032) (.033)
66
Table 11: Pseudo True Value: Moment-based Approach
Parameter ScenariosTrue Varpβq : p0, 0.1, 0.1, 0.1, 0.05q p0, 0.2, 0.2, 0.2, 0.1q p0, 0.5, 0.5, 0.5, 0.2qTrue Varpξq : 0.1 0.5 1 0.1 0.5 1 0.1 0.5 1β0 “ ´1 -1.01 -1.01 -1.01 -1.04 -1.04 -1.04 -1.11 -1.11 -1.12βx1 “ 1.5 1.49 1.49 1.49 1.48 1.48 1.48 1.43 1.43 1.43βx2 “ 1.5 1.49 1.49 1.49 1.48 1.48 1.48 1.43 1.43 1.43βx3 “ 0.5 0.496 0.496 0.496 0.486 0.486 0.486 0.455 0.455 0.455βp “ ´1 -0.989 -0.988 -0.988 -0.958 -0.957 -0.955 -0.873 -0.869 -0.864Varpβx1 q 0.0854 0.0855 0.0855 0.149 0.149 0.149 0.275 0.275 0.276Varpβx2 q 0.0855 0.0855 0.0856 0.149 0.149 0.149 0.273 0.274 0.274Varpβx3 q 0.0938 0.0938 0.0937 0.176 0.175 0.175 0.369 0.368 0.366Varpβpq 0.0421 0.0421 0.0419 0.0685 0.0681 0.0676 0.0920 0.0906 0.0888
67
Table 12: Distribution of the Difference between True and Estimated Elasticity
Product 5 Product 10 Product 15
Product 20 Product 25 Legend
68
Table 13: Distribution of the Estimates of the Means — Different βp
βp “ ´1 βp “ ´2 βp “ ´3Distribution of β0 Distribution of β0 Distribution of β0
Distribution of βx1 Distribution of βx1 Distribution of βx1
Distribution of βx2 Distribution of βx2 Distribution of βx2
Distribution of βx3 Distribution of βx3 Distribution of βx3
Distribution of βp Distribution of βp Distribution of βp
69
Table 14: Distribution of the Estimates of the Variances — Different βp
βp “ ´1 βp “ ´2 βp “ ´3Distribution of Varpβx1 q Distribution of Varpβx1 q Distribution of Varpβx1 q
Distribution of Varpβx2 q Distribution of Varpβx2 q Distribution of Varpβx2 q
Distribution of Varpβx3 q Distribution of Varpβx3 q Distribution of Varpβx3 q
Distribution of Varpβpq Distribution of Varpβpq Distribution of Varpβpq
70
Table 15: Testing for Zero Means — Standard 2SLS
Significance level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.904 0.793 0.711GLS estimator and standard errors 0.889 0.765 0.6782SLS with clustered standard error 0.904 0.780 0.702
71
Table 16: Testing for Zero Means — Corrected 2SLS
Significance level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.915 0.819 0.725GLS estimator and standard errors 0.882 0.767 0.6692SLS with clustered standard error 0.906 0.809 0.731
72
Table 17: Testing for Zero Variances — Standard 2SLS
Significance level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.746 0.625 0.556GLS estimator and standard errors 0.738 0.618 0.5422SLS with clustered standard error 0.740 0.617 0.547
73
Table 18: Testing for Zero Variances — Corrected 2SLS
Significance level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.792 0.688 0.626GLS estimator and standard errors 0.773 0.680 0.6132SLS with clustered standard error 0.775 0.679 0.620
74
Table 19: Joint Test of Zero Means and Variances — Standard 2SLS
Significance level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.731 0.578 0.507GLS estimator and standard errors 0.699 0.545 0.4832SLS with clustered standard error 0.702 0.565 0.498
75
Table 20: Joint Test of Zero Means and Variances — Corrected 2SLS
Significant level 1% 5% 10%2SLS with heteroskedasticity-robust standard error 0.756 0.642 0.568GLS estimator and standard errors 0.738 0.631 0.5482SLS with clustered standard error 0.749 0.617 0.546
76
Table 21: Distribution of the Estimates of the Means (Lognormal Case)
σ “ 0.3 σ “ 0.4 σ “ 0.5Distribution of β0 Distribution of β0 Distribution of β0
Distribution of βx1 Distribution of βx1 Distribution of βx1
Distribution of βx2 Distribution of βx2 Distribution of βx2
Distribution of βx3 Distribution of βx3 Distribution of βx3
Distribution of βp Distribution of βp Distribution of βp
77
Table 22: Distribution of the Estimates of the Variances (Lognormal Case)
σ “ 0.3 σ “ 0.4 σ “ 0.5Distribution of Varpβx1 q Distribution of Varpβ1q Distribution of Varpβ1q
Distribution of Varpβx2 q Distribution of Varpβ2q Distribution of Varpβ2q
Distribution of Varpβx3 q Distribution of Varpβ3q Distribution of Varpβ3q
Distribution of Varpβpq Distribution of Varpβpq Distribution of Varpβpq
78
Table 23: Distribution of the Estimates of the Third-order Moments (LognormalCase)
σ “ 0.3 σ “ 0.4 σ “ 0.5Distribution of βx1 3rdM Distribution of βx1 3rdM Distribution of βx1 3rdM
Distribution of βx2 3rdM Distribution of βx2 3rdM Distribution of βx2 3rdM
Distribution of βx3 3rdM Distribution of βx3 3rdM Distribution of βx3 3rdM
Distribution of βp 3rdM Distribution of βp 3rdM Distribution of βp 3rdM
79
Table 24: Summary Statistics for the Lognormal Case
Parameter Scenariosσ σ “ 0.3 σ “ 0.4 σ “ 0.5
Moments included: 2 3 2 3 2 3
β0 “ ´1-1.03 -1.01 -1.06 -1.03 -1.10 -1.06
(0.013) (0.080) (0.021) (0.092) (0.031) (0.107)
βx1 “ 1.51.49 1.49 1.47 1.48 1.44 1.45
(0.032) (0.057) (0.056) (0.072) (0.086) (0.096)
βx2 “ 1.51.49 1.49 1.47 1.47 1.44 1.45
(0.033) (0.054) (0.057) (0.070) (0.088) (0.093)
βx3 “ 0.50.499 0.502 0.497 0.502 0.496 0.501
(0.020) (0.039) (0.036) (0.047) (0.056) (0.060)
βp “ ´1-0.976 -0.991 -0.946 -0.972 -0.906 -0.940(0.014) (0.076) (0.020) (0.084) (0.027) (0.095)
Varpβx1 q “ 0.212{0.390{0.6390.153 0.159 0.229 0.241 0.295 0.316
(0.020) (0.030) (0.039) (0.048) (0.062) (0.072)
Varpβx2 q “ 0.212{0.390{0.6390.153 0.160 0.228 0.244 0.294 0.319
(0.020) (0.028) (0.038) (0.045) (0.060) (0.067)
Varpβx3 q “ 0.04{0.043{0.0710.025 0.030 0.045 0.036 0.068 0.056
(0.014) (0.033) (0.025) (0.041) (0.038) (0.059)
Varpβpq “ 0.094{0.174{0.2840.0579 0.077 0.081 0.115 0.099 0.145
(0.007 ) (0.095) (0.011) (0.11) (0.016) (0.12)
3rdMpβx1 q “ 0.093{0.322{0.8940.044 0.096 0.160
(0.067) (0.091) (0.135)
3rdMpβx2 q “ 0.093{0.322{0.8940.043 0.097 0.161
(0.082) (0.101) (0.130)
3rdMpβx3 q “ 0.003{0.012{0.033-0.0073 -0.012 -0.015(0.070) (0.084) (0.120)
3rdMpβpq “ ´0.027{ ´ 0.096{ ´ 0.265-0.0095 -0.018 -0.024(0.050) (0.058) (0.066)
80
Table 25: Distribution of Three Estimates of the Means — Normal and Lognormal
Normal LognormalDistribution of β0 Distribution of β0
Distribution of βx1 Distribution of βx1
Distribution of βx2 Distribution of βx2
Distribution of βx3 Distribution of βx3
Distribution of βp Distribution of βp
81
Table 26: Distribution of Three Estimates of the Variances — Normal and Lognormal
Normal LognormalDistribution of Varpβx1 q Distribution of Varpβx1 q
Distribution of Varpβx2 q Distribution of Varpβx2 q
Distribution of Varpβx3 q Distribution of Varpβx3 q
Distribution of Varpβpq Distribution of Varpβpq
82