15. bayesian methods - university of california,...
TRANSCRIPT
15. Bayesian Methods
c A. Colin Cameron & Pravin K. Trivedi 2006
These transparencies were prepared in 2003.
They can be used as an adjunct to
Chapter 13 of our subsequent book
Microeconometrics: Methods and Applications
Cambridge University Press, 2005.
Original version of slides: May 2003
Outline
1. Introduction
2. Bayesian Approach
3. Bayesian Analysis of Linear Regression
4. Monte Carlo Integration
5. Markov Chain Monte Carlo Simulation
6. MCMC Example: Gibbs Sampler for SUR
7. Data Augmentation
8. Bayesian Model Selection
9. Practical Considerations
1 Introduction
� Bayesian regression has grown greatly since books byArnold Zellner (1971) and Leamer (1978).
� Controversial. Requires specifying a probabilistic modelof prior beliefs about the unknown parameters.[Though role of prior is negligible in large samplesand relatively uninformative priors can be speci�ed.]
� Growth due to computational advances.
� In particular, despite analytically intractable poste-rior can use simulation (Monte Carlo) methods to
� estimate posterior moments
� make draws from the posterior.
2 Bayesian Approach
1. Prior �(�)Uncertainty about parameters � explicitly modelledby density �(�).
e.g. � is an income elasticity and on basis of eco-nomic model or previous studies it is felt thatPr[0:8 � � � 1:2] = 0:95.Possible prior is � � N [1; 0:12].
2. Sample joint density or likelihood f(yj�)Similar to ML framework.In single equation case y is N�1 vector and depen-dence on regressors X is suppressed.
3. Posterior p(�jy)Obtained by combining prior and sample.
2.1 Bayes Theorem
� Bayes inverse law of probability gives posterior
p(�jy) = f(yj�)�(�)f(y)
; (1)
where f(y) is marginal (wrt �) prob. distn of y
f(y) =Zf(yj�)�(�)d�: (2)
� Proof: Use Pr[AjB] = Pr[A\B]Pr[B]
=Pr[BjA] Pr[A]
Pr[B]:
� f(y) in (1) is free of �, so can write p(�jy) asproportional to the product of the pdf and the prior
p(�jy) _ f(yj�)�(�): (3)
� Big di¤erence
� Frequentist: �0 is constant and b� is random.� Bayesian: � is random.
2.2 Normal-Normal iid Example
1. Sample Density f(yj�)Assume yij� � N [�; �2] with � unknown and �2given.
f(yj�) =�2��2
��N=2exp
8<:�NXi=1
(yi � �)2 =2�29=;
_ exp�� N
2�2(�y � �)2
�;
2. Prior �(�).Suppose � � N [�; �2] where � and �2 are given.
�(�) =�2��2
��1=2exp
n� (� � �)2 =2�2
o_ exp
�� 1
2�2(� � �)2
�;
3. Posterior density p(�jy)
p(�jy) _ exp�� N
2�2(y � �)2
�exp
�� 1
2�2(� � �)2
�
� After some algebra (completing the square)
p(�jy) _ exp
(�12
"(� � �)2
�2+
(y � �)2
N�1�2 + �2
#)
_ exp
(�12
"(� � �1)2
�21
#)
�1 = �21
�Ny=�2 + �=�2
��21 =
�N=�2 + 1=�2
��1:
� Properties of posterior:
� Posterior density is �jy � N [�1; �21]:
� Posterior mean �1 is weighted average of priormean � and sample average y.
� Posterior precision ��21 is sum of sample preci-sion of y, N=�2, and prior precision 1=�2.[Precision is the reciprocal of the variance.]
� As N !1, �jy � N [y; �2=N ]
Normal-Normal Examplewith �2 = 100; � = 5, �2 = 3, N = 50 and �y = 10.
2.3 Speci�cation of the Prior
� Tricky. Not the focus of this talk.
� Prior can be improper yet yield proper posterior.
� A noninformative prior has little impact on the re-sulting posterior distribution.Use Je¤reys prior, not uniform prior, as invariant toreparametrization.
� For informative prior prefer natural conjugate prioras it yields analytical posterior.)Exponential family prior, density and posterior.e.g. normal-normal, Poisson-gamma
� Hierarchical priors popular for multilevel models.
2.4 Measures Related to Posterior
� Marginal Posterior:p(�kjy) =
Rp(�1; :::; �djy)d�1::d�k�1d�k+1::d�d:
� Posterior Moments: mean/median; standard devn.
� Point Estimation: no unknown �0 to estimate.Instead �nd value of � that minimizes a loss function.
� Posterior Intervals (95%):Prh�k;:025 � �k � �k;:975)jy
i= 0:95.
� Hypothesis Testing: Not relevant. Bayes factors.
� Conditional Posterior Density:p(�kj�j; �j 2 ��k;y) = p(�jy)=p(�j 2 ��kjy):
2.5 Large Sample Behavior of Posterior
� Asymptotically role of prior disappears.
� If there is a true �0 then the posterior mode b� (max-imum of the posterior) is consistent for this.
� Posterior is asymptotically normal
�jy a� Nhb�; I(b�)�1i ; (4)
centered around the posterior mode, where
I�b�� = "
@2 ln p(�jy)@�@�0
������=b�
#�1:
� Called a Bayesian central limit theorem.
3 Bayesian Linear Regression
� Linear regression model
yjX;�;�2 � N [X�;�2IN ]:
� Di¤erent results with noninformative and informativepriors.Even within these get di¤erent results according tosetup.
3.1 Noninformative Priors
� Je¤reys�priors: �(�j) _ c and �(�2) _ 1=�2.All values of �j equally likely.Smaller values of �2 are viewed as more likely.
�(�;�2) _ 1=�2:
� Posterior density after some algebra
p(�;�2jy;X)
_�1
�2
�K=2exp
��12(��b�)0 1
�2(X0X)(��b�)�
��1
�2
�(N�K)=2+1exp
� (N �K) s2
2�2
!
� Cond. posterior p(�j�2;y;X) isN [b�OLS; �2 �X0X��1]� Marginal posterior p(�jy;X) (integrate out �2) ismultivariate t-distribution centered at b� withN�Kdof and variance s2 (N �K)
�X0X
��1 = (N �K � 2).
� Marginal posterior p(�2jy;X) is inverse gamma.
� Qualitatively similar to frequentist analysis in �nitesamples.
� Interpretation is quite di¤erent.
E.g. Bayesian 95 percent posterior interval for �jis b�j � t:025;N�K�sehb�ji
� means that �j lies in this interval with posteriorprobability 0.95
� not that if we had many samples and constructedmany such intervals 95 percent of them will con-tain the true �j0.
3.2 Informative Priors
� Use conjugate priors:
� Prior for �j�2 is N [�0; �20]:
� Prior for �2 is inverse-gamma.
� Posterior after much algebra is
p(�;1=�jy;X)
_��2�(�0+N)=2�1
exp�� s12�2
� ��2��K=2
� exp�� 1
2�2
����
�01
����
��where
� =�0 +X
0X��1
(0�0 +X0Xb�)
1 =�0 +X
0X�
s1 = s0 + bu0bu+ ����
�0 ��10 +
�X0X
��1� ����
�
� Conditional posterior p(�j�2;y;X) is N [�;1]:
� Marginal posterior p(�jy;X) (integrate out �2) ismultivariate t-distribution centered at �:
� Here � is average of b�OLS and prior mean �0:And precision is sum of prior and sample precisions.
4 Monte Carlo Integration
� Compute key posterior moments, without �rst ob-taining the posterior distribution.
� Want E[m(�jy)], where expectation is wrt to pos-terior density p(�jy).For notational convenience suppress y:
� So wish to compute
E [m(�)] =Zm(�)p(�)d�: (5)
� Need a numerical estimate of an integral:- Numerical quadrature too hard.- Direct Monte Carlo with draws from p(�) not pos-sible.- Instead use importance sampling.
4.1 Importance Sampling
� Rewrite
E [m(�)] =Zm(�)p(�)d�
=Z
m(�)p(�)
g(�)
!g(�)d�;
where g(�) > 0 is a known density with same sup-port as p(�).
� The corresponding Monte Carlo integral estimate is
bE [m(�)] = 1
S
SXs=1
m(�s)p(�s)
g(�s); (6)
where �s, s = 1; :::; S, are S draws from of � fromg(�) not p(�).
� To apply to posterior need also to account for con-stant of integration in the denominator of (1).
� Let pker(�) = f (yj�)�(�) be posterior kernel.
� Then posterior density is
p(�) =pker(�)Rpker(�)d�
;
with posterior moment
E [m(�)] =Zm(�)
pker(�)Rpker(�)d�
!d�
=
Rm(�) pker(�)d�Rpker(�)d�
=
R �m(�) pker(�)=g(�)
�g(�)d�R �
pker(�)=g(�)�g(�)d�
:
� The importance sampling-based estimate is then
bE [m(�)] = 1S
PSs=1m(�
s)pker(�s)=g(�s)1S
PSs=1 p
ker(�s)=g(�s); (7)
where �s, s = 1; :::; S, are S draws of � from theimportance sampling density g(�):
� Method was proposed by Kloek and van Dijk (1978).
� Geweke (1989) established consistency and asymp-totic normality as S !1 if
� E[m(�)] <1 so the posterior moment exists
�Rp(�)d� = 1 so the posterior density is proper.
May requireR�(�)d� <1.
� g(�) > 0 over the support of p(�)
� g(�) should have thicker tails than the p(�) toensure that the importance weight w(�) = p(�)=g(�)
remains bounded.e.g. use multivariate- t.
� The importance sampling method can be used toestimates many quantities, including mean, standarddeviation and percentiles of the posterior.
5 Markov Chain Monte Carlo Sim-
ulation
� If can make S draws from the posterior,E[m(�)] can be estimated by S�1
Psm(�
s).
� But hard to make draws if no tractable closed formexpression for the posterior density.
� Instead make sequential draws that, if the sequenceis run long enough, converge to a stationary distrib-ution that coincides with the posterior density p(�).
� Called Markov chain Monte Carlo, as it involvessimulation (Monte Carlo) and the sequence is thatof a Markov chain.
� Note that draws are correlated.
5.1 Markov Chains
� A Markov chain is a sequence of random variablesxn (n = 0; 1; 2; :::) with
Pr [xn+1 = xjxn; xn�1; :::; x0] = Pr [xn+1 = xjxn] ;
so that the distribution of xn+1 given past is com-pletely determined only by the preceding value xn.
� Transition probabilities are
txy = Pr [xn+1 = yjxn = x] :
� For �nite state Markov chain with m states form anm�m transition matrix T:
� Then for transition from x to y in n steps (stages)the transition probability is given by Tn, the n-timesmatrix product of T.
� The rows t(n)j of the matrix Tn give the marginal
distribution across the m states at the nth stage.
� The chain is said to yield a stationary distributionor invariant distribution t (x) ifX
x2At (x)Tx;y = t (y) 8y 2 A:
� For Bayesian application the chain is �(n) not xn:
� We want the chain �(n):(1) to converge to a stationary distribution and(2) this stationary distribution to be the desired pos-terior.
5.2 Gibbs Sampler
� Easy to describe and implement.
� Let � = [�01 �02]0 have posterior density p(�) =p(�1; �2).
� Suppose know p(�1j�2) and p(�2j�1).
� Then alternating sequential draws from p(�1j�2) andp(�2j�1) in the limit converge to draws from p(�1; �2).
5.2.1 Gibbs Sampler Example
� Let y = (y1; y2) � N [�;�]where � =(�1; �2)
0
and � has diagonal entries 1 and o¤-diagonals �.
� Then given a uniform prior for � the posterior
�jy � N [�y;N�1�]:
� So the conditional posterior distributions are
�1j�2;y � Nh(�y1 + � (�2 � �y2)) ; (1� �2)=N
i�2j�1;y � N
h(�y2 + � (�2 � �y2)) ; (1� �2)=N
i;
� Can iteratively sample from each conditional normaldistribution using updated values of �1 and �2.
� If the chain is run long enough then it will convergeto the bivariate normal.
5.2.2 Gibbs Sampler
� More generally, suppose � is partitioned into d blocks.e.g. � = [� �2]0 in a linear regression example.
� Let �k be the kth blockand ��k denote all components of � aside from �k.
� Assume the full conditional distributions p��kj��k
�,
k = 1; :::; d are known.
� Then sequential sampling from the full conditionalscan be set up as follows.
1. Let the initial values of � be �(0) = (�(0)1 ; :::; �(0)d ):
2. The next iteration involves sequentially revising allcomponents of � to yield �(1) = (�
(1)1 ; :::; �
(1)d )
generated using d draws from the d conditional dis-tributions as follows:
p(�(1)1 j�(0)2 ; :::; �
(0)d )
p(�(1)2 j�(1)1 ; �
(0)3 :::; �
(0)d )
...
p(�(1)d j�(1)1 ; �
(1)2 ; :::; �
(1)d�1)
3. Return to step one, reinitialize the vector � at �(1)
and cycle through step 2 again to obtain the newdraw �(2): Repeat the steps until convergence isachieved.
� Geman and Geman (1984) showed that the stochas-tic sequence
n�(n)
ois a Markov chain with the cor-
rect stationary distribution. See also Tanner andWong (1987) and Gelfand and Smith (1990).
� These results mentioned do not tell us how manycycles are needed for convergence, which is modeldependent.
� It is very important to ensure that su¢ cient numberof cycles are executed for the chain to converge. Dis-card the earliest results from the chain, the so-called�burn-in�phase. Diagnostic tests are available.
5.3 Metropolis Algorithm
� The Gibbs sampler is the best known MCMC algo-rithm.
� Limited applicability as it requires direct samplingfrom the full conditional distributions which may notbe known.
� Two extensions that allow MCMC to be applied moregenerally are the Metropolis algorithm and the Metropolis-Hastings algorithm.
� In applying MCMC we use a sequence of approxi-mating posterior distributions; are transition distrib-utions or transition kernels or proposal densities:
� Use the notation Jn(�(n)j�(n�1)) which emphasizesthat the transition distribution varies with n.
1. Draw a starting point �(0) from an initial approxi-mation to the posterior for which p(�(0)) > 0.e.g. draw from a multivariate t-distribution centeredon the posterior mode.
2. Set n = 1: Draw �� from a symmetric jumpingdistribution J1(�
(1)j�(0)),i.e. for any arbitrary pair (�a; �b); Jn(�aj�b) =Jn(�bj�a)e.g. �(1)j�(0) � N [�(0);V] for some �xed V.
3. Calculate the ratio of densities r = p(��)=p(�(0)):
4. Set
�(1) =
(�� with probability min(r; 1)�(0) with probability (1�min(r; 1)) :
5. Return to step 2, increase the counter and repeatthe following steps.
� Can view as an iterative method to maximize p(�).If �� increases p(�) then �(n) = �� always.If �� decreases p(�) then �(n) = �� with prob r.
� Similar in spirit to accept-reject sampling but withno requirement that a �xed multiple of the jumpingdistribution always covers the posterior.
� Metropolis generates a Markov chain with propertiesof reversibility, irreducibility and Harris recurrencethat ensure convergence to a stationary distribution.
� To see that the Metropolis stationary distribution isthe desired posterior p(�) do as follows ...
� Let �a and �b be points such that p(�b) � p(�a).
� If �(n�1) = �a and �� = �b then �(n) = �b with
certainty and
Pr[�(n) = �b; �(n�1) = �a] = Jn(�bj�a)p(�a):
� If order is reversed and �(n�1) = �b and �� = �a
then �(n) = �a with probability r = p(�a)=p(�b)
and
Pr[�(n) = �a; �(n�1) = �b] = Jn(�aj�b)p(�b)p(�a)p(�b)
= Jn(�aj�b)p(�a)= Jn(�bj�a)p(�a)
as symmetric jumping distribution.
� Symmetric joint distribution) marginal distributions of �(n) and �(n�1) aresame) p(�) is the stationary distribution
5.4 Metropolis-Hastings (M-H) Algorithm
� The Metropolis-Hastings (M-H) algorithm is thesame as the Metropolis algorithm,except that in step 2 the jumping distribution neednot be symmetric.
� Then in step 3 the acceptance probability
rn =p(��)=Jn(��j�(n�1))
p(�(n�1))=Jn(�(n�1)j��)
=p(��)Jn(�(n�1)j��)
p(�(n�1))Jn(��j�(n�1)):
� Any normalizing constants present in either p(�) orJn(�) cancel in rn. So both posterior and jump prob-abilities need only be computed up to this constant.
5.5 M-H Examples
� Di¤erent jumping distributions lead to di¤erent M-Halgorithms.
� Gibbs sampler is a special case of M-H.If � is partitioned into d blocks, then there are dMetropolis steps at the nth step of the algorithm.The jumping distribution is the conditional distri-bution given in subsection 5.2 and the acceptanceprobability is always 1. Gibbs sampling is also calledalternating conditional sampling.
� Mixed strategies can be used.e.g. an M-H step combined with a Gibbs sampler.
� The independence chain makes all draws from a �xeddensity g (�) :
� A random walk chain sets the draw�� = �(n�1) + "; where " is a draw from g("):
� Gelman et al. (1995, p.334) consider � � N [�;�].For Metropolis with
��j�(n�1) � N [�(n�1); c2�];
c ' 2:4=pq leads to greatest e¢ ciency relative to
direct draws from q�variate normal.The e¢ ciency is about 0:3, compared to 1=q for theGibbs sampler for � =�2Iq.
6 Gibbs Sampler for SUR
� Two-equation example with ith observation
y1i = �11 + �12x1i + "1i
y2i = �21 + �22x2i + "2i;""1i"2i
#� N
""00
#;
"� =
�11 �12�21 �22
##:
� Assume independent informative priors, with
� � N [�0;B�10 ];
��1 � Wishart[�0;D0];
� Some algebra yields the conditional posteriors
�j�;y;X � N [C0(B0�0 +NXi=1
x0i��1yi);C0];
��1j�;y;X � Wish[�0 +N; (D�10 +
NXi=1
"0i"i)�1]
where C0 =�B0 +
PNi=1 x
0i��1xi
��1.
� Gibbs sampler can be used since conditionals known.
� Simulation: N = 1000 or N = 10000.x1i�N [0; 1] and x2i�N [0; 1]:�11 = �12 = �21 = �22 = 1
�11 = �22 = 1; �12 = �:5:Priors �0 = 0, B�10 = �I (with � = 10; 1; 0:1),D0 = I and �0 = 5.
� Gibbs sampler samples recursively from the condi-tional posteriors.Reject the �rst 5000 replications - �burn-in�.Use subsequent 50000 and 100000 replications.
� Table reports mean and st.dev. of the marginal pos-terior distribution of the 7 parameters.
� First three columns: not sensitive to di¤erent valuesof �:
� Fourth column vs. �rst shows doubling number ofreps has very little e¤ect.
� Fifth column vs. �rst shows that increasing samplesize ten-fold to 100000 has relatively small impacton point estimates though precision is much higher.
� When number of reps is small ' 1000 the autocor-relation coe¢ cients of parameters are found to be ashigh as 0.06. When number of reps ' 50000 serialcorrelation is much lower < 0:01.
7 Data Augmentation
� Gibbs sampler can sometimes be applied to a widerrange of models by introduction of auxiliary vari-ables.
� In particular, this is the case for models involvinglatent variables, such as discrete choice, truncatedand censored models.
� Observe only y = g(y�) for given g(�) and latentdependent variable y�.e.g. Probit / logit have y = 1(y� > 0).
� Data augmentation replaces y� by imputed val-ues and treats this as observed data.
� Essential insight, due to Tanner and Wong (1987),is that posterior based only on observed data is in-tractable, but that obtained after data augmentationis often tractable using the Gibbs sampler.
8 Bayesian Model Selection
� Method uses Bayes factors.
� Two hypotheses under consideration- H1 and H2 possibly non-nested.- Prior probabilities Pr[H1] and Pr[H2]:- Sample dgp�s Pr[yjH1] and Pr[yjH2]:
� Posterior probabilities by Bayes Theorem
Pr[Hkjy] =Pr[yjHk]Pr[Hk]
Pr[yjH1]Pr[H1] + Pr[yjH2]Pr[H2]:
� The posterior odds ratio
Pr[H1jy]Pr[H2jy]
=Pr[yjH1]Pr[H1]Pr[yjH2]Pr[H2]
� B12Pr[H1]
Pr[H2];
whereB12 =Pr[yjH1] =Pr[yjH2] is called Bayes fac-tor.
� Hypothesis 1 preferred if posterior odds ratio > 1.
� Bayes factor = posterior odds in favor of H1if Pr[H1] =Pr[H2].
� Bayes factor has form of a likelihood ratio.But depends on unknown parameters �k eliminatedby integrating over parameter space wrt prior, so
Pr [yjHk] =ZPr [yj�k; Hk]� (�kjHk) d�:
� This expression depends upon all the constants thatappear in the likelihood.These constants can be neglected when evaluatingthe posterior, but are required for the computationof the Bayes factor.
9 Practical Considerations
� WinBUGS package (Bayesian inference Using GibbsSampling) package especially useful for hierarchicalmodels and missing data problems.
� For more complicated models use Matlab or Gauss.
� Practical issue of how long to run the chain.Diagnostic checks for convergence are available, butoften do not have universal applicability.Graphs of output for scalar parameters from the Markovchain is a visually attractive way of con�rming con-vergence, but more formal approaches are available(Geweke, 1992).Gelman and Rubin (1992) use multiple (parallel) Gibbssamplers each beginning with di¤erent starting val-ues to see if di¤erent chains converge to the sameposterior distribution.Zellner and Min (1995) propose several convergencecriteria that can be used if the posterior can be writ-ten explicitly.
10 Bibliography
� Useful books include Gamerman (1997), Gelman,Carlin, Stern and Rubin (1995), Gill (2002) and Koop(1993) plus older texts by Zellner (1971) and Leamer(1978).
� Numerous papers by Chib and his collaborators, andGeweke and his collaborators, cover many topics ofinterest in microeconometrics. See Chib and Green-berg (1996), Chib (200) and Geweke and Keane(2000).
Albert, J.H. (1988), �Computational Methods for Using a Bayesian
Hierarchical Generalized Linear Model�. Journal of American Statis-
tical Association, 83, 1037-1045:
Casella, G. and E. George (1992), �Explaining the Gibbs Sampler�,
The American Statistician, 46, 167-174.
Chib, S. (2000), �Markov Chain Monte Carlo Methods: Computa-
tion and Inference�, chapter 57 in J.J. Heckman and E.E. Leamer,
Editors, Handbook of Econometrics Volume 5, 3570-3649.
Chib, S., and E. Greenberg (1995), �Understanding the Metropolis-
Hastings Algorithm�, The American Statistician, 49, 4, 327-335.
Chib, S., and E. Greenberg (1996), �Markov Chain Monte Carlo
Simulation Method in Econometrics�, Econometric Theory, 12, 409-
431.
Gamerman, D. (1997), Markov Chain Monte Carlo: Stochastic Sim-
ulation for Bayesian Inference, London: Chapman and Hall.
Gelfand, A.E. and A.F.M. Smith (1990) �Sampling Based Approaches
to Calculating Marginal Densities�, Journal of American Statistical
Association, 85, 398-409.
Gelman, A., J.B. Carlin, H.S. Stern and D.B. Rubin (1995), Bayesian
Data Analysis, London: Chapman and Hall.
Gelman, A., and D.B. Rubin (1992), �Inference from Iterative Sim-
ulations Using Multiple Sequences�, Statistical Science, 7, 457-511.
Geman, S. and D. Geman (1984), �Stochastic Relaxation, Gibbs Dis-
tributions and Bayesian Restoration of Images�, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 6, 721-741.
Geweke, J. (1989), �Bayesian Inference in Econometric Models Using
Monte Carlo Integration�, Econometrica, 57, 1317-1339.
Geweke, J. (1992), �Evaluating the Accuracy of Sampling-based Ap-
proaches to the Calculation of Posterior Moments (with discussion)�,
in J. Bernardo, J. Berger, A.P. Dawid, and A.F.M. Smith, Editors,
Bayesian Statistics, 4, 169-193. Oxford: Oxford University Press.
Geweke, J. and M. Keane (2000), �Computationally Intensive Meth-
ods for Integration in Econometrics�, chapter 56 in Heckman, J.J.
and E.E. Leamer, Editors, Handbook of Econometrics Volume 5,
3463-3567.
Gill, J. (2002), Bayesian Methods: A Social and Behavioral Sciences
Approach, Boca Raton (FL): Chapman and Hall.
Hastings, W.K. (1970), �Monte Carlo Sampling Methods Using Markov
Chain and Their Applications�, Biometrika, 57, 97-109.
Kass, R.E. and A.E. Raftery (1995), �Bayes Factors�, Journal of
American Statistical Association, 90, 773-795.
Kloek, T. and H.K. van Dijk (1978), �Bayesian Estimates of Equa-
tion System Parameters: An Application of Integration by Monte
Carlo�, Econometrica, 46, 1-19.
Koop, G. 2003), Bayesian Econometrics, Wiley.
Leamer, E.E. (1978), Speci�cation Searches: Ad Hoc Inference with
Nonexperimental Data, New York: John Wiley.
Robert, C.P., and G. Casella (1999), Monte Carlo Methods, New
York: Springer-Verlag.
Tanner, M.A., and W.H. Wong (1987), �The Calculation of Pos-
terior Distributions by Data Augmentation�, Journal of American
Statistical Association, 82, 528-549.
Zellner, A. (1971), An Introduction to Bayesian Inference in Econo-
metrics, New York: John Wiley.
Zellner, A. (1978), �Je¤reys-Bayes Posterior Odds Ratio and the
Akaike Information Criterion for Discriminating Between Models�,
Economics Letters, 1, 337-342.
Zellner, A., and C-k. Min (1995), �Gibbs Sampler Convergence
Criteria�, Journal of American Statistical Association, 90, 921-927.
Table 1: Mean and Standard deviation of the PosteriorDistribution of a two-equation SUR Model calculated byGibbs Sampling.
� = 10 � = 1 � = 1=10 � = 10 � = 10N 1000 1000 1000 1000 10000reps 50000 50000 50000 100000 100000�11 0.971 1.013 0.983 1.020 1.010
(0.0310) (0.0312) (0.0316) (0:0324) (0:0100)�12 1.026 0.9835 1.006 1.006 1.015
(0.0265) (0.0271) (.0265) (:0268) (0:0086)�21 1.016 0.972 0.993 1.017 0.991
(0.0309) (0.0325) (0.0322) (0:0326) (0:0100)�22 0.983 0.992 0.979 1.005 1.007
(0.0256) (0.0285) (0.0272) (0:0277) (0:0085)�11 0.960 0.969 1.012 1:043 1.010
(0.0429) (0.0434) (0.0453) (0:0466) (0:0143)�12 -0.499 -0.507 -0.519 -0.576 -0.515
(0.0340) (0.0358) (0.0368) (0:0379) (0:0113)�22 0.950 1.066 1.049 1.062 1.002
(0.425) (0.0476) (0.0467) (0:0472) (0:0141)