hierarchical selection models with applications in meta-analysis

12
This article was downloaded by: [Florida Atlantic University] On: 11 November 2014, At: 12:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of the American Statistical Association Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uasa20 Hierarchical Selection Models with Applications in Meta-Analysis Nancy Paul Silliman a a Division of Biometrics, Center for Drug Evaluation and Research , Food and Drug Administration , Rockville , MD , 20850 , USA Published online: 17 Feb 2012. To cite this article: Nancy Paul Silliman (1997) Hierarchical Selection Models with Applications in Meta-Analysis, Journal of the American Statistical Association, 92:439, 926-936, DOI: 10.1080/01621459.1997.10474047 To link to this article: http://dx.doi.org/10.1080/01621459.1997.10474047 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: nancy-paul

Post on 16-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Selection Models with Applications in Meta-Analysis

This article was downloaded by: [Florida Atlantic University]On: 11 November 2014, At: 12:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical AssociationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uasa20

Hierarchical Selection Models with Applications inMeta-AnalysisNancy Paul Silliman aa Division of Biometrics, Center for Drug Evaluation and Research , Food and DrugAdministration , Rockville , MD , 20850 , USAPublished online: 17 Feb 2012.

To cite this article: Nancy Paul Silliman (1997) Hierarchical Selection Models with Applications in Meta-Analysis, Journal ofthe American Statistical Association, 92:439, 926-936, DOI: 10.1080/01621459.1997.10474047

To link to this article: http://dx.doi.org/10.1080/01621459.1997.10474047

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Hierarchical Selection Models with Applications in Meta-Analysis

Hierarchical Selection Models With Applicationsin Meta-Analysis

Nancy Paul SILLIMAN

Hierarchical selection models are introduced and shown to be useful in meta-analysis. These models combine the use of hierar­chical models, allowing investigation of variability both within and between studies, and weight functions, allowing modeling ofnonrandomly selected studies. Markov chain Monte Carlo (MCMC) methods are used to estimate the hierarchical selection model.This is first illustrated for known weight functions, and then extended to allow for estimation of unknown weight functions. Toinvestigate sensitivity of results to unobserved studies directly, which is shown to be different from modeling bias in the selectionof observed studies, the hierarchical selection model is used in conjunction with data augmentation. Again, MCMC methods maybe used to estimate the model. This is illustrated for an unknown weight function.

KEY WORDS: Data augmentation; Hierarchical model; Monte Carlo; Selection bias; Weight function.

1. INTRODUCTION

This article addresses three aspects of combining infor­mation, with special attention to meta-analysis. The firstaspect is the use of hierarchical models to model study ef­fects (e.g., differences in experimental and control groupmeans) that are heterogeneous (Carlin 1992; DuMouchel1990; Morris and Normand 1992). The second aspect dealswith selection bias (Bayarri and DeGroot 1993; Hedges andOlkin 1985; Larose and Dey 1996). Sometimes it is impos­sible to observe a random sample from the population ofstudies of interest, because there is selection inherent in theway studies are observed (e.g., only studies with significantresults are observed). Weight functions, defined later, canbe used to account for such selection. The third aspect dealswith sensitivity of results to any unobserved study effects.Data augmentation is used to investigate such sensitivityby modeling unobserved study effects directly (West 1994).An approach to meta-analysis is illustrated that addressesall three of these issues. First, hierarchical selection modelsthat model both heterogeneity and selection bias, account­ing for the first two aspects just mentioned, are introduced(Paul 1995). These models are then extended to includemodeling unobserved studies directly using data augmen­tation (Paul 1995).

When study effects are homogeneous, fixed-effects mod­els may be used for meta-analysis (Glass 1976). But whenstudy effects are not homogeneous, Hedges (1983) showedthat it is more appropriate to use random-effects (or hier­archical) models. Hierarchical models allow one to modelboth properties about the individual study effects (usuallydone in the first stage of the model) and how these prop­erties vary among studies (done in the second stage, thirdstage, etc.). Section 2 presents a two-stage Bayesian hierar-

Nancy Paul Silliman is mathematical statistician, Division of Biometrics,Center for Drug Evaluation and Research, Food and Drug Administration,Rockville, MD, 20850. This research was conducted while the author wasa Howard Hughes Medical Institute Predoctoral Fellow in Biostatistics inthe Department of Statistics, Carnegie Mellon University, Pittsburgh, PA.The author would like to thank her Ph.D. advisor, Larry Wasserman, forhis guidance and many helpful suggestions. The author would also like tothank the editor, the associate editor, and two referees for comments thatgreatly improved the manuscript.

chical model for effect sizes in a meta-analysis, similar tothat of Carlin (1992), DuMouchel (1990), and Morris andNormand (1992), who assumed that the observed studiesare a random sample from the population of interest, andintroduces a Jeffreys prior for the final stage of the model.This model is used as a building block for the hierarchicalselection model introduced in Section 3. An example fromthe field of dentistry is given that is used throughout thearticle.

The second issue concerns complications that arise whenone is not able to observe a random sample from the un­derlying population of interest. For example, as large stud­ies with significant results are more easily published thansmall studies with nonsignificant results (Greenwald 1975),this "publication bias" inherent in meta-analyses can givemisleading results unless accounted for. When this is thecase, or when there is any sort of selection inherent in the,way units are observed, weight functions allow one to in­corporate this information into the modeling process. (Fora more complete list of references on meta-analysis andpublication bias, see Cooper and Hedges 1994.)

The concept of a weight function originated with Fisher(1934), who demonstrated a need for adjustment in the waymodels are specified depending on how the data are ob­tained. A good survey on the topic was provided by Rao(1985). Weight functions may be defined as follows. Sup­pose that a random variable X is distributed according tothe density f(xIB), but a random sample cannot be drawnfrom this distribution-in fact, the probability that an ob­servation x enters the sample gets multiplied by some non­negative weight function w(x). Then the observed sampleis actually a random sample from the weighted distribution,

r(xIB) = w(x)f(xJB)Jw(x)f(xIB) dx'

The weight function might also depend on an unknown pa­rameter, !, and the form of the weighted distribution then

© 1997 American Statistical AssociationJournal of the American Statistical Association

September 1997, Vol. 92, No. 439, Theory and Methods

926

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 3: Hierarchical Selection Models with Applications in Meta-Analysis

Silliman: Hierarchical Selection Models in Meta-Analysis 927

generalizes to

(2)

I

f(YIJ-l, (J~) = (27f)-II2 II((J; + (J~)-1/2

i=1

p, (a~) oc [~ t, (a; L"),rFor the model where the Jeffreys prior distributions are

specified in the third stage, the full conditional distributionsused in the Gibbs sampling algorithm are

i = 1, ... ,I,

and

{~ -(Yi - J-l)2}X exp 6 2((J; + (J~)

is the marginal distribution of the data y. In this model, thereference prior distribution for (J~ using Jeffreys's rule is

the suggestion of Jeffreys (1961) and Kass and Wasserman(1994), one could assume that J-l and (J~ are independentand take J-l to have a "flat" prior distribution (i.e., uniformon ~). The distribution for (J~ is obtained as

Pl((J~) = [I((J~W/2,

where I((J~) is the expected Fisher information, given by

I( 2) =-E [f)2l0gf(YIJ-l,(J~)]«; (T~ f)((J~)2 '

(1)i =1, ... ,I,

2. HIERARCHICAL MODELS FOR META-ANALYSIS

For the case in which the observed studies are a randomsample from the population of interest, the following hier­archical model for a meta-analysis is considered:

Section 3 discusses how known weight functions canbe incorporated into the hierarchical model for meta­analysis given in Section 2 using Markov chain MonteCarlo (MCMC) methods such as Gibbs sampling (Gelfandand Smith 1990) and the Metropolis algorithm (Metropolis,Rosenbluth, Rosenbluth, Teller, and Teller 1953). Becauseweight functions will not be known in practice, Section 4considers Bayesian estimation of parametric and nonpara­metric weight functions, including shape constraints, in thecontext of the hierarchical model discussed here.

Weight functions allow one to investigate sensitivity ofresults to bias in the way studies are obtained. However,this is different from examining sensitivity to unobservedstudies directly. To investigate sensitivity of results to un­observed studies, while still accounting for between-studyvariability and bias in the collection of the observed stud­ies, the hierarchical selection model approach is combinedwith data augmentation to allow for direct modeling of un­observed studies. Again, MCMC methods may be used toestimate the model, as Section 5 illustrates for an unknown(i.e., estimated) weight function. Section 6 provides someconclusions.

r( 10 ) = w(xl,)f(xIO)x " Jw(xl,)f(xIO) dx'

J-lIY, a, (J~ '" N ( Oi, ~~) ,

(3)

(J~lc,d ro.J IG(c, d),

where N represents the normal distribution, IG representsthe inverse gamma distribution, Yi is the observed study ef­fect (e.g., a difference in treatment means), Qi is the truestudy effect, (Jr is the within-study variance, J-l is the averagestudy effect, and (J~ is the between-study variance. Becausea; is usually the standard error of an estimate, for simplic­ity we treat it as fixed. All posterior distributions are easilyestimated using MCMC (Gelfand and Smith 1990; Hastings1970). Note that one could also specify a nonnormal dis­tribution for effect sizes (such as the noncentra1 t) in thefirst and/or second stage of the model, if desired. Again,MCMC methods could be used to estimate the model. Formeta-analyses where there seems to be either disagreementor skewness among studies, this would provide a more ro­bust approach.

As an alternative to specifying proper prior distributionsfor J-l and (J~, one could specify a reference (or noninfor­mative) prior distribution (Berger 1980, p. 82). Following

and

2 (I 1~ 2) 2(Joly,a,J-l ro.J IG "2 -1'"2 6(Qi - J-l) . PI((Jo)'

Because one cannot sample from the full conditional dis­tribution for (J~ directly, a Metropolis step (Metropoliset al. 1953; Tierney 1991) is used. If the Markov chainis at (J~(N) = 0, then a candidate value 0* is generatedfrom IG(I/2 - 1,0.52:{=1 (Qi - J-l)2). With probabil­ity (3(0,0*) = min{[0.52:{=1 ((J; + 0*)-2P /2/[0.5 2:{=1((J; + 0)-2p/2, I}, the chain moves to (J~(N+l) = 0*. Oth­erwise, the chain moves to (J~(N+l) = O.

Note that if (JI = (J~ = ... = (J; = (J2 were assumed,then the Jeffreys prior for (J~ would simplify to

PI((J~) <X ((J2 + (J~)-I,

which is similar to the Jeffreys prior described by Box andTiao (1973, p. 251) for the case in which both (J2 and (J~

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 4: Hierarchical Selection Models with Applications in Meta-Analysis

i = 1, ... ,I,

928

are unknown:

( 2 2) -2( 2 2)-1P(J",(J"OI CX(J" (J" +(J"OI .

2.1 An Example

Johnson (1993) reviewed 12 studies comparing the ef­fectiveness of two different types of fluoride-sodium flu­oride (NaP) and sodium monofluorophosphate (SMFP)-inpreventing cavities. For each study, the observed averagedifference in effect Yi (given by the average increment indecayed, missing, and/or filled surfaces for teeth (D(M)FS)for patients using SMFP minus the average increment inD(M)FS for patients using NaP), the corresponding stan­dard error ai, and the sample size N are reported (Table 1).As one hopes to reduce the average increment in D(M)FSwith fluoride treatment, a positive difference suggests thatNaP is more effective than SMFP. Eleven of the 12 studiesreport a positive effect, and the estimated overall differencein effect (measured by weighting each study difference ac­cording to sample size and variability and averaging) is .32,with a 95% confidence interval of (.13, .52), supporting thehypothesis that NaP is better.

It is useful to further quantify the statement that NaP ismore effective by looking at the posterior probability that u,(the average treatment difference) is positive. This is doneusing the hierarchical model described earlier. First, it isassumed that (1) fL follows a normal distribution with meanoand variance .04, corresponding to the belief a priori thatthere is no difference in treatment, and that (2) (J"; followsan inverse gamma distribution with mean .04 and variance1.0, corresponding to the noniterative estimate for (J"; of.04 developed by DerSimonian and Laird (1986). This isreferred to as the "clinical informative prior."

For the clinical informative prior, the estimate of theaverage effect fL is .25, and the estimate of the between­study variance (J"; is .034 (see Table 2, line 1). Note thatthe estimate of fL is a compromise between the observeddata with weighted mean .32 (given by Johnson) and theprior mean of O. The estimated posterior density for fLgivesP(fL > DiY) = .9985, providing very strong evidence thatNaP is more effective in combating cavities than SMFP.

Results are fairly consistent when the Jeffreys prior dis­tributions are specified. The estimate of the overall effectfL is somewhat larger (.32 equal to the observed weightedmean specified by Johnson, compared to .25), as is the es­timate of the between-study variance (.067, compared to.34). In this-case P(fL > Oly = .9981, which again is strongevidence for the superiority of NaP.

Journal of the American Statistical Association, September 1997

1979). One approach to modeling any nonrandornness orbias in study selection is to use weight functions.

Hedges and Olkin (1985) incorporated known weightfunctions into a fixed-effects model for meta-analysis us­ing maximum likelihood techniques. Iyengar and Green­house (1988) extended this approach using a similar modelby allowing for maximum likelihood estimation of para­metric families of weight functions. For a nonhierarchicalmodel, Cleary (1993) and Larose and Dey (1996) consideredBayesian estimation of the weight function and computingthe posterior distribution(s).

Fixed-effects model allow one to learn about the observedstudy effects, but not about the population of study ef­fects of interest. Thus one desirable extension of the workof Hedges and Olkin (1985) and Iyengar and Greenhouse(1988) would be to allow for random effects in a meta­analysis. This is accomplished here by constructing a "hier­archical selection model," which incorporates weight func­tions into the hierarchical model discussed in Section 2.These models allow one to simultaneously model hetero­geneity of study effects and publication bias.

Assume that the model for a meta-analysis involving arandom sample of studies is given by (1), and that theprobability of observing any specific study effect Yi ismultiplied by some known nonnegative weight functionW(Yi)' Then the random variable actually observed is aweighted version of Yi, Yiw == Xi, from the weighted den­sity

(4)

where f(Xilai' (J"l) = (27r)-1/2(J";1 exp( -.5(J";2(Xi - ai)2)and CW ,OI i ,U 2 = I w(x)f(xlai' (J"l) dx .

For any'specified weight function, the full conditionaldistributions used in the MCMC algorithm for estimatingthe hierarchical selection model with stage III proper priordistributions are

and3. HIERARCHICAL SELECTION MODELS

The hierarchical model described in Section 2 assumesthat the studies included in a meta-analysis are a randomsample from all available studies. This is often not thecase, however, when studies are selected through a litera­ture search. This can be a problem in that published researchtends to be biased toward statistical significance (Rosenthal

where x = (Xl,"" XI )', a

(I:~=1 ai)/ I.

(5)

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 5: Hierarchical Selection Models with Applications in Meta-Analysis

Silliman: Hierarchical Selection Models in Meta-Analysis 929

Table 1. Observed Effect, Standard Error, and Sample Size for the 12 Studies From the Original Johnson Meta-Analysis

Study

2 3 4 5 6 7 8 9 10 11 12

Yi .86 .33 .47 .50 -.28 .04 .80 .19 .49 .49 .01 .67Ui .57 .56 .35 .25 .54 .28 .78 .13 .28 .24 .08 .17N 247 326 277 363 343 1,490 418 2,273 1,352 2,762 2,222 2,126

and

For the stage III Jeffreys prior distributions, the full con­ditional distributions used in the MCMC algorithm are

ailx, a#i, It, O"~ ex (cw Q' 0"2 )-1 N (Xi ( 2O"~ 2), 1.11. Ucx. + (J'i

Note that the choice of 'Y = 0 indicates no selection bias.In this case the hierarchical selection model simplifies tothe hierarchical model discussed in Section 2. The resultsfor the choice of 'Y = 1 using the clinical informative priornow follow (see also Table 2, line 2). The chosen weightfunction, w(x) = [z], indicates that larger study effects aremore likely to be observed. Thus the estimates of both f.l andthe ai shrink toward O. The estimate of f.l is .19, compared

12 studies compare the effectiveness of two different flu­orides, NaP and SMFP, in preventing cavities. Johnson(1993) stated that the 12 studies that she reviewed were se­lected through a world-wide literature search. (She actuallyfound 13 studies, but used only 12 because the thirteenthfailed to provide an estimate of the variance of the studyeffect.) Because published research has been shown to bebiased toward statistical significance (Rosenthal 1979), thisis an interesting example for investigating the possible ef­fect of publication bias.

To account for the fact that studies with larger effects(here, the difference in mean effect of NaP and SMFP) havea higher probability of being published, a weight functionis specified, One plausible weight function is w(x) = Ixl'Y,where 'Y ~ O. This is referred to as "size-biased sampling."For this weight function, it is not possible to sample directlyfrom the weighted full conditional distribution for ai, so aMetropolis step is used. For each ai, if the Markov chainis at a~n) = e, then a candidate value e* is sampled fromthe corresponding unweighted full conditional distributionand accepted with probability

!J(e,e*)=min{[!lul'Y¢(U:ie*) -l'x [!IUI'Y¢(u~e) - ,I}.

i = 1, ... ,1,

where PI is given by (2).Depending on the weight function w(x) specified at the

first stage, it mayor may not be possible to sample directlyfrom the weighted full conditional distributions for the ai.If it is possible, then analysis proceeds as before. If it is notpossible, then Gibbs sampling in the case of stage III properprior distributions and Gibbs sampling with a Metropolisstep for O"~ in the case of stage III Jeffreys prior distri­butions is used to obtain estimates of all posterior distribu­tions as before, with the following modification. As one canstill evaluate the weighted full conditional distributions forthe ai, either analytically or numerically depending on theweight function, a Metropolis step for sampling from thesedistributions is added to the algorithm. Estimation of hier­archical selection models is investigated for various weightfunctions w(·).

3.1 Size-Biased Sampling

Recall the dental example given in Section 2.1, where

Table 2. Estimates of the Average Study Effect and Between-Study Variance for Various Hierarchical Selection Models

Clinical informative prior Jeffreys's prior

Weight function it -2 ;y it ~ ;y0"0< 0<

No weight function .25 .034 N/A .32 .067 N/ASize-biased sampling (r = 1.0) .19 .026 N/A .22 .042 N/ASize-biased sampling (r = 2.0) .15 .021 N/A .15 .025 N/ASize-biased sampling (E(r) = .5) .23 .030 .34 .28 .054 .29Size-biased sampling (Eh) = 2.0) .17 .024 1.36 .20 .035 1.33Truncated sampling (Eh) = .001) .25 .034 .001 .32 .065 .001Truncated sampling (E(r) = 1.0) .23 .030 .57 .29 .057 .44Step weight function (r1, 'Y2, 'Y3) .22 .030 (.75, .57, .36) .28 .056 (.74, .57, .40)

NOTE: In models specifying publication bias, estimates of J..L and O'~ are smaller, to account for the fact that observed study effects are more likely to be drawn from the tails of the underlyingdistribution than the middle (near 0). The greater the publication bias, the more both estimates shrink. Because the clinical informative prior for J1- is centered around 0, whereas the Jeffreys prioris "noninformative" and hence influenced by the data (weighted average study effect of .32), estimates of J1- are smaller under the clinical informative prior.

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 6: Hierarchical Selection Models with Applications in Meta-Analysis

930

to .25 under the unweighted hierarchical model. But theposterior probability that J.t > 0 is .990, suggesting that theevidence in favor of NaP is robust. The weight functionalso specifies that the weighted sample will be more spreadout than a random sample. Correspondingly, the estimate ofO"~ is smaller (.026, compared to .034 for the unweightedmodel).

To examine robustness of the model to the choice of theprior distribution, results are also considered for the stageIII Jeffreys prior. The estimate of J.t in this case is againsomewhat larger than the estimate of J.t under the clinicalinformative prior (.22, compared to .19), as is the estimateof O"~ (.042 versus .026). These estimates are smaller thanthose under the Jeffreys prior when no publication bias isassumed, for the same reasons stated in the previous para­graph. However, the posterior probability that J.t is positiveis .9918, so the conclusion that NaP is more effective re­mains the same.

Now suppose that, = 2, specifying even more extremepublication bias. The results for this choice are similar tothose for, = 1, but there is even more shrinkage towardo (see Table 2, line 3). For the clinical informative prior,the estimate of J.t is .15. The estimate of O"~ is also smaller(.021). The posterior probability that J.t > 0 is .980, againsuggesting that the evidence in favor of NaP is robust tothe choice of the weight function. For the Jeffreys prior,the estimates of J.t and O"~ are .15 and .025. In addition,P(J.t> Olx) = .9887. .

Although the overall conclusion remains the same in bothweighted models regardless of the prior specified, the indi­vidual parameter estimates differ. Results are sensitive notonly to the choice of prior distribution, but also to the choiceof the weight function. Thus it will be more satisfactory toestimate the weight function as is done in the next section.

4. ESTIMATION OF THE WEIGHT FUNCTION

As the weight function must be specified by the user,much as a prior distribution for a parameter is specified,one question of interest is how robust results are to thechoice of the weight function. One approach to dealing withthis uncertainty is to use a parametric model for the weightfunction w(xll) and estimate the parameter{s) ,. Iyengarand Greenhouse (1988) did this using two different para­metric weight functions and a "no effects" model that as­sumes that all study effects are the same. The parametersin the model and weight function are estimated using max­imum likelihood. Rather than specify a parametric weightfunction, Hedges (1992) approximated the weight functionby a step function in the context of a two-stage random­effects model. The parameters in the random-effects modeland the heights of the step function are estimated using theEM algorithm (Dempster, Laird, and Rubin 1977). Hedges'model is similar to the one given here in (1), but withoutstage III. Thus, using Hedges' model, one does not obtainposterior distributions for the parameters of interest thatdescribe the population of study effects, J.t and O"~.

A natural extension of such work is to estimate the weightfunction in the context of the more general three-stage hi-

Journal of the American Statistical Association, September 1997

erarchical model given here. How this may be done is nowillustrated for both parametric and step weight functions.

4.1 Parametric Weight Functions

Suppose that one wishes to use a weight function tomodel publication bias, but is unsure of the amount of se­lection present. A natural approach is to specify a familyof parametric weight functions, then estimate the parame­tens), Here a Bayesian approach is described for estimatingthe parameters in the weight function.

For the hierarchical selection model described in Section3, a Gibbs/Metropolis algorithm can again be used to es­timate both the parameters in the model and the weightfunction. For simplicity, assume that the weight functiondepends on a single parameter. Denote this parameter by"and denotes its prior distribution by p(,). Then the weightedfull conditional distribution for, is

The full conditional distributions for the other parametersin the model remain as in Equation (5) or (6), depending onwhether the stage III proper or the Jeffreys prior distribu­tions are specified. But because, is now unknown, its valueis updated after every Gibbs iteration. Results generalize tothe multiparameter case in the obvious way. Examples aregiven herein for both cases.

To sample candidate values for " a normal distributioncentered around the current value is used, and work is doneon the 10gb) scale (recall that, must be nonnegative). Ifthe chain for y == 10gb) is currently at yen) =y, thena candidate value y* is sampled from N(y, T 2). The valueof the variance T 2 is chosen initially, and then changed ac­cordingly if the rejection rate is either too small or too large(Gilks and Wild 1992). On the log scale, the probability ofaccepting a candidate value y* is given by

4.1.1 Size-Biased Sampling. To see how this procedureworks, recall again the dental example. The parametric fam­ily of weight functions of the form w(x) = Ixl')' is used tospecify size-biased sampling. Recall that , = 0 indicatesno publication bias. Extreme publication bias might be rep­resented by , = 2, as this would specify that a differenceequal to 2 is four times as likely to be observed as a dif­ference of 1, a difference of 3 is nine times as likely to beobserved as a difference of 1, and so on.

Suppose that one believes a small amount of publicationbias is present. A reasonable prior to specify for , wouldbe IG{3, 1), which corresponds to a prior mean and stan­dard deviation of ~. The middle 95% of this distributionfalls in the interval (.14, 1.62), allowing for a wide range ofplausible amounts of publication bias. When the clinical in­formative prior is specified for stage III of the hierarchical

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 7: Hierarchical Selection Models with Applications in Meta-Analysis

Silliman: Hierarchical Selection Models in Meta-Analysis

selection model, the posterior mean of 1 is .34, suggestingthat the data support less publication bias than that speci­fied by the prior distribution (see Table 2, line 4). The datado support the presence of some publication bias, however,and estimates of Jl and (T~ shrink somewhat toward 0, aswas noted in Section 3. The estimate of Jl is .23, and theestimate of (T; is .030. Note that these estimates shrink lesstoward 0 than when 1 was specified as 1.0 or 2.0, both ofwhich indicate more publication bias then the estimate of1 in this case (.34). The evidence in favor of NaP, given byP(Jl > Ojx) = .997, is still very strong.

To examine robustness of the model to the choice of theprior distribution, results are now investigated for the stageIII Jeffreys prior distributions. The estimate of Jl (.28) issomewhat larger, as is the estimate of (T; (.054). The pos­terior probability that Jl is positive is .997. In addition, theestimate of 1 is a little smaller (.29), suggesting less publi­cation bias than when the clinical informative prior is spec­ified.

Suppose one believed that a great deal of publication biaswas present in the dental meta-analysis. A reasonable priorfor 1 would then be IG(18, 34), corresponding to a priormean of 2 and a prior standard deviation of ~. Again, thedata do not support the amount of publication bias specifiedby the prior. For the stage III clinical informative prior, theposterior mean of 1 is 1.36. For the stage III Jeffreys priordistribution, the posterior mean is 1.33 (see Table 2, line 5).However, in both cases, this still suggests that quite a largeamount of selection bias is present in the results. Estimatesof the other parameters shrink accordingly, and more sothan previously. The overall conclusion remains the same,though, as P(Jl > Olx) = .988 for the clinical informativeprior and P(Jl > Olx) = .991 for the Jeffreys prior.

These results should be more satisfactory than those inSection 3, as 1 is estimated rather than specified, and itis doubtful that one would ever know I' But this examplesuggests that results are going to be very sensitive to thechoice of the prior specification for I' This makes sense, asthere is little information in the data about the amount ofpublication bias. Thus, several priors and weight functionsshould be specified when performing this type of analysis,and the range of results given. Silliman (1998) explores theuse of nonparametric classes of weight functions for mod­eling publication bias.

4.1.2 Truncated Sampling. To address the aforemen­tioned issue, a different family of weight functions is nowspecified. Suppose that significant study effects (say, at thea = .05 level for a two-sided test of whether the standard­ized difference is 0) are always observed, but that nonsignif­icant study effects are seen with a constant probability e-"(,

where 1 is nonnegative. The following family of weightfunctions is then specified as

931

where x is the observed difference in average incrementof D(M)FS for patients using SMFP minus that of patientsusing NaP, as before, and (T is the corresponding standarddeviation. This type of sampling is termed "truncated sam­pling." Note that 1 = 0 indicates no selection bias, and1 = 00 specifies that only results that are significant at the.05 level are observed. Thus this family allows for a widerange of publication bias. The first prior specified for 1 isIG(2.000001, .001), corresponding to a prior mean of .001and a prior standard deviation of 10. This prior suggests thatalmost no selection bias is present. For the clinical infor­mative prior, the posterior mean of 1 is .001, which agreesexactly with the prior mean (see Table 2, line 6). In fact,the posterior means of Jl and (T~ are the same as when nopublication bias is specified: .25 and .034. For the Jeffreysprior, the posterior mean of 1 is also .001, and the estimatesof Jl and (T; are approximately the same as for this modelwith no publication bias: .32 and .065.

The second prior specified for the new weighted modelis IG(2.01, 1.01), corresponding to a prior mean for 1 of 1and a prior standard deviation of 10. (Note that e-1 :::::: .37,indicating that the probability of observing a nonsignificantresult is only slightly more than 1the probability of observ­ing a significant result). On the e-"( scale, the middle 95%of this distribution falls in the range (.02, .84), allowing fora large range of plausible publication bias. For the clini­cal informative prior, the estimate of 1 is .57, suggestingthat nonsignificant studies are almost two-thirds as likelyto be observed as significant studies (C· 57 = .57). Hencethe estimate of the grand mean Jl (.23) is larger than in thesize-biased sampling models (see Table 2, line 7). For theJeffreys prior, the estimate of 1 is somewhat smaller (.44).As this corresponds to less publication bias, the estimate ofJl is larger (.29).

4.2 Step Weight Functions

The family of weight functions discussed in the previoussection can be thought of as a simple family of step func­tions. This section explores the use of step weight functionsin more detail.

It seems reasonable that one would observe results notsignificant at the .05 level, results significant at the .05 level,and results significant at the .01 level with different proba­bilities. Hence the weight function is approximated here bythe following step weight function:

11 if ~ E A1 = (-00,-2.58),

12 if ~ E A2 = [-2.58, -1.96),

w(xl,) = 13 if ~ E A3 = [-1.96,1.96]'

14 if ~ E A4 = (1.96,2.58]'

15 if ~ E A5 = (2.58,00),

{

e-"(

w(xlr) = 1if B < 1.96

u -

if B > 1.96,a

where x is the observed difference in effect, (T is the cor­responding standard error, , = ('1'1,,2,,3,,4;,5)', ando::; Ij ::; 1, for j = 1,2,3,4,5. One may interpret w(xl,)as the probability of observing the standardized studyeffect x / (T.

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 8: Hierarchical Selection Models with Applications in Meta-Analysis

932

Denote the prior on "{j by Pj for j = 1,2,3,4,5. The fullconditional distributions for the true study effects ai, thegrand mean f.J" and the between-study variability <T~ remainas before. The weighted full conditional distributions forthe "{j are now

where CW,C<i,cr;,'Y = Jw(xl")')j(xlai, <Tn dx and j(xlai' <Tnis the density function for a normal distribution with meanai and variance <T;.

Note that if symmetry were desired, then one could spec­ify "{1 = "{5 and "{2 = "{4. In the dental meta-analysis, be­cause there is no reason a priori to believe that positiveeffects are more likely to be observed than negative effects,this assumption is made. It also seems reasonable that thelarger the standardized study effect »[a (i.e., the more "sig­nificant"), the more likely one is to observe it. To accountfor this, the following restrictions are added so that the re­sulting step weight function will be "cup-shaped"; that is,the reverse of unimodal: "{3 ::; "{2 ::; "{1.

Assume that "{1, "{2, and "{3 are uniformly distributed onthe unit cube, with the restriction that "{3 ::; "{2 ::; "{1. AMetropolis step is used to sample from the weighted fullconditional distributions for the "(j. Rather than sample can­didate values from a normal distribution as was done withsize-biased and truncated sampling, these values are nowgenerated from the joint prior distribution, as this methodwas found to be more efficient. Candidate values are ob­tained as follows. Three uniform(O, 1) random variables aregenerated, and then ordered from smallest to largest. Thesmallest is a sampled value of "{3, the middle is a sampledvalue of "{2, and the largest is a sampled value of "(1. Theprobability that the candidate value ")'* = ("{i, "(2' "(3) is ac­cepted, given that the chain is at r(n) = ")' = ("(1,"{2,"{3),is

Using the clinical informative prior, the posterior meansof "{1, "{2, and "{3 are .75, .57, and .36 (see Table 2, line 8).The evidence that NaP is the better fluoride is very strong.In fact, P(f.J, > Olx) = .9978. Results are similar for theJeffreys prior.

5. MODELING UNOBSERVED STUDIES

As the discussion in Sections 3 and 4 suggests, whencombining information in a meta-analysis to make moregeneral inference, it is important to consider how studiesare selected for inclusion in the meta-analysis and to modelany bias that might occur in this procedure. But when stud­ies are not missing at random, it is also important to con­sider these unobserved studies directly. In this way, one caninvestigate what sort of influence they might have on final

Journal of the American Statistical Association, September 1997

conclusions. Meta-analysis is one important application inwhich unobserved units (here, studies) are often not missingat random due to publication bias. Hence in a meta-analysis,it usually is advisable to model such unobserved study ef­fects directly to assess sensitivity of conclusions.

One approach that has been taken in this area is to cal­culate the fail-safe sample size (FSSS) (Iyengar and Green­house 1988; Rosenthal 1979). For a meta-analysis based onpublished studies, the FSSS is the number of unpublishedstudies with nonsignificant results that would be needed tooverturn the conclusion of the meta-analysis. Usually thisnumber is calculated, and then one must judge (subjectively)whether it is probable that there are so many unreportedstudies. The literature does not provide much guidance onhow to interpret an observed FSSS; however, Rosenthal(1984, p. 110) suggested that if there are 1 observed studiesand the FSSS is larger than 51 + 10 (i.e., for every observedstudy there would have to be approximately 5 unobservedstudies with nonsignificant results; the "+10" is to correctfor small 1), then the results should be considered robust.

A more general formulation of this problem is possi­ble incorporating data augmentation techniques (Tanner andWong 1987) into the hierarchical selection model intro­duced in Section 3 to estimate a fixed number, N, of unob­served studies. This approach simultaneously accounts forheterogeneous study effects and bias involved in the collec­tion of studies to be included in the meta-analysis, whilealso directly modeling unobserved studies to assess sensi­tivity of final conclusions. Using this approach, unobservedstudies are no longer restricted to be nonsignificant at somechosen level a. They are simply studies (either significantor not) generated and then not selected according to theselection mechanism defined by the weight function w(.).

Two methods are used to examine sensitivity of inferenceto N. The first computes a credible interval for f.J" the av­erage study effect, as a function of N. In the spirit of theFSSS method, the first N for which the interval crosses 0is taken to be the FSSS. This FSSS determines how manyunobserved studies would have to exist before one began todoubt the "significance" of the results. As before, one mustdecide subjectively whether it is plausible that so many un­reported, nonsignificant studies exist. The second methodused to examine sensitivity of inference to N examines pos­terior distributions of interest (i.e., those of f.J, and <T~) as afunction of N.

In Section 5.1, a Gibbs/Metropolis/data augmentationapproach is used to estimate the hierarchical selectionmodel with unobserved studies. The weight function is as­sumed unknown and estimated. Paul (1995) described thecase where the weight function is known.

5.1 Unknown Step Weight Function

Assume that data are generated according to the hierar­chical selection model described in Section 3 and that theweight function w(·I"{) that describes the probability of ob­serving a study effect is unknown and needs to be estimated.More specifically, model the bias involved in observing a

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 9: Hierarchical Selection Models with Applications in Meta-Analysis

Silliman: Hierarchical Selection Models in Meta-Analysis 933

It)

ci

...ci

'"ci

'"ci

oci

..............................

.....................

............_­------ --- ---------

(2.58,00), A2 = [-2.58, -1.96) U (1.96,2.58], A3 =[-1.96,1.96] .

As in Section 4.2, '1'1, '1'2, and '1'3 are assumed to havea uniform prior distribution on the unit cube, with the re­striction that '1'3 :::; '1'2 :::; '1'1, and a Metropolis step is used tosample from the full conditional densities. Here a candidatevalue "Y* = ('I'i, '1'2 , '1'3) is accepted with probability:

{

( *)EiIA. (x;lUi) ( *)EiIA' (u;lu;) }~ J 1-~ J

TI l c 2. TI l +N(l-c 2.)

(*) . 1=1 W,Q!'O'"l'''' 1=1+1 w,a:z,O'"z ,.., 1

f3 "Y,"Y = mm ()EiIA.(x;lui) ( )EiIA.(U;lui.) , •"I; J 1-"(; 3

TI l C 2 TI l +N(l-c 2)

l=1 w,cxz.uz."'{ l=I+1 w,Oll'O'l ,'Y

The probability integral transformation is used to sampleunobserved studies, Uc; as explained in the Appendix.

Assuming that the prior distributions specified in the thirdstage of the hierarchical selection model are proper andgiven by (1), the remaining full conditional densities neededfor the Gibbs/Metropolis/data augmentation algorithm are

6020o 40N

Figure 1. 95% Credible Interval for J.I. as a Function of N. The up­per and lower dashed lines are the 97.5th and 2.5th quantiles of theestimated posterior distribution of J.I. for the hierarchical selection modelwhere unobserved studies are modeled directly. The step weight functionin this model is estimated.

and

75

.9942

50

.9943

N

25

.9963.9966P(J.I.> Diy, N)

Table 3. Evidence for Superiority of NaF as a Functionof the Number of Unobserved Studies

i = 1, ... ,1 + N,

u~ly,a,It,"Y,6

(I + N 1 I+N )

rvIG _.-2-+c'2 ~(ai-It)2+d (7)

where a = (a1,"" aI+N)' and a = CL{~{" ai)/(1 + N).Note that sampling from each of these densities is straight-forward. .

If the Jeffreys prior distributions are specified in the thirdstage of the hierarchical selection model [i.e., a flat priordistribution for It and the Jeffreys prior distribution P1 foru~ given in (2)], the remaining full conditional densitiesneeded for the Gibbs/Metropolis/data augmentation algo­rithm are

2 (_( (I+N)b )ItIY, a, U a , "Y, {j rv N a U~ + (1 + N)b

j = 1,2,3,

i = 1, ... ,N,

and

( ):Bi l A ' (Xi/ai)

2 'l'j J

'l'j IY, a, It,U a' 'l'k=f:.j, 6 <X TIl1=1 Cw,a! ,a? ,"I

where I represents the usual indicator function, Pj denotesthe prior on 'l'j for j = 1,2,3, and A 1 = (-00, -2.58) U

study effect by the following step weight function:

{

'1'1 if ~ > 2.58,

w(xh') = '1'2 if 1.96 < ~ :::; 2.58,

'1'3 if ~ < 1.96,

where x is the observed difference in effect, a is the cor­responding standard error, "Y = (')'1, '1'2, '1'3)', and 0 :::; '1'3 :::;'1'2 < '1'1 < 1.

Following the hierarchical selection model, study effectsY = (Y1, ... ,YI+N) are generated according to the normaldensities !(·lcl!i, un and observed with probability w(YiI"Y).Denote the observed study effects by X = (Xl"", Xl)and the unobserved study effects by U = (U1, ... , UN),where Yi == Xi for i = 1, ... , I and Yi == Ui.., fori = 1+ 1, ... ,I + N. Introduce the latent indicator vari­able 8i , which equals I if Yi is observed and 0 if it is notobserved, for i = 1, ... ,1+N. Note that 8i given Yi = Yi isa Bernoulli random variable, with parameter P == w(Yil"Y).Conditioning on {j = (81, ... ,8I+N), the full conditionaldensities for the unobserved studies U, and "Y needed for

. the Gibbs/Metropolis/data augmentation algorithm are

Uilx, Ujioi, a, It, u~, "Y, {j <X [1 - w(uil"Y)]N(aI+i, UJ+i),

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 10: Hierarchical Selection Models with Applications in Meta-Analysis

934 Journal of the American Statistical Association, September 1997

N = 1 N=25 N=50 N =75

0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.60.0 0.2 0.4 0.60.0 0.2 0.4 0.6

(a) (b) (c) (d)

Figure 2. Posterior Distribution of 1.1. as a Function of N. This is for the hierarchical selection model where model parameters, the step weightfunction, and unobserved studies are all estimated. (a) The case of 1 unobserved study; (b) the case of 25 unobserved studies; (c) the case of 50unobserved studies; (d) the case of 75 unobserved studies.

i = 1, ... ,f + N,

ILly, Q, (J~,1',6",N(a, f :~N),and

(8)

A Metropolis step is used to sample from the full condi­tional density for (J~.

The study variances (J;, i = 1, ... , I + N, are assumedknown. For i = 1, ... , I, the observed study variances areused. For the remaining unobserved studies, each studyvariance is taken to be the geometric mean of the observedstudy variances. More specifically, for i = I + 1, ... .I+N,(J; = ((Jr(J~ ... (JJ)!/!. The geometric mean of the 12 ob­served studies in the Johnson meta-analysis used as an ex­ample throughout this article is .0855.

The results for the Johnson meta-analysis using the Jef­freys prior suggest that conclusions about the superiorityof NaP versus SMFP are robust even if some studies havenot been observed. Recall that Rosenthal (1984) suggested

that if the FSSS is larger than Sf + 10, where I is the totalnumber of observed studies, then results should be consid­ered robust. For the Johnson meta-analysis, this suggestedcutoff is 70. Figure 1 plots the 95% credible interval for IL

as a function of N, the number of unobserved studies. Forthe range of N considered (i.e., 1-75), the credible intervalis entirely above 0, so the FSSS is in fact greater than 70.In addition, the estimate of the probability that NaP is moreeffective than SMFP remains high regardless of N, as Table3 shows.

Figures 2 and 3 plot the posterior distributions of IL and(J~ for N = 1,25,50,75. As N increases, the posterior dis­tributions become more peaked. This makes sense, in thatthe more "data," the more peaked the likelihood and hencethe posterior. The posterior distributions of IL and (J~ alsoshift toward 0 as N increases, which is to be expected.As more unobserved studies that tend to be nonsignificant(i.e., close to 0). are added, the estimate of IL, the overallstudy effect, decreases. In addition, as there is not muchinformation about the unobserved study effects, they tendto be indistinguishable. Thus as more of these similar un­observed study effects are added, the estimate of (J~, thebetween-study variance, decreases.

N = 1 N=25 N=50 N=75

0.0 0.05 0.10 0.16 0.20 0.0 0.05 0.10 0.15 0.20 0.0 a.OS 0.10 0.15 0.20 0.0 0.05 0.10 0.16 0.20

(a) (b) (c) (d)

Figure 3. Posterior Distribution of ~ as a Function of N. These are from the hierarchical selection model where unobserved studies aremodeled directly and the step weight function is estimated. (a) The case of 1 unobserved study; (b) the case of 25 unobserved studies; (c) thecase of 50 unobserved studies; (d) the case of 75 unobserved studies.

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 11: Hierarchical Selection Models with Applications in Meta-Analysis

Silliman: Hierarchical Selection Models in Meta-Analysis 935

l-w (uiIYNi' a, J.L, a~, "1,6)

To sample an unobserved study Ui, the probability integraltransformation is used. Define

APPENDIX: PROBABILITY INTEGRALTRANSFORMATION TO SAMPLE

UNOBSERVED STUDY Vi

if 1.96 < ~ :S 2.58,

if ~ < 1.96C1'i - ,

if ~ > 2.58"i '

{

A-I (1 -,I)¢> (Ui;/";)

A-I (1 -,2)¢> (Ui;;"'i)

A-I (1 -,3)¢> (Ui;;"'i)

[1 - w(uil"l)]¢> (~)

1[1 - w(ul"I)]¢> (U u;;) du

al (1 -,1 )<I>( -2.58 - ai/ai),

a2 (1-'2)[<I>(-1.96 - ai/ai) - <I>(-2.58 - ai/ai)],

a3 (1 -'3)[<I>(1.96 - ai/ai) - <I> (-1.96 - ai/ai)],

a4 = (1 -'2)[<I>(2.58 - ai/ai) - <I>(1.96 - ai/ai)],

as (1 -,1)[1- <I>(2.58 - ai/ai)],

and A = al + a2 + a3 + a4 + as, where <I>(.) represents the dis­tribution function of the standard normal distribution. The fullconditional density for U, is

ies, this article has presented a method for estimating theweight function. For an even more robust approach, albeitin the context of a nonhierarchical model, Silliman (1998)describes how to use nonparametric classes of weight func­tions.

One interesting extension to the methods presentedherein would be to incorporate covariates. These could beincorporated directly into the hierarchical model, or onecould imagine modeling the parameters in the weight func­tion as a linear combination of covariates. Another interest­ing extension would be to allow for multivariate responsedata from each study in the meta-analysis. For example,if there are three treatments A, B, and C, one might wishto simultaneously consider the differences between treat­ments A and B, A and C, and B and C. Finally, althoughthe methods presented here are applied specifically to thearea of meta-analysis, they are more general and could beapplied to other areas as well.

where ¢>(.) represents the probability density function of the stan­dard normal distribution. The distribution function correspondingto the density fl-w (uiIYNi' a, J.L, a;","1,6) is

6. CONCLUSIONS

This article has introduced hierarchical selection mod­els and illustrates how they can be used in meta-analysis.Hierarchical selection models simultaneously account forheterogeneity of study effects and selection bias involvedin the collection of those study effects, two problems com­mon in meta-analysis. They are attractive models to usein that they are very general and their implementation isstraightforward with the advent of MCMC methods.

Hierarchical selection models are introduced first forknown weight functions. As results are sensitive to thechoice of weight function, this approach is then extendedto allow for estimation of the weight function, a more ro­bust procedure. Estimation of both parametric families ofweight functions and the more general step weight functionhas been presented.

All of the weight functions specify 'that smaller study ef­fects are observed with too Iowa frequency, whereas largerstudy effects are observed with too high a frequency. Undersuch publication bias, the estimate of the overall effect t-tbecomes smaller, as does the estimate of the between-studyvariance 0-;. Of the weight functions considered herein,size-biased sampling seems to represent a more severe typeof publication bias than either truncated sampling or sam­pling represented by the step weight function. This is sup­ported by the fact that the estimates of the overall effect t-tand the between-study variance 0-; are consistently smallerfor size-biased sampling. In turn, the estimates under trun­cated sampling and the step weight function are smallerthan those under the hierarchical model with no publica­tion bias. However, for the example presented in this arti­cle, substantive results remain the same regardless of theweight function chosen.

Results are also somewhat sensitive to the choice of thestage III prior distribution. The clinical informative priorspecifies a priori that there is no difference between 'thefluorides, whereas the Jeffreys prior tends to let the datadrive conclusions. As the observed data suggest that NaPis the better fluoride, results under the Jeffreys prior tendto favor the superiority of NaP more than results under theclinical informative prior. Again though, substantive con­clusions remain the same for the chosen example.

The hierarchical selection model provides an approachfor dealing with bias involved in collecting studies for ameta-analysis. It is also desirable to model unobserved stud­ies directly. This has been done here using data augmenta­tion to estimate a fixed number, N, of unobserved studiesin the context of the hierarchical selection model. Sensitiv­ity of conclusions were then considered as a function of N.This extends the hierarchical selection model, so that nowone is able to simultaneously model heterogeneous studyeffects, account for any publication bias present, and inves­tigate sensitivity of conclusions to unobserved study effectsdirectly.

Because specification of the weight function is a con­cern in both the hierarchical selection model and the hier­archical selection model that incorporates unobserved stud-

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14

Page 12: Hierarchical Selection Models with Applications in Meta-Analysis

if ~ > 2.58C7i '

936

if ~ < -2.58,

A-I [al + (1- 12) [<I> (";;i";) - <I> (-2.58 - adai)]]

if - 2.58 < ~ < -1.96- Ui '

A-I [al + a2 + (1 - 'Y3)

x [<I> (";;''';) - <I> (-1.96 - adai)]]

if - 1.96 ~ ~ ~ 1.96,,A-I [al + a2 + a3 + (1- 12 )

X [<I> (";;''';) - <I>(1.96 - ai/ai)]]

if 1.96 < ~ ~ 2.58,

A-I [al + a2 + a3 + a4 + (1 - 'YI)

X [<I> (";;''';) - <I>(2.58 - adai)]]

which has a uniform (0, 1) distribution. To sample an unobservedstudy U, = Ui, sample Yi = Yi from a uniform (0, 1) distribution.It follows that

ai<I>-1 C~l!yl) + ai if 0 ~ Yi ~ A-I ci ,

ai<I>-1 (Ar!-~;l + <I>[-2.58 - ai/ai]) + ai

if A-1al < Yi ~ A-1(al +a2),

ai<I>-1 (Ay\-_a~a-a2 + <I>[-1.96 - ai/ai]) + ai

Ui = if A-1(al +a2) < Yi < A-1(al +a2 +a3),

ai<I>-1 (AY;-~1~;2-aa + <I>[1.96 - ai/ai]) + ai

if A-I (al + a2 + a3) ~ Yi < A-I (al + a2 + a3 + a4),

a·<I>-1 (Ay;-a1-a2-aa-a4 + <I>[2.58 _ a·/a·]) + a'1. l-'Yl 1. 1. t

if A-1(al +a2 +a3 +a4) ~ Yi ~ 1

is a sample from the full conditional density l-W(UiIYit-i,a, I-t,

a~",6).

[Received April 1995. Revised November 1996.]

REFERENCESBayarri, M. J., and DeGroot, M. H. (1993), "The Analysis of Published

Significant Results," in Rassegna di Metodi Statistici ed Applicazioni.ed. W. Racugno, Bologna: Pitagora, pp. 19-41.

Berger, 1. O. (1980), Statistical Decision Theory and Bayesian Analysis,New York: Springer-Verlag. .

Box, G. E. P., and Tiao, G. C. (1973), Bayesian Inference in StatisticalAnalysis, New York: Wiley.

Carlin, B. (1992), Comment on "Hierarchical Models for Combining Infor­mation and for Meta-Analyses," by C. N. Morris and S. L. Normand, inBayesian Statistics 4, eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, andA. F. M. Smith, Oxford, U.K.: Oxford University Press, pp. 321-344.

Cleary, R J. (1993), "Models for Selection Bias in Meta-Analysis," un­published Ph.D. dissertation, Cornell University, Biometrics Unit.

Cooper, H., and Hedges, L. V. (Eds.) (1994), The Handbook of ResearchSynthesis, New York, Russell Sage Foundation.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), "Maximum Like­lihood From Incomplete Data via the EM Algorithm" (with discussion),Journal of the Royal Statistical Society, Ser. B, 39, 1-38.

Journal of the American Statistical Association, September 1997

DerSimonian, R., and Laird, N. (1986), "Meta-Analysis in Clinical Trials,"Controlled Clinical Trials, 7, 177-188.

DuMouchel, W. (1990), "Bayesian Meta-Analysis," in Statistical Method­ology in the Pharmaceutical Sciences, ed. D. A. Berry, New York: Mar­cel Dekker, pp. 509-529.

Fisher, R A. (1934), "The Effect of Methods of Ascertainment Upon theEstimation of Frequencies," Annals of Eugenics, 6, 13-25.

Gelfand, A. E., and Smith, A. F. M. (1990), "Sampling-Based Approachesto Calculating Marginal Densities," Journal of the American StatisticalAssociation, 85, 398-409.

Gilks, W. R, and Wild, P. (1992), "Adaptive Rejection Sampling for GibbsSampling," Applied Statistics, 41, 337-348.

Glass, G. V. (1976), "Primary, Secondary, and Meta-Analysis of Research,"Educational Researcher, 5, 3-8.

Greenwald, A. (1975), "Consequences of Prejudice Against the Null Hy­pothesis," Psychological Bulletin, 82, 1-20.

Hastings, W. K. (1970), "Monte Carlo Sampling Methods Using MarkovChains and Their Applications," Biometrika, 57, 97-109.

Hedges, L. V. (1983), "A Random Effects Model for Effect Sizes," Psy­chological Bulletin, 93, 388-395.

--- (1992), "Modeling Publication Selection Effects in Meta-Analysis,"Statistical Science, 7, 246-255.

Hedges, L. V., and Olkin, I. (1985), Statistical Methodsfor Meta-Analysis,Orlando: Academic Press.

Iyengar, S., and Greenhouse, J. B. (1988), "Selection Models and the FileDrawer Problem" (with discussion), Statistical Science, 3, 109-135.

Jeffreys, H. (1961), Theory of Probability (3rd ed.), London: Oxford Uni­versity Press.

Johnson, M. F. (1993), "Comparative Efficacy of NaF and SMFP Denti­frices in Caries Prevention: A Meta-Analytic Overview," Journal of theEuropean Organization for Caries Research (ORCA), 27, 328-336.

Kass, R E., and Wasserman, L. (1994), "Formal Rules for Selecting PriorDistributions: A Review and Annotated Bibliography," Technical Report583, Carnegie Mellon University, Dept. of Statistics.

Larose, D. T., and Dey, D. K. (1996), "Weighted Distributions Viewedin the Context of Model Selection: A Bayesian Perspective," Test, 5,227-246.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., andTeller, E. (1953), "Equations of State Calculations by Fast ComputingMachines," Journal of Chemical Physics, 21,1087-1091.

Morris, C. N., and Normand, S. L. (1992), "Hierarchical Models for Com­bining Information and for Meta-Analyses," in Bayesian Statistics 4,eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F.M. Smith,Oxford, U.K.: Oxford University Press, pp. 321-344.

Paul, N. L. (1995), "Issues in Combining Information: Hierarchical Mod­els, Selection Models, and Unobserved Data," unpublished Ph.D. disser­tation, Carnegie Mellon University, Dept. of Statistics.

Rao, C. R (1985), "Weighted Distributions Arising out of Methods ofAscertainment: What Population Does a Sample Represent?," in A Cel­ebration of Statistics: The lSI Centenary Volume, eds. A. C. Atkinsonand S. E. Fienberg, New York: Springer-Verlag, pp. 543-569.

Rosenthal, R (1979), "The 'File Draw Problem' and Tolerance for NullResults," Psychological Bulletin, 86, 638--641.

--- (1984), Meta-Analytic Procedures for Social Research, BeverlyHills, CA: Sage Publications.

Silliman, N. P. (in press), Non-parametric Classes of Weight Functions toModel Publication Bias, submitted to Biometrika.

Tanner, M. A., and Wong, W. H. (1987), ''The Calculation of Posterior Dis­tributions by Data Augmentation," Journal of the American StatisticalAssociation, 82, 528-550.

Tierney, L. (1991), "Markov Chains for Exploring Posterior Distributions,"Technical Report 560, University of Minnesota, School of Statistics.

West, M. (1994), "Discovery Sampling and Selection Models," in Statis­tical Decision Theory and Related Topics V, eds. S. S. Gupta and J. O.Berger, New York: Springer-Verlag, pp. 221-235.

Dow

nloa

ded

by [

Flor

ida

Atla

ntic

Uni

vers

ity]

at 1

2:37

11

Nov

embe

r 20

14