[wiley series in probability and statistics] statistical meta-analysis with applications || vote...

9
CHAPTER 16 VOTE COUNTING PROCEDURES We now describe the method of vote counting procedures which is used when we have scanty or incomplete information from the studies to be combined for statistical meta-analysis. This chapter, which contains standard materials on this topic, is mostly based on Hedges and Olkin (1985) and Cooper and Hedges (1994). Some new results are also added at the end. The nature of data from primary research sources which are available to a meta-analyst generally falls into three broad categories: (i) complete information (e.g., raw data, summary statistics) that can be used to calculate relevant effect size estimates such as means, proportions, correlations, and test statistic values; (ii) results of hypothesis tests for population effect sizes about statistically significant or nonsignificant relations; and (iii) information about the direction of relevant outcomes (i.e., conclusions of significant tests) without their actual values (i.e., without the actual values of the test statistics). Vote counting procedures are useful for the second and third types of data, that is, when complete information about the results of primary studies are not available in the sense that effect size estimates cannot be calculated. In such situations often the information from a primary source is in the form of a report of the decision obtained from a significance test (i.e., significant positive relation or nonsignificant positive relation) or in the form of a direction (positive or negative) of the effect without regard Statistical Meta-Analysis with Applications. By Joachim Hartung, Guido Knapp, Bimal K. Sinha Copyright @ 2008 John Wiley & Sons, Inc. 203

Upload: bimal-k

Post on 06-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

CHAPTER 16

VOTE COUNTING PROCEDURES

We now describe the method of vote counting procedures which is used when we have scanty or incomplete information from the studies to be combined for statistical meta-analysis. This chapter, which contains standard materials on this topic, is mostly based on Hedges and Olkin (1985) and Cooper and Hedges (1994). Some new results are also added at the end. The nature of data from primary research sources which are available to a meta-analyst generally falls into three broad categories: (i) complete information (e.g., raw data, summary statistics) that can be used to calculate relevant effect size estimates such as means, proportions, correlations, and test statistic values; (ii) results of hypothesis tests for population effect sizes about statistically significant or nonsignificant relations; and (iii) information about the direction of relevant outcomes (i.e., conclusions of significant tests) without their actual values (i.e., without the actual values of the test statistics).

Vote counting procedures are useful for the second and third types of data, that is, when complete information about the results of primary studies are not available in the sense that effect size estimates cannot be calculated. In such situations often the information from a primary source is in the form of a report of the decision obtained from a significance test (i.e., significant positive relation or nonsignificant positive relation) or in the form of a direction (positive or negative) of the effect without regard

Statistical Meta-Analysis with Applications. By Joachim Hartung, Guido Knapp, Bimal K. Sinha Copyright @ 2008 John Wiley & Sons, Inc.

203

204 VOTE COUNTING PROCEDURES

to its statistical significance. In other words, all is known is whether a test statistic exceeds a certain critical value at a given significance level (such as a* = 0.05) or if an estimated effect size is positive or negative, which amounts to the observation that the test statistic exceeds the special critical value at significance level a* = 0.5. Actual values of the test statistics are not available.

To fix ideas, recall that often a meta-analyst is interested in determining whether a relation exists between an independent variable and a dependent variable for each study, that is, whether the effect size is zero for each study. Let T I , . . . , T k be indepen- dent estimates from k studies of the corresponding population effect sizes 61. . . . , Qk

(i.e., difference of two means, differencehati0 of two proportions, difference of two correlations, or z-values). Under the assumption that the population effect sizes are equal, that is, 81 = = Qk = 6, the appropriate null and alternative hypotheses are HO : Q = 0 (no relation) against H1 : 6 > 0 (relation exists). The test rejects HO if an estimate T of the common effect size 8, when standardized, exceeds the one-sided critical value d a , Typically, in large samples, one invokes the large-sample approx- imation of the distribution of T , resulting in the normal distribution of T, and we can then use $a = z,, the cut-off point from a standard normal distribution. On the other hand, if a 100(1 - a)% level confidence interval for 6’ is desired, it is usually provided by

where b(T) is the (estimated, if necessary) standard error of T. Quite generally, the standard error o(Q) of T will be a function of Q and can be estimated by b(T), and a normal approximation can be used in large samples. We refer to Chapter 4 for details.

When the individual estimates T I . . . . , T k as well as their (estimated) standard errors b(T1). . . . , b(Tk) are available, the solutions to these testing and confidence interval problems are trivial (as discussed in previous lectures). However, the essential feature of a vote counting procedure is that the values of TI ~ . . . . T k are not observed, and hence none of the estimated standard errors of the Tz’s are available. What is known to us is not the exact values of the T,’s, but just the number of them which are positive or how many of them exceed the one-sided critical value ua*. The question then arises if we can test HO : 6 = 0 or estimate the common effect size Q based on just this very incomplete information.

The sign test, which is the oldest of all nonparametric tests, can be used to test the hypothesis that the effect sizes from a collection of k independent studies are all zero when only the signs of estimated effect sizes from the primary sources are known. If the population effect sizes are all zero, the probability of getting a positive result for the estimated effect size is 0.5. If, on the other hand, the treatment has an effect, the probability of getting a positive result for the estimated effect size is greater than 0.5. Hence, the appropriate null and alternative hypotheses can be described as

HO : T = 0.5 versus HI : T > 0.5, (16.1)

where 7r is the probability of a positive effect size in the population. The test can be carried out in the usual fashion based on a binomial distribution and rejects HO if X / k exceeds the desired level of significance, where X is the number of studies out of a total of k studies with positive estimated effect sizes.

T - $,/z b(T) I Q 5 T + 4,/2 b(T),

VOTE COUNTING PROCEDURES 205

Example 16.1. Suppose that a meta-analyst finds exactly 10 positive results in 15 independent studies. The estimate of 7r is p = 10/15 = 0.67, and the corresponding tail area from the binomial table is 0.1509. Thus, we would fail to reject HO at the 0.05 overall significance level or even at the 0.10 overall significance level. On the other hand, if exactly 12 of the 15 studies had positive results, the tail area would become 0.0176, and we would reject HO at the 0.05 overall level of significance.

The main criticism against the sign test is that it does not take into account the sample sizes of the different studies, which are likely to be unequal, and also it does not provide an estimate of the underlying common effect size 6' or provide a confidence interval for the common effect size. Under the simplifying assumption that each study in a collection of k independent studies has an identical sample size n, we now describe a procedure to establish a point estimate as well as a confidence interval for the common effect size l3 based on a knowledge of the number of positive results. If a study involves an experimental (E) as well as a control ( C ) group, we assume that the sample sizes for each such group are the same, that is, nf = n," = n for all k studies. In case k studies have different sample sizes, we may use an average value, namely,

(16.2)

Based on a knowledge of the signs of Tz's, an unbiased estimate of 7i is given by p = X / k , where X is the number of positive Tz's. It is also well known that a lOO(1 - a)% level approximate confidence interval for 7i (based on the normal approximation) is given by

where z , / ~ is the two-sided critical value of the standard normal distribution. A second method uses the fact that

has an approximate chi-square distribution with 1 df, which leads to the two-sided interval

> ( 2 p + b) - Jb2 + 4bp(l - p ) ( 2 p + b) + J b 2 + 4bp(l - p )

2 ( 1 + b ) [TL. ml] =

(1 6.4) where b = x l , , / k and xf .a is the upper l O O a % point of the chi-square distribution with 1 df.

206 VOTE COUNTING PROCEDURES

Once a two-sided confidence interval [TL ~ 7ru] has been obtained for T , a two-sided confidence interval for 8 can be constructed by using the relation

7r = Pr[T > qa]

where a(.) is the standard normal cdf. Solving the above equation yields

8 = $a - S(8)W1(1 - T ) . (1 6.5)

which provides a relation between the effect size 8 and the population proportion T

of a positive effect size. A point estimate of 8 is then obtained by replacing 7r by p = X / k in the above equation and solving for 8. To obtain a two-sided confidence interval for 8, we substitute T L and TV for 7r and solve for the two bounds for 8.

Example 16.2. Let us consider the case when an effect size is measured by the standardized mean difference given by

E C 8, - Pi -Pi 2 - , a = l , . . . , k ,

gi

where p? is the population mean for the experimental group in the ith study, p*.,c is the population mean for the control group in the ith study, and oi is the population standard deviation in the ith study, which is assumed to be the same for the experimental and the control groups. The corresponding estimates Ti's are given by (Hedges's g)

Y , " - y , C ,

T, = , z = 1 , . . . . k , S 2

(16.6)

where E" is the sample mean for the experimental group in the ith study, KC is the sample mean for the control group in the ith study, and S, is the pooled within-group sample standard deviation in the ith study. In large samples, the approximate variance S,(8,) of T, is given by

where n denotes the common sample size for all the studies. Equation (16.5) in this case then reduces to

(16.7)

Let us consider a real data example from Raudenbush and Bryk (1985). The data are given in Table 16.1.

VOTE COUNTING PROCEDURES 207

Table 16.1 Studies of the effects of teacher expectancy on pupil IQ

Study nE nc a d

1 2 3 4 5 6 I 8 9

10 11 12 13 14 15 16 17 18 19

I1 60 12 11 11

129 110 26 15 32 22 43 24 19 80 12 65

233 65

339 198 12 22 22

348 636

99 14 32 22 38 24 32 19 12

255 224

61

208.0 129.0 72.0 16.5 16.5

238.5 313.0

62.5 14.5 32.0 22.0 40.5 24.0 25.5 79.5 12.0

160.0 228.5

66.0

0.03 0.12

-0.14 1.18 0.26

-0.06 -0.02 -0.32

0.27 0.80 0.54 0.18

-0.02 0.23

-0.18 -0.06

0.30 0.07

-0.07

nE = experimental group sample size, nc = control group sample size, 6 = (nE + nC)/2 = mean group sample size.

For this data set, n is approximated as 84, using the mean group sample sizes f i in Eq. (16.2), and the estimate of 7i based on the proportion of positive results is p = 11/19 = 0.579. Solving for 0, using ~a = 0, we obtain 0 = 0.032, which is the proposed point estimate of the population effect size. To obtain a 95% confidence interval for 0, we note that the confidence interval for T based on the Eq. (16.3) is [0.357,0.801] and that based on Eq. (16.4) is [0.363,0.769], which is slightly narrower. Using these latter values in Eq. (16.7), we find that the 95% confidence interval for 0 is given by [-0.056,0.121]. Since this confidence interval contains the value 0, we conclude that we can accept the null hypothesis that the population effect size is 0 for all the studies.

For the same data set, we can also obtain a point estimate and a confidence interval for 0 based on the proportion of significant positive results. Since 3 of the 19 studies result in statistically significant values at cy = 0.05, with the corresponding value of = 1.64, our estimate of T is p = 3/19 = 0.158, and this results in the point estimate of 8 as 6 = 0.013. Again, the confidence interval for 7r based on the normal theory is obtained as [-0.006; 0.3221 and based on the chi-square distribution is given by [0.055,0.376]. Using the latter bounds and Eq. (16.7), we obtain the 95%

208 VOTE COUNTING PROCEDURES

Study n r

confidence bounds for 6’ as [0.032,0.212]. Since this interval does not contain 0, we can conclude that the common effect size 6’ is significantly greater than 0.

Example 16.3. We next consider the situation when both the variables X and Y are continuous, and a measure of effect size is provided by the correlation coefficient p. Typically, the population correlation coefficients p1 , . . . , p k of the k studies are estimated by the sample correlation coefficients TI , . . . , T k , which represent the Bi’s and the Ti’s, respectively.

It is well known that, in large samples, Var(ri) % (1 - p ~ ) ’ / ( n - l), where n is the sample size. We thus have all the ingredients to apply formula (16.5) to any specific problem.

As an example, we consider the data set from Cohen (1983) on validity studies correlating student ratings of the instruction with student achievement. The relevant data are given in Table 16.2.

Study n r

Table 16.2 Validity studies correlating student ratings of the instructor with student achievement

1 10 0.68 11 36 -0.11 20 13 22 28 12 - 12 36 19

10 12 0.18

0.56 0.23 0.64 0.49

0.49 0.33 0.58

-0.04

20 20 0.16

12 13 14 15 16 17 18 19

75 0.27 33 0.26 121 0.40 37 0.49 14 0.51 40 0.40 16 0.34 14 0.42

Suppose we wish to obtain a point estimate and a confidence interval for p , the assumed common population correlation, based on the proportion of positive results. Obviously, here p = 18/20 = 0.9, and, using Eq. (16.2), f i = 26. Taking $a = 0, we then get p = 0.264 as the point estimate of p . The 95% approximate two-sided confidence interval for p based on the normal theory is given by [0.769,1.031] while that based on the chi-square theory is obtained as [0.699, 0.9721. Using the latter, the confidence bounds for p turn out as [0.107,0.381]. Because this interval does not contain the value 0, we can conclude that there is a positive correlation between student ratings of the instructor and student achievement.

For the same data set, we can proceed to obtain point estimates and confidence bounds for p based on only significantly positive results. Taking a = 0.05, so that

= 1.64, and noting that p = 12/20 = 0.6, we obtain the point estimate of p as j = 0.372. Similarly, using the chi-square-based confidence interval for 7r, namely, [0.387,0.781], the bounds for p are obtained as [0.271,0.464], leading to the same conclusion.

VOTE COUNTING PROCEDURES 209

We now present a new and exact result pertaining to the estimation of the effect size 8 when the sample sizes are unequal and these are not replaced by an average sample size! As before, we assume that the effect sizes T I , Tz, . . . , Tk are not observed, and instead we observe 21,22,. . . , Z k , where

1, Ti > 0, 0, Ti I: 0.

2, =

Our goal then is to draw inference upon 6 using the data Z1,22, . . . ,Zk. Let 0:(8, ni) denote the variance of Ti. Note that even though Tl1 Tz, . . . , T k are not observed, we typically know how they were computed (i.e., for each i we know whether Ti is a sample correlation, Cohen’s d, Hedges’s g, Glass’s A, etc.) and so we at least have an approximate expression for ~ , ” ( 6 , ni) for each i. Note that a,”(Q, ni) depends on 8 and on ni. Now Zl ,Zz , . . . , ZI, are independent random variables such that Zi N Bernoulli(Ti), where

and where @( .) denotes the standard normal cumulative distribution function. Hence

(16.8)

When the ni’s are assumed to be equal (or approximated by an average sample size), T’S coincide and result in the simplified likelihood. By looking at Eq. (16.8) we easily note how 7ri depends on the sample size ni through the standard deviation ai(6, ni). Thus if the ni’s are unequal, then the ri’s will also be unequal. Now let z1,z2, . . . , Zk denote the observed values of the random variables Zl,Zz,. . . , Z k .

Then the likelihood function for 8 can be written as follows:

i=l

21 0 VOTE COUNTING PROCEDURES

Using the approximate likelihood L(8) defined as

we can then obtain the maximum likelihood estimate of 8. Example 16.4. To illustrate this procedure we revisit the data in Example 16.3 and displayed in Table 16.2. Let us pretend that r l ~ 7-2, . . . , r k are not observed; instead we observe Z l , Z z , . , . , z k , where

1, rz > 0, 0, r, 5 0.

zi =

We then use the likelihood function defined in Eq. (16.9) to draw inference upon p based on Zl .Zz , . . . . z k . Using the expression for L(8) given in Eq. (16.9) the likelihood function for p can be written as

where (16.1 1)

Using a numeric optimization routine we find that for this example L ( p ) is maximized when p = 0.2064, which is the maximum likelihood estimate of p. It is interesting to note that this estimate of p differs considerably from the earlier value 0.264.

An alternative way to perform the analysis in this example is instead of tak- i n g 8 = p w e t a k e e = p* = i l n { ( l + p ) / ( l - p ) } . T h e n w e l e t T , = r,* =

stabilizing transformation of r,. Since r,* > 0 @ T, > 0, we observe that In { (1 + r , ) / ( 1 - r , )} . Note that i In { (1 + r z ) / ( 1 - r , ) } is the well-known variance-

Following the general expression (16.9), we can then write down the following like- lihood function for p* :

where

and where ~ 1 . ~ 2 . . . ., Zk denote the observed values of Z1.z~ % . . . , Z k . Using a numeric optimization routine we find that for this example L ( p * ) is maximized when p* =

VOTE COUNTING PROCEDURES 21 1

0.2718, implying jj = 0.2653. It is rather interesting to observe the stern dissimilarity between the two estimates of p (namely, 0.2064 and 0.2653), obtained by using two approaches based on r and r* , and the rather strange similarity between the two values (0.264 and 0.265), obtained by using the exact likelihood based on r* and the approximate likelihood based on r and an average sample size! One wonders if this remarkable difference vanishes when the sample sizes are large, and here is our finding. Keeping the observed values of Zz’s the same, and just changing the sample sizes of the available studies, we have considered three scenarios and in each case applied the exact method based on T and T* .

Case (i): 121 = 55,122 = 60, 123 = 65, . . . ,1220 = 150. Here we find that

f iMLE(r) = 0.1158 and ~ M L E ( ~ * ) = 0.1315.

Case (ii): nl = 505,122 = 510.723 = 515,. , . . 7220 = 600. Here we find that

$MLE(T) = 0.0520 and F M L E ( T * ) = 0.0549.

Case (iii): n1 = 1005,122 = 1010,123 = 1015,. . . , 1220 = 1100. In this case of extreme large sample sizes, we find that

;MLE(T) = 0.0381 and @MLE(T*) = 0.0397.

Looking at the results above it appears that for large sample sizes the methods of analysis based on T and r* give nearly the same results.

We conclude this chapter with the observation that although the procedures de- scribed above provide quick estimates of an overall effect size along with its estimated standard error, their uses are quite limited due to the requirement of large sample sizes. For details, we refer to Hedges and Olkin (1985).