a tale of randomization: randomization versus mixed model analysis for single and chain...
TRANSCRIPT
A tale of randomization: randomization versus mixed model analysis for single and chain randomizations
Chris BrienPhenomics & Bioinformatics Research Centre, University of South Australia.The Australian Centre for Plant Functional Genomics, University of Adelaide.This work was supported by the Australian Research Council.
2
A tale of randomization: outline
1. Once upon a time.
2. Randomization model for a single randomization.
3. Randomization analysis for a single randomization.
4. Randomization model for a chain of randomizations.
5. Randomization analysis for a chain of randomizations.
6. Some issues.
7. Conclusions.
3
1. Once upon a time In the 70s I was a true believer:
We are talking randomization inference.
4
Purism
These books demonstrate that p-value from randomization analysis is approximated by p-value from analyses assuming normality for CRDs & RCBDs;
Welch (1937) & Atiqullah (1963) show that true, provided the observed data actually conforms to the variance for the assumed normal model (e.g. homogeneity between blocks).
Kempthorne (1975):
5
Randomization analysis: what is it? A randomization model is formulated.
It specifies the distribution of the response over all randomized layouts possible for the design.
A test statistic is identified. I will use test statistics from parametric analyses (e.g. F-statistics).
The value of the test statistic is computed from the data for: all possible randomized layouts, or a random sample (with
replacement) of them randomization distribution of the test statistic, or an estimate;
the randomized layout used in the experiment: the observed test statistic.
The p-value is the proportion of all possible values that are as, or more, extreme than the observed test statistic.
Different to a permutation test in that it is based on the randomization employed in the experiment.
6
Sex created difficulties … and time Preece (1982, section 6.2): Is Sex a block or a
treatment factor? Semantic problem: what is a block factor? Often Sex is unrandomized, but is of interest – I believe
this to be the root of the dilemma. If it is unrandomized, it cannot be tested using a
randomization test (at all?). In longitudinal studies, Time is similar. Sites also. What about incomplete block designs with
recombination of information? Missing values? Seems that not all inference possible with
randomization analysis.
7
Fisher (1935, Section 21) first proposed randomization tests:
It seems clear that Fisher intended randomization tests to be only a check on normal theory tests.
8
Fisher (1960, 7th edition) added Section 21.1 that includes:
Less intelligible test nonparametric test.
9
Conversion I became a modeller,
BUT, I did not completely reject randomization inference. I have advocated randomization-based mixed
models: a mixed model that starts with the terms that would be in
a randomization model (Brien & Bailey, 2006; Brien & Demétrio, 2009).
This allowed me to: test for block effects and block-treatment interactions; model longitudinal data.
I comforted myself that when testing a model that has an equivalent randomization test, the former is an approximation to the latter and so robust.
10
More recently …. Cox, Hinkelmann and Gilmour pointed out, in the
discussion of Brien and Bailey (2006), no one had so far indicated how a model for a multitiered
experiment might be justified by the randomizations employed.
Rosemary Bailey and I have been working for some time on the analysis of experiments with multiple randomizations, using randomization-based (mixed) models; Brien and Bailey (201?) details estimation & testing.
I decided to investigate randomization inference for such experiments, but first single randomizations.
11
2. Randomization model for a single randomization
Additive model of constants:y = w + Xht
where y is the vector of observed responses; w is the vector of constants representing the contributions of each
unit to the response; and t is a vector of treatment constants; Xh is design matrix showing the assignment of treatments to units.
Under randomization, i.e. over all allowable unit permutations applied to w, each element of w becomes a random variable, as does each element of y. Let W and Y be the vectors of random variables and so we have
Y = W + Xht. The set of Yn forms the multivariate randomization distribution, our
randomization model. Now, we assume ER[W] = 0 and so ER[Y] = Xht .
12
Randomization model (cont’d) Further,
R Rvar .H H H H H HH H H
Y V B S QH H H
H is the set of generalized factors (terms) derived from a poset of factors on the units;
zH is the covariance between variables with the same levels of generalized factor H;
yH is the canonical component of excess covariance for H;
hH is the eigenvalue of VR for H and is its contribution to E[MSq];
BH, SH, and QH are known matrices.
This model has the same terms as a randomization-based mixed model (Brien & Bailey, 2006; Brien & Demétrio, 2009)
However, the distributions differ.
13
Randomization by permutation of units & unit factors
Unit Blocks Units Treatments
1 1 1 12 1 2 23 2 1 14 2 2 2
Permutations for an RCBD with b = 2, k = v = 2. The allowable permutations are:
those that permute the blocks as a whole, and those that permute the units within a block; there are b!(k!)b = 2!(2!)2 = 8.
Unit Blocks Units Treatments Permutation
1 1 1 1 42 1 2 2 33 2 1 1 14 2 2 2 2
Permutedunit Blocks Units Treatments Permutation Blocks Units1 1 1 1 4 2 22 1 2 2 3 2 13 2 1 1 1 1 14 2 2 2 2 1 2
Equivalent to Treatments randomization 1, 2, 2, 1.
14
Null randomization distribution: RCBD Under the assumption of no treatment effects, Y* = W +
m*1. In which case, the randomization distribution of Y* is termed the null
randomization distribution Actual distribution obtained by applying each unit permutation to y:Permutation Y*11 Y*12 Y*21 Y*22
1 y11 y12 y21 y22
2 y12 y11 y21 y22
3 y11 y12 y22 y21
4 y12 y11 y22 y21
5 y21 y22 y11 y12
6 y21 y22 y12 y11
7 y22 y21 y11 y12
8 y22 y21 y12 y11
Can show that 1st & 2nd order parameters of the distribution, m*, z*G,
z*B and z*
BU, are equal to sample statistics. For example, for all Y*
ij: * * 2
.. BU, .yy s
Y*ij for Unit
j in Block i.
The distribution of gives the distribution of W. * yY 1
15
VR for the RCBD example The matrices in the expressions for are known.
* * * *R G 2 2 2 B 2 2 2 BU 2 2
* * * *BU B G G* * * *B BU G G* * * *G G BU B* * * *G G B BU
V J I J I J I I I
* * * *R G 2 2 B 2 2 BU 2 2
* * * * * * *G B BU G B G G
* * * * * * *G B G B BU G G
* * * * * * *G G G B BU G B* * * * * * *G G G B G B BU
V J J I J I I
* * * *1 1 1 1 1R G 2 2 B 2 2 2 BU 2 2 2 22 2 2 2 2V J J I J J I J I J
*RV
16
3. Randomization analysis for a single randomization
Estimation and hypothesis testing based on the randomization distribution. Will focus on hypothesis testing.
Propose to use I-MINQUE to estimate the ys and use these estimates to estimate t via EGLS.
I-MINQUE yields the same estimates as REML, but without the need to assume a distributional form for the response.
17
Test statistics Have a set R of idempotents specifying a treatment
decomposition. For an R R, to test H0: RXht = 0, use a Wald F, a Wald
test statistic divided by its numerator df:
1 1 1( ) { ( ) ( ) } ( )Wald h h h h h hF traceRX RX X V X RX RX R Numerator is a quadratic form: (est)’ (var(est))-1 (est). For an orthogonal design, FWald is the same as the F from an ANOVA.
Otherwise, it is a combined F test statistic. For nonorthogonal designs, an alternative test statistic is an
intrablock F-statistic. For a single randomization, let QH be the matrix for H that projects
on the eigenspace of V from which RXht is to be estimated. Then and var .ˆ
H H HH
h H h Q R Q Q RQ Q
RX RQ Y RX
The intrablock ˆ' .H HH H traF ce Q R QRQ RY RQ Y
18
Randomization distribution of the test statistic To obtain it:
Apply, to the unit factors and y, but not the treatment factors, all allowable unit permutations for the design employed: effects a rerandomization of the treatments;
Compute the test statistic for each allowable permutation; This set of values is the required distribution.
Number of allowable permutations. For our RCBD, there are 8 permutations and so computing the 8
test statistics is easy. For b = 10 and k = 3, there are 1.4 x 1035 — not so easy. An alternative is random data permutation (Edgington, 1995): take a
Monte Carlo sample of the permutations.
19
Null distribution of the test statistic under normality
Under normality of the response, the null distribution of FWald is: for orthogonal designs, an exact F-distribution; for nonorthogonal designs, an F-distribution
asymptotically. Under normality of the response, the null
distribution of an intrablock F-statistic is an exact F-distribution.
20
Wheat experiment in a BIBD (Joshi, 1987)
Six varieties of wheat are assigned to plots arranged in 10 blocks of 3 plots.
The efficiency factor for this design is 0.80. The ANOVA with the intrablock F and p:
plots tier treatments tier
source d.f. source d.f. MS F p-value
Blocks 9 Varieties 5 39.32 0.58 0.718
Residual 4 67.59 1.17
Plots[B] 20 Varieties 5 231.29 4.02 0.016
Residual 15 57.53
FWald = 3.05 with p = 0.035 (n1 = 5, n2 = 19.1).
Estimates: yB = 14.60 (p = 0.403); yBP = 58.28.
21
Test statistic distributions 50,000 randomly selected permutations of blocks
and plots within blocks selected.
Intrablock F-statistic Combined F-statistic
Peak on RHS is all values 10.
22
Combined F-statistic
Part of the discrepancy between F- and the randomization distributions is that combined F-statistic is only asymptotically distributed as an F. Differs from Kenward & Rogers (1997) & Schaalje et al (2002) for
nonorthogonal designs.
Randomization distribution Parametric bootstrap
23
Two other examples Rabbit experiment using the same BIBD
(Hinkelmann & Kempthorne, 2008). 6 Diets assigned to 10 Litters, each with 3 Rabbits. Estimates: yL = 21.70 (p = 0.002), yLR = 10.08.
Casuarina experiment in a latinized row-column design (Williams et al., 2002). 4 Blocks of 60 provenances arranged in 6 rows by 10
columns. Provenances grouped according to 18 Countries of
origin. 2 Inoculation dates each applied to 2 of the blocks. Estimates: yC = 0.2710; yB, yBR , yBC < 0.06;
yBRC = 0.2711.
24
ANOVA for Casuarina experiment
Provenance represents provenance differences within countries.
plots tier treatments tier
source d.f. source d.f. Eff. MS F p-value
Blocks 3 Innoculation 1 11.5411.46
0.077
Residual 2 1.011.17
Columns 9 Country 9 7.25
Rows[B] 20 Country 17 0.90
Provenance 3 0.43
B#C 27 Country 17 0.69
Provenance 10 0.48
R#C[B] 176 Country 170.761
2.4610.25
<0.001
Provenance 410.685
0.291.22
0.235
I#C 170.681
0.130.54
0.917
I#P 410.522
0.150.63
0.938
Residual 60 0.24
25
Comparison of p-values
For intrablock F, p-values from F and randomization distributions generally agree.
For FWald, p-values from F-distribution generally underestimates that from randomization distribution: (Rabbit Diets an exception – little interblock contribution).
Example Source Intrablock F FWald (Combined)
n2 F-distri-bution
Randomiz-ation
n2 F-distri-bution
Randomiz-ation
Wheat Varieties 15 0.016 0.012 19.1 0.035 0.096
Rabbit Diets 15 0.038 0.038 16.0 0.032 0.034
Tree Country 60 <0.001 <0.001 79.3 <0.001 0.008
Provenance 60 0.235 0.238 79.0 0.338 0.454
Innoc#C 60 0.917 0.918 84.8 0.963 0.976
Innoc#P 60 0.938 0.938 81.1 0.943 0.966
26
4. Randomization model for a chain of randomizations
A chain of two randomizations consists of: the randomization of treatments to the first set of units; The randomization of the first set of units to a second set of units.
For example, a two-phase sensory experiment (Brien & Payne, 1999; Brien & Bailey , 2006, Example 15) involves two randomizations: Field phase: 8 treatments to 48 halfplots using split-plot with 2
Youden squares for main plots. Sensory phase: 48 halfplots randomized 576 evaluations, using
Latin squares and an extended Youden square.
2 Occasions3 Intervals in O6 Judges4 Sittings in O, I4 Positions in O, I, S, J
576 evaluations48 halfplots
2 Squares3 Rows4 Columns in Q2 Halfplots in Q, R, C
8 treatments
4 Trellis2 Methods
(Q = Squares)
Three sets of objects: treatments (G), halfplots () & evaluations (W).
27
Randomization model
Additive model of constants:y = z + Xfw + XfXht
where y is the vector of observed responses; z is the vector of constants representing the contributions of each
unit in the 2nd randomization (w W) to the response; w is the vector of constants representing the contributions of each
unit in the 1st randomization (u ) to the response; and t is a vector of treatment constants; Xf & Xh are design matrices showing the randomization
assignments. Under the two randomizations, each element of z and of w
become random variables, as does each element of y.
Y = Z + XfW + XfXht where Y, Z and W are the vectors of random variables. Now, we assume ER[Z] = ER[W] = 0 and so ER[Y] = XfXht .
28
Randomization model (cont’d) Further, R
.
H H H HH H
H H H HH H
H H H HH H
V C C
A B
T S
P Q
H H
H H
H H
CW & C are the contributions to the variance arising from W and , respectively.
HW & H are the sets of generalized factors (terms) derived from the posets of factors on W and ;
are the covariances; are the canonical component of excess covariance; are the eigenvalues of CW and C, respectively; are known matrices.
,H H
,H H
,H H
, , , , ,H H H H H H A B T S P Q
29
Forming the null randomization distribution of the response
Under the assumption of no treatment effects,
Y* = Z + XfW + m*1. There are two randomizations, G to and to W;
to effect G to , and H are permuted, and
to effect to W, W and HW are permuted.
However, in this model Xf is fixed and reflects the actual randomization employed in the experiment.
Hence, we do not apply the second randomization and consider the null randomization distribution, conditional on the observed randomization of to W.1) Apply the permutations of to H, HW and y, to effect a rerandomization of
G to .o must also be applied to HW so that it does not effect a rerandomization of to W.
30
5. Randomization analysis for a chain of randomizations
Again, based on the randomization distribution of the response.
Use the same test statistics as for a single randomization: FWald and intrablock F-statistics.
Obtain or estimate the randomization distributions of these test statistics Based on randomization of G to and is conditional on
the observed randomization of to W.
A Two-Phase Sensory Experiment (Brien & Bailey, 2006, Example 15)
Involves two randomizations:
31
(Brien & Payne, 1999)
2 Occasions3 Intervals in O6 Judges4 Sittings in O, I4 Positions in O, I, S, J
576 evaluations48 halfplots
2 Squares3 Rows4 Columns in Q2 Halfplots in Q, R, C
8 treatments
4 Trellis2 Methods
(Q = Squares)
The randomization distribution will be based on the randomization of treatments to halfplots and is conditional on the actual randomization of halfplots to evaluations. Permuting evaluations and y will almost certainly result in unobserved
combinations of halfplots and evaluations, so that the randomization model is no longer valid.
ANOVA table for sensory exp't
32
evaluations tier
source df
Occasions 1
Judges 5
O#J 5
Intervals[O] 4
I#J[O] 20
Sittings[OI] 18
S#J[OI] 90
Positions[OISJ] 432
treatments tier
eff source df
1/27 Trellis 3
Residual 3
2/27 Trellis 3
Residual 3
8/9 Trellis 3
Residual 9
Method 1
T#M 3
Residual 20
Intrablock Trellis
Orthogonalsources
halfplots tier
eff source df
Squares 1
Rows 2
Q#R 2
Residual 16
1/3 Columns[Q] 6
Residual 12
2/3 Columns[Q] 6
R#C[Q] 12
Residual 72
Halfplots[RCQ] 24
Residual 408
33
Comparison of p-values
Note the difference in denominator df for Trellis.
Source Intrablock F FWald (Combined)
n2 F-distribution
Randomiz-ation
n2 F-distribution
Randomiz-ation
Trellis 9 0.001 0.004 14.9 <0.001 0.004
Method 20 0.627 0.626
Trellis#Method 20 0.009 0.005
34
F = 5.10pF = 0.009pR = 0.005
F = 0.24pF = 0.627pR = 0.626
Fcomb = 25.59pF = <0.001pR = 0.004
Fintra = 13.47pF = 0.001pR = 0.004
Comparison of distributions
Trellis
Method
Trellis
Trellis#Method
35
6. Some issues Size of permutations sample A controversy: sometimes pooling Unit-treatment additivity
36
Size of permutations sample A study of subsamples of the 50,000 randomly
selected permutations revealed that: the estimates of p-values from samples of 25,000 or
more randomized layouts have a range < 0.005. samples of 5,000 randomized layouts will often be
sufficiently accurate – the estimates of p-valueso around 0.01 or less, exhibit a range < 0.005; o in excess of 0.20, show a range about 0.03;o around 0.05, display a range of 0.01.
37
Unit-treatment additivity Cox and Reid (2000) allow random unit-treatment
interaction; Test hypothesis that treatment effects are greater than unit-
treatment interaction. Nelder (1977) suggests the random form is questionable.
The Iowa school allows arbitrary (fixed) unit-treatment interactions. Test difference between the average treatment effects over all units,
which is biased in the presence of unit-treatment interaction. Such a test ignores marginality/hierarchy.
Questions: Which form applies? How to detect unit-treatment interaction? Often impossible, but,
when it is possible, cannot be part of a randomization analysis. Randomization analysis requires unit-treatment additivity.
If not appropriate, use a randomization-based mixed model.
38
A controversy Should nonsignificant (??) unit sources of variation
be removed and hence pooled with other unit sources?
The point is that effects hypothesized to occur at the planning stage have not eventuated. A modeller would remove them; Indeed, in mixed-model fitting using REML will have no
option if the fitting process does not converge. Some argue, because in randomization model,
must stay. Seems reasonable if doing randomization inference.
Sometimes-pooling may disrupt power and coverage properties of the analysis (Janky, 2000).
39
7. Conclusions Fisher was right:
One should employ meaningful models; Randomization analyses provides a check on parametric analyses.
I am still a modeller, with the randomization-based mixed model as my starting point.
I am happy that, for single-stratum tests, the normal theory test approximates an equivalent randomization test, when one exists.
However, the p-values for combined test-statistics from the F-distribution are questionable: novel that depends on ‘interblock’ components; need to do bootstrap or randomization analysis for FWald when
denominator df for intrablock-F and FWald differ markedly; this has the advantage of avoiding the need to pool nonsignificant
(??) unit sources of variation, although fitting can be challenging.
40
References Atiqullah, M. (1963) On the randomization distribution and power of the variance
ratio test. J. Roy. Statist. Soc., Ser. B (Methodological), 25: 334-347. Brien, C.J. & Bailey, R.A. (2006) Multiple randomizations (with discussion). J.
Roy. Statist. Soc., Ser. B (Statistical Methodology), 68: 571-609. Brien, C.J. & Demétrio, C.G.B. (2009) Formulating Mixed Models for
Experiments, Including Longitudinal Experiments." J. Agric. Biol. Environ. Statist., 14: 253-280.
Cox, D.R. & Reid, N. (2000). The theory of the design of experiments. Boca Raton, Chapman & Hall/CRC.
Edgington, E.S. (1995) Randomization tests. New York, Marcel Dekker. Fisher, R.A. (1935, 1960) The Design of Experiments. Edinburgh, Oliver and
Boyd. Hinkelmann, K. & Kempthorne, O. (2008) Design and analysis of experiments.
Vol I. Hoboken, N.J., Wiley-Interscience. Janky, D.G. (2000) Sometimes pooling for analysis of variance hypothesis tests:
A review and study of a split-plot model. The Amer. Statist. 54: 269-279. Joshi, D.D. (1987) Linear estimation and design of experiments. Delhi, New Age
Publishers.
41
References (cont’d) Kempthorne, O. (1975) Inference from experiments and randomization. A
Survey of Statistical Design and Linear Models. J. N. Srivastava. Amsterdam., North Holland.
Mead, R., S. G. Gilmour & Mead, A.. (2012). Statistical principles for the design of experiments. Cambridge, Cambridge University Press.
Nelder, J.A. (1965) The analysis of randomized experiments with orthogonal block structure. I. Block structure and the null analysis of variance. Proc. Roy. Soc. Lon., Series A, 283: 147-162.
Nelder, J. A. (1977). A reformulation of linear models (with discussion). J. Roy. Statist. Soc., Ser. A (General), 140: 48-77.
Preece, D.A. (1982) The design and analysis of experiments: what has gone wrong?" Util. Math., 21A: 201-244.
Schaalje, B. G., J. B. McBride, et al. (2002). Adequacy of approximations to distributions of test statistics in complex mixed linear models. J. Agric. Biol, Environ. Stat., 7: 512-524.
Welch, B.L. (1937) On the z-test in randomized blocks and Latin squares. Biometrika, 29: 21-52.
Williams, E.R., Matheson, A.C. & Harwood, C.E. (2002). Experimental design and analysis for tree improvement. Collingwood, Vic., CSIRO Publishing.