the theory of the design of experiments [doe]

CHAPMAN & HALL/CRC

The Theoryof the

Design ofExperiments

D.R. COXHonorary FellowNuffield College

Oxford, UK

AND

N. REIDProfessor of Statistics

University of Toronto, Canada

Boca Raton London New York Washington, D.C.

This book contains information obtained from authentic and highly regarded sources.Reprinted material is quoted with permission, and sources are indicated. A wide varietyof references are listed. Reasonable efforts have been made to publish reliable data andinformation, but the author and the publisher cannot assume responsibility for the validityof all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by anymeans, electronic or mechanical, including photocopying, microfilming, and recording,or by any information storage or retrieval system, without prior permission in writing fromthe publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, forpromotion, for creating new works, or for resale. Specific permission must be obtained inwriting from CRC Press LLC for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida33431.

Trademark Notice:

Product or corporate names may be trademarks or registered trade-marks, and are used only for identification and explanation, without intent to infringe.

© 2000 by Chapman & Hall/CRC

No claim to original U.S. Government worksInternational Standard Book Number 1-58488-195-X

Library of Congress Card Number 00-029529Printed in the United States of America 2 3 4 5 6 7 8 9 0

Printed on acid-free paper

Library of Congress Cataloging-in-Publication Data

Cox, D. R. (David Roxbee)The theory of the design of experiments / D. R. Cox, N. Reid.

p. cm. — (Monographs on statistics and applied probability ; 86)Includes bibliographical references and index.ISBN 1-58488-195-X (alk. paper)1. Experimental design. I. Reid, N. II.Title. III. Series.QA279 .C73 2000001.4

'34—dc21 00-029529 CIP

C195X/disclaimer Page 1 Friday, April 28, 2000 10:59 AM

Visit the CRC Press Web site at www.crcpress.com

Contents

Preface

1 Some general concepts1.1 Types of investigation1.2 Observational studies1.3 Some key terms1.4 Requirements in design1.5 Interplay between design and analysis1.6 Key steps in design1.7 A simplified model1.8 A broader view1.9 Bibliographic notes1.10 Further results and exercises

2 Avoidance of bias2.1 General remarks2.2 Randomization2.3 Retrospective adjustment for bias2.4 Some more on randomization2.5 More on causality2.6 Bibliographic notes2.7 Further results and exercises

3 Control of haphazard variation3.1 General remarks3.2 Precision improvement by blocking3.3 Matched pairs3.4 Randomized block design3.5 Partitioning sums of squares3.6 Retrospective adjustment for improving precision3.7 Special models of error variation


3.8 Bibliographic notes3.9 Further results and exercises

4 Specialized blocking techniques4.1 Latin squares4.2 Incomplete block designs4.3 Cross-over designs4.4 Bibliographic notes4.5 Further results and exercises

5 Factorial designs: basic ideas5.1 General remarks5.2 Example5.3 Main effects and interactions5.4 Example: continued5.5 Two level factorial systems5.6 Fractional factorials5.7 Example5.8 Bibliographic notes5.9 Further results and exercises

6 Factorial designs: further topics6.1 General remarks6.2 Confounding in 2k designs6.3 Other factorial systems6.4 Split plot designs6.5 Nonspecific factors6.6 Designs for quantitative factors6.7 Taguchi methods6.8 Conclusion6.9 Bibliographic notes6.10 Further results and exercises

7 Optimal design7.1 General remarks7.2 Some simple examples7.3 Some general theory7.4 Other optimality criteria7.5 Algorithms for design construction7.6 Nonlinear design


7.7 Space-filling designs7.8 Bayesian design7.9 Optimality of traditional designs7.10 Bibliographic notes7.11 Further results and exercises

8 Some additional topics8.1 Scale of effort8.2 Adaptive designs8.3 Sequential regression design8.4 Designs for one-dimensional error structure8.5 Spatial designs8.6 Bibliographic notes8.7 Further results and exercises

A Statistical analysisA.1 IntroductionA.2 Linear modelA.3 Analysis of varianceA.4 More general models; maximum likelihoodA.5 Bibliographic notesA.6 Further results and exercises

B Some algebraB.1 IntroductionB.2 Group theoryB.3 Galois fieldsB.4 Finite geometriesB.5 Difference setsB.6 Hadamard matricesB.7 Orthogonal arraysB.8 Coding theoryB.9 Bibliographic notesB.10 Further results and exercises

C Computational issuesC.1 IntroductionC.2 OverviewC.3 Randomized block experiment from Chapter 3C.4 Analysis of block designs in Chapter 4


C.5 Examples from Chapter 5C.6 Examples from Chapter 6C.7 Bibliographic notes

References

List of tables


Preface

This book is an account of the major topics in the design ofexperiments, with particular emphasis on the key concepts involvedand on the statistical structure associated with these concepts.While design of experiments is in many ways a very well developedarea of statistics, it often receives less emphasis than methods ofanalysis in a programme of study in statistics.

We have written for a general audience concerned with statisticsin experimental fields and with some knowledge of and interest intheoretical issues. The mathematical level is mostly elementary;occasional passages using more advanced ideas can be skipped oromitted without inhibiting understanding of later passages. Somespecialized parts of the subject have extensive and specialized lit-eratures, a few examples being incomplete block designs, mixturedesigns, designs for large variety trials, designs based on spatialstochastic models and designs constructed from explicit optimalityrequirements. We have aimed to give relatively brief introductionsto these subjects eschewing technical detail.

To motivate the discussion we give outline Illustrations takenfrom a range of areas of application. In addition we give a limitednumber of Examples, mostly taken from the literature, used for thedifferent purpose of showing detailed methods of analysis withoutmuch emphasis on specific subject-matter interpretation.

We have written a book about design not about analysis, al-though, as has often been pointed out, the two phases are inex-orably interrelated. Therefore it is, in particular, not a book onthe linear statistical model or that related but distinct art formthe analysis of variance table. Nevertheless these topics enter andthere is a dilemma in presentation. What do we assume the readerknows about these matters? We have solved this problem uneasilyby somewhat downplaying analysis in the text, by assuming what-ever is necessary for the section in question and by giving a reviewas an Appendix. Anyone using the book as a basis for a course oflectures will need to consider carefully what the prospective stu-dents are likely to understand about the linear model and to sup-plement the text appropriately. While the arrangement of chapters


represents a logical progression of ideas, if interest is focused on aparticular field of application it will be reasonable to omit certainparts or to take the material in a different order.

If defence of a book on the theory of the subject is needed it isthis. Successful application of these ideas hinges on adapting gen-eral principles to the special constraints of individual applications.Thus experience suggests that while it is useful to know about spe-cial designs, balanced incomplete block designs for instance, it israther rare that they can be used directly. More commonly theyneed some adaptation to accommodate special circumstances andto do this effectively demands a solid theoretical base.

This book has been developed from lectures given at Cambridge,Birkbeck College London, Vancouver, Toronto and Oxford. We aregrateful to Amy Berrington, Mario Cortina Borja, Christl Don-nelly, Peter Kupchak, Rahul Mukerjee, John Nelder, Rob Tibshi-rani and especially Grace Yun Yi for helpful comments on a pre-liminary version.

D.R. Cox and N. ReidOxford and TorontoJanuary 2000


List of tables

3.1 Strength index of cotton3.2 Analysis of variance for strength index

4.1 5× 5 Graeco-Latin square4.2 Complete orthogonal set of 5× 5 Latin squares4.3 Balanced incomplete block designs4.4 Two special incomplete block designs4.5 Youden square4.6 Intrablock analysis of variance4.7 Interblock analysis of variance4.8 Analysis of variance, general incomplete block

design4.9 Log dry weight of chick bones4.10 Treatment means, log dry weight4.11 Analysis of variance, log dry weight4.12 Estimates of treatment effects4.13 Expansion index of pastry dough4.14 Unadjusted and adjusted treatment means4.15 Example of intrablock analysis

5.1 Weights of chicks5.2 Mean weights of chicks5.3 Two factor analysis of variance5.4 Analysis of variance, weights of chicks5.5 Decomposition of treatment sum of squares5.6 Analysis of variance, 22 factorial5.7 Analysis of variance, 2k factorial5.8 Treatment factors, nutrition and cancer5.9 Data, nutrition and cancer5.10 Estimated effects, nutrition and cancer5.11 Data, Exercise 5.65.12 Contrasts, Exercise 5.6


6.1 Example, double confounding6.2 Degrees of freedom, double confounding6.3 Two orthogonal Latin squares6.4 Estimation of the main effect6.5 1/3 fraction, degrees of freedom6.6 Asymmetric orthogonal array6.7 Supersaturated design6.8 Example, split-plot analysis6.9 Data, tensile strength of paper6.10 Analysis, tensile strength of paper6.11 Table of means, tensile strength of paper6.12 Analysis of variance for a replicated factorial6.13 Analysis of variance with a random effect6.14 Analysis of variance, quality/quantity interaction6.15 Design, electronics example6.16 Data, electronics example6.17 Analysis of variance, electronics example6.18 Example, factorial treatment structure for incom-

plete block design

8.1 Treatment allocation: biased coin design8.2 3× 3 lattice squares for nine treatments8.3 4× 4 Latin square

A.1 Stagewise analysis of variance tableA.2 Analysis of variance, nested and crossedA.3 Analysis of variance for Yabc;j

A.4 Analysis of variance for Y(a;j)bc

B.1 Construction of two orthogonal 5× 5 Latin squares


CHAPTER 1

Some general concepts

1.1 Types of investigation

This book is about the design of experiments. The word experimentis used in a quite precise sense to mean an investigation where thesystem under study is under the control of the investigator. Thismeans that the individuals or material investigated, the nature ofthe treatments or manipulations under study and the measurementprocedures used are all settled, in their important features at least,by the investigator.

By contrast in an observational study some of these features,and in particular the allocation of individuals to treatment groups,are outside the investigator’s control.

Illustration. In a randomized clinical trial patients meeting clearlydefined eligibility criteria and giving informed consent are assignedby the investigator by an impersonal method to one of two or moretreatment arms, in the simplest case either to a new treatment, T ,or to a standard treatment or control, C, which might be the bestcurrent therapy or possibly a placebo treatment. The patients arefollowed for a specified period and one or more measures of re-sponse recorded.

In a comparable observational study, data on the response vari-ables might be recorded on two groups of patients, some of whomhad received T and some C; the data might, for example, be ex-tracted from a database set up during the normal running of ahospital clinic. In such a study, however, it would be unknown whyeach particular patient had received the treatment he or she had.

The form of data might be similar or even almost identical inthe two contexts; the distinction lies in the firmness of the inter-pretation that can be given to the apparent differences in responsebetween the two groups of patients.

Illustration. In an agricultural field trial, an experimental field isdivided into plots of a size and shape determined by the investiga-


tor, subject to technological constraints. To each plot is assignedone of a number of fertiliser treatments, often combinations of var-ious amounts of the basic constituents, and yield of product ismeasured.

In a comparable observational study of fertiliser practice a surveyof farms or fields would give data on quantities of fertiliser used andyield, but the issue of why each particular fertiliser combinationhad been used in each case would be unclear and certainly notunder the investigator’s control.

A common feature of these and other similar studies is thatthe objective is a comparison, of two medical treatments in thefirst example, and of various fertiliser combinations in the second.Many investigations in science and technology have this form. Invery broad terms, in technological experiments the treatments un-der comparison have a very direct interest, whereas in scientificexperiments the treatments serve to elucidate the nature of somephenomenon of interest or to test some research hypothesis. We donot, however, wish to emphasize distinctions between science andtechnology.

We translate the objective into that of comparing the responsesamong the different treatments. An experiment and an observa-tional study may have identical objectives; the distinction betweenthem lies in the confidence to be put in the interpretation.

Investigations done wholly or primarily in the laboratory areusually experimental. Studies of social science issues in the con-text in which they occur in the real world are usually inevitablyobservational, although sometimes elements of an experimental ap-proach may be possible. Industrial studies at pilot plant level willtypically be experimental whereas at a production level, while ex-perimental approaches are of proved fruitfulness especially in theprocess industries, practical constraints may force some deviationfrom what is ideal for clarity of interpretation.

Illustration. In a survey of social attitudes a panel of individualsmight be interviewed say every year. This would be an observa-tional study designed to study and if possible explain changes ofattitude over time. In such studies panel attrition, i.e. loss of re-spondents for one reason or another, is a major concern. One wayof reducing attrition may be to offer small monetary payments afew days before the interview is due. An experiment on the ef-fectiveness of this could take the form of randomizing individuals


between one of two treatments, a monetary payment or no mone-tary payment. The response would be the successful completion ornot of an interview.

1.2 Observational studies

While in principle the distinction between experiments and obser-vational studies is clear cut and we wish strongly to emphasize itsimportance, nevertheless in practice the distinction can sometimesbecome blurred. Therefore we comment briefly on various forms ofobservational study and on their closeness to experiments.

It is helpful to distinguish between a prospective longitudinalstudy (cohort study), a retrospective longitudinal study, a cross-sectional study, and the secondary analysis of data collected forsome other, for example, administrative purpose.

In a prospective study observations are made on individuals atentry into the study, the individuals are followed forward in time,and possible response variables recorded for each individual. In aretrospective study the response is recorded at entry and an at-tempt is made to look backwards in time for possible explanatoryfeatures. In a cross-sectional study each individual is observed atjust one time point. In all these studies the investigator may havesubstantial control not only over which individuals are includedbut also over the measuring processes used. In a secondary analy-sis the investigator has control only over the inclusion or exclusionof the individuals for analysis.

In a general way the four possibilities are in decreasing order ofeffectiveness, the prospective study being closest to an experiment;they are also in decreasing order of cost.

Thus retrospective studies are subject to biases of recall but mayoften yield results much more quickly than corresponding prospec-tive studies. In principle at least, observations taken at just onetime point are likely to be less enlightening than those taken overtime. Finally secondary analysis, especially of some of the largedatabases now becoming so common, may appear attractive. Thequality of such data may, however, be low and there may be majordifficulties in disentangling effects of different explanatory features,so that often such analyses are best regarded primarily as ways ofgenerating ideas for more detailed study later.

In epidemiological applications, a retrospective study is oftendesigned as a case-control study, whereby groups of patients with


a disease or condition (cases), are compared to a hopefully similargroup of disease-free patients on their exposure to one or more riskfactors.

1.3 Some key terms

We shall return later to a more detailed description of the typesof experiment to be considered but for the moment it is enough toconsider three important elements to an experiment, namely theexperimental units, the treatments and the response. A schematicversion of an experiment is that there are a number of differenttreatments under study, the investigator assigns one treatment toeach experimental unit and observes the resulting response.

Experimental units are essentially the patients, plots, animals,raw material, etc. of the investigation. More formally they cor-respond to the smallest subdivision of the experimental materialsuch that any two different experimental units might receive dif-ferent treatments.

Illustration. In some experiments in opthalmology it might besensible to apply different treatments to the left and to the righteye of each patient. Then an experimental unit would be an eye,that is each patient would contribute two experimental units.

The treatments are clearly defined procedures one of which is tobe applied to each experimental unit. In some cases the treatmentsare an unstructured set of two or more qualitatively different pro-cedures. In others, including many investigations in the physicalsciences, the treatments are defined by the levels of one or morequantitative variables, such as the amounts per square metre of theconstituents nitrogen, potash and potassium, in the illustration inSection 1.1.

The response measurement specifies the criterion in terms ofwhich the comparison of treatments is to be effected. In manyapplications there will be several such measures.

This simple formulation can be amplified in various ways. Thesame physical material can be used as an experimental unit morethan once. If the treatment structure is complicated the experi-mental unit may be different for different components of treatment.The response measured may be supplemented by measurements onother properties, called baseline variables, made before allocation


to treatment, and on intermediate variables between the baselinevariables and the ultimate response.

Illustrations. In clinical trials there will typically be available nu-merous baseline variables such as age at entry, gender, and specificproperties relevant to the disease, such as blood pressure, etc., allto be measured before assignment to treatment. If the key responseis time to death, or more generally time to some critical event in theprogression of the disease, intermediate variables might be prop-erties measured during the study which monitor or explain theprogression to the final response.

In an agricultural field trial possible baseline variables are chem-ical analyses of the soil in each plot and the yield on the plot inthe previous growing season, although, so far as we are aware, theeffectiveness of such variables as an aid to experimentation is lim-ited. Possible intermediate variables are plant density, the numberof plants per square metre, and assessments of growth at variousintermediate points in the growing season. These would be includedto attempt explanation of the reasons for the effect of fertiliser onyield of final product.

1.4 Requirements in design

The objective in the type of experiment studied here is the com-parison of the effect of treatments on response. This will typicallybe assessed by estimates and confidence limits for the magnitudeof treatment differences. Requirements on such estimates are es-sentially as follows. First systematic errors, or biases, are to beavoided. Next the effect of random errors should so far as feasi-ble be minimized. Further it should be possible to make reason-able assessment of the magnitude of random errors, typically viaconfidence limits for the comparisons of interest. The scale of theinvestigation should be such as to achieve a useful but not unnec-essarily high level of precision. Finally advantage should be takenof any special structure in the treatments, for example when theseare specified by combinations of factors.

The relative importance of these aspects is different in differ-ent fields of study. For example in large clinical trials to assessrelatively small differences in treatment efficacy, avoidance of sys-tematic error is a primary issue. In agricultural field trials, andprobably more generally in studies that do not involve human sub-


jects, avoidance of bias, while still important, is not usually theaspect of main concern.

These objectives have to be secured subject to the practical con-straints of the situation under study. The designs and considera-tions developed in this book have often to be adapted or modifiedto meet such constraints.

1.5 Interplay between design and analysis

There is a close connection between design and analysis in that anobjective of design is to make both analysis and interpretation assimple and clear as possible. Equally, while some defects in designmay be corrected by more elaborate analysis, there is nearly alwayssome loss of security in the interpretation, i.e. in the underlyingsubject-matter meaning of the outcomes.

The choice of detailed model for analysis and interpretation willoften involve subject-matter considerations that cannot readily bediscussed in a general book such as this. Partly but not entirelyfor this reason we concentrate here on the analysis of continuouslydistributed responses via models that are usually linear, leading toanalyses quite closely connected with the least-squares analysis ofthe normal theory linear model. One intention is to show that suchdefault analyses follow from a single set of assumptions common tothe majority of the designs we shall consider. In this rather specialsense, the model for analysis is determined by the design employed.Of course we do not preclude the incorporation of special subject-matter knowledge and models where appropriate and indeed thismay be essential for interpretation.

There is a wider issue involved especially when a number of dif-ferent response variables are measured and underlying interpreta-tion is the objective rather than the direct estimation of treatmentdifferences. It is sensible to try to imagine the main patterns ofresponse that are likely to arise and to consider whether the infor-mation will have been collected to allow the interpretation of these.This is a broader issue than that of reviewing the main scheme ofanalysis to be used. Such consideration must always be desirable;it is, however, considerably less than a prior commitment to a verydetailed approach to analysis.

Two terms quite widely used in discussions of the design of ex-periments are balance and orthogonality. Their definition depends abit on context but broadly balance refers to some strong symmetry


in the combinatorial structure of the design, whereas orthogonal-ity refers to special simplifications of analysis and achievement ofefficiency consequent on such balance.

For example, in Chapter 3 we deal with designs for a number oftreatments in which the experimental units are arranged in blocks.The design is balanced if each treatment occurs in each block thesame number of times, typically once. If a treatment occurs oncein some blocks and twice or not at all in others the design is con-sidered unbalanced. On the other hand, in the context of balancedincomplete block designs studied in Section 4.2 the word balancerefers to an extended form of symmetry.

In analyses involving a linear model, and most of our discussioncentres on these, two types of effect are orthogonal if the relevantcolumns of the matrix defining the linear model are orthogonal inthe usual algebraic sense. One consequence is that the least squaresestimates of one of the effects are unchanged if the other type ofeffect is omitted from the model. For orthogonality some kinds ofbalance are sufficient but not necessary. In general statistical theorythere is an extended notion of orthogonality based on the Fisherinformation matrix and this is relevant when maximum likelihoodanalysis of more complicated models is considered.

1.6 Key steps in design

1.6.1 General remarks

Clearly the single most important aspect of design is a purely sub-stantive, i.e. subject-matter, one. The issues addressed should beinteresting and fruitful. Usually this means examining one or morewell formulated questions or research hypotheses, for example aspeculation about the process underlying some phenomenon, orthe clarification and explanation of earlier findings. Some investi-gations may have a less focused objective. For example, the initialphases of a study of an industrial process under production con-ditions may have the objective of identifying which few of a largenumber of potential influences are most important. The methodsof Section 5.6 are aimed at such situations, although they are prob-ably atypical and in most cases the more specific the research ques-tion the better.

In principle therefore the general objectives lead to the followingmore specific issues. First the experimental units must be defined


and chosen. Then the treatments must be clearly defined. The vari-ables to be measured on each unit must be specified and finally thesize of the experiment, in particular the number of experimentalunits, has to be decided.

1.6.2 Experimental units

Issues concerning experimental units are to some extent very spe-cific to each field of application. Some points that arise fairly gen-erally and which influence the discussion in this book include thefollowing.

Sometimes, especially in experiments with a technological focus,it is useful to consider the population of ultimate interest and thepopulation of accessible individuals and to aim at conclusions thatwill bridge the inevitable gap between these. This is linked to thequestion of whether units should be chosen to be as uniform aspossible or to span a range of circumstances. Where the latter issensible it will be important to impose a clear structure on theexperimental units; this is connected with the issue of the choiceof baseline measurements.

Illustration. In agricultural experimentation with an immediateobjective of making recommendations to farmers it will be impor-tant to experiment in a range of soil and weather conditions; avery precise conclusion in one set of conditions may be of limitedvalue. Interpretation will be much simplified if the same basic de-sign is used at each site. There are somewhat similar considerationsin some clinical trials, pointing to the desirability of multi-centretrials even if a trial in one centre would in principle be possible.

By contrast in experiments aimed at elucidating the nature ofcertain processes or mechanisms it will usually be best to chooseunits likely to show the effect in question in as striking a form aspossible and to aim for a high degree of uniformity across units.

In some contexts the same individual animal, person or materialmay be used several times as an experimental unit; for examplein a psychological experiment it would be common to expose thesame subject to various conditions (treatments) in one session.

It is important in much of the following discussion and in ap-plications to distinguish between experimental units and observa-tions. The key notion is that different experimental units must inprinciple be capable of receiving different treatments.


Illustration. In an industrial experiment on a batch process eachseparate batch of material might form an experimental unit to beprocessed in a uniform way, separate batches being processed pos-sibly differently. On the product of each batch many samples maybe taken to measure, say purity of the product. The number ofobservations of purity would then be many times the number ofexperimental units. Variation between repeat observations withina batch measures sampling variability and internal variability ofthe process. Precision of the comparison of treatments is, how-ever, largely determined by, and must be estimated from, variationbetween batches receiving the same treatment. In our theoreticaltreatment that follows the number of batches is thus the relevanttotal “sample” size.

1.6.3 Treatments

The simplest form of experiment compares a new treatment ormanipulation, T , with a control, C. Even here care is needed inapplications. In principle T has to be specified with considerableprecision, including many details of its mode of implementation.The choice of control, C, may also be critical. In some contextsseveral different control treatments may be desirable. Ideally thecontrol should be such as to isolate precisely that aspect of T whichit is the objective to examine.

Illustration. In a clinical trial to assess a new drug, the choice ofcontrol may depend heavily on the context. Possible choices of con-trol might be no treatment, a placebo treatment, i.e. a substancesuperficially indistinguishable from T but known to be pharma-cologically inactive, or the best currently available therapy. Thechoice between placebo and best available treatment may in someclinical trials involve difficult ethical decisions.

In more complex situations there may be a collection of quali-tatively different treatments T1, . . . , Tv. More commonly the treat-ments may have factorial structure, i.e. be formed from combina-tions of levels of subtreatments, called factors. We defer detailedstudy of the different kinds of factor and the design of factorialexperiments until Chapter 5, noting that sensible use of the prin-ciple of examining several factors together in one study is one ofthe most powerful ideas in this subject.


1.6.4 Measurements

The choice of appropriate variables for measurement is a key aspectof design in the broad sense. The nature of measurement processesand their associated potentiality for error, and the different kindsof variable that can be measured and their purposes are central is-sues. Nevertheless these issues fall outside the scope of the presentbook and we merely note three broad types of variable, namelybaseline variables describing the experimental units before appli-cation of treatments, intermediate variables and response variables,in a medical context often called end-points.

Intermediate variables may serve different roles. Usually the moreimportant is to provide some provisional explanation of the processthat leads from treatment to response. Other roles are to check onthe absence of untoward interventions and, sometimes, to serve assurrogate response variables when the primary response takes along time to obtain.

Sometimes the response on an experimental unit is in effect atime trace, for example of the concentrations of one or more sub-stances as transient functions of time after some intervention. Forour purposes we suppose such responses replaced by one or moresummary measures, such as the peak response or the area underthe response-time curve.

Clear decisions about the variables to be measured, especiallythe response variables, are crucial.

1.6.5 Size of experiment

Some consideration virtually always has to be given to the num-ber of experimental units to be used and, where subsampling ofunits is employed, to the number of repeat observations per unit.A balance has to be struck between the marginal cost per exper-imental unit and the increase in precision achieved per additionalunit. Except in rare instances where these costs can both be quan-tified, a decision on the size of experiment is bound be largely amatter of judgement and some of the more formal approaches todetermining the size of the experiment have spurious precision. Itis, however, very desirable to make an advance approximate cal-culation of the precision likely to be achieved. This gives someprotection against wasting resources on unnecessary precision or,more commonly, against undertaking investigations which will be


of such low precision that useful conclusions are very unlikely. Thesame calculations are advisable when, as is quite common in somefields, the maximum size of the experiment is set by constraintsoutside the control of the investigator. The issue is then most com-monly to decide whether the resources are sufficient to yield enoughprecision to justify proceeding at all.

1.7 A simplified model

The formulation of experimental design that will largely be usedin this book is as follows. There are given n experimental units,U1, . . . , Un and v treatments, T1, . . . , Tv; one treatment is appliedto each unit as specified by the investigator, and one responseY measured on each unit. The objective is to specify proceduresfor allocating treatments to units and for the estimation of thedifferences between treatments in their effect on response.

This is a very limited version of the broader view of designsketched above. The justification for it is that many of the valuablespecific designs are accommodated in this framework, whereas thewider considerations sketched above are often so subject-specificthat it is difficult to give a general theoretical discussion.

It is, however, very important to recall throughout that the pathbetween the choice of a unit and the measurement of final responsemay be a long one in time and in other respects and that randomand systematic error may arise at many points. Controlling forrandom error and aiming to eliminate systematic error is thus nota single step matter as might appear in our idealized model.

1.8 A broader view

The discussion above and in the remainder of the book concentrateson the integrity of individual experiments. Yet investigations arerarely if ever conducted in isolation; one investigation almost in-evitably suggests further issues for study and there is commonlythe need to establish links with work related to the current prob-lems, even if only rather distantly. These are important matters butagain are difficult to incorporate into formal theoretical discussion.

If a given collection of investigations estimate formally the samecontrasts, the statistical techniques for examining mutual consis-tency of the different estimates and, subject to such consistency,of combining the information are straightforward. Difficulties come


more from the choice of investigations for inclusion, issues of gen-uine comparability and of the resolution of apparent inconsisten-cies.

While we take the usual objective of the investigation to be thecomparison of responses from different treatments, sometimes thereis a more specific objective which has an impact on the design tobe employed.

Illustrations. In some kinds of investigation in the chemical pro-cess industries, the treatments correspond to differing concentra-tions of various reactants and to variables such as pressure, temper-ature, etc. For some purposes it may be fruitful to regard the ob-jective as the determination of conditions that will optimize somecriterion such as yield of product or yield of product per unit cost.Such an explicitly formulated purpose, if adopted as the sole ob-jective, will change the approach to design.

In selection programmes for, say, varieties of wheat, the inves-tigation may start with a very large number of varieties, possiblyseveral hundred, for comparison. A certain fraction of these are cho-sen for further study and in a third phase a small number of vari-eties are subject to intensive study. The initial stage has inevitablyvery low precision for individual comparisons and analysis of thedesign strategy to be followed best concentrates on such issues asthe proportion of varieties to be chosen at each phase, the relativeeffort to be devoted to each phase and in general on the propertiesof the whole process and the properties of the varieties ultimatelyselected rather than on the estimation of individual differences.

In the pharmaceutical industry clinical trials are commonly de-fined as Phase I, II or III, each of which has quite well-definedobjectives. Phase I trials aim to establish relevant dose levels andtoxicities, Phase II trials focus on a narrowly selected group of pa-tients expected to show the most dramatic response, and Phase IIItrials are a full investigation of the treatment effects on patientsbroadly representative of the clinical population.

In investigations with some technological relevance, even if thereis not an immediate focus on a decision to be made, questions willarise as to the practical implications of the conclusions. Is a differ-ence established big enough to be of public health relevance in anepidemiological context, of relevance to farmers in an agriculturalcontext or of engineering relevance in an industrial context? Do theconditions of the investigation justify extrapolation to the work-


ing context? To some extent such questions can be anticipated byappropriate design.

In both scientific and technological studies estimation of effectsis likely to lead on to the further crucial question: what is the un-derlying process explaining what has been observed? Sometimesthis is expressed via a search for causality. So far as possible thesequestions should be anticipated in design, especially in the defini-tion of treatments and observations, but it is relatively rare for suchexplanations to be other than tentative and indeed they typicallyraise fresh issues for investigation.

It is sometimes argued that quite firm conclusions about causal-ity are justified from experiments in which treatment allocation ismade by objective randomization but not otherwise, it being par-ticularly hazardous to draw causal conclusions from observationalstudies.

These issues are somewhat outside the scope of the present bookbut will be touched on in Section 2.5 after the discussion of therole of randomization. In the meantime some of the potential im-plications for design can be seen from the following Illustration.

Illustration. In an agricultural field trial a number of treatmentsare randomly assigned to plots, the response variable being theyield of product. One treatment, S, say, produces a spectaculargrowth of product, much higher than that from other treatments.The growth attracts birds from many kilometres around, the birdseat most of the product and as a result the final yield for S is verylow. Has S caused a depression in yield?

The point of this illustration, which can be paralleled from otherareas of application, is that the yield on the plots receiving S is in-deed lower than the yield would have been on those plots had theybeen allocated to other treatments. In that sense, which meets oneof the standard definitions of causality, allocation to S has thuscaused a lowered yield. Yet in terms of understanding, and indeedpractical application, that conclusion on its own is quite mislead-ing. To understand the process leading to the final responses it isessential to observe and take account of the unanticipated interven-tion, the birds, which was supplementary to and dependent on theprimary treatments. Preferably also intermediate variables shouldbe recorded, for example, number of plants per square metre andmeasures of growth at various time points in the growing cycle.These will enable at least a tentative account to be developed of


the process leading to the treatment differences in final yield whichare the ultimate objective of study. In this way not only are treat-ment differences estimated but some partial understanding is builtof the interpretation of such differences. This is a potentially causalexplanation at a deeper level.

Such considerations may arise especially in situations in whicha fairly long process intervenes between treatment allocation andthe measurement of response.

These issues are quite pressing in some kinds of clinical trial,especially those in which patients are to be followed for an ap-preciable time. In the simplest case of randomization between twotreatments, T and C, there is the possibility that some patients,called noncompliers, do not follow the regime to which they havebeen allocated. Even those who do comply may take supplementarymedication and the tendency to do this may well be different inthe two treatment groups. One approach to analysis, the so-calledintention-to-treat principle, can be summarized in the slogan “everrandomized always analysed”: one simply compares outcomes inthe two treatment arms regardless of compliance or noncompli-ance. The argument, parallel to the argument in the agriculturalexample, is that if, say, patients receiving T do well, even if few ofthem comply with the treatment regimen, then the consequences ofallocation to T are indeed beneficial, even if not necessarily becauseof the direct consequences of the treatment regimen.

Unless noncompliance is severe, the intention-to-treat analysiswill be one important analysis but a further analysis taking accountof any appreciable noncompliance seems very desirable. Such ananalysis will, however, have some of the features of an observationalstudy and the relatively clearcut conclusions of the analysis of afully compliant study will be lost to some extent at least.

1.9 Bibliographic notes

While many of the ideas of experimental design have a long history,the first major systematic discussion was by R. A. Fisher (1926)in the context of agricultural field trials, subsequently developedinto his magisterial book (Fisher, 1935 and subsequent editions).Yates in a series of major papers developed the subject much fur-ther; see especially Yates (1935, 1936, 1937). Applications wereinitially largely in agriculture and the biological sciences and thensubsequently in industry. The paper by Box and Wilson (1951) was


particularly influential in an industrial context. Recent industrialapplications have been particularly associated with the name ofthe Japanese engineer, G. Taguchi. General books on scientific re-search that include some discussion of experimental design includeWilson (1952) and Beveridge (1952).

Of books on the subject, Cox (1958) emphasizes general prin-ciples in a qualitative discussion, Box, Hunter and Hunter (1978)emphasize industrial experiments and Hinkelman and Kempthorne(1994), a development of Kempthorne (1952), is closer to the orig-inating agricultural applications. Piantadosi (1997) gives a thor-ough account of the design and analysis of clinical trials.

Vajda (1967a, 1967b) and Street and Street (1987) emphasizethe combinatorial problems of design construction. Many generalbooks on statistical methods have some discussion of design buttend to put their main emphasis on analysis; see especially Mont-gomery (1997). For very careful and systematic expositions withsome emphasis respectively on industrial and biometric applica-tions, see Dean and Voss (1999) and Clarke and Kempson (1997).

An annotated bibliography of papers up to the late 1960’s isgiven by Herzberg and Cox (1969).

The notion of causality has a very long history although tra-ditionally from a nonprobabilistic viewpoint. For accounts witha statistical focus, see Rubin (1974), Holland (1986), Cox (1992)and Cox and Wermuth (1996; section 8.7). Rather different viewsof causality are given by Dawid (2000), Lauritzen (2000) and Pearl(2000). For a discussion of compliance in clinical trials, see thepapers edited by Goetghebeur and van Houwelingen (1998).

New mathematical developments in the design of experimentsmay be found in the main theoretical journals. More applied papersmay also contain ideas of broad interest. For work with a primarilyindustrial focus, see Technometrics, for general biometric material,see Biometrics, for agricultural issues see the Journal of Agricul-tural Science and for specialized discussion connected with clinicaltrials see Controlled Clinical Trials, Biostatistics and Statistics inMedicine. Applied Statistics contains papers with a wide range ofapplications.

1.10 Further results and exercises

1. A study of the association between car telephone usage andaccidents was reported by Redelmeier and Tibshirani (1997a)


and a further careful account discussed in detail the study de-sign (Redelmeier and Tibshirani, 1997b). A randomized trialwas infeasible on ethical grounds, and the investigators decidedto conduct a case-control study. The cases were those individu-als who had been in an automobile collision involving propertydamage (but not personal injury), who owned car phones, andwho consented to having their car phone usage records reviewed.

(a) What considerations would be involved in finding a suitablecontrol for each case?

(b) The investigators decided to use each case as his own control,in a specialized version of a case-control study called a case-crossover study. A “case driving period” was defined to bethe ten minutes immediately preceding the collision. Whatconsiderations would be involved in determining the controlperiod?

(c) An earlier study compared the accident rates of a group ofdrivers who owned cellular telephones to a group of driverswho did not, and found lower accident rates in the first group.What potential biases could affect this comparison?

2. A prospective case-crossover experiment to investigate the effectof alcohol on blood œstradiol levels was reported by Ginsberg etal. (1996). Two groups of twelve healthy postmenopausal womenwere investigated. One group was regularly taking œstrogen re-placement therapy and the second was not. On the first day halfthe women in each group drank an alcoholic cocktail, and theremaining women had a similar juice drink without alcohol. Onthe second day the women who first had alcohol were given theplain juice drink and vice versa. In this manner it was intendedthat each woman serve as her own control.

(a) What precautions might well have been advisable in such acontext to avoid bias?

(b) What features of an observational study does this study have?

(c) What features of an experiment does this study have?

3. Find out details of one or more medical studies the conclusionsfrom which have been reported in the press recently. Were theyexperiments or observational studies? Is the design (or analysis)open to serious criticism?


4. In an experiment to compare a number of alternative ways oftreating back pain, pain levels are to be assessed before and aftera period of intensive treatment. Think of a number of ways inwhich pain levels might be measured and discuss their relativemerits. What measurements other than pain levels might beadvisable?

5. As part of a study of the accuracy and precision of laboratorychemical assays, laboratories are provided with a number ofnominally identical specimens for analysis. They are asked todivide each specimen into two parts and to report the sepa-rate analyses. Would this provide an adequate measure of re-producibility? If not recommend a better procedure.

6. Some years ago there was intense interest in the possibilitythat cloud-seeding by aircraft depositing silver iodide crystalson suitable cloud would induce rain. Discuss some of the issueslikely to arise in studying the effect of cloud-seeding.

7. Preece et al. (1999) simulated the effect of mobile phone signalson cognitive function as follows. Subjects wore a headset andwere subject to (i) no signal, (ii) a 915 MHz sine wave analoguesignal, (iii) a 915 MHz sine wave modulated by a 217 Hz squarewave. There were 18 subjects, and each of the six possible ordersof the three conditions were used three times. After two practicesessions the three experimental conditions were used for eachsubject with 48 hours between tests. During each session a vari-ety of computerized tests of mental efficiency were administered.The main result was that a particular reaction time was shorterunder the condition (iii) than under (i) and (ii) but that for14 other types of measurement there were no clear differences.Discuss the appropriateness of the control treatments and theextent to which stability of treatment differences across sessionsmight be examined.

8. Consider the circumstances under which the use of two differentcontrol groups might be valuable. For discussion of this for ob-servational studies, where the idea is more commonly used, seeRosenbaum (1987).


CHAPTER 2

Avoidance of bias

2.1 General remarks

In Section 1.4 we stated a primary objective in the design of exper-iments to be the avoidance of bias, or systematic error. There areessentially two ways to reduce the possibility of bias. One is theuse of randomization and the other the use in analysis of retrospec-tive adjustments for perceived sources of bias. In this chapter wediscuss randomization and retrospective adjustment in detail, con-centrating to begin with on a simple experiment to compare twotreatments T and C. Although bias removal is a primary objec-tive of randomization, we discuss also its important role in givingestimates of errors of estimation.

2.2 Randomization

2.2.1 Allocation of treatments

Given n = 2r experimental units we have to determine which areto receive T and which C. In most contexts it will be reasonable torequire that the same numbers of units receive each treatment, sothat the issue is how to allocate r units to T . Initially we supposethat there is no further information available about the units. Em-pirical evidence suggests that methods of allocation that involveill-specified personal choices by the investigator are often subjectto bias. This and, in particular, the need to establish publicallyindependence from such biases suggest that a wholly impersonalmethod of allocation is desirable. Randomization is a very impor-tant way to achieve this: we choose r units at random out of the2r. It is of the essence that randomization means the use of an ob-jective physical device; it does not mean that allocation is vaguelyhaphazard or even that it is done in a way that looks effectivelyrandom to the investigator.


Illustrations. One aspect of randomization is its use to concealthe treatment status of individuals. Thus in an examination of thereliability of laboratory measurements specimens could be sent foranalysis of which some are from different individuals and others du-plicate specimens from the same individual. Realistic assessment ofprecision would demand concealment of which were the duplicatesand hidden randomization would achieve this.

The terminology “double-blind” is often used in published ac-counts of clinical trials. This usually means that the treatmentstatus of each patient is concealed both from the patient and fromthe treating physician. In a triple-blind trial it would be aimed toconceal the treatment status as well from the individual assessingthe end-point response.

There are a number of ways that randomization can be achievedin a simple experiment with just two treatments. Suppose ini-tially that the units available are numbered U1, . . . , Un. Then all(2r)!/(r!r!) possible samples of size r may in principle be listedand one chosen to receive T , giving each such sample equal chanceof selection. Another possibility is that one unit may be chosen atrandom out of U1, . . . , Un to receive T , then a second out of theremainder and so on until r have been chosen, the remainder re-ceiving C. Finally, the units may be numbered 1, . . . , n, a randompermutation applied, and the first r units allocated say to T .

It is not hard to show that these three procedures are equiv-alent. Usually randomization is subject to certain balance con-straints aimed for example to improve precision or interpretability,but the essential features are those illustrated here. This discussionassumes that the randomization is done in one step. If units areaccrued into the experiment in sequence over time different pro-cedures will be needed to achieve the same objective; see Section2.4.

2.2.2 The assumption of unit-treatment additivity

We base our initial discussion on the following assumption that canbe regarded as underlying also many of the more complex designsand analyses developed later. We state the assumption for a generalproblem with v treatments, T1, . . . , Tv, using the simpler notationT,C for the special case v = 2.


Assumption of unit-treatment additivity. There exist constants,ξs for s = 1, . . . , n, one for each unit, and constants τj , j = 1, . . . , v,one for each treatment, such that if Tj is allocated to Us the re-sulting response is

ξs + τj , (2.1)regardless of the allocation of treatments to other units.

The assumption is based on a full specification of responses cor-responding to any possible treatment allocated to any unit, i.e.for each unit v possible responses are postulated. Now only oneof these can be observed, namely that for the treatment actuallyimplemented on that unit. The other v − 1 responses are counter-factuals. Thus the assumption can be tested at most indirectly byexamining some of its observable consequences.

An apparently serious limitation of the assumption is its deter-ministic character. That is, it asserts that the difference betweenthe responses for any two treatments is exactly the same for allunits. In fact all the consequences that we shall use follow fromthe more plausible extended version in which some variation intreatment effect is allowed.

Extended assumption of unit-treatment additivity. With other-wise the same notation and assumptions as before, we assume thatthe response if Tj is applied to Us is

ξs + τj + ηjs, (2.2)

where the ηjs are independent and identically distributed randomvariables which may, without loss of generality, be taken to havezero mean.

Thus the treatment difference between any two treatments onany particular unit s is modified by addition of a random termwhich is the difference of the two random terms in the originalspecification.

The random terms ηjs represent two sources of variation. Thefirst is technical error and represents an error of measurement orsampling. To this extent its variance can be estimated if indepen-dent duplicate measurements or samples are taken. The second isreal variation in treatment effect from unit to unit, or what willlater be called treatment by unit interaction, and this cannot beestimated separately from variation among the units, i.e. the vari-ation of the ξs.

For simplicity all the subsequent calculations using the assump-


tion of unit-treatment additivity will be based on the simple ver-sion of the assumption, but it can be shown that the conclusionsall hold under the extended form.

The assumption of unit-treatment additivity is not directly test-able, as only one outcome can be observed on each experimentalunit. However, it can be indirectly tested by examining some of itsconsequences. For example, it may be possible to group units ac-cording to some property, using supplementary information aboutthe units. Then if the treatment effect is estimated separately fordifferent groups the results should differ only by random errors ofestimation.

The assumption of unit-treatment additivity depends on the par-ticular form of response used, and is not invariant under nonlineartransformations of the response. For example the effect of treat-ments on a necessarily positive response might plausibly be mul-tiplicative, suggesting, for some purposes, a log transformation.Unit-treatment additivity implies also that the variance of responseis the same for all treatments, thus allowing some test of the as-sumption without further information about the units and, in somecases at least, allowing a suitable scale for response to be estimatedfrom the data on the basis of achieving constant variance.

Under the assumption of unit-treatment additivity it is some-times reasonable to call the difference between τj1 and τj2 thecausal effect of Tj1 compared with Tj2 . It measures the difference,under the general conditions of the experiment, between the re-sponse under Tj1 and the response that would have been obtainedunder Tj2 .

In general, the formulation as ξs + τj is overparameterized anda constraint such as Στj = 0 can be imposed without loss of gen-erality. For two treatments it is more symmetrical to write theresponses under T and C as respectively

ξs + δ, ξs − δ, (2.3)

so that the treatment difference of interest is ∆ = 2δ.The assumption that the response on one unit is unaffected by

which treatment is applied to another unit needs especial consider-ation when physically the same material is used as an experimentalunit more than once. We return to further discussion of this pointin Section 4.3.


2.2.3 Equivalent linear model

The simplest model for the comparison of two treatments, T andC, in which all variation other than the difference between treat-ments is regarded as totally random, is a linear model of the fol-lowing form. Represent the observations on T and on C by randomvariables

YT1, . . . , YTr; YC1, . . . , YCr (2.4)and suppose that

E(YTj) = µ+ δ, E(YCj) = µ− δ. (2.5)

This is merely a convenient reparameterization of a model assigningthe two groups arbitrary expected values. Equivalently we write

YTj = µ+ δ + εTj , YCj = µ− δ + εCj, (2.6)

where the random variables ε have by definition zero expectation.To complete the specification more has to be set out about the

distribution of the ε. We identify two possibilities.

Second moment assumption. The εj are mutually uncorrelated andall have the same variance, σ2.Normal theory assumption. The εj are independently normally dis-tributed with constant variance.

The least squares estimate of ∆, the difference of the two means,is

∆ = YT. − YC., (2.7)where YT. is the mean response on the units receiving T , and YC. isthe mean response on the units receiving C. Here and throughoutwe denote summation over a subscript by a full stop. The residualmean square is

s2 = Σ(YTj − YT.)2 + (YCj − YC.)2/(2r − 2). (2.8)

Defining the estimated variance of ∆ by

evar(∆) = 2s2/r, (2.9)

we have under (2.6) and the second moment assumptions that

E(∆) = ∆, (2.10)Eevar(∆) = var(∆). (2.11)

The optimality properties of the estimates of ∆ and var(∆) un-der both the second moment assumption and the normal theory


assumption follow from the same results in the general linear modeland are detailed in Appendix A. For example, under the secondmoment assumption ∆ is the minimum variance unbiased estimatethat is linear in the observations, and under normality is the min-imum variance estimate among all unbiased estimates. Of course,such optimality considerations, while reassuring, take no accountof special concerns such as the presence of individual defective ob-servations.

2.2.4 Randomization-based analysis

We now develop a conceptually different approach to the analysisassuming unit-treatment additivity and regarding probability asentering only via the randomization used in allocating treatmentsto the experimental units.

We again write random variables representing the observationson T and C respectively

YT1, . . . , YTr, (2.12)YC1, . . . , YCr, (2.13)

where the order is that obtained by, say, the second scheme of ran-domization specified in Section 2.2.1. Thus YT1, for example, isequally likely to arise from any of the n = 2r experimental units.With PR denoting the probability measure induced over the exper-imental units by the randomization, we have that, for example,

PR(YTj ∈ Us) = (2r)−1,

PR(YTj ∈ Us, YCk ∈ Ut, s 6= t) = 2r(2r − 1)−1, (2.14)

where unit Us is the jth to receive T .Suppose now that we estimate both ∆ = 2δ and its standard

error by the linear model formulae for the comparison of two inde-pendent samples, given in equations (2.7) and (2.9). The propertiesof these estimates under the probability distribution induced by therandomization can be obtained, and the central results are that inparallel to (2.10) and (2.11),

ER(∆) = ∆, (2.15)ERevar(∆) = varR(∆), (2.16)

where ER and varR denote expectation and variance calculatedunder the randomization distribution.


We may call these second moment properties of the randomiza-tion distribution. They are best understood by examining a simplespecial case, for instance n = 2r = 4, when the 4! = 24 distinct per-mutations lead to six effectively different treatment arrangements.

The simplest proof of (2.15) and (2.16) is obtained by introduc-ing indicator random variables with, for the sth unit, Is takingvalues 1 or 0 according as T or C is allocated to that unit.

The contribution of the sth unit to the sample total for YT. isthus

Is(ξs + δ), (2.17)

whereas the contribution for C is

(1 − Is)(ξs − δ). (2.18)

Thus∆ = ΣIs(ξs + δ)− (1− Is)(ξs − δ)/r (2.19)

and the probability properties follow from those of Is.A more elegant and general argument is in outline as follows:

YT. − YC. = ∆ + L(ξ), (2.20)

where L(ξ) is a linear combination of the ξ’s depending on theparticular allocation. Now ER(L) is a symmetric linear function,i.e. is invariant under permutation of the units. Therefore

ER(L) = aΣξs, (2.21)

say. But if ξs = ξ is constant for all s, then L = 0 which impliesa = 0.

Similarly both varR(∆), ERevar(∆) do not depend on ∆ andare symmetric second degree functions of ξ1, . . . , ξ2r vanishing ifall ξs are equal. Hence

varR(∆) = b1Σ(ξs − ξ.)2,ERevar(∆) = b2Σ(ξs − ξ.)2,

where b1, b2 are constants depending only on n. To find the b’swe may choose any special ξ’s, such as ξ1 = 1, ξs = 0, (s 6= 1) orsuppose that ξ1, . . . , ξn are independent and identically distributedrandom variables with mean zero and variance ψ2. This is a techni-cal mathematical trick, not a physical assumption about the vari-ability.

Let E denote expectation with respect to that distribution andapply E to both sides of last two equations. The expectations on


the left are known and

EΣ(ξs − ξ.)2 = (2r − 1)ψ2; (2.22)

it follows thatb1 = b2 = 2/r(2r − 1). (2.23)

Thus standard two-sample analysis based on an assumption ofindependent and identically distributed errors has a second mo-ment justification under randomization theory via unit-treatmentadditivity. The same holds very generally for designs considered inlater chapters.

The second moment optimality of these procedures follows un-der randomization theory in essentially the same way as under aphysical model. There is no obvious stronger optimality propertysolely in a randomization-based framework.

2.2.5 Randomization test and confidence limits

Of more direct interest than ∆ and evar(∆) is the pivotal statistic

(∆−∆)/√

evar(∆) (2.24)

that would generate confidence limits for ∆ but more complicatedarguments are needed for direct analytical examination of its ran-domization distribution.

Although we do not in this book put much emphasis on testsof significance we note briefly that randomization generates a for-mally exact test of significance and confidence limits for ∆. To seewhether ∆0 is in the confidence region at a given level we subtract∆0 from all values in T and test ∆ = 0.

This null hypothesis asserts that the observations are totallyunaffected by treatment allocation. We may thus write down theobservations that would have been obtained under all possible al-locations of treatments to units. Each such arrangement has equalprobability under the null hypothesis. The distribution of any teststatistic then follows. Using the constraints of the randomizationformulation, simplification of the test statistic is often possible.

We illustrate these points briefly on the comparison of two treat-ments T and C, with equal numbers of units for each treatmentand randomization by one of the methods of Section 2.2.2. Supposethat the observations are

P = yT1, . . . , yTr; yC1, . . . , yCr, (2.25)


which can be regarded as forming a finite population P . WritemP , wP for the mean and effective variance of this finite populationdefined as

mP = Σyu/(2r), (2.26)wP = Σ(yu −mP)2/(2r − 1), (2.27)

where the sum is over all members of the finite population. Totest the null hypothesis a test statistic has to be chosen that isdefined for every possible treatment allocation. One natural choiceis the two-sample Student t statistic. It is easily shown that this isa function of the constants mP , wP and of YT., the mean responseof the units receiving T . Only YT. is a random variable over thevarious treatment allocations and therefore we can treat it as thetest statistic.

It is possible to find the exact distribution of YT. under thenull hypothesis by enumerating all distinct samples of size r fromP under sampling without replacement. Then the probability ofa value as or more extreme than the observed value yT. can befound. Alternatively we may use the theory of sampling withoutreplacement from a finite population to show that

ER(YT.) = mP , (2.28)varR(YT.) = wP/(2r). (2.29)

Higher moments are available but in many contexts a strongcentral limit effect operates and a test based on a normal approx-imation for the null distribution of YT. will be adequate.

A totally artificial illustration of these formulae is as follows.Suppose that r = 2 and that the observations on T and C arerespectively 3, 1 and−1,−3. Under the null hypothesis the possiblevalues of observations on T corresponding to the six choices of unitsto be allocated to T are

(−1,−3); (−1, 1); (−1, 3); (−3, 1); (−3, 3); (1, 3) (2.30)

so that the induced randomization distribution of YT. has mass 1/6at−2,−1, 1, 2 and mass 1/3 at 0. The one-sided level of significanceof the data is 1/6. The mean and variance of the distribution arerespectively 0 and 5/3; note that the normal approximation to thesignificance level is Φ(−2

√3/√

5) ' 0.06, which, considering theextreme discreteness of the permutation distribution, is not toofar from the exact value.


2.2.6 More than two treatments

The previous discussion has concentrated for simplicity on the com-parison of two treatments T and C. Suppose now that there are vtreatments T1, . . . , Tv. In many ways the previous discussion carriesthrough with little change.

The first new point of design concerns whether the same num-ber of units should be assigned to each treatment. If there is noobvious structure to the treatments, so that for instance all com-parisons of pairs of treatments are of equal interest, then equalreplication will be natural and optimal, for example in the senseof minimizing the average variance over all comparisons of pairs oftreatments. Unequal interest in different comparisons may suggestunequal replication.

For example, suppose that there is a special treatment T0, pos-sibly a control, and v ordinary treatments and that particular in-terest focuses on the comparisons of the ordinary treatments withT0. Suppose that each ordinary treatment is replicated r times andthat T0 occurs cr times. Then the variance of a difference of interestis proportional to

1/r + 1/(cr) (2.31)

and we aim to minimize this subject to a given total number ofobservations n = r(v + c). We eliminate r and obtain a simple ap-proximation by regarding c as a continuous variable; the minimumis at c =

√v. With three or four ordinary treatments there is an

appreciable gain in efficiency, by this criterion, by replicating T0

up to twice as often as the other treatments.The assumption of unit-treatment additivity is as given at (2.1).

The equivalent linear model is

Yjs = µ+ τj + εjs (2.32)

with j = 1, . . . , v and s = 1, . . . , r. An important aspect of hav-ing more than two treatments is that we may be interested inmore complicated comparisons than simple differences. We shallcall Σljτj , where Σlj = 0, a treatment contrast. The special caseof a difference τj1 − τj2 is called a simple contrast and examples ofmore general contrasts are

(τ1 + τ2 + τ3)/3− τ5, (τ1 + τ2 + τ3)/3− (τ4 + τ6)/2. (2.33)

We defer more detailed discussion of contrasts to Section 3.5.2but in the meantime note that the general contrast Σljτj is es-


timated by Σlj Yj. with, in the simple case of equal replication,variance

Σl2jσ2/r, (2.34)

estimated by replacing σ2 by the mean square within treatmentgroups. Under complete randomization and the assumption of unit-treatment additivity the correspondence between the propertiesfound under the physical model and under randomization theorydiscussed in Section 2.2.4 carries through.

2.3 Retrospective adjustment for bias

Even with carefully designed experiments there may be a need inthe analysis to make some adjustment for bias. In some situationswhere randomization has been used, there may be some suggestionfrom the data that either by accident effective balance of importantfeatures has not been achieved or that possibly the implementationof the randomization has been ineffective. Alternatively it may notbe practicable to use randomization to eliminate systematic error.

Sometimes, especially with well-standardized physical measure-ments, such corrections are done on an a priori basis.

Illustration. It may not be feasible precisely to control the tem-perature at which the measurement on each unit is made but thetemperature dependence of the property in question, for exampleelectrical resistance, may be known with sufficient precision for ana priori correction to be made.

For the remainder of the discussion, we assume that any biascorrection has to be estimated internally from the data.

In general we suppose that on the sth experimental unit there isavailable a q × 1 vector zs of baseline explanatory variables, mea-sured in principle before randomization. For simplicity we discussmostly q = 1 and two treatment groups.

If the relevance of z is recognized at the design stage then thecompletely random assignment of treatments to units discussed inthis chapter is inappropriate unless there are practical constraintsthat prevent the randomization scheme being modified to take ac-count of z. We discuss such randomization schemes in Chapters3 and 4. If, however, the relevance of z is only recognized retro-spectively, it will be important to check that it is indeed properlyregarded as a baseline variable and, if there is a serious lack of bal-ance between the treatment groups with respect to z, to consider


whether there is a reasonable explanation as to why this differ-ence occurred in a context where randomization should, with highprobability, have removed any substantial bias.

Suppose, however, that an apparent bias does exist. Figure 2.1shows three of the various possibilities that can arise. In Fig. 2.1athe clear lack of parallelism means that a single estimate of treat-ment difference is at best incomplete and will depend on the par-ticular value of z used for comparison. Alternatively a transforma-tion of the response scale inducing parallelism may be found. InFig. 2.1b the crossing over in the range of the data accentuates thedangers of a single comparison; the qualitative consequences maywell be stronger than those in the situation of Fig. 2.1a. Finallyin Fig. 2.1c the effective parallelism of the two relations suggestsa correction for bias equivalent to using the vertical difference be-tween the two lines as an estimated treatment difference preferableto a direct comparison of unadjusted means which in the particularinstance shown underestimates the difference between T and C.

A formulation based on a linear model is to write

E(YTj) = µ+ δ + β(zTj − z..), (2.35)E(YCj) = µ− δ + β(zCj − z..), (2.36)

where zTj and zCj are the values of z on the jth units to receiveT and C, respectively, and z.. the overall average z. The inclusionof z.. is not essential but preserves an interpretation for µ as theexpected value of Y...

We make the normal theory or second moment assumptionsabout the error terms as in Section 2.2.3. Note that ∆ = 2δ mea-sures the difference between T and C in the expected response atany fixed value of z. Provided that z is a genuine baseline variable,and the assumption of parallelism is satisfied, ∆ remains a measureof the effect of changing the treatment from C to T .

If z is a q × 1 vector the only change is to replace βz by βT z, βbecoming a q × 1 vector of parameters; see also Section 3.6.

A least squares analysis of the model gives for scalar z, 2r 0 0

0 2r r(zT. − zC.)0 r(zT. − zC.) Szz

µ

δ

β

=

r(YT. + YC.)

r(YT. − YC.)ΣYTj(zTj − z..) + YCj(zCj − z..)

, (2.37)


(a)Z

(b)Z

Response Response

(c)Z

Response

Figure 2.1 Two treatments: ×, T and , C. Response variable, Y ; Base-line variable, z measured before randomization and therefore unaffectedby treatments. In (a) nonparallelism means that unadjusted estimate oftreatment effect is biased, and adjustment depends on particular valuesof z. In (b) crossing over of relations means that even qualitative inter-pretation of treatment effect is different at different values of z. In (c)essential parallelism means that treatment effect can be estimated fromvertical displacement of lines.

whereSzz = Σ(zTj − z..)2 + (zCj − z2

..). (2.38)

The least squares equations yield in particular

∆ = 2δ = YT. − YC. − β(zT. − zC.), (2.39)

where β is the least squares estimated slope, agreeing preciselywith the informal geometric argument given above. It follows alsoeither by a direct calculation or by inverting the matrix of the leastsquares equations that

var(∆) = σ22/r + (zT. − zC.)2/Rzz, (2.40)

where

Rzz = Σ(zTj − zT.)2 + (zCj − zC.)2. (2.41)


For a given value of σ2 the variance is inflated as compared withthat for the difference between two means because of samplingerror in β.

This procedure is known in the older literature as analysis of co-variance. If a standard least squares regression package is used todo the calculations it may be simpler to use more direct parameter-izations than that used here although the advantages of choosingparameters which have a direct interpretation should not be over-looked. In extreme cases the subtraction of a suitable constant, notnecessarily the overall mean, from z, and possibly rescaling by apower of 10, may be needed to avoid numerical instability and alsoto aid interpretability.

We have presented the bias correction via an assumed linearmodel. The relation with randomization theory does, however, needdiscussion: have we totally abandoned a randomization viewpoint?First, as we have stressed, if the relevance of the bias inducing vari-able z had been clear from the start then normally this bias wouldhave been avoided by using a different form of randomization, forexample a randomized block design; see Chapter 3. When completerandomization has, however, been used and the role of z is consid-ered retrospectively then the quantity zT.− zC., which is a randomvariable under randomization, becomes an ancillary statistic. Thatis, to be relevant to the inference under discussion the ensemble ofhypothetical repetitions should hold zT. − zC. fixed, either exactlyor approximately. It is possible to hold this ancillary exactly fixedonly in special cases, notably when z corresponds to qualitativegroupings of the units. Otherwise it can be shown that an appro-priate notion of approximate conditioning induces the appropriaterandomization properties for the analysis of covariance estimateof ∆, and in that sense there is no conflict between randomizationtheory and that based on an assumed linear model. Put differently,randomization approximately justifies the assumed linear model.

2.4 Some more on randomization

In theory randomization is a powerful notion with a number ofimportant features.

First it removes bias.Secondly it allows what can fairly reasonably be called causal

inference in that if a clear difference between two treatment groups


arises it can only be either an accident of the randomization or aconsequence of the treatment.

Thirdly it allows the calculation of an estimated variance for atreatment contrast in the above and many other situations basedonly on a single assumption of unit-treatment additivity and with-out the need to supply an ad hoc model for each new design.

Finally it allows the calculation of confidence limits for treatmentdifferences, in principle based on unit-treatment additivity alone.

In practice the role of randomization ranges from being crucial insome contexts to being of relatively minor importance in others; wedo stress, however, the general desirability of impersonal allocationschemes.

There are, moreover, some conceptual difficulties when we con-sider more realistic situations. The discussion so far, except for Sec-tion 2.3, has supposed both that there is no baseline informationavailable on the experimental units and that the randomization isin effect done in one operation rather than sequentially in time.

The absence of baseline information means that all arrange-ments of treatments can be regarded on an equal footing in therandomization-induced probability calculations. In practice poten-tially relevant information on the units is nearly always available.The broad strategy of the subsequent chapters is that such infor-mation, when judged important, is taken account of in the design,in particular to improve precision, and randomization used to safe-guard against other sources of variation. In the last analysis someset of designs is regarded as on an equal footing to provide a rele-vant reference set for inference. Additional features can in principlebe covered by the adjustments of the type discussed in Section 2.3,but some frugality in the use of this idea is needed, especially wherethere are many baseline variables.

A need to perform the randomization one unit at a time, suchas in clinical trials in which patients are accrued in sequence overan appreciable time, raises different issues unless decisions aboutdifferent patients are quite separate, for example by virtually al-ways being in different centres. For example, if a single group of2r patients is to be allocated equally to two treatments and thisis done sequentially, a point will almost always be reached whereall individuals must be allocated to a particular treatment in orderto force the necessary balance and the advantages of concealmentassociated with the randomization are lost. On the other hand ifall patients were independently randomized, there would be some


chance, even if only a fairly small one, of extreme imbalance innumbers. The most reasonable compromise is to group the pa-tients into successive blocks and to randomize ensuring balanceof numbers within each block. A suitable number of patients perblock is often between 6 and 12 thus in the first case ensuring thateach block has three occurrences of each of two treatments. Inline with the discussion in the next chapter it would often be rea-sonable to stratify by one or two important features; for examplethere might be separate blocks of men and women. Randomizationschemes that adapt to the information available in earlier stages ofthe experiment are discussed in Section 8.2.

2.5 More on causality

We return to the issue of causality introduced in Section 1.8. Forease of exposition we again suppose there to be just two treatments,T and C, possibly a new treatment and a control. The counterfac-tual definition of causality introduced in Section 1.8, the notionthat an individual receiving, say, T gives a response systematicallydifferent from the response that would have resulted had the indi-vidual received C, other things being equal, is encapsulated in theassumption of unit-treatment additivity in either its simple or inits extended form. Indeed the general notion may be regarded asan extension of unit-treatment additivity to possibly observationalcontexts.

In the above sense, causality can be inferred from a randomizedexperiment with uncertainty expressed via a significance test, forexample via the randomization-based test of Section 2.2.5. The ar-gument is direct. Suppose a set of experimental units is randomizedbetween T and C, a response is observed, and a significance testshows very strong evidence against the null hypothesis of treat-ment identity and evidence, say, that the parameter ∆ is positive.Then either an extreme chance fluctuation has occurred, to whichthe significance level of the test refers, or units receiving T have ahigher response than they would have yielded had they received Cand this is precisely the definition of causality under discussion.

The situation is represented graphically in Fig. 2.2. Randomiza-tion breaks the possible edge between the unobserved confounderU and treatment.

In a comparable observational study the possibility of an un-observed confounder affecting both treatment and response in a


Y T,C

(a)

(b)

U

Y T,C

(rand)

U

Figure 2.2 Unobserved confounder, U ; treatment, T, C; response, Y . Notreatment difference, no edge between T, C and Y . In an observationalstudy, (a), there are edges between U and other nodes. Marginalizationover U can be shown to induce dependency between T, C and Y . Ina randomized experiment, (b), randomization of T, C ensures there isno edge to it from U . Marginalization over U does not induce an edgebetween T, C and Y .

systematic way would be an additional source of uncertainty, some-times a very serious one, that would make any causal interpretationmuch more tentative.

This conclusion highlighting the advantage of randomized exper-iments over observational studies is very important. Neverthelessthere are some qualifications to it.

First we have assumed the issues of noncompliance discussed inSection 1.8 are unimportant: the treatments as implemented areassumed to be genuinely those that it is required to study. Animplication for design concerns the importance of measuring anyfeatures arising throughout the implementation of the experimentthat might have an unanticipated distortion of the treatments fromthose that were originally specified.

Next it is assumed that randomization has addressed all sources


of potential systematic error including any associated directly withthe measurement of response.

The most important assumption, however, is that the treatmenteffect, ∆, is essentially constant and in particular does not havesystematic sign reversals between different units. That is, in theterminology to be introduced later there is no major interactionbetween the treatment effect and intrinsic features of the experi-mental units.

In an extension of model (2.3) in which each experimental unithas its own treatment parameter, the difference estimated in a ran-domized experiment is the average treatment effect over the full setof units used in the experiment. If, moreover, these were a randomsample from a population of units then the average treatment effectover that population is estimated. Such conclusions have much lessforce whenever the units used in the experiment are unrepresenta-tive of some target population or if substantial and interpretableinteractions occur with features of the experimental units.

There is a connection of these matters with the apparently an-tithetical notions of generalizability and specificity. Suppose forexample that a randomized experiment shows a clear superiorityin some sense of T over C. Under what circumstances may we rea-sonably expect the superiority to be reproduced over a new some-what different set of units perhaps in different external conditions?This a matter of generalizability. On the other hand the questionof whether T will give an improved response for a particular newexperimental unit is one of specificity. Key aids in both aspects areunderstanding of underlying process and of the nature of any in-teractions of the treatment effect with features of the experimentalunits. Both these, and especially the latter, may help clarify theconditions under which the superiority of T may not be achieved.The main implication in the context of the present book concernsthe importance of factorial experiments, to be discussed in Chap-ters 5 and 6, and in particular factorial experiments in which oneor more of the factors correspond to properties of the experimentalunits.


Formal randomization was introduced into the design of experi-ments by R. A. Fisher. The developments for agricultural experi-ments especially by Yates, as for example in Yates (1937), put cen-


tral importance on achieving meaningful estimates of error via therandomization rather than via physical assumptions about the er-ror structure. In some countries, however, this view has not gainedwide acceptance. Yates (1951a,b) discussed randomization moresystematically. For a general mathematical discussion of the ba-sis of randomization theory, see Bailey and Rowley (1987). For acombinatorial nonprobabilistic formulation of the notion of ran-domization, see Singer and Pincus (1998).

Models based on unit-treatment additivity stem from Neyman(1923).

The relation between tests based on randomization and thosestemming from normal theory assumptions was discussed in detailin early work by Welch (1937) and Pitman (1937). See Hinkelmanand Kempthorne (1994) and Kempthorne (1952) for an account re-garding the randomization analysis as primary. Manly (1997) em-phasizes the direct role of randomization analyses in applications.For a discussion of the Central Limit Theorem, and Edgeworth andsaddle-point expansions connected with sampling without replace-ment from a finite population, see Thompson (1997).

A priori corrections for bias are widely used, for example in thephysical sciences for adjustments to standard temperature, etc.Corrections based explicitly on least squares analysis were the mo-tivation for the development of analysis of covariance. For a reviewof analysis of covariance, see Cox and McCullagh (1982). Similaradjustments are central to the careful analysis of observational datato attempt to adjust for unwanted lack of comparability of groups.See, for example, Rosenbaum (1999) and references therein.

For references on causality, see the Bibliographic notes to Chap-ter 1.


1. Suppose that in the comparison of two treatments with r unitsfor each treatment the observations are completely separated,for example that all the observations on T exceed all those onC. Show that the one-sided significance level under the random-ization distribution is (r!)2/(2r)!. Comment on the reasonable-ness or otherwise of the property that it does not depend on thenumerical values and in particular on the distance apart of thetwo sets of observations.


2. In the comparison of v equally replicated treatments in a com-pletely randomized design show that under a null hypothesis ofno treatment effects the randomization expectation of the meansquares between and within treatments, defined in the standardway, are the same. What further calculations would be desir-able to examine the distribution under randomization of thestandard F statistic?

3. Suppose that on each unit a property, for example blood pres-sure, is measured before randomization and then the same prop-erty measured as a response after treatment. Discuss the relativemerits of taking as response on each individual the difference be-tween the values after and before randomization versus takingas response the measure after randomization and adjusting forregression on the value before. See Cox (1957, 1958, Chapter 4)and Cox and Snell (1981, Example D).

4. Develop the analysis first for two treatments and then for vtreatments for testing the parallelism of the regression linesinvolved in a regression adjustment. Sketch some possible ap-proaches to interpretation if nonparallelism is found.

5. Show that in the randomization analysis of the comparison oftwo treatments with a binary response, the randomization testof a null hypothesis of no effect is the exact most powerful condi-tional test of the equality of binomial parameters, usually calledFisher’s exact test (Pearson, 1947; Cox and Hinkley, 1974, Chap-ter 5). If the responses are individually binomial, correspondingto the numbers of successes in, say, t trials show that a ran-domization test is essentially the standard Mantel-Haenszel testwith a sandwich estimate of variance (McCullagh and Nelder,1989, Chapter 14).

6. Discuss a randomization formulation of the situation of Exercise5 in the nonnull case. See Copas (1973).

7. Suppose that in an experiment to compare two treatments, Tand C, the response Y of interest is very expensive to measure.It is, however, relatively inexpensive to measure a surrogate re-sponse variable, X , thought to be quite highly correlated withY . It is therefore proposed to measure X on all units and bothX and Y on a subsample. Discuss some of the issues of designand analysis that this raises.


8. Individual potential experimental units are grouped into clus-ters each of k individuals. A number of treatments are thenrandomized to clusters, i.e. all individuals in the same clusterreceive the same treatment. What would be the likely conse-quences of analysing such an experiment as if the treatmentshad been randomized to individuals? Cornfield (1978) in thecontext of clinical trials called such an analysis “an exercise inself-deception”. Was he justified?

9. Show that in a large completely randomized experiment underthe model of unit-treatment additivity the sample cumulativedistribution functions of response to the different treatmentsdiffer only by translations. How could such a hypothesis betested nonparametrically? Discuss why in practice examinationof homogeneity of variance would often be preferable. First fortwo treatments and then for more than two treatments suggestparametric and nonparametric methods for finding a monotonetransformation inducing translation structure and for testingwhether such a transformation exists. Nonparametric analysisof completely randomized and randomized block designs is dis-cussed in Lehmann (1975).

10. Studies in various medical fields, for example psychiatry (John-son, 1998), have shown that where the same treatment contrastshave been estimated both via randomized clinical trials and viaobservational studies, the former tend to show smaller advan-tages of new procedures than the latter. Why might this be?

11. When the sequential blocked randomization scheme of Section2.4 is used in clinical trials it is relatively common to disregardthe blocking in the statistical analysis. How might some justifi-cation be given of the disregard of the principle that constraintsused in design should be reflected in the statistical model?


CHAPTER 3

Control of haphazard variation

3.1 General remarks

In the previous chapter the primary emphasis was on the elimina-tion of systematic error. We now turn to the control of haphazarderror, which may enter at any of the phases of an investigation.Sources of haphazard error include intrinsic variation in the exper-imental units, variation introduced in the intermediate phases ofan investigation and measurement or sampling error in recordingresponse.

It is important that measures to control the effect of such vari-ation cover all the main sources of variation and some knowledge,even if rather qualitative, of the relative importance of the differentsources is needed.

The ways in which the effect of haphazard variability can bereduced include the following approaches.

1. It may be possible to use more uniform material, improved mea-suring techniques and more internal replication, i.e. repeat ob-servations on each unit.

2. It may be possible to use more experimental units.

3. The technique of blocking, discussed in detail below, is a widelyapplicable technique for improving precision.

4. Adjustment for baseline features by the techniques for bias re-moval discussed in Section 2.3 can be used.

5. Special models of error structure may be constructed, for exam-ple based on a time series or spatial model.

On the first two points we make here only incidental comments.There will usually be limits to the increase in precision achiev-

able by use of more uniform material and in technological experi-ments the wide applicability of the conclusions may be prejudicedif artificial uniformity is forced.


Illustration. In some contexts it may be possible to use pairsof homozygotic twins as experimental units in the way set out indetail in Section 3.3. There may, however, be some doubt as towhether conclusions apply to a wider population of individuals.More broadly, in a study to elucidate some new phenomenon orsuspected effect it will usually be best to begin with the circum-stances under which that effect occurs in its most clear-cut form.In a study in which practical application is of fairly direct concernthe representativeness of the experimental conditions merits moreemphasis, especially if it is suspected that the treatment effectshave different signs in different individuals.

In principle precision can always be improved by increasing thenumber of experimental units. The standard error of treatmentcomparisons is inversely proportional to the square root of thenumber of units, provided the residual standard deviation remainsconstant. In practice the investigator’s control may be weaker inlarge investigations than in small so that the theoretical increasein the number of units needed to shorten the resulting confidencelimits for treatment effects is often an underestimate.

3.2 Precision improvement by blocking

The central idea behind blocking is an entirely commonsense one ofaiming to compare like with like. Using whatever prior knowledgeis available about which baseline features of the units and otheraspects of the experimental set-up are strongly associated withpotential response, we group the units into blocks such that allthe units in any one block are likely to give similar responses inthe absence of treatment differences. Then, in the simplest case,by allocating one unit in each block to each treatment, treatmentsare compared on units within the same block.

The formation of blocks is usually, however, quite constrainedin addition by the way in which the experiment is conducted. Forexample, in a laboratory experiment a block might correspond tothe work that can be done in a day. In our initial discussion weregard the different blocks as merely convenient groupings withoutindividual interpretation. Thus it makes no sense to try to interpretdifferences between blocks, except possibly as a guide for future ex-perimentation to see whether the blocking has been effective in er-ror control. Sometimes, however, some aspects of blocking do havea clear interpretation, and then the issues of Chapter 5 concerned


with factorial experiments apply. In such cases it is preferable touse the term stratification rather than blocking.

Illustrations. Typical ways of forming blocks are to group to-gether neighbouring plots of ground, responses from one subjectin one session of a psychological experiment under different con-ditions, batches of material produced on one machine, where sev-eral similar machines are producing nominally the same product,groups of genetically similar animals of the same gender and initialbody weight, pairs of homozygotic twins, the two eyes of the samesubject in an opthalmological experiment, and so on. Note, how-ever, that if gender were a defining variable for blocks, i.e. strata,we would likely want not only to compare treatments but also toexamine whether treatment differences are the same for males andfemales and this brings in aspects that we ignore in the presentchapter.

3.3 Matched pairs

3.3.1 Model and analysis

Suppose that we have just two treatments, T and C, for comparisonand that we can group the experimental units into pairs, so thatin the absence of treatment differences similar responses are to beexpected in the two units within the same pair or block.

It is now reasonable from many viewpoints to assign one mem-ber of the pair to T and one to C and, moreover, in the absenceof additional structure, to randomize the allocation within eachpair independently from pair to pair. This yields what we call thematched pair design.

Thus if we label the units

U11, U21; U12, U22; . . . ; U1r, U2r (3.1)

a possible design would be

T,C; C, T ; . . . ; T,C. (3.2)

As in Chapter 2, a linear model that directly corresponds withrandomization theory can be constructed. The broad principle insetting up such a physical linear model is that randomization con-straints forced by the design are represented by parameters in thelinear model. Writing YTs, YCs for the observations on treatment


and control for the sth pair, we have the model

YTs = µ+ βs + δ + εTs, YCs = µ+ βs − δ + εCs, (3.3)

where the ε are random variables of mean zero. As in Section 2.2,either the normal theory or the second moment assumption aboutthe errors may be made; the normal theory assumption leads todistributional results and strong optimality properties.

Model (3.3) is overparameterized, but this is often convenientto achieve a symmetrical formulation. The redundancy could beavoided here by, for example, setting µ to any arbitrary knownvalue, such as zero.

A least squares analysis of this model can be done in severalways. The simplest, for this very special case, is to transform theYTs, YCs to sums, Bs and differences, Ds. Because this is propor-tional to an orthogonal transformation, the transformed observa-tions are also uncorrelated and have constant variance. Further inthe linear model for the new variables we have

E(Bs) = 2(µ+ βs), E(Ds) = 2δ = ∆. (3.4)

It follows that, so long as the βs are regarded as unknown parame-ters unconnected with ∆, the least squares estimate of ∆ dependsonly on the differencesDs and is in fact the mean of the differences,

∆ = D. = YT. − YC., (3.5)

withvar(∆) = var(Ds)/r = 2σ2/r, (3.6)

where σ2 is the variance of ε. Finally σ2 is estimated as

s2 = Σ(Ds − D.)2/2(r − 1), (3.7)

so thatevar(∆) = 2s2/r. (3.8)

In line with the discussion in Section 2.2.4 we now show that theproperties just established under the linear model and the secondmoment assumption also follow from the randomization used inallocating treatments to units, under the unit-treatment additivityassumption. This assumption specifies the response on the sth pairto be (ξ1s + δ, ξ2s− δ) if the first unit in that pair is randomized totreatment and (ξ1s − δ, ξ2s + δ) if it is randomized to control. We


then have

ER(∆) = ∆, ERevar(∆) = varR(∆). (3.9)

To prove the second result we note that both sides of the equa-tion do not depend on ∆ and are quadratic functions of the ξjs.They are invariant under permutations of the numbering of thepairs 1, . . . , r, and under permutations of the two units in any pair.Both sides are zero if ξ1s = ξ2s, s = 1, . . . , r. It follows that bothsides of the equation are constant multiples of

Σ(ξ1s − ξ2s)2 (3.10)

and consistency with the least squares analysis requires that theconstants of proportionality are equal. In fact, for example,

ER(s2) = Σ(ξ1s − ξ2s)2/(2r). (3.11)

Although not necessary for the discussion of the matched pairdesign, it is helpful for later discussion to set out the relation withanalysis of variance. In terms of the original responses Y the es-timation of µ, βs is orthogonal to the estimation of ∆ and theanalysis of variance arises from the following decompositions.

First there is a representation of the originating random obser-vations in the form

YTs = Y.. + (YT. − Y..) + (Y.s − Y..)+(YTs − YT. − Y.s + Y..), (3.12)

YCs = Y.. + (YC. − Y..) + (Y.s − Y..)+(YCs − YC. − Y.s + Y..). (3.13)

Regarded as a decomposition of the full vector of observations, thishas orthogonal components.

Secondly because of that orthogonality the squared norms of thecomponents add to give

ΣY 2js = ΣY 2

.. +Σ(Yj.−Y..)2+Σ(Y.s−Y..)2+Σ(Yjs−Yj.−Y.s+Y..)2 :(3.14)

note that Σ represents a sum over all observations so that, forexample, ΣY 2

.. = 2rY 2.. . In this particular case the sums of squares

can be expressed in simpler forms. For example the last term isΣ(Ds − D.)2/2. The squared norms on the right-hand side areconventionally called respectively sums of squares for general mean,for treatments, for pairs and for residual or error.


Thirdly the dimensions of the spaces spanned by the compo-nent vectors, as the vector of observations lies in the full space ofdimension 2r, also are additive:

2r = 1 + 1 + (r − 1) + (r − 1). (3.15)

These are conventionally called degrees of freedom and mean squaresare defined for each term as the sum of squares divided by the de-grees of freedom. Finally, under the physical linear model (3.3) theresidual mean square has expectation σ2.

3.3.2 A modified matched pair design

In some matched pairs experiments we might wish to include somepairs of units both of which receive the same treatment. Cost con-siderations might sometimes suggest this as a preferable design,although in that case redefinition of an experimental unit as a pairof original units would be called for and the use of a mixture ofdesigns would not be entirely natural. If, however, there is somesuspicion that the two units in a pair do not react independently,i.e. there is doubt about one of the fundamental assumptions ofunit-treatment additivity, then a mixture of matched pairs andpairs both treated the same might be appropriate.

Illustration. An opthalmological use of matched pairs might in-volve using left and right eyes as distinct units, assigning differenttreatments to the two eyes. This would not be a good design unlessthere were firm a priori grounds for considering that the treatmentapplied to one eye had negligible influence on the response in theother eye. Nevertheless as a check it might be decided for somepatients to assign the same treatment to both eyes, in effect to seewhether the treatment difference is the same in both environments.Such checks are, however, often of low sensitivity.

Consider a design in which the r matched pairs are augmentedby m pairs in which both units receive the same treatment, mT

pairs receiving T and mC receiving C, with mT +mC = m. So longas the parameters βs in the matched pairs model describing inter-pair differences are arbitrary the additional observations give noinformation about the treatment effect. In particular a comparisonof the means of the mT and the mC complete pairs estimates ∆plus a contrast of totally unknown β’s.

Suppose, however, that the pairs are randomized between com-


plete and incomplete assignments. Then under randomization anal-ysis the β’s can be regarded in effect as random variables. In termsof a corresponding physical model we write for each observation

Yjs = µ± δ + βs + εjs, (3.16)

where the sign of δ depends on the treatment involved, the βs arenow zero mean random variables of variance σ2

B and the εjs are,as before, zero mean random variables of variance now denotedby σ2

W . All random variables are mutually uncorrelated or, in thenormal theory version, independently normally distributed.

It is again convenient to replace the individual observations bysums and differences. An outline of the analysis is as follows. Let∆MP and ∆UM denote treatment effects in the matched pairs andthe unmatched data respectively. These are estimated by the previ-ous estimate, now denoted by YMPT − YMPC , with variance 2σ2

W /rand by YUMT − YUMC with variance

(σ2B + σ2

W /2)(1/mT + 1/mC). (3.17)

If, as might quite often be the case, σ2B is large compared with

σ2W , the between block comparison may be of such low precision

as to be virtually useless.If the variance components are known we can thus test the hy-

pothesis that the treatment effect is, as anticipated a priori, thesame in the two parts of the experiment and subject to homo-geneity find a weighted mean as an estimate of the common ∆.Estimation of the two variance components is based on the sumof squares within pairs adjusting for treatment differences in thematched pair portion and on the sum of squares between pair totalsadjusting for treatment differences in the unmatched pair portion.

Under normal theory assumptions a preferable analysis for acommon ∆ is summarized in Exercise 3.3. There are five sufficientstatistics, two sums of squares and three means, and four unknownparameters. The log likelihood of these statistics can be found anda profile log likelihood for ∆ calculated.

The procedure of combining information from within and be-tween pair comparisons can be regarded as the simplest specialcase of the recovery of between-block information. More generalcases are discussed in Section 4.2.


3.4 Randomized block design

3.4.1 Model and analysis

Suppose now that we have more than two treatments and that theyare regarded as unstructured and on an equal footing and thereforeto be equally replicated. The discussion extends in a fairly directway when some treatments receive additional replication. With vtreatments, or varieties in the plant breeding context, we aim toproduce blocks of v units. As with matched pairs we try, subjectto administrative constraints on the experiment, to arrange that inthe absence of treatment effects, very similar responses are to beanticipated on the units within any one block. We allocate treat-ments independently from block to block and at random withineach block, subject to the constraint that each treatment occursonce in each block.

Illustration. Typical ways of forming blocks include compact ar-rangements of plots in a field chosen in the light of any knowledgeabout fertility gradients, batches of material that can be producedin one day or production period, and animals grouped on the basisof gender and initial body weight.

Let Yjs denote the observation on treatment Tj in block s. Notethat because of the randomization this observation may be on anyone of the units in block s in their original listing. In accordancewith the general principle that constraints on the randomizationare represented by parameters in the associated linear model, werepresent Yjs in the form

Yjs = µ+ τj + βs + εjs, (3.18)

where j = 1, . . . , v; s = 1, . . . , r and εjs are zero mean randomvariables satisfying the second moment or normal theory assump-tions. The least squares estimates of the parameters are determinedby the row and column means and in particular under the sum-mation constraints Στj = 0, Σβs = 0, we have τj = Yj. − Y.. andβs = Y.s−Y... The contrast Lτ = Σljτj is estimated by Lτ = Σlj Yj..

The decomposition of the observations, the sums of squares andthe degrees of freedom are as follows:

1. For the observations we write

Yjs = Y.. + (Yj. − Y..) + (Y.s − Y..)+(Yjs − Yj. − Y.s + Y..), (3.19)


a decomposition into orthogonal components.2. For the sums of squares we therefore have

ΣY 2js = ΣY 2

.. + Σ(Yj. − Y..)2 + Σ(Y.s − Y..)2

+ Σ(Yjs − Yj. − Y.s + Y..)2, (3.20)

where the summation is always over both suffices.3. For the degrees of freedom we have

rv = 1 + (v − 1) + (r − 1) + (r − 1)(v − 1). (3.21)

The residual mean square provides an unbiased estimate of thevariance. Let

s2 = Σ(Yjs − Yj. − Y.s + Y..)2/(r − 1)(v − 1). (3.22)

We now indicate how to establish the result E(s2) = σ2 under thesecond moment assumptions. In the linear model the residual sumof squares depends only on εjs, and not on the fixed parametersµ, τj and βs. Thus for the purpose of computing the expectedvalue of (3.22) we can set these parameters to zero. All sums ofsquares in (3.20) other than the residual have simple expectations:for example

EΣj,s(Yj. − Y..)2 = rEΣj(εj. − ε..)2 (3.23)= r(v − 1)var(εj.) = (v − 1)σ2. (3.24)

Similarly EΣj,s(Y.s − Y..)2 = (r − 1)σ2, E(Σj,sY2.. ) = σ2, and

that for the residual sum of squares follows by subtraction. Thusthe unbiased estimate of the variance of Lτ is

evar(Lτ ) = Σj l2js

2/r. (3.25)

The partition of the sums of squares given by (3.20) is oftenset out in an analysis of variance table, as for example Table 3.2below. This table has one line for each component of the sum ofsquares, with the usual convention that the sums of squares dueto the overall mean, nY 2

.. , is not displayed, and the total sum ofsquares is thus a corrected total Σ(Yjs − Y..)2.

The simple decomposition of the data vector and sum of squaresdepend crucially on the balance of the design. If, for example, sometreatments were missing in some blocks not merely would the or-thogonality of the component vectors be lost but the contrasts oftreatment means would not be independent of differences betweenblocks and vice versa. To extend the discussion to such cases more


elaborate methods based on a least squares analysis are needed.It becomes crucial to distinguish, for example, between the sumof squares for treatments ignoring blocks and the sum of squaresfor treatments adjusting for blocks, the latter measuring the ef-fect of introducing treatment effects after first allowing for blockdifferences.

The randomization model for the randomized block design usesthe assumption of unit-treatment additivity, as in the matchedpairs design. We label the units

U11, . . . , Uv1; U12, . . . , Uv2; . . . ; U1r, . . . , Uvr. (3.26)

The response on the unit in the sth block that is randomized totreatment Tj is

ξTjs + τj (3.27)where ξTjs is the response of that unit in block s in the absence oftreatment.

Under randomization theory properties such as

ERevar(Lτ ) = varR(Lτ ) (3.28)

are established by first showing that both sides are multiples of

Σ(ξjs − ξ.s)2. (3.29)

3.4.2 Example

This example is taken from Cochran and Cox (1958, Chapter 3),and is based on an agricultural field trial. In such trials blocks arenaturally formed from large sections of field, sometimes roughlysquare; the shape of individual plots and their arrangement intoplots is usually settled by a mixture of technological convenience,for example ease of harvesting, and special knowledge of the par-ticular area.

This experiment tested the effects of five levels of application ofpotash on the strength of cotton fibres. A single sample of cottonwas taken from each plot, and four measurements of strength weremade on each sample. The data in Table 3.1 are the means of thesefour measurements.

The marginal means are given in Table 3.1, and seem to indi-cate decreasing strength with increasing amount of potash, withperhaps some curvature in the response, since the mean strength


Table 3.1 Strength index of cotton, from Cochran and Cox (1958), withmarginal means.

Pounds of potash per acre36 54 72 108 144 Mean

I 7.62 8.14 7.76 7.17 7.46 7.63Block II 8.00 8.15 7.73 7.57 7.68 7.83

III 7.93 7.87 7.74 7.80 7.21 7.71

Mean 7.85 8.05 7.74 7.51 7.45 7.72

Table 3.2 Analysis of variance for strength index of cotton.

Sums of Degrees of MeanSource squares freedom square

Treatment 0.7324 4 0.1831Blocks 0.0971 2 0.0486Residual 0.3495 8 0.0437

at 36 pounds is less than that at 54 pounds, where the maximumis reached.

The analysis of variance outlined in Section 3.4.1 is given in Table3.2. The main use of the analysis of variance table is to providean estimate of the standard error for assessing the precision ofcontrasts of the treatment means. The mean square residual isan unbiased estimate of the variance of an individual observation,so the standard error for example for comparing two treatmentmeans is

√(2×0.0437/3) = 0.17, which suggests that the observed

decrease in strength over the levels of potash used is a real effect,but the observed initial increase is not.

It is possible to construct more formal tests for the shape of theresponse, by partitioning the sums of squares for treatments, andthis is considered further in Section 3.5 below.

The S-PLUS code for carrying out the analysis of variance in thisand the following examples is given in Appendix C. As with many


other statistical packages, the emphasis in the basic commands ison the analysis of variance table and the associated F -tests, whichin nearly all cases are not the most useful summary information.

3.4.3 Efficiency of blocking

As noted above the differences between blocks are regarded as ofno intrinsic interest, so long as no relevant baseline information isavailable about them. Sometimes, however, it may be useful to askhow much gain in efficiency there has been as compared with com-plete randomization. The randomization model provides a meansof assessing how effective the blocking has been in improving pre-cision. In terms of randomization theory the variance of the dif-ference between two treatment means in a completely randomizedexperiment is determined by

2rΣ(ξjs − ξ..)2/(vr − 1), (3.30)

whereas in the randomized block experiment it is

2rΣ(ξjs − ξ.s)2/r(v − 1). (3.31)

Also in the randomization model the mean square between blocksis constant with value

vΣ(ξ.s − ξ..)2/(r − 1). (3.32)

As a result the relative efficiency for comparing two treatmentmeans in the two designs is estimated by

2r

SSB + r(v − 1)MSR

(vr − 1)MSR. (3.33)

Here SSB and MSR are respectively the sum of squares for blocksand the residual mean square in the original randomized blockanalysis.

To produce from the original analysis of variance table for therandomized block design an estimate of the effective residual vari-ance for the completely randomized design we may therefore pro-duce a new formal analysis of variance table as follows. Replace thetreatment mean square by the residual mean square, add the sumsof squares for modified treatments, blocks and residual and divideby the degrees of freedom, namely vr − 1. The ratio of the tworesidual mean squares, the one in the analysis of the randomized


block experiment to the notional one just reconstructed, measuresthe reduction in effective variance induced by blocking.

There is a further aspect, however; if confidence limits for ∆ arefound from normal theory using the Student t distribution, thedegrees of freedom are (v − 1)(r − 1) and v(r − 1) respectively inthe randomized block and completely randomized designs, showingsome advantage to the latter if the error variances remain the same.Except in very small experiments, however, this aspect is relativelyminor.

3.5 Partitioning sums of squares


We have in this chapter emphasized that the objective of the anal-ysis is the estimation of comparisons between the treatments. Inthe context of analysis of variance the sum of squares for treat-ments is a summary measure of the variation between treatmentsand could be the basis of a test of the overall null hypothesis thatall treatments have identical effect, i.e. that the response obtainedon any unit is unaffected by the particular treatment assigned toit. Such a null hypothesis is, however, very rarely of concern andtherefore the sum of squares for treatments is of importance pri-marily in connection with the computation of the residual sum ofsquares, the basis for estimating the error variance.

It is, however, important to note that the treatment sum ofsquares can be decomposed into components corresponding to com-parisons of the individual effects and this we now develop.

3.5.2 Contrasts

Recall from Section 2.2.6 that if the treatment parameters are de-noted by τ1, . . . , τv a linear combination Lτ = Σljτj is called atreatment contrast if Σlj = 0. The contrast Lτ is estimated in therandomized block design by

Lτ = Σj lj Yj., (3.34)

where Yj. is the mean response on the jth treatment, averaged overblocks. Equivalently we can write

Lτ = Σj,sljYjs/r, (3.35)


where the sum is over individual observations and r is the numberof replications of each treatment.

Under the linear model (3.18) and the second moment assump-tion,

E(Lτ ) = Lτ , var(Lτ ) = σ2Σj l2j/r. (3.36)

We now define the sum of squares with one degree of freedomassociated with Lτ to be

SSL = rL2τ/Σl

2j . (3.37)

This definition is in some ways most easily recalled by notingthat Lτ is a linear combination of responses, and hence SSL isthe squared length of the orthogonal projection of the observationvector onto the vector whose components are determined by l.

The following properties are derived directly from the definitions:1. E(Lτ ) = Lτ and is zero if and only if the population contrast is

zero.2. E(SSL) = σ2 + rL2

τ/Σl2j .

3. Under the normal theory assumption SSL is proportional toa noncentral chi-squared random variable with one degree offreedom reducing to the central chi-squared form if and only ifLτ = 0.

4. The square of the Student t statistic for testing the null hypoth-esis Lτ = 0 is the analysis of variance F statistic for comparingSSL with the residual mean square.In applications the Student t form is to be preferred to its square,

partly because it preserves the information in the sign and moreimportantly because it leads to the determination of confidencelimits.

3.5.3 Mutually orthogonal contrasts

Several contrasts L(1)τ , L

(2)τ , . . . are called mutually orthogonal if for

all p 6= q

Σl(p)j l

(q)j = 0. (3.38)

Note that under the normal theory assumption the estimates oforthogonal contrasts are independent. The corresponding Student tstatistics are not quite independent because of the use of a commonestimate of σ2, although this is a minor effect unless the residualdegrees of freedom are very small.


Now suppose that there is a complete set of v − 1 mutually or-thogonal contrasts. Then by forming an orthogonal transformationof Y1., . . . , Yv. from (1/

√v, . . . , 1/

√v) and the normalized contrast

vectors, it follows that

rΣjs(Yj. − Y..)2 = SSL

(1)τ

+ . . .+ SSL

(v)τ, (3.39)

that is the treatment sum of squares has been decomposed intosingle degrees of freedom.

Further if there is a smaller set of v1 < v−1 mutually orthogonalcontrasts, then the treatment sum of squares can be decomposedinto

Selected individual contrasts v1Remainder v − 1− v1Total for treatments v − 1

In this analysis comparison of the mean square for the remainderterm with the residual mean square tests the hypothesis that alltreatment effects are accounted for within the space of the v1 iden-tified contrasts. Thus with six treatments and the single degree offreedom contrasts identified by

L(1)τ = (τ1 + τ2)/2− τ3, (3.40)

L(2)τ = (τ1 + τ2 + τ3)/3− (τ4 + τ5 + τ6)/3, (3.41)

we have the partition

L(1)τ 1

L(2)τ 1

Remainder 3Total for treatments 5

The remainder term could be divided further, perhaps most nat-urally initially into a contrast of τ1 with τ2 and a comparison withtwo degrees of freedom among the last three treatments.

The orthogonality of the contrasts is required for the simpledecomposition of the sum of squares. Subject-matter relevance ofthe comparisons of course overrides mathematical simplicity andit may be unavoidable to look at nonorthogonal comparisons.

We have in this section used notation appropriate to partition-ing the treatment sums of squares in a randomized block design,


but the same ideas apply directly to more general settings, withYj. above replaced by the average of all observations on the jthtreatment, and r replaced by the number of replications of eachtreatment. When in Chapter 5 we consider more complex treat-ments defined by factors exactly the same analysis can be appliedto interactions.

3.5.4 Equally spaced treatment levels

A particularly important special case arises when treatments aredefined by levels of a quantitative variable, often indeed by equallyspaced values of that variable. For example a dose might be set atfour levels defined by log dose = 0, 1, 2, 3 on some suitable scale, ora temperature might have three levels defined by temperatures of30, 40, 50 degrees Celsius, and so on.

We now discuss the partitioning of the sums of squares for sucha quantitative treatment in orthogonal components, correspond-ing to regression on that variable. It is usual, and sensible, withquantitative factors at equally spaced levels, to use contrasts rep-resenting linear, quadratic, cubic, ... dependence of the responseon the underlying variable determining the factor levels. Tables ofthese contrasts are widely available and are easily constructed fromfirst principles via orthogonal polynomials, i.e. via Gram-Schmidtorthogonalization of 1, x, x2, . . .. For a factor with three equallyspaced levels, the linear and quadratic contrasts are

−1 0 11 −2 1

and for one with four equally spaced levels, the linear, quadraticand cubic contrasts are

−3 −1 1 31 −1 −1 1

−1 3 3 1

The sums of squares associated with these can be compared withthe appropriate residual sum of squares. In this way some notionof the shape of the dependence of the response on the variabledefining the factor can be obtained.


3.5.5 Example 3.4 continued

In this example the treatments were defined by increasing levels ofpotash, in pounds per acre. The levels used were 36, 54, 72, 108and 144. Of interest is the shape of the dependence of strength onlevel of potash; there is some indication in Table 3.1 of a levellingoff or decrease of response at the highest level of potash.

These levels are not equally spaced, so the orthogonal polynomi-als of the previous subsection are not exactly correct for extractinglinear, quadratic, and other components. The most accurate way ofpartitioning the sums of squares for treatments is to use regressionmethods or equivalently to construct the appropriate orthogonalpolynomials from first principles. We will illustrate here the use ofthe usual contrasts, as the results are much the same.

The coefficients for the linear contrast with five treatment levelsare (−2,−1, 0, 1, 2), and the sum of squares associated with thiscontrast is SSlin = 3(−1.34)2/10 = 0.5387. The nonlinear contri-bution to the treatment sum of squares is thus just 0.1938 on threedegrees of freedom, which indicates that the suggestion of nonlin-earity in the response is not significant. The quadratic component,defined by the contrast (2,−1,−2,−1, 2) has an associated sum ofsquares of 0.0440.

If we use the contrast exactly appropriate for a linear regression,which has entries proportional to

(−2,−1.23,−0.46, 1.08, 2.61),

we obtain the same conclusion.With more extensive similar data, or with various sets of similar

data, it would probably be best to fit a nonlinear model consistentwith general subject-matter knowledge, for example an exponentialmodel rising to an asymptote. Fitting such a model across varioussets of data should be helpful for the comparison and synthesis ofdifferent studies.

3.6 Retrospective adjustment for improving precision

In Section 3.1 we reviewed various ways of improving precision andin Sections 3.2 and 3.3 developed the theme of comparing like withlike via blocking the experimental units into relatively homoge-neous sets, using baseline information. We now turn to a seconduse of baseline information. Suppose that on each experimental


unit there is a vector z of variables, either quantitative or indica-tors of qualitative groupings and that this information has eithernot been used in forming blocks or at least has been only partlyused.

There are three rather different situations. The importance ofz may have been realized only retrospectively, for example by aninvestigator different from the one involved in design. It may havebeen more important to block on features other than z; this is espe-cially relevant when a large number of baseline features is available.Thirdly, any use of z to form blocks is qualitative and it may bethat quantitative use of z instead of, or as well as, its use to formblocks may add sensitivity.

Illustrations. In many clinical trials there will be a large num-ber of baseline features available at the start of a trial and thepracticalities of randomization may restrict blocking to one or twokey features such as gender and age or gender and initial severity.In an animal experiment comparing diets, blocks could be formedfrom animals of the same gender and roughly the same initial bodyweight but, especially in small experiments, appreciable variationin initial body weight might remain within blocks.

Values of z can be used to test aspects of unit-treatment ad-ditivity, in effect via tests of parallelism, but here we concentrateon precision improvement. The formal statistical procedures of in-troducing regression on z into a model have appeared in slightlydifferent guise in Section 2.3 as techniques for retrospective biasremoval and will not be repeated. In fact what from a design per-spective is random error can become bias at the stage of analysis,when conditioning on relevant baseline features is appropriate. Itis therefore not surprising that the same statistical technique reap-pears.

Illustration. A group of animals with roughly equal numbers ofmales and females is randomized between two treatments T and Cregardless of gender. It is then realized that there are substantiallymore males than females in T . From an initial design perspectivethis is a random fluctuation: it would not persist in a similar largestudy. On the other hand once the imbalance is observed, unlessit can be dismissed as irrelevant or unimportant it is a potentialsource of bias and is to be removed by rerandomizing or, if it is toolate for that, by appropriate analysis. This aspect is connected with


some difficult conceptual issues about randomization; see Section2.4.

This discussion raises at least two theoretical issues. The firstconcerns the possible gains from using a single quantitative baselinevariable both as a basis for blocking and after that also as a basisfor an adjustment. It can be shown that only when the correlationbetween baseline feature and response is very high is this double useof it likely to lead to a substantial improvement in final precision.

Suppose now that there are baseline features that cannot be rea-sonably controlled by blocking and that they are controlled by aregression adjustment. Is there any penalty associated with adjust-ing unnecessarily?

To study this consider first an experiment to compare two treat-ments, with r replicates of each. After adjustment for the q × 1vector of baseline variables, z, the variance of the estimated differ-ence between the treatments is

var(τT − τC) = σ22/r + (zT. − zC.)TR−1zz (zT. − zC.), (3.42)

where σ2 is the variance per observation residual to regression on zand to any blocking system used, zT., zC. are the treatment meanvectors and Rzz is the matrix of sums of squares and cross-productsof z within treatments again eliminating any block effects.

Now if treatment assignment is randomized

ER(Rzz/dw) = Ωzz , (3.43)

where dw is the degrees of freedom of the residual sum of squaresin the analysis of variance table, and Ωzz is a finite populationcovariance matrix of the unit constants within blocks. With v = 2we have

ER(zT − zC) = 0, ER(zT − zC)(zT − zC)T = 2Ωzz/r. (3.44)

Now12r(zT − zC)T Ω−1

zz (zT − zC) =12r‖zT − zC‖2Ωzz

, (3.45)

say, has expectation q and approximately a chi-squared distributionwith q degrees of freedom.

That is, approximately

var(τT − τC) =2σ2

r(1 +Wq/dw), (3.46)

where Wq denotes a random variable depending on the outcome of


the randomization and having approximately a chi-squared distri-bution with q degrees of freedom.

More generally if there are v treatments each replicated r times

avej 6=lvar(τj − τl) = σ2[2/r + 2/r(v − 1)tr(BzzR−1zz )], (3.47)

where Bzz is the matrix of sums of squares and products betweentreatments and tr(A) denotes the trace of the matrix A, i.e. thesum of the diagonal elements.

The simplest interpretation of this is obtained by replacing Wq

by its expectation, and by supposing that the number of units n islarge compared with the number of treatments and blocks, so thatdw ∼ n. Then the variance of an estimated treatment difference isapproximately

2σ2

r(1 +

q

n). (3.48)

The inflation factor relative to the randomized block design is ap-proximately n/(n− q) leading to the conclusion that every unnec-essary parameter fitted, i.e. adjustment made without reduction inthe effective error variance per unit, σ2, is equivalent to the loss ofone experimental unit.

This conclusion is in some ways oversimplified, however, not onlybecause of the various approximations in its derivation. First, in asituation such as a clinical trial with a potentially large value ofq, adjustments would be made selectively in a way depending onthe apparent reduction of error variance achieved. This makes as-sessment more difficult but the inflation would probably be rathermore than that based on q0, the dimension of the z actually used,this being potentially much less than q, the number of baselinefeatures available.

The second point is that the variance inflation, which arises be-cause of the nonorthogonality of treatments and regression analy-ses in the least squares formulation, is a random variable depend-ing on the degree of imbalance in the configuration actually used.Now if this imbalance can be controlled by design, for example byrerandomizing until the value of Wq is appreciably smaller than itsexpectation, the consequences for variance inflation are reducedand possibly but not necessarily the need to adjust obviated. If,however, such control at the design stage is not possible, the aver-age inflation may be a poor guide. It is unlikely though that the


inflation will be more for small ε than

(1 +Wq,ε/n), (3.49)

where Wq,ε is the upper ε point of the randomization distributionof Wq, approximately a chi-squared distribution with q degrees offreedom.

For example, with ε = 0.01 and q = 10 it will be unlikely thatthere is more than a 10 per cent inflation if n > 230 as comparedwith n > 100 suggested by the analysis based on properties av-eraged over the randomization distribution. Note that when theunadjusted and adjusted effects differ immaterially simplicity ofpresentation may favour the former.

A final point concerns the possible justification of the adjustedanalysis based on randomization and the assumption of unit treat-ment additivity. Such a justification is usually only approximatebut can be based on an approximate conditional distribution re-garding, in the simplest case of just two treatments, zT − zC asfixed.

3.7 Special models of error variation

In this chapter we have emphasized methods of error control byblocking which, combined with randomization, aim to increase theprecision of estimated treatment contrasts without strong specialassumptions about error structure. That is, while the effectivenessof the methods in improving precision depends on the way in whichthe blocks are formed, and hence on prior knowledge, the validityof the designs and the associated standard errors does not do so.

Sometimes, however, especially in relatively small experiments inwhich the experimental units are ordered in time or systematicallyarrayed in space a special stochastic model may reasonably be usedto represent the error variation. Then there is the possibility ofusing a design that exploits that model structure. However, usuallythe associated method of analysis based on that model will nothave a randomization justification and we will have to rely morestrongly on the assumed model than for the designs discussed inthis chapter.

When the experimental units are arranged in time the two maintypes of variation are a trend in time supplemented by totallyrandom variation and a stationary time series representation. Thelatter is most simply formulated via a low order autoregression.


For spatial problems there are similar rather more complex repre-sentations. Because the methods of design and analysis associatedwith these models are more specialized we defer their discussion toChapter 8.


The central notions of blocking and of adjustment for baseline vari-ables are part of the pioneering contributions of Fisher (1935), al-though the qualitative ideas especially of the former have a longhistory. The relation between the adjustment process and random-ization theory was discussed by Cox (1982). See also the Biblio-graphic notes to Chapter 2. For the relative advantages of blockingand adjustment via a baseline variable, see Cox (1957).

The example in Section 3.4 is from Cochran and Cox (1958,Chapter 3), and the partitioning of the treatment sum of squaresfollows closely their discussion. The analysis of matched pairs andrandomized blocks from the linear model is given in most books ondesign and analysis; see, for example, Montgomery (1997, Chapters2 and 5) and Dean and Voss (1999, Chapter 10). The randomizationanalysis is given in detail in Hinkelmann and Kempthorne (1994,Chapter 9), as is the estimation of the efficiency of the randomizedblock design, following an argument attributed to Yates (1937).


1. Under what circumstances would it be reasonable to have arandomized block experiment in which each treatment occurredmore than once, say, for example, twice, in each block, i.e. inwhich the number of units per block is twice the number oftreatments? Set out the analysis of variance table for such adesign and discuss what information is available that cannot beexamined in a standard randomized block design.

2. Suppose in a matched pair design the responses are binary. Con-struct the randomization test for the null hypothesis of no treat-ment difference. Compare this with the test based on that forthe binomial model, where ∆ is the log odds-ratio. Carry out asimilar comparison for responses which are counts of numbersof occurrences of point events modelled by the Poisson distribu-tion.


3. Consider the likelihood analysis under the normal theory as-sumptions of the modified matched pair design of Section 3.3.2.There are r matched pairs, mT pairs in which both units re-ceive T and mC pairs in which both units receive C; we assumea common treatment difference applies throughout. We trans-form the original pairs of responses to sums and differences asin Section 3.3.1.

(a) Show that r of the differences have mean ∆, and that mT +mC of them have mean zero, all differences being indepen-dently normally distributed with variance τD, say.

(b) Show that independently of the differences the sums are in-dependently normally distributed with variance τS , say, withr having mean ν, say, mT having mean ν+ δ and mC havingmean ν − δ, where ∆ = 2δ.

(c) Hence show that minimal sufficient statistics are (i) the leastsquares estimate of ν from the sums; (ii) the least squaresestimate ∆S of ∆ from the unmatched pairs, i.e. the differenceof the means of mT and mC pairs; (iii) the estimate ∆D

from the matched pairs; (iv) a mean square MSD with dD =r − 1 +mT +mC degrees of freedom estimating τD and (v)a mean square MSS with dS = r − 2 +mT +mC degrees offreedom estimating τS . This shows that the system is a (5, 4)curved exponential family.

(d) Without developing a formal connection with randomizationtheory note that complete randomization of pairs to the threegroups would give some justification to the strong homogene-ity assumptions involved in the above. How would such ho-mogeneity be examined from the data?

(e) Show that a log likelihood function obtained by ignoring (i)and using the known densities of the four remaining statisticsis

− 12

log τS − m(∆S −∆)2/(2τS)

− 12

log τD − r(∆D −∆)2/(2τD)

− 12dD log τD − 1

2dDMSD/τD

− 12dS log τS − 1

2dSMSS/τS ,


where 1/m = 1/mD + 1/mS.(f) Hence show, possibly via some simulated data, that only in

quite small samples will the profile likelihood for ∆ differ ap-preciably from that corresponding to a weighted combinationof the two estimates of ∆ replacing the variances and theo-retically optimal weights by sample estimates and calculatingconfidence limits via the Student t distribution with effectivedegrees of freedom

d = (rMSS + mMSD)2(r2MS2S/dD + m2MS2

D/dS)−1.

For somewhat related calculations, see Cox (1984b).

4. Suppose that n experimental units are arranged in sequence intime and that there is prior evidence that the errors are likely tobe independent and identically distributed initially with meanzero except that at some as yet unknown point there is likely tobe a shift in mean error. What design would be appropriate forthe comparison of v treatments? After the experiment is com-pleted and the responses obtained it is found that the disconti-nuity has indeed occurred. Under the usual linear assumptionswhat analysis would be suitable if

(a) the position of the discontinuity can be determined withouterror from supplementary information

(b) the position of the discontinuity is regarded as an unknownparameter.


CHAPTER 4

Specialized blocking techniques

4.1 Latin squares

4.1.1 Main ideas

In some fields of application it is quite common to have two dif-ferent qualitative criteria for grouping units into blocks, the twocriteria being cross-classified. That is, instead of the units beingconceptually grouped into sets or blocks, arbitrarily numbered ineach block, it may be reasonable to regard the units as arrayed ina two-dimensional way.

Illustrations. Plots in an agricultural trial may be arranged in asquare or rectangular array in the field with both rows and columnslikely to represent systematic sources of variation. The shapes ofindividual plots will be determined by technological considerationssuch as ease of harvesting.

In experimental psychology it is common to expose each subjectto a number of different conditions (treatments) in sequence in eachexperimental session. Thus with v conditions used per subject persession, we may group the subjects into sets of v. For each set of vsubjects the experimental units, i.e. subject-period combinations,form a v× v array, with potentially important sources of variationboth between subjects and between periods. In such experimentswhere the same individual is used as a unit more than once, theassumption that the response on one unit depends only on thetreatment applied to that unit may need close examination.

In an industrial process with similar machines in parallel it maybe sensible to regard machines and periods as the defining featuresof a two-way classification of the units.

The simplest situation arises with v treatments, and two blockingcriteria each with v levels. The experimental units are arranged inone or more v × v squares. Then the principles of comparing likewith like and of randomization suggest using a design with eachtreatment once in each row and once in each column and choosing


a design at random subject to those two constraints. Such a designis called a v × v Latin square.

An example of a 4× 4 Latin square after randomization is

T4 T2 T3 T1

T2 T4 T1 T3

T1 T3 T4 T2

T3 T1 T2 T4.

(4.1)

In an application in experimental psychology the rows might cor-respond to subjects and the columns to periods within an experi-mental session. The arrangement ensures that constant differencesbetween subjects or between periods affect all treatments equallyand thus do not induce error in estimated treatment contrasts.

Randomization is most simply achieved by starting with a squarein some standardized form, for example corresponding to cyclicpermutation:

A B C DB C D AC D A BD A B C

(4.2)

and then permuting the rows at random, permuting the columns atrandom, and finally assigning the letters A, . . . ,D to treatments atrandom. The last step is unnecessary to achieve agreement betweenthe randomization model and the standard linear model, althoughit probably ensures a better match between normal theory andrandomization-based confidence limits; we do not, however, knowof specific results on this issue. It would, for the smaller squares atleast, be feasible to choose at random from all Latin squares of thegiven size, yielding an even richer randomization set.

On the basis again that constraints on the design are to be re-flected by parameters in the physical linear model, the defaultmodel for the analysis of a Latin square is as follows. In row sand column t let the treatment be Tjst . Then

Yjstst = µ+ τj + βs + γt + εjst, (4.3)

where on the right-hand side we have abbreviated jst to j andwhere the assumptions about εjst are as usual either second mo-ment assumptions or normal theory assumptions. Note especiallythat on the right-hand side of (4.3) the suffices j, s, t do not range


freely. The least-squares estimate of the contrast Σljτj is

ΣljYj.., (4.4)

where Yj.. is the mean of the responses on Tj. Further

evar(Σlj Yj..) = Σl2js2/v, (4.5)

where

s2 = Σ(Yjst − Yj.. − Y.s. − Y..t + 2Y...)2/(v − 1)(v − 2). (4.6)

The justification can be seen most simply from the followingdecompositions:

1. For the observations

Yjst = Y... + (Yj.. − Y...) + (Y.s. − Y...) + (Y..t − Y...)+ (Yjst − Yj.. − Y.s. − Y..t + 2Y...). (4.7)

2. For the sums of squares

ΣY 2jst = ΣY 2

... + Σ(Yj.. − Y...)2 + Σ(Y.s. − Y...)2

+ Σ(Y..t − Y...)2 + Σ(Yjst − Yj.. − Y.s. − Y..t + 2Y...)2. (4.8)

3. For the degrees of freedom

v2 = 1 + (v − 1) + (v − 1) + (v − 1) + (v − 1)(v − 2). (4.9)

The second moment properties under the randomization dis-tribution are established as before. From this point of view theassumption underlying the Latin square design, namely that ofunit-treatment additivity, is identical with that for completely ran-domized and randomized block designs. In Chapter 6 we shall seeother interpretations of a Latin square as a fractional factorial ex-periment and there strong additional assumptions will be involved.

Depending on the precision to be achieved it may be necessary toform the design from several Latin squares. Typically they would beindependently randomized and the separate squares kept separatein doing the experiment, the simplest illustration of what is calleda resolvable design. This allows the elimination of row and columneffects separately within each square and also permits the analysisof each square separately, as well as an analysis of the full set ofsquares together. Another advantage is that if there is a defect inthe execution of the experiment within one square, that square canbe omitted from the analysis.


4.1.2 Graeco-Latin squares

The Latin square design introduces a cross-classification of exper-imental units, represented by the rows and columns in the designarrangement, and a single set of treatments. From a combinatorialviewpoint there are three classifications on an equal footing, andwe could, for example, write out the design labelling the new rowsby the original treatments and inserting in the body of the squarethe original row numbering. In Chapter 6 we shall use the Latinsquare in this more symmetrical way.

We now discuss a development which is sometimes of direct usein applications but which is also important in connection with otherdesigns. We introduce a further classification of the experimentalunits, also with v levels; it is convenient initially to denote the levelsby letters of the Greek alphabet, using the Latin alphabet for thetreatments. We require that the Greek letters also form a Latinsquare and further that each combination of a Latin and a Greekletter occurs together in the same cell just once. The resultingconfiguration is called a Graeco-Latin square. Combinatorially thefour features, rows, columns and the two alphabets are on the samefooting.

Illustration. In an industrial experiment comparing five treat-ments, suppose that the experiment is run for five days with fiveruns per day in sequence so that a 5 × 5 Latin square is a suitabledesign. Suppose now that five different but nominally similar ma-chines are available for further processing of the material. It wouldthen be reasonable to use the five further machines in a Latinsquare configuration and to require that each further machine isused once in combination with each treatment.

Table 4.1 shows an example of a Graeco-Latin square designbefore randomization, with treatments labelled A, . . . , E and ma-chines α, . . . , ε.

If the design is randomized as before, the analysis can be basedon an assumed linear model or on randomization together withunit-treatment additivity. If Ylstgstst denotes the observation onrow s, column t, Latin letter lst and Greek letter gst then themodel that generates the results corresponding to randomizationtheory has

E(Ylgst) = µ+ τl + νg + βs + γt, (4.10)

where for simplicity we have abandoned the suffices on l and g.


Table 4.1 A 5× 5 Graeco-Latin square.

Aα Bβ Cγ Dδ EεBγ Cδ Dε Eα AβCε Dα Eβ Aγ BδDβ Eγ Aδ Bε CαEδ Aε Bα Cβ Dγ

Again the model is formed in accordance with the principle thateffects balanced out by design are represented by parameters inthe model even though the effects may be of no intrinsic interest.

In the decomposition of the data vector the residual terms havethe form

Ylgst − Yl... − Y.g.. − Y..s. − Y...t + 3Y...., (4.11)

and the sum of squares of these forms the residual sum of squaresfrom which the error variance is estimated.

4.1.3 Orthogonal Latin squares

Occasionally it may be useful to add yet further alphabets to theabove system. Moreover the system of arrangements that corre-sponds to such addition is of independent interest in connectionwith the generation of further designs.

It is not hard to show that for a v×v system there can be at most(v − 1) alphabets such that any pair of letters from two alphabetsoccur together just once. When v is a prime power, v = pm, acomplete set of (v−1) orthogonal squares can be constructed; thisresult follows from some rather elegant Galois field theory brieflydescribed in Appendix B. Table 4.2 shows as an example such asystem for v = 5. It is convenient to abandon the use of lettersand to label rows, columns and the letters of the various alphabets0, . . . , 4. If such an arrangement, or parts of it, are used directlythen rows, columns and the names of the letters of the alphabetswould be randomized.

For the values of v likely to arise in applications, say 3 ≤ v ≤ 12,a complete orthogonal set exists for the primes and prime powers,namely all numbers except 6, 10 and 12. Remarkably for v = 6there does not exist even a Graeco-Latin square. For v = 10 and


Table 4.2 Complete orthogonal set of 5× 5 Latin squares.

0000 1234 2413 3142 43211111 2340 3024 4203 04322222 3401 4130 0314 10433333 4012 0241 1420 21044444 0123 1302 2031 3210

12 such squares do exist: a pair of 10×10 orthogonal Latin squareswas first constructed by Bose, Shrikhande and Parker (1960). Atthe time of writing it is not known whether a 10×10 square existswith more than two alphabets.

4.2 Incomplete block designs


In the discussion so far the number of units per block has beenassumed equal to the number of treatments, with an obvious ex-tension if some treatments, for example a control, are to receiveadditional replication. Occasionally there may be some advantagein having the number of units per block equal to, say, twice thenumber of treatments, so that each treatment occurs twice in eachblock. A more common possibility, however, is that the number vof distinct treatments exceeds the most suitable choice for k, thenumber of units per block. This may happen because there is afirm constraint on the number of units per block, for example tok = 2 in a study involving twins. If blocks are formed on the basisof one day’s work there will be some flexibility over the value of k,although ultimately an upper bound. In other cases there may beno firm restriction on k but the larger k the more heterogeneousthe blocks and hence the greater the effective value of the errorvariance σ2.

We therefore consider how a blocking system can be imple-mented when the number of units per block is less than v, thenumber of treatments. For simplicity we suppose that all treat-ments are replicated the same number r of times. The total num-ber of experimental units is thus n = rv and because this is also


bk, where b is the number of blocks, we have

rv = bk. (4.12)

For given r, v, k it is necessary that b defined by this equation isan integer in order for a design to exist.

In this discussion we ignore any structure in the treatments,considering in particular all pairwise contrasts between T1, . . . , Tv

to be of equal interest.One possible design would be to randomize the allocation subject

only to equal replication, and to adjust for the resulting imbalancebetween blocks by fitting a linear model including block effects.This has, however, the danger of being quite inefficient, especially ifsubsets of treatments are particularly clustered together in blocks.

A better procedure is to arrange the treatments in as close toa balanced configuration as is achievable and it turns out that ina reasonable sense highest precision is achieved by arranging thateach pair of treatments occurs together in the same block the samenumber, λ, say, of times. Since the number of units appearing inthe same block as a given treatment, say T1, can be calculated intwo ways we have the identity

λ(v − 1) = r(k − 1). (4.13)

Another necessary condition for the existence of a design is thusthat this equation have an integer solution for λ. A design satisfyingthese conditions with k < v is called a balanced incomplete blockdesign.

A further general relation between the defining features is theinequality

b ≥ v. (4.14)

To see this, let N denote the v × b incidence matrix, which hasentries njs equal to 1 if the jth treatment appears in the sth block,and zero otherwise. Then

NNT = (r − λ)I + λ11T , (4.15)

where I is the identity matrix and 1 is a vector of unit elements, andit follows that NNT and hence also N have rank v, but rank N ≤min(b, v), thus establishing (4.14).

Given values of r, v, b, k and λ satisfying the above conditionsthere is no general theorem to determine whether a correspondingbalanced incomplete block design exists. The cases of practical in-


Table 4.3 Existence of some balanced incomplete block designs.

No. of Total no.Units per No. of No. of No. of of Units,Block, k Treatments, v Blocks, b Replicates, r bk = rv

2 3 3 2 62 4 6 3 122 5 10 4 20

2 any v12v(v − 1) (v − 1) v(v − 1)

3 4 4 3 123 5 10 6 303 6 10 5 303 6 20 10 603 7 7 3 213 9 12 4 363 10 30 9 903 13 26 6 783 15 35 7 105

4 5 5 5 204 6 15 10 604 7 7 4 284 8 14 7 564 9 18 8 724 10 15 6 604 13 13 4 524 16 20 5 80

terest have, however, been enumerated; see Table 4.3. Designs fortwo special cases are shown in Table 4.4, before randomization.

4.2.2 Construction

There is no general method of construction for balanced incompleteblock designs even when they do exist. There are, however, someimportant classes of such design and we now describe just three.


Table 4.4 Two special incomplete block designs. The first is resolvableinto replicates I through VII.

k = 3 I 1 2 3 4 8 12 5 10 15 6 11 13 7 9 14v = 15 II 1 4 5 2 8 10 3 13 14 6 9 15 7 11 12b = 35 III 1 6 7 2 9 11 3 12 15 4 10 14 5 8 13r = 7 IV 1 8 9 2 13 15 3 4 7 5 11 4 6 10 12

V 1 10 11 2 12 14 3 5 6 4 9 13 7 8 15VI 1 12 13 2 5 7 3 9 10 4 11 15 6 8 14VII 1 14 15 2 4 6 3 8 11 5 9 12 7 10 13

k = 4, v = 9, 1 2 3 4 1 2 5 6 1 2 7 8 1 3 5 7b = 18, r = 8 1 4 6 8 1 3 6 9 1 4 8 9 1 5 7 9

2 3 8 9 2 4 5 9 2 6 7 9 2 3 4 72 5 6 8 3 5 8 9 4 6 7 9 3 4 5 63 6 7 8 4 5 7 8

The first are the so-called unreduced designs consisting of allcombinations of the v treatments taken k at a time. The wholedesign can be replicated if necessary. The design has

b =(vk

), r =

(v − 1k − 1

). (4.16)

Its usefulness is restricted to fairly small values of k, v such as inthe paired design k = 2, v = 5 in Table 4.3.

A second family of designs is formed when the number of treat-ments is a perfect square, v = k2, where k is the number of unitsper block, and a complete set of orthogonal k × k Latin squares isavailable, i.e. k is a prime power. We use the 5× 5 squares set outin Table 4.2 as an illustration. We suppose that the treatments areset out in a key pattern in the form of a 5× 5 square, namely

1 2 3 4 56 7 8 9 10

...

We now form blocks of size 5 by the following rules


1. produce 5 blocks each of size 5 via the rows of the key design

2. produce 5 more blocks each of size 5 via the columns of the keydesign

3. produce four more sets each of 5 blocks of size 5 via the fouralphabets of the associated complete set of orthogonal Latinsquares.

In general this construction produces the design with

v = k2, r = k + 1, b = k(k + 1), λ = 1, n = k2(k + 1). (4.17)

It has the special feature of resolvability, not possessed in generalby balanced incomplete block designs: the blocks fall naturally intosets, each set containing each treatment just once. This feature ishelpful if it is convenient to run each replicate separately, possiblyeven in different centres. Further it is possible, with minor extracomplications, to analyze the design replicate by replicate, or omit-ting certain replicates. This can be a useful feature in protectingagainst mishaps occurring in some portions of the experiment, forexample.

The third special class of balanced incomplete block designs arethe symmetric designs in which

b = v, r = k. (4.18)

Many of these can be constructed by numbering the treatments0, . . . , v−1, finding a suitable initial block and generating the sub-sequent blocks by addition of 1 mod v. Thus with v = 7, k = 3 wemay start with (0, 1, 3) and generate subsequent blocks as (1, 2, 4),(2, 3, 5), (3, 4, 6), (4, 5, 0), (5, 6, 1), (6, 0, 2), a design with λ = 1;see also Appendix B. Before use the design is to be randomized.

4.2.3 Youden squares

We introduced the v× v Latin square as a design for v treatmentsaccommodating a cross-classification of experimental units so thatvariation is eliminated from error in two directions simultaneously.If now the number v of treatments exceeds the natural block sizesthere are two types of incomplete analogue of the Latin square. Inone it is possible that the full number v of rows is available, butthere are only k < v columns. It is then sensible to look for a designin which each treatment occurs just once in each column but therows form a balanced incomplete block design. Such designs are


Table 4.5 Youden square before randomization.

0 1 2 3 4 5 61 2 3 4 5 6 03 4 5 6 0 1 2

called Youden squares; note though that as laid out the design isessentially rectangular not square.

Illustrations. Youden squares were introduced originally in con-nection with experiments in which an experimental unit is a leaf ofa plant and there is systematic variation between different plantsand between the position of the leaf on the plant. Thus with 7 treat-ments and 3 leaves per plant, taken at different positions down theplant, a design with each treatment occurring once in each positionand every pair of treatments occurring together on the same plantthe same number of times would be sensible. Another applicationis to experiments in which 7 objects are presented for scoring byranking, it being practicable to look at 3 objects at each session,order of presentation within the session being relevant.

The incomplete block design formed by the rows has b = v, i.e.is symmetric and the construction at the end of the last subsec-tion in fact in general yields a Youden square; see Table 4.5. Inrandomizing Table 4.5 as a Youden square the rows are permutedat random, then the columns are permuted at random holding thecolumns intact and finally the treatments are assigned at randomto (0, . . . , v − 1). If the design is randomized as a balanced incom-plete block design the blocks are independently randomized, i.e. thecolumn structure of the original construction is destroyed and theenforced balance which is the objective of the design disappears.

The second type of two-way incomplete structure is of less com-mon practical interest and has fewer than v rows and columns.The most important of these arrangements are the lattice squareswhich have v = k2 treatments laid out in k × k squares. For somedetails see Section 8.5.


4.2.4 Analysis of balanced incomplete block designs

We now consider the analysis of responses from a balanced in-complete block design; the extension to a Youden square is quitedirect.

First it would be possible, and in a certain narrow sense valid,to ignore the balanced incomplete structure and to analyse theexperiment as a completely randomized design or, in the case ofa resolvable design, as a complete randomized block design re-garding replicates as blocks. This follows from the assumption ofunit-treatment additivity.

Nevertheless in nearly all contexts this would be a poor analysis,all the advantages of the blocking having been sacrificed and theeffective error now including a component from variation betweenblocks. The exception might be if there were reasons after the ex-periment had been completed for thinking that the grouping intoblocks had in fact been quite ineffective.

To exploit the special structure of the design which is intended toallow elimination from the treatment contrasts of systematic vari-ation between blocks, we follow the general principle that effectseliminated by design are to be represented by parameters in theassociated linear model. We suppose therefore that if treatment Tj

occurs in block s the response is Yjs, where

E(Yjs) = µ+ τj + βs, (4.19)

and if it is necessary to resolve nonuniqueness of the parameteri-zation we require by convention that

Στj = Σβs = 0. (4.20)

Further we suppose that the Yjs are uncorrelated random variablesof variance σ2

k, using a notation to emphasize that the variancedepends on the number k of units per block.

Because of the incomplete character of the design only thosecombinations of (j, s) specified by the v × b incidence matrix Nof the design are observed. A discussion of the least squares esti-mation of the τj for general incomplete block designs is given inthe next section; we outline here an argument from first principles.An alternative approach uses the method of fitting parameters instages; see Appendix A2.6.

The right-hand side of the least squares equations consists of the


total of all observations, Y.., and the treatment and block totals

Sj = Σbs=1Yjsnjs, Bs = Σv

j=1Yjsnjs. (4.21)

In a randomized block design the τj can be estimated directly fromthe Sj but this is not so for an incomplete block design. We look fora linear combination of the (Sj , Bs) that is an unbiased estimateof, say, τj ; this must be the least squares estimate required. Thequalitative idea is that we have to “correct” Sj for the specialfeatures of the blocks in which Tj happens to occur.

Consider therefore the adjusted treatment total

Qj = Sj − k−1ΣsnjsBs; (4.22)

note that the use of the incidence matrix ensures that the sum isover all blocks containing treatment Tj. A direct calculation showsthat

E(Qj) = rτj + (r − λ)k−1Σl 6=jτl (4.23)

=rv(k − 1)k(v − 1)

τj , (4.24)

where we have used the constraint Στj = 0 and the identity defin-ing λ. The least squares estimate of τj is therefore

τj =k(v − 1)rv(k − 1)

Qj =k

λvQj . (4.25)

As usual we obtain an unbiased estimate of σ2k from the analysis

of variance table. Some care is needed in calculating this, becausethe treatment and block effects are not orthogonal. In the so-calledintrablock or within block analysis, the sum of squares for treat-ments is adjusted for blocks, as in the computation of the leastsquares estimates above.

The sum of squares for treatments adjusted for blocks can mosteasily be computed by comparing the residual sum of squares afterfitting the full model with that for the restricted model fitting onlyblocks. We can verify that the sum of squares due to treatment isΣj τ

2j (λv)/k = ΣjQ

2jk/(λv), giving the analysis of variance of Table

4.6.For inference on a treatment contrast Σljτj , we use the estimate

Σlj τj , which has variance

var (Σlj τj) =kσ2

k

λvΣl2j , (4.26)


Table 4.6 Intrablock analysis of variance for balanced incomplete blockdesign.

Source Sum of squares Degrees of freedom

Blocks (ignoring Σj,s(Y.s − Y..)2 b− 1treatments)

Treatments kΣjQ2j /(λv) v − 1

(adj. for blocks)Residual bk − v − b+ 1Total bk − 1

which is obtained by noting that var(Qj) = r(k−1)σ2k/k and that,

for j 6= j′, cov(Qj , Qj′) = −λσ2k/k. For example, if l1 = 1, l2 = −1,

and lj = 0, so that we are comparing treatments 1 and 2, we havevar(τ1 − τ2) = 2kσ2

k/(λv). In the randomized block design withthe same value of error variance, σ2

k, and with r observations pertreatment, we have var(τ1 − τ2) = 2σ2

k/r. The ratio of these

E =v(k − 1)(v − 1)k

, (4.27)

is called the efficiency factor of the design. Note that E < 1. Itessentially represents the loss of information incurred by having tounscramble the nonorthogonality of treatments and blocks.

To assess properly the efficiency of the design we must take ac-count of the fact that the error variance in an ordinary randomizedblock design with blocks of v units is likely to be unequal to σ2

k;instead the variance of the contrast comparing two treatments is

2σ2v/r (4.28)

where σ2v is the error variance when there are v units per block.

The efficiency of the balanced incomplete block design is thus

Eσ2v/σ

2k. (4.29)

The smallest efficiency factor is obtained when k = 2 and v islarge, in which case E = 1/2. To justify the use of an incompleteblock design we would therefore need σ2

k ≤ σ2v/2.

The above analysis is based on regarding the block effects as arbi-trary unknown parameters. The resulting intrablock least squares


analysis eliminates any effect of such block differences. The mostunfavourable situation likely to arise in such an analysis is thatin fact the blocking is ineffective, σ2

k = σ2v , and there is a loss of

information represented by an inflation of variance by a factor 1/E .Now while it would be unwise to use a balanced incomplete block

design of low efficiency factor unless a substantial reduction in errorvariance is expected, the question arises as to whether one canrecover the loss of information that would result in cases where thehoped-for reduction in error variance is in fact not achieved. Notein particular that if it could be recognized with confidence that theblocking was ineffective, the direct analysis ignoring the incompleteblock structure would restore the variance of an estimated simplecontrast to 2σ2

v/r.A key to the recovery of information in general lies in the ran-

domization applied to the ordering in blocks. This means that itis reasonable, in the absence of further information or structure inthe blocks, to treat the block parameters βs in the linear modelas uncorrelated random variables of mean zero and variance, sayσ2

B . The resulting formulae can be justified by further appeal torandomization theory and unit-treatment additivity.

For a model-based approach we suppose that

Yjs = µ+ τj + βs + εjs (4.30)

as before, but in addition to assuming εjs to be independentlynormally distributed with zero mean and variance σ2

k, we add theassumption that βs are likewise independently normal with zeromean and variance σ2

B . This can be justified by the randomizationof the blocks within the whole experiment, or within replicates inthe case of a resolvable design.

This model implies

Bs = Y.s = kµ+ Σjnjsτj + (kβs + Σjnjsεjs), (4.31)

where we might represent the error term in parentheses as ηs whichhas mean zero and variance k(kσ2

B + σ2k). The least squares equa-

tions based on this model for the totals Bs give

µ = B. = Y.., (4.32)

τj =ΣsnjsY.s − rkµ

r − λ(4.33)


Table 4.7 Interblock analysis of variance corresponding to Table 4.6.


Treatment Σj(Yj. − Y..)2 v − 1(ignoring blocks)Blocks by subtraction b− 1(adj. for treatment)Residual as in intrablock bk − b− v + 1

analysis

and it is easily verified that

var(τj) =k(v − 1)v(r − λ)

(kσ2B + σ2

k). (4.34)

The expected value of the mean square due to blocks (adjustedfor treatments) is σ2

k + v(r − 1)σ2B/(b− 1) and unbiased estimates

of the components of variance are available from the analysis ofvariance given in Table 4.7.

We have two sets of estimates of the treatment effects, τj fromthe within block analysis and τj from the between block analysis.These two sets of estimates are by construction uncorrelated, so anestimate with smaller variance can be constructed as a weightedaverage of the two estimates, wτj + (1− w)τj , where

w = var(τj)−1/[var(τj)−1 + var(τj)−1] (4.35)

will be estimated using the estimates of σ2k and σ2

B. This approachis almost equivalent to the use of a modified profile likelihood func-tion for the contrasts under a normal theory formulation.

If σ2B is large relative to σ2

k, then the weight on τj will be closeto one, and there will be little gain in information over that fromthe intrablock analysis.

If the incomplete block design is resolvable, then we can removea sum of squares due to replicates from the sum of squares due toblocks, so that σ2

B is now a component of variance between blockswithin replicates.


4.2.5 More general incomplete block designs

There is a very extensive literature on incomplete block arrange-ments more general than balanced incomplete block designs. Someare simple modifications of the designs studied so far. We might,for example, wish to replicate some treatments more heavily thanothers, or, in a more extreme case, have one or more units in everyblock devoted to a control treatment. These and similar adapta-tions of the balanced incomplete block form are easily set up andanalysed. In this sense balanced incomplete block designs are asor more important for constructing designs adapted to the specialneeds of particular situations as they are as designs in their ownright.

Another type of application arises when no balanced incompleteblock design or ad hoc modification is available and minor changesin the defining constants, v, r, b, k, are unsatisfactory. Then varioustypes of partially balanced designs are possible. Next in importanceto the balanced incomplete block designs are the group divisibledesigns. In these the v treatments are divided into a number ofequal-sized disjoint sets. Each treatment is replicated the samenumber of times and appears usually either once or not at all ineach block. Slightly more generally there is the possibility that eachtreatment occurs either [v/k] or [v/k+1] times in each block, wherethere are k units per block. The association between treatments isdetermined by two numbers λ1 and λ2. Any two treatments inthe same set occur together in the same block λ1 times whereasany two treatments in different sets occur together λ2 times. Wediscuss the role of balance a little more in Chapter 7 but in essencebalance forces good properties on the eigenvalues of the matrix C,defined at (4.39) below, and hence on the covariance matrix of theestimated contrasts.

Sometimes, for example in plant breeding trials, there is a strongpractical argument for considering only resolvable designs. It istherefore useful to have a general flexible family of such designscapable of adjusting to a range of requirements. Such a family isdefined by the so-called α designs; the family is sufficiently richthat computer search within it is often needed to determine anoptimum or close-to-optimum design. See Exercise 4.6.

While quite general incomplete block designs can be readily anal-ysed as a linear regression model of the type discussed in AppendixA, the form of the intrablock estimates and analysis of variance is


Table 4.8 Analysis of variance for a general incomplete block design.


Blocks (ignoring BTK−1B − Y 2.../n b− 1

treatmentsTrt QT τ v − 1(adj. for blocks)Residual by subtraction n− b− v + 1Total ΣY 2

jsm − Y 2.../n n− 1

still relatively simple and we briefly outline it now. We supposethat treatment j is replicated rj times, that block s has ks experi-mental units, and that the jth treatment appears in the sth blocknjs times. The model for the mth observation of treatment j inblock s is

Yjsm = µ+ τj + βs + εjsm. (4.36)

The adjusted treatment totals are

Qj = Sj − ΣsnjsBs/ks, (4.37)

where, as before, Sj is the total response on the jth treatment andBs is the total for the sth block. We have again

E(Qj) = rjτj − Σl(Σsnjsnls/ks)τl (4.38)

or, defining Q = diag(Q1, . . . , Qv), R = diag(r1, . . . , rv), K =diag(k1, . . . , kb) and N = ((njs)),

E(Q) = (R−NK−1NT )τ = Cτ, (4.39)

say, and henceQ = Cτ . (4.40)

The v×v matrix C is not of full rank, but if every pair of contrastsτj − τl is to be estimable, the matrix must have rank v− 1. Such adesign is called connected. We may impose the constraint Στj = 0or Σrjτj = 0, leading to different least squares estimates of τ butthe same estimates of contrasts and the same analysis of variancetable. The analysis of variance is outlined in Table 4.8, in whichwe write B for the vector of block totals.

The general results for the linear model outlined in Appendix A


can be used to show that the covariance matrix for the adjustedtreatment totals is given by

cov(Q) = (R−NK−1NT )σ2, (4.41)

where σ2 is the variance of a single response. This leads directly toan estimate of the variance of any linear contrast Σliτi as lTC−l,where a specific form for the generalized inverse, C−, can be ob-tained by invoking either of the constraints 1T τ = Στj = 0 or1TRτ = Σrjτj = 0.

These formulae can be used not only in the direct analysis ofdata but also to assess the properties of nonstandard designs con-structed from ad hoc considerations. For example if no balancedincomplete block design exists but another design can be shown tohave variance properties close to those that a balanced incompleteblock design would have had then the new design is likely to beclose to optimal. Further the relative efficiencies can be comparedusing the appropriate C− for each design.

4.2.6 Examples

We first discuss a simple balanced incomplete block design.Example I of Cox and Snell (1981) is based on an experiment

reported by Biggers and Heyner (1961) on the growth of bones fromchick embryos after cultivation over a nutrient medium. There weretwo bones available from each embryo, and each embryo formed ablock. There were six treatments, representing a complete mediumand five other media obtained by omitting a single amino acid. Thedesign and data are given in Table 4.9. The treatment assignmentwas randomized, but the data are reported in systematic order. Inthe notation of the previous subsection, k = 2, λ = 1, v = 6 andr = 5.

The raw treatment means, and the means adjusted for blocks,are given in Table 4.10, and these form the basis for the intrablockanalysis. The intrablock analysis of variance table is given in Table4.11, from which an estimate of σ is 0.0811.

The treatment effect estimates τj , under the constraint Στj =0, are given by 2Qj/6, and the standard error for the differencebetween any two τj is

√(4/6)× 0.0811 = 0.066.

These estimates can be combined with the interblock estimates,which are obtained by fitting a regression model to the block to-tals. The intrablock and interblock effect estimates are shown in


Table 4.9 Log dry weight (µg) of chick bones for 15 embryos.

1 C 2.51 His- 2.15 9 His- 2.32 Lys- 2.532 C 2.49 Arg- 2.23 10 Arg- 2.15 Thr- 2.233 C 2.54 Thr- 2.26 11 Arg- 2.34 Val- 2.154 C 2.58 Val- 2.15 12 Arg- 2.30 Lys- 2.495 C 2.65 Lys- 2.41 13 Thr- 2.20 Val- 2.186 His- 2.11 Arg- 1.90 14 Thr- 2.26 Lys- 2.437 His- 2.28 Thr- 2.11 15 Val- 2.28 Lys- 2.568 His- 2.15 Val- 1.70

Table 4.10 Adjusted and unadjusted treatment means of log dry weight.

C His- Arg- Thr- Val- Lys-

Yj. 2.554 2.202 2.184 2.212 2.092 2.484(Unadj. mean)

τj + Y.. 2.550 2.331 2.196 2.201 2.060 2.390(Adj. mean)

Table 4.11 Analysis of variance of log dry weight.

Source Sum of sq. D.f. Mean sq.

Days 0.7529 14 0.0892Treatments 0.4462 5 0.0538(adj.)Residual 0.0658 10 0.0066


Table 4.12 Within and between block estimates of treatment effects.

C His- Arg- Thr- Val- Lys-

τj 0.262 0.043 −0.092 −0.087 −0.228 0.102τj 0.272 −0.280 −0.122 −0.060 −0.148 0.338τ∗j 0.264 −0.017 −0.097 −0.082 −0.213 0.146

Table 4.12, along with the pooled estimate τ∗j , which is a linearcombination weighted by the inverse variances. Because there isrelatively large variation between blocks, the pooled estimates arenot very different from the within block estimates. The standarderror for the difference between two τ∗j ’s is 0.059.

As a second example we take an incomplete block design thatwas used in a study of the effects of process variables on the prop-erties of pastry dough. There were 15 different treatments, and theexperiment took seven days. It was possible to do just four runson each day. The data are given in Table 4.13. The response givenhere is the cross-sectional expansion index for the pastry dough,in cm per g. The treatments have a factorial structure, which isignored in the present analysis. The treatment means and adjustedtreatment means are given in Table 4.14.

Note that all the treatments used on Day 6, the day with thehighest block mean, were also used on other days, whereas threeof the treatments used on Day 5 were not replicated. Thus thereis considerable intermixing of block effects with treatment effects.

The analysis of variance, with treatments adjusted for blocks, isgiven in Table 4.15. It is mainly useful for providing an estimate ofvariance for comparing adjusted treatment means. As the detailedtreatment structure has been ignored in this analysis, we defer thecomparison of treatment means to Exercise 6.9.

4.3 Cross-over designs


One of the key assumptions underlying all the previous discussionis that the response on any unit depends on the treatment appliedto that unit independently of the allocation of treatments to the


Table 4.13 An unbalanced incomplete block design. From Gilmour andRingrose (1999). 15 treatments; response is expansion index of pastrydough (cm per g).

Mean1 8 9 9

Day 1 15.0 14.8 13.0 11.7 13.6259 5 4 9

Day 2 12.2 14.1 11.2 11.6 12.2752 3 8 5

Day 3 15.9 10.8 15.8 15.6 14.52512 6 14 10

Day 4 12.7 18.6 11.4 11.2 13.47511 15 3 13

Day 5 13.0 11.1 10.1 11.7 11.4751 6 4 7

Day 6 14.6 17.8 12.8 15.4 15.12 9 7 9

Day 7 15.0 10.7 10.9 9.6 11.55

Table 4.14 Treatment means of expansion index: unadjusted and ad-justed for days.

Trt 1 2 3 4 5 6 7 8

Yj. 14.8 15.4 10.4 12.0 14.9 18.2 13.1 15.3τj + Y.. 14.5 16.7 10.8 12.1 15.3 17.1 14.0 15.4

Trt 9 10 11 12 13 14 15

Yj. 11.5 11.2 13.0 12.7 11.7 11.4 11.1τj + Y.. 12.6 9.7 13.7 11.2 12.4 9.9 11.8


Table 4.15 Within-block analysis of variance for pastry example.

Source Sum of sq. D. f. Mean sq.

Days (ign. trt) 49.41 6 8.235Treatment 96.22 14 6.873(adj. for days)Residuals 5.15 7 0.736

other units. When the experimental units are physically differententities this assumption will, with suitable precautions, be entirelyreasonable.

Illustration. In an agricultural fertiliser trial in which the exper-imental units are distinct plots of ground the provision of suitableguard rows between the plots will be adequate security againstdiffusion of fertiliser from one plot to another affecting the results.

When, however, the same physical object or individual is usedas an experimental unit several times the assumption needs morecritical attention and may indeed be violated.

Illustrations. In a typical investigation in experimental psychol-ogy, subjects are exposed to a series of conditions (treatments) andappropriate responses to each observed; a Latin square design maybe very suitable. It is possible, however, in some contexts that theresponse in one period is influenced not only by the condition inthat period but by that in the previous period and perhaps evenon the whole sequence of conditions encountered up to that point.This possibility would distort and possibly totally vitiate an inter-pretation based on the standard Latin square analysis. We shallsee that if the effect of previous conditions is confined to a sim-ple effect one period back, then suitable designs are available. If,however, there is the possibility of more complex forms of depen-dence on previous conditions, it will be preferable to reconsider thewhole basis of the design, perhaps exposing each subject to onlyone experimental condition or perhaps regarding a whole sequenceof conditions as defining a treatment.

An illustration of a rather different kind is to so-called commu-nity intervention trials. Here a whole village, community or schoolis an experimental unit, for example to compare the effects of


health education campaigns. Each school, say, is randomized toone of a number of campaigns. Here there can be interference be-tween units in two different senses. First, unless the schools are farapart, it may be difficult to prevent children in one school learningwhat is happening in other schools. Secondly there may be migra-tion of children from one school to another in the middle of theprogramme under study. To which regime should such children beassigned? If the latter problem happens only on a small scale itcan be dealt with in analysis, for example by omitting the childrenin question. Thirdly there is the delicate issue of what informationcan be learned from the variation between children within schools.

In experiments on the effect of diet on the milk yield of cows acommonly used design is again a Latin square in which over theperiod of lactation each cow receives each of a number of diets.Similar remarks about carry-over of treatment effects apply as inthe psychological application.

In an experiment on processing in the wool textile industriesone set of treatments may correspond to different amounts of oilapplied to the raw material. It is possible that some of the oil isretained on the machinery and that a batch processed followinga batch with a high oil allocation in effect receives an additionalsupplement of oil.

Finally a quite common application is in clinical trials of drugs.In its simplest form there are two treatments, say T and C; somepatients receive the drug T in the first period and C in the secondperiod, whereas other patients have the order C, T . This, the two-treatment, two-period design can be generalized in various obviousways by extending the number of treatments or the number ofperiods or both; see Exercise 1.2.

We call any effect on one experimental unit arising from treat-ments applied to another unit a carry-over or residual effect. It isunlikely although not impossible that the carry-over of a treatmenteffect from one period to another is of intrinsic interest. For theremainder of the discussion we shall, however, take the more usualview that any such effects are an encumbrance.

Two important general points are that even in the absence ofcarry-over effects it is possible that treatment effects estimated inan environment of change are not the same as those in a morestable context. For example, cows subject to frequently changingdiets might react differently to a specific diet from how they would


under a stable environment. If that seems a serious possibility theuse of physically the same material as a unit several times is suspectand alternative designs should be explored.

The second point is that it will often be possible to eliminate orat least substantially reduce carry-over effects by wash-out periodsrestoring the material so far as feasible to its initial state. Forexample, in the textile experiment mentioned above the processingof each experimental batch could be preceded by a standard batchwith a fixed level of oil returning the machinery to a standard state.Similar possibilities are available for the dairy and pharmaceuticalillustrations.

Such wash-out periods are important, their duration being de-termined from subject-matter knowledge, although unless they canbe made to yield further useful information the effort attached tothem may detract from the appeal of this type of design.

In some agricultural and ecological applications, interference orig-inates from spatial rather than temporal contiguity. The designprinciples remain the same although the details are different. Thesimplest possibility is that the response on one unit depends notonly on the treatment applied to that unit but also on the treat-ment applied to spatially adjacent units; sometimes the dependenceis directly on the responses from the other units. These suggest theuse of designs in which each treatment is a neighbour of each othertreatment approximately the same number of times. A second pos-sibility arises when, for example, tall growing crops shield nearbyplots from sunlight; in such a case the treatments might be vari-eties growing to different heights. To some extent this can be dealtwith by sorting the varieties into a small number of height groupson the basis of a priori information and aiming by design to keepvarieties in the same group nearby as far as feasible.

In summary, repeated use of the same material as an experimen-tal unit is probably wise only when there are solid subject-matterarguments for supposing that any carry-over effects are either ab-sent or if present are small and of simple form. Then it is often apowerful technique.

4.3.2 Two-treatment two-period design

We begin with a fairly thorough discussion of the simplest designwith two treatments and two periods, in effect an extension of thediscussion of the matched pair design in Section 3.3. We regard the


two periods asymmetrically, supposing that all individuals startthe first period in the same state whereas that is manifestly nottrue for the second period. In the first period let the treatmentparameter be ±δ depending on whether T or C is used and in thesecond period let the corresponding parameter be ±(δ+ γ), whereγ measures a treatment by period interaction. Next suppose thatin the second period there is added to the treatment effect a carry-over or residual effect of ±ρ following T or C, respectively, in thefirst period. Finally let π denote a systematic difference betweenobservations in the second period as compared with the first.

If an individual additive parameter is inserted for each pair ofunits, i.e. for each patient in the clinical trial context, an analysisfree of these parameters must be based on differences of the obser-vation in the second period minus that in the first. If, however, aswould typically be the case, individuals have been randomly allo-cated to a treatment order a second analysis is possible based onthe pair totals. The error for this will include a contribution fromthe variation between individuals and so the analysis of the totalswill have a precision lower than and possibly very much lower thanthat of the analysis of differences. This is analogous to the situa-tion arising in interblock and intrablock analyses of the balancedincomplete block design.

For individuals receiving treatments in the order C, T the ex-pected value of the first and second observations are, omitting in-dividual unit effects,

µ− δ, µ+ δ + γ − ρ+ π, (4.42)

whereas for the complementary sequence the values are

µ+ δ, µ− δ − γ + ρ+ π. (4.43)

Thus the mean of the differences within individuals estimates re-spectively

2δ + π + γ − ρ, −2δ + π − γ + ρ. (4.44)

and the difference of these estimates leads to the estimation of

2∆ + 2(γ − ρ), (4.45)

where ∆ = 2δ. On the other hand a similar comparison based onthe sums of pairs of observations leads to the estimation, typicallywith low precision, of 2(γ − ρ).

From this we can see that treatment by period interaction and


residual effect are indistinguishable. In addition, assuming that thefirst period treatment effect ∆ is a suitable target parameter, itcan be estimated from the first period observations alone, but onlywith low precision because the error includes a between individualcomponent. If this component of variance were small there wouldbe little advantage to the cross-over design anyway.

This analysis confirms the general conclusion that the design issuitable only when there are strong subject-matter arguments forsupposing that any carry-over effect or treatment by period interac-tion are small and that there are substantial systematic variationsbetween subjects.

To a limited extent the situation can be clarified by includingsome individuals receiving the same treatment in both periods, i.e.T, T or C,C. Comparison of the differences then estimates (γ+ρ),allowing the separation of treatment by period interaction fromresidual effect. Comparison of sums of pairs leads to the estimationof 2∆+(γ+ρ), typically with low precision. Note that this assumesthe absence of a direct effect by carry-over effect interaction, i.e.that the carry-over effect when there is no treatment change isthe same as when there is a change. The reasonableness of thisassumption depends entirely on the context.

Another possibility even with just two treatments is to have athird period, randomizing individuals between the six sequences

T, T, C; T,C, T ; T,C,C; C,C, T ; C, T,C; C, T, T. (4.46)

It might often be reasonable to assume the absence of a treatmentby period interaction and to assume that the carry-over parameterfrom period 1 to period 2 is the same as, say, between period 2 andperiod 3.

4.3.3 Special Latin squares

With a modest number of treatments the Latin square design withrows representing individuals and columns representing periods is anatural starting point and indeed many of the illustrations of theLatin square mentioned in Chapter 4 are of this form, ignoring,however, carry-over effects and treatment by period interactions.

The general comments made above about the importance ofwash-out periods and the desirability of restricting use of the de-signs to situations where any carry-over effects are of simple formand small continue to hold. If, however, the possibility of treat-


ment by period interaction is ignored and any carry-over effect isassumed to be an additive effect extending one period only, it isappealing to look for special Latin square designs in which eachtreatment follows each other treatment the same number of times,either over each square, or, if this is not possible over the set ofLatin squares forming the whole design.

We first show how to construct such special squares. For a v× vsquare it is convenient to label the treatments 0, . . . , v − 1 with-out any implication that the treatments correspond to levels of aquantitative variable.

If v is even the key to design construction is that the sequence

0, 1, v − 1, 2, v − 2 . . . (4.47)

has all possible differences mod v occurring just once. Thus if wegenerate a Latin square from the above first row by successive ad-dition of 1 followed by reduction mod v it will have the requiredproperties. For example, with v = 6 we get the following design inwhich each treatment follows each other treatment just once:

0 1 5 2 4 31 2 0 3 5 42 3 1 4 0 53 4 2 5 1 04 5 3 0 2 15 0 4 1 3 2

The whole design might consist of several such squares indepen-dently randomized by rows and by naming the treatments but not,of course, randomized by columns.

If v is odd the corresponding construction requires pairs of squaresgenerated by the first row as above and by that row reversed.

Note that in these squares no treatment follows itself so that inthe presence of simple carry-over effects the standard Latin squareanalysis ignoring carry-over effects yields biased estimates of thedirect treatment effects. This could be obviated if every row ispreceded by a period with no observation but in which the sametreatment is used as in the first period of the standard square. Itis unlikely that this will often be a very practicable possibility.

For continuous observations the analysis of such a design wouldusually be via an assumed linear model in which if Ysuij is theobservation in row s, column or period u receiving treatment i and


previous treatment j then

E(Ysuij ) = µ+ αs + βu + τi + ρj , (4.48)

with the usual assumptions about error and conventions to definethe parameters in the overparameterized model, such as

Σαs = Σβu = Στi = Σρj = 0. (4.49)

Note also that any carry-over effect associated with the state ofthe individual before the first treatment period can be absorbedinto β0, the first period effect.

In practice data from such a design are usually most simplyanalysed via some general purpose procedure for linear models.Nevertheless it is helpful to sketch an approach from first principles,partly to show what is essentially involved, and partly to show howto tackle similar problems in order to assess the precision likely tobe achieved.

To obtain the least squares estimates, say from a single squarewith an even value of v, we argue as follows. By convention num-ber the rows so that row s ends with treatment s. Then the leastsquares equations associated with general mean, row, column, di-rect and carry-over treatment effects are respectively

v2µ = Y...., (4.50)vµ+ vαs − ρs = Ys..., (4.51)

vµ+ vβu = Y.u.., (4.52)vµ+ vτi − ρi = Y..i., (4.53)

(v − 1)µ− τj + (v − 1)ρj − β1 = Y...j. (4.54)

It follows that

(v2 − v − 1)τi = (v − 1)Y..i. + Y...i, (4.55)

although to satisfy the standard constraint Στi = 0 a constant−Y.... +Y.1../v has to be added to the right-hand side, not affectingcontrasts.

A direct calculation now shows that the variance of a simplecontrast is

var(τ1 − τ2) = 2σ2(v − 1)/(v2 − v − 1) (4.56)

showing that as compared with the variance 2σ2/v for a simplecomparison of means the loss of efficiency from nonorthogonalityis small.


This analysis assumes that the parameters of primary interestare contrasts of the direct treatment effects, i.e. contrasts of theτi. If, however, it is reasonable to assume that the carry-over effectof a treatment following itself is the same as following any othertreatment then the total treatment effect τi+ρi would be the focusof attention.

In some contexts also a more sensitive analysis would be obtainedif it were reasonable to make the working assumption that ρi =κτi for some unknown constant of proportionality κ, leading toa nonlinear least squares analysis. There are many extensions ofthese ideas that are theoretically possible.

4.3.4 Single sequence

In the above discussion it has been assumed that there are a num-ber of individuals on each of which observation continues for arelatively modest number of periods. Occasionally, however, thereis a single individual under study and then the requirement is for asingle sequence possibly quite long, in which each treatment occursthe same number of times and, preferably in which some balanceof carry-over effects is achieved.

Illustration. Most clinical trials involve quite large and some-times very large numbers of patients and are looking for relativelysmall effects in a context in which treatment by patient interac-tion is likely to be small, i.e. any treatment effect is provisionallyassumed uniform across patients. By contrast it may happen thatthe search is for the treatment appropriate for a particular patientin a situation in which it is possible to try several treatments un-til, hopefully, a satisfactory solution is achieved. Such designs arecalled in the literature n of one designs. If the trial is over an ex-tended period we may need special sequences of the type mentionedabove.

The need for longish sequences of a small number of letters inwhich each letter occurs the same number of times and the samenumber of times either following all other letters or all letters, i.e.including itself, will be discussed in the context of serially corre-lated errors in Section 8.4. In the context of n of one trials it maybe that the objective is best formulated not as estimating treat-ment contrasts but rather in decision-theoretic terms as that ofmaintaining the patient in a satisfactory condition for as high a


proportion of time as possible. In that case quite different strate-gies may be appropriate, broadly of the play-the-winner kind, inwhich successful treatments are continued until failure when theyare replaced either from a pool of as-yet-untried treatments or bythe apparently best of the previously used and temporarily aban-doned treatments.


Latin squares, studied by Euler for their combinatorial interest,were occasionally used in experimentation in the 19th century.Their systematic study and the introduction of randomization isdue to R. A. Fisher. For a very detailed study of Latin squares,including an account of the chequered history of the 6× 6 Graeco-Latin square, see Denes and Keedwell (1974). For an account ofspecial forms of Latin squares and of various extensions, such asto Latin cubes, see Preece (1983, 1988).

Balanced incomplete block designs were introduced by Yates(1936). The extremely extensive literature on their properties andextensions is best approached via P. W. M. John (1971); see alsoJ. A. John and E. R. Williams (1995) and Raghavarao (1971).Hartley and Smith (1948) showed by direct arguments that everysymmetric balanced incomplete block design, i.e. one having b = v,can be converted into a Youden square by suitable reordering. Fora careful discussion of efficiency in incomplete block designs, seePearce (1970) and related papers.

Cross-over designs are discussed in books by Jones and Kenward(1989) and Senn (1993), the latter largely in a clinical trial context.The special Latin squares balanced for residual effects were intro-duced by E. J. Williams (1949, 1950). For some generalizations ofWilliams’s squares using just three squares in all, see Newcombe(1996). An analysis similar to that in Section 4.3.3 can be found inCochran and Cox (1958, Section 4.6a). For some of the problemsencountered with missing data in cross-over trials, see Chao andShao (1997).

For n of one designs, see Guyatt et al. (1986).


1. Show that randomizing a Latin square by (a) permuting rowsand columns at random, (b) permuting rows, columns and treat-


ment names at random, (c) choosing at random from the set ofall possible Latin squares of the appropriate size all generatefirst and second moment randomization distributions with theusual properties. What considerations might be used to choosebetween (a), (b) and (c)? Explore a low order case numerically.

2. Set out an orthogonal partition of 6× 6 Latin squares in whicha second alphabet is superimposed on the first with two letterseach occurring three times in each row and column and threetimes coincident with each letter of the Latin square. Find alsoan arrangement with three letters each occurring twice in eachrow, etc. Give the form of the analysis of variance in each case.These partitions in the 6× 6 case were given by Finney (1945a)as in one sense the nearest one could get to a 6×6 Graeco-Latinsquare.

3. In 1850 the Reverend T.P. Kirkman posed the following prob-lem. A schoolmistress has a class of 15 girls and wishes to takethem on a walk each weekday for a week. The girls should walkin threes with no two girls together in a triplet more than oncein the week. Show that this is equivalent to finding a balancedincomplete block design and give an explicit solution.

4. An alternative to a balanced incomplete block design is to usea control treatment within each block and to base the analysison differences from the control. Let there be v treatments tobe compared in b blocks each of k units, of which c units re-ceive the control C. Suppose for simplicity that v = b(k − c),and that the design consists of r replicates each of b blocks ineach of which every treatment occurs once and the control bctimes. Discuss the advantages of basing the analysis on (a) thedifference between each treated observation and the mean ofthe corresponding controls and (b) on an analysis of covariance,treating the mean of the control observations as an explanatoryvariable. For (a) and for (b) calculate the efficiency of such adesign relative to a balanced incomplete block design withoutrecovery of interblock information and using no observations oncontrol and with respect to a completely randomized design ig-noring block structure. How would the conclusions be affectedif C were of intrinsic interest and not merely a device for im-proving precision? The approach was discussed by Yates (1936)who considered it inefficient. His conclusions were confirmed by


Atiqullah and Cox (1962) whose paper has detailed theoreticalcalculations.

5. Develop the intra-block analysis of an incomplete block designfor v ordinary treatments, and one extra treatment, a control,in which the ordinary treatments are to be set out in a balancedincomplete block design and the control is to occur on kc extraunits in each block. Find the efficiency factors for the compar-ison of two ordinary treatments and for the comparison of anordinary treatment with control. Show how to choose kc to op-timize the comparison of the v ordinary treatments individuallywith the control.

6. The family of resolvable designs called α(h1, . . . , hq) designs forv = wk treatments, r replicates of each treatment, k units perblock and with b blocks in each replicate are generated as follows.There are r resolution classes of w blocks each. Any pair oftreatments occurs together in h1 or . . . or in hq blocks. In manyways the most important cases are the α(0, 1) designs followedby the α(0, 1, 2) designs; that is, in the former case each pair oftreatments occurs together in a block at most once.The method of generation starts with an initial block withineach resolution class, and generates the blocks in that resolutionclass by cyclic permutation.These designs were introduced by Patterson and Williams (1976)and have been extensively used in plant breeding trials in whichlarge numbers of varieties may be involved with a fairly smallnumber of replicates of each variety and in which resolvabil-ity is important on practical grounds. See John and Williams(1995) for a careful discussion within a much larger setting andStreet and Street (1987) for an account with more emphasis onthe purely combinatorial aspects. For resolvable designs withunequal block sizes, see John et al. (1999).

7. Raghavarao and Zhou (1997, 1998) have studied incompleteblock designs in which each triple of treatments occur togetherin a block the same number of times. One application is to mar-keting studies in which v versions of a commodity are to becompared and each shop is to hold only k < v versions. The de-sign ensures that each version is seen in comparison with eachpair of other possibilities the same number of times. Give sucha design for v = 6, k = 4 and suggest how the responses mightbe analysed.


CHAPTER 5

Factorial designs: basic ideas

5.1 General remarks

In previous chapters we have emphasized experiments with a singleunstructured set of treatments. It is very often required to inves-tigate the effect of several different sets of treatments, or moregenerally several different explanatory factors, on a response of in-terest. Examples include studying the effect of temperature, con-centration, and pressure on the hardness of a manufactured prod-uct, or the effects of three different types of fertiliser, say nitrogen,potassium and potash, on the yield of a crop. The different aspectsdefining treatments are conventionally called factors, and there aretypically a specified, usually small, number of levels for each factor.An individual treatment is a particular combination of levels of thefactors.

A complete factorial experiment consists of an equal number ofreplicates of all possible combinations of the levels of the factors.For example, if there are three levels of temperature, and two eachof concentration and pressure, then there are 3× 2× 2 = 12 treat-ments, so that we will need at least 12 experimental units in orderto study each treatment only once, and at least 24 in order to getan independent estimate of error from a complete replicate of theexperiment.

There are several reasons for designing complete factorial ex-periments, rather than, for example, using a series of experimentsinvestigating one factor at a time. The first is that factorial exper-iments are much more efficient for estimating main effects, whichare the averaged effects of a single factor over all units. The sec-ond, and very important, reason is that interaction among factorscan be assessed in a factorial experiment but not from series ofone-at-a-time experiments.

Interaction effects are important in determining how the con-clusions of the experiment might apply more generally. For ex-ample, knowing that nitrogen only improves yield in the presence


of potash would be crucial information for general recommenda-tions on fertiliser usage. A main basis for empirical extrapolationof conclusions is demonstration of the absence of important inter-actions. In other contexts interaction may give insight into howthe treatments “work”. In many medical contexts, such as recentlydeveloped treatments for AIDS, combinations of drugs are effectivewhen treatment with individual drugs is not.

Complete factorial systems are often large, especially if an ap-preciable number of factors is to be tested. Often an initial experi-ment will set each factor at just two levels, so that important maineffects and interactions can be quickly identified and explored fur-ther. More generally a balanced portion or fraction of the completefactorial can often be used to get information on the main effectsand interactions of most interest.

The choice of factors and the choice of levels for each factor arecrucial aspects of the design of any factorial experiment, and willbe dictated by subject matter knowledge and constraints of timeor cost on the experiment.

The levels of factors can be qualitative or quantitative. Quan-titative factors are usually constructed from underlying continu-ous variables, such as temperature, concentration, or dose, andthere may well be interest in the shape of the response curve orresponse surface. Factorial experiments are an important ingredi-ent in response surface methods discussed further in Section 6.5.Qualitative factors typically have no numerical ordering, althoughoccasionally factors will have a notion of rank that is not strictlyquantitative.

Factors are initially thought of as aspects of treatments: theassignment of a factor level to a particular experimental unit isunder the investigator’s control and in principle any unit mightreceive any of the various factor combinations under consideration.

For some purposes of design and analysis, although certainlynot for final interpretation, it is helpful to extend the definitionof a factor to include characteristics of the experimental units.These may be either important intrinsic features, such as sex orinitial body mass, or nonspecific aspects, such as sets of apparatus,centres in a clinical trial, etc. stratifying the experimental units.

Illustrations. In a laboratory experiment using mice it mightoften be reasonable to treat sex as a formal factor and to ensurethat each treatment factor occurs equally often with males and


females. In an agricultural field trial it will often be important toreplicate the experiment, preferably in virtually identical form, ina number of farms. This gives sex in the first case and farms inthe second some of the features of a factor. The objective is not tocompare male and female mice or to compare farms but rather tosee whether the conclusions about treatments differ for male andfor female mice or whether the conclusions have a broad range ofvalidity across different farms.

As noted in Section 3.2 for analysis and interpretation it is of-ten desirable to distinguish between specific characteristics of theexperimental units and nonspecific groupings of the units, for ex-ample defining blocks in a randomized block experiment.

For most of the subsequent discussion we take the factors asdefining treatments. Regarding each factor combination as a treat-ment, the discussion of Chapters 3 and 4 on control of haphazarderror applies, and we may, for example, choose a completely ran-domized experiment, a randomized block design, a Latin square,and so on. Sometimes replication of the experiment will be associ-ated with a blocking factor such as days, laboratories, etc.

5.2 Example

This example is adapted from Example K of Cox and Snell (1981),taken in turn from John and Quenouille (1977). Table 5.1 showsthe total weights of 24 six-week-old chicks. The treatments, twelvedifferent methods of feeding, consisted of all combinations of threefactors: level of protein at three levels, type of protein at two levels,and level of fish solubles, at two levels. The resulting 3 × 2 × 2factorial experiment was independently replicated in two differenthouses, which we treat as blocks in a randomized block experiment.

Table 5.2 shows mean responses cross-classified by factors. Ta-bles of means are important both for a preliminary assessment ofthe data, and for summarizing the results. The average responseon groundnut is 6763 g, and on soybean is 7012 g, which suggeststhat soybean is the more effective diet. However, the two-way tableof type of protein by level is indicative of what will be called aninteraction: the superiority of soybean appears to be reversed atthe higher level of protein.

A plot for detecting interactions is often a helpful visual sum-mary of the tables of means; see Figure 5.1, which is derived fromFigure K.1 of Cox and Snell (1981). The interaction of type of


Table 5.1 Total weights (g) of six-week-old chicks.

Protein Level of Level of House Meanprotein fish

solubles I IIGroundnut 0 0 6559 6292 6425.5

1 7075 6779 6927.01 0 6564 6622 6593.0

1 7528 6856 7192.02 0 6738 6444 6591.0

1 7333 6361 6847.0

Soybean 0 0 7094 7053 7073.51 8005 7657 7831.0

1 0 6943 6249 6596.01 7359 7292 7325.5

2 0 6748 6422 6585.01 6764 6560 6662.0

protein with level of protein noted above shows in the lack of par-allelism of the two lines corresponding to each type of protein. It isnow necessary to check the strength of evidence for the effects justsummarized. We develop definitions and methods for doing this inthe next section.

5.3 Main effects and interactions

5.3.1 Assessing interaction

Consider two factors A and B at a, b levels, each combinationreplicated r times. Denote the response in the sth replicate at leveli of A and j of B by Yijs. There are ab treatments, so the sum ofsquares for treatment has ab − 1 degrees of freedom and can becomputed from the two-way table of means Yij., averaging over s.

The first and primary analysis of the data consists of forming thetwo-way table of Yij. and the associated one-way marginal meansYi.., Y.j., as we did in the example above. It is then important todetermine if all the essential information is contained in compari-son of the one-way marginal means, and if so, how the precision of


Table 5.2 Two-way tables of mean weights (g).

Groundnut Soybean MeanLevel of 0 6676 7452 7064protein 1 6893 6961 7927

2 6719 6624 6671Mean 6763 7012 6887

G-nut Soy Level of protein Mean0 1 2

Level of 0 6537 6752 6750 6595 6588 6644fish 1 6989 7273 7379 7259 6755 7131

Mean 6763 7012 7064 6927 6671 6887

associated contrasts is to be assessed. Alternatively, if more thanone-way marginal means are important then appropriate more de-tailed interpretation will be needed.

We start the more formal analysis by assuming a linear modelfor Yijs that includes block effects whenever dictated by the de-sign, and we define τij to be the treatment effect for the treatmentcombination i, j. Using a dot here to indicate averaging over asubscript, we can write

τij = τ.. + (τi. − τ..) + (τ.j − τ..) + (τij − τi. − τ.j + τ..), (5.1)

and in slightly different notation

τij = τ.. + τAi + τB

j + τABij . (5.2)

If the last term is zero for all i, j, then in the model the followingstatements are equivalent.1. There is defined to be no interaction between A and B.2. The effects of A and B are additive.3. The difference between any two levels of A is the same at all

levels of B.4. The difference between any two levels of B is the same at all

levels of A.Of course, in the data there will virtually always be nonzero

estimates of the above quantities. We define the sum of squares for


Level of protein

mea

n we

ight

6400

6600

6800

7000

7200

7400

7600

7800

0 1 2

Figure 5.1 Plot of mean weights (g) to show possible interaction.Soybean, Lev.f = 0 (———); Soybean, Lev.f = 1 (– – –); G-nut,Lev.f = 0 (· · · · · ·); G-nut, Lev.f = 1 (— — —)

interaction via the marginal means corresponding to the last termin (5.1): ∑

i,j,s

(Yij. − Yi.. − Y.j. + Y...)2. (5.3)

Note that this is r times the corresponding sum over only i, j. Oneproblem is to assess the significance of this, usually using an errorterm derived via the variation of effects between replicates, as inany randomized block experiment. If there were to be just oneunit receiving each treatment combination, r = 1, then some otherapproach to estimating the variance is required.

An important issue of interpretation that arises repeatedly alsoin more complicated situations concerns the role of main effects inthe presence of interaction.

Consider the interpretation of, say, τA2 − τA

1 . When interactionis present the difference between levels 2 and 1 of A at level j offactor B, namely

τ2j − τ1j (5.4)


in the notation of (5.1), depends on j. If these individual differenceshave different signs then we say there is qualitative interaction. Inthe absence of qualitative interaction, the main effect of A, which isthe average of the individual differences over j, retains some weakinterpretation as indicating the general direction of the effect ofA at all levels of B used in the experiment. However, generally inthe presence of interaction, and especially qualitative interaction,the main effects do not provide a useful summary of the data andinterpretation is primarily via the detailed pattern of individualeffects.

In the definitions of main effects and interactions used above theparameters automatically satisfy the constraints

ΣiτAi = Σjτ

Bj = Σiτ

ABij = Σjτ

ABij = 0. (5.5)

The parameters are of three kinds and it is formally possible toproduce submodels in which all the parameters of one, or even two,types are zero. The model with all τAB

ij zero, the model of maineffects, has a clear interpretation and comparison with it is thebasis of the test for interaction. The model with, say, all τB

j zero,i.e. with main effect of A and interaction terms in the model is,however, artificial and in almost all contexts totally implausible asa basis for interpretation. It would allow effects of B at individuallevels of A but these effects would average exactly to zero overthe levels of A that happened to be used in the experiment underanalysis. Therefore, with rare exceptions, a hierarchical principleshould be followed in which if an interaction term is included in amodel so too should both the associated main effects. The principleextends when there are more than two factors.

A rare exception is when the averaging over the particular levelssay of factor B used in the study has a direct physical significance.

Illustration. In an animal feeding trial suppose that A representsthe diets under study and thatB is not a treatment but an intrinsicfactor, sex. Then interaction means that the difference between di-ets is not the same for males and females and qualitative interactionmeans that there are some reversals of effect, for example that diet2 is better than diet 1 for males and inferior for females. Inspectiononly of the main effect of diets would conceal this. Suppose, how-ever, that on the basis of the experiment a recommendation is tobe made as to the choice of diet and this choice must for practicalreasons be the same for male as for female animals. Suppose alsothat the target population has an equal mix of males and females.


Then regardless of interaction the main effect of diets should bethe basis of choice. We stress that such a justification will rarelybe available.

5.3.2 Higher order interaction

When there are more than two factors the above argument extendsby induction. As an example, if there are three factors we couldassess the three-way interaction A×B×C by examining the two-way tables of A×B means at each level of C. If there is no three-factor interaction, then

τijk − τi.k − τ.jk + τ..k (5.6)

is independent of k for all i, j, k and therefore is equivalent to

τij. − τi.. − τ.j. + τ.... (5.7)

We can use this argument to conclude that the three-way inter-action, which is symmetric in i, j and k, should be defined in themodel by

τABCijk = τijk − τij. − τi.k − τ.jk + τi.. + τ.j. + τ..k − τ... (5.8)

and the corresponding sum of squares of the observations can beused to assess the significance of an interaction in data.

Note that these formal definitions apply also when one or moreof the factors refer to properties of experimental units rather thanto treatments. Testing the significance of interactions, especiallyhigher order interactions, can be an important part of analysis,whereas for the main effects of treatment factors estimation is likelyto be the primary focus of analysis.

5.3.3 Interpretation of interaction

Clearly lack of interaction greatly simplifies the conclusions, andin particular means that reporting the average response for eachfactor level is meaningful.

If there is clear evidence of interaction, then the following pointswill be relevant to the interpretation of the analysis. First, sum-mary tables of means for factor A, say, averaged over factor B,or for A × B averaged over C will not be generally useful in thepresence of interaction. As emphasized in the previous subsection


the significance (or otherwise) of main effects is virtually alwaysirrelevant in the presence of appreciable interaction.

Secondly, some particular types of interaction can be removedby transformation of the response. This indicates that a scale in-appropriate for the interpretation of response may have been used.Note, however, that if the response variable analysed is physicallyadditive, i.e. is extensive, transformation back to the original scaleis likely to be needed for subject matter interpretation.

Thirdly, if there are many interactions involving a particularfactor, separate analyses at the different levels of that factor maylead to the most incisive interpretation. This may especially be thecase if the factor concerns intrinsic properties of the experimentalunits: it may be scientifically more relevant to do separate analysesfor men and for women, for example.

Fourthly, if there are many interactions of very high order, theremay be individual factor combinations showing anomalous response,in which case a factorial formulation may well not be appropriate.

Finally, if the levels of some or all of the factors are defined byquantitative variables we may postulate an underlying relationshipEY (x1, x2) = η(x1, x2), in which a lack of interaction indicatesη(x1, x2) = η1(x1)+ η2(x2), and appreciable interaction suggests amodel such as

η(x1, x2) = η1(x1) + η2(x2) + η12(x1, x2), (5.9)

where η12(x1, x2) is not additive in its arguments, for example de-pending on x1x2. An important special case is where η(x1, x2) is aquadratic function of its arguments. Response surface methods forproblems of this sort will be considered separately in Section 6.6.

Our attitude to interaction depends considerably on context andindeed is often rather ambivalent. Interaction between two treat-ment factors, especially if it is not removable by a meaningful non-linear transformation of response, is in one sense rather a nuisancein that it complicates simple description of effects and may lead toserious errors of interpretation in some of the more complex frac-tionated designs to be considered later. On the other hand such in-teractions may have important implications pointing to underlyingmechanisms. Interactions between treatments and specific featuresof the experimental units, the latter in this context sometimes be-ing called effect modifiers, may be central to interpretation and, inmore applied contexts, to action. Of course interactions expected


Table 5.3 Analysis of variance for a two factor experiment in a com-pletely randomized design.


A∑

i,j,s(Yi.. − Y...)2 a− 1B

∑i,j,s(Y.j. − Y...)2 b− 1

A×B∑

i,j,s(Yij. − Yi.. − Y.j. + Y...)2 (a− 1)(b− 1)Residual

∑i,j,s(Yijs − Yij.)2 ab(r − 1)

on a priori subject matter grounds deserve more attention thanthose found retrospectively.

5.3.4 Analysis of two factor experiments

Suppose we have two treatment factors, A and B, with a and blevels respectively, and we have r replications of a completely ran-domized design in these treatments. The associated linear modelcan be written as

Yijs = µ+ τAi + τB

j + τABij + εijs. (5.10)

The analysis centres on the interpretation of the table of treat-ment means, i.e. on the Yij. and calculation and inspection of thisarray is a crucial first step.

The analysis of variance table is constructed from the identity

Yijs = Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...) + (Yijs − Yij.). (5.11)

If the ab possible treatments have been randomized to the rabexperimental units then the discussion of Chapter 4 justifies theuse of the residual mean square, i.e. the variation between unitswithin each treatment combination, as an estimate of error. If theexperiment were arranged in randomized blocks or Latin squaresor other similar design there would be a parallel analysis incorpo-rating block, or row and column, or other relevant effects.

Thus in the case of a randomized block design in r blocks, wewould have r− 1 degrees of freedom for blocks and a residual with(ab − 1)(r − 1) degrees of freedom used to estimate error, as ina simple randomized block design. The residual sum of squares,


Table 5.4 Analysis of variance for factorial experiment on the effect ofdiets on weights of chicks.


House 708297 1 708297p-type 373751 1 373751p-level 636283 2 318141f-level 1421553 1 1421553p-type× p-level 858158 2 429079p-type× f-level 7176 1 7176p-level× f-level 308888 2 154444p-type× p-level× f-level 50128 2 25064Residual 492640 11 44785

which is formally an interaction between treatments and blocks,can be partitioned into A×blocks, B×blocks and A×B×blocks,giving separate error terms for the three components of the treat-ment effect. This would normally only be done if there were ex-pected to be some departure from unit-treatment additivity likelyto induce heterogeneity in the random variability. Alternativelythe homogeneity of these three sums of squares provides a test ofunit-treatment additivity, albeit one of low sensitivity.

In Section 6.5 we consider the interpretation when one or more ofthe factors represent nonspecific classification of the experimentalunits, for example referring to replication of an experiment overtime, space, etc.

5.4 Example: continued

In constructing the analysis of variance table we treat House as ablocking factor, and assess the size of treatment effects relative tothe interaction of treatments with House, the latter providing anappropriate estimate of variance for comparing treatment means,as discussed in Section 5.3. Table 5.4 shows the analysis of variance.As noted in Section 5.3 the residual sum of squares can be parti-tioned into components to check that no one effect is unusuallylarge.

Since level of protein is a factor with three levels, it is possible


Table 5.5 Decomposition of treatment sum of squares into linear andquadratic contrasts.

Source Sum of sq. D.f.

House 708297 1p-type 373751 1p-level

linear 617796 1quadratic 18487 1

f-level 1421553 1p-type× p-level


p-type× f-level 7176 1p-level× f-level


p-type× p-level× f-levellinear 47310 1quadratic 2820 1

Residual 492640 11

to partition the two degrees of freedom associated with its maineffects and its interactions into components corresponding to linearand quadratic contrasts, as outlined in Section 3.5. Table 5.5 showsthis partition. From the three-way table of means included in Table5.1 we see that the best treatment combination is a soybean dietat its lowest level, combined with the high level of fish solubles: theaverage weight gain on this diet is 7831 g, and the next best dietleads to an average weight gain of 7326 g. The estimated varianceof the difference between two treatment means is 2(σ2/2+ σ2/2) =211.62 where σ2 is the residual mean square in Table 5.4.

5.5 Two level factorial systems


Experiments with large numbers of factors are often used as ascreening device to assess quickly important main effects and in-


teractions. For this it is common to set each factor at just twolevels, aiming to keep the size of the experiment manageable. Thelevels of each factor are conventionally called low and high, or ab-sent and present.

We denote the factors by A,B, . . . and a general treatment com-bination by aibj . . ., where i, j, . . . take the value zero when thecorresponding factor is at its low level and one when the corre-sponding factor is at its high level. For example in a 25 design, thetreatment combination bde has factors A and C at their low level,and B, D and E at their high level. The treatment combination ofall factors at their low level is (1).

We denote the treatment means in the population, i.e. the ex-pected responses under each treatment combination, by µ(1), µa,and so on. The observed response for each treatment combinationis denoted by Y(1), Ya, and so on. These latter will be averages overreplicates if there is more than one observation on each treatment.

The simplest case is a 22 experiment, with factors A and B, andfour treatment combinations (1), a, b, and ab. There are thus fouridentifiable parameters, the general mean, two main effects and aninteraction. In line with the previous notation we denote these by µ,τA, τB and τAB . The population treatment means µ(1), µa, µb, µab

are simple linear combinations of these parameters:

τA = (µab + µa − µb − µ(1))/4,

τB = (µab − µa + µb − µ(1))/4,

τAB = (µab − µa − µb + µ(1))/4.

The corresponding least squares estimate of, for example, τA, un-der the summation constraints, is

τA = (Yab + Ya − Yb − Y(1))/4= (Y2.. − Y1..)/2 = Y2.. − Y... (5.12)

where, for example, Y2.. is the mean over all replicates and overboth levels of factor B of the observations taken at the higher levelof A. Similarly

τAB = (Yab−Ya−Yb+Y(1))/4 = (Y11.−Y21.−Y12.+Y22.)/4, (5.13)

where Y11. is the mean of the r observations with A and B at theirlower levels.

The A contrast, also called the A main effect, is estimated bythe difference between the average response among units receiving


high A and the average response among units receiving low A,and is equal to 2τA as defined above. In the notation of (5.2)τA = τA

2 = −τA1 , and the estimated A effect is defined to be

τA2 − τA

1 . The interaction is estimated via the difference betweenYab − Yb and Ya − Y(1), i.e. the difference between the effect of Aat the high level of B and the effect of A at the low level of B.

Thus the estimates of the effects are specified by three orthogonallinear contrasts in the response totals. This leads directly to ananalysis of variance table of the form shown in Table 5.6.

By defining

I = (1/4)(µ(1) + µa + µb + µab) (5.14)

we can write

IABAB

=

(1/4)(1/2)(1/2)(1/2)

1 1 1 1−1 1 −1 1−1 −1 1 1

1 −1 −1 1

µ(1)

µa

µb

µab

(5.15)

and this pattern is readily generalized to k greater than 2; forexample

8I4A4B4AB4C4AC4BC4ABC

=

1 1 1 1 1 1 1 1−1 1−1 1−1 1−1 1−1−1 1 1−1−1 1 1

1−1−1 1 1−1−1 1−1−1−1−1 1 1 1 1

1−1 1−1−1 1−1 11 1−1−1−1−1 1 1

−1 1 1−1 1−1−1 1

µ(1)

µa

µb

µab

µc

µac

µbc

µabc

. (5.16)

Note that the effect of AB, say, is the contrast of aibjck for whichi + j = 0 mod 2, with those for which i + j = 1 mod 2. Also theproduct of the coefficients for C and ABC gives the coefficients forAB, etc. All the contrasts are orthogonal.

The matrix in (5.16) is constructed row by row, the first rowconsisting of all 1’s. The rows for A and B have entries −1, +1in the order determined by that in the set of population treat-ment means: in (5.16) they are written in standard order to makeconstruction of the matrix straightforward. The row for AB is theproduct of those for A and B, and so on. Matrices for up to a 26

design can be quickly tabulated in a table of signs.


Table 5.6 Analysis of variance for r replicates of a 22 factorial.

Source Sum sq. D.f.

Factor A SSA 1Factor B SSB 1Interaction SSAB 1Residual 4(r − 1)Total 4r − 1

5.5.2 General definitions

The matrix approach outlined above becomes increasingly cum-bersome as the number of factors increases. It is convenient fordescribing the general 2k factorial to use some group theory: Ap-pendix B provides the basic definitions. The treatment combina-tions in a 2k factorial form a prime power commutative group; seeSection B.2.2. The set of contrasts also forms a group, dual to thetreatment group.

In the 23 factorial the treatment group is

(1), a, b, ab, c, ac, bc, abcand the contrast group is

I, A,B,AB,C,AC,BC,ABC.As in (5.16) above, each contrast is the difference of the populationmeans for two sets of treatments, and the two sets of treatmentsare determined by an element of the contrast group. For examplethe element A partitions the treatments into the sets (1), b, c, bcand a, ab, ac, abc, and the A effect is thus defined to be (µa +µab + µac + µabc − µ(1) − µb − µc − µbc)/4.

In a 2k factorial we define a contrast group I, A,B,AB, . . .consisting of symbols AαBβCγ · · ·, where α, β, γ, . . . take values 0and 1. An arbitrary nonidentity element AαBβ · · · of the contrastgroup divides the treatments into two sets, with aibjck · · · in oneset or the other according as

αi+ βj + γk + · · · = 0 mod 2, (5.17)αi+ βj + γk + · · · = 1 mod 2. (5.18)


Then

AαBβCγ · · · =1

2k−1sum of µ′s in set containing aαbβcγ · · ·

− sum of µ′s in other set, (5.19)

I =12ksum of all µ′s. (5.20)

The two sets of treatments defined by any contrast form a subgroupand its coset; see Section B.2.2.

More generally, we can divide the treatments into 2l subsetsusing a contrast subgroup of order 2l. Let SC be a subgroup oforder 2l of the contrast group defined by l generators

G1 = Aα1Bβ1 · · ·G2 = Aα2Bβ2 · · ·

... (5.21)Gl = AαlBβl · · · .

Divide the treatments group into 2l subsets containing(i) all symbols with (even, even, ... , even) number of letters in

common with G1, . . . , Gl;(ii) all symbols with (odd, even, ... , even) number of letters in

common with G1, . . . , Gl

...(2l) all symbols with (odd, odd, ... , odd) number of letters in

common with G1, . . . , Gl.Then (i) is a subgroup of order 2k−l of the treatments group, andall sets (ii) ... (2l) contain 2k−l elements and are cosets of (i). Inparticular, therefore, there are the same number of treatments ineach of these sets.

For example, in a 24 design, the contrasts ABC, BCD (or thecontrast subgroup I, ABC, BCD, AD) divide the treatmentsinto the four sets

(i) (1), bc, abd, acd (ii) a, abc, bd, cd (iii) d, bcd, ab, ac (iv) ad, abcd, b, c .

The treatment subgroup in (i) is dual to the contrast subgroup.


Any two contrasts are orthogonal, in the sense that the definingcontrasts divide the treatments into four equally sized sets, a sub-group and three cosets.

5.5.3 Estimation of contrasts

In a departure from our usual practice, we use the same notationfor the population contrast and for its estimate. Consider a designin which each of the 2k treatments occurs r times, the design beingarranged in completely randomized form, or in randomized blockswith 2k units per block, or in 2k × 2k Latin squares. The leastsquares estimates of the population contrasts are simply obtainedby replacing population means by sample means: for example,

AαBβCγ · · · =1

2k−1rsum of y′s in set containing aαbβcγ · · ·

− sum of y′s in other set, (5.22)

I =1

2krsum of all y′s. (5.23)

Each contrast is estimated by the difference of two means eachof r2k−1 = (1/2)n observations, which has variance 2σ2/(r2k−1) =4σ2/n. The analysis of variance table, for example for the ran-domized blocks design, is given in Table 5.7. The single degree offreedom sums of squares are equal to r2k−2 times the square of thecorresponding estimated effect, a special case of the formula for alinear contrast given in Section 3.5. If meaningful, the residual sumof squares can be partitioned into sets of r− 1 degrees of freedom.A table of estimated effects and their standard errors will usuallybe a more useful summary than the analysis of variance table.

Typically for moderately large values of k the experiment willnot be replicated, so there is no residual sum of squares to provide adirect estimate of the variance. A common technique is to pool theestimated effects of the higher order interactions, the assumptionbeing that these interactions are likely to be negligible, in whichcase each of their contrasts has mean zero and variance 4σ2/n. Ifwe pool l such estimated effects, we have l degrees of freedom toestimate σ2. For example, in a 25 experiment there are five maineffects and 10 two factor interactions, leaving 16 residual degreesof freedom if all the third and higher order interactions are pooled.

A useful graphical aid is a normal probability plot of the es-timated effects. The estimated effects are ordered from smallest


Table 5.7 Analysis of variance for a 2k design.

Source Degrees of freedom

Blocks r − 1

Treatments 2k − 1

1...1

Residual (r − 1)(2k − 1)

to largest, and the ith effect in this list of size 2k − 1 is plottedagainst the expected value of the ith largest of 2k − 1 order statis-tics from the standard normal distribution. Such plots typicallyhave a number of nearly zero effect estimates falling on a straightline, and a small number of highly significant effects which read-ily stand out. Since all effects have the same estimated variance,this is an easy way to identify important main effects and inter-actions, and to suggest which effects to pool for the estimation ofσ2. Sometimes further plots may be made in which either all maineffects are omitted or all manifestly significant contrasts omitted.The expected value of the ith of n order statistics from the stan-dard normal can be approximated by Φ−1i/(n+ 1), where Φ(·)is the cumulative distribution function of the standard normal dis-tribution. A variation on this graphical aid is the half normal plot,which ranks the estimated effects according to their absolute val-ues, which are plotted against the corresponding expected value ofthe absolute value of a standard normal variate. The full normalplot is to be preferred if, for example, the factor levels are defined insuch a way that the signs of the estimated main effects have a rea-sonably coherent interpretation, for example that positive effectsare a priori more likely than negative effects.

5.6 Fractional factorials

In some situations quite sharply focused research questions are for-mulated involving a small number of key factors. Other factors maybe involved either for technical reasons, or to explore interactions,


but the contrasts of main concern are clear. In other applications ofa more exploratory nature, there may be a large number of factorsof potential interest and the working assumption is often that onlymain effects and a small number of low order interactions are im-portant. Other possibilities are that only a small group of factorsand their interactions influence response or that response may bethe same except when all factors are simultaneously at their highlevels.

Illustration. Modern techniques allow the modification of singlegenes to find the gene or genes determining a particular featurein experimental animals. For some features it is likely that only asmall number of genes are involved.

We turn now to methods particularly suited for the second sit-uation mentioned above, namely when main effects and low orderinteractions are of primary concern.

A complete factorial experiment with a large number of factorsrequires a very large number of observations, and it is of inter-est to investigate what can be estimated from only part of thefull factorial experiment. For example, a 27 factorial requires 128experimental units, and from these responses there are to be es-timated 7 main effects and 21 two factor interactions, leaving 99degrees of freedom to estimate error and/or higher order interac-tions. It seems feasible that quite good estimates of main effectsand two factor interactions could often be obtained from a muchsmaller experiment.

As a simple example, suppose in a 23 experiment we obtain ob-servations only from treatments (1), ab, ac and bc. The linear com-bination (yab + yac − ybc − y(1))/2 provides an estimate of the Acontrast, as it compares all observations at the high level of A withthose at the low level. However, this linear combination is also theestimate that would be obtained for the interaction BC, using theargument outlined in the previous section. The main effect of A issaid to be aliased with that of BC. Similarly the main effect of Bis aliased with AC and that of C aliased with AB. The experimentthat consists in obtaining observations only on the four treatments(1), ab, ac and bc is called a half-fraction or half-replicate of a 23

factorial.The general discussion in Section 5.5 is directly useful for defin-

ing a 2−l fraction of a 2k factorial. These designs are called 2k−l

fractional factorials. Consider first a 2k−1 fractional factorial. As


we saw in Section 5.5.2, any element of the contrast group parti-tions the treatments into two sets. A half-fraction of the 2k factorialconsists of the experiment taking observations on one of these twosets. The contrast that is used to define the sets cannot be esti-mated from the experiment, but every other contrast can be, asall the constrasts are orthogonal. For example, in a 25 factorial wemight use the contrast ABCDE to define the two sets. The set oftreatments aibjckdlem for which i+ j+ k+ l+m = 0 mod 2 formsthe first half fraction.

More generally, any subgroup of order 2l of the contrast group,defined by l generators, divides the treatments into a subgroup andits cosets. A 2k−l fractional factorial design takes observations onjust one of these sets of treatments, say the subgroup, set (i). Nowconsider the estimation of an arbitrary contrast AαBβ . . .. Thiscompares treatments for which

αi+ βj + . . . = 0 mod 2 (5.24)

with those for which

αi+ βj + . . . = 1 mod 2. (5.25)

However, by construction of the treatment subgroup all treatmentssatisfy αri+ βrj + . . . = 0 mod 2 for r = 1, . . . 2l− 1, see (5.21), sothat we are equally comparing

(α+ αr)i+ (β + βr)j + . . . = 0 (5.26)

with(α+ αr)i+ (β + βr)j + . . . = 1. (5.27)

Thus any estimated contrast has 2l−1 alternative interpretations,i.e. aliases, obtained by multiplying the contrast into the elementsof the alias subgroup. The general theory is best understood byworking through an example in detail: see Exercise 5.2.

In general we aim to choose the alias subgroup so that so far aspossible main effects and two factor interactions are aliased withthree factor and higher order interactions. Such a design is calleda design of Resolution V; designs in which two factor interactionsare aliased with each other are Resolution IV, and designs in whichtwo factor interactions are aliased with main effects are ResolutionIII. The resolution of a fractional factorial is equal to the length ofthe shortest member of the alias subgroup.

For example, suppose that we wanted a 1/4 replicate of a 26


factorial, i.e. a 26−2 design investigating six factors in 16 observa-tions. At first sight it might be tempting to take five factor interac-tions to define the aliasing subgroup, for example taking ABCDE,BCDEF as generators leading to the contrast subgroup

I, ABCDE,BCDEF,AF, (5.28)

clearly a poor choice for nearly all purposes because the main ef-fects of A and F are aliased. A better choice is the Resolution IVdesign with contrast subgroup

I, ABCD,CDEF,ABEF (5.29)

leaving each main effect aliased with two three-factor interactions.Some two factor interactions are aliased in triples, e.g.AB, CD andEF , and others in pairs, e.g. AC and BD, and occasionally someuse could be made of this distinction in naming the treatments. Tofind the 16 treatment combinations forming the design we have tofind four independent generators of the appropriate subgroup andform the full set of treatments by repeated multiplication. Thechoice of particular generators is arbitrary but might be ab, cd ,ef , ace yielding

(1) ab cd abcd ef abef cdef abcdeface bce ade bde acf bcf adf bdf (5.30)

A coset of these could be used instead.Note that if, after completing this 1/4 replicate, it were decided

that another 1/4 replicate is needed, replication of the same set oftreatments would not usually be the most suitable procedure. If,for example it were of special interest to clarify the status of theinteraction AB, it would be sensible in effect to reduce the aliasingsubgroup to

I,CDEF (5.31)

by forming a coset by multiplication by, for example, acd, whichis not in the above subgroup but which is even with respect toCDEF .

There are in general rich possibilities for the formation of seriesof experiments, clarifying at each stage ambiguities in the earlierresults and perhaps removing uninteresting-seeming factors andadding new ones.


5.7 Example

Blot et al. (1993) report a large nutritional intervention trial inLinxian county in China. The goal was to investigate the role ofdietary supplementation with specific vitamins and minerals on theincidence of and mortality from esophageal and stomach cancers,a leading cause of mortality in Linxian county. There were ninespecific nutrients of interest, but a 29 factorial experiment was notconsidered feasible. The supplements were instead administered incombination, and each of four factors was identified by a particularset of nutrients, as displayed in Table 5.8.

The trial recruited nearly 30 000 residents, who were randomlyassigned to receive one of eight vitamin/mineral supplement combi-nations within blocks defined by commune, sex and age. The treat-ment set formed a one-half fraction of the full 24 design with thecontrast ABCD defining the fraction. Table 5.9 shows the data,the number of cancer deaths and person-years of observation ineach of the eight treatment groups. Estimates of the main effectsand the two factor interactions are presented in Table 5.10. Thetwo factor interactions are all aliased in pairs.

Table 5.8 Treatment factors: combinations of micronutrients. From Blotet al. (1993).

Factor Micronutrients Dose per day

A Retinol 5000 IUZinc 22.5 mg

B Riboflavin 3.2 mgNiacin 40 mg

C Vitamin C 120 mgMolybdenum 30 µg

D Beta carotene 15 mgSelenium 50 µgVitamin E 30 mg

We estimate the treatment effects using a log-linear model for therates of cancer deaths in the eight groups. If we regard the countsas being approximately Poisson distributed, the variance of the logof a single response is approximately 1/µ, where µ is the Poisson


Table 5.9 Cancer mortality in the Linxian study. From Blot et al.(1993).

Person-years of Number of Deaths fromTreatment observation, nc cancer deaths, dc all causes

(1) 18626 107 280ab 18736 94 265ac 13701 121 296bc 18686 101 268ad 18745 81 250bd 18729 103 263cd 18758 90 249abcd 18792 95 256

Table 5.10 Estimated effects based on analysis of log(dc/nc).

A B C D AB,CD−0.036 −0.005 0.053 −0.140 −0.043

AC,BD AD,BC0.152 −0.058

mean (Exercise 8.3), so the average of these across the eight groupsis estimated by 1

8 (1/107+ . . .+ 1/95) = 0.010. Since each contrastis the difference between averages of four totals, the standard errorof the estimated effects is approximately 0.072. From this we seethat the main effect of D is substantial, although the interpretationof this is somewhat confounded by the large increase in mortalityrate associated with the interaction AC = BD. This is consistentwith the conclusion reached in the analysis of Blot et al. (1993),that dietary supplementation with beta carotene, selenium andvitamin E is potentially effective in reducing the mortality fromstomach cancers. There is a similar effect on total mortality, whichis analysed in Appendix C.

To some extent this analysis sets aside one of the general prin-ciples of Chapters 2 and 3. By treating the random variation ashaving a Poisson distribution we are in effect treating individualsubjects as the experimental units rather than the groups of sub-


jects which are the basis of the randomization. It is thus assumedthat so far as the contrasts of treatments are concerned the block-ing has essentially accounted for all the overdispersion relative tothe Poisson distribution that is likely to be present. The morecareful analysis of Blot et al. (1993), which used the more detaileddata in which the randomization group is the basis of the analysis,essentially confirms that.


The importance of a factorial approach to the design of experi-ments was a key element in Fisher’s (1926, 1935) systematic ap-proach to the subject. Many of the crucial details were developedby Yates (1935, 1937). A review of the statistical aspects of interac-tion is given in Cox (1984a); see also Cox and Snell (1981; Section4.13). For discussion of qualitative interaction, see Azzalini andCox (1984), Gail and Simon (1985) and Ciminera et al. (1993).

Factorial experiments are quite widely used in many fields. Fora review of the fairly limited number of clinical trials that are fac-torial, see Piantadosi (1997, Chapter 15). The systematic explo-ration of factorial designs in an industrial context is described byBox, Hunter and Hunter (1978); see also the Bibliographic notes forChapter 6. Daniel (1959) introduced the graphical method of thehalf-normal plot; see Olguin and Fearn (1997) for the calculationof guard rails as an aid to interpretation.

Fractional replication was first discussed by Finney (1945b) and,independently, for designs primarily concerned with main effects,by Plackett and Burman (1945). For an introductory account ofthe mathematical connection between fractional replication andcoding theory, see Hill (1986) and also the Bibliographic notes forAppendix B.

A formal mathematical definition of the term factor is providedin McCullagh (2000) in relation to category theory. This provides amathematical interpretation to the notion that main effects are notnormally meaningful in the presence of interaction. McCullagh alsouses the formal definition of a factor to emphasize that associatedmodels should preserve their form under extension and contractionof the set of levels. This is particularly relevant when some of thefactors are homologous, i.e. have levels with identical meanings.



1. For a single replicate of the 22 system, write the observationsas a column vector in the standard order 1, a, b, ab. Form a new4 × 1 vector by adding successive pairs and then subtractingsuccessive pairs, i.e. to give Y1 + Ya, Yb + Yab, Ya − Y1, Yab − Yb.Repeat this operation on the new vector and check that thereresults, except for constant multipliers, estimates in standardorder of the contrast group.

Show by induction that for the 2k system k repetitions of theabove procedure yield the set of estimated contrasts.

Observe that the central operation is repeated multiplication bythe 2× 2 matrix

M =(

1 11 −1

)(5.32)

and that the kth Kronecker product of this matrix with itselfgenerates the matrix defining the full set of contrasts.

Show further that by working with a matrix proportional toM−1 we may generate the original observations starting fromthe set of contrasts and suggest how this could be used to smootha set of observations in the light of an assumption that certaincontrasts are null.

The algorithm was given by Yates (1937) after whom it is com-monly named. An extension covering three level factors is dueto Box and reported in the book edited by Davies (1956). Theelegant connection to Kronecker products and the fast Fouriertransform was discussed by Good (1958).

2. Construct a 1/4 fraction of a 25 factorial using the generatorsG1 = ABCD and G2 = CDE. Write out the sets of aliasedeffects.

3. Using the construction outlined in Section 5.5.2 in a 24 factorial,verify that any contrast does define two sets of treatments, with23 treatments in each set, and that any pair of contrasts dividesthe treatments into four sets each of 22 treatments.

4. In the notation of Section 5.5.2, verify that the 2l subsets of thetreatment group constructed there are equally determined bythe conditions:

(i) α1i+ β1j + . . . = 0, α2i+ β2j + . . . = 0, . . . ,


αli+ βlj + . . . = 0,(ii) α1i+ β1j + . . . = 1, α2i+ β2j + . . . = 0, . . . ,

αli+ βlj + . . . = 0,...

(2l) α1i+ β1j + . . . = 1, α2i+ β2j + . . . = 1, . . . ,

αli+ βlj + . . . = 1.

5. Table 5.11 shows the design and responses for four replicatesof a 1/4 fraction of a 26 factorial design. The generators usedto determine the set of treatments were ABCD and BCEF .Describe the alias structure of the design and discuss its advan-tages and disadvantages. The factors represent the amounts ofvarious minor ingredients added to flour during milling, and theresponse variable is the average volume in ml/g of three loavesof bread baked from dough using the various flours (Tuck, Lewisand Cottrell, 1993). Table 5.12 gives the estimates of the maineffects and estimable two factor interactions. The standard er-ror of the estimates can be obtained by pooling small effects orvia the treatment-block interaction, treating day as a blockingfactor. The details are outlined in Appendix C.

6. Factorial experiments are normally preferable to those in whichsuccessive treatment combinations are defined by changing onlyone factor at a time, as they permit estimation of interactions aswell as of main effects. However, there may be cases, for examplewhen it is very difficult to vary factor levels, where one-factor-at-a-time designs are needed. Show that for a 23 factorial, the de-sign which has the sequence of treatments (1), a, ab, abc, bc, c, (1)permits estimation of the three main effects and of the threeinteraction sums AB + AC, −AB + BC and AC + BC. Thisdesign also has the property that the main effects are not con-founded by any linear drift in the process over the sequence ofthe seven observations. Extend the discussion to the 24 experi-ment (Daniel, 1994).

7. In a fractional factorial with a largish number of factors, theremay be several designs of the same resolution. One means ofchoosing between them rests on the combinatorial concept ofminimum aberration (Fries and Hunter, 1980). For example afractional factorial of resolution three has no two factor inter-actions aliased with main effects, but they may be aliased with


Table 5.11 Exercise 5.6: Volumes of bread (ml/g) from Tuck, Lewis andCottrell (1993).

Factor levels; factors are coded Average specific volumeingredient amounts for the following days:

A B C D E F 1 2 3 4−1 −1 −1 −1 −1 −1 519 446 337 415−1 −1 −1 −1 1 1 503 468 343 418−1 −1 1 1 −1 1 567 471 355 424−1 −1 1 1 −1 −1 552 489 361 425−1 1 −1 1 −1 1 534 466 356 431−1 1 −1 1 1 −1 549 461 354 427−1 1 1 −1 −1 −1 560 480 345 437−1 1 1 −1 1 1 535 477 363 418

1 −1 −1 1 −1 −1 558 483 376 4181 −1 −1 1 1 1 551 472 349 4261 −1 1 −1 −1 1 576 487 358 4341 −1 1 −1 1 −1 569 494 357 4441 1 −1 −1 −1 1 562 474 358 4041 1 −1 −1 1 −1 569 494 348 4001 1 1 1 −1 −1 568 478 367 4631 1 1 1 1 1 551 500 373 462

Table 5.12 Estimates of contrasts for data in Table 5.11.

A B C D E F AB13.66 3.72 14.72 7.03 −0.16 −2.41 −2.53AC BC AE BE CE DE ABE ACE0.22 −2.84 −0.1 0.03 0.16 −0.66 3.16 2.53

each other. In this setting the design of minimum aberrationequalizes as far as possible the number of two factor interac-tions in each alias set. See Dey and Mukerjee (1999, Chapter 8)and Cheng and Mukerjee (1998) for a more detailed discussion.Another method for choosing among fractional factorial designsis to minimize (or conceivably maximize) the number of levelchanges required during the execution of the experiment. See


Cheng, Martin and Tang (1998) for a mathematical discussion.Mesenbrink et al. (1994) present an interesting case study inwhich it was very expensive to change factor levels.


CHAPTER 6

Factorial designs: further topics

6.1 General remarks

In the previous chapter we discussed the key ideas involved in fac-torial experiments and in particular the notions of interaction andof the possibility of extracting useful information from fractionsof the full factorial system. We begin the present more specializedchapter with a discussion of confounding, mathematically closelyconnected with fractional replication but conceptually quite dif-ferent. We continue with various more specialized topics related tofactorial designs, including factors at more than two levels, orthog-onal arrays, split unit designs, and response surface methods.

6.2 Confounding in 2k designs

6.2.1 Simple confounding

Factorial and fractional factorial experiments may require a largenumber of experimental units, and it may thus be advisable touse one or more of the methods described in Chapters 3 and 4for controlling haphazard variation. For example, it may be fea-sible to try only eight treatment combinations in a given day, orfour treatment combinations on a given batch of raw material.The treatment sets defined in Section 5.5.2 may then be used toarrange the 2k experimental units in blocks of size 2k−p in sucha way that block differences can be eliminated without losing in-formation about specified contrasts, usually main effects and loworder interactions.

For example, in a 23 experiment to be run in two blocks, we canuse the ABC effect to define the blocks, by simply putting into oneblock the treatment subgroup obtained by the contrast subgroupI, ABC, and into the second block its coset:

Block 1: (1) ab ac bcBlock 2: a b c abc


Note that the second block can be obtained by multiplication mod2 of any one element of that block with those in the first block. TheABC effect is now confounded with blocks, i.e. it is not possibleto estimate it separately from the block effect. The analysis ofvariance table has one degree of freedom for blocks, and six degreesof freedom for the remaining effects A, B, C, AB, AC, BC. Thewhole experiment could be replicated r times.

Experiments with larger numbers of factors can be divided intoa larger number of blocks by identifying the subgroup and cosets ofthe treatment group associated with particular contrast subgroups.For example, in a 25 experiment the two contrasts ABC , CDE formthe contrast subgroup I,ABC ,CDE ,ABDE, and this dividesthe treatment group into the following sets:

Block 1: (1) ab acd bcd ace bce de abdeBlock 2: a b cd abcd ce abce ade bdeBlock 3: c abc ad bd ae be cde abcdeBlock 4: ac bc d abd e abe acde bcde

following the discussion of Section 5.5.2. The defining contrastsABC , CDE , and their product mod 2, namely ABDE , are con-founded with blocks. If there were prior information to indicatethat particular interactions are of less interest than others, theywould, of course, be chosen as the ones to be confounded.

With larger experiments and larger blocks the general discussionof Section 5.5.2 applies directly. The block that receives treatment(1) is called the principal block. In the analysis of a 2k experimentrun in 2p blocks, we have 2p − 1 degrees of freedom for blocks,and 2k− 2p estimated effects that are not confounded with blocks.Each of the unconfounded effects is estimated in the usual way,as a difference between two equal sized sets of treatments, dividedby r2k−1 if there are r replicates, and estimated with varianceσ2/(r2k−2). A summary is given in a table of estimated effectsand their standard errors, or for some purposes in an analysis ofvariance table. If, as would often be the case, there is just onereplicate of the experiment (r = 1), the error can be estimatedby pooling higher order unconfounded interactions, as discussed inSection 5.5.3.

If there are several replicates of the blocked experiment, and thesame contrasts are confounded in all replicates, they are said to be


totally confounded. Using the formulae from Section 5.5.3, the un-confounded contrasts are estimated with standard error 2σm/

√n

where n is the total number of units and σ2m is the variance among

responses in a single block of m units, where here m = 2k−p. Ifthe experiment were not blocked, the corresponding standard er-ror would be 2σn/

√n, where σ2

n is the variance among all n units,and would often be appreciably larger than σ2

m.

6.2.2 Partial confounding

If we can replicate the experiment, then it may be fruitful toconfound different contrasts with blocks in different replicates, inwhich case we can recover an estimate of the confounded inter-actions, although with reduced precision. For example, if we havefour replicates of a 23 design run in two blocks of size 4, we couldconfound ABC in the first replicate, AB in the second, AC in thethird, and BC in the fourth, giving:

Replicate I: Block 1: (1) ab ac bcBlock 2: a b c abc

Replicate II: Block 1: (1) ab c abcBlock 2: a b bc ac

Replicate III: Block 1: (1) b ac abcBlock 2: a c bc ab

Replicate IV: Block 1: (1) a bc abcBlock 2: b c ab ac

Estimates of a contrast and its standard error are formed fromthe replicates in which that contrast is not confounded. In theabove example we have three replicates from which to estimateeach of the four interactions, and four replicates from which toestimate the main effects. Thus if σ2

m denotes the error variancecorresponding to blocks of size m, all contrasts are estimated withhigher precision after confounding provided σ2

4/σ28 < 3/4.

A further fairly direct development is to combine fractional repli-cation with confounding. This is illustrated in Section 6.3 below.

6.2.3 Double confounding

In special cases it may be possible to construct orthogonal con-founding patterns using different sets of contrasts, and then to


Table 6.1 An example of a doubly confounded design.

(1) abcd bce ade acf bdf abef cdefabd c acde be bcdf af def abcefabce de a bcd bef acdef cf abdfcde abe bd ac adef bcef abcdf fbcf adf ef abcdef ab cd ace bdeacdf bf abd cef d abc bcde aeaef bcdef abcf df ce abde b acdbdef acef cdf abf abcde e ad bc

associate the sets of treatments so defined with two (or more) dif-ferent blocking factors, for example the rows and columns of aLatin square style design. The following example illustrates themain ideas.

In a 26 experiment suppose we choose to confound ACE , ADF ,and BDE with blocks. This divides the treatments into blocks ofsize 8, and the interactions CDEF , ABCD , ABEF and BCF arealso confounded with blocks. The alternative choice ABF , ADE ,and BCD also determines blocks of size 8, with a distinct set ofconfounded interactions (BDE , ACDF , ABCE and CEF ). Thuswe can use both sets of generators to set out the treatments in a23× 23 square. The design before randomization is shown in Table6.1. The principal block for the first confounding pattern gives thefirst row, the principal block for the second confounding patterngives the first column, and the remaining treatment combinationsare determined by multiplication (mod 2) of these two sets of treat-ments, achieving coset structure both by rows and by columns.

The form of the analysis is summarized in Table 6.2, where thelast three rows would usually be pooled to give an estimate of errorwith 22 degrees of freedom.

Fractional factorial designs may also be laid out in blocks, inwhich case the effects defining the blocks and all their aliases areconfounded with blocks.


Table 6.2 Degrees of freedom for the doubly confounded Latin squaredesign in Table 6.1.

Rows 7Columns 7Main effects 62-factor interactions 153-factor interactions 20− 8=124-factor interactions 15− 6 = 95-factor interactions 66-factor interaction 1

6.3 Other factorial systems


It is often necessary to consider factorial designs with factors atmore than two levels. Setting a factor at three levels allows, whenthe levels are quantitative, estimation of slope and curvature, andthus, in particular, a check of linearity of response. A factor withfour levels can formally be regarded as the product of two factorsat two levels each, and the design and analysis outlined in Chapter5 can be adapted fairly directly.

For example, a 32 design has factors A and B at each of threelevels, say 0, 1 and 2. The nine treatment combinations are (1),a, a2, b, b2, ab, a2b, ab2 and a2b2. The main effect for A has twodegrees of freedom and is estimated from two contrasts, preferablybut not necessarily orthogonal, between the total response at thethree levels of A. If the factor is quantitative it is natural to usethe linear and quadratic contrasts with coefficients (−1, 0, 1) and(1,−2, 1) respectively (cf. Section 3.5). The A×B interaction hasfour degrees of freedom, which might be decomposed into singledegrees of freedom using the direct product of the same pair ofcontrast coefficients. The four components of interaction are de-noted ALBL, ALBQ, AQBL, AQBQ, in an obvious notation. If thelevels of the two factors were indexed by x1 and x2 respectively,then these four effects are coefficients of the products x1x2, x1x

22,

x21x2, and (x2

1 − 1)(x22 − 1). The first effect is essentially the inter-

action component of the quadratic term in the response, to whichthe cubic and quartic effects are to be compared.


Table 6.3 Two orthogonal Latin squares used to partition the A × Binteraction.

Q R S Q R SR S Q S Q RS Q R R S Q

A different partition of the interaction term is suggested by con-sidering the two orthogonal 3 × 3 Latin squares shown in Table6.3. If we associate the levels of A and B with respectively therows and the columns of the squares, the letters essentially identifythe treatment combinations aibj . Each square gives two degrees offreedom for (P,Q,R), so that the two factor interaction has beenpartitioned into two components, written formally as AB and AB2.These components have no direct statistical interpretation, but canbe used to define a confounding scheme if it is necessary to carryout the experiment in three blocks of size three, or to define a 32−1

fraction.

6.3.2 Factors at a prime number of levels

Consider experiments in which all factors occur at a prime numberp of levels, where p = 3 is the most important case. The mathe-matical theory for p = 2 generalizes very neatly, although it is nottoo satisfactory statistically.

The treatment combinations aibj . . ., where i and j run from 0 top − 1, form a group Gp(a, b, . . .) with the convention apbp . . . = 1;see Appendix B. If we form a table of totals of observations asindicated in Table 6.4, we define the main effect of A, denoted bythe symbols A, . . . , Ap−1 to be the p−1 degrees of freedom involvedin the contrasts among the p totals. This set of degrees of freedomis defined by contrasting the p sets of aibj . . . for i = 0, 1, . . . , p−1.

To develop the general case we assume familiarity with the Galoisfield of order p, GF(p), as sketched in Appendix B.3. In general letα, β, γ, . . . ∈ GF(p) and define

φ = αi+ βj + · · · . (6.1)

This sorts the treatments into sets defined by φ = 0, 1, . . . , p − 1.The sets can be shown to be equal in size. Hence φ determines a


Table 6.4 Estimation of the main effect of A.

(Sum over b, c, ...)a0 a1 ap−1

Y0... Y1... Yp−1···

contrast with p− 1 degrees of freedom. Clearly cφ determines thesame contrast. We denote it by AαBβ · · · or equally AcαBcβ · · ·,where c = 1, . . . , p−1. By convention we arrange that the first non-zero coefficient is a one. For example, with p = 5, B3C2, BC4, B4Cand B2C3 all represent the same contrast. The conventional formis BC4.

We now suppose in (6.1) that α 6= 0. Consider another contrastdefined by φ′ = α′i + β′j + . . ., and suppose α′ 6= 0. Among alltreatments satisfying φ = c, and for fixed j, k, . . ., we have i =(c− βj − γk − . . .)/α and then eliminating i from φ′ gives

φ′ =α′

αc+ (β′ − α′

αβ)j + . . . (6.2)

with not all the coefficients zero. As j, k, . . . run through all values,with φ fixed, so does φ′. Hence the contrasts defined by φ′ areorthogonal to those defined by φ.

We have the following special cases1. For the main effect of A, φ = i.2. In the table of AB totals there are p2−1 degrees of freedom. The

main effects account for 2(p−1). The remaining (p−1)2 form theinteraction A×B. They are the contrasts AB,AB2, . . . , ABp−1

each with p− 1 degrees of freedom.3. Similarly ABC,ABC2, . . . , ABp−1Cp−1 are (p−1)2 sets of (p−

1) degrees of freedom each, forming the A × B × C interactionwith (p− 1)3 degrees of freedom.The limitation of this approach is that the subdivision of, say,

the A×B interaction into separate sets of degrees of freedom usu-ally has no statistical interpretation. For example, if the factorlevels were determined by equal spacing of a quantitative factor,this subdivision would not correspond to a partition by orthogonalpolynomials, which is more natural.


In the 32 experiment discussed above, the main effect A com-pares aibj for i = 0, 1, 2 mod 3, the interaction AB compares aibj

for i+ j = 0, 1, 2 mod (3), and the interaction AB2 compares aibj

for i+ 2j = 0, 1, 2 mod (3). We can set this out in two orthogonal3× 3 Latin squares as was done above in Table 6.3.

In a 33 experiment the two factor interactions such as B×C aresplit into pairs of degrees of freedom as above. Now consider theA×B × C interaction. This is split into:

ABC : i+ j + k = 0, 1, 2 mod 3ABC2 : i+ j + 2k = 0, 1, 2 mod 3AB2C : i+ 2j + k = 0, 1, 2 mod 3AB2C2 : i+ 2j + 2k = 0, 1, 2 mod 3

We may consider the ABC term for example, as determined froma Latin cube, laid out as follows:

Q R S R S Q S Q RR S Q S Q R Q R SS Q R Q R S R S Q

where in the first layerQ corresponds to the treatment combinationwith i+ j + k = 0, R with i+ j + k = 1, and S with i+ j + k = 2.There are three further Latin cubes, orthogonal to the above cube,corresponding to the three remaining components of interactionABC2, AB2C, and AB2C2. In general with r letters we have (p−1)r−1 r-dimensional orthogonal p× p Latin hypercubes.

Each contrast divides the treatments into three equal sets andcan therefore be a basis for confounding. Thus ABC divides the 33

experiment into three blocks of nine, and with four replicates wecan confound in turn ABC2, AB2C, and AB2C2. Similarly takingI, ABC as defining an alias subgroup, the nine treatments Qabove form a 1

3 replicate with

I = ABC = A2B2C2

A = A2BC = B2C2(= BC)B = AB2C = A2C2(= AC)C = ABC2 = A2B2(= AB)

AB2 = A2C(= AC2) = BC2, etc.


Factors at pm levels can be regarded as the product of m factorsat p levels or dealt with directly by GF(pm); see Appendix B. Thecase of four levels is sketched in Exercise 6.4.

For example, the 35 experiment has a 13 replicate such that

aliases of all main effects and two factor interactions are higherorder interactions; for example we can take as the alias subgroupI, ABCDE,A2B2C2D2E2. This can be confounded into threeblocks of 27 units each using ABC2 as the effect to be confoundedwith blocks, giving

ABC2 = A2B2DE = CD2E2,

A2B2C = C2DE = ABD2E2.

The contents of the first block must satisfy i + j + k + l + m =0 mod (3) and i+j+2k = 0 mod (3), which gives three generators.The treatments in this block are

(1) de2 d2e ab2 a2b ab2de2 ab2d2e a2bde2 a2bd2e

acd acd2e2 acd2e bcd b2cd a2b2cd2e2 a2b2ce bcd2e2

bce a2c2d2 a2c2e2 a2c2de abc2d2 b2c2d2 b2c2e2 b2c2de

abc2e2 abc2de

The second and third blocks are found by multiplying by treat-ments that satisfy i + j + k + l +m = 0, but not i + j + 2k = 0.Thus ad2 and a2d2 will achieve this. The analysis of variance is setout in Table 6.5.

Table 6.5 Degrees of freedom for the estimable effects in a confounded1/3 replicate of a 35 design.

Source D.f.

Blocks 2Main effects 10Two factor interactions= three factor interactions (5×4

1×2 )× 2 = 20Two factor interactions= four factor interactions 20Three factor interactions(6= two factor interactions) 5×4×3

1×2×3 × 3× 2× 12 − 2 = 28

Total 80


The Latin square has entered the discussion at various points.The design was introduced in Chapter 4 as a design for a singleset of treatments with the experimental units cross-classified byrows and by columns. No special assumptions were involved in itsanalysis. By contrast if we have three treatment factors all withthe same number, k, of levels we can regard a k×k Latin square asa one-kth replicate of the k3 system in which main effects can beestimated separately, assuming there to be no interactions betweenthe treatments. Yet another role of a k × k Latin square is asa one-kth replicate of a k2 system in k randomized blocks. Theseshould be thought of as three quite distinct designs with a commoncombinatorial base.

6.3.3 Orthogonal arrays

We consider now the structure of factorial or fractional factorialdesigns from a slightly different point of view. We define for a facto-rial experiment an orthogonal array, which is simply a matrix withruns or experimental units indexing the rows, and factors indexingthe columns. The elements of the array indicate the level of thefactors for each run. For example, the orthogonal array associatedwith a single replicate of a 23 factorial may be written out as

−1 −1 −11 −1 −1

−1 1 −11 1 −1

−1 −1 11 −1 1

−1 1 11 1 1

(6.3)

We could as well use the symbols (0, 1) as elements of the array, or(“high”, “low”), etc. The structure of the design is such that thecolumns are mutually orthogonal, and in any pair of columns eachpossible treatment combination occurs the same number of times.The columns of the array (6.3) are the three rows in the matrix ofcontrast coefficients (5.16) corresponding to the three main effectsof factors A, B, and C.

The full array of contrast coefficients is obtained by pairwise


multiplication of columns of (6.3):

1 −1 −1 1 −1 1 1 −11 1 −1 −1 −1 −1 1 11 −1 1 −1 −1 1 −1 11 1 1 1 −1 −1 −1 −11 −1 −1 1 1 −1 −1 11 1 −1 −1 1 1 −1 −11 −1 1 −1 1 −1 1 −11 1 1 1 1 1 1 1

A B C D E F G

(6.4)

As indicated by the letters across the bottom, we can associatea main effect with each column except the first, in which case (6.4)defines a 27−4 factorial with, for example, C = AB = EF , etc.Array (6.4) is an 8 × 8 Hadamard matrix; see Appendix B. Eachrow indexes one run or one experimental unit. For example, the firstrun has factors A,B,D,G at their low level and the others at theirhigh level. The main effects of factors A up to G are independentlyestimable by the indicated contrasts in the eight observations: forexample the main effect of E is estimated by (Y1 − Y2 + Y3 − Y4 −Y5+Y6+Y7−Y8)/4. This design is called saturated for main effects;once the main effects have been estimated there are no degrees offreedom remaining to estimate interactions or error.

An orthogonal array of size n×n−1 with two symbols in each col-umn specifies a design saturated for main effects. The designs withsymbols ±1 are called Plackett-Burman designs and Hadamardmatrices defining them have been shown to exist for all multiplesof four up to 424; see Appendix B.

More generally, an n×k array with mi symbols in the ith columnis an orthogonal array of strength r if all possible combinationsof symbols appear equally often in any r columns. The symbolscorrespond to levels of a factor. The array in (6.3) has 2 levels ineach column, and has strength 2, as each of (−1,−1), (−1,+1),(+1,−1), (+1,+1) appears the same number of times in every setof two columns. An orthogonal array with all mi equal is calledsymmetric. The strength of the array is a generalization of thenotion of resolution of a fractional factorial, and determines thenumber of independent estimable effects.

Table 6.6 gives an asymmetric orthogonal array of strength 2with m1 = 3, m2 = m3 = m4 = 2. Each level of each factor occurs


Table 6.6 An asymmetric orthogonal array.

−1 −1 −1 −1 −1−1 −1 1 −1 1−1 1 −1 1 1−1 1 1 1 −1

0 −1 −1 1 10 −1 1 1 −10 1 −1 −1 10 1 1 −1 −11 −1 −1 1 −11 −1 1 −1 11 1 −1 −1 −11 1 1 1 1

A B C D E

the same number of times with each level of the remaining factors.Thus, for example, linear and quadratic effects of A and B can beestimated, as well as the linear effects used in specifying the design.

There is a large literature on the existence and construction oforthogonal arrays; see the Bibliographic notes. Methods of con-struction include ones based on orthogonal Latin squares, on dif-ference matrices, and on finite projective geometries. Orthogonalarrays of strength 2 are often associated with Taguchi methods,and are widely used in industrial experimentation; see Section 6.7.

6.3.4 Supersaturated systems

In an experiment with n experimental units and k two-level factorsit may if n = k + 1 be possible to find a design in which all maineffects can be estimated separately, for example by a fractionalfactorial design with main effects aliased only with interactions.Indeed this is possible, for example when k = 2m− 1 using the or-thogonal arrays described in the previous subsection. Such designsare saturated with main effects.


Table 6.7 Supersaturated design for 16 factors in 12 trials.

+ + + + + + + + + + + − − − − −+ − + + + − − − + − − − − − − −− + + + − − − + − − + + + − + ++ + + − − − + − − + − − + + + ++ + − − − + − − + − + + + − + −+ − − − + − − + − + + + + + − +− − − + − − + − + + + + − + + +− − + − − + − + + + − + − + − +− + − − + − + + + − − + + + + −+ − − + − + + + − − − − + + − −− − + − + + + − − − + − − − + +− + − + + + − − − + − − − − − −

Suppose now that n < k + 1, i.e. that there are fewer experi-mental units than parameters in a main effects model. A designfor such situations is called supersaturated. For example we mightwant to study 16 factors in 12 units. Clearly all main effects cannotbe separately estimated in such situations. If, however, to take anextreme case, it could plausibly be supposed that at most one fac-tor has a nonzero effect, it will be possible with suitable design toisolate that factor. If we specify the design by a n×k matrix of 1’sand −1’s it is reasonable to make the columns as nearly mutuallyorthogonal as possible. Such designs may be found by computersearch or by building on the theory of fractional replication.

These designs are not merely sensitive to the presence of inter-actions aliased with main effects but more seriously still if morethan a rather small number of effects are present very misleadingconclusions may be drawn.

Table 6.7 shows a design for 16 factors in 12 trials. It was formedby adding to a main effect design for 11 factors five additionalcolumns obtained by computer search. First the maximum scalarproduct of two columns was minimized. Then, within all designswith the same minimum, the number of pairs of columns with thatvalue was minimized.


While especially in preliminary industrial investigations it is en-tirely possible that the number of factors of potential interest ismore than the number of experimental units available for an ini-tial experiment, it is questionable whether the use of supersatu-rated designs is ever the most sensible approach. Two alternativesare abstinence, cutting down the number of factors in the initialstudy, and the use of judicious factor amalgamation. For the lat-ter suppose that two factors A and B are such that their upperand lower levels can be defined in such a way that if either hasan effect it is likely to be that the main effect is positive. We canthen define a new two-level quasi-factor (AB) with levels (1), (ab)in the usual notation. If a positive effect is found for (AB) then itis established that at least one of A and B has an effect. In this waythe main effects of factors of particular interest and which are notamalgamated are estimated free of main effect aliasing, whereasother main effects have a clear aliasing structure. Without the as-sumption about the direction of any effect there is the possibilityof effect cancellation. Thus in examining 16 factors in 12 trials wewould aim to amalgamate 10 factors in pairs and to investigatethe remaining 6 factors singly in a design for 11 new factors in 12trials.

6.4 Split plot designs


Formally a split plot, or split unit, experiment is a factorial exper-iment in which a main effect is confounded with blocks. There is,however, a difference of emphasis from the previous discussion ofconfounding. Instead of regarding the confounded main effects aslost, we now suppose there is sufficient replication for them to beestimated, although with lower, and maybe much lower, precision.In this setting blocks are called whole units, and what were pre-viously called units are now called subunits. The replicates of thedesign applied to the whole units and subunits typically correspondto our usual notion of blocks, such as days, operators, and so on.

As an example suppose in a factorial experiment with two factorsA and B, where A has four levels and B has three, we assign thefollowing treatments to each of four blocks:

(1) b b2 a b ab2 a2 a2b a2b2 a3 a3b a3b2


in an obvious notation. The main effect of A is clearly confoundedwith blocks. Equivalently, we may assign the level of A at randomto blocks or whole units, each of which consists of three subunits.The levels of B are assigned at random to the units in each block.

Now consider an experiment with, say kr whole units arrangedin r blocks of size k. Let each whole unit be divided into s equalsubunits. Let there be two sets of treatments (the simplest casebeing when there are two factors) and suppose that:

1. whole-unit treatments, A1, . . . , Ak, say, are applied at randomin randomized block form to the whole units;

2. subunit treatments, B1, . . . , Bs, are applied at random to thesubunits, each subunit treatment occurring once in each wholeunit.

An example of one block with k = 4 and s = 5 is:

A1 A2 A3 A4

B4 B2 B1 B5

B3 B1 B2 B4

B5 B5 B3 B3

B1 B3 B4 B2

B2 B4 B5 B1

All the units in the same column receive the same level of A. Therewill be a similar arrangement, independently randomized, in eachof the r blocks.

We can first do an analysis of the whole unit treatments repre-sented schematically by:

Source D.f.

Blocks r − 1Whole unit treatment A k − 1Error (a) (k − 1)(r − 1)

Between whole units kr − 1

The error is determined by the variation between whole unitswithin blocks and the analysis is that of a randomized block design.We can now analyse the subunit observations as:


Between whole units kr − 1Subunit treatment B s− 1A×B (s− 1)(k − 1)Error (b) k(r − 1)(s− 1)

Total krs− 1

The error (b) measures the variation between subunits withinwhole units. Usually this error is appreciably smaller than thewhole unit error (a).

There are two reasons for using split unit designs. One is practi-cal convenience, particularly in industrial experiments on two (ormore) stage processes, where the first stage represents the wholeunit treatments carried out on large batches, which are then splitinto smaller sections for the second stage of processing. This is thesituation in the example discussed in Section 6.4.2. The second is toobtain higher precision for estimating B and the interaction A×Bat the cost of lower precision for estimating A. As an example ofthis A might represent varieties of wheat, and B fertilisers: if thefocus is on the fertilisers, two or more very different varieties maybe included primarily to examine the A × B interaction thereby,hopefully, obtaining some basis for extending the conclusions aboutB to other varieties.

There are many variants of the split unit idea, such as the useof split-split unit experiments, subunits arranged in Latin squares,and so on. When we have a number of factors at two levels eachwe can apply the theory of Chapter 5 to develop more complicatedforms of split unit design.

6.4.2 Examples

We first consider two examples of factorial split-unit designs. Forthe first example, let there be four two-level factors, and let itbe required to treat one, A, as a whole unit treatment, the maineffects of B, C, and D being required among the subunit treat-ments. Suppose that each replicate is to consist of four whole units,each containing four subunits. Take as the confounding subgroupI, A,BCD,ABCD. Then the design is, before randomization,


(1) bc cd bd

a abc acd abd

ab ac abcd ad

b c bcd d

As a second example, suppose we have five factors and that itis required to have 1

2 replicates consisting of four whole units eachof four subunits, with factor A having its main effect in the wholeunit part. In the language of 2k factorials we want a 1

2 replicateof a 25 in 22 blocks of 22 units each with A confounded. The aliassubgroup is I, ABCDE with confounding subgroups

A = BCDE, BC = ADE, ABC = DE. (6.5)

This leaves two two factor interactions in the whole unit partand we choose them to be those of least potential interest. Thedesign is

(1) bc de bcde

ab ac abde acde

cd bd ce be

ae abce ad abcd

The analysis of variance table has the form outlined in Table 6.8.A prior estimate of variance will be necessary for this design.

Table 6.8 Analysis of variance for the 5 factor example.

Source D.f.

A 1Between whole plots BC 1

DE 1

Main effects B,C,D,E 4Two factor interactions 8 (= 10− 2)

15

Our third example illustrates the analysis of a split unit exper-iment, and is adapted from Montgomery (1997, Section 12.4). The


Table 6.9 Tensile strength of paper. From Montgomery (1997).

Day 1 Day 2 Day 3Prep. method 1 2 3 1 2 3 1 2 3

1 30 34 29 28 31 31 31 35 32Temp 2 35 41 26 32 36 30 37 40 34

3 37 38 33 40 42 32 41 39 394 36 42 36 41 40 40 40 44 45

experiment investigated two factors, pulp preparation method andtemperature, on the tensile strength of paper. Temperature was tobe set at four levels, and there were three preparation methods.It was desired to run three replicates, but only 12 runs could bemade per day. One replicate was run on each of the three days,and replicates (or days) is the blocking factor.

On each day, three batches of pulp were prepared by the threedifferent methods; thus the level of this factor determines the wholeunit treatment. Each of the three batches was subdivided into fourequal parts, and processed at a different temperature, which is thusthe subunit treatment. The data are given in Table 6.9.

The analysis of variance table is given in Table 6.10. If F -testsare of interest, the appropriate test for the main effect of prepara-tion method is 64.20/9.07, referred to an F2,4 distribution, whereasfor the main effect of temperature and the temperature × prepa-ration interaction the relevant denominator mean square is 3.97.Similarly, the standard error of the estimated preparation effect islarger than that for the temperature and temperature × prepara-tion effects. Estimates and their standard errors are summarizedin Table 6.11.

6.5 Nonspecific factors

We have already considered the incorporation of block effects intothe analysis of a factorial experiment set out in randomized blocks.This follows the arguments based on randomization theory and de-veloped in Chapters 3 and 4. Formally a simple randomized blockexperiment with a single set of treatments can be regarded as onereplicate of a factorial experiment with one treatment factor and


Table 6.10 Analysis of variance table for split unit example.


Blocks 77.55 2 38.78Prep. method 128.39 2 64.20Blk × Prep.(error (a)) 36.28 4 9.07

Temp 434.08 3 144.69Prep× Temp 75.17 6 12.53Error (b) 71.49 18 3.97

Table 6.11 Means and estimated standard errors for split unit experi-ment.

Prep1 2 3 Mean

1 29.67 33.33 30.67 31.22Temp 2 34.67 39.00 30.00 34.56 Standard error

3 39.33 39.67 34.67 37.89 for difference 0.944 39.00 42.00 40.33 40.44

Mean 35.67 38.50 33.92 36.03Standard error for difference 1.23

one factor, namely blocks, referring to the experimental units. Wecall such a factor nonspecific because it will in general not be de-termined by a single aspect, such as sex, of the experimental units.In view of the assumption of unit-treatment additivity we may usethe formal interaction, previously called residual, as a base for es-timating the effective error variance. From another point of viewwe are imposing a linear model with an assumed zero interactionbetween treatments and blocks and using the associated residualmean square to estimate variance. In the absence of an externalestimate of variance there is little effective alternative, unless someespecially meaningful components of interaction can be identifiedand removed from the error estimate. But so long as the initialassumption of unit-treatment additivity is reasonable we need nospecial further assumption.


Now suppose that an experiment, possibly a factorial experi-ment, is repeated in a number of centres, for example a numberof laboratories or farms or over a number of time points some ap-preciable way apart. The assumption of unit-treatment additivityacross a wide range of conditions is now less appealing and consid-erable care in interpretation is needed.

Illustrations. Some agricultural field trials are intended as a basisfor practical recommendations to a broad target population. Thereis then a strong case for replication over a number of farms andover time. The latter gives a spread of meteorological conditionsand the former aims to cover soil types, farm management practicesand so on. Clinical trials, especially of relatively rare conditions,often need replication across centres, possibly in different countries,both to achieve some broad representation of conditions, but also inorder to accrue the number of patients needed to achieve reasonableprecision.

To see the issues involved in fairly simple form suppose that westart with an experiment with just one factor A with r replicates ofeach treatment, i.e. in fact a simple nonfactorial experiment. Nowsuppose that this design is repeated independently at k centres;these may be different places, laboratories or times, for example.Formally this is now a two factor experiment with replication. Weassume the effect of factors A and B on the expected response areof the form

τij = τAi + τB

j + τABij , (6.6)

using the notation of Section 5.3, and we compute the analysis ofvariance table by the obvious extension to the decomposition of theobservations for the randomized block design Yijs used in Section3.4:

Yijs = Y... + (Yi.. − Y...)− (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...) + (Yijs − Yij.). (6.7)

We can compute the expected mean squares from first principlesunder the summation restrictions ΣτA

i = 0, ΣτBj = 0, Σiτ

ABij = 0,

and ΣjτABij = 0. Then, for example, E(MSAB) is equal to

ErΣij(Yij. − Yi.. − Y.j. + Y...)2/(v − 1)(k − 1)= rEΣijτAB

ij + (εij. − εi.. − ε.j. + ε...)2/(v − 1)(k − 1)= rΣij(τAB

ij )2 + rΣijE(εij. − εi.. − ε.j. + ε...)2/(v − 1)(k − 1).


The last expectation is that of a quadratic form in εij. of rank(v − 1)(k − 1) and hence equal to σ2(v − 1)(k − 1)/r.

The analysis of variance table associated with this system hasthe form outlined in Table 6.12. From this we see that the designpermits testing of A × B against the residual within centres. Ifunit-treatment additivity held across the entire investigation theinteraction mean square and the residual mean square would bothbe estimates of error and would be of similar size; indeed if suchunit-treatment additivity were specified the two terms would bepooled. In many contexts, however, it would be expected a prioriand found empirically that the interaction mean square is greaterthan the mean square within centres, establishing that the treat-ment effects are not identical in the different centres.

If such an interaction is found, it should be given a rationalinterpretation if possible, either qualitatively or, for example, byfinding an explicit property of the centres whose introduction intoa formal model would account for the variation in treatment effect.In the absence of such an explanation there is little quantitativealternative to regarding the interaction as a haphazard effect rep-resented by a random variable in an assumed linear model. Notethat we would not do this if centres represented a specific propertyof the experimental material, and certainly not if centres had beena treatment factor.

A modification to the usual main effect and interaction model is

Table 6.12 Analysis of variance for a replicated two factor experiment.

Source D.f. Expected Mean squares

A, Trtms v − 1 σ2 + rkΣ(τAi )2/(v − 1)

B, centres k − 1 σ2 + rvΣ(τBj )2/(k − 1)

A×B (v − 1)(k − 1) σ2 + rΣ(τABij )2/(v − 1)(k − 1)

Within vk(r − 1) σ2

centres


now essential. We write instead of (6.6)

τij = τAπi + τB

j + ηABij , (6.8)

where ηABij are assumed to be random variables with zero mean, un-

correlated and with constant variance σ2AB , representing the hap-

hazard variation in treatment effect from centre to centre. Note thecrucial point that it would hardly ever make sense to force thesehaphazard effects to sum to zero over the particular centres used.There are, moreover, strong homogeneity assumptions embeddedin this specification: in addition to assuming constant variance weare also excluding the possibility that there may be some contraststhat are null across all centres, and at the same time some largetreatment effects that are quite different in different centres. If thatwere the case, the null effects would in fact be estimated with muchhigher precision than the non-null treatment effects and the treat-ment times centres interaction effect would need to be subdivided.

In (6.8) τAπ2 − τA

π1 specifies the contrast of two levels averagedout not only over the differences between the experimental unitsemployed but also over the distribution of the ηAB

ij , i.e. over ahypothetical ensemble π of repetitions of the centres.

A commonly employed, but in some contexts rather unfortunate,terminology is to call centres a random factor and to add the usu-ally irrelevant assumption that the τB

j also are random variables.The objection to that terminology is that farms, laboratories, hos-pitals, etc. are rarely a random sample in any meaningful senseand, more particularly, if this factor represents time it is not of-ten meaningful to regard time variation as totally random andfree of trends, serial correlations, etc. On the other hand the ap-proximation that the way treatment effects vary across centres isrepresented by uncorrelated random variables is weaker and moreplausible.

The table of expected mean squares for model (6.8) is givenin Table 6.13. The central result is that when interest focuses ontreatment effects averaged over the additional random variation theappropriate error term is the mean square for interaction of treat-ments with centres. The arguments against study of the treatmentmain effect averaged over the particular centres in the study havealready been rehearsed; if that was required we would, however,revert to the original specification and use the typically smaller


Table 6.13 Analysis of variance for a two factor experiment with a ran-dom effect.

Source D.f. Expected mean squares

A v − 1 σ2 + rσ2AB + rkΣ(τA

πi)2/(v − 1)

B k − 1 σ2 + rvΣ(τBj )2/(k − 1)

A×B (v − 1)(k − 1) σ2 + rσ2AB

residual vk(r − 1) σ2

mean square within centres to estimate the error variance associ-ated with the estimation of the parameters τA

πi.

6.6 Designs for quantitative factors


When there is a single factor whose levels are defined by a quanti-tative variable, x, there is always the possibility of using a trans-formation of x to simplify interpretation, for example by achievingeffective linearity of the dependence of the response on x or onpowers of x. If a special type of nonlinear response is indicated,for example by theoretical considerations, then fitting by maxi-mum likelihood, often equivalent to nonlinear least squares, willbe needed and the methods of nonlinear design sketched in Section7.6 may be used. An alternative is first to fit a polynomial responseand then to use the methods of approximation theory to convertthat into the desired form. In all cases, however, good choice of thecentre of the design and the spacing of the levels is important fora succesful experiment.

When there are two or more factors with quantitative levels itmay be very fruitful not merely to transform the component vari-ables, but to define a linear transformation to new coordinates inthe space of the factor variables. If, for instance, the response sur-face is approximately elliptical, new coordinates close to the prin-cipal axes of the ellipse will usually be helpful: a long thin ridge atan angle to the original coordinate axes would be poorly exploredby a simple design without such a transformation of the x’s. Ofcourse to achieve a suitable transformation previous experimenta-tion or theoretical analysis is needed. We shall suppose throughout


the following discussion that any such transformation has alreadybeen used.

In many applications of factorial experiments the levels of thefactors are defined by quantitative variables. In the discussion ofChapter 5 this information was not explicitly used, although thepossibility was mentioned in Section 5.3.3.

We now suppose that all the factors of interest are quantitative,although it is straightforward to accommodate qualitative factorsas well. In many cases, in the absence of a subject-matter basisfor a specific nonlinear model, it would be reasonable to expectthe response y to vary smoothly with the variables defining thefactors; for example with two such factors we might assume

E(Y ) = η(x1, x2) = β00 + β10x1 + β01x2

+12(β20x

21 + 2β11x1x2 + β02x

22) (6.9)

with block and other effects added as appropriate. One interpreta-tion of (6.9) is as two terms of a Taylor series expansion of η(x1, x2)about some convenient origin.

In general, with k factors, the quadratic model for a response is

E(Y ) = η(x1, . . . , xk) = β00... + β10...x1 + . . .+ β0...1xk

+12(β20...x

21 + 2β11...x1x2 + . . .+ β0...2x

2k). (6.10)

A 2k design has each treatment factor set at two levels, xi = ±1,say. In Section 5.5 we used the values 0 and 1, but it is more conve-nient in the present discussion if the treatment levels are centred onzero. This design does not permit estimation of all the parametersin (6.10), as x2

i ≡ 1, so the coefficients of pure quadratic terms areconfounded with the main effect. Indeed from observations at twolevels it can hardly be possible to assess nonlinearity! However, theparameters β10..., β01... and so on are readily identified with whatin Section 5.5 were called main effects, i.e.

2β10... = average response at high level of factor 1− average response at low level of factor 1,

for example. Further, the cross-product parameters are identifiedwith the interaction effects, β11..., for example, measuring the rateof change with x2 of the linear regression of y on x1.

In a fractional replicate of the full 2k design, we can estimatelinear terms β10..., β01... and so on, as long as main effects are not


x2

x3

x1

(a)

x2

x3

x1

(b)

Figure 6.1 (a). Design space for three factor experiment. Full 23 in-dicated by vertices of cube. Closed and open circles, points of one-halfreplicates with alias I = ABC. (b) Axial points added to form centralcomposite design.

aliased with each other. Similarly we can estimate cross-productparameters β11..., etc., if two factor interactions are not aliasedwith any main effects.

To estimate the pure quadratic terms in the response, it is neces-sary to add design points at more levels of xi. One possibility is toadd the centre point (0, . . . , 0); this permits estimation of the sumof all the pure quadratic terms and may be useful when the goalis to determine the point of maximum or minimum response or tocheck whether a linear approximation is adequate against stronglyconvex or strongly concave alternatives.

Figure 6.1a displays the design space for the case of three factors;the points on the vertices of the cube are those used in a full 23

factorial. Two half fractions of the factorial are indicated by theuse of closed or open circles. Either of these half fractions permitsestimation of the main effects, β100, β010 and β001. Addition of oneor more points at (0, 0, 0) permits estimation of β200 +β020 +β002;replicate centre points can provide an internal estimate of error,which should be compared to any error estimates available fromexternal sources.

In order to estimate the pure quadratic terms separately, we


must include points for at least three levels of xi. One possibility isto use a complete or fractional 3k factorial design. An alternativedesign quite widely used in industrial applications is the centralcomposite design, in which a 2k design or fraction thereof is aug-mented by one or more central points and by design points alongthe coordinate axes at (α, 0, . . . , 0), (−α, 0, . . . , 0) and so on. Theseaxial points are added to the 23 design in Fig. 6.1b. One approachto choosing the coded value for α is to require that the estimatedvariance of the predicted response depends only on the distancefrom the centre point of the design space. Such designs are calledrotatable. The criterion is, however, dependent on the scaling ofthe levels of the different factors; see Exercise 6.8.

6.6.2 Search for optima

Response surface designs are used, as their name implies, to inves-tigate the shape of the dependence of the response on quantitativefactors, and sometimes to determine the estimated position of max-imum or minimum response, or more realistically a region in whichclose to optimal response is achieved. As at (6.10), this shape isoften approximated by a quadratic, and once the coefficients areestimated the point of stationarity is readily identified. However ifthe response surface appears to be essentially linear in the range ofx considered, and indeed whenever the formal stationary point lieswell outside the region of investigation, further work will be neededto identify a stationary point at all satisfactorily. Extrapolation isnot reliable as it is very sensitive to the quadratic or other modelused.

In typical applications a sequence of experiments is used, first toidentify important factors and then to find the region of maximumresponse. The method of steepest ascents can be used to suggestregions of the design space to be next explored, although scaledependence of the procedure is a major limitation. Typically thefirst experiment will not cover the region of optimality and a linearmodel will provide an adequate fit. The steepest ascent directioncan be estimated from this linear model as the vector orthogonalto the fitted plane, although as noted above this depends on therelative units in which the x’s are measured and this will usuallybe rather arbitrary.


6.6.3 Quality and quantity interaction

In most contexts the simple additive model provides a natural ba-sis for the assessment of interaction. In special circumstances, how-ever, there may be other possibilities, especially if one of the factorshas quantitative levels. Suppose, for instance, that in a two factorexperiment a level, i, of the first factor is labelled by a quantitativevariable xi, corresponding to the dose or quantity of some treat-ment, measured on the same scale for all levels j of the secondfactor which is regarded as qualitative.

One possible simple structure would arise if the difference ineffect between two levels of j is proportional to the known level xi,so that if Yij is the response in combination (i, j), then

E(Yij) = αj + βjxi, (6.11)

with the usual assumption about errors; that is, we have separatelinear regressions on xi for each level of the qualitative factor.

A special case, sometimes referred to as the interaction of qualityand quantity, arises when at xi = 0 we have that all factorialcombinations are equivalent. Then αj = α and the model becomes

E(Yij) = α+ βjxi. (6.12)

Illustration. The application of a particular active agent, for ex-ample nitrogenous fertiliser, may be possible in various forms: theamount of fertiliser is the quantitative factor, and the variant ofapplication the qualitative factor. If the amount is zero then thetreatment is no additional fertiliser whatever the variant, so thatall factorial combinations with xi = 0 are identical.

In such situations it might be questioned whether the full facto-rial design, leading to multiple applications of the same treatment,is appropriate, although it is natural if a main effect of dose av-eraged over variants is required. With three levels of xi, say 0, 1and 2, and k levels of the second factor arranged in r blocks with3k units per block the analysis of variance table will have the formoutlined in Table 6.14.

Here there are two error lines, the usual one for a randomizedblock experiment and an additional one, shown last, from the vari-ation within blocks between units receiving the identical zero treat-ment.

To interpret the treatment effect it would often be helpful to fitby least squares some or all of the following models:


E(Yij) = α,

E(Yij) = α+ βxi,

E(Yij) = α+ βjxi,

E(Yij) = α+ βxi + γx2i ,

E(Yij) = α+ βjxi + γx2i ,

E(Yij) = α+ βjxi + γjx2i .

The last is a saturated model accounting for the full sum of squaresfor treatments. The others have fairly clear interpretations. Notethat the conventional main effects model is not included in thislist.

6.6.4 Mixture experiments

A special kind of experiment with quantitative levels arises whenthe factor levels xj , j = 1, . . . , k represent the proportions of kcomponents in a mixture. For all points in the design space

Σxj = 1, (6.13)

so that the design region is all or part of the unit simplex. A numberof different situations can arise and we outline here only a few keyideas, concentrating for ease of exposition on small values of k.

First, one or more components may represent amounts of traceelements. For example, with k = 3, only very small values of x1

may be of interest. Then (6.13) implies that x2 + x3 is effectivelyconstant and in this particular case we could take x1 and the pro-portion x2/(x2 + x3) as independent coordinates specifying treat-

Table 6.14 Analysis of variance for a blocked design with treatment ef-fects as in (6.11).

Source D.f.

Treatments 2kBlocks r − 1Treatments× Blocks 2k(r − 1)Error within Blocks r(k − 1)


ments. More generally the dimension of the design space affectedby the constraint (6.13) is k−1 minus the number of trace elements.

Next it will often happen that only treatments with all compo-nents present are of interest and indeed there may be quite strongrestrictions on the combinations of components that are of concern.This means that the effective design space may be quite compli-cated; the algorithms of optimal design theory sketched in Section7.4 may then be very valuable, especially in finding an initial designfor more detailed study.

It is usually convenient to use simplex coordinates. In the casek = 3 these are triangular coordinates: the possible mixtures arerepresented by points in an equilateral triangle with the verticescorresponding to the pure mixtures (1, 0, 0), (0, 1, 0) and (0, 0, 1).For a general point (x1, x2, x3), the coordinate x1, say, is the areaof the triangle formed by the point and the complementary vertices(0, 1, 0) and (0, 0, 1). The following discussion applies when thedesign space is the full triangle or, with minor modification, if it isa triangle contained within the full space.

At a relatively descriptive level there are two basic designs thatin a sense are analogues of standard factorial designs. In the sim-plex centroid design, there are 2k − 1 distinct points, the k purecomponents such as (1, 0, . . . , 0), the k(k − 1)/2 simple mixturessuch as (1/2, 1/2, 0, . . . , 0) and so on up to the complete mixture(1/k, . . . , 1/k). Note that all components present are present inequal proportions. This may be contrasted with the simplex latticedesigns of order (k, d) which are intended to support the fittingof a polynomial of degree d. Here the possible values of each xj

are 0, 1/d, 2/d, . . . , 1 and the design consists of all combinations ofthese values that satisfy the constraint Σxj = 1.

As already noted if the object is to study the behaviour of mix-tures when one or more of the components are at very low pro-portions, or if singular behaviour is expected as one componentbecomes absent, these designs are not directly suitable, althoughthey may be useful as the basis of a design for the other compo-nents of the mixture. Fitting of a polynomial response surface isunlikely to be adequate.

If polynomial fitting is likely to be sensible, there are two broadapproaches to model parameterization, affecting analysis ratherthan design. In the first there is no attempt to give individualparameters specific interpretation, the polynomial being regardedas essentially a smoothing device for describing the whole surface.


The defining constraint Σxj = 1 can be used in various slightlydifferent ways to define a unique parameterization of the model.One is to produce homogeneous forms. For example to produce ahomogeneous expression of degree two we start with an ordinarysecond degree representation, multiply the constant by (Σxj)2 andthe linear terms by Σxj leading to the general form

Σi≤jδijxixj , (6.14)

with k(k+1)/2 independent parameters to be fitted by least squaresin the usual way. Interpretation of single parameters on their ownis not possible.

Other parameterizations are possible which do allow interpreta-tion in terms of responses to pure mixtures, for example verticesof the simplex, simple binary mixtures, and so on.

A further possibility, which is essentially just a reparameteriza-tion of the first, is to aim for interpretable parameters in terms ofcontrasts and for this additional information must be inserted. Onepossibility is to consider a reference or standard mixture (s1, . . . , sk).The general idea is that to isolate the effect of, say, the first com-ponent we imagine x1 increased to x1 + ∆. The other compo-nents must change and we suppose that they do so in accordancewith the standard mixture, i.e. for j 6= 1, the change in xj is toxj − ∆sj/(1 − s1). Thus if we start from the usual linear modelβ0 + Σβjxj imposition of the constraint

Σβjsj = 0

will lead to a form in which a change ∆ in x1 changes the expectedresponse by β1∆/(1 − s1). This leads finally to writing the linearresponse model in the form

β0 + Σβjxj/(1− sj) (6.15)

with the constraint noted above. A similar argument applies tohigher degree polynomials. The general issue is that of definingcomponent-wise directional derivatives on a surface for which thesimplex coordinate system is mathematically the most natural, butfor reasons of physical interpretation not appropriate.


6.7 Taguchi methods


Many of the ideas discussed in this book were first formulated inconnection with agricultural field trials and were then applied inother areas of what may be broadly called biometry. Industrialapplications soon followed and by the late 1930’s factorial exper-iments, randomized blocks and Latin squares were quite widelyused, in particular in the textile industries where control of productvariability is of central importance. A further major developmentcame in the 1950’s in particular by the work of Box and associateson design with quantitative factors and with the search for opti-mum operating conditions in the process industries. Although firstdeveloped partly in a biometric context, fractional replication wasfirst widely used in this industrial setting. The next major devel-opment came in the late 1970’s with the introduction via Japanof what have been called Taguchi methods. Indeed in some dis-cussions the term Taguchi design is misleadingly used as beingvirtually synonymous with industrial factorial experimentation.

There are several somewhat separate aspects to the so-calledTaguchi method, which can broadly be divided into philosophical,design, and analysis. The philosophical aspects relate to the cre-ation of working conditions conducive to the continuous emphasison ensuring quality in production, and are related to the similarlymotivated but more broad ranging ideas of Deming and to thenotion of evolutionary operation.

We discuss here briefly the novel design aspects of Taguchi’scontributions. One is the emphasis on the study and control ofproduct variability, especially in contexts where achievement ofa target mean value of some feature is relatively easy and wherehigh quality hinges on low variability. Factors which cannot be con-trolled in a production environment but which can be controlledin a research setting are deliberately varied as so-called noise fac-tors, often in split-unit designs. Another is the systematic use oforthogonal arrays to investigate main effects and sometimes twofactor interactions.

The designs most closely associated with the Taguchi methodare orthogonal arrays as described in Section 6.3, often Plackett-Burman two and three level arrays. There tends to be an emphasisin Taguchi’s writing on designs for the estimation only of maineffects; it is argued that in each experiment the factor levels can or


should be chosen to eliminate or minimize the size of interactionsamong the controllable factors.

We shall not discuss some special methods of analysis introducedby Taguchi which are less widely accepted. Where product variabil-ity is of concern the analysis of log sample variances will often beeffective.

The popularization of the use of fractional factorials and relateddesigns and the emphasis on designing for reduction in variabilityand explicit accommodation of uncontrollable variability, althoughall having a long history, have given Taguchi’s approach consider-able appeal.

6.7.2 Example

This example is a case study from the electronics industry, as de-scribed by Logothetis (1990). The purpose of the experiment wasto investigate the effect of six factors on the etch rate (in A/min) ofthe aluminium-silicon layer placed on the surface of an integratedcircuit. The six factors, labelled here A to F , control various con-ditions of manufacture, and three levels of each factor were chosenfor the experiment. A seventh factor of interest, the over-etch time,was controllable under experimental conditions but not under man-ufacturing conditions. In this experiment it was set at two levels.Finally, the etch rate was measured at five fixed locations on eachexperimental unit, called a wafer: four corners and a centre point.

The design used for the six controllable factors is given in Table6.15: it is an orthogonal array which in compilations of orthogonalarray designs is denoted by L18(36) to indicate eighteen runs, andsix factors with three levels each.

Table 6.16 shows the mean etch rate across the five locations oneach wafer. The individual observations are given by Logothetis(1990). The two mean values for each factor combination corre-spond to the two levels of the “uncontrollable” factor, the over-etchrate. This factor has been combined with the orthogonal array ina split-unit design. The factor settings A up to F are assigned towhole units, and the two wafers assigned to different values of OEare the sub-units.

The design permits estimation of the linear and quadratic maineffects of the six factors, and five further effects. All these effectsare of course highly aliased with interactions. These five further


Table 6.15 Design for the electronics example.

A B C D E F−1 −1 −1 −1 −1 −1−1 0 0 0 0 0−1 1 1 1 1 1

0 −1 −1 0 0 10 0 0 1 1 −10 1 1 −1 −1 01 −1 0 −1 1 01 0 1 0 −1 11 1 −1 1 0 −1

−1 −1 1 1 0 0−1 0 −1 −1 1 1−1 1 0 0 −1 −1

0 −1 0 1 −1 10 0 1 −1 0 −10 1 −1 0 1 01 −1 1 0 1 −11 0 −1 1 −1 01 1 0 −1 0 1

effects are pooled to form an estimate of error for the main effects,and the analysis of variance table is as indicated in Table 6.17.

From this we see that the main effects of factors A, E and Fare important, and partitioning of the main effects into linearand quadratic components shows that the linear effects of thesefactors predominate. This partitioning also indicates a suggestivequadratic effect of B. The AE linear by linear interaction is aliasedwith the linear effect of F and the quadratic effect of B, so the in-terpretation of the results is not completely straightforward. Thesimplest explanation is that the linear effects of A, E and AE arethe most important influences on the etch rate.

The analysis of the subunits shows that the over-etch time doeshave a significant effect on the response, and there are suggestiveinteractions of this with A, B, D, and E. These interaction effectsare much smaller than the main effects of the controllable factors.Note from Table 6.17 that the subunit variation between wafers ismuch smaller than the whole unit variation, as is often the case.


Table 6.16 Mean etch rate (A min−1) for silicon wafers under variousconditions.

run OE, 30s OE, 90s mean1 4750 5050 49002 5444 5884 56643 5802 6152 59774 6088 6216 61525 9000 9390 91956 5236 5902 55697 12960 12660 128108 5306 5476 53919 9370 9812 9591

10 4942 5206 507411 5516 5614 556512 5108 5322 521013 4890 5108 499914 8334 8744 853915 10750 10750 1075016 12508 11778 1214317 5762 6286 602418 8692 8920 8806

6.8 Conclusion

In these six chapters we have followed a largely traditional paththrough the main issues of experimental design. In the followingtwo chapters we introduce some more specialized topics. Through-out there is some danger that the key concepts become obscuredin the details.

The main elements of good design may in our view be summa-rized as follows.

Experimental units are chosen; these are defined by the small-est subdivision of the material such that any two units may receivedifferent treatments. A structure across different units is character-ized, typically by some mixture of cross-classification and nestingand possibly baseline variables. The cross-classification is deter-mined both by blocks (rows, columns, etc.) of no intrinsic interestand by strata determined by intrinsic features of the units (for


Table 6.17 Analysis of variance for mean etch rate.


(×106) (×106)A 84083 2 42041B 6997 2 3498

Whole unit C 3290 2 1645D 5436 2 2718E 98895 2 49448F 28374 2 14187

Whole unit 4405 2 881errorOE 408 1 408

OE × A 112 2 56Subunit OE ×B 245 2 122

OE × C 5.9 2 3.0OE ×D 159 2 79.5OE × E 272 2 136OE × F 13.3 2 6.6

Subunit error 55.4 5 11.1

example, gender). Blocks are used for error control and strata toinvestigate possible interaction with treatments. Interaction of thetreatment effects with blocks and variation among nested units isused to estimate error.

Treatments are chosen and possible structure in them identified,typically via a factorial structure of qualitative and quantitativefactors.

Appropriate design consists in matching the treatment and unitstructures to ensure that bias is eliminated, notably by random-ization, that random error is controlled, usually by blocking, andthat analysis appropriate to the design is achieved, in the simplestcase via a linear model implicitly determined by the design, therandomization and a common assumption of unit-treatment addi-tivity.

Broadly, in agricultural field trials structure of the units (plots)is a central focus, in industrial experiments structure of the treat-ments is of prime concern, whereas in most clinical trials a key


issue is the avoidance of bias and the accrual of sufficient units(patients) to achieve adequate estimation of the relatively modesttreatment differences commonly encountered. More generally eachnew field of application has its own special features; neverthelesscommon principles apply.


The material in Sections 6.2, 6.3 and 6.4 stems largely from Yates(1935, 1937). It is described, for example, by Kempthorne (1952)and by Cochran and Cox (1958). Some of the more mathematicalconsiderations are developed from Bose (1938).

Orthogonal arrays of strength 2, defined via Hadamard matri-ces, were introduced by Plackett and Burman (1945); the defi-nition used in Section 6.3 is due to Rao (1947). Bose and Bush(1952) derived a number of upper bounds for the maximum pos-sible number of columns for orthogonal arrays of strength 2 and3, and introduced several methods of construction of orthogonalarrays that have since been generalized. Dey and Mukerjee (1999)survey the current known bounds and illustrate the various meth-ods of construction, with an emphasis on orthogonal arrays rele-vant to fractional factorial designs. Hedayat, Sloane and Stufken(1999) provide an encyclopedic survey of the existence and con-struction of orthogonal arrays, their connections to Galois fields,error-correcting codes, difference schemes and Hadamard matri-ces, and their uses in statistics. The array illustrated in Table 6.6is constructed in Wang and Wu (1991).

Supersaturated designs with the factor levels randomized, so-called random balance designs, were popular in industrial experi-mentation for a period in the 1950’s but following critical discus-sion of the first paper on the subject (Satterthwaite, 1958) theiruse declined. Booth and Cox (1962) constructed systematic de-signs by computer enumeration. See Hurrion and Birgil (1999) foran empirical study.

Box and Wilson (1951) introduced designs for finding optimumoperating conditions and the subsequent body of work by Boxand his associates is described by Box, Hunter and Hunter (1978).Chapter 15 in particular provides a detailed example of sequentialexperimentation towards the region of the maximum, followed bythe fitting of a central composite design in the region of the maxi-mum. The general idea is that only main effects and perhaps a few


two factor interactions are likely to be important. The detailedstudy of follow-up designs by Meyer, Steinberg and Box (1996)hinges rather on the notion that only a small number of factors,main effects and their interactions, are likely to play a major role.

The first systematic study of mixture designs and associatedpolynomial representations was done by Scheffe (1958), at the sug-gestion of Cuthbert Daniel, motivated by industrial applications.Earlier suggestions of designs by Quenouille (1953) and Claring-bold (1955) were biologically motivated. A thorough account of thetopic is in the book by Cornell (1981). The representation via areference mixture is discussed in more detail by Cox (1971).

The statistical aspects of Taguchi’s methods are best approachedvia the wide-ranging panel discussion edited by Nair (1992) and thebook of Logothetis and Wynn (1989). For evolutionary operation,see Box and Draper (1969).

The example in Section 6.7 is discussed by Logothetis (1990),Fearn (1992) and Tsai et al. (1996). Fearn (1992) pointed out thatthe aliasing structure complicates interpretation of the results. Thesplit plot analysis follows Tsai et al. (1996). The three papers aregive many more details and a variety of approaches to the problem.There are also some informative interaction plots presented in thetwo latter papers. For an extended form of Taguchi-type designsfor studying noise factors, see Rosenbaum (1999a).

Nelder (1965a, b) gives a systematic account of an approach todesign and analysis that emphasizes treatment and unit structuresas basic principles. For a recent elaboration, see Brien and Payne(1999).


1. A 24 experiment is to be run in 4 blocks with 4 units per block.Take as the generators ABC and BCD, thus confounding alsothe two factor interaction AD with blocks and display the treat-ments to be applied in each block. Now show that if it is possibleto replicate the experiment 6 times, it is possible to confoundeach two factor interaction exactly once. Then show that 5/6of the units give information about, say AB, and that if theratio σc/σ is small enough, it is possible to estimate the twofactor interactions more precisely after confounding, where σ2

c

is the variance of responses within the same block and σ2 is thevariance of all responses.


2. Show that the 2k experiment can be confounded in 2k−1 blocksof two units per block allowing the estimation of main effectsfrom within block comparisons. Suggest a scheme of partial con-founding appropriate if two factor interactions are also required.

3. Double confounding in 2k: Let u, v, . . . and x, y, . . . be r+ c = kindependent elements of the treatments group. Write out the2r × 2c array

1 x y xy z . . .u ux uy uxy uzv vx . . .uvw...

The first column is a subgroup and the other columns are cosets,i.e. there is a subgroup of contrasts confounded with columns,defined by generators X , Y, . . .. Likewise there are generatorsU, V, . . . defining the contrasts confounded with rows. Show thatX,Y, . . . ;U, V, . . . are a complete set of generators of the con-trasts group.

4. We can formally regard a factor at four levels, 1, a, a2, a3 as theproduct of two factors at two levels, by writing, for example 1,X , Y , and XY for the four levels. The three contrastsX , Y , andXY are three degrees of freedom representing the main effect ofA. Often XY is of equal importance with X and Y and wouldbe preserved in a system of confounding.

(a) Show how to arrange a 4 × 22 in blocks of eight with threereplicates in a balanced design, partially confounding XBC,Y BC and therefore also XYBC.

(b) If the four levels of the factor are equally spaced, express thelinear, quadratic and cubic components of regression in termsof X , Y , and XY . Show that the Y equals the quadratic com-ponent and that if XY is confounded and the cubic regressionis negligible, then X gives the linear component.

Yates (1937) showed how to confound the 3 × 22 in blocks ofsix, and the 4× 2n in blocks of 4× 2n−1 and 4× 2n−2. He also


constructed the 3n×2 in blocks of 3n−1×2 and 3n−2×2. Thesedesigns are reproduced in many textbooks.

5. Discuss the connection between supersaturated designs and thesolution of the following problem. Given 2m coins all but oneof equal mass and one with larger mass and a balance with twopans thus capable of discriminating larger from smaller totalmasses, how many weighings are needed to find the anomalouscoin.

By simulation or theoretical analysis examine the consequencesin analysing data from the design of Table 6.7 of the presenceof one, two, three or more main effects.

6. Explore the possibilities, including the form of the analysis ofvariance table, for designs of Latin square form in which in ad-dition to the treatments constituting the Latin square furthertreatments are applied to whole rows and/or whole columns ofthe square. These will typically give contrasts for these furthertreatments of low precision; note that the experiment is essen-tially of split plot form with two sets of whole unit treatments,one for rows and one for columns. The designs are variouslycalled plaid designs or criss-cross designs. See Yates (1937) andfor a discussion of somewhat related designs applied to an ex-periment on medical training for pain assessment, Farewell andHerzberg (2000).

7. Suppose that in a split unit experiment it is required to com-pare two treatments with different levels of both whole unit andsubunit treatments. Show how to estimate the standard errorof the difference via a combination of the two residual meansquares. How would approximate confidence limits for the dif-ference be found either by use of the Student t distribution withan approximate number of degrees of freedom or by a likelihood-based method?

8. In a response surface design with levels determined by variablesx1, . . . , xk the variance of the estimated response at positionx under a given model, for example a polynomial of degree d,can be regarded as a function of x. If the contours of constantvariance are spherical centred on the origin the design is calledrotatable; see Section 6.6.1. Note that the definition dependsnot merely on the choice of origin for x but more critically onthe relative units in which the different x’s are measured. For a


quadratic model the condition for rotatability, taking the cen-troid of the design points as the origin, requires all variables tohave the same second and fourth moments and Σx4

iu = 3Σx2iux

2ju

for all i 6= j.

Show that for a quadratic model with 2k factorial design points(±1, . . . ,±1) and 2k axial points (±a, 0, . . .), . . . , (0, . . . ,±a),the design is rotatable if and only if a = (

√2)k. For comparative

purposes it is more interesting to examine differences betweenestimated responses at two points x′, x′′, say. It can be shownthat in important special cases rotatability implies that the vari-ance depends only on the distances of the points from the originand the angle between the corresponding vectors. Rotatabilitywas introduced by Box and Hunter (1957) and the discussion ofdifferences is due to Herzberg (1967).

9. The treatment structure for the example discussed in Section4.2.6 was factorial, with three controllable factors expected toaffect the properties of the response. These three factors werequantitative, and set at three equally spaced levels, here shownin coded values, following a central composite design. Each of theeight factorial points (±1,±1,±1) were used twice, the centrepoint (0, 0, 0) was replicated six times, and the six axial points(±1, 0, 0), (0,±1, 0), and (0, 0,±1) were used once. The dataand treatment assignment to blocks are shown in Table 4.13;Table 6.18 shows the factorial points corresponding to each ofthe treatments.

A quadratic model in xA, xB , and xC has nine parameters in ad-dition to the overall mean. Fit this model, adjusted for blocks,and discuss how the linear and quadratic effects of the threefactors may be estimated. What additional effects may be es-timated from the five remaining treatment degrees of freedom?Discuss how the replication of the centre point in three differ-ent blocks may be used as an adjunct to the estimate of errorobtained from Table 4.15.

Gilmour and Ringrose (1999) discuss the data in the light offitting response surface models. Blocking of central compositedesigns is discussed in Box and Hunter (1957); see also Deanand Voss (1999, Chapter 16).

10. What would be the interpretation in the quality-quantity exam-ple of Section 6.6.3 if the upper of the two error mean squares


Table 6.18 Factorial treatment structure for the incomplete block designof Gilmour and Ringrose (1999); data and preliminary analysis are givenin Table 4.13.

Trtm 1 2 3 4 5 6 7 8xA −1 −1 −1 −1 1 1 1 1xB −1 −1 1 1 −1 −1 1 1xC −1 1 −1 1 −1 1 −1 1Day 1, 6 3, 7 3, 5 2, 6 2, 3 4, 6 6, 7 1, 3

Trtm 9 10 11 12 13 14 15xA 0 −1 0 0 1 0 0xB 0 0 −1 0 0 1 0xC 0 0 0 −1 0 0 1Day 1, 2, 7 4 5 4 5 4 5

were to be much larger (or smaller) than the lower? Comparethe discussion in the text with that of Fisher (1935, Chapter 8).

11. Show that for a second degree polynomial for a mixture experi-ment a canonical form different from the one in the text resultsif we eliminate the constant term by multiplying by Σxj andeliminate the squared terms such as x2

j by writing them in theform xj(1 − Σk 6=jxk). Examine the extension of this and otherforms to higher degree polynomials.

12. In one form of analysis of Taguchi-type designs a variance iscalculated for each combination of fixed factors as between theobservations at different levels of the noise factors. Under whatexceptional special conditions would these variances have a di-rect interpretation as variances to be empirically realized in ap-plications? Note that the distribution of these variances undernormal-theory assumptions has a noncentral chi-squared form.A standard method of analyzing sets of normal-theory estimatesof variance with d degrees of freedom uses the theoretical vari-ance of approximately 2/d for the log variances and a multi-plicative systematic structure for the variances. Show that thiswould tend to underestimate the precision of the conclusions.

13. Pistone and Wynn (1996) suggested a systematic approach tothe fitting of polynomial and some other models to essentially


arbitrary designs. A key aspect is that a design is specified viapolynomials that vanish at the design points. For example, the22 design with observations at (±1,±1) is specified by the simul-taneous equations x2

1 − 1 = 0, x22 − 1 = 0. A general polynomial

in (x1, x2) can then be written as

k1(x1, x2)(x21 − 1) + k2(x1, x2)(x2

2 − 1) + r(x1, x2),

where r(x1, x2) is a linear combination of 1, x1, x2, x1x2 andthese terms specify a saturated model for this design. More gen-erally a design with n distinct points together with an orderingof the monomial expressions xa1

1 · · ·xak

k , in the above example,1 ≺ x1 ≺ x2 ≺ x1x2, determines a Grobner basis, which is a setof polynomials g1, . . . , gm such that the design points satisfythe simultaneous equations g1 = 0, . . . , gm = 0. Moreover whenan arbitrary polynomial is written∑

ks(x)gs(x) + r(x),

the remainder r(x) specifies a saturated model for the design re-specting the monomial ordering. Computer algorithms for find-ing Grobner bases are available. Once constructed the terms inthe saturated model are found via monomials not divisible bythe leading terms of the bases. For a full account, see Pistone,Riccomagno and Wynn (2000).


CHAPTER 7

Optimal design

7.1 General remarks

Most of the previous discussion of the choice of designs has been ona relatively informal basis, emphasizing the desirability of generallyplausible requirements of balance and the closely associated notionof orthogonality; see Section 1.7. We now consider the extent towhich the design process can be formalized and optimality criteriaused to deduce a design. We give only an outline of what is a quiteextensive theoretical development.

This theory serves two rather different purposes. One is to clarifythe properties of established designs whose good properties haveno doubt always been understood at a less formal level. The otheris to give a basis for suggesting designs in nonstandard situations.

We begin with a very simple situation that will then serve toillustrate the general discussion and in particular a key result, theGeneral Equivalence Theorem of Kiefer and Wolfowitz.

7.2 Some simple examples

7.2.1 Straight line through the origin

Suppose that it is possible to make n separate observations on aresponse variable Y whose distribution depends on a single ex-planatory variable x and that for each observation the investigatormay choose a value of x in the closed interval [−1, 1]. We call theinterval the design region D for the problem. Suppose further thatvalues of Y for different individuals are uncorrelated of constantvariance σ2 and have

E(Y ) = βx, (7.1)

where β is an unknown parameter.We thus have a very explicit formulation both of a model and

a design restriction; the latter is supposed to come either from a


practical constraint on the region of “safe” or accessible experi-mentation or from the consideration that the region is the largestover which the model can plausibly be used.

We specify the design used, i.e. the set of values x1, . . . , xnemployed, by a measure ξ(·) over the design region attaching designmass 1/n to the relevant x for each observation; note that the samepoint in D may be used several times and then receives mass 1/nfrom each occurrence.

It is convenient to write

n−1Σx2i =

∫x2ξ(dx) = M(ξ), (7.2)

say. We call M(ξ) the design second moment; in general it is pro-portional to the Fisher information. If we analyse the responses byleast squares

var(β) = (σ2/n)M(ξ)−1. (7.3)

To estimate the expected value of Y at an arbitrary value ofx ∈ D, say x0, we take

Y0 = βx0, (7.4)

with

var(Y0) =σ2

nM(ξ)−1x2

0 =σ2

nd(x0, ξ), (7.5)

say.There are two types of optimality requirement that might now

be imposed. One is to minimize var(β) and this requires the max-imization of M(ξ). Any design which attaches all the design massto the points ±1 achieves this. Alternatively we may minimize themaximum over x0 of var(Y0). We have that

d(ξ) = supx0⊂Dd(x0, ξ) = M(ξ)−1 (7.6)

and this is minimized as before by maximizing M(ξ). Further whenthis is done

infξ d(ξ) = 1. (7.7)

7.2.2 Straight line with intercept

Now consider the more general straight line model with an inter-cept,

E(Y ) = β0 + β1x. (7.8)


We generalize the design moment to the design moment matrix

M(ξ) =(

1 xx n−1Σx2

i

), (7.9)

where x = n−1Σxi, for example, can be written as

x =∫xξ(dx). (7.10)

The determinant and inverse of M(ξ) are

detM(ξ) = n−1Σ(xi − x)2, (7.11)

M(ξ)−1 = nΣ(xi − x)2−1

(n−1Σx2

i −x−x 1

). (7.12)

Now the covariance matrix of the estimated regression coeffi-cients is M(ξ)−1σ2/n. It follows that, at least for even values ofn, the determinant of M and the value of var(β1) are minimizedby putting design mass of 1/2 at the points ±1, i.e. spacing thepoints as far apart as allowable. There is a minor complication ifn is odd.

Also the variance of Y0 = β0+β1x0, the estimated mean responseat the point x0, is again d(x0, ξ)σ2/n, where now

d(x, ξ) = (1 x) M(ξ)−1(1 x)T .

It is convenient sometimes to regard this as defining the varianceof prediction although for predicting a single outcome σ2 wouldhave to be added to the variance of the estimated mean response.

A direct calculation shows that for the symmetrical two-pointdesign

infξ d(ξ) = infξ

supx⊂Dd(x, ξ) = 2. (7.13)

Again in this case, although not in general, different optimalityrequirements can be met simultaneously. In summary, any designwith x = 0 minimizes var(β0), the design with equal mass 1/2 at±1 minimizes var(β1) and also can be shown to minimize d(ξ).

7.2.3 Critique

These results depend heavily on the precise specification of themodel and of the design region. In many situations it would bea fatal criticism of the above design that it offers no possibility


of checking the assumed linearity of the regression function; suchchecking does not feature in the optimality criteria used above sothat it is no surprise that it does not feature in the optimal design.

We now investigate informally two approaches to the inclusionof some check of linearity. One is to use the optimal design for aproportion (1− w) of the observations and to take the remainingproportion w at some third point, most naturally at zero in theabsence of special considerations otherwise; see Section 6.6. Thevariance of the estimated slope is increased by a factor (1− w)−1

and nonlinearity can be studied via the statistic comparing thesample mean Y at x = 0 with the mean of the remaining obser-vations. That analysis is closely associated with the fitting of thequadratic model

E(Y ) = β0 + β1x+ β2x2. (7.14)

A direct calculation shows that the optimal design for estimatingβ1 has w = 0, the optimal design for estimating β2 has w = 1/2and that for minimizing both the determinant of the covariancematrix of β and the prediction based criterion d(ξ) has w = 1/3,leading also to

infξ d(ξ) = infξ

supx⊂Dd(x, ξ) = 3. (7.15)

If a suitable criterion balancing the relative importance of esti-mating β1 and testing the adequacy of linearity were to be formu-lated then and only then could a suitable w be deduced within thepresent formulation.

7.2.4 Space-filling designs

There is another more extreme approach to design in this context.If a primary aspect were the exploration of an unknown functionrather than the estimation of a slope then a reasonable strategywould be to spread the design points over the design region, forexample approximately uniformly. It is a convenient approximationto allow the measure ξ(·) to be continuous and in particular toconsider the uniform distribution on (−1, 1). For the simple two-parameter linear model this would lead to the moment matrix

M(ξ) = diag(1, 1/3), M(ξ)−1 = diag(1, 3),

representing a three-fold increase in variance of the estimated slopeas compared with the optimal design. Hardly surprisingly, the ob-


jectives of estimating a parameter in a tightly specified model andof exploring an essentially unknown function lead to quite differentdesigns. See Section 7.7 for an introduction to space-filling designs.

7.3 Some general theory

7.3.1 Formulation

We now sketch a more general formulation. The previous sectionprovides motivation and exemplification of most of the ideas in-volved.

We consider a design region D, typically a closed region in someEuclidean space, Rd, and a linear model specifying for a particularx ⊂ D that

E(Y ) = βT f(x), (7.16)

where β is a p × 1 vector of unknown parameters and f(x) isa p × 1 vector of known functions of x, for example the powers1, x, x2, . . . , xp−1 or orthogonalized versions thereof, or the firstfew sinusoidal functions as the start of an empirical Fourier series.

We make the usual second moment error assumptions leading tothe use of least squares estimates. Under some circumstances thismight be justified by randomization.

A design is specified by an initially arbitrary measure ξ(·) as-signing unit mass to the design region D. If this is formed fromatoms of size equal to 1/n, where n is the specified number of ob-servations, the design can be exactly realized in n observations, butotherwise the specification has to be regarded as an approximationvalid in some sense for large n.

We define the moment matrix by

M(ξ) =∫f(x)f(x)T ξ(dx), (7.17)

so that the covariance matrix of the least squares estimate of βfrom n observations with variance σ2 is

(σ2/n)M(ξ)−1 (7.18)

and the variance of the estimated mean response at point x is(σ2/n)d(x, ξ), where

d(x, ξ) = f(x)T M(ξ)−1f(x). (7.19)


We define a design ξ∗ to be D–optimal if it maximizes detM(ξ)or, of course equivalently minimizes detM(ξ)−1, the latter de-termining the generalized variance of the least squares estimate ofβ.

We define a design to be G–optimal if it minimizes

d(ξ) = supx⊂D

d(x, ξ). (7.20)

7.3.2 General equivalence theorem

A central result in the theory of optimal design, the General Equiv-alence Theorem, asserts that the design ξ∗ that isD–optimal is alsoG–optimal and that

d(ξ∗) = p, (7.21)

the number of parameters. The specific example of linear regressionin Section 7.2 illustrates both parts of this.

We shall not give a full proof which requires showing overalloptimality and uses some arguments in convex analysis. We willoutline a proof of local optimality. For this we perturb the measureξ to (1−ε)ξ+εδx, where δx is a unit atom at x. For a scalar, vectoror matrix valued function H(ξ) determined by the measure ξ, wecan define a derivative

derH(ξ), x = limε→0+

ε−1H(1− ε)ξ + εδx, (7.22)

where the limit is taken as ε tends to zero through positive val-ues. This is a generalization of the notion of partial or directionalderivatives and a special case of Gateaux derivatives, the last hav-ing a similar definition with more general perturbing measures. Anecessary condition for ξ∗ to produce a local stationary point in His that derH(ξ∗), x = 0 for all x ∈ D.

To apply this to D–optimality we take H to be log detM(ξ). Forany nonsingular matrix A and any matrix B, we have that as εtends to zero

det(A+ εB) = det(A)det(I + εA−1B)= det(A)1 + εtr(A−1B) +O(ε2), (7.23)

where tr denotes the trace of a square matrix. That is, at ε = 0,we have that

(d/dε) log det(A+ εB) = tr(A−1B). (7.24)


The moment matrixM(ξ) is linear in the measure ξ and if all themass is concentrated at x the moment matrix would be f(x)fT (x).Thus

M(1− ε)ξ + εδx = M(ξ) + εf(x)f(x)T −M(ξ). (7.25)

Therefore

der[log detM(ξ), x] = trM−1f(x)f(x)T − I. (7.26)

Note that the derivative is meaningful only as a right-hand deriva-tive unless there is an atom at x in the measure ξ in which case thesubtraction of a small atom is possible and negative ε allowable.

Now the trace of the first matrix on the right-hand side is equalto trf(x)TM−1f(x) and that of the second matrix is p, the di-mension of the parameter vector:

der[log detM(ξ), x] = d(x, ξ)− p. (7.27)

Next suppose that we have a design measure ξ∗ such that at allsets of points in D which have design measure zero, d(x, ξ∗) < pand at all points with positive measure, including especially pointswith atomic design mass, d(x, ξ∗) = p. Then the design is locallyD–optimal. For a perturbation that adds a small design mass wherethere was none before decreases the determinant whereas with re-spect to changes at other points there is a stationary point, in facta local maximum. Global D–optimality and the second propertyof G–optimality hinge on convexity, G–optimality in particular onthe generalized derivative being nonpositive, so that indeed the“worst” points for prediction are the points of positive mass whered(ξ∗) = p.

The most important point to emerge from the above outline isthe mathematical basis for the connection between the covariancematrix of the least squares estimates and the variance of estimatedmean response and the origin of the, at first sight mysterious, iden-tity between the maximum variance of prediction and the numberof parameters.

7.3.3 Some special cases

If the D–optimal design has support on p distinct points with de-sign masses δ1, . . . , δp it follows that the moment matrix has theform

M(ξ) = Cdiag(δ1, . . . , δp)CT , (7.28)


where the p × p matrix C depends only on the positions of thepoints. It follows that

detM(ξ) = det(C)2Πδi. (7.29)

The condition for D–optimality thus requires that Πδi is maxi-mized subject to Σδi = 1 and this is easily shown to imply that allpoints have equal design mass 1/p. Not all D–optimal designs aresupported on as few as p points, however.

As an example we take the quadratic model of (7.14), and forconvenience reparameterize in orthogonalized form as

E(Yx) = γ0 + γ1x+ γ2(3x2 − 2). (7.30)

With equal design mass 1/3 at −1, 0, 1 we have that

M(ξ∗) = diag(1, 2/3, 2),M−1(ξ∗) = diag(1, 3/2, 1/2), (7.31)

so that on calculating var(Y0), we have that

d(x0, ξ∗) = 3− 9(x2

0 − x40)/2 (7.32)

and all the properties listed above can be directly verified. Thatis, in the design region D the generalized derivative is negative atall points except the three points with positive support where it iszero and where the maximum standardized variance of predictionof 3 is achieved.

7.4 Other optimality criteria

The discussion in the previous section hinges on a link between aglobal criterion about the precision of the vector estimate β andthe variance of prediction. A number of criteria other than D–optimality may be more appropriate. One important possibilityis based on partitioning β into two components β1 and β2 andfocusing on β1 as the component of interest; in a special case β1

consists of a single parameter.If the information matrix of the full parameter vector is parti-

tioned conformally as

M(ξ) =(M11 M12

M21 M22

)(7.33)

the covariance matrix of the estimate of interest is proportional to


the inverse of the matrix

M11.2(ξ) = M11 −M12M−122 M21 (7.34)

and we call a design Ds–optimal if the determinant of (7.34) ismaximized.

An essentially equivalent but superficially more general formu-lation concerns the estimation of the parameter Aβ, where A is aq × p matrix with q < p; the corresponding notion may be calledDA–optimality.

Other notions that can be appropriate in special contexts includeA–optimality in which the criterion to be minimized is tr(M−1),and E–optimality which aims to minimize the variance of the leastwell-determined of a set of contrasts.

7.5 Algorithms for design construction

The use of the theory developed so far in this chapter is partly toverify that designs suggested from more qualitative considerationshave good properties and partly to aid in the construction of de-signs for nonstandard situations. For example an unusual modelmay be fitted or a nonstandard design region may be involved.

There are two slightly different settings requiring use of optimaldesign algorithms. In one some observations may already be avail-able and it is required to supplement these so that the full design isas effective as possible. In the second situation only a design regionand a model are available. If an informative prior distribution wereavailable and if this were to be used in the analysis as well as thechoice of a design the two situations are essentially equivalent.

We shall concentrate on D–optimality. There are various algo-rithms for finding an optimal design; all are variants on the fol-lowing idea. Start with an initial design, ξ0, in the first problemabove with that used in the first part of the experiment and inthe second problem often with some initially plausible atomic ar-rangement with atoms of amount 1/N , where N is large comparedwith the number n of points to be used. Cover the design regionwith a suitable network, N , of points and compute the functiond(x, ξ0) for all x in N . The network N should be rich enough tocontain close approximations to the points likely to have positivemass in the ultimate design. The idea is to add design mass whered is large until the D–optimality criterion is reasonably closelysatisfied. Then it will be necessary to look at the design realizable


with n observations and, especially importantly, to check that thedesign has no features undesirable in the light of some aspect notcovered in the formal optimality criterion used.

For the construction of a design without previous observationsone appealing formulation is as follows: at the kth step let thedesign measure be ξk. Remove an atom from the point in N withsmallest d(x, ξk) and attach it to the point with largest d(x, ξk).There are many ways of accelerating such an algorithm.

In some ways the simplest algorithm, and one for which con-vergence can be proved, is at the kth step to find the point xk atwhich d(x, ξk) is maximized and to define

ξk+1 = kξk/(k + 1) + δxk/(k + 1). (7.35)

If other optimality requirements are used a similar algorithm canbe used based on the relevant directional derivative.

These algorithms give the optimizing ξ. The construction of theoptimum n point design for given n is a combinatorial optimizationproblem and in general is much more difficult.

7.6 Nonlinear design

The above discussion concerns designs for problems in which anal-ysis by least squares applied to linear models is appropriate. Whilequite strong specification is needed to deduce an optimal design thesolution does not depend on the unknown parameter under study.When nonlinear models are appropriate for analysis, much of theprevious theory applies replacing least squares estimation by max-imum likelihood estimation but the optimal design will typicallydepend on the unknown parameter under study.

This means that either one must be content with optimality atthe best prior estimate of the unknown parameter, checking forsensitivity to errors in this prior estimate, or, preferably, that asequential approach is used.

As an illustration we discuss what was historically one of thefirst problems of optimal design to be analysed, the dilution series.Suppose that organisms are distributed at random at a rate µ perunit volume. If a unit volume is sampled the number of organismswill have a Poisson distribution of mean µ. If the unit volume isdiluted by a factor k the number will have a Poisson distribution ofmean µ/k. In some contexts it is much easier to check for presenceor absence of the organism than it is to count numbers, leading at


dilution k to consideration of a binary variable Yk, where

P (Yk = 0; k, µ) = e−µ/k, P (Yk = 1; k, µ) = 1− e−µ/k, (7.36)

corresponding respectively to the occurrence of no organism or oneor more organisms.

A commonly used procedure is to examine r samples at each of aseries of dilutions, for example taking k = 1, 2, 4, . . .. The number ofpositive samples at a given k will have a binomial distribution andhence a log likelihood function can be found and µ estimated bymaximum likelihood. Samples at both large and very small valuesof µ/k provide little quantitative information about µ; it is thesamples with a value of k approximately equal to µ that are mostinformative. There is one unknown parameter and the design regionis the set of allowable k, essentially all values greater than or equalto one. The design criterion is to minimize the asymptotic varianceof the maximum likelihood estimate µ or equivalently to maximizethe Fisher information about µ. It is plausible and can be formallyproved that this is done by choosing a single value of k.

The log likelihood for one observation on Yk is

−µk−1(1− Yk) + Yk log(1 − e−µ/k), (7.37)

so that the expected information about µ is

ν2e−ν(1− e−ν)−1µ−2, (7.38)

where ν = µ/k, the function having its maximum for given µ atν = 1.594, i.e. at kopt = 0.627µ, the corresponding expected in-formation per observation about µ being 0.648/µ2. If we madeone direct count of the number of organisms at dilution k∗ yield-ing a Poisson distributed observation of mean µ/k∗ the expectedinformation about µ is 1/(k∗µ) and if it so happened that theabove optimal dilution had been chosen, so that k∗ = kopt, theinformation about µ would then be 1.594/µ2. Some of the loss ofinformation involved in the simple dilution series method could berecovered if it were possible to record the response variable in theextended form 0, 1, more than 1. Then, of course, the optimalitycalculation would need revision. If particular interest was focusedon whether µ is larger or smaller than some special value µ0 itwould be reasonable to use the design locally optimal at µ0.

The key point is that the optimal choice of design depends onthe unknown parameter µ and this is typical of nonlinear problemsin general. To examine sensitivity to the choice of the dilution


constant we write k = ckopt, so that c is the ratio of the dilutionused to its optimal value. The ratio of the resulting informationto its optimal value can be found from (7.38). The dependence onerrors in assessing k is asymmetric, the ratio being 0.804 at c = 2and 0.675 at c = 1/2 and being 0.622 at c = 3 and 0.298 at c = 1/3.It would thus be better to dilute by too much than by too little ifa single dilution strategy were to be adopted.

In normal theory regression problems there would be a similardependence if the error variance depended on the unknown param-eter controlling the expected value.

If the dependence of intrinsic precision on the unknown param-eter is slight then designs virtually independent of the unknownparameter are achieved. For example, suppose that a reasonablemodel for the observed responses is of the generalized linear re-gression form in which Y1, . . . , Yn are independently distributed inthe exponential family distribution with the density of Yj being

expφjyj + a(yj)− k(φj),where φj is the canonical parameter, a(yj) a normalizing functionand k(φj) the cumulant function. Suppose also that for some knownfunction h(·) the vector h(φ) with components h(φj) obeys thelinear model

h(φ) = βT f(x), (7.39)

where β and f(x) have the same interpretation as in the linearmodel (7.16). The natural analogue to the moment matrix is theinformation matrix for β,

ΣmβT f(xj)f(xj)f(xj)T , (7.40)

where the function m(·) depends on the functions h(·) and k(·). Ifthe dependence of the response on the covariates is weak, that is ifwe can work locally near β = 0, Taylor expansion shows that theinformation matrix is proportional to the moment matrix M(ξ),so that, for example, the D–optimal design is the same as in thelinear least squares case. This conclusion is qualitatively obviouswhen h(φ) = φ on remarking that the canonical statistics are thesame as in the normal-theory case and the dependence of theirasymptotic distribution on β is by assumption weak. For example,locally at least near a full null hypothesis, optimal designs for linearlogistic regression are the same as in the least squares case.


7.7 Space-filling designs

The optimal designs described in Sections 7.2–7.4 typically sam-ple at the extreme points of the design space, as they are in ef-fect targetted at minimizing the variance of prediction for a givenmodel. In problems where the model is not well specified a prefer-able strategy is to take observations throughout the range of thedesign space, at least until more information about the shape of theresponse surface is obtained. Such designs are called space-fillingdesigns.

In Section 6.6 we discussed two and three level factorial systemsfor exploration of response surfaces that were well approximatedby quadratic polynomials. If the dependence of Y on x1, . . . , xk ishighly nonlinear, either because the system is very complex, or therange of values of X is relatively large, then this approximation maybe quite poor. A space-filling design is then useful for exploring thenature of the response surface.

Illustrations. In an engineering context, an experiment may in-volve running a large simulation of a complex deterministic systemwith given values for a number of tuning parameters. For example,in a fluid dynamics experiment, the system may be determinedvia the solution of a number of partial differential equations. Asthe system is deterministic, random error is unimportant. Of in-terest is the choice of tuning parameters needed to ensure that thesimulator reproduces observations consistent with data from thephysical system. Since each run of the simulator may be very ex-pensive, efficient choice of the test values of the tuning parametersis important.

In modelling a stochastic epidemic there may be a number ofparameters introduced to describe rates of infectivity, transmissionunder various mechanisms, and so on. At least in the early stagesof the epidemic there will not be enough data to permit estimationof these parameters. Some progress can be made by simulating themodel over the full range of parameter values, and accepting asplausible those parameter combinations which give results consis-tent with the available data. A space-filling design may be used tochoose the parameter values for simulation of the model.

A commonly used space-filling design, called a Latin hypercube,is constructed as follows. The range for each of m design variables(tuning parameters in the illustrations above) is divided into nsubintervals, equally spaced on the appropriate scale for each vari-


able. An array of n rows and m columns is constructed by assigningto each column a random permutation of 1, . . . , n, independentlyof the other columns. Each row of the resulting array defines a de-sign point for the n-run experiment, either at a fixed point in eachsubinterval, such as the endpoint or midpoint, or randomly sam-pled within the subinterval. These designs are generalizations ofthe lattice square, which is constructed from sets of orthogonalLatin squares in Section 8.5.

For example, with n = 10 and m = 3, we might obtain the array 10 8 9 7 6 5 3 1 4 2

9 2 3 8 5 1 4 10 6 75 2 1 7 6 10 9 8 4 3

T

. (7.41)

Thus if each of the three design variables take values equally spacedon (0, 1), and we sample midpoints, the design points on (0, 1)3 are(0.45, 0.85, 0.95), (0.15, 0.15, 0.75), and so on. With n = 9 we coulduse the design given in Table 8.2.

Latin hypercube designs are balanced on any individual factor,but are not balanced across pairs or larger sets of factors. An im-provement of balance may be obtained by using as the basic designan orthogonal array of strength two, Latin hypercubes being or-thogonal arrays of strength 1; see Section 6.3.

In addition to exploring the shape of an unknown and possiblyhighly nonlinear response surface, y = f(x1, . . . , xk), space-fillingdesigns can be used to evaluate the integral∫

f(x)dx (7.42)

where f(·) may represent a density for a k-dimensional variable xor may be the expected value of a given function f(X) with respectto the uniform density. The resultant approximation to (7.42) oftenhas smaller variance than that obtained by Monte Carlo sampling.

7.8 Bayesian design

We have in the main part of the book deliberately emphasized theessentially qualitative good properties of designs. While the meth-ods of analysis have been predominantly based on an appropriatelinear model estimated by the method of least squares, or implic-itly by an analogous generalized linear model, were some other


type of analysis to be used or a special ad hoc model developed,the designs would in most cases retain considerable appeal. In thepresent chapter more formal optimum properties, largely based onleast squares analyses, have been given more emphasis. In many as-pects of design as yet unknown features of the system under studyare relevant. If the uncertainty about these features can be cap-tured in a prior probability distribution, Bayesian considerationscan be invoked.

In fact there are several rather different ways in which a Bayesianview of design might be formulated and there is indeed a quiteextensive literature. We outline several possibilities.

First choosing a design is clearly a decision. In an inferentialprocess we may conclude that there are several equally successfulinterpretations. In a terminal decision process even if there areseveral essentially equally appealing possibilities just one has to bechosen. This raises the possibility of a decision-analytic formulationof design choice as optimizing an expected utility. In particulara full Bayesian analysis would require a utility function for thevarious designs as a function of unknown features of the systemand a prior probability distribution for those features.

Secondly there is the possibility that the whole of the objectiveof the study, not just the choice of design, can be formulated as adecision problem. While of course the objectives of a study and thepossible consequences of various possible outcomes have always tobe considered, in most of the applications we have in mind in thisbook a full decision analysis is not likely to be feasible and we shallnot address this aspect.

Thirdly there may be prior information about the uncontrolledvariation. Use of this at a qualitative level has been the primarytheme of Chapters 3 and 4. The special spatial and temporal mod-els of uncontrolled variation to be discussed in Sections 8.4 and8.5 are also of this type although a fully Bayesian interpretation ofthem would require a hyperprior over the defining parameters.

Finally there may be prior information about the contrasts ofprimary interest, i.e. a prior distribution for the parameter of in-terest.

We stress that the issue is not whether prior information of thesevarious kinds should be used, but rather whether it can be usedquantitatively via a prior probability distribution and possibly autility function.

If the prior distribution is based on explicit empirical data or


theoretical calculation it will usually be sensible to use that sameinformation in the analysis of the data. In the other, and perhapsmore common, situation where the prior is rather more impres-sionistic and specific to experience of the individuals designing thestudy, the analysis may best not use that prior. Indeed one of themain themes of the earlier chapters is the construction of designsthat will lead to broadly acceptable conclusions. If one insisted on aBayesian formulation, which we do not, then the conclusions shouldbe convincing to individuals with a broad range of priors not onlyabout the parameters of interest but also about the structure ofthe uncontrolled variation.

Consider an experiment to be analysed via a parametric modeldefined by unknown parameters θ. Denote a possible design by ξand the resulting vector of responses by Y . Let the prior density ofθ to be used in designing the experiment be p0d(θ) and the priordensity to be used in analysis be p0a(θ). One multipurpose utilityfunction that might be used is the Shannon information in theposterior distribution. A more specifically statistical version wouldbe some measure of the size of the posterior covariance matrix ofθ.

Now for linear models the posterior covariance matrix undernormality of both model and prior is proportional to

nM(ξ) + P0−1, (7.43)

where P0 is a contribution proportional to the prior concentrationmatrix of θ. Because the prior distribution in this formulation doesnot depend on either the responses Y or the value of θ all the non-Bayesian criteria can be used with relatively minor modification.Thus maximization of

log detM(ξ) + P0/ngives BayesianD–optimality. Unless the prior information is strongthe Bayesian formulation does not make a radical difference. Notethat P0 would typically be given by the prior to be used for analysis,should that be known. That is to say, even if hypothetically theindividuals designing the experiment knew the correct value of θbut were not allowed to use that information in analysis the choiceof design would be unaffected.

The situation is quite different in nonlinear problems where theoptimal design typically depends on θ. Here at least in principle thepreposterior expected utility is the basis for design choice. Thus


with a scalar unknown parameter and squared error loss as thecriterion the objective would be to minimize the expectation, takenover the prior distribution p0d(θ) and over Y , of the posterior meanof θ evaluated under p0a(θ). In the extreme case where p0d(θ) ishighly concentrated around θ0 but the prior p0a(θ) is very dispersedthis amounts to using the design locally optimum at θ0 with thenon-Bayesian analysis. This design would also be suitable if interestis focused on the sign of θ − θ0. In the notionally complementarycase where the prior for analysis is highly concentrated but not tobe used in design it may be a waste of resources to do an experimentat all!

It can be shown theoretically and is clear on qualitative groundsthat, especially if the design prior is quite dispersed, a design withmany points of support will be required.

Implementation of the above procedure to deduce an optimal de-sign will often require extensive computation even in simple prob-lems. One much simpler approach is to find the Fisher information,I(θ, ξ) for a given design, i.e. the expected information averagedover the responses at a fixed θ, and to maximize that averagedover the prior distribution p0d(θ) of θ or slightly more generally tomaximize ∫

φI(θ, ξ)p0d(θ)dθ; (7.44)

for some suitable φ. This would not preclude the use of p0a(θ) inanalysis, it being assumed that this would have only a second-ordereffect on the choice of design.

Thus in the dilution series a single observation at dilution kcontributes Fisher information

I(µ, k) = k−2e−µ/k(1 − e−µ/k)−1

so that the objective is to maximize∫I(µ, k)p0d(µ)dµξ(dk) (7.45)

with respect to the design measure ξ. Note that the prior densityp0d must be proper. For an improper prior, evaluated as a limit ofa uniform distribution of µ over (0, A) as A tends to infinity, theaverage information at any k is zero.

This optimization problem must be done numerically, for exam-ple by computing (7.45) for discrete designs ξ with m design pointsand weight wi assigned to dilution ki, i = 1, . . .m. Given a prior


range for µ = (µL, µU ), say, we will have ki ∈ (1.594/µU , 1.594/µL)as in Section 7.6. Numerical calculation based on a uniform priorfor logµ indicates that for a sufficiently narrow prior the optimaldesign has just one support point, but a more diffuse prior permitsa larger number of support points; see the Bibliographic notes.

7.9 Optimality of traditional designs

In the simpler highly balanced designs strong optimality propertiescan be deduced from the following considerations.

Suppose first that interest is focused on a particular treatmentcontrast, not necessarily a main effect. Now if it could be assumedthat all other effects are absent, the problem is essentially thatof comparing two or more groups. Provided that the error vari-ance is constant and the errors uncorrelated this is most effectivelyachieved by equal replication. In the case of comparison of twotreatment groups this means that the variance of the comparisonis that of the difference between two means of independent samplesof equal size.

Next note that this variance is achieved in the standard facto-rial or fractional factorial systems, provided in the latter case thecontrast can be estimated free of damaging aliasing.

Finally the inclusion of additional terms into a linear model can-not decrease the variance of the estimates of the parameters ini-tially present. Indeed it will increase the variance unless orthogo-nality holds; see Appendix A.2.

Proof of the optimality of more complex designs, such as bal-anced incomplete block designs, is in most cases more difficult; seethe Bibliographic notes.


Optimal design for fitting polynomials was considered in great de-tail by Smith (1918). The discussion of nonlinear design for the di-lution series is due to Fisher (1935). While there are other isolatedinvestigations the first general discussions are due to Elfving (1952,1959), Chernoff (1953) and Box and Lucas (1959). An explosion ofthe subject followed the discovery of the General Equivalence The-orem by Kiefer and Wolfowitz (1959). Key papers are by Kiefer(1958, 1959, 1975); see also his collected works (Kiefer, 1985).Fedorov’s book (Fedorov, 1972) emphasizes algorithms for design


construction, whereas the books by Silvey (1980) and Pukelsheim(1993) stress respectively briefly and in detail the underlying gen-eral theory in the linear case. Atkinson and Donev (1992) in theirless mathematical discussion give many examples of optimal de-signs for specific situations. Important results on the convergenceof algorithms like the simple one of Section 7.5 are due to Wynn(1970); see also Fedorov and Hackl (1997). There is a large lit-erature on the use of so-called “alphabet” optimality criteria fora number of more specialized applications appearing in the theo-retical statistical journals. Flournoy, Rosenberger and Wong (1998)presents recent work on nonlinear optimal design and designs achiev-ing multiple objectives.

A lucid account of exact optimality is given by Shah and Sinha(1989). A key technique is that many of the optimality criteriaare expressed in terms of the eigenvalues of the matrix C of Sec-tion 4.2. It can then be shown that unnecessary lack of balancein the design leads to sub-optimal designs by all these criteria. Inthat way the optimality of balanced incomplete designs and groupdivisible designs can be established. For the optimality of orthog-onal arrays and fractional factorial designs, see Dey and Mukerjee(1999, Chapter 2).

A systematic review of Bayesian work on design of experimentsis given by Chaloner and Verdinelli (1995). For the use of previ-ous data, see Covey-Crump and Silvey (1970). A general treatmentin terms of Shannon information is due to Lindley (1956). Atkin-son and Donev (1992, Chapter 19) give some interesting examples.Dawid and Sebastiani (1999) have given a general discussion treat-ing design as part of a fully formulated decision problem.

Bayesian design for the dilution series is discussed in Mehrabiand Matthews (1998), where in particular they illustrate the useof diffuse priors to obtain a four-point design and more concen-trated priors to obtain a one-point design. General results on therelation of the prior to the number of support points are describedin Chaloner (1993).

Box and Draper (1959) emphasized the importance of space-filling designs when the goal is prediction of response at a newdesign point, and when random variability is of much less impor-tance than systematic variability. Latin hypercube designs wereproposed in McKay, Beckman and Conover (1979). Sacks, Welch,Mitchell and Wynn (1989) survey the use of optimal design the-ory for computer experiments; i.e. simulation experiments in which


runs are expensive and the output is deterministic. Aslett et al.(1998) give a detailed case study of a sequential approach to thedesign of a circuit simulator, where a Latin hypercube design isused in the initial stages. For applications to models of BSE andvCJD, see Donnelly and Ferguson (1999, Chapters 9, 10). Owen(1993) proves a central limit theorem for Latin hypercube sam-pling, and indicates how these samples may be used to explore theshape of the response function, for example in finding the max-imum of f over the design space. He considers the more generalcase of evaluating the expected value of f with respect to a knowndensity g.

Another approach to space-filling design using methods fromnumber theory is briefly described in Exercise 7.7. This approachis reviewed by Fang, Wang and Bentler (1994) and its applica-tion in design of experiments discussed in Chapter 5 of Fang andWang (1993). In the computer science literature the method is of-ten called quasi-Monte Carlo sampling; see Neiderreiter (1992).


1. Suppose that an optimal design for a model with p parametershas support on more than p(p+ 1)/2 distinct points. Then be-cause an arbitrary information matrix can be formed from aconvex combination of those for p(p+1)/2 points there must bean optimal design with only that number of points of support.Often fewer points, indeed often only p points, are needed.

2. Show that a number of types of optimality criterion can be en-capsulated into a single form via the eigenvalues γ1, . . . , γp ofthe matrix M(ξ) on noting that the eigenvalues of M−1(ξ) arethe reciprocals of the γs and defining

Πk(ξ) = (p−1Σγ−ks )1/k.

Examine the special cases k = ∞, 1, 0. See Kiefer (1975).3. Optimal designs for estimating logistic regression allowing for

robustness to parameter choice and considering sequential pos-sibilities were studied by Abdelbasit and Plackett (1983). Heiseand Myers (1996) introduced a bivariate logistic model, i.e. withtwo responses, for instance efficacy and toxicity. They defined Qoptimality to be the minimization of an average over the designregion of a variance of prediction and developed designs for es-timating the probability of efficacy without toxicity. See several


papers in the conference volume edited by Flournoy et al. (1998)for further extensions.

4. Physical angular correlation studies involve placing pairs of de-tectors to record the simultaneous emission of pairs of particleswhose paths subtend an angle θ at a source. There are theoret-ical grounds for expecting the rate of emission to be given by afew terms of an expansion in Legendre polynomials, namely tobe of the form

β0 + β2P2(cos θ) + β4P4(cos θ) =β0 + β2(3 cos2 θ − 1)/2 + β4(35 cos4 θ − 30 cos2 θ + 3)/8.

Show, assuming that the estimated counting rate at each angle isapproximately normal with constant variance, that the optimaldesign is to choose three equally spaced values of cos2 θ, namelyvalues of θ of 90, 135 and 180 degrees. If the resulting countshave Poisson distributions, with largish means, what are theimplications for the observational times at the three angles?

5. In an experiment on the properties of materials cast from highpurity metals it was possible to cast four 2 kg ingots from a 8 kgmelt. After the first ingot from a melt had been cast it was pos-sible to change the composition of the remaining molten metalbut only by the addition of one of the alloying metals, no tech-nique being available for selectively reducing the concentrationof an alloying metal. Thus within any one melt and with one al-loying metal at two levels (1, a) there are five possible sequencesnamely

(1, 1, 1, 1); (1, 1, 1, a); (1, 1, a, a); (1, a, a, a); (a, a, a, a),

with corresponding restrictions on each factor separately if afactorial experiment is considered. This is rather an extremeexample of practical constraints on the sequence of experimen-tation.

By symmetry an optimal design is likely to have

n1, n2, n3, n2, n1

melts of the five types. Show that under the usual model allow-ing for melt (block) and period effects the information aboutthe treatment effect is proportional to

12n1n2 + 8n1n3 + 6n2n3 + 4n22.


By maximizing this subject to the constraint that the total num-ber of melts 2n1+2n2+n3 is fixed show that the simplest realiz-able optimal design has 8 melts with n1 = 1, n2 = n3 = 2. Showthat this is an advantageous design as compared with holdingthe treatment fixed within a melt, i.e. taking a whole melt asan experimental unit if and only if there is a substantial compo-nent of variance between melts. For more details including thefactorial case, see Cox (1954).

6. In Latin hypercube sampling, if we sample randomly for eachsubinterval, we can represent the resulting sample (X1, . . . , Xn),or in component form (X11, . . . , X1m; . . . ;Xn1, . . . , Xnm), by

Xij = (πij − Uij)/n, i = 1, . . . n, j = 1, . . . ,m, (7.46)

where π1j , . . . , πnj is a random permutation of 1, . . . , n andUij is a random variable distributed uniformly on (0, 1). Supposeour goal is to evaluate Y = f(X), X ∈ Rm and Y ∈ R where fis a known function but very expensive to compute. Show thatvar(Y ) under the sampling scheme (7.46) is

n−1varf(X) − n−1m∑

j=1

varfj(Xj)+ o(n−1), (7.47)

where fj(Xj) = Ef(X)|Xj−Ef(X) is the “main effect” off(X) for the jth component of X . For details see Stein (1987)and Owen (1992). Tang (1993) shows how to reduce the variancefurther by orthogonal array based sampling.

7. Another type of space-filling design specifies points in the designspace using methods from number theory. The resulting design iscalled a uniform, or uniformly scattered design. In one dimensiona uniformly scattered design of n points is simply

(2i− 1)/(2n), i = 1, . . . , n. (7.48)

This design has the property that it minimizes, among all n-point designs, the Kolmogorov-Smirnov distance between thedesign measure and the uniform measure on (0, 1):

Dn(ξ) = supx∈(0,1)|Fn(x)− x| (7.49)

where Fn(x) = n−1∑

1ξi ≤ x is the empirical distributionfunction for the design ξ = (ξ1, . . . ξn).In k ≥ 2 dimensions the determination of a uniformly scattereddesign, i.e. a design that minimizes the Kolmogorov-Smirnov


distance between the design measure and the uniform measureon [0, 1]k is rather difficult and is often simplified by seeking de-signs that achieve this property asymptotically. Detailed resultsand a variety of applications are given in Fang and Wang, (1993,Chapters 1, 5).


CHAPTER 8

Some additional topics

8.1 Scale of effort


An important part of the design of any investigation involves thescale of effort appropriate. In the terminology used in this bookthe total number of experimental units must be chosen. Also, com-monly, a decision must be made about the number of repeat ob-servations to be made on important variables within each unit,especially when sampling of material within each unit is needed.

Illustration. In measuring yield of product per plot in an agri-cultural field trial the whole plot might be harvested and the to-tal yield measured; in some contexts sample areas within the plotwould be used. In measuring other aspects concerned with thequality of product, sampling within each plot would be essential.Similar remarks apply to the product of, for example, chemicalreactions, when chemical analyses would be carried out on smallsubsamples.

It is, of course, crucial to distinguish between the number ofexperimental units and the total number of observations, whichwill be much greater if there is intensive sampling within units.Precision of treatment contrasts is usually determined primarilyby the number of units and only secondarily by the number ofrepeat observations per unit.

Often the investigator has appreciable control over the amountof sampling per unit. As regards the number of units, there are twobroad situations.

In the first the number of units is largely outside the investi-gator’s control. The question at issue is then usually whether theresources are adequate to produce an answer of sufficient precisionto be useful and hence to justify proceeding with the investigation.Just occasionally it may be that the resources are unnecessarilygreat and that it may be sensible to begin with a smaller investi-


gation. The second possibility is that there is a substantial degreeof control over the number of units to use and in that case somecalculation of the number needed to achieve reasonable precision isrequired. In both cases some initial consideration of the number ofunits is very desirable although, as we shall see, there is substantialarbitrariness involved and elaborate calculations of high apparentprecision are very rarely if ever justified.

If the investigation is of a new or especially complex kind itwill be important to do a pilot study of as many aspects of theprocedure as feasible. The resources devoted to this will, however,typically be small compared with the total available and this aspectwill not be considered further.

A further general aspect concerns the time scale of the investiga-tion. If results are obtained quickly it may be sensible to proceedin relatively small steps, calculating confidence limits from time totime and stopping when adequate precision has been achieved. Toavoid possible biases it will, however, be desirable to set a targetprecision in advance.

A decision-theoretic formulation is outlined in a later subsec-tion, but we deal first with some rather informal procedures thatare commonly used. The importance of these arguments in fieldssuch as clinical trials, where they are widely applied, is probablyin ensuring some uniformity of procedure. If experience indicatesthat a certain level of precision produces effective results the cal-culations provide some check that in each new situation broadlyappropriate procedures are being followed and this, of course, isby no means the same as using the same number of experimentalunits in all contexts.

We deal first with the number of experimental units and sub-sequently with the issue of sampling within units. In much of thediscussion we consider an experiment to compare two treatments,T and C, the extension to more than two treatments and to simplefactorial systems being immediate.

8.1.2 Precision and power

We have in the main discussion in this book emphasized the ob-jective of estimating treatment contrasts, in the simplest case dif-ferences between pairs of treatments and in more complex casesfactorial contrasts. In the simplest cases these are estimated witha standard error of the form σ

√(2/m), where m is related to the


total number n of experimental units. For example, if comparisonof pairs of treatments is involved, m = n/v, when v equally repli-cated treatments are used. The standard deviation σ is residualto any blocking system used. Approximate confidence limits aredirectly calculated from the standard error.

A very direct and appealing formulation is to aim that the stan-dard error of contrasts of interest is near to some target level, d,say. This could also be formulated in terms of the width of confi-dence intervals at some chosen confidence levels. Direct use of thestandard error leads to the choice

m = 2σ2/d2 (8.1)

and hence to an appropriate n. If the formulation is in terms of astandard error required to be a given fraction of an overall mean,for example 0.05 times the overall mean, σ is to be interpreted as acoefficient of variation. If the comparison is of the means of Poissonvariables or of the probabilities of binary events essentially minorchanges are needed.

Now while we have put some emphasis in the book on the esti-mation of precision internally from the experiment itself, usuallyvia an appropriate residual sum of squares, it will be rare thatthere is not some rough idea from previous experience about thevalue of σ2. Indeed if there is no such value it will probably beparticularly unwise to proceed much further without a pilot studywhich, in particular, will give an approximate value for σ2. Primar-ily, therefore, we regard the above simple formula as the one to usein discussions of appropriate sample size. Note that to produce sub-stantial improvements in precision via increased replication largechanges in m and n are needed, a four-fold increase to halve thestandard error. We return to this point below.

A conceptually more complicated but essentially equivalent pro-cedure is based on the power of a test of the null hypothesis that atreatment contrast, for example a simple difference, is zero. If ∆0

represents a difference which we wish to be reasonably confidentof detecting if present we may require power (1 − β) in a test atsignificance level α to be achieved at ∆ = ∆0. If we consider aone-sided test for simplicity this leads under normality to

∆0 = (kα + kβ)σ√

(2/m), (8.2)

where Φ(−kα) = α.


This is equivalent to requiring the standard error to be ∆0/(kα+kβ).

Quite apart from the general undesirability of focusing the ob-jectives of experimentation on hypothesis testing, note that theinterpretation in terms of power requires the specification of threesomewhat arbitrary quantities many of them leading to the samechoice of m. For that reason a formulation directly in terms of atarget standard error is to be preferred. A further general com-ment concerns the dependence of standard error on m or n. Theinverse square-root dependence holds so long as the effective stan-dard deviation, σ, does not depend on n. In practice, for a varietyof reasons, if n varies over a very wide range it is quite likely thatσ increases slowly with n, for example because it is more difficultto maintain control over large studies than over small studies. Forthat reason the gains in precision achieved by massive increases inn are likely to be less than those predicted above.

Finally, if observations can be obtained and analysed quicklyit may be feasible simply to continue an investigation until therequired standard error, d, has been achieved rather than havinga prior commitment to a particular n.

8.1.3 Sampling within units

We now suppose that on each of n experimental units r repeatobservations are taken, typically representing a random sample ofmaterial forming the unit. Ignoring the possibility that the sampledmaterial is an appreciable proportion of the whole, so that a finitepopulation correction is unnecessary, the effective variance per unitis

σ2b + σ2

w/r, (8.3)

where σ2b and σ2

w are respectively components of variance betweenand within units: see Appendix A.3. The precision of treatmentcomparisons is determined by

(σ2b + σ2

w/r)/n (8.4)

and we assume that the cost of experimentation is proportional to

κn+ rn, (8.5)

where κ is the ratio of the cost of a unit to the cost of a sampledobservation within a unit.


We may now either minimize the variance for a given cost orminimize the cost for a given variance, which essentially leads tominimizing the objective function

(σ2b + σ2

w/r)(κ+ r). (8.6)

The optimum value of r for the estimation of treatment contrastsis κ1/2σw/σb. Also, because the third derivative of the objectivefunction is negative, the function falls relatively steeply to and risesrelatively slowly from its minimum so that it will often be best totake the integer larger than the in general non-integer value givenby the formula. The arguments for this are accentuated if eitherestimation of the components of variance is of intrinsic interest orif it is required to take special precautions against occasional badvalues.

There will usually be strong arguments for taking the same valueof r for all units. A possible exception is when the formal optimumgiven above is only slightly greater than one when a balanced sub-sample of units should be selected, preferably with an element ofrandomization, to be sampled twice.

A final general point is that observations on repeat sampleswithin the same unit should be measured blind to their identity.Otherwise substantial underestimation of error may arise and atleast some of the advantages of internal replication lost.

8.1.4 Quasi-economic analysis

In principle the choice of number of units involves a balance be-tween the cost of experimental units and the losses arising fromimprecision in the conclusions. If these can be expressed in com-mon units, for example of time or money, an optimum can bedetermined. This is a decision-oriented formulation and involvesformalizing the objective of the investigation in decision-theoreticterms.

While, even then, it may be rather rare to be able to attachmeaningful numbers to the costs involved it is instructive to ex-amine the resulting formulae. There are two rather different casesdepending on whether an essentially continuous choice or a discreteone is involved in the experiment.

As probably the simplest example of the first situation suppose


that a regression relation of the form

E(Y ;x) = β0 + β1x+ β2x2 = m(x), (8.7)

is investigated, for x ∈ [−1, 1], with a design putting n/3 obser-vations at each of x = −1, 0, 1. Assume that β2 > 0 so that theminimum response is at θ = −β1/(2β2) and that this is estimatedvia the least squares estimates of β1, β2 leading to θ = −β1/(2β2).

Now suppose that Y represents the cost per unit yield, and theobjective is to achieve minimum response in future applications.The loss compared with complete knowledge of the parameterscan thus be measured via

m(θ)−m(θ) = β1(θ − θ) + β2(θ2 − θ2), (8.8)

with expected value

β2E(θ2) + β1E(θ) +β2

1

4β2. (8.9)

Now make the optimistic assumption that via preliminary investi-gation the design has been correctly centred so that, while β1 hasstill to be estimated its true value is small and that curvature isthe predominant effect in the regression. Then approximately

var(θ) = var(β1)/(4β22) (8.10)

= 3σ2/(8nβ22) (8.11)

under the usual assumptions on error.If now the conclusions are applied to a target population equiv-

alent to N experimental units and if cy is the cost of unit increasein Y per amount of material equivalent to one experimental unit,then the expected loss arising from errors in estimating θ is

3Ncyσ2

8nβ2, (8.12)

whereas the cost of experimentation, ignoring set up cost and as-suming the cost cn per unit is constant, is ncn leading to an opti-mum value of the number of units of(

3Ncyσ2

8cnβ2

)1/2

. (8.13)

This has the general dependence on the defining parameters thatmight have been anticipated; note, however, that the approxima-


tions involved preclude the use of the formula in the “flat” casewhen both β1, β2 are very small.

Now suppose that a choice between just two treatments T,C isinvolved. We continue to suppose that the response variable Y issuch that its value is to be minimized and assume that for appli-cation to a target population equivalent to N units the differencein costs is Ncy times the difference of expected values under thetwo treatments, i.e. is Ncy∆, say. On the basis of an experimentinvolving n experimental units we estimate ∆ by ∆ with variance4σ2/n. For simplicity suppose that there is no a priori cost differ-ence between the treatments so that the treatment with smallermean is chosen. Then the loss arising from errors of estimation iszero if ∆ and ∆ have the same sign and is Ncy|∆| otherwise. Forgiven ∆ the expected loss is thus

Ncy|∆|P (∆∆ < 0; ∆) = Ncy|∆|Φ−|∆|√n /(2σ). (8.14)

The total expected cost is this plus the cost of experimentation,taken to be cnn.

The most satisfactory approach is now to explore the total cost asa function of n for a range of plausible values of ∆, the dependenceon Ncy/cn being straightforward. A formally more appealing, al-though often ultimately less insightful, approach is to average overa prior distribution of ∆. For simplicity we take the prior to benormal of zero mean and variance τ2: except for a constant inde-pendent of n the expected cost becomes

−Ncyτ√n√

(2π)(n+

4κ2

)−1/2 + ncn , (8.15)

where κ = τ/σ. The optimum n can now be found numerically as afunction of κ and ofNcyτ/cn. If κ is very large, a situation in whicha large value of n will be required, the optimum n is approximately

Ncyτ√(2π)cn

. (8.16)

The dependence on target population size and cost ratio is moresensitive than in the optimization problem discussed above presum-ably because of the relatively greater sensitivity to small errors ofestimation.

When the response is essentially binary, for example satisfac-tory and unsatisfactory, an explicit assignment of cost ratios maysometimes be evaded by supposing that the whole target popula-


tion is available for experimentation, that after an initial phase inwhich n units are randomized between T and C the apparentlysuperior treatment is chosen and applied to the remaining (N −n)units. The objective is to maximize the number of individuals giv-ing satisfactory response, or, equivalently, to maximize the numberof individuals receiving the superior treatment. Note that n/2 in-dividuals receive the inferior treatment in the experimental phase.The discussion assumes absence of a qualitative treatment by in-dividual interaction.

We argue very approximately as follows. Suppose that the pro-portions satisfactory are between, say 0.2 and 0.8, for both treat-ments. Then the difference ∆ between the proportions satisfactoryis estimated with a standard error of approximately 0.9/

√n; this

is because the binomial variance ranges from 0.25 to 0.16 over theproportions in question and 0.2 is a reasonable approximation forexploratory purposes. The probability of a wrong choice is thusΦ(−|∆|√n /0.9) and the expected number of wrongly treated in-dividuals is

n/2 + (N − n)Φ(− | ∆|√n /0.9). (8.17)

Again we may either explore this as a function of (n,∆) or av-erage over a prior distribution of ∆. With a normal prior of zeromean and variance τ2 we obtain an expected number of wronglytreated individuals of

n

2+N − n

πtan−1(

0.9τ√n

). (8.18)

For fixed N and τ the value of n minimizing (8.18), nopt, say, isreadily computed. The optimum proportion, nopt/N is shown as afunction of N and τ in Figure 8.1. This proportion depends morestrongly on τ than on N , especially for τ between about 0.5 and3. As τ approaches 0 the proportion wrongly treated becomes 0.5,and as τ approaches ∞ the number wrongly treated is n/2.

While the kinds of calculation outlined in this section throwsome light on the considerations entering choice of scale of efforttheir immediate use in applications is limited by two rather differ-ent considerations. First the explicit formulation of the objective ofexperimentation in a decision-making framework may be inappli-cable. Thus, even though the formulation of maximizing the num-ber of correctly treated individuals is motivated by clinical trialapplications, it is nearly always a considerable oversimplification


tau

optim

um p

ropo

rtion

0 1 2 3 4 5

0.05

0.10

0.15

0.20

0.25

0.30

Figure 8.1 Optimum proportion to be entered into experiment, nopt/N ,as a function of target population, N , and prior variance τ of the dif-ference of proportions: N = 100 (———), 200 (– – –), 50(· · · · · ·).

to suppose that strategies for treating whole groups of patientsare determined by the outcome of one study. Secondly, even if thedecision-theoretic formulation of objectives is reasonably appropri-ate, it may be extremely hard to attach meaningful numbers to thequantities determining n.

For that reason the procedures commonly used in considerationsof appropriate n are the more informal ones outlined in the earliersubsections.

8.2 Adaptive designs


In the previous discussion it has been assumed implicitly that eachexperiment is designed and implemented in one step or at least in


substantial sections dealt with in turn. If, however, experimentalunits are used in sequence and the response on each unit is ob-tained quite quickly, much greater flexibility in design is possible.We consider mostly an extreme case in which the response of eachunit becomes available before the next unit has to be randomizedto its treatment. Similar considerations apply in the less extremecase where experimental units are dealt with in smallish groups ata time or there is a delay before each response is obtained, duringwhich a modest number of further experimental units can be en-tered. The greater flexibility can be used in various ways. First, wecould choose the size of the experiment in the light of the observa-tions obtained, using so-called sequential stopping rules. Secondly,we might modify the choice of experimental units, treatments andresponse variables in the light of experience. Finally, we could al-locate a treatment to the next experimental unit in the light of theresponses so far obtained.

Illustrations. An agricultural field trial will typically take a grow-ing season to produce responses, a rotation experiment much longerand an experiment on pruning fruit trees many years. Adaptivetreatment allocation is thus of little or no relevance. By contrastindustrial experiments, especially on batch processes, and somelaboratory experiments may generate responses very quickly oncethe necessary techniques are well established. Clinical trials onlong-term conditions such as hypertension may involve follow-up ofpatients for several years; on the other hand where the immediaterelief of symptoms of, for instance, asthma is involved, adaptiveallocation may become appropriate.

We concentrate here largely on adaptive treatment allocation,with first some brief comments on modifications of the primaryfeatures of the experiment.

8.2.2 Modifications of primary features

Considered on a sufficiently long time-scale the great majority ofexperimentation is intrinsically sequential; the analysis and inter-pretation of one experiment virtually always raises further ques-tions for investigation and in the meantime other relevant workmay well have appeared. Thus the possibility of frequent or nearlycontinuous updating of design and objectives is, at first sight, veryappealing, but there are dangers. Especially in fields where new


ideas arise at quite high speed, there can be drawbacks to changingespecially the focus of investigations until reasonably unambiguousresults have been achieved.

The broad principles underlying possible changes in key aspectsof the design in the course of an experiment are that it should bepossible to check for a possible systematic shift in response afterthe change and that a unified analysis of the whole experimentshould be available. For example, changes in the definition of anexperimental unit may be considered.

Illustration. Suppose that in a clinical trial patients are, in par-ticular, required to be in the age range 55–65 years, but that aftersome time it is found that patient accrual is much slower than ex-pected. Then it may be reasonable to relax the trial protocol toallow a wider age range.

In general it may be proposed to change the acceptable valuesof one or more baseline variables, z. Analysis of the likely effectsof this change is, of course, important before a decision is reached.One possibility is that the treatment effects under study interactwith z. This might take the form of a linear interaction with z, or,perhaps more realistically in the illustration sketched above, theapproximation that a treatment difference ∆ under the originalprotocol becomes a treatment difference κ∆ for individuals whopass the new, but not the old, entry requirements; here κ is anunknown constant, with probably 0 < κ < 1, especially if theoriginal protocol was formulated to include only individuals likelyto show a treatment effect in the most sensitive form. Anotherpossibility is that while the treatment effect ∆ is the same for thenew individual the variance is increased.

A rather crude analysis of this situation is as follows. Supposethat an analysis is considered in which, if the protocol is extendedthe null hypothesis ∆ = 0 is tested via the mean difference over allindividuals. Let there be m units on each treatment a proportionp of which meet the original protocol. The sensitivity of a testusing only the individuals meeting the original protocol will bedetermined by the quantity

∆√

(pm)/σ, (8.19)

where σ is the standard deviation of an individual difference. Forthe mean difference over all individuals the corresponding measure


is

p∆ + (1− p)κ∆√m/[σ√p+ γ(1− p)], (8.20)

where γ ≥ 1 is the ratio of the variance of the individuals in theextended protocol to those in the original.

There is thus a formal gain from including the new individualsif and only if

p+ (1− p)κ > p2 + p(1− p)γ1/2. (8.21)

The formal advantages of extending the protocol would be greaterif a weighted analysis were used to account for differences of vari-ance possibly together with an ordered alternative test to accountfor the possible deflation of the treatment effect.

Note that amending the requirements for entry into an experi-ment is not the same as so-called enrichment entry. In this all unitsare first tested on one of the proposed treatments and only thosegiving appropriate responses are randomized into the main experi-ment. There are various possible biases inherent in this procedure.

Modification of the treatments used in the light of intermediateresults is most likely by:1. omitting treatments that appear to give uninteresting results;2. introducing new and omitting current factors in fractional fac-

torial experiments (see Section 5.6);3. changing the range of levels used for factors with quantitative

levels; see Section 8.3 below.If the nature of the response variable is changed, it will be impor-

tant to ensure that the change does not bias treatment contrastsbut also, unless the change is very minor, to aim to collect both newand old response variable on a nontrivial number of experimentalunits; see Exercise 8.1.

8.2.3 Modification of treatment allocation: two treatments

We now consider for simplicity the comparison of two treatmentsT,C; the arguments extend fairly directly to the comparison ofany small number of treatments. If the primary objective is theestimation of the treatment difference, then there will be no casefor abandoning equal replication unless either the variance of theresponse or the cost of implementation is different for the two treat-ments. For example, if T involves extensive modifications or novel


substances it may be appreciably more expensive than C and thismay only become apparent as the work progresses. If, however,an objective is, as in Section 8.1, to treat as many individuals aspossible successfully, there are theoretically many possible waysof implementing adaptive allocation, either to achieve economy orto address ethical considerations. We consider in more detail thesituation of Section 8.1 with binary responses.

The formulation used earlier involved a target population of Nindividuals, randomized with equal representation of T and C, un-til some point after which all individuals received the same treat-ment. Clearly there are many possibilities for a smoother transitionbetween the two regimes which, under some circumstances maybe preferable. The question of which is in some sense optimumdepends rather delicately on the balance between the somewhatdecision-like formulation of treating individuals optimally and theobjective of estimating a treatment difference and may also involveethical considerations.

The play-the-winner rule in its simplest form specifies that if andonly if the treatment applied to the rth unit yields a success thesame treatment is used for the (r+1)st unit. Then if this continuesfor a long time the proportion of individuals allocated to T tends toθC/(θT +θC), where θT and θC are respectively the probabilities offailure under T and C. Unless the ratio of these probabilities is ap-preciable the concentration of resources on the superior treatmentis thus relatively modest. Also, in some contexts, the property thatthe treatment to be assigned to a new unit is known as soon asthe response on the previous unit is available would be a sourceof serious potential bias. There are various modifications of theprocedure incorporating some randomization that would overcomethis to some extent.

A further point is that if after n units have been used the num-bers of units and numbers of successes are respectively nT , nC andrT , rC under T and C, then these are sufficient statistics for thedefining parameters provided these parameters are stable in time.A concept of what might be called design sufficiency suggests thatif a data-dependent allocation rule is to be used it should be viathe sufficient statistics. The play-the-winner rule is a local one andis not in accord with this. This suggests that it can be improvedunless the success probabilities vary in time, in which case a morelocal rule might indeed be appealing.

There are many possibilities for an allocation rule in which the


probability that the (r + 1)st unit is allocated to T depends onthe current sufficient statistic. The simplest is the biased coin ran-domization scheme in which the probability of allocation to T is1/2 + c if the majority of past results favour T , 1/2 − c if themajority favour C and 1/2 if there is balance; in some versionsthe probability of 1/2 is maintained for the first n0 trials. Herec is a positive constant, equal say to 1/6, so that the allocationprobabilities are 2/3, 1/3 and 1/2, respectively.

Much more generally, in any experiment in which units enter andare allocated to treatments in order, biased coin randomization canbe used to steer the design in any desired direction, for exampletowards balance with respect to a set of baseline variables, whileretaining a strong element of randomization which might be lost ifexact balance were enforced.

The simplest illustration of this idea is to move towards equalreplication. Define an indicator variable

∆r =

1 if rth unit receives T−1 if rth unit receives C (8.22)

and write Sr = ∆1 + . . .+∆r. Then biased coin randomization canbe used to move towards balance by making the probability that∆r+1 = 1 equal to 1/2− c, 1/2, 1/2 + c according as Sr >,=, < 0.

While it is by no means essential to base the analysis of such adesign on randomization theory some broad correspondence withthat theory is a good thing. In particular the design forces fairlyclose balance in the treatment allocation over medium-sized timeperiods. Thus if there were a long term trend in the propertiesof the experimental units that trend would largely be eliminatedand thus an analysis treating the data as two independent sampleswould overestimate error, possibly seriously.

Some of the further arguments involved can be seen from a sim-ple special case. This is illustrated in Table 8.1, the second line ofwhich, for example, has probability

1/2× 1/3× 1/3× 2/3. (8.23)

To test the null hypothesis of treatment equivalence, i.e. the nullhypothesis that the observation on any unit is unaffected by treat-ment allocation, a direct approach is first to choose a suitable teststatistic, for example the difference in mean responses between thetwo treatments. Then under the null hypothesis the value of thestatistic can be computed for all possible treatment configurations,


the probability of each evaluated and hence the null hypothesis dis-tribution of the test statistic found. Unfortunately this argument isinadequate. For example, in two of the arrangements of Table 8.1,all units receive the same treatment and hence provide no informa-tion about the hypothesis. More generally the more balanced thearrangement the more informative the data. This suggests that therandomization should be conditioned on a measure of the balanceof the design, for example on the terminal value of Sr. If a timetrend in response is suspected the randomization could be condi-tioned also on further properties concerned with any time trendin treatment balance; of course this requires a trial of sufficientlength.

This makes the randomization analysis less simple than it mightappear.

To study the consequences analytically we amend the random-ization scheme to one in which for some suitable small positivevalue of ε we arrange that

E(∆r+1 | Sr = s) = −εs. (8.24)

For small ε the Sr form a first order autoregressive process. If wedefine the test statistic Tn to be the difference Σyr∆r, where yr

is the response on the rth unit, it can be shown that asymptoti-cally (Tn, Sn) has a bivariate normal distribution of zero mean and

Table 8.1 Treatment allocation for four units; biased coin design withprobabilities 1/3, 1/2, 2/3. There are 8 symmetrical arrangements start-ing with −1.

Treat alloc S4 Prob

1 1 1 1 4 1/541 1 1 −1 2 2/541 1 −1 1 2 2/541 −1 1 1 2 3/541 1 −1 −1 0 4/541 −1 1 −1 0 6/541 −1 −1 1 0 6/541 −1 −1 −1 −2 3/54


covariance matrix specified by

var(Tn) = nc0 − nεΣcr(1− ε)r, (8.25)cov(Sn, Tn) = −εΣr 6=syr(1− ε)|r−s|/2, (8.26)

var(Sn) = (2ε)−1, (8.27)

whereck = n−1ΣYrYr+k (8.28)

and the sum in the first formula is from r = 1, . . . , n− 1.This suggests that for a formal asymptotic theory one should

take ε to be proportional to 1/n.An approximate test can now be constructed from the asymp-

totically normal conditional distribution of Tn given Sn = s whichis easily calculated from the covariance matrix. More detailed cal-culation shows that if there is long- or medium-term variation inthe yr then the variance of the test statistic is much less than thatcorresponding to random sampling.

In this way some protection against bias and sometimes a sub-stantial improvement over total randomization are obtained. Themain simpler competitor is a design in which exact balance is en-forced within each block of 2k units for some suitable fairly smallk. The disadvantage of this is that after some point within eachblock the allocation of the next unit may be predetermined with aconsequent possibility of selection bias.

8.3 Sequential regression design

We now consider briefly some of the possibilities of sequential treat-ment allocation when each treatment corresponds to the values ofone or more quantitative variables. Optimal design for various cri-teria within a linear model setting typically leads to unique designs,there being no special advantage in sequential development unlessthe model to be fitted or the design region change. Interesting pos-sibilities for sequential design arise either with nonlinear models,when a formally optimal design depends on the unknown true pa-rameter value, or with unusual objectives.

As an example of the second, we consider a simple regressionproblem in which the response Y depends on a single explanatoryvariable x, so that E(Y ;x) = ψ(x), where ψ(x) may either be anunknown but typically monotonic function or in some cases maybe parameterized. Suppose that we wish to estimate the value of


xp, say, assumed unique, such that

ψ(xp) = p. (8.29)

A special case concerns binary responses when xp is the factorlevel at which a proportion p of successful responses is achieved. Ifp = 1/2 this is the so-called ED50 point when the factor level isdose of a drug.

There are two broad classes of procedure which can now be usedfor sequential allocation. In one there is a preset collection of lev-els of x, often equally spaced, and a rule is specified for movingbetween them. In the more elaborate versions the set of levels of xvaries, typically shrinking as the target value is neared.

Note that if the function ψ(x) is parameterized, optimal designfor the appropriate parametric function can be used and this willtypically require dispersed values of x. The methods outlined hereare primarily suitable when ψ(x) is to be regarded nonparametri-cally.

For a fixed set of levels, in the so-called up and down or staircasemethod a rule is specified for moving to the next higher or nextlower level depending usually only on the current observation. Notethat this will conflict with design sufficiency. The rule is chosenso that the levels of x used cluster around the target; estimationis typically based on an average of the levels used eliminating atransient effect arising from the initial conditions. Thus with acontinuous response the simplest rule, when ψ(x) is increasing, isto move up if the response is less than p, down if it is greater thanp. If the rule depends only on the last response observed the systemforms a Markov chain.

The most widely studied version in which the levels of x vary asthe experiment proceeds is the Robbins-Monro procedure in which

xr+1 = xr + ar−1(xr − p), (8.30)

where a > 0 is a constant to be chosen. This is in effect an upand down method in which the step length decreases at a rateproportional to 1/n; see Exercise 8.2.

8.4 Designs for one-dimensional error structure

We now turn to some designs based on quite strong assumptionsabout the form of the uncontrolled variation; for some preliminary


remarks see Section 3.7. We deal with experimental units arrangedin sequence in time or along a line in space.

Suppose then that the experimental units are arranged at equallyspaced intervals in one dimension. The uncontrolled variation thusin effect defines a time series and two broad possibilities are firstthat there is a trend, or possibly systematic seasonal variation withknown wavelength, and secondly that the uncontrolled variationshows serial correlation corresponding to a stationary time series.

In both cases, especially with fairly small numbers of treatments,grouping of adjacent units into blocks followed by use of a random-ized block design or possibly a balanced incomplete block designwill often supply effective error control without undue assumptionsabout the nature of the error process. Occasionally, however, es-pecially in a very small experiment, it may be preferable to basethe design and analysis on an assumed stochastic model for theuncontrolled variation. This is particularly likely to be useful ifthere is some intrinsic interest in the structure of the uncontrolledvariation.

Illustration. In a textile experiment nine batches of material wereto be processed in sequence comparing three treatments, T1, T2, T3

equally replicated. The batches were formed by thoroughly mixinga large consignment of raw material and dividing it at randominto nine equal sections, numbered at random 1, ..., 9 and to beprocessed in that order. The whole experiment took appreciabletime and there was some possibility that the oil on the materialwould slowly oxidize and induce a time trend in the responses ontop of random variation; there was some intrinsic interest in sucha trend. It is thus plausible to assume that in addition to anytreatment effect there is a time trend plus random variation.

Motivated by such situations suppose that the response on thesth unit has the form

Ys = µ+ τ[s] + β1φ1(s) + . . .+ βpφp(s) + εs, (8.31)

where τ[s] is the treatment effect for the treatment applied to units and φq(s) is the value at point s of the orthogonal polynomial ofdegree q defined on the design points and the ε’s are uncorrelatederrors of zero mean and variance σ2. Quite often one would takep = 1 or 2 corresponding to a linear or quadratic trend.

For example with n = 8 the values of φ1(s), φ2(s) are respectively


−7 − 5 − 3 − 1 + 1 + 3 + 5 + 7+7 + 1 − 3 − 5 − 5 − 3 − 1 − 7

and for n = 9

−4 −3 −2 −1 0 +1 +2 +3 +4+28 +7 −8 −17 −20 −17 −8 +7 +28.

Now for v = 3, n = 9 division into three blocks followed by ran-domization and the fitting of (8.31) would have treatment assign-ments some of which are quite inefficient; moreover no convincingrandomization justification would be available. The same wouldapply a fortiori to v = 4, n = 8 and with rather less force to v = 2,n = 8. This suggests that, subject to reasonableness in the specificcontext, it is sensible in such situations to base both design andanalysis directly on the model (8.31).

Then a design such that the least squares analysis of (8.31) isassociated with a diagonal matrix, i.e. such that the coefficientvectors associated with treatments are orthogonal to the vector oforthogonal polynomials, will generate estimates of minimum vari-ance.

This suggests that we consider the v×pmatrixW ∗ with elements

w∗iq = ΣTiφ∗q(s), (8.32)

where the sum is over all units s assigned to Ti and the asterisksrefer to orthogonal polynomials normalized to have unit sum ofsquares over the data points. The objective is to make W ∗ = 0and where that is not combinatorially possible to make W ∗ smallin some sense. Some optimality requirements that can be used toguide that choice are described in Chapter 7. In some contextssome particular treatment contrasts merit special emphasis. Often,however, there results a design with all the elements w∗iq small andthe choice of specific optimality criterion is not critical.

The form of the optimal solution can be seen by examining firstthe special case of two treatments each replicated r times, n = 2r.We take the associated linear model in the slightly modified form

E(Ys) = µ± δ + β1φ∗1(s) + . . .+ βpφ

∗p(s), (8.33)

where the treatment effect is 2δ and the normalized form of theorthogonal polynomials is used. The matrix determining the least


squares estimates and their precision is

n 0 0 0 . . . 00 n d∗1 d∗2 . . . d∗p0 d∗1 1 0 . . . 0. . . . .0 d∗p 0 0 . . . 1

, (8.34)

where

d∗q = ΣT2φ∗q(i)− ΣT1φ

∗q(i). (8.35)

The inverse of the bordered matrix is easily calculated and from itthe variance of 2δ, the estimated treatment effect, is

2σ2

r(1− Σd∗2t

2r)−1. (8.36)

Some of the problems with the approach are illustrated by thecase n = 8. For p = 2 the design

T2T1T1T2T1T2T2T1

achieves exact orthogonality, d∗1 = d∗2 = 0 so that quadratic trendelimination and estimation is achieved without loss of efficiency.On the other hand the design is sensitive to cubic trends and ifa cubic trend is inserted into the model by introducing φ3(s) thevariance of the estimated treatment effect is almost doubled, i.e.the design is only 50% efficient. A design specifically chosen tominimize variance for p = 3 has about 95% efficiency.

For general values of the number, v, of treatments similar argu-ments hold. The technically optimal design depends on the specificcriterion and for example on whether some treatment contrasts areof more concern than others, but usually the choice for given p isnot critical.

Suppose next that the error structure forms a stationary timeseries. We deal only with a first-order autoregressive form in whichthe Gaussian log likelihood is

− log σ − (n− 1) logω − (y1 − µ− τ[1])2/(2σ2)

−Σys − µ− τ[s] − ρ(ys−1 − µ− τ[s−1])2/(2ω)2, (8.37)

where τ[s] is the treatment effect for the treatment applied to unit s,where ρ is the correlation parameter of the autoregressive processand where σ2 and ω2 are respectively marginal and innovationvariances of the process so that σ2 = ω2/(1 − ρ2). The formally


anomalous role of the first error component arises because it isassumed to have the stationary distribution of the process.

Maximum likelihood can now be applied to estimate the param-eters. It is plausible, and can be confirmed by detailed calculation,that at least for ρ ≥ 0, the most common case, a suitable designstrategy is to aim for neighbourhood balance. That is, every pairof different treatments should occur next to one another the samenumber of times. Note, however, that for ρ < 0 allocating the sametreatment to some pairs of adjacent units enhances precision.

It is instructive and of intrinsic interest to consider the limitingcase ρ = 1 corresponding to a random walk error structure. In thisthe error component for the sth experimental unit is

ζ1 + . . .+ ζs,

where the ζs are uncorrelated random variables of zero meanand variance σ2

ζ , say. Suppose that Tu is adjacent to Ts λsu times,for s < u. We may replace the full set of responses by the firstresponse and the set of differences between adjacent responses.Under the proposed error structure the resulting errors are uncor-related. Moreover the initial response is the only one to have anexpectation depending on the overall mean µ and hence is uninfor-mative about treatment contrasts. Thus the design is equivalent toan incomplete block design with two units per block and with nopossibility of recovery of interblock information. When the treat-ments are regarded symmetrically it is known that a balanced in-complete block design is optimal, i.e. that neighbourhood balanceis the appropriate design criterion.

Suppose that this condition is exactly or nearly satisfied, so thatthere are λ occurrences of each pair of treatments. Then with vtreatments there are n = λv(v − 1)/2 units in the effective in-complete block design and thus r = λ(v − 1)/2 replicates of eachtreatment. Note, however, that in the notional balanced incom-plete design associated with the system each treatment is repli-cated 2r times, because each unit contributes to two differences.The efficiency factor for the balanced incomplete block design isv/2(v − 1) so that the variance of the difference between twotreatments is, from Section 4.2.4,

σ2ζ (v − 1)/(vr). (8.38)

Suppose now that the system is investigated by a standard ran-domized block design with its associated randomization analysis.


Within one block the unit terms are of the form

ζ1, ζ1 + ζ2, . . . , ζ1 + . . .+ ζv (8.39)

and the effective error variance for the randomized block analysisis the expected mean square within this set, namely

σ2ζ (v + 1)/6. (8.40)

Thus the variance of the estimated difference between two treat-ments is

σ2ζ (v + 1)/(3r)

so that the asymptotic efficiency of the standard randomized blockdesign and analysis is

3(v − 1)/v(v + 1). (8.41)

Thus at v = 2 the efficiency is 1/2, as is clear from the con-sideration that half the information contained in a unit is lost ininterblock contrasts. The efficiency is the same at v = 3 and there-after decreases slowly with v. The increase in effective variance anddecrease in efficiency factor as v increases are partially offset by thedecreasing loss from interblock comparisons.

A possible compromise for larger v, should it be desired to usea standard analysis with a clear randomization justification, is toemploy a balanced incomplete block design with the number ofunits per block two or three, i.e. chosen to optimize efficiency ifthe error structure is indeed of the random walk type. Anotherpossibility in a very large experiment is to divide the experimentinto a number of independent sections with a systematic neighbourbalance design within each section and to randomize the names ofthe treatments separately in the different sections.

Finally we need to give sequences that have neighbour balance.See Appendix B for some of the underlying algebra. For two, threeand four treatments suitable designs iterate the initial sequences

T1, T2

T1, T2, T3;T2, T1, T3;T1, T2, T3, T4;T2, T3, T1, T4;T3, T1, T4, T2.

There is an anomaly arising from end effects which prevent thebalance condition being exactly satisfied unless the designs are re-garded as circular, i.e. with the last unit adjacent to the first, but


this is almost always not meaningful. In a large design with manyrepetitions of the above the end effect is negligible.

8.5 Spatial designs

We now turn briefly to situations in which an important featureis the arrangement of experimental units in space. In so far asspecial models are concerned the discussion largely parallels thatof the previous section on designs with a one-dimensional arrayof units. A further extension, unexplored so far as we are aware,would be to spatial-temporal arrangements of experimental units.

There are two rather different situations. In the first one or morecompact areas of ground are available and the issue is to dividethem into subareas whose size and shape are determined by theinvestigator, usually constrained by technological considerations ofease of processing and often also by the need for guard areas to iso-late the distinct units. Except for the guard areas the whole of theavailable area or areas is used for the experiment. The other pos-sibility is that a very large area is available within which relativelysmall subareas are chosen to form the experimental units.

Illustration. In an agricultural field trial one or more areas ofland are divided into plots, the area and size of these being de-termined in part by ease of sowing and harvesting. By contrast inan ecological experiment a large area of, say, forest is available.Within selected areas different treatments for controlling diseaseare to be compared. With k treatments for comparison, a versionof a randomized block design will require the definition of a numberof sets of k areas. The areas within a set should be close togetherto achieve homogeneity, although sufficiently separated to ensureno leakage of treatment from one area to another and no directtransmission of infection from one area to another.

For example with k = 3 the areas might be taken as circles ofradius r centred at the corners of an equilateral triangle of sided, d > 2r. The orientation of the triangle might not be critical; thecentroids of different triangles would be chosen to sample a rangeof terrains and perhaps to ensure good representation of regions ofpotentially high infection, in order to study treatment differencesin a context where most sensitive estimation of effects is possible.

In a recent experiment on the possible role of badgers in bovinetuberculosis the experimental areas were clusters of three approxi-mately circular areas of radius about 10 km with separation zones.


The treatments were proactive culling of badgers, reactive cullingfollowing a breakdown, and no culling. The regions were choseninitially as having high expected breakdown rates on the basis ofpast data. Before randomization the regions were modified to avoidmajor rivers, motorways, etc., except as boundaries. The whole in-vestigation consists of ten such triplets, thus forming a randomizedblock design of ten blocks with three units per block.

When the units are formed from a set of essentially contiguousplots a key traditional design is the Latin square or some general-ization thereof. It is assumed that the plots are oriented so that anypredominant patterns of variability lie along the rows and columnsand not diagonally across the square.

There are many variants of the design when the number of treat-ments is too large to fit into the simple Latin square form or, pos-sibly, that even one full Latin square would involve too much repli-cation of each treatment. Youden squares provide one extensionin which the rows, say, form a balanced incomplete block designand each treatment continues to fall once in each column. We shalldescribe only one further possibility, the lattice square designs.

In these designs the whole design consists of a set of q×q squares,where the number of treatments is q2. The design is such that eachtreatment occurs once in each square and each pair of treatmentsoccurs together in the same row or column of a square the samenumber of times. Such designs exist when q is a prime power, pm,say. The construction is based on the existence of a complete setof (q − 1) mutually orthogonal Latin squares.

We first set out the treatments in a q × q square called a keypattern. Each treatment in effect has attached to it (q+1) aspects,row number, column number and the (q−1) symbols in the Galoisfield GF(pm) in the various mutually orthogonal Latin squares.Notionally we may use the Galois field symbols to label also therows and columns. We then form squares by taking the labellingcharacteristics in pairs, choosing each equally often. Thus if q iseven, we need to take each twice and if q is odd, so that the numberof labelling characteristics is even, then each can be used once.

Table 8.2 shows the design for nine treatments in two 3 × 3squares. The first part of the table gives the key pattern and twoorthogonal 3 × 3 Latin squares. Imagine the rows and columnslabelled (0, 1, 2). The design itself, before randomization, is formedby taking (rows, columns) and (first alphabet, second alphabet) as


Table 8.2 3 × 3 lattice squares for 9 treatments: (a) key pattern andorthogonal Latin squares; (b) the design.

(a)1 2 3 00 11 224 5 6 12 20 017 8 9 21 02 10

(b)1 2 3 1 6 84 5 6 9 2 47 8 9 5 7 3

determining via the key pattern the treatments to be assigned toany particular cell. For instance, the entry in row 1 and column 0 ofthe second square is the element in the key pattern correspondingto (1, 0) in the two orthogonal squares. The way that the design isconstructed ensures that each pair of treatments occurs togetherin either a row or a column just once. The design is resolvable.

For 16 treatments the rows and columns of the key pattern andthe three alphabets of the orthogonal Latin squares give five cri-teria. In this case it needs five 4 × 4 squares to achieve balance,for example via (row, column); (column, alphabet 1); (alphabet 1,alphabet 2); (alphabet 2, alphabet 3); (alphabet 3, row).

The above designs all have what might be called a traditionaljustification. That is, for continuous responses approximately nor-mally distributed there is a naturally associated linear model with ajustification via randomization theory. For other kinds of response,for example binary responses, it will often be reasonable to startwith the corresponding exponential family generalization.

While the Latin square and similar designs retain some validitywhatever the pattern of uncontrolled variability they are sensibledesigns when any systematic effects are essentially along the rowsand columns. Occasionally more specific assumptions may be suit-able. Thus suppose that we have a spatial coordinate system (η, ζ)corresponding to the rows and columns and that the uncontrolledcomponent of variation associated with the unit centred at (η, ζ)has the generalized additive form

a(η) + b(ζ) + ε(η, ζ), (8.42)


Table 8.3 4×4 Latin square. Formal cross-product values of linear bylinear components of variation and an optimal treatment assignment.

+9, T1 +3, T2 −3, T3 −9, T4

+3, T4 +1, T3 −1, T2 −3, T1

−3, T3 −1, T4 +1, T1 +3, T2

−9, T2 −3, T1 +3, T4 +9, T3

where the ε’s are independent and identically distributed and a(η)and b(ζ) are arbitrary functions. Then clearly under unit-treatmentadditivity the precision of estimated treatment contrasts is deter-mined by var(ε).

Now suppose instead that the variation is, except for randomerror, a polynomial in (η, ζ). We consider a second degree polyno-mial. Because of the control over row and column effects, a Latinsquare design balances out all terms except the cross-product termηζ. Balance of the pure linear and quadratic terms does not requirethe strong balance of the Latin square but for simplicity we restrictourselves to Latin squares and look for that particular square whichis most nearly balanced with respect to the cross-product term.

The procedure is illustrated in Table 8.3. The rows and columnsare identified by the standardized linear polynomial with equallyspaced levels, for example by −3, −1, 1, 3 for a 4 × 4 square.With the units of the square are then associated the formal prod-uct of the row and column identifiers. By trial and error we findthe Latin square most nearly balanced with respect to the cross-product term. This is shown for a 4× 4 square in Table 8.3. Espe-cially if the square is to be replicated, the names of the treatmentsshould be randomized within each square but additional random-ization would destroy the imposed balance.

The subtotals of the cross-product terms for the four treatmentsare respectively 4, −4, 4, −4. These would be zero if exact orthogo-nality between the cross-product spatial term and treatments couldbe achieved. In fact the loss of efficiency from the nonorthogonalityis negligible, much less than if a Latin square had been random-ized. Note that if the four treatments were those in a 22 factorial


system it would be possible to concentrate the loss of efficiency onthe interaction term.

If the design were analysed on the basis of the quadratic modelof uncontrolled variation, two degrees of freedom would be re-moved from each of the between rows and between columns sumsof squares and reallocated to the residual.

Often, for example in agricultural field trials, a much more real-istic model of spatial variation can be based on a stochastic modelof neighbourhood dependence. The simplest models of such typeregard the units centred at (η− 1, ζ), (η+1, ζ), (η, ζ − 1), (η, ζ +1)as the neighbours N(η, ζ) of the unit centred at (η, ζ). If ξ(η, ζ) isthe corresponding component of uncontrolled variation one repre-sentation of spatially correlated variation has

ξ(η, ζ) − αΣj∈N(η,ζ)ξ(j) = ε(η, ζ), (8.43)

where the ε’s are independently and identically distributed.A different assumption is that the ξ’s are generated by a two-

dimensional Brownian motion, i. e. that

ξ(η, ζ) = Ση′≤η,ζ′≤ζε∗(η′, ζ′), (8.44)

where again the ε∗’s are independent and identically distributed.It is in both cases then very appealing to look for designs in

which every pair of treatments are neighbours of one another thesame number of times.

Sometimes, especially perhaps when a rather inappropriate de-sign has been used, it may, whatever the design, be reasonable tofit a realistic spatial model, for example with a long-tailed distri-bution of ε or ε∗. This may partly recover the efficiency proba-bly achievable more simply by more appropriate design. Typicallyquite extensive calculations, for example by Markov chain MonteCarlo methods, will be needed. Also the conclusions will often berelatively sensitive to the assumptions about the uncontrolled vari-ation.


There is a very extensive literature on the choice of sample sizevia considerations of power. A thorough account is given by Desuand Raghavarao (1990). For an early decision-oriented analysis, seeYates (1952).

Optimal stopping in a sequential decision-making formulation


is connected with general sequential decision making; for formula-tions aimed at clinical trials see, for example, Carlin, Kadane andGelfand (1998) and Wang and Leung (1998).

For a critique of enrichment entry, see Leber and Davis (1998).Most designs in which the treatment is adapted sequentially trial

by trial have little or no element of randomization. For discussionof a design in which randomization is needed and an applicationin psychophysics, see Rosenberger and Grill (1997).

There is a very extensive literature on sequential stopping, stem-ming originally from industrial inspection (Wald, 1947) and morerecently motivated largely by clinical trials (Armitage, 1975; White-head, 1997; Jennison and Turnbull, 2000).

Early work (Neyman, 1923; Hald, 1948) on designs in the pres-ence of polynomial trends presupposed a systematic treatment ar-rangement with a number of replicates of the same sequence. Cox(1951) discussed the choice of arrangements with various optimumproperties and gave formulae for the increase in variance conse-quent on nonorthogonality. Atkinson and Donev (1996) have re-viewed subsequent developments and given extensions. Williams(1952) discussed design in the presence of autocorrelated errorstructure and Kiefer (1958) proved the optimality of Williams’sdesigns. Similar combinatorial arrangements needed for long se-quences on a single subject are mentioned in the notes for Chapter4.

Methods for using local spatial structure to improve precision infield trials stem from Papadakis (1937); see also Bartlett (1938).Subsequent more recent work, for example Bartlett (1978), makessome explicit use of spatial stochastic models leading to some no-tion of neighbourhood balance as a design criterion. At the timeof writing the extensive literature on the possible advantages ofneighbourhood balanced spatial designs over randomized blocksand similar techniques is best approached via the paper of Azais,Monod and Bailey (1998) showing how a careful assessment of rel-ative advantage is to be made and the theoretical treatment ofrandomization theory under a special method of analysis (Monod,Aza+is and Bailey, 1996). Besag and Higdon (1999) describe avery detailed analysis of some spatial designs based on a Markovchain Monte Carlo technique using long-tailed distributions and aspecific spatial model. For a general review of the applications toagricultural field trials, see Gilmour, Cullis and Verbyla (1997).



1. Suppose that in comparing two treatments T and C, a variableY1 is measured on r1 units for each treatment with an error vari-ance after allowing for blocking of σ2

1 . It is then decided that adifferent response variable Y2 is to be preferred in terms of whichthe final comparison of treatments is to be made. Therefore afurther r12 units are assigned to each treatment on which bothY1 and Y2 are to be measured followed by a further r2 units foreach treatment on which only Y2 is measured.

Under normal theory assumptions in which the regression of Y2

on Y1 is the same for both treatments, obtain a likelihood basedmethod for estimating the required treatment effect. What con-siderations would guide the choice of r12 and r2?

2. The Robbins-Monro procedure is an adaptive treatment assign-ment rule in effect of the up-and-down type with shrinking stepsizes. If we observe a response variable Y with expectation de-pending in a monotone increasing way on a treatment (dose)variable x via E(Y ;x) = η(x), where η(·) is unknown, the ob-jective is assumed to be the estimation for given p of x(p), where

η(x(p)) = p.

Thus for a binary response and p = 1/2, estimation is requiredof the 50% point of the response, the ED50. The procedure isto define treatment levels recursively depending on whether thecurrent response is above or below the target via

xt+1 = xt − at(xt − p),

where the preassigned sequence at defines the procedure. If theprocedure is stopped after n steps the estimate of x(p) is eitherxn or xn+1.

Give informal motivation for the formal conditions for conver-gence of the procedure that Σat is divergent and Σa2

t convergentleading to the common choice an = a/n, for some constant a.Note, however, that the limiting behaviour as t increases is rel-evant only as a guide to behaviour in that the procedure willalways have to be combined with a stopping rule.

Assume that the procedure has reached a locally linear regionnear x(p) in which the response function has slope η′(p)and thevariance of Y is constant, σ2, say. Show by rewriting the defining


equation in terms of the response Yt at xt and assuming appro-priate dependence on sample size that the asymptotic varianceis

a2σ2/(2aη′(p) − 1).

How might this be used to choose a?

The formal properties were given in generality by Robbins andMonro (1951); for a more informal discussion see Wetherill andGlazebrook (1986, Chapters 9 and 10).

3. If point events occur in a Poisson process of rate ρ, the numberNt of points in an interval of length t0 has a Poisson distributionwith mean and variance equal to ρt0. Show that for large t0,logNt0 − log t0 is asymptotically normal with mean log ρ andvariance 1/(ρt0) estimated by 1/Nt0.

Show that if, on the other hand, sampling proceeds until a pre-assigned number n0 of points have been observed then the cor-responding time period T0 is such that 2ρT0 has a chi-squareddistribution with 2n0 degrees of freedom and that the asymp-totic variance of the estimate of ρ is again the reciprocal of thenumber of points counted.

Suppose now that two treatments are to be compared with cor-responding Poisson rates ρ1, ρ2. Show that the variance of theestimate of log ρ2− log ρ1 is approximately 1/N2+1/N1 which ifthe numbers are not very different is approximately 4/N.; hereN1 and N2 are the numbers of points counted in the two groupsand N. = N1 +N2. Hence show that to achieve a certain preas-signed fractional standard error, d, in estimating the ratio sam-pling should proceed until about 4/d2 points have been countedin total, distributing the sampling between the two groups toachieve about the same number of points from each.

What would be the corresponding conclusion if both processeshad to be corrected for a background noise process of knownrate ρ0? What would be the consequences of overdispersion inthe Poisson processes?

4. In a randomized trial to compare two treatments, T and C, withequal replication over 2r experimental units, suppose that treat-ments are allocated to the units in sequence and that at eachstage the outcomes of the previous allocations are known andmoreover that the strategy of allocation is known. Some aspects


of the avoidance of selection bias can be represented via a two-person zero-sum game in which player I chooses the treatment tobe assigned to each individual and player II “guesses” the out-come of the choice. Player II receives from or pays out to playerI one unit depending on whether the guess is correct or false.Blackwell and Hodges (1957) show that the design in which thetreatments are allocated independently with equal probabilitiesuntil one treatment has been allocated r times is the optimalstrategy for player I with the obvious associated rule for playerII and that the expected number of correct guesses by player IIexceeds r by

r

(2rr

)/22r ∼ √

(r/π),

whereas if all treatment allocations are equally likely and playerII acts appropriately the corresponding excess is

22r−1/

(2rr

)∼ √

(πr/4).

Note that the number of excess correct guesses could be reducedto zero by independently randomizing each unit but this wouldcarry a penalty in terms of possibly serious imbalance in the twotreatment arms.

5. The following adaptive randomization scheme has been used inclinical trials to compare two treatments T and C when a binaryresponse, success or failure, can be observed on each experimen-tal unit before the next experimental unit is to be randomized.The initial randomization is represented by an urn containingtwo balls marked T and C respectively. Each time a success isobserved, a ball marked by the successful treatment is added tothe urn.In a trial on newborn infants with respiratory failure, the newtreatment T was highly invasive: extracorporeal membrane oxy-genation (ECMO), and C was conventional medical manage-ment. The first patient was randomized to T and survived, thesecond was randomized to C and died, and the next ten patientswere randomized to T , all surviving, at which time the trial wasterminated. (Wei, 1988; Bartlett et al., 1985).Compare the efficiency of the adaptive urn scheme to that ofbalanced allocation to T and C, for an experiment with a to-tal sample of size 12, first under the assumption that p1, the


probability of success under T is 0.80, and p2, the probabilityof success under C is 0.20, and then under the assumption thatp1 = p2. See Begg (1990) for a discussion of inference under theadaptive randomization scheme.The results of this study were considered sufficiently inconclu-sive that another trial was conducted in Boston in 1986, usinga sequential allocation scheme in which patients were random-ized equally to T and C in blocks of size four, until four deathsoccurred on either T or C. The rationale for this design andthe choice of stopping rule is given in Ware (1989); analysis ofthe resulting data (9 units randomized to T with no failures,10 units randomized to C with 4 failures) indicates substantialbut not overwhelming evidence in favour of ECMO. The dis-cussion of Ware (1989) highlights several interesting ethical andstatistical issues.Subsequent studies have not completely clarified the issue, al-though the UK Collaborative ECMO Trial (1996) estimated therisk of death for ECMO relative to conventional therapy to be0.55.


APPENDIX A

Statistical analysis

A.1 Introduction

Design and statistical analysis are inextricably linked but in themain part of the book we have aimed primarily to discuss designwith relatively minimal discussion of analysis. Use of results con-nected with the linear model and analysis of variance is, however,unavoidable and we have assumed some familiarity with these. Inthis Appendix we describe the essential results required in a com-pact, but so far as feasible, self-contained way.

To the extent that we concern ourselves with analysis, we rep-resent the response recorded on a particular unit as the value of arandom variable and the objective to be inference about aspects ofthe underlying probability distributions, in particular parametersdescribing differences between treatments. Such models are an es-sential part of the more formal part of statistical analysis, i.e. thatpart that goes beyond graphical and tabular display, importantthough these latter are.

One of the themes of the earlier chapters of this book is an inter-play between two different kinds of probabilistic model. One is theusual one in discussions of statistical inference where such modelsare idealized representations of physical random variability as itarises when repeat observations are made under nominally similarconditions. The second model is one in which the randomness en-ters only via the randomization procedure used by the investigatorin allocating treatments to experimental units. This leads to thenotion that a standard set of assumptions plus consideration ofthe design used implies a particular form of default analysis with-out special assumptions about the physical form of the randomvariability encountered. These considerations are intended to re-move some of the arbitrariness that may seem to be involved inconstructing models and analyses for special designs.


A.2 Linear model

A.2.1 Formulation and assumptions

We write the linear model in the equivalent forms

E(Y ) = Xθ, Y = Xθ + ε, (A.1)

where by definition E(ε) = 0. Here Y is a n× 1 vector of randomvariables representing responses to be observed, one per experimen-tal unit, θ is a q × 1 vector of unknown parameters representingvariously treatment contrasts, including main effects and interac-tions, block and similar effects, effects of baseline variables, etc.and X is a n× q matrix of constants determined by the design andother structure of the system. Typically some components of θ areparameters of interest and others are nuisance parameters.

It is frequently helpful to write q = p+1 and to take the first col-umn of X to consist of ones, concentrating then on the estimationof the last p components of θ. Initially we suppose that X is of fullrank q < n. That is there are fewer parameters than observations,so that the model is not saturated with parameters, and moreoverthere is not a redundant parameterization.

For the primary discussion we alternate between two possibleassumptions about the error vector ε: it is always clear from contextwhich is being used.

Second moment assumption. The components of ε are uncorre-lated and of constant variance σ2, i.e.

E(εεT ) = σ2I, (A.2)

where I is the n× n identity matrix.Normal theory assumption. The components of ε are indepen-

dently normally distributed with zero mean and variance σ2.Unless explicitly stated otherwise we regard σ2 as unknown. The

normal theory assumption implies the second moment assumption.The reasonableness of the assumptions needs consideration in eachapplied context.

A.2.2 Key results

The strongest theoretical motivation of the following definitions isprovided under the normal theory assumption by examining thelikelihood function, checking that it is the likelihood for an ex-ponential family with q + 1 parameters and a q + 1 dimensional


canonical statistic and that hence analysis under the model is tobe based on the statistics now to be introduced. We discuss opti-mality further in Section A2.4 but for the moment simply considerthe following statistics.

We define the least squares estimate of θ by the equation

XTXθ = XTY, (A.3)

the residual vector to be

Yres = Y −Xθ (A.4)

and the residual sum of squares and mean square as

SSres = Y TresYres, MSres = SSres/(n− q). (A.5)

Occasionally we use the extended notation Yres.X , or even Y.X ,to show the vector and model involved in the definition of theresidual.

Because X has full rank so too does XTX enabling us to write

θ = (XTX)−1XTY. (A.6)

Under the second moment assumption θ is an unbiased estimateof θ with covariance matrix (XTX)−1σ2 and MSres is an unbiasedestimate of σ2. Thus the covariance matrix of θ can be estimatedand approximate confidence limits found for any parametric func-tion of θ. One strong justification of the least squares estimates isthat they are functions of the sufficient statistics under the normaltheory assumption. Another is that among unbiased estimators lin-ear in Y , θ has the “smallest” covariance matrix, i.e. for any matrixC for which E(CY ) = θ, cov(θ)−cov(CY ) is positive semi-definite.Stronger results are available under the normal theory assumption;for example θ has smallest covariance among all unbiased estima-tors of θ.

Under the second moment assumption on substituting Y = Xθ+ε into (A.6) we have

θ = (XTX)−1XTXθ + (XTX)−1XT ε

= θ + (XTX)−1XT ε. (A.7)

The unbiasedness follows immediately and the covariance matrixof θ is

E(θ − θ)(θ − θ)T = (XTX)−1XTE(εεT )X(XTX)−1

= (XTX)−1σ2. (A.8)


Further

Yres = I −X(XTX)−1XT ε. (A.9)

Now the residual sum of squares is Y TresYres, so that the expected

value of the residual sum of squares is

σ2tr(I −X(XTX)−1XT T I −X(XTX)−1XT ).Direct multiplication shows that

I −X(XTX)−1XT T I −X(XTX)−1XT= I −X(XTX)−1XT

and its trace is

tr(In −XTX(XTX)−1) = tr(In)− tr(Iq) = n− q, (A.10)

where temporarily we show explicitly the dimensions of the identitymatrices involved.

A.2.3 Some properties

There is a large literature associated with the results just sketchedand their generalizations. Here we give only a few points.

First under the normal theory assumption the log likelihood is,except for a constant

−n log σ − (Y −Xθ)T (Y −Xθ)/(2σ2). (A.11)

The identity

(Y −Xθ)T (Y −Xθ)

= (Y −Xθ) +X(θ − θ)T (Y −Xθ) +X(θ − θ)= SSres + (θ − θ)T (XTX)(θ − θ), (A.12)

the cross-product term vanishing, justifies the statement about suf-ficiency at the beginning of Section A2.2.

Next we define the vector of fitted values

Y = Xθ, (A.13)

the values that would have arisen had the data exactly fitted themodel with the estimated parameter value. Then we have the anal-ysis of variance, or more literally the analysis of sum of squares,

Y TY = (Y −Xθ +Xθ)T (Y −Xθ +Xθ)


= SSres + Y T Y . (A.14)

We call the second term the sum of squares for fitting X andsometimes denote it by SSX . It follows on direct substitution that

E(SSX) = (Xθ)T (Xθ) + qσ2. (A.15)

A property that is often useful in analysing simple designs is thatbecause XTX is of full rank, a component θs of the least squaresestimate is the unique linear combination of XTY , the right-handside of the least squares equations, that is unbiased for θs. For sucha linear combination lTs X

TY to be unbiased we need

E(lTs XTY ) = lTs X

TXθ = eTs θ, (A.16)

where es is a vector with one in row s and zero elsewhere. Thisimplies that ls is the sth column of (XTX)−1.

Finally, and most importantly, consider confidence limits for acomponent parameter. Write C = XTX and denote the elementsof C−1 by crs. Then

var(θs) = cssσ2

is estimated bycssMSres

suggesting the use of the pivot

(θs − θ)/√

(cssMSres) (A.17)

to calculate confidence limits for and test hypotheses about θs.Under the normal theory assumption the pivot has a Student t

distribution with n− q degrees of freedom. Under the second mo-ment assumption it will have for large n asymptotically a standardnormal distribution under the extra assumptions that1. n − q also is large which can be shown to imply that MSres

converges in probability to σ2

2. the matrix X and error structure are such that the central limittheorem applies to θs.Over the second point note that if we assumed the errors inde-

pendent, and not merely uncorrelated, it is a question of verifyingsay the Lindeberg conditions. A simple sufficient but not necessarycondition for asymptotic normality of the least squares estimatesis then that in a notional series of problems in which the numberof parameters is fixed and the number of observations tends to in-finity the squared norms of all columns of (XTX)−1XT tend tozero.


A.2.4 Geometry of least squares

For the study of special problems that are not entirely balancedwe need implicitly or explicitly either algebraic manipulation andsimplification of the above matrix equations or, perhaps more com-monly, their numerical evaluation and solution. For some aspects ofthe general theory, however, it is helpful to adopt a more abstractapproach and this we now sketch.

We shall regard the vector Y and the columns of X as elementsof a linear vector space V . That is, we can add vectors and multiplythem by scalars and there is a zero vector in V .

We equip the space with a norm and a scalar product and formost purposes use the Euclidean norm, i.e. we define for a vectorZ ∈ V , specified momentarily in coordinate form,

‖Z‖2 = ZTZ = ΣZ2i , (A.18)

and the scalar product by

(Z1, Z2) = ZT1 Z2 = (Z2, Z1). (A.19)

Two vectors are orthogonal if their scalar product is zero.Given a set of vectors the collection of all linear combinations of

them defines a subspace, S, say. Its dimension dS is the maximalnumber of linearly independent components in S, i.e. the maxi-mal number such that no linear combination is identically zero. Inparticular the q columns of the n× q matrix X define a subspaceCX called the column space of X . If and only if X is of full rankq = dCX .

In the following discussion we abandon the requirement that Xis of full rank.

Given a subspace S of dimension dS1. the set of all vectors orthogonal to all vectors in S forms another

vector space S⊥ called the orthogonal complement of S2. S⊥ has dimension n− dS3. an arbitrary vector Z in V is uniquely resolved into two compo-

nents, its projection in S and its projection in the orthogonalcomplement

Z = ZS + ZS⊥ (A.20)

4. the components are by construction orthogonal and

‖Z‖2 = ‖ZS‖2 + ‖ZS⊥‖2. (A.21)


We now regard the linear model as specifying that E(Y ) lies inthe column space of X , CX . Resolve Y into a component in CX anda component in its orthogonal complement. The first component,in matrix notation Xθ, say, is such that the second componentY −Xθ is orthogonal to every column of X , i.e.

XT (Y −Xθ) = 0 (A.22)

and these are the least squares equations (A.3) so that θ = θ. Fur-ther, the components are the vectors of fitted values and residualsand the analysis of variance in (A.14) is the Pythagorean identityfor their squared norms.

From this representation we have the following results.In a redundant parameterization, the vector of fitted values and

the residual vector are uniquely defined by the column space of Xeven though some at least of the estimates of individual compo-nents of θ are not uniquely defined.

The estimate of a component of θ based on a linear combinationof the components of Y is a scalar product (l, Y ) of an estimat-ing vector l with Y . We can resolve l into components lCX , l⊥CX inand orthogonal to CX . Every scalar product (l⊥, Y ), in a slightlycondensed notation, has zero mean and, because of orthogonal-ity, var(l, Y ) is the sum of the variances of the components. Itfollows that for a given expectation the variance is minimized bytaking only estimating vectors in CX , i.e. by linear combinationsof XTY , justifying under the second moment assumption the useof least squares estimates. This property may be called the linearsufficiency of XTY .

We now sketch the distribution theory underlying confidence lim-its and tests under the normal theory assumption. It is helpful,although not essential, to set up in CX and its orthogonal com-plement a set of orthogonal unit vectors as a basis for each spacein terms of which any vector may be expressed. By an orthogonaltransformation the scalar product of Y with any of these vectorsis normally distributed with variance σ2 and scalar products withdifferent basis vectors are independent. It follows that

1. the residual sum of squares SSres has the distribution of σ2 timesa chi-squared variable with degrees of freedom n− dCX

2. the residual sum of squares is independent of the least squaresestimates, and therefore of any function of them


3. the least squares estimates, when uniquely defined, are normallydistributed

4. the sum of squares of fitted values SSX has the form of σ2 timesa noncentral chi-squared variable with dCX degrees of freedom,reducing to central chi-squared if and only if θ = 0, i.e. the trueparameter value is at the origin of the vector space.

These cover the distribution theory for standard so-called exacttests and confidence intervals.

In the next subsection we give a further development using thecoordinate-free vector space approach.

A.2.5 Stage by stage fitting

In virtually all the applications we consider in this book the pa-rameters and therefore the columns of the matrix X are dividedinto sections corresponding to parameters of different types; in par-ticular the first parameter is usually associated with a column ofone’s, i.e. is a constant for all observations.

Suppose then that

E(Y ) = X0θ0.1 +X1θ1.0; (A.23)

no essentially new ideas are involved with more than two sections.We suppose that the column spaces ofX1 and ofX0 do not coincideand that in general each new set of parameters genuinely constrainsthe previous model.

It is then sometimes helpful to argue as follows.Set θ1.0 = 0. Estimate θ0, the coefficient of X0 in the model ig-

noring X1, by least squares. Note that the notation specifies whatparameters are included in the model. We call the resulting esti-mate θ0 the least squares estimate of θ0 ignoring X1 and the as-sociated sum of squares of fitted values, SSX0 , the sum of squaresfor X0 ignoring X1.

Now project the whole problem into the orthogonal complementof CX0 , the column space of X0. That is, we replace Y by what wenow denote by Y.0, the residual vector with respect to X0 and wereplace X1 by X1.0 and the linear model formally by

E(Y.0) = X1.0θ1.0, (A.24)

a linear model in a space of dimension n − d0, where d0 is thedimension of CX0 .


We again obtain a least squares estimate θ1.0 by orthogonal pro-jection. We obtain also a residual vector Y.01 and a sum of squaresof fitted values which we call the sum of squares adjusted for X0,SSX1.0.

We continue this process for as many terms as there are in theoriginal model.

For example if there were three sections of the matrix X ; X0,X1, X2, the successive sums of squares generated would be forfitting first X0 ignoring (X1, X2), then X1 ignoring X2 adjustingfor X0 and finally X2 adjusting for (X0, X1), leaving a sum ofsquares residual to the whole model. These four sums of squares,being squared norms in orthogonal subspaces, are under the nor-mal theory assumption independently distributed in chi-squareddistributions, central for the residual sum of squares and in gen-eral noncentral for the others. Although it is an aspect we have notemphasized in the book, if a test is required of consistency withθ2.01 = 0 in the presence of arbitrary θ0, θ1 this can be achievedvia the ratio of the mean square for X2 adjusting for (X0, X1)to the residual mean square. The null distribution, correspondingto the appropriately scaled ratio of independent chi-squared vari-ables, is the standard variance ratio or F distribution with degreesof freedom the dimensions of the corresponding spaces.

The simplest special case of this procedure is the fitting of thelinear regression

E(Yi) = θ0.1 + θ1.0zi, (A.25)

so that X0, X1 are both n× 1. We estimate θ0 ignoring X1 by thesample mean Y. and projection orthogonal to the unit vector X0

leads to the formal model

E(Yi − Y.) = θ1.0(zi − z.) (A.26)

from which familiar formulae for a least squares slope, and associ-ated sum of squares, follow immediately.

In balanced situations, such as a randomized block design, inwhich the three sections correspond to terms representing generalmean, block and treatment effects, the spaces X1.0, X2.0 are or-thogonal and X2.0 is the same as X2.01. Then and only then thedistinction between, say, a sum of squares for blocks ignoring treat-ments and a sum of squares for blocks adjusting for treatments canbe ignored.

Because of the insight provided by fitting parameters in stages


and of its connection with the procedure known in the older lit-erature as analysis of covariance, we now sketch an algebraic dis-cussion, taking for simplicity a model in two stages and assumingformulations of full rank.

That is we start again from

E(Y ) = X0θ0.1 +X1θ1.0, (A.27)

where Xj is n× dj and θj.k is dj × 1. The matrix

R0 = I −X0(XT0 X0)−1XT

0 (A.28)

is symmetric and idempotent, i.e. RT0 R0 = R2

0 = R0 and for anyvector u of length n, R0u is the vector of residuals after regressingu on X0. The associated residual sum of squares is (R0u)T (R0u) =uTR0u and the associated residual sum of products for two vectorsu and v is uTR0v. Further for any vector u, we have XT

0 R0u = 0.In the notation of (A.24) R0Y = Y.0 and R0X1 = X1.0.

We can rewrite model (A.27) in the form

E(Y ) = X0θ0 +R0X1θ1.0 (A.29)

where

θ0 = θ0.1 + (XT0 X0)−1XT

0 X1θ1.0. (A.30)

The least squares equations for the model (A.29) have the form(XT

0 X0 00 XT

1 R0X1

) (θ0θ1.0

)=

(XT

0 YXT

1 R0Y

)(A.31)

from which the following results can be deduced.First the parameters θ0 and θ1.0 are orthogonal with respect to

the expected Fisher information in the model, and the associatedvectors X0 and X1.0 are orthogonal in the usual sense.

Secondly θ1.0 is estimated from a set of least squares equationsformed from the matrix of sums and squares of products of thecolumns ofX1 residual to their regression onX0 and the covariancematrix of θ1.0 is similarly determined.

Thirdly the least squares estimate θ1.0 is obtained by adding toθ0 a function uncorrelated with it.

Continuing with the analysis of (A.29) we see that the residualsum of squares from the full model is obtained by reducing theresidual sum of squares from the model ignoring θ1, i.e. Y TR0Y ,by the sum of squares of fitted values in the regression of R0Y on


R0X1:

(Y −X0θ0 −R0X1θ1.0)T (Y −X0θ0 −R0X1θ1.0)

= Y TR0Y − θT1.0X

T1 R0X1θ1.0. (A.32)

Under the normal theory assumption, an F test of the hypothesisthat θ1.0 = 0 is obtained via the calculation of residual sums ofsquares from the full model and from one in which the term in θ1.0

is omitted.It is sometimes convenient to display the results from stage by

stage fitting in an analysis of variance table; an example with justtwo stages is outlined in Table A.1. In the models used in thisbook, the first section of the design matrix is always a column of1’s corresponding to fitting an overall mean. This is so rarely ofinterest it is usually omitted from the analysis of variance table,so that the total sum of squares is (Y − Y 1)T (Y − Y 1) instead ofY TY .

A.2.6 Redundant parameters

In the main text we often use formulations with redundant param-eters, such as for the completely randomized and randomized block

Table A.1 Analysis of variance table emphasizing the fitting of model(A.27) in two stages.

Source D.f. Sums of Expectedsquares mean square

Regr. rank X0 θT0 X

T0 X0θ0 σ2+

on X0 = q θT0 X

T0 X0θ0/q

Regr. on p− q θT1.0X

T1.0X1.0θ1.0 σ2+

X1, adj. θT1.0X

T1.0X1.0θ1.0/(p− q)

for X0

Residual n− p− q (Y − Y )T (Y − Y ) σ2

Total n Y TY


designs

E(Yjs) = µ+ τj , (A.33)E(Yjs) = µ+ τj + βs. (A.34)

With v treatments and b blocks these models have respectively v+1and v + b + 1 parameters although the corresponding X matricesin the standard formulation have ranks v and v + b− 1.

The least squares equations are algebraically consistent but donot have a unique solution. The geometry of least squares shows,however, that the vector of fitted values and the sum of squaresfor residual and for fitting X are unique. In fact any parameter ofthe form aTXθ has a unique least squares estimate.

From a theoretical viewpoint the most satisfactory approach isvia the use of generalized inverses and the avoidance of explicitspecification of constraints. The issue can, however, always be by-passed by redefining the model so that the column space of Xremains the same but the number of parameters is reduced toeliminate the redundancy, i.e. to restore X to full rank. Whenthe parameters involved are parameters of interest this is indeeddesirable in that the primary objective of the analysis is the esti-mation of, for example, treatment contrasts, so that formulationin terms of them has appeal despite a commonly occurring loss ofsymmetry.

While, as noted above, many aspects of the estimation problemdo not depend on the particular reparameterization chosen somecare is needed. First and most importantly the constraint mustnot introduce an additional rank reduction in X ; thus in (A.33)the constraint

µ+ Στs = a (A.35)

for a given constant a would not serve for resolution of parameterredundancy.

In general suppose that the n× q matrix X is of rank q− r andimpose constraints Aθ = 0, where A is r × q chosen so that theequations

Xθ = k, Aθ = 0

uniquely determine θ for all constant vectors k. This requires that(XT , AT ) is of full rank. The least squares projection equations


supplemented by the constraints give

(XTX AT

A 0

) (θλ

)=

(XTY

0

), (A.36)

where λ is a vector of Lagrange multipliers used to introduce theconstraint.

The matrix on the left-hand side is singular but is converted intoa nonsingular matrix by changingXTX to XTX+ATA which doesnot change the solution. Equations for the constrained estimatesand their covariance matrix follow.

In many of the applications discussed in the book some compo-nents of the parameter vector correspond to qualitative levels of afactor, i.e. each level has a distinct component. As discussed aboveconstraints are needed if a model of full rank is to be achieved. De-note the unconstrained set of relevant components by ψ1, . . . , ψk;typical constraints are ψ1 = 0 and Σψj = 0, although others arepossible. The objective is typically the estimation with standarderrors of a set C of contrasts Σcjψj with Σcj = 0. If C is the setof simple contrasts with baseline, i.e. the estimation of ψj − ψ1

for j = 2, . . . , k, the resulting estimates of the constrained param-eters and their standard errors are all that is needed. In general,however, the full covariance matrix of the constrained estimateswill be required; for example if the constrained parameters are de-noted by φ2, . . . , φk, estimation of ψ3 − ψ2 via φ3 − φ2 is directbut the standard error requires cov(φ3, φ2) as well as the separatevariances.

In the presentation of conclusions and especially for moderatelylarge k the recording of a full covariance matrix may be inconve-nient. This may often be avoided by the following device. We at-tach pseudo-variances ω1, . . . , ωk to estimates of so-called floatingparameters a+ ψ1, . . . , a+ ψk, where a is an unspecified constant,in such a way that exactly or approximately

var(Σcjψj) = varΣcj(a+ ψj) = Σωjc2j (A.37)

for all contrasts in the set C. We then treat the floating parame-ters as if independently estimated. In simple cases this reduces tospecifying marginal means rather than contrasts.


A.2.7 Residual analysis and model adequacy

There is an extensive literature on the direct use of the residualvector, Yres.X , for assessing model adequacy and detecting possibledata anomalies. The simplest versions of these methods hinge onthe notion that if the model is reasonably adequate the residualvector is approximately a set of independent and identically dis-tributed random variables, so that structure detected in them isevidence of model inadequacy. This is broadly reasonable when thenumber of parameters fitted is small compared with the numberof independent observations. In fact the covariance matrix of theresidual vector is

σ2I −X(XTX)−1XT (A.38)

from which more refined calculations can be made.For many of the designs considered in this book, however, q is

not small compared with n and naive use of the residuals will bepotentially misleading.

Possible departures from the models can be classified roughly assystematic changes in structure, on the whole best detected andanalysed by fitting extended models, and data anomalies, such asdefective observations, best studied via direct inspection of thedata, where these are not too extensive, and by appropriate func-tions of residuals. See the Bibliographic notes and Further resultsand exercises.

A.3 Analysis of variance

A.3.1 Preliminaries

In the previous section analysis of variance was introduced in thecontext of the linear model as a schematic way first of calculatingthe residual sum of squares as a basis for estimating residual vari-ance and then as a device for testing a null hypothesis constrainingthe parameter vector of the linear model to a subspace. There are,however, other ways of thinking about analysis of variance.

The first corresponds in a sense to the most literal meaning ofanalysis. Suppose that an observed random variable is in fact thesum of two unobserved (latent) variables, so that in the simplestcase in which systematic effects are absent we can write

Y = µ+ ξ + η, (A.39)


where µ is the unknown mean and ξ and η are uncorrelated randomvariables of zero mean and variances respectively σ2

ξ , σ2η; these are

called components of variance. Then

var(Y ) = σ2ξ + σ2

η, (A.40)

with obvious generalization to more than two components of vari-ability.

If now we observe a number of different random variables withthis general structure it may be possible to estimate the com-ponents of variance σ2

ξ , σ2η separately and in this sense to have

achieved a splitting up, i.e. analysis, of variance. The two simplestcases are where we have repeat observations on a number of groupsof observations Yaj , say, with

Yaj = µ+ ξa + ηaj (A.41)

for a = 1, . . . , A; j = 1, . . . , J , where it is convenient to departbriefly from our general convention of restricting upper case lettersto random variables.

The formulation presupposes that the variation between groups,as well as that within groups, is regarded as best represented byrandom variables; it would thus not be appropriate if the groupsrepresented different treatments.

A second possibility is that the observations are cross-classifiedby rows and columns in a rectangular array and where the obser-vations Yab can be written

Yab = µ+ ξa + ηb + εab, (A.42)

where the component random variables are now interpreted as ran-dom row and column effects and as error, respectively.

It can be shown that in these and similar situations with ap-propriate definitions the components of variance can be estimatedvia a suitable analysis of variance table. Moreover we can then, viathe complementary technique of synthesis of variance, estimate theproperties of systems in which either the variance components havebeen modified in some way, or in which the structure of the datais different, the variance components having remained the same.

For example under model (A.41) the variance of the overall meanof the data, considered as an estimate of µ is easily seen to be

σ2ξ/A+ σ2

η/(AJ).

We can thus estimate the effect on the precision of the estimate of


µ of, say, increasing or decreasing J or of improvements in mea-surement technique leading to a reduction in σ2

η.

A.3.2 Cross-classification and nesting

We now give some general remarks on the formal role of analysis ofvariance to describe relatively complicated structures of data. Forthis we consider data classified in general in a multi-dimensionalarray. Later it will be crucial to distinguish classification based on atreatment applied to the units from that arising from the intrinsicstructure of the units but for the moment we do not make thatdistinction.

A fundamental distinction is between cross-classification andnesting. Thus in the simplest case we may have an array of obser-vations, which we shall now denote by Ya;j , in which the labellingof the repeat observations for each a is essentially arbitrary, i.e.j = 1 at a = 1 has no meaningful connection with j = 1 at a = 2.We say that the second suffix is nested within the first. By contrastwe may have an arrangement Yab, which can be thought of as arow by column A × B array in which, say, the column labellingretains the same meaning for each row a and vice versa. We saythat rows are crossed with columns.

Corresponding to these structures we may decompose the datavector into components. First, for the nested arrangement, we havethat

Ya;j = Y.. + (Ya. − Y..) + (Ya;j − Ya.). (A.43)

This is to be contrasted with, for the cross-classification, the de-composition

Yab = Y.. +(Ya.− Y..)+ (Y.b− Y..)+ (Yab− Ya.− Y.b + Y..). (A.44)

As usual averaging over a suffix is denoted by a full-stop.Now in these decompositions the terms on the right-hand sides,

considered as defining vectors, are mutually orthogonal, leading tofamiliar decompositions of the sums of squares. Moreover there isa corresponding decomposition of the dimensionalities, or degreesof freedom, of the spaces in which these vectors lie, namely for thenested case

AJ = 1 + (A− 1) +A(J − 1)


and in the cross-classified case

AB = 1 + (A− 1) + (B − 1) + (A− 1)(B − 1).

Note that if we were mistakenly to treat the suffix j as if it were ameaningful basis for cross-classification we would be decomposingthe third term in the analysis of the data vector, the sum of squaresand the degrees of freedom into the third and fourth components ofthe crossed analysis. In general if a nested suffix is converted into acrossed suffix, variation within nested levels is typically convertedinto a main effect and one or more interaction terms.

The skeleton analysis of variance tables corresponding to thesestructures, with degrees of freedom, but without sums of squares,are given in Table A.2.

There are many possibilities for extension, still keeping to bal-anced arrangements. For example Yabc;j denotes an arrangement inwhich observations, perhaps corresponding to replicate determina-tions, are nested within each cell of anA×B×C cross-classification,whereas observations Y(a;j)bc have within each level of the first clas-sification a number of sublevels which are all then crossed with thelevels of the other classifications. The skeleton analysis of variancetables for these two settings are given in Tables A.3 and A.4. Notethat in the second analysis the final residual could be further de-composed.

These decompositions may initially be regarded as concise de-scriptions of the data structure. Note that no probabilistic consid-erations have been explicitly involved. In thinking about relatively

Table A.2 Skeleton analysis of variance table for nested and crossedstructures.

Nested CrossedSource D.f. Source D.f.

Mean 1 Mean 1A-class (groups) A− 1 A-class (rows) A− 1Within groups A(J − 1) B-class (cols) B − 1

A×B (A− 1)(B − 1)Total AJ

Total AB


Table A.3 Skeleton analysis of variance for Yabc;j.

Source D.f.

Mean 1A A− 1B B − 1C C − 1B × C (B − 1)(C − 1)C ×A (C − 1)(A− 1)A×B (A− 1)(B − 1)A×B × C (A− 1)(B − 1)(C − 1)Within cells ABC(J − 1)Total ABCJ

Table A.4 Skeleton analysis of variance for Y(a;j)bc.

Source D.f.

Mean 1A A− 1Within A A(J − 1)B B − 1C C − 1B × C (B − 1)(C − 1)C ×A (C − 1)(A− 1)A×B (A− 1)(B − 1)A×B × C (A− 1)(B − 1)(C − 1)Residual A(BC − 1)(J − 1)Total ABCJ

complex arrangements it is often essential to establish which fea-tures are to be regarded as crossed and which as nested.

In terms of modelling it may then be useful to convert the datadecomposition into a model in which the parameters associatedwith nested suffixes correspond to random effects, differing levelsof nesting corresponding to different variance components. Also


it will sometimes be necessary to regard the highest order inter-actions as corresponding to random variables, in particular whenone of the interacting factors represents grouping of the units madewithout specific subject-matter interpretation. Further all or mostpurely cross-classified suffixes correspond to parameters describ-ing the systematic structure, either parameters of direct interestor characterizations of block and similar effects. For example, themodel corresponding to the skeleton analysis of variance in TableA.4 is

Y(a;j)bc = τAa +ηj(a) +τB

b +τCc +τBC

bc +τABab +τAC

ac +τABCabc +εj(abc).

For normal theory linear models with balanced data and a singlelevel of nesting the resulting model is a standard linear model andthe decomposition of the data and the sums of squares have adirect use in terms of standard least squares estimation and testing.With balanced data and several levels of nesting hierarchical errorstructures are involved.

Although we have restricted attention to balanced arrangementsand normal error, the procedure outlined here suggests a system-atic approach in more complex problems. We may have data un-balanced because of missing combinations or unequal replication.Further the simplest appropriate model may not be a normal the-ory linear model but, for example a linear logistic or linear probitmodel for binary response data. The following procedure may nev-ertheless be helpful.

First write down the formal analysis of variance table for thenearest balanced structure.

Next consider the corresponding normal theory linear model. Ifit has a single error term the corresponding linear model for unbal-anced data can be analysed by least squares and the correspondinggeneralized linear model, for example for binary data, analysed bythe method of maximum likelihood. Of course even in the normaltheory case the lack of balance will mean that the sums of squaresin the analysis of variance decomposition must be interpreted inlight of the other terms in the model. If the model has multiple errorterms the usual normal theory analysis uses the so-called residualmaximum likelihood, or REML, for inference on the variance com-ponents. This involves constructing the marginal likelihood for theresiduals after estimating the parameters in the mean. The corre-sponding analysis for generalized linear models will involve somespecial approximations.


The importance of these remarks lies in the need to have a sys-tematic approach to developing models for complex data and fortechniques of analysis when normal theory linear models are largelyinapplicable.

A.4 More general models; maximum likelihood

We have, partly for simplicity of exposition and partly because oftheir immediate practical relevance, emphasized analyses for con-tinuous responses based directly or indirectly on the normal theorylinear model. Other types of response, in particular binary, ordi-nal or nominal, arise in many fields. Broadly speaking the use ofstandard designs, for example of the factorial type, will usually besensible for such situations although formal optimality considera-tions will depend on the particular model appropriate for analysis,except perhaps locally near a null hypothesis; see Section 7.6.

For models other than the normal theory linear model, formalmethods of analysis based on the log likelihood function or somemodification thereof provide analyses serving essentially the samerole as those available in the simpler situation. Thus confidenceintervals based on the profile likelihood or one of its modificationsare the preferred analogue of intervals based on the Student t dis-tribution and likelihood ratio tests the analogue of tests for setsof parameters using the F distribution. We shall not review thesefurther here.

A.5 Bibliographic notes

The method of least squares as applied to a linear model has along history and an enormous literature. For the history, see Stigler(1986) and Hald (1998). Nearly but not quite all books on designof experiments and virtually all those on regression and analysis ofvariance and most books on general statistical theory have discus-sions of least squares and the associated distribution theory, themathematical level and style of treatment varying substantiallybetween books. The geometrical treatment of the distribution the-ory is perhaps implicit in comments of R. A. Fisher, and is cer-tainly a natural development from his treatment of distributionalproblems. It was explicitly formulated by Bartlett (1933) and byKruskal (1961) and in some detail in University of North Carolinalecture notes, unpublished so far as we know, by R. C. Bose.


Analysis of variance, first introduced, in fact in the context ofa nonlinear model, by Fisher and Mackenzie (1923), is often pre-sented as an outgrowth of linear model analysis and in particulareither essentially as an algorithm for computing residual sums ofsquares or as a way of testing (usually uninteresting) null hypothe-ses. This is only one aspect of analysis of variance. An older andin our view more important role is in clarifying the structure ofsets of data, especially relatively complicated mixtures of crossedand nested data. This indicates what contrasts can be estimatedand the relevant basis for estimating error. From this viewpointthe analysis of variance table comes first, then the linear modelnot vice versa. See Nelder (1965a, b) for a systematic formulationand from a very much more mathematical viewpoint Speed (1987).Brien and Payne (1999) describe some further developments. Themain systematic treatment of the theory of analysis of varianceremains the book of Scheffe (1959).

The method of fitting in stages is implicit in Gauss’s treatment;the discussion in Draper and Smith (1998, Chapter 2) is help-ful. The connection with analysis of covariance goes back to theintroduction of that technique; for a broad review of analysis ofcovariance, see Cox and McCullagh (1982). The relation betweenleast squares theory and asymptotic theory is discussed by van derVaart (1998).

The approach to fitting models of complex structure to general-ized linear models is discussed in detail in McCullagh and Nelder(1989).

For techniques for assessing the adequacy of linear models, seeAtkinson (1985) and Cook and Weisberg (1982).

For discussion of floating parameters and pseudo-variances, seeRidout (1989), Easton et al. (1991), Reeves (1991) and Firth andMenezes (2000).

A.6 Further results and exercises

1. Show that the matrices X(XTX)−1XT and I−X(XTX)−1XT

are idempotent, i.e. equal to their own squares and give thegeometrical interpretation. The first is sometimes called the hatmatrix because of its role in forming the vector of fitted valuesY .

2. If the covariance matrix of Y is V σ2, where V is a known pos-itive definite matrix, show that the geometrical interpretation


of Section A.2.3 is preserved when norm and scalar product aredefined by

‖Y ‖2 = Y TV −1Y, (Y1, Y2) = Y T1 V

−1Y2.

Show that this leads to the generalized least squares estimatingequation

XTV −1(Y −Xθ) = 0.

Explain why these are appropriate definitions statistically andgeometrically.

3. Verify by direct calculation that in the least squares analysis ofa completely randomized design essentially equivalent answersare obtained whatever admissible constraint is imposed on thetreatment parameters.

4. Denote the jth diagonal element ofX(XTX)−1XT by hj, calledthe leverage of the corresponding response value. Show that 0 <hj < 1 and that Σhj = q, where q is the rank of XTX . Showthat the variance of the corresponding component of the residualvector, Yres,j is σ2(1 − hj), leading to the definition of the jthstandardized residual as rj = Yres,j/s√(1 − hj), where s2 isthe residual mean square. Show that the difference between Yj

and the predicted mean for Yj after omitting the jth value fromthe analysis, xT

j β(j), divided by the estimated standard error ofthe difference (again omitting Yj from the analysis) is

rj(n− q − 1)1/2/(n− q − r2j )1/2

and may be helpful for examining for possible outliers. For somepurposes it is more relevant to consider the influence of specificobservations on particular parameters of interest. See Atkinson(1985) and Cook and Weisberg (1982, Ch. 2).

5. Suggest a procedure for detecting a single defective observationin a single Latin square design. Test the procedure by simulationon a 4× 4 and a 8× 8 square.

6. Develop the intra-block analysis of a balanced incomplete blockdesign via the method of fitting parameters in stages.

7. Consider a v× v Latin square design with a single baseline vari-able z. It is required to fit the standard Latin square modelaugmented by linear regression on z. Show by the method of fit-ting parameters in stages that this is achieved by the following


construction. Write down the standard Latin square analysis ofvariance table for the response Y and the analogous forms forthe sum of products of Y with z and the sum of squares of z.Let RY Y , RzY , Rzz and TY Y , TzY , Tzz denote respectively theresidual and treatment lines for these three analysis of variancetables. Then

(a) the estimated regression coefficient of Y on z is RzY /Rzz;(b) the residual mean square of Y in the full analysis is (RY Y −

R2zY /Rzz)/(v2 − 3v + 1);

(c) the treatment effects are estimated by applying an adjust-ment proportional to RzY /Rzz to the simple treatment ef-fects estimated from Y in an analysis ignoring z;

(d) the adjustment is uncorrelated with the simple unadjustedeffects so that the standard error of an adjusted estimate iseasily calculated;

(e) an F test for the nullity of all treatment effects is obtainedby comparing with the above residual mean square the sumof squares with v − 1 degrees of freedom

TY Y +RY Y − (TzY +RzY )2/(Tzz +Rzz).

How would treatment by z interaction be tested?8. Recall the definition of a sum of squares with one degree of

freedom associated with a contrast lTY = ΣljYj , with Σlj = 0,as (lTY )2/(lT l). Show that in the context of the linear modelE(Y ) = Xθ, the contrast is a function only of the residual vectorif and only if lTX = 0. In this case show that under the normaltheory assumption l can be taken to be any function of thefitted values Y and the distribution of the contrast will remainσ2 times chi-squared with one degree of freedom.For a randomized block design the contrast ΣljsYjs = Σljs(Yjs−Yjs) with ljs = (Yj. − Y..)(Y.s − Y..) was suggested by Tukey(1949) as a means of checking deviations from additivity of theform

E(Yjs) = µ+ τj + βs + γ(τjβs).


APPENDIX B

Some algebra

B.1 Introduction

Some specialized aspects of the design of experiments, especiallythe construction of arrangements with special properties, have linkswith problems of general combinatorial interest. This is not a topicwe have emphasized in the book, but in this Appendix we review inoutline some of the algebra involved. The discussion is in a numberof sections which can be read largely independently. One objectiveof this Appendix is to introduce some key algebraic ideas neededto approach some of the more specialized literature.

B.2 Group theory

B.2.1 Definition

A group is a set G of elements a, b, . . . and a rule of combinationsuch thatG1 for each ordered pair a, b ∈ G, there is defined a unique elementab ∈ G;

G2 (ab)c = a(bc);G3 there exists e ∈ G, such that ea = a, for all a ∈ G;G4 for each a ∈ G, there exists a−1 ∈ G such that a−1a = e.

Many properties follow directly from G1–G4. Thus ae = ea,aa−1 = a−1a = e, and e, a−1 are unique. Also ax = ay impliesx = y.

If ab = ba for all a, b ∈ G, then G is called commutative (orAbelian). If it has a finite number n of elements we call it a finitegroup of order n. All the groups considered in the present contextare finite.

A subset of elements of G forming a group under the law ofcombination of G is called a subgroup of G.

Starting with a subgroup S of order r we can generate the groupG by multiplying the elements of S, say on the left, by new elements


to form what are called the cosets of S. This construction is usedrepeatedly in the theory of fractional replication and confounding.

We could equally denote the law of combination by +, but byconvention we restrict + to commutative groups and then denotethe identity by 0.Examples

1. all integers under addition (infinite group)

2. the cyclic group of order n: Cn(a). This is the set

1, a, a2, . . . , an−1 (B.1)

with the rule aras = at where t = (r + s) mod n. Alterna-tively this can be written as the additive group of least posi-tive residues mod n, G+

n , G+n = 0, 1, . . . , n − 1, with the rule

r + s = t, where t = (r + s) mod n. Clearly Cn and G+n are

essentially the same group. They are said to be isomorphic; theelements of the two groups can be placed in 1-1 correspondencein a way preserving the group operation.

B.2.2 Prime power commutative groups

Let p be prime. Build up groups from Cp(a), Cp(b) as follows:

1 a a2 . . . ap−1

b ab a2b . . . ap−1b...

......

bp−1 abp−1 a2bp−1 . . . ap−1bp−1

(B.2)

This set of p2 symbols forms a commutative group if we define(aibj)(akbl) = ai+kbj+l, reducing mod p where necessary. We callthis group Cp(a, b). Similarly, with the symbols a, b, . . . , d we defineCp(a, b, . . . , d) to be the set of all powers aibj . . . dk, with indicesbetween 0, . . . , p − 1. The group is called a prime power commu-tative group of order pm and a, b, . . . , d are called the generators.

Properties

1. A group is generated equally by any set of m independent ele-ments.

2. Any subgroup is of order p, p2, . . . and is generated by a suitableset of elements.


In the group Cp(a, b) enumerated above, the first line is a sub-group. The remaining lines are obtained by repeated multiplicationby fixed elements and are thus the cosets of the subgroup.

B.2.3 Permutation groups

We now consider a different kind of finite group. Let each elementof the group denote a permutation of the positions

1, 2, . . . , n.For example, with n = 4 the elements a and b might produce

a : 2, 4, 3, 1;b : 4, 3, 2, 1.

We define ab by composition, i.e. by applying first b then a to givein the above example

1, 3, 4, 2;note that ba here gives

3, 1, 2, 4showing that in general composition of permutations is not com-mutative. A group of permutations is a set of permutations suchthat if a and b are in the set so too is ab; the unit element leaves allpositions unchanged and we require also the inclusion with everya, its inverse a−1 defined by aa−1 = e, i.e. by restoring the originalpositions.

The simplest such group is the set of all possible permutations,called the symmetric group of order n, denoted by Sn; it has n!elements.

A group of transformations is called transitive if it contains atleast one permutation sending position i into position j for all i, j.It is called doubly transitive if it contains a permutation sending anyordered pair i, j; i 6= j into any other ordered pair k, l; k 6= l. It canbe shown that if i 6= j 6= k 6= l then the number of permutationswith the required property is the same for all i, j, k, l.

An important construction of a group of permutations is as fol-lows. Divide the positions into b blocks each of k positions. Considera group formed as follows. Take a permutation from the symmetricgroup Sb to permute the blocks. Then take permutations from bseparate symmetric groups Sk to permute positions within blocks.The group formed by composing these permutations is called awreath product.


B.2.4 Application to randomization theory

The most direct way of thinking about the randomization of a de-sign is to consider the experimental units as given, labelled 1, . . . , n,say and then a particular pattern of treatment allocation to bechosen at random out of some suitable set of arrangements. Inthis sense the units are fixed and the treatments randomized tothe units. It is equivalent, however, to suppose that a treatmentpattern is fixed and then the units allocated at random to thatpattern.

For example consider an experiment with two treatments, T andC and six units 1, . . . , 6, a completely randomized design with equalreplication being used. We may start with the design

T, T, T, C,C,C.

Next apply the symmetric group S6 of 6! permutations to the initialorder 1, 2, 3, 4, 5, 6 of the six experimental units. This generates theset

1, 2, 3, 4, 5, 61, 2, 3, 4, 6, 5

...6, 5, 4, 3, 2, 1

Then we choose one permutation at random out of that set as thespecification of the design. In some respects this is a clumsy con-struction but it has the advantage of making it clear that becausethe set of possible designs is invariant under any permutation in S6

so too must be the properties of the randomization distribution.If instead we had used the matched pair design based on the

pairs (1, 2), (3, 4), (5, 6) the initial design would have been

T,C, T, C, T, C.

The permutations would either have interchanged units within apair or interchanged units within a pair and pairs as a whole.

The first possibility gives

1, 2, 3, 4, 5, 61, 2, 3, 4, 6, 51, 2, 4, 3, 5, 61, 2, 4, 3, 6, 5


2, 1, 3, 4, 5, 62, 1, 3, 4, 6, 52, 1, 4, 3, 5, 62, 1, 4, 3, 6, 5

The second possibility, which would become especially relevant if asecond set of treatments was to be imposed at the pair level, wouldinvolve also interchanging pairs to give, for example

3, 4, 1, 2, 5, 63, 4, 2, 1, 5, 6

etc. In the first case, the set of designs is invariant under the per-mutation group consisting of all possible transpositions of pairs. Inthe second a larger group is involved, in fact the wreath productas defined above. Again it follows that the randomization distribu-tions involved are invariant under the appropriate group.

When the second moment theory of randomization is consideredwe are concerned only with the properties of linear and quadraticfunctions of the data. The arguments used in Sections 3.3 and 3.4use invariance to simplify the randomization expectations involved.If in the present formulation the set of designs is invariant under agroup G of permutations so too are all expectations involving theunit constants ξ in the notation of Chapters 2 and 3.

There are essentially two uses of this formulation. One is in con-nection with complex designs where randomization has been usedbut it is not clear to what extent a randomization-based analysis isvalid. Then clarification of the group of permutations that leavesthe design invariant can resolve the issue.

Secondly, even in simple designs where second moment proper-ties are considered, the maximal group of permutations may notbe required; the key property is some version of double transitivitybecause the focus of interest is the randomization expectation ofquadratic forms. This leads to the notion of restricted randomiza-tion; it may be possible to label certain arrangements generatedby the “full” group as objectionable and to find a restricted groupof permutations having the right double transitivity properties tojustify the standard analysis but excluding the objectionable ar-rangements. The explicit use of permutation groups is unnecessaryfor the relatively simple designs considered in this book but is es-sential for more complex possibilities.


B.3 Galois fields

B.3.1 Definition

The most important algebraic systems involving two operations, byconvention represented as addition and multiplication, are knownas fields and we give a brief introduction to their properties.

A field is a set F of elements a, b, . . . such that for any paira, b ∈ F , there are defined unique a+ b, a · b ∈ F such thatF1 Under +, F is an additive group with identity 0;F2 Under ·, all elements of F except 0 form a commutative group;F3 a · (b + c) = a · b + a · c.

Various properties follow from the axioms. In particular we havecancellation laws: a+ b = a+ c implies b = c; a · b = a · c and a 6= 0,imply b = c. F2 implies the existence of a unit element.

If F contains a finite number n of elements it is called a Galoisfield of order n.

The key facts about Galois fields are:1. Galois fields exist if and only if n is a prime power pm;2. any two fields of order n are isomorphic, so that there exists

essentially only one Galois field of order pm, which we denoteGF(pm).Fields of prime order, GF(p), may be defined as consisting of

0, 1, . . . , p− 1, defining addition and multiplication mod p. Thisconstruction satisfies F1 for all p, but F2 only for prime p.

We use GF(p) to construct finite fields of prime power order,GF(pm). These consist of all polynomials

a0 + a1x+ . . .+ am−1xm−1,

with ai ∈ GF(p). Obviously there are pm such expressions. Ad-dition is defined as ordinary addition with reduction of the co-efficients mod p. To define multiplication, we use an irreduciblepolynomial P (x) of degree m, and with coefficients in GF(p). Thatis we take

P (x) = α0 + α1x+ . . .+ αmxm (αm 6= 0), (B.3)

where P (x) is not a product, reducing mod p, of polynomials oflower degree. Such a P (x) always exists. The product of two ele-ments in GF(pm) is the remainder of their ordinary product afterdivision by P (x) and reduction of the coefficients mod p. It can beshown that this defines a field.


Example. For GF(22) the elements are 0, 1, x, x + 1. An irre-ducible polynomial is P (x) = x2 + x+ 1; note that P (x) is not x2

or x · (x+ 1) or (x+ 1)2 = x2 + 1. Then for example x · x is

x2 = (x2 + x+ 1)− (x+ 1) = −(x+ 1) = x+ 1, (B.4)

after division by P (x) and because −1 = 1.The field GF(pm) can alternatively be constructed using a power

cycle from a primitive element in which the powers x1, . . . , xpm−1

are identified with each nonzero element of the field. In the Exam-ple above x1 = x, x2 = x + 1, x3 = 1. The powers x1, . . . , xpm−1

contain each nonzero element of the field just once. The powercycle can be used to work back to the multiplication table; e.g.x · (x+ 1) = xx2 = 1.

We define a nonzero member a of the field to be a quadraticresidue if it is the square of another member of the field. It can beseen that quadratic residues are even powers of a primitive elementof the field and that therefore the number of nonzero nonquadraticresidues is the same as the number of quadratic residues. That is,if we define

χ(a) =

1 a is a quadratic residue−1 a 6= 0 and is not a quadratic residue0 a = 0,

(B.5)

then

Σχ(a) = 0, χ(a)χ(b) = χ(ab), (B.6)Σjχ(j − i1)χ(j − i2) = −1, (B.7)

the last, a quasi-orthogonality relation, being useful in connectionwith the construction of Hadamard matrices.

B.3.2 Orthogonal sets of Latin squares

We now sketch the application of Galois fields to orthogonal Latinsquares, adopting a rather more formal approach than that sketchedin Section 4.1.3.

A set of n×n Latin squares such that any pair are orthogonal iscalled an orthogonal set and an orthogonal set of n− 1 n×n Latinsquares is called a complete orthogonal set.

The central result is that whenever n is a prime power pm,a complete orthogonal set exists. To see this, number the rows,columns and letters by the elements of GF(pm); u0 = 0, u1 =


1, u2, . . . , un−1. For each λ = 1, . . . , n− 1 define a Latin square Lλ

by the rule: in row ux, column uy, put letter uλux + uy. Symboli-cally

Lλ : ux, uy : uλux + uy. (B.8)

Then these are a complete orthogonal set. For1. Lλ is a Latin square;

2. if λ 6= λ′, Lλ and Lλ′ are orthogonal.To prove 1., note that if the same letter occurs in row ux and

columns uy and uy′ , then

uλux + uy = uλux + uy′ . (B.9)

This implies that uy = uy′ , because of the cancellation law in theadditive group GF(pm).

Similarly if the same letter occurs in rows ux, ux′ , and in columnuy, then

uλux + uy = uλux′ + uy,

uλux = uλux′ ,

ux = ux′

using both addition and multiplication cancellation laws.To prove 2., suppose that row ux, column uy contain the same

pair of letters as row ux′ and column uy′ . Then

uλux + uy = uλux′ + uy′ ,

uλ′ux + uy = uλ′ux′ + uy′ .

Therefore (uλ − uλ′)ux = (uλ − uλ′)ux′ . Thus ux = ux′ , sinceuλ − uλ′ 6= 0. Similarly uy = uy′ .

From this it follows that any square of the set can be derivedfrom any other, in particular from L1, by a permutation of rows.For if and only if uλux = uλ′ux′ , then the ux row of Lλ is identicalwith the ux′ row of Lλ′ . Further the last equation has a uniquesolution for ux′ , so that each row of Lλ occurs just once in Lλ′ .(This result is not true for all complete orthogonal sets.)

A second consequence is that L1 is the addition table of GF(pm).For it is given by the rule ux, uy;ux + uy. An example of twoorthogonal 5× 5 Latin squares is given in Table B.1.

Let N(n) be the maximum possible number of squares in anorthogonal set of n × n squares. We have shown above that ifn = pm, N(n) = n − 1. This can be extended to show that if


Table B.1 The construction of two orthogonal 5× 5 Latin squares.

Columnu0 u1 u2 u3 u4

Row u0 = 0 0 1 2 3 4u1 = 1 1 2 3 4 0u2 = 2 2 3 4 0 1u3 = 3 3 4 0 1 2u4 = 4 4 0 1 2 3

L1

u0 u1 u2 u3 u4

Row u0 = 0 0 1 2 3 4u1 = 1 2 3 4 0 1u2 = 2 4 0 1 2 3u3 = 3 1 2 3 4 0u4 = 4 3 4 0 1 2

L2

n = pm11 pm2

2 . . . (p1 6= p2 6= . . .) and if rn = min(pm11 , pm2

2 , . . . , ),then N(n) ≥ rn−1. Thus if n = 12, rn = min(22, 3) = 3 and thereexists at least 3 − 1 = 2 orthogonal 12 × 12 squares. Fisher andYates, and others, have shown that N(6) = 1, i.e. there is not evena Graeco-Latin square of size 6.

A longstanding conjecture of Euler was that N(n) = rn − 1,which would have implied that no Graeco-Latin square exists whenn = 2 mod 4, and in particular that no 10×10 Graeco-Latin squareexists. A pair of orthogonal 10× 10 Latin squares was constructedin 1960, and it is now known that N(n) > 1, n > 6, so that Graeco-Latin squares exist except when n = 6. Some bounds for N(n) areknown, and N(n) →∞ as n→∞.

Notes

1. The full axioms for a field are not used in the above construction.It would be enough to have a linear associative algebra. Thisfact does lead to systems of squares essentially different fromL1, . . . , Ln, but not to a solution when n 6= pm.


2. Orthogonal partitions of Latin squares can be constructed: thefollowing is an example derived from a 4×4 Graeco-Latin square.

Bγ Dα Cβ Aδ BII DI CI AIICδ Aβ Bα Dγ CII AI BI DIIAα Cγ Dδ Bβ AI CII DII BIDβ Bδ Aγ Cα DI BII AII CI

The symbols I, II each occur twice in each row and column andtwice in common with each letter. Orthogonal partitions existfor 6× 6 squares.

3. Combinatorial properties of Latin squares are unaffected bychanges between rows, columns and letters.

B.4 Finite geometries

Closely associated with Galois fields are systems of finite num-bers of “points” that with suitable definitions satisfy axioms ofeither Euclidean or projective geometry and are therefore reason-ably called finite geometries. In an abstract approach we start witha system consisting of a finite number of points and a collectionof lines, each line consisting of a set of points said to be collinear.Such a system is called a finite projective geometry PG(k, pm ) if itobeys the following axioms:

1. there is just one line through any pair of points

2. if points A,B,C are not collinear and if a line l contains a pointD on the line AB and a point E on the line BC, then it containsa point F on the line CA

3. if points are called 0-spaces and lines 1-spaces and if q-spacesare defined inductively, for example by defining 2-spaces as theset of points collinear with points on two given distinct lines,then if q < k not all points lie in the same q space and there isno k + 1 space

4. there are (pm)k + pm + 1 distinct points.It can be shown that such a system is isomorphic with the fol-

lowing construction. Let aj denote elements of GF (pm ). Then apoint is identified by a set of homogeneous coordinates

(a0, a1, . . . , ak),


not all zero. By the term homogeneous coordinates is meant thatall sets of coordinates (aa0, aa1, . . . , aak) for a any nonzero elementof the field denote the same point. The line joining two points

(a0, a1, . . . , ak), (b0, b1, . . . , bk)

contains all points with coordinates

(λa0 + µb0, λa1 + µb1, . . . , λak + µbk), (B.10)

where λ, µ are elements of the field.The axioms defining a field can be used to show that the require-

ments of a finite projective geometry are satisfied.Now in “ordinary” geometry, Euclidean geometry is obtained

from a corresponding projective geometry by deleting points at in-finity. A similar construction is possible here. Take all those pointswith a0 6= 0; without loss of generality we can then set a0 to theunit element of the field and take the remaining points as definedby a unique set of k coordinates, a1, . . . , ak, say. This system iscalled a finite Euclidean geometry, EG(k, pm). In effect the set ofpoints with a0 = 0 plays the role of a point at infinity.

Many of the features of “ordinary” geometry, for example a du-ality principle in which k − 1 subspaces correspond to points andk − 2 subspaces to lines can be mimicked in these systems.

In particular if m1 < m and we define a subfield GF(pm1) con-tained in GF(pm) we can derive a proper subgeometry PG(k, pm1)within PG(k, pm) by using only elements of the subfield.

As an example we consider PG(2, 2) contained within PG(2, 22).We start with the elements of the Galois field labelled 0, 1, x, x+1as above. The full system has 21 points with homogeneous coordi-nates as follows:

A B C D E0, 0, 1 0, 1, 0 0, 1, 1 0, 1, x 0, 1, x+ 1F G H I J

1, 0, 0 1, 0, 1 1, 0, x 1, 0, x+ 1 1, 1, 0K L M N O

1, 1, 1 1, 1, x 1, 1, x+ 1 1, x, 0 1, x, 1P Q R S T

1, x, x 1, x, x+ 1 1, x+ 1, 0 1, x+ 1, 1 1, x+ 1, xU

1, x+ 1, x+ 1


The lines are formed from linear combinations of coordinates. Thuson the line AB are also the points 0, µA+ λB for all choices of λ,µ from the nonzero elements of GF(22). This leads to the linecontaining just the points A,B,C,D,E.

The subgeometry PG(2, 2) is formed from the points with co-ordinates 00, 01, 10, 11 formed from the elements 0, 1 of GF(2).These are the points A,B,C, F,G, J,K in the above specificationand when these are arranged in a rectangle with associated linesas columns we obtain the balanced incomplete block design withseven points (treatments) arranged in seven lines (blocks), withthree points on each line, each pair of treatments occurring in thesame line just once:

A B F C J K GB F C J K G AC J K G A B F

(B.11)

A Euclidean geometry is formed from points F, . . . , U specifyingeach point by the second and third coordinate, for example M as(1, x+ 1).

B.5 Difference sets

A very convenient way of generating block, and more generallyrow by column, designs is by development from an initial blockby repeated addition of 1. That is, if there are v treatments la-belled 0, 1, . . . , v − 1 we define an initial block and then producemore blocks by successive addition of 1 and reduction mod v. Forexample, if v = 7 and we start with the initial block 1, 2, 4 thenwith the successive blocks, namely 2, 3, 5; 3, 4, 6; 4, 5, 0; 5, 6, 1;6, 0, 2; 0, 1, 3, we have a balanced incomplete block design withv = b = 7, r = k = 3, λ = 1, different from that in (B.11).

The key to this construction is that in the initial block the dif-ferences between pairs of entries are 3 − 2 = 1, 2 − 3 = −1 = 6,5 − 2 = 3, 2 − 5 = 4, 5 − 3 = 2, 3 − 5 = 5, so that each possibledifference occurs just once. This implies that in the whole designeach pair of treatments occurs together just once.

There are connections between difference sets and Abelian groupsand also with Galois fields. Thus it can be shown that for v = pm =4q + 1 there are two starting blocks with the desired properties,namely the set of nonzero quadratic residues of GF(pm) and the


set of nonzero nonquadratic residues. If v = pm = 4q − 1 we takethe nonzero quadratic residues.

B.6 Hadamard matrices

An n× n square matrix L is orthogonal by definition if

LTL = I, (B.12)

where I is the n× n identity matrix. The matrix L may be calledorthogonal in the extended sense if

LTL = D, (B.13)

where D is a diagonal matrix with strictly positive elements. Sucha matrix is formed from mutually orthogonal columns which are,however, not in general scaled to have unit norm. The columns ofsuch a matrix can always be rescaled to produce an orthogonalmatrix.

An n×n matrixH is called a Hadamard matrix if its first columnconsists only of elements +1, if its remaining elements are +1 or−1 and if it is orthogonal in the extended sense.

For such a matrix to exist n must be a multiple of 4. It has beenshown that such matrices indeed exist for all multiples of 4 up toand including 424.

If for a prime p, 4t = p+ 1 we may define a matrix by

hi0 = h0j = 1, hii = −1, hij = χ(j − i) (B.14)

and the required orthogonality property follows from those of thequadratic residue. If pm = 4t − 1 we proceed similarly labellingthe rows and columns by the elements of GF(pm). The size of aHadamard matrix, H , can always be doubled by the construction

(H HH −H

). (B.15)

The following is a 8× 8 Hadamard matrix:


1 1 1 1 1 1 1 11 1 1 1 −1 −1 −1 −11 1 −1 −1 1 1 −1 −11 1 −1 −1 −1 −1 1 11 −1 1 −1 1 −1 1 −11 −1 1 −1 −1 1 −1 11 −1 −1 1 1 −1 −1 11 −1 −1 1 −1 1 1 −1

. (B.16)

This is used to define the treatment contrasts in a 23 factorialin Section 5.5 and a saturated main effect plan for a 27 factorial inSection 6.3.

B.7 Orthogonal arrays

In Section 6.3 we briefly described orthogonal arrays, which fromone point of view are generalizations of fractional factorial designs.The construction of orthogonal arrays is based on the algebra offinite fields. Orthogonal arrays can be constructed from Hadamardmatrices, as illustrated in Section 6.3, and can also be constructedfrom Galois fields, from difference schemes, and from sets of or-thogonal Latin squares.

A symmetric orthogonal array of size n with k columns has ssymbols in each column, and has strength r if every n × r subar-ray contains each r-tuple of symbols the same number of times.Suppose s = pm is a prime power. Then an orthogonal array withn = sl rows and (sl − 1)/(s − 1) columns that has strength 2can be constructed as follows: form an l × (sl − 1)/(s− 1) matrixwhose columns are all nonzero l-tuples from GF(s) in which thefirst nonzero element is 1. All linear combinations of the rows ofthis generator matrix form an orthogonal array of the required size.This is known as the Rao-Hamming construction.

For example, with s = 2 and l = 3 and generator matrix

1 0 0 1 1 0 1

0 1 0 1 0 1 10 0 1 0 1 1 1


we obtain the 8× 7 orthogonal array of strength 2:

0 0 0 0 0 0 01 0 0 1 1 0 10 1 0 1 0 1 10 0 1 0 1 1 11 1 0 0 1 1 01 0 1 1 0 1 00 1 1 1 1 0 01 1 1 0 0 0 1

.

Orthogonal arrays can also be constructed from error-correctingcodes, by associating to each codeword a row of an orthogonalarray. In the next section we illustrate the construction of codesfrom some of the designs considered in this book.

B.8 Coding theory

The combinatorial considerations involved in the design of experi-ments, in particular those associated with orthogonal Latin squaresand balanced incomplete block designs, have other applications, no-tably to the theory of error-detecting and error-correcting codes.

For example, suppose that q ≥ 3, q 6= 6 so that a q × q Graeco-Latin square exists. Then we may use an alphabet of q letters0, 1, . . . , q−1 to assign each of q2 codewords a code of four symbolsby labelling the codewords (i, j) for i, j = 0, . . . , q − 1 and thenassigning codeword (i, j) the code

ijαijβij , (B.17)

where αij and βij refer to the Latin and Greek letters in row i andcolumn j translated onto 0, . . . , q − 1 in the obvious way.

For example with q = 3 we obtain the following:

Codeword 00 01 02 10 11Code 0000 0111 0222 1012 1120

Codeword 12 20 21 22Code 1201 2021 2102 2210

It can be checked in this example, and indeed in general fromthe properties of the Graeco-Latin square, that the codes for anytwo codewords differ by at least three symbols. This implies that


two errors in coding can be detected and one error corrected, thelast by moving to the codeword nearest to the transmitted word.In the same way if q = pm, we can by labelling the codewords viathe elements of GF(pm) and using the complete set of mutuallyorthogonal q×q Latin squares obtain a coding of q2 codewords withq + 1 symbols per codeword and with very strong error detectingand error correcting properties.

Symmetrical balanced incomplete block designs with b = v canbe used to derive binary codes in a rather different way. Add tothe incidence matrix of the design a row of 0’s and below this thematrix with 0’s and 1’s interchanged, thus producing a 2v + 2 byv matrix coding 2v + 2 codewords with v symbols per codeword.Thus with b = v = 7, r = k = 3, λ = 1, sixteen codewords are eachassigned seven binary symbols. Again any two codewords differ byat least three symbols and the error-detecting and error-correctingproperties are as before.

B.9 Bibliographic notes

Restricted randomization was introduced by Yates (1951a, b) todeal with a particular practical problem arising with a quasi-Latinsquare, i.e. a factorial experiment with double confounding in squareform. The group theory justification is due to Grundy and Healy(1950). The method was rediscovered in a much simpler context byYouden (1956). See the general discussion by Bailey and Rowley(1987).

Galois fields were introduced into the study of sets of Latinsquares by Bose (1938).

A beautiful account of finite groups and fields is by Carmichael(1937). Street and Street (1981) give a wide-ranging account ofcombinatorial problems connected with experimental design.

John and Williams (1995) give an extensive discussion of designsformed by cyclic generation.

For the first account of a 10× 10 Graeco-Latin square, see Bose,Shrikhande and Parker (1960). For orthogonal partitions of Latinsquares, see Finney (1945a).

For an introduction to coding theory establishing connectionswith experimental design, see Hill (1986). The existence and con-struction of orthogonal arrays with a view to their statistical appli-cations is given by Hedayat, Sloane and Stufken (1999). The use oferror-correcting codes to construct orthogonal arrays is the subject


of their Chapter 5, and the Rao-Hamming construction outlinedin Section A.8 is given in Chapter 3.

There is an extensive specialized literature on all the topics inthis Appendix.

B.10 Further results and exercises

1. For the group C2(a, b, c) = 1, a, b, ab, c, ac, bc, abc write outthe multiplication table and verify that the group is equallygenerated by (ab, c, bc). Enumerate all subgroups of C2(a, b, c).

2. Write out the multiplication table for GF(22).3. Construct the addition and multiplication table for GF(32) tak-

ing x2 + x + 2 as the irreducible polynomial. (Verify that it isirreducible.) Verify the power cycle

x = x, x2 = 2x+ 1, x3 = 2x+ 2, x4 = 2,x5 = 2x, x6 = x+ 2, x7 = x+ 1

and conversely use the power cycle to derive the multiplicationtable.

4. Use the addition and multiplication tables for GF(32) to writedown L1, L2 for the 9 × 9 set. Check that L2 can be obtainedby permuting the rows of L1.

5. Construct a theory of orthogonal Latin cubes.6. Count the number of lines and points in finite Euclidean and

projective geometries.


APPENDIX C

Computational issues

C.1 Introduction

There is a wide selection of statistical computing packages, andmost of these provide the facility for analysis of variance and esti-mation of treatment contrasts in one form or another. With smalldata sets it is often straightforward, and very informative, to com-pute the contrasts of interest by hand. In 2k factorial designs thisis easily done using Yates’s algorithm (Exercise 5.1).

The package GENSTAT is particularly well suited to analysis ofcomplex balanced designs arising in agricultural application. SASis widely used in North America, partly for its capabilities in han-dling large databases. GLIM is very well suited to empirical modelbuilding by the successive addition or deletion of terms, and foranalysis of non-normal models of exponential family form.

Because S-PLUS is probably the most flexible and powerful ofthe packages we give here a very brief overview of the analysisof the more standard designs using S-PLUS, by providing codesufficient for the analysis of the main examples in the text. Thereader needing an introduction to S-PLUS or wishing to exploit itsfull capabilities will need to consult one of the several specializedbooks on the topic. As with many packaged programs, the outputfrom S-PLUS is typically not in a form suitable for the presentationof conclusions, an important aspect of analysis that we do notdiscuss.

We assume the reader is familiar with running S-PLUS on thesystem being used and with the basic structure of S-PLUS, includ-ing the use of .Data and CHAPTER directories (the latter introducedin S-PLUS 5.1 for Unix), and the use of objects and methods for ob-jects. A dataset, a fitted regression model, and a residual plot areall examples of objects. Example of methods for these objects aresummary, plot and residuals. Many objects have several specificmethods for them as well. The illustrations below use a commandline version of S-PLUS such as is often used in a Unix environment.


Most PC based installations of S-PLUS also offer a menu-drivenversion.

C.2 Overview

C.2.1 Data entry

The typical data from the types of experiments we describe in thisbook takes a single response or dependent variable at a time, sev-eral classification variables such as blocks, treatments, factors andso on, and possibly one or more continuous explanatory variables,such as baseline measurements. The dependent and explanatoryvariables will typically be entered from a terminal or file, using aversion of the scan or read.table command. It will rarely be thecase that the data set will contain fields corresponding to the var-ious classification factors. These can usually be constructed usingthe rep command. All classification or factor variables must beexplicitly declared to be so using the factor command.

Classification variables for several standard designs can be cre-ated by the fac.design command. It is usually convenient to col-lect these classification variables in a design object, which essen-tially contains all the information needed to construct the matrixfor the associated linear model.

The collection of explanatory, baseline, and classification vari-ables can be referred to in a variety of ways. The simplest, thoughin the long run most cumbersome, is to note that variables areautomatically saved in the current working directory by the namesthey are assigned as they are read or created. In this case the datavariables relevant to a particular analysis will nearly always be vec-tors with length equal to the number of responses. Alternatively,when the data file has a spreadsheet format with one row per caseand one column per variable, it is often easy to store the dependentand explanatory variables as a matrix. The most flexible and ulti-mately powerful way to store the data is as a data.frame, which isessentially a matrix with rows corresponding to observations andcolumns corresponding to variables, and a provision for assigningnames to the individual columns and rows.

In the first example below we illustrate these three methods ofdefining and referring to variables: as vectors, as a matrix, andas a data frame. In subsequent examples we always combine the


variables in a data frame, usually using a design object for theexplanatory variables.

As will be clear from the first example, one disadvantage of adata frame is that individual columns must be accessed by theslightly cumbersome form data.frame.name$variable.name. Byusing the command attach(data.frame,1), the data frame takesthe place of the working directory on the search path, and anyuse of variable.name refers to that variable in the attached dataframe. More details on the search path for variables is provided bySpector (1994).

C.2.2 Treatment means

The first step in an analysis is usually the construction of a ta-ble of treatment means. These can be obtained using the tapplycommand, illustrated in Section C.3 below. To obtain the mean re-sponse of y at each of several levels of x use tapply(y, x, mean).In most of our applications x will be a factor variable, but in anycase the elements of x are used to define categories for the cal-culation of the mean. If x is a list then cross-classified means arecomputed; we use this in Section C.5. In Section C.3 we illustratethe use of tapply on a variable, on a matrix, and on a data frame.

A data frame that contains a design object or a number of factorvariables has several specialized plotting methods, the most usefulof which is interaction.plot. Curiously, a summary of meansof a design object does not seem to be available, although thesemeans are used by the plotting methods for design objects.

An analysis of variance will normally be used to provide es-timated standard errors for the treatment means, using the aovcommand described in the next subsection. If the design is com-pletely balanced, the model.tables command can be used on theresult of an aov command to construct a table of means after ananalysis of variance, and this, while in principle not a good idea,will sometimes be more convenient than constructing the table ofmeans before fitting the analysis of variance. For unbalanced orincomplete designs, model.tables will give estimated effects, butthey are not always properly adjusted for lack of orthogonality.


C.2.3 Analysis of variance

Analysis of variance is carried out using the aov command, whichis a specialization of the lm command used to fit a linear model.The summary and plot methods for aov are designed to providethe information most often needed when analysing these kinds ofdata.

The input to the aov command is a response variable and amodel formula. S-PLUS has a powerful and flexible modelling lan-guage which we will not discuss in any detail. The model formulaefor most analyses of variance for balanced designs are relativelystraightforward. The model formula takes the form y ~ model,where y is the response or dependent variable. Covariates entermodel by their names only and an overall mean term (denoted 1)is always assumed to be present unless explicitly deleted from themodel formula. If A and B are factors A + B represents an additivemodel with the main effects of A and B, A:B represents their in-teraction, and A*B is shorthand for A + B + A:B. Thus the linearmodel

E(Yjs) = µ+ βxjs + τAj + τB

s + τABjs

can be writteny~x+A*B

whileE(Yjs) = µ+ βjxjs + τA

j + τBs + τAB

js

can be writteny~x+x:A+A*B.

There is also a facility for specifying nested effects; for examplethe model E(Ya;j) = µ+ τa + ηaj is specified as y ~ A+B/A.

Model formulae are discussed in detail by Chambers and Hastie(1992, Chapter 2).

The analysis of variance table is printed by the summary func-tion, which takes as its argument the name of the aov object. Thiswill show sums of squares corresponding to individual terms in themodel. The summary command does not show whether or not thesums of squares are adjusted for other terms in the model. In bal-anced cases the sums of squares are not affected by other termsin the model but in unbalanced cases or in more general modelswhere the effects are not orthogonal, the interpretation of individ-ual sums of squares depends crucially on the other terms in themodel.


S-PLUS computes the sums of squares much in the manner ofstagewise fitting described in Appendix A, and it is also possible toupdate a fitted model using special notation described in Chambersand Hastie (1992, Chapter 2). The convention is that terms areentered into the model in the order in which they appear on theright hand side of the model statement, so that terms are adjustedfor those appearing above it in the summary of the aov object. Forexample,

unbalanced.aov <- aov(y ~ x1 + x2 + x3)summary(unbalanced.aov)

will fit the models

y = µ+ β1x1

y = µ+ β1x1 + β2x2

y = µ+ β1x1 + β2x2 + β3x3

and in the partitioning of the regression sum of squares the sumof squares attributed to x1 will be unadjusted, that for x2 will beadjusted for x1, and that for x3 adjusted for x1 and x2. Be warnedthat this is not flagged in the output except by the order of theterms:

> summary(unbalanced.aov)Df Sum of Sq Mean Sq F Value Pr(F)

x1 (unadj.)x2 (adj. for x1)x3 (adj. for x1, x2)

residuals

C.2.4 Contrasts and partitioning sums of squares

As outlined in Section 3.5, it is often of interest to partition thesums of squares due to treatments using linear contrasts. In S-PLUS each factor variable has an associated set of linear contrasts,which are used as parameterization constraints in the fitting ofthe model specified in the aov command. These linear contrastsdetermine the estimated values of the unknown parameters. Theycan also be used to partition the associated sum of squares in theanalysis of variance table using the split option to summary(aov).

This dual use of contrasts for factor variables is very power-ful, although somewhat confusing. We will first indicate the use of


contrasts in estimation, before using them to partition the sums ofsquares.

The default contrasts for an unordered factor, which is createdby factor(x), are Helmert contrasts, which compare the secondlevel with the first, the third level with the average of the first two,and so on. The default contrasts for an ordered factor are thosedetermined by the appropriate orthogonal polynomials. The con-trasts used in fitting can be changed before an analysis of varianceis constructed, using the options command. The summation con-straint for an unordered factor

∑τj = 0 is imposed by specifying

contr.sum, and the constraint τ1 = 0 is imposed by specifyingcontr.treatment:> options(contrasts=c("contr.sum", "contr.poly"))> options(contrasts=c("contr.treatment", "contr.poly"))

It is possible to specify a different set of contrasts for orderedfactors from polynomial contrasts, but this will rarely be needed. InSection C.3.3 below we estimate the treatment parameters undereach of the three constraints: Helmert, summation and τ1 = 0. Ifindividual estimates of the τj are to be used for any purpose, andthis should be avoided as far as feasible, it is essential to note theconstraints under which these estimates were obtained.

The contrasts used in fitting the model can also be used to par-tition the sums of squares. The summation contrasts will rarelybe of interest in this context, but the orthogonal polynomial con-trasts will be useful for quantitative factors. Prespecified contrastsmay also be specified, using the function contrasts or C. Use ofthe contrast matrix C is outlined in detail by Venables and Ripley(1999, Chapter 6.2).

C.2.5 Plotting

There are some associated plotting methods that are often use-ful. The function interaction.plot plots the mean response bylevels of two cross-classified factors, and is illustrated in SectionC.5 below. An optional argument allows some other function ofthe response, such as the median or standard error, to be plottedinstead.

The function qqnorm, when applied to an analysis of varianceobject created by the aov command, constructs a half-normal plotof the estimated effects (see Section 5.5). Two optional argumentsare very useful: qqnorm(aov.example, label=6) will label the six


largest effects on the plot, and qqnorm(aov.example, full=T)will construct a full normal plot of the estimated effects.

C.2.6 Specialized commands for standard designs

There are a number of commands for constructing designs, includ-ing fac.design, oa.design, and expand.grid. Fractional facto-rials can be constructed with an optional argument to fac.design.Details on the use of these functions are given by Chambers andHastie (1992, Chapter 5.2); see also Venables and Ripley (1999,Chapter 6.7).

C.2.7 Missing values

Missing values are generally assigned the special value NA. S-PLUSfunctions differ in their handling of missing values. Many of theplotting functions, for example, will plot missing values as zeroes;the documentation for, for example, interaction.plot includesunder the description of the response variable the information“Missing values (NA) are allowed”. On the other hand, aov handlesmissing values in the same way lm does, through the optional ar-gument na.action. The default value for na.action is na.fail,which halts further computation. Two alternatives are na.omit,which will omit any rows of the data frame that have missing val-ues, and na.include, which will treat NA as a valid factor levelamong all factor variables; see Spector (1994, Ch. 10).

In some design and analysis textbooks there are formulae forcomputing (by hand) treatment contrasts, standard errors, andanalysis of variance tables in the presence of a small number ofmissing responses in randomized block designs; Cochran and Cox(1958) provide details for a number of other more complex designs.In general, procedures for arbitrarily unbalanced data may have tobe used.

C.3 Randomized block experiment from Chapter 3

C.3.1 Data entry

This is the randomized block experiment taken from Cochran andCox (1958), to compare five quantities of potash fertiliser on thestrength of cotton fiber. The data and analysis of variance are given


in Tables 3.1 and 3.2. The dependent variable is strength, and thereare two classification variables, treatment (amount of potash), andblock. The simplest way to enter the data is within S-PLUS:> potash.strength<-scan()1: 762 814 776 717 746 800 815 773 757 768 793 787 774 780 72116:> potash.strength<-potash.strength/100> potash.tmt<-factor(rep(1:5,3))> potash.blk<-factor(rep(1:3,rep(5,3)))> potash.tmt[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5> potash.blk[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3> is.factor(potash.tmt)[1] T

We could also construct a 15 × 3 matrix to hold the responsevariable and the explanatory variables, although the columns ofthis matrix are all considered numeric, even if the variable enteredis a factor.> potash.matrix<-matrix(c(potash.strength, potash.tmt,+ potash.blk),15,3)> potash.matrix

[,1] [,2] [,3][1,] 7.62 1 1[2,] 8.14 2 1[3,] 7.76 3 1[4,] 7.17 4 1

.

.

.>is.factor(potash.tmt)[1] T>is.factor(potash.matrix[,2])[1] F

Finally we can construct the factor levels using fac.design,store them in the design object potash.design, and combine thiswith the dependent variable in a data frame potash.df. In theillustration below we add ‘names’ for the factor levels, an optionthat is available (but not required) in the fac.design command.> fnames<-list(tmt=c("36", "54", "72", "108", "144"),+ blk=c("I","II","III"))> potash.design<-fac.design(c(5,3),fnames)> potash.design

tmt blk1 36 I2 54 I3 72 I4 108 I


.

.

.> strength<-potash.strength # this is simply to use a shorter

# name in what follows> rm(strength, fnames, potash.design) # remove un-needed objects> potash.df<-data.frame(strength,potash.design)> potash.dfpotash.df

strength tmt blk1 7.62 36 I2 8.14 54 I3 7.76 72 I4 7.17 108 I

.

.

.

C.3.2 Table of treatment and block means

The simplest way to compute the treatment means is using thetapply command. When used with an optional factor argument astapply(y,factor,mean) the calculation of the mean is stratifiedby the level of the factor. This can be used on any of the datastructures outlined in the previous subsection:

> tapply(potash.strength,potash.tmt,mean)1 2 3 4 5

7.85 8.0533 7.7433 7.5133 7.45

> tapply(potash.matrix[,1],potash.matrix[,2],mean)1 2 3 4 5

7.85 8.0533 7.7433 7.5133 7.45

> tapply(potash.df$strength, potash.df$tmt, mean)36 54 72 108 144

7.85 8.0533 7.7433 7.5133 7.45

As is apparent above, the tapply command is not terribly con-venient when used on a data matrix or a data frame. There arespecial plotting methods for data frames with factors that alloweasy plotting of the treatment means, but curiously there does notseem to be a ready way to print the treatment means without firstconstructing an analysis of variance.


C.3.3 Analysis of variance

We first form a two way analysis of variance using aov. Note thatthe summary method for the analysis of variance object gives moreuseful output than printing the object itself.

In this example we illustrate the estimates τj in the model yjs =µ+τj+βs+εjs under the default constraint specified by the Helmertcontrasts, under the summation constraint

∑τj = 0, and under

the constraint often used in generalized linear models τ1 = 0. Ifindividual estimates of the τj are to be used for any purpose, it isessential to note the constraints under which these estimates wereobtained. The analysis of variance table and estimated residual sumof squares are of course invariant to the choice of parametrizationconstraint.> potash.aov<-aov(strength~tmt+blk,potash.df)> potash.aovCall:

aov(formula = strength ~ tmt + blk, data = potash.df)

Terms:tmt blk Residuals

Sum of Squares 0.73244 0.09712 0.34948Deg. of Freedom 4 2 8

Residual standard error: 0.20901Estimated effects are balanced> summary(potash.aov)

Df Sum of Sq Mean Sq F Value Pr(F)tmt 4 0.73244 0.18311 4.1916 0.04037blk 2 0.09712 0.04856 1.1116 0.37499

Residuals 8 0.34948 0.04369

> coef(potash.aov)(Intercept) tmt1 tmt2 tmt3 tmt4 blk1 blk2

7.722 0.10167 -0.069444 -0.092222 -0.068 0.098 -0.006

> options(contrasts=c("contr.sum","contr.poly"))> potash.aov<-aov(strength~tmt+blk,potash.df)> coef(potash.aov)(Intercept) tmt1 tmt2 tmt3 tmt4 blk1 blk2

7.722 0.128 0.33133 0.021333 -0.20867 -0.092 0.104

> options(contrasts=c("contr.treatment","contr.poly"))> potash.aov<-aov(strength~tmt+blk,potash.df)> coef(potash.aov)(Intercept) tmt54 tmt72 tmt108 tmt144 blkII blkIII

7.758 0.20333 -0.10667 -0.33667 -0.4 0.196 0.08

The estimated treatment effects under the summation constraintcan also be obtained using model.tables or dummy.coef, so it


is not necessary to change the default fitting constraint with theoptions command, although it is probably advisable. Below weillustrate this, assuming that the default (Helmert) contrasts wereused in the aov command. We also illustrate how model.tablescan be used to obtain treatment means and their standard errors.

> options("contrasts")$contrasts:[1] "contr.helmert" "contr.poly"

> dummy.coef(potash.aov)$"(Intercept)":[1] 7.722

$tmt:36 54 72 108 144

0.128 0.33133 0.021333 -0.20867 -0.272

$blk:I II III

-0.092 0.104 -0.012

> model.tables(potash.aov)

Tables of effects

tmt36 54 72 108 144

0.12800 0.33133 0.02133 -0.20867 -0.27200

blkI II III

-0.092 0.104 -0.012Warning messages:

Model was refitted to allow projection in: model.tables(potash.aov)

> model.tables(potash.aov,type="means",se=T)

Tables of meansGrand mean

7.722

tmt36 54 72 108 144

7.8500 8.0533 7.7433 7.5133 7.4500

blkI II III

7.630 7.826 7.710

Standard errors for differences of meanstmt blk


0.17066 0.13219replic. 3.00000 5.00000

C.3.4 Partitioning sums of squares

For the potash experiment, the treatment was a quantitative factor,and in Section 3.5.5 we discussed partitioning the treatment sumsof squares using the linear and quadratic polynomial contrasts for afactor with five levels using (−2,−1, 0, 1, 2) and (2,−1,−2,−1, 2).Since orthogonal polynomials are the default for an ordered factor,the simplest way to partition the sums of squares in S-PLUS is todefine tmt as an ordered factor.

> otmt<-ordered(potash.df$tmt)> is.ordered(otmt)[1] T> is.factor(otmt)[1] T> contrasts(otmt)

.L .Q .C ^ 436 -6.3246e-01 0.53452 -3.1623e-01 0.1195254 -3.1623e-01 -0.26726 6.3246e-01 -0.4780972 -6.9389e-18 -0.53452 4.9960e-16 0.71714108 3.1623e-01 -0.26726 -6.3246e-01 -0.47809144 6.3246e-01 0.53452 3.1623e-01 0.11952> potash.df<-data.frame(potash.df,otmt)> rm(otmt)> potash.aov<-aov(strength~otmt+blk,potash.df)> summary(potash.aov)

Df Sum of Sq Mean Sq F Value Pr(F)otmt 4 0.73244 0.18311 4.1916 0.04037blk 2 0.09712 0.04856 1.1116 0.37499

Residuals 8 0.34948 0.04369> summary(potash.aov,split=list(otmt=list(L=1,Q=2)))

Df Sum of Sq Mean Sq F Value Pr(F)otmt 4 0.73244 0.18311 4.192 0.04037

otmt: L 1 0.53868 0.53868 12.331 0.00794otmt: Q 1 0.04404 0.04404 1.008 0.34476

blk 2 0.09712 0.04856 1.112 0.37499Residuals 8 0.34948 0.04369> summary(potash.aov,split=list(otmt=list(L=1,Q=2,C=3,QQ=4)))

Df Sum of Sq Mean Sq F Value Pr(F)otmt 4 0.73244 0.18311 4.192 0.04037

otmt: L 1 0.53868 0.53868 12.331 0.00794otmt: Q 1 0.04404 0.04404 1.008 0.34476otmt: C 1 0.13872 0.13872 3.175 0.11261otmt: QQ 1 0.01100 0.01100 0.252 0.62930

blk 2 0.09712 0.04856 1.112 0.37499Residuals 8 0.34948 0.04369

It is possible to specify just one contrast of interest, and a set of


contrasts orthogonal to the first will be constructed automatically.This set will not necessarily correspond to orthogonal polynomialshowever.

> contrasts(tmt)<-c(-2,-1,0,1,2)> contrasts(tmt) #these contrasts are orthogonal

#but not the usual polynomial contrasts[,1] [,2] [,3] [,4]

36 -2 -0.41491 -0.3626 -0.310454 -1 0.06722 0.3996 0.732072 0 0.83771 -0.2013 -0.2403108 1 -0.21744 0.6543 -0.4739144 2 -0.27258 -0.4900 0.2925

> potash.aov<-aov(strength~tmt+blk,potash.df)> summary(potash.aov,split=list(tmt=list(1)))

Df Sum of Sq Mean Sq F Value Pr(F)tmt 4 0.7324 0.1831 4.19 0.0404

tmt: Ctst 1 1 0.5387 0.5387 12.33 0.0079blk 2 0.0971 0.0486 1.11 0.3750

Residuals 8 0.3495 0.0437

Finally, in this example recall that the treatment levels are notin fact equally spaced, so that the exact linear contrast is as givenin Section 3.5: (−2,−1.23,−0.46, 1.08, 2.6). This can be specifiedusing contrasts, as illustrated here.

> contrasts(potash.tmt)<-c(-2,-1.23,-0.46,1.08,2.6)> contrasts(potash.tmt)

[,1] [,2] [,3] [,4]1 -2.00 -0.44375 -0.4103 -0.37732 -1.23 -0.09398 0.3332 0.75483 -0.46 0.86128 -0.1438 -0.14884 1.08 -0.15416 0.6917 -0.46055 2.60 -0.16939 -0.4707 0.2318> potash.aov<-aov(potash.strength~potash.tmt+potash.blk)> summary(potash.aov,split=list(potash.tmt=list(1,2,3,4)))

Df Sum of Sq Mean Sq F Value Pr(F)potash.tmt 4 0.7324 0.1831 4.19 0.0404

potash.tmt: Ctst 1 1 0.5668 0.5668 12.97 0.0070potash.tmt: Ctst 2 1 0.0002 0.0002 0.01 0.9444potash.tmt: Ctst 3 1 0.0045 0.0045 0.10 0.7577potash.tmt: Ctst 4 1 0.1610 0.1610 3.69 0.0912

potash.blk 2 0.0971 0.0486 1.11 0.3750Residuals 8 0.3495 0.0437


C.4 Analysis of block designs in Chapter 4

C.4.1 Balanced incomplete block design

The first example in Section 4.2.6 is a balanced incomplete blockdesign with two treatments per block in each of 15 blocks. Thedata are entered as follows:> weight<-scan()1: 251 215 249 223 254 226 258 215 265 24111: 211 190 228 211 215 170 232 253 215 22321: 234 215 230 249 220 218 226 243 228 25631:> weight<-weight/100> blk<-factor(rep(1:15,rep(2,15)))> blk[1] 1 1 2 2 3 3 4 4 ...> tmt <- 0> for (i in 1:5) for (j in (i+1):6) tmt <- c(tmt,i,j)> tmt <- tmt[-1]> tmt <- factor(tmt)> tmt[1] 1 2 1 3 1 4 1 5 1 6 2 3 2 4 2 5 2 6 3 4 3 5 3 6 4 5 4 6 5 6> fnames<-list(tmt=c("C","His-","Arg-","Thr-","Val-","Lys-"),+ blk=c(1:15))> chick.design<-design(tmt,blk,factor.names=fnames)> chick.design

tmt blk1 C 12 His- 13 C 24 Arg- 2...

> chick.df<-data.frame(weight,chick.design)> rm(chick.design, fnames, blk)> chick.df

weight tmt blk1 2.51 C 12 2.15 His- 13 2.49 C 24 2.23 Arg- 25 2.54 C 36 2.26 Thr- 3...

We now compute treatment means, both adjusted and unad-justed, and the analysis of variance table for their comparison. Thisis our first example of an unbalanced design, in which for examplethe sums of squares for treatments ignoring blocks is different from


the sums of squares adjusted for blocks. The convention in S-PLUSis that terms are added to the model in the order they are listed inthe model statement. Thus to construct the intrablock analysis ofvariance, in which treatments are adjusted for blocks, we use themodel statement y ~ block + treatment.

We used tapply to obtain the unadjusted treatment means, andobtained the adjusted means by adding τj to the overall meanY... The τj were obtained under the summation constraint. It ispossible to derive both Qj and the adjusted treatment means us-ing model.tables, although this returns an incorrect estimate ofthe standard error and is not recommended. The least squares es-timates of τj under the summation constraint are also returnedby dummy.coef, even if the summation constraint option was notspecified in fitting the model.

> tapply(weight,tmt,mean)1 2 3 4 5 6

2.554 2.202 2.184 2.212 2.092 2.484> options(contrasts=c("contr.sum","contr.poly"))> chick.aov<-aov(weight~blk+tmt,chick.df)> summary(chick.aov)

Df Sum of Sq Mean Sq F Value Pr(F)blk 14 0.75288 0.053777 8.173 0.0010245tmt 5 0.44620 0.089240 13.562 0.0003470

Residuals 10 0.06580 0.006580

> coef(chick.aov)(Intercept) blk1 blk2 blk3 blk4 blk5 blk6

2.288 -0.1105 -0.013 0.0245 0.060333 0.060333 -0.25883

blk7 blk8 blk9 blk10 blk11 blk12 blk13 blk14-0.071333 -0.2705 0.0645 -0.0088333 0.117 0.102 0.0595 0.0495

tmt1 tmt2 tmt3 tmt4 tmt50.26167 0.043333 -0.091667 -0.086667 -0.22833

> dummy.coef(chick.aov)$"(Intercept)":[1] 2.288...$tmt:

C His- Arg- Thr- Val- Lys-0.26167 0.043333 -0.091667 -0.086667 -0.22833 0.10167

> tauhat<-.Last.value$tmt> tauhat+mean(weight)

C His- Arg- Thr- Val- Lys-2.5497 2.3313 2.1963 2.2013 2.0597 2.3897

> model.tables(chick.aov,type="adj.means")


Tables of adjusted meansGrand mean

2.28800se 0.01481

...

tmtC His- Arg- Thr- Val- Lys-

2.5497 2.3313 2.1963 2.2013 2.0597 2.3897se 0.0452 0.0452 0.0452 0.0452 0.0452 0.0452

We will now compute the interblock analysis of variance using re-gression on the block totals. The most straightforward approach isto compute the estimates directly from equations (4.32) and (4.33);the estimated variance is obtained from the analysis of variance ta-ble with blocks adjusted for treatments. To obtain this analysis ofvariance table we specify treatment first in the right hand side ofthe model statement that is the argument of the aov command.> N <- matrix(0, nrow=6, ncol=15)> ind <- 0> for (i in 1:5) for (j in (i+1):6) ind <- c(ind,i,j)> ind<- ind[-1]> ind <- matrix(ind, ncol=2,byrow=T)> for (i in 1:15) N[ind[i,1],i] <- N[ind[i,2],i] <-1> B<-tapply(weight,blk,sum)> B

1 2 3 4 5 6 7 8 9 10 11 124.66 4.72 4.8 4.73 5.06 4.01 4.39 3.85 4.85 4.38 4.49 4.79

13 14 154.38 4.69 4.84

> tau<-(N%*%B-5*2*mean(weight))/4> tau<-as.vector(tau)> tau[1] 0.2725 -0.2800 -0.1225 -0.0600 -0.1475 0.3375>> summary(aov(weight~tmt+blk,chick.df))

Df Sum of Sq Mean Sq F Value Pr(F)tmt 5 0.85788 0.17158 26.075 0.000020blk 14 0.34120 0.02437 3.704 0.021648

Residuals 10 0.06580 0.00658> sigmasq<-0.00658> sigmaBsq<-((0.34120/14-0.00658)*14)/(6*4)> sigmaBsq[1] 0.010378> vartau1<-sigmasq*2*5/(6*6)> vartau2<-(2*5*(sigmasq+2*sigmaBsq))/(6*4)> (1/vartau1)+(1/vartau2)


[1] 634.91> (1/vartau1)/.Last.value[1] 0.86172> dummy.coef(chick.aov)$tmt

C His- Arg- Thr- Val- Lys-0.26167 0.043333 -0.091667 -0.086667 -0.22833 0.10167> tauhat<-.Last.value> taustar<-.86172*tauhat+(1-.86172)*tau> taustar

C His- Arg- Thr- Val- Lys-0.26316 -0.0013772 -0.09593 -0.082979 -0.21716 0.13428> sqrt(1/( (1/vartau1)+(1/vartau2)))[1] 0.039687> setaustar<-.Last.value> sqrt(2)*setaustar[1] 0.056125

C.4.2 Unbalanced incomplete block experiment

The second example from Section 4.2.6 has all treatment effectshighly aliased with blocks. The data is given in Table 4.13 andthe analysis summarized in Tables 4.14 and 4.15. The within blockanalysis is computed using the aov command, with blocks (days)entered into the model before treatments. The adjusted treatmentmeans are computed by adding Y.. to the estimated coefficients. Wealso indicate the computation of the least squares estimates underthe summation constraint using the matrix formulae of Section 4.2.The contrasts between pairs of treatment means do not have equalprecision; the estimated standard error is computed for each meanusing var(Yj.) = σ2/rj , although for comparing pairs of means itmay be more useful to use the result that cov(τ ) = C−.> day<-rep(1:7,rep(4,7))> tmt<-scan()1: 1 8 9 9 9 5 4 9 2 3 8 5 ...29:> expansion<-scan()1: 150 148 130 117 122 141 112 ...29:> day<-factor(day)> tmt<-factor(tmt)> expansion<-expansion/10> dough.design<-design(tmt,day)> dough.df<-data.frame(expansion,dough.design)> dough.df

expansion tmt day1 15.0 1 12 14.8 8 13 13.0 9 1


4 11.7 9 15 12.2 9 26 14.1 5 2

.

.

.

> tapply(expansion,day,mean)1 2 3 4 5 6 7

13.625 12.275 14.525 13.475 11.475 15.15 11.55> tapply(expansion,tmt,mean)

1 2 3 4 5 6 7 8 9 10 1114.8 15.45 10.45 12 14.85 18.2 13.15 15.3 11.46667 11.2 13

12 13 14 1512.7 11.7 11.4 11.1

> dough.aov<-aov(expansion~day+tmt,dough.df)> summary(dough.aov)

Df Sum of Sq Mean Sq F Value Pr(F)day 6 49.41 8.235 11.188 0.00275tmt 14 96.22 6.873 9.337 0.00315

Residuals 7 5.15 0.736

> dummy.coef(.Last.value)$tmt1 2 3 4 5 6 7 8

1.3706 3.5372 -2.3156 -1.0711 2.1622 3.9178 0.85389 2.2539

9 10 11 12 13 14 15-0.51556 -3.4822 0.58444 -1.9822 -0.71556 -3.2822 -1.3156

> replications(dough.design)$tmt:1 2 3 4 5 6 7 8 9 10 11 12 13 14 152 2 1 2 2 1 1 6 1 1 2 2 1 2 2

$day:[1] 4> R<-matrix(0,nrow=15,ncol=15)> diag(R)<-replications(dough.design)$tmt> K<-matrix(0,nrow=7,ncol=7)> diag(K)<-rep(4,7)> N<-matrix(0,nrow=15,ncol=7)> N[,1]<-c(1,0,0,0,0,0,0,1,2,0,0,0,0,0,0)> N[,2]<-c(0,0,0,1,1,0,0,0,2,0,0,0,0,0,0)> N[,3]<-c(0,1,1,0,1,0,0,1,0,0,0,0,0,0,0)> N[,4]<-c(0,0,0,0,0,1,0,0,0,1,0,1,0,1,0)> N[,5]<-c(0,0,1,0,0,0,0,0,0,0,1,0,1,0,1)> N[,6]<-c(1,0,0,1,0,1,1,0,0,0,0,0,0,0,0)> N[,7]<-c(0,1,0,0,0,0,1,0,2,0,0,0,0,0,0)> Q<-S-N%*%solve(K)%*%B> C<-R-N%*%solve(K)%*%t(N)> Q%*%ginverse(C)

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]


[1,] 1.3706 3.5372 -2.3156 -1.0711 2.1622 3.9178 0.85389 2.2539

[,9] [,10] [,11] [,12] [,13] [,14] [,15][1,] -0.51556 -3.4822 0.58444 -1.9822 -0.71556 -3.2822 -1.3156> tauhat<-.Last.value> as.vector(tauhat+mean(expansion))[1] 14.5241 16.6908 10.8380 12.0825 15.3158 17.0713 14.0075[8] 15.4075 12.6380 9.6713 13.7380 11.1713 12.4380 9.8713[15] 11.8380> se<-0.7361/sqrt(diag(R))> se[1] 0.52050 0.52050 0.52050 0.52050 0.52050 0.52050 0.52050[8] 0.52050 0.30051 0.73610 0.73610 0.73610 0.73610 0.73610[15] 0.73610> setauhat<-sqrt(diag(ginverse(C)))> setauhat[1] 0.92376 0.92376 1.04243 0.92376 0.92376 1.04243 0.92376[8] 0.92376 0.76594 1.59792 1.59792 1.59792 1.59792 1.59792[15] 1.59792

C.5 Examples from Chapter 5

C.5.1 Factorial experiment, Section 5.2

The treatments in this experiment form a complete 3× 2× 2 fac-torial. The data are given in Table 5.1 and the analysis summa-rized in Tables 5.2 and 5.4. The code below illustrates the use offac.design to construct the levels of the factors. For this purposewe treat house as a factor, although in line with the discussionof Section 5.1 it is not an aspect of treatment. These factors arethen used to stratify the response in the tapply command, pro-ducing tables of marginal means. Figure 5.1 was obtained usinginteraction.plot, after constructing a four-level factor indexingthe four combinations of type of protein crossed with level of fishsolubles.> weight<-scan()1: 6559 6292 7075 6779 6564 6622 7528 6856 6738 6444 7333 636113: 7094 7053 8005 7657 6943 6249 7359 7292 6748 6422 6764 656025:

> exk.design<-fac.design(c(2,2,3,2),factor.names=+ list(House=c("I","II"), Lev.f=c("0","1"),+ Lev.pro=c("0","1","2"),Type=c("gnut","soy")))> exk.design

House Lev.f Lev.pro Type1 I 0 0 gnut2 II 0 0 gnut3 I 1 0 gnut


4 II 1 0 gnut5 I 0 1 gnut...

> exk.df<-data.frame(weight,exk.design)> rm(exk.design)> tapply(weight,list(exk.df$Lev.pro,exk.df$Type),mean)

gnut soy0 6676.2 7452.21 6892.5 6960.82 6719.0 6623.5

> tapply(weight,list(exk.df$Lev.f,exk.df$Type),mean)gnut soy

0 6536.5 6751.51 6988.7 7272.8

> tapply(weight,list(exk.df$Lev.f,exk.df$Lev.pro),mean)0 1 2

0 6749.5 6594.5 6588.01 7379.0 7258.7 6754.5

> tapply(weight, list(exk.df$Lev.pro,exk.df$Lev.f,exk.df$Type),+ mean)

, , gnut0 1

0 6425.5 69271 6593.0 71922 6591.0 6847

, , soy0 1

0 7073.5 7831.01 6596.0 7325.52 6585.0 6662.0

> Type.Lev.f<-factor(c(1,1,2,2,1,1,2,2,1,1,2,2,+ 3,3,4,4,3,3,4,4,3,3,4,4))> postscript(file="Fig5.1.ps",horizontal=F)> interaction.plot(exk.df$Lev.pro,Type.Lev.f,weight,+ xlab="Level of protein")> dev.off()

Table 5.3 shows the analysis of variance, using interactions withhouses as the estimate of error variance. As usual, the summarytable for the analysis of variance includes calculation of F statis-tics and associated p-values, whether or not these make sense inlight of the design. For example, the F statistic for the main effectof houses does not have a justification under the randomization,which was limited to the assignment of chicks to treatments. Indi-


vidual assessment of main effects and interactions via F -tests is alsousually not relevant; the main interest is in comparing treatmentmeans. As the design is fully balanced, model.tables provides aset of cross-classified means, as well as the standard errors for theircomparison. The linear and quadratic contrasts for the three-levelfactor level of protein are obtained first by defining protein as anordered factor, and then by using the split option to the analysisof variance summary.

> exk.aov<-aov(weight~Lev.f*Lev.pro*Type+House,exk.df)> summary(exk.aov)

Df Sum of Sq Mean Sq F Value Pr(F)Lev.f 1 1421553 1421553 31.741 0.00015

Lev.pro 2 636283 318141 7.104 0.01045Type 1 373751 373751 8.345 0.01474House 1 708297 708297 15.815 0.00217

Lev.f:Lev.pro 2 308888 154444 3.449 0.06876Lev.f:Type 1 7176 7176 0.160 0.69661

Lev.pro:Type 2 858158 429079 9.581 0.00390Lev.f:Lev.pro:Type 2 50128 25064 0.560 0.58686

Residuals 11 492640 44785

> model.tables(exk.aov,type="mean",se=T)


6887.4

Lev.f0 1

6644 7130.8

...

Standard errors for differences of meansLev.f Lev.pro Type House Lev.f:Lev.pro Lev.f:Type

86.396 105.81 86.396 86.396 149.64 122.18replic. 12.000 8.00 12.000 12.000 4.00 6.00

Lev.pro:Type Lev.f:Lev.pro:Type149.64 211.63

replic. 4.00 2.00Warning messages:

Model was refit to allow projection in: model.tables(exk.aov,type = "means", se = T)

> options(contrasts=c("contr.poly","contr.poly"))> exk.aov2<-aov(weight~Lev.f*Lev.pro*Type + House, data=exk.df)> summary(exk.aov2,split=list(Lev.pro=list(1,2)))

Df Sum of Sq Mean Sq F Value Pr(F)


Lev.f 1 1421553 1421553 31.74 0.0001Lev.pro 2 636283 318141 7.10 0.0104

Lev.pro: Ctst 1 1 617796 617796 13.79 0.0034Lev.pro: Ctst 2 1 18487 18487 0.41 0.5337

Type 1 373751 373751 8.34 0.0147House 1 708297 708297 15.81 0.0022

Lev.f:Lev.pro 2 308888 154444 3.45 0.0689Lev.f:Lev.pro: Ctst 1 1 214369 214369 4.79 0.0512Lev.f:Lev.pro: Ctst 2 1 94519 94519 2.11 0.1742

Lev.f:Type 1 7176 7176 0.16 0.6966Lev.pro:Type 2 858158 429079 9.58 0.0039

Lev.pro:Type: Ctst 1 1 759512 759512 16.96 0.0017Lev.pro:Type: Ctst 2 1 98645 98645 2.20 0.1658

Lev.f:Lev.pro:Type 2 50128 25064 0.56 0.5869Lev.f:Lev.pro:Type: Ctst 1 1 47306 47306 1.06 0.3261Lev.f:Lev.pro:Type: Ctst 2 1 2821 2821 0.06 0.8064

Residuals 11 492640 44785

C.5.2 24−1 fractional factorial; Section 5.7

The data for the nutrition trial of Blot et al. (1993) is given in Ta-ble 5.9. Below we illustrate the analysis of the log of the death ratefrom cancer, and the numbers of cancer deaths. The second anal-ysis is a reasonable approximation to the first; as the numbers atrisk are nearly equal across treatment groups. Both these analysesignore the blocking information on sex, age and commune. Blot etal. (1993) report the results in terms of the relative risk, adjustingfor the blocking factors; the conclusions are broadly similar. Thefraction option to fac.design defaults to the highest order inter-action for defining the fraction. In the model formula the shorthand.^ 2 denotes all main effects and two-factor interactions. We illus-trate the use of qqnorm for constructing a half-normal plot of theestimated effects from an aov object. The command qqnorm.aov isidentical to qqnorm. The command qqnorm(aov.object, full=T)will produce a full-normal plot of the estimated effects, and effectsother than the grand mean can be omitted from the plot with theoption omit=. Here we omitted the plotting of the aliased effects,otherwise they are plotted as 0.> lohi<-c("0","1")> cancer.design<- fac.design(levels=c(2,2,2,2),+ factor=list(A=lohi,B=lohi,C=lohi,D=lohi),fraction=1/2)> death.c<-scan()1: 107 94 121 101 81 103 90 959:> years<-scan()1: 18626 18736 18701 18686 18745 18729 18758 18792


9:> log.rates<-log(death.c/years)

# Below we analyse number of deaths from cancer and# the log death rate; the latter is discussed in Section 5.7.

> logcancer.df<-data.frame(log.rates,cancer.design)

> cancer.df<-data.frame(death.c,cancer.design)> rm(lohi,death.c,log.rates)

> logcancer.aov<-aov(log.rates~.^2,logcancer.df)> model.tables(logcancer.aov,type="feffects")

Table of factorial effectsA B C D A:B A:C

-0.036108 -0.005475 0.053475 -0.13972 -0.043246 0.15217

A:D-0.058331> cancer.aov<-aov(death.c~.^2,cancer.df)> model.tables(cancer.aov,type="feffects")

Table of factorial effectsA B C D A:B A:C A:D

-2.5 -1.5 5.5 -13.5 -5 15 -6

> postscript(file="FigC.1.ps",horizontal=F)> qqnorm(logcancer.aov,omit=c(8,9,10),label=7)> dev.off()

> mean(1/death.c)[1] 0.0102

C.5.3 Exercise 5.6: flour milling

This example is adapted from Tuck, Lewis and Cottrell (1993);that article provides a detailed case study of the use of responsesurface methods in a quality improvement study in the flour millingindustry. A subset of the full data from the article’s experiment I isgiven in Table 5.8. There are six factors of interest, all quantitative,labelled XA through XF and coded −1 and 1. (The variable name“F” is reserved in S-PLUS for “False”.) The experiment forms aone-quarter fraction of a 26 factorial. The complete data includeda further 13 runs taken at coded values for the factors arranged inwhat is called in response surface methodology a central compositedesign. Below we construct the fractional factorial by specifying thedefining relations as an optional argument to fac.design. As theS-PLUS default is to vary the first factor most quickly, which is the


••

••

•

•

•

Half-normal Quantiles

Effects

0.5 1.0 1.5

0.0

0.05

0.10

0.15

0.20

A:C

D

C

A:B A

B C:D

Figure C.1 Half normal plots of estimated effects: cancer mortality inLinxiang nutrition trial.

opposite of the design given in Table 5.8, we name the factors inreverse order.

> flour.y <- scan()1: 519 446 337 415 503 468 343 418 ...

61: 551 500 373 46265:> flour.tmt <- rep(1:16,rep(4,16))> flour.tmt[1] 1 1 1 1 2 2 2 2 3 3 3 3 ...

> flour.tmt <- factor(flour.tmt)> flour.day <- rep(1:4,16)> tapply(flour.y,flour.tmt,mean)

1 2 3 4 5 6 7 8 9429.25 433 454.25 456.75 446.75 447.75 455.5 448.25 458.75

10 11 12 13 14 15 16449.5 463.75 386 449.5 452.75 469 471.5> flour.ybar<-.Last.value


> flour.design<-fac.design(rep(2,6),+ factor.names<-c("XF","XE","XD","XC","XB","XA"),+ fraction = ~ XA:XB:XC:XD + XB:XC:XE:XF)> flour.design

XF XE XD XC XB XA1 XF1 XE1 XD1 XC1 XB1 XA12 XF2 XE2 XD1 XC1 XB1 XA13 XF2 XE1 XD2 XC2 XB1 XA14 XF1 XE2 XD2 XC2 XB1 XA15 XF2 XE1 XD2 XC1 XB2 XA16 XF1 XE2 XD2 XC1 XB2 XA17 XF1 XE1 XD1 XC2 XB2 XA18 XF2 XE2 XD1 XC2 XB2 XA19 XF1 XE1 XD2 XC1 XB1 XA210 XF2 XE2 XD2 XC1 XB1 XA211 XF2 XE1 XD1 XC2 XB1 XA212 XF1 XE2 XD1 XC2 XB1 XA213 XF2 XE1 XD1 XC1 XB2 XA214 XF1 XE2 XD1 XC1 XB2 XA215 XF1 XE1 XD2 XC2 XB2 XA216 XF2 XE2 XD2 XC2 XB2 XA2

Fraction: ~ XA:XB:XC:XD + XB:XC:XE:XF

> flour.df <- data.frame(flour.ybar, flour.design)

> flour.aov<-aov(flour.ybar~XA*XB*XC*XD*XE*XF,flour.df)> summary(flour.aov)

Df Sum of Sq Mean SqXA 1 53.473 53.473XB 1 752.816 752.816XC 1 89.066 89.066XD 1 1160.254 1160.254XE 1 412.598 412.598XF 1 230.660 230.660

XA:XB 1 223.129 223.129XA:XC 1 382.691 382.691XB:XC 1 204.848 204.848XA:XE 1 412.598 412.598XB:XE 1 402.504 402.504XC:XE 1 387.598 387.598XD:XE 1 349.223 349.223

XA:XB:XE 1 692.348 692.348XA:XC:XE 1 223.129 223.129

> flour.aov2<-aov(flour.y~flour.tmt+flour.day)> summary(flour.aov2)

Df Sum of Sq Mean Sq F Value Pr(F)flour.tmt 15 23907.7 1593.8 0.82706 0.6436536flour.day 3 391397.8 130465.9 67.69952 0.0000000Residuals 45 86721.0 1927.1

> model.tables(flour.aov,type="feffects")


Table of factorial effectsXA XB XC XD XE XF XA:XB XA:XC

5.1707 19.401 6.6733 24.086 -14.363 10.739 14.937 -19.563XB:XC XD:XE XA:XB:XE XA:XC:XE XB:XC:XE XA:XE:XF XB:XE:XF14.312 18.687 26.313 -14.937 0 0 0

C.6 Examples from Chapter 6

C.6.1 Split unit

The data for a split unit experiment are given in Table 6.8. Thestructure of this example is identical to the split unit exampleinvolving varieties of oats, originally given by Yates (1935), usedas an illustration by Venables and Ripley (1999, Chapter 6.11)Their discussion of split unit experiments emphasizes their formalsimilarity to designs with more than one component of variance,such as discussed briefly in Section 6.5. From this point of view thesubunits are nested within the whole units, and there is a specialmodelling operator A/B to represent factor B nested within factorA. Thus the result ofaov(y ~ temp * prep + Error(reps/prep))

is a list of aov objects, one of which is the whole unit analysis ofvariance and another is the subunit analysis of variance. The sub-unit analysis is implied by the model formula because the finestlevel analysis, in our case “within reps”, is automatically com-puted. As with unbalanced data, model.tables cannot be usedto obtain estimated standard errors, although it will work if themodel statement is changed to omit the interaction term betweenpreparation and temperature. Venables and Ripley (1999, Chap-ter 6.11) discuss the calculation of residuals and fitted values inmodels with more than one source of variation.> y<-scan()1: 30 34 29 35 41 26 37 38 33 36 42 3613: 28 31 31 32 36 30 40 42 32 41 40 4025: 31 35 32 37 40 34 41 39 39 40 44 4537:> prep<-factor(rep(1:3,12))> temp<-factor(rep(rep(1:4,rep(3,4)),3))> days<-factor(rep(1:3,rep(12,3)))> split.design<-design(days,temp,prep)> split.df<-data.frame(split.design,y)> rm(y, prep, temp, days, split.design)> split.df

days temp prep y1 1 1 1 30


2 1 1 2 343 1 1 3 294 1 2 1 35

...

> split.aov<-aov(y~temp*prep+Error(days/prep),split.df)> summary(split.aov)

Error: daysDf Sum of Sq Mean Sq F Value Pr(F)

Residuals 2 77.556 38.778

Error: prep %in% daysDf Sum of Sq Mean Sq F Value Pr(F)

prep 2 128.39 64.194 7.0781 0.048537Residuals 4 36.28 9.069

Error: WithinDf Sum of Sq Mean Sq F Value Pr(F)

temp 3 434.08 144.69 36.427 0.000000temp:prep 6 75.17 12.53 3.154 0.027109Residuals 18 71.50 3.97

> model.tables(split.aov,type="mean")Refitting model to allow projection


36.028

temp[,1]

1 31.2222 34.5563 37.8894 40.444

prep[,1]

1 35.6672 38.5003 33.917

temp:prepDim 1 : tempDim 2 : prep

1 2 31 29.667 33.333 30.6672 34.667 39.000 30.0003 39.333 39.667 34.6674 39.000 42.000 40.333

#calculate errors by hand


#use whole plot error for prep;#prep means are averaged over 12 obsns> sqrt(2*9.06944/12)[1] 1.229461

#use subplot error for temp;#temp means are averaged over 9 obsns> sqrt(2*3.9722/9)[1] 0.9395271

#use subplot error for temp:prep;#these means are averaged over 3 obsns> sqrt(2*3.9722/3)[1] 1.6273

C.6.2 Wafer experiment; Section 6.7.2

There are six controllable factors and one noise factor. The designis a split plot with the noise factor, over-etch time, the whole plottreatment. Each subplot is an orthogonal array of 18 runs withsix factors each at three levels. Tables of such arrays are availablewithin S-PLUS, using the command oa.design.

The F -value and p-value have been deleted from the output, asthe main effects of the factors should be compared using the wholeplot error, and the interactions of the factors with OE should becompared using the subplot error. These two error components arenot provided using the split plot formula, as there is no replicationof the whole plot treatment. One way to extract them is to specifythe model with all estimable interactions, and pool the appropriate(higher order) ones to give an estimate of the residual mean square.> elect1.design<-oa.design(rep(3,6))

> elect1.designA B C D E G

1 A1 B1 C1 D1 E1 G12 A1 B2 C2 D2 E2 G23 A1 B3 C3 D3 E3 G34 A2 B1 C1 D2 E2 G35 A2 B2 C2 D3 E3 G16 A2 B3 C3 D1 E1 G27 A3 B1 C2 D1 E3 G28 A3 B2 C3 D2 E1 G39 A3 B3 C1 D3 E2 G110 A1 B1 C3 D3 E2 G211 A1 B2 C1 D1 E3 G312 A1 B3 C2 D2 E1 G113 A2 B1 C2 D3 E1 G3


14 A2 B2 C3 D1 E2 G115 A2 B3 C1 D2 E3 G216 A3 B1 C3 D2 E3 G117 A3 B2 C1 D3 E1 G218 A3 B3 C2 D1 E2 G3

Orthogonal array design with 5 residual df.Using columns 2, 3, 4, 5, 6, 7 from design oa.18.2p1x3p7

> OE<-factor(c(rep(1,18),rep(2,18)))

> elect.design<-design(elect1.design,OE)Warning messages:1: argument(s) 1 have 18 rows, will be replicated to 36 rows to

match other arguments in: data.frame(elect1.design, OE)2: Row names were wrong length, using default names in: data.f\

rame(elect1.design, OE)

> elect.designA B C D E G OE

1 A1 B1 C1 D1 E1 G1 12 A1 B2 C2 D2 E2 G2 13 A1 B3 C3 D3 E3 G3 14 A2 B1 C1 D2 E2 G3 1

...33 A2 B3 C1 D2 E3 G2 234 A3 B1 C3 D2 E3 G1 235 A3 B2 C1 D3 E1 G2 236 A3 B3 C2 D1 E2 G3 2

> y<-scan()1: 4750 5444 5802 6088 9000 5236 12960 5306 9370 494211: 5516 5084 4890 8334 10750 12508 5762 8692 5050 588421: 6152 6216 9390 5902 12660 5476 9812 5206 5614 532231: 5108 8744 10750 11778 6286 892037:

> elect.df<-data.frame(y,elect.design)> rm(y, elect.design)> elect.aov<-aov(y~(A+B+C+D+E+G)+OE+OE*(A+B+C+D+E+G),elect.df)> summary(elect.aov)

Df Sum of Sq Mean SqA 2 84082743 42041371B 2 6996828 3498414C 2 3289867 1644933D 2 5435943 2717971E 2 98895324 49447662G 2 28374240 14187120OE 1 408747 408747

OE:A 2 112170 56085OE:B 2 245020 122510OE:C 2 5983 2991OE:D 2 159042 79521OE:E 2 272092 136046


OE:G 2 13270 6635Residuals 10 4461690 446169

> summary(elect.aov,split=list(A=list(1,2),B=list(1,2),+ C=list(1,2),D=list(1,2),+ E=list(1,2),G=list(1,2)))

Df Sum of Sq Mean SqA 2 84082743 42041371

A: Ctst 1 1 27396340 27396340A: Ctst 2 1 56686403 56686403

B 2 6996828 3498414B: Ctst 1 1 5415000 5415000B: Ctst 2 1 1581828 1581828

C 2 3289867 1644933C: Ctst 1 1 2275504 2275504C: Ctst 2 1 1014363 1014363

D 2 5435943 2717971D: Ctst 1 1 130833 130833D: Ctst 2 1 5305110 5305110

E 2 98895324 49447662E: Ctst 1 1 22971267 22971267E: Ctst 2 1 75924057 75924057

G 2 28374240 14187120G: Ctst 1 1 2257067 2257067G: Ctst 2 1 26117174 26117174

OE 1 408747 408747OE:A 2 112170 56085

OE:A: Ctst 1 1 620 620OE:A: Ctst 2 1 111549 111549

OE:B 2 245020 122510OE:B: Ctst 1 1 192963 192963OE:B: Ctst 2 1 52057 52057

OE:C 2 5983 2991OE:C: Ctst 1 1 3220 3220OE:C: Ctst 2 1 2763 2763

OE:D 2 159042 79521OE:D: Ctst 1 1 55681 55681OE:D: Ctst 2 1 103361 103361

OE:E 2 272092 136046OE:E: Ctst 1 1 1734 1734OE:E: Ctst 2 1 270358 270358

OE:G 2 13270 6635OE:G: Ctst 1 1 12331 12331OE:G: Ctst 2 1 939 939

Residuals 10 4461690 446169

> summary(aov(y~A*B*C*D*E*G*OE,elect.df))Df Sum of Sq Mean Sq

A 2 84082743 42041371B 2 6996828 3498414C 2 3289867 1644933D 2 5435943 2717971E 2 98895324 49447662G 2 28374240 14187120


OE 1 408747 408747A:B 2 229714 114857B:C 2 3001526 1500763B:E 1 1175056 1175056A:OE 2 112170 56085B:OE 2 245020 122510C:OE 2 5983 2991D:OE 2 159042 79521E:OE 2 272092 136046G:OE 2 13270 6635

A:B:OE 2 2616 1308B:C:OE 2 49258 24629B:E:OE 1 3520 3520

> (229714+3001526+1175056)/5[1] 881259.2> (2616+49258+3520)/5[1] 11078.8

C.7 Bibliographic notes

The definitive guide to statistical analysis with S-PLUS is Venablesand Ripley (1999), now in its third edition. A detailed discussionof contrasts for fitting and partitioning sums of squares is givenin Chapter 6.2, and analysis of structured designs is outlined inChapter 6.7 and 6.8. Models with several components of variationare discussed in Chapter 6.11 and the latest release of S-PLUSincludes a quite powerful method for fitting mixed effects models,lme. The software and sets of data for Venables and Ripley (1999)are available on the World Wide Web; a current list of sites ismaintained at\tt http://www.stats.ox.ac.uk/pub/MASS3/sites.html.

Chambers and Hastie (1992) is a useful reference for detailed un-derstanding of the structure of data and models in S and has manyexamples of analysis of structured designs in Chapter 5. Their bookrefers to the S language, which is included in S-PLUS. Spector(1994) gives a readable introduction to both languages, with anumber of useful programming tips. The manuals distributed withS-PLUS are useful for problems with the same structure as one oftheir examples: designed experiments are discussed in Chapters 13through 15 of the S-PLUS 2000 Guide to Statistics, Vol I.

There are a number of S and S-PLUS functions available throughthe statlib site at Carnegie-Mellon University:\tt http://www.stat.cmu.edu.


Of particular interest is the Designs archive at that site, whichincludes several programs for computing optimal designs, and thelibrary of functions provided by F. Harrell (Harrell/hmisc in theS archive).


References

Abdelbasit, K.M. and Plackett, R.L. (1983). Experimental design forbinary data. J. Amer. Statist. Assoc., 78, 90–98.

Armitage, P. (1975). Sequential medical trials. Oxford: Blackwell.

Aslett, R., Buck, R.J., Duvall, S.G., Sacks, J. and Welch, W.J. (1998).Circuit optimization via sequential computer experiments: design ofan output buffer. Appl. Statist., 47, 31–48.

Atiqullah, M. and Cox, D.R. (1962). The use of control observations asan alternative to incomplete block designs. J. R. Statist. Soc. B, 24,464–471.

Atkinson, A.C. (1985). Plots, transformation and regression. OxfordUniversity Press.

Atkinson, A.C. and Donev, A.N. (1992). Optimal experimental designs.Oxford University Press.

Atkinson, A.C. and Donev, A.N. (1996). Experimental designs optimallybalanced against trend. Technometrics, 38, 333–341.

Azais, J.-M., Monod, H. and Bailey, R.A. (1998). The influence of designon validity and efficiency of neighbour methods. Biometrics, 54, 1374–1387.

Azzalini, A. and Cox, D.R. (1984). Two new tests associated with anal-ysis of variance. J. R. Statist. Soc. B, 46, 335–343.

Bailey, R.A. and Rowley, C.A. (1987). Valid randomization. Proc. Roy.Soc. London, A, 410, 105–124.

Bartlett, M.S. (1933). The vector representation of a sample. Proc.Camb. Phil. Soc., 30, 327–340.

Bartlett, M.S. (1938). The approximate recovery of information fromfield experiments with large blocks. J. Agric. Sci., 28, 418–427.

Bartlett, M.S. (1978). Nearest neighbour models in the analysis of fieldexperiments (with discussion). J. R. Statist. Soc. B, 40, 147–170.

Bartlett, R.H., Roloff, D.W., Cornell, R.G., Andrews, A.F., Dillon, P.W.and Zwischenberger, J.B. (1985). Extracorporeal circulation in neona-tal respiratory failure: A prospective randomized study. Pediatrics,76, 479–487.

Begg, C.B. (1990). On inferences from Wei’s biased coin design for clin-ical trials (with discussion). Biometrika, 77, 467–485

Besag, J. and Higdon, D. (1999). Bayesian analysis of agricultural field


experiments (with discussion). J. R. Statist. Soc. B, 61, 691–746.

Beveridge, W.V.I. (1952). The art of scientific investigation. London:Heinemann.

Biggers, J.D. and Heyner, R.R. (1961). Studies on the amino acid re-quirements of cartilaginous long bone rudiments in vitro. J. Experi-mental Zoology, 147, 95–112.

Blackwell, D. and Hodges, J.L. (1957). Design for the control of selectionbias. Ann. Math. Statist., 28, 449–460.

Blot, W.J. and 17 others (1993). Nutritional intervention trials in Linx-ian, China: supplementation with specific vitamin-mineral combina-tions, cancer incidence, and disease-specific mortality in the generalpopulation. J. Nat. Cancer Inst., 85, 1483–1492.

Booth, K.H.V. and Cox, D.R. (1962). Some systematic supersaturateddesigns. Technometrics, 4, 489–495.

Bose, R.C. (1938). On the application of Galois fields to the problem ofthe construction of Hyper-Graeco Latin squares. Sankhya, 3, 323–338.

Bose, R.C. and Bush, K.A. (1952). Orthogonal arrays of strength twoand three. Ann. Math. Statist., 23, 508–524.

Bose, R.C., Shrikhande, S.S. and Parker, E.T. (1960). Further results onthe construction of mutually orthogonal Latin squares and the falsityof Euler’s conjecture. Canad. J. Math, 12, 189-203.

Box, G.E.P. and Draper, N.R. (1959). A basis for the selection of aresponse surface design. J. Amer. Statist. Assoc., 54, 622–654.

Box, G.E.P. and Draper, N.R. (1969). Evolutionary operation. NewYork: Wiley.

Box, G.E.P. and Hunter, J.S. (1957). Multi-factor experimental designsfor exploring response surfaces. Ann. Math. Statist., 28, 195–241.

Box, G.E.P. and Lucas, H.L. (1959). Design of experiments in nonlinearsituations. Biometrika, 46, 77–90.

Box, G.E.P. and Wilson, K.B. (1951). On the experimental attainmentof optimum conditions (with discussion). J. R. Statist. Soc., B, 13,1–45.

Box, G.E.P., Hunter, W.G. and Hunter, J.S. (1978). Statistics for ex-perimenters. New York: Wiley.

Brien, C.J. and Payne, R.W. (1999). Tiers, structure formulae and theanalysis of complicated experiments. J. R. Statist. Soc. D, 48, 41–52.

Carlin, B.P., Kadane, J. and Gelfand, A.E. (1998). Approaches for op-timal sequential decision analysis in clinical trials. Biometrics, 54,964–975.

Carmichael, R.D. (1937). Introduction to the theory of groups of finiteorder. New York: Dover, 1956 reprint.

Chaloner, K. (1993). A note on optimal Bayesian design for nonlinearproblems. J. Statist. Plann. Inf., 37, 229–235.

Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: a


review. Statist. Sci., 10, 273–304.

Chambers, J.M. and Hastie, T.J. (editors) (1992). Statistical models inS. Pacific Grove: Wadsworth & Brooks/Cole.

Chao, S.-C. and Shao, J. (1997). Statistical methods for two-sequencethree-period cross-over trials with incomplete data. Statistics inMedicine, 16, 1071–1039.

Cheng, C.-S., Martin, R.J. and Tang, B. (1998). Two-level factorial de-signs with extreme numbers of level changes. Ann. Statist., 26, 1522–1539.

Cheng, C.-S. and Mukerjee, R. (1998). Regular fractional factorial de-signs with minimum aberration and maximum estimation capacity.Ann. Statist., 26, 2289–2300.

Chernoff, H. (1953). Locally optimal designs for estimating parameters.Ann. Math. Statist., 24, 586–602.

Ciminera, J.L., Heyse, J.F., Nguyen, H. and Tukey, J.W. (1993). Testsfor qualitative treatment by center interaction using a push-back pro-cedure. Statistics in Medicine, 12, 1033–1045.

Claringbold, P.J. (1955). Use of the simplex design in the study of thejoint reaction of related hormones. Biometrics, 11, 174–185.

Clarke, G.M. and Kempson, R.E. (1997). Introduction to the design andanalysis of experiments. London: Arnold.

Cochran, W.G. and Cox, G.M. (1958). Experimental designs. Secondedition. New York: Wiley.

Cook, R.D. and Weisberg, S. (1982). Residuals and inference in regres-sion. London: Chapman & Hall.

Copas, J.B. (1973). Randomization models for the matched and un-matched 2× 2 tables. Biometrika, 60, 467–476.

Cornell, J.A. (1981). Experiments with mixtures. New York: Wiley.

Cornfield, J. (1978). Randomization by group: a formal analysis. Amer-ican J. Epidemiology, 108, 100–102.

Covey–Crump, P.A.K. and Silvey, S.D. (1970). Optimal regression de-signs with previous observations. Biometrika, 57, 551–566.

Cox, D.R. (1951). Some systematic experimental designs. Biometrika,38, 310–315.

Cox, D.R. (1954). The design of an experiment in which certain treat-ment arrangements are inadmissible. Biometrika, 41, 287–295.

Cox, D.R. (1957). The use of a concomitant variable in selecting anexperimental design. Biometrika, 44, 150–158.

Cox, D.R. (1958). Planning of experiments. New York: Wiley.

Cox, D.R. (1971). A note on polynomial response functions for mixtures.Biometrika, 58, 155–159.

Cox, D.R. (1982). Randomization and concomitant variables in the de-sign of experiments. In Statistics and Probability: Essays in honor ofC.R. Rao. Editors G. Kallianpur, P.R. Krishnaiah and J.K. Ghosh.


Amsterdam: North Holland, pp. 197–202.

Cox, D.R. (1984a). Interaction (with discussion). Int. Statist. Rev., 52,1–31.

Cox, D.R. (1984b). Effective degrees of freedom and the likelihood ratiotest. Biometrika, 71, 487–493.

Cox, D.R. (1992). Causality: some statistical aspects. J. R. Statist. Soc.,A, 155, 291–301.

Cox, D.R. and Hinkley, D.V. (1974). Theoretical statistics. London:Chapman & Hall.

Cox, D.R. and McCullagh, P. (1982). Some aspects of analysis of covari-ance (with discussion). Biometrics, 38, 541–561.

Cox, D.R. and Snell, E.J. (1981). Applied statistics. London: Chapman& Hall.

Cox, D.R. and Wermuth, N. (1996). Multivariate dependencies. London:Chapman & Hall.

Daniel, C. (1959). Use of half normal plot in interpreting factorial two-level experiments. Technometrics, 1, 311–341.

Daniel, C. (1994). Factorial one-factor-at-a-time experiments. AmericanStatistician, 48, 132–135.

Davies, O.L. (editor) (1956). Design and analysis of industrial experi-ments. 2nd ed. Edinburgh: Oliver & Boyd.

Dawid, A.P. (2000). Causality without counterfactuals (with discussion).J. Amer. Statist. Assoc., 95, to appear.

Dawid, A.P. and Sebastiani, P. (1999). Coherent dispersion criteria foroptimal experimental design. Ann. Statist., 27, 65–81.

Dean, A. and Voss, D. (1999). Design and analysis of experiments. NewYork: Springer.

Denes, J. and Keedwell, A.D. (1974). Latin squares and their applica-tions. London: English Universities Press.

Desu, M.M. and Raghavarao, D. (1990). Sample size methodology. NewYork: Academic Press.

Dey, A. and Mukerjee, R. (1999). Fractional factorial plans. New York:Wiley.

Donnelly, C.A. and Ferguson, N.M. (1999). Statistical aspects of BSEand vCJD. London: Chapman & Hall.

Draper, N. and Smith, H. (1998). Applied regression analysis. 3rd edi-tion. New York: Wiley.

Easton, D.F., Peto, J. and Babiker, A.G. (1991). Floating absolute risk:alternative to relative risk in survival and case-control analysis avoid-ing and arbitrary reference group. Statistics in Medicine, 10, 1025–1035.

Elfving, G. (1952). Optimum allocation in linear regression theory. Ann.Math. Statist., 23, 255–262.

Elfving, G. (1959). Design of linear experiments. In Cramer Festschrift


volume, ed. U. Grenander, pp.58–74. New York: Wiley.

Fang, K.-T. and Wang, Y. (1993). Number-theoretic methods in statis-tics. London: Chapman & Hall.

Fang, K.-T., Wang, Y. and Bentler, P.M. (1994). Some applications ofnumber-theoretic methods in statistics. Statist. Sci., 9, 416–428.

Farewell, V.T. and Herzberg, A.M. (2000). Plaid designs for the evalu-ation of training for medical practitioners. To appear.

Fearn, T. (1992). Box-Cox transformations and the Taguchi method:an alternative analysis of a Taguchi case study. Appl. Statist., 41,553–559.

Fedorov, V.V. (1972). Theory of optimal experiments. (English transla-tion from earlier Russian edition). New York: Academic Press.

Fedorov, V.V. and Hackl, P. (1997). Model oriented design of experi-ments. New York: Springer.

Finney, D.J. (1945a). Some orthogonal properties of the 4× 2 and 6× 6Latin squares. Ann. Eugenics, 12, 213–217.

Finney, D.J. (1945b). The fractional replication of factorial arrange-ments. Ann. Eugenics, 12, 283–290.

Firth, D. and Menezes, R. (2000). Quasi-variances for comparing groups:control relative not absolute error. To appear.

Fisher, R.A. (1926). The arrangement of field experiments. J. Ministryof Agric., 33, 503–513.

Fisher, R.A. (1935). Design of experiments. Edinburgh: Oliver & Boyd.

Fisher, R.A. and Mackenzie, W.A. (1923). Studies in crop variation II.The manurial response of different potato varieties. J. Agric. Sci., 13,311–320.

Flournoy, N., Rosenberger, W.F. and Wong, W.K. (1998) (eds). Newdevelopments and applications in experimental design. Hayward: In-stitute of Mathematical Statistics.

Fries, A. and Hunter, W.G. (1980). Minimum aberration in 2k−p designs.Technometrics, 222, 601–608.

Gail, M. and Simon, R. (1985). Testing for qualitative interaction be-tween treatment effects and patient subsets. Biometrics, 41, 361–372.

Gilmour, A.R., Cullis, B.R. and Verbyla, A.P. (1997). Accounting fornatural and extraneous variation in the analysis of field experiments.J. Agric. Bio. Environ. Statist., 2, 269–273.

Gilmour, S.G. and Ringrose, T.J. (1999). Controlling processes in foodtechnology by simplifying the canonical form of fitted response sur-faces. Appl. Statist., 48, 91–102.

Ginsberg, E.S., Mello, N.K., Mendelson, J.H., Barbieri, R.L., Teoh, S.K.,Rothman, M., Goa, X. and Sholar, J.W. (1996). Effects of alcoholingestion on œstrogens in postmenopausal women. J. Amer. Med.Assoc. 276, 1747–1751.

Goetghebeur, E. and Houwelingen, H.C. van (1998)(eds). Special issue


on noncompliance in clinical trials. Statistics in Medicine, 17, 247–390.

Good, I.J. (1958). The interaction algorithm and practical Fourier anal-ysis. J. R. Statist. Soc., B, 20, 361–372.

Grundy, P.M. and Healy, M.J.R. (1950). Restricted randomization andquasi-Latin squares. J. R. Statist. Soc., B, 12, 286–291.

Guyatt, G., Sackett, D., Taylor, D.W., Chong, J., Roberts, R. and Pugs-ley, S. (1986). Determining optimal therapy-randomized trials in in-dividual patients. New England J. Medicine, 314, 889–892.

Hald, A. (1948). The decomposition of a series of observations. Copen-hagen: Gads Forlag.

Hald, A. (1998). A history of mathematical statistics. New York: Wiley.

Hartley, H.O. and Smith, C.A.B. (1948). The construction of Youdensquares. J. R. Statist. Soc., B, 10, 262–263.

Hedayat, A.S., Sloane, N.J.A. and Stufken, J. (1999). Orthogonal arrays:theory and applications. New York: Springer.

Heise, M.A. and Myers, R.H. (1996). Optimal designs for bivariate lo-gistic regression. Biometrics, 52, 613–624.

Herzberg, A.M. (1967). The behaviour of the variance function of thedifference between two estimated responses. J. R. Statist. Soc., B, 29,174–179.

Herzberg, A.M. and Cox, D.R. (1969). Recent work on the design ofexperiments: a bibliography and a review. J. R. Statist. Soc., A, 132,29–67.

Hill, R. (1986). A first course in coding theory. Oxford University Press.

Hinkelman, K. and Kempthorne, O. (1994). Design and analysis of ex-periments. New York: Wiley.

Holland, P.W. (1986). Statistics and causal inference (with discussion).J. Amer. Statist. Assoc., 81, 945–970.

Huang, Y.-C. and Wong, W.-K. (1998). Sequential considerations ofmultiple-objective optimal designs. Biometrics, 54, 1388–1397.

Hurrion, R.D. and Birgil, S. (1999). A comparison of factorial and ran-dom experimental design methods for the development of regressionand neural simulation metamodels. J. Operat. Res. Soc., 50, 1018–1033.

Jennison, C. and Turnbull, B.W. (2000). Group sequential methods withapplications to clinical trials. London: Chapman & Hall.

John, J.A. and Quenouille, M.H. (1977). Experiments: design and anal-ysis. London: Griffin.

John, J.A., Russell, K.G., Williams, E.R. and Whitaker, D. (1999). Re-solvable designs with unequal block sizes. Austr. and NZ J. Statist.,41, 111–116.

John, J.A. and Williams, E.R. (1995). Cyclic and computer generateddesigns. 2nd edition. London: Chapman & Hall.


John, P.W.M. (1971). Statistical design and analysis of experiments.New York: Macmillan.

Johnson, T. (1998). Clinical trials in psychiatry: background and statis-tical perspective. Statist. Methods in Medical Res., 7, 209–234.

Jones, B. and Kenward, M.G. (1989). Design and analysis of crossovertrials. London: Chapman & Hall.

Kempthorne, O. (1952). Design of experiments. New York: Wiley.

Kiefer, J. (1958). On the nonrandomized optimality and randomizednonoptimality of symmetrical designs. Ann. Math. Statist., 29, 675–699.

Kiefer, J. (1959). Optimum experimental design (with discussion). J. R.Statist. Soc., B, 21, 272–319.

Kiefer, J. (1975). Optimal design: variation in structure and performanceunder change of criterion. Biometrika, 62, 277–288.

Kiefer, J. (1985). Collected papers. eds. L. Brown, I. Olkin, J. Sacks andH.P. Wynn. New York: Springer.

Kiefer, J. and Wolfowitz, J. (1959). Optimal designs in regression prob-lems. Ann. Math. Statist., 30, 271–294.

Kruskal, W.H. (1961). The coordinate-free approach to Gauss-Markovestimation, and its application to missing and extra observations.Proc. 4th Berkeley Symposium, 1, 435–451.

Lauritzen, S.L. (2000). Causal inference from graphical models. In Com-plex stochastic systems. C. Kluppelberg, O.E. Barndorff-Nielsen andD.R. Cox, editors. London: Chapman & Hall/CRC.

Leber, P.D. and Davis, C.S. (1998). Threats to the validity of clinicaltrials employing enrichment strategies for sample selection. ControlledClinical Trials, 19, 178–187.

Lehmann, E.L. (1975). Nonparametrics: statistical methods based onranks. San Francisco: Holden-Day.

Lindley, D.V. (1956). On the measure of information provided by anexperiment. Ann. Math. Statist., 27, 986–1005.

Logothetis, N. (1990). Box-Cox transformations and the Taguchimethod. Appl. Statist., 39, 31–48.

Logothetis, N. and Wynn, H.P. (1989). Quality through design. OxfordUniversity Press.

McCullagh, P. (2000). Invariance and factorial models (with discussion).J. R. Statist. Soc., B, 62, 209–256.

McCullagh, P. and Nelder, J.A. (1989). Generalized linear models. 2ndedition. London: Chapman & Hall.

McKay, M.D., Beckman, R.J. and Conover, W.J. (1979). A comparisonof three methods for selecting values of input variables in the analysisof output from a computer code. Technometrics, 21, 239–245.

Manly, B.J.F. (1997). Randomization, bootstrap and Monte Carlo meth-ods in biology. London: Chapman & Hall.


Mehrabi, Y. and Matthews, J.N.S. (1998). Implementable Bayesian de-signs for limiting dilution assays. Biometrics, 54, 1398–1406.

Mesenbrink, P., Lu, J-C., McKenzie, R. and Taheri, J. (1994). Char-acterization and optimization of a wave-soldering process. J. Amer.Statist. Assoc., 89, 1209–1217.

Meyer, R.D., Steinberg, D.M. and Box, G.E.P. (1996). Follow up designsto resolve confounding in multifactorial experiments. Technometrics,38, 303–318.

Monod, H., Azais, J.-M. and Bailey, R.A. (1996). Valid randomizationfor the first difference analysis. Austr. J. Statist., 38, 91–106.

Montgomery, D.C. (1997). Design and analysis of experiments. 4th edi-tion. New York: Wiley.

Montgomery, D.C. (1999). Experimental design for product and processdesign and development (with comments). J. R. Statist. Soc., D, 48,159–177.

Nair, V.J. (editor) (1992). Taguchi’s parameter design: a panel discus-sion. Technometrics, 34, 127–161.

Neiderreiter, H. (1992). Random number generation and quasi-MonteCarlo methods. Philiadelphia: SIAM.

Nelder, J.A. (1965a). The analysis of experiments with orthogonal blockstructure. I Block structure and the null analysis of variance. Proc.Roy. Soc. London, A, 283, 147–162.

Nelder, J.A. (1965b). The analysis of experiments with orthogonal blockstructure. II Treatment structure and the general analysis of variance.Proc. Roy. Soc. London, A, 283, 163–178.

Newcombe, R.G. (1996). Sequentially balanced three-squares cross-overdesigns. Statistics in Medicine, 15, 2143–2147.

Neyman, J. (1923). On the application of probability theory to agricul-tural experiments. Essay on principles. Roczniki Nauk Rolniczych, 10,1–51 (in Polish). English translation of Section 9 by D.M. Dabrowskaand T.P. Speed (1990), Statist. Sci., 9, 465–480.

Olguin, J. and Fearn, T. (1997). A new look at half-normal plots forassessing the significance of contrasts for unreplicated factorials. Appl.Statist., 46, 449–462.

Owen, A. (1992). Orthogonal arrays for computer experiments, integra-tion, and visualization. Statist. Sinica, 2, 459–452.

Owen, A. (1993). A central limit theorem for Latin hypercube sampling.J. R. Statist. Soc., B, 54, 541–551.

Papadakis, J.S. (1937). Methods statistique poure des experiences surchamp. Bull. Inst. Amer. Plantes a Salonique, No. 23.

Patterson, H.D. and Williams, E.R. (1976). A new class of resolvableincomplete block designs. Biometrika, 63, 83–92.

Pearce, S.C. (1970). The efficiency of block designs in general.Biometrika 57, 339–346.


Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge:Cambridge University Press.

Pearson, E.S. (1947). The choice of statistical tests illustrated on theinterpretation of data classed in a 2 × 2 table. Biometrika, 34, 139–167.

Piantadosi, S. (1997). Clinical trials. New York: Wiley.

Pistone, G. and Wynn, H.P. (1996). Generalised confounding withGrobner bases. Biometrika, 83, 653–666.

Pistone, G., Riccomagno, E. and Wynn, H.P. (2000). Algebraic statistics.London: Chapman & Hall/CRC.

Pitman, E.J.G. (1937). Significance tests which may be applied tosamples from any populations: III The analysis of variance test.Biometrika, 29, 322–335.

Plackett, R.L. and Burman, J.P. (1945). The design of optimum multi-factorial experiments. Biometrika, 33, 305–325.

Preece, A.W., Iwi, G., Davies-Smith, A., Wesnes, K., Butler, S., Lim,E. and Varney, A. (1999). Effect of 915-MHz simulated mobile phonesignal on cognitive function in man. Int. J. Radiation Biology, 75,447–456.

Preece, D.A. (1983). Latin squares, Latin cubes, Latin rectangles, etc. InEncyclopedia of statistical sciences, Vol.4. S. Kotz and N.L. Johnson,eds, 504–510.

Preece, D.A. (1988). Semi-Latin squares. In Encyclopedia of statisticalsciences, Vol.8. S. Kotz and N.L. Johnson, eds, 359–361.

Pukelsheim, F. (1993). Optimal design of experiments. New York: Wiley.

Quenouille, M.H. (1953). The design and analysis of experiments. Lon-don: Griffin.

Raghavarao, D. (1971). Construction and combinatorial problems in de-sign of experiments. New York: Wiley.

Raghavarao, D. and Zhou, B. (1997). A method of constructing 3-designs. Utilitas Mathematica, 52, 91–96.

Raghavarao, D. and Zhou, B. (1998). Universal optimality of UE 3-designs for a competing effects model. Comm. Statist.–Theory Meth.,27, 153–164.

Rao, C. R. (1947). Factorial experiments derivable from combinatorialarrangements of arrays. Suppl. J. R. Statist. Soc., 9, 128–139.

Redelmeier, D. and Tibshirani, R. (1997a). Association between cellularphones and car collisions. N. England J. Med., 336, 453–458.

Redelmeier, D. and Tibshirani, R.J. (1997b). Is using a cell phone likedriving drunk? Chance, 10, 5–9.

Reeves, G.K. (1991). Estimation of contrast variances in linear models.Biometrika, 78, 7–14.

Ridout, M.S. (1989). Summarizing the results of fitting generalized linearmodels from designed experiments. In Statistical modelling: Proceed-


ings of GLIM89. A. Decarli et al. editors, pp. 262–269. New York:Springer.

Robbins, H. and Monro, S. (1951). A stochastic approximation method.Ann. Math. Statist., 22, 400–407.

Rosenbaum, P.R. (1987). The role of a second control group in an ob-servational study (with discussion). Statist. Sci., 2, 292–316.

Rosenbaum, P.R. (1999). Blocking in compound dispersion experiments.Technometrics, 41, 125–134.

Rosenberger, W.F. and Grill, S.E. (1997). A sequential design for psy-chophysical experiments: an application to estimating timing of sen-sory events. Statistics in Medicine, 16, 2245–2260.

Rubin, D.B. (1974). Estimating causal effects of treatments in random-ized and nonrandomized studies. J. Educ. Psychol., 66, 688–701.

Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Designand analysis of computer experiments (with discussion). Statist. Sci.,4, 409–436.

Sattherthwaite, F. (1958). Random balanced experimentation. Techno-metrics, 1, 111–137.

Scheffe, H. (1958). Experiments with mixtures (with discussion). J.R.Statist. Soc., B, 20, 344–360.

Scheffe, H. (1959). Analysis of variance. New York: Wiley.

Senn, S.J. (1993). Cross-over trials in clinical research. Chichester: Wi-ley.

Shah, K.R. and Sinha, B.K. (1989). Theory of optimal designs. Berlin:Springer.

Silvey, S.D. (1980). Optimal design. London: Chapman & Hall.

Singer, B.H. and Pincus, S. (1998). Irregular arrays and randomization.Proc. Nat. Acad. Sci. USA, 95, 1363–1368.

Smith, K. (1918). On the standard deviation of adjusted and interpo-lated values of an observed polynomial function and its constants, andthe guidance they give towards a proper choice of the distribution ofobservations. Biometrika, 12, 1–85.

Spector, P. (1994). An introduction to S and S-Plus. Belmont: Duxbury.

Speed, T.P. (1987). What is an analysis of variance? (with discussion).Ann. Statist., 15, 885–941.

Stein, M. (1987). Large sample properties of simulations using Latinhypercube sampling. Technometrics, 29, 143–151.

Stigler, S.M. (1986). The history of statistics. Cambridge, Mass: HarvardUniversity Press.

Street, A.P. and Street, D.J. (1987). Combinatorics of experimental de-sign. Oxford University Press.

Tang, B. (1993). Orthogonal array-based Latin hypercubes. J. Amer.Statist. Assoc., 88, 1392–1397.

Thompson, M. E. (1997). Theory of sample surveys. London: Chapman


& Hall.

Tsai, P.W., Gilmour, S.G., and Mead, R. (1996). An alternative anal-ysis of Logothetis’s plasma etching data. Letter to the editor. Appl.Statist., 45, 498–503.

Tuck, M.G., Lewis, S.M. and Cottrell, J.I.L. (1993). Response surfacemethodology and Taguchi: a quality improvement study from themilling industry. Appl. Statist., 42, 671–681.

Tukey, J.W. (1949). One degree of freedom for non-additivity. Biomet-rics, 5, 232–242.

UK Collaborative ECMO Trial Group. (1986). UK collaborative ran-domised trial of neonatal extracorporeal membrane oxygenation.Lancet, 249, 1213–1217.

Vaart, A.W. van der (1998). Asymptotic statistics. Cambridge: Cam-bridge University Press.

Vajda, S. (1967a). Patterns and configurations in finite spaces. London:Griffin.

Vajda, S. (1967b). The mathematics of experimental design; incompleteblock designs and Latin squares. London: Griffin.

Venables, W.M. and Ripley, B.D. (1999). Modern applied statistics withS-PLUS. 3rd ed. Berlin: Springer.

Wald, A. (1947). Sequential analysis. New York: Wiley.

Wang, J.C. and Wu, C.F.J. (1991). An approach to the construction ofasymmetric orthogonal arrays. J. Amer. Statist. Assoc., 86, 450–456.

Wang, Y.-G. and Leung, D. H.-Y. (1998). An optimal design for screen-ing trials. Biometrics, 54, 243–250.

Ware, J.H. (1989). Investigating therapies of potentially great benefit:ECMO (with discussion). Statist. Sci., 4, 298–340.

Wei, L.J. (1988). Exact two-sample permutation tests based on the ran-domized play-the-winner rule. Biometrika, 75, 603–606.

Welch, B.L. (1937). On the z test in randomized blocks and Latinsquares. Biometrika, 29, 21–52.

Wetherill, G.B. and Glazebrook, K.D. (1986). Sequential methods instatistics. 3rd edition. London: Chapman & Hall.

Whitehead, J. (1997). The design and analysis of sequential medical tri-als. 2nd edition. Chichester: Wiley.

Williams, E.J. (1949). Experimental designs balanced for the estimationof residual effects of treatments. Australian J. Sci. Res., A, 2, 149–168.

Williams, E.J. (1950). Experimental designs balanced for pairs of resid-ual effects. Australian J. Sci. Res., A, 3, 351–363.

Williams, R.M. (1952). Experimental designs for serially correlated ob-servations. Biometrika, 39, 151–167.

Wilson, E.B. (1952). Introduction to scientific research. New York: Mc-Graw Hill.


Wynn, H.P. (1970). The sequential generation of D-optimum experi-mental designs. Ann. Statist., 5, 1655–1664.

Yates, F. (1935). Complex experiments (with discussion). Suppl. J. R.Statist. Soc., 2, 181–247.

Yates, F. (1936). A new method of arranging variety trials involving alarge number of varieties. J. Agric. Sci., 26, 424–455.

Yates, F. (1937). The design and analysis of factorial experiments. Tech-nical communication 35. Harpenden: Imperial Bureau of Soil Science.

Yates, F. (1951a). Bases logiques de la planification des experiences.Ann. Inst. H. Poincare, 12, 97–112.

Yates, F. (1951b). Quelques developpements modernes dans la planifi-cation des experiences. Ann. Inst. H. Poincare, 12, 113–130.

Yates, F. (1952). Principles governing the amount of experimentation indevelopmental work. Nature, 170, 138–140.

Youden, W.J. (1956). Randomization and experimentation (abstract).Ann. Math. Statist., 27, 1185–1186.


the theory of the design of experiments [doe]

Business