control of experimental error

Control of Experimental Error Blocking -

– A block is a group of homogeneous experimental units

– Maximize the variation among blocks in order to minimize the variation within blocks

Reasons for blocking– To remove block to block variation from the

experimental error (increase precision)– Treatment comparisons are more uniform– Increase the information by allowing the researcher

to sample a wider range of conditions

Blocking

At least one replication is grouped in a homogeneous area

C

C

C

Blocking

A B D

A

A

B

B

D

D

A A

A

B

B

B

CC

C

D

D

D

Just replication

Criteria for blocking Proximity or known patterns of variation in the field

– gradients due to fertility, soil type– animals (experimental units) in a pen (block)

Time– planting, harvesting

Management of experimental tasks – individuals collecting data– runs in the laboratory

Physical characteristics– height, maturity

Natural groupings– branches (experimental units) on a tree (block)

Randomized Block Design

Experimental units are first classified into groups (or blocks) of plots that are as nearly alike as possible

Linear Model: Yij = + i + j + ij = mean effect– βi = ith block effect

j = jth treatment effect ij = treatment x block interaction, treated as error

Each treatment occurs in each block, the same number of times (usually once)– Also known as the Randomized Complete Block Design– RBD = RCB = RCBD

Minimize the variation within blocks - Maximize the variation between blocks

Pretty doesn’t count here

Randomized Block Design

Other ways to minimize variation within blocks:

Field operations should be completed in one block before moving to another

If plot management or data collection is handled by more than one person, assign each to a different block

Advantages of the RBD Can remove site variation from experimental error and

thus increase precision

When an operation cannot be completed on all plots at one time, can be used to remove variation between runs

By placing blocks under different conditions, it can broaden the scope of the trial

Can accommodate any number of treatments and any number of blocks, but each treatment must be replicated the same number of times in each block

Statistical analysis is fairly simple

Disadvantages of the RBD Missing data can cause some difficulty in the analysis

Assignment of treatments by mistake to the wrong block can lead to problems in the analysis

If there is more than one source of unwanted variation, the design is less efficient

If the plots are uniform, then RBD is less efficient than CRD

As treatment or entry numbers increase, more heterogeneous area is introduced and effective blocking becomes more difficult. Split plot or lattice designs may be better suited.

Uses of the RBD

When you have one source of unwanted variation

Estimates the amount of variation due to the blocking factor

Randomization in an RBD

Each treatment occurs once in each block

Assign treatments at random to plots within each block

Use a different randomization for each block

Analysis of the RBD Construct a two-way table of the means and

deviations for each block and each treatment level

Compute the ANOVA table

Conduct significance tests

Calculate means and standard errors

Compute additional statistics if appropriate:– Confidence intervals– Comparisons of means– CV

The RBD ANOVA

Source df SS MS F

Total rt-1 SSTot =

Block r-1 SSB = MSB = MSB/MSE SSB/(r-1)

Treatment t-1 SST = MST = MST/MSE SST/(t-1)

Error (r-1)(t-1) SSE = MSE = SSTot-SSB-SST SSE/(r-1)(t-1)

MSE is the divisor for all F ratios

2i j ijY Y

2jjr Y Y

2iit Y Y

Means and Standard Errors

Standard Error of a treatment meanYs MSE r

Confidence interval estimate iiL MSE rtY

Standard Error of a difference 1 2Y Ys 2MSE r

Confidence interval estimate on a difference

1 21 2L 2MSE rtY Y

t to test difference between two means 1 2Y Yt2MSE r

Numerical Example Test the effect of different sources of nitrogen on

the yield of barley:– 5 sources and a control

Wanted to apply the results over a wide range of conditions so the trial was conducted on four types of soil– Soil type is the blocking factor

Located six plots at random on each of the four soil types

ANOVA

Source df SS MS F

Total 23 492.36Soils (Block) 3 192.56 64.19 21.61**

Fertilizer (Trt) 5 255.28 51.06 17.19**

Error 15 44.52 2.97

Source (NH4)2SO4 NH4NO3 CO(NH2)2 Ca(NO3)2 NaNO3 Control

Mean 36.25 32.38 29.42 31.02 30.70 25.35

Standard error of a treatment mean = 0.86 CV = 5.6%Standard error of a difference between two treatment means = 1.22

22

24

26

28

30

32

34

36

38

40

(NH4)2SO4 NH4NO3 Ca(NO3)2 NaNO3 CO(NO2)2 Control

34.41 30.54 29.19 28.86 27.59 23.51

36.25 32.38 31.02 30.70 29.42 25.35

38.09 34.21 32.86 32.54 31.26 27.19

Confidence Interval Estimates

Report of Analysis Differences among sources of nitrogen were highly

significant

Ammonium sulfate (NH4)2SO4 produced the highest mean yield and CO(NH2)2 produced the lowest

When no nitrogen was added, the yield was only 25.35 kg/plot

Blocking on soil type was effective as evidenced by:

– large F for Soils (Blocks)

– small coefficient of variation (5.6%) for the trial

Is This Experiment Valid?

No Irrig

atio

n

Irrig

ated

Pre

-Pla

nt

Full

Irrig

atio

n

Missing Plots If only one plot is missing, you

can use the following formula:

Where:• Bi = sum of remaining observations in the ith block• Tj = sum of remaining observations in the jth treatment• G = grand total of the available observations• t, r= number of treatments, blocks, respectively

Total and error df must be reduced by 1 Used only to obtain a valid ANOVA

- No change in Error SS- SS for treatments may be biased upwards

Yij = ( rBi + tTj - G)/[(r-1)(t-1)]

Two or Three Missing Plots

Estimate all but one of the missing values and use the formula

Use this value and all but one of the remaining guessed values and calculate again; continue in this manner until you have resolved all missing plots

You lose one error degree of freedom for each substituted value

Better approach: Let SAS account for missing values– Use a procedure that can accommodate missing values (PROC

GLM, PROC MIXED)– Use adjusted means (LSMEANS) rather than MEANS– degrees of freedom are subtracted automatically for each missing

observation

Yij = ( rBi + tTj - G)/[(r-1)(t-1)]^

Relative Efficiency A way to measure the efficiency of RBD vs CRD

RE = [(r-1)MSB + r(t-1)MSE]/(rt-1)MSE

r, t = number of blocks, treatments in the RBD MSB, MSE = block, error mean squares from the RBD If RE > 1, RBD was more efficient (RE - 1)100 = % increase in efficiency r(RE) = number of replications that would be required in

the CRD to obtain the same level of precision

CRD

RBD

MSERE

MSE

Estimated Error for a CRD

Observed Error for RBD

control of experimental error

Documents

block r

wrong block

site variation

patterns of variation

treatment x block interaction

ith block effectj

number of blocks

blocks of plots