two factor designs - single-sized experimental units - cr...

92
Chapter 10 Two Factor Designs - Single-sized Experimental units - CR and RCB designs Contents 10.1 Introduction ...................................... 493 10.1.1 Treatment structure ............................... 494 10.1.2 Experimental unit structure ........................... 502 10.1.3 Randomization structure ............................ 504 10.1.4 Putting the three structures together ....................... 505 10.1.5 Balance ..................................... 505 10.1.6 Fixed or random effects ............................. 506 10.1.7 Assumptions .................................. 507 10.1.8 General comments ............................... 508 10.2 Example - Effect of photo-period and temperature on gonadosomatic index - CRD 509 10.2.1 Design issues .................................. 509 10.2.2 Preliminary summary statistics ......................... 510 10.2.3 The statistical model .............................. 514 10.2.4 Fitting the model ................................ 515 10.2.5 Hypothesis testing and estimation ........................ 516 10.2.6 Model assessment ................................ 522 10.2.7 Unbalanced data ................................. 524 10.3 Example - Effect of sex and species upon chemical uptake - CRD .......... 524 10.3.1 Design issues .................................. 526 10.3.2 Preliminary summary statistics ......................... 526 10.3.3 The statistical model .............................. 529 10.3.4 Fitting the model ................................ 529 10.4 Power and sample size for two-factor CRD ...................... 537 10.5 Unbalanced data - Introduction ............................ 541 10.6 Example - Stream residence time - Unbalanced data in a CRD ........... 543 10.6.1 Preliminary summary statistics ......................... 545 10.6.2 The Statistical Model .............................. 546 10.6.3 Hypothesis testing and estimation ........................ 547 492

Upload: others

Post on 24-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

Chapter 10

Two Factor Designs - Single-sizedExperimental units - CR and RCBdesigns

Contents10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

10.1.1 Treatment structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49410.1.2 Experimental unit structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 50210.1.3 Randomization structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50410.1.4 Putting the three structures together . . . . . . . . . . . . . . . . . . . . . . . 50510.1.5 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50510.1.6 Fixed or random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50610.1.7 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50710.1.8 General comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

10.2 Example - Effect of photo-period and temperature on gonadosomatic index - CRD 50910.2.1 Design issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50910.2.2 Preliminary summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . 51010.2.3 The statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51410.2.4 Fitting the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51510.2.5 Hypothesis testing and estimation . . . . . . . . . . . . . . . . . . . . . . . . 51610.2.6 Model assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52210.2.7 Unbalanced data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524

10.3 Example - Effect of sex and species upon chemical uptake - CRD . . . . . . . . . . 52410.3.1 Design issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52610.3.2 Preliminary summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . 52610.3.3 The statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52910.3.4 Fitting the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

10.4 Power and sample size for two-factor CRD . . . . . . . . . . . . . . . . . . . . . . 53710.5 Unbalanced data - Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54110.6 Example - Stream residence time - Unbalanced data in a CRD . . . . . . . . . . . 543

10.6.1 Preliminary summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . 54510.6.2 The Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54610.6.3 Hypothesis testing and estimation . . . . . . . . . . . . . . . . . . . . . . . . 547

492

Page 2: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.6.4 Power and sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55610.7 Example - Energy consumption in pocket mice - Unbalanced data in a CRD . . . 556

10.7.1 Design issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55710.7.2 Preliminary summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . 55710.7.3 The statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56010.7.4 Fitting the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56010.7.5 Hypothesis testing and estimation . . . . . . . . . . . . . . . . . . . . . . . . 56110.7.6 Adjusting for unequal variances? . . . . . . . . . . . . . . . . . . . . . . . . 567

10.8 Example: Use-Dependent Inactivation in Sodium Channel Beta Subunit Mutation- BPK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56810.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56810.8.2 Experimental protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56810.8.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

10.9 Blocking in two-factor CRD designs . . . . . . . . . . . . . . . . . . . . . . . . . 57810.10FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

10.10.1 How to determine sample size in two-factor designs . . . . . . . . . . . . . . 57910.10.2 What is the difference between a ‘block’ and a ‘factor’? . . . . . . . . . . . . 58010.10.3 If there is evidence of an interaction, does the analysis stop there? . . . . . . . 58010.10.4 When should you use raw means or emmeans? . . . . . . . . . . . . . . . . . 581

The suggested citation for this chapter of notes is:

Schwarz, C. J. (2019). Two Factor Designs - Single-sized Experimental units - CR and RCBdesigns. In Course Notes for Beginning and Intermediate Statistics.Available at http://www.stat.sfu.ca/~cschwarz/CourseNotes.Retrieved 2019-11-03.

10.1 Introduction

So far we’ve looked at two different experimental designs, the single-factor completely randomizeddesign (1-factor CRD), and the single-factor randomized compete block design (1-factor RCB).

Both designs investigated if differences in the mean response could be attributed to different levelsof a single factor. However, in many experiments, interest lies not only in the effect of a single factor,but in the joint effects of 2 or more factors.

For example:

• Yield of wheat. The yield of wheat depends upon many factors - two of which may be the varietyand the amount of fertilizer applied. This has two factors - (1) variety which may have three levelsrepresenting three popular types of seeds, and (2) the amount of fertilizer which may be set at twolevels.

• Pesticide levels. The pesticide levels may be measured in birds which may depend upon sex (twolevels) and distance of the wintering grounds from agricultural fields (three levels).

• Performance of a product. The strength of paper may depend upon the amount of water added(two levels) and the type of wood fiber used in the mix (three levels).

c©2019 Carl James Schwarz 493 2019-11-03

Page 3: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

There are many ways to design experiments with multiple factors - we will examine three of themost common designs used in ecological research - the completely randomized design (this chapter), therandomized block design (this chapter), and the split-plot design (next chapter).

As noted many times in this course, it is important to match the analysis of the data with the way thedata was collected. Before attempting to analyze any experiment, the features of the experiment shouldbe examined carefully. In particular, care must be taken to examine

• the treatment structure;

• the experimental unit structure;

• the randomization structures;

• the presence or absence of balance;

• if the levels of factors are fixed or random effects; and

• the assumptions implicitly made for the design.

If these features are not identified properly, then an incorrect design and analysis of an experiment willbe made.

10.1.1 Treatment structure

The treatment structure refers to how the various levels of the factors are combined in the experiment.

The first step in any design or analysis is to start by identifying the factors in the experiment,theirassociated levels, and the treatments in the experiment. Treatments are the combinations of factor levelsthat are ‘applied’1 to experimental units.

The two-factor design has, as the name implies, two factors. We generically call these Factor A andFactor B with a and b levels respectively. We will examine only factorial treatment structures, i.e. everytreatment combination appears somewhere in the experiment. For example, if Factor A has 2 levels, andFactor B has 3 levels, then all 6 treatment combinations appear in the experiment.

Why factorial designs?

Why do we insist on factorial treatment structures? There is a temptation to investigate multi-factoreffects using a ‘change-one-at-time’ structure. For example, suppose you are investigating the effectsof process temperature (at two levels, H & L), fiber type (at two levels - deciduous and coniferous)and initial pulping method (at two levels - mechanical or chemical) upon the strength of paper. In the‘change-one-at-a-time’ treatment structure, the following treatment combinations would be tested:

1. L deciduous mechanical2. H deciduous mechanical3. L coniferous mechanical4. L deciduous chemical

1Recall that in analytical surveys, the factor levels cannot be assigned to units (e.g. you can’t assign sex to an animal) and sothe key point is that units are randomly selected from the relevant population.

c©2019 Carl James Schwarz 494 2019-11-03

Page 4: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

The researcher then argues that the effect of fiber type could be found by examining the difference instrength between treatments (1) and (3); the effect of pulping method could be found by examining thedifference in strength between treatments (4) and (1); and the effect of process temperature could befound by examining the difference in strength between treatments (1) and (2).

This is valid provided that the researcher is willing to assume the treatment effects are additive,i.e., that the effect of process temperature is the same at all levels of the other factors; that the effect offiber type is the same at all levels of the other factors; and that the effect of initial pulping method is thesame at all levels of the other factors. Unfortunately, there is no method available to test this assumptionwith the set of treatments listed above.

It is usually not a good idea to make this very strong assumption - what happens if the assumption isnot true? In the previous example, it means that your ‘effects’ are only valid for the particular levels ofthe other factors that happened to be present in the comparison. For example, the process temperatureeffect would only be valid for deciduous fiber sources that are mechanically pulped.

A superior treatment structure is the factorial treatment structure. In the factorial treatment structure,every combination of levels appears in the experiment. For example, referring back to the previousexperiment, all of the following treatments would appear in the experiment:

1. L deciduous mechanical2. H deciduous mechanical3. L coniferous mechanical4. H coniferous mechanical5. L deciduous chemical6. H deciduous chemical7. L coniferous chemical8. H coniferous chemical

Now, the main effects of each factor are found as:

• main effect of temperature - treatments 1, 3, 5, 7 vs. 2, 4, 6, 8

• main effect of source - treatments 1, 2, 5, 6 vs. 3, 4, 7 and 8

• main effect of method - treatments 1, 2, 3, 4 vs. 5, 6, 7, 8

Each main effect would be interpreted at the ‘average change’ over the levels of the other factors.

In addition, it is possible to investigate if interactions exist between the various factors. For example,is the effect of process temperature the same for mechanical and chemical pulping methods? This wouldbe examined by comparing the change in (1)+(3) vs. (2)+(4) [representing the effect of temperature formechanically pulped wood] and the change in (5)+(7) vs. (6)+(8) [representing the effect of temperaturefor chemically pulped wood]. Can you specify how you would investigated the interaction betweentemperature and source? What about between source and method of pulping? All of these are known astwo factor interactions.

The concept of a two-factor interaction can also be generalized to three-factor and higher interactionterms in much the same way.

c©2019 Carl James Schwarz 495 2019-11-03

Page 5: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Why not factorial designs?

While a factorial treatment structure provides the maximal amount of information about the effects offactors and their interactions, there are some disadvantages. In general, the number of treatments thatwill appear in the experiment is equal to the product of the levels from all of the factors. In an experimentwith many factors, this can be enormous. For example, in a 10 factor design, with each factor at 2 levels,there are 1024 treatment combinations. It turns out that in such large experiment, there are better waysto proceed that are beyond the scope of this course - an example of which is a fractional factorial designwhich selects a subset of the possible treatments to run with the understanding that the subset chosenloses information on some of the higher order interactions. If you are contemplating such an experiment,please seek competent help.

As well, in some cases, interest lies in estimating a response surface, e.g. factors are continuousvariables (such a temperature) and the experimenter is interested in finding the optimal conditions. Thisgives rise to a class of designs called response surface designs which are beyond the scope of thiscourse. Again, seek competent help.

Displaying and interpreting treatment effects - profile plots

An important part of the design and analysis of experiment lies in predicting the type of response ex-pected - in particular, what do you expect for the size of the main effects and do you expect to see aninteraction.

During the design phase, these are useful to determining the power and needed sample sizes foran experiment. During the analysis phase, these values and plots help in interpreting the results of thestatistical analysis.

With two factors (A and B) each at two levels, you can construct a profile plot. These profile plotsshow the approximate effect of both factors simultaneously.

The key thing to look for is the ‘parallelism’ of the two lines.

Profile plots with no interaction between factorsFor example, consider the theoretical [it is theoretical because it shows the population means which arenever known exactly] profile plot of the mean responses below:

c©2019 Carl James Schwarz 496 2019-11-03

Page 6: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

In this plot, the vertical distance between the two parallel line segments is the effect of Factor B, i.e.,what happens to the mean response when you change the level of Factor B, but keep the level of FactorA constant. The main effect of Factor B is the AVERAGE vertical distance between the two lines whenaveraged over all levels of Factor A. Notice that if the lines are parallel, the vertical distance betweenthe two lines is constant - this implies that the effect of Factor B (the vertical distance between the twolines) is the same regardless of the level of Factor A and the effect of Factor B and the main effect ofFactor B are synonymous. In this case, we say that there is NO INTERACTION between Factor A andFactor B. Similarly, the effect of Factor A is the change in the line between the two levels of Factor Aat a particular value of Factor B, i.e., the vertical change in each each line segment. The main effect ofFactor A is the AVERAGE change when averaged over all levels of Factor B. Notice that if the lines areparallel, the vertical change is the same for both lines - this implies that the effect of Factor A is the sameregardless of the level of Factor B and that the effect of Factor A is synonymous with the main effect ofFactor A. Once again, there is no interaction between A and B.

Profile plots with interaction between factors

Now consider the following theoretical profile plot:

c©2019 Carl James Schwarz 497 2019-11-03

Page 7: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

In this plot, the vertical distance between the line segments CHANGES depending on where you arein Factor A. This implies that the effect of Factor B changes depending upon the level of A, i.e., thereis INTERACTION between Factor A and B. The main effect of Factor A is the average effect whenaveraged over levels of B. In this case the main effect is not very interpretable (as will be seen in theplots below). Similarly, the vertical change for each line segment is different for each segment - againthe effect of Factor A changes depending upon the level of Factor B - once again there is interactionbetween A and B.

The plots from an actual experiment must be interpreted with a grain of salt because even if there wasno interaction, the lines may not be exactly parallel because of sampling variations in the sample means.The key thing to look for is the degree of parallelism. And it doesn’t matter which factor is plotted alongthe bottom - the plots may look different, but you will come to the same conclusions.

If there is interaction, the line segments may even cross rather than remaining separate.

Illustrations of various theoretical profile plots

c©2019 Carl James Schwarz 498 2019-11-03

Page 8: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• No main effect of Factor A (average of lines is flat); small main effect of Factor B (if there was nomain effect of Factor B the lines would coincide); and no interaction of Factors A and B.

• Large main effect of Factor A; small main effect of Factor B (average difference between lines issmall); and no interaction between Factors A and B.

• No main effect of Factor A; large main effect of Factor B; and no interaction between Factors Aand B.

c©2019 Carl James Schwarz 499 2019-11-03

Page 9: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• Large main effect of Factor A; large main effect of Factor B; and no interaction between FactorsA and B.

• No main effect of Factor A; no main effect of Factor B; but large interaction between Factors Aand B. This illustrates the dangers of investigating ‘main effects’ in the presence of interaction(why? - a good exam question!).

• Large main effect of Factor A; no main effect of factor B: slight interaction. Again, this diagramillustrates the folly of discussing main effects in the presence of an interaction (why?).

c©2019 Carl James Schwarz 500 2019-11-03

Page 10: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• No main effect of Factor A; large main effect of Factor B; large interaction between Factor A andB. As before, there may be problems in interpreting main effects in the presence of an interaction(why?).

c©2019 Carl James Schwarz 501 2019-11-03

Page 11: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• Small main effect of factor A; large main effect of Factor B; large interaction between Factors Aand B. See previous notes about interpreting main effects in the presence of an interaction.

Further examples of profile plots

• Discuss the example of an environmental impact study. Here interaction indicates that there wasan impact.

• Discuss profile plots for three factors.

• Here is the profile plot for an experiment to investigate the effect of wing depth and wing widthupon the flight of paper airplanes. Based upon the profile plot below, what do you conclude?

The MOF publication Displaying factor relationships, Pamphlet 57 also has a discussion on profileplots and is available at the MOF library at http://www.stat.sfu.ca/~cschwarz/Stat-650/MOF/index.html.

10.1.2 Experimental unit structure

The experimental unit structure is the key element to the design of the experiment and the recognition ofan existing experimental design.

c©2019 Carl James Schwarz 502 2019-11-03

Page 12: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

In single-factor experiments, there is usually only a single size of experimental unit and the onlydesign choice is blocking or not. However, in multifactor designs, the choices are much greater and thepotential problems in design and analysis multiply!

For example, consider the following experimental designs to investigate the effect of light level (attwo levels - High and Low), and the amount of water (also at two levels - Dry or Wet) upon the growthof pine tree seedlings in a greenhouse. There are a total of 8 seedlings.

• Design 1. Each seedling is put into its own pot. One pot is placed in each of 8 separate growthchambers. Two grow chambers are each assigned to each combination of light level or wateramount.

• Design 2. Two seedlings are placed in each pot. One pot is placed in each of 4 separate growthchambers. Each grow chamber is given one combination of light level or water amount.

• Design 3. Each seedling is put into its own pot. Two pots are placed into each of 4 separate growthchambers. Each growth chamber is assigned one combination of light level and water amount.

• Design 4. Each seedling is put into its own pot. Two pots are placed in each of 4 separate growthchambers. Two of the growth chambers are assigned the high light level; two are assigned the lowlight level. Within each chamber, one pot receives the wet water level; one pot the dry water level.

• Design 5. Two seedlings are placed in each pot. Two pots are placed in each of 4 separate growthchambers. Two of the growth chambers are assigned the high light level; two are assigned the lowlight level. Within each chamber, one pot receives the wet water level; one pot the dry water level.The growth of each seedling is measured.

Without much difficulty, more ways could be found to run this experiment! Each different way of designrequires a different analysis!

Which experimental unit structure is better? Design 1 requires the most growth chambers, but iseasiest to run. Design 2 requires fewer growth chambers, but suppose that the particular pot had an effecton growth (e.g. the previous researcher used a herbicide in a previous experiment and didn’t clean thepot properly). Design 3 requires that two pots be placed in each chamber - is the chamber big enough?There is no best design that fits all problems!

In all cases, the data will consist of 8 measures of growth along with the light level and water levelreceived. The data will not tell you about the experimental unit structure! Consequently, it is imper-ative that you think very carefully about the experimental unit structure and give explicit instructions onhow to perform the experiment so that there is no ambiguity. You will see later in the course that everyone of the previous designs would be analyzed in a different way!

In this course, we will look at two popular experimental design choices:

1. the Single-size of experimental unit (with and without blocking)

2. the Split-plot Design (with and without blocking) The Split-plot design (which takes its name fromits agricultural heritage) is the most common ‘complicated’ design and, unfortunately, the designthat is most often analyzed incorrectly. It is discussed in a different chapter.

The simplest designs have a single size of experimental unit and the observation unit is the sameas the experimental unit, i.e. only one measurement is taken on each experimental unit. The greatestadvantage of using a single sized unit is that loss of that unit only entails the loss of one data point. Ifyou are conducting multiple measurements on the same unit (e.g. following a unit over time), then the

c©2019 Carl James Schwarz 503 2019-11-03

Page 13: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

loss of that unit entails the potential loss of much more information. The greatest disadvantage of usinga single-sizes unit is that variation in responses may give poor power. However, the simple strategy ofblocking is often sufficient to improve power without making the design too complicated.

A common problem is pseudo-replication where the observational unit is not the same as the experi-mental unit. Hurlbert (1984) should be reread at this point.

In some designs, multiple measurements are taken on the SAME experimental units - typically re-peated measurements over time.2 The most common reason for multiple measurements on the same unitis to have each unit serve as its own control and thereby have greater power to detect changes over therepeated measurements.

10.1.3 Randomization structure

We have already seen two randomization methods:

• Complete randomization where treatments are assigned completely at random to experimentalunits

• Complete-Block randomization where experimental units are first grouped into blocks, everyblock has every treatment, and treatments are randomized to units within blocks.

The actual randomization in practice is a straightforward generalization of that done before for asingle-factor design and I won’t spend too much time on it but it will be discussed in class.

These will be the only two randomization structures considered in this course. In both cases, there iscomplete randomization over all units or over all units within a block. This is makes TIME a particularlydifficult factor - there is often no randomization to new units at different time points. The problem isthat non-randomization often introduces more complex covariance structures among the responses. Forexample, in repeated measurements over time, measurements that are close together in time would beexpected to be more highly correlated than measurements that are far apart in time. In a complete-randomization scheme, the correlation would not expect to change as a function of time separation.

Here are some example of experiments that are NOT completely randomized designs.

• Measuring plankton levels at various locations and distances from the shore. At each location,samples are taken at 1, 5, and 10 m from the shore. Here the distance from the shore are notrandomized to different locations - each location has all three distances from shore.

• The concentration of a chemical in the blood stream is measured on each rat at 1, 5, and 10 minutesafter injection. The time of measurement is not randomized to individual rats - each rat is measuredthree times.

How could these experiments be redesigned to be CRD’s?

2Many experiments have TIME as one of their factors. If the same units are measured repeatedly over time, this is definitelyNOT a completely randomized design. A more appropriate analysis would be a repeated measures design or a split-plot-in-timedesign. The former is beyond the scope of this course; the latter will be covered in a later section.

c©2019 Carl James Schwarz 504 2019-11-03

Page 14: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.1.4 Putting the three structures together

We will examine the four most popular experimental designs based upon the above three structures. Youmay wish to draw a picture of the experimental layouts.

For example, consider the a plant-growth experiment. There are two factors - light level (High andLow), and water level (Wet or Dry). Some possible designs that we will demonstrate how to analyze inthis course include:

• Completely randomized design. Each seedling is randomly placed in its own pot. Each pot israndomly placed in its own greenhouse. Each greenhouse is randomly assigned one of the fourtreatments at random. The experiments are all run at the same time or run in random order.

• Randomized complete block design. Each seedling is randomly placed in its own pot. Four potsare placed in each greenhouse. Within each greenhouse, each of the four pots is randomly assignedto one of the four treatment combinations. [This may require some modification to the greenhouseso that the two light levels can be applied. within each greenhouse]

• Split-plot - variant A. It may be too difficult to modify the greenhouses to have both light levelsin each greenhouse. Therefore, two greenhouses are randomly assigned to each light level. Withineach greenhouse, two pots are used. These are randomly assigned to the two watering levels.

• Split-plot - variant B. Four green houses are not available at one site, but we have two sitesavailable, each with two greenhouses. Therefore, one greenhouse at each site is randomly assignedto each light level. Within each greenhouse, two pots are used. These are randomly assigned tothe two watering levels.

Further reading Refer to MOF publication What is the design?, Pamphlet 17 from the MOF libraryavailable at http://www.stat.sfu.ca/~cschwarz/Stat-650/MOF/index.html for moredetails.

10.1.5 Balance

Balance is a statistical property of a design. Balance used to be much more important in the days ofhand computations when the computations for balanced designs were particularly easy to do. This is lessimportant today in the age of computers but the unwary traveler may hit a few pot holes as will be seenin later sections.

The very simplest balanced design has an equal number of replicates assigned to each treatmentcombination. Balance in the experiment will give the greatest power to detect differences among thevarious treatments. As well, it makes the analysis particularly simple and most computer packages willdo a good job of the analysis.

However, because of deliberate decision or demonic interference, unbalanced designs (unequal num-bers of replicates for each treatment) can occur. Fortunately, the analysis of such designs in the simplecase of a two-factor completely-randomized design is straightforward, but there are a few subtle prob-lems that will be pointed out by example. Some computer packages will give incorrect answers in thecase of unbalanced data.

As you will see in a future section, the greatest danger in unbalanced designs is the lack of a completefactorial treatment structure. These type of experiments are extremely difficulty to analyze properly.

c©2019 Carl James Schwarz 505 2019-11-03

Page 15: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.1.6 Fixed or random effects

In some cases, the choice of levels for a factor is also of concern. If the experiment were to be repeated,would the same levels be chosen (fixed effects) or would a new set of levels be chosen (random effects).Or, is interested limited to the effects of the levels that actually occurred in the experiment (fixed effects)or do you wish to generalize to a large population of levels from which you happened to choose a fewfor this experiment (random effects).

As an illustration, consider an experiment on the effects of soil compaction on subsequent treegrowth. Suppose that the experimenter obtained seedlings from several different seed sources. Thisexperiment could be viewed as having two factors - the level of soil compaction, and the seeding source.

Presumably, if the experiment were to be repeated, the same levels of compaction would be of inter-est. As well, these levels of compaction are of interest in their own right. Hence, compaction would betreated as a fixed effect.

However, what about the factor seed source. If the experiment were to be repeated, would the samesources of seeds be used? Are these the only sources of seeds available, or are there many other sources,of which only a few were chosen to be in this experiment? Do you want to extend your inference to otherseed sources, or are these the only ones that you are really interested in?

Usually, you will want to argue that your conclusions should extend to other seed sources. If this isthe case, then you must be able to argue that the sources you used are in some sense “typical of the onesto which you want to extend your inference”. The simplest way to do this is to argue that the sourcesyou selected were essentially a random sample of all possible seed sources to which you wish to extendyour inference. This factor is then a random effect.

The crucial first step in this model building is deciding which factors are fixed and which factors arerandom effects.

A factor is a fixed effect if :

• the same levels would be used if the experiment were to be repeated;

• inference will be limited ONLY to the levels used in the experiment;

A factor is a random effect if:

• the levels were chosen at random from a larger set of levels

• new levels would be chosen if the experiment were to be repeated

• inference is about the entire set of potential levels - not just the levels chosen in the experiment.

Typical fixed effects are factors such as sex, species, dose, chemical. Typical random effects aresubject, locations, sites, animals.

Examples of random effects would be subjects, batches of material, or experimental animals. Rarelyare experiments run with an interest in the effects of those particular subjects, batches, or experimentalanimals - it is quite common to try and extrapolate results to future, different, subjects, batches, oranimals.

c©2019 Carl James Schwarz 506 2019-11-03

Page 16: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

We will start with demonstration of the analysis of experiments where ALL EFFECTS are fixedeffects. You can still proceed to analyze experiment with random effects the same way as before — upto a point. It turns out that this seemingly innocuous change to a factor has dramatic implications for theanalysis of the experiment! As well MANY poorly written packages will give WRONG RESULTS!In addition, contrary to the impression that statistics is a static science, the whole area of the analysisof models with fixed and random effects is undergoing a revolution in the statistical world. Many ofthe newer techniques are not discussed in textbooks and certainly not in the published literature. Evenexperienced statisticians have difficulty in keeping up with advances in this area.

For all but the simplest cases, seek help with models containing combinations of fixed and randomeffects (often called mixed models). Please contact me for additional help.

10.1.7 Assumptions

Each and every statistical procedure makes a number assumptions about the data that should be verifiedas the analysis proceeds. Some of these assumptions can be examined using the data at hand; other, oftenthe most important can only be assessed using the meta-data about the experiment. Fortunately, manyof the assumptions are identical to those seen in previous chapters. Please consult previous chapters ondetails on how to verify the assumptions.

The most important assumptions to examine are:

• The analysis matches the design! Enough said in past chapters.

• Equal variation within treatment groups. All the populations corresponding to treatments haveequal variances. This can be checked by looking at the sample standard deviations for each group(where each group is formed by one of the treatment combinations). Unless the ratio between thestandard deviations is larger than about 5:1, this is not likely a problem. Procedures are availablefor cases where the variances are not equal in all groups. Fortunately, ANOVA is fairly robust tounequal variances if the design is balanced.

Often you can anticipate an increase in the amount of chance variation with an increase in themean. For example, traps with an ineffective bait will typically catch very few insects. Thenumbers caught may typically range from 0 to under 10. By contrast, a highly effective bait willtend to pull in more insects, but also with a greater range. Both the mean and the standard deviationwill tend to be larger. A transformation may be called for (e.g. take logarithms of the response).

• No outliers. There are no outliers or unusual points. Look at the side-by-side dot plots formed bythe treatment groups. Examine the residual plots after the model is fit.

• Normality within each treatment group. If the sample sizes are small in each group, thenyou must further assume that each population has a normal distribution. If the sample sizes arelarge in all groups, you are saved by the ‘central limit theorem’. Normal probability plots withineach treatment group, or the residuals found after the model fitting procedure can be examined.However, these likely have poor power when the sample sizes are small and will detect minutedifferences when sample sizes are large. Hence, they are often not very informative.

• Are the errors independent? Another key assumption is that experimental units are independentof each other. For example, the response of one experimental animal does not affect the responseof another experimental animal.

c©2019 Carl James Schwarz 507 2019-11-03

Page 17: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.1.8 General comments

The key to a proper analysis of any experiment is recognizing the design that was used and then specify-ing a statistical model that incorporates the sources of variation in the design.

As you will see, this statistical model will have terms representing the main effects and interactionsof the factors and terms for every size of experimental unit in the experiment. [The latter will becomeimportant when we analyze a split-plot design.]

Once the model is specified, then the analysis of variance method (ANOVA) partitions the totalvariation in the observed responses into sources - one for each component representing a main effect oran interaction, and one for every size of experimental unit. [Again, the latter will become more importantin split-plot designs.] For example, in the single factor CRD design, the ANOVA table consisted of a linefor total variation which is then split into sources representing the contributions from the single factor(the treatment sum of squares), and a contribution for experimental unit effects (the error sum of squares).In the single factor RCB design, the ANOVA table introduced yet another entry for the contribution fromblocks (the block sum of squares).

As you will see later in this chapter, two factor design will have lines in the ANOVA table corre-sponding to the interaction of the two factors and their respective main effects.

Then, starting with interaction, you successively test the hypothesis of ‘no effect’ from that source,i.e., you first test the hypothesis of no interaction effects, and then, depending upon the results of the test,you may or may not wish to test the main effects.

Rarely, if ever, are tests performed on experimental unit effects - it would be quite rare to expect thatthe experimental units are exactly identical!

Again, the hypothesis tests only tell you that some effect exists - it doesn’t tell you where the effectlies. You may need to explore the responses using multiple comparison procedures and/or confidenceintervals for the marginal means or contrasts among means.

As before, you should always assess that your model adequately fits the data, and as well beforeperforming the experiment, determine if the sample size is adequate to detect biologically importanteffects.

When reporting results in a paper or thesis, try not to overburden the reader - no one is interested inthe minute details - they want a broad picture - everything else can likely go into an appendix.

Be sure you carefully describe the experimental design so that some one can verify how the experi-ment was done.

I would recommend that you ALWAYS show a profile-plot of the means of the various treatmentcombinations along with approximate 95% confidence intervals - this often tells the entire story.

In terms of the actual statistical computations, usually the F -statistics are reported along with thep-values, but rarely are all the ANOVA tables shown except in appendices or simple tables. In this dayof the WWW, the raw data are often made available on a Web site in case someone else wishes to verifyyour analysis.

c©2019 Carl James Schwarz 508 2019-11-03

Page 18: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.2 Example - Effect of photo-period and temperature on gona-dosomatic index - CRD

A completely randomized design (CRD) is the simplest of the two-factor designs and serves as an intro-duction for the analysis of more complex designs. As noted many times in this course, it is importantto match the analysis of the data with the way the data was collected. Before attempting to analyze anyexperiment, the features of the experiment should be examined carefully. In particular, the treatment,experimental unit, and randomization structures should be determined; is the design balanced or unbal-anced; are the levels of factors fixed or random effects; and what are the assumptions implicitly madefor the design.

The Mirogrex terrau-sanctae is a commercial sardine like fish found in the Sea of Galilee. A studywas conducted to determine the effect of light and temperature on the gonadosomatic index (GSI), whichis a measure of the growth of the ovary. [It is the ratio of the gonad weight to the non-gonad weight.]Two photo-periods – 14 hours of light, 10 hours of dark and 9 hours of light, 15 hours of dark – and twotemperature levels – 16◦C and 27◦C – are used. In this way, the experimenter can simulate both winterand summer conditions in the region.

Twenty females were collected in June. This group was randomly divided into four subgroups - eachof size 5. Each fish was placed in an individual tank - this is extremely important for the analysisof this experiment, as you will see later, and received one of the four possible treatment combinations.At the end of 3 months, the GSI was measured.

Here are the raw data:

Temperature Photo-period

9 hours 14 hours

27◦C 0.90 0.83

1.06 0.67

0.98 0.57

1.29 0.47

1.12 0.66

16◦C 2.31 1.01

2.88 1.52

2.42 1.02

2.66 1.32

2.94 1.63

10.2.1 Design issues

There are two factors in this experiment - photo-period with 2 levels; and temperature also with 2 levels.

What is the treatment structure?

All of the 4 possible treatment combinations (which are?) appear in this study - hence it has afactorial treatment structure.

c©2019 Carl James Schwarz 509 2019-11-03

Page 19: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Now the purpose of this experiment was to simulate summer and winter conditions - however, two ofthe treatment combinations seem unnatural. Why were these treatment combinations used? How couldyou run this experiment if you really were interested only in the summer and winter conditions? Is anyconfounding taking place?

What is the experimental unit structure?The experimental units were individual tanks and the observational units were the individual fish withina tank. There is only one observation unit per experimental unit.

There are a total of 20 fish each of which was placed in an individual tank. At first glance, thisseems wasteful - 20 tanks are needed with five separate tanks needed for each treatment to get the samephoto-period and temperature treatment combination. What is the problem if you used only 4 tankswith 5 fish in each tank? [Hint - what are then the experimental and observation units - and is thispseudo-replication?]

What is the randomization structure?The article on which this section of the notes was based, is not very clear, but the treatments appeared tobe completely randomly assigned to the tanks, etc.

BalanceThe design is balanced as an equal number of replicates was performed for each treatment combination.

Fixed or random factors?Both of factors are considered fixed effects. In this case, you would use exactly the same levels of bothfactors if the experiment were repeated and you are not interested in extrapolating to a new set of factorlevels in a new experiment - therefore both of the factors are treated as fixed effects.3

Completely Randomized DesignBecause there is a single size of experimental unit (the fish in a tank) that has been complete randomizedto the treatment combinations (the combination of temperature and photoperiod) this is Two-FactorCompletely Randomized Design and the analysis is fairly simple.

10.2.2 Preliminary summary statistics

Before doing any formal analyses, it is always advisable to do some preliminary plots and compute somesimple summary statistics - even if these don’t fully tell the whole story. Here are some simple plots andsummary statistics The above data must be converted to a data file in standard format.

First the data has to be converted to columnar format with one column being the response (theGSI), and two columns representing the two factors. It is advantageous to use alphanumeric codes forfactor levels as then there is no ambiguity if these values represent categories or if the levels represent aregression effect. Here is the data in proper columnar format. The order of the columns is not important,nor is the ordering of the rows.

3Usually random effects are restricted to factors such as subjects (where you would choose a new set of subjects for a newexperiment), or experimental units (you would choose a new set of fish for another experiment).

c©2019 Carl James Schwarz 510 2019-11-03

Page 20: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Temp- Photo-

erature period GSI

27C 09h 0.90

27C 09h 1.06

27C 09h 0.98

27C 09h 1.29

27C 09h 1.12

16C 09h 2.31

16C 09h 2.88

16C 09h 2.42

16C 09h 2.66

16C 09h 2.94

27C 14h 0.83

27C 14h 0.67

27C 14h 0.57

27C 14h 0.47

27C 14h 0.66

16C 14h 1.01

16C 14h 1.52

16C 14h 1.02

16C 14h 1.32

16C 14h 1.63

The raw data is available in a datafile called gsi.csv available at the Sample Program Library athttp://www.stat.sfu.ca/~cschwarz/Stat-Ecology-Datasets. The data are importedinto SAS in the usual way:

data gsi; /* read in the raw data */infile ’gsi.csv’ dlm=’,’ dsd missover firstobs=2;length temp $3. photo $3.;input temp $ photo $ gsi;length trt $15.;trt = temp || ’-’ || photo; /* create a trt variable *//* create some missing values for an unbalanced analysis */gsi2 = gsi;if temp = ’27C’ & Photo = ’09h’ & gsi < 1 then gsi2 = .;if temp = ’16C’ & Photo = ’14h’ & gsi > 1.3 then gsi2 = .;

run;

Part of the raw data are shown below:

Obs temp photo gsi trt gsi2

1 16C 09h 2.31 16C-09h 2.31

2 16C 09h 2.88 16C-09h 2.88

3 16C 09h 2.42 16C-09h 2.42

c©2019 Carl James Schwarz 511 2019-11-03

Page 21: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Obs temp photo gsi trt gsi2

4 16C 09h 2.66 16C-09h 2.66

5 16C 09h 2.94 16C-09h 2.94

6 16C 14h 1.01 16C-14h 1.01

7 16C 14h 1.52 16C-14h .

8 16C 14h 1.02 16C-14h 1.02

9 16C 14h 1.32 16C-14h .

10 16C 14h 1.63 16C-14h .

The gsi2 variable is used for analysis of the same experiment with unbalanced data.

Sometimes it is easier to create a ‘pseudo-factor’ consisting of the actual treatment levels to makesimple plots and to find simple summary statistics. A new variable was created by concatenating thevalues of the two factor levels together.

Because this is a completely randomized design, there is no conceptual difference between a twofactor design (each with 2 levels) and a single factor design with 4 levels. In more complex designswhere there are multiple sizes of experimental units (blocked or strip- or split-plot designs), this is nottrue.

We begin by creating side-by-side dot plots:

proc sgplot data=gsi;title2 ’Side-by-side dot plots’;yaxis label=’GSI’ offsetmin=.05 offsetmax=.05;xaxis label=’Treatmemt’ offsetmin=.05 offsetmax=.05;scatter x=trt y=GSI / markerattrs=(symbol=circlefilled);

run;

which gives

c©2019 Carl James Schwarz 512 2019-11-03

Page 22: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

There is no evidence of any outliers or unusual points.

We also compute the means and standard deviations for each treatment group using the Tabulateprocedure:

proc tabulate data=gsi; /* proc tabulate is not for the faint of heart */title2 ’Summary table of means, std devs’;class temp photo;var gsi;table temp*photo, gsi*(n*f=5.0 mean*f=5.2 std*f=5.2 stderr*f=5.2) /rts=15;

run;

giving

gsi

N Mean Std StdErr

temp photo 5 2.64 0.28 0.12

16C 09h

14h 5 1.30 0.28 0.13

27C 09h 5 1.07 0.15 0.07

14h 5 0.64 0.13 0.06

Because the overall design is a CRD, the standard errors are easily computed using the raw data asse = s√

nwhere the sample standard deviation, s, and the sample size n are found for EACH treatment.

If a blocking factor was available, the various packages could also have computed proper standard errorsafter adjusting of the blocking variable, e.g. by block centering. However, in more complex experi-mental designs, the standard errors computed using the above formula would not be correct and shouldnot be relied upon. The simple confidence intervals are also only valid under the CRD.

Hmmm.. the standard deviation seems to show that the variability at 27◦C is about 1/2 of that at16◦C. This is an interesting effect in its own right - however, the change in standard deviation is smallenough that it shouldn’t be too much of a concern for this problem. [As a rough rule of thumb, unlessthe ratio of standard deviations from small samples is on the order of at least 3 to 5 times different, thereis likely nothing to worry about.]

It is quite common to do a plot of the log sd vs. logmean to see if there is a consistent relationshipbetween the standard deviation and the mean. If there is a consistent relationship, Taylor’s Power Lawoften provides a suitable transformation to make the standard deviation roughly equal in all treatmentgroups. This plot (not shown) does not suggest any simple transformation is needed.

The design is balanced - every treatment has the same number of replications - this makes the analysiseasier. Also every treatment combination has some data - missing cells where some cells have no data area REALLY MESSY PROBLEM. Most statisticians even have difficulty in analyzing such experiments- beware!

You should also draw a preliminary profile plot to get a sense of the level of Interaction, if any. Wecan find the sample means and confidence limits for each population mean quite simply because this is aCRD.

c©2019 Carl James Schwarz 513 2019-11-03

Page 23: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

proc sort data=gsi;by temp photo;

run;

proc means data=gsi noprint; /* find simple summary statistics */by temp photo;var gsi;output out=means n=n mean=mean std=stddev stderr=stderr lclm=lclm uclm=uclm;

run;

proc sgplot data=means;title2 ’Profile plot with 95% ci on each mean’;series y=mean x=temp / group=photo;highlow x=temp high=uclm low=lclm / group=photo;yaxis label=’GSI’ offsetmin=.05 offsetmax=.05;xaxis label=’Temperature’ offsetmin=.05 offsetmax=.05;

run;

and then get the profile plot:

It appears that there may be a bit of interaction between the two factors - the lines are not parallel. Itis fairly easy to assess interaction if the approximate 95% c.i. were drawn for each mean.

Looking at the profile plot above, what is the effect of photo-period at 16◦C? at 27◦C? What is theeffect of temperature at 9 h? at 14 h?

10.2.3 The statistical model

The statistical model for any design has terms corresponding to the treatment, experimental unit, andrandomization structure. Fortunately, in simple designs, the latter two are often implicit and do NOThave to be specified by the analyst.

c©2019 Carl James Schwarz 514 2019-11-03

Page 24: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Any factorial treatment structure will have terms corresponding to interactions and main effects.

In cases where there is only one size of experimental unit, and no subsampling or pseudo-replication,and no blocking, then it is not necessary to specify any term for experimental unit effects (this corre-sponds to the MSE line in the ANOVA table).

In cases of complete randomization, there is no need to specify anything further in the model. Incases of non-randomization (e.g. repeated measurement over time, you might specify the covariancestructure of the observations).

This gives a model often written as:

GSI = temp photo temp ∗ photo

What the statistical model says is that we recognize that the observed GSI response values (left of equalssign) are not all the same. What are the various sources of variation in the observed responses? Theseappear to the right of the equal sign. Well, we expect some differences due to the main effects of tem-perature, some differences due to the main effects of photo-period, some differences possibly caused byan interaction between photo-period and temperature. Note that the ‘*’ does NOT imply multiplication,but rather an interaction between two factors. The terms can be written in any order.

There are NO terms representing experimental units (this implies there is a single size of experimentalunit), nor any terms representing randomization effects (complete randomization is assumed).

10.2.4 Fitting the model

The model can be fit using least squares. More complex designs may require more complex fittingprocedures.

There are two main procedures in SAS for fitting ANOVA and regression models (collectively calledlinear models). FIrst is Proc GLM which performs the traditional sums-of-squares decomposition. Sec-ond is Proc Mixed which uses restricted maximum likelihood (REML) to fit models. In models with onlyfixed effects (e.g. those in this chapter), this gives the same results as sums-of-squares decompositions.In unbalanced data with additional random effects, the results from Proc Mixed may differ from that ofProc GLM – both are correct, and are just different ways to deal with the approximations necessary forunbalanced data. Proc Mixed also has the advantage of being able to bit more complex models with morethan one size of experimental unit. There is no clear advantage to using one procedure or the other – itcomes down to personal preference. Both procedures will be demonstrated in this chapter – I personallyprefer to use Proc Mixed.

Here is the code for Proc GLM:

ods graphics on;proc glm data=gsi plots=all;

title2 ’Anova’;class photo temp; /* class statement identifies factors */model gsi = photo temp photo*temp;/* because interaction is significant, only get trt means and do multiple comp */lsmeans photo*temp / cl stderr pdiff adjust=tukey lines slice=photo slice=temp;lsmeans photo / cl stderr pdiff adjust=tukey;lsmeans temp / cl stderr pdiff adjust=tukey ;

run;

c©2019 Carl James Schwarz 515 2019-11-03

Page 25: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

ods graphics off;

Here is the code for Proc Mixed:

ods graphics on;proc mixed data=gsi plots=all;

title2 ’ANOVA using Mixed’;class photo temp;model gsi=photo temp photo*temp / ddfm=kr;lsmeans photo*temp / adjust=tukey diff cl slice=photo slice=temp;lsmeans photo / adjust=tukey diff cl;lsmeans temp / adjust=tukey diff cl;ods output tests3 =MixedTest; /* needed for the pdmix800 */ods output lsmeans=MixedLsmeans;ods output diffs =MixedDiffs;ods output slices =MixedSlices;

run;ods graphics off;

/* Get a joined lines plot */%include ’../pdmix800.sas’;%pdmix800(MixedDiffs,MixedLsmeans,alpha=0.05,sort=yes);

In both procedures, the Class statement specifies the categorical factors for the model. Notice how youspecify the statistical model in the Model statement – it is very similar to the statistical model seen earlier.The extra code at the end of Proc Mixed is to generate the joined-line plots that we’ve seen earlier basedon the output from the emmeans statements (see below).

10.2.5 Hypothesis testing and estimation

The output from the fitting procedure is often divided into sections corresponding to the whole modeland then sections corresponding to each individual term in the model.

The first output below is not very useful - it is a Whole Model test which simply examines if thereis evidence of an effect for any term in the model. It is rarely useful.

Source DF Sum of Squares Mean Square F Value Pr > F

Model 3 11.19194000 3.73064667 76.07 <.0001

Error 16 0.78468000 0.04904250

Corrected Total 19 11.97662000

No such table is produced by Proc Mixed.

The Effects Table breaks down the Model line in the whole model test into the components forevery term in the model. Some packages give you a choice of effect tests. For example, SAS will printout a Type I, II, and III tests - in balanced data these will always be the same and any can be used. Inunbalanced data, these tests will have different results - which test is the ‘correct’ test is still an item of

c©2019 Carl James Schwarz 516 2019-11-03

Page 26: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

controversy among statisticians and can (and do) results in fist-fights among the various camps (and youthought statistics was dull!)Here is the table for the Effect Tests computed by Proc GLM:

Source DF Type III SS Mean Square F Value Pr > F

photo 1 3.92498000 3.92498000 80.03 <.0001

temp 1 6.22728000 6.22728000 126.98 <.0001

photo*temp 1 1.03968000 1.03968000 21.20 0.0003

Here is the table for the Effect Tests computed by Proc Mixed. Because the model only consists offixed effects, the results are identical to those from Proc GLM:

Type 3 Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

photo 1 16 80.03 <.0001

temp 1 16 126.98 <.0001

photo*temp 1 16 21.20 0.0003

Start the hypothesis testing with the most complicated effects (usually interactions) and work towardssimpler terms (main effects).

In this case, we start with the test for no interaction effects.

Our null hypothesis isH: no interaction among photo-period and temperature in their effects on the mean GSI levelA: some interaction among photo-period and temperature in their effects on the mean GSI level.

Our test statistic is F=21.2, the p-value (.0003) is very small. There is very strong evidence of aninteraction between the two factors in their effects on the mean GSI level. This is not surprising, theprofile plots showed that the lines didn’t appear to be too parallel.

What does a statistically significant interaction mean? It implies that the effect of temperature uponthe mean GSI is different at the the various photo-period levels. Similarly, the effect of photo-periodupon the mean GSI index is different at the two temperature levels.

If you detect an interaction, it usually doesn’t make much sense to continue along to test main effectsbecause, by definition, these are not consistent - e.g. the effect of temperature is different at the twophoto-periods.

What to do if an interaction is present?

There is no single way to proceed after this point. There are two common approaches. First is touse a multiple comparison procedure on the combination of factor levels to examine which means of theindividual treatments may differ from each other. Because there was strong evidence of an interaction(non-parallism), you should not examine the main effects unless the non-parallelism is small. Rather,you should now find estimates of the population means for each combination of the factor levels (alongwith the standard errors), and then do a multiple comparison procedure (e.g. Tukey’s) to examine whichpairs of means could differ. SAS also provides a mechanism to examine slices of the data, e.g. for each

c©2019 Carl James Schwarz 517 2019-11-03

Page 27: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

factor level separately – please contact me for more details on this.

Here are estimates of the marginal means (and standard errors) and then all of the pairwise differencesare compared using Tukey’s procedure. These are requested using the emmeans statement in the GLMprocedure. The emmeans estimates are equal to the raw sample means – this is will be true ONLY inbalanced data. In the case of unbalanced data (see later), the emmeans seem like a sensible way toestimate marginal means.

photo temp gsi LSMEAN Standard Error Pr > |t| LSMEAN Number

09h 16C 2.64200000 0.09903787 <.0001 1

09h 27C 1.07000000 0.09903787 <.0001 2

14h 16C 1.30000000 0.09903787 <.0001 3

14h 27C 0.64000000 0.09903787 <.0001 4

photo temp gsi LSMEAN 95% Confidence Limits

09h 16C 2.642000 2.432049 2.851951

09h 27C 1.070000 0.860049 1.279951

14h 16C 1.300000 1.090049 1.509951

14h 27C 0.640000 0.430049 0.849951

Least Squares Means for effect photo*tempPr > |t| for H0: LSMean(i)=LSMean(j)Dependent Variable: gsi

i/j 1 2 3 4

1 <.0001 <.0001 <.0001

2 <.0001 0.3845 0.0333

3 <.0001 0.3845 0.0012

4 <.0001 0.0333 0.0012

Least Squares Means for Effect photo*temp

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 1.572000 1.171287 1.972713

1 3 1.342000 0.941287 1.742713

1 4 2.002000 1.601287 2.402713

2 3 -0.230000 -0.630713 0.170713

2 4 0.430000 0.029287 0.830713

3 4 0.660000 0.259287 1.060713

Proc GLM also provides a difference plot.

c©2019 Carl James Schwarz 518 2019-11-03

Page 28: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

Tukey Comparison Lines for Least Squares Means of photo*temp

LS-means with the same letter are not significantly different.

gsi LSMEAN photo temp LSMEAN Number

A 2.642 09h 16C 1

B 1.300 14h 16C 3

B

B 1.070 09h 27C 2

C 0.640 14h 27C 4

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

c©2019 Carl James Schwarz 519 2019-11-03

Page 29: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Least Squares Means

Effect photo temp Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper

photo*temp 09h 16C 2.6420 0.09904 16 26.68 <.0001 0.05 2.4320 2.8520

photo*temp 09h 27C 1.0700 0.09904 16 10.80 <.0001 0.05 0.8600 1.2800

photo*temp 14h 16C 1.3000 0.09904 16 13.13 <.0001 0.05 1.0900 1.5100

photo*temp 14h 27C 0.6400 0.09904 16 6.46 <.0001 0.05 0.4300 0.8500

photo 09h 1.8560 0.07003 16 26.50 <.0001 0.05 1.7075 2.0045

photo 14h 0.9700 0.07003 16 13.85 <.0001 0.05 0.8215 1.1185

temp 16C 1.9710 0.07003 16 28.14 <.0001 0.05 1.8225 2.1195

temp 27C 0.8550 0.07003 16 12.21 <.0001 0.05 0.7065 1.0035

and then the pairwise differences:

photo temp _photo _temp EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

09h 16C 09h 27C 1.5720 0.1401 Tukey <.0001 1.1713 1.9727

09h 16C 14h 16C 1.3420 0.1401 Tukey <.0001 0.9413 1.7427

09h 16C 14h 27C 2.0020 0.1401 Tukey <.0001 1.6013 2.4027

09h 27C 14h 16C -0.2300 0.1401 Tukey 0.3845 -0.6307 0.1707

09h 27C 14h 27C 0.4300 0.1401 Tukey 0.0333 0.02929 0.8307

14h 16C 14h 27C 0.6600 0.1401 Tukey 0.0012 0.2593 1.0607

09h 14h 0.8860 0.09904 Tukey <.0001 0.6761 1.0959

16C 27C 1.1160 0.09904 Tukey <.0001 0.9061 1.3259

The pdmix800 macro is used to produce the joined-line plots:

Effect=photo*temp Method=Tukey(P<0.05) Set=1

Obs photo temp Estimate Standard Error Alpha Lower Upper Letter Group

1 09h 16C 2.6420 0.09904 0.05 2.4320 2.8520 A

2 14h 16C 1.3000 0.09904 0.05 1.0900 1.5100 B

3 09h 27C 1.0700 0.09904 0.05 0.8600 1.2800 B

4 14h 27C 0.6400 0.09904 0.05 0.4300 0.8500 C

The above output can be used to see which treatment means appear to differ from each other.

This can also be accomplished by redoing the analysis using the pseudo-factors that you definedearlier. This converts the experiment from a two-factor CRD to a single factor CRD with 4 levels. Thisapproach is only valid because the original design is CRD.

Below is the result of such an analysis (using JMP only, but the other packages give similar results).

c©2019 Carl James Schwarz 520 2019-11-03

Page 30: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

There is no difference in conclusion vs. at the approach above.

The ANOVA table is identical to Overall Model ANOVA table earlier. Here are the estimated meansfor each treatment and the estimated standard errors – identical to the above analysis.

We perform a Tukey-multiple comparison procedure which gives the same results as found previ-ously.

What can you conclude from these analyses? In particular, can you see why interaction was detected?Which means appear to be different from the others?

c©2019 Carl James Schwarz 521 2019-11-03

Page 31: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Secondly, some authors suggest that you now break up the data into two mini- experiments and ana-lyze each separately. For example, analyze each photo-period separately and analyze each temperaturelevel separately to estimate the effects at each of the various levels. As these mini-experiments are nowsimply single-factor CRDs (in this case two-sample t-test) all the machinery that we had before can bebrought into bear. The disadvantage of this approach is that you forgo pooling of the error variancesfrom all four groups.

It is better to look at slices of the data. This simply does a comparison of the mean between thephoto-periods for each temperature, or a comparison of the means between the temperatures for eachphoto-period as above, but does this after the combined analysis. The advantage of using an slice analysisis that the standard deviation (from the residuals) is computed once from ALL of the data, rather than foreach slice. This gives a much better estimate of the (common) standard deviation used for each slice.

The slice option on the emmeans statement gives the appropriate slice test of temperature for eachphoto period:

Effect photo tempNumDF

DenDF

FValue

Pr>F

photo*temp 09h 1 16 125.97 <.0001

photo*temp 14h 1 16 22.21 0.0002

In both cases, there is evidence that the mean GSI differs between the two temperature levels foreach photoperiod, but the estimated difference in the means is quite different at the two photo-periods(as expected because that is the definition of an interaction!).

The slice analysis in the other direction is done in a similar fashion and not shown here – consult thepackage code and output for details.

10.2.6 Model assessment

Don’t forget to examine the residual and other diagnostic plots. Here is the diagnostic plot from ProcGLM:

c©2019 Carl James Schwarz 522 2019-11-03

Page 32: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Here is the diagnostic plot from Proc Mixed:

There is no evidence of any gross problems in the analysis. There is some evidence that the residualsare not equally variable (the top left plot, and as was suggested by the different standard deviationsbetween the levels of temperature seen earlier), but the difference in standard deviations is not large.

c©2019 Carl James Schwarz 523 2019-11-03

Page 33: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.2.7 Unbalanced data

If the design is unbalanced, the analysis proceeds basically the same as before except some care must betaken with some statistical packages (e.g. R) that do not provide the appropriate ANOVA table. This iscovered in detail later in this chapter. An analysis where some observations were deleted is available inthe computer output on the web.

10.3 Example - Effect of sex and species upon chemical uptake -CRD

Several persistent chemicals accumulate up the food chain. Different species may differ in the amountof of chemicals accumulated because of different prey availability or other factors. Because of differentbehavior, the accumulated amount may also vary by sex.

A survey was conducted to investigate how the amount of PCBs varied among three different speciesof fish in Nunavut (the new Canadian territory just to east of the Restofit and just north of Ulofit).Samples were taken from four fish of each sex and species and liver PCB levels (ppm) were measured.

Here are the raw data:

c©2019 Carl James Schwarz 524 2019-11-03

Page 34: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

PCB sex Species

21.5 m sp1

19.6 m sp1

20.9 m sp1

22.8 m sp1

14.5 m sp2

17.4 m sp2

15.0 m sp2

17.8 m sp2

16.0 m sp3

20.3 m sp3

18.5 m sp3

19.3 m sp3

14.8 f sp1

15.6 f sp1

13.5 f sp1

16.4 f sp1

12.1 f sp2

11.4 f sp2

12.7 f sp2

14.5 f sp2

14.4 f sp3

14.7 f sp3

13.8 f sp3

12.0 f sp3

The raw data is available in a datafile called pcb.csv available at the Sample Program Library athttp://www.stat.sfu.ca/~cschwarz/Stat-Ecology-Datasets. The data are importedinto SAS in the usual way:

data pcb; /* read in the raw data */infile ’pcb.csv’ dlm=’,’ dsd missover firstobs=2;length sex $10. species $10. trt $15.;input pcb sex $ species $;trt = compbl(sex || "-" || species); /* create a pseudo factor */

run;

The raw data are shown below:

Obs sex species trt pcb

1 f sp1 f -sp1 14.8

2 f sp1 f -sp1 15.6

3 f sp1 f -sp1 13.5

4 f sp1 f -sp1 16.4

c©2019 Carl James Schwarz 525 2019-11-03

Page 35: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Obs sex species trt pcb

5 f sp2 f -sp2 12.1

6 f sp2 f -sp2 11.4

7 f sp2 f -sp2 12.7

8 f sp2 f -sp2 14.5

9 f sp3 f -sp3 13.8

10 f sp3 f -sp3 12.0

10.3.1 Design issues

There are two factors in this experiment - sex with 2 levels and species with 3 levels.

What is the treatment structure? All of the 6 possible treatment combinations (which are?) appear inthis study - hence it has a factorial treatment structure.

What is the experimental unit structure? Hmmm . . . an interesting question. In observational studiesit is often not clear what are the experimental and observational units. For example, is this like a ‘fishtank’ study where all the fish in a particular location are subjected to the same treatments (i.e., depositedPCBs). Or is each fish subjected to its own experience?

This is very common problem in observation studies and you should be very careful about the dangersof pseudo-replication that we explored earlier.

For now, lets treat the experimental units as individual fish and the observational units as the individ-ual fish. There is only one observation unit per experimental unit.

Finally, what is the randomization structure? Again, this is often not clear in observational studies.First, it is quite impossible to randomly assign sex or species to fish. You must view the randomizationas arising from the selection process. Are these fish randomly selected from the entire population of fishof each species and sex? Or is the sample a convenience sample - i.e., the fish closest to the researchstation that are easiest to catch?

In any observational study, you must be careful that the units measured are a proper random samplefrom the relevant populations.

Are the factors to be considered fixed effects? Does it seem reasonable that if you were to repeat thesurvey, you would select the same sexes and species? In this case, you would use exactly the same levelsof both factors - they are fixed-effects.

Hence, this experiment appears to satisfy the requirements for a two-factor fixed-effects CRD. Inparticular, the “randomization” was to individual experimental units and the observational unit is thesame as the experimental unit.

10.3.2 Preliminary summary statistics

Again, create some simple summary statistics. We will create a ‘pseudo-factor’ consisting of the com-bination of the factor levels from the two factors so that summary statistics can be computed on eachgroup. A new variable was created by concatenating the values of the two factor levels together.

c©2019 Carl James Schwarz 526 2019-11-03

Page 36: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

We begin by creating side-by-side dot plots:

proc sgplot data=pcb;title2 ’Side-by-side dot plots’;yaxis label=’PCB’ offsetmin=.05 offsetmax=.05;xaxis label=’Treatmemt’ offsetmin=.05 offsetmax=.05;scatter x=trt y=PCB / markerattrs=(symbol=circlefilled);

run;

which gives

We also compute the means and standard deviations for each treatment group:

proc tabulate data=pcb; /* proc tabulate is not for the faint of heart */title2 ’Summary table of means, std devs’;class sex species;var pcb;table sex*species, pcb*(n*f=5.0 mean*f=5.2 std*f=5.2 stderr*f=5.2) /rts=15;

run;

giving

pcb

N Mean Std StdErr

sex species 4 15.08 1.24 0.62

f sp1

sp2 4 12.68 1.33 0.66

sp3 4 13.73 1.21 0.60

m sp1 4 21.20 1.33 0.66

c©2019 Carl James Schwarz 527 2019-11-03

Page 37: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

pcb

N Mean Std StdErr

sp2 4 16.18 1.67 0.83

sp3 4 18.53 1.84 0.92

The standard deviations are approximately equal in all the groups. There don’t appear to be anyoutliers or unusual points. In general the males seem to have higher levels of PCBs than the females, butthere doesn’t seem to be much of a difference among the mean PCB levels in the species.

The design is balanced - every treatment has the same number of replications - this makes life easier.Every treatment combination has some data - again it makes our analysis task easier.

We draw the profile plots. This can be done by hand or using Excel or your package of choice. Itgood practice to add approximate 95% confidence intervals to the values so that the parallelism can bemore readily judged. We can find the sample means and confidence limits for each population meanquite simply because this is a CRD.

proc sort data=pcb;by sex species;

run;

proc means data=pcb noprint; /* find simple summary statistics */by sex species;var pcb;output out=means n=n mean=mean std=stddev stderr=stderr lclm=lclm uclm=uclm;

run;

proc sgplot data=means;title2 ’Profile plot with 95% ci on each mean’;series y=mean x=species / group=sex;highlow x=species high=uclm low=lclm / group=sex;yaxis label=’pcb’ offsetmin=.05 offsetmax=.05;xaxis label=’Species’ offsetmin=.05 offsetmax=.05;

run;

and then get the profile plot:

c©2019 Carl James Schwarz 528 2019-11-03

Page 38: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

The lines appear to be roughly parallel - so we expect that there may not be an interaction betweenthe two factors.

Looking at the profile plot - what is the effect of sex? What is the effect of species?

10.3.3 The statistical model

The statistical model is written as:

PCB = sex species sex ∗ species

What does this statistical model tell us about the sources of variation in the observed data?

10.3.4 Fitting the model

There are two main procedures in SAS for fitting ANOVA and regression models (collectively called lin-ear models). FIrst is Proc GLM which performs the traditional sums-of-squares decomposition. Secondis Proc Mixed which uses restricted maximum likelihood (REML) to fit models. In models with onlyfixed effects (e.g. those in this chapter), this gives the same results as sums-of-squares decompositions.In unbalanced data with additional random effects, the results from Proc Mixed may differ from that ofProc GLM – both are correct, and are just different ways to deal with the approximations necessary forunbalanced data. Proc Mixed also has the advantage of being able to bit more complex models with morethan one size of experimental unit. There is no clear advantage to using one procedure or the other – itcomes down to personal preference. Both procedures will be demonstrated in this chapter – I personallyprefer to use Proc Mixed.

Here is the code for Proc GLM:

ods graphics on;proc glm data=pcb plots=all;

c©2019 Carl James Schwarz 529 2019-11-03

Page 39: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

title2 ’Anova - balanced’;class sex species; /* class statement identifies factors */model pcb = sex species sex*species ;lsmeans sex*species / cl stderr pdiff adjust=tukey lines;lsmeans sex / cl stderr pdiff adjust=tukey lines;lsmeans species / cl stderr pdiff adjust=tukey lines;

run;ods graphics off;

Here is the code for Proc Mixed:

ods graphics on;proc mixed data=pcb plots=all;

title2 ’Mixed m- balanced’;class sex species; /* class statement identifies factors */model pcb = sex species sex*species / ddfm=kr;lsmeans sex*species / cl adjust=tukey ;lsmeans sex / cl adjust=tukey ;lsmeans species / cl adjust=tukey ;ods output tests3 =MixedTest; /* needed for the pdmix800 */ods output lsmeans=MixedLsmeans;ods output diffs =MixedDiffs;

run;ods graphics off;

/* Get a joined lines plot */%include ’../pdmix800.sas’;%pdmix800(MixedDiffs,MixedLsmeans,alpha=0.05,sort=yes);

In both procedures, the Class statement specifies the categorical factors for the model. Notice how youspecify the statistical model in the Model statement – it is very similar to the statistical model seen earlier.The extra code at the end of Proc Mixed is to generate the joined-line plots that we’ve seen earlier basedon the output from the emmeans statements (see below).

First is a Whole Model test which simply examines if there is evidence of an effect for any term inthe model. It is rarely useful. Here is the table for the overall ANOVA computed by GLM:

Source DF Sum of Squares Mean Square F Value Pr > F

Model 5 200.8720833 40.1744167 19.02 <.0001

Error 18 38.0175000 2.1120833

Corrected Total 23 238.8895833

No such table is produced by Proc Mixed.

Of more interest, are the individual Effect Tests:

Here is the table for the Effect Tests computed by Proc GLM:

c©2019 Carl James Schwarz 530 2019-11-03

Page 40: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Source DF Type III SS Mean Square F Value Pr > F

sex 1 138.7204167 138.7204167 65.68 <.0001

species 2 55.2608333 27.6304167 13.08 0.0003

sex*species 2 6.8908333 3.4454167 1.63 0.2233

Here is the table for the Effect Tests computed by Proc Mixed. Because the model only consists offixed effects, the results are identical to those from Proc GLM:

Type 3 Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

sex 1 18 65.68 <.0001

species 2 18 13.08 0.0003

sex*species 2 18 1.63 0.2233

The above table breaks down the whole model test according to the various terms in the model. Thetable below breaks down the Model line in the whole model test into the components for every term inthe model. Some packages give you a choice of effect tests. For example, SAS will print out a Type I, II,and III tests - in balanced data these will always be the same and any can be used. In unbalanced data,these three types of tests will have different results – ALWAYS use the Type III (also known as themarginal) tests. As noted earlier, if you are using R, the default method are what as known as Type I(incremental) tests and can be misleading (i.e. testing the wrong hypothesis) in cases of unbalanced data.

Start the hypothesis testing with the most complicated terms and work towards simpler terms.

The first null hypothesis isH: no interaction between sex and species in their effect on the mean PCB levelsA: some interaction among sex and species in their effect on the mean PCB levels.

The test statistic is F = 1.6313, the p-value (.2233) is not very small. Hence there is no evidence ofan interaction in the effects of sex and species upon the mean PCB levels. This is not too surprising asthe lines are fairly parallel. What does this mean in terms of the original responses, i.e., what does nointeraction say about the differences in the mean PCB levels among the sex or among the species.

If interaction was not statistically significant, then the analysis continues along to examine the maineffects. These can be examined in any order.

Examining main effects - sex.

What are the null and alternate hypotheses? The ANOVA table gives F = 65.6 and the p-value< 0.0001 - very small. There is very strong evidence of a difference in the mean PCB levels betweenthe two sexes.

Because there are only two levels of sex, no multiple comparison procedure is needed. We wouldlike estimates of the marginal means i.e., estimates of the mean PCB levels for each sex averaged overspecies, and, if possible, estimates of the mean difference in PCB levels between the two sexes averagedover species:

We obtain the marginal means, se, and 95% confidence intervals of these values:

c©2019 Carl James Schwarz 531 2019-11-03

Page 41: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Here are estimates of the marginal means (and standard errors) and the single pairwise differences(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

sex pcb LSMEAN Standard Error H0:LSMEAN=0 H0:LSMean1=LSMean2

Pr > |t| Pr > |t|

f 13.8250000 0.4195318 <.0001 <.0001

m 18.6333333 0.4195318 <.0001

sex pcb LSMEAN 95% Confidence Limits

f 13.825000 12.943596 14.706404

m 18.633333 17.751930 19.514737

Least Squares Means for Effect sex

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 -4.808333 -6.054783 -3.561884

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence interval

c©2019 Carl James Schwarz 532 2019-11-03

Page 42: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

for the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

Tukey Comparison Lines for Least Squares Means of sex

LS-means with the same letter are not significantly different.

pcb LSMEAN sex LSMEAN Number

A 18.63333 m 2

B 13.82500 f 1

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

Effect sex species EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

sex f 13.5500 0.4944 14 27.41 <.0001 0.05 12.4897 14.6103

sex m 18.9500 0.4944 14 38.33 <.0001 0.05 17.8897 20.0103

and then the pairwise differences:

sex species _sex _species EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

f m -5.4000 0.6991 Tukey <.0001 -6.8995 -3.9005

The pdmix800 macro is used to produce the joined-line plots:

Effect=sex Method=Tukey(P<0.05) Set=2

Obs sex species Estimate Standard Error Alpha Lower Upper Letter Group

7 m 18.6333 0.4195 0.05 17.7519 19.5147 A

8 f 13.8250 0.4195 0.05 12.9436 14.7064 B

In multifactor designs, it may not be very useful to estimate the marginal means. Does it make senseto take an average over the three species with equal weight given to each species? If one species is moreabundant than another species, perhaps it should be given a greater weight?

The estimated difference is -4.8 ppm (i.e., females have lower mean PCB levels on average thanmales) with a se of 0.5933. A 95% confidence interval for the difference is also given.

c©2019 Carl James Schwarz 533 2019-11-03

Page 43: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Main effects should ONLY be examined if the interaction effects are not statistically significant or ifthe non-parallelism is not very large.

Examining main effects - species

What are the null and alternate hypotheses?

The ANOVA table gives F = 13.1 and a p-value of 0.0003 - very small. There is very strongevidence of a difference in the mean PCB levels among the three species.

Once again, the test just tells us that there is evidence of a difference in the means, but doesn’t tellus which mean appears to be different. First examine the estimates of the marginal means along withapproximate 95% confidence intervals and standard error for each marginal mean and do a multiple-comparison procedure as before.

Here are estimates of the marginal means (and standard errors) and the single pairwise differences(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

species pcb LSMEAN Standard Error Pr > |t| LSMEAN Number

sp1 18.1375000 0.5138194 <.0001 1

sp2 14.4250000 0.5138194 <.0001 2

sp3 16.1250000 0.5138194 <.0001 3

species pcb LSMEAN 95% Confidence Limits

sp1 18.137500 17.058005 19.216995

sp2 14.425000 13.345505 15.504495

sp3 16.125000 15.045505 17.204495

Least Squares Means for effect speciesPr > |t| for H0: LSMean(i)=LSMean(j)Dependent Variable: pcb

i/j 1 2 3

1 0.0002 0.0323

2 0.0002 0.0756

3 0.0323 0.0756

Least Squares Means for Effect species

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 3.712500 1.857969 5.567031

1 3 2.012500 0.157969 3.867031

2 3 -1.700000 -3.554531 0.154531

c©2019 Carl James Schwarz 534 2019-11-03

Page 44: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

Tukey Comparison Lines for Least Squares Means of species

LS-means with the same letter are not significantly different.

pcb LSMEAN species LSMEAN Number

A 18.1375 sp1 1

B 16.1250 sp3 3

B

B 14.4250 sp2 2

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

c©2019 Carl James Schwarz 535 2019-11-03

Page 45: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Effect sex species EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

species sp1 18.6125 0.6422 14 28.98 <.0001 0.05 17.2351 19.9899

species sp2 14.4250 0.5244 14 27.51 <.0001 0.05 13.3004 15.5496

species sp3 15.7125 0.6422 14 24.47 <.0001 0.05 14.3351 17.0899

and then the pairwise differences:

sex species _sex _species EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

sp1 sp2 4.1875 0.8291 Tukey-Kramer 0.0005 2.0176 6.3574

sp1 sp3 2.9000 0.9082 Tukey-Kramer 0.0168 0.5230 5.2770

sp2 sp3 -1.2875 0.8291 Tukey-Kramer 0.2976 -3.4574 0.8824

The pdmix800 macro is used to produce the joined-line plots:

Effect=species Method=Tukey(P<0.05) Set=3

Obs sex species Estimate Standard Error Alpha Lower Upper Letter Group

9 sp1 18.1375 0.5138 0.05 17.0580 19.2170 A

10 sp3 16.1250 0.5138 0.05 15.0455 17.2045 B

11 sp2 14.4250 0.5138 0.05 13.3455 15.5045 B

What does this table appear to show us? Again, is it sensible to “average” over the two sexes? Asmost species have an equal sex ratio, this is likely a sensible thing to do.

How about the estimate of the differences in the mean PCB levels among the species?

Don’t forget to examine the residual and other diagnostic plots (often produced automatically bypackages).Here is the diagnostic plot from Proc GLM:

c©2019 Carl James Schwarz 536 2019-11-03

Page 46: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Here is the diagnostic plot from Proc Mixed:

There is no evidence of any gross problems in the analysis. There is some evidence that the residualsare not equally variable, but the difference in variance is not large.

10.4 Power and sample size for two-factor CRD

As before, the first thing for a power analysis is to estimate the biologically important difference todetect. THIS IS HARD as it is sometimes not clear what size of difference will make an impact. You

c©2019 Carl James Schwarz 537 2019-11-03

Page 47: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

will need an effect size for each factor and possibly for the interaction (if that is of interest).

I find it easies to start by making reasonable guess as to the MEAN for each treatment, i.e. what isthe estimated mean for each combination of factor levels. These means determine the effect sizes forthe various main effects and interactions in the model. Then look at the estimates of the main effectsand adjust the individual means until the effect sizes match that of biological importance above. This isreadily done in a spreadsheet program.

Next you will need an estimate of the residual STANDARD DEVIATION (not the se) which repre-sents the (common) variation among observations within each treatment group.

These two pieces of information allow various packages to estimate the sample sizes to detect thevarious effect sizes.

For example, consider a two-factor design with factor 1 (sex) at two levels (male and female) andfactor 2 (species) at three levels and the PCB concentration is measured in fish selected from eachcombination of sex and species.

From past studies we believe that a difference of about 4 ppm between the sexes and about 6 ppmamong species is biologically important. THere may be a slight interaction between the two factors onthe effects.

Here is my first “guess” as to the appropriate means:

Species

Sex sp1 sp2 sp3

M 14 16 20

F 10 12 16

The actual values are not that important – what is important are the effect sizes. Here the main effectof sex is found as

(14− 10) + (16− 12) + (20− 16)

3= 4

i.e. the difference between the means of the two sexes averaged over the three species.

The main effect of the third vs. the first species is:

(20− 14) + (16− 10)

2= 6

i.e. the difference in the means averaged over the two sexes.

This combination of means gives the targeted main effects for both species.

There is currently no interaction in the effects of the factors because the effect of sex (a differenceof 4 units) is the same for all species. We can add a little interaction by jittering the means slightly, butkeeping the main effects the same:

Species

Sex sp1 sp2 sp3

M 15 15 20

F 9 13 16

c©2019 Carl James Schwarz 538 2019-11-03

Page 48: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Now the main effect of sex is found as

(15− 9) + (15− 13) + (20− 16)

3= 4

i.e. no change in the main effect of sex.

Similarly, The revised main effect of the third vs. the first species (i.e. the largest difference in themeans among species) is:

(20− 15) + (16− 9)

2= 6

i.e. no change in the main effect of species.

However, now there is a slight interaction as the effect of sex is not consistent across the species.

We also believe that the standard deviation is 5.

Power computations for two-factor CRD are done using Proc GLMpower.4 This procedure requiresyou read in the means for the factor level combinations. This is done in the usual fashion.

data means;input sex $ species $ pcb;datalines;

m s1 15m s2 15m s3 20f s1 9f s2 13f s3 16;;;;

The Proc GLMpower is called – most of the statement are obvious:

ods graphics on;proc glmpower data=means;

title2 ’GLMPower’;class sex species;model pcb = sex species sex*species;power

stddev = 5alpha = .05ntotal = .power = .80 ;

plot y=power yopts=(ref=.80 crossref=yes) min=.05 max=.95;;footnote ’NOTE: You require different sample sizes for each effect’;

run;ods graphics off;

This gives the following output:

4 The Proc Power cannot be used for two-factor or more complex designs. There is a third, more flexible way, using themethods developed by Stroup that are illustrated in the SAS code on the http://www.stat.sfu.ca/ cschwarz/Stat-Ecology-Datasets.

c©2019 Carl James Schwarz 539 2019-11-03

Page 49: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Fixed Scenario Elements

Dependent Variable pcb

Alpha 0.05

Error Standard Deviation 5

Nominal Power 0.8

NOTE: You require different sample sizes for each effect

Computed N Total

Index Source Test DF Error DF Actual Power N Total

1 sex 1 48 0.821 54

2 species 2 42 0.856 48

3 sex*species 2 360 0.802 366

NOTE: You require different sample sizes for each effect

and a plot of the tradeoff between power and sample size:

You would need a TOTAL sample size of about 54 (i.e. 9 per treatment group) to detect a 4 ppbdifference in the mean concentration between the two sexes with an 80% power. A similar sample

c©2019 Carl James Schwarz 540 2019-11-03

Page 50: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

TOTAL sample size of about 48 (i.e. 8 per treatment group) would be needed to detect a 6 ppb differencein the mean concentration among the three species with an 80% power. We are fortunate that the twosample sizes for the two effects agree so well – this is often not the case and you would be forced to usethe larger sample size. For example, the total sample size to detect the effects of Factor A may be 60,while the total sample size to detect the effects of Factor B may be 90. If possible, use the larger samplesize to ensure adequate power for all factors.

The sample size requirements to detect the interaction are very large (over 350 in total). It may befeasible to conduct this experiment to detect this small interaction.

It is not necessary to have the sample sizes equal in all treatment groups, but it can be shown that the‘power’ of the test is maximized when the sample sizes are equal for all treatment combinations.

10.5 Unbalanced data - Introduction

Unbalanced data can take many forms. Some of the forms are easy to analyze, some are difficult.

Here are some illustrations of the common replication patterns that you will run into. In all casesthere are 2 levels of Factor A and three levels of Factor B and an ‘x’ represents a replicate.

• Equal replications per cell

Factor Bb1 b2 b3

+-----+-----+-----+Factor A a1 | xx | xx | xx |

+-----+-----+-----+a2 | xx | xx | xx |

+-----+-----+-----+

This is the easiest to deal with and two examples were given earlier in the notes.

c©2019 Carl James Schwarz 541 2019-11-03

Page 51: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• Unequal replications per cell, but replicates in every cell

Factor Bb1 b2 b3

+-----+-----+-----+Factor A a1 | xxx | xx | xxx |

+-----+-----+-----+a2 | xx | xx | xxxx|

+-----+-----+-----+

In this case, all cell have some data, but the number of replicates differs among cells and every cellhas 2 or more replicates. The multiple replicates within a cell are needed to estimate the MSE rowin the ANOVA table. An example of an analysis of this type of data will be given below. Becauseeach cell has replicates, it is possible to check that the variation is roughly equal in all treatmentgroups. This type of unbalance can be analyzed “easily” if the computer package has been pro-grammed correctly. BEWARE: some packages (e.g. Excel) will give WRONG answers! If youuse R, you will find that R fits what are known as incremental sums-of-squares which maynot test hypotheses that of interest – see the examples below for more details. A key assump-tion is that missing values occur completely at random (MCAR). This should be assessed beforeanalyzing any unbalanced design where the original design was balanced, but some experimentalunits were lost.

• Unequal replications per cell, with some cells having only a single observation

Factor Bb1 b2 b3

+-----+-----+-----+Factor A a1 | x | xx | xxx |

+-----+-----+-----+a2 | xx | x | xx |

+-----+-----+-----+

In this case, all cell have some data, but the number of replicates differs among cells and somecells have only a single observation. Theoretically, there is no difference in the analysis of thisexperiment from the previous example. However, in this experiment, you must assume that thevariability in the cells with replicates is an accurate representation of that in cells with only a singleobservation.

• One observation per cell

Factor Bb1 b2 b3

+-----+-----+-----+Factor A a1 | x | x | x |

+-----+-----+-----+a2 | x | x | x |

+-----+-----+-----+

If you only have a single observation per cell, it is impossible to test for interaction effects. [Tech-nically, the design has insufficient degrees of freedom for error.] We won’t discuss how to analyzethis type of data in this course, but basically you MUST ASSUME that no interaction exists, andfit a model without any interaction terms in the model. Only main effects can be tested.

c©2019 Carl James Schwarz 542 2019-11-03

Page 52: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• One or more cells completely empty

Factor Bb1 b2 b3

+-----+-----+-----+Factor A a1 | xx | xx | |

+-----+-----+-----+a2 | xx | xx | xx |

+-----+-----+-----+

SEEK HELP! Most computer packages will give you completely WRONG results! This is toughproblem. Now having said this, one simple solution to this problem (if it only occurs in one cellof the design), is to drop the column and analyze the remaining data as a two-factor, each at twolevels. In the above example, level b3 would be dropped from the experiment. However if thereare many missing cells, you may find that you are dropping most of your data!

The analysis of an unbalanced design with all cells having at least one observation and some cellshaving at least two replicates is discussed in:

Shaw, R.G. and Mitchell-Olds, T. (1993).ANOVA for unbalanced data:an overview.Ecology, 74, 1638-1645.http://dx.doi.org/10.2307/1939922.

10.6 Example - Stream residence time - Unbalanced data in a CRD

This example is taken from:

Trouton, Nicole (2004). An investigation into the factors influencing escapement estima-tion for chinook salmon (Oncorhynchus tshawytscha) on the Lower Shuswap River, BritishColumbia. M.Sc. Thesis, Simon Fraser University.

Spawning residence time is an important value in managing Pacific salmon. A standard procedureto estimate the total number of fish that have returned to spawn (the escapement) is to estimate thetotal number of fish-days spent by salmon in a stream by aerial flights and then divide this number by thespawning residence time. This procedure is called the Area-under-the-curve (AUC) method of estimatingescapement.

Trouton captured and inserted radio transmitters into a sample of chinook salmon on the LowerShuswap River in British Columbia, Canada. Then she rowed along the stream on a daily basis with aradio receiver to see how long each fish spent on the spawning grounds.

Approximately 60, 70, and 150 fish were radio tagged in 2000, 2001, and 2002 respectively, approx-imately equally split between males and female in each year. Not all fish survived the radio insertionand not all fish spawned in the reaches of the river surveyed by the radio receiver so the number of datapoints actually measured each year is less than this number.

c©2019 Carl James Schwarz 543 2019-11-03

Page 53: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

The raw data is available in a datafile called residence.csv available at the Sample Program Libraryat http://www.stat.sfu.ca/~cschwarz/Stat-Ecology-Datasets. The data are im-ported into SAS in the usual way:

data residence;infile ’residence.csv’ dlm=’,’ dsd missover firstobs=2;length trt $8.;input time sex $ year $;substr(trt,1,1) = sex;substr(trt,2,1) = ’-’;substr(trt,3,4) = year;

run;

Part of the raw data are shown below:

Obs trt time sex year

1 f-2000 4.5 f 2000

2 f-2000 5.0 f 2000

3 f-2001 4.0 f 2001

4 f-2001 6.0 f 2001

5 f-2001 1.0 f 2001

6 f-2001 4.0 f 2001

7 f-2001 5.0 f 2001

8 f-2001 6.0 f 2001

9 f-2001 4.0 f 2001

10 f-2001 3.0 f 2001

Design issues

What are the factors in this experiment? What are their levels? What is the response variable?

Is this an “experiment” or an analytical survey? What is the role of randomization in this study?

What is the treatment structure? Is it a factorial treatment structure?

What are the experimental units? What are the observational units? Are they the same? Why?

The design is unbalanced, in that there are unequal number of males and females in each year, andunequal number of fish radio tagged in each year. How did the unbalance occur? Was this a plannedfeature of the survey?

Not all fish tagged were measured? Are the missing values missing completely at random (MCAR),missing at random (MAR) or informative? What important assumption needs to be made about themissing values in order that these results may be extrapolated to the relevant populations?

c©2019 Carl James Schwarz 544 2019-11-03

Page 54: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.6.1 Preliminary summary statistics

Examine the table of simple summary statistics: We compute the means and standard deviations for eachtreatment group using Proc Tabulate in the usual way:

proc tabulate data=residence; /* proc tabulate is not for the faint of heart */title2 ’Summary table of means, std devs’;class sex year;var time;table sex*year, time*(n*f=5.0 mean*f=5.2 std*f=5.2 stderr*f=5.2) /rts=15;

run;

giving

time

N Mean Std StdErr

sex year 2 4.75 0.35 0.25

f 2000

2001 9 4.11 1.54 0.51

2002 6 3.00 0.00 0.00

m 2000 10 6.15 1.90 0.60

2001 15 8.10 2.58 0.67

2002 26 6.29 1.33 0.26

The unbalance within years and across years is quite clear. It is interesting that all six observationson females in 2002 had exactly the same value leading to a standard deviation for that particular year of0.

The standard deviations are all approximately equal - it is quite difficult to tell very much for femalesin 2000 with only two observations.

Next look at the side-by-side dot plots:

We begin by creating side-by-side dot plots:

proc sgplot data=residence;title2 ’Side-by-side dot plots’;yaxis label=’Residence Time’ offsetmin=.05 offsetmax=.05;xaxis label=’Sex and Year’ offsetmin=.05 offsetmax=.05;scatter x=trt y=time / markerattrs=(symbol=circlefilled);

run;

which gives

c©2019 Carl James Schwarz 545 2019-11-03

Page 55: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

There is no evidence of any outliers or unusual points.

The capture of fish took place over an extended period of time and there was little evidence thatschooling or any other non-independent behavior occurred among the fish.

10.6.2 The Statistical Model

This appears to a simple two-factor completely randomized design with year and sex being the twofactors. The model for this experiment will then be:

ResidenceT ime = Y ear Sex Y ear ∗ Sex

It is clear that Sex is a fixed effect. It is less clear that Year is a fixed effect. Do you want to restrictinference only to these three particular years in the survey or do you wish to extrapolate to all possibleyears? If the latter, then you run into the problem that the years selected in the study were not “randomly”selected from all possible years of interest. In cases such as these, many scientists elect to treat Year asa fixed effect, but the consequences of this need to be understood.5Notice how the Year variable waspreviously defined as a categorical factor when the data was imported into R.

Start again, but constructing profile plots:

We can find the sample means and confidence limits for each population mean quite simply becausethis is a CRD.

proc sort data=residence;by sex year;

run;

proc means data=residence noprint; /* find simple summary statistics */

5The more serious consequence is that standard errors for marginal means (e.g. the mean residence time for males or females)will be underestimated as the extra variation induced from the potential selection of a new set of years (if Years were a randomeffect) has not been accounted for.

c©2019 Carl James Schwarz 546 2019-11-03

Page 56: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

by sex year;var time;output out=means n=n mean=mean std=stddev stderr=stderr lclm=lclm uclm=uclm;

run;

proc sgplot data=means;title2 ’Profile plot with 95% ci on each mean’;series y=mean x=year / group=sex;highlow x=year high=uclm low=lclm / group=sex;yaxis label=’Residence time’ offsetmin=.05 offsetmax=.05;xaxis label=’Year’ offsetmin=.05 offsetmax=.05;

run;

and then get the profile plot:

Note that all observations in 2002 for females had the same value giving a standard deviation of 0and a confidence interval of width 0 – obviously this is silly.

There is no evidence of any interaction between the two factors upon the mean response because thelines are parallel.

10.6.3 Hypothesis testing and estimation

There are two main procedures in SAS for fitting ANOVA and regression models (collectively called lin-ear models). FIrst is Proc GLM which performs the traditional sums-of-squares decomposition. Secondis Proc Mixed which uses restricted maximum likelihood (REML) to fit models. In models with onlyfixed effects (e.g. those in this chapter), this gives the same results as sums-of-squares decompositions.In unbalanced data with additional random effects, the results from Proc Mixed may differ from that ofProc GLM – both are correct, and are just different ways to deal with the approximations necessary forunbalanced data. Proc Mixed also has the advantage of being able to bit more complex models with morethan one size of experimental unit. There is no clear advantage to using one procedure or the other – itcomes down to personal preference. Both procedures will be demonstrated in this chapter – I personally

c©2019 Carl James Schwarz 547 2019-11-03

Page 57: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

prefer to use Proc Mixed.

Here is the code for Proc GLM:

ods graphics on;proc glm data=residence plots=all;

title2 ’analysis’;class sex year;model time = sex year sex*year;lsmeans sex*year / cl pdiff adjust=tukey lines;lsmeans sex / cl pdiff adjust=tukey lines;lsmeans year / cl pdiff adjust=tukey lines;

run;ods graphics off;

Here is the code for Proc Mixed:

ods graphics on;proc mixed data=residence plots=all;

title2 ’Mixed model balanced’;class sex year; /* class statement identifies factors */model time = sex year sex*year / ddfm=kr;lsmeans sex*year / cl adjust=tukey ;lsmeans sex / cl adjust=tukey ;lsmeans year / cl adjust=tukey ;ods output tests3 =MixedTest; /* needed for the pdmix800 */ods output lsmeans=MixedLsmeans;ods output diffs =MixedDiffs;

run;ods graphics off;

/* Get a joined lines plot */%include ’../pdmix800.sas’;%pdmix800(MixedDiffs,MixedLsmeans,alpha=0.05,sort=yes);

In both procedures, the Class statement specifies the categorical factors for the model. Notice how youspecify the statistical model in the Model statement – it is very similar to the statistical model seen earlier.The extra code at the end of Proc Mixed is to generate the joined-line plots that we’ve seen earlier basedon the output from the emmeans statements (see below).

The Whole Model test is not very interesting as it tests if there is an effect of Sex, or Year, oran interaction upon the mean response. It is rarely useful. Here is the table for the overall ANOVAcomputed by GLM:

Source DF Sum of Squares Mean Square F Value Pr > F

Model 5 157.6422197 31.5284439 10.36 <.0001

Error 62 188.7254274 3.0439585

Corrected Total 67 346.3676471

c©2019 Carl James Schwarz 548 2019-11-03

Page 58: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

No such table is produced by Proc Mixed.

The Effect Test section of the output is much more informative. Here is the table for the Effect Testscomputed by Proc GLM:

Source DF Type III SS Mean Square F Value Pr > F

sex 1 76.60591323 76.60591323 25.17 <.0001

year 2 22.31105774 11.15552887 3.66 0.0313

sex*year 2 8.65181116 4.32590558 1.42 0.2492

Here is the table for the Effect Tests computed by Proc Mixed. Because the model consists only offixed effects, the results are identical to those from Proc GLM:

Type 3 Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

sex 1 62 25.17 <.0001

year 2 62 3.66 0.0313

sex*year 2 62 1.42 0.2492

The Effects Test table breaks down the whole model test according to the various terms in the model.The table below breaks down the Model line in the whole model test into the components for every termin the model. Some packages give you a choice of effect tests. For example, SAS will print out a TypeI, II, and III tests - in balanced data these will always be the same and any can be used. In unbalanceddata, these three types of tests will have different results – ALWAYS use the Type III (also known asthe marginal) tests. As noted earlier, if you are using R, the default method are what as known as Type I(incremental) tests and can be misleading (i.e. testing the wrong hypothesis) in cases of unbalanced data.

As in the balanced case, start with tests for interaction between the two factors upon the mean re-sponse. The null hypothesis is:

H: no interaction between the two factors upon the mean response.A: some interaction between the two factors upon the mean response.

The test statistic is F = 1.42 with a p-value of 0.24. There is no evidence of an interaction betweenthe two factors upon the mean response. This is not surprising given the apparent parallelism of the linesin the factor profile plots.

Because there was no evidence of an interaction, it is sensible to examine main effects of each factor.In both cases, the null hypothesis is no effect of the factor upon the mean response when averaged overthe levels of the other factor. Note that in multi-factor designs, the hypotheses of main effects are alwaysin terms of ‘averages over the other factors’ in the model. This is one reason why it is important to testfor a possible interaction between factors before testing for main effects.

There is strong evidence of a Sex effect upon the mean residence time (F = 25.2, p < .0001), andthere is evidence of a Year effect upon the mean residence time (F = 3.7, p = .031).

It is of interest to estimate the various effects as well. First start with an examination of the sex effect.

Here are estimates of the marginal means (and standard errors) and the single pairwise differences

c©2019 Carl James Schwarz 549 2019-11-03

Page 59: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

sex time LSMEAN H0:LSMean1=LSMean2

Pr > |t|

f 3.95370370 <.0001

m 6.84615385

sex time LSMEAN 95% Confidence Limits

f 3.953704 2.928447 4.978960

m 6.846154 6.319631 7.372677

Least Squares Means for Effect sex

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 -2.892450 -4.045003 -1.739898

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidence

c©2019 Carl James Schwarz 550 2019-11-03

Page 60: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

that this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

Tukey-Kramer Comparison Lines for Least Squares Means of sex

LS-means with the same letter are not significantly different.

time LSMEAN sex LSMEAN Number

A 6.8461538 m 2

B 3.9537037 f 1

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

Effect sex year EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

sex f 3.9537 0.5129 62 7.71 <.0001 0.05 2.9284 4.9790

sex m 6.8462 0.2634 62 25.99 <.0001 0.05 6.3196 7.3727

and then the pairwise differences:

sex year _sex _year EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

f m -2.8925 0.5766 Tukey-Kramer <.0001 -4.0450 -1.7399

The pdmix800 macro is used to produce the joined-line plots:

Effect=sex Method=Tukey-Kramer(P<0.05) Set=2

Obs sex year Estimate Standard Error Alpha Lower Upper Letter Group

7 m 6.8462 0.2634 0.05 6.3196 7.3727 A

8 f 3.9537 0.5129 0.05 2.9284 4.9790 B

The LeastSquare Mean for females is 3.95 days which differs from the raw mean of 3.79 days. Thisis an artifact of the unbalance in the experiment.

If you look at the raw means for females presented earlier, you find that they were

c©2019 Carl James Schwarz 551 2019-11-03

Page 61: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Year n Raw data Mean

2000 2 4.5, 5.0 4.75

2001 9 4.0, 6.0, 1.0, 4.0, 5.0, 6.0, 4,0, 3.0, 4.0 4.11

2002 6 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 3.00

Overall 17 3.79

The overall raw mean of 3.79 days is found by adding up all of the 2 + 9 + 6 = 17 observations andtaking a simple average. The emmeans of 3.95 days is found as the “average of the averages”, i.e.

3.95 =4.75 + 4.11 + 3.00

3

Which is “better”. There is no simple way to determine which, if any, of these two estimates is“better”. The simple raw mean suffers from the fact that a year with more observation (e.g. 2001) isgiven more weight in determining the average than a year with fewer observation (e.g. 2000). Theemmeans give each year’s data equal weight.

If the different sample sizes are simple artifacts of the experiment and have no intrinsic meaning, thenthe emmeans may be preferable. If, however, the different sample sizes are related to something that isof biological importance (for example, suppose that the spawning population in 2001 was almost fivetimes as large as that in 2000), then the raw means may be more suitable. Many scientists unthinkinglyuse the emmeans automatically because that is what the computer package automatically spits out.

The estimated difference in mean residence time (averaged across all three years) is about 2.89 dayswith a standard error of .58 days.

Similarly, one can look at the Year effects in more detail.

Here are estimates of the marginal means (and standard errors) and the single pairwise differences(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

year time LSMEAN LSMEAN Number

2000 5.45000000 1

2001 6.10555556 2

2002 4.64423077 3

year time LSMEAN 95% Confidence Limits

2000 5.450000 4.099261 6.800739

2001 6.105556 5.370306 6.840805

2002 4.644231 3.854446 5.434015

c©2019 Carl James Schwarz 552 2019-11-03

Page 62: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Least Squares Means for effect speciesPr > |t| for H0: LSMean(i)=LSMean(j)Dependent Variable: pcb

i/j 1 2 3

1 0.0002 0.0323

2 0.0002 0.0756

3 0.0323 0.0756

Least Squares Means for Effect year

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 -0.655556 -2.502883 1.191772

1 3 0.805769 -1.073758 2.685297

2 3 1.461325 0.165154 2.757496

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

c©2019 Carl James Schwarz 553 2019-11-03

Page 63: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Tukey-Kramer Comparison Lines for Least Squares Means of year

LS-means with the same letter are not significantly different.

time LSMEAN year LSMEAN Number

A 6.1055556 2001 2

A

B A 5.4500000 2000 1

B

B 4.6442308 2002 3

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

Effect sex year EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

year 2000 5.4500 0.6757 62 8.07 <.0001 0.05 4.0993 6.8007

year 2001 6.1056 0.3678 62 16.60 <.0001 0.05 5.3703 6.8408

year 2002 4.6442 0.3951 62 11.75 <.0001 0.05 3.8544 5.4340

and then the pairwise differences:

sex year _sex _year EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

2000 2001 -0.6556 0.7693 Tukey-Kramer 0.6722 -2.5029 1.1918

2000 2002 0.8058 0.7827 Tukey-Kramer 0.5613 -1.0738 2.6853

2001 2002 1.4613 0.5398 Tukey-Kramer 0.0235 0.1652 2.7575

The pdmix800 macro is used to produce the joined-line plots:

Effect=year Method=Tukey-Kramer(P<0.05) Set=3

Obs sex year Estimate Standard Error Alpha Lower Upper Letter Group

9 2001 6.1056 0.3678 0.05 5.3703 6.8408 A

10 2000 5.4500 0.6757 0.05 4.0993 6.8007 AB

11 2002 4.6442 0.3951 0.05 3.8544 5.4340 B

Again notice that the emmeans differ from the raw means because of the imbalance in the data. Hereit makes sense to weight the sexes equally because of the (presumed) 50:50 sex ratio within each year.

Estimates of differences in the mean residence time between pairs of years can be readily determined.

Don’t forget to examine the residual and other diagnostic plots (often produced automatically by

c©2019 Carl James Schwarz 554 2019-11-03

Page 64: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

packages). Here is the diagnostic plot from Proc GLM:

Here is the diagnostic plot from Proc Mixed:

There is no evidence of any gross problems in the analysis. There is some evidence that the residualsare not equally variable, but the difference in variance is not large.

c©2019 Carl James Schwarz 555 2019-11-03

Page 65: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.6.4 Power and sample size

As before, an estimate of the standard deviation is needed. This can be chosen as somewhere in the rangeof the standard deviations presented in the summary table, or the

√MSE = RMSE value from the

ANOVA table can be used. In this case, the estimated standard deviation is√

3.044 = 1.74.

10.7 Example - Energy consumption in pocket mice - Unbalanceddata in a CRD

Here is an example showing some of the problems that you may run into when analyzing unbalanceddata.

This data was taken from:

French, A.R. 1976.Selection of high temperature for hibernation by the pocket mouse: Ecological advantagesand energetic consequences.Ecology, 57, 185-191http://dx.doi.org/10.2307/1936410

He collected the following data on the energy utilization of the pocket mouse (Perognathus longimem-bris) during hibernation at different temperatures:

Restricted Food Ad libitum food

8◦C 18◦C 8◦C 18◦C

62.69 72.60 95.73 101.19

54.07 70.97 63.95 76.88

65.73 74.32 144.30 74.08

62.98 53.02 144.30 81.40

46.22 66.58

59.10 84.38

61.79 118.95

61.89 118.95

62.50

All readings are in kcal/g.

The raw data is available in a datafile called residence.csv available at the Sample Program Libraryat http://www.stat.sfu.ca/~cschwarz/Stat-Ecology-Datasets. The data are im-ported into SAS in the usual way:

data energy;infile ’mouse.csv’ dlm=’,’ dsd missover firstobs=2;length diet $10 temp $3 trt $15;input temp diet energy;

c©2019 Carl James Schwarz 556 2019-11-03

Page 66: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

substr(trt,1,3) = temp;substr(trt,4,1) = ’-’;substr(trt,5 ) = diet;

run;

Part of the raw data are shown below:

Obs diet temp trt energy

1 ad 18C 18C-ad 101.19

2 ad 18C 18C-ad 76.88

3 ad 18C 18C-ad 74.08

4 ad 18C 18C-ad 81.40

5 ad 18C 18C-ad 66.58

6 ad 18C 18C-ad 84.38

7 ad 18C 18C-ad 118.95

8 ad 18C 18C-ad 118.95

9 res 18C 18C-res 72.60

10 res 18C 18C-res 70.97

10.7.1 Design issues

What are the factors in this experiment? Their levels? What is the response variable? Is this an “exper-iment” or an observational study? If the latter, what is the role of randomization in this study? Whatis the treatment structure in this experiment? What is the experimental unit structure? What are theexperimental and observation units? Why is the design unbalanced? What unbalanced the design, or didit occur “by chance”? What is the randomization structure?

Are the factors to be considered fixed effects? Hence, does this experiment appear to satisfy therequirements for a two-factor fixed-effects CRD?

10.7.2 Preliminary summary statistics

Create some simple summary statistics. Create a ‘pseudo-factor’ representing the different combinationsof the levels of the two factors and construct side-by-side dot plots and find the means and standarddeviations of each treatment group in the usual way.

We compute the means and standard deviations for each treatment group using Proc Tabulate in theusual way:

proc tabulate data=energy; /* proc tabulate is not for the faint of heart */title2 ’Summary table of means, std devs’;class temp diet;var energy;table temp*diet, energy*(n*f=5.0 mean*f=5.2 std*f=5.2 stderr*f=5.2) /rts=15;

c©2019 Carl James Schwarz 557 2019-11-03

Page 67: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

run;

giving

energy

N Mean Std StdErr

temp diet 8 90.30 20.28 7.17

18C ad

res 9 62.49 9.23 3.08

8C ad 4 112.1 39.41 19.71

res 4 61.37 5.05 2.53

The side-by-side dot plots are created:

proc sgplot data=energy;title2 ’Side-by-side dot plots’;yaxis label=’Enercy’ offsetmin=.05 offsetmax=.05;xaxis label=’Treatmemt’ offsetmin=.05 offsetmax=.05;scatter x=trt y=energy / markerattrs=(symbol=circlefilled);

run;

which gives

There doesn’t appear to be any difference in the mean energy usage between the two temperaturelevels under the restricted food diet, but both appear to be less than the mean energy usage from the adlibitum groups.

The standard deviations appear to be quite different! This is very worrisome - the ANOVA method is

c©2019 Carl James Schwarz 558 2019-11-03

Page 68: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

fairly robust to unequal variances provided the sample sizes are equal in all groups. I would proceedwith caution in the subsequent analysis!

The design is unbalanced (the sample sizes are not equal in all groups).

Next, we construct the profile plot:

We can find the sample means and confidence limits for each population mean quite simply becausethis is a CRD.

proc sort data=energy;by temp diet;

run;

proc means data=energy noprint; /* find simple summary statistics */by temp diet;var energy;output out=means n=n mean=mean std=stddev stderr=stderr lclm=lclm uclm=uclm;

run;

proc sgplot data=means;title2 ’Profile plot with 95% ci on each mean’;series y=mean x=diet / group=temp;highlow x=diet high=uclm low=lclm / group=temp;yaxis label=’Energy’ offsetmin=.05 offsetmax=.05;xaxis label=’Diet’ offsetmin=.05 offsetmax=.05;

run;

and then get the profile plot:

The profile plot shows that some interaction may be present but it appears to be swamped by samplinguncertainty (i.e. the wide confidence intervals imply that the lines could be parallel). Indeed, this is notunexpected given the earlier plot that showed no difference in the means between the two temperatures

c©2019 Carl James Schwarz 559 2019-11-03

Page 69: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

under the restricted diet but some apparent differences under the ad libitum diet. Because of the strongevidence of differential standard deviations among the groups, we may not detect this effect.

10.7.3 The statistical model

The model for this study is:energy = food temp food*temp

What does this statistical model tell us about the sources of variation in the observed data?

10.7.4 Fitting the model

There are two main procedures in SAS for fitting ANOVA and regression models (collectively called lin-ear models). FIrst is Proc GLM which performs the traditional sums-of-squares decomposition. Secondis Proc Mixed which uses restricted maximum likelihood (REML) to fit models. In models with onlyfixed effects (e.g. those in this chapter), this gives the same results as sums-of-squares decompositions.In unbalanced data with additional random effects, the results from Proc Mixed may differ from that ofProc GLM – both are correct, and are just different ways to deal with the approximations necessary forunbalanced data. Proc Mixed also has the advantage of being able to bit more complex models with morethan one size of experimental unit. There is no clear advantage to using one procedure or the other – itcomes down to personal preference. Both procedures will be demonstrated in this chapter – I personallyprefer to use Proc Mixed.

Here is the code for Proc GLM:

ods graphics on;proc glm data=energy plots=all;

title2 ’Anova using GLM’;class temp diet; /* class statement identifies factors */model energy = temp diet temp*diet;/* Note that the type III ss do not add to model ss because

of imbalance in the design */lsmeans temp*diet /cl stderr pdiff adjust=tukey lines;lsmeans temp / cl stderr pdiff adjust=tukey lines;lsmeans diet / cl stderr pdiff adjust=tukey lines;

run;ods graphics off;

Here is the code for Proc Mixed:

ods graphics on;proc mixed data=energy plots=all;

title2 ’Mixed analysis’;class temp diet; /* class statement identifies factors */model energy = temp diet temp*diet / ddfm=kr;lsmeans temp*diet / cl adjust=tukey ;lsmeans temp / cl adjust=tukey ;lsmeans diet / cl adjust=tukey ;ods output tests3 =MixedTest; /* needed for the pdmix800 */

c©2019 Carl James Schwarz 560 2019-11-03

Page 70: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

ods output lsmeans=MixedLsmeans;ods output diffs =MixedDiffs;

run;ods graphics off;

/* Get a joined lines plot */%include ’../pdmix800.sas’;%pdmix800(MixedDiffs,MixedLsmeans,alpha=0.05,sort=yes);

In both procedures, the Class statement specifies the categorical factors for the model. Notice how youspecify the statistical model in the Model statement – it is very similar to the statistical model seen earlier.The extra code at the end of Proc Mixed is to generate the joined-line plots that we’ve seen earlier basedon the output from the emmeans statements (see below).

10.7.5 Hypothesis testing and estimation

There can be several ANOVA tables produced from the model fit.

The Whole Model test which simply examines if there is evidence of an effect for any term in themodel. It is rarely useful but does tell us that we should proceed further and investigate the variouseffects. Here is the table for the overall ANOVA computed by GLM:

Source DF Sum of Squares Mean Square F Value Pr > F

Model 3 9092.57694 3030.85898 7.67 0.0012

Error 21 8297.83396 395.13495

Corrected Total 24 17390.41090

No such table is produced by Proc Mixed.

The effect tests table breaks down the whole model test according to the various terms in the model.

Here is the table for the Effect Tests computed by Proc GLM:

Source DF Type III SS Mean Square F Value Pr > F

temp 1 579.080566 579.080566 1.47 0.2395

diet 1 8374.291389 8374.291389 21.19 0.0002

temp*diet 1 711.861727 711.861727 1.80 0.1939

Here is the table for the Effect Tests computed by Proc Mixed. Because the model consists only offixed effects, the results are identical to those from Proc GLM:

Type 3 Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

temp 1 21 1.47 0.2395

diet 1 21 21.19 0.0002

c©2019 Carl James Schwarz 561 2019-11-03

Page 71: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Type 3 Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

temp*diet 1 21 1.80 0.1939

The Effects Test table breaks down the whole model test according to the various terms in the model.The table below breaks down the Model line in the whole model test into the components for every termin the model. Some packages give you a choice of effect tests. For example, SAS will print out a TypeI, II, and III tests - in balanced data these will always be the same and any can be used. In unbalanceddata, these three types of tests will have different results – ALWAYS use the Type III (also known asthe marginal) tests. As noted earlier, if you are using R, the default method are what as known as Type I(incremental) tests and can be misleading (i.e. testing the wrong hypothesis) in cases of unbalanced data.

As in the balanced case, start the hypothesis testing with the most complicated terms and worktowards simpler terms. In this case, start with the interaction effects.

Our null hypothesis isH: no interaction among the effects of food and temperature on the mean response.A: some interaction among the effects of food and temperature on the mean response.

Our test statistic is F = 1.80, the p-value (.1939) is not very small. Hence there is no evidence of aninteraction among the effects of food and temperature on the mean response. This is somewhat surprisinggiven the profile plots constructed earlier - it may be an artifact of the unequal standard deviations amongthe groups or because our sample sizes are not very large.

Examining main effects - temperature.

The F -statistic is 1.47, the p-value is 0.2395. There is no evidence of a difference among the meanenergy requirements at the two temperature levels.

This would be confirmed by looking at the estimated marginal means and the estimated difference inthe means:

Here are estimates of the marginal means (and standard errors) and the single pairwise differences(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

temp energy LSMEAN Standard Error H0:LSMEAN=0 H0:LSMean1=LSMean2

Pr > |t| Pr > |t|

18C 76.3956250 4.8294863 <.0001 0.2395

8C 86.7187500 7.0279349 <.0001

temp energy LSMEAN 95% Confidence Limits

18C 76.395625 66.352158 86.439092

8C 86.718750 72.103359 101.334141

c©2019 Carl James Schwarz 562 2019-11-03

Page 72: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Least Squares Means for Effect temp

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 -10.323125 -28.056590 7.410340

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen inthe previous chapters.

Tukey-Kramer Comparison Lines for Least Squares Means of temp

LS-means with the same letter are not significantly different.

energy LSMEAN temp LSMEAN Number

A 86.71875 8C 2

A

A 76.39563 18C 1

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much more

c©2019 Carl James Schwarz 563 2019-11-03

Page 73: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

logical fashion than in Proc GLM. First here are the estimates of the marginal means:

Effect temp diet EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

temp 18C 76.3956 4.8295 21 15.82 <.0001 0.05 66.3522 86.4391

temp 8C 86.7188 7.0279 21 12.34 <.0001 0.05 72.1034 101.33

and then the pairwise differences:

temp diet _temp _diet EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

18C 8C -10.3231 8.5274 Tukey-Kramer 0.2395 -28.0566 7.4103

The pdmix800 macro is used to produce the joined-line plots:

Effect=temp Method=Tukey-Kramer(P<0.05) Set=2

Obs temp diet Estimate Standard Error Alpha Lower Upper Letter Group

5 8C 86.7188 7.0279 0.05 72.1034 101.33 A

6 18C 76.3956 4.8295 0.05 66.3522 86.4391 A

Here is where the unbalance again causes some difficulty in the analysis. Notice that the emmeansno longer equal the raw means. In unbalanced data, simple means across the factors may be affected byunequal sample sizes. The simple mean for the 8◦C levels would include 4 mice at the restricted dietand 4 mice at the ad libitum diet - an equal split. However, the simple mean for the 18◦C diet wouldinclude 9 mice at the restricted diet and 10 mice under the ad libitum diet - no longer an equal weightingamong the two diets. The emmeans are computed by giving equal weights to each of the two means fromthe two diets. There is no universal agreement upon this (and you thought that Statistics was so cut anddried) but, in most situations, the estimated marginal means seem preferable.

Examining main effects - food level.

The ANOVA table gives F = 21.2 and the p-value is 0.0002 - very small. There is very strongevidence of a difference in the mean energy requirements between the two food levels. Because weonly have two levels, it is obvious where the difference lies. But we find the Least Square Means andestimated differences for the food levels:

Here are estimates of the marginal means (and standard errors) and the single pairwise differences(it is not necessary to use Tukey’s procedure - why?). These are requested using the emmeans statementin the GLM procedure. The emmeans estimates are equal to the raw sample means – this is will be trueONLY in balanced data. In the case of unbalanced data (see later), the emmeans seem like a sensibleway to estimate marginal means.

c©2019 Carl James Schwarz 564 2019-11-03

Page 74: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

diet energy LSMEAN Standard Error H0:LSMEAN=0 H0:LSMean1=LSMean2

Pr > |t| Pr > |t|

ad 101.185625 6.086370 <.0001 0.0002

res 61.928750 5.972596 <.0001

diet energy LSMEAN 95% Confidence Limits

ad 101.185625 88.528325 113.842925

res 61.928750 49.508056 74.349444

Least Squares Means for Effect diet

i j Difference Between Means Simultaneous 95% Confidence Limits for LSMean(i)-LSMean(j)

1 2 39.256875 21.523410 56.990340

Proc GLM also provides a difference plot.

In this plot, follow the light grey lines beside two pairs of brands to where they intersect on either a solidblue line or a dashed red line. The confidence interval runs out from this point. If the confidence intervalfor the difference in the means does NOT intersect the dashed grey line (X = Y ), then there is evidencethat this difference in the means is not equal to zero (the solid blue lines). If the confidence interval forthe difference in the means does intersect the dashed grey line (the dashed red lines), then there is noevidence of a difference in the means.

Finally, Proc GLM can also produce the joined-line plot to make the interpretation easier as seen in

c©2019 Carl James Schwarz 565 2019-11-03

Page 75: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

the previous chapters.

Tukey-Kramer Comparison Lines for Least Squares Means of diet

LS-means with the same letter are not significantly different.

energy LSMEAN diet LSMEAN Number

A 101.1856 ad 1

B 61.9288 res 2

The output from Proc Mixed is not quite as extensive, and, in my opinion, organized in a much morelogical fashion than in Proc GLM. First here are the estimates of the marginal means:

Effect temp diet EstimateStandardError DF

tValue

Pr>|t| Alpha Lower Upper

diet ad 101.19 6.0864 21 16.62 <.0001 0.05 88.5283 113.84

diet res 61.9287 5.9726 21 10.37 <.0001 0.05 49.5081 74.3494

and then the pairwise differences:

temp diet _temp _diet EstimateStandardError Adjustment

AdjP

AdjLow

AdjUpp

ad res 39.2569 8.5274 Tukey-Kramer 0.0002 21.5234 56.9903

The pdmix800 macro is used to produce the joined-line plots:

Effect=diet Method=Tukey-Kramer(P<0.05) Set=3

Obs temp diet Estimate Standard Error Alpha Lower Upper Letter Group

7 ad 101.19 6.0864 0.05 88.5283 113.84 A

8 res 61.9287 5.9726 0.05 49.5081 74.3494 B

Again, notice that the estimated marginal means are different than the raw means which are shownin the last column. This will usually happen when the design is unbalanced. The estimated marginalmeans give equal weight to the two temperature levels while the raw means weight the two temperaturelevels according to the observed sample sizes.

Don’t forget to examine the residual and other diagnostic plots (often produced automatically bypackages). Here is the diagnostic plot from Proc GLM:

c©2019 Carl James Schwarz 566 2019-11-03

Page 76: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Here is the diagnostic plot from Proc Mixed:

There is no evidence of any gross problems in the analysis. There is some evidence that the residualsare not equally variable, but the difference in variance is not large.

10.7.6 Adjusting for unequal variances?

This is beyond the scope of this course, but a formal test for unequal variances showed clear evidence ofa problem. A more exact test, fortunately, came to similar conclusions, but the estimated standard error

c©2019 Carl James Schwarz 567 2019-11-03

Page 77: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

of the difference is slightly different.

10.8 Example: Use-Dependent Inactivation in Sodium Channel BetaSubunit Mutation - BPK

This dataset was provided by Csilla Egri as part of a 2011 M.Sc. Thesis from the Department of Biomed-ical Physiology & Kinesiology at Simon Fraser University, Burnaby, BC. “C121W: A thermosensitivesodium channel mutation”. Additional details are available at http://dx.doi.org/10.1016/j.bpj.2010.12.2506.

10.8.1 Introduction

Voltage gated sodium (NaV) channels are macromolecular complexes which pass sodium specific inwardcurrent and are the main determinants of action potential in initiation and propagation. NaV normallyassociate with one or more auxiliary beta subunits which modify voltage dependent properties. Mutationsto these proteins can cause epilepsy, cardiac arrhythmias, and skeletal muscle disorders.

The research question is if temperature exacerbates the functional effect of an epilepsy causingsodium channel beta subunit mutation (C121W).

10.8.2 Experimental protocol

The experiment was conducted using whole-cell voltage clamp experiments on CHO cells expressingsodium channel proteins with or without auxiliary beta subunits. On each day, a different batch of cellswas used, and the voltage dependent properties were determined at either one of two temperatures (22◦

C or 34◦ C). Three sodium channels were examined:

• NaV1.2 Sodium channel without associated subunit which served as a negative control.

• NaV1.2 + B(WT) is a control

• NaV1.2 + B(CW) is the experimental group where the mutation in the beta subunit causes geneticepilepsy with febrile seizures plus, in which patients experience seizures in response to fever.

The Use-dependent inactivation (UDI) was measured for each batch of cells.

10.8.3 Analysis

The raw data is available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-Ecology-Datasets. The csv file names UDI.csv is read into SAS in the usual way:

data UDI;infile ’UDI.csv’ dlm=’,’ dsd missover firstobs=2;length subunit $20;

c©2019 Carl James Schwarz 568 2019-11-03

Page 78: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

input Subunit $ Temp UDI_Yo;run;

proc print data=UDI (obs=15);title2 ’Part of the raw data’;

run;

The first few lines of the raw data are shown below:

Obs subunit Temp UDI_Yo trt

1 Nav1.2 22 0.31 22.Nav1.2

2 Nav1.2 22 0.10 22.Nav1.2

3 Nav1.2 22 0.17 22.Nav1.2

4 Nav1.2 22 0.27 22.Nav1.2

5 Nav1.2 22 0.43 22.Nav1.2

6 Nav1.2 22 0.39 22.Nav1.2

7 Nav1.2 + Beta 22 0.68 22.Nav1.2 + Beta

8 Nav1.2 + Beta 22 0.51 22.Nav1.2 + Beta

9 Nav1.2 + Beta 22 0.27 22.Nav1.2 + Beta

10 Nav1.2 + Beta 22 0.19 22.Nav1.2 + Beta

11 Nav1.2 + Beta 22 0.12 22.Nav1.2 + Beta

12 Nav1.2 + Beta 22 0.41 22.Nav1.2 + Beta

13 Nav1.2 + Beta 22 0.21 22.Nav1.2 + Beta

14 Nav1.2 + Beta 22 0.44 22.Nav1.2 + Beta

15 Nav1.2 + Beta(CW) 22 0.19 22.Nav1.2 + Beta(CW)

We begin by examining the experimental protocol to determine the appropriate analysis for thisexperiment.

Treatment structure. There are two factors for this experiment (subunit type with 3 levels andtemperature with 2 levels). All subunits were tested at both temperatures. This is a factorial experiment.We are interested only in the levels tested of each factor – both factors are fixed effects.

Experimental structure. A separate batch of cells was tested on each day and each batch of cellswas only tested once. The batch of cells is the experimental unit.

Randomization structure. Each batch of cells of subunit type was randomized (we hope) to aseparate temperature and day to be tested. There was no blocking by week.

Consequently, this experiment appears to be two-factor completely randomized design (Two-factorCRD).

Before performing any analysis, it is always wise to examine the data to see if the assumptionsrequired for the analysis are approximately satisfied, to search for outliers or unusual points, and tocheck for any data anomalies.

c©2019 Carl James Schwarz 569 2019-11-03

Page 79: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

We begin by examining the sample sizes, means, and standard deviations of each treatment (combi-nation of subunit and temperature) group.

The sample size, mean, and standard deviation for each treatment group are found using Proc Tabu-late:

proc tabulate data=UDI;class Temp Subunit;var UDI_Yo;table temp*subunit, UDI_Yo*(n*f=4.0 mean*f=7.2 std*f=7.2);

run;

UDI_Yo

N Mean Std

Temp subunit 6 0.28 0.13

22 Nav1.2

Nav1.2 + Beta 8 0.35 0.19

Nav1.2 + Beta(CW) 6 0.28 0.08

34 Nav1.2 5 0.54 0.16

Nav1.2 + Beta 5 0.47 0.09

Nav1.2 + Beta(CW) 7 0.62 0.09

The design is unbalanced (the number of batches of cells under each treatment combination is un-equal). CAUTION: The analysis of unbalance data can be tricky as different packages make differentassumptions on how to test the main effects and interaction.

The standard deviations are all approximately equal which appears to satisfy the assumption of equalvariances for each treatment group. A plot of the standard deviations of each treatment versus the meanof each treatment group

c©2019 Carl James Schwarz 570 2019-11-03

Page 80: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

also doesn’t show any strong relationship between the standard deviation and the mean6.

We also construct side-by-side dot plots by constructing a pseudo-factor from the combination of thetwo factor levels. We create a pseudo-factor by concatenating the value of the Temperature and Subunitvariables together and then do a dot plot:

/* Side-by-side dot plots to check for outliers *//* Create the pseudo factor */data UDI;

set UDI;length trt $20.;trt = put(temp,2.0) || ’.’ || subunit;

run;proc sgplot data=UDI;

title2 ’Side-by-side dot plots’;yaxis label=’UDI’ offsetmin=.05 offsetmax=.05;xaxis label=’Treatmemt’ offsetmin=.05 offsetmax=.05;scatter x=trt y=UDI_yo / markerattrs=(symbol=circlefilled);

run;

6Often the standard deviation increases with the mean which would imply that some transformation (usually a log() transfor-mation) should be used.

c©2019 Carl James Schwarz 571 2019-11-03

Page 81: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

The dot plot doesn’t show any obvious outliers.

The assumption of normality will be checked after we fit the model. It is not correct to simply checkif the pooled (over all treatment groups) comes from a single normal distribution as each of the treatmentgroups has a different mean.

The assumption of independence needs to be assessed based on the experimental protocol. Separatebatches of cells were used, so there should be no “batch effects” the may cause a similar response overthe experimental conditions.

It appears that all the assumptions required for this design are satisfied. We now proceed to the theformal analysis. We start by developing the formal statistical model for this experiment. This model hasterms corresponding to the factors (and their interaction if the experiment is factorial), the experimentalunits (the smallest size of experimental unit is always implicit), and the randomization structure (wealways assume complete randomization at this time).

Using a standard shorthand syntax, the model is

UDI = Temp Subunit Temp ∗ Subunit

The UDI represents the variation in the response variable (the UDI value). Some of the variation isexplained by the factor (and their interaction) effects (the Temp, Subunit, and Temp∗Subunit terms).Because there is only size of experimental unit (the batch of cells), the random batch-to-batch variationis implicit (and does not formally enter the model). Finally, complete randomization occurred during theexperiment, so no extra terms are needed as well.

The above shorthand notation is commonly used in many computer packages as noted below.

c©2019 Carl James Schwarz 572 2019-11-03

Page 82: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Either Proc GLM or Proc Mixed can be used to analyze this data. The code sequences are shownbelow.

ods graphics on;proc glm data=UDI plots=diagnostics;

title2 ’GLM analysis’;class temp subunit;model UDI_Yo = temp subunit temp*subunit;lsmeans temp / stderr pdiff cl adjust=tukey lines ;lsmeans subunit / stderr pdiff cl adjust=tukey lines;lsmeans temp*subunit / stderr pdiff cl adjust=tukey lines;estimate ’CW vs reg at 34’

subunit 0 1 -1subunit*temp 0 0 0 0 1 -1 / e;

ods output ModelAnova=GLMModelAnova;run;ods graphics off;

ods graphics on;proc mixed data=UDI plot=residualpanel;

title2 ’Mixed analysis’;class temp subunit;model UDI_Yo = temp subunit temp*subunit / ddfm=kr ;lsmeans temp / diff adjust=tukey;lsmeans subunit / diff adjust=tukey;lsmeans temp*subunit / diff adjust=tukey;estimate ’CW vs reg at 34’

subunit 0 1 -1subunit*temp 0 0 0 0 1 -1 / e;

ods output tests3=MixedTests;ods output lsmeans=MixedLSMeans;ods output diffs=MixedDiffs;

run;ods graphics off;

Selected portions of the output are shown below and for the remainder of the document. You shouldrun the SAS programs available from the Sample Program Library to get the complete listing.

EffectNumDF

DenDF

FValue

Pr>F

Temp 1 31 29.41 <.0001

subunit 2 31 0.41 0.6676

Temp*subunit 2 31 2.26 0.1212

c©2019 Carl James Schwarz 573 2019-11-03

Page 83: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

DependentHypothesisType Source DF

TypeISS

MeanSquare

FValue

Pr>F

UDI_Yo 3 Temp 1 0.52095609 0.52095609 29.41 <.0001

UDI_Yo 3 subunit 2 0.01450323 0.00725162 0.41 0.6676

UDI_Yo 3 Temp*subunit 2 0.08008009 0.04004004 2.26 0.1212

Start by examining the effect tests for the interaction-terms. The p-value for the interaction effectis 0.12 indicating no evidence of an interaction effect between the two factors of the experiments. Thisimplies that the responses to the two temperatures should be parallel over the different sub-unit types,i.e. the difference in mean UDI between the two temperature levels is the same for all sub-units.

Because there was no evidence of an interaction, it is sensible to look at the main effects. Had therebeen strong evidence of an interaction, it may not be sensible to look at the main effects.

The p-value for the effect of sub-unit is 0.66 indicating no evidence of a difference in mean UDIamong the three sub-units when averaged across both temperatures. The p-value for the effect of tem-perature is ≤ .0001 indicating strong evidence of a temperature effect.

The assumptions of normality and influence of individual observations is provided in the outputgenerated by ODS GRAPHICS. Only the output from GLM is shown:

c©2019 Carl James Schwarz 574 2019-11-03

Page 84: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

There is no evidence of a problem with the model fit.

The profile plot

c©2019 Carl James Schwarz 575 2019-11-03

Page 85: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

shows a separation in the profiles between the two temperatures as expected from the results of the effecttests, but the variation in response is sufficient to “hide” any non-parallelism (the interaction) that maybe present. It is pity that the default output from most packages doesn’t add standard error (or confidenceinterval) bars to the above plot.

Examine carefully the estimated population marginal means for each level of temperature, subunit,or the treatment combination (not all of which may be shown below):

Effect Temp subunit EstimateStandardError DF

tValue

Pr>|t|

Temp 22 0.3040 0.03003 31 10.12 <.0001

Temp 34 0.5448 0.03269 31 16.67 <.0001

subunit _ Nav1.2 0.4072 0.04029 31 10.10 <.0001

subunit _ Nav1.2 + Beta 0.4139 0.03794 31 10.91 <.0001

subunit _ Nav1.2 + Beta(CW) 0.4521 0.03702 31 12.21 <.0001

Temp*subunit 22 Nav1.2 0.2783 0.05433 31 5.12 <.0001

Temp*subunit 22 Nav1.2 + Beta 0.3538 0.04705 31 7.52 <.0001

Temp*subunit 22 Nav1.2 + Beta(CW) 0.2800 0.05433 31 5.15 <.0001

Temp*subunit 34 Nav1.2 0.5360 0.05952 31 9.01 <.0001

Temp*subunit 34 Nav1.2 + Beta 0.4740 0.05952 31 7.96 <.0001

Temp*subunit 34 Nav1.2 + Beta(CW) 0.6243 0.05030 31 12.41 <.0001

c©2019 Carl James Schwarz 576 2019-11-03

Page 86: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Because the design is NOT balanced, the marginal means (known as the emmeans in SAS lingo) arenot equal to the corresponding raw means computed using the data. These marginal estimates are foundby first computing the average of each combination of subunit and temperature, and then averaging theseaverages across the three subunits for each temperature level.

Because the interaction term was not statistically significant, the effect of temperature could be con-stant for all subunits resulting in two parallel lines. The estimated temperature effect (or the averagedifference between the means at the two temperature levels) is found as:The emmeans statements also estimates the difference in the marginal means.

Effect Temp subunit _Temp _subunit Est SEAdjP

Temp 22 34 -0.2407 0.04439 <.0001

subunit _ Nav1.2 _ Nav1.2 + Beta -0.00671 0.05534 0.9919

subunit _ Nav1.2 _ Nav1.2 + Beta(CW) -0.04498 0.05472 0.6924

subunit _ Nav1.2 + Beta _ Nav1.2 + Beta(CW) -0.03827 0.05301 0.7525

Temp*subunit 22 Nav1.2 22 Nav1.2 + Beta -0.07542 0.07188 0.8972

Temp*subunit 22 Nav1.2 22 Nav1.2 + Beta(CW) -0.00167 0.07684 1.0000

Temp*subunit 22 Nav1.2 34 Nav1.2 -0.2577 0.08059 0.0343

Temp*subunit 22 Nav1.2 34 Nav1.2 + Beta -0.1957 0.08059 0.1779

Temp*subunit 22 Nav1.2 34 Nav1.2 + Beta(CW) -0.3460 0.07404 0.0007

Temp*subunit 22 Nav1.2 + Beta 22 Nav1.2 + Beta(CW) 0.07375 0.07188 0.9055

Temp*subunit 22 Nav1.2 + Beta 34 Nav1.2 -0.1823 0.07587 0.1867

Temp*subunit 22 Nav1.2 + Beta 34 Nav1.2 + Beta -0.1202 0.07587 0.6141

Temp*subunit 22 Nav1.2 + Beta 34 Nav1.2 + Beta(CW) -0.2705 0.06888 0.0054

Temp*subunit 22 Nav1.2 + Beta(CW) 34 Nav1.2 -0.2560 0.08059 0.0360

Temp*subunit 22 Nav1.2 + Beta(CW) 34 Nav1.2 + Beta -0.1940 0.08059 0.1849

Temp*subunit 22 Nav1.2 + Beta(CW) 34 Nav1.2 + Beta(CW) -0.3443 0.07404 0.0008

Temp*subunit 34 Nav1.2 34 Nav1.2 + Beta 0.06200 0.08417 0.9757

Temp*subunit 34 Nav1.2 34 Nav1.2 + Beta(CW) -0.08829 0.07793 0.8639

Temp*subunit 34 Nav1.2 + Beta 34 Nav1.2 + Beta(CW) -0.1503 0.07793 0.4048

and the estimated difference in the mean UDI between the two temperature levels (averaged across allthree sub-unit types) is 0.24 (SE .044).

A specialized contrast comparing NaV1.2+B(CW) and NAV1.2 sub-unit types at the elevated tem-perature can be formed using a contrast.

The ESTIMATE statement in the previous program estimates this contrast. Note that you need toknow the order in which SAS has stored the factor levels and the parameterization used to ensure that thecorrect coefficients are being used. Please see me for more details.

The Mixed Procedure

c©2019 Carl James Schwarz 577 2019-11-03

Page 87: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Coefficients for CW vs reg at 34

Effect subunit Temp Row1

Intercept

Temp 22

Temp 34

subunit Nav1.2

subunit Nav1.2 + Beta 1

subunit Nav1.2 + Beta(CW) -1

Temp*subunit Nav1.2 22

Temp*subunit Nav1.2 + Beta 22

Temp*subunit Nav1.2 + Beta(CW) 22

Temp*subunit Nav1.2 34

Temp*subunit Nav1.2 + Beta 34 1

Temp*subunit Nav1.2 + Beta(CW) 34 -1

The Mixed Procedure

Estimates

Label Estimate Standard Error DF t Value Pr > |t|

CW vs reg at 34 -0.1503 0.07793 31 -1.93 0.0630

This indicates that there is no evidence that the estimated mean UDI at 34◦ C for the Nav1.2+Beta(CW)sub-unit type is different than the mean UDI at 34◦ C for the Nav1.2+Beta unit because the estimateddifference is 0.15 (SE 0.08, p = .063).

Other contrasts can be explored in a similar fashion.

The original research question was “if temperature exacerbates the functional effect of an epilepsycausing sodium channel beta subunit mutation (C121W).” If there were an effect of temperature, thenthe lines would not be parallel. As there was no evidence of non-parallelism in the ANOVA table, thereis no evidence against the hypothesis. Similarly, there was no evidence that the mean response at 34◦Cwas different among the two non-control sub-units (the final contrast).

10.9 Blocking in two-factor CRD designs

This twist on the design poses no real problems. Things to look out for:

• the randomization within each block is done independently of every other block

• the blocks must be complete, i.e., every treatment combination must appear in every block.

• when analyzing the data, specify factors as fixed or random as well as blocks as fixed or random.

c©2019 Carl James Schwarz 578 2019-11-03

Page 88: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

• as before, random blocks or random factors imply that some packages may give you incorrectresults.

• random blocks or random factors imply that some of the packages will have difficult in estimatingthe se of the marginal means and of the contrasts among the means.

The statistical model for this experiment will be

Y = Block Factor1 Factor2 Factor1 ∗ Factor2

The blocking variables can be declared as a fixed or random effect. If blocks are complete, i.e. everyblock has every treatment combination, then it makes no difference to the analysis if blocks are treatedas fixed or random effects. If blocks are incomplete, e.g. due to missing values or deliberate action, thena model with random blocks can recover additional information – see the chapter on Incomplete BlockDesigns.

An example of such a design will be done in an assignment.

10.10 FAQ

10.10.1 How to determine sample size in two-factor designs

How are sample sizes determined in two-factor experiments?

In a later chapter, you will analyze in detail an experiment to investigate ways of warming peoplesuffering from hypothermia.

Suppose that literature reviews have shown that in past experiments, the standard deviation in thetime needed to rewarm bodies was around 10 minutes. We are interested in detecting differences ofabout 10 minutes in the mean time needed to rewarm bodies among the three methods and between thetwo sexes. What sample sizes would be needed for a CRD.

First, compute the sample size required for each of the two factors.

When examining sex effects, we find that the standard deviation is about 10, the difference to detectis about 10, and a total sample size of about 34 is required for an 80% power at α = 0.05.

When examining method effects, we find that again the standard deviation is about 10, the differencebetween the smallest and largest mean is set to 10 while the third mean is placed in the middle, and atotal sample size of about 60 is needed for an 80% power at α = 0.05.

The two results must be reconciled. If you must obtain the desired power for both factors, then youmust use the larger sample size, i.e., about 10 subjects per treatment combination.

If this is too costly, you will have to make compromises. For example, you could choose a samplesize of 50 which would give you the desired power for detecting sex effects, but not method effects.

c©2019 Carl James Schwarz 579 2019-11-03

Page 89: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

10.10.2 What is the difference between a ‘block’ and a ‘factor’?

Blocks are typically not manipulated but rather are a collection of experimental units that are similar. Youusually assume that blocking effects don’t interact with factor effects. Usually, blocks are not assigned toexperimental units - the experimental units are conveniently grouped into blocks. Blocks are "passive".

A factor has levels that are manipulated and randomized over the experimental units. Factors areusually assigned to experimental units. [Of course in analytical surveys, units are randomly selected fromeach level rather than being assigned to levels.] There is usually no natural grouping of experimentalunits to factor levels. Factors are "active".

In the case of the rancid fat, laboratories were assigned at random to irradiate the experimental units– the batches of fat. There is no natural grouping of batches of fat with the laboratories. Consequently,laboratories were treated as a factor rather than a block.

In the case of the mosquito repellent, people were randomly assigned to locations. There were nonatural groups of people at each location. Location was again treated as a factor rather than a block.

In the case of seedling growth, the blocks were locations around the province. At each location, therewere several 1 ha plots. The plots were grouped naturally by location. Hence, location should be treatedas a block rather than as a factor.

In some cases, the distinction is not as clear cut. The computer packages will give the same resultsif a term is factor or a block so the final results will be the same. The only real reason to distinguishcarefully among blocks or factors is for interpreting the results. Usually, blocks are not randomized sotests for "block" effects don’t make much sense.

10.10.3 If there is evidence of an interaction, does the analysis stop there?

In the case of an interaction being detected, first examine if a transformation would remove the interac-tion. For example, if the factor operates multiplicatively (it reduces yield by 1/2) rather than additively(it reduced yield by 50 kg/ha) a log-transform would remove interaction effects.

For example, consider an experiment with two factors, one with three levels and the second at twolevels. Table 10.84 shows population means under the assumption of additivity.

Table 10.84: Population means under assumption of additivity between factors

Factor A

Factor B a1 a2 a3

b1 10 20 15

b2 35 45 40

Notice that the effect of going from a1 to a2 is the same for both levels of B. Similarly, the effect ofgoing from b1 to b2 is the same for all levels of B. Note that the above table refers to POPULATIONmeans - the sample means may not enjoy this strict additivity as an artifact of the sampling process.

There are two ways in which additivity can fail. First, the units may be measured on the wrong scale.Consider Table 10.85 of means where additivity does not hold:

In Table 10.85, the treatment effects of each factor are not the same across all levels of the other factor,

c©2019 Carl James Schwarz 580 2019-11-03

Page 90: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Table 10.85: Population means when the assumption of additivity between factors is false but correctable

Factor A

Factor B a1 a2 a3

b1 10 20 15

b2 100 200 150

but notice that the mean for level a2 is always twice the mean for level a1 regardless of the level of B, andthat the mean for level b1 is always one-tenth of the mean for level b2 regardless of the level of A. Thissuggests that the effects of each factor are multiplicative rather than additive, and that the analysis shouldproceed on the log-scale. Indeed, consider the same values in Table 10.86 after a log-transformation:7

Table 10.86: Population means when the assumption of additivity among factors is false but correctable.A log-transform is applied.

Factor A

Factor B a1 a2 a3

b1 2.30 3.00 2.71

b2 4.61 5.30 5.01

The treatment effects in Table 10.86 are now additive on the log-scale (how can you tell?).

In some cases, no transformation will correct non-additivity among factors as illustrated in Ta-ble 10.87..

Notice that in Table 10.87 the response in going from a1 to a2 is different depending upon the levelof B, but there is not simple pattern to the means.

If evidence of interaction is still present, then it really doesn’t make much sense to test main effects.An interaction between two factors indicates that the effect of a factor changes depending on the levelof the other factor. The test for the main effect in the presence of an interaction would examine if theaverage effect exists - this may not bear any relevance to the individual effects.

At this point, you should examine the individual treatment combinations to see which treatmentsappear to differ from other treatments.

10.10.4 When should you use raw means or emmeans?

When should raw means be used and when should emmeans be used?7Either the ln or log transformation can be used. The results will differ by a constant.

Table 10.87: Population means when the assumption of additivity among factors is false and not cor-rectable.

Factor A

Factor B a1 a2 a3

b1 10 20 15

b2 35 30 25

c©2019 Carl James Schwarz 581 2019-11-03

Page 91: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

The difference between emmeans and Raw Mean (ordinary means) is important whenever you "av-erage" across units with unequal sample sizes.

Sub-sampling example: For example, suppose you have 2 fish, one with 2 sub-samples and theother with 5 sub-samples with the following made up data for the fat concentration in tissues?

Fish 1 2 4

Fish 2 7 8 9 4 11

The raw mean is computed as (2 + 4 + 7 + 8 + 9 + 10 + 11)/7 = 7.2. The emmeans is computed

as the average of the averages(2+4)

2 +(7+8+9+10+11)

5

2 = (3+9)2 = 6.

Which is a better representation of the average fat concentration across fish? In this case it seemsreasonable that each fish should be given equal "weight" in the averaging process, so the emmeans islikely a better estimate.

Now suppose rather than fish 1 and fish 2, you had one reading from each of 2 male fish and onereading from each of 5 female fish:

Male 2 4

Female 7 8 9 10 11

What is the best estimate of the average fat concentration over males and females? Here it is not soclear. If the sex ratio in the population is 50:50, then the emmeans are appropriate as the sample size justreflects that females were easier to catch.

But if the two sexes were equally catchable, then the different sample size for the two sexes is areflection of an unequal sex ratio, so the population average has to reflect the different sex ratio and theraw means may be more appropriate.

But what if the sex ratio was not 50:50 and males/females were not equally catchable. Then againdifferent weighting would be needed.

Two-factor example: Consider the following (made up) data from a two-factor completely random-ized design where factor A has 2 levels, and factor B has 3 levels:

Factor Bb1 b2 b3

+----------+-------- +-------------+Factor A a1 | 1, 3, 5 | 7, 9 | 12 |

+----------+---------+-------------+a2 | 5 | 9, 11 | 13, 14, 15 |

+----------+---------+-------------+

We wish to estimate the effect of Factor A. For each level of B, the mean of level a2 is 2 more thanthe mean of level a1. For example, the mean of treatment a1,b1 is 3 and the mean of treatment a2,b1 is5.

The two raw means for Factor A are computed as

Y a1,raw =1 + 3 + 57 + 9 + 12

6= 6.17

c©2019 Carl James Schwarz 582 2019-11-03

Page 92: Two Factor Designs - Single-sized Experimental units - CR ...people.stat.sfu.ca/~cschwarz/CourseNotes/PDFbigbook-ALL/SAS-chapter-11.pdf · CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED

CHAPTER 10. TWO FACTOR DESIGNS - SINGLE-SIZED EXPERIMENTAL UNITS - CR ANDRCB DESIGNS

Y a2,raw =5 + 9 + 11 + 13 + 14 + 15

6= 11.17

and the difference in the raw means is not consistent with the effect of Factor A in the table!

Again, what has happened, is that the raw mean for level a1 of Factor A is more heavily weightedtowards level b1 of factor B, while the raw mean for level a2 is more heavily weighted to level b3 ofFactor B.

In many cases, it seem sensible to give equal weight to all of the levels of the factor you are averagingacross. As in the previous example, we take the averages of the averages to get:

Y a1,emmeans =3 + 8 + 12

3= 7.67

Y a2,emmeans =5 + 10 + 14

3= 9.67

and now the difference in the least-square means reflects the effect of Factor A seen in the table.

While most computer packages automatically give equal weight to the levels of the other factorswhen finding main effects (or any other effect when you have to average over the levels of other factors),this is NOT always necessarily the correct thing to do.

Perhaps the sample sizes in the above table actually reflect the different population frequencies, i.e.there are many more population units in treatment a1, b1 than it a1, b3. If this is correct, then you willneed to use a weighted-average of averages to reflect these population values.

Missing values in an RCB A similar reasoning occurs when missing values occur in a RandomizedComplete Block design. Consider the values below in a design with Factor A with 2 levels and threeblocks.

Blocks1 2 3

+----------+-------- +-------------+Factor A a1 | 3 | 7 | 12 |

+----------+---------+-------------+a2 | 5 | 9 |missing |

+----------+---------+-------------+

If find the raw means of each level of Factor A we find that the difference in the raw means does notreflect the consistent difference seen in block 1 and block 2.

Now an “average-of-averages” cannot be computed because each cell only has one observation.However, a key assumption of blocked designs is that there is no interaction between blocks and treat-ments, or, that the treatment effect is consistent among blocks. So the treatment effect estimated inblocks 1 and 2 should be a reasonable “guess” for the treatment effect in block 3, and we impute a valuefor the missing observation, in this case 14 = 12 + 2, before we compute the least-square-means. (Inactual fact, the computations are more difficult, but the general idea is the same).

So... in many cases the emmeans are preferable, but this is not always true. You need to ex-amine the circumstances for each survey or experiment to decide which mean is most appropriate.More details about this are available at: http://onbiostatistics.blogspot.ca/2009/04/least-squares-means-marginal-means-vs.html

s

c©2019 Carl James Schwarz 583 2019-11-03