© imperial college londonpage 1 experimental design emma mccoy imperial college london

60
© Imperial College London Page 1 Experimental design Emma McCoy Imperial College London

Upload: elmer-kelley

Post on 27-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 1

Experimental design

Emma McCoyImperial College London

Page 2: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 2

1. Introduction

• The aim of these two lectures is to give you an introduction to some of the ideas, principles, and methods of experimental design.

• Experimental design is planned, purposeful intervention and manipulation of a process to:

- understand how it works- identify how to improve it

• Experimental design shows you how to collect the data so that it gives maximum information about the process or system you are studying

- in the most efficient way- for the least cost- in the least time

Page 3: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 3

Note that

• we need to take natural variation into account:- people, machines, software systems vary

• our aim is to try to identify causes of that variation• blocking and balance

- control variation arising from known causes• randomisation

- control variation arising from unknown causes- valid statistical inferences

• simultaneous study of several factors- can do better than just varying one at a time

Page 4: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 4

You may have heard of experimental design by other names

• the Deming method• Six Sigma• the Kaizen method• These systems are based on elementary ideas of experimental

design

Page 5: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 5

2. One factor designs

• A telephone holiday booking company is considering replacing the system it uses on all its booking screens. It has been shown two different systems, A and B and would like to choose that which minimises the time required to train new operators. To investigate this, it intends to train a series of new operators for a day using each system, and see how well they can do the job afterwards. How well they can do the job is measured by a test, on which a high score is good.

• Note that, once an operator has been trained using one system, they cannot also be trained using the other, since they then know what the job involves, and ought to pick it up more quickly.

• Question: Which system gives the highest performance scores after one day’s training?

Page 6: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 6

• Problem: There is natural variability between operators - so they won’t all score the same, even using identical systems.

• Solution: Train several operators using each system and compare the average results.

• Basic statistical result: averages are more likely to be near the true mean than are single observations. The more observations that are averaged, the better.

Page 7: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 7

• Problem: Maybe older people score differently from younger people. So maybe any differences are due to the fact that system A was tested on mainly young operators, system B on older ones. Maybe there are other differences between people which tend to lead to different scores (sex, education, previous experience, interest, etc.).

• We won’t be able to think of all possible such differences

• Solution: Randomise so that people are assigned to a system at random.

• Now any differences are either due to differences between systems, or are due to chance (statistical tests).

Page 8: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 8

• Alternative solution: Control the random variation by arranging people in blocks. Block 1: People aged 30 or over: 5 using A 5 using BBlock 2: People aged under 30: 5 using A 5 using B

• Now age does not contribute to any difference between systems in block A,

• and age does not contribute to any difference between systems in block B.

• Average block A and block B differences to get overall difference Age does not contribute to any difference between systems overall.

Page 9: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 9

Example: Data: Results from 16 people: High score is good System A B

13 1514 1715 1918 2113 1223 2519 2021 23Means17 19

• So, system B appears better than system A. the difference being 2.

Page 10: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 10

• Simple graphical display:- medians- Q1 and Q3- whisker connecting extreme points to quartiles

• Does the difference of 2 reflect underlying truth?• Or is it just chance arising from the particular people chosen?• If we had chosen 16 different people, would we have got a different result?

12

14

16

18

20

22

24

A B

Page 11: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 11

Such questions are answered by hypothesis tests and significance tests

• Basic principle:

If there is really no difference between the methods, how often would we expect to observe such a large difference between A and B?

• If A is really better than B, how often would we expect to observe B two or more points larger than A?

Page 12: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 12

e.g. t-tests for comparing two groups

• Basic principle of t-testSimulate the distribution of differences between the average A score and the average B score, when the scores are drawn from a population which has no average differenceSee what proportion of these have values of 2 or moreCan sidetrack the hassle of simulation by using basic statistical ideas, which tell us that the distribution of differences will have a known form called a t-distribution

• NOTE: the groups do not have to have the same number of observations in them.

Page 13: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 13

But perhaps differences between the way the systems are used lead to differences between scoresIf so, what is the best way to use the system?

e.g.1: does Editing style (Edit, with two possible styles, D and

E) make a difference?

We could do a similar experiment: train group of people using editing style D, and a group using editing style E, and compare results.

e.g.2: does the format of the display (Format, with two possible formats, Y or N) make a difference?

Again, do a similar experiment: train group of people using format Y and a group using format N and compare the results.

• And so on

Page 14: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 14

• Many experiments needed:- one to compare system- one to compare editing style- one to compare display format

• 16 people to compare systems 16 people to compare editing style• 16 people to compare display format

- making 48 people in all• Very expensive.• Very slow • the company needs to know next month, not next year!

Page 15: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 15

• Worse:

- perhaps the factors interact:

• That is, editing style D may be better with display format Y than with display format N

• but editing style E may be better with display format N than with display format Y.

Page 16: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 16

3. Two factor designs

• Suppose we want to investigate the effect of both

System (A or B) and Editing style: (D or E)

• If we do two separate experiments we need 32 people

- and we can’t see if there is any interaction

• Instead adopt a more sophisticated experimental design

Page 17: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 17

• We can arrange things in a 2×2 (“two by two”) design (also 2 )

System A B E 13 15 E 14 17 E 15 19 E 18 21Edit D 13 12 D 23 25 D 19 20 D 21 23

• Only 16 people but 8 on each system and 8 on each editing style !• Half the cost of the ‘obvious’ approach (16 people instead of 32)• And joint information on both factors - can explore interaction

2

2

Page 18: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 18

• Two possible situations:

Case 1: the difference between system levels is the same for each editing style

That is, there is no interaction between system and edit style

Case 2: the difference between system levels is different for each editing style

That is, there is an interaction between system and edit style

Page 19: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 19

Means in each combination of factors

System

A B

Edit E 15 18

Edit D 19 20

Page 20: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 20

Case 1: the difference between system levels is the same for each editing style

• That is, the system A-B difference is the same for editing styles E and for D[we say system and editing style do not interact]

• Note: we expect to observe some differences due to random variation, but they are not statistically significant: they can be attributed to chance.

• So get better estimates of the difference between A and B by averaging over the two levels of editing style

Page 21: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 21

• Same argument for the difference between levels of editing styleThe four group means are

SystemA B

Edit E 15 18 16.5Edit D 19 20 19.5

17 19

• The effect of switching from A to B is an increase in performance of 19 - 17 = 2

• That is: the main effect of system is 2

• Likewise: the main effect of editing style is 19.5 – 16.5 = 3

Page 22: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 22

Put another way:SystemA B

Edit E 15 18 16.5Edit D 19 20 19.5

17 19

• The simple effect of System when Edit=D is (20-19) = 1

• The simple effect of System when Edit=E is (18-15) = 3

• The main effect of System is the average of these 1/2 [1 + 3] = 2

• And the main effect of Edit is the average of its simple effects: 1/2 [(20 - 18) + (19 - 15)] = 3

Page 23: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 23

Case 2: the difference between system levels is different for each editing style (System and Edit interact)• This means we believe that the effect of System depends on

whether Edit is D or E.• and the effect of Edit depends on whether system A or system B

is being used.• The interaction is the difference of the simple effects• That is 1/2 [(18 - 15) - (20 - 19)] = 1

SystemA B

Edit E 15 18 16.5Edit D 19 20 19.5

17 19

Page 24: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 24

4. A shorthand notation

• System main effect: 1/2 [(18 - 15) + (20 - 19)] = 1/2 [ + 18 – 15 + 20 - 19 ]

System A B

Edit E -15 +18Edit D -19 +20

• Edit main effect = 1/2 [ - 18 - 15 + 20 + 19]• Interaction = 1/2 [ - 18 + 15 + 20 - 19]

System System A B A BEdit E -15 -18 Edit E +15 -18Edit D +19 +20 Edit D -19 +20

Page 25: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 25

• Note that all of the effects and interactions are sums and differences of the cell means.

• The different effects and interactions are given by different combinations of plus and minuses.

• A weighted combination of the cell means is called a contrast.

Page 26: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 26

(i) Label factors by lower case letter: s for System, e for Edit

(ii) Designate one level of each factor as high and the other as low Thus B = high System, D = high Edit

(iii) List, as rows of a table, all factor combinations, using 1 if a factor’s low level is present, and its letter if its high level is present: (s) means the high level of System and the low of Edit, (se) means the high level of both, (1) means the low level of both.

(iv) List, as columns of the table, all main effect and interactions. For main effects use a minus if the low level of the factor occurs in the row, and a plus if the high level occurs. For interactions use the product of the signs for its factors.

Page 27: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 27

s System B = high, A = low e Edit D = high, E = low

Factor comb’n Not’n Av E S ES• System Edit

A E (1) 15 - - +A D (e) 19 + - -B E (s) 18 - + -B D (se) 20 + + +

(v) To compute a main effect or interaction, add up the averages, multiplying by the appropriate sign, and divide by the number of pluses.

Page 28: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 28

5. More than two factors

• Suppose we want to investigate the effect of Edit, System, and Format (that is, the display format), wheref = Format can be type N (high) or type Y (low)

• Testing each of Edit, System, and Format separately would require 48 people, and would not allow us to explore interactions

• Now there are 8 possible treatment combinations

Page 29: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 29

System A B

Edit FormatE Y 13 15E Y 14 17E N 15 19E N 18 21

D Y 13 12D Y 23 25D N 19 20D N 21 23

• Again only 16 people altogether but have 8 at each level of System and of Edit and of Format

Magic!

Page 30: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 30

This is a 2 design.• Generalising the ideas from the 2 design above, we get

Fac Notation Av S E F ES EF SE SEFSEFAEY (1) 13.5 - - - + + + -ADY (e) 18 - + - - - + +BEY (s) 16 + - - - + - +AEN (f) 16.5 - - + + - - +BDY (es) 18.5 + + - + - - -ADN (ef) 20 - + + - + - -BEN (sf) 20 + - + - - + -BDN (esf) 21.5 + + + + + + +

• Note that now there are three two-factor interactions and one three factor interaction

• A three factor interaction tells us how the two-factor interaction between two factors varies according to the level of the third factor.

3

22

Page 31: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 31

• From this: - the main effect of System is

1/4 [ -13.5-18+16-16.5+18.5-20+20+21.5] = 2.0That is, the average advantage in performance if system B is used instead of system A is 2.0

• the interaction of Edit and Format is 1/4[+13.5-18+16-16.5-18.5+20-20+21.5] = -0.5

• The effect of Format depends whether editing style D or E is used, and the difference in the effect of Format is 0.5

• etc.

Page 32: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 32

NOTE:

1) There are formal statistical test procedures to see if the effects are real (represent true differences between factor levels) or are merely due to the people we happened to have chosen for our study.

2) There are informal graphical procedures to see if the effects are real (represent true differences between factor levels), or are merely due to chance.

3) Designs like these are called factorial designs, because they allow the testing of many factors.

Page 33: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 33

6. Example: A 2 design

• In an experiment to compare spreadsheets, the response variable was the time (in seconds) needed to complete a task.

• There were- two spreadsheets: Excel, WingZ (a)- two computers: MacintoshSE, Macintosh IIcx (b)- two RAM configurations:

1Mb free RAM, 2Mb free RAM (c)

- two tasks:

import file, open the program and file (d)

• That is, four factors, each at two levels

4

Page 34: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 34

The data

Treatment comb Notation Response time (sec)

Excel, SE, 1Mb RAM, import (1) 87.0WingZ, SE, 1Mb RAM, import a 21.0Excel, IIcx, 1Mb RAM, import b 44.5Excel, SE, 2Mb RAM, import c 82.0Excel, SE, 1Mb RAM, open d 51.0WingZ, IIcx, 1Mb RAM, import ab 8.5WingX, SE, 2Mb RAM, import ac 20.0WingZ, SE, 1Mb RAM, open ad 38.5Excel, IIcx, 2Mb RAM, import bc 45.0Excel, IIcx, 1Mb RAM, open bd 37.0Excel, SE, 2Mb RAM, open cd 43.0WingZ, IIcx, 2Mb RAM, import abc 9.0WingZ, IIcx, 1Mb RAM, open abd 21.2WingZ, SE, 2Mb RAM, open acd 32.0Excel, IIcx, 2Mb RAM, open bcd 37.0WingZ, IIcx, 2Mb RAM, open abcd 18.0

Page 35: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 35

Notation Response A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD

(1) 87.0 - - - - + + + + + + - - - - +a 21.0 + - - - - - - + + + + + + - -b 44.5 ………………………………………………………………….. c 82.0 …………………………………………………………………..

abcd …………………………………………………………………………….

Question:(i) What is the main effect of RAM configuration?

(ii) What is the interaction effect of spreadsheet, computer, and RAM configuration?

Page 36: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 36

Solution:

(i) This is the column headed C. It will have a plus in it whenever a c appears in a row, and a minus otherwise:

[-87.0-21.0-44.5+82.0-51.0.....+37.0+18.0]/8 = -2.8375

Page 37: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 37

(ii) This is the column headed ABC. It will have a plus in it whenever an odd number of the three symbols {a, b, c} appears in a row, and a minus otherwise.

[-87.0+21.0+44.5+82.0-51.0.....-37.0+18.0]/8 = -1.0875

A -32.2875 (I)B -19.2875 (ii)ABD -8.4125D -4.9125C -2.8375CD -1.5875ABC -1.0875ACD -0.7125ABCD -0.0875AC 0.2875BCD 0.5375BC 2.2875AB 5.5875BD 6.4625AD 17.7125

Page 38: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 38

Interpretation:

(i) Spreadsheet (factor A) main effect = -32.29

The spreadsheet product WingZ has an average response time which is 32.29 seconds faster than the spreadsheet product Excel

(ii) Computer (factor B) main effect = -19.29

The Macintosh IIcx has an average response time which is 19.29 seconds faster than the MacintoshSE.

Page 39: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 39

But what about interactions?

In particular, the AD interaction (spreadsheet by task) is large.Means:

TaskImport Open

Spreadsheet Excel 64.625 42.000WingZ 14.625 27.425

• We see that Excel is slower at importing than opening files, whereas WingZ is slower at opening files than importing them.

• However, WingZ is always faster than Excel (the main effect of computer favours WingZ).

Page 40: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 40

7. Fractional replication

• A replicate is a repeat of a measurement or an experimental design under identical conditions

• The results are unlikely to be exactly identical because of random variation.

• For example, if I randomly choose students, put them in a particular office, with a particular calculator, at a particular time of day, and give them a statistics test, the result may differ from student to student

Page 41: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 41

• A 2 design has 16 treatment combinations.

• A 2 design has 1024 treatment combinations

• With m observations of each treatment combination there are 1024m observations to be made.

• This might be very slow, expensive, etc.

• It might be impossible to get 1024 results (e.g. train 1024 students?)

• Can we choose a subset of the treatment combinations which still give us the information we want?

4

10

Page 42: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 42

• To illustrate, return to the 2 study of performance in which system (System), editing style (Edit), and display format (Format) were factors of interest

• There are 8 treatment combinations, so if we could afford to train 8 people we could estimate the main effects and interaction

• (In our previous study, we trained 2 people in each treatment combination, making 16 people in all. 2 in each combination is better than just 1 because we get more accurate estimates and more powerful tests).

• Suppose, however, that we can only afford to test 4 people. That is, we can only test 4 of the 8 treatment combinations.

• Let’s choose those which have a plus in the three way interaction column.

3

Page 43: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 43

Fac Notation Av S E F ES EF SE SEFSEFAEY (1) 13.5 - - - + + + -ADY (e) 18 - + - - - + +BEY (s) 16 + - - - + - +AEN (f) 16.5 - - + + - - +BDY (es) 18.5 + + - + - - -ADN (ef) 20 - + + - + - -BEN (sf) 20 + - + - - + -BDN (esf) 21.5 + + + + + + +

Fac Notation Av S E F ES EF SE SEFSEFADY (e) 18 - + - - - + +BEY (s) 16 + - - - + - +AEN (f) 16.5 - - + + - - +BDN (esf) 21.5 + + + + + + +

Page 44: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 44

• We can still estimate the S main effect• We can still estimate the E main effect• We can still estimate the F main effect

• E has two observations at its high level, and two at its low level- the estimated m.e. of Edit is 1/2 [18-16-16.5+21.5] = 3.5

• S has two observations at its high level, and two at its low level- the estimated m.e. of System is 1/2 [-18+16-16.5+21.5]=3

• F has two observations at its high level, and two at its low level- the estimated m.e. of Format is 1/2 [-18-16+16.5+21.5] = 2

• But we cannot estimate the three way interaction- only has the pluses. It's not a contrast

Page 45: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 45

• If we were to do separate experiments to study each effect, with two observations at the high and low levels in each case, we would need 4 observations for each of the three factors

• That is 3×2×2 = 12 observations altogether

• But we’ve managed to achieve the same result with only 4

Magic!

Page 46: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 46

• But it is at a price.• The table shows that the +/- pattern is the same

- for S and EF- for E and SF- for F and SE

• The S and EF effects are said to be confounded. etc.• So, the estimate for S is also the estimate for EF.• But this is OK if we believe the EF interaction is likely to be

small.• Such a design is a fractional factorial design because it allows

the testing of many factors using only a fraction of the total number of treatment combinations.

Page 47: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 47

8. Example: A 2 fractional factorial design

• Just take those treatment combinations which have a + in the four way interaction

Treatment combination A B C D AB AC AD

CD BD BCExcel, SE, 1Mb , import (1) - - - - + + +WingZ, IIcx, 1Mb , import (ab) + + - - + - -WingX, SE, 2Mb , import (ac) + - + - - + -WingZ, SE, 1Mb , open (ad) + - - + - - +Excel, IIcx, 2Mb , import (bc) - + + - - - +Excel, IIcx, 1Mb , open (bd) - + - + - + -Excel, SE, 2Mb , open (cd) - - + + + - -WingZ, IIcx, 2Mb , open (abcd) + + + + + + +

4-1

Page 48: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 48

• Can estimate all the main effects.

• But note: All two way interactions are confounded with other two way interactions.

• e.g. the interaction AB is estimated by using the sequence+ + - - - - + +

• and the interaction CD is also estimated by using the sequence + + - - - - + +

• If we used all possible treatment combinations, including those which have a minus in the four way interaction, we’d find that the pattern for AB and that for CD differed in that one had a plus and one a minus in the four way interaction

Page 49: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 49

• Similarly:All three way interactions are confounded with main effectse.g. the +/- pattern for BCD is the same as that for A if we only take those treatment combinations which have + in the four way interaction.

• If we believe the three way interactions are negligible, and if we believe the two way interactions are small or do not matter, then we can estimate all 4 of the main effects using just 8 measurements: each compares 4 at low level with 4 at high level

• Contrast this with: if you do separate experiments to estimate the 4 main effects, each estimate being based on comparing 4 values with 4 values, you will need

4 (effects) × 4 (values) × 2 (levels) = 32 measurements

Page 50: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 50

These ideas can be taken even further, so that one can test the main effects of large numbers of factors in simple experiments without obtaining a huge number of results.

Page 51: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 51

9. More than two levels• These ideas can be extended to situations in which the factors have more

than two levels (though the convenient +/- table cannot then be used)

• A 1-factor experiment where the factor has 4 levels:e.g. 4 different background colours for a display

• A 2-factor experiment, with one factor having 4 levels and one having 3.e.g. A=background colour (4 levels), B=icon size (3 levels)

A1 2 3 4

B 1 • • • •2 • • • •

3 • • • •

Here we have 12 treatment combinations(AB) = 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42, 43

Page 52: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 52

10. Blocks

• In the 4×3 experiment, suppose we can afford to collect 36 observations: 3 measurements at each treatment combination

• Suppose also that we think the results might depend on the age of user

• Then we might want to arrange things so that 12 of the observations were collected from young users, 12 from older users, and 12 from older still.

• That is, we would block the 36 observations into three groups, or blocks, to reduce the variability in the results

Block1: 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42, 43Block2: 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42, 43Block3: 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42, 43

Page 53: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 53

• The main effects and interaction of background colour and icon size can be evaluated within each block (and averaged over the blocks)...

... so that the age differences do not affect the results

• That is, age is not confounded with the factors

• We would be controlling for the fact that the results depended on age.

Page 54: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 54

11. Within subjects designs

• Sometimes we can do even better and eliminate variation by using each subject for more than one treatment combinatione.g. is it quicker to type scientific papers using the word processing system Word or the typesetting language LaTeX?Group of people each type a document using both systems, yielding the times (mins) below

Subject Word LaTeX1 10 112 8 123 15 144 11 105 21 176 18 187 14 158 20 22

Page 55: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 55

• Same design as we started with

But care needs to be taken: perhaps people are faster when they type the document the second time

• So balance order in which the subjects do the tests:

Half use Word first, and half use LaTeX first.

• Then any learning effect will not be confounded with the difference between the two systems.

Page 56: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 56

• Example: Suppose we want to have three observations at each treatment combination of a 2 by 3 design.

Factor B1 2 3

x x x 1 x x x

Factor x x x

x x x 2 x x x

x x x

Page 57: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 57

Design 1: between subjects factors

Factor B1 2 3S1 S4 S7

1 S2 S5 S8S3 S6 S9

Factor AS10 S13 S16

2 S11 S14 S17S12 S15 S18

Page 58: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 58

Design 2: within subjects factors

Factor B1 2 3S1 S1 S1

1 S2 S2 S2S3 S3 S3

Factor AS1 S1 S1

2 S2 S2 S2S3 S3 S3

Page 59: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 59

• Advantages of within subjects designs:1) Only 3 subjects needed instead of 18 2) Increased sensitivity because the estimates of differences between levels of Factor A do not include random differences due to different subjects being

used.3) Same for Factor B.

• Disadvantages of within subjects designs:1) Can only use if there’s no ‘learning effect’.e.g. If Factor B is method of teaching a new

programming language then you can’t do it – can’t learn a new language three times!

Page 60: © Imperial College LondonPage 1 Experimental design Emma McCoy Imperial College London

© Imperial College LondonPage 60

FINISHED (Phew!)