© 2002 prentice-hall, inc.chap 9-1 statistics for managers using microsoft excel 3 rd edition...

© 2002 Prentice-Hall, Inc. Chap 9-1

Statistics for Managers Using Microsoft Excel

3rd Edition

Chapter 9Analysis of Variance


The completely randomized design: one-factor analysis of variance ANOVA assumptions F test for difference in c means The Tukey-Kramer procedure

The factorial design: two-way analysis of variance Examine effects of factors and interaction

Kruskal-Wallis rank test for differences in c medians

Chapter Topics


General Experimental Setting

Investigator controls one or more independent variables Called treatment variables or factors Each treatment factor contains two or more

levels (or categories/classifications) Observe effects on dependent variable

Response to levels of independent variable Experimental design: the plan used to

test hypothesis


Completely Randomized Design

Experimental units (subjects) are assigned randomly to treatments Subjects are assumed homogeneous

Only one factor or independent variable With two or more treatment levels

Analyzed by One-factor analysis of variance (one-way

ANOVA)


Factor (Training Method)

Factor Levels

(Treatments)

Randomly Assigned

Units

Dependent Variable

(Response)

21 hrs 17 hrs 31 hrs



Randomized Design Example


One-Factor Analysis of VarianceF Test

Evaluate the difference among the mean responses of two or more (c ) populations e.g.: Several types of tires, oven temperature settings

Assumptions Samples are randomly and independently drawn

This condition must be met Populations are normally distributed

F test is robust to moderate departure from normality

Populations have equal variances Less sensitive to this requirement when samples

are of equal size from each population


Why ANOVA? Could compare the means one by one

using Z or t tests for difference of means Each Z or t test contains Type I error The total Type I error with k pairs of

means is 1- (1 - ) k

e.g.: If there are 5 means and use = .05 Must perform 10 comparisons Type I error is 1 – (.95) 10 = .40 40% of the time you will reject the null

hypothesis of equal means in favor of the alternative even when the null is true!


Hypotheses of One-Way ANOVA

All population means are equal No treatment effect (no variation in means

among groups)

At least one population mean is different

(others may be the same!) There is treatment effect Does not mean that all population means are

different

0 1 2: cH

1 : Not all are the sameiH


One-Factor ANOVA (No Treatment Effect)

The Null Hypothesis is True

0 1 2: cH


1 2 3


One-Factor ANOVA (Treatment Effect Present)

The Null Hypothesis is

NOT True

0 1 2: cH


1 2 3 1 2 3


One-Factor ANOVA(Partition of Total Variation)

Variation Due to Treatment SSA

Variation Due to Treatment SSA

Variation Due to Random Sampling SSW

Variation Due to Random Sampling SSW

Total Variation SSTTotal Variation SST

Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares

Unexplained Within Groups Variation

Commonly referred to as: Sum of Squares Among Sum of Squares Between Sum of Squares Model Sum of Squares

Explained Sum of Squares

Treatment Among Groups Variation

= +


Total Variation

2

1 1

1 1

( )

: the -th observation in group

: the number of observations in group

: the total number of observations in all groups

: the number of groups

the over

j

j

nc

ijj i

ij

j

nc

ijj i

SST X X

X i j

n j

n

c

X

Xc

all or grand mean


Total Variation(continued)

Group 1 Group 2 Group 3

Response, X


Response, X

2 2 2

11 21 cn cSST X X X X X X

X


2

1

( )c

j jj

SSA n X X

Among-Group Variation

Variation Due to Differences Among Groups.

1

SSAMSA

c

i j

: The sample mean of group

: The overall or grand mean

jX j

X


Among-Group Variation(continued)


Response, X


Response, X

2 2 2

1 1 2 2 c cSSA n X X n X X n X X

X1X 2X

3X


Summing the variation within each group and then adding over all groups.

: The sample mean of group

: The -th observation in group

j

ij

X j

X i j

Within-Group Variation

2

1 1

( )jnc

ij jj i

SSW X X

SSWMSW

n c

j


Within-Group Variation(continued)


Response, X


Response, X

22 2

11 1 21 1 cn c cSSW X X X X X X

1X 2X3X


Within-Group Variation(continued)

2 2 21 1 2 2

1 2

( 1) ( 1) ( 1)

( 1) ( 1) ( 1)c c

c

SSWMSW

n c

n S n S n S

n n n

For c = 2, this is the pooled-variance in the t-Test.

•If more than two groups, use F Test.

•For two groups, use t-Test. F Test more limited.

j


One-Factor ANOVAF Test Statistic

Test statistic

MSA is mean squares among or between variances

MSW is mean squares within or error variances

Degrees of freedom

MSAF

MSW

1 1df c 2 1df n


One-Factor ANOVA Summary Table

Source of

Variation

Degrees of

Freedom

Sum ofSquares

Mean Squares

(Variance)

FStatistic

Among(Factor)

c – 1 SSAMSA =

SSA/(c – 1 )MSA/MSW

Within(Error)

n – c SSWMSW =

SSW/(n – c )

Total n – 1SST =SSA + SSW


Features of One-Factor ANOVA

F Statistic The F Statistic is the ratio of the

among estimate of variance and the within estimate of variance The ratio must always be positive Df1 = c -1 will typically be small Df2 = n - c will typically be large

The ratio should be closed to 1 if the null is true


Features of One-Factor ANOVA

F Statistic

The numerator is expected to be greater than the denominator

The ratio will be larger than 1 if the null is false

(continued)


One-Factor ANOVA F Test Example

As production manager, you want to see if three filling machines have different mean filling times. You assign 15 similarly trained and experienced workers, five per machine, to the machines. At the .05 significance level, is there a difference in mean filling times?

Machine1 Machine2 Machine325.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40


One-Factor ANOVA Example: Scatter Diagram

27

26

25

24

23

22

21

20

19

••

•••

•••••

••••

•

Time in SecondsMachine1 Machine2 Machine325.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

1 2

3

24.93 22.61

20.59 22.71

X X

X X

1X

2X

3X

X


One-Factor ANOVA Example Computations

Machine1 Machine2 Machine3

25.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

2 2 25 24.93 22.71 22.61 22.71 20.59 22.71

47.164

4.2592 3.112 3.682 11.0532

/( -1) 47.16 / 2 23.5820

/( - ) 11.0532 /12 .9211

SSA

SSW

MSA SSA c

MSW SSW n c

1

2

3

24.93

22.61

20.59

22.71

X

X

X

X

5

3

15

jn

c

n


Summary Table

Source of

Variation

Degrees of

Freedom

Sum ofSquares

Mean Squares

(Variance)

FStatistic

Among(Factor)

3-1=2 47.1640 23.5820MSA/MSW

=25.60

Within(Error)

15-3=12

11.0532 .9211

Total15-

1=1458.2172


One-Factor ANOVA Example Solution

F0 3.89

H0: 1 = 2 = 3

H1: Not All Equal = .05df1= 2 df2 = 12

Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at = 0.05

There is evidence that at least one i differs from the rest.

= 0.05

FMSA

MSW

23 5820

921125 6

.

..


Solution In EXCEL

Use tools | data analysis | ANOVA: single factor

EXCEL worksheet that performs the one-factor ANOVA of the example

Microsoft Excel Worksheet


The Tukey-Kramer Procedure

Tells which population means are significantly different e.g.: 1 = 2 3

Two groups whose means may be significantly different

Post hoc (a posteriori) procedure Done after rejection of equal means in ANOVA

Ability for pair-wise comparisons Compare absolute mean differences with

critical range

X

f(X)

1 = 2

3


The Tukey-Kramer Procedure: Example

1. Compute absolute mean differences:Machine1 Machine2 Machine3

25.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

1 2

1 3

2 3

24.93 22.61 2.32

24.93 20.59 4.34

22.61 20.59 2.02

X X

X X

X X

2. Compute Critical Range:

3. All of the absolute mean differences are greater. There is a significance difference between each pair of means at 5% level of significance.

( , )'

1 1Critical Range 1.618

2U c n cj j

MSWQ

n n


Solution in PHStat

Use PHStat | c-sample tests | Tukey-Kramer procedure …

EXCEL worksheet that performs the Tukey-Kramer procedure for the previous example



Two-Way ANOVA

Examines the effect of Two factors on the dependent variable

e.g.: Percent carbonation and line speed on soft drink bottling process

Interaction between the different levels of these two factors

e.g.: Does the effect of one particular percentage of carbonation depend on which level the line speed is set?


Two-Way ANOVA

Assumptions Normality

Populations are normally distributed

Homogeneity of Variance Populations have equal variances

Independence of Errors Independent random samples are drawn

(continued)


SSE

Two-Way ANOVA Total Variation Partitioning

Variation Due to Treatment A

Variation Due to Treatment A

Variation Due to Random SamplingVariation Due to

Random Sampling

Variation Due to Interaction

Variation Due to Interaction

SSA

SSABSST

Variation Due to Treatment B

Variation Due to Treatment B

SSB

Total VariationTotal Variation

d.f.= n-1

d.f.= r-1

=

+

+ d.f.= c-1

+ d.f.= (r-1)(c-1)

d.f.= rc(n’-1)


Two-Way ANOVA Total Variation Partitioning

'

the number of levels of factor A

the number of levels of factor B

the number of values (replications) for each cell

the total number of observations in the experiment

the value of the -th oijk

r

c

n

n

X k

bservation for level of

factor A and level of factor B

i

j


Total Variation

' '

' 2

1 1 1

1 1 1 1 1 1

'

Sum of Squares Total

= total variation among all

observations around the grand mean

the overall or grand mean

r c n

ijki j k

r c n r c n

ijk ijki j k i j k

SST X X

X X

Xrcn n


Factor A Variation

2'

1

r

i

i

SSA cn X X

Sum of Squares Due to Factor A = the difference among the various levels of factor A and the grand mean


Factor B Variation

2'

1

c

j

j

SSB rn X X

Sum of Squares Due to Factor B = the difference among the various levels of factor B and the grand mean


Interaction Variation

2'

1 1

r c

ij i j

i j

SSAB n X X X X

Sum of Squares Due to Interaction between A and B = the effect of the combinations of factor A and factor B


Random Error

Sum of Squares Error = the differences among the observations within

each cell and the corresponding cell means

' 2

1 1 1

r c n

ijijki j k

SSE X X


Two-Way ANOVA:The F Test Statistic

F Test for Factor B Main Effect

F Test for Interaction Effect

H0: 1 .= 2 . = ••• = r .

H1: Not all i . are equal

H0: ij = 0 (for all i and j)

H1: ij 0

H0: 1 = . 2 = ••• = c

H1: Not all . j are equal

Reject if F > FU

Reject if F > FU

Reject if F > FU

1

MSA SSAF MSA

MSE r

F Test for Factor A Main Effect

1

MSB SSBF MSB

MSE c

1 1

MSAB SSABF MSAB

MSE r c


Two-Way ANOVASummary Table

Source ofVariation

Degrees of Freedom

Sum ofSquare

s

Mean Squares

FStatisti

c

Factor A(Row)

r – 1 SSAMSA =

SSA/(r – 1)MSA/MSE

Factor B(Column)

c – 1 SSBMSB =

SSB/(c – 1)MSB/MSE

AB(Interaction)

(r – 1)(c – 1) SSABMSAB =

SSAB/ [(r – 1)(c – 1)]MSAB/MSE

Error rc n’ – 1) SSEMSE =

SSE/[rc n’ – 1)]

Total rc n’ – 1 SST


Features of Two-Way ANOVA

F Test

Degrees of freedom always add up rcn’-1=rc(n’-1)+(c-1)+(r-1)+(c-1)(r-1) Total=error+column+row+interaction

The denominator of the F Test is always the same but the numerator is different.

The sums of squares always add up Total=error+column+row+interaction


Kruskal-Wallis Rank Test for c Medians

Extension of Wilcoxon rank-sum test Tests the equality of more than 2 (c)

population medians

Distribution-free test procedure Used to analyze completely randomized

experimental designs Use 2 distribution to approximate if

each sample group size nj > 5 df = c – 1


Kruskal-Wallis Rank Test

Assumptions Independent random samples are drawn Continuous dependent variable Data may be ranked both within and among

samples Populations have same variability Populations have same shape

Robust with regard to last two conditions Use F test in completely randomized designs

and when the more stringent assumptions hold


Kruskal-Wallis Rank Test Procedure

Obtain ranks In event of tie, each of the tied values gets

their average rank Add the ranks for data from each of the

c groups Square to obtain tj

2

2

1

123( 1)

( 1)

cj

j j

TH n

n n n

1 2 cn n n n



Compute test statistic

Number of observation in j –th sample H may be approximated by chi-square

distribution with df = c –1 when each nj >5

(continued)

2

1

123( 1)

( 1)

cj

j j

TH n

n n n

1 2 cn n n n

jn



Critical value for a given Upper tail

Decision rule Reject H0: M1 = M2 = ••• = mc if test statistic

H > Otherwise do not reject H0

(continued)

2U

2U



Kruksal-Wallis Rank Test: Example

As production manager, you want to see if three filling machines have different median filling times. You assign 15 similarly trained & experienced workers, five per machine, to the machines. At the .05 significance level, is there a difference in median filling times?


Machine1 Machine2 Machine3 14 9 2 15 6 7 12 10 1 11 8 4 13 5 3

Example Solution: Step 1 Obtaining a Ranking

Raw Data Ranks

65 38 17



Example Solution: Step 2 Test Statistic Computation

212

3( 1)( 1) 1

2 2 212 65 38 173(15 1)

15(15 1) 5 5 5

11.58

Tc jH n

n n nj j


Kruskal-Wallis Test Example Solution

H0: M1 = M2 = M3

H1: Not all equal = .05df = c - 1 = 3 - 1 = 2Critical Value(s): Reject at

Test Statistic:

Decision:

Conclusion:There is evidence that population medians are not all equal.

= .05

= .05

H = 11.58

0 5.991


Kruskal-Wallis Test in PHStat

PHStat | c-sample tests | Kruskal-Wallis rank sum test …

Example solution in excel spreadsheet



Chapter Summary Described the completely randomized

design: one-factor analysis of variance ANOVA assumptions F test for difference in c means The Tukey-Kramer procedure

Described the factorial design: two-way analysis of variance Examine effects of factors and interaction

Discussed Kruskal-Wallis rank test for differences in c medians

© 2002 prentice-hall, inc.chap 9-1 statistics for managers using microsoft excel 3 rd edition...

Documents

factorseach treatment

treatment ssavariation

null hypothesis of equal

differentonefactor anova

distributedf test

pairs of means

population mean

c populationse