ch 5 bb analyze

7/29/2019 Ch 5 BB Analyze

1/83

Chapter 5

Analyze


2/83

Objective

In the measure phase, our focus was primarily onY.

In the analyze phase, we shall focus on the Xs.

Objectives are To identify and establish the root cause(s) of the

problem (i.e. the Xs affecting Y)

[If required] To develop the function Y=f(X), that isuseful

To identify the important Xs for experimentation


3/83


4/83

Types of Factors (Xs)

Noise Factors Neither the effect difference among levels can be

reduced nor can the best level be picked up forimplementation

Example:Environmental condition, measurementmethod, Degradation time, Supply voltage, Toleranceson control factors

Block Factors The best level cant be picked up for implementation but

the effect difference among levels can be reduced One or more control factors and/or noise factors are

associated with a (naturally occurring) block Example:Geographical location, Shift of operation,

Spindle number, Supplier


5/83


6/83


7/83


8/83

Notes: Analysis Steps

Validate measurement

Develop operating definition for each X

Perform gauge R & R for each X

Select probable causes

Base lining of each X

Is-Is Not testing

Importance rating Develop and test hypotheses

Develop hypothesis Collect data Test hypothesis


9/83


10/83

Brainstorming

Brainstorming is a part of problem solving,where a group of people put social

inhibitions and norms aside for generating

as many new ideas as possible (regardlessof their initial worth).


11/83


12/83


13/83


14/83


15/83


16/83


17/83


18/83


19/83


20/83

CE Diagram: Dispersion AnalysisType

Poor DataQuality

Sampling

Sample collection

Design

implementation

Samplingdesign

Requirement

Accuracy

Cost

Unit sampling cost

Sample size

Sample size

Stratification

Why does the dispersion in Because of

Sampling occur? Sampling design, designimplementation and sample collection

Sampling design occur? Requirement variation


21/83


22/83

Bad CE Diagram

Effect

(1) Very simple looking

(2) Contains very few causes

Effect

1

2

3 45

(3) Complicated looking, but full of non-actionable causes

(4) Complicated looking, but contains many causes having no effect

A good CE diagramis one that is easy

to use, leads toaction and actionsare likely to yieldresults


23/83


24/83


25/83


26/83


27/83


28/83


29/83


30/83


31/83


32/83


33/83


34/83


35/83


36/83


37/83


38/83


39/83


40/83

Example Solution: One Tail

80.1

36

15

3685.372

n

S

Xt

Observed t = 1.80 does not fallin the rejection region


41/83

Example : Z Test for Proportion

Problem:A marketing company claimsthat it receives 4% responses from its

Mailing.

Approach:To test this claim, a randomsample of 500 were surveyed with 25

responses. So observed p = .05 is more

than the claimed ps = .04Solution:Z test (Assume a = .05). Whytest at all since p > ps? Both sided H1.


42/83


43/83


44/83


45/83


46/83

t Test An Example

Sample no. 1 2 3 4 5 6 7 8 9 10

Oil Quenching 145 150 153 148 141 152 146 154 139 148

Water quenching 155 153 150 158 143 149 161 155 154 146

Hardness after hardening by two quenching methods

Data159156153150147144141

Oil Quenching

Water Quenching

Dotplot of Oil Quenching, Water Quenching

What can you conclude?

Mean s.d

Oil Q 147.6 4.97

Water Q 152.4 5.46

sp =[(9*4.972 + 9*5.462)/(10 + 10 - 2)]

= 5.221t0 = (147.6152.4) / [5.221*(2/10)] = -2.06

tcritical = t.025, 18 = 2.101 > mod (to)

H0: oil = waterHA: oilwater

H0 can not be rejected at 95% level of confidence


47/83


48/83


49/83

Pizza Experiment : Solution

Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d

Pizza A 12.9 5.7 16.0 14.3 2.4 1.6 14.6 10.2 4.3 6.6 8.86 5.40

Pizza B 16.0 7.5 16.0 14.0 13.2 3.4 15.5 11.3 15.4 10.6 12.29 4.18

Hypotheses:: H0: A = B, H1: A BPooled s. d = sp = [(5.402 + 4.182)/2] = 4.83

t = (12.298.86) / [4.83*(1/5)] = 3.43/2.16 = 1.59 < t.025, 18 = 2.101

Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d

B - A 3.1 1.8 0.0 -0.3 10.8 1.8 0.9 1.1 11.1 4.0 3.43 4.17

Hypotheses:: H0: B-A = 0, H1: B-A 0t = 3.43 / [4.17/10] = 2.60 > t.025, 9 = 2.262

H0 is Rejected

There is sufficient evidence to claim that

people prefer Pizza A more than Pizza B

H0can not berejected

Two Sample t Test

F T t


50/83

F Test :Testing Variances of Two Populations

H0: 12 = 22

H1: 1222

H0: 12 = 22

H1: 12 > 22

H0: 12 = 22

H1: 12 < 22

Hypotheses

2

1

2

2

0s

sF

2

2

2

1

0s

s

F

212

2

2

10 , ss

s

sF >

Test Statistic

1,1,2/0 21 >

nnFF

a

1,1,0 21 > nnFF a

1,1,0 12 >

nnFF

a

H0 Rejection criteria

In the previous pizza example, standard deviation in the twocases are 5.40 and 4.18. Assuming two sided alternative,

F0 = 5.402/4.182 = 1.67 < F.025, 9, 9 = 4.03

H0not rejectedTwo variances are not significantly different


51/83


52/83

Basic ANOVA Situation

We want to compare means of two or moregroups

Analysis of variance (ANOVA) makes use of an

omnibus F test that tells us if there is anysignificant difference anywhere among the groups

If F test says no significant difference then there isno point in searching further

If F test indicates significance then we may useother tools to find out where the difference is


53/83

An Example ANOVA Situation

Subjects: 25 patients with blisters

Treatments:Treatment A, Treatment B, Placebo

Measurement:# of days until blisters heal

Data [and means]:

A: 5, 6, 6, 7, 7, 8, 9, 10 [7.25]

B: 7, 7, 8, 9, 9, 10, 10, 11 [8.875]

P: 7, 9, 9, 10, 10, 10, 11, 12, 13 [10.11]

Are these differences significant?


54/83

Two Sources of Variability

In ANOVA, an estimate ofvariability between groupsiscompared withvariability within groups.

Between-group variation is the variation among the means of thedifferent treatment conditions due to chance (random sampling

error) and treatment effects, if any exist. Within-group variation is the variation due to chance (random

sampling error) among individuals given the same treatment.

Total Variation in Data

Within Group Variation Variation due to chance

Between Group Variation Variation due to chance and

treatment effects, if any


55/83

Variability Between Groups

There is a lot of variability from one mean to the next.

Large differences between means probably are not due tochance.

It is difficult to imagine that all six groups are randomsamples taken from the same population.

The null hypothesis is rejected, indicating a treatmenteffectin at least one of the groups.


56/83

Variability Within Groups

Same amount of variability between group means.

However, there is more variability within each group.

The larger the variability within each group, the lessconfident we can be that we are dealing withsamples drawn from different populations.


57/83

The F Ratio

The ANOVA F-statistic is a ratio of the Between Group

Variation divided by the Within Group Variation

VariationGroupWithinVariationGroupBetweenF

A large F is evidence against H0, since it indicates that there

is more difference between groups than within groups


58/83


1>F

GroupsWithinyVariabilitGroupsBetweenyVariabilitF


59/83


roupsy Within GVariabilit

Groupsy BetweenVariabilit

F

1F


60/83

Blister Experiment: Minitab ANOVAOutput

1 less than # of

groups

# of data values - # of groups

(equals df for each group

added together)

1 less than # of individuals

(just like other situations)

Analysis of Variance for days

Source DF SS MS F P

Treatment 2 34.74 17.37 6.45 0.006

Error 22 59.26 2.69

Total 24 94.00


61/83

Minitab ANOVA Output

Analysis of Variance for daysSource DF SS MS F P

Treatment 2 34.74 17.37 6.45 0.006

Error 22 59.26 2.69

Total 24 94.00

2)( iobs

ij xx

(xi

obs

x)2

(xijobs x)

2

SS stands for sum of squares


62/83

Minitab ANOVA Output

MSG = SSG / DFG

MSE = SSE / DFE

Analysis of Variance for days

Source DF SS MS F P

Treatment 2 34.74 17.37 6.45 0.006

Error 22 59.26 2.69

Total 24 94.00

F = MSG / MSE

P-value

comes from

F(DFG,DFE)


63/83

Contingency Table

Group Outcome Total

Defective Non-defective

Supplier 1 13 37 50

Supplier 2 6 144 150

Total 19 181 200

26%

4%

74%

96%

Y = Outcome

Y is discrete

Its a 2 x 2

contingency table

Is there any difference between the two suppliers?

Generally speaking, we want to know if there is anyassociation between the groups and outcomes

In the above we have two categorical variables eachat two levels. In general we can have p x q x rx table

We shall discuss only p x q


64/83

Expected Frequencies

Group Outcome Total


Supplier 1 13 37 50


Total 19 181 200

5

14

45

136

#

Expected frequency

under H0, i.e. no

supplier difference

Under H0, Probability of getting a defective = 19 / 200 = 0.095

So, under H0, number of defectives expected in the samples are

Supplier 1: 50 x 0.095 = 5 (rounded to nearest integer) Supplier 2: 150 x 0.095 = 14 (rounded to nearest integer)

Expected frequency of non-defectives can be found out similarly, or

simply as (50 5) = 45 (supplier 1) and (150 14) = 136 (supplier 2)


65/83


66/83

Contingency Table 2 Test

Group Outcome Total


Supplier 1 13 37 50


Total 19 181 200

5

14

45

136

p

i

q

j ij

ijij

E

OE

1 1

2

2)(

c

H0 will be rejected if

)1)(1(,22

> qpacc

Eij = Expected frequency

Oij = Observed frequency

11.21

75.135

)14475.135(

25.14

)625.14(

25.45

)3725.45(

75.4

)1375.4( 22222

c

2.05,1 = 3.84

Expected frequencies

should not be rounded

For our data

H0 is rejected. Two suppliers are significantly different


67/83

Correlation

In practice we frequently find that a group of twoor more variables move together

For simplicity, let us assume two continuous

variables The variables may be (Y1, Y2), (X1, X2) or (X, Y)

At this moment we also not concerned whetherthe correlation is meaningful!

How do we measure the strength of correlationbetween the two variables?


68/83

Correlation Coefficient

The most popular measure of association betweentwo continuous variables is the correlationcoefficient r

It is defined in such a manner so that it varies

Between -1 (perfect negative correlation)

Through 0 (absolutely no correlation)

To +1 (perfect positive correlation)

However we should never jump to compute thevalue of r. The first step in studying associationbetween two variables is to plot the data in theform of a Scatter Diagram


69/83

Scatter Diagram and r

r = +1.0


70/83


r = -1.0


71/83


r = +0.7

d


72/83


r = +0.3

S d


73/83


r = -0.3

S Di d


74/83


r = 0

U f l f S Di


75/83

Usefulness of Scatter Diagram

Drawing scatter diagram should always be the first step incorrelation analysis for the following reasons

Guard against gross computational error

Detects outliers, if any (may be genuine or data error) Detects influential points, if any (may be genuine or data error)

Detects groups in data, if any

Detects nonlinearity (r is a measure of linear association only)

Provides a quick approximate solution

Easy to understand

C ti


76/83

Computing r

yyxx

xy

n

y

iyy

n

x

ixx

iixy

SS

Sr

yyyS

xxxS

n

yxxyyyxxS

2

2

)(22

)(22

)(

)(

))((

22 )()(

))((

yyxx

yyxxr

ii

ii

Data: (Var 1, Var 2)=(x, y)

Formulae for convenient andefficient computing

# x y x2

y2

xy1 2 6 4 36 12

2 6 8 36 64 48

3 1 4 1 16 4

4 4 7 16 49 28

5 3 4 9 16 12

6 5 9 25 81 45

7 8 10 64 100 80

Total 29 48 155 362 229

Sxy = 229 (29 * 48) / 7 = 30.14

Sxx = 155 (29 * 29) / 7 = 34.86

Syy = 362 (48 * 48) / 7 = 32.86

r = (30.14/((34.86*32.86)) =0.891

R i A l i


77/83

Regression Analysis

As in the correlation analysis we have n observations of theform (X, Y) or (X1, X2, X3, , Xk, Y)

Y is called the dependent variable and the Xs independentvariables

Objective is to develop a equation relating Y to the Xs forpredicting Y in the data range Simple linear regression

Y = + X + , where is the random error component

Multiple linear regression Y = + 1 X1 + 2 X2 + 3 X3+ . + kXk + All or a few of the Xs may be a function of x1

Nonlinear regression

Here we shall discuss only simple linear regression

R i A l i A E l


78/83

Regression Analysis: An Example

Given one variable Goal: Predict Y Example:

Given Years ofExperience Predict Salary

Questions: When X=10, what is Y? When X=25, what is Y? This is known as

regression

X (years) Y (salary inRs. 1,000)

3 30

8 57

9 64

13 72

3 36

6 43

11 59

21 90

1 20

Obt i i th B t Fit Li


79/83

Obtaining the Best Fit Line

Years of Experience

Salary

20151050

90

80

70

60

50

40

30

20

Scatterplot of Salary vs Years of Experience

1. Plot the data



80/83


Linear Regression: Y=3.5*X+23.2

0

20

40

60

80

100

120

0 5 10 15 20 25

Years

Sa

lary

2. Then obtain the best fit line



81/83


by minimizing the deviations

Usually, we minimize the square of thedeviations for estimating and

Hence the name Least Square Method

R i F l


82/83

Regression Formulae

XY ba

xy

xx

yyxx

ii

i

ii

ba

b

2

)(

))((

Using our earlier notation we havexx

xy

S

Sb

Exercise: Estimateand for the salary data

G d f Fit


83/83

Goodness of Fit

yy

xy

S

S

TSS

SSR

squaresofsumTotal

regressiontoduesquaresofSumR

b

2

Note that R

2

is nothing but the squareof the correlation coefficient r

For practical purposes, usefulness of the equation isjudged by the standard error s

sbandpredictioneApproximat

MSEserrordardS

nSSEMSEerrorsquareMean

RSSTSSSSEsquaresofsumError

2

tan

)2/(

ch 5 bb analyze

Documents