ch 5 bb analyze
TRANSCRIPT
-
7/29/2019 Ch 5 BB Analyze
1/83
Chapter 5
Analyze
-
7/29/2019 Ch 5 BB Analyze
2/83
Objective
In the measure phase, our focus was primarily onY.
In the analyze phase, we shall focus on the Xs.
Objectives are To identify and establish the root cause(s) of the
problem (i.e. the Xs affecting Y)
[If required] To develop the function Y=f(X), that isuseful
To identify the important Xs for experimentation
-
7/29/2019 Ch 5 BB Analyze
3/83
-
7/29/2019 Ch 5 BB Analyze
4/83
Types of Factors (Xs)
Noise Factors Neither the effect difference among levels can be
reduced nor can the best level be picked up forimplementation
Example:Environmental condition, measurementmethod, Degradation time, Supply voltage, Toleranceson control factors
Block Factors The best level cant be picked up for implementation but
the effect difference among levels can be reduced One or more control factors and/or noise factors are
associated with a (naturally occurring) block Example:Geographical location, Shift of operation,
Spindle number, Supplier
-
7/29/2019 Ch 5 BB Analyze
5/83
-
7/29/2019 Ch 5 BB Analyze
6/83
-
7/29/2019 Ch 5 BB Analyze
7/83
-
7/29/2019 Ch 5 BB Analyze
8/83
Notes: Analysis Steps
Validate measurement
Develop operating definition for each X
Perform gauge R & R for each X
Select probable causes
Base lining of each X
Is-Is Not testing
Importance rating Develop and test hypotheses
Develop hypothesis Collect data Test hypothesis
-
7/29/2019 Ch 5 BB Analyze
9/83
-
7/29/2019 Ch 5 BB Analyze
10/83
Brainstorming
Brainstorming is a part of problem solving,where a group of people put social
inhibitions and norms aside for generating
as many new ideas as possible (regardlessof their initial worth).
-
7/29/2019 Ch 5 BB Analyze
11/83
-
7/29/2019 Ch 5 BB Analyze
12/83
-
7/29/2019 Ch 5 BB Analyze
13/83
-
7/29/2019 Ch 5 BB Analyze
14/83
-
7/29/2019 Ch 5 BB Analyze
15/83
-
7/29/2019 Ch 5 BB Analyze
16/83
-
7/29/2019 Ch 5 BB Analyze
17/83
-
7/29/2019 Ch 5 BB Analyze
18/83
-
7/29/2019 Ch 5 BB Analyze
19/83
-
7/29/2019 Ch 5 BB Analyze
20/83
CE Diagram: Dispersion AnalysisType
Poor DataQuality
Sampling
Sample collection
Design
implementation
Samplingdesign
Requirement
Accuracy
Cost
Unit sampling cost
Sample size
Sample size
Stratification
Why does the dispersion in Because of
Sampling occur? Sampling design, designimplementation and sample collection
Sampling design occur? Requirement variation
-
7/29/2019 Ch 5 BB Analyze
21/83
-
7/29/2019 Ch 5 BB Analyze
22/83
Bad CE Diagram
Effect
(1) Very simple looking
(2) Contains very few causes
Effect
1
2
3 45
(3) Complicated looking, but full of non-actionable causes
(4) Complicated looking, but contains many causes having no effect
A good CE diagramis one that is easy
to use, leads toaction and actionsare likely to yieldresults
-
7/29/2019 Ch 5 BB Analyze
23/83
-
7/29/2019 Ch 5 BB Analyze
24/83
-
7/29/2019 Ch 5 BB Analyze
25/83
-
7/29/2019 Ch 5 BB Analyze
26/83
-
7/29/2019 Ch 5 BB Analyze
27/83
-
7/29/2019 Ch 5 BB Analyze
28/83
-
7/29/2019 Ch 5 BB Analyze
29/83
-
7/29/2019 Ch 5 BB Analyze
30/83
-
7/29/2019 Ch 5 BB Analyze
31/83
-
7/29/2019 Ch 5 BB Analyze
32/83
-
7/29/2019 Ch 5 BB Analyze
33/83
-
7/29/2019 Ch 5 BB Analyze
34/83
-
7/29/2019 Ch 5 BB Analyze
35/83
-
7/29/2019 Ch 5 BB Analyze
36/83
-
7/29/2019 Ch 5 BB Analyze
37/83
-
7/29/2019 Ch 5 BB Analyze
38/83
-
7/29/2019 Ch 5 BB Analyze
39/83
-
7/29/2019 Ch 5 BB Analyze
40/83
Example Solution: One Tail
80.1
36
15
3685.372
n
S
Xt
Observed t = 1.80 does not fallin the rejection region
-
7/29/2019 Ch 5 BB Analyze
41/83
Example : Z Test for Proportion
Problem:A marketing company claimsthat it receives 4% responses from its
Mailing.
Approach:To test this claim, a randomsample of 500 were surveyed with 25
responses. So observed p = .05 is more
than the claimed ps = .04Solution:Z test (Assume a = .05). Whytest at all since p > ps? Both sided H1.
-
7/29/2019 Ch 5 BB Analyze
42/83
-
7/29/2019 Ch 5 BB Analyze
43/83
-
7/29/2019 Ch 5 BB Analyze
44/83
-
7/29/2019 Ch 5 BB Analyze
45/83
-
7/29/2019 Ch 5 BB Analyze
46/83
t Test An Example
Sample no. 1 2 3 4 5 6 7 8 9 10
Oil Quenching 145 150 153 148 141 152 146 154 139 148
Water quenching 155 153 150 158 143 149 161 155 154 146
Hardness after hardening by two quenching methods
Data159156153150147144141
Oil Quenching
Water Quenching
Dotplot of Oil Quenching, Water Quenching
What can you conclude?
Mean s.d
Oil Q 147.6 4.97
Water Q 152.4 5.46
sp =[(9*4.972 + 9*5.462)/(10 + 10 - 2)]
= 5.221t0 = (147.6152.4) / [5.221*(2/10)] = -2.06
tcritical = t.025, 18 = 2.101 > mod (to)
H0: oil = waterHA: oilwater
H0 can not be rejected at 95% level of confidence
-
7/29/2019 Ch 5 BB Analyze
47/83
-
7/29/2019 Ch 5 BB Analyze
48/83
-
7/29/2019 Ch 5 BB Analyze
49/83
Pizza Experiment : Solution
Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d
Pizza A 12.9 5.7 16.0 14.3 2.4 1.6 14.6 10.2 4.3 6.6 8.86 5.40
Pizza B 16.0 7.5 16.0 14.0 13.2 3.4 15.5 11.3 15.4 10.6 12.29 4.18
Hypotheses:: H0: A = B, H1: A BPooled s. d = sp = [(5.402 + 4.182)/2] = 4.83
t = (12.298.86) / [4.83*(1/5)] = 3.43/2.16 = 1.59 < t.025, 18 = 2.101
Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d
B - A 3.1 1.8 0.0 -0.3 10.8 1.8 0.9 1.1 11.1 4.0 3.43 4.17
Hypotheses:: H0: B-A = 0, H1: B-A 0t = 3.43 / [4.17/10] = 2.60 > t.025, 9 = 2.262
H0 is Rejected
There is sufficient evidence to claim that
people prefer Pizza A more than Pizza B
H0can not berejected
Two Sample t Test
F T t
-
7/29/2019 Ch 5 BB Analyze
50/83
F Test :Testing Variances of Two Populations
H0: 12 = 22
H1: 1222
H0: 12 = 22
H1: 12 > 22
H0: 12 = 22
H1: 12 < 22
Hypotheses
2
1
2
2
0s
sF
2
2
2
1
0s
s
F
212
2
2
10 , ss
s
sF >
Test Statistic
1,1,2/0 21 >
nnFF
a
1,1,0 21 > nnFF a
1,1,0 12 >
nnFF
a
H0 Rejection criteria
In the previous pizza example, standard deviation in the twocases are 5.40 and 4.18. Assuming two sided alternative,
F0 = 5.402/4.182 = 1.67 < F.025, 9, 9 = 4.03
H0not rejectedTwo variances are not significantly different
-
7/29/2019 Ch 5 BB Analyze
51/83
-
7/29/2019 Ch 5 BB Analyze
52/83
Basic ANOVA Situation
We want to compare means of two or moregroups
Analysis of variance (ANOVA) makes use of an
omnibus F test that tells us if there is anysignificant difference anywhere among the groups
If F test says no significant difference then there isno point in searching further
If F test indicates significance then we may useother tools to find out where the difference is
-
7/29/2019 Ch 5 BB Analyze
53/83
An Example ANOVA Situation
Subjects: 25 patients with blisters
Treatments:Treatment A, Treatment B, Placebo
Measurement:# of days until blisters heal
Data [and means]:
A: 5, 6, 6, 7, 7, 8, 9, 10 [7.25]
B: 7, 7, 8, 9, 9, 10, 10, 11 [8.875]
P: 7, 9, 9, 10, 10, 10, 11, 12, 13 [10.11]
Are these differences significant?
-
7/29/2019 Ch 5 BB Analyze
54/83
Two Sources of Variability
In ANOVA, an estimate ofvariability between groupsiscompared withvariability within groups.
Between-group variation is the variation among the means of thedifferent treatment conditions due to chance (random sampling
error) and treatment effects, if any exist. Within-group variation is the variation due to chance (random
sampling error) among individuals given the same treatment.
Total Variation in Data
Within Group Variation Variation due to chance
Between Group Variation Variation due to chance and
treatment effects, if any
-
7/29/2019 Ch 5 BB Analyze
55/83
Variability Between Groups
There is a lot of variability from one mean to the next.
Large differences between means probably are not due tochance.
It is difficult to imagine that all six groups are randomsamples taken from the same population.
The null hypothesis is rejected, indicating a treatmenteffectin at least one of the groups.
-
7/29/2019 Ch 5 BB Analyze
56/83
Variability Within Groups
Same amount of variability between group means.
However, there is more variability within each group.
The larger the variability within each group, the lessconfident we can be that we are dealing withsamples drawn from different populations.
-
7/29/2019 Ch 5 BB Analyze
57/83
The F Ratio
The ANOVA F-statistic is a ratio of the Between Group
Variation divided by the Within Group Variation
VariationGroupWithinVariationGroupBetweenF
A large F is evidence against H0, since it indicates that there
is more difference between groups than within groups
-
7/29/2019 Ch 5 BB Analyze
58/83
Two Sources of Variability
1>F
GroupsWithinyVariabilitGroupsBetweenyVariabilitF
-
7/29/2019 Ch 5 BB Analyze
59/83
Two Sources of Variability
roupsy Within GVariabilit
Groupsy BetweenVariabilit
F
1F
-
7/29/2019 Ch 5 BB Analyze
60/83
Blister Experiment: Minitab ANOVAOutput
1 less than # of
groups
# of data values - # of groups
(equals df for each group
added together)
1 less than # of individuals
(just like other situations)
Analysis of Variance for days
Source DF SS MS F P
Treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
-
7/29/2019 Ch 5 BB Analyze
61/83
Minitab ANOVA Output
Analysis of Variance for daysSource DF SS MS F P
Treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
2)( iobs
ij xx
(xi
obs
x)2
(xijobs x)
2
SS stands for sum of squares
-
7/29/2019 Ch 5 BB Analyze
62/83
Minitab ANOVA Output
MSG = SSG / DFG
MSE = SSE / DFE
Analysis of Variance for days
Source DF SS MS F P
Treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
F = MSG / MSE
P-value
comes from
F(DFG,DFE)
-
7/29/2019 Ch 5 BB Analyze
63/83
Contingency Table
Group Outcome Total
Defective Non-defective
Supplier 1 13 37 50
Supplier 2 6 144 150
Total 19 181 200
26%
4%
74%
96%
Y = Outcome
Y is discrete
Its a 2 x 2
contingency table
Is there any difference between the two suppliers?
Generally speaking, we want to know if there is anyassociation between the groups and outcomes
In the above we have two categorical variables eachat two levels. In general we can have p x q x rx table
We shall discuss only p x q
-
7/29/2019 Ch 5 BB Analyze
64/83
Expected Frequencies
Group Outcome Total
Defective Non-defective
Supplier 1 13 37 50
Supplier 2 6 144 150
Total 19 181 200
5
14
45
136
#
Expected frequency
under H0, i.e. no
supplier difference
Under H0, Probability of getting a defective = 19 / 200 = 0.095
So, under H0, number of defectives expected in the samples are
Supplier 1: 50 x 0.095 = 5 (rounded to nearest integer) Supplier 2: 150 x 0.095 = 14 (rounded to nearest integer)
Expected frequency of non-defectives can be found out similarly, or
simply as (50 5) = 45 (supplier 1) and (150 14) = 136 (supplier 2)
-
7/29/2019 Ch 5 BB Analyze
65/83
-
7/29/2019 Ch 5 BB Analyze
66/83
Contingency Table 2 Test
Group Outcome Total
Defective Non-defective
Supplier 1 13 37 50
Supplier 2 6 144 150
Total 19 181 200
5
14
45
136
p
i
q
j ij
ijij
E
OE
1 1
2
2)(
c
H0 will be rejected if
)1)(1(,22
> qpacc
Eij = Expected frequency
Oij = Observed frequency
11.21
75.135
)14475.135(
25.14
)625.14(
25.45
)3725.45(
75.4
)1375.4( 22222
c
2.05,1 = 3.84
Expected frequencies
should not be rounded
For our data
H0 is rejected. Two suppliers are significantly different
-
7/29/2019 Ch 5 BB Analyze
67/83
Correlation
In practice we frequently find that a group of twoor more variables move together
For simplicity, let us assume two continuous
variables The variables may be (Y1, Y2), (X1, X2) or (X, Y)
At this moment we also not concerned whetherthe correlation is meaningful!
How do we measure the strength of correlationbetween the two variables?
-
7/29/2019 Ch 5 BB Analyze
68/83
Correlation Coefficient
The most popular measure of association betweentwo continuous variables is the correlationcoefficient r
It is defined in such a manner so that it varies
Between -1 (perfect negative correlation)
Through 0 (absolutely no correlation)
To +1 (perfect positive correlation)
However we should never jump to compute thevalue of r. The first step in studying associationbetween two variables is to plot the data in theform of a Scatter Diagram
-
7/29/2019 Ch 5 BB Analyze
69/83
Scatter Diagram and r
r = +1.0
-
7/29/2019 Ch 5 BB Analyze
70/83
Scatter Diagram and r
r = -1.0
-
7/29/2019 Ch 5 BB Analyze
71/83
Scatter Diagram and r
r = +0.7
d
-
7/29/2019 Ch 5 BB Analyze
72/83
Scatter Diagram and r
r = +0.3
S d
-
7/29/2019 Ch 5 BB Analyze
73/83
Scatter Diagram and r
r = -0.3
S Di d
-
7/29/2019 Ch 5 BB Analyze
74/83
Scatter Diagram and r
r = 0
U f l f S Di
-
7/29/2019 Ch 5 BB Analyze
75/83
Usefulness of Scatter Diagram
Drawing scatter diagram should always be the first step incorrelation analysis for the following reasons
Guard against gross computational error
Detects outliers, if any (may be genuine or data error) Detects influential points, if any (may be genuine or data error)
Detects groups in data, if any
Detects nonlinearity (r is a measure of linear association only)
Provides a quick approximate solution
Easy to understand
C ti
-
7/29/2019 Ch 5 BB Analyze
76/83
Computing r
yyxx
xy
n
y
iyy
n
x
ixx
iixy
SS
Sr
yyyS
xxxS
n
yxxyyyxxS
2
2
)(22
)(22
)(
)(
))((
22 )()(
))((
yyxx
yyxxr
ii
ii
Data: (Var 1, Var 2)=(x, y)
Formulae for convenient andefficient computing
# x y x2
y2
xy1 2 6 4 36 12
2 6 8 36 64 48
3 1 4 1 16 4
4 4 7 16 49 28
5 3 4 9 16 12
6 5 9 25 81 45
7 8 10 64 100 80
Total 29 48 155 362 229
Sxy = 229 (29 * 48) / 7 = 30.14
Sxx = 155 (29 * 29) / 7 = 34.86
Syy = 362 (48 * 48) / 7 = 32.86
r = (30.14/((34.86*32.86)) =0.891
R i A l i
-
7/29/2019 Ch 5 BB Analyze
77/83
Regression Analysis
As in the correlation analysis we have n observations of theform (X, Y) or (X1, X2, X3, , Xk, Y)
Y is called the dependent variable and the Xs independentvariables
Objective is to develop a equation relating Y to the Xs forpredicting Y in the data range Simple linear regression
Y = + X + , where is the random error component
Multiple linear regression Y = + 1 X1 + 2 X2 + 3 X3+ . + kXk + All or a few of the Xs may be a function of x1
Nonlinear regression
Here we shall discuss only simple linear regression
R i A l i A E l
-
7/29/2019 Ch 5 BB Analyze
78/83
Regression Analysis: An Example
Given one variable Goal: Predict Y Example:
Given Years ofExperience Predict Salary
Questions: When X=10, what is Y? When X=25, what is Y? This is known as
regression
X (years) Y (salary inRs. 1,000)
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
Obt i i th B t Fit Li
-
7/29/2019 Ch 5 BB Analyze
79/83
Obtaining the Best Fit Line
Years of Experience
Salary
20151050
90
80
70
60
50
40
30
20
Scatterplot of Salary vs Years of Experience
1. Plot the data
Obt i i th B t Fit Li
-
7/29/2019 Ch 5 BB Analyze
80/83
Obtaining the Best Fit Line
Linear Regression: Y=3.5*X+23.2
0
20
40
60
80
100
120
0 5 10 15 20 25
Years
Sa
lary
2. Then obtain the best fit line
Obt i i th B t Fit Li
-
7/29/2019 Ch 5 BB Analyze
81/83
Obtaining the Best Fit Line
by minimizing the deviations
Usually, we minimize the square of thedeviations for estimating and
Hence the name Least Square Method
R i F l
-
7/29/2019 Ch 5 BB Analyze
82/83
Regression Formulae
XY ba
xy
xx
yyxx
ii
i
ii
ba
b
2
)(
))((
Using our earlier notation we havexx
xy
S
Sb
Exercise: Estimateand for the salary data
G d f Fit
-
7/29/2019 Ch 5 BB Analyze
83/83
Goodness of Fit
yy
xy
S
S
TSS
SSR
squaresofsumTotal
regressiontoduesquaresofSumR
b
2
Note that R
2
is nothing but the squareof the correlation coefficient r
For practical purposes, usefulness of the equation isjudged by the standard error s
sbandpredictioneApproximat
MSEserrordardS
nSSEMSEerrorsquareMean
RSSTSSSSEsquaresofsumError
2
tan
)2/(