statistical controls for qc
TRANSCRIPT
1
Statistical Tools for the Quality Control Laboratory and Validation Studies:
Session 1
l STEVEN S. KUWAHARA, Ph.D. l GXP BioTechnology LLC
l PMB #506 l 1669-2 Hollenbeck Avenue
l Sunnyvale, CA 94087-5042 USA l Tel. & FAX 408-530-9338
l e-Mail: [email protected] l Website: www.gxpbiotech.org
IVTPHL1012S1
2
NORMAL DISTRIBUTION
22/1
21 ⎟
⎠
⎞⎜⎝
⎛ −−
Π= σ
µ
σ
iX
eY
IVTPHL1012S1
3 IVTPHL1012S1
4 IVTPHL1012S1
5 IVTPHL1012S1
6
NORMAL DISTRIBUTION PROPERTIES
l The normal distribution has the following properties: l Bell-shaped l Unimodal l Symmetrical l Extends from -∞ to +∞ (tails never reach zero frequency) l Same value for mean, median, and mode l This pattern of variation is common for manufacturing processes.
IVTPHL1012S1
7 IVTPHL1012S1
8
VARIANCE (S2)
( )
( )
( )( )1
1
1
222
22
22
2
−
Σ−Σ=
−
−Σ=
−
Σ−Σ
=
nnXXnS
nXXS
nnXX
S
ii
i
ii
IVTPHL1012S1
Averages and Standard Deviations and the SEM. 1.
l All of the n measurements that go into the mean () must be measurements of the same thing. l The mean of fruits and the mean of oranges are
different things unless all of the fruits are oranges. l But then it is still the mean of oranges not fruits.
l The standard deviation (s) is a measure of the variation among the n components of NOT the variation of itself. l Thus the next item (n + 1) from the original population
should have a 95% chance of being within ± 1.96s of but not the next average (1).
Averages and Standard Deviations and the SEM. 2.
l The variation in the averages is the standard error of the mean (SEM) which is: s/√n. l Thus the next average (1) has a 95% probability
of being within ±1.96(s/√n) or ±1.96SEM of the original mean ().
l When dealing with single numbers, s is used, but when dealing with means the SEM is the number to use. l It is incorrect to use s to set a specification on a
value that is actually an average.
11
RANGE AND C.V.
l The range can be related to the standard deviation for n<16.
RSDXXSVC
ddXXs sL
%100..
alue. tabular va is 22
==
−=
IVTPHL1012S1
12
F - TEST
98.228.9F :Note
s.experiment factorial andANOVAfor used istest that -F thefrom
different slightly is This :Note
10,10,05.0
0.05,3,3
21
22
2,1,
=
=
=
F
ssF dfdfα
13
Student’s t
ances.known vari
averages,t Independen
1
form. Basic
2
22
1
21
21
nn
xxt
ndfn
sxt
σσ
µ
+
−=
−=
−=
14
t-TEST vs THEORETICAL OR KNOWN VALUE
l CHON Analysis. 9.55% H calculated. l Data: 9.17, 9.09, 9.14, 9.10, 9.13, 9.27. n = 6, = 9.15,
s = ± 0.0654 l t0.05/2, 5= 2.57, t0.01/2, 5 = 4.032, t0.001/2, 5 = 6.869, p < 0.001
98.146
0654.055.915.9
=−
=−
=
nsxt µ
15
KNOWN VARIANCES, t-TEST OF TWO AVERAGES
l Karl Fischer H2O. σ = 0.025 from historical data. l Data: Lot A: 0.50, 0.53, 0.47. l Lot B: 0.53, 0.56, 0.51, 0.53, 0.50 l n1=3, n2=5, x1=0.500, x2=0.526
l t0.05/2.∞=1.96, df = n1 + n2 – 2 = 6, t0.05/2, 6 =2.447
( ) ( )424.1
5025.0
3025.0
526.0500.022=
+
−=t
16
t for Unknown and Equal Variances
221
2ps21= t21n if
21
2121
−+=
−=
+
−=
nndf
nxxn
nn
nn
ps
xxt
17
t-TEST, UNKNOWN BUT EQUAL VARIANCES, 1.
l Data (mg/L Fe3+): Lot A: 6.1, 5.8, 7.0. l Lot B: 5.9, 5.7, 6.1. xA=6.30, sA=0.6245, xB=5.90,
sB=0.2000.
( )( )
( ) ( )( ) ( )
4637.0131320.026245.02
75.92000.06245.0
00.39
22
2
2
2,2,2/05.0
=−+−
+=
==
=
Ps
F
F
18
t-TEST UNKNOWN BUT EQUAL VARIANCES. 2.
l df = n1 + n2 - 2 df = 4
78.2
056.13333
4637.090.530.6
4,2/05.0 =
=+
−=
t
Xt
19
POOLED VARIANCE
( ) ( )211
21
222
211
−+−+−
=nn
snsnsp
20
t for Independent Averages with unknown and unequal variances.
2
11 2
2
2
22
1
2
1
21
2
22
1
21
2
22
1
21
21
−
+
⎟⎟⎠
⎞⎜⎜⎝
⎛
++
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛+
=
+
−=
nns
nns
ns
ns
df
ns
nsxxt
21
t-TEST UNKNOWN AND UNEQUAL VARIANCES, 1.
l Data:Extension of Previous Fe+3 mg/L study l xA = 6.13, sA = 0.3529 l xB = 5.76, sB = 0.1647
l nA = nB = 10 l F0.05/2,9,9 = 4.03
l F = (0.3529)2 / (0.1647)2 l F = 4.59
1 6.1 5.92 5.8 5.73 7.0 6.14 6.1 5.85 6.1 5.76 6.4 5.67 6.1 5.68 6.0 5.99 5.9 5.710 5.8 5.6
22
t-TEST UNKNOWN AND UNEQUAL VARIANCES, 2.
l t.05/2,17 = 2.110
0044.30151664.037.0
10
21647.010
23529.0
76.513.6
==
+
−=⎟⎠⎞⎜
⎝⎛⎟
⎠⎞⎜
⎝⎛
t
t
23
t-TEST UNKNOWN AND UNEQUAL VARIANCES, 3.
2
11 2
2
22
1
2
1
21
2
2
22
1
21
−
+
⎟⎟⎠
⎞⎜⎜⎝
⎛
++
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛+
=
nns
nns
ns
ns
df
24
t-TEST UNKNOWN AND UNEQUAL VARIANCES, 4.
( )( ) ( )
number wholea torounded 1723.19000081.0
0015666.00000669.00000141.0
0015666.0
2
110271261.0
1101245384.0
0395799.022
2
=−=
=+
=
−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
+
=
df
df
df
25
Paired t-Test
( )
1
1
22
21
−
∑∑−
=
−=−==
nndd
s
ndfxxdnsdt
d
iid
26
DATA FOR t -TESTS l Sample New Original d l 1. 12.1% 14.7% 2.6% l 2. 10.9 14.0 3.1 l 3. 13.1 12.9 -0.2 l 4. 14.5 16.2 1.7 l 5. 9.6 10.2 0.6 l 6. 11.2 12.4 1.2 l 7. 9.8 12.0 2.2 l 8. 13.7 14.8 1.1 l 9. 12.0 11.8 -0.2 l 10 9.1 9.7 0.6 l ave. 11.60 12.87 1.27 l s 1.814 2.075 1.126
27
Paired t-Test Calculation
exists. difference tsignifican a Therefore
26.2
567.310126.127.1
9,2/05.0 =
===
t
nSdtd
28
t-Test for unknown but equal variances.
l Showing that there is no significant difference?
10.2t182ndf 457.1
1010100
9488.187.1260.11
0.05/2,18
21
21
2121
=
=−+==+
−=
+
−=
nt
nnnn
SXXtp
29
Student’s t to a C.I.
.confidence desired theand freedom of degrees 1-nfor
table- ta from taken is t of valueThe
1
nts form. Basic
ntsx
ndf
xn
sxt
±=
−=
−=−
=
µ
µµ
30
CONFIDENCE INTERVAL 1.
30.4
..
96.1..
2,05.01,05.0 =
±=
±=
− ttntsXIC
IC
n
σµ
} 67.0 65.8 78.1 66.4 69.0 70.5 } 67.5 75.6 74.2 74.5 85.0 81.1 } 76.0 71.9 70.8 67.3 75.0 74.0 } 72.7 68.8 84.9 73.2 74.7 76.6 } 73.1 82.6 72.2 68.7 69.5 64.2
} n = 30, range = 64.2 - 85.0 range = 20.8 } Ave. = 73.03 s or σ = 5.4416 SQRT(30) = 5.4772 } t0.995, 29=2.756 99%C.I.(t) = 70.29 - 75.77
IVTPHL1012S1 31
DATA SET FOR SETTING SPECS. 1.
l 67.0 72.7 71.9 82.6 70.8 66.4 73.2 85.0 69.5 74.0 l 67.5 73.1 68.8 78.1 84.9 74.5 68.7 75.0 70.5 64.2 l 76.0 65.8 75.6 74.2 72.2 67.3 69.0 74.7 81.1 64.2
l Ave.70.2 70.5 72.1 78.3 76.0 69.4 70.3 78.2 73.7 71.6 l s = 5.06 4.10 3.40 4.20 7.77 4.44 2.52 5.86 6.43 6.54 l CV. 7.21 5.82 4.72 5.37 10.23 6.40 3.58 7.49 8.72 9.13 l CI ±29.0 23.5 19.5 24.1 44.5 25.4 14.4 33.6 36.8 37.5 l X3 = 73.03, s = 3.36, C.V.=3.5%, n=10, t0.995,9 = 3.250 l 99%C.I.(ave) = ±3.46 = 69.67 - 76.49
IVTPHL1012S1 32
DATA SET FOR SETTING SPECS. 2. SETS OF 3
} Set A: 67.0 67.5 76.0 72.7 73.1 65.8 75.6 71.9 68.8 82.6 } Set B: 78.1 74.2 70.8 84.9 72.2 66.4 74.5 67.3 73.2 68.7 } Set C: 69.0 85.0 75.0 74.7 69.5 70.5 81.1 74.0 76.6 64.2 } SQRT(10) = 3.162278 t0.995, 9 = 3.250 } A B C } 72.1 ± 5.13, 7.1% 73.0 ± 5.49, 7.5% 74.0 ± 6.08, 8.2% } CI.66.8 - 77.37: 5.2 67.4 - 80.6: 5.64 65.7 - 82.2: 8.23 } Ave(10s)= 73.03, s = 0.9300, C.V. = 1.3%, 99%C.I. = ± 5.33 } 99%CI = 67.7 - 78.4. SQRT(3) = 1.7321 t0.995,2 = 9.925
IVTPHL1012S1 33
DATA SET FOR SETTING SPECS. 3. SETS OF 10
} n Ave. s C.V. 99%C.I. SQRT(n) t0.995,n-1 } 2 67.25 0.35 0.5 15.9 1.4142 63.66 } 3 70.17 5.06 7.2 42.8 1.1731 9.925 } 4 70.80 4.32 6.1 12.6 2.0000 5.841 } 5 71.26 3.88 5.4 8.0 2.2361 4.604 } 6 70.35 4.12 5.9 6.8 2.4495 4.032 } 9 70.93 3.78 5.3 4.2 3.0000 3.355 } 12 72.78 4.97 6.8 4.5 3.4641 3.106 } 18 72.74 5.40 7.4 3.7 4.2426 2.898 } 24 73.13 5.45 7.5 3.1 4.8990 2.807 } 30 73.03 5.44 7.5 2.7 5.4773 2.756
IVTPHL1012S1 34
DATA SET FOR SETTING SPECS. 4. CUMULATIVE
35
Wilcoxon’s Signed Rank Test 1.
l Nonparametric test for paired test results. l Does the same thing as the paired t-test but without the
assumption of normalcy. l First, take your paired data and calculate the
differences, including their signs. l Second, place the differences in order (low to high)
based on their absolute values. l Third, assign a rank to the differences and assign to the
rank a sign according to the sign of the original difference. (continued)
36
Wilcoxon’s Signed Rank Test 2.
l Fourth, count the number or positive or negative ranks, take the group with the smaller number of members, and sum the absolute values of the ranks in that group. This will give a value, Tn, where n = the number of pairs.
l Go to a Wilcoxon table for n pairs and significance level of at least 95% to obtain a tabular value of Tn. For significance, the calculated value must be smaller than the tabular value for Tn.
37
Signed Rank Test: Example
l A minimum of 6 pairs is needed. l With 6 pairs, all of the differences must have the same
sign. This gives T6 = 0 which is significant at the 95% level.
l Differences from 19 pairs of test results. l Diff : +2, -4, -6, +8, +10, -11, -12, +13, +22, -25, l Rank:+1, -2, -3, +4, +5, -6, -7, +8, +9, -10, l Diff: -33, +33, +41, -45, +45, +45, +81, +92, +139 l Rank:-11.5,+11.5,+13,-15, +15, +15, +17, +18, +19
38
Signed Rank Test: Example: Continued
l There are 7 negative ranks and 12 positive ranks, so the absolute sum is taken of:
l -2, -3, -6, -7, -10, -11.5, and -15, this gives: l T19 = 54.5. The tabular value for T0.05, 19 is
46, so the data show no difference between the groups.
39
A Simpler Nonparametric Test 1.
l The following is not as powerful as the Signed Rank Test, but is faster and easier. It tests the hypothesis that p = 0.5 for a given sign. It is a Chi-square (χ2) test.
( )21
2212 1nn
nn+
−−=χ
40
Simpler Signed Rank Test 2.
l n1 and n2 are the number of positive and negative differences. From the previous data there are 12 positive and 7 negative differences so:
( )0.1
1916
194
7121712 22
2 <==+
−−=χ
41
Simpler Signed Rank Test 3.
l Usually, Χ2 > 1.0, so this indicates that there is no significance since the calculated Х2 should be larger than the tabular Χ2 for significance.
l This test can be adopted as a rapid and easy method to decide if further investigation is required. It is even possible to have prepared tables for use.