statistical controls for qc

1

Statistical Tools for the Quality Control Laboratory and Validation Studies:

Session 1

l  STEVEN S. KUWAHARA, Ph.D. l  GXP BioTechnology LLC

l  PMB #506 l  1669-2 Hollenbeck Avenue

l  Sunnyvale, CA 94087-5042 USA l  Tel. & FAX 408-530-9338

l  e-Mail: [email protected] l  Website: www.gxpbiotech.org

IVTPHL1012S1

2

NORMAL DISTRIBUTION

22/1

21 ⎟

⎠

⎞⎜⎝

⎛ −−

Π= σ

µ

σ

iX

eY

IVTPHL1012S1

3 IVTPHL1012S1

4 IVTPHL1012S1

5 IVTPHL1012S1

6

NORMAL DISTRIBUTION PROPERTIES

l  The normal distribution has the following properties: l  Bell-shaped l  Unimodal l  Symmetrical l  Extends from -∞ to +∞ (tails never reach zero frequency) l  Same value for mean, median, and mode l  This pattern of variation is common for manufacturing processes.

IVTPHL1012S1

7 IVTPHL1012S1

8

VARIANCE (S2)

( )

( )

( )( )1

1

1

222

22

22

2

−

Σ−Σ=

−

−Σ=

−

Σ−Σ

=

nnXXnS

nXXS

nnXX

S

ii

i

ii

IVTPHL1012S1

Averages and Standard Deviations and the SEM. 1.

l  All of the n measurements that go into the mean () must be measurements of the same thing. l  The mean of fruits and the mean of oranges are

different things unless all of the fruits are oranges. l  But then it is still the mean of oranges not fruits.

l  The standard deviation (s) is a measure of the variation among the n components of NOT the variation of itself. l  Thus the next item (n + 1) from the original population

should have a 95% chance of being within ± 1.96s of but not the next average (1).

Averages and Standard Deviations and the SEM. 2.

l  The variation in the averages is the standard error of the mean (SEM) which is: s/√n. l  Thus the next average (1) has a 95% probability

of being within ±1.96(s/√n) or ±1.96SEM of the original mean ().

l  When dealing with single numbers, s is used, but when dealing with means the SEM is the number to use. l  It is incorrect to use s to set a specification on a

value that is actually an average.

11

RANGE AND C.V.

l  The range can be related to the standard deviation for n<16.

RSDXXSVC

ddXXs sL

%100..

alue. tabular va is 22

==

−=

IVTPHL1012S1

12

F - TEST

98.228.9F :Note

s.experiment factorial andANOVAfor used istest that -F thefrom

different slightly is This :Note

10,10,05.0

0.05,3,3

21

22

2,1,

=

=

=

F

ssF dfdfα

13

Student’s t

ances.known vari

averages,t Independen

1

form. Basic

2

22

1

21

21

nn

xxt

ndfn

sxt

σσ

µ

+

−=

−=

−=

14

t-TEST vs THEORETICAL OR KNOWN VALUE

l  CHON Analysis. 9.55% H calculated. l  Data: 9.17, 9.09, 9.14, 9.10, 9.13, 9.27. n = 6, = 9.15,

s = ± 0.0654 l  t0.05/2, 5= 2.57, t0.01/2, 5 = 4.032, t0.001/2, 5 = 6.869, p < 0.001

98.146

0654.055.915.9

=−

=−

=

nsxt µ

15

KNOWN VARIANCES, t-TEST OF TWO AVERAGES

l  Karl Fischer H2O. σ = 0.025 from historical data. l  Data: Lot A: 0.50, 0.53, 0.47. l  Lot B: 0.53, 0.56, 0.51, 0.53, 0.50 l  n1=3, n2=5, x1=0.500, x2=0.526

l  t0.05/2.∞=1.96, df = n1 + n2 – 2 = 6, t0.05/2, 6 =2.447

( ) ( )424.1

5025.0

3025.0

526.0500.022=

+

−=t

16

t for Unknown and Equal Variances

221

2ps21= t21n if

21

2121

−+=

−=

+

−=

nndf

nxxn

nn

nn

ps

xxt

17

t-TEST, UNKNOWN BUT EQUAL VARIANCES, 1.

l  Data (mg/L Fe3+): Lot A: 6.1, 5.8, 7.0. l  Lot B: 5.9, 5.7, 6.1. xA=6.30, sA=0.6245, xB=5.90,

sB=0.2000.

( )( )

( ) ( )( ) ( )

4637.0131320.026245.02

75.92000.06245.0

00.39

22

2

2

2,2,2/05.0

=−+−

+=

==

=

Ps

F

F

18

t-TEST UNKNOWN BUT EQUAL VARIANCES. 2.

l  df = n1 + n2 - 2 df = 4

78.2

056.13333

4637.090.530.6

4,2/05.0 =

=+

−=

t

Xt

19

POOLED VARIANCE

( ) ( )211

21

222

211

−+−+−

=nn

snsnsp

20

t for Independent Averages with unknown and unequal variances.

2

11 2

2

2

22

1

2

1

21

2

22

1

21

2

22

1

21

21

−

+

⎟⎟⎠

⎞⎜⎜⎝

⎛

++

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=

+

−=

nns

nns

ns

ns

df

ns

nsxxt

21

t-TEST UNKNOWN AND UNEQUAL VARIANCES, 1.

l  Data:Extension of Previous Fe+3 mg/L study l  xA = 6.13, sA = 0.3529 l  xB = 5.76, sB = 0.1647

l  nA = nB = 10 l  F0.05/2,9,9 = 4.03

l  F = (0.3529)2 / (0.1647)2 l  F = 4.59

1 6.1 5.92 5.8 5.73 7.0 6.14 6.1 5.85 6.1 5.76 6.4 5.67 6.1 5.68 6.0 5.99 5.9 5.710 5.8 5.6

22


l  t.05/2,17 = 2.110

0044.30151664.037.0

10

21647.010

23529.0

76.513.6

==

+

−=⎟⎠⎞⎜

⎝⎛⎟

⎠⎞⎜

⎝⎛

t

t

23


2

11 2

2

22

1

2

1

21

2

2

22

1

21

−

+

⎟⎟⎠

⎞⎜⎜⎝

⎛

++

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=

nns

nns

ns

ns

df

24


( )( ) ( )

number wholea torounded 1723.19000081.0

0015666.00000669.00000141.0

0015666.0

2

110271261.0

1101245384.0

0395799.022

2

=−=

=+

=

−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

+

=

df

df

df

25

Paired t-Test

( )

1

1

22

21

−

∑∑−

=

−=−==

nndd

s

ndfxxdnsdt

d

iid

26

DATA FOR t -TESTS l  Sample New Original d l  1. 12.1% 14.7% 2.6% l  2. 10.9 14.0 3.1 l  3. 13.1 12.9 -0.2 l  4. 14.5 16.2 1.7 l  5. 9.6 10.2 0.6 l  6. 11.2 12.4 1.2 l  7. 9.8 12.0 2.2 l  8. 13.7 14.8 1.1 l  9. 12.0 11.8 -0.2 l  10 9.1 9.7 0.6 l  ave. 11.60 12.87 1.27 l  s 1.814 2.075 1.126

27

Paired t-Test Calculation

exists. difference tsignifican a Therefore

26.2

567.310126.127.1

9,2/05.0 =

===

t

nSdtd

28

t-Test for unknown but equal variances.

l  Showing that there is no significant difference?

10.2t182ndf 457.1

1010100

9488.187.1260.11

0.05/2,18

21

21

2121

=

=−+==+

−=

+

−=

nt

nnnn

SXXtp

29

Student’s t to a C.I.

.confidence desired theand freedom of degrees 1-nfor

table- ta from taken is t of valueThe

1

nts form. Basic

ntsx

ndf

xn

sxt

±=

−=

−=−

=

µ

µµ

30

CONFIDENCE INTERVAL 1.

30.4

..

96.1..

2,05.01,05.0 =

±=

±=

− ttntsXIC

IC

n

σµ

}  67.0 65.8 78.1 66.4 69.0 70.5 }  67.5 75.6 74.2 74.5 85.0 81.1 }  76.0 71.9 70.8 67.3 75.0 74.0 }  72.7 68.8 84.9 73.2 74.7 76.6 }  73.1 82.6 72.2 68.7 69.5 64.2

}  n = 30, range = 64.2 - 85.0 range = 20.8 }  Ave. = 73.03 s or σ = 5.4416 SQRT(30) = 5.4772 }  t0.995, 29=2.756 99%C.I.(t) = 70.29 - 75.77

IVTPHL1012S1 31

DATA SET FOR SETTING SPECS. 1.

l  67.0 72.7 71.9 82.6 70.8 66.4 73.2 85.0 69.5 74.0 l  67.5 73.1 68.8 78.1 84.9 74.5 68.7 75.0 70.5 64.2 l  76.0 65.8 75.6 74.2 72.2 67.3 69.0 74.7 81.1 64.2

l  Ave.70.2 70.5 72.1 78.3 76.0 69.4 70.3 78.2 73.7 71.6 l  s = 5.06 4.10 3.40 4.20 7.77 4.44 2.52 5.86 6.43 6.54 l  CV. 7.21 5.82 4.72 5.37 10.23 6.40 3.58 7.49 8.72 9.13 l  CI ±29.0 23.5 19.5 24.1 44.5 25.4 14.4 33.6 36.8 37.5 l  X3 = 73.03, s = 3.36, C.V.=3.5%, n=10, t0.995,9 = 3.250 l  99%C.I.(ave) = ±3.46 = 69.67 - 76.49

IVTPHL1012S1 32

DATA SET FOR SETTING SPECS. 2. SETS OF 3

}  Set A: 67.0 67.5 76.0 72.7 73.1 65.8 75.6 71.9 68.8 82.6 }  Set B: 78.1 74.2 70.8 84.9 72.2 66.4 74.5 67.3 73.2 68.7 }  Set C: 69.0 85.0 75.0 74.7 69.5 70.5 81.1 74.0 76.6 64.2 }  SQRT(10) = 3.162278 t0.995, 9 = 3.250 }  A B C }  72.1 ± 5.13, 7.1% 73.0 ± 5.49, 7.5% 74.0 ± 6.08, 8.2% }  CI.66.8 - 77.37: 5.2 67.4 - 80.6: 5.64 65.7 - 82.2: 8.23 }  Ave(10s)= 73.03, s = 0.9300, C.V. = 1.3%, 99%C.I. = ± 5.33 }  99%CI = 67.7 - 78.4. SQRT(3) = 1.7321 t0.995,2 = 9.925

IVTPHL1012S1 33

DATA SET FOR SETTING SPECS. 3. SETS OF 10

}  n Ave. s C.V. 99%C.I. SQRT(n) t0.995,n-1 }  2 67.25 0.35 0.5 15.9 1.4142 63.66 }  3 70.17 5.06 7.2 42.8 1.1731 9.925 }  4 70.80 4.32 6.1 12.6 2.0000 5.841 }  5 71.26 3.88 5.4 8.0 2.2361 4.604 }  6 70.35 4.12 5.9 6.8 2.4495 4.032 }  9 70.93 3.78 5.3 4.2 3.0000 3.355 }  12 72.78 4.97 6.8 4.5 3.4641 3.106 }  18 72.74 5.40 7.4 3.7 4.2426 2.898 }  24 73.13 5.45 7.5 3.1 4.8990 2.807 }  30 73.03 5.44 7.5 2.7 5.4773 2.756

IVTPHL1012S1 34

DATA SET FOR SETTING SPECS. 4. CUMULATIVE

35

Wilcoxon’s Signed Rank Test 1.

l  Nonparametric test for paired test results. l  Does the same thing as the paired t-test but without the

assumption of normalcy. l  First, take your paired data and calculate the

differences, including their signs. l  Second, place the differences in order (low to high)

based on their absolute values. l  Third, assign a rank to the differences and assign to the

rank a sign according to the sign of the original difference. (continued)

36

Wilcoxon’s Signed Rank Test 2.

l  Fourth, count the number or positive or negative ranks, take the group with the smaller number of members, and sum the absolute values of the ranks in that group. This will give a value, Tn, where n = the number of pairs.

l  Go to a Wilcoxon table for n pairs and significance level of at least 95% to obtain a tabular value of Tn. For significance, the calculated value must be smaller than the tabular value for Tn.

37

Signed Rank Test: Example

l  A minimum of 6 pairs is needed. l  With 6 pairs, all of the differences must have the same

sign. This gives T6 = 0 which is significant at the 95% level.

l  Differences from 19 pairs of test results. l  Diff : +2, -4, -6, +8, +10, -11, -12, +13, +22, -25, l  Rank:+1, -2, -3, +4, +5, -6, -7, +8, +9, -10, l  Diff: -33, +33, +41, -45, +45, +45, +81, +92, +139 l  Rank:-11.5,+11.5,+13,-15, +15, +15, +17, +18, +19

38

Signed Rank Test: Example: Continued

l There are 7 negative ranks and 12 positive ranks, so the absolute sum is taken of:

l  -2, -3, -6, -7, -10, -11.5, and -15, this gives: l T19 = 54.5. The tabular value for T0.05, 19 is

46, so the data show no difference between the groups.

39

A Simpler Nonparametric Test 1.

l  The following is not as powerful as the Signed Rank Test, but is faster and easier. It tests the hypothesis that p = 0.5 for a given sign. It is a Chi-square (χ2) test.

( )21

2212 1nn

nn+

−−=χ

40

Simpler Signed Rank Test 2.

l  n1 and n2 are the number of positive and negative differences. From the previous data there are 12 positive and 7 negative differences so:

( )0.1

1916

194

7121712 22

2 <==+

−−=χ

41

Simpler Signed Rank Test 3.

l Usually, Χ2 > 1.0, so this indicates that there is no significance since the calculated Х2 should be larger than the tabular Χ2 for significance.

l This test can be adopted as a rapid and easy method to decide if further investigation is required. It is even possible to have prepared tables for use.

statistical controls for qc

Health & Medicine