-
Comparison of alternative measurement methods
Siriporn Kuttatharmmakul, D. Luc Massart, Johanna Smeyers-Verbeke*
ChemoAC, Pharmaceutical Institute, Vrije Universiteit Brussel, Laarbeeklaan 103, B-1090, Brussel, Belgium
Received 19 February 1998; received in revised form 14 July 1998; accepted 14 July 1998
Abstract
A procedure to compare the performance (precision and bias) of an alternative measurement method and a reference method
has been extensively described. It is based on ISO 5725-6 which has been adapted to the intralaboratory situation. This means
that the proposed approach does not evaluate the reproducibility, but considers the (operatorinstrumenttime)-differentintermediate precision and/or the time-different intermediate precision. A 4-factor nested design is used for the study. The
calculation of different variance estimates from the experimental data is carried out by ANOVA. The Satterthwaite
approximation is included to determine the number of degrees of freedom associated with the compound variances. Taken into
account the acceptable bias, the acceptable ratio between the precision parameters of the two methods, the significance level and the probability to wrongly accept an alternative method with an unacceptable performance, the formulae to determinethe number of measurements required for the comparison are given. For the evaluation of the bias, in addition to the point
hypothesis testing, the interval hypothesis testing is also included as an alternative. Two examples are given as an illustration
of the proposed approach. # 1999 Elsevier Science B.V. All rights reserved.
Keywords: Comparison; Alternative measurement method; Bias; Precision; Repeatability; Time-different intermediate precision;
(Operatorinstrumenttime)-different intermediate precision; Nested design; ANOVA; Satterthwaite approximation; Interval hypothesistesting
1. Introduction
When a laboratory wants to replace an existing
analytical method by a new method (e.g. because
the latter is cheaper or easier to use) it has to show
that the new method performs at least as good as the
existing one. A comparison of the performance (pre-
cision and bias) of both methods has therefore to be
performed. One of the most advanced guidelines for
the comparison of two methods can be found in ISO
5725-6 [1]. However the ISO guideline is based on
interlaboratory studies and is therefore not applicable
in the intralaboratory situation. Indeed within a single
laboratory, the reproducibility, as evaluated by ISO,
cannot be determined but intermediate precision con-
ditions, such as changes in operator, equipment and
time should be considered since they contribute to the
variability of measurements performed in the labora-
tory.
In the ISO guideline the reference method is an
international standard method that was studied in an
interlaboratory test program and its precision (2) isassumed to be known. This assumption is reasonable
since the precision is obtained from a large number of
measurements. In the intralaboratory situation a
Analytica Chimica Acta 391 (1999) 203225
*Corresponding author. Tel.: +32-2477-4737; fax: +32-2477-
4735; e-mail: [email protected]
0003-2670/99/$ see front matter # 1999 Elsevier Science B.V. All rights reserved.PII: S 0 0 0 3 - 2 6 7 0 ( 9 9 ) 0 0 1 1 5 - 4
-
laboratory has developed a first method and later on
wishes to compare a new method to the older already
internally validated method. For the latter, referred to
as the reference method, only an estimate of the
precision (s2) will be available since the precision is
determined from a rather limited number of measure-
ments. This of course determines the statistical tests to
be used in the comparison of the performance char-
acteristics of both methods.
Moreover, the ISO standard is meant to show that
both methods have similar precision and/or trueness
whereas a laboratory that performs a method compar-
ison study is interested to evaluate whether the new
method is at least as good as the reference method.
This implies that some two-sided statistical tests
included in the ISO guideline are not appropriate
for the comparison of two methods in a single labora-
tory, where example in the evaluation of the precision
one-sided tests have to be considered.
In the decision making concerning the new alter-
native method it is important (i) not to reject an
alternative method which in fact is appropriate, and
(ii) not to accept an alternative method which in fact is
not appropriate. The former is related to the a-error ofthe statistical tests used in the comparison and is
controlled through the selection of the significance
level. The latter is related to the b-error and when it isconsidered it is generally taken into account by includ-
ing sample size calculations. This approach is also
included in the ISO guideline.
In this article we propose an adaptation of the ISO
guideline to the intralaboratory comparison of two
methods. It is also applicable to the situation in which
two laboratories of, e.g., the same organisation are
involved, each laboratory being specialized in one of
the methods. For the evaluation of the bias, in addition
to the point hypothesis testing, interval hypothesis
testing [2] in which the probability of accepting a
method that is too much biased is controlled, is also
included.
Due to the specified acceptance criteria for the
alternative method, the proposed approach might lead
to a large number of measurements to be performed.
An alternative approach (which will be described in a
next article) is to perform the method comparison with
a user-defined number of measurements and to eval-
uate the probability that a method with an unaccep-
table performance will be accepted.
2. Methods
All symbols and abbreviations used in this paper are
defined in Table 1.
2.1. Experimental design
A 4-factor nested experimental design is used
[37]. This design is also one of the designs recom-
mended by ISO [3]. The schematic layout of the
design is given in Fig. 1. The four factors represent
four sources of variation that contribute to the varia-
bility of the measurements within one laboratory. The
factors considered are operator, instrument, time, and
random error. The experimental approach can be
described as follows. For each analytical method,
the sample is analysed by m operators. Each operator
performs, on each of q instruments, n replicated
measurements on each of p different days. To avoid
an underestimation of the day effect, the set of p
different days during which the measurements are
performed on each of the q instruments must be
different, i.e. two instruments cannot be operated on
the same day.
Fig. 1. Schematic layout for the 4-factor nested experimental
design applied. Only the nested structure under the ith operator, jth
instrument and the kth day is shown here. The nested structure
under other operators, instruments and days has the same pattern.
(instruinstrument, repreplicate).
204 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
Table 1
Definition of symbols and abbreviations applied in the document
d Absolute difference between the grand means obtained with two methods
D Component of day effect in a test result
E Random error component occurring in every test result
FI(OIT) Calculated F-value obtained from the comparison of (operatorinstrumenttime)-different intermediate precision (variance)FI(T) Calculated F-value obtained from the comparison of time-different intermediate precision (variance)
Fr Calculated F-value obtained from the comparison of repeatability variance
FB ;A Value of the F-distribution with B degrees of freedom associated with the numerator and A degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value
FA ;B Value of the F-distribution with A degrees of freedom associated with the numerator and B degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value
I Component of instrumental effect in a test result
m Number of operators
M General mean (expectation) of the test results
MS Mean squares
n Number of replicates performed on each day
n Average number of replicates performed on each dayN Total number of measurements
O Component of operator effect in a test result
p Number of days
q Number of instruments
s Estimate of
s2 Estimate of 2
tcal Calculated t-value obtained from the comparison of the means obtained with two methods
t/2 Two-sided tabulated t-value at significance level and degrees of freedom
t One-sided tabulated t-value at significance level and degrees of freedom
UCL Upper confidence limit
y Test resulty Grand mean of test resultsyi Arithmetic mean of the test results obtained from the ith operatoryij Arithmetic mean of the test results obtained from the ith operator and the jth instrumentyijk Arithmetic mean of the test results obtained from the ith operator, the jth instrument and the kth dayyijkL Particular test result related to the Lth replicate of the kth day, the jth instrument and the ith operator
z/2 Two-sided tabulated z-value of the standard normal distribution at significance level
Significance level (type I error probability)
Type II error probability Detectable difference between the means obtained from the two methods
Numbers of degrees of freedom
Detectable ratio between the repeatability standard deviations of method B and method A True value of a standard deviation
2 True value of variance
I(OIT) Detectable ratio between the square roots of the (operatorinstrumentday) mean squares (or the (operatorinstrumenttime)-different intermediate precision (standard deviation)) of method B and method A
I(T) Detectable ratio between the square roots of the between-day mean squares (or the time-different intermediate precision
(standard deviation)) of method B and method A
Symbols used as superscripts and subscripts
A Method A
B Method B
d Difference between the grand means obtained with two methods
D Between-day
E Residual
I Between-instrument
I(T) Time-different intermediate precision
I(OIT) (Operatorinstrumenttime)-different intermediate precisioni Index for a particular operator
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 205
-
2.2. Basic statistical model
To understand the following statistical approach, it
is necessary to briefly explain the basic statistical
model. More details can be found in [3].
Here, we assume that every test result y obtained
with a particular analytical method is the sum of five
components
y M O I D E; (1)where M is the general mean (expectation) of the test
results, O the random effect caused by changing the
operator, I the random effect caused by changing the
instrument, D the random effect caused by the fact that
measurements are performed on different days, and E
is the random error occurring in every measurement
under repeatability conditions.
These four factors (operator, instrument, time, and
random error under repeatability conditions) are
selected for our approach, since they are the main
sources that contribute to the variability of the mea-
surements within a laboratory. The precision of the
method is then determined by the contribution from
the variance (2) of each factor, i.e. 2O; 2I ;
2D and
2E, which are estimated as s2O; s
2I ; s
2D and s
2E, respec-
tively. Since it can be assumed that these estimated
variance components are not related, the estimation of
the overall precision parameter also called the (oper-
atorinstrumenttime)-different intermediate preci-sion S2IOIT can be obtained by the sum of allvariance components: s2O s2I s2D s2E. It is an esti-mate of the variance of an individual measurement
made by an arbitrary operator on an arbitrary instru-
ment. When in the laboratory, the analyses are per-
formed by the same operator on a single instrument,
the overall precision corresponds with the time-dif-
ferent intermediate precision which is obtained as
s2IT s2D s2E. The intermediate precision is usefulfor indicating the ability of the analytical method to
repeat the test result under the defined conditions.
2.3. Calculation of the variance estimates
In analogy with ISO guidelines [3], the calculation
of different variance estimates is carried out by
ANOVA (see Table 2). In case that the numbers of
replicates per day (nijk), as well as the numbers of
instruments performed by each operator (qi), are equal
for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij,the calculation is simplified as shown in Table 3. The
number of days (pij) might not be constant for different
operators and instruments if the detection of outlying
day means yijk leads to the rejection of some data. Ifhowever pij is equal for all i and j then the termsPm
i1Pq
j1 pij andPm
i1Pq
j1pij 1 which appearin Table 3 are simply replaced by mqp and mq(p1),respectively. Throughout the rest of the text the cal-
culations as represented in Table 3 will be considered.
No calculation is given for the individual variance
component for operators s2O and for the individualvariance component for instruments s2I in theANOVA tables. Since the number of operators and
instruments within a single laboratory is generally
limited, a small value for the degrees of freedom
associated with the variance components, s2O and s2I ,
is to be expected. Consequently, poor estimates for s2Oand s2I will be obtained. Therefore, besides the time-
different intermediate precision s2IOIT, the (opera-torinstrumenttime)-different intermediate preci-sion s2IOIT is estimated as shown in Table 3. Thisestimate includes the calculation of MSOID which is
obtained from the sum of squared differences between
Table 1 (Continued )
j Index for a particular instrument
k Index for a particular day
L Index for a particular test result performed by the ith operator, on the jth instrument and kth day
m Number of operators
nijk Number of replicates performed by the ith operator on the jth instrument and kth day
pij Number of days performed by the ith operator on the jth instrument
qi Number of instruments performed by the ith operator
O Between-operator
OID (Operatorinstrumentday)r Repeatability
206 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
Table 2
Calculation of the variance components (ANOVA table)
Source Mean squares Estimate of
Operatorinstrumentday
MSOID
Xmi1
Xqij1
Xpijk1
nijkyijk y2
Xmi1
Xqij1
pij 1
2r n2OID
Day MSD
Xmi1
Xqij1
Xpijk1
nijkyijk yij2Xmi1
Xqij1pij 1
2r n2D
Residual MSE
Xmi1
Xqij1
Xpijk1
XnijkL1yijkL yijk2
Xmi1
Xqij1
Xpijk1nijk 1
2r
yijk
XnijkL1
yijkL
nijk; yij
Xpijk1
nijkyijkXpijk1
nijk
; y
Xmi1
Xqij1
Xpijk1
nijkyijk
N ; n N
Xmi1
Xqij1
Xpijk1nijk2=N
!Xmi1
Xqij1
pij 1
0BBBB@1CCCCA; N
Xmi1
Xqij1
Xpijk1
nijk total number of measurements
0BBBB@1CCCCA
Calculation of the variance estimates
The repeatability variance s2r MSE; Xmi1
Xqij1
Xpijk1nijk 1
The between-day variance component s2D MSD MSE
nif s2D < 0 set s
2D 0
The (operatorinstrumentday) variance component s2OID MSOID MSE
nif s2OID < 0 set s
2OID 0
Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE
n
(Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE
n
Variance of the day means yijk s2yijk
Xmi1
Xqij1
Xpijk1yijk y2
Xmi1
Xqij1
pij 1 MSOID
n s2OID s2r =n;
Xmi1
Xqij1
pij 1
nijk is the number of replicates on the kth day performed on the jth instrument by the ith operator (L1, 2, . . ., nijk); pij the number of days performed on the jth instrument by the ithoperator (k1, 2, . . ., pij); qi the number of instruments performed by the ith operator (j1, 2, . . ., qi); m is the number of operators (i1, 2, . . ., m).
S.
Ku
ttath
arm
ma
kul
eta
l./An
alytica
Chim
icaA
cta391
(1999)
203225
207
-
Table 3
Calculation of the variance components in case of equal nijk and equal qi for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij. Only pij that may be unequal for different operatorsand instruments due to possible rejection of some discordant data (ANOVA table)
Source Mean squares Estimate of
Operatorinstrumentday MSOID nXmi1
Xqj1
Xpijk1yijk y2Xm
i1
Xqj1
pij 12r n2OID
Day MSD nXmi1
Xqj1
Xpijk1yijk yij2Xm
i1
Xqj1pij 1
2r n2OID
Residual MSE
Xmi1
Xqj1
Xpijk1
XnL1yijkL yijk2
n 1Xmi1
Xqj1
pij
2r
yijk
XnL1
yijkL
n; yij
Xpijk1
yijk
pij; y
Xmi1
Xqj1
Xpijk1
XnL1
yijkL
nXmi1
Xqj1
pij
0BBBB@1CCCCA
Calculation of the variance estimates
The repeatability variance s2r MSE; n 1Xmi1
Xqj1
pij
The between-day variance component s2D MSD MSE
nif s2D < 0 set s
2D 0
The (operatorinstrumentday) variance component s2OID MSOID MSE
nif s2OID < 0 set s
2OID 0
Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE
n
(Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE
n
Variance of the day means yijk s2yijk
Xmi1
Xqj1
Xpijk1yijk y2
Xmi1
Xqj1
pij 1 MSOID
n s2OID s2r =n;
Xmi1
Xqj1
pij 1
n is the number of replicates (L1, 2, . . ., n); pij the number of days performed on the jth instrument by the ith operator (k1, 2, . . ., pij); q the number of instruments (j1, 2, . . ., q);m is the number of operators (i1, 2, . . ., m).
20
8S
.K
utta
tha
rmm
aku
let
al./A
na
lyticaC
him
icaA
cta391
(1999)
203225
-
the day means yijk and the grand mean y. Thismight result in an underestimation of the effects of the
instrument and the operator since those parameters are
not changed for every yijk obtained. However, this isthe best possible approach to estimate the intermediate
precision s2IOIT with small numbers of operators andinstruments and although it might not adequately
reflect the true precision it is useful for comparison
studies as long as the number of operators and instru-
ments for the methods being compared are equal.
Considering the formulae to calculate the between-
day variance component s2D and the (opera-
torinstrumentday) variance component s2OID inTable 3, negative values for those parameters can be
obtained. For example, if due to random effects MSDis smaller than MSE, we will get a negative value for
s2D. In that case, the negative estimates of variance are
given the value 0. This is the usual practice which is
also considered by ISO [8] if a negative value for the
between-laboratory variance s2L is obtained. Another
approach to deal with negative variance estimates is
reported in [9]. It applies the method of pooling
minimal mean squares with predecessors.
2.4. Number of measurements
As mentioned earlier the probability to accept an
alternative method, which in fact is not appropriate
(-error) because it is not precise enough or toomuch biased in comparison with the reference method,
can be controlled by determining the number of
measurements required to detect a certain bias as well
as a certain difference in precision (if it exists). This
implies that an acceptable difference between the
means of the two methods as well as an acceptable
ratio between the precision parameters of the two
methods have to be specified. The former is called
by ISO the detectable difference between the biases of
the two methods, , and is defined as the minimumdifference between the means of the two methods that
the experimenter wishes to detect with high probabil-
ity. The latter is called by ISO the detectable ratio
between the precision parameters of the two methods.
It is defined as the minimum ratio of precision para-
meters that the experimenter wishes to detect with
high probability from the results obtained with the two
methods. In analogy with what is given in ISO, the
detectable ratio to be considered in the intralaboratory
situation are:
rBrA
for the comparison of repeatabilities;
IT
MSDBMSDA
sfor the comparison of time-
different intermediate precisions;
IOIT
MSOIDBMSOIDA
sfor the comparison of
operator instrument time-differentintermediate precisions:
Due to the complexity in the determination of the
degrees of freedom associated with I(T) and I(OIT)(see further), the detectable ratios I(T) and I(OIT) aregiven in terms of the mean squares.
It is recommended to use a significance level of
0.05 in the comparison of the precision parametersand the means ( represents the probability that thealternative method B is rejected when in fact its
performance is not worse than that of the reference
method A). ISO recommends that the risk of failing to
detect the chosen minimum ratio of standard devia-
tions or the minimum difference between the means is
set at 0.05. For the intralaboratory situation thismight be too stringent and therefore 0.05 as well as0.2 will be considered. The latter is inspired by therequirement in bioequivalence studies [10], where it is
demanded that the statistical tests have 80% power
(power100(1)).
2.4.1. Determination of the minimum number of
measurements required for the detection of In the ISO document [1], the precision (2) is
assumed to be known and the repeatability variance
as well as the between-laboratory variance is included
in the calculation for the optimal number of measure-
ments. In what follows, this is adapted to the situation
in which only an estimate of the precision (s2) is
available. This requires the use of t-values instead
of z-values (applied in ISO). Moreover, the repeat-
ability variance as well as the (operatorinstrumentday) variance component is considered.
The following equation is used for the determina-
tion of the minimum number of measurements
required for the detection of .
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 209
-
where the subscript A and B refer to method A and
method B, respectively, t/2: two-sided tabulated
t-value at significance level and degrees of freedommAqApAmBqBpB2, t,: one-sided tabulatedt-value at significance level and degrees of freedommAqApAmBqBpB2.
This expression is based on the t-test for the com-
parison of two means and therefore assumes that the
precision of both methods are equal. This assumption
should be acceptable for an estimation of the optimal
number of measurements. If the precision of the
alternative method B is unknown which might often
be the case, it is substituted by the precision of the
reference method A.
where is the acceptable difference between themeans, which one wants to detect with (1)100%confidence from a two-tailed t-test performed at the
significance level . The t-distribution of the non-zeromean difference is a non-central t-distribution. There-
fore, instead of (t/2t), the non-centrality parameterof the non-central t-distribution should be used. An
evaluation of the effect of approximating this by
means of the central t-distribution indicated that very
similar results are obtained. Therefore the central
t-distribution is used.
As indicated earlier, it is strongly recommended to
have the same numbers of operators (mAmB) andinstruments (qAqB) for both methods. If moreover,the number of days as well as the number of replicates
are taken the same for both methods, i.e. pApB andnAnB, Eq. (3) simplifies to
t=2 t
2s2OIDA s2rA=nA
mAqApA
s: (4)
Generally, the number of operators (mAmB) andinstruments (qAqB) will be fixed by practical con-straints. It is recommended that the number of repli-
cates per day is equal to 2 (n2) and to focus on thenumber of days required since this will lead to a
balanced design in which the number of degrees of
freedom associated with the repeatability is almost the
same as the number of degrees of freedom associated
with the between-day component. Therefore, the
minimum number required is mostly determined only
for the number of days pA (pB) which can beobtained by finding the smallest value for pA that
satisfies Eq. (4).
The equations above are only approximates which
could be further simplified by replacing (t/2t) by aconstant value. Indeed for 0.05 and 0.05,(t/2t) varies between 3.6 (1) and 3.9 (14,
i.e. mqp2) and therefore a constant value equal to4 could be used. For 0.05 and 0.2, (t/2t)varies between 2.8 (1) and 3.0 (14, i.e.mqp2), thus a constant value of 3.0 could beapplied. Eq. (4) then becomes
4
2s2OIDA s2rA=nA
mAqApA
swhen 0:05 and 0:05; (5)
3
2s2OIDA s2rA=nA
mAqApA
swhen 0:05 and 0:2: (6)
2.4.2. Determination of the minimum number of
measurements required for the detection of the
minimum ratio of precision parameters
In the ISO document [1], values of the minimum
detectable ratio of the precision parameters corre-
sponding to the chosen degrees of freedom (A, B)are given for the significance level 0.05 and thepower (1)0.95. Since ISO applies a two-sidedF-test to check whether the two methods have
t=2 t
mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDB s2rB=nB
mAqApA mBqBpB 21
mAqApA 1
mBqBpB
s; (2)
t=2 t
mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDA s2rA=nB
mAqApA mBqBpB 21
mAqApA 1
mBqBpB
s(3)
210 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
different precision, these values are obtained based on
a two-sided F-test. In our approach, the objective is to
demonstrate that the precision of the alternative
method B is at least as good as that of the reference
method A. Therefore, a one-sided F-test is applied to
compare the precision of both methods. Consequently,
the calculation of the minimum ratio of precision
parameters or corresponding to the given valuesof (A, B, , ) can be computed as
; IT or IOIT
FA;B FB;A
p; (7)
where
A mAqApAnA 1 andB mBqBpBnB 1 in case that is considered;
(8)
A mAqApA 1 andB mBqBpB 1 in case that IT is considered;
(9)
A mAqApA 1 andB mBqBpB 1 in case that IOIT is considered:
(10)
Tables 4 and 5 give the minimum ratios of precision
parameters (, I(T) or I(OIT)) as a function of thedegrees of freedom A and B for (0.05, 0.05)and (0.05, 0.2), respectively. If the methodprecision is known, the degrees of freedom equalto 200 can be used.
With mAmB, qAqB and nAnB2, the minimumnumbers of days required for the detection of the
minimum ratio , I(T) or I(OIT) can be obtained byfirst finding the smallest values for the degrees of
freedom (A and B) that satisfy Eq. (7) and theassociated minimum number of days can be calculated
from Eq. (8) or Eq. (9) or Eq. (10) depending on
which precision parameters are considered. When
the values of and considered correspond to thosegiven in Table 4 or Table 5, the minimum values for
the degrees of freedom are directly obtained by look-
ing for the tabulated , I(T) or I(OIT) that is closest to(preferably smaller than) the given detectable ratio ,I(T) or I(OIT) and finding its associated numbers ofdegrees of freedom (A, B).
The minimum number of measurements required is
computed for the minimum difference , as well as forthe minimum ratios , I(T) and I(OIT) and the largestvalue is chosen to perform the method comparison.
Table 4
Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.05)B A
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200
5 5.05 4.66 4.40 4.22 4.08 3.97 3.88 3.81 3.75 3.70 3.66 3.62 3.59 3.56 3.54 3.52 3.43 3.27 3.15
6 4.66 4.28 4.03 3.85 3.72 3.61 3.53 3.46 3.40 3.36 3.31 3.28 3.25 3.22 3.20 3.17 3.09 2.93 2.81
7 4.40 4.03 3.79 3.61 3.48 3.38 3.29 3.23 3.17 3.12 3.08 3.05 3.02 2.99 2.96 2.94 2.86 2.70 2.59
8 4.22 3.85 3.61 3.44 3.31 3.21 3.13 3.06 3.00 2.96 2.92 2.88 2.85 2.82 2.80 2.78 2.70 2.54 2.42
9 4.08 3.72 3.48 3.31 3.18 3.08 3.00 2.93 2.88 2.83 2.79 2.75 2.72 2.70 2.67 2.65 2.57 2.41 2.29
10 3.97 3.61 3.38 3.21 3.08 2.98 2.90 2.83 2.78 2.73 2.69 2.66 2.62 2.60 2.57 2.55 2.47 2.31 2.19
11 3.88 3.53 3.29 3.13 3.00 2.90 2.82 2.75 2.70 2.65 2.61 2.58 2.55 2.52 2.49 2.47 2.39 2.23 2.11
12 3.81 3.46 3.23 3.06 2.93 2.83 2.75 2.69 2.63 2.59 2.55 2.51 2.48 2.45 2.43 2.41 2.33 2.16 2.05
13 3.75 3.40 3.17 3.00 2.88 2.78 2.70 2.63 2.58 2.53 2.49 2.46 2.42 2.40 2.37 2.35 2.27 2.11 1.99
14 3.70 3.36 3.12 2.96 2.83 2.73 2.65 2.59 2.53 2.48 2.44 2.41 2.38 2.35 2.33 2.30 2.22 2.06 1.94
15 3.66 3.31 3.08 2.92 2.79 2.69 2.61 2.55 2.49 2.44 2.40 2.37 2.34 2.31 2.29 2.26 2.18 2.02 1.90
16 3.62 3.28 3.05 2.88 2.75 2.66 2.58 2.51 2.46 2.41 2.37 2.33 2.30 2.28 2.25 2.23 2.15 1.98 1.86
17 3.59 3.25 3.02 2.85 2.72 2.62 2.55 2.48 2.42 2.38 2.34 2.30 2.27 2.24 2.22 2.20 2.12 1.95 1.83
18 3.56 3.22 2.99 2.82 2.70 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.24 2.22 2.19 2.17 2.09 1.92 1.80
19 3.54 3.20 2.96 2.80 2.67 2.57 2.49 2.43 2.37 2.33 2.29 2.25 2.22 2.19 2.17 2.15 2.06 1.90 1.77
20 3.52 3.17 2.94 2.78 2.65 2.55 2.47 2.41 2.35 2.30 2.26 2.23 2.20 2.17 2.15 2.12 2.04 1.87 1.74
25 3.43 3.09 2.86 2.70 2.57 2.47 2.39 2.33 2.27 2.22 2.18 2.15 2.12 2.09 2.06 2.04 1.96 1.78 1.65
50 3.27 2.93 2.70 2.54 2.41 2.31 2.23 2.16 2.11 2.06 2.02 1.98 1.95 1.92 1.90 1.87 1.78 1.60 1.45
200 3.15 2.81 2.59 2.42 2.29 2.19 2.11 2.05 1.99 1.94 1.90 1.86 1.83 1.80 1.77 1.74 1.65 1.45 1.26
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 211
-
2.5. Evaluation of test results
For each test sample, the following parameters are
to be computed:
s2rA ; s2rB
estimates of the repeatability variance
for methods A and B, respectively
s2DA ; s2DB
estimates of the between-day var-
iance component for methods A and
B, respectively
s2OIDA ; s2OIDB
estimates of the (operatorinstru-mentday) variance component formethods A and B, respectively
s2ITA ; s2ITB estimates of the time-different inter-
mediate precision (variance) for
methods A and B, respectively
s2IOITA ; s2IOITB estimates of the (operatorinstru-
menttime)-different intermediateprecision (variance) for methods A
and B, respectively
s2yijkA ; s2yijkB estimates of the variance of the day-
means yijk for methods A and B,respectively
yA; yB grand means obtained from methodsA and B, respectively.
Calculation of all these parameters are given in
Tables 2 and 3.
2.5.1. Comparison of precision
As mentioned before, it is important to show that the
precision of method B is at least as good as that of
method A. Therefore a one-sided F-test is applied here
instead of the two-sided test used in ISO [1]. The null
hypothesis H0 is that the precision of the alternative
method B is better than or equal to the precision of the
reference method A H0 : 2B 2A and the alterna-tive hypothesis H1 is that the precision of the alter-
native method B is worse than the precision of the
reference method A H1 : 2B > 2A.
2.5.1.1. Comparison of repeatability. To compare the
repeatability of two methods, the sample statistic Fr is
calculated as follows:
Fr s2rBs2rA
(11)
Table 5
Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.2)B A
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200
3 5.22 4.41 4.00 3.76 3.60 3.48 3.39 3.32 3.27 3.23 3.19 3.16 3.13 3.11 3.09 3.07 3.05 3.04 2.99 2.89 2.81
4 4.76 3.98 3.59 3.35 3.19 3.08 2.99 2.92 2.87 2.83 2.79 2.76 2.74 2.71 2.69 2.68 2.66 2.65 2.60 2.49 2.42
5 4.51 3.74 3.35 3.12 2.96 2.85 2.77 2.70 2.65 2.60 2.57 2.54 2.51 2.49 2.47 2.45 2.44 2.42 2.37 2.27 2.20
6 4.35 3.59 3.21 2.97 2.82 2.70 2.62 2.55 2.50 2.46 2.42 2.39 2.37 2.34 2.32 2.31 2.29 2.28 2.22 2.12 2.05
7 4.24 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.36 2.32 2.29 2.26 2.24 2.22 2.20 2.19 2.17 2.12 2.02 1.94
8 4.15 3.41 3.03 2.79 2.64 2.53 2.44 2.38 2.32 2.28 2.24 2.21 2.19 2.16 2.14 2.13 2.11 2.10 2.04 1.94 1.86
9 4.09 3.35 2.97 2.74 2.58 2.47 2.38 2.32 2.26 2.22 2.18 2.15 2.13 2.10 2.08 2.07 2.05 2.04 1.98 1.88 1.80
10 4.04 3.30 2.92 2.69 2.53 2.42 2.34 2.27 2.22 2.17 2.14 2.11 2.08 2.06 2.04 2.02 2.00 1.99 1.93 1.83 1.75
11 4.00 3.26 2.88 2.65 2.50 2.38 2.30 2.23 2.18 2.14 2.10 2.07 2.04 2.02 2.00 1.98 1.96 1.95 1.89 1.78 1.70
12 3.97 3.23 2.85 2.62 2.47 2.35 2.27 2.20 2.15 2.10 2.07 2.03 2.01 1.98 1.96 1.95 1.93 1.91 1.86 1.75 1.67
13 3.94 3.21 2.83 2.60 2.44 2.33 2.24 2.17 2.12 2.08 2.04 2.01 1.98 1.96 1.94 1.92 1.90 1.89 1.83 1.72 1.64
14 3.92 3.18 2.80 2.57 2.42 2.30 2.22 2.15 2.10 2.05 2.02 1.98 1.96 1.93 1.91 1.89 1.88 1.86 1.81 1.69 1.61
15 3.90 3.17 2.79 2.55 2.40 2.28 2.20 2.13 2.08 2.03 1.99 1.96 1.94 1.91 1.89 1.87 1.86 1.84 1.78 1.67 1.59
16 3.88 3.15 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.01 1.98 1.94 1.92 1.89 1.87 1.85 1.84 1.82 1.76 1.65 1.56
17 3.87 3.13 2.75 2.52 2.37 2.25 2.17 2.10 2.04 2.00 1.96 1.93 1.90 1.88 1.86 1.84 1.82 1.80 1.75 1.63 1.55
18 3.86 3.12 2.74 2.51 2.35 2.24 2.15 2.08 2.03 1.98 1.95 1.91 1.89 1.86 1.84 1.82 1.81 1.79 1.73 1.62 1.53
19 3.84 3.11 2.73 2.50 2.34 2.23 2.14 2.07 2.02 1.97 1.93 1.90 1.87 1.85 1.83 1.81 1.79 1.78 1.72 1.60 1.51
20 3.83 3.10 2.72 2.49 2.33 2.22 2.13 2.06 2.01 1.96 1.92 1.89 1.86 1.84 1.82 1.80 1.78 1.76 1.71 1.59 1.50
25 3.79 3.06 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.92 1.88 1.85 1.82 1.79 1.77 1.75 1.73 1.72 1.66 1.54 1.44
50 3.71 2.98 2.60 2.37 2.21 2.09 2.00 1.93 1.88 1.83 1.79 1.76 1.73 1.70 1.68 1.66 1.64 1.62 1.56 1.43 1.32
200 3.65 2.92 2.54 2.31 2.15 2.03 1.94 1.87 1.81 1.76 1.72 1.69 1.66 1.63 1.60 1.58 1.56 1.55 1.48 1.33 1.19
212 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
Fr is then compared with FrB ;rA , where FrB ;rA isthe value of the F-distribution with degrees of freedom
of numerator rB nB 1PmB
i1PqB
j1 pijB and deno-minator rA nA 1
PmAi1PqA
j1 pijA, representsthe portion of the F-distribution to the right of the
given value; 0.05.If Fr > FrB ;rA , method B has worse repeatability
than method A at the (1)100% (i.e. 95%) confi-dence level and therefore the repeatability of the
alternative method B is not acceptable.
If Fr FrB ;rA , the repeatability of the alternativemethod B is acceptable, which means that it is at most
a factor worse than that of method A.
2.5.1.2. Comparison of time-different intermediate
precision. For the comparison of the time-different
intermediate precision, we need the number of degrees
of freedom associated with the precision estimates.
Since these estimates are not directly estimated from
the data but are calculated as a linear combination of
two mean squares, MSD and MSE (see Table 3), it is
not evident how to determine the number of degrees of
freedom associated with this compound variance. The
Satherthwaite approximation [11] (see further) can
then be used.
However, to avoid the complexity in the determina-
tion of the degrees of freedom associated with s2IT, thecomparison of time-different intermediate precisions
can be performed in an indirect way by comparing the
day mean squares MSD [12]. Indeed since (see
Table 3):
MSD ns2D s2r and s2IT s2D s2r :It follows that
MSD ns2IT ns2r s2r ns2IT n 1s2r :Therefore provided that the repeatabilities of both
methods are equal 2rA 2rB and the number ofreplicates per day for both methods is equal (nAnB),the day mean squares MSD are considered instead of
s2IT. If nAnB, the equality of the repeatabilities forboth methods is first to be tested H0 : 2rA 2rB ;H1 :
2rA6 2rB by means of a two-sided F-test. The
results obtained from the comparison of the repeat-
abilities in Section 2.5.1.1 cannot be used here, since
a one-sided F-test has been considered to test the
hypotheses: H0 : 2rB 2rA ; H1 : 2rB > 2rA. Anon-significant test, which means that the repeatability
of method B is acceptable, does not necessarily imply
that the repeatabilities of both methods are equal; the
repeatability of method B can be better (smaller) than
the repeatability of method A. Therefore, the repeat-
abilities of both methods have to be compared again
by applying a two-sided F-test. Fr is obtained here as
follows:
Fr s21
s22(12)
with s21 the largest of s2rA
and s2rB .
Fr is then compared with F=2r1 ; r2, whereF=2r1 ; r2 is the value of the F-distribution withdegrees of freedom of numerator r1 and denominatorr2 , /2 represents the portion of the F-distribution tothe right of the given value; 0.05.r1 rA and r2 rB if s2rA > s2rB ;r1 rB and r2 rA if s2rB > s2rA ;
rB nB 1XmBi1
XqBj1
pijB and
rA nA 1XmAi1
XqAj1
pijA :
If Fr F=2r1 ; r2, there is no evidence that thetwo methods have different repeatabilities and there-
fore the equality of the repeatabilities for both meth-
ods can be assumed.
If Fr > F=2r2 ; r2, the repeatabilities of the twomethods are significantly different at the significance
level of (i.e. 5%). In that case the following sim-plified approach to compare the time-different inter-
mediate precisions cannot be applied. The
Satherthwaite approximation to estimate the number
of degrees of freedom associated with s2IT has then tobe used (see further).
In the situation where the number of replicates per
day for both methods are equal (nAnB) and theequality of the repeatabilities for both methods can
be assumed 2rA 2rB, the comparison of time-dif-ferent intermediate precision is performed by calcu-
lating FI(T) as follows:
FIT MSDBMSDA
nBs2DBs2rB
nAs2DAs2rA
nBs
2ITBnB1s
2rB
nAs2ITAnA1s2rA
!:
(13)
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 213
-
FI(T) is compared with FITB ;ITA , whereFITB ;ITA is the value of the F-distribution withdegrees of freedom of the numerator ITB
PmBi1PqB
j1pijB 1 and the denominator ITA PmA
i1PqAj1pijA 1, represents the portion of the F-
distribution to the right of the given value; 0.05.If FIT FITB ;ITA , method B has worse time-
different intermediate precision than method A at the
(1)100% (i.e. 95%) confidence level and thereforethe time-different intermediate precision of the alter-
native method B is not acceptable.
If FIT FITB ;ITA , the time-different inter-mediate precision of the alternative method B is
acceptable, which means that it is at most a factor
IT worse than that of method A.In the situation where the number of replicates per
day for both methods are not equal (nA 6nB) or theequality of the repeatabilities for both methods cannot
be assumed 2rA 6 2rB, the comparison of the time-different intermediate precisions cannot be performed
by Eq. (13) but it must be investigated through the
comparison of s2IT. Then FI(T) is calculated as fol-lows:
FIT s2ITBs2
ITA: (14)
As mentioned earlier, the number of degrees of
freedom associated with s2IT for both methods isobtained from the Satterthwaite approximation:
When the non-integer value is obtained for the I(T),round the number down to the nearest integer.
The comparison of FI(T) with FITB ; ITA,where ITB and ITA are computed from Eqs. (15)and (16), respectively, is then performed in the same
way as mentioned earlier when Eq. (13) is applied.
2.5.1.3. Comparison of (operatorinstrumenttime)-different intermediate precision. In analogy with the
comparison of the time-different intermediate
precision, the comparison of (operatorinstrumenttime)-different intermediate precision can be per-
formed in an indirect way by comparing the
(operatorinstrumentday)-mean squares MSOID. Ifthe number of replicates per day for both methods is
equal (nAnB) and the comparison of the repeata-bilities in Eq. (12) does not give evidence against the
equality of the repeatabilities of both methods (i.e.
2rA 2rB ), FI(OIT) is calculated as follows:
FIOIT MSOIDBMSOIDA
nBs2OIDBs2rB
nAs2OIDAsr2
A
nBs
2IOITBnB1s
2rB
nAs2IOITAnA1s2rA
!: (17)
The comparison of FI(OIT) with FIOITB ;IOITA ,where IOITB
PmBi1PqB
j1 pijB 1 and IOITAPmAi1PqAj1 pijA 1 is then performed in analogywith the comparison of the time-different intermediate
precision mentioned earlier when Eq. (13) is
considered.
In the situation where the number of replicates
per day for both methods are not equal (nA 6nB)or the equality of the repeatabilities of both
methods cannot be assumed 2rA 6 2rB, the compar-ison of (operatorinstrumenttime)-different inter-mediate precision must be investigated through
the comparison of s2IOIT. Then FI(OIT) is calculatedas follows:
FIOIT s2IOITBs2
IOITA: (18)
Again, the further steps to compare FI(OIT) with
FIOITB ; IOITA, where IOITB and IOITA arecomputed from Eqs. (19) and (20), respectively, are in
analogy with the comparison of the time-different
intermediate precision mentioned earlier when
Eq. (13) is considered.
ITB s2ITB
2
MSDB=nB2=PmB
i1PqB
j1pijB 1 nB 1MSEB=nB2=nB 1PmB
i1PqB
j1 pijB; (15)
ITA s2ITA
2
MSDA=nA2=PmA
i1PqA
j1pijA 1 nA 1MSEA=nA2=nA 1PmA
i1PqA
j1 pijA: (16)
214 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
2.5.1.4. Comment concerning the ISO 5725-6
procedure. In ISO [1], the comparison of the overall
precision (a term which is not really explained) is
performed in analogy with Eq. (17) without pre-
evaluating the equality of the repeatabilities of both
methods. The fact that the same number of replicates
per laboratory for the two methods (nAnB) isrequired is not taken into consideration either. If the
overall precision refers to the reproducibility, the
indirect comparison of the latter for both methods
by a comparison of the variance of the laboratory
means is only possible if the repeatability and the
number of replicates per laboratory (n) is the same for
both methods. If this is not the case, a direct
comparison of the reproducibility obtained with the
two methods should be performed, possibly in analogy
with Eq. (18).
2.5.2. Evaluation of the bias
2.5.2.1. Comments concerning the ISO 5725-6
procedure. The evaluation of the bias is performed
by comparing the grand means obtained with both
methods. In ISO [1] the comparison is based on the z-
test, since the sample statistic is compared with 2 (an
approximation of the two-sided tabulated z-value of
zz0.051.96). This implies that the estimatedstandard deviations used in the comparison are
obtained from large samples and therefore that they
are sufficiently good estimates of the true standard
deviations. If the sample statistic is larger than 2, the
difference between the means obtained with the two
methods is statistically significant. In that case, to
avoid the rejection of the method with an acceptable
bias, it is further examined whether the estimated bias
d can be considered acceptable. ISO concludes that the
bias is significant, but acceptable if its absolute value
is not larger than /2.However, this approach is questionable for the
following reasons. ISO specifies to be four times
d, where d represents the standard deviation of thedifference between the means of methods A and B.
This is obtained from z=2 zd which with0.05 becomes (1.961.645)d4d. In theevaluation of the bias d is estimated from the experi-ments as sd. If the estimated bias d jyA yBj =2,the 95% confidence interval (CI) around d can be
calculated as
=2 1:96sd or 2d 2sd:
If sd is exactly equal to d, the lower limit of theconfidence interval is equal to 0 and the upper limit is
equal to (see Fig. 2(a)). Since 0 is just included inthe CI, the bias is not significantly different from zero.
(Evaluation of the bias by means of the z-test, as done
in ISO, would of course lead to the same conclusion
since the test statistic, d/sd2d/sd2). Moreover, theprobability that the true absolute bias, as estimated by
IOITB s2IOITB
2
MSOIDB=nB2=PmB
i1PqB
j1 pijB 1 nB 1MSEB=nB2=nB 1PmB
i1PqB
j1 pijB; (19)
IOITA s2IOITA
2
MSOIDA=nA2=PmA
i1PqA
j1 pijA 1 nA 1MSEA=nA2=nA 1PmA
i1PqA
j1 pijA; (20)
Fig. 2. Different situations of a bias evaluation. d: The estimated
absolute difference between the grand means obtained with the two
methods, : the acceptable bias; () 95% confidence interval.(a) sdd and d/2; (b) sd/2; (c) sd>d and d
-
d/2, exceeds is only 2.5%. ISO considers asignificant bias to be acceptable if d is smaller than
/2. However, if sd equals d there is no point incomparing d with /2 since with d larger than /2 thereis no chance that the significant difference can be
acceptable.
If sd is different from d, as is to be expected,the comparison can lead to wrong conclusions.
Indeed if sd is smaller than d, which will be, e.g.the case if the acceptance criteria for the precision
measure are defining the number of measurements
to be performed, considering the bias to be unaccep-
table if d is larger than /2 can lead to the rejectionof a method with an acceptable bias (see Fig. 2(b)).
On the other hand if sd is larger than d, an unac-
ceptable bias can lead to a non-significant test and
therefore to acceptance of the method (see Fig. 2(c)).
Therefore, it is more appropriate to compare the
one-sided upper 95% confidence limit around d with
to conclude on the acceptability of the method(see further).
2.5.2.2. Adapted approach. In our approach which is
intended for the intralaboratory situation, the standard
deviations are generally estimated from a relatively
small sample size, and therefore the t-test is more
appropriate than the z-test. A two-sided test (H0:
AB; H1: A 6B) is considered since thedifference between the two means can be positive
as well as negative. Therefore
tcal jyA yBj
sd; (21)
where sd represents the estimated standard deviation
of the differences between the means obtained with
the two methods.
The use of t-test requires that the variances of the
day means obtained with the two methods are equal
(i.e. 2yijkA 2yijkB ). This equality must be first tested
by applying a two-sided F-test. The degrees of free-
dom associated with s2yijkA and s2yijkB are
PmAi1PqA
j1pijA 1 and
PmBi1PqB
j1 pijB 1, respectively.If there is no evidence against the equality of 2yijkA
and 2yijkB , sd is calculated by applying the pooledvariance s2p as follows:
sd
s2p
1PmAi1PqA
j1 pijA 1PmB
i1PqB
j1 pijB
!vuut ; (22)where
with
d XmAi1
XqAj1
pijA XmBi1
XqBj1
pijB 2; (24)
When the equality of the variances of the day means,
2yijkA and 2yijkB , cannot be assumed, the variances
cannot be pooled and sd is calculated as follows [13]:
sd
s2yijkAPmA
i1PqA
j1 pijA
s2yijkBPmBi1PqB
j1 pijB
vuut : (25)The number of degrees of freedom d associated with
sd is then calculated by applying the Satterthwaite
approximation:
The tcal is compared with t=2;d , where t=2;d is the
two-sided tabulated t-value at the significance level
0.05 and the degrees of freedom d as indicated inEq. (24) or Eq. (26).
If tcal > t=2;d , the difference between the meansobtained with the two methods is statistically signifi-
cant. Though the difference is significant it might not
s2p PmAi1PqAj1 pijA 1s2yijkA PmBi1PqBj1 pijB 1s2yijkB
d(23)
d s2d2
s2yijkA=PmA
i1PqA
j1 pijA2=PmA
i1PqA
j1 pijA 1 s2yijkB=PmB
i1PqB
j1 pijB2=PmB
i1PqB
j1 pijB 1(26)
216 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
be relevant to the application. Therefore it could be
further evaluated whether the difference found can be
considered acceptable. The one-sided (1)100%(i.e. 95% for 0.05) upper confidence limit (UCL)around the absolute difference d is compared with the
acceptance limit . The UCL is obtained as follows:
UCL jyA yBj sd t;d; (27)where sd is as shown in Eq. (22) or Eq. (25) and t;d is
the one-sided tabulated t-value at the significance level
0.05 and the degrees of freedom d (as shown inEq. (24) or Eq. (26)).
If UCL, the bias although statistically significantis acceptable since there is a smaller than (or at most)
probability that the true absolute difference asestimated by jyA yBj is larger than .
If UCL>, the bias is not acceptable since there is alarger than probability that the true absolute differ-ence as estimated by jyA yBj is larger than .
If tcal t=2;d , the difference between the meansobtained with the two methods is statistically insig-
nificant. However, if the precision estimates (s2r and
s2OID) of the two methods obtained experimentally are
larger than those used for the calculation of the
minimum number of measurements required for the
detection of in Eqs. (2)(6), an unacceptable biascan lead to a non-significant test.
Therefore, to limit the risk of adopting a method
with an unacceptable bias, the interval hypothesis
testing as proposed by Hartmann et al. [2] should
be more appropriate for the evaluation of the bias than
the approach mentioned above (Eqs. (21), (22), (23)
(27)). The procedure is as follows.
Calculate the one-sided (1)100% (i.e. 95% for0.05) upper confidence limit (UCL) around theabsolute difference d:
UCL jyA yBj sd t;d; (28)where is 0.05, sd is the same as Eq. (22) or Eq. (25)and d is the same as Eq. (24) or Eq. (26).
Since in interval hypothesis testing, the null and
alternative hypotheses are reversed, the roles of and are also reversed. Therefore, here corresponds tothe probability that a method that is biased to an
unacceptably large extent will be accepted.
If the UCL is not larger than the acceptable bias ,the difference between the grand means of method A
and method B is considered acceptable at the
(1)100% confidence level and the bias of methodB is acceptable. If the UCL is larger than the accep-
table bias , the bias of method B is not acceptable.With this approach the probability of accepting a
method that is too much biased is controlled at 5%.
The evaluation of the bias described above is based
on nested designs performed separately for methods A
and B. If it is possible to design a simultaneous
experiment (e.g. same days and same operators) a
paired comparison [6], for which a smaller sd is to be
expected, could be preferable.
3. Examples
Two examples will illustrate the approach dis-
cussed. In the first example measurements are per-
formed under (operatorinstrumenttime)-differentintermediate precision conditions while in the second
example only time-different intermediate precision
conditions are considered.
3.1. Example 1: quantification of diazepam in
diazepam tablets (the example is fictitious)
3.1.1. Background
Method A is a HPLC method, method B is a UV
(second derivative) method for the quantification of
diazepam in diazepam tablets. A laboratory uses
method A but developed method B as an alternative.
The laboratory wants to compare the performance of
both methods. The results are expressed as percentage
of the labelled amount (%). For method A an estimate
of the precision (sr and sOID) is available: sr1%,sOID2%.
3.1.2. Requirements
The acceptable bias is 2%. The acceptable ratio ofthe standard deviations between the two methods, ,I(T) or I(OIT) is 2. The statistical tests are performedat the significance level 0.05. The probability towrongly adopt the method with an unacceptable per-
formance is set at 0.2.
3.1.3. Experimental design
It is decided that the number of operators, instru-
ments and replicates per day for each method is 2 and
the number of days for both methods is equal (pApB).
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 217
-
From one batch of diazepam tablets, 300 tablets are
randomly taken. They are powdered and kept in a cool,
dry place, e.g. desiccator. Each day during pA days,
two replicates (nA2) prepared from the powderedsamples are analysed with method A on the first
instrument by the first operator. The analysis is
repeated independently in the same way during
another pA days but on the second instrument. The
second operator performs the procedures in the same
way as the first operator does. The experiments for
method B are designed in the same way as those for
method A.
3.1.4. Determination of the minimum number of days
(1) For the detection of . Since pApB andnAnB2, Eq. (4) is used:
2 t=2 t
2s2OIDA s2rA=nA
mAqApA
s;
2 t=2 t
222 12=2
4pA
s:
With pA4 and[2mAqApA2][2(224)2]30, (t/2t)2.896 and the right side of the equa-tion above equals 2.172; with pA5 and [2mAqApA2][2(225)2]38, (t/2t)2.876 and the right side of the equation above equals
1.929. Hence pApB5. (The use of a constant multi-plication factor equal to 3 would yield pApB6.)
(2) For the comparison of precision measures. From
Table 5 it can be seen that (or I(T) or I(OIT))2 isgiven by AB14.
To compare repeatability,
A mAqApA and B mBqBpB;so pA pB 14=4 3:5 4:To compare time-different intermediate precision,
A mAqApA 1 and B mBqBpB 1;so pA pB 14=4 1 4:5 5:To compare (operatorinstrumentday)-different
intermediate precision,
A mAqApA 1 and B mBqBpB 1;so pA pB 15=4 3:75 4:
(3) Conclusion. The minimum number of days
required for both methods (with two operators, two
instruments and two measurements per day) is 5.
3.1.5. The data
The data for methods A and B are summarized in
Tables 6 and 7, respectively.
3.1.6. Investigation of outliers
Grubbs tests were applied to the day means [8]. No
single or double stragglers or outliers were found for
both methods.
Table 6
Data obtained with method A (example 1)
Operator Instrument 1 Instrument 2
Day yi1k1 yi1k2 yi1k yi1 Day yi2k1 yi2k2 yi2k yi2
1 1 97.13 98.81 97.970 97.888 1 98.98 99.52 99.250 99.253
2 101.23 100.68 100.955 2 102.84 100.93 101.885
3 97.13 96.63 96.880 3 99.65 99.29 99.470
4 97.17 95.82 96.495 4 98.67 98.46 98.565
5 96.82 97.46 97.140 5 97.08 97.11 97.095
2 1 100.71 99.37 100.040 100.208 1 101.55 104.04 102.795 102.211
2 101.26 103.78 102.520 2 99.66 98.70 99.180
3 98.49 100.87 99.680 3 102.54 100.60 101.570
4 97.06 98.92 97.990 4 104.95 102.61 103.780
5 101.85 99.77 100.810 5 103.10 104.36 103.730
Grand mean99.890
218 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
3.1.7. Calculation of the variance estimates
Tables 8 and 9 summarize the calculation of the
variance estimates for methods A and B, respectively.
3.1.8. Comparison of precision
3.1.8.1. Repeatability. The repeatabilities are com-
pared according to Eq. (11):
Fr 1:48101:2317
1:20:
This is to be compared with F0.05(20,20)2.12. SinceFr
-
of replicates per day for both methods is equal
(nAnB), the comparison of time-different intermedi-ate precision is performed according to Eq. (13):
FIT MSDBMSDA
9:70426:3345
1:53:
This is to be compared with F0.05(16,16)2.33. SinceFI(T)
-
This is to be compared with the acceptable bias
2. Since UCL>2 the bias of method B is notacceptable. Notice that in this approach the probability
of accepting a method that is too much biased is
controlled at 5%.
3.2. Example 2: determination of moisture in cheese
(the example is fictitious)
3.2.1. Background
Method A is a Karl Fischer method, method B is a
vacuum oven method for the determination of moist-
ure in cheese. A laboratory uses method A but devel-
oped method B as an alternative. The laboratory wants
to compare the performance of both methods. The
results are expressed as percentage moisture. For
method A an estimate of the precision (s2r and s2D)
is available: s2r 0:023; s2D 0:08
3.2.2. Requirements
The acceptable bias is 0.50%. The acceptableratio of the standard deviations between the two
methods, or I(T) is 3. The statistical tests areperformed at the significance level 0.05. The prob-ability to wrongly adopt the method with an unac-ceptable performance is set at 0.05.
3.2.3. Experimental design
The material is a cheese, analysed with both
methods.
It is decided that the number of replicates per day
for each method is two and the number of days for
both methods is equal (pApB). Each day during pAdays, two independent samples (nA2) from thecheese are analysed with method A by the same
operator using the same instrument. Each day during
pB days, two independent samples (nB2) from thecheese are analysed with method B by the same
operator using the same instrument.
3.2.4. Determination of the minimum number of days
(1) For the detection of . Since pApB andnAnB2, Eq. (4) is used. Since in the comparisononly time-different intermediate precision conditions
are considered, the number of operators, mAmB1and the number of instruments, qAqB1. Conse-quently, as can be derived form Table 3, MSOID and
s2OID are equal to MSD and s2D, respectively.
0:5 t=2 t
2s2DA s2rA=nA
pA
s;
0:5 t=2 t
20:08 0:023=2
pA
s:
With pA10 and [2mAqApA2][2(1110)2]18, (t/2t)3.835 and the right side of theequation above equals 0.519; with pA11 and [2mAqApA2][2(1111)2]20, (t/2t)3.811 and the right side of the equation above equals0.492. Hence pApB11. (The use of a constant multi-plication factor equal to 4 would yield pApB12.)
(2) For the comparison of precision. From Table 4 it
can be seen that 3 orI(T)3 is given by AB10.To compare repeatability standard deviations,
A mAqApA and B mBqBpB;so pA pB 10:
To compare between-day mean squares,
A mAqApA 1 and B mBqBpB 1;so pA pB 10 1 11:
(3) Conclusion. The minimum number of days
required (with two measurements per day) is 11.
3.2.5. The data
The data for methods A and B are summarized in
Table 10.
3.2.6. Investigation of outliers
Grubbs tests were applied to the day means [8]. No
single or double stragglers or outliers were found for
method A. For method B the single Grubbs test
applied on the mean of day 9 is significant at the
5% level but not at the 1% level. Indeed
G 39:845 39:4860:1411
2:544;
which is to be compared with Grubbs critical values
for p11 at 5% (2.355) and 1% (2.564). Therefore,since this observation is considered as a straggler it is
retained but indicated with a in Table 10.
3.2.7. Calculation of the variance estimates
Tables 11 and 12 summarize the calculation of the
variance estimates for methods A and B, respectively.
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 221
-
Table 10
Data for the example 2
Day Method A Method B
y11k1 y11k2 y11k y11k1 y11k2 y11k
1 39.68 39.77 39.725 39.29 39.36 39.325
2 39.08 39.38 39.230 39.51 39.38 39.445
3 40.39 40.33 40.360 39.45 39.49 39.740
4 39.87 39.98 39.925 39.59 39.51 39.550
5 39.70 39.95 39.825 39.41 39.41 39.410
6 39.93 39.95 39.940 39.45 39.54 39.495
7 39.78 39.97 39.875 39.55 39.55 39.550
8 39.92 40.20 40.060 39.29 39.36 39.325
9 40.34 39.89 40.115 39.82 39.87 39.845a
10 40.12 40.26 40.190 39.44 39.45 39.445
11 39.43 39.54 39.485 39.45 39.53 39.490
Grand mean39.884 Grand mean39.486a Straggler.
Table 11
Calculation of the variance estimates for method A (example 2) (ANOVA table)
Source Mean squares Estimate of
Day MSD0.2050 2rA nAs2DAResidual MSE0.0239 2rA
Calculation of the variance estimates
The repeatability variance s2rA 0:0239; 11The between-day variance component s2DA
0:2050 0:02392
0:0906Time-different intermediate precision (variance) s2ITA s
2DA s2rA 0:1145
Variance of the day means yijk s2yijkA s
2DA s2rA=nA 0:1025; 10
Table 12
Calculation of the variance estimates for method B (example 2) (ANOVA table)
Source Mean Squares Estimate of
Day MSD0.0397 2rB nBs2DBResidual MSE0.0024 2rB
Calculation of the variance estimates
The repeatability variance s2rB 0:0024; 11The between-day variance component s2DB
0:0397 0:00242
0:0187Time-different intermediate precision (variance) s2ITB s
2DB s2rB 0:0211
Variance of the day means yijk s2yijkB s
2DB s2rB=nB 0:0199; 10
222 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
3.2.8. Comparison of precision
3.2.8.1. Repeatability. The repeatabilities are com-
pared according to Eq. (11):
Fr 0:00240:0239
0:10:
This is to be compared with F0.05(11,11)2.82. SinceFr3.47 there is evidence that the
repeatabilities of both methods are different (in fact
the repeatability for method B is better than for
method A).
(ii) Since the repeatabilities of both methods are
different 2rA 6 2rB, the comparison of time-differentintermediate precision is performed according to
Eq. (14):
FIT S2ITBS2
ITA 0:0211
0:1145 0:18:
The number of degrees of freedom associated with
s2IT for both methods is obtained from the Sat-terthwaite approximation (Eqs. (15) and (16)):
ITB 0:02112
0:0397=22=10 0:0024=22=11 11;
ITA 0:11452
0:2050=22=10 0:0239=22=11 12:
FI(T) is to be compared with F0.05(11,12)2.72. SinceFI(T)3.72 there is evidence that the variances of
the day means obtained with the two methods are
different. Therefore, the standard deviation sd and its
associated degrees of freedom d are calculated asfollows (see Eqs. (25) and (26)):
sd
0:1025
11 0:0199
11
r 0:1055;
d 0:01112
0:1025=112=10 0:0199=112=10 13:
The test statistic tcal is obtained as given in Eq. (21):
tcal jyA yBj
sd j39:884 39:486j
0:1055 3:77:
This is to be compared with t0.025;132.16. Sincetcal>2.16 the difference between the grand means of
the two methods is statistically significant at 0.05.To further evaluate whether the difference found
can be acceptable, the UCL is calculated according to
Eq. (27):
UCL j39:884 39:486j 0:1055 t;13 0:398 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:
Since the UCL is larger than the acceptable bias
0.50, there is evidence that the difference betweenthe means of the two methods is unacceptable.
(If the probability is allowed to be 0.2, the UCLwould be (0.398(0.10550.870))0.490 which issmaller than 0.50 and the bias would be accept-able.)
In the interval hypothesis testing approach, the one-
sided 95% upper confidence limit (UCL) around the
absolute difference between the grand means is cal-
culated as follows (Eq. (28))):
UCL j39:884 39:486j 0:1055 t;13 j39:884 39:486j 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:
This is to be compared with the acceptable bias
0.5. Since UCL>0.5 the bias of method B is not
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 223
-
acceptable. Notice that in this example, the interval
hypothesis leads to the same conclusion as the point
hypothesis (with the inclusion of the -error consid-eration) does. This is due to the fact that with the point
hypothesis testing, a significant difference between
the means of both methods is detected and that in the
further evaluation whether the difference found can be
considered acceptable, the probability considered isequal to the probability in the interval hypothesistesting.
4. Conclusion
An approach for the comparison of an alternative
method to a reference method has been proposed
for the intralaboratory situation. Instead of the repro-
ducibility (as included in the ISO guidelines), the
(operatorinstrumenttime)-different intermediateprecision is considered in the comparison. The pro-
posal includes:
1. the experimental design (i.e. the determination of
the number of measurements required to perform
the comparison),
2. the estimation of different precision parameters
and the comparison of these precision measures
for both methods, and
3. the statistical approach for the evaluation of the
bias in which the interval hypothesis testing has
also been proposed as an alternative.
The comparison of the bias and precision described
in this article is performed at a single concentration
level. If the alternative method is intended for use over
a rather broad concentration range, the comparison
should be performed at more concentration levels (e.g.
low, middle and high). Due to the problem with
multiple comparison [5], the present approach is not
recommended if the methods are to be compared at
more than three levels. For trace analysis, an evalua-
tion of the detection and quantification limit should
also be performed.
The proposal is an optimal approach in the sense
that it is based on sample size calculations. The
number of measurements to be performed are such
that there is a high probability (1) that an alter-native method with an unacceptable performance will
not be adopted. This of course is of utmost importance
but might require a number of measurements that the
laboratory is not able (or not willing) to perform
because of time and cost involved. If this is the case,
an alternative approach based on a number of mea-
surements that in practice is feasible, is required. Two
approaches can be conceived. The first is to perform
the comparison, based on a user-defined number of
measurements, in the classical way using point
hypothesis testing and to evaluate the b-error. In thisway the laboratory would at least have an idea of the
probability that an alternative method with an unac-
ceptable performance has been accepted and thus of
the risk that is run that the method will not perform as
expected during routine use of the method.
Another approach is to control the probability that a
method with unacceptable performance characteris-
tics will be adopted by using interval hypothesis
testing. The latter was already included here as an
alternative for the evaluation of the bias but can also be
considered in the comparison of precision measures.
After it was proposed for the evaluation of the bias in
method validation studies by Hartmann et al. [2],
interval hypothesis testing has been considered by
the SFSTP in a guideline for the validation of bioa-
nalytical methods [14].
Acknowledgements
This work has received financial support from the
European Commission (Standards, Measurements and
Testing Programme Contract SMT4-CT95-2031) and
the Belgian government (The Prime Minister Services
Federal Office for Scientific, Technical and Cultural
Affairs, Standardisation Programme Research Con-
tract no/03/003).
References
[1] International Standard, Accuracy (Trueness and Precision) of
Measurement methods and results, ISO 5725-6, Geneva,
1994.
[2] C. Hartmann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander
Heyden, P. Vankeerberghen, D.L. Massart, Anal. Chem. 67
(1995) 4491.
[3] International Standard, Accuracy (Trueness and Precision) of
Measurement methods and results, ISO 5725-3, Geneva,
1994.
224 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225
-
[4] International Standard, Statistics (Vocabulary and symbols):
Design of experiments, ISO 3534-3, Geneva, 1985.
[5] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De
Jong, P.J. Lewi, J. Smeyers-Verbeke, Hand book of Chemo-
metrics and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.
[6] D.C. Montgomery, Design and Analysis of Experiments, 4th
ed., Wiley, New York, 1997.
[7] J. Mandel, The Statistical Analysis of Experimental Data,
Dover, New York, 1964, p. 359.
[8] International Standard, Accuracy (Trueness and Precision) of
Measurement methods and results, ISO 5725-2, Geneva,
1994.
[9] W. Gerisch, D. Abraham, Comput. Stat. Quarterly 4 (1989)
299.
[10] D.J. Schuirmann, J. Pharmacokinet. Biopharm. 15 (1987)
657.
[11] F.E. Satterthwaite, Biomed. Bull. 2 (1946) 110.
[12] G.T. Wernimont, Use of Statistics to Develop and Evaluate
Analytical Methods, in: W. Spendley (Ed.), AOAC, Arlington,
VA, 1985, p. 39.
[13] G.W. Snedecor, W.G. Cochran, Statistical Methods, 7th ed.,
The Iowa state University Press, Ames, Iowa, 1982, p. 96.
[14] E. Chapuzet, N. Mercier (Presidents), S. Bervoas-Martin, B.
Boulanger, P. Chevalier, P. Chiap, D. Grandjean, P. Hubert, P.
Lagorce, M. Lallier, M.C. Laparra, M. Laurentie, J.C. Nivet,
S.T.P. Pharma Prat. 7 (3) (1997) 169.
S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 225