comparison of alternative measurement methods (kuttatharmmakul)

Comparison of alternative measurement methods

Siriporn Kuttatharmmakul, D. Luc Massart, Johanna Smeyers-Verbeke*

ChemoAC, Pharmaceutical Institute, Vrije Universiteit Brussel, Laarbeeklaan 103, B-1090, Brussel, Belgium

Received 19 February 1998; received in revised form 14 July 1998; accepted 14 July 1998

Abstract

A procedure to compare the performance (precision and bias) of an alternative measurement method and a reference method

has been extensively described. It is based on ISO 5725-6 which has been adapted to the intralaboratory situation. This means

that the proposed approach does not evaluate the reproducibility, but considers the (operatorinstrumenttime)-differentintermediate precision and/or the time-different intermediate precision. A 4-factor nested design is used for the study. The

calculation of different variance estimates from the experimental data is carried out by ANOVA. The Satterthwaite

approximation is included to determine the number of degrees of freedom associated with the compound variances. Taken into

account the acceptable bias, the acceptable ratio between the precision parameters of the two methods, the significance level and the probability to wrongly accept an alternative method with an unacceptable performance, the formulae to determinethe number of measurements required for the comparison are given. For the evaluation of the bias, in addition to the point

hypothesis testing, the interval hypothesis testing is also included as an alternative. Two examples are given as an illustration

of the proposed approach. # 1999 Elsevier Science B.V. All rights reserved.

Keywords: Comparison; Alternative measurement method; Bias; Precision; Repeatability; Time-different intermediate precision;

(Operatorinstrumenttime)-different intermediate precision; Nested design; ANOVA; Satterthwaite approximation; Interval hypothesistesting

1. Introduction

When a laboratory wants to replace an existing

analytical method by a new method (e.g. because

the latter is cheaper or easier to use) it has to show

that the new method performs at least as good as the

existing one. A comparison of the performance (pre-

cision and bias) of both methods has therefore to be

performed. One of the most advanced guidelines for

the comparison of two methods can be found in ISO

5725-6 [1]. However the ISO guideline is based on

interlaboratory studies and is therefore not applicable

in the intralaboratory situation. Indeed within a single

laboratory, the reproducibility, as evaluated by ISO,

cannot be determined but intermediate precision con-

ditions, such as changes in operator, equipment and

time should be considered since they contribute to the

variability of measurements performed in the labora-

tory.

In the ISO guideline the reference method is an

international standard method that was studied in an

interlaboratory test program and its precision (2) isassumed to be known. This assumption is reasonable

since the precision is obtained from a large number of

measurements. In the intralaboratory situation a

Analytica Chimica Acta 391 (1999) 203225

*Corresponding author. Tel.: +32-2477-4737; fax: +32-2477-

4735; e-mail: [email protected]

0003-2670/99/$ see front matter # 1999 Elsevier Science B.V. All rights reserved.PII: S 0 0 0 3 - 2 6 7 0 ( 9 9 ) 0 0 1 1 5 - 4

laboratory has developed a first method and later on

wishes to compare a new method to the older already

internally validated method. For the latter, referred to

as the reference method, only an estimate of the

precision (s2) will be available since the precision is

determined from a rather limited number of measure-

ments. This of course determines the statistical tests to

be used in the comparison of the performance char-

acteristics of both methods.

Moreover, the ISO standard is meant to show that

both methods have similar precision and/or trueness

whereas a laboratory that performs a method compar-

ison study is interested to evaluate whether the new

method is at least as good as the reference method.

This implies that some two-sided statistical tests

included in the ISO guideline are not appropriate

for the comparison of two methods in a single labora-

tory, where example in the evaluation of the precision

one-sided tests have to be considered.

In the decision making concerning the new alter-

native method it is important (i) not to reject an

alternative method which in fact is appropriate, and

(ii) not to accept an alternative method which in fact is

not appropriate. The former is related to the a-error ofthe statistical tests used in the comparison and is

controlled through the selection of the significance

level. The latter is related to the b-error and when it isconsidered it is generally taken into account by includ-

ing sample size calculations. This approach is also

included in the ISO guideline.

In this article we propose an adaptation of the ISO

guideline to the intralaboratory comparison of two

methods. It is also applicable to the situation in which

two laboratories of, e.g., the same organisation are

involved, each laboratory being specialized in one of

the methods. For the evaluation of the bias, in addition

to the point hypothesis testing, interval hypothesis

testing [2] in which the probability of accepting a

method that is too much biased is controlled, is also

included.

Due to the specified acceptance criteria for the

alternative method, the proposed approach might lead

to a large number of measurements to be performed.

An alternative approach (which will be described in a

next article) is to perform the method comparison with

a user-defined number of measurements and to eval-

uate the probability that a method with an unaccep-

table performance will be accepted.

2. Methods

All symbols and abbreviations used in this paper are

defined in Table 1.

2.1. Experimental design

A 4-factor nested experimental design is used

[37]. This design is also one of the designs recom-

mended by ISO [3]. The schematic layout of the

design is given in Fig. 1. The four factors represent

four sources of variation that contribute to the varia-

bility of the measurements within one laboratory. The

factors considered are operator, instrument, time, and

random error. The experimental approach can be

described as follows. For each analytical method,

the sample is analysed by m operators. Each operator

performs, on each of q instruments, n replicated

measurements on each of p different days. To avoid

an underestimation of the day effect, the set of p

different days during which the measurements are

performed on each of the q instruments must be

different, i.e. two instruments cannot be operated on

the same day.

Fig. 1. Schematic layout for the 4-factor nested experimental

design applied. Only the nested structure under the ith operator, jth

instrument and the kth day is shown here. The nested structure

under other operators, instruments and days has the same pattern.

(instruinstrument, repreplicate).

204 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

Table 1

Definition of symbols and abbreviations applied in the document

d Absolute difference between the grand means obtained with two methods

D Component of day effect in a test result

E Random error component occurring in every test result

FI(OIT) Calculated F-value obtained from the comparison of (operatorinstrumenttime)-different intermediate precision (variance)FI(T) Calculated F-value obtained from the comparison of time-different intermediate precision (variance)

Fr Calculated F-value obtained from the comparison of repeatability variance

FB ;A Value of the F-distribution with B degrees of freedom associated with the numerator and A degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value

FA ;B Value of the F-distribution with A degrees of freedom associated with the numerator and B degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value

I Component of instrumental effect in a test result

m Number of operators

M General mean (expectation) of the test results

MS Mean squares

n Number of replicates performed on each day

n Average number of replicates performed on each dayN Total number of measurements

O Component of operator effect in a test result

p Number of days

q Number of instruments

s Estimate of

s2 Estimate of 2

tcal Calculated t-value obtained from the comparison of the means obtained with two methods

t/2 Two-sided tabulated t-value at significance level and degrees of freedom

t One-sided tabulated t-value at significance level and degrees of freedom

UCL Upper confidence limit

y Test resulty Grand mean of test resultsyi Arithmetic mean of the test results obtained from the ith operatoryij Arithmetic mean of the test results obtained from the ith operator and the jth instrumentyijk Arithmetic mean of the test results obtained from the ith operator, the jth instrument and the kth dayyijkL Particular test result related to the Lth replicate of the kth day, the jth instrument and the ith operator

z/2 Two-sided tabulated z-value of the standard normal distribution at significance level

Significance level (type I error probability)

Type II error probability Detectable difference between the means obtained from the two methods

Numbers of degrees of freedom

Detectable ratio between the repeatability standard deviations of method B and method A True value of a standard deviation

2 True value of variance

I(OIT) Detectable ratio between the square roots of the (operatorinstrumentday) mean squares (or the (operatorinstrumenttime)-different intermediate precision (standard deviation)) of method B and method A

I(T) Detectable ratio between the square roots of the between-day mean squares (or the time-different intermediate precision

(standard deviation)) of method B and method A

Symbols used as superscripts and subscripts

A Method A

B Method B

d Difference between the grand means obtained with two methods

D Between-day

E Residual

I Between-instrument

I(T) Time-different intermediate precision

I(OIT) (Operatorinstrumenttime)-different intermediate precisioni Index for a particular operator

S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 205

2.2. Basic statistical model

To understand the following statistical approach, it

is necessary to briefly explain the basic statistical

model. More details can be found in [3].

Here, we assume that every test result y obtained

with a particular analytical method is the sum of five

components

y M O I D E; (1)where M is the general mean (expectation) of the test

results, O the random effect caused by changing the

operator, I the random effect caused by changing the

instrument, D the random effect caused by the fact that

measurements are performed on different days, and E

is the random error occurring in every measurement

under repeatability conditions.

These four factors (operator, instrument, time, and

random error under repeatability conditions) are

selected for our approach, since they are the main

sources that contribute to the variability of the mea-

surements within a laboratory. The precision of the

method is then determined by the contribution from

the variance (2) of each factor, i.e. 2O; 2I ;

2D and

2E, which are estimated as s2O; s

2I ; s

2D and s

2E, respec-

tively. Since it can be assumed that these estimated

variance components are not related, the estimation of

the overall precision parameter also called the (oper-

atorinstrumenttime)-different intermediate preci-sion S2IOIT can be obtained by the sum of allvariance components: s2O s2I s2D s2E. It is an esti-mate of the variance of an individual measurement

made by an arbitrary operator on an arbitrary instru-

ment. When in the laboratory, the analyses are per-

formed by the same operator on a single instrument,

the overall precision corresponds with the time-dif-

ferent intermediate precision which is obtained as

s2IT s2D s2E. The intermediate precision is usefulfor indicating the ability of the analytical method to

repeat the test result under the defined conditions.

2.3. Calculation of the variance estimates

In analogy with ISO guidelines [3], the calculation

of different variance estimates is carried out by

ANOVA (see Table 2). In case that the numbers of

replicates per day (nijk), as well as the numbers of

instruments performed by each operator (qi), are equal

for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij,the calculation is simplified as shown in Table 3. The

number of days (pij) might not be constant for different

operators and instruments if the detection of outlying

day means yijk leads to the rejection of some data. Ifhowever pij is equal for all i and j then the termsPm

i1Pq

j1 pij andPm

i1Pq

j1pij 1 which appearin Table 3 are simply replaced by mqp and mq(p1),respectively. Throughout the rest of the text the cal-

culations as represented in Table 3 will be considered.

No calculation is given for the individual variance

component for operators s2O and for the individualvariance component for instruments s2I in theANOVA tables. Since the number of operators and

instruments within a single laboratory is generally

limited, a small value for the degrees of freedom

associated with the variance components, s2O and s2I ,

is to be expected. Consequently, poor estimates for s2Oand s2I will be obtained. Therefore, besides the time-

different intermediate precision s2IOIT, the (opera-torinstrumenttime)-different intermediate preci-sion s2IOIT is estimated as shown in Table 3. Thisestimate includes the calculation of MSOID which is

obtained from the sum of squared differences between

Table 1 (Continued )

j Index for a particular instrument

k Index for a particular day

L Index for a particular test result performed by the ith operator, on the jth instrument and kth day

m Number of operators

nijk Number of replicates performed by the ith operator on the jth instrument and kth day

pij Number of days performed by the ith operator on the jth instrument

qi Number of instruments performed by the ith operator

O Between-operator

OID (Operatorinstrumentday)r Repeatability


Table 2

Calculation of the variance components (ANOVA table)

Source Mean squares Estimate of

Operatorinstrumentday

MSOID

Xmi1

Xqij1

Xpijk1

nijkyijk y2

Xmi1

Xqij1

pij 1

2r n2OID

Day MSD

Xmi1

Xqij1

Xpijk1

nijkyijk yij2Xmi1

Xqij1pij 1

2r n2D

Residual MSE

Xmi1

Xqij1

Xpijk1

XnijkL1yijkL yijk2

Xmi1

Xqij1

Xpijk1nijk 1

2r

yijk

XnijkL1

yijkL

nijk; yij

Xpijk1

nijkyijkXpijk1

nijk

; y

Xmi1

Xqij1

Xpijk1

nijkyijk

N ; n N

Xmi1

Xqij1

Xpijk1nijk2=N

!Xmi1

Xqij1

pij 1

0BBBB@1CCCCA; N

Xmi1

Xqij1

Xpijk1

nijk total number of measurements

0BBBB@1CCCCA

Calculation of the variance estimates

The repeatability variance s2r MSE; Xmi1

Xqij1

Xpijk1nijk 1

The between-day variance component s2D MSD MSE

nif s2D < 0 set s

2D 0

The (operatorinstrumentday) variance component s2OID MSOID MSE

nif s2OID < 0 set s

2OID 0

Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE

n

(Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE

n

Variance of the day means yijk s2yijk

Xmi1

Xqij1

Xpijk1yijk y2

Xmi1

Xqij1

pij 1 MSOID

n s2OID s2r =n;

Xmi1

Xqij1

pij 1

nijk is the number of replicates on the kth day performed on the jth instrument by the ith operator (L1, 2, . . ., nijk); pij the number of days performed on the jth instrument by the ithoperator (k1, 2, . . ., pij); qi the number of instruments performed by the ith operator (j1, 2, . . ., qi); m is the number of operators (i1, 2, . . ., m).

S.

Ku

ttath

arm

ma

kul

eta

l./An

alytica

Chim

icaA

cta391

(1999)

203225

207

Table 3

Calculation of the variance components in case of equal nijk and equal qi for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij. Only pij that may be unequal for different operatorsand instruments due to possible rejection of some discordant data (ANOVA table)


Operatorinstrumentday MSOID nXmi1

Xqj1

Xpijk1yijk y2Xm

i1

Xqj1

pij 12r n2OID

Day MSD nXmi1

Xqj1

Xpijk1yijk yij2Xm

i1

Xqj1pij 1

2r n2OID

Residual MSE

Xmi1

Xqj1

Xpijk1

XnL1yijkL yijk2

n 1Xmi1

Xqj1

pij

2r

yijk

XnL1

yijkL

n; yij

Xpijk1

yijk

pij; y

Xmi1

Xqj1

Xpijk1

XnL1

yijkL

nXmi1

Xqj1

pij

0BBBB@1CCCCA


The repeatability variance s2r MSE; n 1Xmi1

Xqj1

pij

The between-day variance component s2D MSD MSE

nif s2D < 0 set s

2D 0

The (operatorinstrumentday) variance component s2OID MSOID MSE

nif s2OID < 0 set s

2OID 0

Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE

n

(Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE

n

Variance of the day means yijk s2yijk

Xmi1

Xqj1

Xpijk1yijk y2

Xmi1

Xqj1

pij 1 MSOID

n s2OID s2r =n;

Xmi1

Xqj1

pij 1

n is the number of replicates (L1, 2, . . ., n); pij the number of days performed on the jth instrument by the ith operator (k1, 2, . . ., pij); q the number of instruments (j1, 2, . . ., q);m is the number of operators (i1, 2, . . ., m).

20

8S

.K

utta

tha

rmm

aku

let

al./A

na

lyticaC

him

icaA

cta391

(1999)

203225

the day means yijk and the grand mean y. Thismight result in an underestimation of the effects of the

instrument and the operator since those parameters are

not changed for every yijk obtained. However, this isthe best possible approach to estimate the intermediate

precision s2IOIT with small numbers of operators andinstruments and although it might not adequately

reflect the true precision it is useful for comparison

studies as long as the number of operators and instru-

ments for the methods being compared are equal.

Considering the formulae to calculate the between-

day variance component s2D and the (opera-

torinstrumentday) variance component s2OID inTable 3, negative values for those parameters can be

obtained. For example, if due to random effects MSDis smaller than MSE, we will get a negative value for

s2D. In that case, the negative estimates of variance are

given the value 0. This is the usual practice which is

also considered by ISO [8] if a negative value for the

between-laboratory variance s2L is obtained. Another

approach to deal with negative variance estimates is

reported in [9]. It applies the method of pooling

minimal mean squares with predecessors.

2.4. Number of measurements

As mentioned earlier the probability to accept an

alternative method, which in fact is not appropriate

(-error) because it is not precise enough or toomuch biased in comparison with the reference method,

can be controlled by determining the number of

measurements required to detect a certain bias as well

as a certain difference in precision (if it exists). This

implies that an acceptable difference between the

means of the two methods as well as an acceptable

ratio between the precision parameters of the two

methods have to be specified. The former is called

by ISO the detectable difference between the biases of

the two methods, , and is defined as the minimumdifference between the means of the two methods that

the experimenter wishes to detect with high probabil-

ity. The latter is called by ISO the detectable ratio

between the precision parameters of the two methods.

It is defined as the minimum ratio of precision para-

meters that the experimenter wishes to detect with

high probability from the results obtained with the two

methods. In analogy with what is given in ISO, the

detectable ratio to be considered in the intralaboratory

situation are:

rBrA

for the comparison of repeatabilities;

IT

MSDBMSDA

sfor the comparison of time-

different intermediate precisions;

IOIT

MSOIDBMSOIDA

sfor the comparison of

operator instrument time-differentintermediate precisions:

Due to the complexity in the determination of the

degrees of freedom associated with I(T) and I(OIT)(see further), the detectable ratios I(T) and I(OIT) aregiven in terms of the mean squares.

It is recommended to use a significance level of

0.05 in the comparison of the precision parametersand the means ( represents the probability that thealternative method B is rejected when in fact its

performance is not worse than that of the reference

method A). ISO recommends that the risk of failing to

detect the chosen minimum ratio of standard devia-

tions or the minimum difference between the means is

set at 0.05. For the intralaboratory situation thismight be too stringent and therefore 0.05 as well as0.2 will be considered. The latter is inspired by therequirement in bioequivalence studies [10], where it is

demanded that the statistical tests have 80% power

(power100(1)).

2.4.1. Determination of the minimum number of

measurements required for the detection of In the ISO document [1], the precision (2) is

assumed to be known and the repeatability variance

as well as the between-laboratory variance is included

in the calculation for the optimal number of measure-

ments. In what follows, this is adapted to the situation

in which only an estimate of the precision (s2) is

available. This requires the use of t-values instead

of z-values (applied in ISO). Moreover, the repeat-

ability variance as well as the (operatorinstrumentday) variance component is considered.

The following equation is used for the determina-

tion of the minimum number of measurements

required for the detection of .


where the subscript A and B refer to method A and

method B, respectively, t/2: two-sided tabulated

t-value at significance level and degrees of freedommAqApAmBqBpB2, t,: one-sided tabulatedt-value at significance level and degrees of freedommAqApAmBqBpB2.

This expression is based on the t-test for the com-

parison of two means and therefore assumes that the

precision of both methods are equal. This assumption

should be acceptable for an estimation of the optimal

number of measurements. If the precision of the

alternative method B is unknown which might often

be the case, it is substituted by the precision of the

reference method A.

where is the acceptable difference between themeans, which one wants to detect with (1)100%confidence from a two-tailed t-test performed at the

significance level . The t-distribution of the non-zeromean difference is a non-central t-distribution. There-

fore, instead of (t/2t), the non-centrality parameterof the non-central t-distribution should be used. An

evaluation of the effect of approximating this by

means of the central t-distribution indicated that very

similar results are obtained. Therefore the central

t-distribution is used.

As indicated earlier, it is strongly recommended to

have the same numbers of operators (mAmB) andinstruments (qAqB) for both methods. If moreover,the number of days as well as the number of replicates

are taken the same for both methods, i.e. pApB andnAnB, Eq. (3) simplifies to

t=2 t

2s2OIDA s2rA=nA

mAqApA

s: (4)

Generally, the number of operators (mAmB) andinstruments (qAqB) will be fixed by practical con-straints. It is recommended that the number of repli-

cates per day is equal to 2 (n2) and to focus on thenumber of days required since this will lead to a

balanced design in which the number of degrees of

freedom associated with the repeatability is almost the

same as the number of degrees of freedom associated

with the between-day component. Therefore, the

minimum number required is mostly determined only

for the number of days pA (pB) which can beobtained by finding the smallest value for pA that

satisfies Eq. (4).

The equations above are only approximates which

could be further simplified by replacing (t/2t) by aconstant value. Indeed for 0.05 and 0.05,(t/2t) varies between 3.6 (1) and 3.9 (14,

i.e. mqp2) and therefore a constant value equal to4 could be used. For 0.05 and 0.2, (t/2t)varies between 2.8 (1) and 3.0 (14, i.e.mqp2), thus a constant value of 3.0 could beapplied. Eq. (4) then becomes

4

2s2OIDA s2rA=nA

mAqApA

swhen 0:05 and 0:05; (5)

3

2s2OIDA s2rA=nA

mAqApA

swhen 0:05 and 0:2: (6)

2.4.2. Determination of the minimum number of

measurements required for the detection of the

minimum ratio of precision parameters

In the ISO document [1], values of the minimum

detectable ratio of the precision parameters corre-

sponding to the chosen degrees of freedom (A, B)are given for the significance level 0.05 and thepower (1)0.95. Since ISO applies a two-sidedF-test to check whether the two methods have

t=2 t

mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDB s2rB=nB

mAqApA mBqBpB 21

mAqApA 1

mBqBpB

s; (2)

t=2 t

mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDA s2rA=nB

mAqApA mBqBpB 21

mAqApA 1

mBqBpB

s(3)


different precision, these values are obtained based on

a two-sided F-test. In our approach, the objective is to

demonstrate that the precision of the alternative

method B is at least as good as that of the reference

method A. Therefore, a one-sided F-test is applied to

compare the precision of both methods. Consequently,

the calculation of the minimum ratio of precision

parameters or corresponding to the given valuesof (A, B, , ) can be computed as

; IT or IOIT

FA;B FB;A

p; (7)

where

A mAqApAnA 1 andB mBqBpBnB 1 in case that is considered;

(8)

A mAqApA 1 andB mBqBpB 1 in case that IT is considered;

(9)

A mAqApA 1 andB mBqBpB 1 in case that IOIT is considered:

(10)

Tables 4 and 5 give the minimum ratios of precision

parameters (, I(T) or I(OIT)) as a function of thedegrees of freedom A and B for (0.05, 0.05)and (0.05, 0.2), respectively. If the methodprecision is known, the degrees of freedom equalto 200 can be used.

With mAmB, qAqB and nAnB2, the minimumnumbers of days required for the detection of the

minimum ratio , I(T) or I(OIT) can be obtained byfirst finding the smallest values for the degrees of

freedom (A and B) that satisfy Eq. (7) and theassociated minimum number of days can be calculated

from Eq. (8) or Eq. (9) or Eq. (10) depending on

which precision parameters are considered. When

the values of and considered correspond to thosegiven in Table 4 or Table 5, the minimum values for

the degrees of freedom are directly obtained by look-

ing for the tabulated , I(T) or I(OIT) that is closest to(preferably smaller than) the given detectable ratio ,I(T) or I(OIT) and finding its associated numbers ofdegrees of freedom (A, B).

The minimum number of measurements required is

computed for the minimum difference , as well as forthe minimum ratios , I(T) and I(OIT) and the largestvalue is chosen to perform the method comparison.

Table 4

Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.05)B A

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200

5 5.05 4.66 4.40 4.22 4.08 3.97 3.88 3.81 3.75 3.70 3.66 3.62 3.59 3.56 3.54 3.52 3.43 3.27 3.15

6 4.66 4.28 4.03 3.85 3.72 3.61 3.53 3.46 3.40 3.36 3.31 3.28 3.25 3.22 3.20 3.17 3.09 2.93 2.81

7 4.40 4.03 3.79 3.61 3.48 3.38 3.29 3.23 3.17 3.12 3.08 3.05 3.02 2.99 2.96 2.94 2.86 2.70 2.59

8 4.22 3.85 3.61 3.44 3.31 3.21 3.13 3.06 3.00 2.96 2.92 2.88 2.85 2.82 2.80 2.78 2.70 2.54 2.42

9 4.08 3.72 3.48 3.31 3.18 3.08 3.00 2.93 2.88 2.83 2.79 2.75 2.72 2.70 2.67 2.65 2.57 2.41 2.29

10 3.97 3.61 3.38 3.21 3.08 2.98 2.90 2.83 2.78 2.73 2.69 2.66 2.62 2.60 2.57 2.55 2.47 2.31 2.19

11 3.88 3.53 3.29 3.13 3.00 2.90 2.82 2.75 2.70 2.65 2.61 2.58 2.55 2.52 2.49 2.47 2.39 2.23 2.11

12 3.81 3.46 3.23 3.06 2.93 2.83 2.75 2.69 2.63 2.59 2.55 2.51 2.48 2.45 2.43 2.41 2.33 2.16 2.05

13 3.75 3.40 3.17 3.00 2.88 2.78 2.70 2.63 2.58 2.53 2.49 2.46 2.42 2.40 2.37 2.35 2.27 2.11 1.99

14 3.70 3.36 3.12 2.96 2.83 2.73 2.65 2.59 2.53 2.48 2.44 2.41 2.38 2.35 2.33 2.30 2.22 2.06 1.94

15 3.66 3.31 3.08 2.92 2.79 2.69 2.61 2.55 2.49 2.44 2.40 2.37 2.34 2.31 2.29 2.26 2.18 2.02 1.90

16 3.62 3.28 3.05 2.88 2.75 2.66 2.58 2.51 2.46 2.41 2.37 2.33 2.30 2.28 2.25 2.23 2.15 1.98 1.86

17 3.59 3.25 3.02 2.85 2.72 2.62 2.55 2.48 2.42 2.38 2.34 2.30 2.27 2.24 2.22 2.20 2.12 1.95 1.83

18 3.56 3.22 2.99 2.82 2.70 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.24 2.22 2.19 2.17 2.09 1.92 1.80

19 3.54 3.20 2.96 2.80 2.67 2.57 2.49 2.43 2.37 2.33 2.29 2.25 2.22 2.19 2.17 2.15 2.06 1.90 1.77

20 3.52 3.17 2.94 2.78 2.65 2.55 2.47 2.41 2.35 2.30 2.26 2.23 2.20 2.17 2.15 2.12 2.04 1.87 1.74

25 3.43 3.09 2.86 2.70 2.57 2.47 2.39 2.33 2.27 2.22 2.18 2.15 2.12 2.09 2.06 2.04 1.96 1.78 1.65

50 3.27 2.93 2.70 2.54 2.41 2.31 2.23 2.16 2.11 2.06 2.02 1.98 1.95 1.92 1.90 1.87 1.78 1.60 1.45

200 3.15 2.81 2.59 2.42 2.29 2.19 2.11 2.05 1.99 1.94 1.90 1.86 1.83 1.80 1.77 1.74 1.65 1.45 1.26


2.5. Evaluation of test results

For each test sample, the following parameters are

to be computed:

s2rA ; s2rB

estimates of the repeatability variance

for methods A and B, respectively

s2DA ; s2DB

estimates of the between-day var-

iance component for methods A and

B, respectively

s2OIDA ; s2OIDB

estimates of the (operatorinstru-mentday) variance component formethods A and B, respectively

s2ITA ; s2ITB estimates of the time-different inter-

mediate precision (variance) for

methods A and B, respectively

s2IOITA ; s2IOITB estimates of the (operatorinstru-

menttime)-different intermediateprecision (variance) for methods A

and B, respectively

s2yijkA ; s2yijkB estimates of the variance of the day-

means yijk for methods A and B,respectively

yA; yB grand means obtained from methodsA and B, respectively.

Calculation of all these parameters are given in

Tables 2 and 3.

2.5.1. Comparison of precision

As mentioned before, it is important to show that the

precision of method B is at least as good as that of

method A. Therefore a one-sided F-test is applied here

instead of the two-sided test used in ISO [1]. The null

hypothesis H0 is that the precision of the alternative

method B is better than or equal to the precision of the

reference method A H0 : 2B 2A and the alterna-tive hypothesis H1 is that the precision of the alter-

native method B is worse than the precision of the

reference method A H1 : 2B > 2A.

2.5.1.1. Comparison of repeatability. To compare the

repeatability of two methods, the sample statistic Fr is

calculated as follows:

Fr s2rBs2rA

(11)

Table 5

Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.2)B A

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200

3 5.22 4.41 4.00 3.76 3.60 3.48 3.39 3.32 3.27 3.23 3.19 3.16 3.13 3.11 3.09 3.07 3.05 3.04 2.99 2.89 2.81

4 4.76 3.98 3.59 3.35 3.19 3.08 2.99 2.92 2.87 2.83 2.79 2.76 2.74 2.71 2.69 2.68 2.66 2.65 2.60 2.49 2.42

5 4.51 3.74 3.35 3.12 2.96 2.85 2.77 2.70 2.65 2.60 2.57 2.54 2.51 2.49 2.47 2.45 2.44 2.42 2.37 2.27 2.20

6 4.35 3.59 3.21 2.97 2.82 2.70 2.62 2.55 2.50 2.46 2.42 2.39 2.37 2.34 2.32 2.31 2.29 2.28 2.22 2.12 2.05

7 4.24 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.36 2.32 2.29 2.26 2.24 2.22 2.20 2.19 2.17 2.12 2.02 1.94

8 4.15 3.41 3.03 2.79 2.64 2.53 2.44 2.38 2.32 2.28 2.24 2.21 2.19 2.16 2.14 2.13 2.11 2.10 2.04 1.94 1.86

9 4.09 3.35 2.97 2.74 2.58 2.47 2.38 2.32 2.26 2.22 2.18 2.15 2.13 2.10 2.08 2.07 2.05 2.04 1.98 1.88 1.80

10 4.04 3.30 2.92 2.69 2.53 2.42 2.34 2.27 2.22 2.17 2.14 2.11 2.08 2.06 2.04 2.02 2.00 1.99 1.93 1.83 1.75

11 4.00 3.26 2.88 2.65 2.50 2.38 2.30 2.23 2.18 2.14 2.10 2.07 2.04 2.02 2.00 1.98 1.96 1.95 1.89 1.78 1.70

12 3.97 3.23 2.85 2.62 2.47 2.35 2.27 2.20 2.15 2.10 2.07 2.03 2.01 1.98 1.96 1.95 1.93 1.91 1.86 1.75 1.67

13 3.94 3.21 2.83 2.60 2.44 2.33 2.24 2.17 2.12 2.08 2.04 2.01 1.98 1.96 1.94 1.92 1.90 1.89 1.83 1.72 1.64

14 3.92 3.18 2.80 2.57 2.42 2.30 2.22 2.15 2.10 2.05 2.02 1.98 1.96 1.93 1.91 1.89 1.88 1.86 1.81 1.69 1.61

15 3.90 3.17 2.79 2.55 2.40 2.28 2.20 2.13 2.08 2.03 1.99 1.96 1.94 1.91 1.89 1.87 1.86 1.84 1.78 1.67 1.59

16 3.88 3.15 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.01 1.98 1.94 1.92 1.89 1.87 1.85 1.84 1.82 1.76 1.65 1.56

17 3.87 3.13 2.75 2.52 2.37 2.25 2.17 2.10 2.04 2.00 1.96 1.93 1.90 1.88 1.86 1.84 1.82 1.80 1.75 1.63 1.55

18 3.86 3.12 2.74 2.51 2.35 2.24 2.15 2.08 2.03 1.98 1.95 1.91 1.89 1.86 1.84 1.82 1.81 1.79 1.73 1.62 1.53

19 3.84 3.11 2.73 2.50 2.34 2.23 2.14 2.07 2.02 1.97 1.93 1.90 1.87 1.85 1.83 1.81 1.79 1.78 1.72 1.60 1.51

20 3.83 3.10 2.72 2.49 2.33 2.22 2.13 2.06 2.01 1.96 1.92 1.89 1.86 1.84 1.82 1.80 1.78 1.76 1.71 1.59 1.50

25 3.79 3.06 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.92 1.88 1.85 1.82 1.79 1.77 1.75 1.73 1.72 1.66 1.54 1.44

50 3.71 2.98 2.60 2.37 2.21 2.09 2.00 1.93 1.88 1.83 1.79 1.76 1.73 1.70 1.68 1.66 1.64 1.62 1.56 1.43 1.32

200 3.65 2.92 2.54 2.31 2.15 2.03 1.94 1.87 1.81 1.76 1.72 1.69 1.66 1.63 1.60 1.58 1.56 1.55 1.48 1.33 1.19


Fr is then compared with FrB ;rA , where FrB ;rA isthe value of the F-distribution with degrees of freedom

of numerator rB nB 1PmB

i1PqB

j1 pijB and deno-minator rA nA 1

PmAi1PqA

j1 pijA, representsthe portion of the F-distribution to the right of the

given value; 0.05.If Fr > FrB ;rA , method B has worse repeatability

than method A at the (1)100% (i.e. 95%) confi-dence level and therefore the repeatability of the

alternative method B is not acceptable.

If Fr FrB ;rA , the repeatability of the alternativemethod B is acceptable, which means that it is at most

a factor worse than that of method A.

2.5.1.2. Comparison of time-different intermediate

precision. For the comparison of the time-different

intermediate precision, we need the number of degrees

of freedom associated with the precision estimates.

Since these estimates are not directly estimated from

the data but are calculated as a linear combination of

two mean squares, MSD and MSE (see Table 3), it is

not evident how to determine the number of degrees of

freedom associated with this compound variance. The

Satherthwaite approximation [11] (see further) can

then be used.

However, to avoid the complexity in the determina-

tion of the degrees of freedom associated with s2IT, thecomparison of time-different intermediate precisions

can be performed in an indirect way by comparing the

day mean squares MSD [12]. Indeed since (see

Table 3):

MSD ns2D s2r and s2IT s2D s2r :It follows that

MSD ns2IT ns2r s2r ns2IT n 1s2r :Therefore provided that the repeatabilities of both

methods are equal 2rA 2rB and the number ofreplicates per day for both methods is equal (nAnB),the day mean squares MSD are considered instead of

s2IT. If nAnB, the equality of the repeatabilities forboth methods is first to be tested H0 : 2rA 2rB ;H1 :

2rA6 2rB by means of a two-sided F-test. The

results obtained from the comparison of the repeat-

abilities in Section 2.5.1.1 cannot be used here, since

a one-sided F-test has been considered to test the

hypotheses: H0 : 2rB 2rA ; H1 : 2rB > 2rA. Anon-significant test, which means that the repeatability

of method B is acceptable, does not necessarily imply

that the repeatabilities of both methods are equal; the

repeatability of method B can be better (smaller) than

the repeatability of method A. Therefore, the repeat-

abilities of both methods have to be compared again

by applying a two-sided F-test. Fr is obtained here as

follows:

Fr s21

s22(12)

with s21 the largest of s2rA

and s2rB .

Fr is then compared with F=2r1 ; r2, whereF=2r1 ; r2 is the value of the F-distribution withdegrees of freedom of numerator r1 and denominatorr2 , /2 represents the portion of the F-distribution tothe right of the given value; 0.05.r1 rA and r2 rB if s2rA > s2rB ;r1 rB and r2 rA if s2rB > s2rA ;

rB nB 1XmBi1

XqBj1

pijB and

rA nA 1XmAi1

XqAj1

pijA :

If Fr F=2r1 ; r2, there is no evidence that thetwo methods have different repeatabilities and there-

fore the equality of the repeatabilities for both meth-

ods can be assumed.

If Fr > F=2r2 ; r2, the repeatabilities of the twomethods are significantly different at the significance

level of (i.e. 5%). In that case the following sim-plified approach to compare the time-different inter-

mediate precisions cannot be applied. The

Satherthwaite approximation to estimate the number

of degrees of freedom associated with s2IT has then tobe used (see further).

In the situation where the number of replicates per

day for both methods are equal (nAnB) and theequality of the repeatabilities for both methods can

be assumed 2rA 2rB, the comparison of time-dif-ferent intermediate precision is performed by calcu-

lating FI(T) as follows:

FIT MSDBMSDA

nBs2DBs2rB

nAs2DAs2rA

nBs

2ITBnB1s

2rB

nAs2ITAnA1s2rA

!:

(13)


FI(T) is compared with FITB ;ITA , whereFITB ;ITA is the value of the F-distribution withdegrees of freedom of the numerator ITB

PmBi1PqB

j1pijB 1 and the denominator ITA PmA

i1PqAj1pijA 1, represents the portion of the F-

distribution to the right of the given value; 0.05.If FIT FITB ;ITA , method B has worse time-

different intermediate precision than method A at the

(1)100% (i.e. 95%) confidence level and thereforethe time-different intermediate precision of the alter-

native method B is not acceptable.

If FIT FITB ;ITA , the time-different inter-mediate precision of the alternative method B is

acceptable, which means that it is at most a factor

IT worse than that of method A.In the situation where the number of replicates per

day for both methods are not equal (nA 6nB) or theequality of the repeatabilities for both methods cannot

be assumed 2rA 6 2rB, the comparison of the time-different intermediate precisions cannot be performed

by Eq. (13) but it must be investigated through the

comparison of s2IT. Then FI(T) is calculated as fol-lows:

FIT s2ITBs2

ITA: (14)

As mentioned earlier, the number of degrees of

freedom associated with s2IT for both methods isobtained from the Satterthwaite approximation:

When the non-integer value is obtained for the I(T),round the number down to the nearest integer.

The comparison of FI(T) with FITB ; ITA,where ITB and ITA are computed from Eqs. (15)and (16), respectively, is then performed in the same

way as mentioned earlier when Eq. (13) is applied.

2.5.1.3. Comparison of (operatorinstrumenttime)-different intermediate precision. In analogy with the

comparison of the time-different intermediate

precision, the comparison of (operatorinstrumenttime)-different intermediate precision can be per-

formed in an indirect way by comparing the

(operatorinstrumentday)-mean squares MSOID. Ifthe number of replicates per day for both methods is

equal (nAnB) and the comparison of the repeata-bilities in Eq. (12) does not give evidence against the

equality of the repeatabilities of both methods (i.e.

2rA 2rB ), FI(OIT) is calculated as follows:

FIOIT MSOIDBMSOIDA

nBs2OIDBs2rB

nAs2OIDAsr2

A

nBs

2IOITBnB1s

2rB

nAs2IOITAnA1s2rA

!: (17)

The comparison of FI(OIT) with FIOITB ;IOITA ,where IOITB

PmBi1PqB

j1 pijB 1 and IOITAPmAi1PqAj1 pijA 1 is then performed in analogywith the comparison of the time-different intermediate

precision mentioned earlier when Eq. (13) is

considered.

In the situation where the number of replicates

per day for both methods are not equal (nA 6nB)or the equality of the repeatabilities of both

methods cannot be assumed 2rA 6 2rB, the compar-ison of (operatorinstrumenttime)-different inter-mediate precision must be investigated through

the comparison of s2IOIT. Then FI(OIT) is calculatedas follows:

FIOIT s2IOITBs2

IOITA: (18)

Again, the further steps to compare FI(OIT) with

FIOITB ; IOITA, where IOITB and IOITA arecomputed from Eqs. (19) and (20), respectively, are in

analogy with the comparison of the time-different

intermediate precision mentioned earlier when

Eq. (13) is considered.

ITB s2ITB

2

MSDB=nB2=PmB

i1PqB

j1pijB 1 nB 1MSEB=nB2=nB 1PmB

i1PqB

j1 pijB; (15)

ITA s2ITA

2

MSDA=nA2=PmA

i1PqA

j1pijA 1 nA 1MSEA=nA2=nA 1PmA

i1PqA

j1 pijA: (16)


2.5.1.4. Comment concerning the ISO 5725-6

procedure. In ISO [1], the comparison of the overall

precision (a term which is not really explained) is

performed in analogy with Eq. (17) without pre-

evaluating the equality of the repeatabilities of both

methods. The fact that the same number of replicates

per laboratory for the two methods (nAnB) isrequired is not taken into consideration either. If the

overall precision refers to the reproducibility, the

indirect comparison of the latter for both methods

by a comparison of the variance of the laboratory

means is only possible if the repeatability and the

number of replicates per laboratory (n) is the same for

both methods. If this is not the case, a direct

comparison of the reproducibility obtained with the

two methods should be performed, possibly in analogy

with Eq. (18).

2.5.2. Evaluation of the bias

2.5.2.1. Comments concerning the ISO 5725-6

procedure. The evaluation of the bias is performed

by comparing the grand means obtained with both

methods. In ISO [1] the comparison is based on the z-

test, since the sample statistic is compared with 2 (an

approximation of the two-sided tabulated z-value of

zz0.051.96). This implies that the estimatedstandard deviations used in the comparison are

obtained from large samples and therefore that they

are sufficiently good estimates of the true standard

deviations. If the sample statistic is larger than 2, the

difference between the means obtained with the two

methods is statistically significant. In that case, to

avoid the rejection of the method with an acceptable

bias, it is further examined whether the estimated bias

d can be considered acceptable. ISO concludes that the

bias is significant, but acceptable if its absolute value

is not larger than /2.However, this approach is questionable for the

following reasons. ISO specifies to be four times

d, where d represents the standard deviation of thedifference between the means of methods A and B.

This is obtained from z=2 zd which with0.05 becomes (1.961.645)d4d. In theevaluation of the bias d is estimated from the experi-ments as sd. If the estimated bias d jyA yBj =2,the 95% confidence interval (CI) around d can be

calculated as

=2 1:96sd or 2d 2sd:

If sd is exactly equal to d, the lower limit of theconfidence interval is equal to 0 and the upper limit is

equal to (see Fig. 2(a)). Since 0 is just included inthe CI, the bias is not significantly different from zero.

(Evaluation of the bias by means of the z-test, as done

in ISO, would of course lead to the same conclusion

since the test statistic, d/sd2d/sd2). Moreover, theprobability that the true absolute bias, as estimated by

IOITB s2IOITB

2

MSOIDB=nB2=PmB

i1PqB

j1 pijB 1 nB 1MSEB=nB2=nB 1PmB

i1PqB

j1 pijB; (19)

IOITA s2IOITA

2

MSOIDA=nA2=PmA

i1PqA

j1 pijA 1 nA 1MSEA=nA2=nA 1PmA

i1PqA

j1 pijA; (20)

Fig. 2. Different situations of a bias evaluation. d: The estimated

absolute difference between the grand means obtained with the two

methods, : the acceptable bias; () 95% confidence interval.(a) sdd and d/2; (b) sd/2; (c) sd>d and d

d/2, exceeds is only 2.5%. ISO considers asignificant bias to be acceptable if d is smaller than

/2. However, if sd equals d there is no point incomparing d with /2 since with d larger than /2 thereis no chance that the significant difference can be

acceptable.

If sd is different from d, as is to be expected,the comparison can lead to wrong conclusions.

Indeed if sd is smaller than d, which will be, e.g.the case if the acceptance criteria for the precision

measure are defining the number of measurements

to be performed, considering the bias to be unaccep-

table if d is larger than /2 can lead to the rejectionof a method with an acceptable bias (see Fig. 2(b)).

On the other hand if sd is larger than d, an unac-

ceptable bias can lead to a non-significant test and

therefore to acceptance of the method (see Fig. 2(c)).

Therefore, it is more appropriate to compare the

one-sided upper 95% confidence limit around d with

to conclude on the acceptability of the method(see further).

2.5.2.2. Adapted approach. In our approach which is

intended for the intralaboratory situation, the standard

deviations are generally estimated from a relatively

small sample size, and therefore the t-test is more

appropriate than the z-test. A two-sided test (H0:

AB; H1: A 6B) is considered since thedifference between the two means can be positive

as well as negative. Therefore

tcal jyA yBj

sd; (21)

where sd represents the estimated standard deviation

of the differences between the means obtained with

the two methods.

The use of t-test requires that the variances of the

day means obtained with the two methods are equal

(i.e. 2yijkA 2yijkB ). This equality must be first tested

by applying a two-sided F-test. The degrees of free-

dom associated with s2yijkA and s2yijkB are

PmAi1PqA

j1pijA 1 and

PmBi1PqB

j1 pijB 1, respectively.If there is no evidence against the equality of 2yijkA

and 2yijkB , sd is calculated by applying the pooledvariance s2p as follows:

sd

s2p

1PmAi1PqA

j1 pijA 1PmB

i1PqB

j1 pijB

!vuut ; (22)where

with

d XmAi1

XqAj1

pijA XmBi1

XqBj1

pijB 2; (24)

When the equality of the variances of the day means,

2yijkA and 2yijkB , cannot be assumed, the variances

cannot be pooled and sd is calculated as follows [13]:

sd

s2yijkAPmA

i1PqA

j1 pijA

s2yijkBPmBi1PqB

j1 pijB

vuut : (25)The number of degrees of freedom d associated with

sd is then calculated by applying the Satterthwaite

approximation:

The tcal is compared with t=2;d , where t=2;d is the

two-sided tabulated t-value at the significance level

0.05 and the degrees of freedom d as indicated inEq. (24) or Eq. (26).

If tcal > t=2;d , the difference between the meansobtained with the two methods is statistically signifi-

cant. Though the difference is significant it might not

s2p PmAi1PqAj1 pijA 1s2yijkA PmBi1PqBj1 pijB 1s2yijkB

d(23)

d s2d2

s2yijkA=PmA

i1PqA

j1 pijA2=PmA

i1PqA

j1 pijA 1 s2yijkB=PmB

i1PqB

j1 pijB2=PmB

i1PqB

j1 pijB 1(26)


be relevant to the application. Therefore it could be

further evaluated whether the difference found can be

considered acceptable. The one-sided (1)100%(i.e. 95% for 0.05) upper confidence limit (UCL)around the absolute difference d is compared with the

acceptance limit . The UCL is obtained as follows:

UCL jyA yBj sd t;d; (27)where sd is as shown in Eq. (22) or Eq. (25) and t;d is

the one-sided tabulated t-value at the significance level

0.05 and the degrees of freedom d (as shown inEq. (24) or Eq. (26)).

If UCL, the bias although statistically significantis acceptable since there is a smaller than (or at most)

probability that the true absolute difference asestimated by jyA yBj is larger than .

If UCL>, the bias is not acceptable since there is alarger than probability that the true absolute differ-ence as estimated by jyA yBj is larger than .

If tcal t=2;d , the difference between the meansobtained with the two methods is statistically insig-

nificant. However, if the precision estimates (s2r and

s2OID) of the two methods obtained experimentally are

larger than those used for the calculation of the

minimum number of measurements required for the

detection of in Eqs. (2)(6), an unacceptable biascan lead to a non-significant test.

Therefore, to limit the risk of adopting a method

with an unacceptable bias, the interval hypothesis

testing as proposed by Hartmann et al. [2] should

be more appropriate for the evaluation of the bias than

the approach mentioned above (Eqs. (21), (22), (23)

(27)). The procedure is as follows.

Calculate the one-sided (1)100% (i.e. 95% for0.05) upper confidence limit (UCL) around theabsolute difference d:

UCL jyA yBj sd t;d; (28)where is 0.05, sd is the same as Eq. (22) or Eq. (25)and d is the same as Eq. (24) or Eq. (26).

Since in interval hypothesis testing, the null and

alternative hypotheses are reversed, the roles of and are also reversed. Therefore, here corresponds tothe probability that a method that is biased to an

unacceptably large extent will be accepted.

If the UCL is not larger than the acceptable bias ,the difference between the grand means of method A

and method B is considered acceptable at the

(1)100% confidence level and the bias of methodB is acceptable. If the UCL is larger than the accep-

table bias , the bias of method B is not acceptable.With this approach the probability of accepting a

method that is too much biased is controlled at 5%.

The evaluation of the bias described above is based

on nested designs performed separately for methods A

and B. If it is possible to design a simultaneous

experiment (e.g. same days and same operators) a

paired comparison [6], for which a smaller sd is to be

expected, could be preferable.

3. Examples

Two examples will illustrate the approach dis-

cussed. In the first example measurements are per-

formed under (operatorinstrumenttime)-differentintermediate precision conditions while in the second

example only time-different intermediate precision

conditions are considered.

3.1. Example 1: quantification of diazepam in

diazepam tablets (the example is fictitious)

3.1.1. Background

Method A is a HPLC method, method B is a UV

(second derivative) method for the quantification of

diazepam in diazepam tablets. A laboratory uses

method A but developed method B as an alternative.

The laboratory wants to compare the performance of

both methods. The results are expressed as percentage

of the labelled amount (%). For method A an estimate

of the precision (sr and sOID) is available: sr1%,sOID2%.

3.1.2. Requirements

The acceptable bias is 2%. The acceptable ratio ofthe standard deviations between the two methods, ,I(T) or I(OIT) is 2. The statistical tests are performedat the significance level 0.05. The probability towrongly adopt the method with an unacceptable per-

formance is set at 0.2.

3.1.3. Experimental design

It is decided that the number of operators, instru-

ments and replicates per day for each method is 2 and

the number of days for both methods is equal (pApB).


From one batch of diazepam tablets, 300 tablets are

randomly taken. They are powdered and kept in a cool,

dry place, e.g. desiccator. Each day during pA days,

two replicates (nA2) prepared from the powderedsamples are analysed with method A on the first

instrument by the first operator. The analysis is

repeated independently in the same way during

another pA days but on the second instrument. The

second operator performs the procedures in the same

way as the first operator does. The experiments for

method B are designed in the same way as those for

method A.

3.1.4. Determination of the minimum number of days

(1) For the detection of . Since pApB andnAnB2, Eq. (4) is used:

2 t=2 t

2s2OIDA s2rA=nA

mAqApA

s;

2 t=2 t

222 12=2

4pA

s:

With pA4 and[2mAqApA2][2(224)2]30, (t/2t)2.896 and the right side of the equa-tion above equals 2.172; with pA5 and [2mAqApA2][2(225)2]38, (t/2t)2.876 and the right side of the equation above equals

1.929. Hence pApB5. (The use of a constant multi-plication factor equal to 3 would yield pApB6.)

(2) For the comparison of precision measures. From

Table 5 it can be seen that (or I(T) or I(OIT))2 isgiven by AB14.

To compare repeatability,

A mAqApA and B mBqBpB;so pA pB 14=4 3:5 4:To compare time-different intermediate precision,

A mAqApA 1 and B mBqBpB 1;so pA pB 14=4 1 4:5 5:To compare (operatorinstrumentday)-different

intermediate precision,

A mAqApA 1 and B mBqBpB 1;so pA pB 15=4 3:75 4:

(3) Conclusion. The minimum number of days

required for both methods (with two operators, two

instruments and two measurements per day) is 5.

3.1.5. The data

The data for methods A and B are summarized in

Tables 6 and 7, respectively.

3.1.6. Investigation of outliers

Grubbs tests were applied to the day means [8]. No

single or double stragglers or outliers were found for

both methods.

Table 6

Data obtained with method A (example 1)

Operator Instrument 1 Instrument 2

Day yi1k1 yi1k2 yi1k yi1 Day yi2k1 yi2k2 yi2k yi2

1 1 97.13 98.81 97.970 97.888 1 98.98 99.52 99.250 99.253

2 101.23 100.68 100.955 2 102.84 100.93 101.885

3 97.13 96.63 96.880 3 99.65 99.29 99.470

4 97.17 95.82 96.495 4 98.67 98.46 98.565

5 96.82 97.46 97.140 5 97.08 97.11 97.095

2 1 100.71 99.37 100.040 100.208 1 101.55 104.04 102.795 102.211

2 101.26 103.78 102.520 2 99.66 98.70 99.180

3 98.49 100.87 99.680 3 102.54 100.60 101.570

4 97.06 98.92 97.990 4 104.95 102.61 103.780

5 101.85 99.77 100.810 5 103.10 104.36 103.730

Grand mean99.890


3.1.7. Calculation of the variance estimates

Tables 8 and 9 summarize the calculation of the

variance estimates for methods A and B, respectively.


3.1.8.1. Repeatability. The repeatabilities are com-

pared according to Eq. (11):

Fr 1:48101:2317

1:20:

This is to be compared with F0.05(20,20)2.12. SinceFr

of replicates per day for both methods is equal

(nAnB), the comparison of time-different intermedi-ate precision is performed according to Eq. (13):

FIT MSDBMSDA

9:70426:3345

1:53:

This is to be compared with F0.05(16,16)2.33. SinceFI(T)

This is to be compared with the acceptable bias

2. Since UCL>2 the bias of method B is notacceptable. Notice that in this approach the probability

of accepting a method that is too much biased is

controlled at 5%.

3.2. Example 2: determination of moisture in cheese

(the example is fictitious)

3.2.1. Background

Method A is a Karl Fischer method, method B is a

vacuum oven method for the determination of moist-

ure in cheese. A laboratory uses method A but devel-

oped method B as an alternative. The laboratory wants

to compare the performance of both methods. The

results are expressed as percentage moisture. For

method A an estimate of the precision (s2r and s2D)

is available: s2r 0:023; s2D 0:08

3.2.2. Requirements

The acceptable bias is 0.50%. The acceptableratio of the standard deviations between the two

methods, or I(T) is 3. The statistical tests areperformed at the significance level 0.05. The prob-ability to wrongly adopt the method with an unac-ceptable performance is set at 0.05.

3.2.3. Experimental design

The material is a cheese, analysed with both

methods.

It is decided that the number of replicates per day

for each method is two and the number of days for

both methods is equal (pApB). Each day during pAdays, two independent samples (nA2) from thecheese are analysed with method A by the same

operator using the same instrument. Each day during

pB days, two independent samples (nB2) from thecheese are analysed with method B by the same

operator using the same instrument.

3.2.4. Determination of the minimum number of days

(1) For the detection of . Since pApB andnAnB2, Eq. (4) is used. Since in the comparisononly time-different intermediate precision conditions

are considered, the number of operators, mAmB1and the number of instruments, qAqB1. Conse-quently, as can be derived form Table 3, MSOID and

s2OID are equal to MSD and s2D, respectively.

0:5 t=2 t

2s2DA s2rA=nA

pA

s;

0:5 t=2 t

20:08 0:023=2

pA

s:

With pA10 and [2mAqApA2][2(1110)2]18, (t/2t)3.835 and the right side of theequation above equals 0.519; with pA11 and [2mAqApA2][2(1111)2]20, (t/2t)3.811 and the right side of the equation above equals0.492. Hence pApB11. (The use of a constant multi-plication factor equal to 4 would yield pApB12.)

(2) For the comparison of precision. From Table 4 it

can be seen that 3 orI(T)3 is given by AB10.To compare repeatability standard deviations,

A mAqApA and B mBqBpB;so pA pB 10:

To compare between-day mean squares,

A mAqApA 1 and B mBqBpB 1;so pA pB 10 1 11:

(3) Conclusion. The minimum number of days

required (with two measurements per day) is 11.

3.2.5. The data

The data for methods A and B are summarized in

Table 10.

3.2.6. Investigation of outliers

Grubbs tests were applied to the day means [8]. No

single or double stragglers or outliers were found for

method A. For method B the single Grubbs test

applied on the mean of day 9 is significant at the

5% level but not at the 1% level. Indeed

G 39:845 39:4860:1411

2:544;

which is to be compared with Grubbs critical values

for p11 at 5% (2.355) and 1% (2.564). Therefore,since this observation is considered as a straggler it is

retained but indicated with a in Table 10.

3.2.7. Calculation of the variance estimates

Tables 11 and 12 summarize the calculation of the

variance estimates for methods A and B, respectively.


Table 10

Data for the example 2

Day Method A Method B

y11k1 y11k2 y11k y11k1 y11k2 y11k

1 39.68 39.77 39.725 39.29 39.36 39.325

2 39.08 39.38 39.230 39.51 39.38 39.445

3 40.39 40.33 40.360 39.45 39.49 39.740

4 39.87 39.98 39.925 39.59 39.51 39.550

5 39.70 39.95 39.825 39.41 39.41 39.410

6 39.93 39.95 39.940 39.45 39.54 39.495

7 39.78 39.97 39.875 39.55 39.55 39.550

8 39.92 40.20 40.060 39.29 39.36 39.325

9 40.34 39.89 40.115 39.82 39.87 39.845a

10 40.12 40.26 40.190 39.44 39.45 39.445

11 39.43 39.54 39.485 39.45 39.53 39.490

Grand mean39.884 Grand mean39.486a Straggler.

Table 11

Calculation of the variance estimates for method A (example 2) (ANOVA table)


Day MSD0.2050 2rA nAs2DAResidual MSE0.0239 2rA


The repeatability variance s2rA 0:0239; 11The between-day variance component s2DA

0:2050 0:02392

0:0906Time-different intermediate precision (variance) s2ITA s

2DA s2rA 0:1145

Variance of the day means yijk s2yijkA s

2DA s2rA=nA 0:1025; 10

Table 12

Calculation of the variance estimates for method B (example 2) (ANOVA table)

Source Mean Squares Estimate of

Day MSD0.0397 2rB nBs2DBResidual MSE0.0024 2rB


The repeatability variance s2rB 0:0024; 11The between-day variance component s2DB

0:0397 0:00242

0:0187Time-different intermediate precision (variance) s2ITB s

2DB s2rB 0:0211

Variance of the day means yijk s2yijkB s

2DB s2rB=nB 0:0199; 10



3.2.8.1. Repeatability. The repeatabilities are com-

pared according to Eq. (11):

Fr 0:00240:0239

0:10:

This is to be compared with F0.05(11,11)2.82. SinceFr3.47 there is evidence that the

repeatabilities of both methods are different (in fact

the repeatability for method B is better than for

method A).

(ii) Since the repeatabilities of both methods are

different 2rA 6 2rB, the comparison of time-differentintermediate precision is performed according to

Eq. (14):

FIT S2ITBS2

ITA 0:0211

0:1145 0:18:

The number of degrees of freedom associated with

s2IT for both methods is obtained from the Sat-terthwaite approximation (Eqs. (15) and (16)):

ITB 0:02112

0:0397=22=10 0:0024=22=11 11;

ITA 0:11452

0:2050=22=10 0:0239=22=11 12:

FI(T) is to be compared with F0.05(11,12)2.72. SinceFI(T)3.72 there is evidence that the variances of

the day means obtained with the two methods are

different. Therefore, the standard deviation sd and its

associated degrees of freedom d are calculated asfollows (see Eqs. (25) and (26)):

sd

0:1025

11 0:0199

11

r 0:1055;

d 0:01112

0:1025=112=10 0:0199=112=10 13:

The test statistic tcal is obtained as given in Eq. (21):

tcal jyA yBj

sd j39:884 39:486j

0:1055 3:77:

This is to be compared with t0.025;132.16. Sincetcal>2.16 the difference between the grand means of

the two methods is statistically significant at 0.05.To further evaluate whether the difference found

can be acceptable, the UCL is calculated according to

Eq. (27):

UCL j39:884 39:486j 0:1055 t;13 0:398 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:

Since the UCL is larger than the acceptable bias

0.50, there is evidence that the difference betweenthe means of the two methods is unacceptable.

(If the probability is allowed to be 0.2, the UCLwould be (0.398(0.10550.870))0.490 which issmaller than 0.50 and the bias would be accept-able.)

In the interval hypothesis testing approach, the one-

sided 95% upper confidence limit (UCL) around the

absolute difference between the grand means is cal-

culated as follows (Eq. (28))):

UCL j39:884 39:486j 0:1055 t;13 j39:884 39:486j 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:

This is to be compared with the acceptable bias

0.5. Since UCL>0.5 the bias of method B is not


acceptable. Notice that in this example, the interval

hypothesis leads to the same conclusion as the point

hypothesis (with the inclusion of the -error consid-eration) does. This is due to the fact that with the point

hypothesis testing, a significant difference between

the means of both methods is detected and that in the

further evaluation whether the difference found can be

considered acceptable, the probability considered isequal to the probability in the interval hypothesistesting.

4. Conclusion

An approach for the comparison of an alternative

method to a reference method has been proposed

for the intralaboratory situation. Instead of the repro-

ducibility (as included in the ISO guidelines), the

(operatorinstrumenttime)-different intermediateprecision is considered in the comparison. The pro-

posal includes:

1. the experimental design (i.e. the determination of

the number of measurements required to perform

the comparison),

2. the estimation of different precision parameters

and the comparison of these precision measures

for both methods, and

3. the statistical approach for the evaluation of the

bias in which the interval hypothesis testing has

also been proposed as an alternative.

The comparison of the bias and precision described

in this article is performed at a single concentration

level. If the alternative method is intended for use over

a rather broad concentration range, the comparison

should be performed at more concentration levels (e.g.

low, middle and high). Due to the problem with

multiple comparison [5], the present approach is not

recommended if the methods are to be compared at

more than three levels. For trace analysis, an evalua-

tion of the detection and quantification limit should

also be performed.

The proposal is an optimal approach in the sense

that it is based on sample size calculations. The

number of measurements to be performed are such

that there is a high probability (1) that an alter-native method with an unacceptable performance will

not be adopted. This of course is of utmost importance

but might require a number of measurements that the

laboratory is not able (or not willing) to perform

because of time and cost involved. If this is the case,

an alternative approach based on a number of mea-

surements that in practice is feasible, is required. Two

approaches can be conceived. The first is to perform

the comparison, based on a user-defined number of

measurements, in the classical way using point

hypothesis testing and to evaluate the b-error. In thisway the laboratory would at least have an idea of the

probability that an alternative method with an unac-

ceptable performance has been accepted and thus of

the risk that is run that the method will not perform as

expected during routine use of the method.

Another approach is to control the probability that a

method with unacceptable performance characteris-

tics will be adopted by using interval hypothesis

testing. The latter was already included here as an

alternative for the evaluation of the bias but can also be

considered in the comparison of precision measures.

After it was proposed for the evaluation of the bias in

method validation studies by Hartmann et al. [2],

interval hypothesis testing has been considered by

the SFSTP in a guideline for the validation of bioa-

nalytical methods [14].

Acknowledgements

This work has received financial support from the

European Commission (Standards, Measurements and

Testing Programme Contract SMT4-CT95-2031) and

the Belgian government (The Prime Minister Services

Federal Office for Scientific, Technical and Cultural

Affairs, Standardisation Programme Research Con-

tract no/03/003).

References

[1] International Standard, Accuracy (Trueness and Precision) of

Measurement methods and results, ISO 5725-6, Geneva,

1994.

[2] C. Hartmann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander

Heyden, P. Vankeerberghen, D.L. Massart, Anal. Chem. 67

(1995) 4491.



1994.


[4] International Standard, Statistics (Vocabulary and symbols):

Design of experiments, ISO 3534-3, Geneva, 1985.

[5] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De

Jong, P.J. Lewi, J. Smeyers-Verbeke, Hand book of Chemo-

metrics and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.

[6] D.C. Montgomery, Design and Analysis of Experiments, 4th

ed., Wiley, New York, 1997.

[7] J. Mandel, The Statistical Analysis of Experimental Data,

Dover, New York, 1964, p. 359.



1994.

[9] W. Gerisch, D. Abraham, Comput. Stat. Quarterly 4 (1989)

299.

[10] D.J. Schuirmann, J. Pharmacokinet. Biopharm. 15 (1987)

657.

[11] F.E. Satterthwaite, Biomed. Bull. 2 (1946) 110.

[12] G.T. Wernimont, Use of Statistics to Develop and Evaluate

Analytical Methods, in: W. Spendley (Ed.), AOAC, Arlington,

VA, 1985, p. 39.

[13] G.W. Snedecor, W.G. Cochran, Statistical Methods, 7th ed.,

The Iowa state University Press, Ames, Iowa, 1982, p. 96.

[14] E. Chapuzet, N. Mercier (Presidents), S. Bervoas-Martin, B.

Boulanger, P. Chevalier, P. Chiap, D. Grandjean, P. Hubert, P.

Lagorce, M. Lallier, M.C. Laparra, M. Laurentie, J.C. Nivet,

S.T.P. Pharma Prat. 7 (3) (1997) 169.


comparison of alternative measurement methods (kuttatharmmakul)

Documents