Transcript
  • Comparison of alternative measurement methods

    Siriporn Kuttatharmmakul, D. Luc Massart, Johanna Smeyers-Verbeke*

    ChemoAC, Pharmaceutical Institute, Vrije Universiteit Brussel, Laarbeeklaan 103, B-1090, Brussel, Belgium

    Received 19 February 1998; received in revised form 14 July 1998; accepted 14 July 1998

    Abstract

    A procedure to compare the performance (precision and bias) of an alternative measurement method and a reference method

    has been extensively described. It is based on ISO 5725-6 which has been adapted to the intralaboratory situation. This means

    that the proposed approach does not evaluate the reproducibility, but considers the (operatorinstrumenttime)-differentintermediate precision and/or the time-different intermediate precision. A 4-factor nested design is used for the study. The

    calculation of different variance estimates from the experimental data is carried out by ANOVA. The Satterthwaite

    approximation is included to determine the number of degrees of freedom associated with the compound variances. Taken into

    account the acceptable bias, the acceptable ratio between the precision parameters of the two methods, the significance level and the probability to wrongly accept an alternative method with an unacceptable performance, the formulae to determinethe number of measurements required for the comparison are given. For the evaluation of the bias, in addition to the point

    hypothesis testing, the interval hypothesis testing is also included as an alternative. Two examples are given as an illustration

    of the proposed approach. # 1999 Elsevier Science B.V. All rights reserved.

    Keywords: Comparison; Alternative measurement method; Bias; Precision; Repeatability; Time-different intermediate precision;

    (Operatorinstrumenttime)-different intermediate precision; Nested design; ANOVA; Satterthwaite approximation; Interval hypothesistesting

    1. Introduction

    When a laboratory wants to replace an existing

    analytical method by a new method (e.g. because

    the latter is cheaper or easier to use) it has to show

    that the new method performs at least as good as the

    existing one. A comparison of the performance (pre-

    cision and bias) of both methods has therefore to be

    performed. One of the most advanced guidelines for

    the comparison of two methods can be found in ISO

    5725-6 [1]. However the ISO guideline is based on

    interlaboratory studies and is therefore not applicable

    in the intralaboratory situation. Indeed within a single

    laboratory, the reproducibility, as evaluated by ISO,

    cannot be determined but intermediate precision con-

    ditions, such as changes in operator, equipment and

    time should be considered since they contribute to the

    variability of measurements performed in the labora-

    tory.

    In the ISO guideline the reference method is an

    international standard method that was studied in an

    interlaboratory test program and its precision (2) isassumed to be known. This assumption is reasonable

    since the precision is obtained from a large number of

    measurements. In the intralaboratory situation a

    Analytica Chimica Acta 391 (1999) 203225

    *Corresponding author. Tel.: +32-2477-4737; fax: +32-2477-

    4735; e-mail: [email protected]

    0003-2670/99/$ see front matter # 1999 Elsevier Science B.V. All rights reserved.PII: S 0 0 0 3 - 2 6 7 0 ( 9 9 ) 0 0 1 1 5 - 4

  • laboratory has developed a first method and later on

    wishes to compare a new method to the older already

    internally validated method. For the latter, referred to

    as the reference method, only an estimate of the

    precision (s2) will be available since the precision is

    determined from a rather limited number of measure-

    ments. This of course determines the statistical tests to

    be used in the comparison of the performance char-

    acteristics of both methods.

    Moreover, the ISO standard is meant to show that

    both methods have similar precision and/or trueness

    whereas a laboratory that performs a method compar-

    ison study is interested to evaluate whether the new

    method is at least as good as the reference method.

    This implies that some two-sided statistical tests

    included in the ISO guideline are not appropriate

    for the comparison of two methods in a single labora-

    tory, where example in the evaluation of the precision

    one-sided tests have to be considered.

    In the decision making concerning the new alter-

    native method it is important (i) not to reject an

    alternative method which in fact is appropriate, and

    (ii) not to accept an alternative method which in fact is

    not appropriate. The former is related to the a-error ofthe statistical tests used in the comparison and is

    controlled through the selection of the significance

    level. The latter is related to the b-error and when it isconsidered it is generally taken into account by includ-

    ing sample size calculations. This approach is also

    included in the ISO guideline.

    In this article we propose an adaptation of the ISO

    guideline to the intralaboratory comparison of two

    methods. It is also applicable to the situation in which

    two laboratories of, e.g., the same organisation are

    involved, each laboratory being specialized in one of

    the methods. For the evaluation of the bias, in addition

    to the point hypothesis testing, interval hypothesis

    testing [2] in which the probability of accepting a

    method that is too much biased is controlled, is also

    included.

    Due to the specified acceptance criteria for the

    alternative method, the proposed approach might lead

    to a large number of measurements to be performed.

    An alternative approach (which will be described in a

    next article) is to perform the method comparison with

    a user-defined number of measurements and to eval-

    uate the probability that a method with an unaccep-

    table performance will be accepted.

    2. Methods

    All symbols and abbreviations used in this paper are

    defined in Table 1.

    2.1. Experimental design

    A 4-factor nested experimental design is used

    [37]. This design is also one of the designs recom-

    mended by ISO [3]. The schematic layout of the

    design is given in Fig. 1. The four factors represent

    four sources of variation that contribute to the varia-

    bility of the measurements within one laboratory. The

    factors considered are operator, instrument, time, and

    random error. The experimental approach can be

    described as follows. For each analytical method,

    the sample is analysed by m operators. Each operator

    performs, on each of q instruments, n replicated

    measurements on each of p different days. To avoid

    an underestimation of the day effect, the set of p

    different days during which the measurements are

    performed on each of the q instruments must be

    different, i.e. two instruments cannot be operated on

    the same day.

    Fig. 1. Schematic layout for the 4-factor nested experimental

    design applied. Only the nested structure under the ith operator, jth

    instrument and the kth day is shown here. The nested structure

    under other operators, instruments and days has the same pattern.

    (instruinstrument, repreplicate).

    204 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • Table 1

    Definition of symbols and abbreviations applied in the document

    d Absolute difference between the grand means obtained with two methods

    D Component of day effect in a test result

    E Random error component occurring in every test result

    FI(OIT) Calculated F-value obtained from the comparison of (operatorinstrumenttime)-different intermediate precision (variance)FI(T) Calculated F-value obtained from the comparison of time-different intermediate precision (variance)

    Fr Calculated F-value obtained from the comparison of repeatability variance

    FB ;A Value of the F-distribution with B degrees of freedom associated with the numerator and A degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value

    FA ;B Value of the F-distribution with A degrees of freedom associated with the numerator and B degrees of freedom associated withthe denominator; represents the portion of the F-distribution to the right of the given F-value

    I Component of instrumental effect in a test result

    m Number of operators

    M General mean (expectation) of the test results

    MS Mean squares

    n Number of replicates performed on each day

    n Average number of replicates performed on each dayN Total number of measurements

    O Component of operator effect in a test result

    p Number of days

    q Number of instruments

    s Estimate of

    s2 Estimate of 2

    tcal Calculated t-value obtained from the comparison of the means obtained with two methods

    t/2 Two-sided tabulated t-value at significance level and degrees of freedom

    t One-sided tabulated t-value at significance level and degrees of freedom

    UCL Upper confidence limit

    y Test resulty Grand mean of test resultsyi Arithmetic mean of the test results obtained from the ith operatoryij Arithmetic mean of the test results obtained from the ith operator and the jth instrumentyijk Arithmetic mean of the test results obtained from the ith operator, the jth instrument and the kth dayyijkL Particular test result related to the Lth replicate of the kth day, the jth instrument and the ith operator

    z/2 Two-sided tabulated z-value of the standard normal distribution at significance level

    Significance level (type I error probability)

    Type II error probability Detectable difference between the means obtained from the two methods

    Numbers of degrees of freedom

    Detectable ratio between the repeatability standard deviations of method B and method A True value of a standard deviation

    2 True value of variance

    I(OIT) Detectable ratio between the square roots of the (operatorinstrumentday) mean squares (or the (operatorinstrumenttime)-different intermediate precision (standard deviation)) of method B and method A

    I(T) Detectable ratio between the square roots of the between-day mean squares (or the time-different intermediate precision

    (standard deviation)) of method B and method A

    Symbols used as superscripts and subscripts

    A Method A

    B Method B

    d Difference between the grand means obtained with two methods

    D Between-day

    E Residual

    I Between-instrument

    I(T) Time-different intermediate precision

    I(OIT) (Operatorinstrumenttime)-different intermediate precisioni Index for a particular operator

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 205

  • 2.2. Basic statistical model

    To understand the following statistical approach, it

    is necessary to briefly explain the basic statistical

    model. More details can be found in [3].

    Here, we assume that every test result y obtained

    with a particular analytical method is the sum of five

    components

    y M O I D E; (1)where M is the general mean (expectation) of the test

    results, O the random effect caused by changing the

    operator, I the random effect caused by changing the

    instrument, D the random effect caused by the fact that

    measurements are performed on different days, and E

    is the random error occurring in every measurement

    under repeatability conditions.

    These four factors (operator, instrument, time, and

    random error under repeatability conditions) are

    selected for our approach, since they are the main

    sources that contribute to the variability of the mea-

    surements within a laboratory. The precision of the

    method is then determined by the contribution from

    the variance (2) of each factor, i.e. 2O; 2I ;

    2D and

    2E, which are estimated as s2O; s

    2I ; s

    2D and s

    2E, respec-

    tively. Since it can be assumed that these estimated

    variance components are not related, the estimation of

    the overall precision parameter also called the (oper-

    atorinstrumenttime)-different intermediate preci-sion S2IOIT can be obtained by the sum of allvariance components: s2O s2I s2D s2E. It is an esti-mate of the variance of an individual measurement

    made by an arbitrary operator on an arbitrary instru-

    ment. When in the laboratory, the analyses are per-

    formed by the same operator on a single instrument,

    the overall precision corresponds with the time-dif-

    ferent intermediate precision which is obtained as

    s2IT s2D s2E. The intermediate precision is usefulfor indicating the ability of the analytical method to

    repeat the test result under the defined conditions.

    2.3. Calculation of the variance estimates

    In analogy with ISO guidelines [3], the calculation

    of different variance estimates is carried out by

    ANOVA (see Table 2). In case that the numbers of

    replicates per day (nijk), as well as the numbers of

    instruments performed by each operator (qi), are equal

    for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij,the calculation is simplified as shown in Table 3. The

    number of days (pij) might not be constant for different

    operators and instruments if the detection of outlying

    day means yijk leads to the rejection of some data. Ifhowever pij is equal for all i and j then the termsPm

    i1Pq

    j1 pij andPm

    i1Pq

    j1pij 1 which appearin Table 3 are simply replaced by mqp and mq(p1),respectively. Throughout the rest of the text the cal-

    culations as represented in Table 3 will be considered.

    No calculation is given for the individual variance

    component for operators s2O and for the individualvariance component for instruments s2I in theANOVA tables. Since the number of operators and

    instruments within a single laboratory is generally

    limited, a small value for the degrees of freedom

    associated with the variance components, s2O and s2I ,

    is to be expected. Consequently, poor estimates for s2Oand s2I will be obtained. Therefore, besides the time-

    different intermediate precision s2IOIT, the (opera-torinstrumenttime)-different intermediate preci-sion s2IOIT is estimated as shown in Table 3. Thisestimate includes the calculation of MSOID which is

    obtained from the sum of squared differences between

    Table 1 (Continued )

    j Index for a particular instrument

    k Index for a particular day

    L Index for a particular test result performed by the ith operator, on the jth instrument and kth day

    m Number of operators

    nijk Number of replicates performed by the ith operator on the jth instrument and kth day

    pij Number of days performed by the ith operator on the jth instrument

    qi Number of instruments performed by the ith operator

    O Between-operator

    OID (Operatorinstrumentday)r Repeatability

    206 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • Table 2

    Calculation of the variance components (ANOVA table)

    Source Mean squares Estimate of

    Operatorinstrumentday

    MSOID

    Xmi1

    Xqij1

    Xpijk1

    nijkyijk y2

    Xmi1

    Xqij1

    pij 1

    2r n2OID

    Day MSD

    Xmi1

    Xqij1

    Xpijk1

    nijkyijk yij2Xmi1

    Xqij1pij 1

    2r n2D

    Residual MSE

    Xmi1

    Xqij1

    Xpijk1

    XnijkL1yijkL yijk2

    Xmi1

    Xqij1

    Xpijk1nijk 1

    2r

    yijk

    XnijkL1

    yijkL

    nijk; yij

    Xpijk1

    nijkyijkXpijk1

    nijk

    ; y

    Xmi1

    Xqij1

    Xpijk1

    nijkyijk

    N ; n N

    Xmi1

    Xqij1

    Xpijk1nijk2=N

    !Xmi1

    Xqij1

    pij 1

    0BBBB@1CCCCA; N

    Xmi1

    Xqij1

    Xpijk1

    nijk total number of measurements

    0BBBB@1CCCCA

    Calculation of the variance estimates

    The repeatability variance s2r MSE; Xmi1

    Xqij1

    Xpijk1nijk 1

    The between-day variance component s2D MSD MSE

    nif s2D < 0 set s

    2D 0

    The (operatorinstrumentday) variance component s2OID MSOID MSE

    nif s2OID < 0 set s

    2OID 0

    Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE

    n

    (Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE

    n

    Variance of the day means yijk s2yijk

    Xmi1

    Xqij1

    Xpijk1yijk y2

    Xmi1

    Xqij1

    pij 1 MSOID

    n s2OID s2r =n;

    Xmi1

    Xqij1

    pij 1

    nijk is the number of replicates on the kth day performed on the jth instrument by the ith operator (L1, 2, . . ., nijk); pij the number of days performed on the jth instrument by the ithoperator (k1, 2, . . ., pij); qi the number of instruments performed by the ith operator (j1, 2, . . ., qi); m is the number of operators (i1, 2, . . ., m).

    S.

    Ku

    ttath

    arm

    ma

    kul

    eta

    l./An

    alytica

    Chim

    icaA

    cta391

    (1999)

    203225

    207

  • Table 3

    Calculation of the variance components in case of equal nijk and equal qi for all i1, 2, . . ., m, j1, 2, . . ., q and k1, 2, . . ., pij. Only pij that may be unequal for different operatorsand instruments due to possible rejection of some discordant data (ANOVA table)

    Source Mean squares Estimate of

    Operatorinstrumentday MSOID nXmi1

    Xqj1

    Xpijk1yijk y2Xm

    i1

    Xqj1

    pij 12r n2OID

    Day MSD nXmi1

    Xqj1

    Xpijk1yijk yij2Xm

    i1

    Xqj1pij 1

    2r n2OID

    Residual MSE

    Xmi1

    Xqj1

    Xpijk1

    XnL1yijkL yijk2

    n 1Xmi1

    Xqj1

    pij

    2r

    yijk

    XnL1

    yijkL

    n; yij

    Xpijk1

    yijk

    pij; y

    Xmi1

    Xqj1

    Xpijk1

    XnL1

    yijkL

    nXmi1

    Xqj1

    pij

    0BBBB@1CCCCA

    Calculation of the variance estimates

    The repeatability variance s2r MSE; n 1Xmi1

    Xqj1

    pij

    The between-day variance component s2D MSD MSE

    nif s2D < 0 set s

    2D 0

    The (operatorinstrumentday) variance component s2OID MSOID MSE

    nif s2OID < 0 set s

    2OID 0

    Time-different intermediate precision (variance) s2IT s2D s2r MSD n 1MSE

    n

    (Operatorinstrumenttime)-different intermediate precision (variance) s2IOIT s2OID s2r MSOID n 1MSE

    n

    Variance of the day means yijk s2yijk

    Xmi1

    Xqj1

    Xpijk1yijk y2

    Xmi1

    Xqj1

    pij 1 MSOID

    n s2OID s2r =n;

    Xmi1

    Xqj1

    pij 1

    n is the number of replicates (L1, 2, . . ., n); pij the number of days performed on the jth instrument by the ith operator (k1, 2, . . ., pij); q the number of instruments (j1, 2, . . ., q);m is the number of operators (i1, 2, . . ., m).

    20

    8S

    .K

    utta

    tha

    rmm

    aku

    let

    al./A

    na

    lyticaC

    him

    icaA

    cta391

    (1999)

    203225

  • the day means yijk and the grand mean y. Thismight result in an underestimation of the effects of the

    instrument and the operator since those parameters are

    not changed for every yijk obtained. However, this isthe best possible approach to estimate the intermediate

    precision s2IOIT with small numbers of operators andinstruments and although it might not adequately

    reflect the true precision it is useful for comparison

    studies as long as the number of operators and instru-

    ments for the methods being compared are equal.

    Considering the formulae to calculate the between-

    day variance component s2D and the (opera-

    torinstrumentday) variance component s2OID inTable 3, negative values for those parameters can be

    obtained. For example, if due to random effects MSDis smaller than MSE, we will get a negative value for

    s2D. In that case, the negative estimates of variance are

    given the value 0. This is the usual practice which is

    also considered by ISO [8] if a negative value for the

    between-laboratory variance s2L is obtained. Another

    approach to deal with negative variance estimates is

    reported in [9]. It applies the method of pooling

    minimal mean squares with predecessors.

    2.4. Number of measurements

    As mentioned earlier the probability to accept an

    alternative method, which in fact is not appropriate

    (-error) because it is not precise enough or toomuch biased in comparison with the reference method,

    can be controlled by determining the number of

    measurements required to detect a certain bias as well

    as a certain difference in precision (if it exists). This

    implies that an acceptable difference between the

    means of the two methods as well as an acceptable

    ratio between the precision parameters of the two

    methods have to be specified. The former is called

    by ISO the detectable difference between the biases of

    the two methods, , and is defined as the minimumdifference between the means of the two methods that

    the experimenter wishes to detect with high probabil-

    ity. The latter is called by ISO the detectable ratio

    between the precision parameters of the two methods.

    It is defined as the minimum ratio of precision para-

    meters that the experimenter wishes to detect with

    high probability from the results obtained with the two

    methods. In analogy with what is given in ISO, the

    detectable ratio to be considered in the intralaboratory

    situation are:

    rBrA

    for the comparison of repeatabilities;

    IT

    MSDBMSDA

    sfor the comparison of time-

    different intermediate precisions;

    IOIT

    MSOIDBMSOIDA

    sfor the comparison of

    operator instrument time-differentintermediate precisions:

    Due to the complexity in the determination of the

    degrees of freedom associated with I(T) and I(OIT)(see further), the detectable ratios I(T) and I(OIT) aregiven in terms of the mean squares.

    It is recommended to use a significance level of

    0.05 in the comparison of the precision parametersand the means ( represents the probability that thealternative method B is rejected when in fact its

    performance is not worse than that of the reference

    method A). ISO recommends that the risk of failing to

    detect the chosen minimum ratio of standard devia-

    tions or the minimum difference between the means is

    set at 0.05. For the intralaboratory situation thismight be too stringent and therefore 0.05 as well as0.2 will be considered. The latter is inspired by therequirement in bioequivalence studies [10], where it is

    demanded that the statistical tests have 80% power

    (power100(1)).

    2.4.1. Determination of the minimum number of

    measurements required for the detection of In the ISO document [1], the precision (2) is

    assumed to be known and the repeatability variance

    as well as the between-laboratory variance is included

    in the calculation for the optimal number of measure-

    ments. In what follows, this is adapted to the situation

    in which only an estimate of the precision (s2) is

    available. This requires the use of t-values instead

    of z-values (applied in ISO). Moreover, the repeat-

    ability variance as well as the (operatorinstrumentday) variance component is considered.

    The following equation is used for the determina-

    tion of the minimum number of measurements

    required for the detection of .

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 209

  • where the subscript A and B refer to method A and

    method B, respectively, t/2: two-sided tabulated

    t-value at significance level and degrees of freedommAqApAmBqBpB2, t,: one-sided tabulatedt-value at significance level and degrees of freedommAqApAmBqBpB2.

    This expression is based on the t-test for the com-

    parison of two means and therefore assumes that the

    precision of both methods are equal. This assumption

    should be acceptable for an estimation of the optimal

    number of measurements. If the precision of the

    alternative method B is unknown which might often

    be the case, it is substituted by the precision of the

    reference method A.

    where is the acceptable difference between themeans, which one wants to detect with (1)100%confidence from a two-tailed t-test performed at the

    significance level . The t-distribution of the non-zeromean difference is a non-central t-distribution. There-

    fore, instead of (t/2t), the non-centrality parameterof the non-central t-distribution should be used. An

    evaluation of the effect of approximating this by

    means of the central t-distribution indicated that very

    similar results are obtained. Therefore the central

    t-distribution is used.

    As indicated earlier, it is strongly recommended to

    have the same numbers of operators (mAmB) andinstruments (qAqB) for both methods. If moreover,the number of days as well as the number of replicates

    are taken the same for both methods, i.e. pApB andnAnB, Eq. (3) simplifies to

    t=2 t

    2s2OIDA s2rA=nA

    mAqApA

    s: (4)

    Generally, the number of operators (mAmB) andinstruments (qAqB) will be fixed by practical con-straints. It is recommended that the number of repli-

    cates per day is equal to 2 (n2) and to focus on thenumber of days required since this will lead to a

    balanced design in which the number of degrees of

    freedom associated with the repeatability is almost the

    same as the number of degrees of freedom associated

    with the between-day component. Therefore, the

    minimum number required is mostly determined only

    for the number of days pA (pB) which can beobtained by finding the smallest value for pA that

    satisfies Eq. (4).

    The equations above are only approximates which

    could be further simplified by replacing (t/2t) by aconstant value. Indeed for 0.05 and 0.05,(t/2t) varies between 3.6 (1) and 3.9 (14,

    i.e. mqp2) and therefore a constant value equal to4 could be used. For 0.05 and 0.2, (t/2t)varies between 2.8 (1) and 3.0 (14, i.e.mqp2), thus a constant value of 3.0 could beapplied. Eq. (4) then becomes

    4

    2s2OIDA s2rA=nA

    mAqApA

    swhen 0:05 and 0:05; (5)

    3

    2s2OIDA s2rA=nA

    mAqApA

    swhen 0:05 and 0:2: (6)

    2.4.2. Determination of the minimum number of

    measurements required for the detection of the

    minimum ratio of precision parameters

    In the ISO document [1], values of the minimum

    detectable ratio of the precision parameters corre-

    sponding to the chosen degrees of freedom (A, B)are given for the significance level 0.05 and thepower (1)0.95. Since ISO applies a two-sidedF-test to check whether the two methods have

    t=2 t

    mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDB s2rB=nB

    mAqApA mBqBpB 21

    mAqApA 1

    mBqBpB

    s; (2)

    t=2 t

    mAqApA 1s2OIDA s2rA=nA mBqBpB 1s2OIDA s2rA=nB

    mAqApA mBqBpB 21

    mAqApA 1

    mBqBpB

    s(3)

    210 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • different precision, these values are obtained based on

    a two-sided F-test. In our approach, the objective is to

    demonstrate that the precision of the alternative

    method B is at least as good as that of the reference

    method A. Therefore, a one-sided F-test is applied to

    compare the precision of both methods. Consequently,

    the calculation of the minimum ratio of precision

    parameters or corresponding to the given valuesof (A, B, , ) can be computed as

    ; IT or IOIT

    FA;B FB;A

    p; (7)

    where

    A mAqApAnA 1 andB mBqBpBnB 1 in case that is considered;

    (8)

    A mAqApA 1 andB mBqBpB 1 in case that IT is considered;

    (9)

    A mAqApA 1 andB mBqBpB 1 in case that IOIT is considered:

    (10)

    Tables 4 and 5 give the minimum ratios of precision

    parameters (, I(T) or I(OIT)) as a function of thedegrees of freedom A and B for (0.05, 0.05)and (0.05, 0.2), respectively. If the methodprecision is known, the degrees of freedom equalto 200 can be used.

    With mAmB, qAqB and nAnB2, the minimumnumbers of days required for the detection of the

    minimum ratio , I(T) or I(OIT) can be obtained byfirst finding the smallest values for the degrees of

    freedom (A and B) that satisfy Eq. (7) and theassociated minimum number of days can be calculated

    from Eq. (8) or Eq. (9) or Eq. (10) depending on

    which precision parameters are considered. When

    the values of and considered correspond to thosegiven in Table 4 or Table 5, the minimum values for

    the degrees of freedom are directly obtained by look-

    ing for the tabulated , I(T) or I(OIT) that is closest to(preferably smaller than) the given detectable ratio ,I(T) or I(OIT) and finding its associated numbers ofdegrees of freedom (A, B).

    The minimum number of measurements required is

    computed for the minimum difference , as well as forthe minimum ratios , I(T) and I(OIT) and the largestvalue is chosen to perform the method comparison.

    Table 4

    Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.05)B A

    5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200

    5 5.05 4.66 4.40 4.22 4.08 3.97 3.88 3.81 3.75 3.70 3.66 3.62 3.59 3.56 3.54 3.52 3.43 3.27 3.15

    6 4.66 4.28 4.03 3.85 3.72 3.61 3.53 3.46 3.40 3.36 3.31 3.28 3.25 3.22 3.20 3.17 3.09 2.93 2.81

    7 4.40 4.03 3.79 3.61 3.48 3.38 3.29 3.23 3.17 3.12 3.08 3.05 3.02 2.99 2.96 2.94 2.86 2.70 2.59

    8 4.22 3.85 3.61 3.44 3.31 3.21 3.13 3.06 3.00 2.96 2.92 2.88 2.85 2.82 2.80 2.78 2.70 2.54 2.42

    9 4.08 3.72 3.48 3.31 3.18 3.08 3.00 2.93 2.88 2.83 2.79 2.75 2.72 2.70 2.67 2.65 2.57 2.41 2.29

    10 3.97 3.61 3.38 3.21 3.08 2.98 2.90 2.83 2.78 2.73 2.69 2.66 2.62 2.60 2.57 2.55 2.47 2.31 2.19

    11 3.88 3.53 3.29 3.13 3.00 2.90 2.82 2.75 2.70 2.65 2.61 2.58 2.55 2.52 2.49 2.47 2.39 2.23 2.11

    12 3.81 3.46 3.23 3.06 2.93 2.83 2.75 2.69 2.63 2.59 2.55 2.51 2.48 2.45 2.43 2.41 2.33 2.16 2.05

    13 3.75 3.40 3.17 3.00 2.88 2.78 2.70 2.63 2.58 2.53 2.49 2.46 2.42 2.40 2.37 2.35 2.27 2.11 1.99

    14 3.70 3.36 3.12 2.96 2.83 2.73 2.65 2.59 2.53 2.48 2.44 2.41 2.38 2.35 2.33 2.30 2.22 2.06 1.94

    15 3.66 3.31 3.08 2.92 2.79 2.69 2.61 2.55 2.49 2.44 2.40 2.37 2.34 2.31 2.29 2.26 2.18 2.02 1.90

    16 3.62 3.28 3.05 2.88 2.75 2.66 2.58 2.51 2.46 2.41 2.37 2.33 2.30 2.28 2.25 2.23 2.15 1.98 1.86

    17 3.59 3.25 3.02 2.85 2.72 2.62 2.55 2.48 2.42 2.38 2.34 2.30 2.27 2.24 2.22 2.20 2.12 1.95 1.83

    18 3.56 3.22 2.99 2.82 2.70 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.24 2.22 2.19 2.17 2.09 1.92 1.80

    19 3.54 3.20 2.96 2.80 2.67 2.57 2.49 2.43 2.37 2.33 2.29 2.25 2.22 2.19 2.17 2.15 2.06 1.90 1.77

    20 3.52 3.17 2.94 2.78 2.65 2.55 2.47 2.41 2.35 2.30 2.26 2.23 2.20 2.17 2.15 2.12 2.04 1.87 1.74

    25 3.43 3.09 2.86 2.70 2.57 2.47 2.39 2.33 2.27 2.22 2.18 2.15 2.12 2.09 2.06 2.04 1.96 1.78 1.65

    50 3.27 2.93 2.70 2.54 2.41 2.31 2.23 2.16 2.11 2.06 2.02 1.98 1.95 1.92 1.90 1.87 1.78 1.60 1.45

    200 3.15 2.81 2.59 2.42 2.29 2.19 2.11 2.05 1.99 1.94 1.90 1.86 1.83 1.80 1.77 1.74 1.65 1.45 1.26

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 211

  • 2.5. Evaluation of test results

    For each test sample, the following parameters are

    to be computed:

    s2rA ; s2rB

    estimates of the repeatability variance

    for methods A and B, respectively

    s2DA ; s2DB

    estimates of the between-day var-

    iance component for methods A and

    B, respectively

    s2OIDA ; s2OIDB

    estimates of the (operatorinstru-mentday) variance component formethods A and B, respectively

    s2ITA ; s2ITB estimates of the time-different inter-

    mediate precision (variance) for

    methods A and B, respectively

    s2IOITA ; s2IOITB estimates of the (operatorinstru-

    menttime)-different intermediateprecision (variance) for methods A

    and B, respectively

    s2yijkA ; s2yijkB estimates of the variance of the day-

    means yijk for methods A and B,respectively

    yA; yB grand means obtained from methodsA and B, respectively.

    Calculation of all these parameters are given in

    Tables 2 and 3.

    2.5.1. Comparison of precision

    As mentioned before, it is important to show that the

    precision of method B is at least as good as that of

    method A. Therefore a one-sided F-test is applied here

    instead of the two-sided test used in ISO [1]. The null

    hypothesis H0 is that the precision of the alternative

    method B is better than or equal to the precision of the

    reference method A H0 : 2B 2A and the alterna-tive hypothesis H1 is that the precision of the alter-

    native method B is worse than the precision of the

    reference method A H1 : 2B > 2A.

    2.5.1.1. Comparison of repeatability. To compare the

    repeatability of two methods, the sample statistic Fr is

    calculated as follows:

    Fr s2rBs2rA

    (11)

    Table 5

    Values of (A, B, , ) or I(T)(A, B, , ) or I(OIT)(A, B, , ) for (0.05, 0.2)B A

    3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 50 200

    3 5.22 4.41 4.00 3.76 3.60 3.48 3.39 3.32 3.27 3.23 3.19 3.16 3.13 3.11 3.09 3.07 3.05 3.04 2.99 2.89 2.81

    4 4.76 3.98 3.59 3.35 3.19 3.08 2.99 2.92 2.87 2.83 2.79 2.76 2.74 2.71 2.69 2.68 2.66 2.65 2.60 2.49 2.42

    5 4.51 3.74 3.35 3.12 2.96 2.85 2.77 2.70 2.65 2.60 2.57 2.54 2.51 2.49 2.47 2.45 2.44 2.42 2.37 2.27 2.20

    6 4.35 3.59 3.21 2.97 2.82 2.70 2.62 2.55 2.50 2.46 2.42 2.39 2.37 2.34 2.32 2.31 2.29 2.28 2.22 2.12 2.05

    7 4.24 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.36 2.32 2.29 2.26 2.24 2.22 2.20 2.19 2.17 2.12 2.02 1.94

    8 4.15 3.41 3.03 2.79 2.64 2.53 2.44 2.38 2.32 2.28 2.24 2.21 2.19 2.16 2.14 2.13 2.11 2.10 2.04 1.94 1.86

    9 4.09 3.35 2.97 2.74 2.58 2.47 2.38 2.32 2.26 2.22 2.18 2.15 2.13 2.10 2.08 2.07 2.05 2.04 1.98 1.88 1.80

    10 4.04 3.30 2.92 2.69 2.53 2.42 2.34 2.27 2.22 2.17 2.14 2.11 2.08 2.06 2.04 2.02 2.00 1.99 1.93 1.83 1.75

    11 4.00 3.26 2.88 2.65 2.50 2.38 2.30 2.23 2.18 2.14 2.10 2.07 2.04 2.02 2.00 1.98 1.96 1.95 1.89 1.78 1.70

    12 3.97 3.23 2.85 2.62 2.47 2.35 2.27 2.20 2.15 2.10 2.07 2.03 2.01 1.98 1.96 1.95 1.93 1.91 1.86 1.75 1.67

    13 3.94 3.21 2.83 2.60 2.44 2.33 2.24 2.17 2.12 2.08 2.04 2.01 1.98 1.96 1.94 1.92 1.90 1.89 1.83 1.72 1.64

    14 3.92 3.18 2.80 2.57 2.42 2.30 2.22 2.15 2.10 2.05 2.02 1.98 1.96 1.93 1.91 1.89 1.88 1.86 1.81 1.69 1.61

    15 3.90 3.17 2.79 2.55 2.40 2.28 2.20 2.13 2.08 2.03 1.99 1.96 1.94 1.91 1.89 1.87 1.86 1.84 1.78 1.67 1.59

    16 3.88 3.15 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.01 1.98 1.94 1.92 1.89 1.87 1.85 1.84 1.82 1.76 1.65 1.56

    17 3.87 3.13 2.75 2.52 2.37 2.25 2.17 2.10 2.04 2.00 1.96 1.93 1.90 1.88 1.86 1.84 1.82 1.80 1.75 1.63 1.55

    18 3.86 3.12 2.74 2.51 2.35 2.24 2.15 2.08 2.03 1.98 1.95 1.91 1.89 1.86 1.84 1.82 1.81 1.79 1.73 1.62 1.53

    19 3.84 3.11 2.73 2.50 2.34 2.23 2.14 2.07 2.02 1.97 1.93 1.90 1.87 1.85 1.83 1.81 1.79 1.78 1.72 1.60 1.51

    20 3.83 3.10 2.72 2.49 2.33 2.22 2.13 2.06 2.01 1.96 1.92 1.89 1.86 1.84 1.82 1.80 1.78 1.76 1.71 1.59 1.50

    25 3.79 3.06 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.92 1.88 1.85 1.82 1.79 1.77 1.75 1.73 1.72 1.66 1.54 1.44

    50 3.71 2.98 2.60 2.37 2.21 2.09 2.00 1.93 1.88 1.83 1.79 1.76 1.73 1.70 1.68 1.66 1.64 1.62 1.56 1.43 1.32

    200 3.65 2.92 2.54 2.31 2.15 2.03 1.94 1.87 1.81 1.76 1.72 1.69 1.66 1.63 1.60 1.58 1.56 1.55 1.48 1.33 1.19

    212 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • Fr is then compared with FrB ;rA , where FrB ;rA isthe value of the F-distribution with degrees of freedom

    of numerator rB nB 1PmB

    i1PqB

    j1 pijB and deno-minator rA nA 1

    PmAi1PqA

    j1 pijA, representsthe portion of the F-distribution to the right of the

    given value; 0.05.If Fr > FrB ;rA , method B has worse repeatability

    than method A at the (1)100% (i.e. 95%) confi-dence level and therefore the repeatability of the

    alternative method B is not acceptable.

    If Fr FrB ;rA , the repeatability of the alternativemethod B is acceptable, which means that it is at most

    a factor worse than that of method A.

    2.5.1.2. Comparison of time-different intermediate

    precision. For the comparison of the time-different

    intermediate precision, we need the number of degrees

    of freedom associated with the precision estimates.

    Since these estimates are not directly estimated from

    the data but are calculated as a linear combination of

    two mean squares, MSD and MSE (see Table 3), it is

    not evident how to determine the number of degrees of

    freedom associated with this compound variance. The

    Satherthwaite approximation [11] (see further) can

    then be used.

    However, to avoid the complexity in the determina-

    tion of the degrees of freedom associated with s2IT, thecomparison of time-different intermediate precisions

    can be performed in an indirect way by comparing the

    day mean squares MSD [12]. Indeed since (see

    Table 3):

    MSD ns2D s2r and s2IT s2D s2r :It follows that

    MSD ns2IT ns2r s2r ns2IT n 1s2r :Therefore provided that the repeatabilities of both

    methods are equal 2rA 2rB and the number ofreplicates per day for both methods is equal (nAnB),the day mean squares MSD are considered instead of

    s2IT. If nAnB, the equality of the repeatabilities forboth methods is first to be tested H0 : 2rA 2rB ;H1 :

    2rA6 2rB by means of a two-sided F-test. The

    results obtained from the comparison of the repeat-

    abilities in Section 2.5.1.1 cannot be used here, since

    a one-sided F-test has been considered to test the

    hypotheses: H0 : 2rB 2rA ; H1 : 2rB > 2rA. Anon-significant test, which means that the repeatability

    of method B is acceptable, does not necessarily imply

    that the repeatabilities of both methods are equal; the

    repeatability of method B can be better (smaller) than

    the repeatability of method A. Therefore, the repeat-

    abilities of both methods have to be compared again

    by applying a two-sided F-test. Fr is obtained here as

    follows:

    Fr s21

    s22(12)

    with s21 the largest of s2rA

    and s2rB .

    Fr is then compared with F=2r1 ; r2, whereF=2r1 ; r2 is the value of the F-distribution withdegrees of freedom of numerator r1 and denominatorr2 , /2 represents the portion of the F-distribution tothe right of the given value; 0.05.r1 rA and r2 rB if s2rA > s2rB ;r1 rB and r2 rA if s2rB > s2rA ;

    rB nB 1XmBi1

    XqBj1

    pijB and

    rA nA 1XmAi1

    XqAj1

    pijA :

    If Fr F=2r1 ; r2, there is no evidence that thetwo methods have different repeatabilities and there-

    fore the equality of the repeatabilities for both meth-

    ods can be assumed.

    If Fr > F=2r2 ; r2, the repeatabilities of the twomethods are significantly different at the significance

    level of (i.e. 5%). In that case the following sim-plified approach to compare the time-different inter-

    mediate precisions cannot be applied. The

    Satherthwaite approximation to estimate the number

    of degrees of freedom associated with s2IT has then tobe used (see further).

    In the situation where the number of replicates per

    day for both methods are equal (nAnB) and theequality of the repeatabilities for both methods can

    be assumed 2rA 2rB, the comparison of time-dif-ferent intermediate precision is performed by calcu-

    lating FI(T) as follows:

    FIT MSDBMSDA

    nBs2DBs2rB

    nAs2DAs2rA

    nBs

    2ITBnB1s

    2rB

    nAs2ITAnA1s2rA

    !:

    (13)

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 213

  • FI(T) is compared with FITB ;ITA , whereFITB ;ITA is the value of the F-distribution withdegrees of freedom of the numerator ITB

    PmBi1PqB

    j1pijB 1 and the denominator ITA PmA

    i1PqAj1pijA 1, represents the portion of the F-

    distribution to the right of the given value; 0.05.If FIT FITB ;ITA , method B has worse time-

    different intermediate precision than method A at the

    (1)100% (i.e. 95%) confidence level and thereforethe time-different intermediate precision of the alter-

    native method B is not acceptable.

    If FIT FITB ;ITA , the time-different inter-mediate precision of the alternative method B is

    acceptable, which means that it is at most a factor

    IT worse than that of method A.In the situation where the number of replicates per

    day for both methods are not equal (nA 6nB) or theequality of the repeatabilities for both methods cannot

    be assumed 2rA 6 2rB, the comparison of the time-different intermediate precisions cannot be performed

    by Eq. (13) but it must be investigated through the

    comparison of s2IT. Then FI(T) is calculated as fol-lows:

    FIT s2ITBs2

    ITA: (14)

    As mentioned earlier, the number of degrees of

    freedom associated with s2IT for both methods isobtained from the Satterthwaite approximation:

    When the non-integer value is obtained for the I(T),round the number down to the nearest integer.

    The comparison of FI(T) with FITB ; ITA,where ITB and ITA are computed from Eqs. (15)and (16), respectively, is then performed in the same

    way as mentioned earlier when Eq. (13) is applied.

    2.5.1.3. Comparison of (operatorinstrumenttime)-different intermediate precision. In analogy with the

    comparison of the time-different intermediate

    precision, the comparison of (operatorinstrumenttime)-different intermediate precision can be per-

    formed in an indirect way by comparing the

    (operatorinstrumentday)-mean squares MSOID. Ifthe number of replicates per day for both methods is

    equal (nAnB) and the comparison of the repeata-bilities in Eq. (12) does not give evidence against the

    equality of the repeatabilities of both methods (i.e.

    2rA 2rB ), FI(OIT) is calculated as follows:

    FIOIT MSOIDBMSOIDA

    nBs2OIDBs2rB

    nAs2OIDAsr2

    A

    nBs

    2IOITBnB1s

    2rB

    nAs2IOITAnA1s2rA

    !: (17)

    The comparison of FI(OIT) with FIOITB ;IOITA ,where IOITB

    PmBi1PqB

    j1 pijB 1 and IOITAPmAi1PqAj1 pijA 1 is then performed in analogywith the comparison of the time-different intermediate

    precision mentioned earlier when Eq. (13) is

    considered.

    In the situation where the number of replicates

    per day for both methods are not equal (nA 6nB)or the equality of the repeatabilities of both

    methods cannot be assumed 2rA 6 2rB, the compar-ison of (operatorinstrumenttime)-different inter-mediate precision must be investigated through

    the comparison of s2IOIT. Then FI(OIT) is calculatedas follows:

    FIOIT s2IOITBs2

    IOITA: (18)

    Again, the further steps to compare FI(OIT) with

    FIOITB ; IOITA, where IOITB and IOITA arecomputed from Eqs. (19) and (20), respectively, are in

    analogy with the comparison of the time-different

    intermediate precision mentioned earlier when

    Eq. (13) is considered.

    ITB s2ITB

    2

    MSDB=nB2=PmB

    i1PqB

    j1pijB 1 nB 1MSEB=nB2=nB 1PmB

    i1PqB

    j1 pijB; (15)

    ITA s2ITA

    2

    MSDA=nA2=PmA

    i1PqA

    j1pijA 1 nA 1MSEA=nA2=nA 1PmA

    i1PqA

    j1 pijA: (16)

    214 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • 2.5.1.4. Comment concerning the ISO 5725-6

    procedure. In ISO [1], the comparison of the overall

    precision (a term which is not really explained) is

    performed in analogy with Eq. (17) without pre-

    evaluating the equality of the repeatabilities of both

    methods. The fact that the same number of replicates

    per laboratory for the two methods (nAnB) isrequired is not taken into consideration either. If the

    overall precision refers to the reproducibility, the

    indirect comparison of the latter for both methods

    by a comparison of the variance of the laboratory

    means is only possible if the repeatability and the

    number of replicates per laboratory (n) is the same for

    both methods. If this is not the case, a direct

    comparison of the reproducibility obtained with the

    two methods should be performed, possibly in analogy

    with Eq. (18).

    2.5.2. Evaluation of the bias

    2.5.2.1. Comments concerning the ISO 5725-6

    procedure. The evaluation of the bias is performed

    by comparing the grand means obtained with both

    methods. In ISO [1] the comparison is based on the z-

    test, since the sample statistic is compared with 2 (an

    approximation of the two-sided tabulated z-value of

    zz0.051.96). This implies that the estimatedstandard deviations used in the comparison are

    obtained from large samples and therefore that they

    are sufficiently good estimates of the true standard

    deviations. If the sample statistic is larger than 2, the

    difference between the means obtained with the two

    methods is statistically significant. In that case, to

    avoid the rejection of the method with an acceptable

    bias, it is further examined whether the estimated bias

    d can be considered acceptable. ISO concludes that the

    bias is significant, but acceptable if its absolute value

    is not larger than /2.However, this approach is questionable for the

    following reasons. ISO specifies to be four times

    d, where d represents the standard deviation of thedifference between the means of methods A and B.

    This is obtained from z=2 zd which with0.05 becomes (1.961.645)d4d. In theevaluation of the bias d is estimated from the experi-ments as sd. If the estimated bias d jyA yBj =2,the 95% confidence interval (CI) around d can be

    calculated as

    =2 1:96sd or 2d 2sd:

    If sd is exactly equal to d, the lower limit of theconfidence interval is equal to 0 and the upper limit is

    equal to (see Fig. 2(a)). Since 0 is just included inthe CI, the bias is not significantly different from zero.

    (Evaluation of the bias by means of the z-test, as done

    in ISO, would of course lead to the same conclusion

    since the test statistic, d/sd2d/sd2). Moreover, theprobability that the true absolute bias, as estimated by

    IOITB s2IOITB

    2

    MSOIDB=nB2=PmB

    i1PqB

    j1 pijB 1 nB 1MSEB=nB2=nB 1PmB

    i1PqB

    j1 pijB; (19)

    IOITA s2IOITA

    2

    MSOIDA=nA2=PmA

    i1PqA

    j1 pijA 1 nA 1MSEA=nA2=nA 1PmA

    i1PqA

    j1 pijA; (20)

    Fig. 2. Different situations of a bias evaluation. d: The estimated

    absolute difference between the grand means obtained with the two

    methods, : the acceptable bias; () 95% confidence interval.(a) sdd and d/2; (b) sd/2; (c) sd>d and d

  • d/2, exceeds is only 2.5%. ISO considers asignificant bias to be acceptable if d is smaller than

    /2. However, if sd equals d there is no point incomparing d with /2 since with d larger than /2 thereis no chance that the significant difference can be

    acceptable.

    If sd is different from d, as is to be expected,the comparison can lead to wrong conclusions.

    Indeed if sd is smaller than d, which will be, e.g.the case if the acceptance criteria for the precision

    measure are defining the number of measurements

    to be performed, considering the bias to be unaccep-

    table if d is larger than /2 can lead to the rejectionof a method with an acceptable bias (see Fig. 2(b)).

    On the other hand if sd is larger than d, an unac-

    ceptable bias can lead to a non-significant test and

    therefore to acceptance of the method (see Fig. 2(c)).

    Therefore, it is more appropriate to compare the

    one-sided upper 95% confidence limit around d with

    to conclude on the acceptability of the method(see further).

    2.5.2.2. Adapted approach. In our approach which is

    intended for the intralaboratory situation, the standard

    deviations are generally estimated from a relatively

    small sample size, and therefore the t-test is more

    appropriate than the z-test. A two-sided test (H0:

    AB; H1: A 6B) is considered since thedifference between the two means can be positive

    as well as negative. Therefore

    tcal jyA yBj

    sd; (21)

    where sd represents the estimated standard deviation

    of the differences between the means obtained with

    the two methods.

    The use of t-test requires that the variances of the

    day means obtained with the two methods are equal

    (i.e. 2yijkA 2yijkB ). This equality must be first tested

    by applying a two-sided F-test. The degrees of free-

    dom associated with s2yijkA and s2yijkB are

    PmAi1PqA

    j1pijA 1 and

    PmBi1PqB

    j1 pijB 1, respectively.If there is no evidence against the equality of 2yijkA

    and 2yijkB , sd is calculated by applying the pooledvariance s2p as follows:

    sd

    s2p

    1PmAi1PqA

    j1 pijA 1PmB

    i1PqB

    j1 pijB

    !vuut ; (22)where

    with

    d XmAi1

    XqAj1

    pijA XmBi1

    XqBj1

    pijB 2; (24)

    When the equality of the variances of the day means,

    2yijkA and 2yijkB , cannot be assumed, the variances

    cannot be pooled and sd is calculated as follows [13]:

    sd

    s2yijkAPmA

    i1PqA

    j1 pijA

    s2yijkBPmBi1PqB

    j1 pijB

    vuut : (25)The number of degrees of freedom d associated with

    sd is then calculated by applying the Satterthwaite

    approximation:

    The tcal is compared with t=2;d , where t=2;d is the

    two-sided tabulated t-value at the significance level

    0.05 and the degrees of freedom d as indicated inEq. (24) or Eq. (26).

    If tcal > t=2;d , the difference between the meansobtained with the two methods is statistically signifi-

    cant. Though the difference is significant it might not

    s2p PmAi1PqAj1 pijA 1s2yijkA PmBi1PqBj1 pijB 1s2yijkB

    d(23)

    d s2d2

    s2yijkA=PmA

    i1PqA

    j1 pijA2=PmA

    i1PqA

    j1 pijA 1 s2yijkB=PmB

    i1PqB

    j1 pijB2=PmB

    i1PqB

    j1 pijB 1(26)

    216 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • be relevant to the application. Therefore it could be

    further evaluated whether the difference found can be

    considered acceptable. The one-sided (1)100%(i.e. 95% for 0.05) upper confidence limit (UCL)around the absolute difference d is compared with the

    acceptance limit . The UCL is obtained as follows:

    UCL jyA yBj sd t;d; (27)where sd is as shown in Eq. (22) or Eq. (25) and t;d is

    the one-sided tabulated t-value at the significance level

    0.05 and the degrees of freedom d (as shown inEq. (24) or Eq. (26)).

    If UCL, the bias although statistically significantis acceptable since there is a smaller than (or at most)

    probability that the true absolute difference asestimated by jyA yBj is larger than .

    If UCL>, the bias is not acceptable since there is alarger than probability that the true absolute differ-ence as estimated by jyA yBj is larger than .

    If tcal t=2;d , the difference between the meansobtained with the two methods is statistically insig-

    nificant. However, if the precision estimates (s2r and

    s2OID) of the two methods obtained experimentally are

    larger than those used for the calculation of the

    minimum number of measurements required for the

    detection of in Eqs. (2)(6), an unacceptable biascan lead to a non-significant test.

    Therefore, to limit the risk of adopting a method

    with an unacceptable bias, the interval hypothesis

    testing as proposed by Hartmann et al. [2] should

    be more appropriate for the evaluation of the bias than

    the approach mentioned above (Eqs. (21), (22), (23)

    (27)). The procedure is as follows.

    Calculate the one-sided (1)100% (i.e. 95% for0.05) upper confidence limit (UCL) around theabsolute difference d:

    UCL jyA yBj sd t;d; (28)where is 0.05, sd is the same as Eq. (22) or Eq. (25)and d is the same as Eq. (24) or Eq. (26).

    Since in interval hypothesis testing, the null and

    alternative hypotheses are reversed, the roles of and are also reversed. Therefore, here corresponds tothe probability that a method that is biased to an

    unacceptably large extent will be accepted.

    If the UCL is not larger than the acceptable bias ,the difference between the grand means of method A

    and method B is considered acceptable at the

    (1)100% confidence level and the bias of methodB is acceptable. If the UCL is larger than the accep-

    table bias , the bias of method B is not acceptable.With this approach the probability of accepting a

    method that is too much biased is controlled at 5%.

    The evaluation of the bias described above is based

    on nested designs performed separately for methods A

    and B. If it is possible to design a simultaneous

    experiment (e.g. same days and same operators) a

    paired comparison [6], for which a smaller sd is to be

    expected, could be preferable.

    3. Examples

    Two examples will illustrate the approach dis-

    cussed. In the first example measurements are per-

    formed under (operatorinstrumenttime)-differentintermediate precision conditions while in the second

    example only time-different intermediate precision

    conditions are considered.

    3.1. Example 1: quantification of diazepam in

    diazepam tablets (the example is fictitious)

    3.1.1. Background

    Method A is a HPLC method, method B is a UV

    (second derivative) method for the quantification of

    diazepam in diazepam tablets. A laboratory uses

    method A but developed method B as an alternative.

    The laboratory wants to compare the performance of

    both methods. The results are expressed as percentage

    of the labelled amount (%). For method A an estimate

    of the precision (sr and sOID) is available: sr1%,sOID2%.

    3.1.2. Requirements

    The acceptable bias is 2%. The acceptable ratio ofthe standard deviations between the two methods, ,I(T) or I(OIT) is 2. The statistical tests are performedat the significance level 0.05. The probability towrongly adopt the method with an unacceptable per-

    formance is set at 0.2.

    3.1.3. Experimental design

    It is decided that the number of operators, instru-

    ments and replicates per day for each method is 2 and

    the number of days for both methods is equal (pApB).

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 217

  • From one batch of diazepam tablets, 300 tablets are

    randomly taken. They are powdered and kept in a cool,

    dry place, e.g. desiccator. Each day during pA days,

    two replicates (nA2) prepared from the powderedsamples are analysed with method A on the first

    instrument by the first operator. The analysis is

    repeated independently in the same way during

    another pA days but on the second instrument. The

    second operator performs the procedures in the same

    way as the first operator does. The experiments for

    method B are designed in the same way as those for

    method A.

    3.1.4. Determination of the minimum number of days

    (1) For the detection of . Since pApB andnAnB2, Eq. (4) is used:

    2 t=2 t

    2s2OIDA s2rA=nA

    mAqApA

    s;

    2 t=2 t

    222 12=2

    4pA

    s:

    With pA4 and[2mAqApA2][2(224)2]30, (t/2t)2.896 and the right side of the equa-tion above equals 2.172; with pA5 and [2mAqApA2][2(225)2]38, (t/2t)2.876 and the right side of the equation above equals

    1.929. Hence pApB5. (The use of a constant multi-plication factor equal to 3 would yield pApB6.)

    (2) For the comparison of precision measures. From

    Table 5 it can be seen that (or I(T) or I(OIT))2 isgiven by AB14.

    To compare repeatability,

    A mAqApA and B mBqBpB;so pA pB 14=4 3:5 4:To compare time-different intermediate precision,

    A mAqApA 1 and B mBqBpB 1;so pA pB 14=4 1 4:5 5:To compare (operatorinstrumentday)-different

    intermediate precision,

    A mAqApA 1 and B mBqBpB 1;so pA pB 15=4 3:75 4:

    (3) Conclusion. The minimum number of days

    required for both methods (with two operators, two

    instruments and two measurements per day) is 5.

    3.1.5. The data

    The data for methods A and B are summarized in

    Tables 6 and 7, respectively.

    3.1.6. Investigation of outliers

    Grubbs tests were applied to the day means [8]. No

    single or double stragglers or outliers were found for

    both methods.

    Table 6

    Data obtained with method A (example 1)

    Operator Instrument 1 Instrument 2

    Day yi1k1 yi1k2 yi1k yi1 Day yi2k1 yi2k2 yi2k yi2

    1 1 97.13 98.81 97.970 97.888 1 98.98 99.52 99.250 99.253

    2 101.23 100.68 100.955 2 102.84 100.93 101.885

    3 97.13 96.63 96.880 3 99.65 99.29 99.470

    4 97.17 95.82 96.495 4 98.67 98.46 98.565

    5 96.82 97.46 97.140 5 97.08 97.11 97.095

    2 1 100.71 99.37 100.040 100.208 1 101.55 104.04 102.795 102.211

    2 101.26 103.78 102.520 2 99.66 98.70 99.180

    3 98.49 100.87 99.680 3 102.54 100.60 101.570

    4 97.06 98.92 97.990 4 104.95 102.61 103.780

    5 101.85 99.77 100.810 5 103.10 104.36 103.730

    Grand mean99.890

    218 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • 3.1.7. Calculation of the variance estimates

    Tables 8 and 9 summarize the calculation of the

    variance estimates for methods A and B, respectively.

    3.1.8. Comparison of precision

    3.1.8.1. Repeatability. The repeatabilities are com-

    pared according to Eq. (11):

    Fr 1:48101:2317

    1:20:

    This is to be compared with F0.05(20,20)2.12. SinceFr

  • of replicates per day for both methods is equal

    (nAnB), the comparison of time-different intermedi-ate precision is performed according to Eq. (13):

    FIT MSDBMSDA

    9:70426:3345

    1:53:

    This is to be compared with F0.05(16,16)2.33. SinceFI(T)

  • This is to be compared with the acceptable bias

    2. Since UCL>2 the bias of method B is notacceptable. Notice that in this approach the probability

    of accepting a method that is too much biased is

    controlled at 5%.

    3.2. Example 2: determination of moisture in cheese

    (the example is fictitious)

    3.2.1. Background

    Method A is a Karl Fischer method, method B is a

    vacuum oven method for the determination of moist-

    ure in cheese. A laboratory uses method A but devel-

    oped method B as an alternative. The laboratory wants

    to compare the performance of both methods. The

    results are expressed as percentage moisture. For

    method A an estimate of the precision (s2r and s2D)

    is available: s2r 0:023; s2D 0:08

    3.2.2. Requirements

    The acceptable bias is 0.50%. The acceptableratio of the standard deviations between the two

    methods, or I(T) is 3. The statistical tests areperformed at the significance level 0.05. The prob-ability to wrongly adopt the method with an unac-ceptable performance is set at 0.05.

    3.2.3. Experimental design

    The material is a cheese, analysed with both

    methods.

    It is decided that the number of replicates per day

    for each method is two and the number of days for

    both methods is equal (pApB). Each day during pAdays, two independent samples (nA2) from thecheese are analysed with method A by the same

    operator using the same instrument. Each day during

    pB days, two independent samples (nB2) from thecheese are analysed with method B by the same

    operator using the same instrument.

    3.2.4. Determination of the minimum number of days

    (1) For the detection of . Since pApB andnAnB2, Eq. (4) is used. Since in the comparisononly time-different intermediate precision conditions

    are considered, the number of operators, mAmB1and the number of instruments, qAqB1. Conse-quently, as can be derived form Table 3, MSOID and

    s2OID are equal to MSD and s2D, respectively.

    0:5 t=2 t

    2s2DA s2rA=nA

    pA

    s;

    0:5 t=2 t

    20:08 0:023=2

    pA

    s:

    With pA10 and [2mAqApA2][2(1110)2]18, (t/2t)3.835 and the right side of theequation above equals 0.519; with pA11 and [2mAqApA2][2(1111)2]20, (t/2t)3.811 and the right side of the equation above equals0.492. Hence pApB11. (The use of a constant multi-plication factor equal to 4 would yield pApB12.)

    (2) For the comparison of precision. From Table 4 it

    can be seen that 3 orI(T)3 is given by AB10.To compare repeatability standard deviations,

    A mAqApA and B mBqBpB;so pA pB 10:

    To compare between-day mean squares,

    A mAqApA 1 and B mBqBpB 1;so pA pB 10 1 11:

    (3) Conclusion. The minimum number of days

    required (with two measurements per day) is 11.

    3.2.5. The data

    The data for methods A and B are summarized in

    Table 10.

    3.2.6. Investigation of outliers

    Grubbs tests were applied to the day means [8]. No

    single or double stragglers or outliers were found for

    method A. For method B the single Grubbs test

    applied on the mean of day 9 is significant at the

    5% level but not at the 1% level. Indeed

    G 39:845 39:4860:1411

    2:544;

    which is to be compared with Grubbs critical values

    for p11 at 5% (2.355) and 1% (2.564). Therefore,since this observation is considered as a straggler it is

    retained but indicated with a in Table 10.

    3.2.7. Calculation of the variance estimates

    Tables 11 and 12 summarize the calculation of the

    variance estimates for methods A and B, respectively.

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 221

  • Table 10

    Data for the example 2

    Day Method A Method B

    y11k1 y11k2 y11k y11k1 y11k2 y11k

    1 39.68 39.77 39.725 39.29 39.36 39.325

    2 39.08 39.38 39.230 39.51 39.38 39.445

    3 40.39 40.33 40.360 39.45 39.49 39.740

    4 39.87 39.98 39.925 39.59 39.51 39.550

    5 39.70 39.95 39.825 39.41 39.41 39.410

    6 39.93 39.95 39.940 39.45 39.54 39.495

    7 39.78 39.97 39.875 39.55 39.55 39.550

    8 39.92 40.20 40.060 39.29 39.36 39.325

    9 40.34 39.89 40.115 39.82 39.87 39.845a

    10 40.12 40.26 40.190 39.44 39.45 39.445

    11 39.43 39.54 39.485 39.45 39.53 39.490

    Grand mean39.884 Grand mean39.486a Straggler.

    Table 11

    Calculation of the variance estimates for method A (example 2) (ANOVA table)

    Source Mean squares Estimate of

    Day MSD0.2050 2rA nAs2DAResidual MSE0.0239 2rA

    Calculation of the variance estimates

    The repeatability variance s2rA 0:0239; 11The between-day variance component s2DA

    0:2050 0:02392

    0:0906Time-different intermediate precision (variance) s2ITA s

    2DA s2rA 0:1145

    Variance of the day means yijk s2yijkA s

    2DA s2rA=nA 0:1025; 10

    Table 12

    Calculation of the variance estimates for method B (example 2) (ANOVA table)

    Source Mean Squares Estimate of

    Day MSD0.0397 2rB nBs2DBResidual MSE0.0024 2rB

    Calculation of the variance estimates

    The repeatability variance s2rB 0:0024; 11The between-day variance component s2DB

    0:0397 0:00242

    0:0187Time-different intermediate precision (variance) s2ITB s

    2DB s2rB 0:0211

    Variance of the day means yijk s2yijkB s

    2DB s2rB=nB 0:0199; 10

    222 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • 3.2.8. Comparison of precision

    3.2.8.1. Repeatability. The repeatabilities are com-

    pared according to Eq. (11):

    Fr 0:00240:0239

    0:10:

    This is to be compared with F0.05(11,11)2.82. SinceFr3.47 there is evidence that the

    repeatabilities of both methods are different (in fact

    the repeatability for method B is better than for

    method A).

    (ii) Since the repeatabilities of both methods are

    different 2rA 6 2rB, the comparison of time-differentintermediate precision is performed according to

    Eq. (14):

    FIT S2ITBS2

    ITA 0:0211

    0:1145 0:18:

    The number of degrees of freedom associated with

    s2IT for both methods is obtained from the Sat-terthwaite approximation (Eqs. (15) and (16)):

    ITB 0:02112

    0:0397=22=10 0:0024=22=11 11;

    ITA 0:11452

    0:2050=22=10 0:0239=22=11 12:

    FI(T) is to be compared with F0.05(11,12)2.72. SinceFI(T)3.72 there is evidence that the variances of

    the day means obtained with the two methods are

    different. Therefore, the standard deviation sd and its

    associated degrees of freedom d are calculated asfollows (see Eqs. (25) and (26)):

    sd

    0:1025

    11 0:0199

    11

    r 0:1055;

    d 0:01112

    0:1025=112=10 0:0199=112=10 13:

    The test statistic tcal is obtained as given in Eq. (21):

    tcal jyA yBj

    sd j39:884 39:486j

    0:1055 3:77:

    This is to be compared with t0.025;132.16. Sincetcal>2.16 the difference between the grand means of

    the two methods is statistically significant at 0.05.To further evaluate whether the difference found

    can be acceptable, the UCL is calculated according to

    Eq. (27):

    UCL j39:884 39:486j 0:1055 t;13 0:398 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:

    Since the UCL is larger than the acceptable bias

    0.50, there is evidence that the difference betweenthe means of the two methods is unacceptable.

    (If the probability is allowed to be 0.2, the UCLwould be (0.398(0.10550.870))0.490 which issmaller than 0.50 and the bias would be accept-able.)

    In the interval hypothesis testing approach, the one-

    sided 95% upper confidence limit (UCL) around the

    absolute difference between the grand means is cal-

    culated as follows (Eq. (28))):

    UCL j39:884 39:486j 0:1055 t;13 j39:884 39:486j 0:1055 t0:05;13 0:398 0:1055 1:771 0:585:

    This is to be compared with the acceptable bias

    0.5. Since UCL>0.5 the bias of method B is not

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 223

  • acceptable. Notice that in this example, the interval

    hypothesis leads to the same conclusion as the point

    hypothesis (with the inclusion of the -error consid-eration) does. This is due to the fact that with the point

    hypothesis testing, a significant difference between

    the means of both methods is detected and that in the

    further evaluation whether the difference found can be

    considered acceptable, the probability considered isequal to the probability in the interval hypothesistesting.

    4. Conclusion

    An approach for the comparison of an alternative

    method to a reference method has been proposed

    for the intralaboratory situation. Instead of the repro-

    ducibility (as included in the ISO guidelines), the

    (operatorinstrumenttime)-different intermediateprecision is considered in the comparison. The pro-

    posal includes:

    1. the experimental design (i.e. the determination of

    the number of measurements required to perform

    the comparison),

    2. the estimation of different precision parameters

    and the comparison of these precision measures

    for both methods, and

    3. the statistical approach for the evaluation of the

    bias in which the interval hypothesis testing has

    also been proposed as an alternative.

    The comparison of the bias and precision described

    in this article is performed at a single concentration

    level. If the alternative method is intended for use over

    a rather broad concentration range, the comparison

    should be performed at more concentration levels (e.g.

    low, middle and high). Due to the problem with

    multiple comparison [5], the present approach is not

    recommended if the methods are to be compared at

    more than three levels. For trace analysis, an evalua-

    tion of the detection and quantification limit should

    also be performed.

    The proposal is an optimal approach in the sense

    that it is based on sample size calculations. The

    number of measurements to be performed are such

    that there is a high probability (1) that an alter-native method with an unacceptable performance will

    not be adopted. This of course is of utmost importance

    but might require a number of measurements that the

    laboratory is not able (or not willing) to perform

    because of time and cost involved. If this is the case,

    an alternative approach based on a number of mea-

    surements that in practice is feasible, is required. Two

    approaches can be conceived. The first is to perform

    the comparison, based on a user-defined number of

    measurements, in the classical way using point

    hypothesis testing and to evaluate the b-error. In thisway the laboratory would at least have an idea of the

    probability that an alternative method with an unac-

    ceptable performance has been accepted and thus of

    the risk that is run that the method will not perform as

    expected during routine use of the method.

    Another approach is to control the probability that a

    method with unacceptable performance characteris-

    tics will be adopted by using interval hypothesis

    testing. The latter was already included here as an

    alternative for the evaluation of the bias but can also be

    considered in the comparison of precision measures.

    After it was proposed for the evaluation of the bias in

    method validation studies by Hartmann et al. [2],

    interval hypothesis testing has been considered by

    the SFSTP in a guideline for the validation of bioa-

    nalytical methods [14].

    Acknowledgements

    This work has received financial support from the

    European Commission (Standards, Measurements and

    Testing Programme Contract SMT4-CT95-2031) and

    the Belgian government (The Prime Minister Services

    Federal Office for Scientific, Technical and Cultural

    Affairs, Standardisation Programme Research Con-

    tract no/03/003).

    References

    [1] International Standard, Accuracy (Trueness and Precision) of

    Measurement methods and results, ISO 5725-6, Geneva,

    1994.

    [2] C. Hartmann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander

    Heyden, P. Vankeerberghen, D.L. Massart, Anal. Chem. 67

    (1995) 4491.

    [3] International Standard, Accuracy (Trueness and Precision) of

    Measurement methods and results, ISO 5725-3, Geneva,

    1994.

    224 S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225

  • [4] International Standard, Statistics (Vocabulary and symbols):

    Design of experiments, ISO 3534-3, Geneva, 1985.

    [5] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De

    Jong, P.J. Lewi, J. Smeyers-Verbeke, Hand book of Chemo-

    metrics and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.

    [6] D.C. Montgomery, Design and Analysis of Experiments, 4th

    ed., Wiley, New York, 1997.

    [7] J. Mandel, The Statistical Analysis of Experimental Data,

    Dover, New York, 1964, p. 359.

    [8] International Standard, Accuracy (Trueness and Precision) of

    Measurement methods and results, ISO 5725-2, Geneva,

    1994.

    [9] W. Gerisch, D. Abraham, Comput. Stat. Quarterly 4 (1989)

    299.

    [10] D.J. Schuirmann, J. Pharmacokinet. Biopharm. 15 (1987)

    657.

    [11] F.E. Satterthwaite, Biomed. Bull. 2 (1946) 110.

    [12] G.T. Wernimont, Use of Statistics to Develop and Evaluate

    Analytical Methods, in: W. Spendley (Ed.), AOAC, Arlington,

    VA, 1985, p. 39.

    [13] G.W. Snedecor, W.G. Cochran, Statistical Methods, 7th ed.,

    The Iowa state University Press, Ames, Iowa, 1982, p. 96.

    [14] E. Chapuzet, N. Mercier (Presidents), S. Bervoas-Martin, B.

    Boulanger, P. Chevalier, P. Chiap, D. Grandjean, P. Hubert, P.

    Lagorce, M. Lallier, M.C. Laparra, M. Laurentie, J.C. Nivet,

    S.T.P. Pharma Prat. 7 (3) (1997) 169.

    S. Kuttatharmmakul et al. / Analytica Chimica Acta 391 (1999) 203225 225


Top Related