a unified approach to linear equating for the nonequivalent groups design

46
Research & Development Alina A. von Davier Nan Kong Research Report November 2003 RR-03-31 A Unified Approach to Linear Equating for the Non-Equivalent Groups Design

Upload: fordham

Post on 19-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Research & Development

Alina A. von Davier

Nan Kong

ResearchReport

November 2003 RR-03-31

A Unified Approach to Linear Equating for the Non-Equivalent Groups Design

A Unified Approach to Linear Equating for the Non-Equivalent Groups Design

Alina A. von Davier and Nan Kong

Educational Testing Service, Princeton, NJ

November 2003

Research Reports provide preliminary and limited dissemination of ETS research prior to publication. They are available without charge from:

Research Publications Office Mail Stop 7-R Educational Testing Service Princeton, NJ 08541

Abstract

This paper describes a new, unified framework for linear equating in a Non-Equivalent-groups

Anchor Test (NEAT) design. We focus on three methods for linear equating in the NEAT

design—Tucker, Levine observed-score, and chain—and develop a common parameterization

that allows us to show that each particular equating method is a special case of the linear

equating function in the NEAT design. We use a new concept, the Method Function, to

distinguish among the linear equating functions, in general, and among the three equating

methods, in particular. This approach leads to a general formula for the standard error of

equating for all equating functions in the NEAT design. We also present a new tool, the standard

error of equating difference, to investigate if the observed difference in the equating functions is

statistically significant.

Key words: Test equating, Non-Equivalent groups Anchor Test (NEAT) design, Tucker

equating, Levine observed-score equating, chain linear equating, standard error of equating, delta

method

i

Acknowledgments

The authors would like to thank Paul Holland, Neil Dorans, Hariharan Swaminathan, Dan

Eignor, Shelby Haberman, Skip Livingston, and Krishna Tateneni for helpful comments and

suggestions during the development of this project. We are also thankful to Bruce Kaplan and

Ted Blew, who were very supportive in developing the software and carrying out the resampling

procedure. We would also like to thank Kim Fryer, Elizabeth Brophy, and Diane Rein for their

help in editing the manuscript. The Educational Testing Service Research Allocation supported

our work. Any opinions expressed in this paper are those of the authors and not necessarily of

Educational Testing Service.

ii

Test equating methods are statistical tools used to produce exchangeable scores across different

test forms. In particular, observed-score equating methods, as opposed to true-scores equating

methods, refer to the transformation of the raw scores of a new test, X, on the raw scores of an

old test, Y. Any test equating process consists of a data collection design and different equating

methods. This paper focuses on the Non-Equivalent-groups Anchor Test (NEAT) design and the

linear equating function.

The NEAT design is a data collection design widely used in practice. It involves two

populations of test-takers (usually different test administrations), P and Q, and a sample of

examinees from each. The sample from P takes test X, the sample from Q takes test Y, and both

samples take an anchor test, V, which is used to link X and Y.

For the NEAT design, several observed-score equating methods are commonly used.

Here we focus only on observed-score linear equating methods for the NEAT design.

In this paper, we take a new, mathematical approach to three linear observed-score

equating methods—Tucker equating for the NEAT design with an anchor (T), Levine observed-

score equating (L), and chain linear equating (CL)—by emphasizing their common framework

and their similarities. We define these methods carefully later.

In this way, we introduce a unified approach to linear equating in the NEAT design, and

we show that each of these equating methods is a special case of the linear equating function.

This approach allows us to establish new theoretical results on otherwise well-known equating

methods, creating a conceptual shift in the analysis of the observed-score linear equating

methods in the NEAT design: There are not only disparate methods, each with its own

framework, but they share the same parameter space and have numerous similarities. One of the

consequences is that we can develop one general formula for the standard error of equating

(SEE) that is applicable to most of the observed-score linear equating functions for the NEAT

design that are available. We also introduce in this paper a new, practical tool, the standard error

of equating difference (SEED), for investigating whether the differences between the (linear)

equating methods are statistically significant.

More precisely, in this paper we investigate the linear equating in the NEAT design from

several points of view:

1. We put the three methods on a common footing by developing the same

parameterization for the (equating) functions. We also make use of the concept of a

1

Method Function in the framework of the linear equating to show that there is only

one definition of the linear observed-score equating function that might have different

special cases. (A related concept, the Design Function, was first introduced in von

Davier, Holland, & Thayer, 2004, in a different context, to model the data collection

design.) This approach leads to a general formula for the SEE for linear equating in

the NEAT design in general and for the three equating functions in particular.

2. We generalize the SEED (von Davier et al., 2004) for any pair of (linear) equating

functions that share the same set of parameters. The SEED is a new tool to investigate

whether the difference between the equating functions is statistically significant.

3. We use real and resampled data from two national administrations of a high volume

testing program to illustrate the SEE (computed via the new general formula) and the

SEED.

We do not make any distributional assumptions about the variables involved in this

theoretical exposition.

Linear Equating Function for the NEAT Design

This section sets up the basic notation. We assume there are two tests to be equated, X

and Y, and a “target” population, T, on which this is to be done (Braun & Holland, 1982; Kolen

& Brennan, 1995).

In this paper, we use the standard notation of 2and µ σ for the means and the variances.

We also use the symbol π to denote the parameters in general. The subscripts usually indicate the

variable and the population. We use σ with appropriate subscripts to denote the covariances; for

example, , ;X V Pσ denotes the covariance of X and V in P, while Σ denotes the covariance matrix

of π.

Many observed score equating methods are based on the linear equating function.

Usually, the rational behind the linear equating on the target population, T, is to set standardized

deviation scores (z-scores) on the two forms to be equal such that

,XT YT

XT YT

x yµ µσ σ− −

=

2

where ,, , and YT YT XT XTµ σ µ σ are the means and the variances of X and Y in T. Solving for y in

the above equation results in the formula for the linear equating function,

( ) ( )( );Lin / .XY T YT YT XT XTx xµ σ µ σ= + − (1)

In the NEAT design, there are also another rational and implicitly another definition of a

linear equating function, that is, the chain linear equating function. The chain linear equating

function is given by chaining together the two linear linking functions (i.e., by using the

mathematical composition of the two linear functions), from X to V on P and from V to Y on Q,

that is, ( ) ( );Lin and LinXV P VY Q;x v . This results in

( )( )( ) ( )( )( )(( )( ) ( )( )(

; ;( ) Lin Lin

/ /

/ / /

XY VY Q XV P

YQ YQ VQ VP VP XP XP VQ

YQ YQ VQ VP VQ YQ VQ VP XP XP

CL x x

x

x

µ σ σ µ σ σ µ µ

µ σ σ µ µ σ σ σ σ µ

=

= + + − −

= + − + −

)) ,

(2)

and (2) is the usual form for the chain linear equating function. Moreover, the final equating

function does not depend on the target population, T. As shown in von Davier, Holland, and

Thayer (in press), (2) can be rewritten as (1) under appropriate assumptions. This will be

discussed in more detail later.

Usually, X and Y are the “operational tests” given to two samples from the two “test

administrations” P and Q, respectively, and V is the “anchor test” given to both samples from P

and Q. The anchor test score, V, can be either a part of both X and Y (called the internal anchor)

or a separate score (the external anchor).

In this study we assume that the target population, T, for the NEAT design is a mixture of

P and Q and is denoted by

( )1T wP w Q= + − , (3)

3

(see Braun & Holland, 1982, or Kolen & Brennan, 1995, for details on the concept of a target

population in the NEAT design).

The target population in (3) is determined by a weight w. When w = 1, then T = P, and

when , then T = Q. Other choices of w may be used as well. Typically, w is the ratio of the

sample size of the group from P and the sum of the sample sizes of the two groups.

0w =

In the NEAT design, X and Y are each only observed either on P or on Q, but not both.

Thus, X and Y are not both observed on T, regardless of the choice of w. For this reason

assumptions must be made in order to overcome this lack of complete information in the NEAT

design.

The three equating methods used in the NEAT design that concern us here, Tucker,

Levine, and chain linear equating, make different assumptions about the distributions of X and Y

in the populations where they are not observed. We identify these assumptions in the next

section.

Tucker, Levine, and Chain Linear Equating Methods

In this section we briefly describe the methods we use and their assumptions, which can

be found in more detail somewhere else (Kolen & Brennan, 1995, pp. 114-118; Angoff, 1984;

von Davier et al., in press). Here we provide only the information that is necessary to explain our

new approach, which is given in more detail later in this paper.

This section is structured as follows: First, we present the Tucker and Levine together,

stressing the similarities between them. Although the assumptions that underlie the two methods

are different, the computational forms are similar. We will not give computational details on

Tucker and Levine because they are well documented (Kolen & Brennan, 1995, pp. 114-118).

Then, we describe the chain linear equating method, following the development given in von

Davier et al. (in press). In the next section, we develop a common parameterization for the three

functions that allows us to compare the equating functions as well as their standard errors (SEE).

Tucker Equating Method: Assumptions

T1: The linear regressions of X on V and of Y on V are the same in the two populations.

T2: The conditional variances of X given V and of Y on V are the same in the two populations.

4

Levine Observed-score Equating Method: Assumptions

L1: X, Y, and V all measure the same thing, or, stated in different words, the true scores of the

tests (T and T ) and of the anchor (T ) in the two populations are perfectly correlated. X Y V

L2: The regressions of T on T and of T on T are linear and the same in the two

populations.

X V Y V

L3: The measurement error variances for X and for Y are the same in the two populations.

From the two sets of assumptions and from (1) the formulas for the parameters of X and Y

on T for Tucker and Levine follow. They are similar in form for the two equating methods:

( )1 ,XT XP P VP VQwµ µ µ µ = − − ∆ − (4)

,YT YQ Q VP VQwµ µ µ µ= + ∆ −

,

(5)

( ) ( ) 22 2 2 2 2 21 1XT XP P VP VQ P VP VQw w wσ σ σ σ µ µ = − − ∆ − + − ∆ − (6)

( ) 22 2 2 2 2 21YT YQ Q VP VQ Q VP VQw w wσ σ σ σ µ µ = + ∆ − + − ∆ − (7)

(see Kolen & Brennan, 1995, pp. 114–118 for the derivations).

The four ∆-parameters, which distinguish the two equating methods, Tucker and Levine,

have the following formulas:

For the Tucker method:

, ;, ;2 and ,Y V QX V P

P P Q QVP VQ

2

σσα

σ σ∆ = = ∆ = =α (8)

where , ;X V Pσ denotes the covariance of X and V in P and , ;Y V Qσ denotes the covariance of Y and

V in Q.

For the Levine observed-score equating function for a NEAT design with an external

anchor:

5

22, ;, ;

2, ; , ;

and ,YQ Y V QXP X V PP P Q Q

VP X V P VQ Y V Q

σ σσ σγ γ

σ σ σ σ++

∆ = = ∆ = =+ +2 (9)

which are the formulas for the Levine function derived under the assumptions L1-L3 and the

additional assumption of a congeneric model, for which the error variances are proportional to

the effective test lengths (see Kolen & Brennan, 1995, p. 117).

For the Levine function for a NEAT design with an internal anchor:

22

, ; , ;

and ,YQXPP P Q Q

X V P Y V Q

σσγσ σ

∆ = = ∆ = =γ

)

(10)

which are also derived under the additional assumption of a congeneric model (see Kolen &

Brennan, 1995, p. 116).

Chain Linear Equating Method: Assumptions

C1: The (linear) linking function from X on V is the same in the two populations, P and Q.

C2: The (linear) linking function from V on Y is the same in the two populations, Q and P.

We follow the notations and approach to chain linear equating given in von Davier et al.

(in press, Appendix A). We do not give any computational detail in this paper; instead we refer

to that work and quote only those formulas from it that are necessary for our exposition here. As

shown in von Davier et al. (in press, Appendix A) from C1 and C2, it follows that on a target

population, T, as defined in (3), we have

( )(/XT XP XP VP VT VPµ µ σ σ µ µ= + − , (11)

( )/XT VT VP XP ,σ σ σ σ= (12)

( )(/YT YQ YQ VQ VT VQµ µ σ σ µ µ= + − ).

, and (13)

( )/YT VT VQ YQσ σ σ σ= (14)

6

Von Davier et al. (in press) shows that under the assumptions C1 and C2 made by the

chain equating CL defined in (2) is, in fact, ( )XY x ( );n XY TLi x , as defined in (1). More precisely,

that work shows that applying (11)–(14) to the chain linear function from (2) results in (1). The

target population, T, cancels out of the composed function, ( )( ); ;n LinVY Q XV PLi x . This provides a

direct argument that chain linear equating is the linear observed score equating on T with 2 nd YT

2, , , aXT YT XTµ µ σ σ given by the expressions in (11)–(14).

Identifying the Parameters of the Tucker, Levine, and Chain Linear Equating Functions

In this section, we introduce a common parameterization for the linear equating functions

described above. We show that this approach leads to a unified framework for all the linear

equating functions in the NEAT design.

Consider the linear equating function (1) that equates X to Y on the target population, T,

in the form of (3). This equating function depends on 2, , , and XT XT YT YT2µ σ µ σ , which are

parameters on the population T. We can express this dependence of the equating function on the

target population parameters by using the notation

( 2;Lin ; , , ,XY T XT XT YT YTx µ σ µ σ )2

.

= a generic linear equating function. (15)

In (4)–(14), we observe that the four parameters on T depend on the 10 means, variances, and

covariances in the two populations, P and Q. Denote by the column vector of the 10

parameters from the two bivariate distributions, that is,

π

2 2 2 2

, ; , ;( , , , , , , , , , )tXP XP VP VP X V P YQ YQ VQ VQ Y V Qµ σ µ σ σ µ σ µ σ σ=π (16)

We use a new concept, a function that will map the 10 parameters from the two

populations, P and Q, into the four parameters on the population T.

To preserve the similarities to von Davier et al. (2004) and to emphasize the similarities

across the equating methods, we will call this function the Method Function (MF).

7

( ) ( 2 2MF , , , .t

XT XT YT YTµ σ µ σ=π )

)

)

(17)

Now, we rewrite (15) as

(;Lin ;MF( )XY T x π = a linear equating function obtained through a specific MF, (18)

with defined in (16). π

The previous section showed that all three linear equating functions, Tucker, Levine, and

chain linear, can be expressed as (1). Thus, they can also be expressed as (18), in which the

Method Function differs according to which equating method is used. For the Tucker method,

the Method Function is described by the formulas (4)–(8). For the Levine method, the Method

Function is given by (4)–(7) and (9) for an external anchor, and by (4)–(7) and (10) for an

internal anchor. For the chain linear method, the Method Function is described by (11)–(14).

Each Method Function is given in detail in the appendix in Table A1.

From (2) as well as from (11)–(14), we observe that the covariance between X and V on P

and the covariance of Y and V on Q do not appear in the formulas of the chain linear equating

function. Hence, the chain linear equating function depends only on eight parameters, while the

Tucker and Levine functions depend on ten parameters. However, by using (15), (17), and (18),

we can express the three linear equating functions as sharing the same parameter space. Note the

chain linear function implicitly depends on the covariances between the tests and the anchor only

if before computing the equating function, the two bivariate distributions of the tests and the

anchor are presmoothed using, for example, log-linear models (see von Davier et al., 2004;

Holland & Thayer, 2000).

Equating functions are estimated by substituting estimates of the population parameters

in (18), that is,

( ) (( ); ; ˆLin ;MF( ) Lin ;MF ,XY T XY Tx x=π π

)

(19)

where π̂ denotes a sample estimate of . π

The uncertainty in Li derives from the uncertainty in the estimate of .

Because the samples are independently drawn from populations P and Q, the covariances

( )(;n ;MFXY T x π π

8

between each of the five parameters estimated from the population P and the five parameters

estimated from the population Q are zero.

Hence, the covariance matrix of the parameter for the three equating functions,

Tucker, Levine, and chain linear, is:

Σ π

(20) 0

,0

P

Q

=

ΣΣ

Σ

where PΣ

Q

denotes the covariance matrix of the five parameters obtained from the population P

and denotes the covariance matrix of the five parameters obtained from the population Q. Σ

Also note that the “Braun and Holland” linear equating method for the NEAT design

(Braun & Holland, 1982; Kolen & Brennan, 1995, p. 146) shares the same parameter vector

and has the same covariance matrix of , as in (20).

π

π

In this section, we introduced a common parameterization that can be used for most of

the available observed-score linear equating functions in the NEAT design. We showed that one

could write down the Method Function formulas for each of three methods that we analyzed

here, and we think that one could easily write the appropriate Method Function for any other

observed-score linear equating function. However, the investigation of additional equating

functions is beyond the scope of this study.

Standard Error of Equating

In this section, we show that using a common parameterization for all linear equating

functions in a NEAT design leads to a general formula for the SEE.

The delta method, a general method for approximating standard errors that is based on

the Taylor expansion (Rao, 1965; Kendall & Stuart, 1977), is widely used for computing

standard errors. Kolen (1985) and Hanson, Zeng, and Kolen (1993) used the delta method to

compute the SEE for the Tucker method and the Levine method, respectively.

Although we also use the delta method for computing the SEE, our approach differs from

Kolen (1985) and Hanson et al. (1993) in the following sense: We provide a unified approach

that, through the MF, includes not only the Tucker and the Levine methods, but also chain linear

equating and other linear observed-score equating functions—such as the Braun and Holland

9

linear equating method (Braun & Holland, 1982; Kolen & Brennan, 1995). In order to emphasize

this unity, we focus on the matrix form of the SEEs, (21) below, rather than on the sum form, as

did Kolen (1985) and Hanson et al. (1993). The approach presented here has similarities with the

approach developed in von Davier et al. (2004).

Delta Method Applied to Linear Equating

We use the delta method to calculate the asymptotic variance, ,

whose square root is the SEE.

( )( );Va r Lin ;MF( )XY T x π

From the delta method (Theorem A1 in the appendix), it follows that the asymptotic

variance of a smooth function, f, that depends on the parameter vector, , is π

( )( ) ( ) ( ) ( )V ar tff =π J π Σ π J πf (21)

where is the Jacobian (the matrix or vector of the first derivatives of the function f with

respect to the components ofπ ) computed at the estimated values of (see also von Davier et

al., 2004; von Davier, 2001).

( )fJ π

π

Let the parameter from (16) be the parameter vector described in Theorem A1 and let f

be a linear equating function,

π

( );Lin ;MF( )XY T x π , given in (1). The Method Function can refer to

any of the Tucker, Levine, and chain linear functions. The Jacobian of is,

according to matrix differentiation theory and differentiability of composition of functions,

( );Lin ;MF( )XY T x π

Lin MF= ,fJ J J

where is the vector of the first derivatives of the function from (1) with respect to

. is the matrix of the first derivatives of

LinJ

( , ,XT )2 ,XT YT YTµ σ µ σ 2MFJ ( )2, , ,XT XT YT YTµ σ µ σ 2 with respect

to the components of π from (16).

In the previous section we showed that the (10 by 10) covariance matrix is the same

for the Tucker, Levine, and chain linear functions. Moreover, the Jacobian J

Σ

Lin will also have the

same form for all observed-score linear equating functions (the Jacobian of the linear function,

for any of the three equating functions, is a 4-dimensional (row) vector). The Jacobian JMF is a 4

by 10-matrix and will have a different form for each of the equating methods.

10

Now, by using (21), the SEE of a linear equating function, , can be

expressed as

( ); ˆLin ;MF( )XY T x π

( )2Lin MF MF Lin

ˆ ˆ ˆ ˆSEE ,t tx = J J Σ J J (22)

with from (20). Σ

Equation (22) is the computational formula for the SEE for the Tucker, Levine, and chain

linear methods, that is, the formula that might be implemented into a computer program.

It is easy to see the computational advantages of having only one formula for the SEE for

all linear equating methods. Note that this formula does not require any distributional assumption

on the variables involved.

The entries of in (22) can be obtained from Kolen, 1985. The derivatives

for the Tucker equating function are given in Kolen (1985). The derivatives for the

Levine function for a NEAT design are given in Hanson et al. (1993). The derivatives

for the chain linear equating, given in Table A2, were computed by us.

Σ Lin MF=fJ J J

Lin MFJ=fJ J

Lin MF=fJ J J

We use the notations SEET, SEEL, and SEECL to refer to the SEE for the Tucker, Levine,

and chain linear methods, respectively.

SEED for Linear Equating Functions

In this section, we state a new result that is analogous to (21) and that will allow us to

compute a standard error for the difference between two linear equating functions. This standard

error can be used to inform discussion about the final form of an equating function.

The SEED was first introduced in von Davier et al. (2004) for the kernel method of test

equating. This paper applies the same concept to the linear equating functions. The main

differences between the SEED in von Davier et al. (2004) and the SEED here lie in the fact that

the parameters of the equating functions and the equating functions themselves differ. In the

kernel method of test equating, the parameters are the score probabilities of the tests to be

equated (and, in chain equipercentile, also the score probabilities of the anchor test); in the case

of linear equating, the parameters are the means, the variances, and the covariances of the tests to

be equated and of the anchor test in the two populations, P and Q.

11

Consider two equating functions ( ) ( )1Lin ;MF ( ) and Lin ;MF ( )x xπ 2 π

))2 π

, which have the

form given in (1) and depend on the same parameter vectors from (16) (i.e., the assumptions on

the functions required by the delta method are met). We are interested in

( ) (( 1V ar Lin ;MF ( ) Lin ;MF ( ) .x x−π (23)

Theorem 1. If ( ) ( )1n ;MF ( ) and Lin ;MF ( )x xπ 2 π

)

Li are two equating functions that have

the form given in (1) and depend on the same parameter vector, , from (16), then π

( ) ( )( ) 1 2 1 21 2 Lin MF MF MF MF Linˆ ˆ ˆ ˆ ˆ ˆV ar Lin ;MF ( ) Lin ;MF ( ) ( ) ( ) ,t tx x− = − −π π J J J Σ J J J (24)

where is the 4-dimensional-row vector of the first derivatives of the function from (1) with

respect to the parameters on T, ( , J is the 4 by 10 matrix of the first

derivatives of the four components of the Method Function,

LinJ

2 2, , ,XTXT YT YTµ σ µ σ MF

( )2, , ,XTXT YT YTµ σ µ σ

π

2 , with respect to

the components of , and is the variance-covariance matrix of , given in (20). π Σ

The proof follows from the delta method (Theorem A1), applied to the difference of two

smooth functions that depend on the same parameters (see also von Davier et al., 2004, chapter 5).

Hence, the SEED is

( ) ( )( )21 2SEED =V ar Lin ;MF ( ) Lin ;MF ( )x x−π π . (25)

Corollary 1. The SEEDs for any pair of the three equating functions, Tucker, Levine, and

chain linear, are:

2T,L Lin T L T L Lin

ˆ ˆ ˆ ˆ ˆ ˆSEED = ( ) ( ) ,t t− −J J J Σ J J J (26)

2 tCL,L Lin CL L CL L Lin

ˆ ˆ ˆ ˆ ˆ ˆSEED = ( ) ( ) ,t− −J J J Σ J J J (27)

12

2T,CL Lin T CL T CL Lin

ˆ ˆ ˆ ˆ ˆ ˆSEED = ( ) ( ) ,t t− −J J J Σ J J J (28)

with from (20) Σ

The proof follows from Theorem 1.

The entries of are given in Kolen (1995). The entries of are given in

Hanson et al. (1993) and the entries of are given in Table A2.

Lin TJ J Lin LJ J

Lin CLJ J

In conclusion, the SEED is a measure of the uncertainty in the difference between two

equating functions that is due to the estimation of the parameters (the means, variances, and

covariances in the two samples). It also reflects the differences in the two Method Functions. We

propose the following practical rule: If the difference between two linear equating functions is no

larger than the noise level in the data, then this difference would be smaller than twice the SEED

in either direction (see also von Davier et al., 2004).

Study 1

Here we illustrate how the general formula for the SEE and a new tool, the SEED, for the

Tucker, Levine, and chain linear methods can be applied using an example that involves data

from two national administrations of a high volume testing program. The two testing

administrations were in the fall of 2001 (P) and in the winter of 2000 (Q).

We consider this example to be an informative one, in the sense that it departs from the

ideal conditions described in von Davier (2003) when the equating methods give the same

results. Moreover, as seen later, the difference between the three equating functions of interest is

about half score point or more, which is a difference that matters for the program from which the

data come. (A difference in equating results that is large enough to make a difference in the

reported scores is called a difference that matters.)

The data, which were collected following a NEAT design with an external anchor,

consisted of the raw sample frequencies of rounded formula scores for two parallel, 78 item tests

and a 35 item external anchor test given to two samples from a national population of examinees.

(The rounded formula scores are scores in which “the right minus a quarter wrong” formula

scores are rounded to integers.) In this study, the negative scores were rounded to zero.

13

The data are sample frequencies for two bivariate distributions. We denote the two sets of

sample frequencies by number of examinees with jln = jX x= and V lv= , and = number of

examinees with Y and V .

klm

k= y lv=

In this example, ; the same is true for . For , we have

. The two sample sizes are given by: N =10,634 and M =11,321. The

sample correlation of X and V in P was 0.88, and the sample correlation of Y and V in Q was

0.87.

1 2 790, 1, , 78x x x= = =… ky lv

1 2 360, 1, , 35v v v= = =…

Table 1

Summary Statistics for the Observed Distributions of X, Y, V in P and V in Q

X Y VP VQ

Mean 39.25 32.69 17.05 14.39

SD 17.23 16.73 8.33 8.21

From Table 1 we see that the mean of the anchor test V is 17.05 (±0.08) in population P,

and 14.39 (±0.08) in Q, where 0.08 is the standard error of the mean. Thus Q is a less proficient

population than P, as measured by V. In terms of effect sizes, the difference between these two

means (2.66) is approximately 32% of the average standard deviation of 8.27. For this type of

testing program, a mean difference of this magnitude indicates a fairly large difference between

the two populations.

Before chain linear equating was in use, ETS researchers were guided by the following

rules when they had to choose between Tucker and Levine equating: “If the standardized mean

difference of the anchor scores in the two samples is smaller than 0.25, then choose the Tucker

method,” and “If the ratio of the variances of the anchor in the two samples is between 0.80 and

1.25, then use the Tucker method” (Kirk, 1971; Wichert, 1967). We couldn’t find any rational

explanation for these rules, especially for the cut-off values. Kolen and Brennan (1995, pp. 131–

132), however, suggest “choosing Levine when it is known that populations differ

substantially…and if there is also reason to believe that the forms are quite similar” and choosing

Tucker if the forms are suspected to differ, with the observation that “if the populations [and

14

forms] are too dissimilar, then any equating is suspect” and with the note that “this ad hoc

reasoning is by no means definitive.”

Hence, based on this information, one would have chosen the Levine equating function

particularly for this example since the test forms are very carefully constructed to be parallel in

this assessment program.

We used the formulas (1), (4)–(10) to compute the Tucker and Levine functions. We used

(2) to compute the chain linear equating function. The equating functions, the SEEs, and the

SEEDs are discrete functions of x.

The three functions, shown in Figure 1, give relatively different results. The differences

between the Tucker and Levine functions and the Tucker and chain functions are more than a

half raw score point for the whole score range, which is a difference that matters. The difference

between Levine function and the chain function is less than the size of a difference that matters

for the whole score range.

15

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0 10 20 30 40 50 60 70 80

X-SCORE

EQU

ATI

NG

DIF

FER

ENC

E

T-LT-CL-C

Figure 1. The Tucker, Levine, chain linear functions. Study 1. NEAT design with an external anchor.

The three SEEs are given in Figure 2. The shape of the SEEs is the usual one for linear

equating functions, with lower values around the means and higher values for the extreme score

ranges. The SEE for the Levine function seems to be larger than the SEE for the chain linear

function and for the Tucker function, which has the smallest values almost overall on the score

range (see Figure 2). The SEEs for the three functions are very close to each other, though, and

therefore, one could not choose a method solely based on these SEEs values.

16

0.1

0.13

0.16

0.19

0.22

0.25

-2 8 18 28 38 48 58 68 78

X-SCORE

SEE

TLC

Figure 2. The SEET, SEEL, and SEEC. Study 1. NEAT design with an external anchor.

We then used (24)–(28) to compute the SEED for each pair of linear equating functions

(Figure 3). From the results plotted in Figure 3, we might conclude that the accuracy of the

difference between the Levine and chain linear functions is very high in the middle of the score

range, but relatively low for the lower and upper score range. In contrast, the accuracy of the

difference between the Tucker and chain functions and Tucker and Levine functions varies less

across the score range, being relatively high in the middle of the score range. Since the accuracy

of estimating the parameter π is the same for the three equating functions and the vector of the

first derivatives of the linear function is also the same for the three equating functions, this plot

reflects the differences in the pairs of the Method Functions (more exactly, the differences in the

first derivatives of these functions)—see also Corollary 1.

17

0.015

0.035

0.055

0.075

0.095

0.115

0.135

-2 8 18 28 38 48 58 68 78

X-SCORE

SEED

T-LT-CL-C

Figure 3. The standard error of equating differences for three equating functions. Study 1. NEAT design with an external anchor.

Figures 4–6 plot the difference between two linear equating functions together with the

corresponding ±2 SEED. In these three cases, the differences between the three functions (about

half of a raw score point or more—see also Figure 1) are statistically significant relative to the

SEEDs. It appears that the Levine and the chain functions agree only at the very low end of the

score range. As mentioned before, the SEEDs reflect the uncertainty in these differences that are

due to the estimation of the parameters (the means, the variances, and the covariances in the two

samples) as well as to the differences in the Method Functions.

18

-1.25

-1.05

-0.85

-0.65

-0.45

-0.25

-0.05

0.15

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,L

)

T-L2*SEED(T,L)-2*SEED(T,L)

Figure 4. The difference between Tucker and Levine together with a band of ±2SEEDT, L. Study 1. NEAT design with an external anchor.

19

-0.9

-0.7

-0.5

-0.3

-0.1

0.1

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

) T-C2*SEED(T,C)-2*SEED(T,C)

Figure 5. The difference between Tucker and chain linear together with a band of ±2SEEDT,C. Study 1. NEAT design with an external anchor.

20

-0.32

-0.27

-0.22

-0.17

-0.12

-0.07

-0.02

0.03

0.08

0.13

0.18

0.23

0.28

0.33

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(L,C

)

L-C2*SEED(L,C)-2*SEED(L,C)

Figure 6. The difference between Levine and chain linear functions together with a band of ±2SEEDL, C. Study 1. NEAT design with an external anchor.

In conclusion, we observe that the differences between the three equating functions are

statistically significant. In other words, with the help of the SEED, we can distinguish between

noise and real differences between the analyzed functions.

Study 2

The SEEDs are asymptotic results, so it is of interest to investigate how they vary with

sample size. Sample sizes of 10,000 are relatively large, and therefore, the estimation of the

parameters is relatively accurate. As a consequence, the ±2SEED band will be very narrow.

Study 2 examined the following research questions:

What is going to happen to the SEED when the sample sizes get smaller?

21

For which N will the ±2SEED band be about half of a raw score point (a difference that

matters)?

For which N will the ±2SEED band encompass the difference between the equating

functions? More precisely, for which N will the SEED not be able to detect that the equating

functions differ statistically?

We resampled seven samples of sizes 5,000; 2,500; 1,700; 800; 400; 200; and 100 for

each group of students from P (those who took (X, V)) and Q (those who took (Y, V)),

respectively. These samples were independent random samples, drawn without replacement from

the original N = 10,634 for (X, V) and M = 11,321 for (Y, V). Sorted, uniformly distributed

random numbers between 0 and 1 (including 0 and 1) were generated in Microsoft Excel using

the RAND function. The steps we used for sampling are as follows for each sample size within

each population:

1) Assign a random number from the uniform distribution between 0 and 1 to

each case (person) in the group. There are N cases in the first group.

2) Sort these N random numbers.

3) The first NS cases (where NS is size for the new sample and NS is less or equal

to N) are chosen to be included in the new sample.

We repeated the same procedure for the second group.

The summary statistics for X, Y, and V in P and Q in the new samples are given in Tables

2a and 2b.

22

Table 2a

Summary Statistics for the Distributions of X, Y, V in P and V in Q, for the Samples With Different Sample Sizes, NS and MS

N = 10,634 M = 11,321

NS = 5,000 MS = 5,000

NS = 2,500 MS = 2,500

NS = 1,700 MS = 1,700

XPµ 39.25 39.29 39.28 38.90 2XPσ 17.22 17.21 17.35 17.28

VPµ 17.05 17.04 17.10 16.84 2VPσ 8.33 8.31 8.36 8.33

, ;X V Pσ 126.43 126.04 129.31 126.16

YQµ 32.68 32.78 33.00 32.62 2YQσ 16.72 16.79 16.86 16.59

VQµ 14.38 14.51 14.52 14.31 2VQσ 8.20 8.20 8.25 8.11

, ;Y V Qσ 120.11 120.35 120.51 116.94

Moreover, we took care to preserve the same sign as that for the differences of the means

in the two samples. We also took care to approximately preserve the same effect sizes (with

respect to the difference in the ability in the two populations as measured by the anchor) across

the samples (for example, we resampled a second set of samples of size 100 in order to preserve

the same sign for the differences of the means in the two samples). It is important to note that,

although the resampling was carefully carried out, by having smaller samples the parameter

estimates will fluctuate around the values in the original samples. This measurement error will

also have an effect on the computation of the equating functions, and their differences,

respectively.

23

Table 2b

Summary Statistics for the Distributions of X, Y, V in P and V in Q, for the Samples With Different Sample Sizes, NS and MS

NS = 800 MS = 800

NS = 400 MS = 400

NS = 200 MS = 200

NS = 100 MS = 100

XPµ 38.51 39.28 39.87 38.09 2XPσ 17.47 17.63 17.73 18.0

VPµ 16.74 17.03 17.54 17.04 2VPσ 8.43 8.62 8.34 8.86

, ;X V Pσ 130.77 135.23 132.62 146.31

YQµ 32.44 32.12 32.28 33.77 2YQσ 16.58 16.78 16.33 16.83

VQµ 13.99 14.08 14.10 14.86 2VQσ 8.19 8.35 8.32 8.14

, ;Y V Qσ 117.68 121.43 121.43 121.54

Figures 7 to 13 plot only the differences between the Tucker and the chain linear equating

together with ± 2SEEDT,C. Study 2 focuses on the SEED’s behavior for small and medium

sample sizes, and therefore, for this purpose it doesn’t matter on which functions we focus. The

results for the differences between Tucker and Levine functions are similar to the results for

Tucker and chain linear, while the results for the differences between chain linear and Levine are

as in Figure 6. Each figure illustrates one sample size, with NS = MS = 5,000, 2,500, 1,700, 800,

400, 200, and 100, respectively.

We notice that when the sample sizes are small, the uncertainty related to computing the

equating functions is large relative to the difference in the two functions (from 0.025–0.12 in the

original sample—see Figures 3 and 5—to 0.3–0.6 when NS = MS = 100—in Figure 7). Hence,

with a sample size of 100 available, we would conclude that the differences in the two equating

functions are not statistically significant. Moreover, the ± 2SEEDT,C, in absolute value, is larger

than a difference that matters.

24

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 7. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 100. NEAT design with an external anchor.

For a sample size of 200, the differences between the Tucker and chain functions are

statistically significant and the ± 2SEEDT,C is about the size of a difference that matters (see

Figure 8). However, at the lower and upper score range, the difference between the two equating

functions is inside the band provided by the ± 2SEEDT,C. One of the reasons is that the accuracy

is lower at extremes of the score range.

25

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 8. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 200. NEAT design with an external anchor.

For a sample size of 400, the differences between the Tucker and chain functions are

statistically significant over most of the score range, and the 2SEEDT,C is about the size of a

difference that matters (see Figure 9).

26

-1.5

-1

-0.5

0

0.5

1

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 9. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 400. NEAT design with an external anchor.

For all the larger samples, the differences between the Tucker and chain functions are

statistically significant over all of the score range (see Figures 10–13).

27

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 10. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 800. NEAT design with an external anchor.

28

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 11. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 1,700. NEAT design with an external anchor.

29

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 12. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 2,500. NEAT design with an external anchor.

30

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0 10 20 30 40 50 60 70 80

X-SCORE

SEED

(T,C

)

T-C2*SEED(T,C)-2*SEED(T,C)

Figure 13. The difference between Tucker and chain linear functions together with a band of ±2SEEDT, C. Study 2, N = 5,000. NEAT design with an external anchor

With a sample size of 200 available, we conclude that the two equating functions

significantly differ for most of the score range. For larger sample sizes, we notice that the

accuracy increases (i.e., the 2SEEDT,C in absolute values decreases), and one can conclude that

the differences between the two functions are statistically significant.

It follows that for this data set, a sample size of 200 seems to be enough for the SEED to

detect that the two equating methods, Tucker and chain, differ statistically. The level of accuracy

is slightly decreased for this small sample size. More studies are necessary to investigate the

SEED behavior in small and medium samples. Given that in most of the practical equating

situations the sample sizes are much larger, the SEED probably will detect whether the

differences between the equating methods are significant.

31

In von Davier (2003), several idealized conditions are described when the three methods

will give the same results. However, in practical applications, each of these conditions holds

more or less. In a real life situation, when plots like those from Figures 4–6 indicate that the

differences between the methods are statistically significant, which of the methods should one

choose?

From a score-reporting point of view, it does matter which method

one would choose in this example because the differences between the results from the Tucker

method and the others do have an impact on the final results. (These differences are larger than

half a raw score point for most of the raw-score range of X.) From Study 1, we can conclude that

Tucker is far away from the other two equating methods and that the chain linear is in between

Tucker and Levine ( ( ) ( ) ( ); T ; CL ; LLin ;MF ( ) <Lin ;MF ( ) <Lin ;MF ( )XY T XY T XY Tx x xπ π π ). Moreover,

all observed differences are statistically significant.

We cannot make the decision about the final equating function using the SEED alone, if

each of the equating methods relies on a different set of assumptions. We also cannot resolve the

choice between the methods by directly checking their assumptions (T1–T2, L1–L3, C1–C3)

against the data, since these assumptions are not directly testable.

In a practical situation, one will also investigate the issues related to the possible

nonlinearity of the appropriate equating function. In addition, one should also investigate the

SEE for each equating function. The equating results with a higher accuracy (smaller SEEs)

should prevail. However, in Study 1, the differences in the SEEs were very small and therefore,

it would be difficult to use them for making the decision.

As mentioned before for this example where the two populations seem to be dissimilar,

using the rules and discussion previously presented, one would choose the Levine equating

function (when choosing between Tucker and Levine methods), since usually the test forms are

very carefully constructed in this assessment program. Hence, the final decision would appear to

be between the Levine and the chain functions.

At this point, one’s belief in the plausibility of each set of assumptions “appears to be the

sole basis left for making this important judgment” (von Davier et al., 2004, p. 194). Further

research in this area is necessary. The advantages of the SEED are outlined in the next section.

32

Discussion

This paper takes a new perspective on linear equating. It introduces a unified approach to

linear equating in the NEAT design by developing a common parameterization that allows one to

emphasize the similarities between different methods. Based on this common parameterization,

we claim that there is only one definition of observed-score linear equating in the NEAT design,

given in (1), which might take different forms under different assumptions.

We use a new concept, the Method Function, to distinguish among the possible forms

that a linear equating function in a NEAT design might take (in particular among the three

equating methods investigated here—Tucker, Levine, and chain linear equating). By using this

approach, the SEE formula and concept also becomes unified, covering all of the particular

equating functions.

The new approach to linear equating provides a better understanding of equating in

general as well as of the SEE. This view is provided here for the first time (to our knowledge).

The new formula for the SEE makes a computer program more efficient.

We also present a new tool, the standard error of equating difference (SEED), to

investigate if the observed difference in equating functions is statistically significant. Although

the SEED is an asymptotic result, it seems to be stable enough to detect the differences in a

sample size of 200 for the data investigated here. Additional studies might be necessary to

describe the behavior of the SEED for small and medium sample sizes for different data.

The SEED provides an additional measure to consider when making decisions about the

final equating function, especially for medium sample sizes. It is important to know if the

observed differences between two equating functions are statistically significant or they reflect

only random errors. This issue was extensively investigated in empirical studies, and as Harris

and Crouse (1993, p. 219) conclude:

Perhaps the most common process followed in conducting an equating study is to

apply a series of equating methods to a particular situation. Usually all that can be

concluded from such a comparison is whether the methods appear to be providing

similar or dissimilar results, and even that cannot be determined with any

accuracy, because one generally does not have a baseline by which to judge if the

differences between results are simply the result of random error, or something

else.

33

The SEED is exactly the answer to the second part of Harris and Crouse’s remark: The

SEED can tell if the observed differences are the result of random error or not. While it does not

solve the problem of how to decide between different equating functions, it is a step forward in

providing more insight and information that one can use when making this decision.

Harris and Crouse (1993) reviewed all criteria and methods that researchers had

developed for improving this decisional process up to 1993. Three other methods can be

considered:

1. Investigating how sensitive each of the equating functions is to the population

invariance assumption (see Dorans & Holland, 2000; von Davier et al, 2003). The

method introduced in von Davier et al. (2003), though promising, needs additional

research.

2. Carrying out a score equity analysis proposed in Dorans (2003). This is also an

approach to the study of population invariance, but it focuses on different issues:

specifying the number of subpopulations that should be investigated, checking if

the subpopulation score distributions are similar, computing the standardized

difference between the means in the important subpopulations, and using the

Dorans and Holland measure (2000) to investigate the population invariance of

the equating function.

3. Comparing the first several moments of the distribution obtained through equating

with those of the distribution of the old form (the targeted distribution—see von

Davier, et al., 2004, chapter 4).

It is also worth noting that a similar approach as outlined here (with general formulas for

the SEE and SEED) is being developed to investigate the differences between linear and

nonlinear equating functions in the framework of the kernel method of test equating (see von

Davier et al., 2004). A similar SEED formula is not feasible for the classical equipercentile

equating (which uses a linear interpolation as a continuization procedure) because the resulting

equating function is not continuously differentiable at the extreme of the linear segments (and

therefore, the delta method cannot be applied). Bootstrap SEED might be conceived for this

situation, which might be a very interesting issue for further research.

34

References

Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing

Service. (Reprinted from Educational measurement, 2nd ed., pp. 508–600, by R. L.

Thorndike, Ed., 1971, Washington, DC: American Council on Education.)

Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of

some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating

(pp. 9–49). New York: Academic.

von Davier, A. A. (2001). Testing unconfoundedness in regression models with normally

distributed variables. Aachen: Shaker Verlag.

von Davier, A. A. (2003). Notes on linear equating methods for the Non-Equivalent Groups

design (ETS RR-03-24). Princeton, NJ: Educational Testing Service.

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.

New York: Springer Verlag.

von Davier, A. A., Holland, P. W. & Thayer, D. T. (2003). Population invariance and chain

versus post-stratification methods for equating and test linking. In N. Dorans (Ed.),

Population invariance of score linking: Theory and applications to Advanced Placement

Program® Examinations (ETS RR-03-27). Princeton, NJ: Educational Testing Service.

von Davier, A. A., Holland, P. W. & Thayer, D. T. (in press). The chain and post-stratification

methods for observed-score equating: Their relationship to population invariance.

Journal of Educational Measurement.

Dorans, N. J. (2003, May 16). Score equity analysis. Paper presented at the Ledyard R. Tucker

Psychometric Workshop, Educational Testing Service, Princeton, NJ.

Dorans, N. J., & Holland, P. W. (2000). Population invariance and equitability of tests: Basic

theory and the linear case. Journal of Educational Measurement, 37, 281–306.

Hanson, B. A., Zeng, L., & Kolen, M. J. (1993). Standard errors of Levine linear equating,

Applied Psychological Measurement, 17, 225–237.

Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement

in Education, 6(3), 1995–240.

Holland, P. W., King, B. F., & Thayer, D. T. (1989). The standard error of equating for the

kernel method of equating score distributions (ETS PSRTR-89-83, ETS RR-89-06).

Princeton, NJ: Educational Testing Service.

35

Holland, P. W., & Thayer, D. T. (2000). Univariate and bivariate loglinear models for discrete

score distributions. Journal of Educational and Behavioral Statistics, 25, 133–183.

Kendall, M., & Stuart, A. (1977). The advance theory of statistics (4th ed., Vol. 1). New York:

Macmillan.

Kirk, D. B. (1971). Toward a better understanding of the equating process. Unpublished

manuscript.

Kolen, M. J. (1985). Standard errors of Tucker equating. Applied Psychological Measurement, 9,

209–223.

Kolen, M. J., & Brennan, R. J. (1995). Test equating: Methods and practices. New York:

Springer.

Rao, C. R. (1965). Linear statistical inference and applications. New York: Wiley.

Wichert, V. E. (1967). Methods of equating test forms and an equating computer system.

Unpublished manuscript.

36

Appendix

Delta Method

Theorem A1. Suppose that there is given a sequence of statistical models indexed by

(usually the sample size) with the same parameter space n∈N ,Θ which is a nonempty open

subset of R . Let m ˆnπ be a sequence of vector statistics, such that ˆnπ is an asymptotically normal

estimator for π , that is,

( ) ( )( )ˆ 0, , ,dnn Nπ π π π− → ∀Σ ∈Θ

)

where ( )(0,N πΣ denotes the multivariate normal distribution with expectation zero and

covariance matrix ( )πΣ . Consider a function, f, of π with and assume that

f is continuously differentiable on Θ. By

: pf Θ→ R , m p≥

( )f πJ , we denote the p by m Jacobian matrix of f at π

(the matrix of the first derivatives of f by the components of π ). Then the distribution of

( ) (( ˆn f ))n f π π− converges to the multivariate normal distribution with expectation zero and

covariance matrix ( ) ( ) ( )tf fπ π πΣ JJ .

37

Table A1

The Method Function for Tucker, Levine, and Chain Linear Equating

MF XTµ 2XTσ YTµ 2

YTσ

MFT (1 ) ( )XP P VP VQwµ α µ µ− − − 2 2 2 2(1 ) ( )XP P VP VQwσ α σ σ− − − 2 2(1 ) ( )P VP VQw w α µ µ+ − −

( )YQ Q VP VQwµ α µ µ+ − 2 2 2( )YQ Q VP VQwσ α σ σ+ − 2 2 2(1 ) ( )Q VP VQw w α µ µ+ − −

MFL (1 ) ( )XP P VP VQwµ γ µ µ− − − 2 2 2 2(1 ) ( )XP P VP VQwσ γ σ σ− − − 2 2(1 ) ( )P VP VQw w γ µ µ+ − −

( )YQ Q VP VQwµ γ µ µ+ − 2 2 2( )YQ Q VP VQwσ γ σ σ+ − 2 2 2(1 ) ( )Q VP VQw w γ µ µ+ − −

MFC ( )XPXP VT VP

VP

σµ µ µσ

+ − 2VTXP

VP

σ σσ

( )YQYQ VT VQ

VQ

σµ µ µ

σ+ − 2VT

YQVQ

σ σσ

38

Note. The αs and the γs are given in (8)–(13).

Table A2

The Entries of the JLinJMF for Chain Linear Equating

Parameters Derivatives

YQµ 1

VPµ 2

2

YQ

VQ

σ

σ

XPµ 2 2

2 2

YQ VP

VQ XP

σ σ

σ σ− ⋅

VQµ 2

2

YQ

VQ

σ

σ−

2YQσ

2

2 2 2

1 ( )2

VPVP XP VQ

YQ VQ XP

µ µσ σ σ

+ − − ⋅

µ

(Table continues)

39

Table A2 (continued)

Parameters Derivatives

2VQσ

2 2

2 2 2( )

2YQ VP

VP XP VQ

VQ VQ XP

xσ σ

µ µσ σ σ

− + −

µ−

2VPσ ( )

2

2 2 2

1 12

YQXP

VQ VP XP

µσ σ σ

⋅ ⋅ −

2XPσ ( )

2 2

2 2 22YQ VP

XP

XP VQ XP

xσ σ

µσ σ σ

− −⋅

, ;X V Pσ 0

, ;Y V Qσ 0

Note. ( )( )

'

2

1 1( ) ( ) .2

xf x f x

x x xx= ⇒ ′ = − = −

40