investigating the relationship between item exposure and test overlap: item sharing and item pooling

22
Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Investigating the relationship between item exposure and test overlap: Item sharing and item pooling Shu-Ying Chen 1 * and Pui-Wa Lei 2 1 National Chung-Cheng University, Chia-Yi, Taiwan, Republic of China 2 The Pennsylvania State University, USA To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed-length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two. 1. Introduction Due to significant progress in computer technology, computerized adaptive testing (CAT) has grown in popularity in recent years. Many conventional paper-and-pencil (P&P) tests are now offered in a CAT format. It is well known that CAT has many advantages over conventional P&P tests. These advantages, however, may not be sustained if test items are compromised. When examinees can easily obtain test information from previous test takers and answer items correctly simply based on item pre-knowledge rather than on *Correspondence should be addressed to Dr Shu-Ying Chen, Department of Psychology, National Chung- Cheng University, 168 University Road, Min-Hsiung, Chia-Yi, Taiwan 62107, Republic of China (e-mail: [email protected]). The British Psychological Society 205 British Journal of Mathematical and Statistical Psychology (2010), 63, 205–226 q 2010 The British Psychological Society www.bpsjournals.co.uk DOI:10.1348/000711009X430906

Upload: pennstate

Post on 11-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Investigating the relationship between itemexposure and test overlap: Item sharing anditem pooling

Shu-Ying Chen1* and Pui-Wa Lei21National Chung-Cheng University, Chia-Yi, Taiwan, Republic of China2The Pennsylvania State University, USA

To date, exposure control procedures that are designed to control item exposure andtest overlap simultaneously are based on the assumption of item sharing between pairsof examinees. However, examinees may obtain test information from more than oneexaminee in practice. This larger scope of information sharing needs to be taken intoaccount in refining exposure control procedures. To control item exposure and testoverlap among a group of examinees larger than two, the relationship between the twoindices needs to be identified first. The purpose of this paper is to analytically derive therelationships between item exposure rate and each of the two forms of test overlap,item sharing and item pooling, for fixed-length computerized adaptive tests. Itemsharing is defined as the number of common items shared by all examinees in a group,while item pooling is the number of overlapping items that an examinee has with agroup of examinees. The accuracy of the derived relationships was verified usingnumerical examples. The relationships derived will lay the foundation for futuredevelopment of procedures to simultaneously control item exposure and item sharingor item pooling among a group of examinees larger than two.

1. Introduction

Due to significant progress in computer technology, computerized adaptive testing (CAT)

has grown in popularity in recent years. Many conventional paper-and-pencil (P&P) tests

are now offered in a CAT format. It is well known that CAT has many advantages over

conventional P&P tests. These advantages, however,may not be sustained if test items are

compromised. When examinees can easily obtain test information from previous testtakers and answer items correctly simply based on item pre-knowledge rather than on

*Correspondence should be addressed to Dr Shu-Ying Chen, Department of Psychology, National Chung-Cheng University, 168 University Road, Min-Hsiung, Chia-Yi, Taiwan 62107, Republic of China(e-mail: [email protected]).

TheBritishPsychologicalSociety

205

British Journal of Mathematical and Statistical Psychology (2010), 63, 205–226

q 2010 The British Psychological Society

www.bpsjournals.co.uk

DOI:10.1348/000711009X430906

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

their proficiency, the observed test scores would be invalid. Item sharing among

examinees does not raise great concern for conventional P&P testing because most P&P

tests are administered on periodic test schedules and most items are not reused. Item

sharing does raise great concern for CATs because most CATs are continuous tests and

items are reused (Chang & Zhang, 2002; Davey & Parshall, 1995; Revuelta & Ponsoda,

1998; Way, 1998). To reduce the threat of item sharing among examinees, item exposuremust be taken into account in conducting CATs.

Item exposure rate and test overlap rate are two indices commonly used to track

item exposure in CATs. Item exposure rate refers to the proportion of all CATs in which

an item is administered, and test overlap rate is defined as the proportion of items shared

by pairs of exams, averaged across all possible pairwise comparisons. By considering

both indices, item exposure can be monitored at both the item and test levels. With

respect to item exposure at the test level, the test overlap rate could provide useful

information on item sharing between pairs of examinees. This index alone, however,may not be sufficient to capture real item sharing among examinees. In practice,

examinees may obtain test information from more than one examinee. This larger scope

of information sharing should be considered in addition to the pairwise item sharing.

That is, test overlap among a group of examinees as well as that between pairs of

examinees should be controlled in CATs.

To control item exposure and test overlap simultaneously, the relationship between

the two indices needs to be identified first. The relationship between item exposure and

pairwise test overlap has been provided by Chen, Ankenmann, and Spray (2003) andcan be expressed as:

T ¼ S2r þ �r2

�r; ð1Þ

where �r and S2r are the sample mean and variance of item exposure rates, respectively,and T is a test overlap rate defined as the proportion of items shared by pairs of exams,

averaged across all possible pairwise comparisons. Based on the relationship shown in

equation (1), Chen and Lei (2005) modified the Sympson and Hetter (1985) procedure

such that both item exposure rate and pairwise test overlap rate can be controlled

simultaneously.

Nevertheless, the test overlap considered in the Sympson and Hetter procedure with

test overlap control is defined as the proportion of items shared by pairs of exams.

Because examinees may obtain test information from more than one previous test takerin the field, controlling the number of common items shared by a group of examinees as

well as by pairs of exams is necessary. This paper extends the analytical derivation of

Chen et al. (2003) to document the relationship between item exposure rates and test

overlap rate when the proportion of items shared by a group of examinees larger than

two is of practical interest.

1.1. Theoretical lower bound for test overlapEven though controlling test overlap among a group of examinees may be complex,

Chang and Zhang (2002) provided a theoretical lower bound for the number ofcommon items shared by a group of examinees under the assumption of completely

randomized item selection. The theoretical lower bound can serve as a benchmark

for test practitioners to assess the severity of test overlap observed in practice and to

206 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

set an appropriate target test overlap rate if procedures for controlling test overlap

are available.

Chang and Zhang (2002) distinguished two forms of test overlap for a group

of examinees: item sharing and item pooling. Item sharing is defined as the number of

common items shared by all examinees in a group, while item pooling is the number

of overlapping items that an examinee has with a group of examinees. To be morespecific, let Ai be the set of items that examinee i takes, and let p be the number of CATs

administered. Item sharing (Xa) is the number of common items shared by all a

examinees in a group (i.e. number of items in >ai¼1Ai) averaged over all possible

combinations of a examinees out of the p CATs administered. Item pooling (Ya) is the

number of overlapping items between an examinee’s test and the combined set of items

exposed to a group of a examinees (i.e. the number of items in Aj > ð<ai¼1AiÞ for j – i )

averaged over all possible such combinations out of the p CATs administered. Because

examinees are likely to obtain test information from those who have taken the testearlier, it may be more efficient to consider ordered item pooling (Za) which is the

average number of items in Aj > ð<ai¼1AiÞ for j . i instead of j – i. The former form of

item pooling is henceforth referred to as unordered item pooling, and the latter as

ordered item pooling.

1.2. Purpose of the studyThe purpose of this paper is to analytically derive the relationships between item

exposure rate and each of the three forms of test overlap: item sharing, unordered item

pooling, and ordered item pooling. The accuracy of the relationships derived was

verified using numerical examples, by comparing the computational results with the

corresponding tabulations based on the definitions.

A simulation study was conducted to examine the extent of exposure control by twocurrently available exposure control procedures, the Sympson and Hetter (1985)

procedure (SH) and the Sympson and Hetter procedure with test overlap control (SHT),

in light of these new definitions of test overlap rates. The relationships derived along

with findings from the simulation study, will lay the foundation for future development

of procedures to simultaneously control item exposure and item sharing or item pooling

among a group of examinees larger than two.

2. Relationship between item exposure and test overlap

2.1. Item sharingConsidering item sharing between pairs of exams, Chen et al. (2003) showed that the

relationship between test overlap (i.e. item sharing between pairs of examinees) and

item exposure rate is given by:

T ¼Pn

i¼1 ðmi

2 Þh ð p2 Þ

¼Pn

i¼1 miðmi 2 1Þhpð p2 1Þ ¼

Pni¼1 rið pri 2 1Þhð p2 1Þ ; ð2Þ

where h is test length, n is the number of items in the pool, p is the number of CATs

administered, mi is the number of times item i is administered, and ri is the item

exposure rate of item i, defined as mi/p. The relationship shown in equation (2) is

derived based on combinatorial mathematics. Specifically, the numerator term

Item exposure and test overlap 207

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

represents the total number of items in common across all pairwise comparisons andPni¼1ð

mi

2 Þ=ð p2 Þ is the average number of items in common between a pair of examinees.

It is straightforward to extend equation (2) to item sharing among a group of a

examinees ðWaÞ, by simply substituting the number 2 in equation (2) with a (a can be 2

or greater). After simplifying and substituting mi with its equivalent ( pri ), the general

relationship between item exposure rate and item sharing among a examinees is

given by:

Wa ¼ Xa

h¼Pn

i¼1 ðmia Þ

hðpaÞ¼Pn

i¼1 miðmi 2 1Þ· · ·½mi 2 ða2 1Þ�hpð p2 1Þ· · ·½ p2 ða2 1Þ�

¼Pn

i¼1 rið pri 2 1Þ· · ·½ pri 2 ða2 1Þ�hð p2 1Þ· · ·½ p2 ða2 1Þ� ; ð3Þ

where Xa is the average number of common items shared by a group of a examinees.Item sharing can be found by using equation (3) once item usage is known. Equation (3)

is especially efficient when p the number of CATs administered, is large, indeed, as it can

be further simplified to give:

Wa ¼ Xa

h¼Pn

i¼1 rið pri 2 1Þ· · ·½ pri 2 ða2 1Þ�hð p2 1Þ· · ·½ p2 ða2 1Þ�

¼Pn

i¼1 ri ð priÞa21 þ · · ·� �

h pa21 þ · · ·� � !

p!1Pn

i¼1 rai

h: ð4Þ

According to equation (4), the average number of common items shared by a group of a

examinees can be obtained simply by adding rai across the item pool when p is large.

As item sharing between pairs of examinees (i.e. T of Chen et al., 2003) is a special

case of equation (3), the large-sample approximation of T provided by Chen et al. (2003)is similarly a special case of equation (4). That is,

W 2 ¼ T ¼Pn

i¼1 miðmi 2 1Þhpð p2 1Þ ¼

Pni¼1 rið pri 2 1Þhð p2 1Þ !

p!1Pn

i¼1 r2i

h¼ S2r þ �r2

�r: ð5Þ

Based on the relationship between item exposure rate and test overlap rate as shown in

equation (5), Chen and Lei (2005) developed a procedure to simultaneously control

item exposure and test overlap at pre-specified values by controlling both the maximum

value and the variance of item exposure rates. A similar process could be applied for

controlling item exposure and test overlap simultaneously for a . 2. However, since an

examinee might not study for just the items administered to all his or her friends but for

any item administered to any of his or her friends, controlling item pooling amongexaminees might be more realistic in practice. It is therefore important to explore the

relationship between item pooling and item exposure.

2.2. Unordered item poolingLet Ya be the average number of overlapping items between an examinee and any of the

a examinees in a group, h be test length, n be the number of items in the pool, p be the

number of CATs administered, and mi be the number of times item i is administered.

Using similar counting rules in tabulating item sharing, the proportion of overlapping

items between an examinee’s test and the combined set of items administered to

208 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

a group of a examinees, averaged over all pð p21a Þ possible such combinations out of the

total number of CATs administered, can be expressed as a function of item exposure

rates as follows (see Appendix A for details of the derivation):

Va ¼ Ya

h¼Pn

i¼1

Paj¼1 mið mi21

j Þð p2mi

a2j Þhpð p21

a Þ

¼ 121

h

Xni¼1

ðriÞ½ pð12 riÞ�½ pð12 riÞ2 1�· · ·½ pð12 riÞ2 ða2 1Þ�ð p2 1Þð p2 2Þ· · ·ð p2 aÞ :

ð6Þ

Equation (6) was constructed based on the hypergeometric distribution and

combinatorial mathematics. The numerator ðPn

i¼1

Paj¼1miðmi21

j Þðp2mi

a2j ÞÞ represents thetotal number of times that the same item is administered to an examinee and any of the aexaminees in the group across all pð p21

a Þ possible comparisons. Take a ¼ 2, for

example, and assume that item i has been administered to m examinees among p CATs

administered and examinee k is one of them examinees taking item i. That is, excluding

examinee k, item i has been administered to m2 1 examinees among p2 1 CATs

administered. To count the number of times that item i has been administered to

examinee k and any other two (a) examinees, the hypergeometric distribution could be

applied. Specifically, among p21 CATs administered (excluding examinee k), there are

ðm211 Þðp2m

221 Þ ways that item i can appear in one of the two examinees’ tests ( j ¼ 1) andðm21

2 Þðp2m222 Þ ways that item i can appear in both examinees’ tests ( j ¼ 2). Since item i

should be counted as an overlapping item regardless of whether it is administered to one

or both of the two examinees, the number of times that item i is administered to

examinee k and any other two examinees would be ðm211 Þðp2m

221 Þ þ ðm212 Þðp2m

222 Þ. The same

counting rule should be applied for the other examinees who have taken item i, and the

number of times that item i is administered to an examinee and any other two

examinees would be m½ðm211 Þðp2m

221 Þ þ ðm212 Þðp2m

222 Þ� orP2

j¼1mðm21j Þðp2m

22j Þ. A similar

counting rule should be applied for other items in the pool, hence the summation overall n items in the pool. The denominator is used to transform the total number into a

proportion.

Equation (6) is especially useful for finding the unordered item pooling rate when

the number of CATs administered is large. Indeed, for large p, equation (6) can be further

simplified to

V a ¼ 121

h

Xni¼1

ðriÞ½ pð12 riÞ�½ pð12 riÞ2 1�· · ·½ pð12 riÞ2 ða2 1Þ�ð p2 1Þð p2 2Þ · · · ð p2 aÞ

¼ 121

h

Xni¼1

ðriÞ pað12 riÞa þ · · ·� �

ð pa þ · · ·Þ !p ! 1

12

Pni¼1 rið12 riÞa

h:

ð7Þ

Based on equation (7), the proportion of overlapping items between an examinee and a

group of a examinees can be obtained easily from item exposure rates when p is large.According to the definition of the item sharing and unordered item pooling, V 1 is the

same as W 2 (i.e. average test overlap rate between a pair of examinees). Their large-

sample approximation counterparts are also equal. That is, substituting a ¼ 1 in

equation (7) simplifies to equation (5) as follows:

Item exposure and test overlap 209

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

V 1!p ! 1

12

Pni¼1 rið12 riÞ1

h¼ 12

Pni¼1 ri 2

Pni¼1 r

2i

h¼ 12

h2Pn

i¼1 r2i

h¼Pn

i¼1 r2i

h:

Like the large-sample approximation of item sharing, the simplified relationship

between unordered item pooling and item exposure rates in the large-sample

approximation as shown in equation (7) provides the necessary information for futuredevelopment of exposure control procedures to simultaneously control item exposure

and unordered item pooling. Because CATs are often administered at different times,

examinees are likely to obtain test information from those who have taken the test but

not from those who are taking the test at the same time or those who will take the test

later. It is logical to tabulate item pooling for an examinee with a group of a examinees

who have already taken the test instead of tabulating all possible combinations

disregarding the order of CAT administrations. Hence, the relationship between item

exposure and test overlap with respect to item pooling while taking into account theorder of CATs is explored below.

2.3. Ordered item poolingOrdered item pooling refers to the number of common items between an examinee anda group of a examinees who have previously taken the test. That is, the number of items

in Aj > ð<ai¼1AiÞ for j . i rather than that in Aj > ð<a

i¼1AiÞ for j – i. Controlling ordered

item pooling may meet practical needs better than controlling its unordered

counterpart.

Applying counting rules as before, the average proportion of common items

between an examinee and a group of a examinees who have previously taken the test

can be expressed as a function of item usage as

Va ¼ Za

h¼Pp

t¼aþ1

Pni¼1

Paj¼1 ½mit 2miðt21Þ�ðmit21

j Þðt2mit

a2j Þhð p

aþ1Þ; ð8Þ

where Za is the average number of overlapping items encountered by an examinee with

a examinees who have previously taken the test, h is test length, n is the number of

items in the pool, p is the number of CATs administered, andmit is the number of times

that item i has been administered to the first t examinees. Similar to equation (6), the

numerator ðPp

t¼aþ1

Pni¼1

Paj¼1½mit 2miðt21Þ�ðmit21

j Þðt2mit

a2j ÞÞ represents the total numberof times that the same item is administered to an examinee and a group of a examinees

who have previously taken the test across ð paþ1Þ possible comparisons. Since the order of

exams is taken into account, instead of using m as a multiplier as in the unordered item

pooling case, the number of overlapping items is calculated for each examinee after a

examinees have taken the test. In addition, the number of overlapping items is affected

only when item i is administered to examinee t. When item i is not administered to

examinee t, the number of overlapping items remains the same because

mit 2miðt21Þ ¼ 0. With regard to the number of all possible comparisons, instead ofconsidering pðp21

a Þ overlap comparisons as in the unordered item pooling case, only

ð paþ1Þ overlap comparisons need to be considered for the ordered case; ð p

aþ1Þ is the sumover comparisons for each examinee after a examinees have taken the test, that is,

ð paþ1Þ ¼ ðaaÞ þ ðaþ1

a Þ þ ðaþ2a Þ þ · · ·þ ðp21

a Þ (see Appendix B for details of the derivation).

210 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Recent research on exposure control procedures shows interest in controlling item

exposure on the fly instead of relying on large tedious simulations conducted prior to

operational CATs to derive stable itemexposureparameters (e.g. Chen, Lei, & Liao, 2008).

It is useful to express ordered item pooling rate for the first t examinees ðVa;tÞ as afunction of ordered item pooling rate for the previous (t 2 1) examinees ðVa;ðt21ÞÞ.A simplified expression (derived in Appendix C) is

Va;t ¼t 2 a2 1

tVa;ðt21Þ þ

aþ 1

t2

Pni¼1 ½mit 2miðt21Þ�ðt2mit

a Þhð t

aþ1Þ: ð9Þ

This relationship could be used to develop exposure control procedures to control

ordered item pooling on the fly. To be more specific, the numerator term l ¼Pni¼1½mit 2miðt21Þ�ðt2mit

a Þ is critical in controlling Va;t because a, t, h, and Va;ðt21Þ areknown constants after (t 2 1) examinees have taken tests. To have Va;t less than a pre-

specified test overlap rate Vmax, the relationship between l and Vmax can be expressed

as follows:

Va;t ¼t2a21

tVa;ðt21Þþ

aþ1

t2

Pni¼1½mit2miðt21Þ�ðt2mit

a Þhð t

aþ1Þ#Vmax; ð10Þ

l¼Xni¼1

½mit2miðt21Þ�ðt2mit

a Þ$hð taþ1Þ

t2a21

tVa;ðt21Þþ

aþ1

t2Vmax

� �¼d: ð11Þ

Even though l is the sum of ½mit2miðt21Þ�ðt2mita Þ over items in the pool, only h items

(where h is test length) would affect l because mit 2 mi(t21) – 0 for these h items but

mit 2 mi(t21) ¼ 0 for the rest of items in the pool. To have l $ d, each itemadministered should contribute at least l/h to l, where the contribution of each item is

defined as ½mit2miðt21Þ�ðt2mita Þ. In other words, to haveVa;t less thanVmax, the value of

l/h should serve as an item selection criterion to determine if a selected item can be

administered to the tth examinee or not. A study is currently under way to investigate

the effects of the proposed procedure on controlling ordered item pooling for a group of

examinees larger than two.

3. Verification using a numerical example

A simple example is now provided to show the correctness of above derived equations.

Suppose that an item pool consists of n ¼ 10 items, from which p ¼ 5 fixed-length CATs

are administered, each CAT consisting of h ¼ 5 items. Table 1 shows the items that are

administered in each of the five CATs, and Table 2 shows the number of times each item

is used (mi).

Table 1. Items administered in each of five CATs

CAT Items administered

A1 2 5 4 7 8

A2 1 3 4 6 2

A3 7 9 2 5 8

A4 8 7 1 3 2

A5 2 1 6 7 4

Item exposure and test overlap 211

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

3.1. Item sharingItem sharing ðWaÞ is defined as the proportion of items shared by a exams, averagedover all possible combinations of a out of the p CATs administered. That is, Wa ¼ð1=hÞ £ ð1=ðpaÞÞ £ ðtotal number of common items among a exams over all possible

combinations of a out of p CATs). For a ¼ 2 out of p ¼ 5 CATs,

W 2 ¼1

1

ð52ÞðA1A2 þA1A3 þA1A4 þA1A5 þA2A3 þA2A4 þA2A5 þA3A4 þA3A5 þA4A5Þ

¼ 1

1

10ð2 þ 4 þ 3 þ 3 þ 1 þ 3 þ 4 þ 3 þ 2 þ 3Þ ¼ 1

50£ 28 ¼ 0:56:

Note that the notation Ai Aj above indicates the number of common items in set Ai andset Aj. For example, there are two common items between examinees 1 and 2, items 2

and 4 (see Table 1). Because there are ðpaÞ¼ð52Þ¼10 possible pairs for five CATs, there are

10 Ai Aj terms. The average number of common items between pairs of examinees is

divided by test length (h ¼ 5) so thatW 2 represents the average proportion of common

items shared by two examinees. Similarly, for a ¼ 3,

W 3 ¼1

5£1

ð53ÞðA1A2A3þA1A2A4þA1A2A5þA1A3A4þA1A3A5þA1A4A5þA2A3A4

þA2A3A5þA2A4A5þA3A4A5Þ

¼ 1

5£1

10ð1þ1þ2þ3þ2þ2þ1þ1þ2þ2Þ¼ 1

50£17¼0:34:

Substituting the item usage information from Table 2 into equation (3), the item sharing

rate for a¼2 and 3, respectively, is

W 2 ¼ ð32Þþð52Þþð22Þþð32Þþð22Þþð22Þþð42Þþð32Þþð12Þþð02Þ5£ð52Þ

¼ 3þ10þ1þ3þ1þ1þ6þ3þ0þ0

5£10¼28

50¼0:56

W 3 ¼ð33Þþð53Þþð23Þþð33Þþð23Þþð23Þþð43Þþð33Þþð13Þþð03Þ

5£ð53Þ

¼ 1þ10þ0þ1þ0þ0þ4þ1þ0þ0

5£10¼17

50¼0:34:

Based on the results above, it is clear that the item sharing rate found by using

equation (3) is identical to that based on the definition for each a value.

Table 2. Item usage corresponding to items administered in Table 1

Item (i ) 1 2 3 4 5 6 7 8 9 10

Number of times item i is used (mi) 3 5 2 3 2 2 4 3 1 0

212 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

3.2. Unordered item poolingSimilarly, the correctness of equation (6) can be verified against the corresponding

tabulation based on the definition of unordered item pooling. The unordered item

pooling rate is defined as the proportion of overlapping items between an examinee’s

test and the combined set of items exposed to a group of a examinees, averaged over all

P ðp21a Þ possible such combinations out of the total number of p CATs administered. The

notation A1A2· · ·AikAj below indicates the number of overlapping items between

examinee j’s test and the union set of items administered to a group of a examinees

(examinees 1, 2, : : : , i, i – j ). According to the definition of unordered item pooling,

the test overlap rate for a ¼ 1 and 2, respectively, is

V 1 ¼1

1

5 £ ð41ÞðA1kA2 þ A1kA3 þ A1kA4 þ A1kA5 þ A2kA1 þ A2kA3 þ A2kA4 þ A2kA5

þ A3kA1 þ A3kA2 þ A3kA4 þ A3kA5 þ A4kA1 þ A4kA2 þ A4kA3 þ A4kA5 þ A5kA1

þ A5kA2 þ A5kA3 þ A5kA4Þ

¼ 1

1

20ð2þ 4þ 3þ 3þ 2þ 1þ 3þ 4þ 4þ 1þ 3þ 2þ 3þ 3þ 3þ 3þ 3

þ 4þ 2þ 3Þ ¼ 1

100£ 56 ¼ 0:56;

V 2 ¼1

1

5 £ ð42ÞðA1A2kA3 þ A1A2kA4 þ A1A2kA5 þ A1A3kA2 þ A1A3kA4 þ A1A3kA5

þ A1A4kA2 þ A1A4kA3 þ A1A4kA5 þ A1A5kA2 þ A1A5kA3 þ A1A5kA4 þ A2A3kA1

þ A2A3kA4 þ A2A3kA5 þ A2A4kA1 þ A2A4kA3 þ A2A4kA5 þ A2A5kA1 þ A2A5kA3

þ A2A5kA4 þ A3A4kA1 þ A3A4kA2 þ A3A4kA5 þ A3A5kA1 þ A3A5kA2 þ A3A5kA4

þ A4A5kA1 þ A4A5kA2 þ A4A5kA3Þ

¼ 1

1

30ð4þ 5þ 5þ 2þ 3þ 3þ 4þ 4þ 4þ 4þ 4þ 4þ 5þ 5þ 5þ 4þ 3þ 5

þ 3þ 2þ 4þ 4þ 3þ 3þ 5þ 4þ 4þ 4þ 5þ 3Þ ¼ 1

150£ 117 ¼ 0:78:

The corresponding unordered item pooling rate based on equation (6) for a ¼ 1 and 2,respectively, is

V 1 ¼6þ 20þ 2þ 6þ 2þ 2þ 12þ 6þ 0þ 0

5 £ 5 £ 4¼ 56

100¼ 0:56;

V 2 ¼15þ 30þ 6þ 15þ 6þ 6þ 24þ 15þ 0þ 0

5 £ 5 £ 6¼ 117

150¼ 0:78:

Again, the two ways of calculating unordered item pooling rates produce the same

results. Note that V 1 ¼ W 2 ¼ 0:56 in this simple example as discussed above.

3.3. Ordered item poolingThe ordered item pooling rate is defined as the proportion of common items between an

examinee and a group of a examinees who have previously taken the test, averaged over

the ð paþ1Þ possible combinations out of p CATs. The notation A1A2· · ·AikAj below

Item exposure and test overlap 213

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

indicates the number of overlapping items between examinee j’s test and the union

set of items administered to a group of a examinees before examinee j (examinees

1, 2, : : : , i, i , j ). According to the definition of ordered item pooling, the test overlap

rate for a ¼ 1 and 2, respectively, is

V1 ¼1

1

ð52ÞðA1kA2 þ A1kA3 þ A1kA4 þ A1kA5 þ A2kA3 þ A2kA4 þ A2kA5

þ A3kA4 þ A3kA5 þ A4kA5Þ

¼ 1

1

10ð2þ 4þ 3þ 3þ 1þ 3þ 4þ 3þ 2þ 3Þ ¼ 1

50£ 28 ¼ 0:56;

V2 ¼1

1

ð53ÞðA1A2kA3 þ A1A2kA4 þ A1A2kA5 þ A1A3kA4 þ A1A3kA5

þ A1A4kA5 þ A2A3kA4 þ A2A3kA5 þ A2A4kA5 þ A3A4kA5Þ

¼ 1

1

10ð4þ 5þ 5þ 3þ 3þ 4þ 5þ 5þ 5þ 3Þ ¼ 1

50£ 42 ¼ 0:84:

To tabulate the ordered item pooling rate using equation (8), item usage needs to be

updated as each CAT is administered, which is done routinely in on-the-fly exposure

control procedures. Corresponding to the data in Table 1, Table 3 details the number of

times that each item is administered for the first t examinees (mit), t ¼ 1; 2; : : : ; 5.Substituting the data of Table 3 into equation (8), the ordered item pooling rate for a ¼ 1

and 2, respectively, is

V1 ¼ð2Þ þ ð2þ 3Þ þ ð2þ 3þ 4Þ þ ð2þ 4þ 2þ 1þ 3Þ

5 £ 10¼ 28

50¼ 0:56;

V2 ¼ð1þ 3Þ þ ð4þ 3þ 6Þ þ ð10þ 6þ 3þ 6Þ

5 £ 10¼ 42

50¼ 0:84:

As expected, the two ways of tabulating the ordered item pooling rate provide identical

results. Ordered item pooling can also be calculated based on the rate calculated for

previous examinee using equation (9). Taking a ¼ 2 and t ¼ 3, 4, 5 for example and

using the data of Table 3,

Table 3. Item usage corresponding to items administered in Table 1 for the first t examinee(s)

Item (i )

t 1 2 3 4 5 6 7 8 9 10

1 0 1 0 1 1 0 1 1 0 0

2 1 2 1 2 1 1 1 1 0 0

3 1 3 1 2 2 1 2 2 1 0

4 2 4 2 2 2 1 3 3 1 0

5 3 5 2 3 2 2 4 3 1 0

214 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

V2;3 ¼4

5¼ 0:8;

V2;4 ¼42 22 1

40:8þ 2þ 1

42

ð4222 Þ þ ð424

2 Þ þ ð4222 Þ þ ð423

2 Þ þ ð4232 Þ

5ð 42þ1Þ

¼ 0:8

4þ 3

42

2

20¼ 0:85;

V2;5 ¼52 22 1

50:85þ 2þ 1

52

ð5232 Þ þ ð525

2 Þ þ ð5232 Þ þ ð522

2 Þ þ ð5242 Þ

5ð 52þ1Þ

¼ 1:7

5þ 3

52

5

50¼ 0:84:

The result of V2;5 using equation (9) is identical to the result of V2 using equation (8) as

expected because equations (8) and (9) are simply different ways of calculating ordered

item pooling for the same first five examinees.

4. Simulation study

The effectiveness of two currently available exposure control procedures, SH and SHT,and a no exposure control procedure was evaluated in light of these new definitions of

test overlap rates using simulation CAT studies. The purpose of the simulation study was

to demonstrate the level of test overlap in light of the new definitions using no exposure

control and some current exposure control methods to gauge the level of security

afforded by these methods under the more realistic new definition of test overlap.

Two exposure control methods that target maximum item exposure rate were selected

for this purpose and they were by no means exhaustive of exposure control methods.

The SH procedure is designed to control maximum item exposure rate and is commonlyused in practice, while the SHT procedure controls test overlap between pair, of

examinees in addition to the maximum item exposure rate. The CAT specifications are

described below.

4.1. CAT specificationsAll simulated data were generated by using the three-parameter logistic item response

model (3PLM); the maximization of item information was the criterion used for item

selection at each stage; expected a posteriori (EAP) estimation with a u , Nð0; 1Þ priordistribution was used for both provisional and final trait estimation and the initial trait

estimate was assumed to be zero. The adaptive tests were administered to 10,000

simulees drawn from a standard normal distribution N(0,1) on the theta metric. For the

item exposure control procedures, two levels of test security were considered. For a

high level of test security, rmax ¼ 0.2 was used for the SH procedure, and SHT was

implemented with rmax ¼ 0.2 and Tmax ¼ 0:1. For a low level of test security, rmax ¼ 0.3

was selected for the SH procedure, and the SHT procedure was conducted with

rmax ¼ 0.3 and Tmax ¼ 0:2.A real item pool consisting of 400 mathematics items with item parameters

calibrated by using the 3PLMwas employed. The content areas covered by the item pool

were: pre-algebra, 84 items (21%); elementary algebra, 60 items (15%); intermediate

algebra, 54 items (13.5%); coordinate geometry, 54 items (13.5%); plane geometry,

84 items (21%); and trigonometry, 64 items (16%).

Item exposure and test overlap 215

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Two fixed lengths were used: 20 and 40 items. For the 20-item test, the content areas

were distributed as follows: pre-algebra, four items (20%); elementary algebra, three

items (15%); intermediate algebra, three items (15%); coordinate geometry, three items

(15%); plane geometry, four items (20%); and trigonometry, three item (15%). Thus, the

distribution of content areas in the 20-item test was similar to that of the item pool. For

the 40-item test, the number of items in each content area was twice that of the 20-itemtest. A multinomial model proposed to randomize content area selection by Chen and

Ankenmann (2004) was implemented in this study to ensure content balance. In the

approach of the multinomial model, a cumulative distribution of content areas was

formed based on the target percentages. A random number drawn from Uð0; 1Þ was

then used to identify the corresponding content area in the cumulative distribution. As a

result, the target percentage in each content area can be met exactly. See Chen and

Ankenmann (2004) for more details of the multinomial model.

4.2. ResultsEmpirical test overlap rates for item sharing, unordered item pooling, and ordered

item pooling were compared to the corresponding theoretical lower bounds of Chang

and Zhang (2002), where randomized item selection is considered, to gauge the level

of test overlap achieved by these exposure control procedures. Table 4 reports the

empirical test overlap rates for the CAT simulations using no exposure control (No),SH, and SHT, as well as the corresponding theoretical lower bound of the test overlap

rates for a values from 1 to 5. Note that the classical definition of test overlap

was reported as a ¼ 2 for item sharing and as a ¼ 1 for both ordered and unordered

item pooling.

It is clear from Table 4 that test overlap rates for item sharing decreased as a value

increased. However, item sharing as defined here may not be a realistic practice. In

contrast, test overlap rates for item pooling, ordered or unordered, increased as a value

increased. This makes reasonable sense because test security is more likely to becompromised when more individuals are revealing test information. Between ordered

and unordered item pooling, their test overlap rates were very similar (identical to the

second decimal place) for this large number of CATs (10,000).

Even though maximum item exposure rates could be controlled by the SH

procedure, the item pooling test overlap rates for the SH procedure with rmax ¼ .3

(SH_L) were close to those observed from no exposure control, especially for large a

values. Taking the 20-item test and a ¼ 5, for example, the item pooling test overlap rate

was .70 for SH_L and .78 when item exposure was not controlled. In other words, theitem pooling test overlap could not be controlled by the SH_L procedure. For the 20-

item test, there were 14 items in common on average between an individual and five

other test takers when the SH procedure was implemented with rmax ¼ .3.

Thehigh itempooling test overlap observed in the SH_Lprocedure couldbe reducedby

implementing the SHTprocedurewith rmax ¼ .3 and Tmax ¼ :2 (SHT_L). It is clear that testoverlap between pairs of examinees (a ¼ 1) was well controlled by the SHT_L procedure

regardless of test length. However, item pooling test overlap rates for the other a values

were too high to be acceptable in practice. Even though the high item pooling test overlapobserved from both the SH_L and SHT_L procedures could be reduced by increasing the

stringencyofexposure control (i.e. smaller valuesof rmax andTmax; see results for SH_Hand

SHT_H), itempooling test overlap rateswere still far above their corresponding theoretical

lower bounds, especially for tests with short test length.

216 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Table

4.Empirical

test

overlap

ratesforCATsimulationsan

dtheoretical

lowerbounds

Item

exposure

control

Item

sharing

Unordereditem

pooling

Ordereditem

pooling

a2

34

51

23

45

12

34

5

20-Item

test

No

.352

.162

.084

.047

.352

.543

.656

.728

.778

.352

.542

.655

.727

.777

SH_L

.238

.064

.018

.005

.238

.412

.540

.635

.705

.238

.412

.539

.634

.704

SHT_L

.202

.048

.012

.003

.202

.356

.474

.566

.637

.202

.356

.473

.564

.636

SH_H

.173

.033

.006

.001

.173

.313

.427

.520

.596

.173

.314

.428

.521

.597

SHT_H

.105

.012

.002

.000

.105

.197

.279

.352

.416

.105

.197

.279

.351

.416

LB

.050

.003

.000

.000

.050

.098

.143

.185

.226

40-Item

test

No

.390

.193

.107

.063

.390

.586

.698

.766

.812

.390

.586

.697

.766

.812

SH_L

.250

.068

.019

.006

.250

.431

.563

.659

.731

.250

.431

.563

.660

.732

SHT_L

.204

.045

.010

.002

.204

.363

.487

.585

.662

.204

.363

.488

.586

.662

SH_H

.182

.035

.007

.001

.182

.329

.448

.544

.622

.182

.329

.448

.544

.622

SHT_H

.111

.013

.002

.000

.111

.209

.296

.373

.441

.111

.209

.296

.373

.441

LB

.100

.010

.001

.000

.100

.190

.271

.344

.410

Note.Testoverlap

ratesforitem

sharingan

dunordereditem

poolingwere

calculatedusingthelarge-sam

pleap

proxim

ations.Theclassicaldefinitionoftest

overlap

was

reportedas

2foritem

sharingan

das

1forboth

orderedan

dunordereditem

pooling.

No,noitem

exposure

control;SH

_L,SH

procedure

withless

stringentex

posure

control,r m

ax¼

.3;SH

T_L,SH

Tprocedure

withless

stringentex

posure

control,r m

ax¼

.3,Tmax

¼0:2;SH

_H,SH

procedure

withstringentex

posure

control,r m

ax¼

.2;SH

T_H,SH

Tprocedure

withstringentex

posure

control,

r max¼

.2,T

max

¼:1;LB,theoretical

lowerboundsbasedonthederivationofChan

gan

dZhan

g(2002)underrandom

item

selection.

Item exposure and test overlap 217

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Figures 1 and 2 present the unordered item pooling rates for the four item selection

procedures for the 20-item test and 40-item test, respectively, where rmax ¼ .2 was used

for the SH procedure, and SHT was implemented with rmax ¼ .2 and T max ¼ :1. Asexpected, the selection procedure without exposure control (i.e. based on the

maximum item information criterion only) produced the highest unordered item

pooling rates. The SH selection procedure provided much lower test overlap rates (by

about .20) than the procedure without exposure control but not as low as those

Figure 1. Unordered item pooling rates for 20-item test (rmax ¼ .2, Tmax ¼ :1).

Figure 2. Unordered item pooling rates for 40-item test (rmax ¼ :2; Tmax ¼ :1).

218 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

provided by the SHT procedure with test level exposure control. The difference

between the SH and the SHT procedures in unordered item pooling rates appeared to

widen gradually as a value increased. This seems to support the importance of

controlling test overlap in conjunction with item exposure. Compared to the

randomized selection procedure, the SHT procedure appeared to have some room for

improvement especially when the test was short (20 items). When the test was long (40items), however, the unordered item pooling rates for the SHT procedure were just

slightly higher than those for the randomized selection procedure (by up to .031).

Because the large-sample approximation for the unordered item pooling rate

(equation (7)) is much simpler computationally than the exact rate given by equation

(6), it would be interesting to find out how large a value of p (number of examinees) is

required for the approximation to function satisfactorily. Figure 3 shows the unordered

item pooling rates calculated by the exact and approximation formulas for the first 1,000

examinees using a ¼ 5, h ¼ 20, and the SHT selection procedure with rmax ¼ .2,T max ¼ :1. (We plotted data for the first 1,000 examinees because we expected the

approximation to work well within sample size of 1,000. Note that the exposure times

of each item would change when the number of examinees changed in the tabulation of

test overlap rates.) The approximated rates were higher than the exact rates for the first

200 or so examinees, and the difference gradually disappeared as the number of

examinees increased.

Due to the similarity between ordered and unordered item pooling test overlap rates

and the complexity of equations (8) and (9) for tabulating ordered item pooling, theunordered rate can serve as a large-sample approximation for the ordered rate. Figure 4

displays the ordered and unordered item pooling rates for the first 1,000 examinees

Figure 3. Exact and approximated unordered item pooling rates for a ¼ 5, test length ¼ 20, and

with SHT exposure control using rmax ¼ :2; Tmax ¼ :1.

Item exposure and test overlap 219

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

using a ¼ 5, h ¼ 20, and the SHT selection procedure with rmax ¼ .2, T max ¼ :1. Theunordered rates might overestimate or underestimate the ordered counterpart

somewhat when the number of CATs administered was small (less than 250). The

ordered and unordered rates became virtually indistinguishable after 250 examinees.In addition, the accuracy of the large-sample approximation for item sharing

(equation (4)) and unordered item pooling (equation (7)) rates can be verified by using

the results from the theoretical lower bounds. Taking a¼3 and test length (h) equal to

40, for example, under the randomized item selection, item exposure rate is equal to

40/400 for each item in the 400-item pool. By using equation (4) the large-sample

approximation for the item sharing rate isP400

i¼10:13=40 ¼ 0:01, which is identical to the

test overlap rate (value in bold) observed in the last row of Table 4. The large-sample

approximation for the unordered item pooling the using equation (7) is12

P400i¼10:1ð0:9Þ3=40 ¼ 0:271, which is identical to the corresponding test overlap

rate (value in bold) in Table 4. While the lower bounds for test overlap rates are

applicable for randomized item selection only, equations (4) and (7) can be used to

obtain item sharing and unordered item pooling rates, respectively, for any item

selection procedures implemented in practice as long as item exposure rates (ri) are

known. The CAT simulations above demonstrated the use of two item selection

procedures with exposure control.

5. Conclusions

Controlling itemexposure, test overlap, or both is a popularmeans to increase test security

in CATs. To date, exposure control procedures that are designed to control item exposure

and test overlap simultaneously are based on the assumption of item sharing between pairs

Figure 4. Ordered and unordered item pooling rates for a ¼ 5, test length ¼ 20, and with SHT

exposure control using rmax ¼ :2; T max ¼ :1.

220 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

of examinees. However, examinees may obtain test information frommore than one other

examinee in practice. This larger scope of information sharing needs to be taken into

account in estimating test security level and in refining exposure control procedures.

This paper examines three different definitions of test overlap rate to reflect

possible ways of sharing information among a group of examinees larger than two.

Item sharing should be considered if one believes that examinees would only takeadvantage of information on items that have been administered to all examinees in a

group. Item pooling is probably more realistic in that examinees would take

advantage of information on any items that have been administered to a group.

Between the two variants of item pooling, the ordered version (taking into account

order of exams) is likely to be more reasonable than the unordered version

(disregarding order of exams) given the continuous nature of CATs. Note that the

mathematical definitions of test overlap do not specify what constitutes a group.

Researchers and practitioners can narrow the definition to any group of interest bystipulating the attributes of the group.

Because procedures designed to control test overlap and item exposure

simultaneously would have item exposure rates or item usage information readily

available, it is useful to derive the relationship between item exposure rates and test

overlap rate. Following the same approach of Chen et al. (2003), we document the

mathematical relationships between item exposure rates and each of the definitions of

test overlap rate for a group of examinees in this paper. Large-sample approximations are

also provided, when available, to simplify their relationships. Moreover, the test overlaprate for ordered item pooling is expressed as a function of that calculated for previous

examinees to make the development of exposure control procedures on the fly (i.e.

relevant statistics are updated for each CAT) more convenient.

Empirical test overlap rates based on the new definitions for three item selection

procedures, maximum information without exposure control, SH, and SHT, are

examined along with the theoretical lower bounds. Results are as expected: the more

stringent the exposure control, the lower the overlap rates. Item pooling rates are

substantially higher than the theoretical lower bounds even for the most stringentexposure control procedure examined here (SHT), especially when the test is short.

Moreover, test overlap rates for item pooling among a group of examinees larger than

two are significantly higher than those for item sharing between pairs of examinees (the

classical definition of test overlap). More stringent exposure control than that afforded

by the currently available procedures may be needed if test information pooling

generally happens in groups larger than two.

Furthermore, empirical evidence suggests that the large-sample approximation for

the unordered item pooling rates converges to the exact rate quite quickly (afteraround 200 examinees). The unordered rate also converges quickly to the ordered

rate (after about 250 examinees). Even though the unordered rates may not be as

applicable as the ordered rates in practice, these two indices are indistinguishable

when the number of CATs is not too small. Given the simplicity of the large-sample

approximation for unordered rates provided in equation (7), the ordered rates could

be approximated efficiently by using the large-sample approximation of unordered

rates. The large-sample approximation of unordered rates is especially useful for

estimating the lower bound for the ordered rates, which cannot be theoreticallyderived. That is, the lower bound for ordered rates can be estimated by applying

equation (7) with the item exposure rate of each item set equal to the ratio of test

length to pool size.

Item exposure and test overlap 221

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Empirical test overlap rates based on the new definitions are provided only for two

exposure control procedures that explicitly target maximum item exposure rate (i.e. SH

and SHT). Even though the two procedures can control item exposure (and test overlap)

well and are commonly used in practice, they cannot increase exposure rates for

underexposed items. Another popular exposure control procedure, the a-stratified

method proposed by Chang and Ying (1999), can effectively balance item usage. Byimplementing the a-stratified method, there are few items with exposure rates equal to

zero and item usage is well balanced without sacrificing the efficiency in ability

estimation (Chang & Ying, 1999). However, the maximum of observed item exposure

rates may not be controlled by implementing the a-stratified method alone. It would be

interesting to compare the SH and SHT procedures to the a-stratified method on

empirical test overlap rates based on the new definitions.

As noted earlier, the new definitions of test overlap do not specify what constitutes a

group of examinees and the formulas derived in this paper can be applied to differentdefinitions of groups with different characteristics. In practice, test practitioners would

consider test overlap conditional on ability because item information is more likely to be

shared among individuals with similar ability. To find conditional test overlap rates, the

formulas presented in this paper can simply be applied to the group of examinees with

similar ability rather than the larger group containing all ability levels. Since examinees

with similar ability levels tend to take similar items in CATs, it could be expected that

conditional test overlap rates would be much higher than the unconditional test overlap

rates observed in this study. For exposure control conditional on ability levels,conditional procedures that consider ability subgroups such as the Stocking and Lewis

(1998) conditional procedure can be used, and they are expected to provide lower

conditional test overlap rates. Future studies should compare the Stocking and Lewis

conditional procedure and the item exposure control procedures implemented in this

study on the level of test overlap conditional on ability.

In addition to conditioning on ability level, test overlap conditioning on testing time

could be considered because an individual would be more likely to get item information

from the most recent examinees than from examinees who took the test years ago. Tofind the test overlap rate conditional on testing time, the formulas presented in this

paper can likewise be applied to the group of examinees who take the test in close

proximity.

In sum, this paper lays the groundwork for future developments of exposure control

procedures to simultaneously control item exposure and test overlap rates for more

realistic item information sharing situations, in which examinees may obtain test

information from a group of two or more examinees. We have illustrated the initial

development process of such a procedure for controlling ordered item pooling on thefly based on equation (9). Similar development processes can be followed if one would

like to control for item sharing or unordered item pooling. Large-sample approximations

for both item sharing and unordered item pooling depend on a power function of item

exposure rates. One can start by expanding the power function of order a, which varies

by the number of examinees sharing or pooling information. However, developing the

actual exposure control algorithms is beyond the scope of the paper.

Future studies should naturally develop appropriate exposure control procedures

based on the relationships shown in this paper and thoroughly evaluate their effect onboth test security and score precision. In addition, how examinees share test

information is likely to depend on the stake and nature of the test as well as

characteristics of test takers. It may be worthwhile to investigate the common group

222 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

sizes (i.e. determining a reasonable a value) for different tests and different populations

of test takers.

Acknowledgements

This study was supported by the National Science Council, Taiwan (NSC 97-2410-H-194-107-MY3).

References

Chang, H.-H., & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied

Psychological Measurement, 23, 211–222.

Chang, H.-H., & Zhang, J. (2002). Hypergeometric family and item overlap rates in computerized

adaptive testing. Psychometrika, 67, 387–398.

Chen, S., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the

early stages of computerized adaptive testing. Journal of Educational Measurement, 41,

149–174.

Chen, S., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and

test overlap in computerized adaptive testing. Journal of Educational Measurement, 40,

129–145.

Chen, S., & Lei, P. (2005). Controlling item exposure and test overlap in computerized adaptive

testing. Applied Psychological Measurement, 29, 204–217.

Chen, S., Lei, P., & Liao, W. (2008). Controlling item exposure and test overlap on the fly in

computerized adaptive testing. British Journal of Mathematical and Statistical Psychology,

61, 471–492.

Davey, T., & Parshall, C. G. (1995, April). New algorithms for item selection and exposure control

with computerized adaptive testing. Paper presented at the annual meeting of the American

Educational Research Association, San Francisco.

Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in

computerized adaptive testing. Journal of Educational Measurement, 35, 311–327.

Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in

computerized adaptive testing. Journal of Educational and Behavior Statistics, 23, 57–75.

Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized

adaptive testing. Proceedings of the 27th Annual Meeting of the Military Testing Association

(pp. 973–977). San Diego, CA: Navy Personnel Research and Development Center.

Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational

Measurement: Issues and Practice, 17, 17–27.

Received 7 February 2008; revised version received 3 February 2009

Item exposure and test overlap 223

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Appendix A. Derivation of unordered item pooling as a function of itemexposure rates

V a¼

Pni¼1

Paj¼1mi

mi21

j

!p2mi

a2j

!

hpp21

a

! withmi21

j

!¼0 if mi21, j

¼Xni¼1

mi

hp

mi21

1

!p2mi

a21

mi21

2

!p2mi

a22

!þ ···þ

mi21

a

!p2mi

0

!

p21

a

!266664

377775

¼Xni¼1

mi

hp12

p2mi

a

!

p21

a

!266664

377775 {

Paj¼0

x

j

y

a2j

!

xþy

a

! ¼1:0

¼Xni¼1

mi

hp12

ð p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þð p21Þð p22Þ ··· ð p212aþ1Þ

� �

¼Xni¼1

mi

hp2Xni¼1

mi

hp

ð p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þð p21Þð p22Þ ··· ð p212aþ1Þ

� �

¼12Xni¼1

mið p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þhpð p21Þð p22Þ ··· ð p2aÞ {

Xni¼1

mi

hp¼Xni¼1

ri

h¼h

h¼1

¼12Xni¼1

ð priÞð p2priÞð p2pri21Þ ··· ð p2pri2aþ1Þhpð p21Þð p22Þ ··· ð p2aÞ

¼121

h

Xni¼1

ðriÞ½ pð12riÞ�½ pð12riÞ21Þ� ··· ½ pð12riÞ2ða21Þ�ð p21Þð p22Þ ··· ð p2aÞ :

224 Shu-Ying Chen and Pui-Wa Lei

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Appendix B. Derivation of the number of all possible comparisons forordered item pooling

p

aþ1

a

a

aþ1

a

aþ2

a

!þ ···þ

p21

a

!

¼p21

a

0@

1Aþ

p21

aþ1

0@

1A{

x

y

0@1A¼

x21

y21

0@

1Aþ

x21

y

0@

1A ðPascal’s triangleÞ

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p22

aþ1

0@

1A

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p23

a

0@

1Aþ

p23

aþ1

0@

1A

..

.

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p23

a

0@

1Aþ

p24

a

0@

1Aþ···þ

aþ2

a

0@

1Aþ

aþ2

aþ1

0@

1A

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p23

a

0@

1Aþ

p24

a

0@

1Aþ···þ

aþ2

a

0@

1Aþ

aþ1

a

0@

1Aþ

aþ1

aþ1

0@

1A

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p23

a

0@

1Aþ

p24

a

0@

1Aþ···þ

aþ2

a

0@

1Aþ

aþ1

a

0@

1A

þa

a

0@1Aþ

a

aþ1

0@

1A

¼p21

a

0@

1Aþ

p22

a

0@

1Aþ

p23

a

0@

1Aþ

p24

a

0@

1Aþ···þ

aþ2

a

0@

1Aþ

aþ1

a

0@

1Aþ

a

a

0@1Aþ0

Item exposure and test overlap 225

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Appendix C. Derivation of ordered item pooling for the first t examinees

Vat ¼

Vaðt21Þh

t 2 1

aþ 1

0@

1Aþ

Pni¼1

Paj¼1 ½mit 2miðt21Þ�

mit 2 1

j

0@

1A t 2mit

a2 j

0@

1A

h

t

aþ 1

0@

1A

¼

Vaðt21Þh

t 2 1

aþ 1

0@

1Aþ

Pni¼1 ½mit 2miðt21Þ�

t 2 1

a

0@

1A2

t 2mit

a

0@

1A

24

35

h

t

aþ 1

0@

1A

;

{

x þ y

a

0@

1A ¼

Xaj¼0

x

j

0@

1A y

a2 j

0@

1A

¼ t 2 a2 1

t

� �Vaðt21Þ

þ

h

t 2 1

a

0@

1A2

Pni¼1 ½mit 2miðt21Þ�

t 2mit

a

0@

1A

h

t

aþ 1

0@

1A

¼ t 2 a2 1

t

� �Vaðt21Þ

þ aþ 1

t2

Pni¼1 ½mit 2miðt21Þ�

t 2mit

a

0@

1A

h

t

aþ 1

0@

1A

226 Shu-Ying Chen and Pui-Wa Lei