investigating the relationship between item exposure and test overlap: item sharing and item pooling
TRANSCRIPT
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Investigating the relationship between itemexposure and test overlap: Item sharing anditem pooling
Shu-Ying Chen1* and Pui-Wa Lei21National Chung-Cheng University, Chia-Yi, Taiwan, Republic of China2The Pennsylvania State University, USA
To date, exposure control procedures that are designed to control item exposure andtest overlap simultaneously are based on the assumption of item sharing between pairsof examinees. However, examinees may obtain test information from more than oneexaminee in practice. This larger scope of information sharing needs to be taken intoaccount in refining exposure control procedures. To control item exposure and testoverlap among a group of examinees larger than two, the relationship between the twoindices needs to be identified first. The purpose of this paper is to analytically derive therelationships between item exposure rate and each of the two forms of test overlap,item sharing and item pooling, for fixed-length computerized adaptive tests. Itemsharing is defined as the number of common items shared by all examinees in a group,while item pooling is the number of overlapping items that an examinee has with agroup of examinees. The accuracy of the derived relationships was verified usingnumerical examples. The relationships derived will lay the foundation for futuredevelopment of procedures to simultaneously control item exposure and item sharingor item pooling among a group of examinees larger than two.
1. Introduction
Due to significant progress in computer technology, computerized adaptive testing (CAT)
has grown in popularity in recent years. Many conventional paper-and-pencil (P&P) tests
are now offered in a CAT format. It is well known that CAT has many advantages over
conventional P&P tests. These advantages, however,may not be sustained if test items are
compromised. When examinees can easily obtain test information from previous testtakers and answer items correctly simply based on item pre-knowledge rather than on
*Correspondence should be addressed to Dr Shu-Ying Chen, Department of Psychology, National Chung-Cheng University, 168 University Road, Min-Hsiung, Chia-Yi, Taiwan 62107, Republic of China(e-mail: [email protected]).
TheBritishPsychologicalSociety
205
British Journal of Mathematical and Statistical Psychology (2010), 63, 205–226
q 2010 The British Psychological Society
www.bpsjournals.co.uk
DOI:10.1348/000711009X430906
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
their proficiency, the observed test scores would be invalid. Item sharing among
examinees does not raise great concern for conventional P&P testing because most P&P
tests are administered on periodic test schedules and most items are not reused. Item
sharing does raise great concern for CATs because most CATs are continuous tests and
items are reused (Chang & Zhang, 2002; Davey & Parshall, 1995; Revuelta & Ponsoda,
1998; Way, 1998). To reduce the threat of item sharing among examinees, item exposuremust be taken into account in conducting CATs.
Item exposure rate and test overlap rate are two indices commonly used to track
item exposure in CATs. Item exposure rate refers to the proportion of all CATs in which
an item is administered, and test overlap rate is defined as the proportion of items shared
by pairs of exams, averaged across all possible pairwise comparisons. By considering
both indices, item exposure can be monitored at both the item and test levels. With
respect to item exposure at the test level, the test overlap rate could provide useful
information on item sharing between pairs of examinees. This index alone, however,may not be sufficient to capture real item sharing among examinees. In practice,
examinees may obtain test information from more than one examinee. This larger scope
of information sharing should be considered in addition to the pairwise item sharing.
That is, test overlap among a group of examinees as well as that between pairs of
examinees should be controlled in CATs.
To control item exposure and test overlap simultaneously, the relationship between
the two indices needs to be identified first. The relationship between item exposure and
pairwise test overlap has been provided by Chen, Ankenmann, and Spray (2003) andcan be expressed as:
T ¼ S2r þ �r2
�r; ð1Þ
where �r and S2r are the sample mean and variance of item exposure rates, respectively,and T is a test overlap rate defined as the proportion of items shared by pairs of exams,
averaged across all possible pairwise comparisons. Based on the relationship shown in
equation (1), Chen and Lei (2005) modified the Sympson and Hetter (1985) procedure
such that both item exposure rate and pairwise test overlap rate can be controlled
simultaneously.
Nevertheless, the test overlap considered in the Sympson and Hetter procedure with
test overlap control is defined as the proportion of items shared by pairs of exams.
Because examinees may obtain test information from more than one previous test takerin the field, controlling the number of common items shared by a group of examinees as
well as by pairs of exams is necessary. This paper extends the analytical derivation of
Chen et al. (2003) to document the relationship between item exposure rates and test
overlap rate when the proportion of items shared by a group of examinees larger than
two is of practical interest.
1.1. Theoretical lower bound for test overlapEven though controlling test overlap among a group of examinees may be complex,
Chang and Zhang (2002) provided a theoretical lower bound for the number ofcommon items shared by a group of examinees under the assumption of completely
randomized item selection. The theoretical lower bound can serve as a benchmark
for test practitioners to assess the severity of test overlap observed in practice and to
206 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
set an appropriate target test overlap rate if procedures for controlling test overlap
are available.
Chang and Zhang (2002) distinguished two forms of test overlap for a group
of examinees: item sharing and item pooling. Item sharing is defined as the number of
common items shared by all examinees in a group, while item pooling is the number
of overlapping items that an examinee has with a group of examinees. To be morespecific, let Ai be the set of items that examinee i takes, and let p be the number of CATs
administered. Item sharing (Xa) is the number of common items shared by all a
examinees in a group (i.e. number of items in >ai¼1Ai) averaged over all possible
combinations of a examinees out of the p CATs administered. Item pooling (Ya) is the
number of overlapping items between an examinee’s test and the combined set of items
exposed to a group of a examinees (i.e. the number of items in Aj > ð<ai¼1AiÞ for j – i )
averaged over all possible such combinations out of the p CATs administered. Because
examinees are likely to obtain test information from those who have taken the testearlier, it may be more efficient to consider ordered item pooling (Za) which is the
average number of items in Aj > ð<ai¼1AiÞ for j . i instead of j – i. The former form of
item pooling is henceforth referred to as unordered item pooling, and the latter as
ordered item pooling.
1.2. Purpose of the studyThe purpose of this paper is to analytically derive the relationships between item
exposure rate and each of the three forms of test overlap: item sharing, unordered item
pooling, and ordered item pooling. The accuracy of the relationships derived was
verified using numerical examples, by comparing the computational results with the
corresponding tabulations based on the definitions.
A simulation study was conducted to examine the extent of exposure control by twocurrently available exposure control procedures, the Sympson and Hetter (1985)
procedure (SH) and the Sympson and Hetter procedure with test overlap control (SHT),
in light of these new definitions of test overlap rates. The relationships derived along
with findings from the simulation study, will lay the foundation for future development
of procedures to simultaneously control item exposure and item sharing or item pooling
among a group of examinees larger than two.
2. Relationship between item exposure and test overlap
2.1. Item sharingConsidering item sharing between pairs of exams, Chen et al. (2003) showed that the
relationship between test overlap (i.e. item sharing between pairs of examinees) and
item exposure rate is given by:
T ¼Pn
i¼1 ðmi
2 Þh ð p2 Þ
¼Pn
i¼1 miðmi 2 1Þhpð p2 1Þ ¼
Pni¼1 rið pri 2 1Þhð p2 1Þ ; ð2Þ
where h is test length, n is the number of items in the pool, p is the number of CATs
administered, mi is the number of times item i is administered, and ri is the item
exposure rate of item i, defined as mi/p. The relationship shown in equation (2) is
derived based on combinatorial mathematics. Specifically, the numerator term
Item exposure and test overlap 207
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
represents the total number of items in common across all pairwise comparisons andPni¼1ð
mi
2 Þ=ð p2 Þ is the average number of items in common between a pair of examinees.
It is straightforward to extend equation (2) to item sharing among a group of a
examinees ðWaÞ, by simply substituting the number 2 in equation (2) with a (a can be 2
or greater). After simplifying and substituting mi with its equivalent ( pri ), the general
relationship between item exposure rate and item sharing among a examinees is
given by:
Wa ¼ Xa
h¼Pn
i¼1 ðmia Þ
hðpaÞ¼Pn
i¼1 miðmi 2 1Þ· · ·½mi 2 ða2 1Þ�hpð p2 1Þ· · ·½ p2 ða2 1Þ�
¼Pn
i¼1 rið pri 2 1Þ· · ·½ pri 2 ða2 1Þ�hð p2 1Þ· · ·½ p2 ða2 1Þ� ; ð3Þ
where Xa is the average number of common items shared by a group of a examinees.Item sharing can be found by using equation (3) once item usage is known. Equation (3)
is especially efficient when p the number of CATs administered, is large, indeed, as it can
be further simplified to give:
Wa ¼ Xa
h¼Pn
i¼1 rið pri 2 1Þ· · ·½ pri 2 ða2 1Þ�hð p2 1Þ· · ·½ p2 ða2 1Þ�
¼Pn
i¼1 ri ð priÞa21 þ · · ·� �
h pa21 þ · · ·� � !
p!1Pn
i¼1 rai
h: ð4Þ
According to equation (4), the average number of common items shared by a group of a
examinees can be obtained simply by adding rai across the item pool when p is large.
As item sharing between pairs of examinees (i.e. T of Chen et al., 2003) is a special
case of equation (3), the large-sample approximation of T provided by Chen et al. (2003)is similarly a special case of equation (4). That is,
W 2 ¼ T ¼Pn
i¼1 miðmi 2 1Þhpð p2 1Þ ¼
Pni¼1 rið pri 2 1Þhð p2 1Þ !
p!1Pn
i¼1 r2i
h¼ S2r þ �r2
�r: ð5Þ
Based on the relationship between item exposure rate and test overlap rate as shown in
equation (5), Chen and Lei (2005) developed a procedure to simultaneously control
item exposure and test overlap at pre-specified values by controlling both the maximum
value and the variance of item exposure rates. A similar process could be applied for
controlling item exposure and test overlap simultaneously for a . 2. However, since an
examinee might not study for just the items administered to all his or her friends but for
any item administered to any of his or her friends, controlling item pooling amongexaminees might be more realistic in practice. It is therefore important to explore the
relationship between item pooling and item exposure.
2.2. Unordered item poolingLet Ya be the average number of overlapping items between an examinee and any of the
a examinees in a group, h be test length, n be the number of items in the pool, p be the
number of CATs administered, and mi be the number of times item i is administered.
Using similar counting rules in tabulating item sharing, the proportion of overlapping
items between an examinee’s test and the combined set of items administered to
208 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
a group of a examinees, averaged over all pð p21a Þ possible such combinations out of the
total number of CATs administered, can be expressed as a function of item exposure
rates as follows (see Appendix A for details of the derivation):
Va ¼ Ya
h¼Pn
i¼1
Paj¼1 mið mi21
j Þð p2mi
a2j Þhpð p21
a Þ
¼ 121
h
Xni¼1
ðriÞ½ pð12 riÞ�½ pð12 riÞ2 1�· · ·½ pð12 riÞ2 ða2 1Þ�ð p2 1Þð p2 2Þ· · ·ð p2 aÞ :
ð6Þ
Equation (6) was constructed based on the hypergeometric distribution and
combinatorial mathematics. The numerator ðPn
i¼1
Paj¼1miðmi21
j Þðp2mi
a2j ÞÞ represents thetotal number of times that the same item is administered to an examinee and any of the aexaminees in the group across all pð p21
a Þ possible comparisons. Take a ¼ 2, for
example, and assume that item i has been administered to m examinees among p CATs
administered and examinee k is one of them examinees taking item i. That is, excluding
examinee k, item i has been administered to m2 1 examinees among p2 1 CATs
administered. To count the number of times that item i has been administered to
examinee k and any other two (a) examinees, the hypergeometric distribution could be
applied. Specifically, among p21 CATs administered (excluding examinee k), there are
ðm211 Þðp2m
221 Þ ways that item i can appear in one of the two examinees’ tests ( j ¼ 1) andðm21
2 Þðp2m222 Þ ways that item i can appear in both examinees’ tests ( j ¼ 2). Since item i
should be counted as an overlapping item regardless of whether it is administered to one
or both of the two examinees, the number of times that item i is administered to
examinee k and any other two examinees would be ðm211 Þðp2m
221 Þ þ ðm212 Þðp2m
222 Þ. The same
counting rule should be applied for the other examinees who have taken item i, and the
number of times that item i is administered to an examinee and any other two
examinees would be m½ðm211 Þðp2m
221 Þ þ ðm212 Þðp2m
222 Þ� orP2
j¼1mðm21j Þðp2m
22j Þ. A similar
counting rule should be applied for other items in the pool, hence the summation overall n items in the pool. The denominator is used to transform the total number into a
proportion.
Equation (6) is especially useful for finding the unordered item pooling rate when
the number of CATs administered is large. Indeed, for large p, equation (6) can be further
simplified to
V a ¼ 121
h
Xni¼1
ðriÞ½ pð12 riÞ�½ pð12 riÞ2 1�· · ·½ pð12 riÞ2 ða2 1Þ�ð p2 1Þð p2 2Þ · · · ð p2 aÞ
¼ 121
h
Xni¼1
ðriÞ pað12 riÞa þ · · ·� �
ð pa þ · · ·Þ !p ! 1
12
Pni¼1 rið12 riÞa
h:
ð7Þ
Based on equation (7), the proportion of overlapping items between an examinee and a
group of a examinees can be obtained easily from item exposure rates when p is large.According to the definition of the item sharing and unordered item pooling, V 1 is the
same as W 2 (i.e. average test overlap rate between a pair of examinees). Their large-
sample approximation counterparts are also equal. That is, substituting a ¼ 1 in
equation (7) simplifies to equation (5) as follows:
Item exposure and test overlap 209
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
V 1!p ! 1
12
Pni¼1 rið12 riÞ1
h¼ 12
Pni¼1 ri 2
Pni¼1 r
2i
h¼ 12
h2Pn
i¼1 r2i
h¼Pn
i¼1 r2i
h:
Like the large-sample approximation of item sharing, the simplified relationship
between unordered item pooling and item exposure rates in the large-sample
approximation as shown in equation (7) provides the necessary information for futuredevelopment of exposure control procedures to simultaneously control item exposure
and unordered item pooling. Because CATs are often administered at different times,
examinees are likely to obtain test information from those who have taken the test but
not from those who are taking the test at the same time or those who will take the test
later. It is logical to tabulate item pooling for an examinee with a group of a examinees
who have already taken the test instead of tabulating all possible combinations
disregarding the order of CAT administrations. Hence, the relationship between item
exposure and test overlap with respect to item pooling while taking into account theorder of CATs is explored below.
2.3. Ordered item poolingOrdered item pooling refers to the number of common items between an examinee anda group of a examinees who have previously taken the test. That is, the number of items
in Aj > ð<ai¼1AiÞ for j . i rather than that in Aj > ð<a
i¼1AiÞ for j – i. Controlling ordered
item pooling may meet practical needs better than controlling its unordered
counterpart.
Applying counting rules as before, the average proportion of common items
between an examinee and a group of a examinees who have previously taken the test
can be expressed as a function of item usage as
Va ¼ Za
h¼Pp
t¼aþ1
Pni¼1
Paj¼1 ½mit 2miðt21Þ�ðmit21
j Þðt2mit
a2j Þhð p
aþ1Þ; ð8Þ
where Za is the average number of overlapping items encountered by an examinee with
a examinees who have previously taken the test, h is test length, n is the number of
items in the pool, p is the number of CATs administered, andmit is the number of times
that item i has been administered to the first t examinees. Similar to equation (6), the
numerator ðPp
t¼aþ1
Pni¼1
Paj¼1½mit 2miðt21Þ�ðmit21
j Þðt2mit
a2j ÞÞ represents the total numberof times that the same item is administered to an examinee and a group of a examinees
who have previously taken the test across ð paþ1Þ possible comparisons. Since the order of
exams is taken into account, instead of using m as a multiplier as in the unordered item
pooling case, the number of overlapping items is calculated for each examinee after a
examinees have taken the test. In addition, the number of overlapping items is affected
only when item i is administered to examinee t. When item i is not administered to
examinee t, the number of overlapping items remains the same because
mit 2miðt21Þ ¼ 0. With regard to the number of all possible comparisons, instead ofconsidering pðp21
a Þ overlap comparisons as in the unordered item pooling case, only
ð paþ1Þ overlap comparisons need to be considered for the ordered case; ð p
aþ1Þ is the sumover comparisons for each examinee after a examinees have taken the test, that is,
ð paþ1Þ ¼ ðaaÞ þ ðaþ1
a Þ þ ðaþ2a Þ þ · · ·þ ðp21
a Þ (see Appendix B for details of the derivation).
210 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Recent research on exposure control procedures shows interest in controlling item
exposure on the fly instead of relying on large tedious simulations conducted prior to
operational CATs to derive stable itemexposureparameters (e.g. Chen, Lei, & Liao, 2008).
It is useful to express ordered item pooling rate for the first t examinees ðVa;tÞ as afunction of ordered item pooling rate for the previous (t 2 1) examinees ðVa;ðt21ÞÞ.A simplified expression (derived in Appendix C) is
Va;t ¼t 2 a2 1
tVa;ðt21Þ þ
aþ 1
t2
Pni¼1 ½mit 2miðt21Þ�ðt2mit
a Þhð t
aþ1Þ: ð9Þ
This relationship could be used to develop exposure control procedures to control
ordered item pooling on the fly. To be more specific, the numerator term l ¼Pni¼1½mit 2miðt21Þ�ðt2mit
a Þ is critical in controlling Va;t because a, t, h, and Va;ðt21Þ areknown constants after (t 2 1) examinees have taken tests. To have Va;t less than a pre-
specified test overlap rate Vmax, the relationship between l and Vmax can be expressed
as follows:
Va;t ¼t2a21
tVa;ðt21Þþ
aþ1
t2
Pni¼1½mit2miðt21Þ�ðt2mit
a Þhð t
aþ1Þ#Vmax; ð10Þ
l¼Xni¼1
½mit2miðt21Þ�ðt2mit
a Þ$hð taþ1Þ
t2a21
tVa;ðt21Þþ
aþ1
t2Vmax
� �¼d: ð11Þ
Even though l is the sum of ½mit2miðt21Þ�ðt2mita Þ over items in the pool, only h items
(where h is test length) would affect l because mit 2 mi(t21) – 0 for these h items but
mit 2 mi(t21) ¼ 0 for the rest of items in the pool. To have l $ d, each itemadministered should contribute at least l/h to l, where the contribution of each item is
defined as ½mit2miðt21Þ�ðt2mita Þ. In other words, to haveVa;t less thanVmax, the value of
l/h should serve as an item selection criterion to determine if a selected item can be
administered to the tth examinee or not. A study is currently under way to investigate
the effects of the proposed procedure on controlling ordered item pooling for a group of
examinees larger than two.
3. Verification using a numerical example
A simple example is now provided to show the correctness of above derived equations.
Suppose that an item pool consists of n ¼ 10 items, from which p ¼ 5 fixed-length CATs
are administered, each CAT consisting of h ¼ 5 items. Table 1 shows the items that are
administered in each of the five CATs, and Table 2 shows the number of times each item
is used (mi).
Table 1. Items administered in each of five CATs
CAT Items administered
A1 2 5 4 7 8
A2 1 3 4 6 2
A3 7 9 2 5 8
A4 8 7 1 3 2
A5 2 1 6 7 4
Item exposure and test overlap 211
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
3.1. Item sharingItem sharing ðWaÞ is defined as the proportion of items shared by a exams, averagedover all possible combinations of a out of the p CATs administered. That is, Wa ¼ð1=hÞ £ ð1=ðpaÞÞ £ ðtotal number of common items among a exams over all possible
combinations of a out of p CATs). For a ¼ 2 out of p ¼ 5 CATs,
W 2 ¼1
5£
1
ð52ÞðA1A2 þA1A3 þA1A4 þA1A5 þA2A3 þA2A4 þA2A5 þA3A4 þA3A5 þA4A5Þ
¼ 1
5£
1
10ð2 þ 4 þ 3 þ 3 þ 1 þ 3 þ 4 þ 3 þ 2 þ 3Þ ¼ 1
50£ 28 ¼ 0:56:
Note that the notation Ai Aj above indicates the number of common items in set Ai andset Aj. For example, there are two common items between examinees 1 and 2, items 2
and 4 (see Table 1). Because there are ðpaÞ¼ð52Þ¼10 possible pairs for five CATs, there are
10 Ai Aj terms. The average number of common items between pairs of examinees is
divided by test length (h ¼ 5) so thatW 2 represents the average proportion of common
items shared by two examinees. Similarly, for a ¼ 3,
W 3 ¼1
5£1
ð53ÞðA1A2A3þA1A2A4þA1A2A5þA1A3A4þA1A3A5þA1A4A5þA2A3A4
þA2A3A5þA2A4A5þA3A4A5Þ
¼ 1
5£1
10ð1þ1þ2þ3þ2þ2þ1þ1þ2þ2Þ¼ 1
50£17¼0:34:
Substituting the item usage information from Table 2 into equation (3), the item sharing
rate for a¼2 and 3, respectively, is
W 2 ¼ ð32Þþð52Þþð22Þþð32Þþð22Þþð22Þþð42Þþð32Þþð12Þþð02Þ5£ð52Þ
¼ 3þ10þ1þ3þ1þ1þ6þ3þ0þ0
5£10¼28
50¼0:56
W 3 ¼ð33Þþð53Þþð23Þþð33Þþð23Þþð23Þþð43Þþð33Þþð13Þþð03Þ
5£ð53Þ
¼ 1þ10þ0þ1þ0þ0þ4þ1þ0þ0
5£10¼17
50¼0:34:
Based on the results above, it is clear that the item sharing rate found by using
equation (3) is identical to that based on the definition for each a value.
Table 2. Item usage corresponding to items administered in Table 1
Item (i ) 1 2 3 4 5 6 7 8 9 10
Number of times item i is used (mi) 3 5 2 3 2 2 4 3 1 0
212 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
3.2. Unordered item poolingSimilarly, the correctness of equation (6) can be verified against the corresponding
tabulation based on the definition of unordered item pooling. The unordered item
pooling rate is defined as the proportion of overlapping items between an examinee’s
test and the combined set of items exposed to a group of a examinees, averaged over all
P ðp21a Þ possible such combinations out of the total number of p CATs administered. The
notation A1A2· · ·AikAj below indicates the number of overlapping items between
examinee j’s test and the union set of items administered to a group of a examinees
(examinees 1, 2, : : : , i, i – j ). According to the definition of unordered item pooling,
the test overlap rate for a ¼ 1 and 2, respectively, is
V 1 ¼1
5£
1
5 £ ð41ÞðA1kA2 þ A1kA3 þ A1kA4 þ A1kA5 þ A2kA1 þ A2kA3 þ A2kA4 þ A2kA5
þ A3kA1 þ A3kA2 þ A3kA4 þ A3kA5 þ A4kA1 þ A4kA2 þ A4kA3 þ A4kA5 þ A5kA1
þ A5kA2 þ A5kA3 þ A5kA4Þ
¼ 1
5£
1
20ð2þ 4þ 3þ 3þ 2þ 1þ 3þ 4þ 4þ 1þ 3þ 2þ 3þ 3þ 3þ 3þ 3
þ 4þ 2þ 3Þ ¼ 1
100£ 56 ¼ 0:56;
V 2 ¼1
5£
1
5 £ ð42ÞðA1A2kA3 þ A1A2kA4 þ A1A2kA5 þ A1A3kA2 þ A1A3kA4 þ A1A3kA5
þ A1A4kA2 þ A1A4kA3 þ A1A4kA5 þ A1A5kA2 þ A1A5kA3 þ A1A5kA4 þ A2A3kA1
þ A2A3kA4 þ A2A3kA5 þ A2A4kA1 þ A2A4kA3 þ A2A4kA5 þ A2A5kA1 þ A2A5kA3
þ A2A5kA4 þ A3A4kA1 þ A3A4kA2 þ A3A4kA5 þ A3A5kA1 þ A3A5kA2 þ A3A5kA4
þ A4A5kA1 þ A4A5kA2 þ A4A5kA3Þ
¼ 1
5£
1
30ð4þ 5þ 5þ 2þ 3þ 3þ 4þ 4þ 4þ 4þ 4þ 4þ 5þ 5þ 5þ 4þ 3þ 5
þ 3þ 2þ 4þ 4þ 3þ 3þ 5þ 4þ 4þ 4þ 5þ 3Þ ¼ 1
150£ 117 ¼ 0:78:
The corresponding unordered item pooling rate based on equation (6) for a ¼ 1 and 2,respectively, is
V 1 ¼6þ 20þ 2þ 6þ 2þ 2þ 12þ 6þ 0þ 0
5 £ 5 £ 4¼ 56
100¼ 0:56;
V 2 ¼15þ 30þ 6þ 15þ 6þ 6þ 24þ 15þ 0þ 0
5 £ 5 £ 6¼ 117
150¼ 0:78:
Again, the two ways of calculating unordered item pooling rates produce the same
results. Note that V 1 ¼ W 2 ¼ 0:56 in this simple example as discussed above.
3.3. Ordered item poolingThe ordered item pooling rate is defined as the proportion of common items between an
examinee and a group of a examinees who have previously taken the test, averaged over
the ð paþ1Þ possible combinations out of p CATs. The notation A1A2· · ·AikAj below
Item exposure and test overlap 213
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
indicates the number of overlapping items between examinee j’s test and the union
set of items administered to a group of a examinees before examinee j (examinees
1, 2, : : : , i, i , j ). According to the definition of ordered item pooling, the test overlap
rate for a ¼ 1 and 2, respectively, is
V1 ¼1
5£
1
ð52ÞðA1kA2 þ A1kA3 þ A1kA4 þ A1kA5 þ A2kA3 þ A2kA4 þ A2kA5
þ A3kA4 þ A3kA5 þ A4kA5Þ
¼ 1
5£
1
10ð2þ 4þ 3þ 3þ 1þ 3þ 4þ 3þ 2þ 3Þ ¼ 1
50£ 28 ¼ 0:56;
V2 ¼1
5£
1
ð53ÞðA1A2kA3 þ A1A2kA4 þ A1A2kA5 þ A1A3kA4 þ A1A3kA5
þ A1A4kA5 þ A2A3kA4 þ A2A3kA5 þ A2A4kA5 þ A3A4kA5Þ
¼ 1
5£
1
10ð4þ 5þ 5þ 3þ 3þ 4þ 5þ 5þ 5þ 3Þ ¼ 1
50£ 42 ¼ 0:84:
To tabulate the ordered item pooling rate using equation (8), item usage needs to be
updated as each CAT is administered, which is done routinely in on-the-fly exposure
control procedures. Corresponding to the data in Table 1, Table 3 details the number of
times that each item is administered for the first t examinees (mit), t ¼ 1; 2; : : : ; 5.Substituting the data of Table 3 into equation (8), the ordered item pooling rate for a ¼ 1
and 2, respectively, is
V1 ¼ð2Þ þ ð2þ 3Þ þ ð2þ 3þ 4Þ þ ð2þ 4þ 2þ 1þ 3Þ
5 £ 10¼ 28
50¼ 0:56;
V2 ¼ð1þ 3Þ þ ð4þ 3þ 6Þ þ ð10þ 6þ 3þ 6Þ
5 £ 10¼ 42
50¼ 0:84:
As expected, the two ways of tabulating the ordered item pooling rate provide identical
results. Ordered item pooling can also be calculated based on the rate calculated for
previous examinee using equation (9). Taking a ¼ 2 and t ¼ 3, 4, 5 for example and
using the data of Table 3,
Table 3. Item usage corresponding to items administered in Table 1 for the first t examinee(s)
Item (i )
t 1 2 3 4 5 6 7 8 9 10
1 0 1 0 1 1 0 1 1 0 0
2 1 2 1 2 1 1 1 1 0 0
3 1 3 1 2 2 1 2 2 1 0
4 2 4 2 2 2 1 3 3 1 0
5 3 5 2 3 2 2 4 3 1 0
214 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
V2;3 ¼4
5¼ 0:8;
V2;4 ¼42 22 1
40:8þ 2þ 1
42
ð4222 Þ þ ð424
2 Þ þ ð4222 Þ þ ð423
2 Þ þ ð4232 Þ
5ð 42þ1Þ
¼ 0:8
4þ 3
42
2
20¼ 0:85;
V2;5 ¼52 22 1
50:85þ 2þ 1
52
ð5232 Þ þ ð525
2 Þ þ ð5232 Þ þ ð522
2 Þ þ ð5242 Þ
5ð 52þ1Þ
¼ 1:7
5þ 3
52
5
50¼ 0:84:
The result of V2;5 using equation (9) is identical to the result of V2 using equation (8) as
expected because equations (8) and (9) are simply different ways of calculating ordered
item pooling for the same first five examinees.
4. Simulation study
The effectiveness of two currently available exposure control procedures, SH and SHT,and a no exposure control procedure was evaluated in light of these new definitions of
test overlap rates using simulation CAT studies. The purpose of the simulation study was
to demonstrate the level of test overlap in light of the new definitions using no exposure
control and some current exposure control methods to gauge the level of security
afforded by these methods under the more realistic new definition of test overlap.
Two exposure control methods that target maximum item exposure rate were selected
for this purpose and they were by no means exhaustive of exposure control methods.
The SH procedure is designed to control maximum item exposure rate and is commonlyused in practice, while the SHT procedure controls test overlap between pair, of
examinees in addition to the maximum item exposure rate. The CAT specifications are
described below.
4.1. CAT specificationsAll simulated data were generated by using the three-parameter logistic item response
model (3PLM); the maximization of item information was the criterion used for item
selection at each stage; expected a posteriori (EAP) estimation with a u , Nð0; 1Þ priordistribution was used for both provisional and final trait estimation and the initial trait
estimate was assumed to be zero. The adaptive tests were administered to 10,000
simulees drawn from a standard normal distribution N(0,1) on the theta metric. For the
item exposure control procedures, two levels of test security were considered. For a
high level of test security, rmax ¼ 0.2 was used for the SH procedure, and SHT was
implemented with rmax ¼ 0.2 and Tmax ¼ 0:1. For a low level of test security, rmax ¼ 0.3
was selected for the SH procedure, and the SHT procedure was conducted with
rmax ¼ 0.3 and Tmax ¼ 0:2.A real item pool consisting of 400 mathematics items with item parameters
calibrated by using the 3PLMwas employed. The content areas covered by the item pool
were: pre-algebra, 84 items (21%); elementary algebra, 60 items (15%); intermediate
algebra, 54 items (13.5%); coordinate geometry, 54 items (13.5%); plane geometry,
84 items (21%); and trigonometry, 64 items (16%).
Item exposure and test overlap 215
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Two fixed lengths were used: 20 and 40 items. For the 20-item test, the content areas
were distributed as follows: pre-algebra, four items (20%); elementary algebra, three
items (15%); intermediate algebra, three items (15%); coordinate geometry, three items
(15%); plane geometry, four items (20%); and trigonometry, three item (15%). Thus, the
distribution of content areas in the 20-item test was similar to that of the item pool. For
the 40-item test, the number of items in each content area was twice that of the 20-itemtest. A multinomial model proposed to randomize content area selection by Chen and
Ankenmann (2004) was implemented in this study to ensure content balance. In the
approach of the multinomial model, a cumulative distribution of content areas was
formed based on the target percentages. A random number drawn from Uð0; 1Þ was
then used to identify the corresponding content area in the cumulative distribution. As a
result, the target percentage in each content area can be met exactly. See Chen and
Ankenmann (2004) for more details of the multinomial model.
4.2. ResultsEmpirical test overlap rates for item sharing, unordered item pooling, and ordered
item pooling were compared to the corresponding theoretical lower bounds of Chang
and Zhang (2002), where randomized item selection is considered, to gauge the level
of test overlap achieved by these exposure control procedures. Table 4 reports the
empirical test overlap rates for the CAT simulations using no exposure control (No),SH, and SHT, as well as the corresponding theoretical lower bound of the test overlap
rates for a values from 1 to 5. Note that the classical definition of test overlap
was reported as a ¼ 2 for item sharing and as a ¼ 1 for both ordered and unordered
item pooling.
It is clear from Table 4 that test overlap rates for item sharing decreased as a value
increased. However, item sharing as defined here may not be a realistic practice. In
contrast, test overlap rates for item pooling, ordered or unordered, increased as a value
increased. This makes reasonable sense because test security is more likely to becompromised when more individuals are revealing test information. Between ordered
and unordered item pooling, their test overlap rates were very similar (identical to the
second decimal place) for this large number of CATs (10,000).
Even though maximum item exposure rates could be controlled by the SH
procedure, the item pooling test overlap rates for the SH procedure with rmax ¼ .3
(SH_L) were close to those observed from no exposure control, especially for large a
values. Taking the 20-item test and a ¼ 5, for example, the item pooling test overlap rate
was .70 for SH_L and .78 when item exposure was not controlled. In other words, theitem pooling test overlap could not be controlled by the SH_L procedure. For the 20-
item test, there were 14 items in common on average between an individual and five
other test takers when the SH procedure was implemented with rmax ¼ .3.
Thehigh itempooling test overlap observed in the SH_Lprocedure couldbe reducedby
implementing the SHTprocedurewith rmax ¼ .3 and Tmax ¼ :2 (SHT_L). It is clear that testoverlap between pairs of examinees (a ¼ 1) was well controlled by the SHT_L procedure
regardless of test length. However, item pooling test overlap rates for the other a values
were too high to be acceptable in practice. Even though the high item pooling test overlapobserved from both the SH_L and SHT_L procedures could be reduced by increasing the
stringencyofexposure control (i.e. smaller valuesof rmax andTmax; see results for SH_Hand
SHT_H), itempooling test overlap rateswere still far above their corresponding theoretical
lower bounds, especially for tests with short test length.
216 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Table
4.Empirical
test
overlap
ratesforCATsimulationsan
dtheoretical
lowerbounds
Item
exposure
control
Item
sharing
Unordereditem
pooling
Ordereditem
pooling
a2
34
51
23
45
12
34
5
20-Item
test
No
.352
.162
.084
.047
.352
.543
.656
.728
.778
.352
.542
.655
.727
.777
SH_L
.238
.064
.018
.005
.238
.412
.540
.635
.705
.238
.412
.539
.634
.704
SHT_L
.202
.048
.012
.003
.202
.356
.474
.566
.637
.202
.356
.473
.564
.636
SH_H
.173
.033
.006
.001
.173
.313
.427
.520
.596
.173
.314
.428
.521
.597
SHT_H
.105
.012
.002
.000
.105
.197
.279
.352
.416
.105
.197
.279
.351
.416
LB
.050
.003
.000
.000
.050
.098
.143
.185
.226
40-Item
test
No
.390
.193
.107
.063
.390
.586
.698
.766
.812
.390
.586
.697
.766
.812
SH_L
.250
.068
.019
.006
.250
.431
.563
.659
.731
.250
.431
.563
.660
.732
SHT_L
.204
.045
.010
.002
.204
.363
.487
.585
.662
.204
.363
.488
.586
.662
SH_H
.182
.035
.007
.001
.182
.329
.448
.544
.622
.182
.329
.448
.544
.622
SHT_H
.111
.013
.002
.000
.111
.209
.296
.373
.441
.111
.209
.296
.373
.441
LB
.100
.010
.001
.000
.100
.190
.271
.344
.410
Note.Testoverlap
ratesforitem
sharingan
dunordereditem
poolingwere
calculatedusingthelarge-sam
pleap
proxim
ations.Theclassicaldefinitionoftest
overlap
was
reportedas
a¼
2foritem
sharingan
das
a¼
1forboth
orderedan
dunordereditem
pooling.
No,noitem
exposure
control;SH
_L,SH
procedure
withless
stringentex
posure
control,r m
ax¼
.3;SH
T_L,SH
Tprocedure
withless
stringentex
posure
control,r m
ax¼
.3,Tmax
¼0:2;SH
_H,SH
procedure
withstringentex
posure
control,r m
ax¼
.2;SH
T_H,SH
Tprocedure
withstringentex
posure
control,
r max¼
.2,T
max
¼:1;LB,theoretical
lowerboundsbasedonthederivationofChan
gan
dZhan
g(2002)underrandom
item
selection.
Item exposure and test overlap 217
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Figures 1 and 2 present the unordered item pooling rates for the four item selection
procedures for the 20-item test and 40-item test, respectively, where rmax ¼ .2 was used
for the SH procedure, and SHT was implemented with rmax ¼ .2 and T max ¼ :1. Asexpected, the selection procedure without exposure control (i.e. based on the
maximum item information criterion only) produced the highest unordered item
pooling rates. The SH selection procedure provided much lower test overlap rates (by
about .20) than the procedure without exposure control but not as low as those
Figure 1. Unordered item pooling rates for 20-item test (rmax ¼ .2, Tmax ¼ :1).
Figure 2. Unordered item pooling rates for 40-item test (rmax ¼ :2; Tmax ¼ :1).
218 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
provided by the SHT procedure with test level exposure control. The difference
between the SH and the SHT procedures in unordered item pooling rates appeared to
widen gradually as a value increased. This seems to support the importance of
controlling test overlap in conjunction with item exposure. Compared to the
randomized selection procedure, the SHT procedure appeared to have some room for
improvement especially when the test was short (20 items). When the test was long (40items), however, the unordered item pooling rates for the SHT procedure were just
slightly higher than those for the randomized selection procedure (by up to .031).
Because the large-sample approximation for the unordered item pooling rate
(equation (7)) is much simpler computationally than the exact rate given by equation
(6), it would be interesting to find out how large a value of p (number of examinees) is
required for the approximation to function satisfactorily. Figure 3 shows the unordered
item pooling rates calculated by the exact and approximation formulas for the first 1,000
examinees using a ¼ 5, h ¼ 20, and the SHT selection procedure with rmax ¼ .2,T max ¼ :1. (We plotted data for the first 1,000 examinees because we expected the
approximation to work well within sample size of 1,000. Note that the exposure times
of each item would change when the number of examinees changed in the tabulation of
test overlap rates.) The approximated rates were higher than the exact rates for the first
200 or so examinees, and the difference gradually disappeared as the number of
examinees increased.
Due to the similarity between ordered and unordered item pooling test overlap rates
and the complexity of equations (8) and (9) for tabulating ordered item pooling, theunordered rate can serve as a large-sample approximation for the ordered rate. Figure 4
displays the ordered and unordered item pooling rates for the first 1,000 examinees
Figure 3. Exact and approximated unordered item pooling rates for a ¼ 5, test length ¼ 20, and
with SHT exposure control using rmax ¼ :2; Tmax ¼ :1.
Item exposure and test overlap 219
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
using a ¼ 5, h ¼ 20, and the SHT selection procedure with rmax ¼ .2, T max ¼ :1. Theunordered rates might overestimate or underestimate the ordered counterpart
somewhat when the number of CATs administered was small (less than 250). The
ordered and unordered rates became virtually indistinguishable after 250 examinees.In addition, the accuracy of the large-sample approximation for item sharing
(equation (4)) and unordered item pooling (equation (7)) rates can be verified by using
the results from the theoretical lower bounds. Taking a¼3 and test length (h) equal to
40, for example, under the randomized item selection, item exposure rate is equal to
40/400 for each item in the 400-item pool. By using equation (4) the large-sample
approximation for the item sharing rate isP400
i¼10:13=40 ¼ 0:01, which is identical to the
test overlap rate (value in bold) observed in the last row of Table 4. The large-sample
approximation for the unordered item pooling the using equation (7) is12
P400i¼10:1ð0:9Þ3=40 ¼ 0:271, which is identical to the corresponding test overlap
rate (value in bold) in Table 4. While the lower bounds for test overlap rates are
applicable for randomized item selection only, equations (4) and (7) can be used to
obtain item sharing and unordered item pooling rates, respectively, for any item
selection procedures implemented in practice as long as item exposure rates (ri) are
known. The CAT simulations above demonstrated the use of two item selection
procedures with exposure control.
5. Conclusions
Controlling itemexposure, test overlap, or both is a popularmeans to increase test security
in CATs. To date, exposure control procedures that are designed to control item exposure
and test overlap simultaneously are based on the assumption of item sharing between pairs
Figure 4. Ordered and unordered item pooling rates for a ¼ 5, test length ¼ 20, and with SHT
exposure control using rmax ¼ :2; T max ¼ :1.
220 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
of examinees. However, examinees may obtain test information frommore than one other
examinee in practice. This larger scope of information sharing needs to be taken into
account in estimating test security level and in refining exposure control procedures.
This paper examines three different definitions of test overlap rate to reflect
possible ways of sharing information among a group of examinees larger than two.
Item sharing should be considered if one believes that examinees would only takeadvantage of information on items that have been administered to all examinees in a
group. Item pooling is probably more realistic in that examinees would take
advantage of information on any items that have been administered to a group.
Between the two variants of item pooling, the ordered version (taking into account
order of exams) is likely to be more reasonable than the unordered version
(disregarding order of exams) given the continuous nature of CATs. Note that the
mathematical definitions of test overlap do not specify what constitutes a group.
Researchers and practitioners can narrow the definition to any group of interest bystipulating the attributes of the group.
Because procedures designed to control test overlap and item exposure
simultaneously would have item exposure rates or item usage information readily
available, it is useful to derive the relationship between item exposure rates and test
overlap rate. Following the same approach of Chen et al. (2003), we document the
mathematical relationships between item exposure rates and each of the definitions of
test overlap rate for a group of examinees in this paper. Large-sample approximations are
also provided, when available, to simplify their relationships. Moreover, the test overlaprate for ordered item pooling is expressed as a function of that calculated for previous
examinees to make the development of exposure control procedures on the fly (i.e.
relevant statistics are updated for each CAT) more convenient.
Empirical test overlap rates based on the new definitions for three item selection
procedures, maximum information without exposure control, SH, and SHT, are
examined along with the theoretical lower bounds. Results are as expected: the more
stringent the exposure control, the lower the overlap rates. Item pooling rates are
substantially higher than the theoretical lower bounds even for the most stringentexposure control procedure examined here (SHT), especially when the test is short.
Moreover, test overlap rates for item pooling among a group of examinees larger than
two are significantly higher than those for item sharing between pairs of examinees (the
classical definition of test overlap). More stringent exposure control than that afforded
by the currently available procedures may be needed if test information pooling
generally happens in groups larger than two.
Furthermore, empirical evidence suggests that the large-sample approximation for
the unordered item pooling rates converges to the exact rate quite quickly (afteraround 200 examinees). The unordered rate also converges quickly to the ordered
rate (after about 250 examinees). Even though the unordered rates may not be as
applicable as the ordered rates in practice, these two indices are indistinguishable
when the number of CATs is not too small. Given the simplicity of the large-sample
approximation for unordered rates provided in equation (7), the ordered rates could
be approximated efficiently by using the large-sample approximation of unordered
rates. The large-sample approximation of unordered rates is especially useful for
estimating the lower bound for the ordered rates, which cannot be theoreticallyderived. That is, the lower bound for ordered rates can be estimated by applying
equation (7) with the item exposure rate of each item set equal to the ratio of test
length to pool size.
Item exposure and test overlap 221
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Empirical test overlap rates based on the new definitions are provided only for two
exposure control procedures that explicitly target maximum item exposure rate (i.e. SH
and SHT). Even though the two procedures can control item exposure (and test overlap)
well and are commonly used in practice, they cannot increase exposure rates for
underexposed items. Another popular exposure control procedure, the a-stratified
method proposed by Chang and Ying (1999), can effectively balance item usage. Byimplementing the a-stratified method, there are few items with exposure rates equal to
zero and item usage is well balanced without sacrificing the efficiency in ability
estimation (Chang & Ying, 1999). However, the maximum of observed item exposure
rates may not be controlled by implementing the a-stratified method alone. It would be
interesting to compare the SH and SHT procedures to the a-stratified method on
empirical test overlap rates based on the new definitions.
As noted earlier, the new definitions of test overlap do not specify what constitutes a
group of examinees and the formulas derived in this paper can be applied to differentdefinitions of groups with different characteristics. In practice, test practitioners would
consider test overlap conditional on ability because item information is more likely to be
shared among individuals with similar ability. To find conditional test overlap rates, the
formulas presented in this paper can simply be applied to the group of examinees with
similar ability rather than the larger group containing all ability levels. Since examinees
with similar ability levels tend to take similar items in CATs, it could be expected that
conditional test overlap rates would be much higher than the unconditional test overlap
rates observed in this study. For exposure control conditional on ability levels,conditional procedures that consider ability subgroups such as the Stocking and Lewis
(1998) conditional procedure can be used, and they are expected to provide lower
conditional test overlap rates. Future studies should compare the Stocking and Lewis
conditional procedure and the item exposure control procedures implemented in this
study on the level of test overlap conditional on ability.
In addition to conditioning on ability level, test overlap conditioning on testing time
could be considered because an individual would be more likely to get item information
from the most recent examinees than from examinees who took the test years ago. Tofind the test overlap rate conditional on testing time, the formulas presented in this
paper can likewise be applied to the group of examinees who take the test in close
proximity.
In sum, this paper lays the groundwork for future developments of exposure control
procedures to simultaneously control item exposure and test overlap rates for more
realistic item information sharing situations, in which examinees may obtain test
information from a group of two or more examinees. We have illustrated the initial
development process of such a procedure for controlling ordered item pooling on thefly based on equation (9). Similar development processes can be followed if one would
like to control for item sharing or unordered item pooling. Large-sample approximations
for both item sharing and unordered item pooling depend on a power function of item
exposure rates. One can start by expanding the power function of order a, which varies
by the number of examinees sharing or pooling information. However, developing the
actual exposure control algorithms is beyond the scope of the paper.
Future studies should naturally develop appropriate exposure control procedures
based on the relationships shown in this paper and thoroughly evaluate their effect onboth test security and score precision. In addition, how examinees share test
information is likely to depend on the stake and nature of the test as well as
characteristics of test takers. It may be worthwhile to investigate the common group
222 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
sizes (i.e. determining a reasonable a value) for different tests and different populations
of test takers.
Acknowledgements
This study was supported by the National Science Council, Taiwan (NSC 97-2410-H-194-107-MY3).
References
Chang, H.-H., & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied
Psychological Measurement, 23, 211–222.
Chang, H.-H., & Zhang, J. (2002). Hypergeometric family and item overlap rates in computerized
adaptive testing. Psychometrika, 67, 387–398.
Chen, S., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the
early stages of computerized adaptive testing. Journal of Educational Measurement, 41,
149–174.
Chen, S., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and
test overlap in computerized adaptive testing. Journal of Educational Measurement, 40,
129–145.
Chen, S., & Lei, P. (2005). Controlling item exposure and test overlap in computerized adaptive
testing. Applied Psychological Measurement, 29, 204–217.
Chen, S., Lei, P., & Liao, W. (2008). Controlling item exposure and test overlap on the fly in
computerized adaptive testing. British Journal of Mathematical and Statistical Psychology,
61, 471–492.
Davey, T., & Parshall, C. G. (1995, April). New algorithms for item selection and exposure control
with computerized adaptive testing. Paper presented at the annual meeting of the American
Educational Research Association, San Francisco.
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in
computerized adaptive testing. Journal of Educational Measurement, 35, 311–327.
Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in
computerized adaptive testing. Journal of Educational and Behavior Statistics, 23, 57–75.
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized
adaptive testing. Proceedings of the 27th Annual Meeting of the Military Testing Association
(pp. 973–977). San Diego, CA: Navy Personnel Research and Development Center.
Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational
Measurement: Issues and Practice, 17, 17–27.
Received 7 February 2008; revised version received 3 February 2009
Item exposure and test overlap 223
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Appendix A. Derivation of unordered item pooling as a function of itemexposure rates
V a¼
Pni¼1
Paj¼1mi
mi21
j
!p2mi
a2j
!
hpp21
a
! withmi21
j
!¼0 if mi21, j
¼Xni¼1
mi
hp
mi21
1
!p2mi
a21
!þ
mi21
2
!p2mi
a22
!þ ···þ
mi21
a
!p2mi
0
!
p21
a
!266664
377775
¼Xni¼1
mi
hp12
p2mi
a
!
p21
a
!266664
377775 {
Paj¼0
x
j
y
a2j
!
xþy
a
! ¼1:0
¼Xni¼1
mi
hp12
ð p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þð p21Þð p22Þ ··· ð p212aþ1Þ
� �
¼Xni¼1
mi
hp2Xni¼1
mi
hp
ð p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þð p21Þð p22Þ ··· ð p212aþ1Þ
� �
¼12Xni¼1
mið p2miÞð p2mi21Þ ··· ð p2mi2aþ1Þhpð p21Þð p22Þ ··· ð p2aÞ {
Xni¼1
mi
hp¼Xni¼1
ri
h¼h
h¼1
¼12Xni¼1
ð priÞð p2priÞð p2pri21Þ ··· ð p2pri2aþ1Þhpð p21Þð p22Þ ··· ð p2aÞ
¼121
h
Xni¼1
ðriÞ½ pð12riÞ�½ pð12riÞ21Þ� ··· ½ pð12riÞ2ða21Þ�ð p21Þð p22Þ ··· ð p2aÞ :
224 Shu-Ying Chen and Pui-Wa Lei
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Appendix B. Derivation of the number of all possible comparisons forordered item pooling
p
aþ1
!¼
a
a
!þ
aþ1
a
!þ
aþ2
a
!þ ···þ
p21
a
!
¼p21
a
0@
1Aþ
p21
aþ1
0@
1A{
x
y
0@1A¼
x21
y21
0@
1Aþ
x21
y
0@
1A ðPascal’s triangleÞ
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p22
aþ1
0@
1A
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p23
a
0@
1Aþ
p23
aþ1
0@
1A
..
.
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p23
a
0@
1Aþ
p24
a
0@
1Aþ···þ
aþ2
a
0@
1Aþ
aþ2
aþ1
0@
1A
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p23
a
0@
1Aþ
p24
a
0@
1Aþ···þ
aþ2
a
0@
1Aþ
aþ1
a
0@
1Aþ
aþ1
aþ1
0@
1A
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p23
a
0@
1Aþ
p24
a
0@
1Aþ···þ
aþ2
a
0@
1Aþ
aþ1
a
0@
1A
þa
a
0@1Aþ
a
aþ1
0@
1A
¼p21
a
0@
1Aþ
p22
a
0@
1Aþ
p23
a
0@
1Aþ
p24
a
0@
1Aþ···þ
aþ2
a
0@
1Aþ
aþ1
a
0@
1Aþ
a
a
0@1Aþ0
Item exposure and test overlap 225
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Appendix C. Derivation of ordered item pooling for the first t examinees
Vat ¼
Vaðt21Þh
t 2 1
aþ 1
0@
1Aþ
Pni¼1
Paj¼1 ½mit 2miðt21Þ�
mit 2 1
j
0@
1A t 2mit
a2 j
0@
1A
h
t
aþ 1
0@
1A
¼
Vaðt21Þh
t 2 1
aþ 1
0@
1Aþ
Pni¼1 ½mit 2miðt21Þ�
t 2 1
a
0@
1A2
t 2mit
a
0@
1A
24
35
h
t
aþ 1
0@
1A
;
{
x þ y
a
0@
1A ¼
Xaj¼0
x
j
0@
1A y
a2 j
0@
1A
¼ t 2 a2 1
t
� �Vaðt21Þ
þ
h
t 2 1
a
0@
1A2
Pni¼1 ½mit 2miðt21Þ�
t 2mit
a
0@
1A
h
t
aþ 1
0@
1A
¼ t 2 a2 1
t
� �Vaðt21Þ
þ aþ 1
t2
Pni¼1 ½mit 2miðt21Þ�
t 2mit
a
0@
1A
h
t
aþ 1
0@
1A
226 Shu-Ying Chen and Pui-Wa Lei