document resume, ed 262 071 skaggs, gary; lissitz, …document resume, ed 262 071 tm 850 548 author...
TRANSCRIPT
DOCUMENT RESUME,
ED 262 071 TM 850 548
AUTHOR Skaggs, Gary; Lissitz, Robert W.TITLE An Exploration of the Robustness of Four Test
- Equating Models.PUB DATE Apr 85NOTE 47p..; Paper presented at the Annual Meeting of the
American Educational Research Association (69th,Chicago, IL, March 31-April 4, 1985). Small print intables 1-3.
PUB TYPE Speeches/Cdinference Papers (150) -- Reports -Research /Technical (143)
EDRS PRICE MFO1 /PCO2 Plus Postage.DESCRIPTORS Computer Simulation; -*Equated Scores;-*Error of
Measurement; *Goodness of Fit; Latent Trait Theory;*Mathematical Models; Scaling; Standardized Tests;'Statistical Analysis; Statistical Studies
IDENTIFIERS Equipercentile Equating; Linear Models; Rasch Model;Robustness;,Three Parameter Model
ABSTRACTThis study examined how four'commonly used test
equating procedures (linear, equipercentile, Rasch Model, andthree-parameter) would respond to situations in which the propertiesor the two tests being equated were different. Data for two testsplus an external anchor test-were generated from a three, parametermodel in which mean test differences in difficulty, discrimination,and Lower asymptote were manipulated. In each case two data sets were.generated consisting of responses of 2,000 examinees to a 35 itemtest plus 15 item anchor test. Each test equating case was comprisedof 4,000 examinees and 85,items. The robustness with respect toviolations of assumptions was tested for the linear and Raschequating methods. For equipercentile equating, the results showed howthe method responded to various conditions for the three-parametermodel, the study primarily tested LOGIST's simultaneous estimationprocedure. Results indicated that equipercentile equating was verystable across the cases studied. Linear and Rasch model equating werevery'sensitive to violations of their models' assumptions. Resultsfor the three-parameter model were disappointing. Paying closeattention to test item properties was advised when selecting anequating method. (BS)
************************************************************************ Reproductions supplie,4. by EDRS are the best that can be made *
* from the original document. *
***********************************************************************
An Exploration of.the Robustnessof Four Test Equating Models
Gary SkaggsRobert W. Lissitz
University of Maryland
April 1985
UAL DEPARTMENT OF EDUCATIONNATIONAL INSTITUTE OF EDUCATION
EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)
VT' his document has been reproduced asreceived from the person or organizationoriginating it.
I Minor changes have been made to Improvereproduction quality.
Points of view or opinions stated in this docu-ment do not necessarily represent official Nitposition or policy.
"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BY
Liss, 74z eolek:/-- 10
TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."
Paper presented at theAnnual Meeting of the American Educational Research Association
Chicago, ILApril 1985
2fly
7
AN EXPLORATION OF THE ROBUSTNESS OF FOUR TEST EQUATING MODELS
Gary SkaggsRobert W. Liesitz
University of Maryland
The application of item response theory (IRT) to many measurement problems has
been one of the ia3or psychometric breakthroughs of the pest twenty years. IRT
methodology is currently being used in many large standardized testing programs. and
at the same time, a great deal of research is being doni to evaluate the robUstness
of the procedures under a variety of conditions. One of the most important
applications of this model is in the area of test equating.
The purpose of test equating is to determine therelationship between raw
scores on two tests that measure the same ability. Equating can be horizontal,
between tests of equivalent difficulty and content,, or vertical, between tests of
intentionally different difficulties.
Most conventional approaches to test equating have been describe* as either
linear or equipercentile methods (see Angoff, 1971) in contrast to using item
response theory (IRT) (Lord, 1980; Wright S. itone,41979) and are now widely used.
In these techniques, a mathematical relationship between raw scores on tests is
modeled. This relationship is based on estimates of item parameters from two tests"
and placement of these estimates'on the same scale.
A number of equating studies using IRT methods have appeared 4n recent years.
A maJority of these have dealt with the one-parameter logistic, or Rasch, model.
This model is the simplest of the IRT models but also the most demanding one in
terms of its assumptions.
Several studies have found the Rasch model to be useful and appropriate for
item calibration and linking (Tinsley & Davis, 1975; Rentz & Bashaw, 1977; Guskey,
1981; Forsyth, Saisangjan, & Gilmer, 1981). On the other hand, a number of
researchers have noticed problems with the Reach model for vertical equating
3
(Whitely & Dawis, 1974;51inde & Linn, 1978 1979; Loyd & Hoover,. £980: Holmes,
19S2).
Some of the inconsistency can be attributed to different equating-liesigns,
different types of tests being equated, and different procedures used to analyze the
retults. Regardless of the cause, there are some, fundamental concerns about the
Rasch model. The most frequently postulated arguments concern the failure of the
model to account for chance scoring, unequal discrimination, and
multidimensionality. The last concern applies to more complex IRT models as well.
Research tailing the three-parameter logistic and other models has not been as
plentiful as with the Rasch model, but it.suggests the same interpretative
difficulties. Most of the studiel have examined the three - parameter model in the
context of horizontal equating. of general ability tests. This work has generally
supported the use of the three-parameter model (March°, Petersen, .& Stewart, 1979;
Kolen, 1981; Petersen, Cook, & Stocking, 1981). For vertical equating, results have
been more mixed with some-studies finding the there parameter model to be more
effective than the Reach model (Marco et al., 1979; Kolen, 1981). However, the
comparison between the models has been shown to depend largely on the content of the
tests being equated (Kolen, 1981; Petersen et al., 1981; Holmes & Doody-Bogan,
1983).
With all of this conflicting research, it is very difficult to make decisions
about howto use IRT equating or whether to use it at all. The purpose of the
present study is to explore how test equating results can be affected by the
parameters of the items that make up the testa being equated.
Four methods of equating were Chosen representing, popular versions of linear,C
equipercentile, Rasch model, and three-parameter model techniques. Data for two
tests plus en external anchor test were generated from a three-parameter model in
which mean test differences in difficulty, discrimination, and lower asymptote were
manipulated. For Reach model and linear equating, this study is an exploration of
robustness when the model's assumptions are violated. For the three-parameter
Todel, this study amounts to an examination of the parameter estimation strategy.
For tine equipercentile equating, this study explores its effectiveness under a
variety of test conditions.
beta
METHOD
Data in this study were generated from the three-parameter logistic model:
-1P(u ='1/ 8,a ,b ,c ) = c (1-c )(1 * exp(-1.702a (8 -b ))) Illi3 3iii i i i 3 1
The response to item i by person 3, a 0 or 1, was determined by comparing the
probability defined by equation 1 to a random number drawn fro* a (0,1) uniform
distribution. If the probability of a correct response exCeeded the random'number,
the item was scored as correct. Otherwise, the item was scored. as, incorrect. The
random numbers were produced from the GOBS (IMSL, 1980) generator.
In all simulation cases. an external anchor test design was used. Iff each
Case, two date,sets were generated. Each data set consisted of the responses of
2,000 examinees to a 35 item test plus an anchor test of 15 items; Each test
equating case was comprised of 4,000 examinees and 85 items.. This size was chosen
to be large enough to provide stable parameter estimates for both IRT models
(Lissak, Hulin, & Drasgow, 1982).
Item and Ability Parameters
The item parameters used to generate the data were determined by manipulating
lower asymptotes and mean test difficulty and discrimination of the tests being
equated. For each reference, the two tests will be referred to as test A and test
B.
1
item difficulty was Studied a. three levels: 1) TA = bg = 0 ;2) = =
.5; and 3)15A= -1.0, = 1.0. For each test, difficulties were uni- rmly
distributed across a ranee of 4./- 2 logits.
Item discrimination was also examined at three levels: 1)71A =- at = .8: 2)-E1
= Et = 1.1; and 3) "iiA = 1.1, it = .5. In each test, discriminatiOns were \\\
uniformly distributed across a range of 4. - .1. Difficulties and discriminations
were randomly paired.
Lower asymptote, values were.manipulated in four ways: 1) cA = cg = 0; 2) cA =
cB = .2; 3) cA = 0, ca.. = .2, and 4) cA = = O. In each case, lower asymptote
values were the same for all items within a test.
In this study, a complete.crossins.of all levels produced 36 cells, or cases,
of pairs of tests to be equated. All anchor test items had a mean difficulty of
zero end a mean discrimination of .8. For vertical equating,the anchor items
represented an overlap in difficulty between tests A and B. Lower asymptote values
were all zero except in the case where the valuei were .2 for both tests. In these
cases, lower asymptotes were .2 for the anchor teat items.
These item parameters were chosen to refleCt a typical test equating between
tests of either equal or unequal difficulty (to simulate horizontal or vertical
equating). The abilities of the examinees was chosen to match that of each test's
difficulty, an.ideal situation and one found commonly in achievement testing. Each
sample of 2,000 examinees was selected from a normal distribution with.a mean equal
to-the mean difficulty-of the test and a standard deviation of one. The GGNML
(IM5L, 1980) generator was used to generate ability parameters for each sample.
Equatina Methods
One linear, one equipercentile, and two item response theory GIRT) equating
methods were chosen for this study on the basis of their popularity. In all cases,
an external anchor test design was used, and Test B was equated to Test A. That is,
for each raw score on Test B, an equivalent was found on the raw score scale of Test
A. For vertical equating,' Test B was always the more difficulty test.
The linear equating,method has been described by Angoff (1971) as Design Ivq-1
and is a procedure derived by Levine (1955) for equally reliable tests.
Equipercentile equating was accomplished using Levine's (1958) method which has been
described by Angoff as Design IVB. Cureton & Tukey's (1951) rolling weighted
average' method was used to smooth the cumulative distributions.
One of the IRT equating methods is based on the Basch model. Item parameters
were estimated using BICAL_(Wright, Mead, I Bell, 1980), and the equating was done
using procedures outlined by Wright & Stone (1979).
For three-parameter model' equating, parameter estimates were obtained from
LOGIST V.(Wingersky, Barton, & Lord, 1982). Many versions of this program exist.
For this study, a. version adapted by ETS for a UNIVAC 1100 was used. For each
equating case] item parameters for both tests and the anchor test were estimated
simultaneously by employing LOGIST's option,for not reached items.' The equating
\ then followed,pord's (1980) estimated true score equating procedure. For.below
.chance raw scores, Lord's (1980, p. 210) method of linear extrapolation was used.
Analysis Procedures
Since the data were generated from a known three-parameter model, these initial
item parameters. were used to develop a criterion for the test equating cases. This
criterion was simply a pairing of raw scores corresponding to the same ability
estimates:71A Pi o) . cs,- z Pi ce)
i j
This equating function was then compared to the equating functions produced by
the four equating methods. Besides plotting these results, two summary statistics
were used to interpret the results. These statistics are very similar to mean
square error statistics used in other equating studies, (e.g. Marco, Petersen, &
Stewart, 1979; Petersen, Cook, & Stocking, 1981). These indices are referred to
here as the weighted and unweightediman square error OISE) and can.be stated as
follows:k-=1
weighted(MSE) = fi(XE - Xcrit
)
2/ f S 123
k-1
unweighted(MSE) = Z' (X - XE crit
)2/52
B
2where k equals the number of items on Test B, 5g equals the raw score variance for °
Test B, -crit is the criterion test score equivalent on Test A for raw score i on
Test B. Xe is the Test A equivalent for raw score i that is produced by one of the
equating methods, and fi is the frequency of raw score i on Test B. The summation
is over rev score values, except that for the weighted MSE, the summation is only
across that part of the scale where extrapolation was not necessary. Zero and
perfect scores were excluded from all IRT equatirigs, but included in both
conventional equating'.
RESULTS,
Raw score means and standard deviations for all data sets are shown in Table 1.
Raw score means ranged from approximately 17.5 to 21.3 and standard deviations from
4
5.'0 to 7.7. By looking at data sets generated under similar item parameters, it is
clear that the generation procedure produced very consistent results. An
examination of the frequency distributions4for each data set also revealed a high
degree of consistency. In the shapes of the distributions. This in turn suggests
that there was a high degree of stability in the equetings.
As expected, a higher degree of,item discrimination in the generating item
parameters produced More dispersion in the raw score distributions. The reverse was
true for low discrimination. Non-zero lower asymptotes produced negatively skewed
raw score distributions.
The summaries of the two mean square error indices are presented in Tables 2
and 3. The first case, where test difficulties and discriminations were equal and
lower asymptotes were zero, was a situation where the data fit the Reach model.
From a psychometric point of view, too, this represented an ideal (easy) equating
situation. 'The best result for all four methods, almost perfect equating, was
found in this situation. The worst resets for all methods occurred where mean test
difficulties and discriminations were unequal, where levels of chance scoring were
unequal, and where low discrimination was paired with non-zero lower asymptotes on
the, more difficult test.
In general, the error indices for the equipercentile method were the lowest
across all cases.. This was followed by the three-paremeter model. Values for the
Rasch-model and linear equating tended, naturally, to be relatively large in
situations where their assumptions were violated.
To aiein the interpretation of Tables 2 and 3 , repeated measures
analyses-of-variance were performed on the two ?ISE indices. The results for the
weighted 115E appear in Table 4 and for the unweifted OE in Table 5. All effects
involving a comparison of the four methods were significant. Figures 1 to 4 show
cell plots of all means for the first and second order interactions between the four'
equating methods and independent variables. In each plot, the values shown are
means pooled across the variable(s) not included in the plot.
Finally, the actual equating functions for each case are shown in Figure 5, In
each plot, the solid.line represents the criterion equating based,on the initial
item parameters. The four broken lines represent the resultiofrom the four equating
. procedures. The criterion equating in most cases was curvilinear, making linear,
equating clearly inappropriate. In most of the plots. Reach and linear equating
were the most deviant, while the equipercentile equating line was closest to the
criterion, thus visually confirming the ?SSE values in Tables 2 and,3..
DISCUSSION
From a statistical viewpoint, the robustness with respect to violations of
assumptions was tested in this study only for the linear and Rasch equating methods.
For equipercentile equating, the results showed how the method responded to a
variety of conditions. In the case'of the three-parameter model, this study was
primarily a test of LOGIST's simultaneous estimation procedure.
Linear E uatin%
The assumptions of the linear equating model are violated whenever the shapes
of the raw score distributions differ for the two tests being equated. This
occurred when the mean test discrimination and/or level of lower asymptotes differed
between the two testa. The appropriateness of linear equating could be gauged by
the degree of curvilinearity in the criterion equating function. The total error
for linear equating was the smallest for horizontal equating with equally
discriminating tests. Chance scoring did not affect the equating in these cases
since the criterion equating function was still lifiear. Linear equating was clearly
inappropriate for all vertical equating cases and for horizontal equating where mean
test discriminations were unequal.
iU
Ecuinercentile Eouatinc
As can be seen in Tables 2 and 3, equating error for equipercentile equating
was-generally the lowest of' all four methods. All MSE values were below .25, and it
provided the smallest values 'in 30 of the 36 cases and in all-the vertical equatinb
cases. In the most extreme situations, this method was the only one of the four to
produce what we would consider acceptable results. Perhaps one reason for this- is
that it is only one of the four approaches not based on a model. It is simply the
best fit of the data at hand. The issue of cross-validation might be important in
some situations, but in this case, our preliminary work (not reported here)shows
that the results ailkvery stable.
In this version. of equipercentile equating, a total group cumulative
distribution was estimated for both tests basedon the response of the combined
.____.samples to the anchor test items. That this estimation inconjunCtion with a
smoothing routine worked so well was somewhat surprising.
Pasch Model Eauatina
An examination of the results in Tables 2 and 3 suggests that,the Reach model
was not very robust to violations of its assumptions. The first case in those
tables and in Figure 5. shows a situation where the data fit the Reach model for
horizoqtal equating. The Reach model, as expected, performed extremely well as did
the other three methods. In the second case in the tables and in Figure 5, all-4
items had a lower essymptote of .2. Yet, the equating was still quite good for all
methods.
In subsequent cases, where the level oP chance scoring was unequal in the two
tests and where test discriminations were unequal, the Reach model performed very;
poorly. In situations where low discrimination was paired with non-zero lower
asymptotes the total error was relatively large. An explanation for these results
11
\ Acan be found by looking at the estimation and linking procedure. When the BICAL
program is faced with a data set, a metric is chosen so that mean difficulty equal
to ,zero and all discriminations equal to one. When BICAL runs are done for two
tests with different properties, the resulting metrics are different, and estimated
item difficulties foi one test are more or less compressed than they should,be. The
use of an equating' constant does not alter the ,underlying metric, and a bias is
introduced into the equating.
That this bias can be severe can be seen in the next to last plot in Figure 5
in the case wheke low discrimination and chance scoring are bothc'present in the more
difficult test. The BICAL estimates for this case revealed a range of difficulty of
-4.0 to 3.1 for\Test A, but for Test B, the range was -1.5 to 1.2. Both tests were
generated with a range of /-2 logits; Obviously the metrics are quite different..
For vertical aque4ng, where the data for edth test fit the Reach Model, Rasch
equating produced adequate results where the test difficulties differed by one'
logit. However, where tests differed by two legit& in difficulty, the equating was
not as good. One poisible reason for this involves the anchoring procedure. Since
the anchor en overlap in the difficulty ranges of.both
teats, these iteitta_wermLvery difficult for those taking. Test A and very easy for
those taking Test B. Consequently, estimation was not as accurate. An anchor test
with a wider range Of difficulty (see Loyd, 1983) might have alleviated this
problem.
Ip the vertical equating cases; the error introduced by unequal seen
discrimination and chance Scoring was even more pronounced. Even where the same
degree of chance-scoring occurred on the two,tests, Reach equating was clearly
inadequate. These results therefore corroborate from a different methodological
perspective, empirical results that advise against using the Reach model to equate
vertically whenever chance scoring is a poisibility. These results also advise
12
against using thy Reach model in any situation when Sean test discriminations are
°unequal.
O
4 These problems would be especialYy difficult to overcome when one is
constructing alternate forms from an item babk.. To ensure that all tests formed
from the bank had the same mean discrimination, all items in the bank would have to
have the Same discrimination, or a,coiplicated algorithm for item selection be
progressed. Similarly,_chance scoring would not be a problem only if all items had
the same degree of chance scoring the forms to be equated were of Comparable
difficulty. this is a difficult task for any test developer.
Three parameter Model Eouatino
Since the data were generated from a threeaparameter molel, one would expect
three-parameter model estimation and equating to be quite accurate. An examination
ofthe values in Tables 2 and 3 indicates that this was not always the case.
The plots in Figures 1 to 4 suggest that three- parameter equating was.
relatively unaffected by levels of teat difficulty or chance Scoring. However, the
equating was affected by unequal discrimination. There was also an interaction
between unequal discrimination and teat difficulty and chance scoring. The greatest
errors occurred where thelmore difficult test also 'had the lower discrimination and
where a higher degree of hance scoring was paired with lower discrimination.
Since the data actually fit the model, the LOGIST estimation procedure as
programmed should be held responsible for the success of the equating. In this
study, simultaneous estimation was used. A single LOGIST run was used for each test
equating by employing the "not reached" option. In every case for this study, the
LOGIST estimation converged. However, for unequal discrimination and chance
scoring, theAprogram typically took at least 35 stages to converge (For practical
reasons, it was decided to extend the limit on the number of stages rather than
produce continuation runs).
That LOGIST was unable to recover the initial metric can be illustrated by
looking at.he parameter estimates from one of the cases. For the situation where
the/initialiparameters for tests A and B were as follows: be-.5, ce.0 and
ci=.2: the weighted andounweighted HSE's were .616 and .563,
respectivel: For Test A, the LOGIST difficulty estimates ranged from -3.14 to
.1.72, while the original difficulties ranged from -2.5 to 1.5. However, by linearly
transformirig-the_LOGIST estimates to the original metric, the estimations ranged
from -4.03 to 2.52. The LOGIST discriminations for Test A ranged from .8 to 1.3
compared to the original-1.0 to 1.2. After transformation, the range becomes ..6 to
.9.
For Test B, the LOGIST difficulties after transformation ranged from -1.37 to
2.55. The original span was from -1.5 to 2.5. However, the difficulties were
poorly estimated for the easiest half of the test. The LOGIST discriminations after
transformation ranged from .5 to 1.0, the original range being .4 to .6.
Ironically, the lower asymptotes were estimated reasonably well. The default
options were used, and default values were obtained for the easiest items on both
tests. Yet, no item had a c-value_greiter than .1 on Test A, and only six items on
Test B had c values leas than .1.
Another peculiarity was observed in the LOGIST results across all cases.
each teat, parameters for a few items (one to throe out of 35) were estimated
extremely poorly. These tended to occur more frequently on tests with weaker
discriminations. No apparent reason for these outliers could be found as all item
responses were generated from the same function. Still, an erroneous decision on
the quality of an item could be made from these results.
14
Clearly, in cases of unequal discrimination, LOGIST was unable to reproduce the
original metric, and equating was therefore biased. The differences in
discrimination were quite severe in this study, and it is not,known how well LOGIST
would respond to milder differences. On the other hand, the parameters for this
study sample size, test length, and ability distribution- - were chosen to yield
stable, reproduceable estimates. The results suggest therefore that the use of the
simultaneous estimation procedure of LOGIST is questionable in circumstances such as
these. Some other method for transforming estimates to the same scale should be
considered.
Comments on Analysis Procedures
A reviev of published equating studies reveals a wide variety of evaluation
procedures and summary statistics. The degree to which methodology affected
conclusionsis not known in these studies. In this study, the weighted mean square
error statistic was chosen because it has appeared frequently in the literature
(e.g. Marco et al., 1979; Petersen et al., 19811. When the results from these
statistics were compared to graphs of the equating functions (Figure,5), the
weighted MSE values did not seem to represent some of the cases accurately. This
was because there were relatively few persons in the raw score ranges where the
greatest equating errors occurred, at the lower end of the distribution.
Because of this, the inweighted MSE was also computed. Because each raw score
counted equally with this'statistic,. the values tended to be higher than for the
weighted statistics. In Figures 1 to 4, the weighted MSE values appear on the left
hand side and the unweighted values on the right. A comparison of the two sets ofk.
plots suggests thatthe two sets of MSE values turned out to be very similar for
equipercentile and three-parameter model methods. For Linear and Rasch model
equating, quite different results appeared in some of the plots.'
15
Other abnormalities appear when looking at the plots in Figures 1 to 4. For
example,, in Figure 1, the symmetry in the study's design does not appear in th MSE
values for levels of the discrimination and lower asymptote independent variable.
Tests h and B alternate between mean discriminations of .5 and 1.1 and lower
asymptotes of 0 and .2. Yet, the higher MSE values occur when Test A has the higher
discrimination. The same thing occurs when there is more chance scoring on Test B
than on Teat A. If one examines symmetrical designs (third, fourth,:fifth, and
ninth plots) in Figuri 5, e plots appear to be mirror images of one another.
The paradox can be explained by the fact that the HSE statistic nses vertical
distances from the plots. If horizontal diitances were used (i.e. Test A equated to
Test B). the pattern of results would be reversed. This analysis calls into
question the use 9f )5E statistics for this purpose. A great deal of theoreical
statistical work is needed in the area of proper error indexing.
CONCLUSIONS
The purpose of this study was to examine how four commonly used test equating
procedures would respond to situations in which the properties of the two testsI #
,being equated were different. The reset.* indicated that equipercentile equating
was very stable across the cases studied. Linear and Rauch model equating were very
sensitive to violations of their models' assumptions. Rasch model equating showed
robustness only for horizontal equating where the degree of chance scoring was the
same for both tests.
When data fit the Reach model, three-parameter model equating and Rasch
equating achieved comparable and accurate results. In all other cases,
three-parameter equating was far better than the Rasch model but generally not as
good as equipercentile equating. The results for three-parameter model equating
were disappointing since the data were generated from a three-parameter model.
16
Simultaneous estimation using LOGIST seemed unable to recover the original metric,
especially when mean test discriminations were unequal.
All of the equating methods were effected by some situations. Where the tests
being equated-differed in difficulty, mean discrimination, and in their.degree of
chance scoring, the equating terror was the largest for all four methods. This
suggests that equating tests should not be'attempted under such extreme conditions.
None of the equating methods could completely overcome the effect of such divergence
in item type.
The use of the MSE statistics produced several-paradoxes:in the results.
These could be resolved by examining the equating functions themselves. Certainly.
more statistical work needs to be done in the comparison of test characteristic
curves.
Finally, all of the data for this °study were generated from a unidimensional
three-parameter model. Real-data' do not exactly conform to this model, although it
seems reasonable in a wide variety of situations. How these methods would respond
to multidimensional data is not known, but problems for both IRT methods were
uncovered in the unidimensional case.
This study supported otheAesearch finding' that found the Rauch model
inappropriate for use in vertical equating situations. The three-parameter model
procedure used here also did not generally produde acceptable &sults in more
complex situations where we might have expected it to do so. The best advice at
this point would seem to be to pay very close attention to the_propertieerof the
test items. If the tests differ very much in their properties, then classic
equipercentile equating is suggested.
17
REFERENCES
AngoEf, W.H. (1971) Scales, Norms and Equivalents Scores. In R. L. Thorndike.(ed.). Educational Measurement (2nd ed.). Washington, D.C.: American Councilon Education.
Cureton, E.E. & Tukey, J.W. (1951) Smoothing frequency distributions, equatingtests, and preparing norms. American PaYchologist, 60 404. (abstract)
Divgi, D.R. (1981) Does the Reach model really work? Not if you look closely.Paper presented at annual meeting of American Educational Association, LosAngeles.
01
Forsyth, R., Saisang3an, U., & Gilmer, J. (1981) Some empirical results related tothe robustness of the Reach model. Applied Psychological Measurement, 5175-186.
Guskey. T.R. (1981) Comparison of a Reach model scale and the grade-equivalentscale for vertical equating of test scores. Applied Pavpholooical Measurement,1, 187-201.,
1/4
\..
Holmes. S.E. (1982). Unidimensionality and vertical equating with the Resch model.Journal of Educational Measurement, 11,139-147.
Holmesi S.E. & Doody-BOgen, E.N. (1983) An empirieal study of vertical equatingmethods using the three-parameter logistic model. Piper_ presented at theannual'meeting of the National Council on Measurement in Education, Montreal.
Hulin, C.L., Lissak; R.I., & Drasgow, F. (1982) Recovery of two- andthree - parameter. Logistic item characteristic curves: a Monte Carlo study.Applied Psychological Measurement, 60 249-260.
IMSL, Inc,. (1980) Internitiona athematica d-Ste Jet cal Libraries ReferenceManual, Houston, T X
Kolenja.--,1-Comparison of traditional and item response theory methods forequating tests. Journal of Educational Measurement, Ls, 1-11.
Levine. R.S. (1955) Equating the score scales of alternate forms administered to. samples of different ability. Research Bulletin No: 23. Princeton, NJ:'Educational Testing Service.
Levine, R.S. (1958) Estimated national norms for the Scholastic Aptitude Test.Statistical Report No. 1. Princeton, N.J.: Educational Testing Service.
Lord, F.M. (1980) Practical application of item response theory. Hillsdale, NJ:Lawrence Erlbaum.
Loyd, B.H. (1983) A comparison of the stability of selected vertical equatingmethods. Paper presented at the annual meeting of the Alerican EducationalResearch Association, Montreal.
Loyd, B.H. O.Hoover. H.D. (1980) Vertical equating using the Raschsodel., Journal of Educational Measurement, 17, 179-193.
Macro, G.L., Petersen, N.S., & Stewart, E.E. (1979) A test of the adequacy ofourvilinar score equating model. Paper presented at the Computerized AdaptiveTesting Conference, Minneapolis.
Petersen, N.S., Cook, L.L., & Stocking, M.L. (1981) IRT versus conventionalequating methods: A comparitive study of scale drift. Paper presented atannual meeting of American Educational Research Association, Los Angeles.
Slinde, J.A. & Linn, R.L. (1978) An.exploration of the adequacy of the Reach modelfor the 'problem of Vertical equating. Journal of Educational Measurement,23-35.
Slinde, J,A. & Linn, R.L. (1979) A note on vertical equating via the Reach modelfor groups of quite different ability and teats of quite different difficulty,Journal of Educational Measurement, 14, 159-165.
Tinsley. H.E. & Davis, R.V. (1975). An investigation of the Reach simple logisticmodel: Sample-free item and test calibration. Educational and PsvchaloaicalMeasurement, a2, 325-339.
Whitely, S.E. & Dawis, R.V. .(1974) The nature of objectivity with the Rauch moctol.Journal of Educational Measurement. ice, 1P-178.
Wingersky, M.S., Barton, M.H.,& Lord, P.M. (1983)- LOGIST: A computer program forestimating examinee-ability and'item characteristic curve parameters. LOGIST5,__versibh 1. Princeton, NJ: Educational Testing Service
Wright, B. D., Mead, R. J., & Bell, S. R. (1980) BICAL: Calibrating items with the%tech model. Research Memorandum No. 23c Chicago: Statistical Laboratory,Department of Education, University of Chicago.
Wright, B. D. & Stone, M.H. (1979) Best test design. Chicago, IL: Mesa Press.
19
Teals 3.
Raw Score Rims and Standard Striations for Generated Data
Test A Toot 11
e Haan $ .0 ; : e 16.J...n Lai0 .1 .0 17.54 7.23 0 .1 .0 7.5S 1 210 .1 :2 21.01 5.19 0 A 210 Al .0 17.79 7 .12 ' 0
.2. .15 .10.1 .2 XJ5 s.00
0 1 .2 21.20 5j$ : 0 .1 .0 17 .6 ., 0 30
o .5 .0 17.31 1j.3 0 1.1 .0, *
0 .20 .5- .0
20.55 4.5117.75 6.00 -- 0
01.1---.T2---2-.04 6.241.1 .2 21.01 6j11. *
0 .S .2 20_45-5-.09- 0 1.1 .0 17.3* 7.5Q
--------- ---0 1 .1 .0 17.41 7 .55 . 0 .5 Ji 17.3*0 1.1 .2 20.53 1.31 0. .5 .2 21.14 5.00 1.1 .0
1.117.12 7.12 0 .5 .2 2035 5.020 .2 21.20 1..36 0 .5 .0 17.63 5.50
.1 .0.5.1 .2
.0 .0
17.34 7.0120.52 5417.57 7.21
.s
.5
.5
.1 .o 17.42 7.05
.9 .2 20.3* 5 A0.9 .2 21.01 5 Ag
-.s .2 21.04 5.17 .5 .1 .0 37.35 47-.5 5 .0-a .5 .2
17.i321.16 5.07
.5.5
1.1 .0 17.41 7.601.1 .2 20.51 E.37
-.5 .5 .0 3.7. 6.04 .5 1.1 .2 21.16 6.27-15 -5 .2 20.97 5.10 .5 1.1 .0 17.55 7.70
....5 1.1 .0...5 1.1 .2
17.51 _ .4021.03 32
.5.55 6.5.5
.22217..0571
....7 1 .1 .01.1 .2
17.49 7.,20.89 6.42
.5 .5 ..2 20.63 5.15.3 .0 17 .39
-1.0 -1 .0 17.41 7.00 1.0 .1 .0 17 .37 7 44-1.0 .1 .2 20.14 5.50 1.0 . .1 .3 20.11 .6 .02-1.0 .1 .0 17.21 7.06 1 .0 .1 .2 21.04-1.0 9 .2 21.11 S AI 1.0 .9 .0 17.13 7.0$-1.0 .51, .0 12.47 1.12 1.0 1.1 .0 17 AO 7 .40-1.0 -5 -2 21.00 5.14 1.0 1.1 .2 20.75 .32-1.0 .5 .0 17.43 1.05 1.0 1.1 .2 20.55 - .41
A .5 .2 21.14 5.14 1.0 1.1 .0 17.60 .7.41
-1.0 1.1 .0 17.65 715 1.0 .5 .0 17.13-1.0 1.1 .2 20.51 1.46 1.0 .5 .2 20.57 5.15.-2 A 1 J. .0 17.55 7.41 1.0 .5 .2 2054-140 1.1 .2 n.ss 6.32 1.0 .5 .0 17.40 6.00
20
Table 2
Unweighted Meat Square Error for Teat Eouatint Cases
Test A Test
c.
.$ .0 Al .0. 8 .2 .6 .2.8 .0 .8 .2. 8 .2 .8' .0
.5 .0 1.1 .0
.5 .2 1.1 .2
. 5 .0 1.1 .2
. 5 .2 1.1 .0
I' 1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0
:8:8
.8
.0
.2
.2
.8
.8.
.8
.0
.2
.2
.0
.5 .0 1.1 .0.5- .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0
1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0
.8 .0 .8 .0
.8 .2 .8 .2
.8 .0 .8 .2.8 .2 .8 .0
.5 .0 1.1 .0..5 .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0
... . -- _1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0
SA - - 0.0
LINEAR.
E0 T'ILE RASCR 3-PARA
.000 .000 .000 .000
.003 .004 .000 .001
.088 .010 .231 .045
.003 .016 '.172 .037
.037 .043 .501 .146
.086 .036 .309 .133
.041 .042 '.140 .082
.028 .039 .890 .204
.094 .074 .854 .271
.195 .079 .422 .214
.705 .132 1.955 .563
.060 .041 .158 .111-bA a Tr .5
N,
.227 .008 .054 .oso
.369 .055 .185 .043
.391 .023 :826 .120
.163 .008 . .120 .058
.160 .040 .199
.479 .069.494
.133.387 .078
.210.157
.128 .036.370.596 .210
.605 .107 .945 .288
.427 .095 .982 .270
.819 .175 2.817 .563
.426 .040 .352 .131
-1.0; Tis 1.0
1.326 .011 .161 .1331.693 .072 .672 .1422.383 .112 1.787 .2281.109 .013 '.183 .117
.916 .059 .443 .2751.386 .178
.420 .1821.807 .090
1.174 .343.740 .058
.442 .212
3.099 .122 1.286 .5192.455 .107- 2.000 .4773.967 .234 4.585 .4782.168 .032 .807 .216
Table 3
Weighted Mean Square Zrtor for Test Equating Costs
me g TEST
e.8 .0 .8 .0.4 .2 .8 .2.8 .0 .8 .2.8 .2 .8 .0
.5 .0 1.1 .0
.5 .2 1.1 .2
.5 .0 1.1 .2
.5 .2 1.1 .0
1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 a1.1 .2 .5` 9.0
.8 .0 .8 .0.8 .2 .1 .2.11 .0 .8 .2Al .2 .8 .0
.5 .0 1.1 .0
.5 .2 1.1 .2.5 .0 1.1 .2.5 .2 1.1 .0
1.1 .0 .5 .01.3. .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0
.8 4 .0. .8 .0
.8 .2 .8 .218 .0 .8 .2.8 .2 .8 .0
.5 .0 1.1 .0
.5 . .2 1.1 .2
.5 .0 1.1 .2
.5 .2 1.1 .0
1.1 .0 .5 .01.1 .2 .5 .21.1 .0 .5 .21.1 .2 .5 .0
bag. bp. 0.0
CASE
=al ED ILE !ASCE 3 -PAPA
.000 °.000 .000 .000
.001 .002 .000 .001
.004 .004 .132 .030
.002 .001 .014 .005
.0E1 .023 .382 ^ .104
.029 .023 .251 .119
.021 .022 .120 .073
.019 .019 .521 .097
.081 .085 .132 .2514081 .015 .390 .244Nis .151 1.430 .510.040 .029 .129 .101
-.51 ;3 is .5
4108 .010 .058 .042.113 .002 .091 .059.198 .014 .215 .109
. .104 .008 .144 .057
.132 .031 .444 .190
.100 .025 .273 .144
.191 .039 .358 .193, .111 .024 .394 .110
.183 .047 .494 .258
.190 . .039 .512 .245
.417 .144 1.413 .414
.121 .034 .291 .114
;A is -1.01 b8. 1.0-
.870 .014 .140 .153
.775 .008 .194 .1551.484 .028 .444 .221
0.779 .013 .230 .123
.830 .048 . .514
.557 .021' .358..371.271
1.047 .034 .792 .410.441 .038 .480 .272
1.317 .031 .340 .2151.001 .033 .543 .3382.134 .093 1.142 485
.910 .014 .387 .157
22
Table 4
Analysis of Variance of Unweighted Mean Square Error
Source df ms
Among SsA,
Difficulty (DI) 2 5.9676 3140.642Discrimination (DS) 2 3.6789 1936.263Lower Aiymptote (L) 3 1.7128 901.474DI x DS 4 .4974 261.789DI x L 6 .3305 173.947DS x L 6 .5190 273.158DI x DS x L .12 .0019 .132
Within Ss
Method (m) 3 5.2413 363.979DI x m 6 2.3454 162.875DS x m 6 .7985 55.451L x al. 9 .4800 33.333DI x DS x m 12 .2033. 14.118DIxLxm '18 .1141 7.924DSxLxm 18 .2231 15.493DI xDSxLxm 36 .0144
Source
Table 5
Analysis of Variance of weighted mean square error
df MS F
Among Ss
Difficulty (DI) 2 1.345 336.657**Discrimination (DS) 2 .712 174.510**Lower Asymptote-(L) 3 .412 100.980**DI x DS 4 .002 .490DI x L 6 .034 8.333**DS x L 6 .156 38.235**DI x DS x L 12 .004
Within Ss
Method (m) . 3 1.178 136.028**DI x m 6 .762 87.991**DS x m 6 .124 14.319**L x m 9 .055 6.351**DI x DS x m 12 .041 . 4.734**DIxLxm 18 .026 3.002**.DSxLxm 18 .030 3.464**DSxDIxLxm 36 .009
1.0
2.05
,_ 1.0-
2.0
1.0
0
WEIGHTED MSE
'Es
.o
CP. 0 ...aeft. OEM 4. ono ale
8A1.1aB .5
Se-1.0Ile 1.0
C
A=.8
-ie. 8
We a.m. ye
O.. .ms irrollioft rimm... mon Imm a
Tv .5
173=1.1
linearequipercentileRaschthree-parameter
20
1.0.
o
DIFFICULTY
2.0
1:0
0
DISCRIMINATION
CA-I
cB.1=,
cA11.2
c.0cB B
.
cA=.2
2.0
1.0
0
=WEIGHTED MSE
40.040.10 :)///ow*
tool)
6.0 .
bA8.0
bB.0Ab .54
.5
Woo-B.
awn
loom dowel
m19,
A8
za--8ITA.1.1-
lie .5
1 60 sum ...err. -.-"A ,..40 g 4"%1:10
ce.2ces 0LOWER ASYMPTOTES
Figure 1. Celiplots of interactions between eqUating methods anddifficulty, discrimination, and lower4symptote
CB-0cA=.2
cB '
2
c 0A2.
cAu.2CB- 0
6
WEIGHTED MSE
4-----41inesr..ea7.11et-- -=34:par al I
410-..---eRasch
1.01
.41'" . aloeow.. aD.' mow 43.
;ono OMB
m.5 = ;5aB
s=1.1 , 4atm .5
0A B
...
r =.8 = .5_A14=1.1
aBw'8
TA=1.1lim .5
lr - -.5ABB = .5
'wow
a" 400.I..ammo IMO
OM...a.' MONO
nsmog.
e
3.. 8 '1 = .5.....k A.
aB=.8 a=1.1
\ . ri,..-1.0134
' '1 0. 1 .
f
Figure 211.: Cellplots of interac,tion.between equating methods, difficulty,...
and discrimination
I
0
3.01
2.04
1.0-
LUWEIGHTEp MSE1
..w
_Aig 1aB=1.1 ae. .5
S.
8
Y1'8
ar .5aB=1.1
gegia- rae -J
0 owe.
0IMAM OM OM, OOM MOD OMM
14.
26 :
WEIGHTED !ISE
,-- e.. Dar.
eitasen-
2.0
1.0
04
3.0 -
2.0
'1.0
0
e.tow twOmm a. 11. WOSo ut 11.06.. Imo OM el. tffillt as Mg
CA=0"
0CA =.2
c.=.2
11.0
1.0 .
0
UNWEIGHTED MSE
.c..,;-...... ..=: -,
...-:-"7. -4------.H.
... - ..
c=0 c..2::0::cA. cB=.2 A. ' :1.
.13
vo,. 11".."1 moco.. a.0",INIM am!.
C 0 c .2 C=O=-.AA.
, eA..2 c 0 C R.
cA
CBAR u A
..2 ccBBma.2 cB
A .5; rB = 1.1
....ero N.. a
C ;0 C c . 0 c ..2cA=0 cA..2 ek
a'2 oe
1.1; AB .5
3.0
2.0
1.0
0
- hD. ..43.
car, 0 c ..2 c ="0 cA=.2c
Bsou cA..2 cA=.2 c8= 0B B
Figure 3. Ceilpiots of interaction between equating methods, discriminationand lower asyrptote
27
of
2.0
3.0
2.0
1.0
WEIGHTED MSE
linear
ti.ri Rasch0
.. 4. ...0. "rk . ... . :
... aro .0. al 4. .p,.. ,.. .... a bme,._.......:
C m0 c ..2 C= 0 c m.2CAmu
A AcA.2 c
A=.2 ce 0B
.13.
b..... bti .. . ocr ' .6.......... pit.. stwmisigaboo. "w. "%ft.
w 1 .....,.
c =0 c ..2 c b 0 c ..2A
cBA.v c1B4..2 -B mgcBA ..2 cA
0
BA= -.
--s
"Z."
A1Cub. b ep
OM Me
-pare
UNWEIGHTED MSE
0
2.0
1.0
A0
."5
1.0
At
A/- *I
emu. "'ND
CA =0
cBinv ::::;
c 0 c A..2cBA..2
CB= 0
I
re.
Or .0, .bow. bow Am.
A°C) c A.I.2 :1 c A..2cAa° CA: 0' c . 26
c c = 0 c ..2,
,934 cB..2
A.2 .'41A 0ce ce c
B.0 c
B..2 .2 0
A 'ce CBS
. 11 4. 0. IF .1 0Ast ' B '
,
Figure 4. 0ellplote of the interaction between equating methods, difficulty,and lower asymptote
28
Figure 5'
Equating functions for four equating methods
Figures follow.
,,,
4e.e5
3e. se
5
T ae.e.
15.08
19.00
5.80
8.90
48.00
121111IIM
alatiWILIDMalUwe
1 VIVOI:DerAt14401111111111111114414441141{111111111111111
8.00 5.88 18.08 15.09 20.08 25.08 38.98 35.80 40.90
TEST
Test A: 15 = 0; E a .3; c = .0; g a 0.0Test 8: 5 = 0; E x .8; c = .0; 8 s 0.0
35.88-1
T 25.00 --
T 20.08A
15.00i
10.e0-
5.00-1211E1104
.m0. ammo 2111111110CM
Mel. UKAII
eav& 1-- taPCX=VII
1 .1 1 111pilipliiiiitili111111440.00 540, 10.00 115.08 20.88 25.00 30.88 35.80 4030
TEST 11
Test A: = 0; a = .8; c = .2; ti = 0.0Test B: = 0; a= .8; c = .2; =0.0
30
40.ee
33.884
38.88
J
18.80 ., .#.e. 00 01412311110
,.?" ....... apulatittVr .4..40.0 WMWM
_.....,.,drcur=erat0.80 . iiiiiiiiaitspiisiataisaliolssat
8.00 5.80 .10.88 15.00 20.00 25.80 30.00 35.00 4040TESTS
Test A; 5 = 0; I a .8; c = .0; i = 0.0Test B: 5= 0; 1 a .8; c = .2; 11 = 0.0
5.00 -C
40.10I(
35.004
30.007
T 25.88'E :$
T 28.06
A
ims
MO, GINO OM..
:
Ir_-
e.se:Ae. -es
triTIMICillME- OP- iposian31
a=
1111114.61114115.-0e Wee isite 20.ee as.es _ae.es 35.00 40
TEST 1
Test A: 5 = 0; = .8; c = .2'; 0.0 31Test 8: 5 = 0: I = .q: r = n z 1
881
40.90
35.8$
38.89
' 25.89
5 ,
7 ae.ee
'A
15.08
1009
5.68
MX=VIM=Si=MIR
milli.'" taral=rnit0.80 ri itaiiii.liers -u iiit al -4 a ii41 I1 1.119.80 5.00 18.88 15.0e.2e.8e 25.88. 30.88 35.80 48.80
TEST I
AO.00
35.08
30.08
ET 25.80
5ae.ee
.A
15.80
10.80
5.88
Test A: 5 s0; Bati .5; c_ .0; 8 * 0.0Test II; B 0; a 1.1; c = .0; e s 0:0
CCTJUIPI.03MMT.AIT .. 11=31
IS=gr=ante.ce -7-1-1-77779-1-1-14 1441111f 11-4110111111111cee 5.80 sem 15.001040 25.98 30.88. 35.08 4840,
TEST
Test A: 5 = 0; I = .5; c = .2; e = 0.0Testl; 15,4= 0; I = 1.1; c = .2; = o.o
32
3i.8.4
3e.88
25.88
S
T aces
A
16.84
18.88
S.80
OA. iliiiilliAtiffilT4146410.80. S.08 18.08 15.88 20.80 25.00 30.00 35.80 40.09
a-
4001MOUR
""""71.011110120111031
agra=rfas
49.80
35.08
38.88
TEST
Test A: B = 0;Test B: 8 = 0;
81
= .5; c = .0;= 1.1; c it .2;
8 .1 0.0
a 0.0
5.08i /
eiee=i4frrrynTr-rrrr,
j
=mum.1.841SCI. 111
111:::31 'a.. tiled
tcrall:0011.1I
0.88 5.08 10.00 15.08 20.88 25.00 30.08 3S.08
TEST 1
Test A: 5 = 0; = .5; c = .2; Om 0.0Test B: c = 0; = 1.1; c = .0; 8 = 0.0
4e.ee
49.00
2530
3e.ee
T as.ee
$
T ae.ee
'A
15.00
18.09
s.ee
0
d.
°,01141MVIM
*WM
!4"=4,014MWM4Cee 41T1 I II i i74Fistipt up-tie wag8.00 5.88 10.80 is.ee 28.08 25.88 38.80'35.88 40.88
TEST
Test A: 5= 0; 3 = 1.1; c = .0; ; = 0.0Test B: = 0; a= .5; c = .0; 13 = 0.0
eC41MUA'. VOC CIU543
e=as
.reft _t
8.8EF
e.ea s.ee tem is.ee 28.88 25.88 38.88 35.08 49.00
TEST
Test A: 5 .= 0; 1 = 1.1; C = .2; 6 = 0.0Test S: 5 = =
4.= G = .2: T1
'leen
548
40.00
25.00
30.00
.1111115.89- 10.08 15.08 28.88 2545
TEST
Test A: 6 = 0; I 1.1; c a .0, 5 =4.0Test B: 5 = 0; E 8. .5; c = .2; 5 = 0.0
...v.Ustampozzant
lidliIirrr30.80 35.08 48 0
10.00
5.03MUM
--a 11entiltfa=WOO
--dw tran=nt.tii1.114114141iiiiIk1141111 41-.111.11i-1SM.848 5.08 10.03 15.00 29.00 25.08 30.08-3540 40.08
TEST .O
Test A: 5 = 0; a = 1.1; c,
35.2; =0.0
.38.89
0.00 5.99 ie.ee 15.00 2940 25.00 30.011 35.00 4.911TEST I
004.4,-
WM=.1110111CM.
MIMI
'"''"` =mg
Test A:. 5 = -.5; a = .8; c a, .0; -.5Test B: B.= .5; a a .8; c a .0; 5 = t5
?nom
?aeonA
isle'
e.relaiwiftfro, otex=In=--- =ILE4#1 ialei 'awl-141611;1 I I'
e.ee SAO ie.ee 15.00 um 25.00 301.0$ 25.00 40.09i TEST
Test A: E r -.5; 3= .8; .c= 2; g -5Test .5; a .8; c_ .2; ; r .5 '36
4.94
311.9t
sloe1=1:01111
311111111M
Malmom.. laale
049 S.90 11.00 15m 80.00 2540 3040 25, 40.0$.TEST 0
Test A: 5 = -.5k 5 a .8; c.* .0; 8 'Test B: 5'.= .5; E a .8; c a :2; 11
4040
-.5.5
35.99
30.8e
T 25.80
29.00
A
15.00
19.00
5.90
15.COC4.414114-1341111i1C
CTITLIM..-....- 3^"Itli
11=1-0.0411.0110AMMI
0 Crallra=k111.10.00 5.00' 10.00 15.90 20.09 26.30 MOO 35.00 40.80
TEST
Test A: 5 = -.5; E = .8; c = .2; i = -.5Test 3: 5 = . A-= .8; c = .0; a = .5
40.00
35.00
3161
er 25.68
-T 26.80
A
15.00
Mel
5.80
a es'
is/e//
// /////.. 111211271111.
//e/ - ri wow=WOWOOpiiINIMORTIS
OM SAO 10.00 15.00 20.00 25.00 30.00 35.00 44.60
TEST
Test. A: 6 = I a c a .0; g -.5Test B:_5 = .5; a =, 1.1; c = .0; = .5
4040
ET as ee
S
T 20.00.7
t
5.00 4
LII=11- COPIZZORTLE
0.80 4
8.40 5.08 10.88 15.88 20.80 25.80 38.40 25.00 40.80
TEST I
Test A: 5 2 INS; I = .5; c = .2;. I = -.5Test B: 8 = .5; 3 = 1.1; c = . ; = .5
38
1g 80es-0
30.01
T.as.se
5
T 26.06
A
15.0e
10.88
5.08
4.1%7doe
r
cmgral311411=12611101
U1111111
00$1pii111111814 1 4 1',11 11111111
"41 IMIPII=Nrvi$ 88 mr.19-19.20-15.00 ae.es 25.88 38.8. 25.88 48.66
EST 1
lk 35.00--
ET as 004
5
T aeon
A
15.00 !e
10.084,
5.80-:
Test A: 5 -.5; 1 .5; c g a --5Test 2: 5 = .5; a 1.1; c = .2; 8 a .5
0.$9. 5..09 1048 15.00 20.09
TEST
Test A: if = -.5; . .5;Test 13: 5 = .5; a = 1.1;
=7"3.1t111I ipla=g111
Wm*"- 11=1":=11111/11 119-111-11-111125:80 30.80 3540 40.80
c = .2; g = -.5c = .0; 0 = .5
4C.Ce
39.80-E
T as.ee-:
5
T 28.43ei
A
19.88-;
. tic
.iimas."1"2
am=
.-----ampacestras0.88 iiiiii1 l4 i4 61 ii-4114-411144 4411646146118.88 5.88 18.88 15.88 28.88 25.81 38.88 35.88 48.88
TEST.
Test A: 6 = -.5; a is 1.1; c = .0; 8 e -.5Test 8: 6 = .5; a .5; c * .0; I a .5
40.0
.35.80
30.00
Er um5
T 29.80
A
15.88
se.ee
5.ee=:r ON
- --- lar=1:1Weltarts
esecitIiiii7i164,111.1.-6-1114-)444411641611t61'""'"' 1=MLi0.80 543 sem mee tem 25.84 38.00 25.130 40 00
TEST i
Test A: 5 = -.5; 5 = 1.1; c = .2; g = -.5Test B: b = .5; a = .5; c = .2; 8 = .5
40
25.88
ae.eeA
15.88
/1 IV
e're. /4/7'
18.88 /5.88 if
i.
.L
400.EA..,-,.........o...e.ee. Iwo, 11181411iii.iii"14-ECT
i.e. 5.81 10488 15.08 29.88 2548
TEST I
Test A: 5 = -.5; 1 = 1.1; c = .0; i a -.5Test B: 5 = .5; a.= .5; c = .2; 0 = .5.
4440
U
T 25.08-4
T 28.881 4%A
15.884ei;//1"/
le.ee-
WEINCII1ada
11111 411 411
30.88 35.88
.41041.0.1114.
rit=70S ee *"."'44 Ong=
:A14 Sr 4.4. UN=I...., Crat:
i ice=C171.eeee
8.e4 5.88 18.88 15.80 29.88 25.88 38.89 35.88 40.88f TEST B
Test A: 6 = -.5; a = 1.1; c = .2; g = -.5Test B: 3 = .5; = .5; c = .0; g = .5
41
.
36.00
" as.efE ,
ST aces
._,A
1648 f/18.88
LSOalga
011.---escrecarentCOO
J-6 71 I 11111 11111111CO. LOS MOO 16.88 28.88 atm 3840 36.8$ 411.11$
TESTS
'.Test A: 6 =,-1.0; r = .8; c .0;Test B: b a 1.0; r .8vc,a .0; g a 1.0
35' CC
3C.8.2
T 25.80
S
720.88
A
16.88
Mee
.e/
C21=1011faie - - S
II:=N%u
CSO I I I I I I 1 - tr..45=mt8.88 6.88 10.08 16.88 acqs E6.88 38.88 36.80 48.0$
TEST*
Test A: F = -1.0; C= .8; c = .2; U. -1.QTest B: b = 1.0; a = .8; c = .2; 0.= 1.0
C.
35.0$-:
34.01i
25.0E' 1
0'.' 20.8071,.A
15.887-5
le.ce
5.00 -
I. Sejaa
R.30
4ece
35.88
30.08
E25.005
T acee
A
15.08
0:7103111
"WagtailWmWOO
''''.0OUOCCONTOS
603 10.88 15.88 20.00 25.811 30.80 35.80 40.80TEST I
Test A: 5 a -1.0; a = .8; c .2; ; a -1.0Test 8: 5 = 1.0: 1 = .8; c .0; 5* 1.0
e
%/
18.88de
teara5.08bags
MP NI UP:01.OWN& traVa=erntesee
0.00 5.09 10.00 15.80 28.00 25.00 33.08 15.00 40.08TEST 1
Test A: 5 = -i.0; E = .8; c = .0; U = -1.0Test B: 5 = 1.0; a = .8; c = .2* 0 = 1.0
43
25.805
T 28.86
A
15.86
18.08
5.00
4949
35.00
30.00
T 25.80
T 29.00
A
15.80
5.00
= INIMOW.12=11111
lasa
auprocanut
10.00 15.80 20.00 25.00 38.08 35.0$ 40.08"
TEST I
Test A: 6 = -1.0; I = .5; c .0; g"ii -1.0Test B: 6 = 1.0; E = 1.1; c = .0; g = 1.0
4.-.P....=11122111
poimcmg
0 100411.
VP MD %ben-IliirPCX:DMILIC
Lee 1 AA kiwCM 540 leas is.ee 20.80 2540 30.30 35.80 40.80
TEST I
Test A: b = -1.0; B = .5; c = .2; g = -1.0Test B: b = 1.0; a = 1.1; c = .2; 8 = 1.0
44
35.28
39.88
'T nom
S
T28.63A
15.80
10.89
538
r
"or
# ".40/10000.00.
e
Ot7.!1;18.88 :I iii1111411.1414114111p4,14ii
0.88 5.88 18.88 15.88 20.38 25.88 38.88 35.88 40.09
TEST 11
Test A: 5 = -1.0; a = .5; c = .0; g = -1.0Test B: b = 1.0; a = 1.1; c = .2; 5 ix 1.0
+00.4.
,CCM2UM11=11.1=11 31111111.=
alial/011 MO
tIrACI=Inii
T 25.88
S
T 20.80
A
............
«do'
5.88-1.-..... UMW
ocial....- 3possen.
eotte amiss611'14114M1s.ee-eime-ab:11-3e.ee- 3s4.--413.e8
Is
etrITZMI
TEST 1
Test A: 6 =Test B: 5 =
-1.0; ' =1.0; a =
.5; c =1.1; c =
.2; g ==
-1.01.0
45
A 4.0
25.04
24.80
as.es
ST ae.ee
A
15.80
10.80
3polinClitWellMe18.001:111=
cos' s.es ie.es is.es 2e.ee as.ee 38.80 35
44.ec
35.0
Mee
Test A: 5 it -1.0; 1 1.1; c = .0; g = -1.0Test B: 5 = 1.0.; I = 5; c 11-.0; g = 1.0-
44.01
MEN
CZITCUIR
reamrst5.00 MU/I WOOemora=cruz
COO 4411411111411 4h1441 it1 411 11111111fLi Lee 10.0 15.00 20.00 35.88 se.es 3540 40.0TEST
-1.01.0
Test A: 5 = -1.0; I = 1.1; c = .2; g =Test B: 5 = 1.0; a = .5; c .2; 6 =
46
38.80
ET
35.001.00
S t
(azA
18.08--
s.es
Lee
4040
35.80
39.80
iv-F.44.4 fl .-.
8.08 5.89 18.00 15.88 20.88 25.08
TESTI
smaggriaIwoModraimapme
38.00 35.88 40.80
Test A: 5 = -1.0; I = 1.1; c = .0; =Test B: 5°a 1.0; a = = .2; 0 =
7 25.80ade .
.0
.1
4r/ /de-,
.0dr
T 2e.ee / /
A df.7 b
15.00 --wt
-1.01.0
.19.00
5.00
0.00
--- C=1211p=c732lig:II
4 -,001.210M!---..wammomng
W.16111111 114 I I II 1-I WI 11 16 1
0.09 5:818 10.00 sae 2e.ee mee 30.88 35.00 48.ee
TEST I
Test A: 5 = -1.0; a = 1.1; cTest B: 5 = 1.0; = .5; c
47
= .0; 0 =-1.01.0