eysenck personality test
TRANSCRIPT
-
8/12/2019 Eysenck Personality test
1/12
http://mec.sagepub.com/and Development
Evaluation in CounselingMeasurement and
http://mec.sagepub.com/content/44/3/159The online version of this article can be foundat:
DOI: 10.1177/0748175611409845 2011 44: 159Measurement and Evaluation in Counseling and Development
Tammi Vacha-Haase and Bruce ThompsonStudies
Score Reliability: A Retrospective Look Back at 12 Years of Reliability Generalization
Published by:
http://www.sagepublications.com
On behalf of:
Institution of Mechanical Engineers
at:can be foundMeasurement and Evaluation in Counseling and Developmentdditional services and information for
http://mec.sagepub.com/cgi/alertsEmail Alerts:
http://mec.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://mec.sagepub.com/content/44/3/159.refs.htmlCitations:
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/content/44/3/159http://www.imeche.org/homehttp://mec.sagepub.com/content/44/3/159http://mec.sagepub.com/content/44/3/159http://www.sagepublications.com/http://www.sagepublications.com/http://www.imeche.org/homehttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/http://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://mec.sagepub.com/subscriptionshttp://mec.sagepub.com/subscriptionshttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/cgi/alertshttp://www.imeche.org/homehttp://www.imeche.org/homehttp://www.sagepublications.com/http://mec.sagepub.com/content/44/3/159http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
2/12
What is This?
- Jun 2, 2011Version of Record>>
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
09 4
http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://mec.sagepub.com/content/44/3/159.full.pdfhttp://mec.sagepub.com/content/44/3/159.full.pdfhttp://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://mec.sagepub.com/content/44/3/159.full.pdf -
8/12/2019 Eysenck Personality test
3/12
Measurement and Evaluation inCounseling and Development
44(3) 159168 The Author(s) 2011
Reprints and permission: http://www.sagepub.com/journalsPermissions.nav
DOI: 10.1177/0748175611409845http://mecd.sagepub.com
Research in Brief
All the statistical analyses (e.g., t tests,
ANOVA, ANCOVA, Pearson r, regression, as
well as T2, MANOVA, MANCOVA, descrip-
tive discriminant analysis, canonical correla-
tion analysis) within the general linear model
(GLM; see Cohen, 1968; Knapp, 1978) are
correlational in that the implicit building block
for these analyses is the computation of the
intervariable correlation or covariance matrix.Indeed, secondary analyses of previously
published results are easily performed given
access to these matrices, even if the raw data
are unavailable (Zientek & Thompson, 2009).
However, poor score reliability will compro-
mise estimates of both statistical significance
(i.e.,pCALCULATED
values) and effect size within
classical GLM analyses, because score reli-
abilities are not considered by the analyses.
Instead, classical GLM analyses assume per-
fect or at least very good score reliabilities.
Score reliabilitycharacterizes the degree to
which scores measure something as opposed
to nothing (e.g., are completely random).
Random variations in data, including the ran-
dom variations associated with measure-
ment error, attenuate the relationships among
measured variables. Such attenuation occurs
because correlation coefficients are sensitive
to systematic covariances among measured
variables replicated over study participants and
not random fluctuations.The fact that poor score reliability compro-
mises the foundation of commonly applied
statistical analyses suggests the obvious con-
clusion that evaluation of the score reliabili-
ties for the scores in hand ought to be the
09845 MECXXX10.1177/0748175611409845Vacha-Hasseand ThompsonMeasurementunselingand Development 44(3)
1Colorado State University, Fort Collins, CO, USA2Texas A&M University, College Station, TX, USA3Baylor College of Medicine, Houston, TX, USA
Corresponding Author:Bruce Thompson, Dept. of Educ. Psyc., 4225 TAMU
College Station, TX 77843, USA
Email: [email protected]
Score Reliability: A
Retrospective Look Back
at 12 Years of Reliability
Generalization Studies
Tammi Vacha-Haase1and Bruce Thompson2,3
Abstract
The present study was conducted to characterize (a) the features of the thousands of primary
reports synthesized in 47 reliability generalization (RG) measurement meta-analysis studies and(b) typical methodological practice within the RG literature to date. With respect to the treat-ment of score reliability in the literature, in an astounding 54.6% of the 12,994 primary reportsauthors did not even mention reliability! Furthermore, in 15.7% of the primary reports authors
did mention score reliability, but merely inducted previously reported values as if they appliedto their data. Clearly, the admonitions of Wilkinson and the APA Task Force (1999) have yetto have their desired impacts with respect to reporting reliability estimates for ones own data.
Keywords
reliability, measurement, psychometrics, reliability generalization, meta-analysis
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
4/12
160 Measurement and Evaluation in Counseling and Development 44(3)
obligatory first step in any quantitative study,
prior to conducting any substantive analyses.
In the words of the American Psychological
Association (APA) Task Force on Statistical
Inference,
It is important to remember that a test is
not reliable or unreliable. Reliability is
a property of the scores on a test for a
particular population of examinees. . . .
Thus, authors should provide reliability
coefficients of the scores for the data
being analyzed even when the focus
of their research is not psychometric.
(Wilkinson & APA Task Force, 1999,
p. 596)
Given the importance of score reliability in all
quantitative analyses, and the fluctuations in
reliabilities across test administrations, ways
to explore systematically the variabilities in
reliabilities should be of special interest to
researchers.
Reliability Generalization (RG)Meta-Analysis
Twelve years ago, in a seminal article, Vacha-
Haase (1998) proposed RG as an extension of
another measurement meta-analytic method,
Validity Generalization, which was developed
by Schmidt and Hunter (1977) and Hunter
and Schmidt (1990). Vacha-Haase (1998)
described RG as a method to characterize
empirically: (a) the typical reliability of
scores for a given test across studies, (b) theamount of variability in reliability coefficients
for given measures, and (c) the sources of
variability in reliability coefficients across
studies (p. 6).
Reliability generalization is built on the
recognition that it is incorrect to speak of the
reliability of the test, or to say that the test
is reliable (Thompson, 1994). Reliability
inures as a property to scores, and not to tests
(Thompson & Vacha-Haase, 2000). Thus,
reliability coefficients fluctuate across test
administrations, and these fluctuations are
ripe for meta-analytic investigation.
The dozen years or so since the Vacha-
Haases (1998) conceptualization of RG have
seen both RG-related methodology develop-
ments (e.g., Bonnett, 2010; Rodriguez &
Maeda, 2006) as well as an increasing numberof RG studies being published. Tutorials on
how to do RG studies have been presented
(Henson & Thompson, 2002). And recogni-
tion of RG has been international (Dandan &
Houcan, 2004).
To date, several dozen RG meta-analyses
have been reported across an impressive array
of measures. For example, RG studies have
been conducted on literatures for measures
involving statetrait anxiety (Barnes, Harp,
& Jung, 2002), locus of control (Beretvas,
Suizzo, Durham, & Yarnell, 2008), mathe-
matics anxiety (Capraro, Capraro, & Henson,
2001), psychopathology (Campbell, Pulos,
Hogan, & Murry, 2005), learning styles
(Henson & Hwang, 2002), substance abuse
propensities (Miller, Woodson, Howell, &
Shields, 2009), ways of coping (Rexrode,
Petersen, & OToole, 2008), and life satisfac-
tion (Wallace & Wheeler, 2002).
Purposes of the Present Article
The present article reports a secondary analy-
sis of the 47 RG studies presented in journal
articles during the past 12 years. We identified
these 47 RG studies by searching PsycInfo
and ERIC for any use of the term reliability
generalization in the title, abstract, or as a
keyword. The source RG reports are desig-
nated with asterisks in our references. Weconducted our study for two broad purposes.
Quality of the social sciences literature. Our
first purpose was to characterize the quality of
the social sciences literature with respect to
score reliability considerations, as reflected in
the primary reports synthesized in the 47 RG
studies. Similar older analyses provide some
historical context for our more contemporary
report.
In an examination of the American Edu-
cational Research Journal (AERJ), Willson
(1980) reported that only 37% ofAERJarticles
explicitly provided reliability coefficients for
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
5/12
Vacha-Hasse and Thompson 161
the data analyzed in the studies, and he con-
cluded that reliability . . . is unreported in . . .
[so much published research] is . . . inexcus-
able at this late date (p. 9). Almost 20 years
later, Vacha-Haase, Ness, Nilsson, and Reetz(1999) reviewed three journals and found that
only 36% of the quantitative articles pro-
vided reliability coefficients for the data being
analyzed.
One reason for poor treatment of psycho-
metric issues within the social sciences litera-
ture is that [a]lthough most programs in
sociobehavioral sciences, especially doctoral
programs, require a modicum of exposure to
statistics and research design, few seem to
require the same where measurement is con-
cerned (Pedhazur & Schmelkin, 1991, p. 2).
Unfortunately, doctoral curricula in recent
years have allocated less and less space for
psychometric training (Capraro & Thompson,
2008), so practices may not have improved
since the time of Willsons (1980) report.
Our mega meta-analysis (see Vacha-Haase,
Henson, & Caruso, 2002) of the 47 RG stud-
ies provides a contemporary assessment of
the degree to which authors of primary reportsare attending to score reliability issues. The
RG studies each synthesized an average of
342.0 (SD= 494.8) prior studies. Thus, the
present study characterizes a huge array of
original studies in diverse areas of the social
sciences.
This first research focus included consider-
ation of how often primary researchers ignored
reliability, inducted prior reliability for mea-
sures rather than reporting reliability for theirown scores (see Vacha-Haase, Kogan, &
Thompson, 2000), or reported reliability for
the data actually being analyzed in their sub-
stantive studies. We also sought to character-
ize the typical score reliabilities reported in
primary reports summarized in RG studies and
the variability of these reliabilities.
Typical practice within the RG literature. In
addition to characterizing the quality of the
primary reports synthesized in the 47 RG
studies with respect to score reliability, second,
we also sought to characterize typical meth-
odological practice within the RG literature to
date. For example, we were interested in the
ways that RG researchers identified source
studies, the types of statistical and graphical
analyses reported, the types of predictor vari-
ables used to predict variabilities in score reli-abilities, and which predictors were or were
not generally found to be useful in making
these predictions.
Results
Quality of the Literature WithRespect to Score Reliability
Across the 47 studies, on average, literature
searches for instrument uses yielded 814.1
hits (SD = 1195.4). However, many of
these turned out to be theoretical or nonem-
pirical studies or studies in which the target
measure was mentioned but not administered.
On average, each RG study involved 342.0
(SD= 494.9) empirical studies in which the
target measure was administered.
In an astounding 54.6% of the 12,994 pri-
mary reports authors did not even mention
reliability! This is a discouraging finding withrespect to the integrity of such a broad array
of substantive studies, especially because most
of these reports used classical GLM methods.
Although structural equation modeling (SEM)
does estimate measurement error variance as
part of substantive analyses, as noted previ-
ously, classical GLM methods (e.g., ANOVA,
regression, descriptive discriminant analysis)
do notestimate measurement error variances
as part of their substantive analyses (seeYetkiner & Thompson, in press). Clearly, the
admonitions of Wilkinson and the APA Task
Force (1999) have yet to have their desired
impacts with respect to reporting reliability
estimates for ones own data.
In 15.7% of the 12,994 primary reports,
authors did mention score reliability but
merely inducted previously reported values as
if they applied to their data. When this was
done, in 48.0% of the inductions only the test
manual was referenced as the source of the
induction, whereas in the remaining cases the
manual and/or prior articles were referenced.
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
6/12
162 Measurement and Evaluation in Counseling and Development 44(3)
RG studies were based on an average of
64.5 (SD= 52.4) primary reports in which
authors reported reliability coefficients for
their own data. In only eight of the RG studies
did the researchers contact primary authors inan attempt to obtain information missing from
the primary reports. Because some RG studies
involved multiple related measures, multiple
subscale scores from a single measure,
reliability coefficients being reported for
subgroups, or multiple administrations of
measures, a given RG study often involved
multiple reliability coefficients. RG studies
on average involved 240.0 (SD=755.6) reli-
ability estimates.
The average of the mean coefficient alpha
values reported across the RG studies was .80
(SD= .09) and ranged from .45 to .95. The
smallest mean alpha reported in the RG stud-
ies was .17, and the largest mean alpha was
.92. However, some of these values were for
subscales on measures rather than for total
scores. And it must be remembered that coef-
ficient alpha and other coefficients, such as
stability reliability coefficients, measure quite
different things and thus tend to vary even forthe same measure (McCrae, Kurtz, Yamagata,
& Terracciano, 2011).
Typical Practice Withinthe RG Literature
Diverse statistical and graphical methods
were used across the 47 RG meta-analyses.
RG researchers frequently used multiple anal-
yses to understand and characterize their RGdata. A majority (i.e., 54.2%) of the 47 RG
reports used multiple regression as an analy-
sis, whereas 27.1% of the reports used
ANOVA. Some 6.2% of the RG researchers
used hierarchical linear modeling to honor
the fact that in some studies subscales were
nested within measures or several reliability
coefficients were nested within single pri-
mary reports. Box-and-whisked plots were
used in 35.4% of the 47 RG studies.
RG researchers typically investigate which
features of the primary reports may predict
variabilities in score reliabilities. The RG
studies on average investigated 8.5 (SD=4.0)
predictor variables in these analyses. The most
commonly used predictor variables included
gender (83.3% of the 47 RG studies), sample
size (68.8%), age in years (54.2%), and eth-nicity (52.1%).
The predictors that, when used, tended to
be noteworthy included the number of items
for measures that had forms of different
lengths (31.2%) and the score standard devia-
tion in the primary report (29.2%). Both
these results are psychometrically reasonable.
Scores from measures with more items tend to
be more reliable, especially when more items
result in more dispersed total test scores,
because total test score dispersion drives
score reliability so strongly (Reinhardt, 1996;
Thompson, 2003). Of course, scores from
longer tests do notinherently have higher reli-
ability, if the test is made longer by adding
items of poor quality or that do not increase
total score dispersion. For example, the Bem
Sex Role Inventory is a published test for
which the short form scores on the Femininity
scale tend to be higher than for their long form
counterpart (Bem, 1981).Two other predictors also tended to be
noteworthy in predicting variabilities in score
reliabilities. Participant age (22.9%) and par-
ticipant gender (22.9%) were among the better
predictors of variabilities in score reliabilities.
Discussion
RG studies provide some insight about both
the score reliabilities produced by given mea-sures across samples and typical reliability
reporting practices within the literature. With
respect to the first outcome, authors of RG
studies must work to avoid certain pitfalls
(see Dimitrov, 2002). For example, RG inves-
tigators should take into account use of
(a) different types of reliability estimates
across studies and (b) different test forms,
especially when forms have different numbers
of items.
A particularly difficult challenge for
RG researchers involves the RG modeling
misspecifications that occur when relevant
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
7/12
Vacha-Hasse and Thompson 163
characteristics of the study samples are not
coded as independent variables in RG analysis
(Dimitrov, 2002, p. 794). These model mis-
specifications may occur because original
reports often do not provide enough detailabout the measurement and sampling designs
being used.
A related problem is that substantive
researchers who report score reliability coef-
ficients for their own data most often report
only Cronbachs alpha, notwithstanding the
limitations of this estimate and the fact that
the measurement model underlying that esti-
mate may not fit many of the situations in
which the estimate is used (see Dimitrov,
2002). Hogan, Benjamin, and Brezinskis
(2000) empirical study of the literature found
that two thirds of the articles they examined
reported alpha, and they also noted that
despite their prominence in the psychometric
literature of the past 20 years, we encountered
no reference to generalizability coefficients . . .
or to the test information functions that arise
from item response theory (p. 528).
These problems limit the potential benefits
of RG studies. Of course, as Thompson andVacha-Haase (2000) reminded,
It is important to remember that RG
studies are a meta-analytic character-
ization of what is hoped is a population
of previous reports. We may not like
the ingredients that go into making this
sausage, but the RG chef can only work
with the ingredients provided by the
literature. (p. 184)
Score Reliability Within the
Social Sciences Literature
Our most important finding is that such an
astonishingly large proportion (i.e., a little
more than half) of primary substantive studies
do not even mention score reliability! This is
a discouraging finding with respect to the
integrity of such a broad array of primary
studies, especially because most of these
studies used classical GLM analyses. Clearly,
the admonitions of Wilkinson and the APA
Task Force (1999) have yet to have their
desired impacts.
We believe this disturbing reality is an
artifact of too many applied researchers still
believing that tests qua teststhemselves havethe property of reliability. This misconception
may not be conscious, but is all the more per-
nicious when unconscious, because uncon-
scious misperceptions may be less likely to be
reconsidered and corrected. The problem of
sloppy speaking about reliability, in which
tests are described as being reliable,
is not just an issues of sloppy speak-
ingthe problem is that sometimes we
unconsciously come to think what we
say or what we hear, so that sloppy
speaking does sometimes lead to a
more pernicious outcome, sloppy think-
ing and sloppy practice. (Thompson,
1992, p. 436)
Some textbooks directly confront the mis-
conception that tests are reliable. For exam-
ple, Pedhazur and Schmelkin (1991) noted,
Statements about the reliability of a measureare . . . [inherently] inappropriate and poten-
tially misleading (p. 82). Similarly, Gronlund
and Linn (1990) emphasized that
reliability refers to the resultsobtained
with an evaluation instrument and not
to the instrument itself. . . . Thus, it
is more appropriate to speak of the
reliability of the test scores or the
measurement than of the test orthe instrument. (p. 78)
More recently, Urbina (2004) emphasized
the fact is that the quality of reliability is one
that, if present, belongs not to test but to test
scores (p. 119). She perceptively noted that
the distinction between scores versus tests
being reliable is subtle, but noted that
the distinction is fundamental to an
understanding of the implications of the
concept of reliability with regard to the
use of tests and the interpretation of test
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
8/12
164 Measurement and Evaluation in Counseling and Development 44(3)
scores. If a test is described as reliable,
the implication is that its reliability
has been established permanently, in all
respects for all uses, and with all users.
(p. 120)
Urbina utilizes a piano analogy to illustrate
the fallacy of describing tests as reliable, not-
ing that saying the test is reliable is similar
to stating a piano will always sound the same,
regardless of the type of music played, the
person who is playing it, the type of the piano,
or the surrounding acoustical environment.
Our second major finding is that score reli-
ability on average in the applied research lit-
erature appears to be reasonably sufficient to
support inquiry using classical GLM statis-
tics, given a mean coefficient alpha of .80
(SD=.09) and a range from .45 to .95.
These results suggest a glass-half-full-
half-empty conclusion about the quality of
our literature with respect to score reliability.
Clearly, some substantive studies are being
conducted with scores of questionable reli-
ability. Furthermore, we must wonder what
were the reliabilities of those scores in thosestudies in which reliabilities were not reported
for data in hand, or reliability was not even
mentioned!
Typical Practice Within
the RG Literature
Each of the 47 RG studies we investigated
involved a gargantuan investment of researcher
time and effort, as does any meta-analysis,whether the meta-analysis is substantive or
psychometric in focus. RG researchers are
employing a wide array of statistical analyses,
and one out of three used box-and-whisker
plots to communicate their results, which
is consistent with the recommendation of
Wilkinson and APA Task Force (1999) to use
graphics to communicate multiple features of
data (e.g., central tendency, dispersion, shape,
outliers) in pictures. We also found that RG
researchers are using a wide array of predictor
variables to help understand what design
features may cause reliabilities to fluctuate
across test administrations.
Over time, we expect the quality of RG
studies to improve further once more and
more primary reports include estimates ofscore reliabilities as more researchers realize
that tests are not reliable. Indeed, the most
important impact of the creation of RG and
the reporting of RG findings is that these
reports in themselves directly confront chronic
misconceptions that tests are reliable. RG
studies in and of themselves communicate the
important understanding that score reliabili-
ties vary across administrations and are not
secreted into test booklets during the test
printing process.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of
interest with respect to the research, authorship,
and/or publication of this article.
Funding
The author(s) received no financial support for the
research, authorship, and/or publication of this
article.
References
Note: The 47 RG studies included in this study are
marked with asterisks.
*Bachner, Y. G., & ORourke, N. (2007). Reli-
ability generalization of responses by care
providers to the Zarit Burden Interview.Aging
& Mental Health, 11, 678685. doi:10.1080/
13607860701529965
*Barnes, L. L. B., Harp, D., & Jung, W. S. (2002).Reliability generalization of scores on the Spiel-
berger StateTrait Anxiety Inventory.Educa-
tional and Psychological Measurement, 62,
603618. doi:10.1177/0013164402062004005
Bem, S. L. (1981). Bem Sex-Role Inventory: Pro-
fessional manual. Palo Alto, CA: Consulting
Psychologists Press.
*Beretvas, S. N., Meyers, J. L., & Leite, W. L. (2002).
A reliability generalization study of the Marlowe
Crowne Social Desirability Scale.Educational
and Psychological Measurement, 62, 570589.
doi:10.1177/0013164402062004003
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
9/12
Vacha-Hasse and Thompson 165
*Beretvas, S. N., Suizzo, M.-A., Durham, J. A.,
& Yarnell, L. M. (2008). A reliability gen-
eralization study of scores on Rotters and
NowickiStricklands locus of control scales.
Educational and Psychological Measurement,68, 97119. doi:10.1177/0013164407301529
Bonnett, D. G. (2010). Varying coefficient alpha
meta-analytic methods for alpha reliability.Psy-
chological Methods, 15, 368385. doi:10.1037/
a0020142
*Campbell, J. S., Pulos, S., Hogan, M., & Murry, F.
(2005). Reliability generalization of the Psycho-
pathy Checklist Applied in youthful samples.
Educational and Psychological Measurement,
65, 639656. doi:10.1177/0013164405275666
*Capraro, R. M., & Capraro, M. M. (2002).
MyersBriggs Type Indicator score reliability
across studies: A meta-analytic reliability gen-
eralization study.Educational and Psychologi-
cal Measurement, 62, 590602. doi:10.1177/
0013164402062004004
*Capraro, M. M., Capraro, R. M., & Henson, R. K.
(2001). Measurement error of scores on the
Mathematics Anxiety Rating Scale across
studies. Educational and Psychological Mea-
surement, 61, 373386. doi:10.1177/00131640121971266
Capraro, R. M., & Thompson, B. (2008). The edu-
cational researcher defined: What will future
researchers be trained to do?Journal of Edu-
cational Research, 101, 247253. doi:10.3200/
JOER.101.4.247-253
*Caruso, J. C. (2000). Reliability generalization
of the NEO Personality Scales. Educational
and Psychological Measurement, 60, 236254.
doi:10.1177/00131640021970484*Caruso, J. C., & Edwards, S. (2001). Reliability gen-
eralization of the Junior Eysenck Personality Ques-
tionnaire.Personality and Individual Differences,
31, 173184. doi:10.1016/S091-8869(00)00126-4
*Caruso, J. C., Witkiewitz, K., Belcourt-Dittloff, A.,
& Gottlieb, J. D. (2001). Reliability of scores
from the Eysenck Personality Questionnaire:
A reliability generalization study.Educational
and Psychological Measurement, 61, 675689.
doi:10.1177/00131640121971437
Cohen, J. (1968). Multiple regression as a general
data-analytic system. Psychological Bulletin,
70, 426433. doi:10.1037/h0026714
Dandan, G., & Houcan, Z. (2004). A redefinition
of reliability and the study of reliability gen-
eralization.Psychological Science (China), 27,
445448.
*Deditius-Island, H. K., & Caruso, J. C. (2002). Anexamination of the reliability of scores from Zuck-
ermans Sensation Seeking Scales, Form V.Edu-
cational and Psychological Measurement, 62,
728734. doi:10.1177/0013164402062004012
Dimitrov, D. M. (2002). Reliability: Arguments
for multiple perspectives and potential prob-
lems with generalizability across studies.Edu-
cational and Psychological Measurement, 62,
783801. doi:10.1177/001316402236878
*Dunn, T. W., Smith, T. B., & Montoya, J. A.
(2006). Multicultural competency instrumen-
tation: A review and analysis of reliability
generalization.Journal of Counseling & Devel-
opment, 84, 471482.
*Graham, J. M., & Christiansen, K. (2009). The reli-
ability of romantic love: A reliability generaliza-
tion meta-analysis.Personal Relationships, 16,
4966. doi:10.1111/j.1475-6811.2009.01209.x
*Graham, J. M., Liu, Y. J., & Jeziorski, J. L. (2006).
The Dyadic Adjustment Scale: A reliability
generalization meta-analysis. Journal of Mar-riage and Family, 68, 701717. doi:10.1111/
j.1741-3737.2006.00284.x
Gronlund, N. E., & Linn, R. L. (1990). Measure-
ment and evaluation in teaching (6th ed.).
New York, NY: Macmillan.
*Hanson, W. E., Curry, K. T., & Bandalos, D. L.
(2002). Reliability generalization of Working
Alliance Inventory scale scores. Educational
and Psychological Measurement, 62, 659673.
doi:10.1177/0013164402062004008*Hellman, C. M., Fuqua, D. R., & Worley, J. (2006).
A reliability generalization study on the Sur-
vey of Perceived Organizational Support: The
effects of mean age and number of items on
score reliability. Educational and Psychologi-
cal Measurement, 66, 631642. doi:10.1177/
0013164406288158
*Hellman, C. M., Muilenburg-Trevino, E. M., &
Worley, J. A. (2008). The belief in a just
world: An examination of reliability estimates
across three measures. Journal of Personal-
ity Assessment, 90, 399401. doi:10.1080/
00223890802108238
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
10/12
166 Measurement and Evaluation in Counseling and Development 44(3)
*Henson, R. K., & Hwang, D.-Y. (2002). Vari-
ability and prediction of measurement error in
a Kolbs Learning Style Inventory Scores: A
reliability generalization study. Educational
and Psychological Measurement, 62, 712727.doi:10.1177/ 0013164402062004011
*Henson, R. K., Kogan, L. R., & Vacha-Haase, T.
(2001). A reliability generalization study of the
Teacher Efficacy Scale and related instruments.
Educational and Psychological Measurement,
61, 404420. doi:10.1177/00131640121971284
Henson, R. K., & Thompson, B. (2002). Charac-
terizing measurement error in scores across
studies: Some recommendations for conduct-
ing reliability generalization (RG) studies.
Measurement and Evaluation in Counseling
and Development, 35, 113127.
Hogan, T. P., Benjamin, A., & Brezinski, K. L.
(2000). Reliability methods: A note on the
frequency of use of various types. Educa-
tional and Psychological Measurement, 60,
523531.
Hunter, J. E., & Schmidt, F. L. (1990). Methods
of meta-analysis: Correcting error and bias in
research findings. Newbury Park, CA: Sage.
*Huynh, Q.-L., Howell, R. T., & Benet-Martinez, V.(2009). Reliability of bidimensional accul-
turation scores: A meta-analysis. Journal of
Cross-Cultural Psychology, 40, 256274.
doi:10.1177/0022022108328919
*Kieffer, K. M., Cronin, C., & Fister, M. C. (2004).
Exploring variability and sources of measure-
ment error in Alcohol Expectancy Question-
naire reliability coefficients: A meta-analytic
reliability generalization study. Journal of
Studies on Alcohol, 65, 663671.*Kieffer, K. M., & Reese, R. J. (2002). A reliability
generalization study of the Geriatric Depres-
sion Scale. Educational and Psychological
Measurement, 62, 969994. doi:10.1177/0013
164402238085
Knapp, T. R. (1978). Canonical correlation analysis:
A general parametric significance testing sys-
tem. Psychological Bulletin , 85, 410416.
doi:10.1037//0033-2909.85.2.410
*Lane, G. G., White, A. E., & Henson, R. K. (2002).
Expanding reliability generalization methods
with KR-21 estimates: An RG study of the
Coopersmith Self-Esteem Inventory. Educa-
tional and Psychological Measurement, 62,
685711. doi:10.1177/0013164402062004010
*Leach, L. F., Henson, R. K., Odom, L. R., &
Cagle, L. S. (2006). A reliability generalization
study of the Self-Description Questionnaire.Educational and Psychological Measurement,
66, 285304. doi:10.1177/0013164405284030
*Li, A., & Bagger, J. (2007). The Balanced Inven-
tory of Desirable Responding (BIDR): A reli-
ability generalization study. Educational and
Psychological Measurement, 67, 525544.
doi:10.1177/001316440292087
*Lopez-Pina, J. A., Sanchez-Meca, J., & Rosa-
Alcazar, A. I. (2009). The Hamilton Rating
Scale for Depression: A meta-analytic reli-
ability generalization study. International
Journal of Clinical and Health Psychology, 9,
143159.
McCrae, R. R., Kurtz, J. E., Yamagata, S., & Ter-
racciano, A. (2011). Internal consistency,
retest reliability, and their implications
for personality scale validity. Personality
and Social Psychology Review, 15, 2850.
doi:10.1177/1088868310366253
*Miller, B. K., & Byrne, Z. S. (2009). Perceptions
of organizational politics: A demonstration ofthe reliability generalization technique.Journal
of Managerial Issues, 21, 280300.
*Miller, C. S., Shields, A. L., Campfield, D.,
Wallace, K. A., & Weiss, R. D. (2007). Sub-
stance use scales of the Minnesota Multiphasic
Personality Inventory: An exploration of score
reliability via meta-analysis. Educational and
Psychological Measurement, 67, 10521065.
doi:10.1177/0013164406299130
*Miller, C. S., Woodson, J., Howell, R. T., &Shields, A. L. (2009). SASSI: Assessing the
reliability of scores produced by the Substance
Abuse Subtle Screening Inventory. Substance
Use & Misuse, 44, 10901100.
*Mji, A., & Alkhateeb, H. M. (2005). Combining
reliability coefficients: Toward reliability gen-
eralization of the Conceptions of Mathemat-
ics Questionnaire.Psychological Reports, 96,
627634. doi:10.2466/pr0.96.3.627-634
*Nilsson, J. E., Schmidt, C. K., & Meek, W. D.
(2002). Reliability generalization: An exami-
nation of the Career Decision-Making Self-
Efficacy Scale.Educational and Psychological
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
11/12
Vacha-Hasse and Thompson 167
Measurement, 62, 647658. doi:10.1177/
0013164402062004007
*ORourke, N. (2004). Reliability generalization
of responses by care providers to the Center
for Epidemiologic StudiesDepression Scale.Educational and Psychological Measurement,
64, 973990. doi:10.1177/0013164404268668
Pedhazur, E. J., & Schmelkin, L. P. (1991). Mea-
surement, design, and analysis: An integrated
approach. Hillsdale, NJ: Erlbaum.
*Reese, R. J., Kieffer, K. M., & Briggs, B. K.
(2002). A reliability generalization study of
select measures of adult attachment style.Edu-
cational and Psychological Measurement, 62,
619646. doi:10.1177/0013164402062004006
Reinhardt, B. (1996). Factors affecting coef-
ficient alpha: A mini Monte Carlo study. In
B. Thompson (Ed.),Advances in social science
methodology (Vol. 4, pp. 320). Greenwich,
CT: JAI Press.
*Rexrode, K. R., Petersen, S., & OToole, S.
(2008). The Ways of Coping Scale: A reli-
ability generalization study.Educational and
Psychological Measurement, 68, 262280.
doi:10.1177/ 0013164407310128
Rodriguez, M. C., & Maeda, Y. (2006). Meta-analysisof coefficient alpha. Psychological Methods,
11, 306322. doi:10.1037/1082-989X.11.3.306
*Ross, M. E., Blackburn, M., & Forbes, S. (2005).
Reliability generalization of the Patterns of Adap-
tive Learning Survey Goal Orientation Scales.
Educational and Psychological Measurement,
65, 451464. doi:10.1177/0013164404272496
*Rouse, S. V. (2007). Using reliability generaliza-
tion methods to explore measurement error:
An illustration using the MMPI-2 PSY-5Scales.Journal of Personality Assessment, 88,
264275.
*Ryngala, D. J., Shields, A. L., & Caruso, J. C.
(2005). Reliability generalization of the
Revised Childrens Manifest Anxiety Scale.
Educational and Psychological Measurement,
65, 259271. doi:10.1177/0013164404272495
Schmidt, F. L., & Hunter, J. E. (1977). Develop-
ment of a general solution to the problem of
validity generalization. Journal of Applied
Psychology, 62, 529540. doi:10.1037//0021-
9010.62.5.529
*Shields, A. L., & Caruso, J. C. (2003). Reliabil-
ity generalization of the Alcohol Use Dis-
orders Identification Test. Educational and
Psychological Measurement, 63, 404413.
doi:10.1177/0013164403063003004*Shields, A. L., & Caruso, J. C. (2004). A reli-
ability induction and reliability generalization
study of the CAGE Questionnaire.Educational
and Psychological Measurement, 64, 254270.
doi:10.1177/0013164403261814
Thompson, B. (1992). Two and one-half decades
of leadership in measurement and evaluation.
Journal of Counseling and Development, 70,
434438.
Thompson, B. (1994). Guidelines for authors.Edu-
cational and Psychological Measurement, 54,
837847.
Thompson, B. (Ed.). (2003). Score reliability:
Contemporary thinking on reliability issues.
Thousand Oaks, CA: Sage.
*Thompson, B., & Cook, C. (2002). Stability of the
reliability of LibQUAL + TM scores: A reli-
ability generalization meta-analysis study.Edu-
cational and Psychological Measurement, 62,
735743. doi:10.1177/0013164402062004013
Thompson, B., & Vacha-Haase, T. (2000). Psycho-metrics is datametrics: The test is not reliable.
Educational and Psychological Measurement,
60, 174195. doi:10.1177/00131640021970448
Urbina, S. (2004).Essentials of psychological test-
ing. Hoboken, NJ: John Wiley.
Vacha-Haase, T. (1998). Reliability generaliza-
tion: Exploring variance in measurement error
affecting score reliability across studies.Edu-
cational and Psychological Measurement, 58,
620. doi:10.1177/00131640121971059Vacha-Haase, T., Henson, R. K., & Caruso, J. C.
(2002). Reliability generalization: Moving
toward improved understanding and use of
score reliability. Educational and Psychologi-
cal Measurement, 62, 562569. doi:10.1177/
0013164402062004002
*Vacha-Haase, T., Kogan, L. R., Tani, C. R.,
& Woodall, R. A. (2001). Reliability gener-
alization: Exploring variation of reliability
coefficients of MMPI clinical scales scores.
Educational and Psychological Measurement,
61, 4559. doi:10.1177/00131640121971059
at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from
http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/ -
8/12/2019 Eysenck Personality test
12/12
168 Measurement and Evaluation in Counseling and Development 44(3)
Vacha-Haase, T., Kogan, L.R., & Thompson, B.
(2000). Sample compositions and variabilities
in published studies versus those in test man-
uals: Validity of score reliability inductions.
Educational and Psychological Measurement,60, 509522. doi:10.1177/00131640021970682
Vacha-Haase, T., Ness, C. M., Nilsson, J., &
Reetz, D. (1999). Practices regarding reporting
of reliability coefficients: A review of three jour-
nals. Journal of Experimental Education, 67,
335341. doi:10.1080/00220979909598487
*Vacha-Haase, T., Tani, C. R., Kogan, L. R.,
Woodall, R. A., & Thompson, B. (2001). Reli-
ability generalization: Exploring reliability
variations on MMPI/MMPI-2 validity scale
scores. Assessment, 8, 391401. doi:10.1177/
107319110100800404
*Vassar, M., & Crosby, J. W. (2008). A reliability
generalization study of coefficient alpha for
the UCLA Loneliness Scale. Journal of Per-
sonality Assessment, 90, 601607. doi:10.1080/
00223890802388624
*Victorson, D., Barocas, J., Song, J., & Cella, D.
(2008). Reliability across studies from the Func-
tional Assessment of Cancer TherapyGeneral
(FACT-G) and its subscales: A reliability gen-eralization. Quality of Life Research: An Inter-
national Journal of Quality of Life Aspects of
Treatment, Care & Rehabilitation,17, 11371146.
doi:10.1007/s11136-008-9398-2
*Wallace, K. A., & Wheeler, A. J. (2002). Reli-
ability generalization of the Life Satisfaction
Index.Educational and Psychological Mea-
surement, 62, 674684. doi:10.1177/001316
4402062004009
Wilkinson, L., & American Psychological Associa-tion (APA) Task Force on Statistical Inference.
(1999). Statistical methods in psychology jour-
nals: Guidelines and explanations. American
Psychologist, 54, 594604. doi:10.1037//0003-
066X.54.8.594
Willson, V. L. (1980). Research techniques in
AERJ articles: 1969 to 1978. Educational
Researcher, 9(6), 510. doi:10.2307/1175221Yetkiner, Z. E., & Thompson, B. (in press). Dem-
onstration of how score reliability is integrated
into SEM and how reliability affects all statisti-
cal analyses.Multiple Linear Regression View-
points, 36(2).
*Yin, P., & Fan, X. (2000). Assessing the reli-
ability of Beck Depression Inventory scores:
Reliability generalization across studies.Edu-
cational and Psychological Measurement, 60,
201223. doi:10.1177/00131640021970466
*Youngstrom, E. A., & Green, K. W. (2003). Reli-
ability generalization of self-report of emotions
when using the Differential Emotions Scale.
Educational and Psychological Measurement,
63, 279295. doi:10.1177/0013164403253226
*Zangaro, G. A., & Soeken, K. L. (2005). Meta-
analysis of the reliability and validity of Part B of
the Index of Work Satisfaction across studies.
Journal of Nursing Measurement, 13, 722.
doi:10.1891/jnum.2005.13.1.7
Zientek, L. R., & Thompson, B. (2009). Matrix summa-ries improve research reports: Secondary analyses
using published literature.Educational Researcher,
38, 343352. doi:10.3102/0013189X09339056
Bios
Tammi Vacha-Haase is a professor of psychol-
ogy at Colorado State University.
Bruce Thompson is a distinguished professor of
educational psychology, and of library science, atTexas A&M University, and adjunct professor of
allied health sciences, Baylor College of Medicine
(Houston).