eysenck personality test

Upload: zmada-anghel

Post on 03-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Eysenck Personality test

    1/12

    http://mec.sagepub.com/and Development

    Evaluation in CounselingMeasurement and

    http://mec.sagepub.com/content/44/3/159The online version of this article can be foundat:

    DOI: 10.1177/0748175611409845 2011 44: 159Measurement and Evaluation in Counseling and Development

    Tammi Vacha-Haase and Bruce ThompsonStudies

    Score Reliability: A Retrospective Look Back at 12 Years of Reliability Generalization

    Published by:

    http://www.sagepublications.com

    On behalf of:

    Institution of Mechanical Engineers

    at:can be foundMeasurement and Evaluation in Counseling and Developmentdditional services and information for

    http://mec.sagepub.com/cgi/alertsEmail Alerts:

    http://mec.sagepub.com/subscriptionsSubscriptions:

    http://www.sagepub.com/journalsReprints.navReprints:

    http://www.sagepub.com/journalsPermissions.navPermissions:

    http://mec.sagepub.com/content/44/3/159.refs.htmlCitations:

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/content/44/3/159http://www.imeche.org/homehttp://mec.sagepub.com/content/44/3/159http://mec.sagepub.com/content/44/3/159http://www.sagepublications.com/http://www.sagepublications.com/http://www.imeche.org/homehttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/http://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsPermissions.navhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/content/44/3/159.refs.htmlhttp://mec.sagepub.com/content/44/3/159.refs.htmlhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://mec.sagepub.com/subscriptionshttp://mec.sagepub.com/subscriptionshttp://mec.sagepub.com/cgi/alertshttp://mec.sagepub.com/cgi/alertshttp://www.imeche.org/homehttp://www.imeche.org/homehttp://www.sagepublications.com/http://mec.sagepub.com/content/44/3/159http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    2/12

    What is This?

    - Jun 2, 2011Version of Record>>

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    09 4

    http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://mec.sagepub.com/content/44/3/159.full.pdfhttp://mec.sagepub.com/content/44/3/159.full.pdfhttp://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://mec.sagepub.com/content/44/3/159.full.pdf
  • 8/12/2019 Eysenck Personality test

    3/12

    Measurement and Evaluation inCounseling and Development

    44(3) 159168 The Author(s) 2011

    Reprints and permission: http://www.sagepub.com/journalsPermissions.nav

    DOI: 10.1177/0748175611409845http://mecd.sagepub.com

    Research in Brief

    All the statistical analyses (e.g., t tests,

    ANOVA, ANCOVA, Pearson r, regression, as

    well as T2, MANOVA, MANCOVA, descrip-

    tive discriminant analysis, canonical correla-

    tion analysis) within the general linear model

    (GLM; see Cohen, 1968; Knapp, 1978) are

    correlational in that the implicit building block

    for these analyses is the computation of the

    intervariable correlation or covariance matrix.Indeed, secondary analyses of previously

    published results are easily performed given

    access to these matrices, even if the raw data

    are unavailable (Zientek & Thompson, 2009).

    However, poor score reliability will compro-

    mise estimates of both statistical significance

    (i.e.,pCALCULATED

    values) and effect size within

    classical GLM analyses, because score reli-

    abilities are not considered by the analyses.

    Instead, classical GLM analyses assume per-

    fect or at least very good score reliabilities.

    Score reliabilitycharacterizes the degree to

    which scores measure something as opposed

    to nothing (e.g., are completely random).

    Random variations in data, including the ran-

    dom variations associated with measure-

    ment error, attenuate the relationships among

    measured variables. Such attenuation occurs

    because correlation coefficients are sensitive

    to systematic covariances among measured

    variables replicated over study participants and

    not random fluctuations.The fact that poor score reliability compro-

    mises the foundation of commonly applied

    statistical analyses suggests the obvious con-

    clusion that evaluation of the score reliabili-

    ties for the scores in hand ought to be the

    09845 MECXXX10.1177/0748175611409845Vacha-Hasseand ThompsonMeasurementunselingand Development 44(3)

    1Colorado State University, Fort Collins, CO, USA2Texas A&M University, College Station, TX, USA3Baylor College of Medicine, Houston, TX, USA

    Corresponding Author:Bruce Thompson, Dept. of Educ. Psyc., 4225 TAMU

    College Station, TX 77843, USA

    Email: [email protected]

    Score Reliability: A

    Retrospective Look Back

    at 12 Years of Reliability

    Generalization Studies

    Tammi Vacha-Haase1and Bruce Thompson2,3

    Abstract

    The present study was conducted to characterize (a) the features of the thousands of primary

    reports synthesized in 47 reliability generalization (RG) measurement meta-analysis studies and(b) typical methodological practice within the RG literature to date. With respect to the treat-ment of score reliability in the literature, in an astounding 54.6% of the 12,994 primary reportsauthors did not even mention reliability! Furthermore, in 15.7% of the primary reports authors

    did mention score reliability, but merely inducted previously reported values as if they appliedto their data. Clearly, the admonitions of Wilkinson and the APA Task Force (1999) have yetto have their desired impacts with respect to reporting reliability estimates for ones own data.

    Keywords

    reliability, measurement, psychometrics, reliability generalization, meta-analysis

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    4/12

    160 Measurement and Evaluation in Counseling and Development 44(3)

    obligatory first step in any quantitative study,

    prior to conducting any substantive analyses.

    In the words of the American Psychological

    Association (APA) Task Force on Statistical

    Inference,

    It is important to remember that a test is

    not reliable or unreliable. Reliability is

    a property of the scores on a test for a

    particular population of examinees. . . .

    Thus, authors should provide reliability

    coefficients of the scores for the data

    being analyzed even when the focus

    of their research is not psychometric.

    (Wilkinson & APA Task Force, 1999,

    p. 596)

    Given the importance of score reliability in all

    quantitative analyses, and the fluctuations in

    reliabilities across test administrations, ways

    to explore systematically the variabilities in

    reliabilities should be of special interest to

    researchers.

    Reliability Generalization (RG)Meta-Analysis

    Twelve years ago, in a seminal article, Vacha-

    Haase (1998) proposed RG as an extension of

    another measurement meta-analytic method,

    Validity Generalization, which was developed

    by Schmidt and Hunter (1977) and Hunter

    and Schmidt (1990). Vacha-Haase (1998)

    described RG as a method to characterize

    empirically: (a) the typical reliability of

    scores for a given test across studies, (b) theamount of variability in reliability coefficients

    for given measures, and (c) the sources of

    variability in reliability coefficients across

    studies (p. 6).

    Reliability generalization is built on the

    recognition that it is incorrect to speak of the

    reliability of the test, or to say that the test

    is reliable (Thompson, 1994). Reliability

    inures as a property to scores, and not to tests

    (Thompson & Vacha-Haase, 2000). Thus,

    reliability coefficients fluctuate across test

    administrations, and these fluctuations are

    ripe for meta-analytic investigation.

    The dozen years or so since the Vacha-

    Haases (1998) conceptualization of RG have

    seen both RG-related methodology develop-

    ments (e.g., Bonnett, 2010; Rodriguez &

    Maeda, 2006) as well as an increasing numberof RG studies being published. Tutorials on

    how to do RG studies have been presented

    (Henson & Thompson, 2002). And recogni-

    tion of RG has been international (Dandan &

    Houcan, 2004).

    To date, several dozen RG meta-analyses

    have been reported across an impressive array

    of measures. For example, RG studies have

    been conducted on literatures for measures

    involving statetrait anxiety (Barnes, Harp,

    & Jung, 2002), locus of control (Beretvas,

    Suizzo, Durham, & Yarnell, 2008), mathe-

    matics anxiety (Capraro, Capraro, & Henson,

    2001), psychopathology (Campbell, Pulos,

    Hogan, & Murry, 2005), learning styles

    (Henson & Hwang, 2002), substance abuse

    propensities (Miller, Woodson, Howell, &

    Shields, 2009), ways of coping (Rexrode,

    Petersen, & OToole, 2008), and life satisfac-

    tion (Wallace & Wheeler, 2002).

    Purposes of the Present Article

    The present article reports a secondary analy-

    sis of the 47 RG studies presented in journal

    articles during the past 12 years. We identified

    these 47 RG studies by searching PsycInfo

    and ERIC for any use of the term reliability

    generalization in the title, abstract, or as a

    keyword. The source RG reports are desig-

    nated with asterisks in our references. Weconducted our study for two broad purposes.

    Quality of the social sciences literature. Our

    first purpose was to characterize the quality of

    the social sciences literature with respect to

    score reliability considerations, as reflected in

    the primary reports synthesized in the 47 RG

    studies. Similar older analyses provide some

    historical context for our more contemporary

    report.

    In an examination of the American Edu-

    cational Research Journal (AERJ), Willson

    (1980) reported that only 37% ofAERJarticles

    explicitly provided reliability coefficients for

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    5/12

    Vacha-Hasse and Thompson 161

    the data analyzed in the studies, and he con-

    cluded that reliability . . . is unreported in . . .

    [so much published research] is . . . inexcus-

    able at this late date (p. 9). Almost 20 years

    later, Vacha-Haase, Ness, Nilsson, and Reetz(1999) reviewed three journals and found that

    only 36% of the quantitative articles pro-

    vided reliability coefficients for the data being

    analyzed.

    One reason for poor treatment of psycho-

    metric issues within the social sciences litera-

    ture is that [a]lthough most programs in

    sociobehavioral sciences, especially doctoral

    programs, require a modicum of exposure to

    statistics and research design, few seem to

    require the same where measurement is con-

    cerned (Pedhazur & Schmelkin, 1991, p. 2).

    Unfortunately, doctoral curricula in recent

    years have allocated less and less space for

    psychometric training (Capraro & Thompson,

    2008), so practices may not have improved

    since the time of Willsons (1980) report.

    Our mega meta-analysis (see Vacha-Haase,

    Henson, & Caruso, 2002) of the 47 RG stud-

    ies provides a contemporary assessment of

    the degree to which authors of primary reportsare attending to score reliability issues. The

    RG studies each synthesized an average of

    342.0 (SD= 494.8) prior studies. Thus, the

    present study characterizes a huge array of

    original studies in diverse areas of the social

    sciences.

    This first research focus included consider-

    ation of how often primary researchers ignored

    reliability, inducted prior reliability for mea-

    sures rather than reporting reliability for theirown scores (see Vacha-Haase, Kogan, &

    Thompson, 2000), or reported reliability for

    the data actually being analyzed in their sub-

    stantive studies. We also sought to character-

    ize the typical score reliabilities reported in

    primary reports summarized in RG studies and

    the variability of these reliabilities.

    Typical practice within the RG literature. In

    addition to characterizing the quality of the

    primary reports synthesized in the 47 RG

    studies with respect to score reliability, second,

    we also sought to characterize typical meth-

    odological practice within the RG literature to

    date. For example, we were interested in the

    ways that RG researchers identified source

    studies, the types of statistical and graphical

    analyses reported, the types of predictor vari-

    ables used to predict variabilities in score reli-abilities, and which predictors were or were

    not generally found to be useful in making

    these predictions.

    Results

    Quality of the Literature WithRespect to Score Reliability

    Across the 47 studies, on average, literature

    searches for instrument uses yielded 814.1

    hits (SD = 1195.4). However, many of

    these turned out to be theoretical or nonem-

    pirical studies or studies in which the target

    measure was mentioned but not administered.

    On average, each RG study involved 342.0

    (SD= 494.9) empirical studies in which the

    target measure was administered.

    In an astounding 54.6% of the 12,994 pri-

    mary reports authors did not even mention

    reliability! This is a discouraging finding withrespect to the integrity of such a broad array

    of substantive studies, especially because most

    of these reports used classical GLM methods.

    Although structural equation modeling (SEM)

    does estimate measurement error variance as

    part of substantive analyses, as noted previ-

    ously, classical GLM methods (e.g., ANOVA,

    regression, descriptive discriminant analysis)

    do notestimate measurement error variances

    as part of their substantive analyses (seeYetkiner & Thompson, in press). Clearly, the

    admonitions of Wilkinson and the APA Task

    Force (1999) have yet to have their desired

    impacts with respect to reporting reliability

    estimates for ones own data.

    In 15.7% of the 12,994 primary reports,

    authors did mention score reliability but

    merely inducted previously reported values as

    if they applied to their data. When this was

    done, in 48.0% of the inductions only the test

    manual was referenced as the source of the

    induction, whereas in the remaining cases the

    manual and/or prior articles were referenced.

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    6/12

    162 Measurement and Evaluation in Counseling and Development 44(3)

    RG studies were based on an average of

    64.5 (SD= 52.4) primary reports in which

    authors reported reliability coefficients for

    their own data. In only eight of the RG studies

    did the researchers contact primary authors inan attempt to obtain information missing from

    the primary reports. Because some RG studies

    involved multiple related measures, multiple

    subscale scores from a single measure,

    reliability coefficients being reported for

    subgroups, or multiple administrations of

    measures, a given RG study often involved

    multiple reliability coefficients. RG studies

    on average involved 240.0 (SD=755.6) reli-

    ability estimates.

    The average of the mean coefficient alpha

    values reported across the RG studies was .80

    (SD= .09) and ranged from .45 to .95. The

    smallest mean alpha reported in the RG stud-

    ies was .17, and the largest mean alpha was

    .92. However, some of these values were for

    subscales on measures rather than for total

    scores. And it must be remembered that coef-

    ficient alpha and other coefficients, such as

    stability reliability coefficients, measure quite

    different things and thus tend to vary even forthe same measure (McCrae, Kurtz, Yamagata,

    & Terracciano, 2011).

    Typical Practice Withinthe RG Literature

    Diverse statistical and graphical methods

    were used across the 47 RG meta-analyses.

    RG researchers frequently used multiple anal-

    yses to understand and characterize their RGdata. A majority (i.e., 54.2%) of the 47 RG

    reports used multiple regression as an analy-

    sis, whereas 27.1% of the reports used

    ANOVA. Some 6.2% of the RG researchers

    used hierarchical linear modeling to honor

    the fact that in some studies subscales were

    nested within measures or several reliability

    coefficients were nested within single pri-

    mary reports. Box-and-whisked plots were

    used in 35.4% of the 47 RG studies.

    RG researchers typically investigate which

    features of the primary reports may predict

    variabilities in score reliabilities. The RG

    studies on average investigated 8.5 (SD=4.0)

    predictor variables in these analyses. The most

    commonly used predictor variables included

    gender (83.3% of the 47 RG studies), sample

    size (68.8%), age in years (54.2%), and eth-nicity (52.1%).

    The predictors that, when used, tended to

    be noteworthy included the number of items

    for measures that had forms of different

    lengths (31.2%) and the score standard devia-

    tion in the primary report (29.2%). Both

    these results are psychometrically reasonable.

    Scores from measures with more items tend to

    be more reliable, especially when more items

    result in more dispersed total test scores,

    because total test score dispersion drives

    score reliability so strongly (Reinhardt, 1996;

    Thompson, 2003). Of course, scores from

    longer tests do notinherently have higher reli-

    ability, if the test is made longer by adding

    items of poor quality or that do not increase

    total score dispersion. For example, the Bem

    Sex Role Inventory is a published test for

    which the short form scores on the Femininity

    scale tend to be higher than for their long form

    counterpart (Bem, 1981).Two other predictors also tended to be

    noteworthy in predicting variabilities in score

    reliabilities. Participant age (22.9%) and par-

    ticipant gender (22.9%) were among the better

    predictors of variabilities in score reliabilities.

    Discussion

    RG studies provide some insight about both

    the score reliabilities produced by given mea-sures across samples and typical reliability

    reporting practices within the literature. With

    respect to the first outcome, authors of RG

    studies must work to avoid certain pitfalls

    (see Dimitrov, 2002). For example, RG inves-

    tigators should take into account use of

    (a) different types of reliability estimates

    across studies and (b) different test forms,

    especially when forms have different numbers

    of items.

    A particularly difficult challenge for

    RG researchers involves the RG modeling

    misspecifications that occur when relevant

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    7/12

    Vacha-Hasse and Thompson 163

    characteristics of the study samples are not

    coded as independent variables in RG analysis

    (Dimitrov, 2002, p. 794). These model mis-

    specifications may occur because original

    reports often do not provide enough detailabout the measurement and sampling designs

    being used.

    A related problem is that substantive

    researchers who report score reliability coef-

    ficients for their own data most often report

    only Cronbachs alpha, notwithstanding the

    limitations of this estimate and the fact that

    the measurement model underlying that esti-

    mate may not fit many of the situations in

    which the estimate is used (see Dimitrov,

    2002). Hogan, Benjamin, and Brezinskis

    (2000) empirical study of the literature found

    that two thirds of the articles they examined

    reported alpha, and they also noted that

    despite their prominence in the psychometric

    literature of the past 20 years, we encountered

    no reference to generalizability coefficients . . .

    or to the test information functions that arise

    from item response theory (p. 528).

    These problems limit the potential benefits

    of RG studies. Of course, as Thompson andVacha-Haase (2000) reminded,

    It is important to remember that RG

    studies are a meta-analytic character-

    ization of what is hoped is a population

    of previous reports. We may not like

    the ingredients that go into making this

    sausage, but the RG chef can only work

    with the ingredients provided by the

    literature. (p. 184)

    Score Reliability Within the

    Social Sciences Literature

    Our most important finding is that such an

    astonishingly large proportion (i.e., a little

    more than half) of primary substantive studies

    do not even mention score reliability! This is

    a discouraging finding with respect to the

    integrity of such a broad array of primary

    studies, especially because most of these

    studies used classical GLM analyses. Clearly,

    the admonitions of Wilkinson and the APA

    Task Force (1999) have yet to have their

    desired impacts.

    We believe this disturbing reality is an

    artifact of too many applied researchers still

    believing that tests qua teststhemselves havethe property of reliability. This misconception

    may not be conscious, but is all the more per-

    nicious when unconscious, because uncon-

    scious misperceptions may be less likely to be

    reconsidered and corrected. The problem of

    sloppy speaking about reliability, in which

    tests are described as being reliable,

    is not just an issues of sloppy speak-

    ingthe problem is that sometimes we

    unconsciously come to think what we

    say or what we hear, so that sloppy

    speaking does sometimes lead to a

    more pernicious outcome, sloppy think-

    ing and sloppy practice. (Thompson,

    1992, p. 436)

    Some textbooks directly confront the mis-

    conception that tests are reliable. For exam-

    ple, Pedhazur and Schmelkin (1991) noted,

    Statements about the reliability of a measureare . . . [inherently] inappropriate and poten-

    tially misleading (p. 82). Similarly, Gronlund

    and Linn (1990) emphasized that

    reliability refers to the resultsobtained

    with an evaluation instrument and not

    to the instrument itself. . . . Thus, it

    is more appropriate to speak of the

    reliability of the test scores or the

    measurement than of the test orthe instrument. (p. 78)

    More recently, Urbina (2004) emphasized

    the fact is that the quality of reliability is one

    that, if present, belongs not to test but to test

    scores (p. 119). She perceptively noted that

    the distinction between scores versus tests

    being reliable is subtle, but noted that

    the distinction is fundamental to an

    understanding of the implications of the

    concept of reliability with regard to the

    use of tests and the interpretation of test

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    8/12

    164 Measurement and Evaluation in Counseling and Development 44(3)

    scores. If a test is described as reliable,

    the implication is that its reliability

    has been established permanently, in all

    respects for all uses, and with all users.

    (p. 120)

    Urbina utilizes a piano analogy to illustrate

    the fallacy of describing tests as reliable, not-

    ing that saying the test is reliable is similar

    to stating a piano will always sound the same,

    regardless of the type of music played, the

    person who is playing it, the type of the piano,

    or the surrounding acoustical environment.

    Our second major finding is that score reli-

    ability on average in the applied research lit-

    erature appears to be reasonably sufficient to

    support inquiry using classical GLM statis-

    tics, given a mean coefficient alpha of .80

    (SD=.09) and a range from .45 to .95.

    These results suggest a glass-half-full-

    half-empty conclusion about the quality of

    our literature with respect to score reliability.

    Clearly, some substantive studies are being

    conducted with scores of questionable reli-

    ability. Furthermore, we must wonder what

    were the reliabilities of those scores in thosestudies in which reliabilities were not reported

    for data in hand, or reliability was not even

    mentioned!

    Typical Practice Within

    the RG Literature

    Each of the 47 RG studies we investigated

    involved a gargantuan investment of researcher

    time and effort, as does any meta-analysis,whether the meta-analysis is substantive or

    psychometric in focus. RG researchers are

    employing a wide array of statistical analyses,

    and one out of three used box-and-whisker

    plots to communicate their results, which

    is consistent with the recommendation of

    Wilkinson and APA Task Force (1999) to use

    graphics to communicate multiple features of

    data (e.g., central tendency, dispersion, shape,

    outliers) in pictures. We also found that RG

    researchers are using a wide array of predictor

    variables to help understand what design

    features may cause reliabilities to fluctuate

    across test administrations.

    Over time, we expect the quality of RG

    studies to improve further once more and

    more primary reports include estimates ofscore reliabilities as more researchers realize

    that tests are not reliable. Indeed, the most

    important impact of the creation of RG and

    the reporting of RG findings is that these

    reports in themselves directly confront chronic

    misconceptions that tests are reliable. RG

    studies in and of themselves communicate the

    important understanding that score reliabili-

    ties vary across administrations and are not

    secreted into test booklets during the test

    printing process.

    Declaration of Conflicting Interests

    The author(s) declared no potential conflicts of

    interest with respect to the research, authorship,

    and/or publication of this article.

    Funding

    The author(s) received no financial support for the

    research, authorship, and/or publication of this

    article.

    References

    Note: The 47 RG studies included in this study are

    marked with asterisks.

    *Bachner, Y. G., & ORourke, N. (2007). Reli-

    ability generalization of responses by care

    providers to the Zarit Burden Interview.Aging

    & Mental Health, 11, 678685. doi:10.1080/

    13607860701529965

    *Barnes, L. L. B., Harp, D., & Jung, W. S. (2002).Reliability generalization of scores on the Spiel-

    berger StateTrait Anxiety Inventory.Educa-

    tional and Psychological Measurement, 62,

    603618. doi:10.1177/0013164402062004005

    Bem, S. L. (1981). Bem Sex-Role Inventory: Pro-

    fessional manual. Palo Alto, CA: Consulting

    Psychologists Press.

    *Beretvas, S. N., Meyers, J. L., & Leite, W. L. (2002).

    A reliability generalization study of the Marlowe

    Crowne Social Desirability Scale.Educational

    and Psychological Measurement, 62, 570589.

    doi:10.1177/0013164402062004003

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    9/12

    Vacha-Hasse and Thompson 165

    *Beretvas, S. N., Suizzo, M.-A., Durham, J. A.,

    & Yarnell, L. M. (2008). A reliability gen-

    eralization study of scores on Rotters and

    NowickiStricklands locus of control scales.

    Educational and Psychological Measurement,68, 97119. doi:10.1177/0013164407301529

    Bonnett, D. G. (2010). Varying coefficient alpha

    meta-analytic methods for alpha reliability.Psy-

    chological Methods, 15, 368385. doi:10.1037/

    a0020142

    *Campbell, J. S., Pulos, S., Hogan, M., & Murry, F.

    (2005). Reliability generalization of the Psycho-

    pathy Checklist Applied in youthful samples.

    Educational and Psychological Measurement,

    65, 639656. doi:10.1177/0013164405275666

    *Capraro, R. M., & Capraro, M. M. (2002).

    MyersBriggs Type Indicator score reliability

    across studies: A meta-analytic reliability gen-

    eralization study.Educational and Psychologi-

    cal Measurement, 62, 590602. doi:10.1177/

    0013164402062004004

    *Capraro, M. M., Capraro, R. M., & Henson, R. K.

    (2001). Measurement error of scores on the

    Mathematics Anxiety Rating Scale across

    studies. Educational and Psychological Mea-

    surement, 61, 373386. doi:10.1177/00131640121971266

    Capraro, R. M., & Thompson, B. (2008). The edu-

    cational researcher defined: What will future

    researchers be trained to do?Journal of Edu-

    cational Research, 101, 247253. doi:10.3200/

    JOER.101.4.247-253

    *Caruso, J. C. (2000). Reliability generalization

    of the NEO Personality Scales. Educational

    and Psychological Measurement, 60, 236254.

    doi:10.1177/00131640021970484*Caruso, J. C., & Edwards, S. (2001). Reliability gen-

    eralization of the Junior Eysenck Personality Ques-

    tionnaire.Personality and Individual Differences,

    31, 173184. doi:10.1016/S091-8869(00)00126-4

    *Caruso, J. C., Witkiewitz, K., Belcourt-Dittloff, A.,

    & Gottlieb, J. D. (2001). Reliability of scores

    from the Eysenck Personality Questionnaire:

    A reliability generalization study.Educational

    and Psychological Measurement, 61, 675689.

    doi:10.1177/00131640121971437

    Cohen, J. (1968). Multiple regression as a general

    data-analytic system. Psychological Bulletin,

    70, 426433. doi:10.1037/h0026714

    Dandan, G., & Houcan, Z. (2004). A redefinition

    of reliability and the study of reliability gen-

    eralization.Psychological Science (China), 27,

    445448.

    *Deditius-Island, H. K., & Caruso, J. C. (2002). Anexamination of the reliability of scores from Zuck-

    ermans Sensation Seeking Scales, Form V.Edu-

    cational and Psychological Measurement, 62,

    728734. doi:10.1177/0013164402062004012

    Dimitrov, D. M. (2002). Reliability: Arguments

    for multiple perspectives and potential prob-

    lems with generalizability across studies.Edu-

    cational and Psychological Measurement, 62,

    783801. doi:10.1177/001316402236878

    *Dunn, T. W., Smith, T. B., & Montoya, J. A.

    (2006). Multicultural competency instrumen-

    tation: A review and analysis of reliability

    generalization.Journal of Counseling & Devel-

    opment, 84, 471482.

    *Graham, J. M., & Christiansen, K. (2009). The reli-

    ability of romantic love: A reliability generaliza-

    tion meta-analysis.Personal Relationships, 16,

    4966. doi:10.1111/j.1475-6811.2009.01209.x

    *Graham, J. M., Liu, Y. J., & Jeziorski, J. L. (2006).

    The Dyadic Adjustment Scale: A reliability

    generalization meta-analysis. Journal of Mar-riage and Family, 68, 701717. doi:10.1111/

    j.1741-3737.2006.00284.x

    Gronlund, N. E., & Linn, R. L. (1990). Measure-

    ment and evaluation in teaching (6th ed.).

    New York, NY: Macmillan.

    *Hanson, W. E., Curry, K. T., & Bandalos, D. L.

    (2002). Reliability generalization of Working

    Alliance Inventory scale scores. Educational

    and Psychological Measurement, 62, 659673.

    doi:10.1177/0013164402062004008*Hellman, C. M., Fuqua, D. R., & Worley, J. (2006).

    A reliability generalization study on the Sur-

    vey of Perceived Organizational Support: The

    effects of mean age and number of items on

    score reliability. Educational and Psychologi-

    cal Measurement, 66, 631642. doi:10.1177/

    0013164406288158

    *Hellman, C. M., Muilenburg-Trevino, E. M., &

    Worley, J. A. (2008). The belief in a just

    world: An examination of reliability estimates

    across three measures. Journal of Personal-

    ity Assessment, 90, 399401. doi:10.1080/

    00223890802108238

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    10/12

    166 Measurement and Evaluation in Counseling and Development 44(3)

    *Henson, R. K., & Hwang, D.-Y. (2002). Vari-

    ability and prediction of measurement error in

    a Kolbs Learning Style Inventory Scores: A

    reliability generalization study. Educational

    and Psychological Measurement, 62, 712727.doi:10.1177/ 0013164402062004011

    *Henson, R. K., Kogan, L. R., & Vacha-Haase, T.

    (2001). A reliability generalization study of the

    Teacher Efficacy Scale and related instruments.

    Educational and Psychological Measurement,

    61, 404420. doi:10.1177/00131640121971284

    Henson, R. K., & Thompson, B. (2002). Charac-

    terizing measurement error in scores across

    studies: Some recommendations for conduct-

    ing reliability generalization (RG) studies.

    Measurement and Evaluation in Counseling

    and Development, 35, 113127.

    Hogan, T. P., Benjamin, A., & Brezinski, K. L.

    (2000). Reliability methods: A note on the

    frequency of use of various types. Educa-

    tional and Psychological Measurement, 60,

    523531.

    Hunter, J. E., & Schmidt, F. L. (1990). Methods

    of meta-analysis: Correcting error and bias in

    research findings. Newbury Park, CA: Sage.

    *Huynh, Q.-L., Howell, R. T., & Benet-Martinez, V.(2009). Reliability of bidimensional accul-

    turation scores: A meta-analysis. Journal of

    Cross-Cultural Psychology, 40, 256274.

    doi:10.1177/0022022108328919

    *Kieffer, K. M., Cronin, C., & Fister, M. C. (2004).

    Exploring variability and sources of measure-

    ment error in Alcohol Expectancy Question-

    naire reliability coefficients: A meta-analytic

    reliability generalization study. Journal of

    Studies on Alcohol, 65, 663671.*Kieffer, K. M., & Reese, R. J. (2002). A reliability

    generalization study of the Geriatric Depres-

    sion Scale. Educational and Psychological

    Measurement, 62, 969994. doi:10.1177/0013

    164402238085

    Knapp, T. R. (1978). Canonical correlation analysis:

    A general parametric significance testing sys-

    tem. Psychological Bulletin , 85, 410416.

    doi:10.1037//0033-2909.85.2.410

    *Lane, G. G., White, A. E., & Henson, R. K. (2002).

    Expanding reliability generalization methods

    with KR-21 estimates: An RG study of the

    Coopersmith Self-Esteem Inventory. Educa-

    tional and Psychological Measurement, 62,

    685711. doi:10.1177/0013164402062004010

    *Leach, L. F., Henson, R. K., Odom, L. R., &

    Cagle, L. S. (2006). A reliability generalization

    study of the Self-Description Questionnaire.Educational and Psychological Measurement,

    66, 285304. doi:10.1177/0013164405284030

    *Li, A., & Bagger, J. (2007). The Balanced Inven-

    tory of Desirable Responding (BIDR): A reli-

    ability generalization study. Educational and

    Psychological Measurement, 67, 525544.

    doi:10.1177/001316440292087

    *Lopez-Pina, J. A., Sanchez-Meca, J., & Rosa-

    Alcazar, A. I. (2009). The Hamilton Rating

    Scale for Depression: A meta-analytic reli-

    ability generalization study. International

    Journal of Clinical and Health Psychology, 9,

    143159.

    McCrae, R. R., Kurtz, J. E., Yamagata, S., & Ter-

    racciano, A. (2011). Internal consistency,

    retest reliability, and their implications

    for personality scale validity. Personality

    and Social Psychology Review, 15, 2850.

    doi:10.1177/1088868310366253

    *Miller, B. K., & Byrne, Z. S. (2009). Perceptions

    of organizational politics: A demonstration ofthe reliability generalization technique.Journal

    of Managerial Issues, 21, 280300.

    *Miller, C. S., Shields, A. L., Campfield, D.,

    Wallace, K. A., & Weiss, R. D. (2007). Sub-

    stance use scales of the Minnesota Multiphasic

    Personality Inventory: An exploration of score

    reliability via meta-analysis. Educational and

    Psychological Measurement, 67, 10521065.

    doi:10.1177/0013164406299130

    *Miller, C. S., Woodson, J., Howell, R. T., &Shields, A. L. (2009). SASSI: Assessing the

    reliability of scores produced by the Substance

    Abuse Subtle Screening Inventory. Substance

    Use & Misuse, 44, 10901100.

    *Mji, A., & Alkhateeb, H. M. (2005). Combining

    reliability coefficients: Toward reliability gen-

    eralization of the Conceptions of Mathemat-

    ics Questionnaire.Psychological Reports, 96,

    627634. doi:10.2466/pr0.96.3.627-634

    *Nilsson, J. E., Schmidt, C. K., & Meek, W. D.

    (2002). Reliability generalization: An exami-

    nation of the Career Decision-Making Self-

    Efficacy Scale.Educational and Psychological

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    11/12

    Vacha-Hasse and Thompson 167

    Measurement, 62, 647658. doi:10.1177/

    0013164402062004007

    *ORourke, N. (2004). Reliability generalization

    of responses by care providers to the Center

    for Epidemiologic StudiesDepression Scale.Educational and Psychological Measurement,

    64, 973990. doi:10.1177/0013164404268668

    Pedhazur, E. J., & Schmelkin, L. P. (1991). Mea-

    surement, design, and analysis: An integrated

    approach. Hillsdale, NJ: Erlbaum.

    *Reese, R. J., Kieffer, K. M., & Briggs, B. K.

    (2002). A reliability generalization study of

    select measures of adult attachment style.Edu-

    cational and Psychological Measurement, 62,

    619646. doi:10.1177/0013164402062004006

    Reinhardt, B. (1996). Factors affecting coef-

    ficient alpha: A mini Monte Carlo study. In

    B. Thompson (Ed.),Advances in social science

    methodology (Vol. 4, pp. 320). Greenwich,

    CT: JAI Press.

    *Rexrode, K. R., Petersen, S., & OToole, S.

    (2008). The Ways of Coping Scale: A reli-

    ability generalization study.Educational and

    Psychological Measurement, 68, 262280.

    doi:10.1177/ 0013164407310128

    Rodriguez, M. C., & Maeda, Y. (2006). Meta-analysisof coefficient alpha. Psychological Methods,

    11, 306322. doi:10.1037/1082-989X.11.3.306

    *Ross, M. E., Blackburn, M., & Forbes, S. (2005).

    Reliability generalization of the Patterns of Adap-

    tive Learning Survey Goal Orientation Scales.

    Educational and Psychological Measurement,

    65, 451464. doi:10.1177/0013164404272496

    *Rouse, S. V. (2007). Using reliability generaliza-

    tion methods to explore measurement error:

    An illustration using the MMPI-2 PSY-5Scales.Journal of Personality Assessment, 88,

    264275.

    *Ryngala, D. J., Shields, A. L., & Caruso, J. C.

    (2005). Reliability generalization of the

    Revised Childrens Manifest Anxiety Scale.

    Educational and Psychological Measurement,

    65, 259271. doi:10.1177/0013164404272495

    Schmidt, F. L., & Hunter, J. E. (1977). Develop-

    ment of a general solution to the problem of

    validity generalization. Journal of Applied

    Psychology, 62, 529540. doi:10.1037//0021-

    9010.62.5.529

    *Shields, A. L., & Caruso, J. C. (2003). Reliabil-

    ity generalization of the Alcohol Use Dis-

    orders Identification Test. Educational and

    Psychological Measurement, 63, 404413.

    doi:10.1177/0013164403063003004*Shields, A. L., & Caruso, J. C. (2004). A reli-

    ability induction and reliability generalization

    study of the CAGE Questionnaire.Educational

    and Psychological Measurement, 64, 254270.

    doi:10.1177/0013164403261814

    Thompson, B. (1992). Two and one-half decades

    of leadership in measurement and evaluation.

    Journal of Counseling and Development, 70,

    434438.

    Thompson, B. (1994). Guidelines for authors.Edu-

    cational and Psychological Measurement, 54,

    837847.

    Thompson, B. (Ed.). (2003). Score reliability:

    Contemporary thinking on reliability issues.

    Thousand Oaks, CA: Sage.

    *Thompson, B., & Cook, C. (2002). Stability of the

    reliability of LibQUAL + TM scores: A reli-

    ability generalization meta-analysis study.Edu-

    cational and Psychological Measurement, 62,

    735743. doi:10.1177/0013164402062004013

    Thompson, B., & Vacha-Haase, T. (2000). Psycho-metrics is datametrics: The test is not reliable.

    Educational and Psychological Measurement,

    60, 174195. doi:10.1177/00131640021970448

    Urbina, S. (2004).Essentials of psychological test-

    ing. Hoboken, NJ: John Wiley.

    Vacha-Haase, T. (1998). Reliability generaliza-

    tion: Exploring variance in measurement error

    affecting score reliability across studies.Edu-

    cational and Psychological Measurement, 58,

    620. doi:10.1177/00131640121971059Vacha-Haase, T., Henson, R. K., & Caruso, J. C.

    (2002). Reliability generalization: Moving

    toward improved understanding and use of

    score reliability. Educational and Psychologi-

    cal Measurement, 62, 562569. doi:10.1177/

    0013164402062004002

    *Vacha-Haase, T., Kogan, L. R., Tani, C. R.,

    & Woodall, R. A. (2001). Reliability gener-

    alization: Exploring variation of reliability

    coefficients of MMPI clinical scales scores.

    Educational and Psychological Measurement,

    61, 4559. doi:10.1177/00131640121971059

    at University of Bucharest on March 6, 2014mec.sagepub.comDownloaded from

    http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/http://mec.sagepub.com/
  • 8/12/2019 Eysenck Personality test

    12/12

    168 Measurement and Evaluation in Counseling and Development 44(3)

    Vacha-Haase, T., Kogan, L.R., & Thompson, B.

    (2000). Sample compositions and variabilities

    in published studies versus those in test man-

    uals: Validity of score reliability inductions.

    Educational and Psychological Measurement,60, 509522. doi:10.1177/00131640021970682

    Vacha-Haase, T., Ness, C. M., Nilsson, J., &

    Reetz, D. (1999). Practices regarding reporting

    of reliability coefficients: A review of three jour-

    nals. Journal of Experimental Education, 67,

    335341. doi:10.1080/00220979909598487

    *Vacha-Haase, T., Tani, C. R., Kogan, L. R.,

    Woodall, R. A., & Thompson, B. (2001). Reli-

    ability generalization: Exploring reliability

    variations on MMPI/MMPI-2 validity scale

    scores. Assessment, 8, 391401. doi:10.1177/

    107319110100800404

    *Vassar, M., & Crosby, J. W. (2008). A reliability

    generalization study of coefficient alpha for

    the UCLA Loneliness Scale. Journal of Per-

    sonality Assessment, 90, 601607. doi:10.1080/

    00223890802388624

    *Victorson, D., Barocas, J., Song, J., & Cella, D.

    (2008). Reliability across studies from the Func-

    tional Assessment of Cancer TherapyGeneral

    (FACT-G) and its subscales: A reliability gen-eralization. Quality of Life Research: An Inter-

    national Journal of Quality of Life Aspects of

    Treatment, Care & Rehabilitation,17, 11371146.

    doi:10.1007/s11136-008-9398-2

    *Wallace, K. A., & Wheeler, A. J. (2002). Reli-

    ability generalization of the Life Satisfaction

    Index.Educational and Psychological Mea-

    surement, 62, 674684. doi:10.1177/001316

    4402062004009

    Wilkinson, L., & American Psychological Associa-tion (APA) Task Force on Statistical Inference.

    (1999). Statistical methods in psychology jour-

    nals: Guidelines and explanations. American

    Psychologist, 54, 594604. doi:10.1037//0003-

    066X.54.8.594

    Willson, V. L. (1980). Research techniques in

    AERJ articles: 1969 to 1978. Educational

    Researcher, 9(6), 510. doi:10.2307/1175221Yetkiner, Z. E., & Thompson, B. (in press). Dem-

    onstration of how score reliability is integrated

    into SEM and how reliability affects all statisti-

    cal analyses.Multiple Linear Regression View-

    points, 36(2).

    *Yin, P., & Fan, X. (2000). Assessing the reli-

    ability of Beck Depression Inventory scores:

    Reliability generalization across studies.Edu-

    cational and Psychological Measurement, 60,

    201223. doi:10.1177/00131640021970466

    *Youngstrom, E. A., & Green, K. W. (2003). Reli-

    ability generalization of self-report of emotions

    when using the Differential Emotions Scale.

    Educational and Psychological Measurement,

    63, 279295. doi:10.1177/0013164403253226

    *Zangaro, G. A., & Soeken, K. L. (2005). Meta-

    analysis of the reliability and validity of Part B of

    the Index of Work Satisfaction across studies.

    Journal of Nursing Measurement, 13, 722.

    doi:10.1891/jnum.2005.13.1.7

    Zientek, L. R., & Thompson, B. (2009). Matrix summa-ries improve research reports: Secondary analyses

    using published literature.Educational Researcher,

    38, 343352. doi:10.3102/0013189X09339056

    Bios

    Tammi Vacha-Haase is a professor of psychol-

    ogy at Colorado State University.

    Bruce Thompson is a distinguished professor of

    educational psychology, and of library science, atTexas A&M University, and adjunct professor of

    allied health sciences, Baylor College of Medicine

    (Houston).