eysenck personality inventory

16
http://epm.sagepub.com/ Educational and Psychological Measurement http://epm.sagepub.com/content/61/4/675 The online version of this article can be found at: DOI: 10.1177/00131640121971437 2001 61: 675 Educational and Psychological Measurement John C. Caruso, Katie Witkiewitz, Annie Belcourt-Dittloff and Jennifer D. Gottlieb Reliability of Scores from the Eysenck Personality Questionnaire: A Reliability Generalization Study Published by: http://www.sagepublications.com can be found at: Educational and Psychological Measurement Additional services and information for http://epm.sagepub.com/cgi/alerts Email Alerts: http://epm.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://epm.sagepub.com/content/61/4/675.refs.html Citations: What is This? - Aug 1, 2001 Version of Record >> at University of Bucharest on March 6, 2014 epm.sagepub.com Downloaded from at University of Bucharest on March 6, 2014 epm.sagepub.com Downloaded from

Upload: zmada-anghel

Post on 29-Dec-2015

362 views

Category:

Documents


0 download

DESCRIPTION

Eysenck personality inventory

TRANSCRIPT

Page 1: Eysenck personality inventory

http://epm.sagepub.com/Educational and Psychological Measurement

http://epm.sagepub.com/content/61/4/675The online version of this article can be found at:

 DOI: 10.1177/00131640121971437

2001 61: 675Educational and Psychological MeasurementJohn C. Caruso, Katie Witkiewitz, Annie Belcourt-Dittloff and Jennifer D. Gottlieb

Reliability of Scores from the Eysenck Personality Questionnaire: A Reliability Generalization Study  

Published by:

http://www.sagepublications.com

can be found at:Educational and Psychological MeasurementAdditional services and information for    

  http://epm.sagepub.com/cgi/alertsEmail Alerts:

 

http://epm.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://epm.sagepub.com/content/61/4/675.refs.htmlCitations:  

What is This? 

- Aug 1, 2001Version of Record >>

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 2: Eysenck personality inventory

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENTCARUSO ET AL.

RELIABILITY OF SCORES FROM THEEYSENCK PERSONALITY QUESTIONNAIRE:A RELIABILITY GENERALIZATION STUDY

JOHN C. CARUSO, KATIE WITKIEWITZ,ANNIE BELCOURT-DITTLOFF, AND JENNIFER D. GOTTLIEB

University of Montana

A reliability generalization study was conducted on data from 69 samples found in 44studies that employed the Psychoticism (P), Extraversion (E), Neuroticism (N), and Lie(L) scales of the Eysenck Personality Questionnaire (EPQ) or EPQ-Revised. The reli-ability of the scores varied considerably between scales, with P scores tending to have thelowest reliability. Hierarchical regression analyses revealed that a larger standard devia-tion of scores was associated with higher score reliability for all four EPQ scales. Morevariability in age was associated with higher score reliability for the P scale and the Lscale. Samples composed of students provided scores with higher reliability than thosecomposed of other types of individuals for the P scale. Several other potential predictors(form, language of administration, average score, average age, gender composition, andnumber of items per scale) were not significantly related to score reliability.

Researchers performing meta-analytic reliability generalization (RG)studies attempt to characterize the reliability of scores on a particular psycho-logical test and to investigate the factors that influence score reliability.Briefly, the methodology consists of collecting score reliability coefficientsand other information from existing studies and using various characteristicsof each sample or study (such as age or gender composition) to predict scorereliability. Such studies are executed to empirically examine the belief that itis not a test per se that has greater or lesser reliability but a particular set ofscores derived from the administration of the test to a particular group.Wilkinson and the American Psychological Association (APA) Task Forceon Statistical Inference (1999) stated that “it is important to remember that atest is not reliable or unreliable. Reliability is a property of the scores on a testfor a particular population of examinees” (p. 596).

Educational and Psychological Measurement, Vol. 61 No. 4, August 2001 675-689© 2001 Sage Publications

675

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 3: Eysenck personality inventory

The RGs that have been conducted have usually found that the reliabilityof scores does, in fact, vary as a function of participant and study characteris-tics (e.g., Caruso, 2000; Caruso & Edwards, in press; Vacha-Haase, 1998;Yin & Fan, 2000; but see Viswesvaran & Ones, 2000), supporting the notionthat reliability is a property of scores and not tests. Based on this notion, themanuscript submission guidelines for empirical manuscripts submitted tothis journal require the reporting of complete information on the reliability ofscores when feasible and proscribe the use of phrasing such as “the test is reli-able” (Thompson, 1994).

RG studies typically use some form of the general linear model (e.g.,regression, ANOVA, or canonical correlation) to examine the relationshipsbetween various study characteristics and score reliability. Score reliabilitycoefficients, or some transformation of them (e.g., the standard error of mea-surement), are employed as criterion variables. Two well-known assump-tions of general linear techniques are that the criterion variable(s) be inter-vally scaled and normally distributed. With regard to the former, classical testtheory (e.g., Lord & Novick, 1968) leads to two seemingly contradictoryinterpretations of score reliability. First, the reliability coefficient is the corre-lation between parallel observed measurements (X and X′) of a given con-struct. However, it is also equal to the squared correlation between eitherobserved measurement and the true score (Xt). Thus, score reliability coeffi-cients can be reasonably interpreted as either correlations or squared correla-tions. The importance of this point is that squared correlations are variance-accounted-for statistics and as such are scaled on an interval level, whereascorrelations themselves are not. In a very informative exchange on the valueand implementation of RG studies in a recent special issue of this journal,Sawilowsky (2000) chose to interpret reliability coefficients as correlations,whereas Thompson and Vacha-Haase (2000) preferred the squared correla-tion interpretation. If we adhere to the interpretation of Thompson andVacha-Haase, we need not transform the score reliability coefficients prior toanalysis to ostensibly satisfy the interval level requirement, but if we interpretthem as Sawilowsky did, then they should be transformed in some way tomore closely approximate the interval level assumption. Although either caseseems reasonable due to the dual interpretation of reliability coefficients, theinterval level property is difficult to test (cf. Cliff, 1992).

Normality, on the other hand, can be examined by computing the skew-ness and kurtosis of the distributions of reliability coefficients and varioustransformations of them. In addition to untransformed score reliability coef-ficients, we will also consider squared score reliability coefficients (due tothe issues raised by Sawilowsky, 2000) and the use of Fisher’s z′ transforma-tion, which has been shown to adjust for the skewness of the distribution ofcorrelation coefficients (Dunlap, Silver, & Phelps, 1988; Silver & Dunlap,

676 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 4: Eysenck personality inventory

1987). The precision of these measures of nonnormality is indicated by theirstandard errors.

The EPQ

The original EPQ was the result of successive improvements and addi-tions to the Maudsley Personality Inventory (MPI) (H. J. Eysenck & Knapp,1962) and the Eysenck Personality Inventory (EPI) (H. J. Eysenck &Eysenck, 1964). The MPI was designed to measure two personality charac-teristics: extraversion (E) and neuroticism (N). High scorers on the E scaleare characterized as sociable, exciting, pleasurable, carefree, and aggressive.Low scorers are more withdrawn, serious, moralistic, and tend to enjoy beingalone. An individual who scores high on the N scale is more likely to be aworried and moody person. People with high N scores also tend to sufferfrom emotional and psychosomatic disorders. Someone with a low N scorecan often be characterized as stable, less emotional, and not very anxious. Itwas found that the two scales of the MPI were slightly intercorrelated,although they measured theoretically distinct constructs, and they often pro-duced scores with low reliability (H. J. Eysenck & Eysenck, 1994).

The EPI was developed in response to these criticisms and also includedthe Lie (L) scale for assessing response bias. H. J. Eysenck and Eysenck(1975) then developed the EPQ, which incorporated the Psychoticism (P)scale for assessing psychotic personality characteristics. The P scale wasdesigned to measure behavior patterns that might be considered schizoid orpsychopathic in the extreme case. An individual with a high score on the Pscale may be inclined to exhibit conduct or other behavioral disorders andmay lack empathy. In addition, these individuals may be hostile, insensitive,or disengaged from society. Although various researchers occasionallyexclude items or employ short forms, the original versions of the full scalesinclude the following number of items: P (25 items), E (21 items), N (23items), and L (21 items).

Despite the widespread use of the questionnaire, several studies have re-ported that EPQ scores may have undesirable psychometric properties (e.g.,Block, 1977; Goh, King, & King, 1982; Helmes, 1980). These studies havereported problems with the factor structure and low reliability of the scores,particularly on the P scale. S.B.G. Eysenck, Eysenck, and Barrett (1985) rec-ognized three major problems with scores on the original P scale: low reli-ability, low range, and highly skewed distributions. Primarily to remedy thepsychometric weaknesses of scores on the P scale, S.B.G. Eysenck et al.(1985) developed a revised version of the EPQ (the EPQ-R). The 94-itemEPQ-R includes 27 items on the P scale, 22 items on the E scale, 24 items onthe N scale, and 21 items on the L scale. The internal consistency of the scores

CARUSO ET AL. 677

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 5: Eysenck personality inventory

in the standardization sample, reported in the EPQ-R manual, ranged from.66 (P scale, male respondents) to .86 (N scale, male and female respon-dents). The test’s authors (H. J. Eysenck & Eysenck, 1994) justify the low re-liability of scores on the P scale by stating,

It must be remembered that the P scale taps several different facets (hostility,cruelty, lack of empathy, nonconformism, etc.) which may hold reliabilitieslower than would be true of a scale like E, which comprises largely sociabilityand activity items only. (p. 14)

But the low score reliability nevertheless casts doubt on the meaningfulinterpretation of the scores. To the extent that the items of the P scale are notunidimensional, it may be the case that two or more subscales would allowfor a more meaningful examination of individual differences. In addition, thestatement of H. J. Eysenck and Eysenck (1994) implies that low reliability isa property of the P scale and that high reliability is a property of the E scale.Using the methodology of RG, we can begin to elucidate the group or studycharacteristics that may be related to the lower reliability of scores on the Pscale as opposed to attributing low reliability to the P scale categorically andwith finality. Furthermore, we will be able to ascertain whether the reliabilityof scores on the P scale of the EPQ-R is typically greater than that of the EPQ,that is, whether S.B.G. Eysenck et al. (1985) achieved that goal in their revi-sion of the scale.

Purposes

The present study has three primary purposes. First, we will assess the typ-ical reliability of scores on the P, E, N, and L scales of the EPQ and EPQ-R.Second, we will compare the distributions of score reliability coefficients,and various transformations of them, to examine the appropriateness of para-metric statistical analyses. Third, we will examine the relationships betweenvarious study and respondent characteristics and score reliability.

Method

Data

In December 1999, the American Psychological Association’s (1992)PsycINFO database was used to generate a list of empirical journal articles inwhich the EPQ or EPQ-R were used. At that time, PsycINFO covered 1,471periodicals from psychology and related fields. Only those articles appearingbetween 1980 and 1999 were selected. The literature search identified 1,540empirical journal articles in which “Eysenck Personality Questionnaire,”“EPQ,” or “EPQ-R” appeared as an index term, in the title, or in the abstract.

678 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 6: Eysenck personality inventory

Of the 1,540 articles, most were excluded from this study. Three hundredand thirteen articles (20%) were published in a language other than English.Seven hundred and sixty seven (50%), a disappointingly high number, didnot mention reliability or score reliability whatsoever. Two hundred andforty nine (16%) asserted that the EPQ (or EPQ-R) was a reliable instrumentor produced reliable scores but provided no data to support this claim. Onehundred and thirty five (9%) reported reliabilities from one of the EPQ manu-als or from other data not collected for that study. The pattern of not evenmentioning reliability is common but certainly disturbing and may originatefrom endemic misconceptions that tests per se are reliable (Vacha-Haase,Ness, Nilsson, & Reetz, 1999; Whittington, 1998). The pattern of “induct-ing” reliability coefficients from prior studies is often unjustified, althoughcommon too, and is disturbing as well (Vacha-Haase, Kogan, & Thompson,2000).

Twenty four (2%) provided reliabilities from the data at hand but in poorform, such as the range of reliability across all scales. Of the remaining 52studies, 8 reported test-retest reliability estimates, and these were excluded.This left 44 studies presenting usable internal consistency coefficients. Thesestudies are marked with asterisks in the References section, although someare cited elsewhere as well. Data from 69 samples were extracted from the 44studies. Four of these studies did not employ the L scale, and so the numbersof samples for all analyses presented here are 69 for P, E, and N and 65 for L.

Procedure

Separate analyses were conducted for P, E, N, and L score reliability. Weselected multiple regression as our method to examine the relationshipbetween score reliability and the selected predictor variables. We performeda hierarchical analysis with the number of items administered and the stan-dard deviation of scores entered as predictors in the first block. These vari-ables were entered first because, with a few common assumptions, they areboth algebraically related to score reliability. First, the Spearman-Brownprophesy formula presents the relationship between the number of items on aparticular scale and the reliability of the scores it produces:

ρ ρρXX

XX

XX

k

k′

=+ −*

( )1 1(1)

where ρXX ′ is the reliability of the original scores, k is the ratio of the numberof items on the new test to items on the original test, andρXX ′* is the predictedreliability of scores on the new test. For example, if a test with 20 items pro-duces scores with a reliability of .70, and 20 additional items are added, k = 2and ρXX ′* = .82.

CARUSO ET AL. 679

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 7: Eysenck personality inventory

Second, observed score variability is related to score reliability by the fol-lowing equalities:

rxxe

X

t

X

′ = − =12

2

2

2

σσ

σσ

(2)

whereσ X2 is the observed score variance,σ t

2 is the true score variance, andσ e2

is the error variance, with the error and true score variances summing to formthe observed score variance. Nunnally (1970, p. 556) suggested that the firstequality in Equation 2 could be used to estimate what score reliability wouldbe if observed score variance were larger or smaller than in a given popula-tion. For example, if the reliability of a set of scores was .50, with an errorvariance of 2 and an observed score variance of 4 (.50 = 1 – (2/4)), then the es-timated score reliability in a more heterogeneous population with an ob-served score variance of 8 would be 1 – (2/8) = .75. Using Equation 2 in thisway assumes that the error variance is the same in the two populations or,equivalently, that true score variance has increased by the exact same amountas has observed score variance. Because of these algebraic relationships, weassigned priority to the number of items administered and the standard devia-tion of scores and consequently entered them in the first block of our regres-sion analyses.

The other predictor variables, selected largely based on availability as thisis an archival study, entered simultaneously in a second block were the meanscore, the mean age of participants, the standard deviation of age, sampletype (0 = student, 1 = nonstudent), gender composition (coded as the propor-tion of subjects who were male), language of administration (0 = English, 1 =non-English), and EPQ form (0 = EPQ, 1 = EPQ-R). Table 1 providesdescriptive statistics for the predictor variables.

Results

The first goal of this study was to characterize the reliability of scores oneach EPQ scale in terms of central tendency and variability. Table 2 presentsthe median, mean, standard deviation, and range of score reliabilities for eachof the four EPQ scales. As shown, scores on the N and E scales tend to bemost reliable, with medians of .83 and .82, respectively. Scores on the P and Lscale were less reliable scores with medians of .66 and .77, respectively.Scores on the P scale in particular often had poor reliability with a minimumof .36 and an interquartile range from .55 to .77.

The second goal of this study was to compare the distributions of the scorereliability coefficients for each scale to the distributions that resulted aftertwo transformations: squaring and Fisher’s z′ transformation. The skewnessand kurtosis of each type of coefficient for each scale were computed, alongwith their standard errors, and these are provided in Table 3. The distributions

680 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 8: Eysenck personality inventory

of the three types of coefficients were generally not highly skewed, and,except for the N scale, they were not highly kurtotic. The untransformedscore reliability coefficients for the L scale had a statistically significantamount of negative skew, and the Fisher’s transformations on the N scale hada significant amount of positive skew. Based on this preliminary evidence, it

CARUSO ET AL. 681

Table 1Descriptive Statistics for Predictor Variables

Predictor M SD Range Percentage

Average age 27.89 9.73 16.51-63.50Standard deviation of age 7.45 4.84 0.67-18.60Proportion male 0.50 0.40 0.00-1.00Number of items

Psychoticism 25.72 6.38 6-32Extraversion 20.66 3.82 6-25Neuroticism 22.06 4.18 6-25Lie 20.08 3.36 6-23

Mean of scoresPsychoticism 4.96 2.48 0.90-11.53Extraversion 12.32 3.12 2.80-20.05Neuroticism 10.12 3.39 2.10-17.44Lie 8.63 3.02 2.30-16.55

Standard deviation of scoresPsychoticism 3.12 1.14 1.20-5.83Extraversion 4.26 1.01 1.50-7.35Neuroticism 4.46 1.09 1.10-6.18Lie 3.90 0.94 1.50-8.20

Language of administrationEnglish (n = 49) 71Non-English (n = 17) 25Missing (n = 3) 4

EPQ formOriginal (n = 38) 55Revised (n = 31) 45

Sample typeStudent (n = 31) 45Not student (n = 38) 55

Table 2Descriptive Statistics for Score Reliability Coefficients

EPQ Scale Minimum Maximum Median M SD

Psychoticism .36 .91 .68 .66 .13Extraversion .68 .93 .82 .82 .05Neuroticism .69 .97 .83 .83 .04Lie .59 .88 .78 .77 .05

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 9: Eysenck personality inventory

appears that using the z′ transformation is not indicated and that neither thescore reliability coefficients themselves nor the squared score reliabilitycoefficients suffer from debilitating nonnormality. We also conducted paral-lel regression analyses for each operationalization of score reliability (resultsnot shown) and found no differences in substantive interpretation. Untrans-formed score reliability coefficients were used as criterion variables in theregression analyses presented next.

Our third and final goal was to examine the relationships between the pre-dictor variables and score reliability. Table 4 shows the unstandardized andstandardized regression weights and the structure coefficients of the predic-tors for P score reliability. Both sets of predictors made statistically signifi-cant contributions: the R2 for Block 1 was .34, F(2, 58) = 14.71, p < .0005, andthe additional variance explained by the Block 2 predictors was .18, F(7, 51)= 2.66, p = .02. The adjusted R2 (a better estimate of the population R2) for themodel with all predictors entered was .43. The standard deviation of scoreswas the strongest predictor of score reliability in both models. Based on theunstandardized regression coefficients (the Bs), the following interpretationscan be made. As the standard deviation of scores increases by one, the reli-ability of scores increases by .05 (in Block 1) or .10 (when Block 2 variableswere entered). Sample type was also a statistically significant predictor, and,because this variable was coded as 0 (student) and 1 (nonstudent), we can alsostate that the reliability of scores from student samples was somewhat higherthan that from nonstudent samples. Although statistically significant at an αlevel of .05, this effect was modest and is somewhat difficult to interpret dueto the variety of sample types making up the nonstudent group. The standarddeviation of age was also a statistically significant predictor, with more agevariability being associated with higher score reliability. This effect was alsomodest, but note that both sample type and age variability accounted for sig-nificant amounts of variance in score reliability over and above scorevariability.

Table 5 provides the regression weights and structure coefficients for pre-dicting E score reliability. Neither set of predictors made a statistically signif-

682 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

Table 3Nonnormality of the Three Operationalizations of Score Reliability Coefficients

Skewness Kurtosis

EPQ Scale ρxx ′ ρ2xx ′ z′ ρxx ′ ρ2

xx ′ z′

Psychoticism –0.33 (.29) 0.03 (.29) 0.32 (.29) –0.76 (.57) –0.92 (.57) –0.29 (.57)Extraversion –0.66 (.29) –0.45 (.29) 0.23 (.29) 0.49 (.57) 0.35 (.57) 0.89 (.57)Neuroticism –0.30 (.29) –0.01 (.29) 2.23 (.29) 1.65 (.57) 1.92 (.57) 12.35 (.57)Lie –0.80 (.30) –0.60 (.30) –0.17 (.30) 1.23 (.59) 0.64 (.59) 0.21 (.59)

Note. Standard errors of skewness and kurtosis are provided in parentheses.

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 10: Eysenck personality inventory

icant contribution, F(2, 58) = 2.01, p = .143 for Block 1 and F(7, 51) = 1.39,p = .23 for Block 2. Despite the nonsignificance of the models, the standarddeviation of the scores was significant when the Block 2 variables wereentered. This effect was also small: With all other predictors in the model, asthe standard deviation of scores increases by one, the score reliabilityincreases by only .02.

Only Block 1 predictors were statistically significant for N score reliabil-ity (see Table 6), with an adjusted R2 of .17: Block 1 F(2, 58) = 7.27, p = .002;Block 2 F(7, 51) = .81, p = .58. The standard deviation of scores was the onlysignificant predictor of N score reliability.

CARUSO ET AL. 683

Table 4Regression Analyses for Psychoticism Score Reliability

Block Predictor B SEB β t p rs

1 Constant .416 .057 — 7.36 <.0005 —Number of items .003 .003 .162 1.22 .229 .762Standard deviation of scores .052 .015 .468 3.51 .001 .974

2 Constant .481 .098 — 4.88 <.0005 —Number of items –.001 .003 –.025 –0.16 .874 .616Standard deviation of scores .098 .030 .874 3.30 .002 .788Mean of scores –.022 .015 –.435 –1.53 .133 .640Mean age –.005 .002 –.344 –1.81 .077 –.123Standard deviation of age .015 .005 .578 3.06 .004 .227Sample type –.070 .033 –.276 –2.15 .036 –.160Proportion male .040 .032 .127 1.25 .218 .042Language .028 .029 .098 0.98 .334 –.014EPQ form .055 .030 .215 1.84 .072 .466

Table 5Regression Analyses for Extraversion Score Reliability

Block Predictor B SEB β t p rs

1 Constant .786 .037 — 21.33 <.0005 —Number of items –.002 .002 –.123 –0.82 .417 .137Standard deviation of scores .015 .008 .298 1.99 .052 .914

2 Constant .857 .060 — 14.17 <.0005 —Number of items –.001 .009 –.096 –0.45 .651 .075Standard deviation of scores .023 .009 .464 2.60 .012 .502Mean of scores –.006 .004 –.389 –1.59 .119 –.267Mean age –.002 .001 –.386 –1.54 .131 .063Standard deviation of age .003 .003 .309 1.28 .208 –.110Sample type –.003 .017 –.033 –0.19 .849 .330Proportion male –.009 .016 –.071 –0.55 .585 –.317Language .011 .016 .094 0.66 .510 .297EPQ form –.001 .014 –.006 –0.05 .963 –.276

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 11: Eysenck personality inventory

Table 7 gives the regression weights for predictors of score reliability onthe L scale. Only Block 1 predictors explained a statistically significantamount of variance: Block 1 F(2, 55) = 12.93, p < .0005; Block 2 F(7, 48) =.99, p = .45. The adjusted R2 for Block 1 was .30. Again, the standard devia-tion of scores was the strongest predictor, but an increase of one results in anincrease of only about .02 to .03 in score reliability. The standard deviation ofage of the sample was also a significant predictor in Block 2: As the variabil-ity in the age of the sample increased by one, the reliability of the scoresincreased by .005.

684 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

Table 6Regression Analyses for Neuroticism Score Reliability

Block Predictor B SEB β t p rs

1 Constant .761 .027 — 27.91 .0005 —Number of items –.001 .002 –.080 –0.53 .601 .525Standard deviation of scores .019 .006 .494 3.25 .002 .989

2 Constant .698 .048 — 14.52 .0005 —Number of items –.000 .002 –.013 –0.08 .940 .443Standard deviation of scores .023 .009 .575 2.64 .011 .836Mean of scores –.000 .003 –.031 –0.15 .886 .479Mean age .002 .001 .345 1.47 .148 .243Standard deviation of age .000 .002 .012 0.05 .958 .362Sample type –.015 .013 –.181 –1.18 .245 .094Proportion male .004 .014 .035 0.26 .794 –.087Language .000 .012 .000 0.00 .998 –.201EPQ form .003 .010 .030 2.48 .805 .004

Table 7Regression Analyses for Lie Score Reliability

Block Predictor B SEB β t p rs

1 Constant .613 .036 — 17.04 <.0005 —Number of items .003 .002 .224 1.73 .089 .772Standard deviation of scores .023 .007 .418 3.23 .002 .940

2 Constant .657 .060 — 11.00 <.0005 —Number of items .000 .003 .027 0.16 .877 .686Standard deviation of scores .026 .009 .475 2.96 .005 .835Mean of scores .002 .003 .111 0.64 .524 .491Mean age –.002 .001 –.280 –1.28 .208 –.060Standard deviation of age .005 .002 .464 2.22 .031 .345Sample type –.027 .020 –.254 –1.34 .185 .146Proportion male .004 .015 .033 0.28 .779 –.094Language –.003 .0152 –.024 –0.20 .844 –.298EPQ form .015 .012 .136 1.17 .249 .173

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 12: Eysenck personality inventory

Discussion

The main finding of the present study is that scores on the E, N, and Lscales typically have adequate reliability, whereas scores on the P scale oftenhave poor reliability. One of the reasons for the development of the EPQ-Rwas the poor reliability of scores on the P scale of the EPQ (S.B.G. Eysencket al., 1985). Unfortunately, the form of the EPQ employed was not a statisti-cally significant predictor of P score reliability. Thus, it does not appear thatthe revision resulted in an improvement in this property of scores fromthe P scale. Furthermore, both the mean and median reliability of scoreson the P scale were less than .70, a value that is typically considered the mini-mum acceptable value for personality questionnaires. It has been noted (H. J.Eysenck & Eysenck, 1994) that the P scale may be less unidimensional thanthe other scales of the EPQ, but, although this is consistent with the presentfindings, it does not reduce the difficulties encountered when score reliabilityis low. Future research attempting to delineate the factor structure of the Pitems may lead to the development of two or more subscales that are moreinternally consistent than the sum of all of the P items.

As noted, it appears that the distributions of either score reliability coeffi-cients or squared reliability coefficients adequately approximate normalityfor most EPQ scales. Therefore, depending on one’s interpretation of theissues raised by Sawilowsky (2000) and Thompson and Vacha-Haase (2000),either could be appropriately employed as criterion variables in parametricstatistical analyses. The use of Fisher’s z′ transformation, often employedwhen analyzing correlation coefficients, resulted in an increase in deviationfrom normality. Why a transformation that has proved valuable in normaliz-ing distributions of correlations (e.g., Dunlap et al., 1988; Silver & Dunlap,1987) did not improve conditions here is unclear, but this is an area worthy offurther study. It would be valuable for future RG studies to include an exami-nation of nonnormality for the expressions of score reliability employed hereand others.

The results of this and other RGs as well as basic formulas of classical testtheory indicate that observed score variability is a very important predictor ofscore reliability. In fact, few instances were found in this study in which otherpredictor variables accounted for variance in score reliability over and abovethat accounted for by score variability. However, other variables included inour analysis may have an effect on score reliability through increasingobserved score variability, a hypothesis that seems quite likely for many ofthe predictors in our analyses. For example, the number of items adminis-tered correlated between .51 and .64 with the standard deviation of scores,making it difficult for the former variable to make an independent contribu-tion to predicting score reliability. It may be the case that a thorough under-standing of the variables that affect score reliability must wait for a more thor-ough understanding of those that affect score variability. Future RGs may

CARUSO ET AL. 685

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 13: Eysenck personality inventory

draw on the results of this and other analyses to develop path models or othertestable models of score reliability.

A disappointing finding from the present study is the small proportion ofstudies in which the reliability of scores was provided, as there are many im-portant reasons for the inclusion of complete psychometric data in all empiri-cal studies. Thompson and Vacha-Haase (2000) recommend that

large discrepancies between reliability estimates reported in the manual andthose obtained in a given study alert the researcher to the possibility that thenormative sample and the research sample may represent discrete populations,and thus such comparisons even may bear somewhat on the generalizability ofsubstantive results. (p. 191)

Thus, the calculation (and presentation) of reliability coefficients in substan-tive research is not of interest solely to psychometricians. Such calculationscan influence substantive interpretations and are an integral aspect of com-plete research reporting.

References

Studies presenting usable internal consistency coefficients are marked with an asterisk.*Allsopp, J., Eysenck, H. J., & Eysenck, S.B.G. (1991). Machiavellianism as a component in

psychoticism and extraversion. Personality and Individual Differences, 12, 29-41.American Psychological Association. (1992). PsycINFO Psychological Abstracts Information

Services users reference manual. Washington, DC: Author.*Biswas, P. K. (1990). The Eysenck Personality Questionnaire (EPQ) on educated Mizos. In-

dian Journal of Clinical Psychology, 17, 71-73.Block, J. (1977). The Eysencks and psychoticism. Journal of Abnormal Psychology, 86, 653-

654.Caruso, J. C. (2000). Reliability generalization of the NEO Personality Scales. Educational and

Psychological Measurement, 60, 236-254.Caruso, J. C., & Edwards, S. (in press). Reliability generalization of the Junior Eysenck Person-

ality Questionnaire. Personality and Individual Differences.Cliff, N. (1992). Abstract measurement theory and the revolution that never happened. Psycho-

logical Science, 3, 186-190.*Corulla, W. J. (1987). A psychometric investigation of the Eysenck Personality Questionnaire

(Revised) and its relationship to the I.7 Impulsiveness Questionnaire. Personality and Indi-vidual Differences, 8, 651-658.

*Corulla, W. J. (1988). A further psychometric investigation of the Sensation Seeking ScaleForm-V and its relationship to the EPQ- R and the I.7 Impulsiveness Questionnaire. Person-ality and Individual Differences, 9, 277-287.

*Corulla, W. J. (1989). The relationships between the Strelau Temperament Inventory, Sensa-tion Seeking and Eysenck’s dimensional system of personality. Personality and IndividualDifferences, 10, 161-173.

*De Flores, T., & Valdes, M. (1986). Behaviour pattern A: Reward, fight or punishment? Person-ality & Individual Differences, 7, 319-326.

Dunlap, W. P., Silver, N. C., & Phelps, G. R. (1988). A Monte Carlo study of using the firsteigenvalue for averaging intercorrelations. Educational and Psychological Measurement,47, 917-923.

686 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 14: Eysenck personality inventory

*Egan, V., Miller, E., & McLellan, I. (1998). Does the personal questionnaire provide a moresensitive measure of cardiac surgery related-anxiety than a standard pencil-and-paper check-list? Personality & Individual Differences, 24, 465-473.

Eysenck, H. J., & Eysenck, S.B.G. (1964). Manual of the Eysenck Personality Inventory. Lon-don: University of London Press.

Eysenck, H. J., & Eysenck, S.B.G. (1975). Manual of the Eysenck Personality Questionnaire.London: Hodder & Stoughton/EdITS.

Eysenck, H. J., & Eysenck, S.B.G. (1994). Manual of the Eysenck Personality Questionnaire:Comprising the EPQ-Revised (EPQ-R) and EPQ-R Short Scale. San Diego, CA: EdITS.

Eysenck, H. J., & Knapp, R. R. (1962). Manual for the Maudsley Personality Inventory. SanDiego, CA: EdITS.

*Eysenck, S.B.G. (1981). National differences in personality: Sicily and England. Italian Jour-nal of Psychology, 8, 87-93.

*Eysenck, S.B.G., & Allsopp, J. F. (1986). Personality differences between students and crafts-men. Personality and Individual Differences, 7, 439-441.

*Eysenck, S.B.G., Barrett, P., Spielberger, C., Evans, F. J., & Eysenck, H. J. (1986). Cross-cul-tural comparisons of personality dimensions: England and America. Personality and Indi-vidual Differences, 7, 209-214.

*Eysenck, S.B.G., & Chan, J. (1982). A comparative study of personality in adults and children:Hong Kong vs. England. Personality and Individual Differences, 3, 153-160.

*Eysenck, S.B.G., Eysenck, H. J., & Barrett, P. (1985). A revised version of the Psychoticismscale. Personality and Individual Differences, 6, 21-29.

*Eysenck, S.B.G., & Haapasalo, J. (1989). Cross-cultural comparisons of personality: Finlandand England. Personality and Individual Differences, 10, 121-125.

*Eysenck, S.B.G., & Long, F. Y. (1986). A cross-cultural comparison of personality in adultsand children: Singapore and England. Journal of Personality and Social Psychology, 50,124-130.

*Eysenck, S.B.G., & Tambs, K. (1990). Cross-cultural comparison of personality: Norway andEngland. Scandinavian Journal of Psychology, 31, 191-197.

*Eysenck, S.B.G., & Yanai, O. (1985). A cross-cultural study of personality: Israel and England.Psychological Reports, 57, 111-116.

*Fontaine, K. R. (1994). Personality correlates of sexual risk-taking among men. Personalityand Individual Differences, 17, 693-694.

*French, C. C., & Beaumont, J. G. (1989). A computerized form of the Eysenck PersonalityQuestionnaire: A clinical study. Personality and Individual Differences, 10, 1027-1032.

Goh, D. S., King, D. W., & King, L. A. (1982). Psychometric evaluation of the Eysenck Person-ality Questionnaire. Educational and Psychological Measurement, 42, 297-309.

*Gomà-i-Freixanet, M. (1997). Consensual validity of the EPQ: Self-reports and spouse-re-ports. European Journal of Psychological Assessment, 13, 179-185.

*Heaven, P.C.L. (1989). Orientation to authority and its relation to impulsiveness. Current Psy-chology: Research and Reviews, 8, 38-45.

*Heavan, P.C.L., Connors, J., & Trevathan, R. (1987). Authoritarianism and the EPQ. Personal-ity and Individual Differences, 8, 677-680.

Helmes, E. (1980). A psychometric investigation of the Eysenck Personality Questionnaire. Ap-plied Psychological Measurement, 4, 43-55.

*Hosokawa, T., & Ohyama, M. (1993). Reliability and validity of a Japanese version of theshort-form Eysenck Personality Questionnaire—Revised. Psychological Reports, 72, 823-832.

*Jahanshahi, M. (1990). Personality in torticollis: Changes across time. Personality & Individ-ual Differences, 11, 355-363.

*Kardum, I., & Hudek-Knezevic, J. (1996). The relationship between Eysenck’s personalitytraits, coping styles and moods. Personality and Individual Differences, 20, 341-350.

CARUSO ET AL. 687

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 15: Eysenck personality inventory

*Levin, J., & Montag, I. (1987). The effect of testing instructions for handling social desirabilityon the Eysenck Personality Questionnaire. Personality and Individual Differences, 8, 163-167.

*Lewis, C. A., & Maltby, J. (1996). Personality, prayer, and church attendance in a sample ofmale college students in the USA. Psychological Reports, 78, 976-978.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:Addison-Wesley.

*McCown, W., Keiser, R., Mulhearn, S., & Williamson, D. (1997). The role of personality andgender in preference for exaggerated bass in music. Personality and Individual Differences,23, 543-547.

*Merten, T., & Ruch, W. (1996). A comparison of computerized and conventional administra-tion of the German versions of the Eysenck Personality Questionnaire and the Carroll RatingScale for depression. Personality and Individual Differences, 20, 281-291.

*Merten, T., & Siebert, K. (1997). A comparison of computerized and conventional administra-tion of the EPQ-R and CRS: Further data on the Merten and Ruch (1996) study. Personalityand Individual Differences, 22, 283-286.

*Mohan, J., & Virdi, P. K. (1985). Standardisation of P. Q. on Punjab University students. IndianPsychological Review, 28, 20-28.

*Mortenson, E. L., Reinisch, J. M., & Sanders, S. A. (1996). Psychometric properties of the Dan-ish 16PF and EPQ. Scandinavian Journal of Psychology, 37, 221-225.

*Muntaner, C., Garcia-Sevilla, L., Fernandez, A., & Torrubia, R. (1988). Personality dimen-sions, schizotypal and borderline personality traits and psychosis proneness. Personality andIndividual Differences, 9, 257-268.

*Nagoshi, C. T., Pitts, S. C., & Nakata, T. (1993). Intercorrelations of attitudes, personality, andsex role orientation in a college sample. Personality and Individual Differences, 14, 603-604.

Nunnally, J. C. (1970). Introduction to psychological measurement. New York: McGraw-Hill.*Parker, J.D.A., Bagby, R. M., & Taylor, G. J. (1989). Toronto Alexithymia Scale, EPQ and self-

report measures of somatic complaints. Personality and Individual Differences, 10, 599-604.*Perera, M., & Eysenck, S.B.G. (1984). A cross-cultural study of personality: Sri Lanka and

England. Journal of Cross-Cultural Psychology, 15, 353-371.*San Martini, P., & Mazzoti, E. (1990). Relationships between the factorial dimensions of the

Strelau Temperament Inventory and the EPQ-R. Personality and Individual Differences, 11,909-914.

*San Martini, P., Mazzotti, E., & Setaro, S. (1996). Factor structure and psychometric features ofthe Italian version of the EPQ-R. Personality and Individual Differences, 21, 877-882.

Silver, N. C., & Dunlap, W. P. (1987). Averaging correlation coefficients: Should Fisher’s z′transformation be used? Journal of Applied Psychology, 72, 146-148.

Sawilowsky, S. S. (2000). Psychometrics versus datametrics: Comment on Vacha-Haase’s “reli-ability generalization” method and some EPM editorial policies. Educational and Psycho-logical Measurement, 60, 157-173.

*Tambs, K., Sundet, J. M., Eaves, L., & Berg, K. (1989). Relations between EPQ and JenkinsActivity Survey. Personality and Individual Differences, 10, 1229-1235.

*Tarrier, N., Eysenck, S.B.G., & Eysenck, H. J. (1980). National differences in personality:Brazil and England. Personality and Individual Differences, 1, 164-171.

Thompson, B. (1994). Guidelines for authors. Educational and Psychological Measurement, 54,837-847.

Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable.Educational and Psychological Measurement, 60, 174-195.

Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error af-fecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.

688 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from

Page 16: Eysenck personality inventory

Vacha-Haase, T., Kogan, L. R., & Thompson, B. (2000). Sample compositions and variabilitiesin published studies versus those in test manuals: Validity of score reliability inductions. Ed-ucational and Psychological Measurement, 60, 509-522.

Vacha-Haase, T., Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reli-ability coefficients: A review of three journals. Journal of Experimental Education, 67, 335-341.

*Vanderzee, K., Buunk, B., & Sanderman, R. (1996). The relationship between social compari-son processes and personality. Personality and Individual Differences, 20, 551-565.

Viswesvaran, C., & Ones, D. S. (2000). Measurement error in “Big Five Factors” personality as-sessment: Reliability generalization across studies and measures. Educational and Psycho-logical Measurement, 60, 224-235.

*Weyers, P., Krebs, H., & Janke, W. (1995). Reliability and construct validity of the German ver-sion of Cloninger’s Tridimensional Personality Questionnaire. Personality and IndividualDifferences, 19, 853-861.

Whittington, D. (1998). How well do researches report their measures? An evaluation of mea-surement in published educational research. Educational and Psychological Measurement,58, 21-37.

Wilkinson, L., & APA Task Force on Statistical Inference, (1999). Statistical methods in psy-chology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

*Wilson, D. J., & Doolabh, A. (1990). A cross-cultural examination of Howarth’s primary fac-tors and Eysenck’s secondary factors among Zimbabwean adolescents. Personality and In-dividual Differences, 11, 657-662.

*Wilson, D. J., & Doolabh, A. (1992). Reliability, factorial validity and equivalence of severalforms of the Eysenck Personality Inventory/Questionnaire in Zimbabwe. Personality and In-dividual Differences, 13, 637-643.

*Wilson, D., & Mutero, C. (1989). Personality concomitants of teacher stress in Zimbabwe. Per-sonality and Individual Differences, 10, 1195-1198.

Yin, P., & Fan, X. (2000). Assessing the reliability of the Beck Depression Inventory scores: Re-liability across studies. Educational and Psychological Measurement, 60, 201-223.

CARUSO ET AL. 689

at University of Bucharest on March 6, 2014epm.sagepub.comDownloaded from