p s y c h o l o g ic a l s c ie n c e g eneral a …ghaeffel/wood(1996).pdfextensions and revisions...

PSYCHOLOGICAL SCIENCE

General Article

THE COMPREHENSIVESYSTEM FOR

THE RORSCHACH:A Critical Examination

By James M. Wood,' M. Teresa Nezworski,^ andWilliam J. StejskaP

'Department of Psychology, University of Texas at Ei Paso; ^School of Human Development, University of Texas atDallas: and ^Woodbridge Psychological Associates, Woodbridge, Virginia

The Comprehensive System (Exner, 1993) is widely accepted asa reliable and valid approach to Rorschach interpretation.However, the present article calls attention to significant prob-lems with the system. First, contrary to common opinion, theinterrater reliability of most scores in the system has neverbeen demonstrated adequately. Second, important scores andindices in the system are of questionable validity. Third, theresearch base of the system consists mainly of unpublishedstudies that are often unavailable for examination. Recommen-dations are made regarding research and clinical use of theComprehensive System.

By the end of the 1960s, heated cotitroversy had de-veloped among psychologists regarding the Rorschachtechnique. On the one hand, eminent critics (e.g., Ey-senck, 1959; Jensen, 1965; Zubin, Eron, & Schutner,1965} had identified numerous problems with the Ror-schach, including (a) lack of standardized rules for ad-ministration and scoring, (b) poor interrater reliability, (c)lack of adequate norms, (d) undemonstrated or weak va-lidity, and (e) susceptibility to situational influences.

On the other hand, defenders of the Rorschach ques-tioned the methodology and clinical relevance of existingresearch and cited the consensus of clinicians regardingthe test's value. There was a feeling that many criticismsof the test were "naive and unjust, often fomented frombias, ignorance, or simply a misunderstanding of themethod and the principles that led to its exploration byRorschach" (Exner, 1993, p. 3).

The controversy remained unresolved until the publi-cation of The Rorschach: A Comprehensive System{TRACS; Exner, 1974). That volume and its subsequentextensions and revisions (Exner, 1978, 1986, 1991, 1993;Exner & Weiner, 1982) appeared to accomplish the re-

Address correspondence to James M. Wood, Department of Psy-chology, University of Texas at El Paso, El Paso, TX 79968,

markable feat of satisfying both sides in the Rorschachcotitroversy. Satisfying the technique's defenders,TRACS presented an approach that preserved andstrengthened the long clinical tradition regarding the test.The Comprehensive System for administration, scoring,and interpretation borrowed features from the variousRorschach systems already in use (Exner, 1969, 1993).Eurthermore, the Comprehensive System incorporatedmany interpretive terms congenial to psychodynamicconceptualizations (e.g., dependency, narcissism, flightinto fantasy).

At the same time, TRACS provided enough standard-ization and empirical data to satisfy the most exacting ofscientific critics. Over a period of 20 years, the variouseditions of TRACS (a) established detailed, objectiverules for administration, scoring, and interpretation oftheRorschach; (b) catalogued extensive data regarding theinterrater reliability of the scales; (c) provided norms andreference data for numerous psychiatric and nonpsychi-atric groups, including children; and (d) cited numerousempirical studies to support the validity of Comprehen-sive System scores.

Thanks to these accomplishments, both sides in theRorschach controversy appeared to have "won." Withthe Comprehensive System, the clinicians had their testand the empiricists their data. As Anastasi (1988) com-mented, "The availability of this system, together withthe research completed thus far, has injected new life intothe Rorschach as a potential psychometric instrument"(p. 599).

Perhaps because the Comprehensive System repre-sents a "middle ground" (Groth-Mamat, 1990, p. 281),however, it has been scrutinized less carefully than mighthave been expected for so popular an assessment proce-dure. The present article focuses attention on three spe-cific issues regarding the Comprehensive System: inter-rater reliability, validity, and nature ofthe research base.

VOL, 7, NO, 1, JANUARY 1996 Copyright © 1996 American Psychologica] Society


Comprehensive System

INTERRATER RELIABILITY

Interrater Reliability in TRACS

Standju-d textbooks on clinical assessment have com-mented favorably on the interrater reliability of the Com-prehensive System (Erdberg, 1985; Groth-Marnat, 1990;Kaplan & Saccuzzo, 1993), For example, Groth-Mamat(1990) reported.

During the development of Exner's Cotnprehensive Systetn,Exner gave particular attention to reliability in developing hisdifferent scoring categories. No category was included unless itachieved a minimutn ,85 level between different scorers, , , . (p.279)

By incorporating detailed, explicit scoring rules, theComprehensive System appears to have overcome theproblems of rater disagreement that plagued earlier ap-plications of the Rorschach,

The positive opinions of commentators are supportedby numerous tables and discussions in TRACS (Exner,1993) that address interrater reliability in the form of"percentage of agreement." For example, percentages ofagreement for 32 scores in two different studies are pro-vided in a table titled "Percentage of Coder Agreementfor Two Reliability Studies" (Exner, 1993, p, 138). Thepercentages appear quite high, ranging from 88% to 99%.

However, percentage of agreement has long been rec-ognized as a potentially inadequate and misleading mea-sure of reliability (Cohen, 1960,1968; Fleiss, 1981; Light,1971; see also Jensen, 1965), The problem is that percent-age of agreement makes no adjustment for agreement bychance and can therefore yield itiflated estimates of trueconsistency among raters. In some circumstances, raterscan achieve a very high percentage of agreement even ifthey score completely at random.

As an example, consider m (i.e., the perceived move-ment of an inanimate object, such as "a swaying tree"),which occurs in about 5% of Rorschach responses (Ex-ner, 1993, p. 260). Imagine that two raters independentlyrate a large number of Rorschach protocols and randomlyassign a score of m to 5% of responses. Even though thetwo raters score at random, they will agree that m ispresent in about .0025 (.05 x .05) of responses and absentin about .9025 (.95 x .95). By chance alone, therefore,the total percentage of agreement between the two raterswill be .9050 (.0025 -t- .9025), This 90% agreement signi-fies random rather than good interrater reliability.

It is instructive to reexamine the TRACS reliabilitytable that was just cited (Exner, 1993, p. 138). The tableshows that 20 coders achieved 93% agreement, and 15

raters achieved 95% agreement, when scoring m. Thesepercentages are only somewhat higher than the 90% thattwo coders would be expected to achieve by chance.

This example illustrates why percentage of agreementis an unacceptable measure of interrater reliability. Yet itis the only measure provided in TRACS (Exner, 1993) formost scores. Despite the considerable data regarding inter-rater reliability in TRACS, therefore, the statistical ap-proach is inadequate and potentially misleading. If thereliability of Comprehensive System scores is to be eval-uated properly, appropriate statistics must be provided,such as kappa, phi. Spearman's rho, or Pearson's r(Fleiss, 1981; Nunnally & Bernstein, 1994),

Three additional difficulties may be noted regardingthe treatment of interrater reliability in TRACS. First, itis unclear how percentage of agreement was calculatedfor many Comprehensive System scores. Although twocomputational methods are described (Exner, 1991, pp.459—460), neither is appropriate for calculating percent-age of agreement for individual scores such as m.

Second, the percentages of agreement reported inTRACS are primarily for individual responses, not totalscores. For example, TRACS reports that raters achieved93% to 95% agreement when scoring individual re-sponses for m. However, the total number of m re-sponses in a protocol appears to be the most clinicallyrelevant figure, insofar as it serves as the basis for Com-prehensive System interpretations (Exner, 1991, p, 169),No reliability information regarding total m is provided inTRACS. The same is true for many other ComprehensiveSystem scores.

Third, for a number of important scores, TRACS (Ex-ner, 1991, 1993) provides no measure of interrater reli-ability, not even percentage of agreement. For example,no reliability information is provided for the Suicide Con-stellation, Schizophrenia Index, Depression Index, ormost individual content categories. The problems notedhere are particularly important because the Standards forEducational and Psychological Testing of the AmericanPsychological Association (1985, p. 20) state that the re-liability of test scores should be fully reported.

Field Interrater Reliability

A distinction may be made between a test's ideal inter-rater reliability and its field interrater reliability. Theideal reliability is demonstrated by highly trained expertswho are performing at their best under optimal condi-tions. By contrast, field reliability is demonstrated bypractitioners who are performing under the time con-straints and conditions typical of their work.

The ideal and field reliabilities of a test can be sub-stantially different, as illustrated by recent controversies

VOL, 7, NO, 1, JANUARY 1996


James M. Wood, M. Teresa Nezworski, and William J. Stejskal

in forensic medicine regarding the DNA test for tissuesamples. Although reliable when carried out by scientistswith appropriate training and equipment, the DNA testcan be unreliable when performed under the conditions ofsome commercial laboratories (Annas, 1992).

The field reliability of the Comprehensive System hasreceived little attention from researchers. However, onestudy (Exner, 1988) seems relevant. Quizzes were admin-istered to more than 300 alumni of the Rorschach Work-shops to evaluate scoring accuracy. The results were"disconcerting" (Exner, 1988, p. 5). Error rates rangedfrom around 15% for Determinants to about 27% for Spe-cial Scores. Exner commented:

The Rorschach is a good test from which to derive infonnatiortabout personality organization and functioning and it is reason-ably easy to interpret, but if the bulk of the interpretation isgenerated from a Structural Summary that has average errorrates similar to those in Table 1, the results will be misleading,and even totally wrong in some cases, (p, 5)

Surprisingly, the results of this study have not been dis-cussed in subsequent editions of TRACS (Exner, 1991,1993).

Reliability of Administration and RecordingAlthough the present discussion has focused on scor-

ing, the reliability of test administration and recordingalso merits comment. Before the advent of the Compre-hensive System, researchers found that Rorschachscores could be inadvertently contaminated by situa-tional factors (Masling, 1960), For example, the interper-sonal style of the test administrator could infiuence thesubject's responses to the cards (Lord, 1950),

To guard against such extraneous infiuences, the Com-prehensive System (Exner, 1993) requires that the test beadministered according to narrowly defined proceduresand that the administrator write down the subject's re-sponses verbatim. However, the degree to which exam-iners using the Comprehensive System in cliniced settingsactually adhere to the standardized administration proce-dures, or are capable of recording subjects' responsesverbatim, has not been studied empirically.

VALIDITY

Strictly speaking, it is imprecise to ask if the Compre-hensive System for the Rorschach is valid. The systemyields 54 percentages and ratios in a Structural Summary,in addition to numerous other scores, and the validity ofeach must be established separately. No single article canexamine the validity of all scores in the system. There-fore, the present discussion focuses on a subset of scores

that, according to TRACS, are related to clinically im-portant phenomena, such as psychological symptoms ordisorders, level of functioning, or level of stress. Becausesuch scores can infiuence decision making in importantcontexts (e.g,, clinical and forensic settings), their valid-ity is particularly important.

Exner (1991) has warned that "the Rorschach inter-preter usually should not anticipate the discovery of di-rect diagnostic evidence in the data ofthe test" (p. 129).However, the Comprehensive System includes severalscores (e.g., Egocentricity Index, Adjusted D, Depres-sion Index, Suicide Constellation) that bear directly onclinical decision making. Empirical evidence regardingthe validity of these scores is often scant or negative.

The Egocentricity Index and ReflectionsIn the Comprehensive System, a Rorschach response

is scored as a pair if it refers to two of the same object(two bears, two lobsters) and as a reflectiort if it refers toa mirror image or refiection ("trees refiected in a lake").Scores on the Egocentricity Index (EGOI) are derived bymultiplying the number of refiection responses by 3, add-ing the number of Pair responses, and dividing by thetotal number of Rorschach responses (Exner, 1974,1993).

According to TRACS (Exner, 1974, 1993), the EGOIand refiection responses are indicators of self-esteem,self-focus, and narcissism, ln a review of empirical stud-ies, however, we (Nezworski & Wood, 1995) concludedthat the EGOI and refiection responses are probably un-related to self-focus or self-esteem, and that their rela-tionship to narcissistic personality disorder has not beenestablished. The most recent edition of TRACS (Exner,1993) fails to cite numerous negative research findingsregarding the EGOI.

D and Adjusted DScores for D and Adjusted D are derived by an algo-

rithm from the number of Rorschach responses involvingmovement, color, achromatic color, texture, shading,and vista. According to TRACS (Exner, 1993), D andAdjusted D measure the presence of situational stressorsand the individual's ability to cope with them. However,in a recent review, Kleiger (1992, p. 293; but see Exner,1992b) noted "two broad problem areas" with researchregarding D and Adjusted D. First, about half of the em-pirical studies on major structural concepts in the Com-prehensive System are unpublished reports and have notappeared in refereed joumals. Second, the findings ofthepublished studies appear "equivocal."

The research reviewed by Kleiger (1992, pp. 293-294)included treatment outcome studies, laboratory studies.




and normative data on children (Exner, 1974, 1978, 1986;Exner & Bryant, 1975, 1976; Weiner & Exner, 1991;Wiener-Levy & Exner, 1981). Kleiger noted that somedata (Exner, 1974) seemed to be "incomplete" or "de-scribed in a confusing manner," making it "difficult forthe reader to evaluate adequately the inferences drawnfrom the research" (p. 293). In addition, Kleiger foundthat the results of a published study (Wiener-Levy &Exner, 1981) had been interpreted as support for the va-lidity of D and Adjusted D, but in fact contradicted earlierfindings (Exner & Bryant, 1975, 1976).

Tbe Depression Index

Like other indices included in the Comprehensive Sys-tem, the Depression Index (DEPI) is derived by combin-ing scores on several Rorschach variables according to analgorithm. The DEPI has existed in two versions. Thefirst (Exner, 1986) was found to miss a large proportion ofdepressed patients (Exner, 1991, pp, 22-26; Viglione,Brager, & Haller, 1988), The DEPI has therefore beenrevised, and the second version appears in the most re-cent edition of TRACS, Volume 2 (Exner, 1991),

According to data from normative and reference sam-ples (Exner, 1993, pp. 260-264, 309-311), the new DEPIis both sensitive and specific. About 75% of inpatientdepressives, but only 3% of nonpatient adults, havescores of 5 or higher on the DEPI. By contrast, otherresearchers have generally failed to find a relationship ofthe new DEPI to diagnoses or self-report measures ofdepression among child, adolescent, or adult patients(Ball, Archer, Gordon, & French, 1991; Meyer, 1993).Additionally, iti a dissertation study of 109 adult inpa-tients. Sells (1990/1991) found that scores on neither theoriginal nor the new DEPI were significantly correlatedeither with di^noses of depression, assigned accordingto the third revised edition of the Diagnostic and Statis-tical Manual of Mental Disorders (DSM-lIl-R; AmericanPsychiatric Association, 1987), or with scores on Scale 2(Depression) of the Minnesota Multiphasic PersonalityInventory (MMPI).

The question arises why the new DEPI performed wellwith the depressed reference sample in TRACS butpoorly in replications. Three possibilities may be identi-fied. First, it is not clear that the depressed patients de-scribed in TRACS were diagnosed accurately. TRACS(Exner, 1991, 1993) does not specify the diagnostic cri-teria or interview procedures used to identify depressedpatients in the reference sample, although the "DSM-III-SADS" [sic] is mentioned at one point in TRACS (Exner,1991, p. 23).

Second, and perhaps more important, criterion con-tamination may have influenced diagnoses of depressedpatients in the TRACS reference sample. Specifically,

TRACS (Exner, 1991, 1993) does not indicate that diag-noses of depression were made by judges blind to pa-tients' Rorschach scores. Some patients (particularlythose with ambiguous symptoms) may have been classi-fied as depressed on the basis of their Rorschachs, cre-ating a spuriously high correlation between diagnoses andDEPI scores.

Third, the DEPI appears to have fallen victim to an oldnemesis of empirically derived indices: shrinkage duringcross-validation. The DEPI was developed using actuar-ial methods (Exner, 1991). That is, variables that discrim-inated depressed from nondepressed subjects in the Ror-schach subject pool were identified empirically. Whenvariables are selected in this manner, they normally showless predictive power when applied to new groups, a phe-nomenon known as shrinkage (Meier, 1994, p. 116; Wig-gins, 1988, pp. 46-49). The proportion of hits may be highin the original group of subjects (the derivation sample)but dismally low in the new group (the cross-validationsample). To the disappointment of many a researcherwho has used the actuarial method, variables that appearpromising in a derivation sample may have no predictivepower in cross-validation groups.

The fact that the DEPI predicts depression among sub-jects in the Rorschach subject pool is to be expected. Theactuarial approach ensures such a result in a derivationsample. The critical question is whether the DEPI canpredict depression in new samples. The results of Ball etal. (1991), Meyer (1993), and Sells (1990/1991) suggestthat it cannot. The shrinkage of the DEPI in cross-validation may be so great that the scale lacks meaningfulpredictive power.

The Suicide ConstellationThe Suicide Constellation (S-CON) was also devel-

oped using actuarial methods and subsequently revised.The Rorschach protocols of patients who had committedsuicide and of control subjects were compared (Exner &Wylie, 1977). A cluster of 11 variables, the originalS-CON, was found to discriminate between suicides andother patients, with an optimal cutoff score of 8 or above.

During cross-validation, shrinkage in an actuarialscale's predictive power is likely to be substantial if theoriginal sample of subjects was small and the number ofvariables considered for inclusion in the scale was large.Because the original study of the S-CON (Exner &Wylie, 1977) involved a small sample of subjects (59 sui-cides) and considered a large number of variables (appar-ently more than 100), substantial shrinkage was to beexpected when the scale was cross-validated on a newsample.

A cross-validation of the S-CON has been reported(Exner, 1986, pp. 411-416; 1993, pp. 342-345), but sur-

VOL. 7, NO. 1, JANUARY 1996


James M. Wood, M. Teresa Nezworski, and William J. Stejskal

pdsingly, no shrinkage occurred. In the original sample(Exner & Wylie, 1977), 75% of suicide patients and 0% ofnonpatients had S-CON scores of 8 or above. In thecross-validation sample, the corresponding figures were74% and 0%. Contrary to what might have been ex-pected, sensitivity and specificity were undiminished.

In contrast to the excellent performance of the S-CONas reported in TRACS (Exner, 1986, 1993), S.K, Eymanand J,R. Eyman (1987; see also J.R. Eyman & S.K. Ey-man, 1992) found that in a sample of 50 patients whocommitted suicide, only 1 had an S-CON score of 8 orabove. They (J.R. Eyman & S.K. Eyman, 1992) con-cluded that the S-CON was "ineffective in predicting sui-cidal behavior" (p. 189).

These findings are not definitive: Rorschachs werescored according to the Comprehensive System but ad-ministered according to the system of Rapaport, Gill, andSchafer (1945), with inquiry immediately following eachcard. Furthermore, the length of time between testingand subjects' suicide varied from less than 90 days tomore than 2 years. Thus, although the results of S.K,Eyman and J.R. Eyman (1987) raise doubts concerningthe validity ofthe S-CON, further research is needed toresolve the issue. It should be added that a second ver-sion of the S-CON has been developed, based on minormodifications of the original scale (Exner, 1993). How-ever, the predictive power of the new version has not yetbeen demonstrated in a cross-validation study.

The Influence of Response FrequencyAs has long been recognized, many Rorschach scores

are correlated with R, the number of responses made bythe subject (Fiske & Baughman, 1953; Meyer, 1992). Be-cause R is influenced by intelligence, educational level,and social class, its influence on other scores is problem-atic (see discussion in Anastasi, 1988). Some commenta-tors (e.g., Groth-Mamat, 1984) believe that the Compre-hensive System has eliminated this problem by adjustingfor R or using ratios.

In fact, however, many of the clinical scores and in-dices of the system are unadjusted. For example, theS-CON, Schizophrenia Index (SCZI), DEPI, Coping Def-icit Index (CDI), Hypervigilance Index (HVI), and Ob-sessive Style Index (OBS) are either unadjusted or onlypartially adjusted for R.

The most thorough investigations of this topic arethose of Meyer (1992,1993; but see Exner, 1992a). Meyer(1993) found that among psychiatric patients, R was sig-nificantly correlated with the S-CON, SCZI, DEPI, CDI,HVI, and OBS. The size of the correlations rangedfrom .25 for the S-CON to .60 for the HVI. In addition,Meyer concluded that the indices might be valid for somevalues of R but not others. For example, the DEPI cor-related significantly with depression scales of the

MMPI-2 among patients with high R, but not amongthose with average or low R, or among the patient sampleas a whole.

Interpretatioi» Ba^d on a Single Response

In a recent refinement (Exner, 1991), the Comprehen-sive System provides interpretive statements for partic-ular test scores or test score combinations. In severalinstances, these statements are based on a single testresponse:

• If even a single refiection response appears in a Ror-schach protocol, an interpretive statement ofthe Com-prehensive System indicates that "a nuclear element inthe subject's self-image is a narcissistic-like featurethat includes a marked tendency to overvalue personalworth" (Exner, 1991, p. 173).

• A single Human Experience response ("two peoplewho are deeply in love, gazing longingly at eachother") is interpreted to mean that the subject "at-tempts to deal with issues of self-image and/or self-value in an overly intellectualized manner that tends toignore reality" and is likely to have "ideational im-pulse control problems" (Exner, 1991, p. 176).

• A single Food response ("a Thanksgiving turkey al-ready eaten") is interpreted to mean that the subject"can be expected to manifest many more dependencybehaviors than usually expected. . , ." If the subject isalso "passive," then "it is reasonable to conclude thata passive-dependent feature is an important core com-ponent in the personaHty structure ofthe subject" (Ex-ner, 1991, p. 184).

There are two reasons to question ComprehensiveSystem interpretations that identify a "core component"of personality on the basis of a single test response. First,any single-sign approach to personality assessment is lim-ited by that sign's reliability. Because a single behavioralindicator or test response is unlikely to have high reliabil-ity, most diagnostic and assessment approaches employmultiple indicators or signs. For example, in the fourthedition of the Diagnostic and Statistical Manual of Men-tal Disorders (DSM-IV; American Psychiatric Associa-tion, 1994), a diagnosis of narcissistic personality disor-der or dependent personality disorder requires thepresence of five criteria. The DSM-IV implicitly recog-nizes the unreliability of any individual sign, and there-fore requires that diagnoses be based on multiple signs.By contrast, the Comprehensive System identifies nar-cissism as a "nuclear element" or dependency as a"core component" of personality on the basis of a singlereflection or Food response.




Second, if clinical decisions about a subject's person-ality are to be based on a single test response, that re-sponse should have high validity. However, the singte-sign interpretations of the Comprehensive System oftenlack well-documented validity:

• As already discussed, we (Nezworski & Wood, 1995)concluded that the reflection response is probably un-related to self-esteem or self-focus.

• TRACS (Exner, 1991, 1993) provides no empirical ev-idence that a single Human Experience response indi-cates an "overly intellectualized" personality style or"ideational impulse control problems,"

• The most recent edition of TRACS (Exner, 1993, pp.438-439) reports three studies that have found a rela-tionship between Food responses and dependency:One is an unpublished study of the Rorschach Work-shops, and the remaining two are described withoutscholarly citation. The discussion in TRACS is brief:Two of the studies are each summarized in a singleparagraph, and the third is summarized in a single sen-tence. No statistical tests are reported. An earlier edi-tion of TRACS (Exner, 1974, p, 303) discussed the re-lationship between Food responses and dependency,cited two studies with contradictory findings, and con-cluded that the evidence was "at best, limited."

Incremental ValidityAlthough the present discussion has focused on evi-

dence of validity, the issue of incremental validity meritscomment (Sechrest, 1%3; Wiggins, 1988, pp, 250-251),Specifically, do Comprehensive System scores contrib-ute information relevant to clinical decision making be-yond what can be gained from a diagnostic interview andconsideration of MMPI-2 scores?

For example, cross-validation studies (Archer & Gor-don, 1988; Meyer, 1993) have confirmed that scores onthe SCZI ofthe Comprehensive System (Exner, 1993) arerelated to diagnoses of schizophrenia among adult andadolescent psychiatric patients. However, Archer andGordon (1988, p. 285) found that when optimal cutoffpoints were used. Scale 8 (Schizophrenia) of the MMPIclassified schizophrenic and nonschizophrenic adoles-cents more accurately than did the SCZI, Furthermore,classifications based on Scale 8 and the SCZI combinedwere not significantly more accurate than classificationsbased on Scale 8 alone.

The study by Archer and Gordon (1988) is not citedhere as proof that the SCZI lacks incremental validity.More research is necessary before the issue can be re-solved one way or the other (Archer & Krishnamurthy,1993), However, the example illustrates the point that

even valid indicators, such as the SCZI, may sometimesadd very little to existing sources of information. In thefuture, research may more thoroughly examine the incre-mental validity of Comprehensive System scores (Archer& Krishnamurthy, 1993),

RESEARCH BASE OF THECOMPREHENSIVE SYSTEM

The various editions of TRACS (Exner 1974, 1978,1986, 1991, 1993; Exner & Weiner, 1982) provide numer-ous citations to unpublished studies of the RorschachWorkshops, For example, the various editions of TRACScite 156 of Exner's works. Twenty-seven of these (17%)have been published in peer-reviewed journals, whereasninety-nine (63%) are unpublished studies of the Ror-schach Workshops. It may be said that the WorkshopsStudies constitute the broad empirical foundation of theComprehensive System.

In response to queries, the Rorschach Workshops hasinformed us that more than 1,000 Workshops Studieswere undertaken from 1968 to 1990. Thus, the studiescited in TRACS constitute less than 10% ofthe total. It isworth noting that 1,000 studies, each examining a singleoutcome variable, would yield 50 statistically significantresults by chance alone.

Many readers of TRACS are probably under the im-pression that the Workshops Studies are actual docu-ments that can be examined by other scholars. However,this impression is often mistaken. In preparation for writ-ing this article, we requested 23 of the Workshops Stud-ies cited in TRACS. Letters from the Rorschach Work-shops informed us that some of the Workshops Studieswere not in their files. The methods and results of theremaining studies either had not been formally written upor could not be released. We were informed that the Ror-schach Workshops could provide raw data related to spe-cific questions, but that we might have to pay for com-puter costs.

Based on the response from the Rorschach Work-shops, we arrived at three conclusions: First, most oftheWorkshops Studies cited in TRACS are apparently re-search projects, not written documents. Second, themethods and results of many Workshops Studies are un-available for public examination, except for summaries inarticles and books. Third, scholars who seek to examinethe studies often cannot obtain reports or quantitativeanalyses. Raw data may be obtained, but the expense ofdoing so may be substantial,

DISCUSSIONCurrent acceptance of the Comprehensive System

seems to be based on several implicit and explicit as-sumptions:

8 VOL. 7, NO. 1, JANUARY 1996


James M. Wood, M, Teresa Nezworski, and William J, Stejskal

1. In contrast to earlier Rorschach systems, the Compre-hensive System has demonstrated a high level of in-terrater reliability.

2. The clinical interpretations generated by the Compre-hensive System are consistent with research findingsand have been well validated.

3, The various indices of the Comprehensive Systemhave performed well in cross-validation samples.

4, The research base of the Comprehensive System iswell documented and has been scrutinized and con-firmed by independent scholars.

As the present article indicates, these assumptions aremistaken, Basic issues regarding the reliability and valid-ity ofthe Comprehensive System have not been resolved.Clinical interpretations generated by the system are ofteneither inadequately supported or inconsistent with re-search findings. In addition, the empirical underpinningsof the Comprehensive System are themselves in doubt.In particular, the unpublished Rorschach WorkshopsStudies, which constitute the main empirical support forthe system, are often unavailable for examination andreview.

Although the present discussion has focused on a lim-ited number of issues, the problems identified are so fun-damental as to raise questions about the ComprehensiveSystem as a whole. We suggest that psychologiststhoughtfully review the relevant scientific literature be-fore relying on a particular Comprehensive System scoreor index for clinical assessment. Ethical and practicalconsiderations (American Psychological Association,1985) discourage reliance on test scores whose reliabilityor validity has not been estabhshed, particularly if theresulting decisions may have a major impact on clients'lives (e,g., in clinical or forensic settings).

In our opinion, there is need for examination of theComprehensive System by researchers. Fundamental is-sues of reliability and validity are yet to be resolved.

• The interrater reliability of the various ComprehensiveSystem scores needs to be studied under both ideal andfield conditions. Also needed are field studies of theadministration, scoring, and recording practices ofpsychologists who use the Comprehensive System.

• In the future, the validity of scores in the Comprehen-sive System should be tested rigorously rather thanassumed. The present article has focused on clinicalscores. Similar considerations apply to the validationof nonclinical scores. Consistent with our earlier con-clusions (Nezworski & Wood, 1995), we suggest thatclinical validation studies (a) use well-defined and rig-orous diagnostic criteria (e,g., from structured inter-

views based on the DSM-III-R or DSM-IV), (b) ensurethat diagnosticians are blinded to Rorschach results,(c) use other tests in addition to the Rorschach to mea-sure relevant constructs, (d) examine the relationshipbetween Rorschach scores and ecologically valid, real-world behaviors, and (e) report findings thoroughlyand completely, including measures of di^nostic per-formance (see Kessel & Zimmerman, 1993).

Acknowledgments—We gratefully acknowledge Richard Bootzin,Jane Bretschger, Russell Clark, Robyn Dawes, Lynette Heslet,Pamela McCauley, Lee Sechrest, Leonore Simon, and Randy Whit-worth for their assistance and comments on this article.

REFERENCESAmerican Psychiatric Association. (1987). Diagnostic and statistical manual of

mental disorders (3rd ed., rev.). Washington, DC: Author.American Psychiatric Association. (1994). Diagnostic and statistical manual of

mental disorders (4th ed.). Washington, DC: Author.American Psychological Association. (1985). Standards for educational and psy-

chological testing. Washington, DC: Author.Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.Annas, G,J. (1992). Setting standards for the use of DNA-typing results in the

courtroom: The state of the art. New England Journal of Medicine. 326,1641, 1644.

Archer, R.P., & Gordon. R.A. (1988). MMPI and Rorschach indices of schizo-phrenic and depressive diagnoses among adolescent inpatients. Journal ofPersonality Assessment, 52. 276-287.

Archer. R.P,, & Krishnamurthy, R. (1993). Combining the Rorschach and theMMPI in the assessment of adolescents. Journal of Personality Assessment,60, 132-140.

Ball, J.D., Archer, R.P.. Gordon, R.A., & French, J. (1991). Rorschach depres-sion indices with children and adolescents: Concurrent validity fmdings.Journal of Personality Assessment, 57, 465—476.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational andPsychological Measurement, 20, 37-46.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision forscaled disagreement or partial credit. Psychological Bulletin, 70. 213-220.

Erdberg, P. (1985). The Rorschach. In C.S. Newmark (Ed.), Major psychologicalassessment instruments (pp. 65-6S). Boston: Allyn and Bacon.

Exner, J.E, (!%9). The Rorschach systems. New York: Grune and Stratton.Exner. J.E. (1974). The Rorschach: A comprehensive system: Vol, 1. New York:

Wiley.Exner, J.E, (1978). The Rorschach: A comprehensive system: Vol. 2. Current

research and advanced interpretation. New York: Wiley.Exner, J,E. (1986). The Rorschach: A comprehensive system: Vol. I, Basic foun-

dations (2nd ed.). New York: Wiley.Exner, J.E. (1988). Scoring issues. I9S8 Alumni Newsletter, pp. 4-8.Exner, J.E. (1991). The Rorschach: A comprehensive system: Vol. 2. Interpreta-

tion (2nd ed.). New York: Wiley.Exner, J.E. (1992a). R in Rorschach research: A ghost revisited. Journal of Per-

sonality Assessment, 5S, 245-251.Exner, J.E. (1992b). Some comments on "A conceptual critique of the EA:es

comparison in the Comprehensive Rorschach System." Psychological As-sesstnent, 4, 297-300.

Exner, J,E, (1993). The Rorschach: A comprehensive system: Vol. I. Basic foun-dations (3rd ed.). New York: Wiley.

Exner, J.E,, & Bryant, E, (1975). Pursuit motor performance and the Ea-eprelation (Workshops Study No. 222). tJnpublished manuscript, RorschachWorkshops, Asheville, NC.

Exner, J.E., & Bryant, E. (1976). Mirror star tracing as related to differentRorschach variables (Workshops Study No. 240). Unpublished manuscript,Rorschach Workshops, Asheville, NC.

Exner, J,E,, & Weiner, I.B. (1982). The Rorschach: A comprehensive system:Vol. 3, Assessment of children and adolescents. New York: Wiley.

Exner, J.E., & Wylie, J. (1977). Some Rorschach data concerning suicide. Jour-nal of Personality Assessment. 41, 339-348.

Eyman, JR., & Byman, S.K. (1992). Personality assessment in suicide predic-tion. In R.W. Waris, A.L. Berman, J.T, Maitsberger, & R,I. Yufit (Eds.),

VOL. 7, NO. 1, JANUARY 1996



Assesstnent and prediction of suicide (pp. 183-201). New York: GuilfordPress.

Eyman, S.K., & Eyman, J.R. (1987. August). An investigation of Exner's SuicideConstellation. Paper presented at the meeting of the American Psycholog-ical Association, New York. (Available from J.R. Eyman, MenningerClinic, Box 829, Topeka, KS 66601-0829)

Eysenck, H.J. (1959). The Rorschach Inkblot Test. In O.K. Buros (Ed.), Thefifth mental measurements yearbook (pp. 276-278). Highland Park. NJ:(jryphon Press.

Fiske, D.W., & Baughman, E,E. (1953). Relationships between Rorschach scor-ing categories and the total number of responses. Journal of Abnormal andSocial Psychology. 4S, 25-32.

Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.). NewYork: Wiley & Sons.

Groth-Mamat, G. (1984). Handbook of psychological assessment. New York:Van Nostrand Reinhold.

Groth-Mamat, G. (1990). Handbook of psychological assessment (2nd ed.). New-York: Wiley & Sons.

Jensen, A.R. (1965). The Rorschach Inkbiot Test. In O.K. Buros (Ed.), The sixthmental measurements yearbook (pp. 501-509). Highland Park, NJ: GryphonPress.

Kaplan, R.M., & Saccuzzo. D.P. (1993). Psychological testing (3rd ed.). PacificGrove, CA: Brooks/Cole.

Kessel, J.B., & Zimmerman, M. (1993). Reporting errors in studies of the diag-nostic performance of self-administered questionnaires: Extent ofthe prob-lem, recommendations for standardized presentation of results, and impli-cations for the peer review process. Psychological Assessment, 5, 395-399.

Kleiger, J.H, (1992). A conceptual critique of the EA:es comparison in the Com-prehensive Rorschach System. Psychological Assessment, 4, 288-2%.

Light, R.J. (1971). Measures of response agreement for qualitative data: Somegeneralizations and alternatives. Psychological Bulletin, 76, 365—377.

Lord, E. (1950). Experimentally induced variation in Rorschach performance.Psychological Monographs. 64(Whole No. 316).

Masling, J. (1960). The influence of situational and interpersonal variables inprojective testing. Psychological Bulletin, 57, 65-55.

Meier, S.T. (1994). The chronic crisis in psychological measurement and assess-ment. San Diego: Academic Press.

Meyer, G.J. (1992). Response frequency problems in the Rorschach: Clinical andresearch implications with suggestions for the future. Journal of PersonalityAssessment, 5S. 231-244.

Meyer, G,J. (1993). The impact of response frequency on the Rorschach constel-lation indices and on their validity with diagnostic and MMPl-2 criteria.Jourruil of Personality Assessment, 60, 153-180.

Nezworski, M.T., & Wood, J.M. (1995). Narcissism in the Comprehensive Sys-tem for the Rorschach. Clinical Psychology: Science and Practice, 2, 179-199.

Nunnally, J.C., & Bernstein, l.H. (1994). Psychometric theory (3rd ed.). NewYork: McGraw-Hill.

Rapaport, D., Gill, M., & Schafer, R. (1945). Diagnostic psychological testing.Chicago: Yearbook Publishers.

Sechrest, L. (1963), Incremental validity: A recommendation. Educational andPsychological Measurement, 23, 153—158.

Sells, J.E. (1991). A validity study of the DEPI index: The Rorschach Compre-hensive System (Doctoral dissertation. University of Utah, 1990). Disser-tation Abstracts International, 51, 5590B.

Viglione, D,J., Brager, R.C, & Haller, N. (1988). Usefulness of structural Ror-schach data in identifying inpatients with depressive symptoms: A prelim-inary study. Journal of Personality Assessment, 52, 524-529.

Weiner, IB. . & Exner, J.E. (1991). Rorschach changes in long-term and short-term psychotherapy. Journal of Personality Assessment, 56, 453-465.

Wiener-Levy, D., & Exner, J.E. (1981). The Rorschach EA-ep variable as relatedto persistence in a task frustration situation under feedback conditions.Journal of Personality Assessment, 45, 118-124.

Wiggins, J,S. (1988). Personality and prediction: Principles of personality assess-ment. Malabar, FL: KHeger.

Zubin, J., Eron, L,D., & Schumer, F. (1965). An experimental approach to pro-jective techniques. New York: Wiley.

(RECEIVED 8/10/94; ACCEPTED 12/U/94)

10 VOL. 7, NO. 1, JANUARY 1996

p s y c h o l o g ic a l s c ie n c e g eneral a …ghaeffel/wood(1996).pdfextensions and revisions...

Documents