methodological pitfalls in predicting counseling success

9

Click here to load reader

Upload: randall-parker

Post on 10-Oct-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methodological pitfalls in predicting counseling success

Journal of Vocational Behavior, 5, 31-39 (1974)

Methodological Pitfalls in Predicting Counseling Success1 v2

RANDALL PARKER The University of Texas at Austin

Prediction models employing multiple linear regression of raw scores, multiple linear regression of factor scores, the single best predictor, and a nine-point decision rule index were compared. The subjects were 296 clients undergoing vocational counseling and evaluation. Predictor variables included performance ratings, demographic variables, and WAIS subtest scores; the criterion was employment status upon program completion. The least statistically sophisticated model, employing the single best predictor, was the most successful approach. Considerable shrinkage in power of prediction was demonstrated upon cross-validation particularly for multiple linear regression of raw scores model, indicating the necessity of cross-validating prediction schemes. Additional suggestions are made to those designing prediction studies.

A number of attempts have been made to devise prediction schemes to be used in screening or placement of clients in various counseling programs (e.g., Ayer, Thoreson, & Butler, 1966; DeMann, 1963; Drasgow & Dreher, 1964; Norris, Marra, & Zadrozny, 1960; Perlman & Hylbert, 1969). Although such studies typically employ sophisticated statistical or configural techniques in developing prediction methods, they frequently fail to compare the developed methods With the base rates. In, addition, some studies fail to cross-validate prediction equations, and thus fail to take into account the shrinkage in predictive accuracy that almost always occurs upon cross- validation (see Helmstadter, 1964). The result of these failures is inconsistent findings among studies, or what Coone and Barry (1970) refer to as the “oscillating r phenomenon.” The purpose of this article was to compare four prediction models regarding their relative effectiveness, to demonstrate the effects of shrinkage in predictability upon cross-validation, and finally, to evaluate each of the prediction schemes.

1Requests for reprints should be sent to the author at the University of Texas, 327 Sutton Hail, Austin, Texas 78712.

2Financial support for data gathering was provided by RSA Grants RD-2326-G and RD-2655-G. The assistance of Dr. Dan Bolanovich and Dr. Joe Kunce is gratefully acknowledged. This is a revised version of a paper presented at the APGA Annual Convention in New Orleans, March 1970.

31

Copyright @ 1974 by Academic Press, Inc. All rights of reproduction in any form reserved.

Page 2: Methodological pitfalls in predicting counseling success

32 RANDALL PARKER

METHODOLOGY

Subjects and procedure. The subjects for this study were 296 disabled clients of the St. Louis Jewish Employment Vocational Service who had been referred for vocational counseling and evaluation, and who had complete data available on the variables of concern. The variables of concern included 27 predictor variables: race, sex, age, education, 13 Wechsler Adult Intelligence Scale (WAIS) subtest and total scores, and ten ratings of workshop perform- ance. The workshop ratings were obtained from workshop counselors and supervisors after the client had completed three, weeks of evaluation. Ratings were made on seven-point graphic rating scales in the following ten areas: productivity, ability to get along with others, motivation, ability to follow instructions, punctuality, reliability, judgement, social competence and communication skills, cooperativeness, and dress and appearance.

The criterion variable, employment status, was a sevencategoried variable which was determined by the counselors after the evaluation program. Category one was assigned to clients who had become competitively employed; category two, to clients whom the counselors judged were employable; category three, to those who were judged employable but in need of educational or vocational training; category four, to clients who were judged to need further work adjustment training; category five, to those who were judged as employable only in sheltered settings; category six, to those who required further medical or other rehabilitation services; and category seven, to those who were judged to be unemployable regardless of additional service.

Prediction models. First, the 296 clients were randomly assigned to sample A (n = 151) or sample B (n = 145). Sample A was the sample used to develop the different prediction methods, and sample B was used as a cross-validation sample. Four prediction models were compared. The first model, stepwise multiple linear regression of raw scores, is probably the procedure most frequently employed in prediction studies. Veldman (1967, p. 294) indicates that this “. . .analytic procedure determines a set of weights for the predictor variables. . .which will yield a composite variable. . .that correlates maximally with the criterion variable. . . .” The second model involved the same regression procedure, but of varimax rotated, principal axis factor scores instead of raw scores. Computation of the factor scores involved a factor analysis and other related statistical procedures described fully in Veldman (1967). The third model, the single best predictor, involved identi- fying the predictor variable that correlated most highly with the criterion, and using the values of that variable as the sole predictor. The fourth and final model, the decision rule index, required the formulation of nine decision rules for predicting clients who will become successfully employed. The rules were based on the research reported by Kunce and Worley (1970) and on a

Page 3: Methodological pitfalls in predicting counseling success

PITFALLS IN PREDICTION 33

Actual Counseling Outcome3

Hits

success

Cell A

Predicted success

Cell B

Predicted failure

(false positives)

Failure

Cell D

Predicted success

Fig. I. Hitsand misses depicted within the context of predicted and actual outcomes.

personal communication with Kunce. Each prediction scheme will be described more fully later.

The initial step in evaluating the regression of raw scores and regression of factor scores models was to reduce the 27 predictors to a more manageable number by identifying a subset of the “best” predictors. A stepwise multiple correlation analysis (Veldman, 1967) was computed on sample A employing the 27 aforementioned variables as predictors, and the employability status variable as the criterion. Since it had been initially decided arbitrarily to reduce the original number of predictors by two-thirds, nine of the 27 predictors that contributed most of the variance to the predictor-criterion relationship were selected (see Table 1).

The first prediction model employed was the stepwise multiple linear regression technique. The nine selected predictors were used to predict the seven-categoried criterion for sample A. Regression weights and the cutoff score minimizing the number of errors in prediction were determined (Helmstadter, 1964). The predicted criterion score for each subject of samples A and B was computed using the regression weights computed from sample A. By using the cutoff score for sample A and comparing the predicted score with the actual criterion for every subject, the number of hits and misses in prediction were determined for both samples. The number of hits is the number of people who are predicted to succeed who actually do, and also those who are predicted to fail and actually do. The term “misses” refers to the number of people for whom predictions are in error. “False positives” are the subgroup of “misses” who succeeded although failure was predicted (see Fig. 1).

3Those clients who had criterion scores of l-3 comprised the actual success group (cells A & B). Those with scores of 4-7 made up the actual failure group (cells C & D). The criterion variable was reduced from a seven-point scale to the success-failure dichotomy after the statistical computations were made to simplify the presentation of the results.

Page 4: Methodological pitfalls in predicting counseling success

34 RANDALL PARKER

Another approach, multiple linear regression of factor scores (Darling- ton, 1968), necessitated a factor analysis of the nine predictors for sample A. Using unities as communality estimates, three factors with eigenvalues greater than 1 .O and accounting for 67.31% of the trace were extracted, and varimax rotated factor scores were computed for each of the subjects on each of the factors (Veldman, 1967). The three factor scores then became the predictor variables in a multiple linear regression analysis of sample A. The remaining procedure involved determining the regression weights and the cutoff score (Helmstadter, 1964) for sample A, applying these parameters to samples A and B, and counting the number of hits and .misses in prediction for both sample A (the initial sample) and sample B (the cross-validation sample).

For the third model, the single best predictor, the predictor that correlated most highly with the criterion (rating of motivation), was used in predicting the employment status criterion. A cutting score which minimized errors in prediction was computed (Helmstadter, 1964) on sample A and used on sample B. The number of hits and misses for both samples was also determined.

The last model employed was a simple nine-point index, based on research by Kunce and Worley (1970) and on personal communication with Kunce. The technique involved nine decision rules scored one if true and zero if false. Each client’s total score was the sum of his scores from each of the following items:

1. High school graduate? 2. WAIS Comprehension score greater than 7? 3. WAIS Picture Completion score greater than 7? 4. WAIS Picture Arrangement score greater than 7? 5. Productivity rating greater than 4? 6. Motivation rating greater than 4? 7. Ability to follow instructions rating greater than 4? 8. Cooperativeness rating greater than 4? 9. Dress and appearance rating greater than 4?

Total decision rule scores were computed for each client and the cutoff score minimizing the number of erroneous predictions was determined (Helmstadter, 1964) for sample A and used to determine the number of hits and misses for both samples A and B.

Although frequently not considered in prediction studies, the base rates provide an essential function as Meehl and Rosen (1955) have pointed out. Base rates simply reflect the percent of individuals succeeding (or failing, whichever is larger) in a program. If 60% of the counselees were judged successful in a counseling program, one could predict success for all entrants and be correct 60% of the time. To be of any value, a prediction scheme must improve over the base rates.

Page 5: Methodological pitfalls in predicting counseling success

PITFALLS IN PREDICTION 35

TABLE 1

The Nine Best Predictors’ Correlations with the Criterion, Cumulative Contribution to the Total Variance, and Factor Loadings on the Three Extracted Factors

Predictor Variable r4 RSQ5 Factor 1 Factor 2 Factor 3

Motivation -.55** .2983 .86 .oo -.02 Productivity -.53** .3622 .79 .lO -.16 Cooperativeness -.51** .3856 .81 -.05 .02 Education .16* .4015 -.02 .16 .94 WAIS Picture Arrangement .05 .4068 .02 .72 -.02 WAIS Picture Completion -.02 .4124 -.05 .85 .04 WAIS comprehension .03 .4198 .ll .77 .17 Ability to follow instructions -.43** .4261 .68 .38 -.26 Dress and appearance -.26** .4307 .69 -.08 .24

4r represents the predictor-criterion correlations. 5RSQ represents the cumulative value of the squared multiple correlation coefficient

which also is the cumulative proportion of variance accounted for by each predictor. *p < .05.

**p < .Ol.

RESULTS

The nine best predictors (see Table 1) were ratings of motivation, productivity, and cooperativeness; level of education; WAIS Picture Arrange- ment, Picture Completion, and Comprehension; and ratings of the ability to follow instructions, and dress and appearance, in drder of their decreasing contribution to the variance of the predictor-criterion relationship. All variables, except the three WAIS subtests, correlated significantly with the employability criterion. A factor analysis of the nine best predictors yielded three factors. Factor 1 had the five ratings loading highly on it and might be termed Work Skills. The three WAIS subtests loaded on Factor 2; this factor could be labeled Intellectual Ability. The third factor had only the level of education loading on it, and thus might simply be termed Educational Level. One could, however, explain the three factors largely by method variance (Jackson, 1969) in that all variables loading on Factor 1 were determined by rating, Factor 2 by testing, and Factor 3 by interview.

The number of hits and misses for the initial and cross-validation samples for each of the four prediction models is presented in Table 2. Most notable is that for the regression of raw scores model the number of hits declined substantially (127-43) from the initial to the cross-validation sample, while the number of false positives sharply increased (1 l-60). For the regression of factor scores model the number of hits declined less (121-98), and the number of false positives more than doubled (13-33). The remaining models

Page 6: Methodological pitfalls in predicting counseling success

36 RANDALL PARKER

TABLE 2

Comparison of Prediction Models on the Number of Hits and Misses for Successful and Unsuccessful Clients6

Model

Hits

Initial sample Cross-validation sample

Success Fail Total Success Fail Total

72 55 127 28 15 43

fo~o~~ah~~o~~sges- . Misses 11 13 24 60 42 102

Total 83 68 151 88 57 145

Hits Multiple linear regres- Misses sion of factor scores

Total

70 51 121 55 43 98 1 3 17 30 33 14 47

83 68 151 88 57 145

Hits 70 46 116 67 37 104

Single best predictor Misses 13 22 35 21 20 41

Total 83 68 151 88 57 145

Hits 70 43 113 62 39 101

Nine-point index Misses 13 25 38 26 18 44

Total 83 68 151 88 57 145

6~uccess clients were those who had obtained criterion scores of 1 through 3, i.e., employed competitively or judged employable after evaluation. Unsuccessful clients (fait) were those who had obtained criterion ratings of 4 through 7, i.e., judged unemployable or as needing further rehabilitation services before they could be‘considered employable. All figures are percentages.

showed less shrinkage in the number of hits and less inflation in the number of false positives. If in similar situations, one wished to minimize the number of people who would succeed although they are predicted to fail (false positives), he should be wary of the regression of raw scores model.

The most direct comparison of the predictive models’ efficiency is presented in Table 3. The columns depicting Improvement over the base rates upon cross-validation and Shnizkage in predicting power from the initial to cross-validation sample are particularly revealing. The single best predictor model showed the greatest amount of improvement over the base rates upon cross-validation (1 l.O%), followed by the nine point index (9.0%), the regression of factor scores model (6.9%) and the regression of raw scores model (-31.0%). The last model was the only one that failed to improve over base rate prediction. Both the single best predictor and the nine-point index showed the least shrinkage (10.8%) in prediction power upon cross-validation. Shrinkage was somewhat greater for the regression of factor scores model

Page 7: Methodological pitfalls in predicting counseling success

PITFALLS IN PREDICTION 37

TABLE 3

Comparison of Prediction Models on Validity Rate, Base Rate, Improvement over Base Rate, and Shrinkage upon Cross-Validation

Mo de1 Sample Measure7

Validity rate Base rate Improvement Shrinkage

Multiple linear regres- Initial sion of raw scores Cross-validation

Multiple linear regres- Initial sion of factor scores Cross-validation

Initial Single best predictor

Cross-validation

Nine-point index Initial

Cross-validation

84.1 55.0 29.1

29.7 60.7 - -31.0 60.1

80.1 55.0 25.1

67.6 60.7 6.9 18.2

76.8 55.0 21.8

71.7 60.7 11.0 10.8

74.8 55.0 19.8

69.7 60.7 9.0 10.8

7Validity rate = (# correct prediction/N) X 100. Base Rate = (#successful/N) X 100. Improvement = (validity rate)-(base rate). Shrinkage = (initial improvement)-(improvement upon cross-validation). All figures are percentages.

(18.2%), and tremendously large for the regression of raw scores model (60.1%).

A multiple regression enthusiast might point out that the stepwise linear regression method might have been applied more appropriately. Instead of using all nine predictors, only those that added significantly to the predictor- criterion relationship might have been used. This is often not done in practice, and in some cases with justification. Since predictor and criterion measures in the behavioral sciences are usually not perfectly reliable, predictor-criterion relationships may vary considerably from one sample to another; the best predictor in one sample may do poorly in a second sample. Adding predictors beyond those that are significant will frequently improve the prediction upon cross-validation. In this study only the first two predictors were significant at the .OS level. Employing the two best predictor regression equation resulted in a 68.3% validity rate upon cross-validation, a 7.6% improvement over the base rates, and an 18.9% shrinkage from the initial to cross-validation sample. Although this model performed much better than the nine predictor regression equation, it still fell short of the single best predictor and nine point index. Adding a third predictor, and also a fourth, each improved the validity rate slightly; adding five to nine predictors to the multiple regression equation decreased the validity rate at each step.

Page 8: Methodological pitfalls in predicting counseling success

38 RANDALL PARKER

DISCUSSION

The results suggest several considerations for those engaging in prediction studies. To evaluate developed prediction schemes, it is necessary to cross-validate and measure the scheme’s improvement over the base rates. Stepwise multiple linear regression of raw scores, which is frequently used in prediction studies, may not be the “best” approach, as in this study. This method should be most seriously considered when there are few (10 or less) reliable predictors, and where moderate shrinkage upon cross-validation can be tolerated (Baggaley, 1964: Coone & Barry, 1970; Darlington, 1968).

In situations where there are many predictors (more than lo), some of which may be rather unreliable, the stepwise multiple linear regression of factor scores approach should be considered. Here factor analysis serves both to reduce the number of predictor variables and to increase the reliability of the predictors (Darlington, 1968).

The single best predictor approach can perhaps best be used in situations where the predictors tend to intercorrelate highly and also correlate highly with the criterion. This approach should involve little shrinkage in predicting power from the initial to cross-validation samples, assuming the samples are similar. It is a relatively simple model to employ and might routinely be used along with the base rates to evaluate other prediction models.

A simple decision rule index can be developed when theory, previous research, or clinical experience would lead to specific decision rules. This procedure may be indicated when curvilinear or other complex relationships exist between the predictors and the criterion. If such a prediction scheme is soundly based on theory or empirical relationships, piediction power may be retained even when used with somewhat dissimilar samples (Kunce & Worley, 1970).

In short, when embarking on a prediction study, one should consider several appropriate models, cross-validate ‘the prediction schemes, and determine the schemes’ improvement over the base rates. One would usually select the model that shows the greatest improvement over the base rates upon cross-validation. However, the choice of a model may also depend on the kind of errors made. For instance, one might wish to choose a model that minimizes the number of false positives. This would most likely be the case when selecting a prediction method to screen clients for counseling; one would probably want to limit the number of those who would be screened out although they would improve or be successful in a counseling program. Following these suggestions should result in both the selection of the best prediction model and in more consistent results from one study to another.

Page 9: Methodological pitfalls in predicting counseling success

PITFALLS IN PREDICTION

REFERENCES

39

Ayer, M. J., Thoreson, R. & Butler, A. Predicting rehabilitation success with the MMPI and demographic data. Personnel and Guidance Journal, 1966, 44, 631637.

Baggaley, A. Intermediate correlational methods. New York: John Wiley & Sons, 1964. Coone, J. & Barry, J. The oscillating r phenomenon in the prediction of rehabilitation

outcome. Rehabilitation Research and Practice Review, 1970, 1, 61-71. Darlington, R. Multiple regression in psychological research and practice. Psychological

BuNetin, 1968, 69, 161-182. DeMann, M. A predictive study of rehabilitation counseling outcomes. Journal of

Counseling Psychology, 1963, 10, 340-343. Drasgow, J. & Dreher, R. Predicting client readiness for training and placement in

vocational rehabilitation. Rehabilitation Counseling Bulletin, 1964, 8, 94-98. Helmstadter, G. Princiiles of psychological measurement. New York: AppletonCentury-

Crofts, 1964. Kunce, J. & Worley, B. Simplified prediction of occupational adjustment of distressed

clients. Journal of Counseling Psychology, 1970, 17, 326-330. Jackson, D. Multimethod factor analysis in the evaluation of convergent and discriminant

validity. Psychological Bulletin, 1969, 72, 30-49. Meehl, P. & Rosen, A. Antecedent probability and the efficiency of psychometric signs,

patterns, or cutting scores. Psychological Bulletin, 1955, 52, 194-216. Novis, F., Marra, J., & Zadrozny, L. Quantitative measurement in the initial screening of

rehabilitation potential. Personnel and Guidance Journal, 1960, 39, 262-269. Perlman, L. & Hylbert, K. Identifying potential dropouts at a rehabilitation center.

Rehabilitation Counseling Bulletin, 1969, 13, 217-225. Veldman, D. Fortran programming for the behavioral sciences. New York: Holt, Rinehart,

and Winston, 1967.

Received: July 16, 1973.