Can past academic criteria predict students at risk of future failure?
Post on 02-Oct-2016
Embed Size (px)
Medical Education 1988, 22, 94-98
Can past academic criteria predict students at risk of future failure?
N . C. B O R E H A M t , C . R U S S E L L $ & D. G. W A S T E L L S
t Department of Higher Education, f Department oj- Oval Medicine and Faculty of Medicine Computational Group, Univerrity of Manchestev
Summary. A significant minority of medical and dental students fail their undergraduate courses. Early warning systems (EWSs) have been developed in some areas ofhigher education to predict at-risk students at an early remedial stage. A n attempt is made to develop an EWS to predict failure in the bacteriology component of the Batchelor of Dental Surgery course at Manchester Dental School. A system bascd on class tests and previous end-of-year performance is derived which is used to predict those students likely to fail o r fall in the bot tom 2@25% in their finals examination. T h e predictors are combined by a simple equal weights method, which 15 found to have the same predictive power as using multiple regression. Failure w a s correctly pre- dicted in 60% of cases, at the expense of 71% false alarms. The high number o f false alarms reflects the low failure rate rather than the lack o f predictive information. T h e need for effective cross-validation o fEWSs is discussed; many pre- vious studies have not been tcstcd on indcpcn- dent data.
Key words: *education, dental; *educational measurement; educational status; bac- teriology/educ; risk; England
Despite the highly selective nature of entry to medical and dental schools in the UK, academic failure remains a problem affecting a significant
Correspondence: Dr D. G. Wastell, Faculty of Medicine Computational Group, University of Manchester, Oxford Road, Manchester M i 3 9PT, UK.
minority o f students. Undoubtedly, most institutions provide some sort offeedback to stu- dents who , f rom their course performance, appear to be at risk of failure. There have been experiments with formal early warning systems (EWSs) in some subject areas and the aim o f this study was to investigate whether an effective system could be devised in dentistry.
Table 1 summarizes the critical features of several published EWSs. There arc many differ- ences between the studies which hamper com- parison. All take a number of predictor variables and combine them in some way; an at-risk threshold is then set such that failure is predictcd if performance falls below this level. T h e predic- tors arc most commonly class tests and/or university selection data but some studies have used social and personality variables (Kapur 1972). There is little consistency in the way pre- dictors are combined; important differences in background factors (difficulty of subject mat- ter, support for students, etc.) also make com- parisons difficult and possibly misleading.
Many of the published systems have been tested on the same data f rom which they were derived, rather than new data, which is well known to give an optimistic bias for any predic- tion system (Wastell 1987). On ly two of the studies (Nisbett & Welsh 1966, 1976) have been independently cross-validated. There is also con- siderable variation, arbitrariness and vagueness concerning at-risk criteria in the studies. T h e setting of the at-risk threshold is critical: a high threshold will lead to more correct predictions o f failure but will also produce more false alarms.
O n average, the EWS studies correctly pre-
Early warning of academic failure 95
Table I. Summary of early warning system studies
Authors Subject Predictors Prediction rate function (Yo)
Sherwin & Child (1976)
Nisbett & Welsh (1966)
Hoare & Yeaman (1971)
Nisbet & Welsh (1976)
Chemistry ( n = I 8 I )
First-year science students (n = 6 I 4) First-year science students ( n = 1 6 ~ ) First-year arts and science (n=977) First-year intake (n= I 8 3 8) First-year science students
First-year physics students (n = 700)
Class tests Selection data
Selection data Social background, IQ, etc.
Questionnaire: social background, personality, etc. Class tests Selection data
Complex algorithm involving number and grade of A levels Failure in two out of four class tests
Failure in two out of three subjects
High proportion of negative indicators
Failure in two out of four class tests
Failure in one out of three subjects
Hits alarm (Yo) (Yo)
dicted 62% of actual failures (the hit rate). T h e false alarm rate in these studies is defined as the proportion o f students predicted to fail (i.e. at risk) w h o actually passed. That the rate seems to be relatively high (53% on average) reflects more the relative rarity of failure than deficiencies in the predictive information. Because success is roughly four times as common as failure, pre- dicting failure in only a small proportion of suc- cessful students produces a large number of incorrect predictions when measured relative to the size o f the at-risk group.
The present study focuses on the feasibility of predicting one component o f the dental cur- riculum, the teaching of bacteriology, which is examined at the end o f the second year o f the degree course. A pass on the second-year examinations is a prerequisite for entry into the final year o f study, which leads to the degree o f Batchelor o f Dental Surgery (BDS) that confers a licence to practise as a dentist in the UK. In some
recent years, as many as 18% of students have failed to achieve the pass mark in bacteriology.
The investigation reviewed 7 years of data f rom 1979 to 1985 (years 1-7). A total o f 439 dental undergraduates were involved. T h e bulk o f the bacteriology course was taught in the second year and examined as part o f the BDS 111 examination set at the end of this year. T h e examination contained a number o f essay and short-answer items on bacteriology which were marked independently o f the other subjects being examined.
The following predictors were analysed: ( I ) A levels: chemistry, physics, biology,
mathematics. ( 2 ) Academicgvade: The dental school operates a
system o f continuous assessment. Each term, tutors give students a grade from A to E reflect-
96 N . C . Boveham et a1
ing their intellect, manual skill and professional attitude. The grades for the term immediately prior to the BDS 111 were combined into a single overall grade. Note that this grading reflected all subjects, not just bacteriology.
(3) Spot test: The spot test was set after the bacteriology course to assess the students practi- cal work; it involved making judgements on a sample of the materials that had been handled during the practical classes. (Spot marks were unavailable for the second year of students in the study.)
(4) CIass arsesrment: Before and after the bac- teriology course, students were set a short test of 10 brief questions covering the basic ground taught in the course. The same questions were given before and after.
( 5 ) BDS I I examination: A mark for the biochemistry part of the BDS I1 examination, set at the end of the first year, was also used.
Examination marks were scaled from o to 100% with a pass mark of 5 0 % . The same examiners marked the class tests and the spot test throughout, but a number of examiners were involved in the end-of-year examinations. Stu- dents were unaware that the system was in opera- tion, which rules out any distortion of their learning and study habits.
Combining a series of predictors involves weighting them before adding them up to pro- duce a predicted score. Part o f this research was to compare two different weighting systems. The conventional means of deriving a set of weights is multiple linear regression. To avoid optimistic bias, it is necessary to estimate the weighting function from a set of data different from that used to test predictive performance. Thus predictions for each year of students were based on the regression equation derived either from the immediately preceding year or from several preceding years of data. The other method of combination simply converted all variables to standard scores and then added them up. This equal weights method avoids bias because the choice of weights is independent o f the data.
Spearnian correlation coefficients were calcu- lated for each year of data relating each predictor
to BDS I11 performance. Very little correlation was apparent between A level grades and BDS I11 scores (r=o.13 on average). All the university indices showed significant positive relationships: the average correlation coefficients were 0.41 for BDS 11, 0 . 3 1 for the spot test, 0 . 3 0 for the class tests and 0.25 for academic grading. As the A level information appeared of relatively little value, only the university indices were used in developing a combined prediction system.
First, multiple regression was used. A pre- dicted score was derived for each year ofstudents using a regression equation estimated from the previous years data. The average correlation between these predicted scores and the actual scores obtained later in the year was 0.48 (Table 2). The mean value of the regression coefficient for each predictor, averaged over the 7 years of the study, is shown in Table 3 .
Table 2 . Average correlation coefficients between actual performance and predicted performance for three prediction systems: regression using the previous years data, regression involving three preceding years, the equal weights system
3 1 5 6 7 Mean
0.68 0.5 5 0.3 7 0.44 0.48
~ . . . .._ . Equal weights
0.36 0.59 0.69
Regression coefficients are notoriously vari- able. The coefficient for the strongest of our pre- dictors (BDS 11) varied during the course of the study over a range from 0 . 1 3 to 0 . 3 5 . The larger the sample, the lower the standard error. Thus by combining the results ofseveral previous years, it should be possible to derive a more stable set of coefficients. Because of the absence of spot data for year 2 , this procedure was applied to year 6 and year 7 only, using equations trained on years 3 . 4, 5 and 4, 5 , 6 respectively. The average correlation between predicted and actual scores was 0 .5 I (Table 2). The two regression equations are shown in Table 3 . The coefficients are similar for BDS 11, academic grading and spot, but there
Early warning of academicfailure Table 3. Mean values of the regression coefficients averaged over the 7 years of the study for the single-year prediction system are shown in row I . Rows 2 and 3 give the regression equations for predicting year 6 from years 3-5 and year 7 from years 4-6 respectively
Class test Academic Regression model BDS I1 grade Spot Before After
Average for single-year system 0.24 1.53 0.16 0.79 2 . 3 3
Year 7 from 4-6 0.29 2.04 0.16 0.85 0.25 Year 6 from 3-5 0.29 2.07 0. I 1 0.02 0.52
are some sharp discrepancies for the class tests, again reflecting the instability of regression estimates.
Using the equal weight method of combina- tion, the average correlation with actual scores was 0 . 5 3 (Table 2) , which is at least as good as either of the multiple regression solutions.
The foregoing techniques measure the quality of prediction across the whole range of achieve- ment; however, our main interest is prediction at the pass/fail boundary. The equal weights method was used in an attempt to predict failure. Rather than setting an absolute at-risk threshold, our decision rule was to classify stu- dents in the bottom quarter of predicted scores as being at risk of failure.
Over the 7 years of the study, the average failure rate was 1 3 % , varying from 5% to 18%. On average, 60% of these failures were correctly predicted but at the expense of 71% false alarms. Performance was rather variable, reflecting the annual changes in failure rate: hit rates ranged from 46% to 100% (the year when only 5 % failed) and false alarms from 50% to 81%. The system was also used, not to predict failure, but to predict whether a students examination mark fell in the bottom quarter. The hit rate was found to be 45%, with 5 5 % false alarms. Performance was relatively stable: hit rates varied from 3 8 % to 50% and false alarms from 50% to 62%. Using the system to predict the bottom 20% of examination scores, the hit rate was 48% and false alarms ran at 52%.
Any prediction system, be it a clinical diagnostic test or a system for giving advance warning of academic failure, faces a difficult task when the
event to be predicted is a rare occurrence. High false alarms are inevitable. Academic failure is not only relatively rare, but the use of absolute pass marks in higher education means that the a priovi probability of failure also fluctuates, with year-to-year variations in students, examination material and examining standards. Errors of pre- diction, i.e. false alarms, are especially undesir- able given the stigma attached to academic failure; they cause needless distress as well as undermining the credibility of a system designed to help not punish.
Many of the difficulties of prediction can be avoided by moving away from the absolute con- cept of academic failure. There are no absolute standards of how much bacteriology a dentist needs to know. All would agree, though, that it is eminently reasonable to warn students in, for example, the bottom quartile that they are falling behind in their course work and that remedial measures are called for. By the same token, fall- ing in the bottom quartile at examination, whether or not this is a pass, is a sign that the students knowledge base leaves definite room for improvement.
This study shows that early warning systems based on relative class positions do have predic- tive validity and that their performance is relatively stable. Using information readily to hand and very simple calculations, almost 50% of students in the bottom group at examination were correctly identified from their relative per- formance on a combined set of pre-examination indices. False alarms ran at a fairly constant rate of 50-60%, depending on the setting of the at- risk threshold. Allowing for the more rigorous system of cross-validation in this study, these figures compare very well with the general stan- dard in other EWS studies.
98 N . C . Boreham et a1
Evaluating EWSs only in terms of the predic- tion of failure in the small group of at-risk cases perhaps gives a rather pessimistic impression of their forecasting power. Taking an overall view of the number of correct predictions (of success as well as failure) in the whole group of students gives a more accurate impressio...