some guidelines for academic quality rankings - world

18
Higher Education in Europe, Vol. XXVII, No. 4, 2002 Some Guidelines for Academic Quality Rankings MARGUERITE CLARKE While rankings are a popular method for comparing the relative quality of higher education institutions, there is much confusion and debate over which indicators to use and how to present the information in ranked format. This article offers some guidelines in both areas as well as questions to consider at each stage of the ranking process. Nonetheless, just as democracy, according to Winston Churchill, is the worst form of government except for all the others, so quality rankings are the worst device for comparing the quality of … colleges and universities, except for all the others (Webster, 1986, p. 6). Academic quality rankings are a controversial, but enduring, part of the educational landscape—controversial because not everyone agrees that the quality of a school or a programme can be quantified (Casper, 1996), and enduring because of the lack of other publicly attractive methods for comparing institutions (Sanoff, 1998). While the lack of appealing alternatives has legitimated the use of rankings in the eyes of many (but not all), there is still a lively debate over the issue of how to rank (Hattendorf, 1993). This article addresses the “how to” issue by offering some general guidelines for academic quality rankings. The discussion is presented in three parts and is couched in an American context. The first section of the paper outlines the conceptual categories that underpin many ranking efforts. It describes the strengths and limitations of some of the methods used to collect information on each, and offers some guidelines for the selection of ranking indicators. The second section examines the popular weight-and-sum approach to presenting this information in ranked format and explores its limitations for the evaluation of educational quality. 1 The third section draws on the findings of the previous two and presents a list of things to consider at each stage of the ranking process. Before proceeding, it is worth defining what an academic quality ranking means. According to Webster (1986), an academic quality ranking: [M]ust be arranged according to some criterion or set of criteria which the compiler(s) of the list believed measured or reflected academic quality[; and] it must be a list of the best colleges, universities, or departments in a field of study, in numerical order according to their supposed quality, with each school or department having its own individual rank, not just lumped together with other schools into a handful of quality classes, groups, or levels (p. 5). This definition reveals two characteristics of academic quality rankings that are the key to the discussion that follows. The first characteristic is that the choice of indicators rests with those doing the ranking. Consequently, while certain normative views of academic quality exist, the set of indicators used will vary according to the value system of the person 1 While this section may seem more of a “how not to” than a “how to” one, the discussion raises important issues that should be kept in mind no matter what ranking approach is used. ISSN 0379-7724 print/ISSN 1469-8358 online/02/040443-17 2002 UNESCO DOI: 10.1080/0379772022000071922

Upload: phamxuyen

Post on 12-Feb-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some Guidelines for Academic Quality Rankings - World

Higher Education in Europe, Vol. XXVII, No. 4, 2002

Some Guidelines for Academic Quality Rankings

MARGUERITE CLARKE

While rankings are a popular method for comparing the relative quality of higher educationinstitutions, there is much confusion and debate over which indicators to use and how to presentthe information in ranked format. This article offers some guidelines in both areas as well asquestions to consider at each stage of the ranking process.

Nonetheless, just as democracy, according to Winston Churchill, is the worst formof government except for all the others, so quality rankings are the worst devicefor comparing the quality of … colleges and universities, except for all the others(Webster, 1986, p. 6).

Academic quality rankings are a controversial, but enduring, part of the educationallandscape—controversial because not everyone agrees that the quality of a school or aprogramme can be quantified (Casper, 1996), and enduring because of the lack of otherpublicly attractive methods for comparing institutions (Sanoff, 1998). While the lack ofappealing alternatives has legitimated the use of rankings in the eyes of many (but not all),there is still a lively debate over the issue of how to rank (Hattendorf, 1993).

This article addresses the “how to” issue by offering some general guidelines foracademic quality rankings. The discussion is presented in three parts and is couched in anAmerican context. The first section of the paper outlines the conceptual categories thatunderpin many ranking efforts. It describes the strengths and limitations of some of themethods used to collect information on each, and offers some guidelines for the selectionof ranking indicators. The second section examines the popular weight-and-sum approachto presenting this information in ranked format and explores its limitations for theevaluation of educational quality.1 The third section draws on the findings of the previoustwo and presents a list of things to consider at each stage of the ranking process.

Before proceeding, it is worth defining what an academic quality ranking means.According to Webster (1986), an academic quality ranking:

[M]ust be arranged according to some criterion or set of criteria which thecompiler(s) of the list believed measured or reflected academic quality[; and] itmust be a list of the best colleges, universities, or departments in a field of study,in numerical order according to their supposed quality, with each school ordepartment having its own individual rank, not just lumped together with otherschools into a handful of quality classes, groups, or levels (p. 5).

This definition reveals two characteristics of academic quality rankings that are the keyto the discussion that follows. The first characteristic is that the choice of indicators restswith those doing the ranking. Consequently, while certain normative views of academicquality exist, the set of indicators used will vary according to the value system of the person

1 While this section may seem more of a “how not to” than a “how to” one, the discussion raises important issuesthat should be kept in mind no matter what ranking approach is used.

ISSN 0379-7724 print/ISSN 1469-8358 online/02/040443-17 2002 UNESCODOI: 10.1080/0379772022000071922

Page 2: Some Guidelines for Academic Quality Rankings - World

444 M. CLARKE

or group doing the ranking. The second characteristic is that schools or programmes mustbe placed in order on the basis of their relative performance on these indicators. Thus, whenmore than one indicator is involved, the information must either be combined in order toproduce a single ranked list of schools, or schools must be ranked separately on eachindicator. The next section focuses on the first of these ranking characteristics.

CONCEPTUALIZING AND MEASURING ACADEMIC QUALITY

The conceptualizations of quality that underpin most ranking efforts can be organized intothree categories: student achievements, faculty accomplishments, and institutional academicresources. These categories are portrayed in Table 1 along with some of the methodstraditionally used to collect information on each. The data obtained through these methodsare organized into indicators. For instance, the information obtained through a survey ofreputation is usually recorded in the form of a score for each institution. Together, thesescores form a “reputation” indicator that may be used—either alone, or in conjunction withother indicators—to rank schools.

As indicated in the third column of Table 1, each indicator has strengths and weaknessesthat should be kept in mind when determining the appropriateness of its use (see alsoHattendorf, 1993). For example, while surveys of reputation can produce rankings with highpublic credibility, they can also be misleading, if the overall reputation of an institutionclouds the evaluation of a particular department by a rater (e.g., if a university has a strongreputation, this fact might lead a rater to give a higher than deserved score to a weakdepartment within the university).

Another strength of surveys of reputation is that they tend to be good at identifying thestrongest programmes in a particular area. For instance, if one wanted to identify the top-tengraduate schools of business, a survey of reputation might serve the purpose very well.However, these surveys are not effective when trying to sort out all the programmes in aparticular area since not all schools have well-known reputations. Consequently, using thisapproach to rank all 352 accredited business programmes in the United States could beproblematic since raters may have little or no knowledge of many of the programmes theyare being asked to evaluate. The point to keep in mind is that every indicator comes withstrengths and weaknesses that need to be considered before including the given indicator ina particular ranking effort.

The set of indicators shown in Table 1 does not reflect some of the more recent changesin higher education (e.g., distance learning and computers in the classroom) that may affecthow quality is conceived and measured. Thus, it is useful, when choosing indicators, tothink in terms of the more general and desirable measurement properties of validity (doesthe indicator measure what it purports to measure?), reliability (does it do so in aconsistent/error-free fashion?), and comparability (can it be interpreted in a similar wayacross different kinds of programmes or institutions?) (Linn, 1993). Basically, if anindicator is to be included in a ranking, there should be some evidence (either in the formof existing data or data collected by those doing the ranking) that shows that it isappropriate—i.e., valid, reliable, and comparable—for the intended purpose. It is not easyto establish these properties since there can be conflicting evidence, and opinion, as to theappropriateness of an indicator for a given purpose. For example, among the indicatorsportrayed in Table 1, the validity of the test scores of incoming students as a measure ofinstitutional quality has been criticized on the grounds that it measures only what studentsbring with them and not what the institution does for students (Seaman, 1998). However,others argue that it is a valid, albeit indirect, indicator of institutional quality since “higher

Page 3: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 445

TABLE 1. Categories of academic quality

Category Method/indicator Advantage

Faculty Surveys of reputation They produce results with face validity, i.e.,accomplishments (e.g., ratings of faculty or results that most nearly match what the educated

programme reputation) public considers the hierarchy of colleges anduniversities to beDisadvantageThe overall reputation of an institution mayinfluence the assessments by raters of theparticular department(s) they are beingasked to rank

Counts of faculty awards, They are useful for ranking the best or thehonours, and prizes better institutions

DisadvantageThey may be years behind or ahead of reality

Counts of faculty citations Useful in assessing the influence andin citation indexes importance of the publications of faculty

members, and not just their sheer volumeDisadvantageThe citation indexes on which the rankingsare based do not distinguish between“good”, “neutral”, or “bad” citations

Student Distinguished alumni While only a small percentage of collegesachievements and the achievements and universities have faculties that produce

of graduates after much research, almost all of them attempt tograduation prepare their students for

rewarding careers in later lifeDisadvantageUsually years, if not decades, behind reality

The scores of incoming The data are easy to obtain and are a measurestudents on standardized by which most institutions can be rankedtests Disadvantage

Based on the academic abilities of studentsbefore they enter college and thus fail toconsider anything that these institutions doto educate their students oncethey enroll

Institutional Compilation of measures The data are easy to obtain and are a measureacademic of institutional resources, on which all institutions can be comparedresources including educational Disadvantage

expenditure per student, Offers little or no information about howfaculty–student ratios, often and how beneficially students useand library resources these resources

Source: Adapted from Webster (1986).

quality” institutions tend to attract the most talented students, and one way of measuringthis attraction is through the scores of incoming students on standardized tests (Morse andFlanigan, 2001).

Another way of thinking about the choice of indicators is in terms of inputs, processes,and outputs. All else being equal, process (e.g., teaching quality) and output (e.g.,effectiveness of graduates in the workplace) measures are preferred since they are better

Page 4: Some Guidelines for Academic Quality Rankings - World

446 M. CLARKE

indications of the quality of the instruction, preparation, and resources offered by aninstitution. The measures in Table 1 can be grouped into inputs (e.g., the scores of incomingstudents on standardized tests and library resources) and outputs (e.g., faculty citations andawards). The lack of process measures in this table, and more generally, is explained bytheir being more difficult to identify and more difficult and costly to measure. However,when available, they can provide very useful information on aspects such as classroomenvironment and teaching effectiveness. For instance, until recently, the rankings of TheTimes Higher Education Supplement of British universities included a “teaching quality”measure that indicated the effectiveness of instruction in different undergraduate depart-ments at these universities (see � http://www.thesis.co.uk/ � ).

Another guideline to consider when choosing indicators is the objective or subjectivenature of the indicators themselves. Objective indicators are those not dependent on theperson doing the counting. For example, if two people were asked to compute thestudent–faculty ratio or the number of books in the library of a particular institution, theywould come up with the same result (if given the same formula and no computational errorswere made). Subjective indicators are those that can vary depending on who is responding.For example, if two people were asked to rate the overall academic quality of a particularinstitution (as in a survey of reputation) using the same set of criteria, they might come upwith two very different scores because their subjective opinion would enter the process. Notsurprisingly, reliability can be more of an issue with subjective indicators.

Other guidelines could be added, but the above list covers some of the more importantones to keep in mind. In addition to these considerations, standardized procedures shouldbe used to collect, store, analyze, and present the information. These controls are necessarybecause errors at any stage of the process will reduce or eliminate the usefulness of anindicator as a potential measure of academic quality. The next section addresses the secondcharacteristic of rankings outlined at the start of this article—i.e., the choice of method forpresenting the information in ranked format.

PRESENTING THE INFORMATION IN RANKED FORMAT

Once the set of indicators has been chosen, a method must be selected for presenting theinformation in a ranked format. Several options exist, some of which are discussed in otherarticles in this issue of Higher Education in Europe. Instead of re-examining thoseapproaches, this section explores the popular weight-and-sum approach and discusses itslimited usefulness for representing the relative quality of institutions or programmes (seeClarke, 2002a; also Clarke, 2002b). In doing so, some general issues are raised that shouldbe kept in mind when presenting ranked information.

The Weight-and-Sum Approach

The weight-and-sum approach involves assigning a weight to each indicator according toits perceived importance and then using the weights to combine the indicator informationinto an overall score (Scriven, 1991). While the result of this process is one easy-to-digestnumber, critics have pointed out several problems, including the fact that the choice ofweights is itself a value judgment and thus can vary depending on who is making thedecision (Camilli and Firestone, 2000). Depending on the number of criteria and theirweights, one dimension may dominate all the others, or several trivial dimensions may

Page 5: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 447

swamp more crucial ones (Evaluation News, 1981). Nonetheless, the approach works quitewell in certain contexts and has been used for years in the area of product evaluations (see� www.consumerreports.org � for an example).

Despite the popularity of this method for ranking cars and toasters, there is dividedopinion as to whether it works for the evaluation of educational quality (Hattendorf, 1993;Scriven, 1991). This lack of consensus is partly due to the basing of product ratings oneasily observable and quantifiable parts of the product or its performance (e.g., in a car, fueleconomy, the warranty, acceleration, and safety features), but academic quality ratings tendto be based on institutional components that are harder to observe or to quantify (e.g.,reputation and distinguished alumni). Consequently, it is not as easy to reach consensus asto which indicators to use and how to measure them when assessing academic quality asit is in the case of assessing car performance.

Even if one is conceptually comfortable with using this approach to rank institutions,there is little research on the relationships that exist among the various indicators ofacademic quality, or on their relative importance in the assessment of institutional orprogramme quality. Therefore, it is difficult to know how to weight them in order to comeup with an overall score. This problem is less pronounced for product ratings since thereis a clearer sense of the relative importance of different aspects of product performance aswell as how these relate to each other. In addition, it is easier to validate a formula usedto rank products since there are more easily observable outcome indicators to rely on—e.g.,do cars that are ranked highest actually perform better/break down less frequently?

Limitations of the Weight-and-Sum Approach for the Evaluation of EducationalQuality

One way to look at the implications of these issues for academic quality rankings is byexamining rankings that use the weight-and-sum approach. The data presented here arefrom the weekly magazine US News and World Report. It is one of the most popularsources of college and graduate school rankings in the United States and has been rankingschools since 1983.2 While the indicators used for each of the college and graduate schoolrankings vary, the methodology is the same: i.e., the chosen indicators are standardized,weighted, and summed to produce an overall score on which to rank schools or programmesin each category against their peers (Garrett, 2002; Morse and Flanigan, 2001). No researchis cited as the basis for the indicators and weights used, and there is no indication that theproperties of the indicators or how they are related to each other have been examined.

However, the importance of obtaining this type of information, particularly if theindicators are going to be combined, can be seen by looking at Tables 2 and 3. These tablesare based on data from the US News and World Report 2001 calendar year rankings of thetop-fifty business schools and the top-fifty schools of education in the United States (datatend not to be published for schools below the top-fifty). In each table, the set of indicatorsused for that ranking is presented across the top and down the side of the table. Thenumbers in the table represent the strength of the relationships (or correlations) among thevarious indicators. The magnitude of the correlation can vary between 0 and 1, with 0 beingthe weakest and 1 the strongest relationship. In addition, the direction of the relationship

2 The college rankings can be obtained at � http://www.usnews.com/usnews/edu/college/rankings/rankindex.htm � . The graduate school rankings can be obtained at � http://www.usnews.com/usnews/edu/beyond/bcrank.htm � .

Page 6: Some Guidelines for Academic Quality Rankings - World

448 M. CLARKE

TA

BL

E2.

Cor

rela

tions

for

2001

busi

ness

scho

olin

dica

tors

*

Em

ploy

edA

vera

geA

vera

geR

eput

atio

nR

eput

atio

nSt

artin

gE

mpl

oyed

atth

ree

mon

ths

GM

AT

unde

rgra

duat

ePe

rcen

t(a

cade

mic

)(r

ecru

iter)

sala

rygr

adua

tion

late

rsc

ores

GPA

acce

pted

Rep

utat

ion

(aca

dem

ic)

10.

900.

900.

320.

240.

810.

57�

0.73

Rep

utat

ion

(rec

ruite

r)1

0.86

0.32

0.18

0.70

0.50

�0.

68St

artin

gsa

lary

10.

520.

430.

800.

49�

0.68

Em

ploy

edat

grad

uatio

n1

0.49

0.30

0.03

�0.

20E

mpl

oyed

thre

em

onth

sla

ter

10.

240.

04�

0.05

Ave

rage

GM

AT

scor

es1

0.66

�0.

69A

vera

geun

derg

radu

ate

GPA

1�

0.58

Perc

ent

acce

pted

1

*B

ased

onda

tafo

rth

eto

p-fif

tysc

hool

san

dro

unde

dto

two

deci

mal

plac

es.

Page 7: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 449

TA

BL

E3.

Cor

rela

tions

for

2001

educ

atio

nsc

hool

indi

cato

rs* N

umbe

rof

Ver

bal

Rat

ioof

doct

oral

Prop

ortio

nR

esea

rch

Rep

utat

ion

Rep

utat

ion

GR

EQ

uant

itativ

ePe

rcen

tst

uden

tsde

gree

sin

doct

oral

Tot

alpe

rfa

culty

(aca

dem

ic)

(sup

erin

tend

ent)

scor

esG

RE

scor

esac

cept

edto

facu

ltygr

ante

dpr

ogra

mm

esre

sear

chm

embe

r

Rep

utat

ion

(aca

dem

ic)

10.

780.

270.

45�

0.28

0.16

0.39

0.44

0.30

0.00

1R

eput

atio

n(s

uper

inte

nden

t)1

0.21

0.23

�0.

150.

130.

210.

390.

160.

05V

erba

lG

RE

scor

es1

0.61

�0.

460.

14�

0.29

�0.

04�

0.08

0.14

Qua

ntita

tive

GR

Esc

ores

1�

0.41

0.01

�0.

100.

100.

090.

04Pe

rcen

tac

cept

ed1

�0.

34�

0.11

0.04

�0.

48�

0.46

Rat

ioof

stud

ents

10.

100.

020.

090.

19to

facu

ltyN

umbe

rof

10.

210.

490.

01do

ctor

alde

gree

sgr

ante

dPr

opor

tion

in1

�0.

12�

0.13

doct

oral

prog

ram

mes

Tot

alre

sear

ch1

0.63

Res

earc

hpe

r1

facu

ltym

embe

r

*B

ased

onda

tafo

rth

eto

p-fif

tysc

hool

san

dro

unde

dto

two

deci

mal

plac

es.

Page 8: Some Guidelines for Academic Quality Rankings - World

450 M. CLARKE

can be positive or negative. A positive number means that as values for one indicatorincrease, the values for the other indicator also tend to increase. A negative number meansthat as values for one indicator increase, the values for the other indicator tend to decrease.The strength of the relationship between any two indicators in Table 2 or 3 can bedetermined by finding the number at which the column location for one and the rowlocation for the other intersect. For example, in Table 2 the correlation between “Startingsalary” and “Average Graduate Management Admissions Test (GMAT) scores” is 0.8. Thismeans that schools with higher average GMAT scores tend to have graduates with higherstarting salaries. The correlation between “Average GMAT scores” and “Percent accepted”is � 0.69, which means that schools that accept fewer applicants tend to have incomingstudents with higher average test scores (note that the relationship is not as strong as thatbetween “Starting salary” and “Average GMAT scores”).

The most obvious difference between Tables 2 and 3 is in the sizes of the correlations.Specifically, the correlations among the business indicators tend to be larger than thoseamong the education indicators, meaning that there are stronger relationships among theindicators used to assess quality in graduate schools of business. A business school thatperforms well on one of these quality indicators is also likely to perform well on the rest.It is harder to make this prediction with schools of education since the relationships are notas strong. For instance, a school of education with high average “Quantitative GraduateRecord Examination (GRE) scores” for its entering students may or may not also have high“Total research” expenditures (this refers to the amount of funded research being conductedat the school). It is hard to predict, since the relationship between these indicators is quiteweak (i.e., r � 0.09). The reason for the different correlation patterns in Tables 2 and 3 isnot clear. For example, these differences could be the result of variations in what qualitylooks like in a school of education versus what it looks like in a school of business, or theymay be due to the differential availability of quality-related information for these schooltypes.

Either way, these correlations have implications for the weighting of the indicators usedto produce an overall score. For instance, the relative sizes of the weights assigned to eachof the business indicators are less likely to have an impact upon the final ordering of schoolsin this ranking since schools tend to perform similarly on each indicator (i.e., if they do wellon one, they also tend to do well on another). Therefore, whether the heaviest weight isgiven to, for example, “Reputation (academic)” (the reputation scores given to schools byacademics) or “Starting salary” will have little effect on the final outcome since schoolstend to perform similarly in regard to both indicators (r � 0.9). The same cannot be said forthe education indicators where, depending on which indicator receives the most weight,there can be a very different rank ordering of schools. For example, giving the largestweight to “Reputation (superintendent)” (the reputation scores given to schools by super-intendents) versus “Total research” (the amount of funded research being conducted at theschool) could produce quite different rankings since schools may not perform similarly onboth (r � 0.16).

These differences also raise questions about the choice of ranking indicators in general.Should the indicators have high or low correlations? Are high correlations a sign of validity(i.e., are they all measuring quality)—or of redundancy (are they all measuring the samething, and are therefore not all needed?) Are low correlations a sign of invalidity (i.e., aresome or all of the indicators not measuring quality?) or unique information (i.e., are theyall measuring different aspects of academic quality?). In addition, if efficiency is of value,and if the indicators are highly correlated, should some of the indicators be dispensed withsince doing so would probably not affect the final ordering of schools? For instance, since

Page 9: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 451

“Reputation (academic)” correlates highly with most of the other indicators used to rankbusiness schools, one could theoretically dispense with the seven other indicators and stillarrive at a very similar ordering of schools.

Another implication of the relationships shown in Tables 2 and 3 is seen in the sensitivityof each ranking to changes in the ranking formula. US News and World Report, like manyorganizations involved in rankings efforts, makes frequent changes in its ranking formulae.These changes are meant to improve the rankings and may involve the addition or removalof indicators, changes to the methodology/definition for an indicator, or changes to theweights used. Any of these changes can have a significant impact on the relative orderingof schools, an impact that is quite separate from real change in the relative performance onthe indicators of the school. The impact is magnified with the weight-and-sum approach,particularly when the indicators used are not highly correlated, since this method producesonly one score on which to rank schools.

For example, the correlation between the top-fifty business schools according to the USNews and World Report in 1995 and 2001 is 0.88, while the correlation between thetop-fifty schools of education in 1995 and 2001 is only 0.78. Thus, the list of the top-fiftybusiness schools in 1995 is quite similar to the list in 2001, but the list of the top-fiftyschools of education in 1995 varies somewhat from the list published in 2001. There werequite a few changes to the formula for the education rankings during this six-year period(twenty changes in total), far more than to the formula for the business rankings (eightchanges in all).

While this phenomenon is one of the reasons behind the greater movement of schools inthe education rankings, it is not the only one. For instance, the US News and World Reportranking formula for law schools also experienced a large number of changes during thistime period (fourteen changes in all), but the list of the top-fifty law schools in 1995 is verysimilar to that for 2001 (r � 0.92). The reason for this situation is that the indicators usedfor the law school rankings tend to be quite highly correlated. Thus, formula changes tendnot to affect the final outcome substantially in terms of the relative ordering of schools.

The weight-and-sum approach attempts to capture in one number an aggregate evaluationof the worth of an institution relative to others. But this overall score can be misleading forseveral reasons. It implies a comprehensive measure of the quality of the colleges orprogrammes being ranked even though no currently available data offer sufficient coverageto accomplish this task (Lombardi et al., 2001). In addition, the indicators used may beproblematic in terms of their validity, reliability, and comparability for the schools beingranked (Cantor, 1996). All of these issues are magnified when weights are applied to createan overall score (Casper, 1996). The usefulness of the information is further reduced whenfrequent changes to the formula make it difficult to interpret movement in the relativepositions of schools in the rankings from year to year.

One way to reveal the uncertainty behind this overall score is shown in Tables 4 and 5.The formulae used to produce these tables are given in the explanation that follows them,and more detail is provided in Clarke (2002a). Basically, the tables show what happens tothe score awarded a school by US News and World Report when slight changes are madeto the indicators and weights being used. The amount of movement in the score awardedto a school owing to these changes is used to generate a “standard error” band around thescore. This error band can then be used in a t-test in order to assess whether the overallscore of the school is significantly different statistically from that of another.

The results of these school-by-school comparisons for each top-fifty ranking (i.e.,business and education) are shown in Tables 4 and 5. In each table, schools are ordered bytheir US News and World Report overall score across the heading and down the rows (the

Page 10: Some Guidelines for Academic Quality Rankings - World

452 M. CLARKE

TA

BL

E4.

Bus

ines

ssc

hool

ratin

gsin

2001

aspe

rth

eU

SN

ews

and

Wor

ldR

epor

tm

etho

dolo

gy

Page 11: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 453

Page 12: Some Guidelines for Academic Quality Rankings - World

454 M. CLARKE

TA

BL

E5.

Edu

catio

nsc

hool

ratin

gsin

2001

aspe

rth

eU

SN

ews

and

Wor

ldR

epor

tm

etho

dolo

gy

Page 13: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 455

Page 14: Some Guidelines for Academic Quality Rankings - World

456 M. CLARKE

overall score is located in parentheses after the name of each school). One must read acrossthe row for a school in order to compare its performance with that of the schools listed inthe heading of the chart. The symbols indicate whether or not the overall score of the schoolin the row is significantly higher than that of the comparison school in the heading (arrowpointing up), significantly lower than that of the comparison school in the heading (arrowpointing down), or if there is no statistically significant difference between the two schools(shaded cell with circle). A blank diagonal represents the comparison of a school againstitself.

Regarding Tables 4 and 5: The Jackknife Technique

A regression model is substituted for the US News and World Report formula by using theoverall scores for schools in a ranking as the outcome variable and the indicators as thepredictor variables (this process basically replaces one linear model with another). Thejackknife procedure then removes one indicator at a time from the regression model,recalculating the overall score for each school with the remaining indicators beforereplacing the indicator and repeating the procedure. The jackknife standard error for aschool is obtained from these values, using the following formula (Efron and Tibshirani,1993):

sejacknife ��n � 1

n���(i) ���(i)

n�(2)

,

where n is the number of regression models to be estimated and �i is the predicted scorefor a school from the ith regression model with one indicator removed.

This standard error can be used in a t-test to assess whether the score of one school issignificantly different statistically from that of another. In order to control for the increasedprobability of Type I error (finding a significant difference when there is none) owing tothe number of comparisons being made, the Bonferroni method for multiple comparisonsis used (see Glass and Hopkins, 1996).

If there were no errors around the overall scores for schools, Tables 4 and 5 wouldconsist only of arrows pointing up and down, except for instances in which two schoolshave the same overall score and are tied for rank. Such is not the case, as evidenced by theamount of shaded area in each table. The fact that there is more shaded area in Table 5means that it is harder to find “real” differences between the overall scores for schools ofeducation since the scores of these schools can change quite a bit depending on theindicators and weights being used. The fact that there is less shaded area in Table 5 meansthat it is easier to find “real” differences between the overall scores for business schoolssince the scores of these schools tend to be fairly stable. Even so, it is evident that thereare far fewer “real” differences between business schools than is implied by the overallscore attributed by the US News and World Report.3

CONCLUSION

Any effort at creating an academic quality ranking system must grapple with the difficultyof trying to quantify the intangibles of a set of complex teaching, learning, resource, and

3 While the grouping patterns in Tables 4 and 5 are an artifact of the number of schools included in the analyses(e.g., if data for all 341 business schools surveyed were included, slightly different groupings would result), it illustratesthe “false precision” of the overall score produced using a weight-and-sum approach.

Page 15: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 457

research phenomena. It is evident that the choice of indicators is a non-arbitrary process thatshould be guided by, among other things, a knowledge of the strengths and limitations ofthe indicators being considered as well as their validity, reliability, and comparability forthe schools or programmes to be ranked. The choice of a method for presenting thisinformation in ranked format must be guided by, among other things, an understanding ofthe nature of quality in the schools or programmes being ranked as well as the relationshipsamong the quality indicators that are to be used. In addition to these guidelines, thefollowing questions are offered as a way to think about the methodological issues raised ateach stage of the ranking process:

What is Being Ranked?

Are institutions, departments, or programmes being ranked? Depending on the answer,not only will the choice of indicators vary; so too will the unit of measurement.For instance, if graduate schools of business are being ranked, then the indicators usedshould reflect information on graduate schools of business, and not on the institutions inwhich they are located. Thus, a student–faculty ratio indicator should be based oninformation for the business school and not for the institution as a whole. In addition, itshould be recognized that all schools in a particular area may not be similar and thereforeshould not be “lumped” into the same ranking. For example, among schools of educationthere are schools the mission of which is primarily teacher training, and others, the primarymission of which is research. Combining both kinds of schools into the same ranking maybe very misleading in terms of showing their relative quality. Two separate rankings wouldbe better.

Why is it Being Ranked? Who is the Intended Audience?

The reasons for rankings are many. For example, the purpose of the ranking may be toinform, to act as a spur for improvement, or to provide benchmarks. The audience forrankings also varies, and may include students, parents, institutions, or the general public.Depending on the reason for a ranking and its intended audience, the choice of indicatorsand approach will vary. For instance, a college ranking that is meant to help students decidewhere to go to school will most likely be different from a ranking that is meant to provideinformation to college administrators or to higher education policy-makers.

What Can I Do to Improve the Quality of My Indicators?

In addition to the above guidelines, information utility and reliability can be improved byusing multi-year data. Doing so tends to reduce any anomalies (e.g., spikes or dips) in theperformance of an institution that may throw off its ranking. Using score ranges (e.g.,percentile ranges) rather than point estimates (e.g., averages) can also provide a betterpicture of the spread of performance in a particular institution or programme.

Indicator quality may also be adversely affected if institutions manipulate the informationbeing used. For instance, schools may inflate their application pools in order to look moreselective, thereby improving their performance on a selectivity indicator. Standardized datacollection, processing, and reporting techniques can reduce the occurrence of some of theseproblems. However, the bigger issue to be addressed may be how to reduce the pressuresthat institutions feel to do well in rankings.

Page 16: Some Guidelines for Academic Quality Rankings - World

458 M. CLARKE

How Will I Present the Information to My Audience?

The choice of a specific ranking methodology is dependent on the particular context andgoals of the person or group doing the ranking. Nonetheless, the usefulness of the rankingwill be increased if the chosen methodology allows the values and needs of the eventualuser to be incorporated into the outcome. For example, the indicator information could bemade available on a Web site that allows the user to decide on a formula for creating a finalranked list of schools. In addition to allowing the user to specify the indicators and theirrelative importance in the final outcome, there could also be an option that allows the userto specify a minimum required performance level on some or all of the indicators.

How Often Will I Present the Information to My Audience?

Rankings can be as frequent or as infrequent as one wishes. However, since institutions tendto be stable rather than unstable systems, and since their quality tends to change slowly overtime, it is doubtful that annual rankings offer any educational value to the averageconsumer.

How Often Will I Change My Approach?

Rankings need to be responsive to changes in education. Thus, as education changes, so tooshould the indicators used to represent it. The rise of distance learning and classroom-basedtechnology are examples of developments that have also changed some aspects of how wethink about and measure academic quality. Nonetheless, it is useful to maintain a subset ofranking indicators that are constant across years so that users can gain some sense of thedegree of stability in relative institutional quality over time.

As a final consideration, it should be remembered that transparency is essential to thesuccess of any ranking system. Thus, the openness of the process in terms of how theindicators were chosen, the approach taken to present this information in ranked format, andaccess to the original data should always be maintained.

REFERENCES

CAMILLI, G., and FIRESTONE, W. A. “Values and State Ratings: An Examination of the State-by-StateEducation Indicators in Quality Counts”, Educational Measurement: Issues and Practice 18 4(2000): 17–25.

CANTOR, G. “Universities Increasingly Question the Criteria of College Rankings”, The Detroit News(1 December 1996): 3.

CASPER, G. “Letter to the Editor of US News and World Report” (23 September 1996), retrieved17 February 1999 from � http://www-portfolio.stanford.edu:8050/documents/president/961206gcfallow.html � .

CLARKE, M. “Quantifying Quality: What Can the US News and World Report Rankings Tell Us aboutthe Quality of Higher Education?”, Education Policy Analysis Archives 10 16 (2002a), � http://epaa.asu.edu/epaa/v10n16/ � .

CLARKE, M. “News or Noise: An Analysis of US News and World Report’s Ranking Scores”,Educational Measurement: Issues and Practice 21 4 (2002b): 39–48.

EFRON, B., and TIBSHIRANI, R. J. An Introduction to the Bootstrap. New York: Chapman and Hall,1993.

Evaluation News 2 1 (February 1981): 85–90.GARRETT, G. “Our Method Explained”, US News and World Report Best Graduate Schools 2003

Edition. Washington, D.C.: US News and World Report, 2002, pp. 34, 35.

Page 17: Some Guidelines for Academic Quality Rankings - World

GUIDELINES FOR ACADEMIC QUALITY RANKINGS 459

GLASS, G. V., and HOPKINS, K. D. Statistical Methods in Education and Psychology, 3rd edn. NeedhamHeights, Massachusetts: Simon & Schuster, 1996.

HATTENDORF, L. C., ed. Educational Rankings Annual. Detroit: Gale Research, 1993.LINN, R., ed. Educational Measurement, 3rd edn. Washington, D.C.: American Council on Education,

1993.LOMBARDI, J., CRAIG, D., CAPALDI, E., GATER, D., and MENDON, S. The Top American Research

Universities. Gainesville, Florida: The Center, University of Florida, 2001.MORSE, R. J., and FLANIGAN, S. M. “How We Rank Schools”, US News and World Report America’s

Best Colleges 2002 Edition. Washington, D.C.: US News and World Report, 2001, pp. 67–70.SANOFF, A. “Rankings Are Here to Stay: Colleges Can Improve Them”, Chronicle of Higher

Education 45 2 (4 September 1998): A96.SCRIVEN, M. Evaluation Thesaurus, 4th edn. Newbury Park: Sage, 1991.SEAMAN, B. “What Makes a Good College”, Time 152 17 (26 October 1998): 1–2.WEBSTER, D. S. Academic Quality Rankings of American Colleges and Universities. Springfield:

Charles C. Thomas, 1986.

Page 18: Some Guidelines for Academic Quality Rankings - World