nber working paper series standing on academic … · j. roger clemmons institute for child health...

NBER WORKING PAPER SERIES

STANDING ON ACADEMIC SHOULDERS:MEASURING SCIENTIFIC INFLUENCE IN UNIVERSITIES

James D. AdamsJ. Roger Clemmons

Paula E. Stephan

Working Paper 10875http://www.nber.org/papers/w10875

NATIONAL BUREAU OF ECONOMIC RESEARCH1050 Massachusetts Avenue

Cambridge, MA 02138October 2004

Paper prepared for the International Conference on R&D, Education, and Productivity in Memory of ZviGriliches, Paris, France; August 25-27, 2003. The Andrew W. Mellon Foundation provided generous supportfor this research. Nancy Bayers and Henry Small of the Institute for Scientific Information (ISI) offerednumerous clarifications concerning the data, while Adam Jaffe provided nonlinear regression programs thatwe adapted for use in the estimation procedures. We thank three referees for their comments and suggestions.The views expressed herein are those of the author(s) and not necessarily those of the National Bureau ofEconomic Research.

© 2004 by James D. Adams, J. Roger Clemmons, and Paula E. Stephan. All rights reserved. Short sectionsof text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit,including © notice, is given to the source.

Standing on Academic Shoulders: Measuring Scientific Influence in UniversitiesJames D. Adams, J. Roger Clemmons, and Paula E. StephanNBER Working Paper No. 10875October 2004JEL No. L30, O30

ABSTRACT

This article measures scientific influence by means of citations to academic papers. The data sourceis the Institute for Scientific Information (ISI); the scientific institutions included are the top 110U.S. research universities; the 12 main fields that classify the data cover nearly all of science; andthe time period is 1981-1999. Altogether the database includes 2.4 million papers and 18.8 millioncitations. Thus the evidence underlying our findings accounts for much of the basic researchconducted in the United States during the last quarter of the 20th century. This research in turncontributes a significant part of knowledge production in the U.S. during the same period.

The citation measure used is the citation probability, which equals actual citations divided bypotential citations, and captures average utilization of cited literature by individual citing articles.The mean citation probability within fields is on the order of 10-5. Cross-field citation probabilities

are one-tenth to one-hundredth as large, or 10-6 to 10-7. Citations between pairs of citing and cited

fields are significant in less than one-fourth of the possible cases. It follows that citations are largely

bounded by field, with corresponding implications for the limits of scientific influence.

Cross-field citation probabilities appear to be symmetric for mutually citing fields. Scientific

influence is asymmetric within fields, and occurs primarily from top institutions to those less highly

ranked. Still, there is significant reverse influence on higher-ranked schools. We also find that top

institutions are more often cited by peer institutions than lower-ranked institutions are cited by their

peers. Overall the results suggest that knowledge spillovers in basic science research are important,

but are circumscribed by field and by intrinsic relevance. Perhaps the most important implication of

the results are the limits that they seem to impose on the returns to scale in the knowledge production

function for basic research, namely the proportion of available knowledge that spills over from one

scientist to another.

James D. AdamsDepartment of Economics3504 Russell Sage LaboratoryRensselaer Polytechnic Institute110 8th StreetTroy, NY 12180-3590and [email protected]

J. Roger ClemmonsInstitute for Child Health PolicyUniversity of FloridaPO Box 100147Gainesville, FL [email protected]

Paula E. StephanDepartment of Economicsand Andrew Young School ofPolicy StudiesGeorgia State UniversityAtlanta, GA [email protected]

I. Introduction

This paper is part of a larger project that has occupied much of our time in recent years 1. In the early

going the project’s main goal is to describe basic research interactions among the top 110 U.S. universities,

among the top 200 U.S. R&D firms, and between the universities and firms. Owing to space constraints,

this paper concentrates on measurement of scientific influence among universities. The time period is

1981-1999; the scope of our investigation includes all of science. The data cover 2.4 million scientific

papers by universities and 18.8 million citations to these papers2. To date our research emphasizes pre-

technology science rather than patented technology, though the two overlap. Branstetter (2003), Darby and

Zucker (2003), Jensen and Thursby (1999), and other papers show that this overlap is important to

technical progress. This paper’s study of scientific influence within academia, however, lays the

groundwork for understanding its influence on industry, since fertile areas of science can give rise to

industrial science, in part by means of faculty consultants and entrepreneurs (Audretsch and Stephan, 1996;

Zucker, Darby and Brewer, 1998).

This paper describes scientific influence, which in principle refers to the productive role that earlier

work plays in later work. Our goal is to trace scientific influence across institutions, sciences, and time.

Space limitations confine our measurements not just to universities, but to a single indicator, citations to

scientific papers from other papers. But publications are the most common channel of knowledge flow in a

recent survey of industrial research (Cohen, Nelson, and Walsh, 2002), let alone universities. The findings

in this paper are a first step towards allowing for many channels of influence, especially collaboration and

mobility of graduate students and faculty. We assume that papers represent new knowledge, though the

amount of this knowledge varies. We also assume that citations to papers on the whole indicate intellectual

influence rather than pre-publication strategy or honorific referencing.

The citation measures used in this paper owes a great deal to bibliometrics. Examples of this literature

are De Solla Price (1965, 1986), Garfield (1972), Narin and Hamilton (1996), Narin, Hamilton, and

Olivastro (1997), and Van Raan (1990), among others.

1 During the planning phases of the project we had the indisputable advantage as well as undeniable privilege of working with Zvi Griliches. We deeply and profoundly regret that he did not live to see this work near completion. 2 The full data set consisting of all papers and citations of the universities and firms amounts to 2.7 million papers and more than 20 million citations.

The following are key questions raised in the course of our research. What determines the level of

influence within different fields? How does influence within fields compare with influence across fields?

How common are the “between” components compared to “within” components? Are cross-field

influences symmetric or not? What role does quality of scientific institutions play in scientific influence?

Could scientific influence increase with quality in such a way as to reinforce scientific excellence?3

This paper draws on a literature which argues that the role of outside knowledge in productivity growth

is important but limited. Its common thread is that knowledge flows are constrained by limited relevance,

so that increasing returns from knowledge are correspondingly limited. The various studies cover

agriculture (Evenson and Kislev, 1975), manufacturing (Scherer, 1982a, b), basic research (Adams, 1990),

applied industrial research (Adams and Jaffe, 1996), as well as surveys of the externalities from R&D and

their limits (Griliches, 1979, 1992). Our approach to measurement of these limits has been shaped by Jaffe

(1986), Trajtenberg (1990), and especially Jaffe and Trajtenberg (1999).

Likewise the findings in this paper imply that limits apply to the relevance of scientific ideas. And

since science could replenish research opportunities in industry (and conversely), this result bears on

economic growth. In the evolution of growth theory, scale effects in the knowledge production function

have been steadily curtailed over time. Thus, opportunities for growth have been viewed as deriving from

the growth of R&D rather than its level, supported by a contribution from spillovers that is sufficient to

avoid diminishing returns to research. However, growth models (Romer (1990), and Jones (1995, 2002))

tend to assume that knowledge flow without friction throughout countries and even the world4. And yet

our findings suggest that the frictions are substantial, so that influence is limited by the narrow applicability

of most discoveries.

The measure of scientific influence used in this paper is the citation probability, which divides actual

citations by potential citations. This is the average number of citations per potentially cited paper, divided

by the number of potentially citing papers. This measure captures the rarity of citation by individual

science papers by normalizing actual citations by potential citations.

3 This is the Matthew Effect. For more on this topic see Merton (1973) and Zuckerman (1977). 4 Marshall (1920, p. 220) observes that “Many of those economies in the use of specialized skill and machinery which are commonly regarded as within the reach of very large factories, do not depend on the size of individual factories. Some depend on the aggregate volume of production of the kind in the neighborhood; while others again, especially those connected with the growth of knowledge and the progress of the arts, depend chiefly on the aggregate volume of production in the whole civilized world.”

2

Main findings of the paper are as follows. First, the within-field citation probability is on the order of

10-5 and is 10 to 100 times the between-field citation probability. The comparative rarity of cross-field

citations holds even when cross-field interactions are required to be statistically significant.

Second, the number of significant cross-field interactions is one-fourth of the potential number. Since

the data include 12 science fields, there are 11×12=132 cross-field interactions. Thirty-two of these differ

significantly from zero. This suggests that knowledge flows are bounded by scientific distance, just as

knowledge flows in industry are bounded by technological distance (Adams and Jaffe, 1996).

Third, the findings support the hypothesis of symmetry of cross-citation between pairs of mutually

citing fields. The rate at which biology cites medicine is nearly the same as the rate at which medicine cites

biology. However, this finding is conditional on two-way citation. Asymmetries exist, but take a more

subtle form of applied fields citing underlying basic fields more than the reverse.

Fourth, the modal or most frequent lag in science citations based on publication years, a measure of the

speed of diffusion, is slightly more than three years. This compares with a modal lag of more than five

years in patent citations based on analogous grant years (Jaffe and Trajtenberg, 1999).

Fifth, the rank of scientific institutions increases scientific influence. Sixth, citation interactions with

peer institutions increase with rank. University-fields in the top 20% cite one another at a significantly

higher rate more than university-fields in the middle 40% or bottom 40% cite one another. This suggests

that knowledge flows from peer programs increase with rank.

The rest of this paper consists of five sections. Section II motivates the paper by reexamining the

knowledge production function in light of our findings. Section III discusses the meaning of science

citations, presents the citation probability, and describes the science citation function. Section IV describes

the data, specifies the science fields, and discusses measurement of the quality of scientific institutions.

Section V reports econometric estimates of the science citation function. Section VI concludes.

II. Motivation

We motivate the empirical findings by exploring a knowledge production function in which searching

the literature increases absorption of knowledge into current research. The research process accumulates

heterogeneous stocks of knowledge. Accumulation of one stock depends on several knowledge stocks, as

the findings suggest, and on the allocation of labor to invention and searching the literature. The stocks

3

consist of basic scientific knowledge. Using this construct we illustrate the constrained output-maximizing

allocation of scientific labor. We then show how this allocation might replicate the empirical results below.

Let be the time derivative of science i , let be “inventive” labor engaged in discovery, and let

be “absorptive” labor in i engaged in searching stock . Then the knowledge production function is

iA& iL

ijK jA

(1) ( ) 1 0 ,1 1

...N i,βAKLA ijN

jjijii ij =>>= ∑

=αδ βα&

Returns to scale are constant or diminishing so . Inventive labor and some absorptive

labor are required for accumulation. If

∑ = ≤+ Nj ij1 1βα

iiβ exceeds ijβ within-field effects exceed cross-field effects.

Let the wage of researchers be and let total research cost beC . If we take the objective of

scientists to be maximization of output subject to a cost constraint, then the Lagrangian is

RW

( )⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛+−+=Η ∑∑

==

N

jijiR

N

jjiji KLWCAKL ij

11

λδ βα

whereλ is the Lagrangian multiplier. Adopting this as the objective avoids the assumption of profit

maximization, which is not clearly relevant to university research. Maximization of H yields the follwing

first order conditions for absorptive labor in the same and other fields:

(2) , Rjijiij

Riiiiii

WAKL

WAKL

ijij

iiii

λδβ

λδβββα

ββα

≤

=−

−

1

1

In the interest of brevity we omit the first order condition for and we assume that the second order

conditions are satisfied. The top equation determines absorption of science in the same field while the

second inequality determines absorption of other fields. In the case of generally small

iL

iiβ and ijβ ,

absorptive labor is small in the face of limited research budgets. Limiting values of ijβ imply that the

second member of (2) is a strict inequality. In that case science drops out of the production function

for , as we frequently find. If, as also appears to hold empirically, technological distance generates an

j

i

4

extreme drop off in the cross-term ijβ relative to iiβ , absorptive labor in the same science

exceeds committed to other sciences. To see this, let the second member of (2) be an equation,

compute the ratio of the two equations, and solve for . After some manipulations the result is

iiK

ijK

iiK

(3) ii

ijii

ij

ii

ijjij

iiiii K

A

AK β

ββ

β

β

β

β −

−−

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛= 1

11

1

If , and if, as already postulated,ji AA ≈ ijii ββ > , then it follows that . To put this point

another way, unless greatly exceeds , absorptive labor is concentrated within the same field.

ijii KK >

jA iA

In the next section we argue that science citations proxy for absorptive labor, even if with error, by

representing the outcome of that labor. If this is so the implications of this section are reflected in a small

propensity to cite, dominance of within-field citations, and absence of many between-field citation

possibilities.

III. Citation Analysis

A. The Meaning of Science Citations

Science citations refer to prior literature, but their motivation is obscure. Citations could measure

influence of earlier ideas, they could limit the problem being addressed, they could seek to refute findings,

or they could be a strategy to raise the odds of acceptance. Of these motives the first two are most likely to

represent scientific influence. Given negative or strategic citations, though, we have to regard citations as

measuring scientific influence with error5.

Science citations are typically controlled by authors. Referees and editors can suggest references,

but their inclusion requires author’s assent, suggesting some knowledge of the references. In contrast

patent citations are often suggested by examiners and attorneys and do not imply the same acquaintance as

science citations do. This is an advantage of science citations, but it should not be overstated. The quality

of science citations varies with the people that make them. Again the citations could be strategic because 5 See Banks, Fogarty, and Jaffe (1996) for an analysis that uses a set of NASA patents, as well as expert opinion on the patents, to test the validity of patent citations, answered in the affirmative, as an indicator of the importance of patents.

5

authors choose references with no monetary punishment, unlike patent citations. Citations have gotten

easier to make because of search engines, though whether this raises or lowers quality is unclear.

Suppose that science citations reflect credible investments of time spent searching the literature.

What would be the earmarks of such investments? For starters, the number of citations would set marginal

benefit of another citation equal to its marginal cost, as above. This suggests that citations would span

larger fractions of smaller disciplines, since similar marginal benefit and cost relationships across

disciplines would lower the proportion of larger literatures that is cited. Furthermore, literatures requiring

larger investments of time per cited paper would result in lower citations, size of literature held constant.

B. Citation Measures

The citation probability used by Jaffe and Trajtenberg (1999) is one way to take account of the size of

citing and cited populations. The citation probability is

(4) jtiT

iTjtiTjt nn

cp =

where i and j are citing and cited groups, defined below, and T and t are citing and cited years, T>t. The

term in the numerator is the citation count from group i in year T to group j in year t. Citations are the

number of linked papers in (i, T) and (j, t). The product of and in the denominator are the numbers

of potentially citing and potentially cited papers in (i, T) and (j, t) that could be linked. Thus (4) is bounded

between 0 and 1 and has a probability interpretation. If not one paper in (i, T) cites not one paper in (j, t)

then (4) equals zero. If every paper in (i, T) cites every paper in (j, t) then (4) equals 1.0. In fact (4) is

closer to zero than 1.0, reflecting limits on citation that operate on individual scientific papers.

iTjtc

iTn jtn

Of course the citation probability is one of several measures that could be examined. It is a measure

of average (not total) scientific influence by group on papers in group i . The logic is this. Suppose that

a paper in iT (with superscript ) references a fraction of the literature in , as indicated by citations per

paper . This resembles a utilization rate. The average utilization rate of the literature in by

papers in is then the sum over of divided by , or (4). For a group of citing papers it

makes more sense to use the group utilization rate, which is or (4) multiplied by the number of

j

k jt

jtkiT nc / jt

iT k jtkiT nc / iTn

jtjti nc /τ

6

citing papers . But an analysis of alternative metrics is beyond the scope of this article, which deals

with measuring scientific influence on individual papers.

iTn

C. Citation Function

The citation probability (4) is defined on cells that are specified by citing and cited groups i and j and

by years. The groups are first of all fields. But if the citing and cited fields are the same, then we make a

further distinction. In that case we supplement field with rankings of scientific institutions into high,

medium, or low “rank-stratification” classes. The use of rank-stratification classes in the citation function

is discussed below, while section IV.C explains the empirical procedure that defines the classes. That

procedure relies heavily on National Research Council peer rankings of the quality of graduate programs,

which are statistically independent of the bibliometric evidence that we explore below.

To fit the citation probability to the data we adapt a procedure used in Jaffe and Trajtenberg (1999) to

explain patterns of patent citation. We incorporate the following features of the data. As already noted, we

incorporate differences in own and cross-field citation propensities. Second, we allow for top-down

asymmetries in citation which favor leading scientific institution. Third, we take account of the early

peaking of citation, followed by a long decline, as the lag in citation increases. In this section we model

these effects using nonlinear regression6. The baseline citation function is:

(5) ( )[ ]

( )[ ]{ } iTjtutT

tTjtTijiTjtp

+−−−

−−=

2 exp1

11 exp

β

ββααα

In (5) ijα is the average probability that field cites field , i j Tα is the average probability that a citation is

made in periodT , and tα is the average probability that a citation is received in period t . In the data i

and represent the 12 fields of science in table 1 below. Notice that the parameters are defined relative to

a baseline value

j

7. We normalize ijα by the parameter for chemistry citing itself, whose value accordingly

equals 1.0. Likewise Tα and tα are parameters that are normalized by the earliest periods citing and

cited, whose values equal 1.0. The estimation procedure does not converge when we specify a full set of

6 We thank Adam Jaffe for nonlinear regression programs that we modified and used in this paper. 7 Use of a baseline value avoids indeterminacy in the multiplicative specification (4). This indeterminacy is the analogue to the dummy variable trap in additive regression models.

7

citing and cited years, so we aggregate citing years into periods: 1981-1985, 1986-1990, 1991-1995, and

1996-1999. Thus T and Tα refer to citing intervals, while t and tα refer to single years cited.

The parameter 1β stands for the decay rate in citations by the baseline field of chemistry. The j1β

terms are thus decay parameters relative to chemistry. The parameter 2β governs overall diffusion.

Since 2β positions the overall rate of citation, this parameter is not identified by field independently of the

ijα vector. Finally, the error term is . Estimates of the citation function (5) are shown in table 4

below. The exponential form of (5) handles the familiar hump-shaped pattern of citations over different

lags, as in figure 1 below.

iTjtu

Recall that in addition to (5) we consider a more elaborate specification of the citation function. This

version allows for rank-stratification class effects within sciences. To allow for such effects we replace the

within-field parameter iiα with this 3×3 matrix of citation possibilities:

(6) ⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

33,32,31,

23,22,21,

13,12,11,

))((

iii

iii

iii

ii

ααααααααα

α

Equation (6) takes rank-stratification class effects into account within fields, where most citations occur8.

The leading subscript of (6) refers to field while trailing subscripts refer to the top 20%, the middle 40%,

and the bottom 40% of institutions. Again the parameters are identified up to a baseline value, which in

this case is the probability that top 20% institutions in chemistry cite each other. Hence 11,4α =1.

The first row of (6) consists of probabilities that top 20% schools cite themselves, middle 40%

institutions, and bottom 40% institutions. The second row capture probabilities that the middle 40% cites

the top 20%, themselves, and the bottom 40%. And in the third row are probabilities that the bottom 40%

cites the top 20%, the middle 40%, and themselves. We report estimates of (6) in table 5.

We group the data on citations, potentially citing papers, and potentially cited papers into cells for the

purpose of estimation. The cells are as follows. For each field i and field combination there are 171 j

8 Citations between fields may be too uncommon to allow estimates of rank-stratification effects.

8

citing and cited year combinations, given a citation lag of at least one year9. For each such combination we

define nine cells within fields, as in (6). Between-field cells are fields potentially citing 11 other fields10.

Given 12 fields there are 9×12=108 within-field cells, and up to 11×12=132 between-field cells, or up to

240 cells, for each of the 171 citing and cited year combinations. The potential number of cells is

240×171=41,040. But 4,206 of these are missing. This happens because citing and cited year pairs are

missing if cross-field citations are rare. Thus the number of cells is 41,040-4,206= 36,834.

IV. Description of the Database

A. Scientific Papers and Citations Data

The data set consists of 2.4 million publications of the top 110 U.S. universities and 18.8 million

citations to those papers during 1981-1999. The top 110 universities that are included in the data account

for about three-fourths of the academic R&D conducted in the U.S. The schools are identified in the

appendix to Adams, Black, Clemmons, and Stephan (2004).

The source of the data is the Institute for Scientific Information (ISI) in Philadelphia, Pennsylvania.

The set of publications includes articles, reviews, notes, and proceedings, or ISI’s standard set of

communications, in 12 main fields of science that span scientific research. The fields are agriculture,

astronomy, biology, chemistry, computer science, earth sciences, economics and business; engineering,

mathematics and statistics; medicine, physics, and psychology.

The papers appear in 7,137 scientific journals. The set expands to include new journals through the end

of the sample period. The journal set is also world-wide in scope. It is important to note that journals are

assigned to a single dominant field11. This assignment is reasonably accurate for the vast majority of

specialized journals, in part because of the breadth of the 12 fields. But the method produces serious

assignment errors for about one percent or 70 journals that fall into the Multidisciplinary category. ISI

9 Papers in 1999 can cite papers from 1998 through 1981, forming 18 combinations. Papers in 1998 can cite papers from 1997 through 1981, forming 17 combinations. This continues through 1982, when only 1981 papers can be cited. The sum of the series is S=18+17+16+…+1, or S=(19×18)÷2=171. 10 In table 4 we ignore rank-stratification classes and average over all nine probabilities that correspond to all possible interactions between the different rank-stratification classes for each science. 11 In the case of 5,507 journals that are currently published, we follow the assignment of journals to fields by ISI and rely on ISI’s experience to provide an accurate assignment. In the case of 1,630 journals that are formerly published we rely on field assignments of CHI (Computer Horizons Inc.). The argument is the same, that experience of an established firm in bibliometrics is likely to be more accurate than idiosyncratic assignments by ourselves.

9

treats this category as part of biology, which accounts for the largest number of its papers. General

Multidisciplinary journals include Nature, Science, Proceedings of the National Academy of Sciences

USA, and Philosophical Transactions of the Royal Society. To classify all articles in these journals in

biology is a mistake, though solutions to the problem are scarce. Moreover, quite a few Multidisciplinary

journals are linked to biology, so that this error applies to less than one percent of the journals12.

The main alternative is to assign papers according to academic departments of the authors. But current

bibliometric information rules this out, and in any event this would not generate multiple field

assignments13. This is unlike patents, where multiple class assignments of individual patents are common.

To carry out multiple assignments clear criteria would be needed and these would have to be agreed on by a

Scientific Papers Office, much like the Patent Office. Neither of these conditions is currently satisfied.

Another problem is that science citation data do not and cannot include publication date but not first

submission dates at the start of perhaps several submissions. The use of publication dates creates an

upward bias in the citation lag. The reason is that the lag between cited publication dates and citing

submission dates captures the lag when the paper is written. The extra “frictional” lag between citing

submission and publication dates overstates the true lag. This point suggests that differences in diffusion

across science are biased by the use of citing publication dates.

While voluminous, the scientific papers and citations data are only a window on scientific research.

They are truncated on the left and right in time. As a result we lack most of the citations to papers from the

late 1990s, since the citations are not yet made. We know nothing about the papers that influenced research

in the early 1980s since citations to these papers are excluded. The data are also limited by sector and

country. They are limited to papers that have at least one author in the top 110 U.S. universities. Citations

made by U.S. researchers to foreign institutions are excluded, as are citations received by U.S. researchers

from foreign institutions. For this reason many of the rich interactions of the international scientific

enterprise are left out of our analysis. But the system of U.S. university citations and papers is still a great

improvement over what we have had.

12 Examples include Bioinformatics, Biomaterials, Biometrics, Biometrika, Journal of Mathematical Biology, Journal of Theoretical Biology and many others. 13 As an experiment we tried to assign all Harvard papers to one of the 12 main science fields using address information. A third of the papers could not be assigned to a field using information on authors’ Harvard addresses. This led us to abandon the effort, though more could be done on this in the future were journals to codify fields of authors and classify papers by field of author rather than field of journal.

10

B. Field Dimensions of Citation

Table 1 looks at field dimensions of the data. The 12 main science fields are shown on the left. The

second column reports total papers, percent of all papers, total citations received, and percent of all

citations received. The third column reveals the composition of main fields in terms of sub-fields. Besides

variation in size and complexity, the table brings out differences in citation practices. Biology ranks

second in publications and first in citations received. Computer science ranks last in both, pointing out how

differences in the size of citing populations and propensities to cite can affect citation.

For the field combinations that are significant in the regressions, table 2 reports means of citations,

potential papers citing, and potential papers cited. Table 2 excludes self-citation by a university to itself, as

do all later tables. Such “institutional” self-citation reflects influence of a university’s past research rather

than knowledge spillovers. Out of similar concerns we exclude same-university citations between fields.

Even these precautions do not eliminate hidden self-citations due to collaborations with other universities,

but they are the best that we can do with the present information.

The top entry for each science displays within-field citations, potential papers citing, and potential

papers cited. This is followed by similar statistics on cross-field interactions. For example, biology

interacts with agriculture, chemistry, earth sciences, medicine, and psychology. By and large the cross-

field interactions in table 2 seem reasonable. For example, biology and medicine are significantly linked

through cross-citation and chemistry and physics are similarly linked. While this information confirms

expectations as to the structure of the sciences, it also shows selectivity of cross-field interactions. Each of

the 12 fields can interact with any of the others, yielding a total of 12×11=132 possible interactions, and yet

only 32 or about one-fourth are significant.

Not surprisingly, citations within fields are more common than between-field citations. This is partly

because of our field definitions, which sweep up sub-fields into large aggregates. But it represents as well

greater scientific influence within disciplines, as explained in sections II and IIIA. This discussion neglects

differences in the size of cross- and within field interactions. Comparing the highest cross-field citation

count to the within field count shows that astronomy, mathematics and statistics, and physics are almost

autonomous from the rest of science. But the life science fields, agriculture, biology, medicine, and

psychology display strong cross-field dependencies.

11

Table 3 reports mean citation probabilities. We report means, standard deviations, maxima, and

minima to show variations in the probabilities. On average within-field probabilities are 10 to 100 times

greater than cross-field probabilities. In small fields such as astronomy, computer science, earth sciences,

and economics, within field probabilities exceed the corresponding probabilities in large fields like biology,

chemistry, and medicine. Following sections II and IIIA, our explanation for this is that research tends to

span a larger fraction of the smaller fields, holding constant the difficulty of searching the literature.

Figure 1 explores lags in citation. The figure graphs the mean citation probability by citation lag.

Since the data cover 19 years, the maximum lag is 18 years. Four curves are represented in the figure.

Higher curves represent actual and fitted citation probabilities within fields by citation lag. As in other

applications, both curves are hump-shaped and positively skewed. The fitted curve declines at a faster rate

because it controls for the higher propensity to cite in recent years, which dominates longer lags. The

lower curves show the result of including actual and fitted citation probabilities between fields.

C. Citations and the Quality of Scientific Institutions

As noted, we distinguish high, medium, and low rank-stratification classes in part of the analysis.

The expanded regressions require nine parameters for different combinations of citing and cited rank-

stratification class. Four classes would imply 16 parameters; five classes 25 parameters, and so on.

Allowing for three classes is a compromise that limits the parameters that are handled in the estimation

procedure while still allowing estimates of rank-stratification effects.

We classify schools into rank-stratification classes as follows. First, we use the 1993 National

Research Council (NRC) department chair rankings of graduate programs (National Research Council,

1995) to assign quality ranks to schools. NRC collects rankings in these 10 broad sciences: astronomy,

biology, chemistry, computer science, earth sciences, economics and business; engineering, mathematics

and statistics; physics, and psychology. NRC does not rank agriculture and medicine. As a very imperfect

substitute we use federal R&D of the top 110 schools in 1998 to rank agriculture and medicine. The

strength of the 1993 NRC rankings is their emphasis on quality of program rather than quantity of funding.

Use of federal R&D to rank agriculture and medicine weakens the link between quality and citations. And

yet peer rankings in agriculture and medicine are lacking.

12

The number of ranked graduate programs varies with the size of disciplines. The scale of

engineering, biology, and medicine leads us to break out 75 schools in these fields with the rest of the

schools treated as a residual institution. In the case of eight fields—agriculture, chemistry, computer

science, earth sciences, economics and business; mathematics and statistics; physics, and psychology—we

consider 50 separate schools, with the remainder again treated as a residual14. In the case of astronomy,

where there are few ranked programs we break out 25 schools and treat the rest as a residual. The size of

residual “institutions” treated in this way is the same as an average ranked program.

We classify the rank of a school as high if it falls in the top 20% of ranked schools in a field, as

medium if it falls in the middle 40% and as low if it falls in the bottom 40% (including the remainder)15. In

this way we construct the rank-stratification classes used in the study. Next we calculate the number of

citations, the number of potentially citing papers, and the number of potentially cited papers for every

citing and cited field, every rank-stratification class combination, and every citing and cited pair of years.

Figure 2 sums up the evidence on citation interactions between rank-stratification classes,

averaged across the 12 sciences. Clearly the citation probability from the top 20% to the middle 40% lies

below the probability from the middle 40% to the top 20%. To see this, compare the middle bar in the

leftmost group to the first bar in the middle group. The same holds true of other groups. Scientific

influence is top-down rather than bottom-up, though less highly ranked schools do influence those higher

ranked. In addition the figure shows that the probability of citation among top 20% schools exceeds the

probability among the middle 40%. And the probability of citation among the middle 40% again exceeds

the probability among the bottom 40%. By this measure, knowledge flows among peer schools increase

with rank. To an extent this counters the leveling effect of the top-down asymmetry of knowledge flows.

Figures 3 and 4 illustrate field differences in the rank-stratification effects. Figure 3 contains

findings for engineering, where these effects are small. Thus in figure 3 the citation probability varies by a

factor of 1.75:1 across classes and this citation gradient is smaller than the average for the 10 fields that are

ranked by quality of graduate program.

14 There are 48 formally recognized schools of agriculture. All other research in agriculture derives from researchers scattered through related disciplines. 15 The percentage ranking implies that there are 15 institutions in the top 20% of engineering, biology, and medicine. There are 10 schools in the top 20% of agriculture, chemistry, computer science, earth sciences, economics and business; mathematics and statistics; physics, and psychology. The top 20% of astronomy consists of five schools. In this way scale conditions size of the rank-stratification classes.

13

Results for economics and business are shown in figure 4. Rank-stratification effects are large:

the citation probability varies by 4.25:1 across classes, and the gradient is the steepest of any field. The

difference in figures 3 and 4 is likely due to greater inequality of program quality in economics than in

engineering16. The crossing of the citation curves in figure 4 also suggests some separation of research in

the bottom 40% of economics from the rest of the field. This is because the bottom 40% cites itself at a

higher rate than it is cited by higher-ranked departments in this field.

V. Findings

A. Citation Function Estimates

We turn next to econometric results from fitting the citation function to the data. Table 4 reports

estimates of the baseline function (5). We begin by discussing the within- and between-field intercept

parameters. Recall that the parameters are normalized by chemistry. All the within-field parameters differ

significantly from zero. There is considerable variation in the within-field rate of citation, from a low

0.234 in engineering to a high 13.346 in astronomy, and this is a 50 to 1 range in citation probabilities.

These differences are highly significant compared with the baseline value of 1.0.

Even the leading cross-field parameters are considerably smaller than the within field parameters.

Indeed, citation parameters within fields are typically 10 to 100 times larger than the cross-field parameters,

though the cross-field parameters only include those that are near or above the 5% level of significance17.

Fields vary in the extent to which they cite other fields. Judging by the ratio of the leading cross-field

parameter to the within-field parameter, the following fields—agriculture, biology, engineering, and

medicine—are strongly dependent on other sciences. This shows up in the closeness of agriculture and

biology, biology and medicine, engineering and computer science, and medicine and biology, where the

cross-field parameter is 1/5 or more the size of the within-field parameter. Some of the cross-field effects

may reflect the difficulty of drawing distinctions between fields, but some are real. In contrast,

mathematics and statistics shows no significant dependence on other disciplines.

16 The much higher citation probability in economics and business is due in part to the smaller size of the economics and business literature. This issue is discussed in sections II and III. 17 The results reported in the table are about the same whether interactions that are insignificantly different from zero are included or not in the estimation procedure and thus whether the data cells on which such estimates are based are or are not included. This implies that insignificant cells add very little information.

14

Turning to year effects, we find that these are U-shaped and reach a minimum in 1991. This pattern

controls for citing period effects, which drop slightly during the late 1980s and increase thereafter. The

cited year parameters seem to measure vintage effects, although the most recent papers have not had the

same opportunity to be cited as earlier papers. The upward drift in citation over time is shown by the

rising parameters over more recent citing intervals.

This discussion of table 4 concludes with the decay and diffusion parameters. Differentiation of the

citation function (5) shows that the reciprocal of the baseline diffusion parameter (chemistry) times the

diffusion parameter for each science yields the modal lag, or the lag at which citations peak:

1β

i1β

(7) iiModalL 11, /1 ββ= .

To prove this, take the derivative of the citation function, set it equal to zero, and solve for . The

modal lag is a measure of the speed of diffusion. Table 4 shows that this lag ranges from 1.75 years in

physics to 4.2 years in computer science. Average modal lags between citing and cited publication years

are about two years less than lags for patented technology (Jaffe and Trajtenberg, 1999) based on citing and

cited grant years. Since publication and grant years are to a degree analogues, one interpretation of this

difference could be that Open Science institutions accelerate the spread of knowledge through society

compared with proprietary technology

iModalL ,

18.

The peak citation probability is approximately equal to

(8) iiP 112max / βββ=

For chemistry (where =1) the peak citation probability is approximately 2.0×10i1β-4. This is roughly 100

times larger than the peak citation probability for patent citations, which is about 1.5×10-6, indicating the

greater volume of science citations compared with patent citations19.

Table 5 reports estimates of the rank-stratification effects within sciences that follow the expanded

citation function (6). Reporting in table 5 is limited to rank-stratification parameters, since the other

estimates are closely similar to those of table 4.

18 See David (1998) for an account of the creation of Open Science institutions as a result of competing patronage arrangements, and of the importance of these institutions for the rise of science in Europe. 19 This comparison draws on Appendix B of Jaffe and Trajtenberg (1999), which reports a baseline value for 1β of 0.190, and a value for 2β of 0.289×10-6.

15

Results are as follows. First, the diagonal parameters generally confirm the view that institutions

in the top 20% are more likely to cite other top 20% institutions than middle 40% institutions are to cite

each other; and these are more likely to each other than bottom 40% institutions are. For example, the

diagonal parameters in computer science for the top 20%, middle 40%, and bottom 40% are 1.490, 1.176,

and 0.712. The same pattern holds everywhere but agriculture. Citation probabilities increase with rank of

institutions, suggesting that knowledge spillovers among peer institutions rise with quality.

A second feature is the tendency for less prestigious schools to cite those more prestigious at a

higher rate than the reverse. The implication is that scientific influence is primarily from the top-down.

Chemistry provides an example. The parameter for the top 20% citing the middle 40% is 0.700, while the

parameter for the middle 40% citing the top 20% is 0.924. The difference turns out to be highly significant.

The same pattern holds in other comparisons within chemistry and most other disciplines and implies that

scientific influence increases with quality of program. The main exception again is agriculture.

B. Conditional Symmetry Tests

The regression findings of tables 4 and 5 suggest opportunities to test for symmetry of the citation

parameters across fields and rank-stratification classes. Table 6 summarizes the results. Note that these are

pair-wise tests of symmetry of the parameters, which we shall call conditional symmetry tests, because they

assume statistically significant two-way citation between pairs of fields or classes.

The first line of table 6 reports tests of equality of the cross-field citation parameters in table 4.

For example, does the rate at which medicine cites biology differ from the rate at which biology cites

medicine? The first line provides a summary of answers to this question, if citation takes place in both

directions. The answer is that cross-citation effects are symmetric. The null hypothesis of equality of the

parameters is usually accepted. One exception is economics, which cites mathematics and statistics more

than the reverse at the 1% level of significance. A second case is physics, which cites astronomy at a

higher rate than the reverse at the 2% level of significance. Two other cross-citation parameters are

unbalanced and are missed by the above evaluation: agriculture cites earth sciences and astronomy cites

biology, but neither is cited in return. Besides this, the conditional symmetry tests miss asymmetries in

scientific influence that occur in the main body of scientific papers. Many papers use older mathematics

and statistics but feedback effects from these fields to modern mathematics and statistics are less common.

16

Our method ignores such hidden asymmetries, which require encoding of article content that is beyond

current frontiers of bibliometrics.

The second line of table 6 tests for differences in the probability of citation by rank-stratification

class and refers to table 5. Equality and symmetry are rejected in most cases, with agriculture the main

exception. Findings for computer science illustrate the result. The top 20% institutions in computer

science cite each other more than the middle or bottom 40% institutions do, and the differences are

significant at the 1% level. The third line tests for asymmetries in top-down versus bottom-up citation.

Symmetry in the cross-effects is rejected at the 1% level in most cases, with agriculture and medicine the

exceptions20. Thus asymmetry of the rank-stratification effects in table 6 is accepted most of the time.

VI. Conclusion

This paper has described scientific influence among U.S. universities measured by means of citations

to scientific papers. One finding is that scientific influence mostly occurs within fields. Another is that

cross-field interactions are selective: significant cross-field interactions represent less than a quarter of

potential cases. Collectively this suggests that academic knowledge flows are bounded by scientific

distance, much as industrial knowledge flows seem to be hemmed in by technological distance (Adams and

Jaffe, 1996).

We also find that two-way interactions between fields are typically symmetric, so that field A cites

field B as often as B cites A. Still, we are convinced that hidden asymmetries are present in field-to-field

interactions. This is because applied fields cite other fields more often than more basic fields cite, and

because of deeply buried content in applied science papers that uses and interprets basic science materials,

while the reverse is not true.

In addition, knowledge seems to diffuse rapidly, and may turn out to diffuse more rapidly within

science than technology does within industry. Scientific paper citations are also more abundant than patent

citations judged by citation frequency. Our evidence confirms that rank of field in a university is correlated

with scientific influence. Tests of the null hypothesis that higher ranked university-fields are more often

cited than those lower-ranked are accepted in 30 out of 36 cases. We test whether interactions with peer 20 The reappearance of agriculture and medicine on the list of exceptions calls for an explanation. In part the pattern recurs because we rank programs according to quantity of federal R&D rather than quality of graduate program. But also the two fields may show greater equality than most other sciences. This tendency seems to hold for engineering, where rank-stratification classes follow the NRC rankings.

17

institutions increase with rank. The answer is again yes, in 30 out of 36 cases. This implies that

surrounding programs reinforce research in a given program more as rank increases.

The work presented here is but one ingredient in a full-fledged knowledge production for the academic

sector. The production process would explain papers and patents of universities and it would include as

explanatory variables knowledge spillovers as well as current and past contributions from a university’s

own research21. Of course, pursuit of this agenda requires information on different channels of interaction

among universities and fields, as well as the spillback of a university-field’s own past research, besides its

current research support.

Still further ahead our goal is to extend this methodology to consider effects on firm papers and patents

of multi-dimensional spillovers from universities to firms and from firms to firms, in addition to the role of

firm’s own research efforts in the determination of innovative success22. The resulting edifice of the

knowledge production function in industry is itself an ingredient, though a crucial one, in the economics of

growth and technical change.

21 Adams and Griliches (1998) studied production of academic research for samples of university-fields, but without knowledge spillovers. Their findings suggest that production obeyed constant returns at the aggregate level, but decreasing returns at the individual level. This may follow from knowledge externalities, or another factor operating more strongly at the individual level, such as errors in variables. 22 Popp (2002) represents an approach to this question.

18

19

Figure 1--Mean Citation Probabilities by Citation Lag

0

0.00002

0.00004

0.00006

0.00008

0.0001

0.00012

0.00014

0.00016

0.00018

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Lag in Years

Cita

tion

Prob

abili

ty

All Citations, ActualAll Citations, PredictedCitations Within Field, ActualCitations Within Field, Predicted

Figure 2--Mean Citation Probabilities Within FieldsBetween Schools of Different Rank

0

0.00004

0.00008

0.00012

0.00016

0.0002

Citation to Top 20% inField

Citation to Middle 40%in Field

Citation to Bottom 40%in Field

Cited Group

Cita

tion

Prob

abili

ty

Citation from Top 20%Citation from Middle 40%Citation from Bottom 40%

Figure 3--Probability of Citation by Citing and Cited Rank

Of School, Engineering

10

12.5

15

17.5

20

22.5

Top 20% Middle 40% Bottom 40%

Cited Rank of School

Prob

abili

ty x

106

Top 20% Citing

Middle 40% Citing

Bottom 40% Citing

Figure 4--Probability of Citation by Citing and Cited RankOf School, Economics and Business

50

100

150

200

250

300

350

Top 20% Middle 40% Bottom 40%

Cited Rank of School

Prob

abili

ty x

106

Top 20% Citing

Middle 40% Citing

Bottom 40% Citing

20

Table 1 Definition, Size, and Composition of 12 Main Science Fields

Practiced in the Top 110 U.S. Universities, 1981-1999

Main Science Field

Total Papers

(% of Total Papers) [Total Citations Received]

{% of Total Citations Received}

Sub-Field Composition of Main Science Field

Agriculture

189,740 (7.8%)

[730,777] {3.9%}

General agriculture and agronomy; aquatic sciences; animal sciences; plant sciences; agricultural chemistry; entomology and pest control; food science and nutrition; veterinary medicine and animal health

Astronomy

35,795 (1.5%)

[371,982] {2.0%}

Astronomy and astrophysics

Biology

639,195 (26.3%)

[8,339,862] {44.4%}

General biological sciences; biochemistry and biophysics; cell and developmental biology; ecology and environment; molecular biology and genetics; biotechnology and applied microbiology; microbiology; experimental biology; immunology; neurosciences and behavior; pharmacology and toxicology; physiology; oncogenesis and cancer research

Chemistry

195,437 (8.0%)

[1,371,491] {7.3%}

General chemistry; analytical chemistry; inorganic and nuclear chemistry; organic chemistry and polymer science; physical chemistry and chemical physics; spectroscopy, instrumentation, and analytical science

Computer Science

28,184 (1.2%)

[76,424] {0.4%}

Computer science and engineering; information technology and communications systems

Earth Sciences

73,126 (3.0%)

[566,280] {3.0%}

Atmospheric sciences; geology and other earth sciences; geological, petroleum, and mining engineering; oceanography

Economics and Business

43,892 (1.8%)

[161,813] {0.9%}

Economics; accounting; decision and information sciences; finance, insurance, and real estate; management; marketing

Engineering

170,569 (7.0%)

[467,955] {2.5%}

Aeronautical engineering; biomedical engineering; chemical engineering; civil engineering; electrical and electronics engineering; engineering mathematics; environmental engineering and energy; industrial engineering, materials science; mechanical engineering; metallurgy; nuclear engineering

21

Table 1 Definition, Size, and Composition of 12 Main Science Fields

Practiced in the Top 110 U.S. Universities, 1981-1999

Main Science Field

Total Papers

(% of Total Papers) [Total Citations Received]

{% of Total Citations Received}

Sub-Field Composition of Main Science Field

Mathematics and Statistics

61, 061 (2.5%)

[187,484] {1.0%}

Mathematics; biostatistics and statistics

Medicine

659,000 (27.1%)

[4,563,261] {24.3%}

General and internal medicine; anesthesia and intensive care; cardiovascular and hematology research; cardiovascular and respiratory systems; clinical immunology and infectious disease; clinical psychology and psychiatry; dentistry and oral surgery; dermatology; endocrinology, metabolism, and nutrition; environmental medicine and public health; gastroenterology and hepatology; health care sciences and services; hematology; medical research, diagnosis, and treatment; medical research, general topics; medical research, organs and systems; neurology; oncology; ophthalmology; orthopedics, rehabilitation, and sports medicine; otolaryngology; pediatrics; radiology, nuclear medicine, and imaging; reproductive medicine; research, laboratory medicine, and medical technology; rheumatology; surgery; urology and nephrology

Physics

217,026 (8.9%)

[1,219,080] {6.5%}

General physics; applied physics, condensed matter, and materials science; optics and acoustics

Psychology 116,976 (4.8%)

[727,673] {3.9%}

Psychology and psychiatry

Notes: Citations received derive from top 110 universities during the period 1981-1999. They are not a census of citations received from all scientific institutions or all papers in the future. Citations can be from any field to any field among the sciences listed in the table. The total number of papers is 2,430,001. The total number of citations received is 18,784,082.

22

Table 2 Mean Citations and Papers by Citing and Cited Field of Science,

The Top 110 U.S. Universities, 1981-1999

Citing Field

Cited Field Citations Potential Papers

Citing Potential Papers

Cited

Agriculture Agriculture 2,543 11,326 10,671 “ Biology 1,843 “ 34,411 “ Earth Sciences 106 “ 3,979 Astronomy

Astronomy

3,218

2,879

2,127

“ Biology 212 “ 34,411 “ Earth Sciences 123 “ 3,979 “ Physics 118 “ 11,747 Biology

Biology

40,349

44,135

34,411

“ Agriculture 905 “ 10,671 “ Chemistry 625 “ 10,035 “ Earth Sciences 351 “ 3,979 “ Medicine 6,454 “ 36,725 “ Psychology 530 “ 7,007 Chemistry

Chemistry

4,989

12,166

10,035

“ Biology 1,101 “ 34,411 “ Physics 492 “ 11,747 Computer Science

Computer Science

326

2,031

1,410

“ Mathematics & Statistics 28 “ 3,572 “ Engineering 81 “ 8,205 Earth Sciences

Earth Sciences

3,324

5,312

3,979

“ Astronomy 113 “ 2,127 “ Biology 668 “ 34,411 Economics & Business

Economics & Business

1,315

2,966

2,632

“ Mathematics & Statistics 153 “ 3,572 “ Psychology 54 “ 7,007 Engineering

Engineering

1,501

11,434

8,205

“ Computer Science 133 “ 1,410 “ Mathematics & Statistics 99 “ 3,572 “ Physics 328 “ 11,747 Mathematics & Statistics

Mathematics & Statistics

828

3,975

3,572

“ Computer Science 19 “ 1,410 “ Economics & Business 32 “ 2,632 “ Engineering 48 “ 8,205 Medicine

Medicine

26,714

45,734

36,725

“ Biology 9,764 “ 34,411 “

Psychology 737 “ 7,007

23

Table 2 Mean Citations and Papers by Citing and Cited Field of Science,


Citing Field

Cited Field Citations Potential Papers

Citing Potential Papers

Cited

Physics

Physics

13,561

16,272

11,747

“ Astronomy 174 “ 2,127 “ Chemistry 315 “ 10,035 “ Engineering 226 “ 8,205 Psychology

Psychology

4,399

7,804

7,007

“ Biology 555 “ 34,411 “ Economics & Business 50 “ 2,632 “

Medicine 789 “ 36,725

Notes: Entries are means over as many as 171 citing and cited year pairs for each citing and cited field combination, where the lags range from one to eighteen years. The statistics are based on 36,834 cells that report number of citations and numbers of potentially citing and cited papers, classified by citing and cited groups and years. Self-citations within a field and citations between fields in the same university are excluded from this analysis.

24

Table 3

Moments of the Citation Probabilities, By Citing and Cited Field Papers and Citations of the Top 110 U.S. Universities, 1981-1999

Mean S.D. Min Max Citing Field Cited Field (All Entries in Units of 10-6 a )

Agriculture

Agriculture

20.7

6.6

8.6

38.7

“ Biology 4.6 1.2 2.3 7.1 “ Earth Sciences 2.3 0.6 0.8 4.0 Astronomy

Astronomy

526.0

226.3

108.2

1081.0

“ Biology 2.2 1.7 0.3 11.2 “ Earth Sciences 11.8 7.1 2.1 40.9 “ Physics 3.6 2.2 0.6 14.7 Biology

Biology

25.7

11.9

5.0

44.6

“ Agriculture 1.9 0.7 0.7 3.3 “ Chemistry 1.4 0.5 0.5 2.5 “ Earth Sciences 1.9 0.7 0.6 3.4 “ Medicine 3.9 1.7 0.9 7.0 “ Psychology 1.7 0.6 0.7 3.7 Chemistry

Chemistry

40.9

16.5

11.1

84.1

“ Biology 2.5 1.0 0.8 5.1 “ Physics 3.4 1.2 1.1 5.8 Computer Science

Computer Science

125.5

52.6

41.0

302.3

“ Mathematics & Statistics 4.0 2.1 0.3 11.5 “ Engineering 4.9 2.3 0.7 12.9 Earth Sciences

Earth Sciences

159.1

56.9

67.8

416.6

“ Astronomy 10.3 5.8 2.3 42.1 “ Biology 3.5 1.5 1.4 10.4 Economics & Business

Economics & Business

165.1

47.4

80.2

262.4

“ Mathematics & Statistics 15.1 7.6 1.6 39.0 “ Psychology 2.7 1.2 0.2 6.0 Engineering

Engineering

15.9

5.0

6.1

27.3

“ Computer Science 8.6 3.0 2.0 17.7 “ Mathematics & Statistics 2.5 0.7 0.7 4.5 “ Physics 2.4 0.8 0.8 4.8 Mathematics & Statistics

Mathematics & Statistics

58.2

13.6

31.3

86.8

“ Computer Science 3.6 1.7 0.4 10.2 “ Economics & Business 3.3 1.9 0.2 11.0 “ Engineering 1.5 0.6 0.3 3.9 Medicine

Medicine

15.5

6.1

4.1

25.8

“ Biology 5.9 2.5 1.2 10.5 “

Psychology 2.3 0.7 0.9 4.3

25

Table 3 Moments of the Citation Probabilities, By Citing and Cited Field Papers and Citations of the Top 110 U.S. Universities, 1981-1999

Mean S.D. Min Max Citing Field Cited Field (All Entries in Units of 10-6 a )

Physics

Physics

60.2

49.0

10.0

238.0

“ Astronomy 5.0 3.2 0.5 14.1 “ Chemistry 2.0 0.9 0.5 4.5 “ Engineering 1.6 0.7 0.4 3.5 Psychology

Psychology

79.6

23.9

30.7

145.5

“ Biology 2.0 0.6 0.8 3.5 “ Economics & Business 2.5 1.8 0.1 11.6 “

Medicine 2.7 0.9 1.0 4.9

Notes. The entries are means over as many as 171 citing and cited year pairs for each of the citing and cited field combinations. The calculations are based on 36,834 citing and cited field, rank-stratification class, and year observations. See the text for more discussion. a The statement that all entries are in units of 10-6 means that 20.7 is 20.7×10-6, 6.6 is 6.6×10-6, and likewise for all the other entries in the table.

26

Table 4

Baseline Citation Function, with Cross-Field Effects The Top 110 U.S. Universities, 1981-1999

Variable or Statistic

Regression Parameter

Asymptotic

Standard Error

Asymptotic

t-Statistic, H0=0

Asymptotic

t-Statistic, H0=1

Field Intercepts (αi s)

Citing Field Cited Field

Agriculture

Agriculture

0.334

0.020

16.7

-33.3

“ Biology 0.073 0.012 6.1 -77.3 “ Earth Sciences 0.036 0.019 1.9 -50.7

Astronomy Astronomy 13.346 0.337 39.6 36.6 “ Biology 0.057 0.024 2.4 -39.3 “ Earth Sciences 0.265 0.042 6.3 -17.5 “ Physics 0.086 0.031 2.8 -29.5 Biology Biology 0.702 0.023 30.5 -13.0 “ Agriculture 0.048 0.017 2.8 -56.0 “ Chemistry 0.035 0.017 2.1 -56.8 “ Earth Sciences 0.048 0.021 2.3 -45.3 “ Medicine 0.102 0.013 7.8 -69.1 “ Psychology 0.041 0.019 2.2 -50.5 Chemistry Chemistry 1.000 -- -- -- “ Biology 0.058 0.016 3.6 -58.9 “ Physics 0.076 0.020 3.8 -46.2 Computer Science Computer Science 1.616 0.056 28.9 11.0 “ Engineering 0.065 0.021 3.1 -44.5 “ Mathematics & Statistics 0.053 0.026 2.0 -36.4 Earth Sciences Earth Sciences 2.929 0.079 37.1 24.4 “ Astronomy 0.186 0.031 6.0 -26.3 “ Biology 0.066 0.015 4.4 -62.3 Economics & Business Economics & Business 2.358 0.066 35.7 20.6 “ Mathematics & Statistics 0.190 0.024 7.9 -40.9 “ Psychology 0.035 0.019 1.8 -50.8 Engineering Engineering 0.234 0.018 13.0 -42.6 “ Computer Science 0.124 0.025 5.0 -35.0 “ Mathematics & Statistics 0.036 0.019 1.9 -50.7 “ Physics 0.035 0.014 2.5 -68.9 Mathematics & Statistics Mathematics & Statistics 0.867 0.035 24.8 -3.8 “ Computer Science 0.049 0.029 1.7 -32.8 “ Economics & Business 0.047 0.025 1.9 -38.1 “ Engineering

0.021 0.019 1.1 -51.5

27

Table 4 Baseline Citation Function, with Cross-Field Effects




Asymptotic

Standard Error

Asymptotic

t-Statistic, H0=0

Asymptotic

t-Statistic, H0=1

Field Intercepts (αi s)

Citing Field Cited Field

Medicine

Medicine

0.324

0.014

23.1

-48.3

“ Biology 0.126 0.011 11.5 -79.5 “ Psychology 0.045 0.015 3.0 -63.7 Physics Physics 3.414 0.096 35.6 25.1 “ Astronomy 0.239 0.059 4.1 -12.9 “ Chemistry 0.086 0.040 2.2 -22.9 “ Engineering 0.069 0.041 1.7 -22.7 Psychology Psychology 1.137 0.034 33.4 4.0 “ Biology 0.028 0.011 2.5 -88.4 “ Economics & Business 0.034 0.020 1.7 -48.3 “ Medicine 0.037 0.010 3.7 -96.3 Cited Year Effects

1981 1.000 -- -- -- 1982 1.012 0.009 112.4 1.3 1983 1.031 0.010 103.1 3.1 1984 1.027 0.011 93.4 2.5 1985 0.993 0.011 90.3 -0.6 1986 0.956 0.011 86.9 -4.0 1987 0.862 0.011 78.4 -12.5 1988 0.798 0.011 72.5 -18.4 1989 0.761 0.011 69.2 -21.7 1990 0.739 0.012 61.6 -21.8 1991 0.722 0.012 60.2 -23.2 1992 0.775 0.013 59.6 -17.3 1993 0.725 0.013 55.8 -21.2 1994 0.744 0.015 49.6 -17.1 1995 0.766 0.016 47.9 -14.6 1996 0.822 0.018 45.7 -9.9 1997 0.832 0.019 43.8 -8.8 1998 1.110 0.028 39.6 3.9 Citing Interval Effects

1981-1985 1.000 -- -- 1986-1990 0.925 0.008 115.6 -9.4 1991-1995 1.070 0.015 71.3 4.7 1996-1998

1.160 0.022 52.7 7.3

28

Table 4 Baseline Citation Function, with Cross-Field Effects




Asymptotic

Standard Error

Asymptotic

t-Statistic, H0=0

Asymptotic

t-Statistic, H0=1

Decay Parameter (β1)

0.353

0.006

58.8

--

Diffusion Parameter (β2) 7.2×10-5 1.86×10-6 387.1 --

Field Decay Parameters (β1i s)

Agriculture 0.778 0.029 26.8 -7.7 Astronomy 1.044 0.016 65.3 2.8 Biology 1.068 0.021 50.9 3.2 Chemistry 1.000 -- -- -- Computer Science 0.675 0.015 45.0 -21.7 Earth Sciences 0.849 0.014 60.6 -10.8 Economics & Business 0.679 0.012 56.6 -26.8 Engineering 0.738 0.037 19.9 -7.1 Mathematics & Statistics 0.716 0.018 39.8 -15.8 Medicine 0.917 0.024 38.2 -3.5 Physics 1.623 0.028 58.0 22.3 Psychology

0.691 0.013 53.2 -23.8

Notes. The number of cells, classified by citing and cited fields and years, is 36,834. The adjusted R2=0.900 and the standard error of the regression (the root mean squared error) is 0.0013. Citations from the same university are treated as self-citations and hence are excluded from the equation. Reported cross-field citation parameters are at or near the margin of significance for a test of H0=0.

29

Table 5

Citation Function: Effects of Rank Stratification-Class The Top 110 Universities, 1981-1999

(Asymptotic Standard Errors in Parentheses)

Cited Rank-Stratification Class Field and Citing Rank-

Stratification Class Top 20%

Middle 40% Bottom 40%

Agriculture

Top 20%

0.198 (0.017)

0.224 (0.016)

0.198 (0.017)

Middle 40%

0.244 (0.017)

0.232 (0.016)

0.203 (0.017)

Bottom 40%

0.195 (0.017)

0.184 (0.016)

0.326 (0.022)

Astronomy Top 20%

8.733 (0.263)

9.753 (0.291)

7.969 (0.239)

Middle 40%

11.254 (0.335)

8.133 (0.244)

7.902 (0.236)

Bottom 40%

9.893 (0.295)

8.852 (0.264)

7.723 (0.231)

Biology Top 20%

0.867 (0.031)

0.528 (0.021)

0.303 (0.015)

Middle 40%

0.733 (0.026)

0.451 (0.018)

0.307 (0.015)

Bottom 40%

0.531 (0.021)

0.395 (0.017)

0.275 (0.014)

Chemistry Top 20%

1.000 (--)

0.700 (0.028)

0.516 (0.023)

Middle 40%

0.924 (0.032)

0.660 (0.026)

0.519 (0.022)

Bottom 40%

0.809 (0.028)

0.623 (0.023)

0.490 (0.019)

Computer Science Top 20%

1.490 (0.059)

1.079 (0.047)

0.598 (0.035)

Middle 40%

1.636 (0.063)

1.176 (0.049)

0.739 (0.037)

Bottom 40% 1.280 (0.051)

1.066 (0.045)

0.712 (0.035)

Earth Science Top 20%

2.685 (0.086)

2.093 (0.068)

1.671 (0.056)

Middle 40%

2.556 (0.081)

1.852 (0.061)

1.608 (0.053)

Bottom 40%

2.140 (0.069)

1.801 (0.059)

1.519 (0.050)

Economics and Business Top 20%

2.693 (0.086)

1.572 (0.054)

0.678 (0.030)

30

Table 5 Citation Function: Effects of Rank Stratification-Class

The Top 110 Universities, 1981-1999 (Asymptotic Standard Errors in Parentheses)

Cited Rank-Stratification Class Field and Citing Rank-

Stratification Class Top 20%

Middle 40% Bottom 40%

Economics and Business (Cont.)

Middle 40%

2.853 (0.090)

1.597 (0.054)

0.897 (0.035)

Bottom 40%

2.001 (0.065)

1.528 (0.051)

1.046 (0.037)

Engineering Top 20%

0.211 (0.018)

0.156 (0.016)

0.122 (0.016)

Middle 40%

0.198 (0.017)

0.140 (0.015)

0.122 (0.015)

Bottom 40%

0.173 (0.017)

0.141 (0.016)

0.124 (0.016)

Mathematics and Statistics Top 20%

0.857 (0.040)

0.637 (0.033)

0.365 (0.025)

Middle 40%

0.910 (0.040)

0.575 (0.029)

0.382 (0.023)

Bottom 40%

0.677 (0.032)

0.543 (0.027)

0.395 (0.022)

Medicine Top 20%

0.248 (0.013)

0.226 (0.012)

0.180 (0.011)

Middle 40%

0.252 (0.013)

0.211 (0.011)

0.187 (0.011)

Bottom 40%

0.222 (0.013)

0.206 (0.012)

0.187 (0.012)

Physics Top 20%

2.184 (0.077)

2.258 (0.078)

1.572 (0.059)

Middle 40%

2.634 (0.089)

2.840 (0.094)

2.253 (0.076)

Bottom 40%

2.120 (0.073)

2.622 (0.087)

1.994 (0.068)

Psychology Top 20%

0.929 (0.034)

0.806 (0.031)

0.658 (0.025)

Middle 40%

0.996 (0.036)

0.755 (0.029)

0.634 (0.024)

Bottom 40%

0.880 (0.031)

0.748 (0.027)

0.610 (0.022)

Notes. The number of citing and cited group and year observations is 36,834. The adjusted R2=0.938 and the standard error of the regression (root mean squared error) is 0.0010. * The t-statistic is reported for the null hypothesis H0=0. ** The t-statistic is reported for the null hypothesis that H1=1. Citations from the same university are treated as self-citations and hence are excluded from the equation. The regression includes all the cross-field citation parameters, cited year effects, and citing year interval effects of Table 5.

31

Table 6 Conditional Symmetry Tests of the Citation Function

Top 110 Universities, 1981-1999

Test

Null Hypothesis

Purpose

Summary

Exceptions

Equality of Between-Field Citation Parameters

jiij αα = Check for asymmetries in the direction of citation between fields i and j

Equality is accepted by 13 of 15 tests at the 5% level of significance

Economics and Business cites Mathematics and Statistics more than the reverse (χ2=16.7, P<0.0001); Physics cites Astronomy more than the reverse (χ2=5.1, P=0.0240)

Equality of Within-Field, Within Rank-Stratification Class Parameters

llikki ,, αα =

Check for asymmetries in citation within quality groups k and l within field i

Equality is rejected by 33 of 36 tests at the 1% level of significance. Citation increases with quality of institution in 30 of 36 tests

Top 20% of Agriculture is cited less than the bottom 40% (χ2=33.7, P<0.0001); middle 40% is cited less than the bottom 40% (χ2=21.5, P<0.0001). Top 20% of Physics is cited less than the middle 40% (χ2=139.7,P<0.0001);

Equality of Within-Field, Between Rank-Stratification Class Parameters

lkikli ,, αα = Check for asymmetries in citation across quality groups k and l and within field i

Equality is rejected by 30 of 36 tests at the 1% level of significance. Citation is less top to bottom than it is bottom to top in 30 of 36 tests

All tests accept equality in agriculture. Equality between the top 20% and middle 40% of engineering is accepted at the 1% but not 2% levels (χ2=5.1, P=0.0237). Equality between the top 20% and middle 40% of medicine is accepted at the 1% but not 3% levels (χ2=4.6, P=0.0317). Equality between the middle 40% and bottom 40% of medicine is accepted.

Notes. All χ2 tests are Wald Tests that evaluate the difference in the parameters from zero evaluated at the unrestricted likelihood function.

32

References

Adams, James D., “Fundamental Stocks of Knowledge and Productivity Growth,” Journal of Political

Economy 98 (August 1990): 673-702.

______________, and Adam B. Jaffe, “Bounding the Effects of R&D: An Investigation Using Matched

Establishment-Firm Data,” RAND Journal of Economics 27 (Winter 1996): 700-721.

______________, and Zvi Griliches, “Research Productivity in a System of Universities,” Annales

D’Economie et de Statistique 49/50 (1998): 127-162.

Audretsch, David, and Paula E Stephan, “Company-Scientist Locational Links: The Case of

Biotechnology,” American Economic Review 86 (June 1996): 641-652.

Branstetter, Lee, “Measuring the Impact of Academic Science on Industrial Innovation: The Case of

California’s Research Universities,” Working Paper, Columbia University, New York, NY:

August 2003.

Cohen, Wesley M., Richard R. Nelson, and John P. Walsh, “Links and Impacts: The Influence of Public

Research on Industrial R&D,” Management Science 48 (January 2002): 1-23.

Darby, Michael R., and Lynne G. Zucker, “Grilichesian Breakthroughs: Inventions of Methods of

Inventing and Firm Entry in Nanotechnology,” NBER Working Paper 9825, Cambridge, MA: July

2003.

David, Paul A., “Common Agency Contracting and the Emergence of ‘Open Science’ Institutions,”

American Economic Review, Papers and Proceedings 88 (May 1998): 15-21.

De Solla Price, Derek J., “Networks of Scientific Papers,” Science, New Series, 149 (July 30, 1965):

510-515.

___________________, Little Science, Big Science…and Beyond, New York: Columbia University

Press, 1986.

Griliches, Zvi, “Issues in Assessing the Contribution of Research and Development to Productivity

Growth,” Bell Journal of Economics 10 (Spring 1979): 92-116.

___________, “The Search for R&D Spillovers,” Scandinavian Journal of Economics 94 (Supplement,

1992): S29-S47.

33

Evenson, Robert E., and Yoav Kislev, Agricultural Research and Productivity. New Haven,

Connecticut: Yale University Press, 1975.

Garfield, Eugene, “Citation Analysis as a Tool in Journal Evaluation,” Science, New Series 178

(November 3, 1973): 471-479.

Jaffe, Adam B., “Technological Opportunity and Spillovers of R&D: Evidence from Firms’ Patents,

Profits, and Market Value,” American Economic Review 75 (December 1986): 984-1002.

____________, Michael S. Fogarty, and Bruce R. Banks, “Evidence from Patents and Patent Citations on

The Impact of NASA and Other Federal Labs on Commercial Innovation,” Journal of Industrial

Economics 96 (1998): 183-205.

____________, and Manuel Trajtenberg, “International Knowledge Flows: Evidence from Patent

Citations,” Economics of Innovation and New Technology 8 (1999): 105-136.

Jensen, Richard, and Marie Thursby, “Proofs and Prototypes for Sale: The Licensing of University

Inventions,” American Economic Review 91 (March 2001): 240-259.

Jones, Charles I., “R&D-Based Models of Economic Growth,” Journal of Political Economy 103 (August

1995): 759-784.

_____________, “Sources of Economic Growth in a World of Ideas,” American Economic Review 92

(March 2002): 220-239.

Marshall, Alfred, Principles of Economics, 8th Edition. London, Macmillan, 1920

Merton, Robert K., The Sociology of Science. Chicago, Illinois: University of Chicago Press, 1973.

National Research Council, Research-Doctorate Programs: Continuity and Change, Washington, DC:

National Academy Press, 1995.

Narin, F., and Kimberly S. Hamilton, “Bibliometric Performance Measures,” Scientometrics 36 (1996):

293-310.

_______, Kimberly S. Hamilton, and Dominic Olivastro, “The Increasing Linkage between U.S.

Technology and Public Science,” Research Policy 26 (1997): 317-330.

Popp, David, “Induced Innovation and Energy Prices,” American Economic Review 92 (March 2002):

160-180.

34

Romer, Paul M., “Endogenous Technological Change,” Journal of Political Economy 98 (October 1990,

Part 2): S71-S102.

Scherer, F. Michael, “Interindustry Technology Flows in the United States,” Research Policy 11 (August

1982a): 227-245.

________________, “Interindustry Technology Flows and Productivity Growth,” The Review of

Economics and Statistics 64 (November 1982b): 627-634.

Trajtenberg, Manuel, “A Penny for your Quotes: Patent Citations and the Value of Innovations,” RAND

Journal of Economics 21 (1990): 172-187.

Van Raan, A. F. J., “Fractal Dimensions of Co-Citations,” Nature 347 (October 18, 1990): 626.

Zucker, Lynne G., Michael R. Darby, and Marilyn B. Brewer, “Intellectual Human Capital and the Birth of

Biotechnology Enterprises,” American Economic Review 88 (March 1998): 290-306.

Zuckerman, Harriet A., Scientific Elite: Nobel Laureates in the United States. New York: The Free

Press, 1977.

35

nber working paper series standing on academic … · j. roger clemmons institute for child health...

Documents