the synthesis of specialty narratives from co-citation clusters

14
The Synthesis of Specialty Narratives from Co=citation Clusters Henry Small institute for Scientific Information, 3507 Market Street, Philadelphia, PA 19104 A method is described for generating reviews or synop- ses of scientific fields called specialty narratives. The raw data for the narrative are the Science Citation Index and selected texts from the published scientific litera- ture. The review process is modeled as a walk through a co-citation network using a quasi-minimal spanning tree path and a depth-first search method. Text for the narrative is derived from a statistical analysis of pas- sages which cite the core (highly cited) documents in the cluster. A method is devised for selecting the pas- sage which is most representative of the citing pas- sages for a specific core document using shared con- tent words. This is called the consensus passage. Transitions between nodes in the network are derived from an analysis of co-citing contexts, which reveal why the concepts represented by the documents are associ- ated. The final specialty narrative consists of consen- sus passages ordered by the tree search, and con- nected to one another by transitional sentences. The method is applied to a co-citation cluster in the field of cancer virology, and the resulting specialty narrative is described in relation to the structure of the original co- citation map. Implications of the procedure for the cog- nitive processes involved in reviewing a field, and the extension of the method to other fields and higher level maps of science are discussed. Introduction Recent developments in computer, information, and cognitive science have stimulated interest in program- ming computers so that they appear to be “intelligent” to human users. In information retrieval there is interest in developing natural language interfaces to retrieval sys- tems [l], so that, in effect, the computer talks to the user in his or her language. In the subdiscipline of cognitive science called artificial intelligence, there is the advent of expert systems and their attendant knowledge bases which assist the user in decision making and problem Kcccived February 4. 1985: revised April 10, 1985; accepted April 10, I985 (~1 1986 by John Wiley 6i Sons, Inc. solving [2]. In the field of scientific information there are specialized text files summarizing what is known about certain topics [3]. Also, linguists have turned to scientific and medical texts to attempt an automatic analysis of content into frames [4]. This is possible because scientific prose may be considered as a series of specialized sublan- guages. One of the tasks which may be amenable to computer- assisted intelligence is the generation of summaries or synopses of specific scientific areas. Information scien- tists have long recognized the importance of review arti- cles [5], and more recently the short review has come into wider use. The Institute for Scientific Information’s At- las of Science project incorporated the idea of the “minireview” in its presentation of bibliographic infor- mation over a wide range of specialized areas [6]. A re- cent collection of essays commissioned by the National Institute of Education points out the importance of knowledge synthesis in the field of education [7]. From the cognitive science perspective, the problem is how to model the thought processes of a scientist when he or she constructs a review of a field. In describing such a complex intellectual enterprise, what determines the or- der or structure of the presentation, the words used to express the ideas, and the way the author moves from one idea to the next? To what extent is there agreement among scientists on the state of a given field, and does a shared perspective or paradigm exist? The point of departure for the methodology proposed here is the co-citation clustering concept [B]. A co-cita- tion cluster is a bibliometrically defined network struc- ture, and the hypothesis is that it defines a knowledge structure as well as an “invisible college” or social struc- ture, as has been explored by other authors [9]. Hence the objective of the present work is to see how a specialty’s co- citation network can be used to generate a synthetic re- view statement, and what light this sheds on the thought processes involved in review writing. In essence, the method pieces together selected pas- sages from a variety of sources to form, as far as possible, JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 37(3):97-110, 1986 CCC 0002.8231/86/030097-14$04.00

Upload: henry-small

Post on 06-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

The Synthesis of Specialty Narratives from Co=citation Clusters

Henry Small institute for Scientific Information, 3507 Market Street, Philadelphia, PA 19104

A method is described for generating reviews or synop- ses of scientific fields called specialty narratives. The raw data for the narrative are the Science Citation Index and selected texts from the published scientific litera- ture. The review process is modeled as a walk through a co-citation network using a quasi-minimal spanning tree path and a depth-first search method. Text for the narrative is derived from a statistical analysis of pas- sages which cite the core (highly cited) documents in the cluster. A method is devised for selecting the pas- sage which is most representative of the citing pas- sages for a specific core document using shared con- tent words. This is called the consensus passage. Transitions between nodes in the network are derived from an analysis of co-citing contexts, which reveal why the concepts represented by the documents are associ- ated. The final specialty narrative consists of consen- sus passages ordered by the tree search, and con- nected to one another by transitional sentences. The method is applied to a co-citation cluster in the field of cancer virology, and the resulting specialty narrative is described in relation to the structure of the original co- citation map. Implications of the procedure for the cog- nitive processes involved in reviewing a field, and the extension of the method to other fields and higher level maps of science are discussed.

Introduction

Recent developments in computer, information, and cognitive science have stimulated interest in program- ming computers so that they appear to be “intelligent” to

human users. In information retrieval there is interest in developing natural language interfaces to retrieval sys- tems [l], so that, in effect, the computer talks to the user in his or her language. In the subdiscipline of cognitive science called artificial intelligence, there is the advent of expert systems and their attendant knowledge bases

which assist the user in decision making and problem

Kcccived February 4. 1985: revised April 10, 1985; accepted April 10, I985

(~1 1986 by John Wiley 6i Sons, Inc.

solving [2]. In the field of scientific information there are

specialized text files summarizing what is known about

certain topics [3]. Also, linguists have turned to scientific and medical texts to attempt an automatic analysis of content into frames [4]. This is possible because scientific prose may be considered as a series of specialized sublan-

guages. One of the tasks which may be amenable to computer-

assisted intelligence is the generation of summaries or synopses of specific scientific areas. Information scien- tists have long recognized the importance of review arti-

cles [5], and more recently the short review has come into wider use. The Institute for Scientific Information’s At- las of Science project incorporated the idea of the

“minireview” in its presentation of bibliographic infor- mation over a wide range of specialized areas [6]. A re- cent collection of essays commissioned by the National Institute of Education points out the importance of knowledge synthesis in the field of education [7].

From the cognitive science perspective, the problem is

how to model the thought processes of a scientist when he

or she constructs a review of a field. In describing such a complex intellectual enterprise, what determines the or-

der or structure of the presentation, the words used to express the ideas, and the way the author moves from one idea to the next? To what extent is there agreement

among scientists on the state of a given field, and does a shared perspective or paradigm exist?

The point of departure for the methodology proposed here is the co-citation clustering concept [B]. A co-cita- tion cluster is a bibliometrically defined network struc- ture, and the hypothesis is that it defines a knowledge structure as well as an “invisible college” or social struc- ture, as has been explored by other authors [9]. Hence the objective of the present work is to see how a specialty’s co-

citation network can be used to generate a synthetic re- view statement, and what light this sheds on the thought processes involved in review writing.

In essence, the method pieces together selected pas- sages from a variety of sources to form, as far as possible,

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 37(3):97-110, 1986 CCC 0002.8231/86/030097-14$04.00

a coherent whole. The sources of text for the narrative are

the papers which cite the core documents in the cluster,

and specifically the contexts of citation for those docu- ments. Earlier work has shown that cited works often

play the role of concept symbols for the citing authors, and analysis of the language of citing passages has re- vealed a uniformity of terminological usage specific to the

cited work [lO,ll]. The study of co-citing contexts has also been explored to find the reasons why specific pairs of cited works are associated 1121. The language of such

co-citing passages is less uniform, but it appears possible

to use the classification systems proposed for citing con- texts to analyze the co-citing passages as well [ 131. More

recently it has been shown how paragraphs which cite

multiple core documents in a cluster can be used to pro- vide an interpretation of the structure of the co-citation

network map [ 141. This concept is extended here so that citing passages from several sources, selected as the most representative, can be combined, in a predetermined se-

quence and with appropriate transitions, to form a spe- cialty narrative.

Co-citation Clusters

The generation of the specialty narrative begins with a

co-citation cluster derived from a single-link cluster anal- ysis of a citation database. The method for deriving such

clusters has been described in detail elsewhere 1151, but

the main steps in the process will be reviewed here. A ci- tation database such as the Science Citation Index (SCI), consists of records of the form “document A cites docu-

ment B” and covers a predefined set of sources of cita- tions (the A’s) covering a prescribed time period (for ex- ample, some number of years), and an unconstrained set

of cited documents (the B’s), whatever earlier literature the sources cite. In the example used later in this article,

a special 3-year citation database was extracted from the SCI using the interest profile of a single research institute [16]. More commonly, the database is defined in terms of

a set of source journals.

To form co-citation clusters, a citation frequency threshold is set to select the most cited documents in the database. Usually this threshold is in terms of an integer

citation count, but more recently a fractional citation threshold has been used which normalizes citation inten-

sities across fields. [l7]. A threshold is used to define the resolution of the analysis in terms of the smallest unit from which the structure is built, analogous to the resolu- tion of a visual image. Since citation frequency is consid- ered a rough measure of impact or importance [ 181, this criterion provides a systematic sampling of key works represented in the database. The second step is to deter- mine the frequency of co-citation between all pairs of cited documents selected by the threshold. This is done

by permuting and summarizing all the qualifying co- cited references in the source documents, giving a file of co-cited document pairs, each with a frequency of co-ci-

tation, as a measure of their association or similarity.

This measure is normalized by one of several possible for- mulas (for example, Salton’s cosine formula [19] or the

Jaccard formula [20]), and the pairs are then subject to a

clustering routine. Because the cited documents and co-citation links

among them usually number in the tens of thousands, the most efficient clustering method available is the single- link method [21]. The characteristic of single-link clus-

ters is that each object must have at least one link to an- other object in the group, at the specific level, to be a member of the cluster. Hence, single-link clusters may be

very sparse in their interconnections, a phenomenon called “chaining,” but, in practice, display a mixture of densely and weakly linked regions. By contrast, com-

plete-linkage clustering would result in smaller, more densely packed clusters with more sharply defined boundaries. Since our objective is to simulate paths of

thought, single-link clusters with their long trails of asso- ciation leading to wider ranges of topics, seem more ap- propriate than the narrowly focused complete-linkage

clusters. Setting co-citation thresholds for clustering also

determines the breadth or narrowness of the resulting groups. Because these clustering methods are hierarchi-

cal, a high-level clustering will be included in a lower level solution, but the problem of how to optimize the cluster- ing level is the subject of ongoing research [22].

Once clusters have been obtained at some level of nor- malized co-citation, it is possible to represent individual

clusters as sets of connected nodes. This may be done by hand, simply placing the documents at arbitrary points and then connecting them according to the specified pat- tern of links, or alternatively, a program such as multidi- mensional scaling can be used to assign more meaningful

locations to the points [23]. Scaling takes the values of all links as input, and computes a location for each docu- ment in a space of N dimensions, such that the rank or-

dering of interpoint distances fits as well as possible the inverse ranking of link values (normalized co-citation, in this case). The configuration obtained in two-dimensions

from scaling usually clarifies the relationships between documents by placing strongly linked items close to one another and weakly linked items further apart. It is use-

ful to draw the links on the scaling plot at the level the

cluster was formed to show both the spatial proximity and the linkage pattern.

An example of a cluster derived from the special 3-year file noted above is presented in Figure 1 (ignoring for the moment the shading and labels on some of the links). It was one of 1278 clusters formed at the normalized co-ci- tation levels of .35 (using the Salton formula). This level was selected because it produced the maximum number of clusters of size 2 or larger. Each box is a document cited at least 12 times during the period 1979-1981, and the lines connecting them are co-citation links equal to or greater than .35, the level at which the cluster was formed. (The documents are listed in Appendix 1, and

98 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986

Kauashlma 760

Nouinrkl 78

:loyd 79

76b

Daweon 68

Chian 78

FIG. I. Leukemia viruses (NIAID 78-81 cluster 148).

citations to them were obtained from the SCZ for the

years 1979-1981.) The points have been positioned by

two-dimensional scaling conducted using the normalized co-citation links shown in Figure 1 and taking all other

interdocument links as zero. The subject matter is the study of leukemia viruses, a

topic in biomedical science and more specifically cancer virology. Two viruses play prominent roles: the Friend vi- rus, named after its discoverer Charlotte Friend, and the MCF virus, an abbreviation for mink-cell-focus-forming virus. The history of Friend virus research has been stud- ied by Hackett [24] up to 1974, and provides background

for this study of the 1979-1981 period. The documents of Figure 1 constitute an intellectually coherent set of key papers in this field. Our problem is how to construct, in

summary form, the knowledge base of this field, what we will call the specialty narrative.

Narrative Structure

The term “narrative” suggests that we seek a way to transform the structure of the co-citation network into a linear ordering of ideas appropriate to a written text. If the network accurately represents the relationships among key ideas in the field for a particular time period, to construct a specialty narrative, we need to traverse all nodes in an orderly fashion. Rather than taking each

node in an arbitrary sequence, the strong co-citation

links may be used as paths from node to node since they

represent the way the documents were associated by the

current authors on the subject. Following these associa- tion trails means that we are in some sense retracing the

sequence of thought which authors took in writing their

accounts. Even in restricting the paths between nodes to the

strong co-citation links, however, there are many differ-

ent ways of traversing the total network. Some further guidelines are needed to limit the possibilities. One rea- sonable rule is that the path should be an efficient one,

touching all nodes only once to eliminate redundancy. Another consideration is that the narrative should be

“connected” with as few “jumps” as possible between

unconnected nodes. For a review it also seems reasonable that it begin with one of the older documents in the net-

work, providing historical background as a point of de-

parture. Another criterion for selecting a starting point might be to use a cut-point 1251, in graph-theoretic terms, assuming that such points represent critical or piv- otal ideas. These same considerations may operate when individual authors attempt to order their ideas for coher- ent expression, without of course being aware of any ex- plicit graphical structure.

The formal structure of the network suggests one solu-

tion: find the most efficient path through the network

which touches every node one time. The concept of the minimal spanning tree is the mathematical equivalent of

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 99

the most efficient path, since this is the shortest path through the network touching all nodes which does not

contain any cycles [26]. The shading indicated on certain links of Figure 1 is one such spanning tree, although more than one such tree may exist. This is equivalent to saying that there is in general more than one way to tell the story of this field.

The particular tree shown in Figure 1 was produced by

the following algorithm: (1) A starting point is selected.

(2) The next point selected is the closest one (in terms of co-citation or distance on the scaling map) linked to that

starting point which has not been previously visited. (3) No cycles are allowed, that is, returning to a previously visited point via an alternate route. (4) In the event that

no unvisited node is connected to the current node, the

program “backtracks” along the path taken to the point until an unvisited node can be reached. (5) When all

nodes have been visited, the path is complete. The types of trees generated by this algorithm will span the net- work, but will not always be minimal. The reason this

procedure is preferred over the usual minimal spanning-

tree definition is that it depends more on paths available in a local situation, rather than global minima, and

therefore may resemble more what an individual might do caught in a maze, in our case a maze of ideas.

The spanning tree provides a pathway through the net- work touching all nodes, but not a sequential ordering of

nodes, that is, which one comes first, second, and so on.

This requires specification of a starting point, a “search”

method on the tree, and designation of trunk, branches,

and twigs. The search method which seems most appro- priate to the construction of a connected narrative is what is known as the “depth first search” 1271. This method

traverses each branch and twig of the tree as deeply as

possible before proceeding to the next branch or twig. Such a mechanism insures a continuity of ideas since jumps only occur when the end of a twig is reached and the narrative must then return to the main trunk. This may be contrasted to the “breadth first search” which

deals with each node connected to the starting node as

hierarchically contained within it, taking each in succes- sion, and then proceeding to the nodes two steps removed

branch 5

from the starting point. Such a method would lead to

many jumps between unconnected nodes, and a more dis- jointed narrative.

The starting point, or root of the tree, is also impor- tant in determining the structure of presentation. As noted before, reviews typically begin with older, pioneer- ing work, and in the case of Figure 1, the two oldest pa-

pers happen to be linked (nodes 1 and 2) and are located

at an extremity of the network. Selection of an alternative

starting point, for example nodes 17 or 19 which are cut- points in the graph, would divide the review into two ma- jor sections, one dealing with the right, the other with the left side of the network.

To implement the “depth first search” on the tree we

must also specify the trunk, branches, and twigs. Using

the rule that the longest path through the tree is the trunk

(the main narrative sequence), and that branches are longer than twigs, a schematic version of the tree is de- rived and shown in Figure 2. The branches off the trunk

and twigs off the branches may be regarded as digres-

sions from the main narrative. As soon as the branch or twig has been traversed, the digression is completed and the narrative returns to the main line of thought. Selec- tion of the longest path through the tree as the trunk or main narrative sequence insures that the discussion will flow logically over a long period and that digressions will

be as short as possible. The tree provides us with a linear

ordering of the nodes in the network, and a formal struc- ture for the specialty narrative. While it is not the only

possible ordering of ideas or way of telling the story of the field, it is a particularly efficient one.

Narrative Content

With narrative structure established, the next prob- lem is what content should fill it. Up to now we have used only the statistical characteristics of the co-citation graph. Since the content of the narrative consists of sen-

tences, it must be derived using another kind of analysis. Rather than use the highly cited documents themselves as

a source of this text, we use what the citing authors say

about these works when they are cited (that is, the cita-

end

branch 4 twig 2 branch 1

1 16

branch 2

FIG. 2. Narrative sequence on minimal spanning tree.

100 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986

tion contexts) [28]. Such an analysis involves identifying

the sentence or sentences where reference to specific works occurs. The advantages of using these data for con- structing the specialty narrative are that they capture a

current way of discussing the earlier work, and can take

into account several observations on the same work be- cause the documents on the map are highly cited. Also,

the transitions between the nodal works can be deduced more readily since the works are often cited in close prox-

imity to one another in the citing papers, and the specific nature of the relation is often explicity stated. In short,

since citation and co-citation patterns were used to gener- ate the cluster in the first place, it is logical to use the citing and co-citing sentences from the same papers to supply the content for the narrative.

Since multiple citing passages exist for each highly

cited work, the problem becomes which passage should

be selected? If it is assumed that most passages express the same concept, but in different words, then it is possi- ble to select that passage which is the most characteristic

or typical of the group by virtue of using the most fre- quently encountered words to describe the idea of the

cited work. We will refer to the passage selected in this

way as the “consensus passage.” (A similar strategy has been suggested for the automatic naming of clusters by 1. Sher of ISI and was used by H. Luhn for automatic ab-

stracting [29].) A review writer might engage in a similar mental procedure to express an idea, if, for example, he

or she recalls the ways the idea has been expressed by oth-

ers, and selects one by comparing each expression against an overall impression of the collection of wordings or for- mulations.

To implement this procedure a sample of citing papers is selected which provides both comprehensive coverage of the cluster and instances of co-usage, and the citing

passages are located in the texts for all core documents. This involves identifying the reference numbers for the core documents in the reference list at the end of the pa-

per, and then scanning the text to find where they occur, or scanning for some other indicator, such as an author-

year combination, which matches an alphabetic list of

references at the end of the paper. When all citing pas-

sages for a given source paper have been located and marked, and the extent of the relevant text has been de-

termined (usually one or two sentences around the point of the citation indicator) [30], the texts of the citing pas-

sages are keyed. In the process of keying, or in a prior preediting step, all content bearing words in the passage are marked. Stemming, truncation, or other methods of terminological unification may also be used at this point, with the idea of identifying the common language used by different authors in citing a given work. Function or non- lexical words which serve to form the syntactic frame for statements are not marked for analysis, since they are usually not unique to the expression of the concept.

After the passages from all citing papers have been co- ded and keyed, a program is run which counts the fre-

quency of occurrence of all content words for all passages associated with each cited reference. It then scores each

passage for “representativeness” by summing the fre- quencies of words greater than one contained in the spe- cific passage. Dividing this score by the sum of all word

frequencies occurring two or more times in all citing pas-

sages for a particular cited reference gives a normalized score which equals 1 if a passage contains all the words

which occurred two or more times. A score of zero is ob-

tained if a passage contains only words which occur uniquely in that passage. If all passages have zero scores,

then no common words exist and it can be said that there

is no consensus on the meaning of the reference. If all scores are 1, then every passage contains the same words appearing in every other passage, and there is complete

consensus on meaning. Obviously, most cases are be- tween these extremes, and passages vary in the degree to

which they may be said to “represent” or be “typical of”

all other passages. Hence it is possible to select the most representative or “consensus passage” of the set.

Another kind of normalization is required because of

the redundant citation phenomenon, that is, the citing of

multiple references within a given context [31]. Since it is

not possible to distinguish which words belong with

which reference or whether they are appropriate to all ref- erences, the scores are divided by the number of refer- ences the passage cites. Hence, these redundant citing passages are much less likely to be selected as the consen-

sus passage for a given reference. For the leukemia virus cluster of Figure 1, the seven

papers citing the largest number of core documents in the cluster were selected for citing context extraction. As

noted previously, it is important to select papers which

multiply cite the cluster because these papers will contain more instances of core document co-citation which are

important for determining the transitions from node to node along the narrative path. Of the 7 highest citing pa- pers, the highest cited I7 of the 29 core documents in the cluster, and the lowest cited 14 (see Appendix 2). The to- tal number of citing passages obtained from these 7 pa-

pers was 146 (rather than 105 which is the sum of the in- dividual citing frequencies), because of multiple citing passages for the same core document (that is, op tits)

[32]. This effect is complicated by the tendency for cer- tain passages to be footnoted with more than 1 of the 29 core documents, that is, redundantly citing passages.

The 146 citing passages for the 7 papers have a redun- dancy rate of 1.71 (250/146) references per passage, while there is an oy tit rate of 2.38 (250/105) passages per core documenl per citing paper.

From the point of view of the cited documents, the sample of 7 citing papers contain an average of 8.6 (250/ 29) passages per core document, 1 document having as many as 33 citing passages (counting the op tits). Never- theless, 4 documents on the map (nodes 23, 24, 27, and 29 on the left side) have no citing passages at all in the 7 papers, and hence 4 additional citing papers were re-

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 101

quired to span the cluster and have an adequate number

of passages for each node. This means that the selection

of a few of the most highly citing papers for a given clus- ter may not be sufficient to span all the cited documents.

The 4 supplementary citing papers were selected on the basis of citations to the specific items missed by the first

sample. The full bibliography of 7 most highly citing, and

4 supplementary papers is given in Appendix 2. After the citing passages were coded and keyed for all

11 papers, the computer run was carried out to determine the scores for each passage citing a particular reference. This process can be illustrated for the second cited docu-

ment in the narrative, namely Friend ‘57 (node 2). Figure

3 shows 6 citing passages for this document along with the source of each passage, the references cited by the

passage, the score (normalized by the number of refer-

ences cited), and the list of content words appearing two or more times in the 6 passages. To reiterate, the score is

the sum of the content word frequencies greater than 1 occurring in a given passage (italicized in Figure 3) di-

vided by the number of references cited by the passage.

By this criterion we select the passage from source 6 to include in the narrative for node 2 because its score is the highest (24). To determine on a more absolute scale how

well this passage represents the sample of 6, we divide by

the sum of word frequencies occurring two or more times,

and obtain a normalized score of .69 (24/35) out of a

maximum possible score of 1.0. The mean normalized score for the 29 passages for all nodes in the network is .48, and so the passage selected for node 2 is “better” or more representative than the average consensus passage.

The consensus passages for each of the 29 nodes on the

network (Figure 1) are given in the full specialty narrative (Figure 4). Before describing this narrative in more de- tail, however, we need to address the problem of transi- tions between passages, symbolized by the links between

nodes.

Transitions Between Nodes

The consensus passages for each cited document in the network are arranged in the narrative sequence given by

the spanning tree. What is lacking to constitute a coher- ent narrative are transitions between the passages. For

this we return to the citing papers and locate the co-cita-

tion contexts, the passages between the citing contexts, for each link on the spanning tree path. Generally, the kinds of words we seek are the “function” or “connect-

FIG. 3. Citing passages for node 2 (Friend ‘57).

Source Passage References Score

Norm

score

2 The I;rierzd strain of rnurirte acute eythroleukemia virus consists of at least fwo compo- 2J.7 27/3=9 .26

wxrs. One is a replication-defective virus which is responsible for the rapid zrur&mnutiow of erythropoietic stem cells in viva.

3 The original isolute of Friend virus described in 1957 caused acute erythroblastosis associ- 2 15/1=15 .43 ated with anemia.

4 The independently is&ted Friend and Rauscher cythroleukemia viruses cause similar 2 20/1=20 57

diseases and consist of at least two componenrs.

5 This con~powr~f is called the spleen focus-forming virlrs SFFV because of its ability to 2,3,7 17/3=5.7 .I6

induce descrete foci of lran~fornwd cells in the spleens of infected mice.

6 The Friend rnurirze Ieukrnziu virus was isolated in 1957 by Dr. Charlotte Friend by passage 2 24/1=24 .69

of c&-free extracts of Erlich ascites carcinoma cells in newborn Swiss wzice.

10 C-type virtrses have long been implicated in the etiology of spontaneous lymphomas or leer- 2 14/1=14 .40 kenzius of mice. This inference was grounded, in part, in the early demonstrations by Gross

and others of the existence of filterable leukemogenic agents in AKR mice or transplanted

nrurirw tumors.

Term

Virus/ Friend

Murine

Mice

Isolate/

Cell/ Component/

Leukemia

Acute

Erythroleukemia Two

Transform/

Frequency

6

4 3

3

3 3

3

2

2

2

2

2

35

102 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986

FIG. 4. Specialty narrative

Node DOCUlllCllt Most representative citing passage source <‘omment

I Metcalf 59

Transition (l-2)

2 Friend 57

Transition (2-3)

3 Axelrad 64

Transition (3-4)

4 stceves 71

Transition (4-S)

Transition (4-h)

6 Troxlcr 77~

Transition (3-7)

7 stccK!s 7.5

Transition (7-8)

8 Troxler 78b

Transition (8-9)

9 Ruscetti 7X

Transition (9- IO)

10 Troxler 77a

Transition (I 0- I 1)

II Bnrbacid 78

Transition (I I-12)

12 Dresler 79

Transition (12-13) The glycoproteins of the viral envelope have been further character&d. 1

13 Racevskis 78 A glycoprotein of MW 52.000. gp52. has been found in cells either productively in-

t’cctcd with the Friend virus complex or nonproductively infected with prcudotgpcd F-

SFFV.

Transition (13-14)

Pathogenic strains of Friend murine erythroleukemia virus have the capacity to induce

rapid proliferation of’ erythroid precursor cells within days after innoculation into adult mice of susceptible strains.

3

The virus causing this disease was discovered previously.

The Friend murine leukemia virus was isolated in 1957 hp Dr. Charlotte Friend by

passage of cell-free extracts of Erlich ascites carcinoma cells in newborn Swiss mice.

m virus consists of at least two components.

One component is called the spleen focus-forming virus (SFFV) because of its ability

to induct discrete foci of transformed cells in the spleens of infected mice.

Another component has also been idcntificd.

3

6

4

4

6

Keplicating helper-independent murine type-C viruses which did not retain the rapid

leukcmogenicity of Friend virus complex in adult mice were derived from Friend virus

stocks by several laboratories.

6

This other component behaves differently on its own. 6

Inoculation of rats or mice with replicating helper virus derived from Friend virus complex by several methods resulted in the developmeni of an apparent Iymphoid

leukemia after the latent interval of up to 6 months.

But when both components are combined, the characteristic Friend disease is pro

duccd.

6

I

In the case of Friend virus-induced erythroleukemia. it was shown that formation of

these spleen foci required the synergistic action of the F-MuLV helper virus and the

defective spleen focus-forming virus (SFFV).

Peturning to the replication defective component. its effect on spleens is characterir-

tic.

7

6

This spleen focus-forming activity was considered to be a distinct viral entity in Friend

virus and became known as spleen focus-forming virus or SFFV.

But much is still not known about the mechanisms by which viruses cause leukemia.

The possibility that viruses belonging to the MCF class may also induce erythrolcukc-

miss was suggested by the finding that spleens of leukemic mice, which had been

inoculated with a clone of Friend murine leukemia virus (F-MuLV), contained Fricnd-

MCF virus.

6

2

7

Further evidence exists of the involvement of MCF virus.

A cross-reacting SFFV gene product has been detected in SFFV nonproducer cell lincs

using a radioimmunoassay specific for a mink cell focus-inducing (MCF) virus, gp70.

4

2

This is consistent with the recombinant nature of F-SFFV.

SFFV was found to contain F-MuLV-derived genetic sequences by hybridization.

Hence further work has been done to find proteins encoded by F-SFFV and their rela-

tion to F-MuLV.

Using ccl1 lines and radioimmunoassay techniques for detecting viral antigens, it has

been reported that F-SFFV encodes a protein that is structurally and immunologically

indistinguishable from the ~1.5 polypeptide encoded by F-MuLV.

The cell surface localization of these SFFV gent products has been studied.

It has been shown that F-SFFV encodes a glycoprotein with an apparent MW of

55,000 (gp55) that is immunologically and structurally related to the F-MuLV enve-

lope glycoprotein gp70.

2

4

5

A relation was found between this glycoprotcin and a specific gene product. 6

Enter

branch

1

Enter

twig I

Jump

back to

branch

JU~IJ

back to

trunk

Enter

branch

2

Enter

twig 2

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 103

FIG. 4. (Continued)

Node Document Most representative citing passage Source Comment

14 Kacevskis 77

Transition (I 2- 15)

The gp.52 gene product detected in SFFV normal rat kidney cells is presumably identi-

cal to the SO,OOO-60,000 dalton glycoprotein which was previously detected in Friend

murine leukemia cells and which was considered to be potentially SFFV-specific.

We turn to the relationship between F-SFFV and MCF-related env gene products.

6

3

15 Ruscetti 80

Transition (15-16)

16 Ruscetti 79

Transition (IO- 17)

Peptide maps show that the F-SFFV gp52 shares peptides with the gp70 of both eco- tropic MuLVs and MCF-MuLVs.

Other results support this relationship.

1

MCF-specific antiserum fails to immunoprecipitate a glycoprotein gene product,

gp70, from ecotropic F-MuLV-infected cells but does immunoprecipitate gp70 from MCF-MuLV-infected cells and gp52 from SFFV-infected nonproducer cells.

We return to the question of the genome structure of Friend virus which could bear on

its leukemogenic properties, and the recombinant nature of SFFV.

3

3

I

17 Troxler 77b Nucleic acid hybridization studies have suggested that F-SFFV is closely related to the

dual-tropic MCF viruses which are ertv gene recombinants between ecotropic and

xenotropic MuLVs.

4

Transition (17-18) Friend virus may then arise in an analogous way to MCF viruses. 7

18 Troxler 78a It has been proposed that the SFFVs may have derived from dual-tropic UIV gene

recombinant MCF viruses which form during the process of leukemogenesis by F-

MuLV.

4

Transition (17-19) There was good evidence for the recombinant nature of the MCF viruses. 7

19 Elder 77

Transition (19-20)

20 Chien 78

Transition (20-21)

21 Rommelaere 78

Transition (21-22)

Since MCF viruses were known to be env gene recombinants between ecotropic and

xenotropic virus, it was concluded that the xenotropic-related portion of SFFV was

derived from a portion of the WY gene of murine xenotropic virus.

Further evidence pertains specifically to the Friend-MCF strain.

Friend-MCF virus, like other known MuLV-MCF viruses also appears to be an enve-

lope gene recombinant between ecotropic and xenotropic viruses.

And in general recombination seemed related to the ability to cause leukemia.

Studies have indicated that several highly leukemogenic MuLVs may have arisen by

recombination between less leukemogenic lymphatic leukemia viruses and unidenti-

fied endogenous xenotropic MuLVs.

6

7

I

7

2

This is similar to what occurs during the development of spontaneous leukemia in

AKR mice.

2

22 Hartley 77

Transition (22-23)

The onset of spontaneous leukemia in AKK mice is associated with the formation of

highly leukemogenic ens gene recombinants between endogenously inherited non-

leukemogenic ecotropic and xenotropic MuLVs.

In the presence of murine leukemia virus (MuLV), these MCF viruses can

accelerate the development of tumors.

5

9

23 Cloyd 80

Transition (23-24)

24 Cloyd 79

Rowe and coworkers have described a classification of MCF virus isolates based on

virus replication in AKR thymus and acceleration of leukemia development in AKR mice.

8

Immunological assays have been developed for these various MCF viruses.

From the extensive heterogeneity of AKR dual-tropic MuLV, it seems unlikely that

such viruses have a germ line origin but represent the products of independent recom- bination events between endogenous ecotropic and xenotropic MuLV genetic informa-

tion.

Transition (22-25) The MCF viruses may also play a causal role in other cancers.

Jump back to

branch

2

Jump

back

to trunk

Enter

branch

3

Jump

back to

trunk

Enter

branch

4

Jump

back to trunk,

enter

branch

5

104 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1966

Node Document Most representative citing passage Source Comment

25 Fischinger 75

Tramition (22-26)

2h Nowinski 78

Transition (26-27)

27 nays 77

Transition (26-28)

28 Kawashima 76b

Transition (28-29)

29 Kawashima 76a

MCF virus has also been found in Moloney murine leukemia virus Mo-MuLV-in-

duced T-cell lymphomas.

The recombinant nature of MCF viruses also explains the timing of leukemia devclop-

mcnt.

Certain isolates of dual-tropic MCF-MuLV. but not ecotropic or xenotropic MuLV,

have been shown to accelerate leukemia development after injection of newborn 01

young AKR mice, suggesting that age-dependent formation of dual-tropic recombi- nant viruses in thymus can account for at least part of the disease latent period.

These experiments in turn explained earlier results.

Earlier work showed that only extracts of thymus of older AKR mice could accelerate

leukemia development.

These age effects are ascribed to the production of the mink cell focus-inducing

(MCF) viruses.

The discovery that spontaneous thymic Iymphomas in AKR mice were associated with

the local production of viruses with an expanded host range has led to the speculation

that the expression of these viruses may represent a necessary step in viral-induced

lymphomas.

Similarly for leukemia, a preleukemic stage can be defined in terms of virus-related

antigens.

Since antigen-amplified thymocytes from preleukemic AKR mice are not yet leukemic

based on transplantation tests in young AKR recipients, amplified expression of a

modified cellular env gene encoded by recombinant viruses may serve as a priming

event in AKR leukemogenesis by affecting normal thymocyte function or proliferation

in some way.

back to trunk

8

8 Enter

branch

8

9 Jump

back to trunk

9

II

ing” words rather than the “content” words, since the former tell us the kinds of relationships one document or

concept has to another. For small samples of citing pa-

pers, it is usually not possible to have a significant num- ber of proximate co-citation contexts for any two refer- ences (within a paragraph or two of one another), and

hence a frequency analysis of function or connecting words is not attempted here. However, there may be a set

of frequently occurring function or connecting words en-

countered in practice which would be amenable to statis- tical treatment. With larger samples of co-citing contexts it should be possible to automate the identification and

selection of transitional sentences, similar to the way the consensus passages were identified for each core docu-

ment, by counting the occurrences of different connect-

ing words. The generation of transitional sentences may be illus-

trated using the first step in the narrative which leads from Metcalf ‘59 (node 1) to Friend ‘57 (node 2, see Fig- ure 1). The Metcalf ‘59 paper is cited as the definitive study of the pathology of “Friend disease.” Two of the seven highly citing papers co-cite these core documents in context, in one case within the same paragraph and, in the other case, in successive paragraphs. One begins by citing Friend ‘57 and then proceeds to Metcalf ‘59 saying: 1‘ . . . however, precise histopathologic classification of Friend virus-induced leukemia was unclear.” [Italics

added.] The other paper cites the Metcalf ‘59 paper first, and then cites the Friend ‘57 paper as the “original iso-

late of Friend virus” which “caused” the disease syn-

drome. In both cases, the transition from one paper to the other is one of causation, namely, that a certain dis- ease is caused by a specific virus.

This transition is shown at the beginning of Figure 4 with the consensus passages for each document quoted and a transitional sentence between them: “The virus

causing this disease was discovered previously.” The transitional sentence sums up the relationship between the two documents, one which represents the “disease” and the other which represents the “virus.”

While the citing passages are direct quotes, with only minor editorial changes made for consistency or clarity,

the transitional sentences are paraphrases of text which comes between the references in the co-citing text. More

than one co-citing passage may be used to compose a transition, but the source of the co-citing passage most similar to the paraphrase is given in Figure 4. For exam- ple, two co-citing passages connecting Metcalf ‘59 and

Friend ‘57 were noted previously, but only one was indi- cated in the source column (source 3). Ideally a transition may be reduced to an indication of how the concepts in adjacent citing passages are related, but often it is neces- sary to include content in the transition if the concepts

are separated by other ideas in the co-citing text. Since

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 105

the cluster is a sampling of the most important concepts in the field, it is inevitable that some transitional ideas

will have to be supplied.

Special treatment is required for transitions at points on the narrative path where branches or twigs occur, or where the narrative returns to the main trunk after com-

pleting a branch or twig. As noted before, these branches

are analogous to interruptions or digressions. In written texts such points are often signalled by new section head- ings or new paragraphs. The longest branch in the narra- tive occurs at node 10 (Troxler 77a), and in the text of source #6 the new topic is signalled by the beginning of a

new section of the paper titled “Proteins Coded for by SFFV.” In the narrative, a branch point could be sig- nalled by a statement such as, “We turn now to consider proteins coded for by SFFV,” although continuity in the narrative will not be lost since a link leads to the next node. However, at the end of a branch, when we jump

back to the trunk of the tree, a discontinuity occurs in the

narrative. Since there is no way to return to the branch point without passing through nodes already discussed, which is equivalent to repeating ourselves, a jump is re-

quired. We might signal this change in topic by saying, “We return now to the properties of SFFV,” and proceed

to the transition between the point at which the branch

occurred and the next node in the narrative. A transitional sentence is required for each link on the

spanning tree. The narrative is completed by inserting a transitional sentence between the consensus passages for

each of the core documents. Transitional sentences are

important not only because they provide continuity be-

tween passages; they also indicate the cognitive relation- ships between key concepts in the specialty, and hence reflect the thought process that led to the review state-

ment .

The Specialty Narrative

The specialty narrative, now completed, is a combina- tion of statements by several individuals from several

sources, melded together by common usage, and selected to typify that usage. Hence it represents the main stream of specialty thinking, not a single individual’s view. Fig-

ure 4 gives the complete specialty narrative consisting of 29 consensus passages and 28 transitions. The node num- ber appears on the left, followed by the first author and year of the core document. The consensus passages for each node are presented in the narrative sequence de- fined earlier with the aid of the spanning tree. Each pas- sage is followed on the right by its source, that is, the cit- ing paper from which it was drawn (see Appendix 2). A transitional sentence follows each consensus passage, and its most similar source is given. The column headed “comment” indicates when the narrative enters a branch

or twig on the tree, or returns to the trunk. These branch and return points are also shown in Figure 2.

The narrative presented in Figure 4 provides a gradual

unfolding of the subject matter of the original co-citation

network. We begin with the characterization of the Friend virus complex and the disease it produces (nodes 1 and 2). Each component of the virus complex has its own characteristic behavior (nodes 3-7). The suggestion of gene relationships between the components and other vi-

ruses (nodes 8-10) leads to the large branch concerned

with proteins coded by the viral envelope (nodes 11-16). Completing this branch, we return to a discussion of the

role of recombination in the origin of the viruses, and cancer-causing properties (nodes 17 and 18). An analogy to the MCF virus is discussed, which supports the recom-

binant nature of Friend virus (node 19). While the right side of the network deals mainly with

the Friend virus complex, the left side deals with the re- lated MCF virus. The relation, however, is perhaps more than analogy, and it is suggested that the MCF virus may in some way have given rise to one component of the

Friend virus. The bridge from right to left sides of the map (nodes 17-19) symbolizes both a similarity between the two viruses in terms of their recombinant nature

(node 20) and also a possible genetic lineage.

Further into the MCF region (left side), the map deals with the recombinant origin of viruses of the MCF type

which cause leukemia (nodes 21 and 22). Using their ability to accelerate leukemia development provides a way to classify these heterogeneous viruses (nodes 23 and 24). MCF viruses are associated with lymphomas as well as leukemias (node 25). Evidence for the role of recombi- nation in accelerating leukemias and lymphomas comes

from studies of age dependencies and latent periods (nodes 26-28), and suggests that recombination may play a causal role in carcinogenesis (node 29).

It is evident from Figure 4 that sources 6 and 8 were

especially useful for the narrative in covering the Friend virus and MCF sides, respectively. Together these two

sources supplied 12 of the 29 (41%) consensus passages

(source 6 has 7 and source 8 has 5). In effect these papers are the most successful in expressing the results of re-

search in the field in a language common to other re- searchers. However, no single citing paper is broad enough to span the entire cluster of 29 core documents.

Another important point is that the structure of the texts

is mirrored by the structure of the network. This is seen not only by the division between different, but related vi- ruses from right to left on the map, but within the right side, how the viral proteins branch forms a separate sub- division.

Since we are positing a relationship between the spe- cialty narrative and the way a scientist thinks about and reviews a field, a closer look at the transitions between consensus passages is warranted. These links are the “glue” which binds together the key concepts of the field, A classification of the “types” of transitions was therefore attempted, analogous to previous efforts to classify citing

passages. Five categories were sufficient to encompass all 28 transitions: (1) AND ALSO or connective transitions,

106 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE--May 1986

expressed in such words as “another” or “further”; (2)

IN CONTRAST transitions signaled by words such as

“but” or “however”; (3) SIMILAR TO transitions which point out analogies or like character; (4) PROPERTY OF

which includes relations such as “characterized by” or “consists of”; and (5) EXPLAINS which indicates that concepts are “consistent with,” “support,” “cause,” or give “reasons for” other concepts, also indicated by

words such as “because” or “therefore.” The numbers of occurrences of these types in the narrative are:

1. AND ALSO 5 2. IN CONTRAST 3 3. SIMILAR TO 3 4. PROPERTY OF 7

5. EXPLAINS 10 - 28 transitions

The links on the spanning tree of Figure 1 have been

labeled to indicate the linkage type. The most frequent

linkage is what might be termed the “explanatory” type, which means that formal justification plays a major role

in the review of findings in a field. The second most fre- quently occurring type is PROPERTY OF which serves a descriptive function, pointing to the importance of de- tailed characterization. Whatever the personal paths of thought or inspiration might be, the public expressions of review authors appear to be dominated by explaining and

describing the phenomena.

It is important to note that the specialty narrative is not a history of the specialty. The narrative may incorpo-

rate certain “historical” progressions if these features are important to the current citing authors. More likely, the narrative will consist of logical or procedural relations be-

tween key concepts, as in the present case.

Conclusions

We have explored the hypothesis that the thought pro- cesses involved in reviewing a field can be modeled by a walk through a co-citation network. The process, as we

have presented it, can be divided into four phases. The

first phase involves a selection of key ideas or documents and recognition of their pattern of co-relation (formation

of the co-citation cluster). The second phase involves an ordering or structuring of the ideas to establish a se- quence of presentation consistent with their co-relations

(the spanning tree and search method). The third phase involves the expression of the ideas in words, which en- tails a selection of a wording which is representative of existing formulations by others (citing passage analysis and consensus passage selection). The fourth phase in- volves an expression of the mode of relationship between

the ideas (creation of transitional sentences). This model suggests the kinds of mental processes

which might be involved in each phase. In the selection

phase, authors might identify “important” papers or concepts and reasons for their association by repeated ex- posure to discussions of what papers or concepts others

regard as “important,” as well as their own judgments.

In the ordering phase, the author might try out various sequences of ideas and imagine how the materials would

fit together. In the expression phase, the author might

recall all the contexts in which a certain work was dis- cussed, and by comparing each instance against an over-

all impression of the set, determine which wording

seemed most appropriate. Since many of these operations involve perceptions of frequencies of occurrence, the

manner in which the mind performs such an integration

of experience assumes major importance [33]. It is unlikely, of course, that the human mind adheres

to a fixed sequence of phases, but rather all operations proceed in parallel. For example, two ideas may be se- lected, associated, ordered, expressed, and related before a third idea is brought in. Or the expression of an idea

may be decided on before its ordering in the narrative,

and so on. Nevertheless, the various cognitive tasks out-

lined above appear essential to the process of construct-

ing a review. Further progress in understanding this pro- cess will depend upon our ability to simulate these steps,

such as finding an efficient path through a maze of re-

lated concepts, or selecting a representative wording of an idea based on a general impression of prior usage, us- ing only the kinds of information and processing capabil- ities available to the human being.

Another issue is to what extent we capture a represen-

tation of a collective view of a specialty, the shared or

commonly traveled trails taken by scientists in thinking about or reviewing their field. Related to this is the find-

ing that the dominant means of travel between ideas are

the explanatory and descriptive modes. Is this a feature particularly unique to scientific discourse? Regarding the

question of consensus, a statistical measure has been in-

troduced, essentially an averaging of all matchings of in- dividual utterances against a group profile of related ex- pressions, which allows us to determine the degree of consensus or dissensus in samples of text. Clearly there need not be total consensus for there to exist a “most rep-

resentative” account, and there is no need for everyone to

hold or express a view for that view to be taken seriously. It has been suggested that the various evaluative acts

of individual scientists are expressed from the point of view of a “generalized other,” a kind of extension of the editorial “we” [34]. This may also apply to review writing

and would help explain the congruence among reviews.

This does not mean that the reviewer concocts a fictitious image of total unanimity in the field, but rather seeks out those most typical or characteristic views which capture points of agreement among groups of scientists.

Since reviewing is only one way a scientist thinks about his or her field, and many different narrative sequences are possible, it is natural to suppose that there is a multi- dimensional network of ideas of which the linear ordering of a particular text is one expression. The commonality of

this network among different individuals depends both on shared information flows (reading the literature,

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 107

meetings, discussions with colleagues) and on the adop- sources into a single text which is not matched in breadth

tion of the viewpoint of the “generalized other.” by any of the reviews in the field then available. The

At this point we can say only that the synthetic spe- method could in principle be extended to any of the sev- cialty narrative presents a plausible account of the key era1 thousand co-citation clusters currently generated

ideas in one field of cancer virology. Much work remains from the combined annual Science Citation Index and to validate the model and its product, including the sensi- Social Sciences Citation Index files. Also, we can only

tivity of the narrative to changes in the clustering thresh- speculate what such a narrative might resemble if it were

olds, selection of alternative pathways through the net- derived for a higher level map of science, where the nodes work, and other methods for sampling the citing on the map are entire specialties or disciplines [35]. In passages. It can be claimed that a unique entity has been other words, is it possible to envision a synthetic review of

created by piecing together statements from several science as a whole?

APPENDIX 1

Node Core documents

Citation

frequency

1979-1981

1 Metcalf, D.; Furth, J.; Buffett, R. F. “Pathogenesis of mouse leukemia caused by Friend virus.” Cancer Research. 1952-58; 1959.

2 Friend, C. “Cell-free transmission in adult Swiss mice of a disease having the character of a leukemia.” The Journal of Experimental Medicine. 105:307-318; 1957.

3

4

5

6

7

Axelrad, A. A.; Steeves, R. A. “Assay for Friend leukemia virus: rapid quantitative method based on enumeration of

macroscopic spleen foci in mice.” Virology. 24513-518; 1964.

Steeves, R. A.; Eckner R. J.; Bennett, M.; Mirand, E. A.; Trudel, P. J. “Isolation and characterization of a lymphatic

leukemia virus in the Friend virus complex.” Journal of the National Cancer Institute. 46:1209-1217; 1971.

Dawson, P. J.; Tacke, R. B.; Fieldsteel, A. H. “Relationship between Friend virus and an associated lymphatic leuke-

mia virus.” British Journal of Cancer. 22:569-576; 1968.

Troxler, D. H.; Parks, W. P.; Vass, W. C.; Scolnick, E. M. “Isolation of a fibroblast nonproducer cell line containing

the Friend strain of the spleen focus-forming virus.” Virology. 76:602-615; 1977.

Steeves, R. A. “Spleen focus-forming virus in Friend and Rauscher leukemia virus preparations.” Journal of the Nu- tional Cuncer Institute. 541289-297; 1975.

8 Troxler, D. H.; Scolnick, E. M. “Rapid leukemia induced by cloned Friend strain of replicating murine type-C virus.”

Virology. 85:17-27; 1978.

9

10

Ruscetti, S.; Linemeyer, D.; Feild, J.; Troxler, D.; Scolnick, E. “Type-specific radioimmunoassays for the gp7Os of

mink cell focus-inducing murine leukemia viruses: expression of a cross-reacting antigen in cells infected with the

Friend strain of the spleen focus-forming virus.” The Journal of Experimental Medicirre. 148:654-663; 1978.

Troxler, D. H.; Boyars, J. K.; Parks, W. P.; Scolnick, E. M. “Friend strain of spleen focus-forming virus: a recombi-

nant between mouse type C ecotropic viral sequences and sequences related to xenotropic virus.” Journal of Virology. 22~361-372; 1977.

11

12

Barbacid, M.; Troxler, D. H.; Scolnick, E. M.; Aaronson, S. A. “Analysis of translational products of Friend strain of

spleen focus-forming virus.” Journal of Virology. 27:826-830; 1978.

Dresler, S.; Ruta, M.; Murray, M. J.; Kabat, D. “Glycoprotein encoded by the Friend spleen focus-forming virus.”

Journal of Virology. 30564-575; 1979.

13 Racevskis, J.; Koch, G. “Synthesis and processing of viral proteins in Friend erythroleukemia cell lines.” Virology.

87:354-365; 1978.

14 Racevskis, J.; Koch, G. “Vital protein synthesis in Friend erythroleukemia cell lines.” JourrtaL of Virology. 21:328-337;

1977.

15

16

17

18

19

Ruscetti, S. K.; Linemeyer, D.; Feild, J.; Troxler, D.; Scolnick, E. M. “Characterization of a protein found in cells

infected with the spleen focus-forming virus that shares immunological cross-reactivity with the gp70 found in mink cell focus-inducing virus particles.” Journal oj’ Virology. 30:787-798; 1979.

Ruscetti, S.; Troxler, D.; Linemeyer, D.; Scolnick, E. “Three laboratory strains of spleen focus-forming virus: compar-

ison of their genomes and translational products.” Journal of Virology, 33: 140-151; 1980.

Troxler, D. H.; Lowy, D.; Howk, R.; Young, H.; Scolnick, E. M. “Friend strain of spleen focus-forming virus is a

recombinant between ecotropic murine type C virus and the env gene region of xenotropic type C virus.” Proceedings of the National Academy of Sciences. 74:4671-4675; 1977.

Troxler, D. H.; Yuan, E.; Linemeyer, D.; Ruscetti, S.; Scolnick, E. M. “Helper-independent mink cell focus-inducing

strains of Friend murine type-C virus: potential relationship to the origin of replication-defective spleen focus-forming

virus.” The Journal of Experimental Medicine. 148:639-653; 1978.

Elder, J. H.; Gautsch, J. W.; Jensen. F. C.; Lerner, R. A.; Hartley, J. W.: Rowe, W. P. “Biochemical evidence that MCF murine leukemia viruses are envelope (env) gene recombinants.” Proceedings q/’ the Ntrriwral Acadcwr~v q/ Sci- CIICCS. 74:4676-4680; 1977.

108 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986

34

108

90

42

16

45

50

40

28

74

23

41

19

20

17

46

69

32

155

APPENDIX 1 (Continued)

Node Core documents

Citation

frequency 1979-1981

20

21

22

23

24

25

26

27

28

29

Chien, Y. H.; Verma, I. M.; Shih, T. Y.; Scolnick, E. M.; Davidson, N. “Heteroduplex analysis of the sequence

relations between the RNAs of mink cell focus-inducing and murine leukemia viruses.” Journal of Virology. 28:352-

360; 1978.

Rommelaere, J.; Failer, D. V.; Hopkins, N. “Characterization and mapping of RNase Tl-resistant oligonucleotides

derived from the genomes of Akv and MCF murine leukemia viruses.” Proceedings of the National Academy of Sciences. 75:495-499; 1978.

Hartley, J. W.; Wolford, N. K.; Old, L. J.; Rowe, W. P. “A new class of murine leukemiavirus associated with develop-

ment of spontaneous lymphomas.” Proceedings of the National Academy of Sciences. 741789-192; 1977.

Cloyd, M. W.; Hartley, J. W.; Rowe, W. P. “Lymphomagenicity of recombinant mink cell focus-inducing murine

leukemia viruses.” The Journal of Experimental Medicine. 151542-552; 1980.

Cloyd, M. W.; Hartley, I. W.; Rowe, W. P. “Cell-surface antigens associated with recombinant mink cell focus-induc-

ing murine leukemia viruses.” The Journal of Experimental Medicine. 149:702-712; 1979.

Fischinger, P. I.; Nomura, S.; Bolognesi, D. P. “A novel murine oncornavirus with dual eeo- and xenotropic proper-

ties.” Proceedings qf the National Academy of Sciences. 72:5150-5155; 1975.

Nowinski, R. C.; Hays, E. F. “Oncogenicity of AKR endogenous leukemia viruses.” Journal of Virology. 27:13-18;

1978.

Hays, E. F.; Vredevoe, D. L. “A discrepancy in XC and oncogenicity assays for murine leukemia virus in AKR mice.”

Cancer Research. 371726-730; 1977.

Kawashima, K.; Ikeda, H.; Hartley, J. W.; Stockert, E.; Rowe, W. P.; Old, L. J. “Changes in expression of murine

leukemia virus antigens and production of xenotropic virus in the late preleukemic period in AKR mice.” Proceed- ings of the National Academy of Sciences. 73:4680-4684; 1976.

Kawashima, K.; Ikeda, H.; Stockert, E.; Takahashi, T.; Old, L. J. “Age-related changes in cell surface antigens of

preleukemic AKR thymocytes.” The Jourruzl of Experimental Medicine. 144:193-208; 1976.

40

91

242

30

15

55

47

19

78

38

APPENDIX 2

Source Highly citing papers

Citing

frequency

1 Bosselman, R. A.; Van Griensven, L. J. L. D.; Vogt, M.; Verma, I. M. “Genome organization of retroviruses. IX.

Analysis of the genomes of Friend spleen focus-forming (F-SFFV) and helper murine leukemia viruses by heterodu- plex-formation.” Virology. 102:234-239; 1980.

2 Dresler, S.; Ruta, M.; Murray, M. J.; Kabat, D. “Glycoprotein encoded by the Friend spleen focus-forming virus.”

Journal of Virology. 30564-575; 1979.

3 Troxler, D. H.; Ruscetti, S. K.; Linemeyer, D. L.; Scolnick, E. M. “Helper-independent and replication-defective erythroblastosis-inducing virus contained within anemia-inducing Friend virus complex (FV-A).” Virology. 102:28- 45; 1980.

4 Ruta, M.; Kabat, D. “Plasma membrane glycoproteins encoded by cloned Rauscher and Friend spleen focus-forming

viruses.” Journal of Virology. 35:844-853; 1980.

5 Kabat, D.; Ruta, M.; Murray, M. I.; Polonoff, E. “Immunoselection of mutants deficient in cell surface glycoproteins

encoded by murine erythroleukemia viruses.” Proceedings of the National Academy of Sciences. 7757-61; 1980.

6 Troxler, D. H.; Ruscetti, S. K.; Scolnick, E. M. “The molecular biology of Friend virus.” Biochimica Biophysics Acta. 605305324; 1980.

7 Van Griensven L. J. L. D.; Vogt, M. “Rauscher ‘mink cell focus-inducing’ (MCF) virus causes erythroleukemia in mice:

Its isolation and properties.” Virology. 101:376-388; 1980.

Supplemental papers

8 O’Donnell, P. V.; Stockert, E.: Obata, Y.; Old, L. .I. “Leukemogenic properties of AKR dualtropic (MCF) viruses:

Amplification of murine leukemia virus-related antigens on thymocJrtes and acceleration of leukemia development in

AKR mice.” Virology. 112548~563; 1981.

9 Chattopadhyay, S. K.; Lander, M. R.; Gupta, S.; Rands, E.; Lowy, D. R. “Origin of mink cytopathic focus-forming

(MCF) viruses: Comparison with ecotropic and xenotropic murine leukemia virus genomes.” Virology. 113:465-483;

1981.

10

11

Cloyd, M. W.; Hartley, J. W.; Rowe, W. P. “Lymphomagenicity of recombinant mink cell focus-inducing murine

leukemia viruses.” The Journal of Experimental Medicine. 151:542-552; 1980.

Famulari, N. G.; Tung, J.-S.; O’Donnell, P. V.; Fleissner, E. “Murine leukemia virus env-gene expression in preleuke-

mic thymocytes and leukemia cells of AKR strain mice.” Cold Spring Harbor. 44:1281-1287; 1979.

15

17

14

15

14

16

14

10

11

9

8

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986 109

References

1. Doszkocs, T. E. “Cite NLM-natural language searching in an

online catalog.” Information Technology and Libraries. 2(4):364-380; 1983.

2. Yaghmai, N. S.; Maxin, J. A. “Expert systems: a tutorial.” Jour.

nal of the American Society for Information Science. 35(5):297- 305; 1984.

3.

4.

5.

6.

7.

8.

Bernstein, L. M.; Siegel, E. R.; Goldstein, C. M. “The hepatitis

knowledge base: a prototype information transfer system.” An- nals of Internal Medicine. 43(1):165-222; 1980.

Sager, N. Nuturuf Lunguugr Information Processing. Reading, MA: Addison-Wesley; 1981.

Herring, C. “Distill or drown: the need for reviews.” Physics To- day. 21(9):27-33; 1968.

Garfield. E. ISIAtlas ofscience: Biochemistry and Molecular Bi- ology. Philadelphia: Institiute for Scientific Information; 1981.

Ward, S. A.; Reed, L. J.. Eds. Knowledge Structure and Use: Implications/or Syrzthesis andInterpretation. Philadelphia: Tem-

ple University Press; 1983.

Small, H.; Griffith, B. C. “The structure of scientific literatures.

I. Identifying and graphing specialties.” Science Studies. 4:339- 365; 1974.

9.

10.

Mullins, N. C.; Hargens, L. L.; Hecht. P. K.; Kick, E. L. “Group

structure of co-citation clusters-a comparative study.” Ameri- can Sociological Review. 42522-562; 1977. Garfield, E. “Can citation indexing be automated?” In: Stevens,

M. E.; Giuliano, V. E.; Heilprin, L. B., Eds. StatisticalAssocia- tion Methods,for Mechanized Documentation. Washington, DC:

NBS; 15 December 1965: pp. 189-192. Reprinted in: Essyys ofan Information Scierrtist, Vol. 1. Philadelphia: ISI Press; 1977: pp.

83-90.

11.

12.

13.

14.

Small, H. “Cited documents as concept symbols.” Social Studies of Science. 8:237-340; 1978.

Small, H. “Co-citation context analysis and the structure of para-

digms.” Journal qf’ Documentation. 36(3):183-196; 1980.

Small, H. “Citation context analysis.” Progress irt Communica- tion Sciences. 3:287-310; 1982.

Small, H. “The lives of a scientific paper.” In: Warren, K.. Ed.

Selectivity in Ir!formation Systems. New York: Praeger; 1984: pp.

83-97.

15.

16.

Garfield. E. Citation Indexing: Its Theory artdApplication in Sci- ence. Technology and Humanities. New York: John Wiley &

Sons, Inc.: 1979: pp. 98-147.

Small, H. “Co-citation cluster analysis of NIAlD intramural re-

search.” Final report on NIH contract NO1 -AI-32684. Institute for

Scientific Information; 1983.

17. Small, H.; Sweeney, E. “Clustering the Science Citation Index

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

using Co-citations. I. A comparison of methods.” Scientometrics. 7:391-409; 1985.

Narin, F. Evaluative Bibliometrics. Report on NSF contract C-

627. Computer Horizons Inc.; 1976: Chap. 5.

Salton, G.; Bergmark, D. “A citation study of computer science

literature.” IEEE Transactions on Prqfessional Communication. PC-22(3):146-158; 1979.

Sneath, P. H. A.; Sokal, R. R. Numerical Taxonomy. San Fran-

cisco: W. H. Freeman; 1973: p. 131.

Hartigan. J. A. Clustering Algorithms. New York: John Wiley &

Sons, Inc.; 1975: p. 199.

Shaw, W. M. “Critical thresholds in co-citation graphs.” .lolournaf

q’ the American Society for Information Science. 36(1):38-43; 1985.

Kruskal, J. B. “Multidimensional scaling by optimizing good-

ness-of-fit to a non-metric hypothesis.” Psychometrica. 29:1-37;

1964.

Hackett, E. J. Social and Cultural It@ences on Contemporary Biomedical Science: A Case Study qf Friend Virus Research. PhD

dissertation, Cornell University: 1979.

Harary, F.; Norman, R. 2.; Cartwright, D. Structural Models. New York: John Wiley; 1965: p. 225.

Hillier, F. S.; Lieberman, G. J. Introduction to Operutions Re- search. San Francisco: Holden-Day. Inc.; 1967: p. 222.

Baron, R. 1.; Shapiro, L. G. Data Structures and their Imple- merztation. New York: Van Nostrand Reinhold; 1980: p. 180.

Small, H. “Citation context analysis.” OP cit. Luhn. H. P. “The Automatic Creation of Literature Abstracts.”

IBM Jourmrl of Research artd Development. 2(2): 159-165; 1958.

O’Connor, .I. “Biomedical citing statements: computer recogni-

tion and use to aid full-text retrieval.” Iaformation Processing trrrd Management. 19:361-368; 1983.

Moravcsik, M. .I.; Murugesan, P. “Some results on the function

and quality of citations.” Social S&dies qf Scierrce. 5(1):86-92; 1976.

Voos. H.; Dagaev. K. S. “Are all citations equal? Or did we op.

cit. your idem?” Jountal ofAcademic Librariamhip. 1(6):19-21;

1976.

Washer. L.; Zacks, R. T. “Automatic processing of fundamental information: the case of frequency of occurrence.” Americurz Psy- chologist. 39(12):1372-1388; 1984.

Burt, R. S.; Doreian, P. “Testing a structural model of percep-

tion: conformity and deviance with respect to journal norms in

elite sociological methodology.” Quality & Quuntity. 16: 109-150;

1982.

Small H.; Sweeney, E.; Greenlec, E.; “Clustering the Science Ci- trrtio~~ Itrtlex using co-citations. II. Mapping science.” Scierztome- tries. W-6):321-340; 1985.

110 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1986