information for the germanist: a practical suggestion

7
INFORMATION FOR THE GERMANIST : A PRACTICAL SUGGESTION BY R. W. LAST IN an article in German Life and Letters published in 1961, B. A. Rowley voices the frustration of the Germanist going about the business of seeking information : The sheer physical labour involved in finding copies of secondary sources is inordinate. Moreover, it is virtually impossible to obtain a full list of secondary sourcesin a given field without a quite disproportionate expenditure of effort.’ The problem is familiar to all engaged in academic studies; and, although the plight of the Germanist in search of information is by no means as desperate as that of the scientist in particular is now, the accelerating expansion of universities throughout the world, and a corresponding population explosion in lecturers and researchers, make it imperative that we put our house in order before a similar situation overtakes us, in which communication is in danger of breaking down. We deal mainly in two kinds of information: primary and secondary literature. Primary literature tends to follow a pattern of advance towards a definitive edition, with relatively minor subsequentimprovements. AIthough there is scope for the mechanized collation of texts to speed up this process, it is in secondary literature that chaos threatens. The transmission of an idea or set of ideas from originator to reader takes three stages: (I) publishing; (2) storing; (3) retrieving. It is this vital self- perpetuating cycle of research and publication that must be rationalized in some way if the point of diminishing returns that confionts the scientist is not to confront us also in a decade or so. Even at the first of these stages serious obstacles lie in the path of the scholar. In his preface to Urfaust and Fuust, ein Frugment, L. A. Willoughby writes: ‘The critical literature on Faust is now so vast that it is beyond the capacity of one man to master it’? But it is not only in well-ploughed fields such as this that problems for editor and critic alike abound. Even in less familiar expanses of literature it is more than possible to overlook an import- ant argument or insight, or to duplicate work done before and immured in some obscure Proceedings or Transactions. The problem could be tackled at the second stage, that is in the stores of information, of which there is an immense variety: the results of research appear in full-length studies, shorter monographs, Festschrifren, critical introductions to primary literature, periodicals devoted to German literature 2-

Upload: r-w-last

Post on 02-Oct-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

INFORMATION FOR THE GERMANIST : A PRACTICAL SUGGESTION

BY R. W. LAST

IN an article in German Life and Letters published in 1961, B. A. Rowley voices the frustration of the Germanist going about the business of seeking information :

The sheer physical labour involved in finding copies of secondary sources is inordinate. Moreover, it is virtually impossible to obtain a full list of secondary sources in a given field without a quite disproportionate expenditure of effort.’

The problem is familiar to all engaged in academic studies; and, although the plight of the Germanist in search of information is by no means as desperate as that of the scientist in particular is now, the accelerating expansion of universities throughout the world, and a corresponding population explosion in lecturers and researchers, make it imperative that we put our house in order before a similar situation overtakes us, in which communication is in danger of breaking down.

We deal mainly in two kinds of information: primary and secondary literature. Primary literature tends to follow a pattern of advance towards a definitive edition, with relatively minor subsequent improvements. AIthough there is scope for the mechanized collation of texts to speed up this process, it is in secondary literature that chaos threatens.

The transmission of an idea or set of ideas from originator to reader takes three stages: (I) publishing; (2) storing; (3) retrieving. It is this vital self- perpetuating cycle of research and publication that must be rationalized in some way if the point of diminishing returns that confionts the scientist is not to confront us also in a decade or so.

Even at the first of these stages serious obstacles lie in the path of the scholar. In his preface to Urfaust and Fuust, ein Frugment, L. A. Willoughby writes: ‘The critical literature on Faust is now so vast that it is beyond the capacity of one man to master it’? But it is not only in well-ploughed fields such as this that problems for editor and critic alike abound. Even in less familiar expanses of literature it is more than possible to overlook an import- ant argument or insight, or to duplicate work done before and immured in some obscure Proceedings or Transactions.

The problem could be tackled at the second stage, that is in the stores of information, of which there is an immense variety: the results of research appear in full-length studies, shorter monographs, Festschrifren, critical introductions to primary literature, periodicals devoted to German literature

2-

Page 2: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

225 I N F O R M A T I O N FOR THE GERMANIST

or the humanities in gcneral, even ncwspaper articles, book reviews and so forth. But any attempt at standardizing storage of information would not only be hopelessly inadequate and create an outcry from the traditionalist, it would also jeopardize the character of the organs of communication one would bc seeking to reform and challenge the freedom to publish an idea in the form and manner to which it is suitcd. Moreover, it would not solve the problem of the mass of information already stored.

Information retrieval is best attempted at the third stage. A partial solution, like that of the bibliographical periodical Germanistik, is worse than none at all. A search there for studies dealing with, say, Gerhart Hauptmann’s comedy Der Biberpelz demands firstly that thc time-wasting assumption be made that there are studies of the play in existence. Such a search requires a large number of operations: the index to each volume must be examined under the general heading G. Hauptmann; each number checked to discover if the reference bears a title which appears to be relevant to Der Biberpelz; then each of the articles or monographs that seem to promise information must be hunted out and read, in the case of D e r Biberpelz yielding a meagre harvest of one article drawing a rather questionable comparison with Brecht’s Mutter Courage, and another giving little more than a tentative prod at the structure of the work.

There are three wasteful and unreliable aspects of this procedure which give rise to concern: it is not known in advance if material exists; at one point the searcher is presented with an apparent excess of information, namely reference to all the secondary literature on Hauptmann ; and, thirdly, at another point he is faced with a serious deficiency, in that titles of articles and monographs often do not present a sufficiently detailed and exclusive summary of the contents. Should one, for example, look into an article baldly entitled ‘Gerhart Hauptmann’? It may prove to be one of a series concerned with turn-of-the-century comic literature in Europe, or merely a sketch of the writer’s achievement on the occasion of some anniversary. Beyond this area, studies of Naturalism in general, or even works which on the surface have nothing to do with Hauptmann in particular, could yield valuable information.

It is not the function of the researcher in German literature to spend a great deal of time in looking for and reading a great mass of secondary literature on the off-chance that there may bc something of use to him buried away somewhere.

An entirely different and comprehensive approach to information retrieval is urgently required, one which is not liable to become increasingly unwieldy with the passage of time and the high annual growth increment of stored information. A conventional bibliography is neither versatile nor usable enough to produce the necessary results. It would demand an immense

Page 3: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

226 INFORMATION POR THE GBRMANIST

amount of effort on the part of compilers to record every significant reference in all possible permutations, and omniscient foreknowledge of the future user’s requests and the manner in which he formulates them.

In the last decade work has been progressing on ways and means of resolv- ing information retrieval for the scientist? Various experimental techniques of automatic secondary literature-searching by electronic computer are under investigation, which rejoice in exotic titles like KWIC (and its opposite KWOC), SMART, ACORN, and EDIAC, and which have met with considerable success. Some, indeed, have advanced beyond the experimental level and are now in practical use, not only in science, but in the fields of law and librarianship: All aim to render preliminary literature-searching auto- matic, thus reducing the number of stages required of the scholar in quest of sources and increasing greatly the probability that he will find the relevant ones. KWIC (Keyword-in-Context), for example, ideally reduces the number of stages to two : in answer to a request for information on ‘gamma rays’, the user is provided with a list of occurrences of the word ‘gamma’ in the titles of a certain range of publications, together with the words immedi- ately before and after it, with the appropriate references. The same process can be applied to the text of these publications as a whole. Thus a kind of crude concordance is proffered with a centrally sited keyword, and even at this first level the user can exercise his judgment on the usefulness of the sources. The second operation is to find and read those sources considered of value.’

Automatic literature-searching techniques at this and far higher levels of sophistication can equally be applied for the Germanist. The aims of such an information-retrieval project should be: (I) the minimum of pre-editing; (2) the maximum possible usefulness; (3) the ability to adapt to changing needs and improved techniques.

The only way of meeting all these basic criteria is to process the whole of the data to be analysed (i.e. the secondary texts) in machine-readable form. This is not so ludicrous as it sounds: there is an increasing use of punched tape in mechanized type-setting, and such tapes, suitably edited and converted when available, could well spare the effort of punching out every work of criticism as it is published. There are two further reasons for encoding the whole text: a user could then request, copyright problems apart, that an obscure source be printed from the tape in its entirety, a process quite feasible with a high-speed line printer. Also there is an increasing tendency towards miniaturization of literature storage, and it seems inevitable that the point will soon be reached at which some periodicals at least will no longer appear in the conventional form, but, for reasons of space, on microfilm or even punched tape. Duplication of tape is far less expensive than producing a facsimile of an out-of-print serial. And when optical scanning of natural

Page 4: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

INFORMATION FOR THE GERMANIST 227

language is perfected thc whole process of achieving a machine-readable form will be executed automatically.

‘The minimum of pre-editing’ has an ominous ring, but the term is used here specifically. It does not mean asking the author to modify his language to suit the machine (a counsel of despair proposed in the context of mechan- ical translation at a Conference on the Mechanization of Thought-processes by L. Brandwood),6 it simply underlines the need to ensure that context has appended to it all the information required in order to enable the user to locate its source. There are technical pre-editing problems that would require careful planning: for example, how to deal with footnotes and bibliographies, misprints and homonyms.

The criterion of maximum possible usefulness applies not only to the student of literature, but also to other potential users, for example, the psychologist interested in the style of learned articles. The principal user- the Germanist-should be able to make as wide a variety of requests as possible, from the generation of a simple bibliography of critical works on a specific topic to the production of information of great complexity. Even citation indexes can be obtained.’

The third requirement will also be covered by encoding the entire text: one envisages that advances, not only in automatic indexing, but in the related fields of automatic syntactic analysis and mechanical translation, will profit and expand the uscfulness of the system.

Devising a scheme in detail for the Germanist creates problems in more than one plane : critical works are produced in a variety of languages as well as forms. It is essential that experiments start on a small scale to obviate the long and tedious correction of any large errors in procedure that may come to light; hence it is suggested that one should begin by taking one periodical run, the contents of which are mainly in English, and perhaps for a long time after the experimental stage restricting references to longer publications (and unpublished theses) to name of author, title, edition, place and date of publi- cation and contents of the index where available, for their transfer en bloc to machine-readable form under prevailing methods would occasion immense problems of cost.

An experimental approach could be to establish a preliminary filter for the input data in the form of two dictionaries (stored on magnetic tape or disc), with the following procedure: (I) data are checked against a dictionary which would instruct that certain words be ignored in subsequent processing, such as definite articles, certain verbs, and other words of no use as keywords, but with thc facility of adding to and subtracting from the list; (2) thc remain- ing words are sorted alphabetically, together with their locations ; (3) these words are then collated with a second dictionary, which would contain all those words likely to be of value. Naturally decisions would have to be made

Page 5: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

228 I N F O R M A T I O N FOR T H E G E R M A N I S T

relating to nouns which subsequently appear as pronouns, variants such as theor-y, theor-etical, theor-izing, theor-ies, and to synonyms like ‘Auf- klsrung’ and Enlightenment. At this stage, a list of words could be printed out which occur in neither dictionary, thus allowing progressive refinement of the inclusive and exclusive dictionaries (which may later be merged into one), and, through the medium of German and other language quotations in the secondary text, allow the gradual build-up of a multilingual system.

The material ‘passed’ by the second dictionary should be coded in such a way that it allows readily accessible subdivision of concepts, both in order to make up-dating less unwieldy, and also to allow the user to frame his request in as specific a manner as possible. A storage system could be evolved for retrieval of information (in a format similar to that of the KWIC system) in three dimensions. Theoretically, this takes as its basic unit a chessboard arrangement (although in practice a far greater number of squares than sixty- four will be needed), which represents the storage location of all concepts relating to one particular writer, ordered in a pattern predetermined by a master-plan. For example (using the descriptive notation from White’s viewpoint), QR3 might be the ‘address’ of contexts from sources relating to the dramatic theory of writer A, QKt3 his theory of the lyric, and so forth. The rest of the column QR3 would be devoted to specific details under the heading ‘drama’. Each of these matrices would contain, in so far as it is possible, similar information at the same location number, and each matrix would be part of a pile corresponding to a literary grouping, with a general heading matrix for the group as a whole. The matrices would be numbered so that, for the sake of argument, FI = Atrfkl;irtrng (= Enlightenment), F2 = Gottsched, F3 = Frau Gottsched, etc. Thus a request coded F2/QR3 would produce KWIC-type contexts of sources relating to Gottschcd’s dramas (a request to an address in one dimension would generate information on that and inferior addresses), and a request coded FI . . . Fn/QR3 would produce all contexts relating to Aiffklurmg dramatic theory.

The type of service that should be offered would be the answering of specific requests by a central agency linked with an existing specialist library, rather than the periodic productions of a global list, for the reasons adduced above. The user would be provided with a handbook which would enable him to formulate his request in a highly specific manner.

There are, however, a great number of difficulties to overcome before the system would achieve self-sufficiency. One such difficulty that immediately comes to mind is that the proposed automatic procedure involves, not simply derivative, but assignment, indexing, a highly skilled task. It may well prove advisable, in dividing and subdividing the data, to draw on the experience of a specialized library system already worked out in detail and proven workable.

Page 6: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

I N F O R M A T I O N FOR T H E G E R M A N I S T 229

Another difficulty is that of the iiicomplcte rcferencc. Take this passage as an example:

In a review article on Samuel’s edition, Maurice Benn claimed that the editor had been faced with the necessity of choice between the education theory and the idealist interpretation, which sees the play as a drama of redemption in the Schillerian tradition. Benn’s ingenious suggestion as to how to escape from Professor Samuel’s dilemma is still closely linked to the traditional inter- pretations. . .’

Anyone interested in Gottfried Benn would be more than puzzled by a KWIC-type reference such as ‘ . . . in the Schillerian tradition. Benn’s ingeni- ous suggestion as to . . . ’ The Schlegels, thc La Rochcs, thc Kleists all face the compiler of an automatic indexing system with equal, if not greater, demands on his ingenuity. The problem can be solved by modifying the input, i.e. by appending ‘Maurice’ to every subsequent occurrence of ‘Benn’ in the article; by sensibly interpreting the output, where the two Benn refcrcnces would occur consecutively; or, in the last resort, by manual pre-cditing. It may prove helpful to assign to each batch of input data the code number of the matrix or matrices under which heading the data would be likely to fall. This last proccdure would also ensure that character’s names find their way to the correct address.

Anothcr, and more serious, difficulty, is presented by a topic which is expressible only by a phrase, such as ‘the dramas of Lessing’, which can be written in a variety of ways. It is not enough to program a machine to recognize this concept if the words ‘Lessing’ (or ‘his’ in the right context) plus ‘drama’, ‘dramas’, ‘plays’, or ‘dramatic works’, etc., occur in the same sentence. One could readily concoct situations in which the required words exist but the concept is quite different. ‘Lessing plays an important role in the development of the ‘Fabel’, or ‘The drumu ofLcssing confronting the orthodox cleric was a spur to the liberal, the despair of the conservative’. Techniques are in existence for the recognition of such patterns within the framework of a syntactic relationship,’ but some anomalies will still slip through the net. However, it is far prefcrable to create a system which errs on the side of excess.

It must be recognized that no single mcthod is capable of resolving all the Germanist’s information retrieval problems, nor is thc wholcsale borrowing of methods applied to the scientific publication, for thcre the structure and layout has a quite distinctive pattern. Only a combination and adaptation of all suitablc procedures so far developed, and the evolution of new ones, will allow the system to meet thc dcmands of the basic criteria.

The development of a viable mcthod of automatic indcxiiig for the Germaiiist will produce a huge additional rcturn, not only for thosc intcrcsted

Page 7: INFORMATION FOR THE GERMANIST: A PRACTICAL SUGGESTION

~~

230 INFORMATION FOR THE GERMANIST

say, in the development of criticism, a full history of which has yet to be written in this field, but will also profit the major part of our work, the study of primary literature, automatic techniques for the investigation of which are still in their infancy.”

However it is produced, such a system is inevitable if the work of the Germanist is not soon to be defeated by the range of material and variety of sources to which he is exposed. It would not render his essential task of making valuejudgments on primary literature any easier, for a machine- even an electronic computer-can do no more than man can accomplish, given time and patience; but it would minimize the drudgery that now faces us daily of ploughing through a vast amount of irrelevant information in the hope of finding some useful items. And at present there is always the nagging suspicion that something vital has been overlooked.

NOTES

B. A. Rowley, ‘Partly Knowing and Not Knowing: Some Clearings in the Jungle of Research’, GLL, vol. XIV (1961), 52.

L. A. Willoughby (ed.), Urfaust and Fausf, ein hugment. Oxford (I946), ix. The two most valuable surveys of automatic indexing are M. E. Stevens, Auromtic Indexirtg: A

State-of-the-Art Report, U.S. Department of Commerce, National Bureau of Standards Monograph 91 (1965); and D. G. Hayes (ed.), Readings in Automatic h g u u g e Proressing. New York (1966).

The 196043 Index to Computing Reviews is an example of the KWIC technique which has been employed by Biological Abstracts for their Annual Cumulative Index since 1960. M. E. Stevens (op. cit.) cites thirty similar projects.

Detailed information on KWIC can be found in H. P. Luhn, ‘Keyword-incontext Index for Technical Literature (KWIC Index)’, Americaii Documentation. voL XI (I*), 288-95.

‘Pronoun Reference in German’, in Nafional PhysicalLabordory Symposium No. 10, H.M.S.O. (~gsg),

This kind of index offers a list of secondary literature in which a specified citation is quoted. E. W. Herd, ‘Form and Intention in Klcist’s Prinz hiedrich uon Homburg’, Seminar, vol. I1 (I*), I. Vide, G. Salton. ‘Automatic Phase Matching’, in D. G. Hayes (op. cit.). 169-88; and G. Salton and

M. E. Lesk, ‘The SMART Automatic Document Retrieval System-An Illustration’, Communiutbns of the ACM, vol. VIII (1g65), 391-8,

lo The electronic computer is now well-established as a tool for the humanist (witness the establishment of the Literary and Linguistic Computing Centre in Cambridge under the Directorship of Dr R. A. Wisbey), both in the production of concordances and in more sophisticated applications to primary literature. Vide, e.g. L. Brandwood, ‘Analysing Plato’s Style with an Electronic Computer’, Instifute of Classical Studies (Uniuersity oflondon) Bulletin No. 3.45-54; J. B. Bessinger, ‘Computer Techniques for an Old Enghsh Concordance’, Americun Documenfation, vol. XI1 (1961), 227-9; R. T. Cargo (ed.), A Concord- ance to Baudetaire’s Les Fleurs du Mat, Univ. of N. Carolina (1965); A. Ellegkd, A Statistical Methodfor Determining Authorship, Gothenburg (1962) ; E. G. Fogel, ‘Electronic Computers and Elizabethan Studies’, Studies in Bibliography, vol. XV (1964), 15-31; S. M. Parrish, ‘Problems in the Making of Computer Concordances’, Studies in Bibliography. vol. XV (1964, 1-14 (which discusses some of the Cornell series of computer concordances); B. Vannier’s review of R. T. Cargo’s Baudelaire concordance in MLN, VOL 81 (1966). 358-60; R. Wisbey, ‘Concordance Making by Electronic Computer: Some Experiences with the Wiener Genesis’. MLR, vol. LVII (1g62), 161-72; R. Wisbey, ‘The Analysis of Middle High German Texts by Computer-Some Lexicographical Aspects’, Transactions ofthcPhilological Society (1g63), 28-48. Also the periodicals Compater and the Humanities (obtainable from Queens College, Hushing, N.Y.), and Revue (published by the International Organization for Ancient Languages Analysis by Computer, Lihge).

vol. I, 338.