toward universal serial item names

8/14/2019 Toward Universal Serial Item Names

http://slidepdf.com/reader/full/toward-universal-serial-item-names 1/42

Toward Universal Serial Item Names

Robert D. CameronSchool of Computing ScienceSimon Fraser University

Accepted for publication in JoDI - Revisions Completed - June 25, 1998

Copyright 1997, 1998, Robert D. Cameron.

Contents

AbstractI. IntroductionII. Requirements

Requirement #1: Unambiguous Article IdentificationRequirement #2: Canonical USINsRequirement #3: Identification of Secondary Serial ComponentsRequirement #4: Scholar-Friendliness

Requirement #4.1: No Required RedundancyRequirement #4.2: Standard MnemonicsRequirement #4.3: Publication Numbering

Requirement #4.4: Standard Numbering SyntaxRequirement #4.5: Brevity of Article IdentificationRequirement #4.6: Ease of Construction and AnalysisRequirement #4.7: Media Independent SpecificationRequirement #4.8: Embedding USINs in Context

Requirement #5: Permanence of USIN DesignationRequirement #6: Accomodating Serial Evolution

III. Global Naming of Serial PublicationsHierarchical Naming Using the DNS ModelThree Initial DomainsEvolution of the USIN System: Towards Scholar-Friendly Names

IV. Hierarchical Identification of Serial ItemsExample: Journal Article Citation

Multiple Articles Per Page.Unpaginated E-Journals

A General Model for Identification by Hierarchical NumberingScopeScope-Dependent NumberingSyntactic RepresentationParallel Numbering HierarchiesChronology

Further Work: Hierarchical Numbering Theory

http://users/kpumuk/convert/2/432360-434656.html#Further

http://users/kpumuk/convert/2/432360-434656.html#Chronology

http://users/kpumuk/convert/2/432360-434656.html#Scope-Dependent

http://users/kpumuk/convert/2/432360-434656.html#Unpaginated

http://users/kpumuk/convert/2/432360-434656.html#III

http://users/kpumuk/convert/2/432360-434656.html#Requirement6


http://users/kpumuk/convert/2/432360-434656.html#Requirement4.7











http://users/kpumuk/convert/2/432360-434656.html#Requirements

http://users/kpumuk/convert/2/432360-434656.html#Introduction

http://users/kpumuk/convert/2/432360-434656.html#Further

http://users/kpumuk/convert/2/432360-434656.html#Chronology

http://users/kpumuk/convert/2/432360-434656.html#Parallel

http://users/kpumuk/convert/2/432360-434656.html#Syntactic

http://users/kpumuk/convert/2/432360-434656.html#Scope-Dependent

http://users/kpumuk/convert/2/432360-434656.html#Scope

http://users/kpumuk/convert/2/432360-434656.html#Numbering

http://users/kpumuk/convert/2/432360-434656.html#Unpaginated

http://users/kpumuk/convert/2/432360-434656.html#Multiple

http://users/kpumuk/convert/2/432360-434656.html#Journal

http://users/kpumuk/convert/2/432360-434656.html#IV

http://users/kpumuk/convert/2/432360-434656.html#Evolution

http://users/kpumuk/convert/2/432360-434656.html#Three

http://users/kpumuk/convert/2/432360-434656.html#Hierarchical

http://users/kpumuk/convert/2/432360-434656.html#III















http://users/kpumuk/convert/2/432360-434656.html#Requirements

http://users/kpumuk/convert/2/432360-434656.html#Introduction

http://users/kpumuk/convert/2/432360-434656.html#Abstract



Additional Design Ideas for Hierarchical NumberingSyntax for Holdings DescriptionSecondary Component NotationThe Reference NotationHyphenation Notation

V. USIN Support TechnologyUSIN Global Registry

SDL - Serials Definition LanguageUPP: USIN Publication ProtocolSRP: Serial Registration ProtocolPDP: Publication Domain Protocol

USIN Global Database SystemUIP - USIN Inquiry ProtocolBibliographic Retrieval and Formatting

USINs, the USIN Global Database and Literature ResearchVI. ConclusionReferences

Abstract

The Universal Serial Item Name (USIN) scheme is proposed as a framework for asingle global namespace of articles and other contributions published in organizedserial collections. Requirements for USINs are analyzed with an emphasis on theuse of USINs in scholarly communication. A uniform naming model is described

based on the hierarchical naming of serial publications and the hierarchicalnumbering of serial items. A number of concrete design ideas for USIN syntax arepresented. A USIN Global Registry and a USIN Global Database are proposed andanalyzed in terms of specific architectural features that interact to meet therequirements of publishers, librarians and scholars. Applications of the USINconcept to literature research, document retrieval, bibliography preparation andaddressing the "broken links" problem of the World-Wide Web are considered.

I. Introduction

The Universal Serial Item Name (USIN) scheme is proposed as a framework for asingle global namespace of articles and other contributions published in organizedserial collections. Although the initial focus is scholarly literature published in

journals, conference proceedings, technical reports and books, the scheme isintended to accomodate extensions to include other types of serialized contributionssuch as magazine articles, bills of a legislature, decisions of a court or minutes of university committee meetings. The USIN is intended as a vehicle forinteroperability between various bibliographic citation applications, includingfinding citations (literature research), retrieving citations (from on-line sources,libraries or document delivery services), citation indexing, and citation formatting(bibliography preparation). The USIN is also intended as one possible mechanism

http://users/kpumuk/convert/2/432360-434656.html#References

http://users/kpumuk/convert/2/432360-434656.html#Conclusion

http://users/kpumuk/convert/2/432360-434656.html#Literature

http://users/kpumuk/convert/2/432360-434656.html#Bibliographic

http://users/kpumuk/convert/2/432360-434656.html#UIP

http://users/kpumuk/convert/2/432360-434656.html#Database

http://users/kpumuk/convert/2/432360-434656.html#PDP

http://users/kpumuk/convert/2/432360-434656.html#SRP

http://users/kpumuk/convert/2/432360-434656.html#UPP

http://users/kpumuk/convert/2/432360-434656.html#SDL

http://users/kpumuk/convert/2/432360-434656.html#Registry

http://users/kpumuk/convert/2/432360-434656.html#Technology

http://users/kpumuk/convert/2/432360-434656.html#Hyphenation

http://users/kpumuk/convert/2/432360-434656.html#ReferenceNotation

http://users/kpumuk/convert/2/432360-434656.html#Secondary

http://users/kpumuk/convert/2/432360-434656.html#Holdings

http://users/kpumuk/convert/2/432360-434656.html#Additional



with these products, editors will be spared the task of correcting author errors incitation, and readers will be spared the difficulty of resolving errors in citations thatauthors and editors miss.

In application to literature databases, the USIN can serve as a standard notation toreport the results of a search process. This could open up new opportunities forcombining search results from distinct databases. For example, duplications couldbe filtered by USIN matching, or relevant items from one search might be fed backinto a search on a different database. In fact, the USIN idea is intended to serve asthe core data element in a scheme for universal citation databases: databases thatlink every document to the documents it cites and vice versa [6].

In application to the World-Wide Web, the USIN concept has considerable promiseas a potential partial solution to the problem of "broken links" [5, 13]. In short, the

URLs that are presently used for hypertext links on the World-Wide Web are basedon "locations" that specify documents in terms of access protocols, port numbers,directory paths, and filenames. For various reasons, all of these attributes of document location are subject to change and web links frequently become broken asa result. Many proposals to resolve this problem through the creation of some formof Uniform Resource Name have been put forward, but none seem to haveprogressed beyond the experimental stage [8, 9].

In comparison to the URN approach, the USIN scheme concentrates on thesomewhat smaller problem of establishing a universal naming scheme forpublications in serialized collections only. One could imagine that USINs could be

developed within the overall URN structure as one particular "namespace" [17]. Onthe other hand, there are several reasons why it may be best to focus on a specificsolution for USINs instead of the general URN problem. First of all, it could beargued that the best focus for perpetual naming schemes is to concentrate on thoseitems actually intended to be long-term contributions to the global knowledgearchive. From this perspective, publication in an organized serial collection may bethe best single indication of such an intent. Second, the act of assigning a documenta number within a serial collection represents an important technical opportunityunavailable for general web resources; a specific event in the publication process towhich naming scheme protocols can be tied. Third, focussing on the evolving globalknowledge archive as a development from the present international network of libraries may suggest different approaches to identifying the "resolution service" fora USIN. For example, users could be allowed to choose their own resolution servicefrom those offered by different local libraries, instead of being forced to accept anetwork-specified service. In the terminology of the Dexter Hypertext ReferenceModel [15], we can take advantage of the flexibilities afforded by resolution withinthe run-time layer to overcome difficulties in storage-layer resolution. For all thesereasons, focussing on publications in organized serial collections may be both theright problem to solve and the one for which URN solutions are most feasible.

Applications of the USIN scheme to other areas such as legal citation and legal

research are also envisaged. However, these are at present beyond the scope of this

http://users/kpumuk/convert/2/432360-434656.html#Dexter

http://users/kpumuk/convert/2/432360-434656.html#RFC2141



http://users/kpumuk/convert/2/432360-434656.html#MOM

http://users/kpumuk/convert/2/432360-434656.html#4P

http://users/kpumuk/convert/2/432360-434656.html#UCD



paper and are left as an area for future consideration.

This paper is intended as a discussion document to set the framework fordevelopment of the USIN concept. Overall, the goal is to propose the requirements

that must be met by any USIN system, and to suggest some reasonably concretedesign ideas that meet those requirements. Section II focuses on the requirementsanalysis with a particular emphasis on the concept of scholar-friendly naming.Sections III and IV focus on design concepts that satisfy the USIN requirements,broken down into two main tasks: globally unique naming of serial publications andhierarchical identification of serial items within a particular publication series.Section V then discusses requirements for important USIN support technologies.Section VI concludes the paper.

II. Requirements Analysis

The goal of this section is to discuss the general requirements that any USIN systemmust meet, without making premature commitments to particular USIN designideas. At the same time, the requirements are used to analyze some of theinadequacies of the existing identification standards, primarily SICI and ISSN. Thisserves both to help establish the need for a new identification scheme and to bringsome concreteness to the discussion. The reader who prefers additional concretenessmay wish to briefly look ahead to some example design ideas for journal articlecitation in Section IV.

Requirement #1: Unambiguous Article Identification

It may seem obvious that a USIN scheme must meet the basic goal of unambiguousarticle identification: every article must be denotable and every USIN denoting anarticle must denote no other article. However, there are difficulties in achieving thisgoal and the goal is in fact not achieved by the existing SICI coding scheme. Inessence, the SICI scheme is prone to failure in some rare cases involving articlesappearing on the same page and having similarly abbreviated titles. To deal with themultiple article per page problem, SICI uses a "title code" of up to six characters,usually formed from the initial letters of title words. Different articles on a page canbe usually distinguished by this title abbreviation. In principle, however, it ispossible to have two or more articles with the same SICI title abbreviation andhence the same overall SICI code. Presumably this is one of the reasons for the 12ambiguities reported within 4 million SICI strings stored in the Uncover database[22]. Another problem with SICI serial title abbreviation is that it requires human

judgment when the title contains symbology; this is a further possible source of ambiguity.

In order to ensure that every article is denotable, a logical first step is to ensure thatevery serial is denotable. Unfortunately, the existing international standard in serialidentification, the ISSN, has an insufficiently large denotation space. The ISSN

system is based on an eight-digit identifier with seven working digits and a check

http://users/kpumuk/convert/2/432360-434656.html#Schwarz-Hepfer

http://users/kpumuk/convert/2/432360-434656.html#Journal



digit. The upper limit on the number of serials that can be accommodated istherefore 10 million. When contemplating a universal designation scheme for serialitems as fine-grained as the minutes of curriculum committee meetings of aparticular university department, it should be become clear that the ISSN system aspresently constituted will not suffice.

Requirement #2: Canonical USINs

Although every USIN must denote at most one article, it is reasonable to allowdifferent USINs to denote the same article. For example, issue numbers may be anoptional part of the USIN syntax, required only when journals are paginated byissue. In the case that journals are paginated by volume, it could be desirable toallow either form (with or without issue numbers) as an acceptable USIN form.There are many other reasons that alternative forms of a USIN might be desirable

and there is no particular reason to rule this option out in the initial requirementspecification for USINs.

Nevertheless, of the set of USINs that may legally denote an article, exactly one of them should be specified as the canonical or preferred form. One use for canonicalforms is to make it easy to determine whether two different USINs denote the samearticle: convert them both to canonical form and see if they are the same. Forexample, if a user searches two distinct databases for articles of interest on aparticular topic and both databases return USINs in canonical form, then it is aneasy matter to filter out duplicate references to the same article because they arerepresented by exactly the same string. A second important role for canonical formsis to support indexing of information by USIN. By always associating informationwith the canonical form of a USIN, it will be possible to retrieve that informationgiven any legal USIN form by first converting to the canonical form.

A further requirement for USINs is that conversion to canonical form be analgorithmic process based on globally available information. In this way, separatesoftware systems will be able to interoperate by conversion to the commoncanonical form. The requirement for globally available information is notparticularly a restriction on the syntax of USINs, but is a constraint on theimplementation of the overall USIN system and how the basic information on

USINs and their formation on a serial-by-serial basis must be shared.

Requirement #3: Identification of Secondary Serial Components

Although the primary focus of the USIN concept is on the identification of published articles, there are a number of other related elements worthy of identification at both coarser and finer granularities. On the coarser side, thisincludes identification of the serial itself, volumes or volume ranges of serials, anindex to the volume, individual issues and issue ranges, contents of an issue orspecial sections of an issue. At a finer level of granularity, it may include named ornumbered components of articles, such as article abstracts or individual sections,



figures, tables, or equations. Scholars may sometimes want to make reference tothese components; other applications include identifying library holdings on avolume/issue basis, checking in serial issues when they arrive at the library orsubmitting claims for them when they are late, and ordering table of contents pagesfor awareness services. The SICI scheme includes capabilities for designating someof these components through its code structure identifier (CSI) and derivative partidentifier (DPI); the PII and DOI schemes do not appear to account for suchcomponents. ANSI Serials Holdings Statements, used to identify holdings in librarycatalogs, includes a variety of conventions for specifying volumes, issues, ranges of volumes and issues and similar units of collection [2].

It is not possible nor desirable to define a priori the specific set of secondary serialcomponents that are identifiable in the USIN syntax. Instead the requirementpresented here is that the USIN scheme should accommodate specification of these

elements through an extensible syntax that can be coupled with a specification of what elements exist on a serial-by-serial basis.

Requirement #4: Scholar-Friendliness

A key requirement central to the entire focus of the USIN concept is that itemphasizes the needs of the people who use USINs over the needs of computersthat process them. This encompasses many aspects that can be generally groupedunder the term scholar-friendliness . However, this term is not intended to restrictthe set of people whose requirements are considered. Instead, it reflects the notionthat anyone who uses a USIN to cite prior works may be said to be taking on therole of a scholar in that act.

One might consider that there is a middle ground between accommodating the needsof scholars and the needs of computer systems. However, the goal of establishingUSINs as names that will serve to denote published items over the long term shouldbe considered. From this viewpoint, apparent requirements that might derive fromthe limitations of present-day computer systems (e.g., fixed-length fields, limitedstorage capacity, etc.) should be avoided. There is little doubt that the processingand storage capabilities of the computer systems that will be available in comingdecades will be vastly superior to those of their present-day counterparts.

Nevertheless, scholar-friendliness cannot be considered an absolute requirement atthe possible expense of unambiguous article identification, canonical forms or otherrequirements. Instead, scholar-friendliness should be considered as a desirable traitto be maximized subject to the constraints imposed by other requirements.

Requirement #4.1: No Required Redundancy

Scholars will often need to write down USINs of interest or type them into theircomputers. To minimize the tedium and the chance of error in these manualprocesses, USINs should be designed to include only that information necessary to

http://users/kpumuk/convert/2/432360-434656.html#Z39.44



clearly identify the cited work. Redundant forms that include additional informationmay be allowed but must not be required. For example, for a journal that ispaginated by volume and that follows the convention of beginning each article on anew page, it is sufficient to specify the journal, volume number and initial pagenumber to uniquely identify an article. In this case, a USIN specification must notrequire the inclusion of additional information such as issue number, date orcomplete page range.

One counterargument is that redundant information helps prevent errors, but one canin turn counter that this approach to error control is obsolescent and inferior.Historically, the requirement for redundant information at data-entry time isdesigned to allow error detection at some future processing step. This is the basis forthree forms of redundancy in the existing SICI scheme for article identification:chronology (date of publication), title codes and check digits. However, these

devices provide error detection without error correction. When an error isencountered, there may be a considerable delay (e.g., days in interlibrary loanapplications) before the error can be corrected and processing resumed. Considerinstead an interactive process supported by a global network. When a scholar entersa USIN, interactive software could immediately consult the global USIN database toverify its correctness and to allow any necessary corrections or resolutions of ambiguity. One existing model for this is the immediate feedback one receives whenentering an incorrect URL on the World-Wide Web (Web). In this way, aninteractive data entry process can both avoid the tedium of redundant data entry andsupport a process of immediate error correction as well as detection. Construction of such a global USIN system is probably feasible using the present-day technology of

Internet-connected computers; if not, it will certainly become feasible within a smallnumber of years.

Requirement #4.2: Standard Mnemonics

A second requirement deriving from scholar-friendliness is to emphasize the use of mnemonic forms for identifying serial publications, and whenever possible, thestandard mnemonic forms that are actually used by the community of scholars thatuse a particular serial. For example, the journal ACM Transactions on Programming

Languages and Systems published by the Association for Computing Machinery is

widely known by the acronym TOPLAS. An acceptable mnemonic form foridentifying this serial might thus be S.ACM/TOPLAS where S might denote a globaldomain of scholarly societies and ACM is a unique code for the Association forComputing Machinery within that domain. As a second example, a designation suchas CA.SFU.CMPT/TR might be acceptable as a globally unique mnemonic code forthe Technical Report series of the School of Computing Science at Simon FraserUniversity. This code is mnemonic and builds on several accepted and standardabbrevations: CA as the ISO country code for Canada, SFU as a unique institutionalcode for Simon Fraser University in the CA domain (cf. the Internet DNS namesfu.ca), CMPT as the standard 4-letter department ID used by Simon FraserUniversity for the School of Computing Science and TR as the abbreviation for



"Technical Report" as used by the School. The syntax shown in these examples isintended to be illustrative of a possible realization of this USIN requirement, butnot prescriptive.

Incorporation of existing standard mnemonic codes within USIN designations willassist scholars in a number of ways. USINs will be reported to scholars as theresults of bibliographic search processes, scholars will enter USINs when doingcitation searches, scholars will use USINs when including references in papers andscholars will make note of USINs when they find papers of interest. In all theseapplications, scholars will find mnemonic forms easier to read, easier to reproduceand generally more useful. However, note that these requirements are met if themnemonic forms are acceptable as one of the alternative forms on USIN input andare produced during output by any USIN-generating software. More precisely, therequirement for using mnemonic forms applies to the definition of the canonical

form of USINs, but does not preclude alternative non-mnemonic forms.

Adopting scholar-friendly mnemonic identification necessarily imposes a furtherlimit on the role of ISSNs within the USIN scheme. Where a serial isunambiguously known by a mnemonic form, that form must be used as canonical inplace of the ISSN. Nevertheless, ISSNs are likely to have an important role both inidentifying serials for which no mnemonic abbreviation has been defined and forinitially identifying serials before their mnemonic identifications have beenregistered and accepted as globally unique.

Requirement #4.3: Publication Numbering

A further requirement deriving from the general principle of scholar-friendliness isthat existing publication numbering conventions should be employed or adaptedwherever possible to identify published articles within a particular serial. Forexample, articles in traditional print journals will typically be identified by volumenumber, issue number (if required) and page number, with the possible addition of acode to discriminate multiple articles on a single page. This will be of the greatestassistance to scholars when forming USINs from either copies of the article inquestion or from a citation of the article in a reference list. It will also be helpful toscholars in decoding USINs and retrieving the items from (physical or virtual)

library shelves.

The requirement for the use of publication numbering rules out the articleidentification mechanisms contemplated by the PII and DOI schemes as a basis forcanonical USINs. Both of those schemes emphasize publisher-generated numbersthat may be different from the actual numbering on the published serial. Thisrequirement also rules out other reasonable schemes for unambiguous articleidentification. For example, a scheme based on volume number and sequentialarticle number would be widely applicable as an unambiguous numbering schemefor many journals. But scholars may be unable to easily determine the sequentialarticle number from either a printed copy of the article or a conventional



bibliographic citation. If publication numbering exists, it should be used.

One might argue that identification by publication numbering is less scholar-friendlythan identification using more mnemonic article attributes, such as author name and

key title words. However, this is an instance in which scholar-friendliness shouldnot be considered an absolute at the expense of a system of unambiguous articleidentification.

One might also prefer to use publication chronology (e.g., dates, month-yearcombinations) instead of publication numbering. In fact, chronology is a form of numbering that happens also to be correlated with the passage of time. For sometypes of publication, chronology may be the only numbering that exists and hencemust be used. In other cases, acceptable alternative USIN forms may be definedbased on chronology. However, chronology is generally more complex and involves

more identification pitfalls. For example, if (volume, page) identification generallysuffices for article identification in a particular journal, it may be the case that (year,page) identification is inadequate for at least two reasons. First, the journal maypublish multiple volumes per year. Second, even if volumes are annual, they maynot correspond to calendar years; articles with the same starting page number in twoconsecutive volumes could still end up being published in the same year. In othercases, serial items may have duplicated and hence ambiguous chronology, forexample, when two technical reports are issued on the same date. There are also anumber of annoying coding problems for chronology. If numeric codes are used formonths, how do you code for month combinations or seasons? If nonnumericcoding is used should it be in English or the original language and should

abbreviations be used? For all these reasons of potential ambiguity and complexity,identification by simple publication numbering should be used in preference tochronology.

Requirement #4.4: Standard Numbering Syntax

Because technical reports, government publications, court documents and journalpapers have various different numbering schemes, alternative syntactic conventionsfor each type of publication will likely be necessary. In principle, each serial shouldbe accompanied by a definition of its numbering scheme, including syntax and

semantics of the USIN designations. However, in order to ease the burden onscholars, efforts should be made to limit the syntactic variations wherever possible.Thus, there should also be methods for defining standardized numbering schemes,with the goal that the vast majority of serials will use one of the standard schemesrather than one of their own design.

Requirement #4.5: Brevity of Article Identification

From the scholar's point of view, the primary role and need for USINs is inidentification of articles. Identifying secondary serial components (volumes, issues,

special sections, abstracts, etc.) is a secondary issue of considerably less importance.



The requirement for scholar-friendliness then is that the syntax for articleidentification not be complicated by codes to distinguish articles from other types of component. Instead, where necessary, the syntax for secondary components shouldinclude additional coding to indicate that a secondary component is being identified;the absence of such coding should be taken to indicate an article identification.

Requirement #4.6: Ease of Construction and Analysis

It should be easy for scholars to construct and analyze USINs manually. Checksumsand other calculations should be avoided. Appropriate punctuation should be used toavoid running numeric items together. For example, the code 20000229 used as theSICI specification for February 29, 2000 violates this requirement. Arcane numericcodes should also be avoided. Although numeric month codes 1 through 12 arearguably acceptable, the SICI code 23 meaning "Fall" is not.

Requirement #4.7: Media Independent Specification

It is not uncommon to find a particular serial published in two or more formats, forexample, in HTML format on the Web and on paper. From the scholar's viewpoint,it is usually the case that it is the content of the article, not the form of itspresentation, that matters. When there is no difference in content, the USINspecification for articles should be fundamentally independent of publicationmedium. This requirement does not preclude media specification from inclusion asan optional element in a USIN syntax. However, the SICI convention of including

the medium format identifier (MFI) as standard practice would not satisfy therequirement for USINs.

It may be the case that a publisher creates separate designations for different formatsof a serial, particularly when there may be significant differences in content. In thiscase, the publication medium or format may be implicitly identified by the choice of publication series designation. However, this does not represent a violation of format independence of the USIN syntax itself.

Requirement #4.8: Embedding USINs in Context

Scholars will need to make use of USINs as notational elements in a variety of contexts, both formal and informal. Formal contexts include use of USINs ascitation tags for bibliographic formatting software and data elements forbibliographic database queries. Informal contexts are generally oriented to thehuman reader, such as presentation of USINs in reference lists or direct use of USINs as nouns in sentences. In any of these contexts, there is a potential forconfusion to be created by interaction of the syntax of the USIN with the notationalconventions of its embedding.

The syntax of USINs should be designed to avoid confusions that can be created by

common notational features that may be expected in typical embeddings. In



particular, both formal and informal settings may embed USINs as notationalelements within structures delimited by parentheses, braces or similar brackettingstructures. To avoid confusion, USIN syntax should be constrained to allowbracketting symbols only if they occur in matched pairs. For example, if a USIN X

is to be acceptable as a parameter in a BibTeX citation tag of the form \cite{X },then any unmatched braces within X would surely cause confusion. It may beworthwhile to avoid braces altogether because of their use in the TeX family of document languages and similarly to avoid angle brackets ("<" and ">") because of their use in HTML and SGML.

When USINs are used as elements in ordinary discourse, they may often occur atthe end of a sentence or phrase. Punctuation (periods, commas, semicolons and soon) added at this point should not be a source of confusion. The presence or absenceof whitespace (blanks, tabs or line breaks) after such a punctuation symbol may be

used to discriminate. That is, a period, comma or other punctuation may be usedwithin the USIN syntax only if it is immediately followed by a nonblank character.Any of these punctuation marks followed by whitespace should always denote theend of a sentence or phrase.

Requirement #5: Permanence of USIN Designation

A necessary requirement for the USIN system is that USINs, once assigned andvalidated, remain permanently unambiguous identifiers of their documents. Thisapplies to both canonical and noncanonical USINs. Three hundred years from now,a scholar may come across a USIN designation in an obsolete form of print media.She may highlight it with her data capture pen and expect to see instantly theresolution of it to a full bibliographic reference on her electronic work area. Thisrequirement implies the need for a global registry system and a set of protocols forensuring that USINs, once assigned, are never reused.

However, it need not be required that canonical USINs always remain canonical, atleast in the initial development of the USIN system. Initially, the canonical USINforms for many serials will include serial designation by ISSN. As globally uniquemnemonic designations for these serials are gradually registered and accepted, thoseforms may become canonical. It may also be the case that changes in the canonical

form of serial numbering become desirable, particularly for those aspects of numbering that are not directly reflected in publication numbering (for example,position of an article on a page).

It may be useful to impose constraints on how frequently canonical forms may bevaried and/or on how results of USIN processing may be combined. For example,new canonical forms might be allowed to be registered at any time, but taking effectonly at certain designated times. When such a time is reached, an updating processmight (a) temporarily disallow new USIN processing requests, (b) allow currentrequests to complete or time out, (c) perform global updating of canonical forminformation, and (d) allow USIN processing requests to resume. Any application

that needs to ensure the completeness of USIN matching could use the simple



device of requiring that all USIN processing requests are initiated and completed inthe same time frame.

Requirement #6: Accomodating Serial EvolutionSerials evolve. Changes in title, publisher or publication frequency arecommonplace. Serials may merge together or split apart. Serials may suspendpublication and then resume publication at a later date. Serial publishers are alsosubject to many kinds of change: renaming, relocation, reorganization and so on.There is no doubt that accommodation of change must be an important design goalfor USIN development.

Two issues involving particular forms of change deserve special attention in thedevelopment of USIN syntax. The first is that title changes should not necessarily

require changes in the USIN code for a serial. This is at odds with the ISSNconvention, which requires new ISSNs to be issued when there is any significantchange in title. However, in considering mnemonic abbreviations of serial titles,various changes in title may be accommodated with the same mnemonic. If thepublisher and readers of a journal wish to retain a particular mnemonic by whichthe journal is known, the USIN system should respect this. The second issue is thatthe syntax for identifying components of a particular serial should be flexible andchangeable. For example, if a serial starts out with sequentially numbered issues, itsUSIN syntax should nevertheless accommodate a later reorganization to number thepublication by volume. Similarly, if a traditional print journal identifies articles byvolume and page number, the USIN syntax should accommodate a later change toan electronic format in which articles are identified by volume and article number.

Requirement #7: Version Discrimination

Articles evolve. Draft versions may be initially circulated in a working paper series,followed by revised versions in conference papers and further revised versions in

journals. At various stages an author may circulate intermediate versions to limitedgroups for review and comment. Post-publication revision of journal articles is alsobecoming a possibility with novel e-journal policies such as those of Living Reviewsin Relativity [24].

The USIN scheme generates distinct identifiers for each separately publishedversion of an article. One possible view of this is that each of these identifiers is infact an alternative identifier of the same article, with one of them (presumably themost recent) being the canonical form. However, this approach has several seriousproblems. The first is that there is no good basis for saying when two versions of anarticle should be treated as the same. How many insertions and/or deletions of textmay be accomodated? What about changes in title or authorship? It is difficult toimagine any set of rules that could provide a satisfactory and implementabledecision procedure. It is also difficult to imagine any mechanism that could ensurethat publishers actually identify these equivalent versions so that the correct

http://users/kpumuk/convert/2/432360-434656.html#LivingReviews



mappings to canonical form can be made automatically. Beyond these concerns,there is also a problem with such equivalences automatically being applied tocitations: changes in the content of an article between versions may render a citationapparently irrelevant or incorrect. This should not be considered a failure on the partof the citing author. In essence, it is a misrepresentation to map the author's citationof a particular version to any other version than the author intended.

Philosophically, then, USINs are names for particular versions of articles, not namesfor the more abstract notion of an article that maintains its identity through variousversions over time. Systems to support this more abstract notion, at least at thecoarse-grain level of publication versioning, might well be built on top of a USINsystem, using USINs to identify particular published versions of articles. Finer-grained versioning concepts, such as those of Augment/NLS [12] or Xanadu [20],might also make use of USINs to interoperate with conventional bibliographic

databases.

The sharp reader may notice an apparent contradiction between the USINrequirements with respect to changes to serials and changes to articles. The USINrequirement for serial codes does represent the more abstract notion of a serialpublication as it goes through various changes rather than the serial as it exists at asingle point in time. However, this distinction between the treatment of serial andarticle identifications reflects a fundamental philosophical view. In this view, serialsare like timelines and articles are like points on those lines. The timeline may gothrough the twists and turns of changes in publisher, title or numbering scheme andstill retain its identity. Each point on each line is separate entity with a separate

identity. There may be relationships between points such as "version-of" and"cites", but the separate identities of the points should be maintained in the USINapproach.

III. Global Naming of Serial Publications

Hierarchical Naming Using the DNS Model

The Domain Name System (DNS) of the Internet is a successful model of ahierarchical, globally-unique naming system using distributed authority [18]. UnderDNS, a number of global domains such as "edu" (educational institutions, primarilyU.S.), "org" (organizations, primarily non-profit), "ca" (Canadian sites), have beenestablished by common agreement. Each domain is managed by an independentdomain authority. Each domain authority assigns unique identifiers within itsdomain to create subdomains and/or to specify particular computer systems. When asubdomain is created, authority for assigning further identifiers within thesubdomain is often passed to a responsible organization. Subdomains may be furtherdivided into subsubdomains and so on.

Consider a USIN scheme that adopts the hierarchical naming idea of DNS, but with

a focus on naming serial publications and publishing organizations, not computer

http://users/kpumuk/convert/2/432360-434656.html#DNS

http://users/kpumuk/convert/2/432360-434656.html#Xanadu

http://users/kpumuk/convert/2/432360-434656.html#Augment



resources. The distinction between naming publications and naming computerresources is critical; the failure to make it may be one of the underlying problems of the URN concept. Notations such as the following may be contemplated:

S.ACM/TOPLAS as a designation for ACM Transactions on Programming Languages and Systems published by the Association for ComputingMachinery within a global domain for scholarly societies,S.ACM.SIGPLAN/Notices for SIGPLAN Notices of the ACM's SpecialInterest Group on Programming Languages,

CA.SFU.CMPT/TR for the Technical Report series of the School of ComputingScience of Simon Fraser University,and AU.NLA.ABN.SC/Papers for papers of the Standards Committee of theAustralian Bibliographic Network of the National Library of Australia withina global domain for Australia.

These examples are for illustrative purposes only; the actual development of adomain structure and names for serials and their publishers requires a process of international consultation and consensus.

In the USIN scheme, then, serial publications are given identifiers which must beunique in the context of a particular publication domain. Thus d1.d2.d3 isinterpreted to specify a subdomain d3 within domain d1.d2, which is itself hierarchically specified as a subdomain d2 within the global domain d1. In general,domains will denote publishing organizations, administrative divisions of suchorganizations or collectives for identifying organizations or publications.

The USIN syntax shown in this paper is intended to be illustrative rather thanprescriptive of the final form of USINs. Thus the choice of periods and slash marksas separators is somewhat arbitrary. One could also argue that the distinctionbetween slash marks and periods is artificial, i.e., that S.ACM.TOPLAS would do aswell as S.ACM/TOPLAS. However, distinguished punctuation allows us to inferdirectly from the form of a specification that S.ACM/TOPLAS is a serial publicationof the ACM, while S.ACM.SIGPLAN is an administrative division thereof. One couldalso question the decision to reverse the right-to-left structuring of domains underDNS; the reason for this is to use a consistent left-to-right hierarchical structuringwithin all levels of the USIN notation. Lastly, the final syntax of domain,

subdomain and series identifiers is left as an area for further work. However,allowance for case-sensitivity in such identifiers seems reasonable, e.g., CaS andCAS could denote separate items.

Three Initial Domains

Prior to international agreements to develop a full domain structure for USINs, it isnevertheless possible to initialize the scheme by building on existing globalidentification standards. With the present focus on the problem for scholarlyliterature taken in this paper, three initial USIN domains can be identified: ISSN,

ISBN and RDNS. The ISSN and ISBN domains directly use the international standard



numbering systems for serials and books. For example, ISSN/0164-0925 is aninitial USIN designation for ACM TOPLAS . Over time the notation S.ACM/TOPLAS

might be adopted as the canonical designation of this journal, but ISSN/0164-0925will always be acceptable. Similarly ISBN is identified as a global domain based onInternational Standard Book Numbers.

Names assigned under the Internet's Domain Name System are the basis for thethird leg of the initial tripod supporting the USIN scheme. Whenever a DNS domainname or host name is clearly associated with a particular publishing organization, itmay be used as a component of the RDNS (restricted DNS) domain of the USINscheme. For example, acm.org is a DNS domain identified with the Association forComputing Machinery, so RDNS."acm.org"/TOPLAS denotes ACM TOPLAS .Similarly, sfu.ca is a DNS domain for Simon Fraser University, soRDNS."sfu.ca".CMPT/TR denotes the Technical Report series of the School of

Computing Science at SFU. In this last example, one might consider instead basingthe USIN specification on the cs.sfu.ca domain, that is, RDNS."cs.sfu.ca"/TR.This form might be allowed, but the form based on the CMPT designation may bepreferred (canonical), because that designation has been specifically chosen by SFUin a system of unambiguous codes for its departments.

The syntactic convention of enclosing a DNS name in double quotes when used asan RDNS domain serves two purposes. First, it emphasizes that the hierarchicalstructure of the DNS name plays no role in the interpretation of that name as anRDNS subdomain. In essence, DNS names are being cited as atomic identifiers forpublishing organizations. Second, the quote marks delimit the scope of a DNS

name, within which the "." separator is understood not as a part of the USINsyntax, but simply as a character in a quoted DNS name.

Unfortunately, there is no constraint within the DNS system that DNS domains arepermanently unique designations of organizations or their successors. Under DNS,the essential requirement is that domains are unique at any particular point in time,but it is quite conceivable that a naming authority at some level may reuse orreassign a name. Furthermore, the association between DNS names andorganizations breaks down as one descends into the hierarchy of subdomains,subsubdomains and so on. To avoid these problems, the USIN standardizationprocess could include the publication of a list of acceptable DNS names and theirassociated organizations for use within the RDNS domain of the USIN scheme.These designations should be permanent; the interpretation of a designation withinthe RDNS domain should be derived from this list, even if that designation is laterreassigned to some other purpose within DNS itself. The intention of the list shouldbe to identify all and only those DNS domains that may be clearly identified withpublishing organizations.

The astute reader will note that designations such as RDNS."acm.org"/TOPLAS andRDNS."sfu.ca".CMPT/TR seem unnecessarily awkward compared to the earlierexamples S.ACM/TOPLAS and CA.SFU.CMPT/TR . We should hope that forms such as

the latter ultimately become canonical under the USIN system. One might ask, then,



why not just skip the RDNS prefix, reverse the order of DNS domain names anduse those reversed names directly at the top-level of the USIN hierarchy in theinitial instance? The answer is that the top-level domain structure of the USINsystem should not be prematurely constrained. Once established for a particular use,USIN designations are intended to be reserved permanently for that use. The RDNSprefix allows existing DNS names to be used as a way of initializing the USINsystem, giving time for an orderly process of developing an internationally-acceptable top-level domain structure.

Within the RDNS domain for a particular publishing organization, the identificationof administrative divisions and publication series should use codes specified by thatorganization. In many cases, clear coding schemes are already in place now. In theimportant case of universities, a system of unambiguous mnemonic codes for theacademic departments is typically available in the university calendar. Codes to

denote a publication series of a university department (e.g., TR for Technical Report,TN for Technical Note and so on) are often included on publication lists produced bythe department or may be found on the documents themselves. Wherever possible,the use of existing naming schemes should be accommodated in this way, in orderto maximize the scholar-friendliness of USIN designations.

Occasionally, one finds a DNS domain that directly corresponds to a particularserial publication. For example, the electronic journal First Monday has anassociated DNS domain firstmonday.dk. In this case, the DNS name can be usedas a serial publication name directly within RDNS. Assuming then that the internetdomain for First Monday is registered on the list of acceptable RDNS domains, it

has the USIN RDNS/"firstmonday.dk" .

In order to ensure the robustness and permanence of USIN designations, one shouldexpect that certain adaptations and accommodations of historical naming schemeswill be required. Thus, the USIN system must include a method for describingnaming schemes and rules for maintaining consistency. In order to make the greatestuse of historical naming schemes, the rules should be designed to accommodate agreat deal of variability. Nevertheless, some modifications of historical namingschemes should be expected in order to comply with USIN requirements.

The three initial domains ISSN, ISBN and RDNS provide a plausible initial basis

for unified, permanent and globally-unique designations of archivable serial, bookand institutional publications. There are undoubtedly many cases in which thecoding of USIN specifications will initially be unclear, especially in the case of institutional publications. However, it is certainly a common practice for the serialpublications of an institution to be identified using a numbering scheme that servesto unambiguously denote those publications in the local context of an institution. Itis certainly also the case that the vast majority of publishing institutions in theindustrialized world can now be identified by an appropriate DNS domain. Theseconditions suggest that it is presently feasible to initiate a USIN system.

Evolution of the USIN System: Towards Scholar-Friendly Names



Although the ISSN, ISBN and RDNS domains may serve to initialize a USINsystem, they will not generally provide a satisfactory basis for the scholar-friendlycanonical designations that meet USIN Requirement #4.2. The development of an

internationally acceptable domain structure is beyond the scope of this paper.However, to stimulate discussion along these lines, the References section of thispaper includes, for each of the cited references, the discussion of possible initialUSIN designations and forms that may evolve over time.

IV. Hierarchical Identification of Serial Items

This section focusses on the problem of identifying articles and other componentswithin the context of a particular serial. For concreteness, the first subsection startswith a proposed USIN syntax for citing journal articles. Following this, a general

model for serial item identification by hierarchical numbering of items within aseries is presented. The final subsection returns to the exploration of some additionaldesign ideas for USIN syntax.

Example: Journal Article Citation

The following examples illustrate a proposed syntax for citation of traditional (print) journal articles.

S.ACM/TOPLAS:16@1811

Assuming that S.ACM does become the code for the Association forComputing Machinery in the global domain for scholarly societies, this is thecanonical USIN in the proposed syntax for the article "A Behavioral Notionof Subtyping" by Barbara H. Liskov and Jeannette M. Wing appearing in

ACM Transactions on Programming Languages and Systems, volume 16,number 6, (November 1994), pages 1811-1841.

S.ACM/TOPLAS:16(6)@1811

This is an acceptable alternative USIN for the same journal article, specifyingthe issue number.

S.ACM.SIGPLAN/Notices:32(1)@66

This denotes the position paper entitled "Global Computation" by Luca

Cardelli, published in ACM SIGPLAN Notices, Volume 32, Number 1,January 1997, pp. 66-68. In this case the issue number is required, becausepages are renumbered from 1 with each issue of SIGPLAN Notices.

The syntax is intended to be scholar-friendly: mnemonic of the roles of eachcomponent in the numbering. Volumes are emphasized as the first numberingcomponent, issues are enclosed in parentheses consistent with many standardcitation formats and the "at sign" indicates the page number at which the articlestarts.

It is possible to contemplate a generic syntax for the numbering of serial items,

avoiding specialized syntax for each type of item. For example, the conventions of



the Web's Universal Resource Identifiers [3] might be adopted to use the "/"punctuation for separation of all elements within the hierarchical numbering of aserial item. The designation of the TOPLAS example might become

S.ACM/TOPLAS/16/6/1811 . Unfortunately, there are a number of disadvantages to ageneric syntax for hierarchical numbering. First, with respect to journal numbering,optional issue numbers are not easily accommodated. For example, how does onereconcile S.ACM/TOPLAS/16/1811 as an article denotation with

S.ACM/TOPLAS/16/6 as an issue denotation? Second, the mnemonic value of associating specific symbols (e.g., "@") with specific concepts (e.g., "at pagenumber") is lost. Finally, there may be syntactic conflicts between the universalsyntax and existing syntaxes for publisher's numbering schemes. For example, the"/" separator for URI syntax conflicts with the combined-issue designations such as3/4 that are frequently used by journals such as The Serials Librarian. For thesereasons, it seems preferable to avoid specifying a generic universal syntax for serial

numbering and instead allow series-dependent syntax. Nevertheless, the number of alternative syntactic schemes should be kept fairly limited to avoid cognitiveburdens for the scholar.

Multiple Articles Per Page.

Occasionally, one may find journals with more than one article starting on aparticular page. For example, these might be items of technical correspondence.One solution to this problem of starting page ambiguity is to use sequentialdenotations with lower case letters. For example, S.ACM/CACM:38(1)@43a and

S.ACM/CACM:38(1)@43b could respectively denote the two short articles "Womenand Computing in the UK" by Alison Adam and "Announcing a New Resource:The WCAR List" by Laura L. Downey, both appearing on page 43 of Communications of the ACM , volume 38, number 1 (January 1995).

There are three small problems with this scheme that may be quite rare but aretheoretically possible and should be addressed. The first is that there may potentiallybe more than 26 articles on a page. However, the scheme easily extends so thatdesignations such as aa for the 27th article and aaa for the 677th article may beused. Second, there may be an ambiguity in determining the ordering of articles;pages are two-dimensional while orderings are one-dimensional. The most scholar-

friendly way to resolve this is to follow the natural text ordering. For publications inEnglish and similar languages, this is column-major numbering: articles in column 1always precede articles in column 2 and so on, while articles within columns arenumbered top to bottom. Finally, note that page numbers themselves might in somecases include lower case letters. An example is preface material in a journal volumenumbered using lower case roman numberals. To handle this case, the USIN schememight specify that the underscore ("_") character can be used as a separator.

In practice, scholars will not want to learn the details of how to distinguish multiplearticles on a page until it becomes a problem. They may not even be aware of theproblem if they are entering a citation from its written form in a reference list. In

http://users/kpumuk/convert/2/432360-434656.html#URI



such a case, the user will likely omit the required lower case code when entering thecitation. Interactive USIN processing software should notify the user of theambiguity and query him or her for its resolution. Batch-oriented software couldreturn the set of all articles on the page and issue a warning report through anappropriate message or log file.

Unpaginated E-Journals

When a journal is not printed on pages, one might expect that article identificationby page number is no longer appropriate. Although many electronic journals have infact retained page-oriented formatting and numbering, many others have chosen notto do so. In particular, there is a growing trend to use the logical document markupcapabilities of SGML [7] and HTML in electronic journals. One advantage is thatformatting may be left to the reader's software; articles can be viewed and printed

in a variety of different formats (with a variety of different paginations) dependingon hardware capability and reader preference. In view of this, it seems reasonable toexpect that the trend towards unpaginated e-journals will continue.

Consider a variation on the standard USIN journal syntax that accommodatesunpaginated e -journals by replacing the @ page syntax with $article-number. (Anearlier version of this paper used the more mnemonic # to denote article numbers,but the $ is easier to use when USINs may be encoded as URLs.) Some e-journalshave explicit article numbering by volume, for example, the Chicago Journal of Theoretical Computer Science. Supposing that S.MITP/CJTCS identifies this journal,

S.MITP/CJTCS:1995$3 then denotes article 3 in volume 1995, entitled "RabinMeasures" by Nils Klarlund and Dexter Kozen. In other cases, articles may benumbered within issues. Thus ISSN/1201-2459:2(3)$4would denote the article"Reflections on Milton and Ariosto" by Roy Flannagan, published as article 4 in

Early Modern Literary Studies (ISSN 1201-2459), volume 2, number 3.

When no explicit numbering is provided, article numbers should be determined byissue, if possible, or by volume, otherwise. In general, scholars will determine articlenumbers by counting through the table of contents. In some cases, this may be asource of ambiguity; if the table of contents includes regular articles, short notes,corrigenda, submission instructions and/or other items, scholars may have difficulty

determining what to count and what to omit. With the expected availability of on-line USIN databases, however, a scholar may simply query the database to verify ordetermine the correct USINs for articles published in a particular issue or volume.

A General Model for Identification by Hierarchical Numbering

The scheme just illustrated for journal citation is an example of a general conceptfor serial item identification: the use of a hierarchical numbering system. Abstractly,serial items are identified in the context of their serials by specifying hierarchicalnumbering tuples. For example, (volume, page) 2-tuples serve to identify articles insome print journals, while (volume, issue, page, item-count) 4-tuples may be

http://users/kpumuk/convert/2/432360-434656.html#SGML



required for magazines. In some cases, the hierarchy may be quite deep; items in aparticular newspaper may be identified by a 7-level numbering (volume, issue,edition, section, page, column, item-count). In general, this is the essence of serialidentification: although the particular scheme employed may vary from serial toserial, every item within every serial may be abstractly identified by some form of hierarchical numbering tuple.

It is interesting to note that a hierarchical enumeration system ("tumbleraddressing") was also used as the basis of universal document identification in theproposals for the Xanadu Docuverse [20]. However, those identifications werebased on a server/user/document/version/content hierarchy rather than the purepublication numbering hierarchy considered here. In essence, the Xanadu addresssystem attempted to develop a new numbering system to apply to all documents,whereas the USIN approach is to make characterize and use existing publication

numbering hierarchies within a common framework.

Scope

One defining characteristic of the USIN hierarchical numbering model is that everycounter within every numbering tuple has a scope that defines the context of itsnumbering. Issues of a journal are typically numbered from 1 within each volume;they are said to have volume scope. Page numbers may have volume scope or issuescope, depending on the particular serial. An "item-count" for distinguishingmultiple articles per page has page scope. The first, or principal, numberingcomponent of a serial is said to have global scope; it is numbered consecutively inperpetuity.

Numbering scope is correlated with, but not synonymous with, hierarchical level.For example, volume scope for page numbers is often used even when volumes aredivided into issues. Similarly, although issues are usually given volume scope whenvolumes exist, they may sometimes be given global scope.

Scope-Dependent Numbering

Another important aspect of the model is the use of scope-dependent numbering. In

general, this reflects the fact that some properties of a counter at a particular levelmay depend on the actual values of counters at superior scope levels. Some of thescope dependencies may be relatively minor. For example, a quarterly journal thatchanges to a bimonthly journal starting with volume 23 exhibits a scope-dependency: issues are number 1 through 4 for volumes 1 through 22, and arenumbered 1 through 6 thereafter. Scope-dependency may even affect the need for aparticular counter in serial item identification. For example, the item-counter formultiple articles per page is not needed for those pages that have only one articlestarting on a page. Scope-dependencies may even affect the entire numberingsystem. For example, a print journal may switch to electronic publication at some

point with a corresponding switch from a (volume, issue, page) numbering scheme




to a (volume, article-number) scheme.

Syntactic Representation

In general, the numbering scheme for every serial has a syntactic representation thatmay be generated by mapping rules from the abstract representation as ahierarchical numbering tuple. In the suggested standard journal article syntax, the(volume, page, item-number) tuple of (12, 135, 2) maps to the syntacticrepresentation 12@135b. In general, each number in a hierarchical numbering tupleis first mapped to a numeral in some encoding system, such as arabic numerals,roman numerals or "alphabetic numerals" (a, b, c, ..., aa, ab, ...). Then a syntacticstring for the entire structure may be constructed by concatenation with appropriatemnemonic operator symbols as punctuation. An essential goal of this process is thatthe syntactic encoding be uniquely decodable. Operator symbols must be carefully

chosen both to have mnemonic value and to ensure unambiguous interpretation of the syntactic forms. In principle, the order of appearance of numbering elementsmay also be considered a design choice, but for simplicity and to avoid confusion itmay be desirable to enforce a strict left-to-right ordering of elements according tothe numbering hierarchy.

Parallel Numbering Hierarchies

A fourth aspect of the hierarchical numbering model is that a serial may haveparallel numbering hierarchies for different purposes. In general, these hierarchies

have a common numbering prefix consisting of one or more of their uppermostnumbering levels, with divergence of numbering below these level(s). The simplestexample is that of the article-identification and issue-identification hierarchies of

journals that are paginated with volume scope. In this case, the (volume, page) and(volume, issue) hierarchies may be considered parallel. In general, syntactic devicesare necessary to distinguish which hierachy is intended in any particular coding; the(volume, page) and (volume, issue) hierarchies are distinguished by the @ and ()

syntax notations given previously. Other examples of parallel numbering are givenin the later subsection on secondary component notation.

Chronology

Finally, chronology is the fifth general property associated with the hierarchicalnumbering model for serials. Chronology is the association of a date and/or time of publication with a particular serial numbering component. In general, chronology isa fundamental aspect of serial publication and should be defined for all hierarchicalnumbering components down to some level at which all further structure isconsidered simultaneously published. For example, traditional print journals havechronology specified to the issue level, while electronic journals may havechronology specified to the article level. In general, chronology is scope-dependent;for example, when a quarterly journal changes to a monthly one, the chronology



associated with issue 3 in each volume may change from "Fall" to "March".Chronology may also be irregular and possibly out-of-sequence, that is, withpublication numbers assigned out of order of actual publication dates. Chronologyitself is also an instance of hierarchical numbering, for example, using (year, month,day) 3-tuples or (year, season) 2-tuples.

Further Work: Hierarchical Numbering Theory

One direction for further development is to consider formalization of the model tobecome a theory of hierarchical numbering. Such a theory would have as its purposethe establishment of certain important properties, such as ensuring that everypublished item is denotable by a hierarchical numbering tuple, every tuple has asyntactic representation and every syntactic representation is unambiguouslydecodable. In particular, careful attention should be given to the formulation of

arithmetic operations to avoid problems such as the "paradoxes of tumblerarithmetic" in the Xanadu scheme [20]. The theory should also account for theparticular properties of hierarchical chronological numbering. In this regard, thetheory should be informed by the extensive work of Dershowitz and Reingold indeveloping the mathematics of many of the world's important calendar systems[10].

Additional Design Ideas for Hierarchical Numbering

The following subsections present a number of additional design ideas for the

identification of serial items by hierarchical numbering. Although many of the ideasare illustrated using examples related to journals, they are intended to apply to othertypes of serial as well.

Syntax for Holdings Description

Beyond article identification, the next most important application area for USINsmay be in the description of library holdings or document delivery servicecoverage. A single volume or issue of a journal is simple to identify by includingnumbering only to the desired level. For example, S.ACM/TOPLAS:16 denotes

volume 16 of TOPLAS , while S.ACM/TOPLAS:16(6) denotes issue 6 thereof. Butholdings are more often described as volume ranges. In cases where issues aremissing, subscriptions are cancelled and then reinstated, or miscellaneous holdingshave been received by donation, the holdings may be broken up into a lists of individually held items or ranges. To accommodate these requirements, it seemsreasonable to reserve the comma (",") to separate elements of a holdings list and thedouble hyphen "--" to serve as a range operator.

Consider a holdings pattern for ACM TOPLAS consisting of volumes 2 through 12and 16 forward, except for the missing issues 2 and 4 of volume 10. The followingUSIN holdings specification could be descriptive.

http://users/kpumuk/convert/2/432360-434656.html#Calendrical




S.ACM/TOPLAS:2--10(1),10(3),11--12,16--ff

Here, the serial code is specified only once. Commas separate individually helditems or ranges. The start and end of a range are indicated by enumeration to the

required level of specificity. An end range of "ff" indicates a continuingsubscription. As a syntactic constraint to aid in error detection, holdings should belisted in strictly ascending order.

Only positive holdings data is shown, following the principle adopted by ANSISerials Holding Statements [2]. Determination of missing items can be made byreference to either the USIN global database or an appropriate serial "definition"(see the subsection on Serials Definition Language in the following section). Forexample, using the knowledge that TOPLAS was quarterly during volume 10 tells usthat 10(2) and 10(4) are missing for these holdings while 10(5) is not (because itdoes not exist).

The conventions for serials holdings are intended to apply to serials with any formof hierarchical numbering and to any level of specifity. One implication is that thesyntax of USINs generally must be structured to avoid conflicts with the "," and "--" symbols of the holdings notation. Another implication is that coverage can bespecified to a finer level of detail. For example, a document delivery service maywish to identify "scanned holdings" to the article level, that is the articles that havealready been scanned or digitized and are hence available for short-turnarounddelivery.

Secondary Component Notation

Secondary component notation is a proposed means of specifying abstracts of articles, tables of contents of issues, indexes of volumes and other secondarycomponents of serials or their articles. In general, secondary component notation isintroduced by a USIN for the relevant article, issue, volume or other component,followed by a vertical bar and a component specification. The componentspecification is typically a standardized mnemonic for the component, possiblyfollowed by a parenthesized enumeration. The following examples are illustrative.

S.ACM:TOPLAS:16|index

The index of volume 16 of TOPLAS (found at the end of S.ACM:TOPLAS:16(6)).

S.ACM:TOPLAS:16(6)|contents

The table of contents of volume 16, issue 6 of TOPLAS .S.ACM:TOPLAS:16@1811|abstract

The abstract of an example TOPLAS article.S.ACM:TOPLAS:16@1811|sec(4.1)

Subsection 4.1 in the example article, entitled "Type Specifications".S.ACM:TOPLAS:16@1811|fig(3)

Figure 3 in the example article, captioned "Stack Type".

The last two examples illustrate parallel (volume, page, section, subsection) and

http://users/kpumuk/convert/2/432360-434656.html#SDL

http://users/kpumuk/convert/2/432360-434656.html#Z39.44



(volume, page, figure) numbering hierarchies respectively for sections and figureswithin articles.

It is anticipated that a standard set of mnemonics for standard kinds of components

would be globally defined (index, abstract, section, figure, table, equation and soon) while others may be defined for individual publications. However, scopedependencies and numbering syntax for enumerated components will typically bedefined on a serial-by-serial basis.

One may question the need for fine-grained identification of article components.Indeed it is reasonable to consider deployment of an initial USIN system thatfocusses on article identification. Nevertheless, for a scheme that is designed toserve for article identification and related purposes in perpetuity, it would seemfoolhardy not to allow the extension of the scheme using a notation such as the

secondary component notation presented here.

The Reference Notation

The reference notation is a particular application of the secondary componentnotation that would allow designation of an article or other contribution by indirectreference. For example, S.ACM/TOPLAS:16@1811|ref(17) denotes reference 17 of the article starting on page 1811 of volume 16 of TOPLAS. As it happens, thisreference is to an article entitled "A semantic database model," by Hammer andMcLeod appearing in ACM Transactions on Database Systems, 6(3), pp. 351-386.Assuming that the appropriate citation database exists, the indirect reference in thiscase could map to the canonical form S.ACM/TODS:6@351 .

One use of the reference notation is to guarantee that you can quickly generate anacceptable USIN for every reference in an article, providing that you can generate aUSIN for the article itself. During creation of citation databases, it may be desirableto produce a full set of USINs for the reference lists of articles in a fairlyexpeditious fashion. If the resolution of some references to their direct USIN form isproving problematic, they may be left in indirect form during initial data entry. At alater time, the resolutions of indirect references may be entered either manually orby acquisition of an independently developed citation set for the same article.

Another use of the reference notation is to serve as a unique canonical form forpersonal communications, unpublished works and other otherwise undenotableitems. In this way, there would be no need to create a classification or codingscheme for such references. Furthermore, each such item would be automaticallygiven a permanent and unique code. For example, if two authors each write articlesciting "Famous Person, personal communication", those citations would be givendistinct canonical identifiers. This would prevent false positives when doingcoreference searches (finding papers that have 2 or more references in common).

The reference notation is best supported by article styles with an explicitly

numbered reference list at the back. If a reference list exists, but is not numbered,



reference numbers may be determined by counting. Alternatively, if references arecited by symbolic tags, as in this paper, a possible design choice is to use thesymbolic code itself in the reference notation. For example, the citation of the SICIstandard referenced in an earlier version of this paper might be given the indirectreference RDNS."sfu.ca".CMPT/TR:97-16|ref(SICI) . Another style may usenumbered endnotes, with the possibility of more than one reference per note. In thiscase, enumeration with endnote number may use lower case letters; |ref(3c) woulddenote the third item cited in endnote 3 of a particular article. In general, each serialmay define its own reference numbering conventions, but it is highly desirable thatone of the standard forms be chosen.

Hyphenation Notation

In some cases it may be desirable to break a long USIN over multiple lines. This

can be accommodated by the following hyphenation convention. A line break maybe inserted after any hyphen appearing in a USIN, without changing its meaning.Furthermore, any nonhyphenated USIN operator can be converted into a hyphenatedequivalent of that operator by adding a hyphen to the end. Thus, the hyphenatedequivalents of "." and "/" and "--" are respectively ".-" and "/-" and "--" (nochange). The following examples illustrate this convention in use.

RDNS."sfu.ca".CMPT/-TR:97-16|ref(SICI)

S.ACM/TOPLAS:2--15(1),-

15(3),15(5)--17,20--ff

S.ACM/TOPLAS:2--15(1),15(3),15(5)--17,20--ff

RDNS."sfu.ca".CMPT/-TR:97-16|ref(SICI)

The last example illustrates that a newline character is not strictly required after ahyphenated operator. This accommodates reformatting operations that mighteliminate an inserted newline character but leave a vestigial hyphen in place.Conversion to canonical form eliminates any hyphenated operators and embeddednewlines. USIN processing software should fully recognize the hyphenation

convention in the event that a multi-line USIN is entered using a cut-and-pasteoperation.

V. USIN Support Technology

This section considers two important models of support technology for a USINscheme: a USIN Global Registry and a USIN Global Database System. The USINGlobal Registry is proposed as a system of institutions and technologies designed topreserve the knowledge of assigned USINs and their denotations for posterity and tosupport publishers and librarians in the assignment of new USINs for new and/or

unassigned works. As differentiated from the Registry, a USIN Global Database



System is not intended for USIN updating, but is instead intended to support theday-to-day needs of scholars for access to USIN information. This distinction isconceptually valuable in organizing requirements for the separate purposes of USINregistration and USIN-based information retrieval. It might ultimately be the casethat the registry and database components are implemented in a single system,however.

In discussing these technologies, the goal is to present a vision of how USINs maybe generated, verified and used in the day-to-day work of publishers, librarians andscholars. At this point in the development of the USIN concept, the focus should bemore on the analysis of overall system requirements than on the implementationdetails of underlying mechanisms. Nevertheless, a number of design ideas areincluded to help give a more concrete picture of the possible operation of anintegrated global USIN system.

USIN Global Registry

Consider a design for the USIN Global Registry based on four principalcomponents. These are:

SDL: Serials Definition Language:a language for specifying serial publications and their publication schemes.

UPP: USIN Publication Protocol:a protocol for assigning USINs as part of the publication process andverifying that they meet global uniqueness and permanence of identification

requirements.SRP: Serial Registration Protocol:

a protocol for registering and revising serial codes and their SDL definitions.PDP: Publication Domain Protocol:

a protocol for creating, modifying and deactivating publication domains.

These are the technologies that publishers and librarians could use on a daily basisin the assignment of USINs to serially published items.

SDL - Serials Definition Language

Fundamental to the USIN concept is the use of serial designations and numberingschemes for identification of articles and other serial components. In order toformally specify these schemes, consider the creation of a Serials DefinitionLanguage (SDL). Each SDL specification would define one serial, establishing itsbasic identity and publication scheme. In particular, this would include formalspecification of the hierarchical numbering scheme of the serial including itsabstract structure, scope-dependencies, chronology, and syntactic identificationschemes for articles and other serial components. It would also include thespecification of the canonical and allowable alternative forms for USINdesignations.



In addition to its formal role in the USIN scheme, SDL should also be designed toserve a variety of related purposes. From a serials check-in and claimingperspective, the enumeration and chronology specifications of an SDL definitionshould also have predictive value as contemplated, for example, by the serial patternscheme of McNellis [16]. The SDL definition of a serial should also provide a basisfor evaluating and interpreting USIN holdings specifications and possiblyconverting them to MARC Holdings Format. Similarly, from a bibliographicdatabase perspective, it should be possible to verify the enumeration and chronologyrecorded in a database entry against that specified in an SDL definition. It shouldalso be possible to determine the comprehensiveness of database coverage: are thereany issues or articles published that are not in the database, or is the databasecomplete?

The requirements above relate to a fairly narrow definition of serials, namely, in

terms of the logical schemes for enumeration, chronology and serial itemidentification. It is possible to define a language (say, SECIL) that would be limitiedto these requirements. Such a narrow approach would serve to support a USINsystem, but it seems reasonable to consider serial definition from a broaderperspective while the opportunity exists. In particular, the definition of a seriallogically includes not only its numbering scheme, but also the title, publisher andpublication format. Incorporation of such elements into the language would seemnecessary to merit the term "serials definition language." Beyond this, one mightwish to include additional information, notably classification and indexinginformation. This reflects a cataloguing perspective and suggests that anomenclature of SCL (serials cataloguing language) might be appropriate. However,

from the viewpoint of designing good modular systems, the SDL approach isarguable preferable, because it focusses on information deriving directly from itspublication and relevant to the essence of what the serial is. Cataloguinginformation is essentially third-party information that may derive from a variety of sources and should be kept separate; it is information about the serial, notinformation defining it. Detailed exploration of these issues is an area for furtherwork.

UPP: USIN Publication Protocol

When USIN-based bibliographic databases are in widespread use, publishers willfind that the sooner an article is assigned a USIN, the sooner it is advertised to largecommunities of scholars. The USIN Publication Protocol (UPP) is thereforeproposed to allow publishers to assign each article a USIN during the publicationprocess, thereby updating the USIN databases automatically.

A major requirement for UPP is to ensure the integrity of assigned USINs from thestandpoint of global uniqueness and consistency with the current SDL definitions of serials in question. One approach to this is to maintain within the USIN GlobalRegistry a current publication state for each serial and to define acceptable UPPactions in terms of this state. In essence, the publication state identifies the last

http://users/kpumuk/convert/2/432360-434656.html#McNellis



issued USIN for the serial, plus a specification of which numbering levels in thehierarchical numbering scheme are currently open. This gives a basis for predictingthe counter and date values for upcoming UPP requests.

For example, consider the publication state that might exist after registering thearticle "Collecting Interpretations of Expressions" by Paul Hudak and JonathonYoung appearing in ACM TOPLAS , Volume 13, Number 2, April 1991, pages 269-290 with the USIN S.ACM/TOPLAS:13@269. The state may include volume and issuecounters that are currently open with values 13 and 2, respectively. A page countermay be closed at page 290 (nothing more will appear on page 290). At this point,there may be two legal UPP actions: add another article in this issue or close it. As ithappens, there is one more article in the issue. Based on the current publicationstate, an expectation may be generated that the next article will have USINS.ACM/TOPLAS:13@291 . If the publisher indeed submits that USIN with the next

UPP request, it can be accepted, otherwise an error can be reported.

After a "close issue" request has been made, the SDL definition and publicationstate can be used to predict the next publication action and expected date. In theexample, this is an "open new issue" request for issue 3 of volume 13, July 1991.These may be verified when the actual request is made. When issue 4 of thisvolume is closed, the SDL definition should tell us that there are no more expectedissues in this volume. The expected sequence of following UPP requests is then a"close volume" request, followed by an "open volume" request for volume 14, 1992,an "open issue" request for issue 1 in January 1992 and an article publicationrequest with USIN S.ACM/TOPLAS:14@1 . Each of these expectations may be in turn

verified against the actual UPP requests made.

Of course, mechanisms will be required to deal with various kinds of exceptions tothe predicted publication pattern. For example, when a particular issue is expected,one may instead see a combined issue (with combined enumeration) instead.Alternatively, an issue may be skipped altogether, or a special issue may be insertedinto the publication stream between two regular issues. Publication numbering mayalso be out of order with respect to date of publication. For example, in a technicalreport series, it is not uncommon for numbers to be assigned in advance of publication, with variable delays between the assignment of a number and actualpublication. An apparent publication exception may also be the first indication of anactual change in publication pattern. In this case, the SDL definition should becorrected to reflect the updated publication pattern and reregistered with SRP,described below.

SRP: Serial Registration Protocol

Serials Registration Protocol is the proposed service for registering a serial code andits accompanying SDL definition and tracking changes thereto over time. Thisincludes registering changes in publication numbering or chronology, changes inpublisher or publication domain, addition of alternative USIN codings, changes to



the canonical USIN form and/or deactivations and reactivations. In general, SRPrequests would be made with respect to a particular publication-domain/serial-codecombination.

Perhaps the most critical function under SRP is the creation of a new serial codewithin an existing publication domain. The code may be the initial code for a newor previously unregistered serial publication or it may be an alternative code for anexisting publication. In either event, creation of a serial code should always beconsidered with care, because it creates, in the context of the given publicationdomain, a permanent USIN binding between that code and the serial in question.From this perspective, it is worth considering appropriate verification actions forcreation of a new serial code. Of course, verification that the code is previouslyunassigned is an automatic function that should be implemented by the appropriatequery to the USIN Global Registry. Beyond this, there should also be some manual

verification to ensure that the code assignment is reasonably consistent with theUSIN concept. One option is to use national serial registration centres analogous tothose of the current international ISSN network. However, such a system is likely tobe too cumbersome for the management of publications at the fine-grained level of,say, minutes of committee meetings of particular university departments. It alsodoes not account for an institutional role in approving the serial codes chosen byadministrative divisions within the institution.

An alternative for verifying serial code assignments that overcomes these problemsis the following. SRP requests for new serial code creation must be approved by aUSIN-certified cataloguing librarian. Certifications are awarded by an appropriate

international standards body. Each authority for a publication domain may designatea certified librarian for that domain. When an SRP request to create a new serialcode is issued, it is handled by the librarian registered for that domain, if such alibrarian exists. Otherwise, verification of the creation request is attempted in theimmediately superior publication domain, and so on. For example, a university maydesignate a single USIN-certified librarian to handle all institutional requests fornew serial codes. Regardless of how deeply structured the administrative hierarchywithin the university is, all serial code creation requests within the university arepassed up the domain hierarchy to be handled by this individual.

The second major function of the SRP protocol is to register the publication patternof a serial and changes to that pattern as required from time to time. As describedabove, these publication patterns are specified as part of the serial's SDL definition.UPP can be used to check the consistency of the publication patterns against futurepublication attempts. That is, each time a USIN is specified in a future UPP request,it serves to check that the SDL definition is correctly predicting the actualpublication numbering and chronology.

Whenever the publication pattern of a serial is changed, the SDL definition must bemodified to account for both future and past publications. The checking of futurepublications is done by UPP. SRP is responsible for checking that the revised SDL

definition correctly accounts for the USINs assigned to past publications. This



checking may be done by formally re-evaluating the revised definition against theentire history of actual publication as recorded in the global registry. The checkingshould satisfy two conditions: (1) every USIN previously registered should beaccounted for by the new SDL definition, and (2) the new SDL definition shouldnot "predict" any past publication that does not, in fact, exist. Exhaustive checkingor a provably equivalent alternative method should be used. That is, a reduced formof checking that puts at risk the consistency of the USIN system should not be

justified on the basis of minor concerns of computer processing efficiency.

The third major function of SRP is to register canonical and alternative forms of USIN for a serial. When a serial is registered for the first time, the publication-domain/serial-code combination under which it is first registered is the canonicalform of USIN. Subsequently, SRP may be used to create alternative USIN forms.When such an attempt is made, the SRP request must specify both the publication-

domain/serial-code combination for the current canonical USIN and the newalternative publication-domain/serial-code combination. It may be reasonable torequire that permission from the domain authority of both domains be obtained.Any number of alternative forms for a serial may be created in this way.

The SRP request to change the canonical form of a serial must specify thepublication-domain/serial-code combination of both the current and proposed newcanonical forms. The request is made by the authority for the new publicationdomain and must be verified by the authority for the currently canonical publicationdomain. If approved, the change will be scheduled to occur at the next scheduledglobal synchronization time for changes to USIN canonical forms, or to a later

synchronization time specified in the change request. Once the change becomeseffective, the canonical form is switched, but both forms remain acceptable.

SRP also can be used to deactivate or reactivate a serial. In essence, deactivation of a serial registers a new publication pattern in which no further publications arepredicted. Reactivation requires a new SDL definition that may change the title andfuture publication pattern of a serial, but still requires consistency with the entirehistory of previously assigned USINs.

PDP: Publication Domain Protocol

Publication Domain Protocol is the final proposed service of the USIN GlobalRegistry. This protocol is used to create and register new publication domains,transfer authority for domains, register the USIN-certified librarians for a domainand other related functions. In general, these actions will refer to subdomains of some existing publication domain; even top-level USIN domains such as ISSN andRDNS may be considerd as subdomains of a global USIN publication domain.

Creation of a code for a new publication domain under PDP parallels the creation of a new serial code under SRP. In both cases, the proposed code must be checked toverify that it is previously unused in the context of the parent publication domain.

Furthermore, the manual review of serial codes by a USIN-certified librarian should



also occur for new publication domains. Ideally, this manual review should verifythat the publication domain corresponds to an actual publishing institution,organization or administrative division thereof and is a scholar-friendly mnemonicdesignation of that unit consistent with historical practice wherever possible.Alternatively, the publication domain may represent a newly-formed collective orcoalition expressly formed for the purpose of organizing the upper levels of theUSIN domain structure.

A further parallel with SRP is to suggest that formal domain definitions beregistered and revised as required from time to time. These definitions wouldspecify the identity and organizational history of a publishing entity. From a domaindefinition, then, one should be able to determine the name of a particular publishingentity, its parent organization, its successors and predecessors and so on. However,domain definitions would not have the complexity of serial definitions under SDL,

because there are no corresponding requirements in publication domains forenumeration, chronology and other aspects of serial definitions.

PDP should also support the registration of alternative USINs and changes incanonical USIN for the publishing entities denoted by publishing domains. Theregistration of alternative USINs under PDP could parallel SRP in a straightforwardfashion. However, registration of a new canonical USIN for a publishing domain iscomplicated by the implications for serials and subdomains within that domain.Consider a proposed change from RDNS."acm.org" to S.ACM as the canonical USINfor the Association for Computing Machinery. Normally, this should implycorresponding changes for all subordinate serials and subdomains recursively. Thus,

changes in canonical USIN from RDNS."acm.org"/CACM to S.ACM/CACM, fromRDNS."acm.org".SIGPLAN to S.ACM.SIGPLAN and from

RDNS."acm.org".SIGPLAN/Notices to S.ACM.SIGPLAN/Notices should all beexpected in the example. However, it may be unwise to automatically make suchchanges without review in every instance. Thus, under PDP, a change in canonicalform for a publishing domain should be carried out by first registering all theappropriate changes for subordinate serials and subdomains. This may be enforcedunder PDP by permitting a registration of a new canonical form for a publicationdomain only when alternative canonical forms for all active subdomains and serialstherein have been registered.

Finally, PDP should also provide for the deactivation and possible reactivation of domains. Deactivation of a publication domain implies that no further publicationactivity is contemplated within that domain or its subdomains. Hence deactivation of a domain should only be permitted when all subordinate serials and subdomainshave themselves been deactivated. Reactivation of a publication domain mayoccasionally be contemplated. However, to ensure the permanence of identificationof USINs issued in the subdomain prior to its earlier deactivation, a reactivationrequest should not be automatically granted. Instead, a "contract" may be firstreturned identifying previous use of the domain, assigned subdomains and serialsand the requirement that new use will respect these. The proposed new domain

authority should agree to these terms before the domain can be reactivated.



USIN Global Database System

Now consider how the day-to-day needs of scholars can be directly supported by a

USIN Global Database System. Three basic needs can be identified: (a) the need toinquire about the article or other item denoted by a given USIN, (b) the need of authors to cite articles by USIN, and (c) the need to use USINs in literatureresearch, both to denote search keys (citation indexing) and search results. USINInquiry Protocol is the first proposed technology to assist users in this regard; itprovides for both the interactive inquiry about USINs and for hypertext citation of USINs in World-Wide Web documents. To support citation by USIN in other typesof document formatting software, a Bibliographic Retrieval Protocol is proposedcoupled with bibliographic formatting "plug-ins" for standard word processingpackages. The final subsection discusses the role of the USIN Global Database and

USINs generally in literature research.

UIP - USIN Inquiry Protocol

One of the primary motivations underlying the USIN concept is to address the"broken links" problem on the World-Wide Web: citation of works by UniformResource Locator (URL) is prone to failure when the cited item is moved orremoved. To solve this problem, it has long been suggested that names of resourcesrather than their locations should be the basis of citation, but none of the proposalsfor Uniform Resource Names (URNs) has yet succeeded. A more successful

approach may be to concentrate on an important subset of the general problem:links to serially-published documents. For this subset, consider the direct use of USINs as permanent, "unbreakable" links and the development of USIN InquiryProtocol (UIP) to enable this use. For example, a hypertext reference to a sampleTOPLAS article could be coded using the following HTML markup.

<A HREF="uip:S.ACM/TOPLAS:16@1811">A Behavioral Notion of Subtyping</A>

Note that a hyperlink formed in this way makes no reference to any particularcomputer system. Thus, the requirements of URNs are satisfied; the target of a linkis designated by naming what it is instead of where it is located.

Apart from this use in Web-based documents, UIP also supports the direct inquiriesabout a particular USIN. All that the scholar need do is to typeuip:S.ACM/TOPLAS:16@1811 directly into the "location" field of his favorite Webbrowser (assuming that the browser has been updated to include the UIP client-sidesoftware.)

Ignoring for the moment how it works, the critical issue from a user perspective iswhat you get when you make a UIP/USIN inquiry, either directly or by activating ahyperlink. One answer is that you retrieve a metadata page, that is, an informationpage about a document, but not the document itself. In general, the direct retrieval

of documents cannot be guaranteed because many of them may not be electronically



available. On the other hand, if a document is available on-line, it may be availablefrom a variety of different sources with a variety of different formats and/or pricingstructures. The purpose of a metadata page, then, is to provide a full bibliographicdescription of the article or other item denoted by the target USIN, and a set of linksfor making further inquiries about the article and/or retrieving a copy of it.

In general, one may consider an ambitious design goal for metadata pages: toprovide a comprehensive information resource with respect to the cited items. Inaddition to basic bibliographic information and links for acquiring copies of articles,a number of other items could be provided. Each article metadata page couldinclude direct links to information about the serial and its publisher. Using theUSIN notation it should also be easy to include links for retrieval of contents pagesfor sibling articles in the same journal issue or volume. Links for exploring otherpublications by the authors of the article might be included. In particular, links for

locating subsequently published corrigenda would be worth highlighting.Information on review articles that discuss the document of interest may beincluded. In conjunction with a citation database, links for retrieving the sets of articles that are respectively cited by and cite this article could also be considered.Finally, it may be reasonable to consider including links to search services that canlocate similar articles by full-text searching using a document surrogate (keywordsand other metadata that describe the current document).

It may be the case that the coded USIN in a UIP hyperreference does not refer to asingle article, but instead denotes some other serial component or is ambiguous orerroneous. In each of these cases, the page returned through UIP should also strive

to provide comprehensive information to the user. For example, in the case of anUSIN reference by page number where more than two articles start on the specifiedpage, a menu showing each possible article could be returned together with theircorrect canonical USINs.

These ambitious goals for the metadata pages returned by UIP servers need notrepresent an obstacle to server development. The initial implementations of UIPservers may focus on basic capabilities, allowing additional functionality to beadded over time. In addition, many of the capabilities could be implemented in afairly modular fashion. For example, if a particular document delivery servicesupports web-based document ordering by USIN, then generating the appropriatedocument ordering link is a simple matter.

Returning to the issue of how UIP may be implemented, note that the syntax forUIP/USIN citations does not specify the actual server to be consulted in resolvingthe UIP request. Rather it is reasonable to expect that the server would be specifiedby an appropriate client-side mechanism, such as a UIPSERVER browser parameter orenvironment variable. Typically, users might choose to set their UIPSERVER tospecify a server operated by a major local research library or library consortium. Inthis way, the metadata pages returned can be formatted to emphasize local holdingsof cited documents, even when the citing document is remotely located.



Bibliographic Retrieval and Formatting

A key goal of the USIN scheme is to support authors of scholarly works in thepreparation of bibliographic references. This may be achieved by bibliographicprocessing "plug-ins" or "add-ons" to standard word processing software that willallow authors to cite works by merely entering USINs at the appropriate citationpoints. The bibliographic processing modules could then take care of all theremaining details for resolving and formatting the citations: retrieving the actual fullbibliographic citations, assigning appropriate in-text reference numbers or labels,formatting the citations according to a chosen style guideline, sorting themaccording to a user- or style-specified ordering, and incorporating the citations intothe document as a reference list at the back or sequentially in footnotes. As well asremoving a considerable source of tedium in the preparation of scholarly works, theuse of USINs in this way should also improve the accuracy and quality of citations

by eliminating manual errors and inconsistencies. Finally, a serendipitous benefit of having the citations in a paper represented as USINs is that the citation set can thenbe made available as data; citation databases can thus be supported by citation dataprovision at the source [6].

A modular design for a USIN-based bibliographic processing system is to allowmany different bibliographic formatting tools to retrieve data from the USIN Globaldatabase using a common retrieval protocol (say BRP: Bibliographic RetrievalProtocol) and citation representation format (say BDF: Bibliographic Data Format).This would allow the development of competing bibliographic formatting tools thatmight cater to different user preferences and to different types of document

processing system. BRP could be designed to work with locally-mounted copies of the USIN database for access to the bulk of historic bibliographic data, coupled withdirect Internet access to the USIN Global Database for access to the latestreferences. BDF should provide a highly-structured logical format for citation data,in order to allow various transformations on that data to be easily implemented.Ideally, UPP (USIN Publication Protocol) and BDF should be designed together sothat the bibliographic data in the correct format is gathered directly during the USINregistration process.

USINs, the USIN Global Database and Literature Research

In support of bibliographic inquiry, retrieval and formatting, the USIN globaldatabase is designed to provide a comprehensive solution when starting with a set of citations represented as USINs. But consider also the literature research task, that is,the need to find citations of potential interest using various search methods. In thiscase, the USINs are not known ahead of time, but may represent the results of thesearch process. In support of literature research, then, what role should USINs, ingeneral, and the USIN Global Database, in particular, play?

One possible approach is to expand the requirements for the USIN Global Databaseto also provide comprehensive support for literature research activities. After all, the




USIN Global Database is intended to be comprehensive in its coverage of thecitable works and must provide the basic bibliographic data (author, title, serialname, serial enumeration, publication date) for each archived item. With theextension of the database to include abstracts, keywords and classification data foreach item, it is possible to contemplate comprehensive support for literatureresearch.

An alternative approach, however, is to support multiple alternative literaturedatabases each of which provide their own methods of augmenting the basicbibliographic data available from the USIN Global Database. USINs themselvescould form the basis of interoperability between the databases, i.e., distinct resultsfrom different databases could be easily combined by USIN sorting and matchingoperations. Such an approach would support different classification schemes thatmight be appropriate in different subject areas, competition between different full-

text searching techniques based on article abstracts and/or article full text, selectivedatabases that target sources relevant to a particular topic or type of material,experimentation with filtering schemes that grade the level or nature of materials,alternative language databases that support searching in languages other thanEnglish, and so on.

From the standpoint of good modular system design, one can also argue that theUSIN Global Database should deal only with the basic bibliographic data thatderives from the publication process. Classification, evaluation and review materialsshould be considered third-party metadata that may come from a variety of sources.Without any agreed upon method for standardizing what types of metadata should

be provided and who should provide it, it would be a poor choice to impose de factostandardization by incorporating a particular third-party metadata scheme into theUSIN Global Database.

Nevertheless, it is reasonable to consider a limited extension of the USIN GlobalDatabase to support one additional form of metadata, namely citation metadata. Arequirement of UPP could be that the USINs of cited references be supplied as partof the publication process. If, as suggested previously, scholars use USINs inwriting their documents, it should not be difficult to provide them in the publicationprocess. If this were done, it could support the development of a universal citationdatabase that would in turn be a valuable tool for literature research and a potentialcatalyst for reform in scholarly communication [6].

VI. Conclusion

The USIN scheme is a proposed system for the global and persistent identificationof the publications in organized serial collections. Ultimately some globalidentification scheme is likely to be developed for interoperation of various articlecitation applications. Scholars should seize the opportunity that now exists to ensurethat the scheme that succeeds is the one that is designed primarily to meet the long-term needs of people (authors and readers), not the short-term needs of particular




present-day computer systems belonging to vendors, libraries or document deliveryservices.

This paper has presented a vision for a scholar-friendly universal identification

system for serially published works. It has also presented a number of concretedesign proposals for USIN syntax and technological components that can support aglobal USIN system. In particular, a uniform naming model has been presentedbased on hierarchical naming of serial publications and hierarchical numbering of serial items. Two important systems in support of the USIN concept have beenproposed, specifically, a USIN Global Registry and a USIN Global Database.Designs for each of these systems have been presented at a level that illustrates howspecific architectural features can interact to meet the requirements of publishers,librarians and scholars.

There is a great deal more work required to fully realize the USIN concept. Theauthor would be most appreciative of your help.

Acknowledgements

Andrew Walenstein has helped greatly by providing valuable feedback on severaldrafts of this paper. Jim Cole, while still questioning some issues from a serialscataloguing perspective, has been a source of considerable encouragement. I am alsograteful to the anonymous referees for many constructive criticisms and helpfulsuggestions.

References

[1]American Chemical Society, American Institute of Physics, AmericanMathematical Society, American Physical Society, Elsevier Science, IEEE,"Publisher Item Identifier as a means of document identification", updatedOctober 9, 1997. Archived publication unknown. Available athttp://www.elsevier.nl/inca/homepage/about/pii/ .

With no other formal denotation known for this work, it might only bedenotable by reference to this paper. Possible eventual USIN:

S.BCS/JoDI:1(3)$1|ref(1) . This assumes that BCS becomes assigned to theBritish Computer Society in the international domain of scholarly societies,and that JoDI is reserved by BCS to to denote the Journal of Digital

Information.

[2]American National Standards Committee on Library and InformationSciences and Related Publishing Practices, Z39, Subcommittee E: SerialsHolding Statements. American National Standard for Information Sciences -

Serial Holdings Statements. ANSI Z39.44-1986. Approved August 14, 1985.

http://www.elsevier.nl/inca/homepage/about/pii/



American National Standards Institute, New York, 1986.

Suggested initial USIN: ISSN.8756-0860/Z39.44-1986 . Possible eventualform US.ANSI/ANS:Z39.44-1986 .

[3]T. Berners-Lee. "Universal Resource Identifiers in WWW: A Unifying Syntaxfor the Expression of Names and Addresses of Objects on the Network asused in the World-Wide Web", RFC 1630, RFC Editor, Internet Society, June1994. Available at URL: http://ds.internic.net/rfc/rfc1630.txt .

Suggested initial USIN: RDNS."isoc.org"/RFC:1630 . Possible eventual formI.ISOC/RFC:1630.

[4]T. Berners-Lee, L. Masinter, M. McCahill (Eds.), "Uniform ResourceLocators", RFC 1738, RFC Editor, Internet Society, December 1994.Available at URL: http://ds.internic.net/rfc/rfc1738.txt .

Suggested initial USIN: RDNS."isoc.org"/RFC:1738 . Possible eventual formI.ISOC/RFC:1738, where ISOC might uniquely denote the Internet Society ina domain I of International organizations. Here, RFCs are identified in thedomain for the Internet Society, the principal sponsor of the series.Technically, the "RFC Editor", chartered by the Internet Society, is said to bethe publisher. However, it seems clear enough that RFC will remain an

unambigous code for this series in the context of Internet Society sponsoredpublications.

[5]Robert D. Cameron. "To Link or To Copy?-Four Principles for MaterialsAcquisition in Internet Electronic Libraries", Technical Report TR 94-08,School of Computing Science, Simon Fraser University, December 1994.Available at http://elib.cs.sfu.ca/project/papers/e-lib-links.html .

Suggested initial USIN: RDNS."sfu.ca".CMPT/TR:94-08 . Possible eventualform CA.SFU.CMPT/TR:94-08 .

[6]Robert D. Cameron. "A Universal Citation Database as a Catalyst for Reformin Scholarly Communication", First Monday 2(4), April 1997. Available atURL: http://www.firstmonday.dk/issues/issue2_4/cameron/index.html

Suggested initial USIN: RDNS/"firstmonday.dk":2(4)$4 . Here, the articlenumber ($4) is determined by counting. Eventually, the form

P.Munksgaard/FirstMonday:2(4)$4may be used, where Munksgaard is thecode for Munksgaard International Publishers in an international publishersdomain. Another possibility is J.FirstMonday:2(4)$4 based on the concept

http://www.firstmonday.dk/issues/issue2_4/cameron/index.html

http://elib.cs.sfu.ca/project/papers/e-lib-links.html

http://ds.internic.net/rfc/rfc1738.txt




of a global journal domain J operated by a publisher consortium.

[7]James H. Coombs, Allen H. Renear, and Steven J. DeRose. "Markup Systems

and the Future of Scholarly Text Processing." Communications of the ACM ,30(11), Nov. 1987, pages 933-947. Available at URL:http://www.sil.org/sgml/coombs.html.

Suggested initial USINs: ISSN/0001-0782:30@933,

RDNS."acm.org"/CACM:30@933 . Possible eventual form

S.ACM/CACM:30@933. An interesting point to note is that issue numbers arenot required for CACM prior to volume 33.

[8]R. Daniel. "A Trivial Convention for using HTTP in URN Resolution", RFC2169, RFC Editor, Internet Society, June 1997. Available at URL:http://ds.internic.net/rfc/rfc2169.txt .


[9]R. Daniel and M. Mealling. "Resolution of Uniform Resource Identifiersusing the Domain Name System", RFC 2168, RFC Editor, Internet Society,June 1997. Available at URL: http://ds.internic.net/rfc/rfc2168.txt .


[10]Nachum Dershowitz and Edward M. Reingold. Calendrical Calculations,Cambridge University Press, Cambridge, UK, 1997. Suggested USINs:

ISBN/0-521-56413-1 and ISBN/0-521-56474-3 . These codes use ISBNs forthe hardback and paperback versions, respectively. Choosing the code for thehardback version as canonical may be appropriate.

[11]

DOI Foundation, "A Guide to Using Digital Object Identifiers", October 10,1997. Archived publication unknown. Available athttp://www.doi.org/guidebook/guidebook.html .

Possible eventual USIN: S.BCS/JoDI:1(3)$1|ref(11) .

[12]Douglas C. Englebart, "Authorship Provisions in AUGMENT", Digest of Papers - Compcon Spring 84 - Twenty-Eighth IEEE Computer SocietyInternational Conference, San Francisco, February 27--March 1, 1984, pp.465-472.

http://www.doi.org/guidebook/guidebook.html



http://www.sil.org/sgml/coombs.html



Initial USINs: ISBN/0-8186-0525-1@465 (paper), ISBN/0-8186-4525-3@465(microfiche), ISBN/0-8186-8525-5@465 (casebound). Possible eventual form

I.IEEE/Compcon:28@465.

[13]Roy T. Fielding. "Maintaining Distributed Hypertext Infostructures: Welcometo MOMspider's Web", Computer Networks and ISDN Systems 27(2),November 1994, Special Issue Selected Papers of the First World-Wide WebConference, pp. 193-204. On-line paper and software distribution available athttp://www.ics.uci.edu/WebSoft/MOMspider/ .

Suggested initial USIN: ISSN/0169-7552:27@193 . Possible eventual formP.Elsevier/COMNET:27@193 . Here, the code COMNET is used by Elsevier forthis journal.

[14]Brian Green and Mark Bide. "Unique Identifiers: A Brief Introduction", BookIndustry Communication, London, 1997. Archived publication unknown.Available at URL http://www.bic.org.uk/bic/uniquid .

Possible eventual USIN: S.BCS/JoDI:1(3)$1|ref(14) .

[15]Frank Halasz and Mayer Schwartz. "The Dexter Hypertext Reference Model",Communications of the ACM 37(2), February 1994, pp. 30-39. Available at

URL: http://ds.internic.net/rfc/rfc2141.txt .

Suggested initial USIN: ISSN/0001-0782:37(2)@30 . Possible eventual formS.ACM/CACM:37(2)@30.

[16]Claudia Houk McNellis. "A Serial Pattern Scheme for a Value-BasedPredictive Check-in System", Serials Review, Vol 22, No. 4, Winter 1996,pages 1-11.

Suggested initial USIN: ISSN/0098-7913:22(4)@1 ,

RDNS."jaipress.com"/SR:22(4)@1 . The code SR is speculative. Possibleeventual form P.JAI/SR:22(4)@1 .

[17]R. Moats. "URN Syntax", RFC 2141, RFC Editor, Internet Society, May1997. Available at URL: http://ds.internic.net/rfc/rfc2141.txt .

Suggested initial USIN: RDNS."isoc.org"/RFC:2141 . Possible eventual form

I.ISOC/RFC:2141.

[18]

P. Mockapetris, "Domain Names: Concepts and Facilities", RFC 1034, RFC



http://www.bic.org.uk/bic/uniquid

http://www.ics.uci.edu/WebSoft/MOMspider/



Editor, Internet Society, November, 1987. Available at URL:http://ds.internic.net/rfc/rfc1034.txt .


I.ISOC/RFC:1034.

[19]National Information Standards Organization. Serial Item and ContributionIdentifier (SICI): An American National Standard Developed by the NationalInformation Standards Organization: Approved August 14, 1996 by theAmerican National Standards Institute. National Information Standards seriesANSI/NISO Z39.56-1996 (Version 2). NISO Press, Bethesda, Maryland,1997. Available at URL: http://sunsite.Berkeley.EDU/SICI/ .

This is an interesting case which is published in the National InformationStandards series (ISSN 1041-5653) of NISO. It has also been given an ISBN.But the code Z39.56-1996 represents its numbering as an American NationalStandard. Suggested initial USIN: ISSN.1041-5653/Z39.56-1996 . Possibleeventual form US.ANSI/ANS:Z39.56-1996.

[20]Theodor Holm Nelson. Literary Machines , Edition 87.1, 1987. Initial USIN:

ISBN/0-89347-055-4 .[21]

Norman Paskin. "Information Identifiers", Learned Publishing, Vol 10, No. 2,

April 1997, pages 135-156. Available at URLhttp://www.elsevier.com/inca/homepage/about/infoident/Menu.shtml .

Suggested initial USIN: ISSN/0953-1513:10@135 . Learned Publishing ispublished by the Association of Learned and Professional Society Publishers.On the path towards mnemonic identification, the USIN formRDNS."alpsp.org.uk"/LP:10@135may temporarily be used before aninternational domain structure is in place. Eventually, the canonical form maybecome S.ALPSP/LP:10@135 based on a domain S of scholary societies.

[22]

Fritz Schwarz and Cindy Hepfer. "Changes to the Serial Item andContribution Identifier and the Effects of Those on Publishers and Libraries",The Serials Librarian 28(3/4), 1996, pp. 367-70.

Suggested initial USINs: ISSN/0361-526X:28@367 and

RDNS."haworth.com"/SL:28@367 . Possible eventual form

P.Haworth/SL:28@367.

[23]K. Sollins and L. Masinter. "Functional Requirements for Uniform ResourceNames", RFC 1737, RFC Editor, Internet Society, December 1994. Available

http://www.elsevier.com/inca/homepage/about/infoident/Menu.shtml

http://sunsite.berkeley.edu/SICI/




at URL: http://ds.internic.net/rfc/rfc1737.txt .


I.ISOC/RFC:1737.

[24]Jennifer Wheary and Bernard F. Schutz, "Living Reviews in Relativity:Making an Electronic Journal Live", The Journal of Electronic Publishing.Available at URL: http://www.press.umich.edu:80/jep/03-01/LR.html .

Suggested initial USIN: ISSN/1080-2711:3(1)$5 . Possible eventual form

EDU.UMICH.PRESS/JEP:3(1)$5 .

http://www.press.umich.edu/jep/03-01/LR.html


toward universal serial item names

Documents