together under one roof: combining collection and item level description through multiple metadata...

4
Together Under One Roof: Combining Collection and Item Level Description through Multiple Metadata Schemas Carolyn Sheffield National Museum of Natural History Smithsonian Institution Washington, DC 20056 [email protected] Sonoe Nakasone Smithsonian Institution Archives Washington, DC 20024 [email protected] ABSTRACT The Smithsonian’s Field Book Project presents a “fusion” of metadata standards to meet the access needs of a diverse user base and to set the framework for establishing best practices for managing field book collections. The key access challenges around field books stem from a lack of best practices when it comes to collection management and description. Field books are unique materials that sometimes fall under the auspices of departmental libraries or laboratories (item level description), sometimes archives (collection level description) and just as often can be found intermingled and uncataloged in museum collections and curators’ offices (little to no description). These varying forms of custodianship result in collection management and descriptive practices that are not consistent across or even within institutions. The Field Book Project draws on existing standards and community input to develop a structured online resource for contributing and locating field book content. This poster provides examples of user needs related to field books; illustrates the use of different metadata schemas within the system and how they have been linked together to bridge collection and item level descriptions; and invites discussion on the potential impacts in terms of establishing best practices, improving access, and leveraging the technological capabilities of XML to expand content and features in the future. Keywords Metadata, Standards for metadata, XML INTRODUCTION The Field Book Project is a joint initiative between the Smithsonian Institution Archives and the National Museum of Natural History. Our overall mission is to create one online location for scholars and others to visit when searching for field books and other field research materials. The scope of the Field Book Project focuses on content related to biodiversity research, specifically botany, entomology, and vertebrate and invertebrate zoology. The project will begin as a Smithsonian-wide initiative and lay the foundation for an online Field Book Registry comprised of content contributed by museums and research institutions from throughout the country. The Field Book Project is funded by the Council for Library and Information Resources (CLIR). Why are field books so important? Field books are THE original source materials documenting field activities. They are the source for specimen labels and catalog entries. They include not only identifications for specimens collected but also often journaling, maps, photos, and other rich contextual information. The supplemental information found in field books can provide clues about interspecies relationships and can be used to guide habitat reconstruction and responsible land management. In addition, a range of personal observations and insights into the environmental and cultural conditions of a given place at the time collection can be gleaned from prose journal entries found in some researchers’ notes. BACKGROUND The Field Book Project grew out of the need for better resources, both for those managing field book collections and those trying to access them. Despite their incredible research and intrinsic value, field books frequently land in the “hidden” category, meaning that they are collections for which little to no documentation exists or that the documentation itself is difficult to access. Different forms of custodianship--library, archive, and museum--have led to variations in descriptive and collection management practices that complicate the This is the space reserved for copyright notices. ASIST 2011, October 9-13, 2011, New Orleans, LA, USA. Copyright notice continues right here.

Upload: carolyn-sheffield

Post on 15-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Together under one roof: Combining collection and item level description through multiple metadata schemas

Together Under One Roof: Combining Collection and Item Level Description through Multiple Metadata Schemas

Carolyn Sheffield

National Museum of Natural History Smithsonian Institution Washington, DC 20056

[email protected]

Sonoe Nakasone Smithsonian Institution Archives

Washington, DC 20024 [email protected]

ABSTRACT

The Smithsonian’s Field Book Project presents a “fusion”

of metadata standards to meet the access needs of a diverse

user base and to set the framework for establishing best

practices for managing field book collections.

The key access challenges around field books stem from a

lack of best practices when it comes to collection

management and description. Field books are unique

materials that sometimes fall under the auspices of

departmental libraries or laboratories (item level

description), sometimes archives (collection level

description) and just as often can be found intermingled and

uncataloged in museum collections and curators’ offices

(little to no description). These varying forms of

custodianship result in collection management and

descriptive practices that are not consistent across or even

within institutions. The Field Book Project draws on

existing standards and community input to develop a

structured online resource for contributing and locating

field book content.

This poster provides examples of user needs related to field

books; illustrates the use of different metadata schemas

within the system and how they have been linked together

to bridge collection and item level descriptions; and invites

discussion on the potential impacts in terms of establishing

best practices, improving access, and leveraging the

technological capabilities of XML to expand content and

features in the future.

Keywords

Metadata, Standards for metadata, XML

INTRODUCTION

The Field Book Project is a joint initiative between the

Smithsonian Institution Archives and the National Museum

of Natural History. Our overall mission is to create one

online location for scholars and others to visit when

searching for field books and other field research materials.

The scope of the Field Book Project focuses on content

related to biodiversity research, specifically botany,

entomology, and vertebrate and invertebrate zoology.

The project will begin as a Smithsonian-wide initiative and

lay the foundation for an online Field Book Registry

comprised of content contributed by museums and research

institutions from throughout the country. The Field Book

Project is funded by the Council for Library and

Information Resources (CLIR).

Why are field books so important?

Field books are THE original source materials documenting

field activities. They are the source for specimen labels and

catalog entries. They include not only identifications for

specimens collected but also often journaling, maps, photos,

and other rich contextual information.

The supplemental information found in field books can

provide clues about interspecies relationships and can be

used to guide habitat reconstruction and responsible land

management. In addition, a range of personal observations

and insights into the environmental and cultural conditions

of a given place at the time collection can be gleaned from

prose journal entries found in some researchers’ notes.

BACKGROUND

The Field Book Project grew out of the need for better

resources, both for those managing field book collections

and those trying to access them. Despite their incredible

research and intrinsic value, field books frequently land in

the “hidden” category, meaning that they are collections for

which little to no documentation exists or that the

documentation itself is difficult to access.

Different forms of custodianship--library, archive, and

museum--have led to variations in descriptive and

collection management practices that complicate the

This is the space reserved for copyright notices.

ASIST 2011, October 9-13, 2011, New Orleans, LA, USA. Copyright notice continues right here.

Page 2: Together under one roof: Combining collection and item level description through multiple metadata schemas

discovery process for those seeking field books. In archives, field books are typically grouped together by creator or expedition and described in collection-level finding aids. Discipline-specific libraries produce item-level inventory lists, somewhat similar to what might be found in a typical library catalog record, albeit more minimal. There are often fewer access points and a lack of controlled terminology for person or place names. It is also important to note that, as field books are unique and non-circulating materials, these descriptions are not necessarily made available through the libraries' public facing catalogs. Additionally, field books are frequently passed down from mentors to mentees, leaving researchers with the gargantuan task of identifying the keepers of institutional memory to locate those materials stashed in curators’ offices or on laboratory

shelves.

This quickly becomes more complicated as related field books can be distributed amongst multiple institutions. A given collector’s work may span affiliations with multiple

institutions, and a given expedition may include collectors from multiple disciplines. Research conducted by Rusty Russell, co-PI on the Field Book Project, led to locating field books from the U.S. Exploring Expedition in no less than 15 libraries, archives, and museums.

Related Work

There are several examples of libraries, archives and museums making field book holdings available online. Three projects and their metadata approaches are described in this section. The following section will present the Field Book Project approach and how it relates to these.

The University of Florida Digital Collections (UFDC) (http://ufdc.ufl.edu/UF00073894/00001/1j?toc=y) provides item-level access to digitally imaged surrogates of Walter Judd’s field books. Transcribed species names have been linked back to the page on which they occur, providing a wonderful level of granularity for navigating through the digital objects. The UFDC system boasts “rich metadatasupport, with automatic transformations for maximum interoperability.” UFDC records follow the METS/MODS metadata standards and are available for export as METS/MODS, MARCXML, and qualified Dublin Core.

The Jepson Herbarium at the University of California, Berkeley, digitally imaged Willis Linn Jepson’s field books

(http://ucjeps.berkeley.edu/images/fieldbooks/jepson_fieldbooks.html). The website provides an item level index of the field books with brief descriptions that include the volume number, date range, and collection numbers included in the volume. Additional information is available for some volumes and may include a description of the collection event, notable information on the volume’s content, and

status of transcription. For example: “Volume 28: Sep 7 1913 to Dec 1914: collection numbers 5636 to 5729International Phytogeographers Excursion to Yosemite. Extensive notes on people, places, errors, etc. [completely

transcribed]”. They are also collecting transcriptions making it possible to link the content to databased specimens described in the field books. Similarly, the California Academy of Sciences is leading Connecting Content (http://research.calacademy.org/library/fieldnotes), a project which the Field Book Project partners with to identify connections between specimens, field book content, and references in published literature.

The Yale Peabody Museum of Natural History recently received funding from CLIR to catalog field books (http://fromdnatodinosaurs.blogspot.com/2011/05/peabody-receives-hidden-collections.html). Their project represents a larger, institution-wide scope to establish intellectual control over field books. Their approach will include a combination of the Dublin Core (DC) and Darwin Core (DwC) metadata schemas.

APPROACH

The Field Book Project solution is to create a “fusion” of

metadata schemas that will bridge the contextual information provided by a collection level description (as in finding aids) with the more granular descriptions that are possible at the item level (as found in inventory lists).

Combining levels of description improves an end user’s

ability to make effective relevance assessments. Collection level descriptions supply the contextual information surrounding the creation of the objects: the creator’s career,

the purpose of creation, and sometimes the temporal and cultural context in which the collection was created. Adding item level descriptions supplies a deeper granularity on the geographic and temporal conditions of collecting events and can provide additional insight into how items within a collection relate to and differ from one another.

The rich descriptive framework presented here will be implemented in three phases with a final product delivered in an XML environment. XML provides a flexible structure that supports ease of migration and helps ensure that as standards and user expectations evolve, the data in the Field Book Registry will be poised to adapt and remain accessible over time.

Metadata

We draw from three existing metadata standards, each available as xml schema:

Natural Collections Description (NCD): for collection-level description of all type of natural history collections Metadata Object Description Schema (MODS):based on MARC-21 for item-level descriptions (It is important to note that we will also be implementing the Metadata Encoding Transmission Standard (METS) to support page level navigation once digitization begins) Encoded Archival Context (EAC): supports consistent and controlled entry of names for entities

Page 3: Together under one roof: Combining collection and item level description through multiple metadata schemas

involved in the creation and maintenance of the

collections and items

Natural Collections Description

Developed by the Biodiversity Information Standards

(TDWG, formerly the Taxonomic Databases Working

Group), NCD is based on Dublin Core and supports

collection-level description of natural history collections.

As mentioned under Related Work, Dublin Core and

Darwin Core (another TDWG standard) will comprise the

core of the Peabody’s data structure for their field book

holdings. We anticipate that some overlap in DC-based

elements and the natural history focus of both NCD and

DwC will make the two approaches highly compatible and

facilitate data sharing and cross-searching in the future.

Metadata Object Description Schema

Developed and maintained by the Library of Congress, this

schema consists of a subset of MARC fields, presented as

language-based xml tags rather than the numeric codes

found in MARC 21 (Guenther and McCallum, 2003). This

provides a notable advantage for the multi-institutional

Registry as some contributing institutions may not have

individuals on staff trained in traditional library cataloging.

Recall that UFDC also uses METS/MODS, making their

item level records already closely aligned with this

approach.

Encoded Archival Context

The EAC-CPF schema captures information on the entities

involved in the creation, use, and maintenance of the

materials. EAC-CPF is maintained by the Society of

American Archivists in partnership with the Berlin State

Library. The Field Book Registry will use EAC-CPF to

provide historical and bibliographic context and help reduce

ambiguity for person and corporate names. Consistency in

named entity entries will be especially important as the

Registry expands to accept records contributed by multiple

institutions.

Phases of Implementation

The Field Book Project brings together key elements from

each of the three schemas to form the Registry which will

be implemented in three phases.

Phase 1: A Local Prototype

The first implementation is a local prototype developed in

FileMaker 11. This phase is a temporary solution to enable

the project team to begin producing catalog records while

the more robust system is developed. The data structure for

the FileMaker database closely follows the structure of the

three schema described above. At the time of writing, more

than 70 collection records, 1,250 item records, and 230

EAC records have been created in the FileMaker Prototype

Registry. We are currently testing the xml export and

conversion in preparation for Phase 2.

Phase 2: Robust online implementation in XML

The second phase will move the system from FileMaker to

a web-based, xml-driven implementation. Development is

underway in a Drupal/Fedora repository. Additional

functions to be implemented during this phase include:

ability to save searches and results; ability to generate

citations; and sequential page navigation (for when

digitized pages are available).

Phase 3: An Open Registry

Phase 3 will extend the Field Book Registry to accept

catalog records created by partner institutions and

interactions with end users. Partner institutions creating

new records will either do so locally and then batch ingest

or work directly within the web-based environment. Since

each of the schema is XML-based, developing metadata

crosswalks between MARC, Dublin Core and any number

of other commonly used standards, could be used to extract

data from many pre-existing systems to populate the

majority of elements in these records. Additionally, the

biodiversity research community has already established

many collaborative, consortial information resources that

we can model this project after. Some notable examples

include: the Biodiversity Heritage Library (BHL);

Encyclopedia of Life (EOL); and the BioSciCol Project.

POTENTIAL IMPACT

While we are only in the first phase of a three phase project,

we have identified potential areas of impact in terms of

improved access, best practices, and technological

capabilities for field book collections.

Improving access

The “fusion” approach to metadata for field books will help

us reach our goal of bridging collection and item level

access points to respond to a range of information needs.

Field books and journals frequently comprise the core

documentation of all collecting events from a given

researcher’s career. As such, collection-level description

helps to maintain the functional context in which each

volume/item was created and establishes clear relationships

to other items created within the same context. In addition,

description at this level is an efficient way for institutions

without the resources to perform item-level cataloging to

begin to describe and provide access to their collections.

Due to the nature of the materials, and their relationships to

a vast number of other items in natural history collections,

effective and efficient access also greatly benefits from

item-level description. Bringing both approaches together

under one roof allows end users to make more informed

relevance assessments and to follow relationships within

and across collections.

Best practices

As a large scale community resource, the Field Book

Registry will serve as an infrastructure for accepting field

book content contributed by institutions from throughout

the country. One of our primary goals is to ensure that this

Page 4: Together under one roof: Combining collection and item level description through multiple metadata schemas

infrastructure will be nonproprietary and easy to adopt by

both large and small institutions. An XML environment

was chosen for its ease of data migration and flexibility for

responding to changing community and user expectations.

Based on existing standards and community input, this

infrastructure is also poised to serve as a foundation for

standards and best practices. We’re excited to see how our

approach can be aligned with other simultaneously

occurring efforts for improving field book access.

Technical possibilities

There are a number of possibilities for extending the

technological capabilities of the Field Book Registry. The

metadata schemas adopted are designed to be easily

mapped to similar schemas (MODS to MARC, NCD to

DC), making them agile and responsive to changing

technological formats. At the same time, they offer a

richness that a purely skeleton schema like DC cannot offer.

In fact, according to the Digital Library Federation (2007),

OAI-PMH best practices encourages the user of multiple

metadata formats to provide the richest descriptions

possible. Simple DC does set an underlying structure that is

easily shared but it cannot support description to the depth

that many specialized collections require for adequate

access. Simple DC also does not support the use of

controlled vocabularies needed for many collections. With

this enriched level of access, the Field Book Registry can

potentially balance interoperability and compatibility with

in-depth, discipline-specific description.

On a grander scale, the underlying XML structure could be

modified to include RDF, aligning it with Linked Open

Data protocols and preparing the data for inclusion in the

Semantic Web.

This poster provides examples of user needs related to field

books; illustrates how the multiple metadata schemas have

been linked together to bridge collection and item level

descriptions; and invites discussion on the potential impacts

in terms of establishing best practices, improving access,

and leveraging the technological capabilities of XML to

expand content and features in the future.

ACKNOWLEDGMENTS

The authors would like to recognize the contributions of

numerous members of the library, archives, museum and

biodiversity communities who have given their time and

input into the design of what will become the Field Book

Registry, including: Markus Döring and Éamonn Ó Tuama,

GBIF Secretariat; Christina V. Fidler, California Academy

of Sciences (CAS); Rebecca Guenther, Library of

Congress; Barbara Mathe, American Museum of Natural

History Library; Stacy Schiff, American Museum of

Natural History Library; Suzanne Pilsk, Smithsonian

Institution Libraries and Biodiversity Heritage Library; and

Katherine Wisser, EAC Working Group Chair.

The authors would also like to acknowledge their

colleagues on the Field Book Project Team: Rusty Russell

and Anne Van Camp, Principal Investigators; Lesley

Parilla, Cataloger; Ricc Ferrante, Director of Digital

Services; Tammy Peters, Supervisory Archivist; and each

of our partner institutions: Biodiversity Heritage Library;

Botany Libraries of the Harvard University Herbaria;

California Academy of Sciences; Ernst Mayr Library at

Harvard University; LuEsther T. Mertz Library at The New

York Botanical Garden; Missouri Botanical Garden.

REFERENCES

Digital Library Federation. (2007) Multiple Metadata

Formats. Retrieved June 30, 2011 at

http://webservices.itcs.umich.edu/mediawiki/oaibp/index.

php/MultipleMetadataFormats.

Encoded Archival Context-Corporate Bodies, Persons and

Families. Society of American Archivists and Berlin

State Library. Retrieved January 28, 2011 from

http://eac.staatsbibliothek-berlin.de/.

Guenther, R. and McCallum, S. (2003) New metadata

standards for digital resources: MODS and METS.

Bulletin of the American Society for Information Science

& Technology,. Retrieved January 28, 2011 at:

http://findarticles.com/p/articles/mi_qa3991/is_200212/ai

_n9150534/.

Haas, J. K., Samuels, H.W., and Simmons, B.T. (1985)

Appraising the Records of Modern Science and

Technology: A Guide. Massachusetts Institute of

Technology.

Metadata Encoding and Transmission Standard. Network

Development and MARC Standards Office of the Library

of Congress, and developed as an initiative of the Digital

Library Federation. Retrieved January 28, 2011 from

http://www.loc.gov/standards/mets/.

Metadata Object Description Schema. Library of Congress.

Retrieved January 28, 2011 from

http://www.loc.gov/standards/mods/.

Natural Collections Description. TDWG Interest Group.

Overview: http://www.tdwg.org/activities/ncd/ and v0.7

Schema: http://rs.tdwg.org/ncd/0.70/ncd.xsd, Retrieved

January 28, 2011.