together under one roof: combining collection and item level description through multiple metadata...
TRANSCRIPT
Together Under One Roof: Combining Collection and Item Level Description through Multiple Metadata Schemas
Carolyn Sheffield
National Museum of Natural History Smithsonian Institution Washington, DC 20056
Sonoe Nakasone Smithsonian Institution Archives
Washington, DC 20024 [email protected]
ABSTRACT
The Smithsonian’s Field Book Project presents a “fusion”
of metadata standards to meet the access needs of a diverse
user base and to set the framework for establishing best
practices for managing field book collections.
The key access challenges around field books stem from a
lack of best practices when it comes to collection
management and description. Field books are unique
materials that sometimes fall under the auspices of
departmental libraries or laboratories (item level
description), sometimes archives (collection level
description) and just as often can be found intermingled and
uncataloged in museum collections and curators’ offices
(little to no description). These varying forms of
custodianship result in collection management and
descriptive practices that are not consistent across or even
within institutions. The Field Book Project draws on
existing standards and community input to develop a
structured online resource for contributing and locating
field book content.
This poster provides examples of user needs related to field
books; illustrates the use of different metadata schemas
within the system and how they have been linked together
to bridge collection and item level descriptions; and invites
discussion on the potential impacts in terms of establishing
best practices, improving access, and leveraging the
technological capabilities of XML to expand content and
features in the future.
Keywords
Metadata, Standards for metadata, XML
INTRODUCTION
The Field Book Project is a joint initiative between the
Smithsonian Institution Archives and the National Museum
of Natural History. Our overall mission is to create one
online location for scholars and others to visit when
searching for field books and other field research materials.
The scope of the Field Book Project focuses on content
related to biodiversity research, specifically botany,
entomology, and vertebrate and invertebrate zoology.
The project will begin as a Smithsonian-wide initiative and
lay the foundation for an online Field Book Registry
comprised of content contributed by museums and research
institutions from throughout the country. The Field Book
Project is funded by the Council for Library and
Information Resources (CLIR).
Why are field books so important?
Field books are THE original source materials documenting
field activities. They are the source for specimen labels and
catalog entries. They include not only identifications for
specimens collected but also often journaling, maps, photos,
and other rich contextual information.
The supplemental information found in field books can
provide clues about interspecies relationships and can be
used to guide habitat reconstruction and responsible land
management. In addition, a range of personal observations
and insights into the environmental and cultural conditions
of a given place at the time collection can be gleaned from
prose journal entries found in some researchers’ notes.
BACKGROUND
The Field Book Project grew out of the need for better
resources, both for those managing field book collections
and those trying to access them. Despite their incredible
research and intrinsic value, field books frequently land in
the “hidden” category, meaning that they are collections for
which little to no documentation exists or that the
documentation itself is difficult to access.
Different forms of custodianship--library, archive, and
museum--have led to variations in descriptive and
collection management practices that complicate the
This is the space reserved for copyright notices.
ASIST 2011, October 9-13, 2011, New Orleans, LA, USA. Copyright notice continues right here.
discovery process for those seeking field books. In archives, field books are typically grouped together by creator or expedition and described in collection-level finding aids. Discipline-specific libraries produce item-level inventory lists, somewhat similar to what might be found in a typical library catalog record, albeit more minimal. There are often fewer access points and a lack of controlled terminology for person or place names. It is also important to note that, as field books are unique and non-circulating materials, these descriptions are not necessarily made available through the libraries' public facing catalogs. Additionally, field books are frequently passed down from mentors to mentees, leaving researchers with the gargantuan task of identifying the keepers of institutional memory to locate those materials stashed in curators’ offices or on laboratory
shelves.
This quickly becomes more complicated as related field books can be distributed amongst multiple institutions. A given collector’s work may span affiliations with multiple
institutions, and a given expedition may include collectors from multiple disciplines. Research conducted by Rusty Russell, co-PI on the Field Book Project, led to locating field books from the U.S. Exploring Expedition in no less than 15 libraries, archives, and museums.
Related Work
There are several examples of libraries, archives and museums making field book holdings available online. Three projects and their metadata approaches are described in this section. The following section will present the Field Book Project approach and how it relates to these.
The University of Florida Digital Collections (UFDC) (http://ufdc.ufl.edu/UF00073894/00001/1j?toc=y) provides item-level access to digitally imaged surrogates of Walter Judd’s field books. Transcribed species names have been linked back to the page on which they occur, providing a wonderful level of granularity for navigating through the digital objects. The UFDC system boasts “rich metadatasupport, with automatic transformations for maximum interoperability.” UFDC records follow the METS/MODS metadata standards and are available for export as METS/MODS, MARCXML, and qualified Dublin Core.
The Jepson Herbarium at the University of California, Berkeley, digitally imaged Willis Linn Jepson’s field books
(http://ucjeps.berkeley.edu/images/fieldbooks/jepson_fieldbooks.html). The website provides an item level index of the field books with brief descriptions that include the volume number, date range, and collection numbers included in the volume. Additional information is available for some volumes and may include a description of the collection event, notable information on the volume’s content, and
status of transcription. For example: “Volume 28: Sep 7 1913 to Dec 1914: collection numbers 5636 to 5729International Phytogeographers Excursion to Yosemite. Extensive notes on people, places, errors, etc. [completely
transcribed]”. They are also collecting transcriptions making it possible to link the content to databased specimens described in the field books. Similarly, the California Academy of Sciences is leading Connecting Content (http://research.calacademy.org/library/fieldnotes), a project which the Field Book Project partners with to identify connections between specimens, field book content, and references in published literature.
The Yale Peabody Museum of Natural History recently received funding from CLIR to catalog field books (http://fromdnatodinosaurs.blogspot.com/2011/05/peabody-receives-hidden-collections.html). Their project represents a larger, institution-wide scope to establish intellectual control over field books. Their approach will include a combination of the Dublin Core (DC) and Darwin Core (DwC) metadata schemas.
APPROACH
The Field Book Project solution is to create a “fusion” of
metadata schemas that will bridge the contextual information provided by a collection level description (as in finding aids) with the more granular descriptions that are possible at the item level (as found in inventory lists).
Combining levels of description improves an end user’s
ability to make effective relevance assessments. Collection level descriptions supply the contextual information surrounding the creation of the objects: the creator’s career,
the purpose of creation, and sometimes the temporal and cultural context in which the collection was created. Adding item level descriptions supplies a deeper granularity on the geographic and temporal conditions of collecting events and can provide additional insight into how items within a collection relate to and differ from one another.
The rich descriptive framework presented here will be implemented in three phases with a final product delivered in an XML environment. XML provides a flexible structure that supports ease of migration and helps ensure that as standards and user expectations evolve, the data in the Field Book Registry will be poised to adapt and remain accessible over time.
Metadata
We draw from three existing metadata standards, each available as xml schema:
Natural Collections Description (NCD): for collection-level description of all type of natural history collections Metadata Object Description Schema (MODS):based on MARC-21 for item-level descriptions (It is important to note that we will also be implementing the Metadata Encoding Transmission Standard (METS) to support page level navigation once digitization begins) Encoded Archival Context (EAC): supports consistent and controlled entry of names for entities
involved in the creation and maintenance of the
collections and items
Natural Collections Description
Developed by the Biodiversity Information Standards
(TDWG, formerly the Taxonomic Databases Working
Group), NCD is based on Dublin Core and supports
collection-level description of natural history collections.
As mentioned under Related Work, Dublin Core and
Darwin Core (another TDWG standard) will comprise the
core of the Peabody’s data structure for their field book
holdings. We anticipate that some overlap in DC-based
elements and the natural history focus of both NCD and
DwC will make the two approaches highly compatible and
facilitate data sharing and cross-searching in the future.
Metadata Object Description Schema
Developed and maintained by the Library of Congress, this
schema consists of a subset of MARC fields, presented as
language-based xml tags rather than the numeric codes
found in MARC 21 (Guenther and McCallum, 2003). This
provides a notable advantage for the multi-institutional
Registry as some contributing institutions may not have
individuals on staff trained in traditional library cataloging.
Recall that UFDC also uses METS/MODS, making their
item level records already closely aligned with this
approach.
Encoded Archival Context
The EAC-CPF schema captures information on the entities
involved in the creation, use, and maintenance of the
materials. EAC-CPF is maintained by the Society of
American Archivists in partnership with the Berlin State
Library. The Field Book Registry will use EAC-CPF to
provide historical and bibliographic context and help reduce
ambiguity for person and corporate names. Consistency in
named entity entries will be especially important as the
Registry expands to accept records contributed by multiple
institutions.
Phases of Implementation
The Field Book Project brings together key elements from
each of the three schemas to form the Registry which will
be implemented in three phases.
Phase 1: A Local Prototype
The first implementation is a local prototype developed in
FileMaker 11. This phase is a temporary solution to enable
the project team to begin producing catalog records while
the more robust system is developed. The data structure for
the FileMaker database closely follows the structure of the
three schema described above. At the time of writing, more
than 70 collection records, 1,250 item records, and 230
EAC records have been created in the FileMaker Prototype
Registry. We are currently testing the xml export and
conversion in preparation for Phase 2.
Phase 2: Robust online implementation in XML
The second phase will move the system from FileMaker to
a web-based, xml-driven implementation. Development is
underway in a Drupal/Fedora repository. Additional
functions to be implemented during this phase include:
ability to save searches and results; ability to generate
citations; and sequential page navigation (for when
digitized pages are available).
Phase 3: An Open Registry
Phase 3 will extend the Field Book Registry to accept
catalog records created by partner institutions and
interactions with end users. Partner institutions creating
new records will either do so locally and then batch ingest
or work directly within the web-based environment. Since
each of the schema is XML-based, developing metadata
crosswalks between MARC, Dublin Core and any number
of other commonly used standards, could be used to extract
data from many pre-existing systems to populate the
majority of elements in these records. Additionally, the
biodiversity research community has already established
many collaborative, consortial information resources that
we can model this project after. Some notable examples
include: the Biodiversity Heritage Library (BHL);
Encyclopedia of Life (EOL); and the BioSciCol Project.
POTENTIAL IMPACT
While we are only in the first phase of a three phase project,
we have identified potential areas of impact in terms of
improved access, best practices, and technological
capabilities for field book collections.
Improving access
The “fusion” approach to metadata for field books will help
us reach our goal of bridging collection and item level
access points to respond to a range of information needs.
Field books and journals frequently comprise the core
documentation of all collecting events from a given
researcher’s career. As such, collection-level description
helps to maintain the functional context in which each
volume/item was created and establishes clear relationships
to other items created within the same context. In addition,
description at this level is an efficient way for institutions
without the resources to perform item-level cataloging to
begin to describe and provide access to their collections.
Due to the nature of the materials, and their relationships to
a vast number of other items in natural history collections,
effective and efficient access also greatly benefits from
item-level description. Bringing both approaches together
under one roof allows end users to make more informed
relevance assessments and to follow relationships within
and across collections.
Best practices
As a large scale community resource, the Field Book
Registry will serve as an infrastructure for accepting field
book content contributed by institutions from throughout
the country. One of our primary goals is to ensure that this
infrastructure will be nonproprietary and easy to adopt by
both large and small institutions. An XML environment
was chosen for its ease of data migration and flexibility for
responding to changing community and user expectations.
Based on existing standards and community input, this
infrastructure is also poised to serve as a foundation for
standards and best practices. We’re excited to see how our
approach can be aligned with other simultaneously
occurring efforts for improving field book access.
Technical possibilities
There are a number of possibilities for extending the
technological capabilities of the Field Book Registry. The
metadata schemas adopted are designed to be easily
mapped to similar schemas (MODS to MARC, NCD to
DC), making them agile and responsive to changing
technological formats. At the same time, they offer a
richness that a purely skeleton schema like DC cannot offer.
In fact, according to the Digital Library Federation (2007),
OAI-PMH best practices encourages the user of multiple
metadata formats to provide the richest descriptions
possible. Simple DC does set an underlying structure that is
easily shared but it cannot support description to the depth
that many specialized collections require for adequate
access. Simple DC also does not support the use of
controlled vocabularies needed for many collections. With
this enriched level of access, the Field Book Registry can
potentially balance interoperability and compatibility with
in-depth, discipline-specific description.
On a grander scale, the underlying XML structure could be
modified to include RDF, aligning it with Linked Open
Data protocols and preparing the data for inclusion in the
Semantic Web.
This poster provides examples of user needs related to field
books; illustrates how the multiple metadata schemas have
been linked together to bridge collection and item level
descriptions; and invites discussion on the potential impacts
in terms of establishing best practices, improving access,
and leveraging the technological capabilities of XML to
expand content and features in the future.
ACKNOWLEDGMENTS
The authors would like to recognize the contributions of
numerous members of the library, archives, museum and
biodiversity communities who have given their time and
input into the design of what will become the Field Book
Registry, including: Markus Döring and Éamonn Ó Tuama,
GBIF Secretariat; Christina V. Fidler, California Academy
of Sciences (CAS); Rebecca Guenther, Library of
Congress; Barbara Mathe, American Museum of Natural
History Library; Stacy Schiff, American Museum of
Natural History Library; Suzanne Pilsk, Smithsonian
Institution Libraries and Biodiversity Heritage Library; and
Katherine Wisser, EAC Working Group Chair.
The authors would also like to acknowledge their
colleagues on the Field Book Project Team: Rusty Russell
and Anne Van Camp, Principal Investigators; Lesley
Parilla, Cataloger; Ricc Ferrante, Director of Digital
Services; Tammy Peters, Supervisory Archivist; and each
of our partner institutions: Biodiversity Heritage Library;
Botany Libraries of the Harvard University Herbaria;
California Academy of Sciences; Ernst Mayr Library at
Harvard University; LuEsther T. Mertz Library at The New
York Botanical Garden; Missouri Botanical Garden.
REFERENCES
Digital Library Federation. (2007) Multiple Metadata
Formats. Retrieved June 30, 2011 at
http://webservices.itcs.umich.edu/mediawiki/oaibp/index.
php/MultipleMetadataFormats.
Encoded Archival Context-Corporate Bodies, Persons and
Families. Society of American Archivists and Berlin
State Library. Retrieved January 28, 2011 from
http://eac.staatsbibliothek-berlin.de/.
Guenther, R. and McCallum, S. (2003) New metadata
standards for digital resources: MODS and METS.
Bulletin of the American Society for Information Science
& Technology,. Retrieved January 28, 2011 at:
http://findarticles.com/p/articles/mi_qa3991/is_200212/ai
_n9150534/.
Haas, J. K., Samuels, H.W., and Simmons, B.T. (1985)
Appraising the Records of Modern Science and
Technology: A Guide. Massachusetts Institute of
Technology.
Metadata Encoding and Transmission Standard. Network
Development and MARC Standards Office of the Library
of Congress, and developed as an initiative of the Digital
Library Federation. Retrieved January 28, 2011 from
http://www.loc.gov/standards/mets/.
Metadata Object Description Schema. Library of Congress.
Retrieved January 28, 2011 from
http://www.loc.gov/standards/mods/.
Natural Collections Description. TDWG Interest Group.
Overview: http://www.tdwg.org/activities/ncd/ and v0.7
Schema: http://rs.tdwg.org/ncd/0.70/ncd.xsd, Retrieved
January 28, 2011.