non-marc cataloging standards overview: tei & ead, mods, mets, xml- based marc eric childress...

17
Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC February 10, 2003 OCLC

Upload: gwen-johns

Post on 28-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Non-MARC Cataloging

Standards Overview:

TEI & EAD, MODS, METS, XML-based MARC

Non-MARC Cataloging

Standards Overview:

TEI & EAD, MODS, METS, XML-based MARCEric Childress

OCLC

Eric Childress

OCLC

                                                                                                             

February 10, 2003OCLC

Page 2: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

OverviewOverview

• Fundamentals– Metadata and content – Types of metadata– Document mark-up languages & character encoding

• The Big Picture• Metadata formats:

– MARC– MODS– METS– MIX– TEI – EAD– ONIX

Page 3: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

FundamentalsFundamentalsMetadata and content

3333Metadata linked to content object•MARC record with URL for ftp object

2222Metadata separate from content object•Book + catalog card•Book + MARC record

1111Metadata embedded in content object•Title page / CIP•HTML header in HTML document

4444

Metadata embedded and linked•MARC record with URL for HTML document•PDF document linked to DC-XML record

•Aggregation of discrete objects linked to record

Page 4: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

FundamentalsFundamentalsTypes of metadata

Administrative metadata:•Data about the metadata

•(e.g. record number)

Descriptive metadata:•Description of the object for discovery and retrieval

•(e.g. Title)

Technical metadata:•Technical characteristics of the object

•(e.g. file size)

Page 5: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

FundamentalsFundamentals

Markup languages:– Address the structure of a document– Convey instructions to software that will process text to:

• Index the text for searching• To render the text (e.g., for screen display or print) • Transform the text (e.g., for a voice synthesizer) for some output device(s)

– The markup is generally invisible to end-users

• Extensible Markup Language (XML):– XML is metalanguage: agencies define their own XML to suit their task by

creating Document Type Definitions (DTDs) or XML schema– Data separate from presentation instructions (recorded in a style sheet)– Offers just the right mix of flexibility and structure

Character encoding:– Used for communicating text characters in a computing environment– Hundreds of character encoding standards exist– Character conversion is complex and expensive

• Unicode: – A single, “comprehensive” global encoding standard– Includes characters from scripts of all major modern, most minor, and

selected ancient languages

Markup languages & Character encoding

Page 6: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

The Big PictureThe Big PictureStandards in a grid

Rich D

escription

Sim

ple

Des

crip

tion

ItemCollections

Dublin Core

RSLP

OAI set record

TEI

VRA Core

ONIX MARC 8

CSDGM

Page 7: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Library-related standardsLibrary-related standards

• MARC 21 (ISO 2709) MARC 8: – Library metadata communications format based on ISO 2709– Strengths:

• Mature standard• Widely adopted by libraries (U.S., Canada, and beyond)• Large universe of records available• Wide choice of software vendors

– Weaknesses (in the present & future): • Virtually unused outside of libraries • Field and record size limitations• Restricted range of scripts supported (MARC 8 repertoire only)• Limited ability to convey hierarchical & complex relationships, attributes• No ability to embed related objects (e.g., book cover GIF)• Cannot be directly processed by widely-used web applications

• MARC 21 (ISO 2709) Unicode:– MARC 21 with Unicode character encoding– Limited to 16K characters equivalent to MARC 8 repertoire

MARC 21 (ISO 2709)MARC

Page 8: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Library-related standardsLibrary-related standards

MARC 21 and XML:– Library of Congress’ MARCXML:

• LC’s schema provides a lossless conversion of MARC 21 (ISO2709) to XML

• LC’s XML framework positions MARCXML as both an end format and as an intermediate format to non-MARC formats

– Stanford University’s Lane Medical School’s XMLMARC:• Developed before LC’s MARCXML schema • Ignores/simplifies some MARC 21 data

UNIMARC and XML:– Ministère de la culture et de la communication (France),

Board of Research and Technology• BiblioML DTD for converting UNIMARC to XML • Conversion tools in development

MARC and XMLMARC

«  BiblioML »

Page 9: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Library-related standardsLibrary-related standards

• Metadata Object Description Schema (MODS) – Essentially MARC 21 recast in an XML-native framework

• Text-based tags rather than numeric ones, • Selected clusters of related MARC 21 attributes condensed into single MODS

element

– MARC 21 readily converts to MODS, but can’t do a lossless reverse conversion of MODS to MARC 21

• Value of MODS:– A rich, library-metadata-oriented XML metadata schema– Optimized for from-MARC conversion of legacy records– Selectively “improves” some of MARC’s mechanisms for representing

resource type– Well-suited as a metadata format for OAI harvesting– Maintained by the same agency (LC) that maintains MARC 21

• Applications of MODS:– LC planning to convert 100K American Memory records– Minerva project, U of Chicago Press, California Digital Library, others using

or planning to use for records for web sites, e-texts.

MODS

Page 10: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Library-related standardsLibrary-related standards

• Metadata Encoding and Transmission Standard (METS)– Standard for encoding descriptive, administrative, structural, rights and

other data essential for retrieving, preserving, and serving up digital resources

– Six modules (header, descriptive metadata, administrative metadata, file section, structural map, behavior section)

– Header and structural map are required; descriptive, administrative, behavior metadata may reside in METS object or be external.

• Value of METS:– Need for METS identified at DLF metadata experts meetings – varied local

approaches to non-descriptive metadata not scaling well nor supporting interoperability between agencies

– Can be used to collect digital resource metadata for submission to repository, hold metadata in the repository, inform user access applications

• Applications of METS:– LC using for moving images, audio recordings, folk life mixed media

collections– OCLC DPR, RLG, Harvard, National Library of Wales exploring or using for

variety of projects

METS

Page 11: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Library-related standardsLibrary-related standards

• Metadata for Images in XML (MIX)– Collaboration of LC and NISO Technical Metadata for Digital Still

Images Standards Committee– XML schema for a set of technical data elements required to

manage digital image collections– Format for interchange and/or storage of the data specified in the

NISO Draft Standard Data Dictionary: Technical Metadata for Digital Still Images (version 1.2)

– Still in early development and testing phases

• Value of MIX:– Provides a common XML schema for expressing technical data

particular to still and moving digital images– Can be used with other schema such as METS and MODS as part

of a comprehensive approach to managing and preserving digital images

• Applications of MIX:– OCLC DPR, LC, others planning or testing – MIX still in nascent stage of development and testing

MIX

Page 12: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

E-text-related standardE-text-related standard

• Text Encoding Initiative (TEI):– For complex markup of literary texts– Both SGML & XML [new] DTDs available– TEI “header” (TEIH) can be used as a descriptive metadata record– Maintenance agency: TEI Consortium

• TEI Consortium has executive offices in Bergen, Norway, and is hosted at four university sites worldwide: the University of Bergen, Brown University, Oxford University, and the University of Virginia

• Consortium maintains “P4” Guidelines for Electronic Text Encoding and Interchange

• Value of TEI:– Designed to meet the needs of scholarly research community (esp.

in the humanities) for a variety of activities including:• Adding in-line academic commentary in e-texts• As an aid to research through supporting special indexing points, etc.

• Applications of TEI:– Widely used by major humanities electronic text collections such as

CETH, UVa e-text center, many others.

TEI

Page 13: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Archives-related standardArchives-related standard

• Encoded Archival Description (EAD)– A format for expressing electronic archival finding aids – Created by LC and the Society of American Archivists (SAA)– EAD DTD (Version 2002) is designed to function as both an SGML

and XML DTD

• Value of EAD: – Effectively an organized presentation of a collection of documents

• EAD header carries metadata for the finding aid• Provides for simple or complex mark-up to support varying levels of

indexing• Well-suited for interweaving narrative with links to specific objects in a

collection (either directly to the object or via a record for the object that may link to the object).

• Applications of EAD:– Conversion of existing paper finding aids to electronic form– Widely used by academic institutions and archives in North America– RLG Archival Resources database host copies of many EADs

EAD

Page 14: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

Publishing-related standardPublishing-related standard

• ONIX International (Online Information Exchange):– Standard format for publishers to use to distribute electronic information

about their publications. – XML schema with Unicode encoding– Based on EPICS (EDItEUR Product Information Communication Standards) – Maintenance agency: EDItEUR working with input from the Book Industry

Communication (BIC) and the Book Industry Study Group (BISG)

• Value of ONIX:– Designed to meet needs of publishers, jobbers, retail sellers for

• richer book data online (including cover art)• a common data exchange format that will allow players to be rid of the burden of

costly, custom programming to handle data from individual suppliers

– Offers two levels of richness (level 1 & level 2)

• Applications of ONIX:– Primarily oriented towards jobbers and publishers – Most major players

(Amazon, Baker & Taylor, etc.) now using/supporting – Some interest in implementation in library systems

ONIXONIX

Page 15: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

&QuestionsQuestionsAA

nswersnswers

Page 16: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

LinksLinks

• MARC 21: http://lcweb.loc.gov/marc/marcdocz.html• MARCXML: http://www.loc.gov/marc/marcxml.html• XMLMARC: http://laneweb.stanford.edu:2380/wiki/medlane/xmlmarc• BiblioML (UNIMARC XML): http://www.culture.fr/BiblioML• MODS: http://www.loc.gov/standards/mods• METS: http://www.loc.gov/standards/mets• MIX: http://www.loc.gov/standards/mix• TEI: http://www.tei-c.org• EAD: http://www.loc.gov/ead• ONIX: http://www.editeur.org/onix.html

Further reading on MARCXML, MODS, METS:“New Metadata Standards for Digital Resources,” Bulletin of the

American Society for Information Science and Technology. Dec/Jan 2003, pp 12-15. http://www.asis.org/Bulletin/Dec-02/ASISTDecJan.pdf

Major emphasis in this presentation

Page 17: Non-MARC Cataloging Standards Overview: TEI & EAD, MODS, METS, XML- based MARC Eric Childress OCLC Eric Childress OCLC February 10, 2003 OCLC

LinksLinks

• SCORM: http://www.adlnet.org/index.cfm?fuseaction=scormabt• RSLP: http://www.ukoln.ac.uk/metadata/rslp• VRA Core: http://www.vraweb.org/vracore3.htm• IMS LOM: http://www.imsglobal.org/metadata• CSDGM: http://www.fgdc.gov/metadata/contstan.html• GEM: http://www.geminfo.org/Workbench• CIMI: http://www.cimi.org/old_site/standards

Also appearing (in Big Picture)