library of congress metadata landscape sally h. mccallum [email protected]

38
Library of Congress Metadata Landscape Sally H. McCallum [email protected]

Upload: kimberly-logan

Post on 30-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Library of Congress Metadata Landscape

Sally H. [email protected]

Page 2: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Content Library of Congress perspective on

Descriptive metadata now Descriptive metadata evolution Broader metadata concerns

Page 3: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

LC metadata needs Same problem everyone has:

Many type of resources• Books, journals, maps, audio, moving image, still

image, artifacts, electronic Many possible levels of access

• Collection, item, analytic, cut, etc. Many items

• 125+ million non-electronic Cataloging for electronic resources

• 3+ million digital resources Linking to electronic resources

Page 4: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

LC service perspective Coherence and consistency (as

much as possible) Explainable to the end user

Page 5: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Primary access tools at LC

Online catalog content tagging Full level cataloging AACR MARC 21 Minimal level AACR MARC 21 Initial Bib. control AACR-like MARC 21

Some collections represented by collection level records in catalog connect to finding aid tools

Page 6: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Finding aid tools at LC Finding aids local EAD

Various collections of manuscripts, music, photographs SONIC catalog AACR-like MARC-like

Sound recording collections

PPOC catalog AACR MARC21 Photograph collections

InQuery mixed internal Digital conversion collections

Indexing and abstracting services Serials

Page 7: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Current LC cataloging “feeds”

Vendor records (MARC 21) Copy from OCLC, etc. (MARC 21) Publisher records (ONIX) Other for special materials Metadata in digital objects Metadata with digital objects

(future)

Page 8: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

LC Links to electronic resources

URIs or equivalent in catalog records and finding aids

handle server OpenURL (experimentation)

Page 9: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Questions Is this coherent and consistent? Is it scalable to electronic resources? Do all resources need the same kind

of treatment? How about proliferating metadata

schemas? How do we maintain evolutionary

pathway and standardization?

Page 10: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

We see content diversity Content = the data (title, subject term,

etc.)• AACR data, EAD data, DC data, ONIX data,

…• Different use of content rules: Does MARC

main entry = DC creator = ONIX contributor?

• More types – administrative, structural, product data, rights management

•Global library community convergence on AACR for descriptive metadata?

Page 11: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

We see markup diversity

Markup = data tagging• MARC 21 tags, DC tags, ONIX tags, MAB

tags, UNIMARC tags• HTML tags• EAD DTD tags• (XML tag sets easy to establish)•Global library community convergence

on MARC 21 and EAD?

Page 12: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

We see different structures Structure = record “arrangement”

• ISO 2709• Microsoft Access• DTDs, Schemas• SGML, XML, HTML, ?ML family•Convergence on XML, Schemas

Page 13: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

And at LC we have 13,000,000 MARC 21bibliographic

records in primary catalog

5,000,000 MARC 21 name authority records

300,000 MARC 21 subject records 350 trained catalogers Integrated Library System

Page 14: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Descriptive metadata evolution

Need to take advantage of XML Establish standard MARC 21 in an XML structure

Need simpler (but compatible) alternatives Development of MODS

Need interoperability with different schemas Assemble coordinated set of tools

Need continuity with current data Provide flexible transition options

Page 15: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARC 21 evolution to XML

Page 16: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARC 21 (2709) MARC 21 (2709) records

Highly developed semantic content Installed base of 1000s of MARC 21

systems Over 1,000,000,000 MARC 21 records in

local and network systems Accessible to 100s of Z39.50 clients Thousands of librarians who “speak”

MARC 21

Page 17: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARC 21 (2709) record (machine view)

00967cam 2200277 a 4500 001000800000005001700008008004100025020005300229040001800282050002400312082002100336100003000357245007400387260004400461300003500505440001200540500002000552650004200572651002500614

347139419990429094819.1931129s1994 wauab 001 0 eng a 93047676 a0898863872 (acid-free, recycled paper) :c$14.95 aDLCcDLCcDLC 00aGV1046.G3bG47 199400a796.6/4/09432201 aSlavinski, Nadine,d1968-10aGermany by bike :b20 tours geared for discovery /cNadine Slavinski. aSeattle, Wash. :bMountaineers,cc1994. a238 p. :bill., maps ;c22 cm. 0aBy bike aIncludes index. 0aBicycle touringzGermanyxGuidebooks.

Page 18: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML - MARC 21 in XML

MARCXML record XML exact equivalent of MARC (2709)

record Lossless/roundtrip conversion to/from MARC

21 record Simple flexible XML schema, no need to

change when MARC 21 changes Presentations using XML stylesheets Converters available from LC, open source LC using with OAI, METS, ZING Adopted by OAI to replace oai_marc

Page 19: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARC21 (2709) to MARCXML<record xmlns="http://www.loc.gov/MARC21/slim">

<leader>00967cam 2200277 a 4500</leader><controlfield tag="001">3471394</controlfield><controlfield tag="005">19990429094819.1</controlfield><controlfield tag="008">931129s1994 wauab 001 0 eng </controlfield><datafield tag="020" ind1=" " ind2=" ">

<subfield code="a">0898863872 (acid-free, recycled paper) :</subfield><subfield code="c">$14.95</subfield>

</datafield><datafield tag="040" ind1=" " ind2=" ">

<subfield code="a">DLC</subfield><subfield code="c">DLC</subfield><subfield code="d">DLC</subfield>

</datafield><datafield tag="050" ind1="0" ind2="0">

<subfield code="a">GV1046.G3</subfield><subfield code="b">G47 1994</subfield>

</datafield><datafield tag="082" ind1="0" ind2="0">

<subfield code="a">796.6/4/0943</subfield><subfield code="2">20</subfield>

</datafield><datafield tag="100" ind1="1" ind2=" ">

<subfield code="a">Slavinski, Nadine,</subfield><subfield code="d">1968-</subfield>

</datafield>

Page 20: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML record (continued)<datafield tag="245" ind1="1" ind2="0">

<subfield code="a">Germany by bike :</subfield><subfield code="b">20 tours geared for discovery /</subfield><subfield code="c">Nadine Slavinski.</subfield>

</datafield><datafield tag="260" ind1=" " ind2=" ">

<subfield code="a">Seattle, Wash. :</subfield><subfield code="b">Mountaineers,</subfield><subfield code="c">c1994.</subfield>

</datafield><datafield tag="300" ind1=" " ind2=" ">

<subfield code="a">238 p. :</subfield><subfield code="b">ill., maps ;</subfield><subfield code="c">22 cm.</subfield>

</datafield><datafield tag="440" ind1=" " ind2="0">

<subfield code="a">By bike</subfield></datafield><datafield tag="500" ind1=" " ind2=" ">

<subfield code="a">Includes index.</subfield></datafield><datafield tag="650" ind1=" " ind2="0">

<subfield code="a">Bicycle touring</subfield><subfield code="z">Germany</subfield><subfield code="x">Guidebooks.</subfield>

</datafield></record>

Page 21: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MODS

MODS Metadata Object Description Schema – a

MARC 21 companion Simpler element set than full MARC, but

MARC semantics - simplified coded data Richer element set than DC More compatible with MARC than others “Friendly” schema and tagging, no coded

values Special accommodation of electronic

resources

Page 22: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MODS for electronic resources Development

electronic resources an important target input from several digital library projects

Xlink attribute throughout Related item structure supports hierarchy

needed for complex digital objects Digital origin attribute Several date types specifically for digital projects

(e.g., capture) E-resource identifiers, e.g., DOI

Page 23: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MODS LC uses of MODS

Describing electronic resources• AV project, web archiving

Technician input• web archiving

Incorporation with XML resources• METS projects

OAI collections• LC offers MODS, MARCXML, DC simple

Page 24: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML to MODS<mods xmlns="http://www.loc.gov/mods/">

<titleInfo><title>Germany by bike : 20 tours geared for discovery /</title></titleInfo><name type="personal">

<namePart>Slavinski, Nadine,</namePart><namePart type="date">1968-</namePart><role>creator</role>

</name><typeOfResource>text</typeOfResource><publicationInfo>

<placeCode authority="marc">wau</placeCode><place>Seattle, Wash. :</place><publisher>Mountaineers,</publisher><dateIssued>c1994.</dateIssued><dateIssued encoding="marc">1994</dateIssued><issuance>monographic</issuance>

</publicationInfo><language authority="iso639-2b">eng</language><physicalDescription><extent>238 p. : ill., maps ; 22 cm.</extent></physicalDescription><note type="statement of responsibility">Nadine Slavinski.</note><note>Includes index.</note>

Page 25: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MODS (continued)<subject authority="lcsh">

<topic>Bicycle touring</topic><geographic>Germany</geographic><topic>Guidebooks.</topic>

</subject><classification authority="lcc">GV1046.G3 G47 1994</classification><classification authority="ddc" edition="20">796.6/4/0943</classification><relatedItem type="series">

<titleInfo><title>By bike</title></titleInfo></relatedItem><identifier type="isbn">0898863872 (acid-free, recycled paper) :</identifier><identifier type="lccn">93047676</identifier><recordInfo>

<recordContentSource>DLC</recordContentSource><recordCreationDate encoding="marc">931129</recordCreationDate><recordChangeDate encoding="iso8601">19990429094819.1</recordChangeDate><recordIdentifier>3471394</recordIdentifier>

</recordInfo></mods>

Page 26: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML and DC DC application target – cross domain,

metadata in document headers Transformation software important to help

standardize crosswalks - LC already maintains DCMARC 21 mapping

Transformation available from LC, open source

Offer items for OAI harvesting in DC (MARCXML and MODS)

Page 27: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML to DC

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title>Germany by bike : 20 tours geared for discovery </dc:title><dc:creator>Slavinski, Nadine, 1968-</dc:creator><dc:type>text</dc:type><dc:publisher>Seattle, Wash. : Mountaineers,</dc:publisher><dc:date>c1994.</dc:date><dc:language>eng</dc:language><dc:subject>Bicycle touring</dc:subject>

</rdf:Description>

Page 28: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML from ONIX

Publisher/bookseller record to MARC (2709) via MARCXML

Complex XML format with• traditional descriptive data possibilities• potentially useful descriptive data LC does not

currently have or supply• publisher/bookseller data not of current interest

Page 29: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

<imprint><b241>02</b241><b242>Clarion Books</b242><b243>HMCo008</b243>

</imprint><b081>Houghton Mifflin Company</b081><b209>New York</b209><b083>US</b083><b003>20021021</b003><b087>2002</b087><measure>

<c093>08</c093><c094>0.0</c094><c095>lb</c095>

</measure><d101>A little buckaroo is

turning two in this birthday book for the very young, the fifth story about the delightful holiday mice. Mischief and near disaster abound when the littlest ouses sister and brothers throw him a cowboy-themed party. Through simple rhymes and charming illustrations, readers witness the party preparations, the rrival of the guests, the opening of presents, and the blowing out of the candles, as well as the ensuing fulfillment of the little mouses fondest birthday wish: to be acowboy.</d101>

<mediafile><f114>04</f114><f115>05</f115> <f116>01</f116><f117>ftp://imagesro:[email protected]/low_res/juvenile_jacket_low_res/fall_2002/0618077723.tif</f117><f122>These images may be used only to promote the Houghton Mifflin publications with which they are associated. They may be used only in their entirety without any alteration, other than to change the size of the images. The images must be accompanied by any proprietary notice included therewith.</f122>

Snip from ONIX record

Page 30: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

<datafield ind1=" " ind2=" " tag="260"><subfield code="a">New York</subfield><subfield code="b">Houghton Mifflin Company</subfield><subfield code="c">2002</subfield>

</datafield> <datafield ind1=" " ind2=" " tag="300">

<subfield code="a">32 p.</subfield></datafield><datafield ind1=" " ind2=" " tag="521">

<subfield code="a">Children/juvenile.</subfield></datafield><datafield ind1="1" ind2=" " tag="700">

<subfield code="a">Cushman, Doug</subfield><subfield code="e">illustrator</subfield>

</datafield><datafield ind1="4" ind2="2" tag="856">

<subfield code="3">Front cover image</subfield><subfield code="u">ftp://imagesro:[email protected]/low_res/ju

venile_jacket_low_res/fall_2002/0618077723.tif</subfield><subfield code="z">These images may be used only to promote the Houghton

Mifflin publications with which they are associated. They may be used only in their entirety without any alteration, other than to change the size of the images. The images must be accompanied by any proprietary notice included therewith.</subfield>

</datafield>

Snip from MARCXML from ONIX

Page 31: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

MARCXML – other tools Tagging transformations

Name instead of number tags? Different language tags for MODS? MARC 21 XML “full” tagging oai_marc to MARCXML

Character set transformations MARCXML to FRBR tool (for

experimentation) MARC record validation tool

Page 32: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Uses of MARCXML and related tools

Standardize MARC 21 across community for XML communication and manipulation

Open MARC 21 to XML programming tools and presentation style sheets

Standardize MARC 21 for OAI harvesting Standardize transformations to and from other

standard formats (DC, ONIX, …) Basis for evolution while maintaining

standardization

Page 33: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Broader metadata needs LC descriptions of digitized items

includes technical and rights data not appropriate for MARC

Focusing on METS - Metadata Encoding and Transmission Standard

Descriptive, administrative, and structural in one XML document

Page 34: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

Characteristics of METS METS enables resource retrieval,

object validation, preservation, rights mgt., ...

Non-proprietary; being developed by library community

(relatively) Simple; extensible; modular

Page 35: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

METS Schema

Page 36: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

METS use LC

Moving image project Selected digital collections Developing a record creation utility

Others BnF web archiving and digital preservation OCLC web archiving NL Wales digital collections Harvard audio collection Michigan State, Berkeley, etc.

Page 37: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

In summary LC focuses on AACR, MARC 21, and

EAD for primary access New development is evolutionary Employing XML through MARCXML Focus for electronic documents in

MODS, a MARC derivative For broader metadata, METS and

appropriate extension schema

Page 38: Library of Congress Metadata Landscape Sally H. McCallum smcc@loc.gov

More information www.loc.gov/marcxml www.loc.gov/mods www.loc.gov/marc