lorcan dempsey (with contributions from colleagues) vp research and chief strategist library of...

57
Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research directions in the areas of metadata management and knowledge organization. Presented to Library of Congress cataloging managers retreat.

Upload: barbara-robertson

Post on 13-Jan-2016

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Lorcan Dempsey(with contributions from colleagues)

VP Research and Chief Strategist

Library of Congress, 15 June 2004

OCLC: some development and research directionsin the areas of metadata management and

knowledge organization.

Presented to Library of Congress cataloging managers retreat.

Page 2: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

TopicsFramework for WorldCat directionsFramework for WorldCat directions

Metadata management and knowledge organizationMetadata management and knowledge organization

Working with web servicesWorking with web services

Making data work harderMaking data work harder

Some research, some productionSome research, some production

Open WorldCatOpen WorldCat

Page 3: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Framework for WorldCat directionsFramework for WorldCat directions

Page 4: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Collections grid

high low

low

high

stewardship

uni

que

ne

ssBooksJournals•Newspapers•Gov. docs•CD, DVD•Maps•Scores

Special collectionsArchives•Rare books•Local history materials•Archives & Manuscripts•Theses & dissertations

Research and learning materials

•ePrints/tech reports•Learning objects•Courseware•E-portfolios•Research data

Untransferred records

Freely-accessible web resources

Page 5: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

WorldCat – the what?

WorldCat: - Grow - Version - Improve

• Easier to use (FRBR)

• Microcontent

• Evaluative content

Add special collections & institutional content to WorldCat: dissertations,

cultural heritage collections, Eprints, learning objects

The Open WebBoth surface and acquire

WorldCat content

Page 6: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

WorldCat – the how?

Prospects

OAI RepositoriesCONTENTdmDspaceILSTOCSCover Art

Verification

TEIDCM21EADMETSMPEG21

Validation

Validating toStandardsuch asAACR2

CollectionMetadataCreation

AdministrativeMetadata

ServiceDescription

ContentDescription

Conversion

DC -> M21HTML -> DCM21 -> XML

EnhancementServices

MetadataCapture

Auto-Dewey

Authorities

Set Holdings

Users

AcademicCataloger

PublicReferenceLibrarian

Public Lib.Patron

Web Surfer

Selection

WorldCatCollectionDevelopmentPolicy

Transmission

OAIFTPTapesCrawl

CompositeServices

FirstSearch

Group Cat.

Connexion

ILL/RM

CollectionManagement

Google/Yahoo/Amazon

Micro

Service

Micro

Service

Micro

Service

Micro

Service

Publish

WorldCat

OpenWorldCat

Access

Enh

ance

Ref. Stayswith Items

Research in these areasResearch in these areas

Page 7: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Some issues

• Metadata variety– Encoding, element sets, values/content– Provenance

• Metadata manipulation– Validation, identification – Enhancement, augmentation– Relation, FRBR, deduplication– Transformation

• Schematization and web services– Make data available in forms that allow machine

services to be flexibly built on top of them– Everything is a service

Page 8: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Open WorldCatOpen WorldCat

Page 9: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Open WorldCat

• Facilitate the rendezvous of users and library services on the web

• Surface the library where the users are

• Help release the value of library services in the working and learning lives of their users.

Page 10: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Open WorldCat Architecture

Aggregators

Schemas and Vocabularies

Profiles and Relationships

Content Owner

Portals

Metadata

Distribution, Search,

Display

Access

Google, Yahoo and Book Vendors Organization and Presentation

OCLC Organizes WorldCat content in model suitable for harvesting, anticipate unique aspects of various portals

OCLC Uses Host of Authentication and Authorization tools to progressively match content to rights

OCLC Developed Geo-locator services to matches users to extensive FirstSearch WorldCat institution and user profiles

WorldCat , Additional collections can be added to Worldcatlibraries domain

OCLC will use tools such as xISBN and FRBR models to organize WorldCat public views suitable for low precision access

Page 11: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Current partners

• Book vendors and bibliographies ABE Books ABAA Alibris HCBIB BookPage

• Search engines (pilot with 2M records exposed as web pages for harvesting)

Google Yahoo!

Click in presentation mode to go through toexamples

Click in presentation mode to go through toexamples

Try a search for:A history of caricature and grotesque in literature and art Try a search for:A history of caricature and grotesque in literature and art

Page 12: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

8/14/03:Googlecontractsigned

9/19/03:Google given go-ahead to harvest records

10/22/03:Google harvests150,000 records

Dec.’03:Records begin toappear in Google;800 inbound-linkslogged (search-site-originating[SSO])

Jan.’04:32,000 inboundlinks logged(SSO)

Mar.’04:109,000 inboundlinks logged(SSO)

5/21/04:Yahoocontractsigned

5/28/04:Yahooharvestsrecords

May’04:725,000 inboundlinks logged(SSO)

6/6/04:Yahoocompletesindexing of2 million WCrecords

Google and Yahoo! timeline

Page 13: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Traffic

800 32,064 42,659 108,971315,988

725,545

2,452,521

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

Dec Jan Feb Mar Apr May Jun*

Search Engine History

Full record displays. Projected for June.

Full record displays. Projected for June.

Off Click Dispersion

17%

1%7%

4%0%

69%

2%Full Text

ILL Request Form

Library Information Page

Library's Map Page

netLibrary

OPAC Links

OpenURL Resolver

Page 14: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Metadata management and knowledge organizationMetadata management and knowledge organization

Page 15: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Research activities

• Structures– FRBR– VIAF

BT

– FAST– Vocabulary encoding

and mappings

• Services– xISBN– Metadata

transformation services– Terminology services– Authority services– Automatic classification

and cataloging Eprints uk Web harvesting

Page 16: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

FRBR

• OR Work-set algorithm

• Work-based view incorporated into WorldCat in FirstSearch in late 2004

• FictionFinder– 2.6+ million fiction records from Worldcat,

clustered by OCLC’s FRBR algorithm– Make greater use of data (genres, settings,

imaginary characters, etc)

• Participate in ongoing FRBR refinement

Click in presentation mode to go through toFictionFinder

Click in presentation mode to go through toFictionFinder

Page 17: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

FAST

Page 18: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Vocabulary mappings

Page 19: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Services

• Web services– Computer to computer applications over the web

• Unplug and play– Unbundling monolithic applications and making

functionality available in more modular ways

• Reuse and sharing– Of services!

• Release the value in a web environment of the historical library investment in vocabularies and structures

Page 20: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

xISBN

• An experimental web service– Leverages FRBRization work– Give it an ISBN, it returns all related ISBNs– Based on WorldCat– Designed for machine-to-machine data exchange

• Examples:– Check user ILL requests against all editions/versions in

OPAC– Find library’s editions when user finds any

edition/version of item on Amazon– Check OPAC for all editions during

selection/acquisitions/gift book processing– …

Page 21: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

xISBN

Click cover to search amazon.co.ukClick cover to search amazon.co.uk

Click cover to search Seattle Public LibraryClick cover to search Seattle Public Library

Install FRBR Bookmarklets in your browser to see xISBN working.See Bookmarklets pageAt www.oclc.org/research/researchworks/

Install FRBR Bookmarklets in your browser to see xISBN working.See Bookmarklets pageAt www.oclc.org/research/researchworks/

Page 22: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Metadata schema transformations

• Metadata Schema Transformation Services– Evaluate approaches to crosswalking metadata– Prototype transformation environments

• The XSLT “short path”– Supports lightweight XML processing– Designed for public access– Deliverables:

OAI repository of METS-captured xwalks [NEW]

• The “long path” option– Designed for high-fidelity translations– May be public or proprietary– Deliverables: Toolkit; expertise in non-MARC formats

Page 23: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

1111

File of records in format X

5555

File of records in format Y

2222Transform to intermediate form

STRUCTURAL TRANSFORM

Translate input semantics to CORE

3333

CORE

SEMANTIC TRANSLATION

Transform to output format Y

STRUCTURAL TRANSFORM

Translate CORE to output semantics

4444

SEMANTIC TRANSLATION

Page 24: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

A crosswalk as a METS record

• Describe the crosswalk object in the METS header.

• Assemble and identify six objects in the METS structural map:– The source metadata schema– The target metadata schema– The crosswalk

– Human-readable and executable versions of each

• Associate metadata for each file in the METS Descriptive Metadata Section.

Page 25: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Crosswalk METS record in OAI repository

Page 26: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 27: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 28: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

What the METS encoding solves

• The semantic and syntactic information required for interpreting and executing a crosswalk is collected into a single object.

• The repository is searchable by humans and automated processes.

• Services can be built on top of it.

• It encourages the development and standardization of crosswalks.

These outcomes are possible because every component in the system is a standard.These outcomes are possible because every component in the system is a standard.

Page 29: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Terminology Services

• Terminology services are web services for knowledge organization schemes (kos)– e.g., authority files, subject heading systems, thesauri,

taxonomies, and classification schemes

• A web service that provides mappings from a term in one vocabulary to one or more terms in another vocabulary is an example of a terminology service

Page 30: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Current Situation

• A plethora of vocabularies

• Many encoding formats

• Few inter-vocabulary connections

• Identifiers inadequate– Unavailable– Temporary– Inconsistent

Page 31: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Terminology services system framework

• Schema transformation:– MARC XML– SKOS– Zthes

• Record enhancement:– Inter-vocabulary mappings– Persistent identifiers (info:uri)

• Access:– Human-readable:– Browse interface (ERRoLs)– Search/retrieve records

(SRU/W)– Switch between schema-

specific views (XSLT)– m2m:

Publishing (OAI) Search/retrieve records

(SRU/W) info:uri resolution (OpenURL)

• Open standards:– MARC 21– XML/XSLT/XPath– SKOS– Zthes– SRU/SRW– OAI– info:uri– OpenURL

• Open source software:– OCLC OAICat– OCLC SRU/SRW server– OCLC ERRoL J2EE webapp

• Open content:– GSAFD, others…

• Open access

• Web services-oriented

Page 32: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Schema Transformation

• MARC XML– Authority Format & Classification Format

• SKOS– Simple Knowledge Organization Systems

• Zthes– Z39.50 Profile for Thesaurus Navigation.5– Based on Z39.19 (NISO Thesaurus Standard)

Page 33: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Vocabulary Processing

Vocabulary X

Zthes SKOS

schema transformation

Add: •provenance (MARC Org. Codes)•persistent identifiers (info:kos)

Optionally, add:•inter-vocabulary mappings

•Concepts & terms•persistent identifers (info:kos)

Vocabulary Y

data enhancement

Conversion from mostformats:•Z39.19•wordlists in PDF, etc.

Initial conversion to MARC XML•Authorities format, or,•Classification format

Page 34: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Info:kos

• Info:uri– provides a mechanism for the registration of public

namespaces that are used for the identification of information assets

• The kos identifier– provides a mechanism for identifying knowledge

organization schemes and the concepts used in those schemes. It has two elements:

scheme concept

Page 35: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

http://errol.oclc.org [OpenURL base URL]

http://errol.oclc.org/xyz.search [SRU-to-HTML gtwy]

http://errol.oclc.org/xyz.html [HTML interface]

server(info:uri resolver)

http://alcme.oclc.org/srw/ [SRW request]

New services environment

DC

SKOS

Zthes

server

[SRW/SRU response]

[ERRoLs server stylesheets applied]

http://errol.oclc.org/xyz.rss [RSS feed]

http://errol.oclc.org/xyz.sru [SRU gateway]

http://errol.oclc.org/xyz.srw2oai [OAI gateway]

Page 36: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 37: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 38: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 39: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 40: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 41: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Name authority lookup

• Interactive

• As a web service

Lorcan DempseyLorcan Dempsey

• An example: authority control serviceinvoked from within Dspace

Click in presentation

mode.

Click in presentation

mode.

Page 42: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 43: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 44: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 45: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research
Page 46: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Working with web servicesWorking with web services

Page 47: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Making data work harderMaking data work harder

Page 48: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Data mining

• Research

• Production– Collection analysis service in development

phase– Leverages WorldCat data in interactive mode

Compare my collection to my peers Compare my collection to my neighbors Profile my collection by subject, by age, … etc

Page 49: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Collection

• Change creates demand for better data.

• Growing interest in knowing more about:– Characteristics– Gaps and overlaps– Use

• Tuning collections based on data.

• Focus collection spending where creates most value.

Page 50: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Some projects

• Characteristics of collections– WorldCat– CIC

• Compare ILL, circulation and holdings data.

• Last copy: what is irreplaceable?

• ARL Global Resources.– Exploring coverage of

overseas titles in ARL libraries.

• Depends on consistency, coverage, currency

Page 51: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Comparing CIC Collection Profiles

Page 52: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Audience level

Forge Letters

Page 53: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Profiles of ‘Letters’ & ‘Forge’ Example

0%

20%

40%

60%

80%

ARL Academic Public School

Per

cen

t o

f H

old

ing

s Letters of …

Forge of Liberty

0.81 0.65

Page 54: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

TopicsFramework for WorldCat directionsFramework for WorldCat directions

Metadata management and knowledge organizationMetadata management and knowledge organization

Working with web servicesWorking with web services

Making data work harderMaking data work harder

Some research, some productionSome research, some production

Open WorldCatOpen WorldCat

Page 55: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Thoughts

• Machines will do more work– Consistency becomes more important

• Variety

• Low precision– Make data work

Page 56: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

The pattern is new …

The knowledge imposes a pattern and falsifies

For the pattern is new in every moment

The knowledge imposes a pattern and falsifies

For the pattern is new in every moment

Page 57: Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004 OCLC: some development and research

Further information

Thanks to colleagues in OCLC Research forcontributions to this presentation. Further information about OCLC Research projectscan be found at http://www.oclc.org/research/

Thanks to colleagues in OCLC Collection Management Services for contributions to this presentation. Further information aboutOpen WorldCat athttp://www.oclc.org/worldcat/pilot/