open archives initiative protocol for metadata harvesting
DESCRIPTION
Dublin Core conference 2009 Seoul, Oct 2009TRANSCRIPT
10/2009 Dublin Core conference 2009, Seoul 1
The Open Archives Initiative Protocol for Metadata Harvesting
Muriel Foulonneau
Tudor Research Centre
The protocol was born
To create a minimal layer of interoperability between distributed repositories of scientific publications
An alternative to federated search Networking of digital repositories
2Oct 2009 [email protected]
Oct 2009 [email protected]
“OAI divides the world between data providers and service providers”
Oct 2009 [email protected]
Sharing metadata : Data aggregation
The portal gathers metadata and implements its own retrieval system
Mill?<title>My resource</title><date>04
Eg. Search engines, union catalogs, OAI
Oct 2009 [email protected]
The OAI framework
Service provider
Harvester
Repository
Data provider
Data provider Data provider
Repository
Data provider
Repository
Agregator
Mechanisms to transfer large datasets Resumption tokens Incremental harvesting
Portal interface
Incremental harvest
6
<title>My resource</title><date>04
Harvester
Data providers What’s new since the last time I came?
•New or modified records•Deleted records
[email protected] 2009
Dublin Core
MARC21
MODS
Multiple representations of an object
School of arts for girls Kiz Sanayi Mektebi]
oai:lcoa1.loc.gov:loc.pnp/cph.3b23005
[email protected] 2009
Oct 2009 [email protected]
OAI repositories can be organized in sets
April, 20065
What do sets represent?
Journals: issues
Institutional repositories:
Departments, research centers, etc.
EPrint Archives:Subject,
Publication Status
Cultural Heritage Repositories:Collections with Intent
Set representations may be constrained by the software package used.
Enable selective harvesting Sets can overlap: 1 item in multiple sets Can be described (eg with DC or DC Collection)
Oct 2009 [email protected]
OAI supports 6 verbs
Identifyhttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify
ListSetshttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets
ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc
ListMetadataFormats
http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats ListIdentifiers
http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc
GetRecord
http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc
Oct 2009 [email protected]
An OAI response
<record>- <header> <identifier>oai:images.library.uiuc.edu:emblems/324</identifier> <datestamp>2003-10-22</datestamp> <setSpec>emblems</setSpec> </header>- <metadata>- <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:creator>Müller, Johann Heinrich Traugott, 1631-1675</dc:creator> <dc:identifier>http://images.library.uiuc.edu:8081/u?/emblems,324</dc:identifier> </oai_dc:dc> </metadata> </record>
About section often not used Eg to state rights on the metadata record
Oct 2009 [email protected]
Examples of repositories
Library of Congress
http://memory.loc.gov/cgi-bin/oai2_0
ContentDM at UIUC
http://images.library.uiuc.edu:8081/cgi-bin/oai.exe
Ohio State Knowledge Bank
https://kb.osu.edu/dspace-oai/request
PictureAustralia
Aggregates from large institutions Web crawling for small ones Flickr for individuals
13
“Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.”
http://www.pictureaustralia.org/schemas/pa/index.html
[email protected] 2009
DRIVER – aggregation as an infrastructure
[email protected] 2009
Europeana
[email protected] 2009
IVOA – synchronization of service repositories
[email protected] 2009
Oct 2009 [email protected]
Turn key systems
ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/
Oct 2009 [email protected]
Metadata formats
DC, QDC, ETDMS, MODS, MARC, EAD, …
Require an XML schema
Most implementations only use simple DC
Example of values found in DC:Date
September 29–October 28, 51 AD; 1970
second half of IXth century AD; 1978
Rebuilt 1984Possibly Vth/VIth century AD; 1935
Planted 1985n/a
n.d.
Mid IInd century AD; 1973
Jul-51
circa 900 AD
ca. 701 BC
Begun 14th century
184-?
1839
18–?
August 23, 2000
between 1827 and 183
VIIIth/IXth century AD ? (TC);1965
Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982
20
XVIII DynastyWinter 2003
era of redevelopmentvarious2002-001980, refurbished 1997China: Neolithic Period (5000 BCE-ca 1600 BCE)?19691968
21. Nouemb. Anno. 1564.And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the
wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe[1479]]19193xxxx Oct xxVarious1938-05-381963 to 1953
[not after 1579]163[5?]
[email protected] 2009
Who is a metadata made for?
machine Dc:type “Text.Correspondence.Letter” Dc:language “wln”
human
Dc:type Correspondence Dc:language “wallon”
Who knows ? Dc:date “197- “ Dc:description “First ed. Cf. BM. “
[email protected] 2009
Oct 2009 [email protected]
Improving quality
Quality certificates for open access repositories DINI - Deutsche Initiative für Netzwerkinformation
Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library
http://www.diglib.org/pubs/dlf108.pdf
Meeting with software providers
Test environment (eg Europeana)
Community guidelines
Conclusion
The protocol « crossed the chasm »?
The objective is to create a network of repositories rather than networking individual resources
Lack of specific mechanism to relate resources to each other
Approach to linked data and OAI-ORE
23Oct 2009 [email protected]
OAI-PMH
http://www.openarchives.org/pmh/
Best practices for OAI and shareable metadata
http://www.diglib.org/pubs/dlf108.pdf
Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting, Libraries Unlimited, 2007
Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008
References
Oct 2009