oai protocol for metadata harvesting
DESCRIPTION
OAI Protocol for Metadata Harvesting. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – http://opcit.eprints.org/ www.ecs.soton.ac.uk. BCS Metadata Meeting, London 29 th May 2002. (Many slides borrowed from Michael L. Nelson). OAI 2.0. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/1.jpg)
OAI Protocol for Metadata Harvesting
Tim BrodyIntelligence, Agents, Multimedia Group
University of SouthamptonOpCit – http://opcit.eprints.org/
www.ecs.soton.ac.uk
BCS Metadata Meeting, London 29th May 2002(Many slides borrowed from Michael L. Nelson)
![Page 2: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/2.jpg)
OAI 2.0
• Public, stable not released yet … (but very close)– Beta released mid-May– Public release scheduled: 1st June
• 2.0 implementations in the pipeline– British Library, Cornell Univ, Ex Libris, my.OAI, Humbolt
Univ, InQuirion Pty Ltd, Library of Congress, NASA, OCLC, Old Dominion Univ, U. of Illinois, U. of Southampton, UCLA, John Hopkins U., Indiana U., NYU, UKOLN, Virginia Tech
![Page 3: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/3.jpg)
Open Archives Initiative
The protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)
Archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.
OAI is happeningat break-neck speed...
![Page 4: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/4.jpg)
Metadata Harvesting• Move away from distributed searching• Extract metadata from various sources• Build services on local copies of metadata
– Resources remain at remote repositories
user
. . .
search for “cfd applications”
local copy ofmetadata
metadataharvested offline
metadataharvested offline
metadataharvested offline
metadataharvested offline
each node independently maintained
all searching, browsing, etc. performed on the metadata hereindividual nodes can
still support direct userinteraction
![Page 5: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/5.jpg)
Metadata Harvesting
• Repositories (archives etc.) = low implementation cost
• Services = higher implementation cost• Similar to web search model
– DP9 gateway makes it exactly the same
![Page 6: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/6.jpg)
about eprints documentlike objects resources
metadata OAMS unqualifiedDublin Core
unqualifiedDublin Core
transport HTTP HTTP HTTP
responses XML XML XML
requests HTTP GET/POST HTTP GET/POST HTTP GET/POST
verbs Dienst OAI-PMH OAI-PMH
nature experimental experimental stable
model metadataharvesting
metadataharvesting
metadataharvesting
Santa Feconvention
OAI-PMHv.1.0/1.1
OAI-PMHv.2.0
![Page 7: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/7.jpg)
OAI-PMH v.2.0 [06/2002]• Goal: recurrent exchange of metadata
about resources between systems• Input:
• OAI-PMH v.1.0 [01/01 – 09/02]• feedback on OAI-implementers• deliberations by OAI-tech [09/01 -]• alpha test group of OAI-PMH v.2.0 [03/02 -]
![Page 8: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/8.jpg)
• low-barrier interoperability specification• metadata harvesting model: data provider / service
provider• metadata about resources • autonomous protocol• distinction between protocol and periphery
• community-specific extensions• HTTP based• XML responses• unqualified Dublin Core• stable (1.0 characterized as experimental)
OAI-PMH v.2.0 [06/2002]
![Page 9: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/9.jpg)
OAI Data Model:Resources / Items / Records
resource
all available metadata about David
item
Dublin Coremetadata
MARCmetadata
SPECTRUMmetadata records
item = identifier
record = identifier + metadata format + datestamp
![Page 10: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/10.jpg)
Overview of OAI VerbsVerb Function
Identify description of archive
ListMetadataFormats metadata formats supported by archive
ListSets sets defined by archive
ListIdentifiers OAI unique ids contained in archive
ListRecords listing of N records
GetRecord listing of a single record
archivalmetadata
harvestingverbs
most verbs take arguments: dates, sets, ids, metadata formatsand resumption token (for flow control)
![Page 11: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/11.jpg)
Identify
• Arguments– none
• Errors– none
• Arguments– none
• Errors– badArgument
1.1 2.0
![Page 12: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/12.jpg)
ListMetadataFormats
• Arguments– identifier
(OPTIONAL)• Errors
– id does not exist
• Arguments– identifier
(OPTIONAL)• Errors
– badArgument– noMetadataFormats– idDoesNotExist
1.1 2.0
![Page 13: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/13.jpg)
ListSets
• Arguments– resumptionToken
(EXCLUSIVE)• Errors
– no set hierarchy
• Arguments– resumptionToken
(EXCLUSIVE)• Errors
– badArgument– badResumptionToken– noSetHierarchy
1.1 2.0
![Page 14: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/14.jpg)
ListIdentifiers
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)• Errors
– no records match
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix (REQUIRED)
• Errors– badArgument– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– noRecordsMatch
1.1 2.0
![Page 15: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/15.jpg)
ListRecords
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix
(REQUIRED)• Errors
– no records match– metadata format cannot be
disseminated
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix (REQUIRED)
• Errors– noRecordsMatch– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– badArgument
1.1 2.0
![Page 16: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/16.jpg)
GetRecord
• Arguments– identifier
(REQUIRED)– metadataPrefix
(REQUIRED)• Errors
– id does not exist– metadata format cannot
be disseminated
• Arguments– identifier
(REQUIRED)– metadataPrefix
(REQUIRED)• Errors
– badArgument– cannotDisseminateFor
mat– idDoesNotExist
1.1 2.0
![Page 17: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/17.jpg)
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord></OAI-PMH>
response no errors
![Page 18: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/18.jpg)
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request><error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error></OAI-PMH>
response with error
![Page 19: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/19.jpg)
• Idempotency of resumptionToken: return same incomplete list when rT is re-issued• while no changes occur in the repo: strict• while changes occur in the repo: all items with unchanged
datestamp• new attributes for the resumptionToken:
• expirationDate• completeListSize• cursor
resumptionToken Flow-Control
![Page 20: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/20.jpg)
• evolution• from talking about OAI-PMH
• to talking about projects that use OAI-PMH
• to talking about projects and failing to mention they use OAI-PMH
• => OAI-PMH becomes part of the infrastructure
Adoption
![Page 21: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/21.jpg)
• 49 registered repositories [11/2001]• 65 registered repositories [03/2002]• 77 registered repositories [05/2002]
• 5+ million records• many unregistered repositories• private implementations (e.g. RDN)
Data Providers (a.k.a. repositories)
![Page 22: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/22.jpg)
• Arc: cross-searching of registered repositories [ http://arc.cs.odu.edu ]
• CiteBase: research literature search + citation ranking[ http://citebase.eprints.org ]
• OLAC: cross-searching of Language Archive Community repositories[ http://www.language-archives.org/index.html ]
Service Providers
![Page 23: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/23.jpg)
• Scirus scientific search engine [Elsevier][ http://www.scirus.com ]
• my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.][ http://www.myoai.com ]
• Growing interest from web search engines
Service Providers
![Page 24: OAI Protocol for Metadata Harvesting](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815fbd550346895dceb720/html5/thumbnails/24.jpg)
• Repository Explorer: interactive exploration of repositories [Virginia Tech][ http://www.purl.org/NET/oai_explorer ]
• eprints.org: generic OAI-PMH compliant repository software [U of Southampton][ http://www.eprints.org ]
• ALCME repository and harvester software [OCLC][ http://alcme.oclc.org/index.html ]
• APIs, others tools @ www.openarchives.org
OAI-PMH tools