digitometric services for open archives environments
DESCRIPTION
Digitometric Services for Open Archives Environments. Tim Brody Simon Kampa, Stevan Harnad, Les Carr, Steve Hitchcock {tdb01r,srk,harnad,lac,sh94r}@ecs.soton.ac.uk. University of Southampton, Intelligence, Agents, Multimedia Group. The protocol is openly documented, and metadata - PowerPoint PPT PresentationTRANSCRIPT
22 April 2023 ECDL 2003, Trondheim, Norway 1 of 24
Digitometric Services forOpen Archives Environments
Tim BrodySimon Kampa, Stevan Harnad,
Les Carr, Steve Hitchcock{tdb01r,srk,harnad,lac,sh94r}@ecs.soton.ac.uk
University of Southampton, Intelligence, Agents, Multimedia Group
22 April 2023 ECDL 2003, Trondheim, Norway 2 of 24
Open Archives Initiative
The protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)
Archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.
Promotinginteroperability
22 April 2023 ECDL 2003, Trondheim, Norway 3 of 24
OAI Data Model:Resources/Items/Records
All available (meta)data about
the resource
Dublin CoreMetadata
MARCMetadata
???XML
Item = OAI identifier
resource
item
records
record = metadata + identifier + datestamp
22 April 2023 ECDL 2003, Trondheim, Norway 4 of 24
Protocol Responses
22 April 2023 ECDL 2003, Trondheim, Norway 5 of 24
Protocol
Service Provider Data ProviderHTTP URL Requests
XML Responses
Identify
Collection-level Description
ListRecords?metadataPrefix=xyz
All repository xyz records
ListRecords?from=2003-04-02&…
All repository xyz records since 2003-04-02
1
2
3
22 April 2023 ECDL 2003, Trondheim, Norway 6 of 24
Other Commands
• ListIdentifiers– Return only the identifier/datestamp/set
membership• ListMetadataFormats
– Return the available data formats• ListSets
– Return the set structure (if there is one)• GetRecord
– Return a record given by OAI identifier
22 April 2023 ECDL 2003, Trondheim, Norway 7 of 24
Interest in OAI
• 111 registered OAI repositories– Many unregistered (e.g. all GNU EPrints.org
and DSpace archives)• 4,500,000 public records
– http://arc.cs.odu.edu/• NSDL project, UK’s JISC Information
Environment• OLAC (language community built on OAI)
22 April 2023 ECDL 2003, Trondheim, Norway 8 of 24
Why OAI?
1. Mandated Dublin Core allows the quick establishment of basic services and tools
2. Simple and metadata-neutral protocol allows more interesting possibilities (without breaking 1.) and extensions …
22 April 2023 ECDL 2003, Trondheim, Norway 9 of 24
Adding Caching to OAI-PMH
LargeRepositories Services
SmallRepositories
LegacyOAI Im plementations
N o n -a g g re g a te d O A I-P M H
LargeRepositories Services
SmallRepositories
A g g re g a te d O A I-P M H
OAI Caches
22 April 2023 ECDL 2003, Trondheim, Norway 10 of 24
Celestial (OAI Cache)
• Developed to maintain a local metadata copy– Avoid repeated, large harvests during development
• Provides an abstraction over multiple OAI versions– (hence acts as a gateway to older implementations)
• Useful for testing OAI implementations & improving performance
• Using XSLT provides a Web interface to OAI• Provides redundancy
22 April 2023 ECDL 2003, Trondheim, Norway 11 of 24
22 April 2023 ECDL 2003, Trondheim, Norway 12 of 24
Citebase Search – Data Model
R epos ito ries
M etadata H arvest
C e les tia l
(O A I-P M H )
Full-text H arvest
M eta D atabas e
R efe ren cesD ataba se
C ita tionD ataba se
W ebIn te rface
O A I-P M HIn terface
Citebase
e-Services
22 April 2023 ECDL 2003, Trondheim, Norway 13 of 24
Content
• 250,000 full-text resources– 240,000 of which arXiv.org
• 6 million references– 29 mean refs/paper (therefore failed to extract
references for 18% of papers)– (n.b. modal refs is 19)
• 1 million references linked internally to the full-text (15%)
22 April 2023 ECDL 2003, Trondheim, Norway 14 of 24
22 April 2023 ECDL 2003, Trondheim, Norway 15 of 24
Citebase Search
22 April 2023 ECDL 2003, Trondheim, Norway 16 of 24
Citebase Search:Navigation by Citation Links
Current Article Co-cited
Article withreference list
Referencelink
Future
Past
Related
22 April 2023 ECDL 2003, Trondheim, Norway 17 of 24
Citebase Search
citescites
22 April 2023 ECDL 2003, Trondheim, Norway 18 of 24
Citebase Search
citescites
22 April 2023 ECDL 2003, Trondheim, Norway 19 of 24
Citebase Search
“Co-cited”
22 April 2023 ECDL 2003, Trondheim, Norway 20 of 24
Read/Cite Cycle
A u th o r R e ad s a n d C ite s P a p e r
A u th o rs fo llo w c ita tio n s , r e a d a n d c ite
22 April 2023 ECDL 2003, Trondheim, Norway 21 of 24
Digitometric Services for OAI
• Tools for visualising research metadata• Builds an analysis service on Citebase• Knowledge mapping (co-authors, co-
citation, etc.)
22 April 2023 ECDL 2003, Trondheim, Norway 22 of 24
Co-Citation Network
22 April 2023 ECDL 2003, Trondheim, Norway 23 of 24
Full Co-Citation Map
22 April 2023 ECDL 2003, Trondheim, Norway 24 of 24
Digitometric Services forOpen Archives Environments
• http://www.openarchives.org/• http://opcit.eprints.org/• http://citebase.eprints.org/• http://www.eprints.org/• http://www.hyphen.info/
– AKT Project (knowledge)
Thank you for listening!Tim Brody