digitometric services for open archives environments

29 January 2022 ECDL 2003, Trondheim, Nor way 1 of 24 Digitometric Services for Open Archives Environments Tim Brody Simon Kampa, Stevan Harnad, Les Carr, Steve Hitchcock {tdb01r,srk,harnad,lac,sh94r}@ecs.soton. ac.uk University of Southampton, Intelligence, Agents, Multimedia Group

Upload: savea

Post on 22-Feb-2016




0 download


Digitometric Services for Open Archives Environments. Tim Brody Simon Kampa, Stevan Harnad, Les Carr, Steve Hitchcock {tdb01r,srk,harnad,lac,sh94r}@ecs.soton.ac.uk. University of Southampton, Intelligence, Agents, Multimedia Group. The protocol is openly documented, and metadata - PowerPoint PPT Presentation


Page 1: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 1 of 24

Digitometric Services forOpen Archives Environments

Tim BrodySimon Kampa, Stevan Harnad,

Les Carr, Steve Hitchcock{tdb01r,srk,harnad,lac,sh94r}@ecs.soton.ac.uk

University of Southampton, Intelligence, Agents, Multimedia Group

Page 2: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 2 of 24

Open Archives Initiative

The protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)

Archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.


Page 3: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 3 of 24

OAI Data Model:Resources/Items/Records

All available (meta)data about

the resource

Dublin CoreMetadata



Item = OAI identifier




record = metadata + identifier + datestamp

Page 4: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 4 of 24

Protocol Responses

Page 5: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 5 of 24


Service Provider Data ProviderHTTP URL Requests

XML Responses


Collection-level Description


All repository xyz records


All repository xyz records since 2003-04-02




Page 6: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 6 of 24

Other Commands

• ListIdentifiers– Return only the identifier/datestamp/set

membership• ListMetadataFormats

– Return the available data formats• ListSets

– Return the set structure (if there is one)• GetRecord

– Return a record given by OAI identifier

Page 7: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 7 of 24

Interest in OAI

• 111 registered OAI repositories– Many unregistered (e.g. all GNU EPrints.org

and DSpace archives)• 4,500,000 public records

– http://arc.cs.odu.edu/• NSDL project, UK’s JISC Information

Environment• OLAC (language community built on OAI)

Page 8: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 8 of 24

Why OAI?

1. Mandated Dublin Core allows the quick establishment of basic services and tools

2. Simple and metadata-neutral protocol allows more interesting possibilities (without breaking 1.) and extensions …

Page 9: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 9 of 24

Adding Caching to OAI-PMH

LargeRepositories Services


LegacyOAI Im plementations

N o n -a g g re g a te d O A I-P M H

LargeRepositories Services


A g g re g a te d O A I-P M H

OAI Caches

Page 10: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 10 of 24

Celestial (OAI Cache)

• Developed to maintain a local metadata copy– Avoid repeated, large harvests during development

• Provides an abstraction over multiple OAI versions– (hence acts as a gateway to older implementations)

• Useful for testing OAI implementations & improving performance

• Using XSLT provides a Web interface to OAI• Provides redundancy

Page 11: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 11 of 24

Page 12: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 12 of 24

Citebase Search – Data Model

R epos ito ries

M etadata H arvest

C e les tia l

(O A I-P M H )

Full-text H arvest

M eta D atabas e

R efe ren cesD ataba se

C ita tionD ataba se

W ebIn te rface

O A I-P M HIn terface



Page 13: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 13 of 24


• 250,000 full-text resources– 240,000 of which arXiv.org

• 6 million references– 29 mean refs/paper (therefore failed to extract

references for 18% of papers)– (n.b. modal refs is 19)

• 1 million references linked internally to the full-text (15%)

Page 14: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 14 of 24

Page 15: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 15 of 24

Citebase Search

Page 16: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 16 of 24

Citebase Search:Navigation by Citation Links

Current Article Co-cited

Article withreference list





Page 17: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 17 of 24

Citebase Search


Page 18: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 18 of 24

Citebase Search


Page 19: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 19 of 24

Citebase Search


Page 20: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 20 of 24

Read/Cite Cycle

A u th o r R e ad s a n d C ite s P a p e r

A u th o rs fo llo w c ita tio n s , r e a d a n d c ite

Page 21: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 21 of 24

Digitometric Services for OAI

• Tools for visualising research metadata• Builds an analysis service on Citebase• Knowledge mapping (co-authors, co-

citation, etc.)

Page 22: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 22 of 24

Co-Citation Network

Page 23: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 23 of 24

Full Co-Citation Map

Page 24: Digitometric Services for Open Archives Environments

22 April 2023 ECDL 2003, Trondheim, Norway 24 of 24

Digitometric Services forOpen Archives Environments

• http://www.openarchives.org/• http://opcit.eprints.org/• http://citebase.eprints.org/• http://www.eprints.org/• http://www.hyphen.info/

– AKT Project (knowledge)

Thank you for listening!Tim Brody