kurt maly department of computer science old dominion university norfolk, virginia 23529, usa...

Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA [email protected] Digital Libraries, OAI and Free Software for Education and Science 5 th National Conference Computer Application Federation of China Instrument & Control Society Yinchuan, Ningxia Province,PRC September 22-24, 2003

Upload: brent-holmes

Post on 04-Jan-2016




2 download


Page 1: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Kurt MalyDepartment of Computer Science

Old Dominion UniversityNorfolk, Virginia 23529, USA

[email protected]

Digital Libraries, OAI and Free Software for Education and Science

5th National ConferenceComputer Application Federation of China Instrument & Control Society

Yinchuan, Ningxia Province,PRCSeptember 22-24, 2003

Page 2: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Outline Digital Libraries The Open Archives Initiative Free Software Systems

Arc DP9 Kepler RVOT

Conclusions Important URLs

Page 3: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Digital Libraries DL = library whose content is

stored digitally and can be accessed over the Internet

Key difference between DLs and the general Web is that the content is structured and has metadata associated with it allowing for more precise results to queries

Page 4: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Digital Libraries Development of software to support DLs

has proceeded along proprietary software lines

It is extremely difficult for the average user to find information that is in different DLs

Need for interoperability between DLs

Page 5: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Digital Libraries DL interoperability can be achieved at three

levels technical:protocol, format, etc. should be

consistent so that messages can be exchanged content: agreements cover the data and metadata,

agreements on the interpretation of messages organizational: includes rules for access, for

changing collections and services, payment, and authentication

Need to federate, filter and provide value-added services on remote content

Page 6: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Open Archives Initiative address technical interoperability

among distributed archives facilitate the discovery of content in

distributed archives The OAI framework defines two

functional roles: data providers (archives) and service providers

Page 7: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Open Archives Initiative Data providers: expose the metadata of their

objects for harvesting Service providers: extract metadata from data

providers via the OAI metadata harvesting protocol

Service provider develop value-added services that are based on the metadata collected from data providers such as: cross-archive search engines, linking systems,

and peer-review systems

Page 8: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference

8herbert van de sompel

The Open Archives Iinitiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel

OAI origin

herbert van de sompel

Page 9: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Core concepts of Santa Fe convention

herbert van de sompel

• low-barrier interoperability

• data-provider & service-provider model

• metadata harvesting model

• shared metadata format and parallel, community-

specific metadata formats

• acceptable use

Dienst subset


XML reply

HTTP based

Gentelmen’s agreement

Page 10: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


core concepts in OAI 1.0

herbert van de sompel

• low-barrier interoperability

• data-provider & service-provider model

• metadata harvesting model

• shared metadata format and parallel, community-

specific metadata formats

• acceptable use

• flexibility

OAI 1.0 protocol

Dublin Core

HTTP based

Community specific

Reply • XML Schema

• Self contained

Page 11: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.

new OAI mission statement

herbert van de sompel

Page 12: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program.

new OAI mission statement

herbert van de sompel

Page 13: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials.


new OAI mission statement

herbert van de sompel

Page 14: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free software - Arc Arc harvests metadata currently from

about 150 OAI compliant archives normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle)

over 6 Million metadata records from various subject domains

Arc also provides OAI layer, thus making hierarchical harvesting possible

Page 15: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 16: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 17: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software – DP9 “deep web" or "invisible web" a vast

repository of content, such as documents in online databases, that general-purpose web crawlers cannot reach

500 times that of the surface web Internet search engines can not index OAI

collections, as they are not aware of the OAI protocol

Page 18: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software – DP9 A Web crawler indexes a Web site by starting

with a base HTML page and following the links on this page to go deeper to retrieve other pages on the Web site

DP9 computes and presents an HTML page presented to a Web crawler as a result of an OAI request, and the links on the Web page leads to other OAI requests

Page 19: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software – DP9 DP9 provides an entry page and if a web

crawler finds this entry page, it may follow the links on this page and send requests to DP9.

DP9 will then forward the request to corresponding OAI Data Providers and process the returned XML records

Depending on the depth a crawler follows, it can index all records in an OAI Data Provider

Page 20: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software – DP9

W eb Craw ler

O AI Repos itory

O AI Repos itory

URLW rapper

J S P /S ervlet

X S LTP roc es s or

O AIHandler

DP9S t a t icU R L

T rans late

S end O AI reques t/G et X M L reply


ReturnHT M L


Page 21: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 22: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software - Kepler The objective of the Kepler framework is to

satisfy the need for the average researchers at an average university to publish results and disseminate them to a wide audience quickly and conveniently

The Kepler framework is based on OAI to support what is called "personal data providers" or "archivelets"

Page 23: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software - Kepler Kepler framework - a digital library of

many ‘little’ publishers. an easy-to-use archivelet that is

downloadable and self-installing an automated registration service to

support tens of thousands of publishers a simple service provider to harvest

metadata from archivelets.

Page 24: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


O AI C om pliantR epository

Publish ingT ool

O AI C om pliantR epository

Publish ingT ool

O AI C om pliantR epository

Publish ingT ool

R egistra tionService




Page 25: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 26: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 27: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software - RVOT Rapid Visual OAI Tool (RVOT) is a tool that

can help small organizations in making their collections OAI-PMH compliant

construct an OAI-PMH repository from a collection of files metadata translation tool

records in the original collection can be in any of the supported formats including RFC1807, MARC subset, and COSATI formats

lightweight HTTP server including an OAI-PMH request handler

Page 28: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software - RVOT

Table 1. OAI-PMH Related Tools

Category Tools Publishing software DSpace, eprints.org, CDSWare, Kepler Data provider programming framework

UIUC OAI Implementation, OCLC OAICat, VTOAI package, oaiperl

Server software integrated with harvester

Arc, Clelestial

Harvester programming framework

OCLC OAIHarvester, oaiperl, my.OAI

Other tools DP9, Repository Explorer

Page 29: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Free Software – RVOT

Category Tools Publishing software DSpace, eprints.org, CDSWare, Kepler Data provider programming framework

UIUC OAI Implementation, OCLC OAICat, VTOAI package, oaiperl

Server software integrated with harvester

Arc, Clelestial

Harvester programming framework

OCLC OAIHarvester, oaiperl, my.OAI

Other tools DP9, Repository Explorer

Page 30: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 31: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 32: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Page 33: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Conclusions OAI makes the many digital libraries available

today interoperate in such a way that users can discover information across a wide variety of domains without having to be aware of the many different user interfaces of the individual libraries

OAI was founded by researchers who were interested not only in free distribution of information but also in free distribution of software

Page 34: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Conclusions All the software systems described in this

paper are freely available either in OpenSource or directly from the research group that created it

one caveat: free software does not necessarily mean no cost running of services. One still has to account for the need for technical support and hardware to set up services

Page 35: Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA maly@cs.odu.edu Digital Libraries, OAI and Free Software

Sept 24, 2003 5th National CACIS Conference


Important URLs http://dlib.cs.odu.edu - ODU digital

library research group http://www.openarchives.org http://arc.cs.odu.edu http://sourceforge.net/projects/oaiarc/ http://dlib.cs.odu.edu/dp9 http://kepler.cs.odu.edu