natasa bulatovic max planck digital library research and development

34
This work is licensed under a Creative Commons Attribution 2.0 Germany License http://creativecommons.org/licenses/by/2.0/de/ eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic Max Planck Digital Library Research and Development

Upload: dympna

Post on 15-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources. Natasa Bulatovic Max Planck Digital Library Research and Development. The Max Planck Digital Library (MPDL) in a Nutshell. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Natasa Bulatovic Max Planck Digital Library Research and Development

This work is licensed under a Creative Commons Attribution 2.0 Germany License http://creativecommons.org/licenses/by/2.0/de/

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic

Max Planck Digital Library

Research and Development

Page 2: Natasa Bulatovic Max Planck Digital Library Research and Development

Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG)

MPG consists of about 80 institutes in three scientific sections the Chemistry, Physics and Technology Section

the Biology and Medicine Section

the Human Sciences Section

The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data

MPDL develops software solutions in close cooperation with scientists, librarians and technicians

In the Human Sciences Section several institutes have digitized cultural artefacts and want to make them open access

The Max Planck Digital Library (MPDL) in a Nutshell

Page 3: Natasa Bulatovic Max Planck Digital Library Research and Development

eSciDoc SOA Landscape

Page 4: Natasa Bulatovic Max Planck Digital Library Research and Development

Which data are managed?

Page 5: Natasa Bulatovic Max Planck Digital Library Research and Development

How?

PubMan – Publication Management

VIRR – Textual digitized resources management

IMEJI – Image management

Page 6: Natasa Bulatovic Max Planck Digital Library Research and Development

PubMan: Management of publications

Page 7: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

Collaboration of the MPDL with the Max Planck Institute for European Legal History

Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist.

VIRR is about

Page 8: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

ViRR Key features

Web-based collaborative application

Editor (bibliographic metadata, table of contents and structural metadata)

Viewer (online representation)

Browser

Page 9: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

ViRR Editor

Combines a set of tools

Paginator

Table of Contents Editor

Metadata Editor

One complex, but flexible workspace

No default order for the usage of the tools

Page 10: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

ViRR Editor - Paginator

Assign the logical page numbers to the physical ones

Choose between different formats (Arabic, Latin, custom)

Paginate manually or automatically

Page 11: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

ViRR Editor - ToC Editor

Gather the logical structure of a work by breaking it down in structural elements

Arrange the hierarchical order of structural elements in the tree

Assign scans to structural elements

Choose from fine granular structural element types (over sixty)

Page 12: Natasa Bulatovic Max Planck Digital Library Research and Development

21.04.23

ViRR Editor – Metadata Editor

Assign descriptive metadata to structural elements

Detailed description of every structural element

Systematic browsing

Dedicated search will be possible

Page 13: Natasa Bulatovic Max Planck Digital Library Research and Development

ViRR Viewer

Browse by scanBrowse by ToC

Navigate to page

View metadata of structural element

Page (web resolution)

Page(full resolution)on click

Page 14: Natasa Bulatovic Max Planck Digital Library Research and Development

ViRR: Sharing and reuse

http://virr.mpdl.mpg.de

Page 15: Natasa Bulatovic Max Planck Digital Library Research and Development

From ViRR to Digitization Lifecycle Project Goal

support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform

Partners: MPI for European Legal History, Frankfurt

Kunsthistorisches Institut, Florenz (KHI)

Bibliotheca Hertziana, Rom

MPI for Human Development, Berlin

Related projects: ViRR (see http://colab.mpdl.mpg.de/mediawiki/ViRR:_Virtueller_Raum_Reichsrecht)

XML-Workflow (see http://colab.mpdl.mpg.de/mediawiki/MPDL_Project_XML_Workflow)

Page 16: Natasa Bulatovic Max Planck Digital Library Research and Development

Imeji: Management of image collections

Page 17: Natasa Bulatovic Max Planck Digital Library Research and Development

Imeji: repository of Digital Images

Organized into

Collections

Created and defined by the institution, project, working group

Albums

Created and defined by the researcher

Page 18: Natasa Bulatovic Max Planck Digital Library Research and Development

Imeji: what is so different about it?

Imeji is not Flickr, nor Facebook...

Freely definable metadata profiles at collection level

Controlled Vocabularies may be integrated

Smart search for dates, ranges (based on the metadata type)

Helps gathering the metadata more effectively

Focusses on collaboration and metadata quality

Repository: Data can be exported at any time

Page 19: Natasa Bulatovic Max Planck Digital Library Research and Development

eSciDoc and other services

Page 20: Natasa Bulatovic Max Planck Digital Library Research and Development

eSciDoc SOA Landscape

Page 21: Natasa Bulatovic Max Planck Digital Library Research and Development

eSciDoc core infrastructure

Set Handler (OAI-PMH)

Admin Handler

Aggregation Definition

Handl.

Statistics Data Handler

Scope Handler

Report Handler

Report Definition Handler

Item Handler

Container Handler

Context Handler

Organizational Unit Handler

Content Model Manager

User Account Handler

Role Handler

Group Handler

Resources & Data Statistics Security

Content Relation Handler

Page 22: Natasa Bulatovic Max Planck Digital Library Research and Development

CoNE Service● Manages named entities

○ Journals

○ Persons

○ Dewey Decimal Classification (3 public levels)

○ Creative Commons Licenses (CC licenses)

○ ISO 639-3 Languages

○ MIME Types

○ PACS classification

○ Custom classifications

● Reuse○ Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list)

● Motivation○ Metadata quality: autosuggest components in solutions during metadata editing

○ Disambiguation: each entity is a named graph

○ Data linking: CoNE identifiers in publication metadata

○ Technical facilitation: all lists in one place

○ Persons: Researcher Portfolio

● Extensions○ Refresh data from external sources

Page 23: Natasa Bulatovic Max Planck Digital Library Research and Development

CoNE – Control of Named Entitieshttp://cone.mpdl.mpg.de/

http://pubman.mpdl.mpg.de/cone/persons/resource/persons2450+

Content negotiation supported

Page 24: Natasa Bulatovic Max Planck Digital Library Research and Development

Transformation Service

● Transforms textual data formats○ Metadata

○ Resources

○ Standard formats

○ Specific formats (e.g. EndNote custom fields)

● Motivation○ Migration of data from MPI

○ Exports and dissemination

○ Imports

○ Continuous interoperability enhancement

○ Implement once, use wherever needed

eDoc

BibTex

APA

OpenURL

EndNote

arXiv

Pmc

TEI

AJPBmc

METS

Spires

eSciDoc-Publication

eSciDoc-TOC

eDoc

BibTex

APA

OpenURL

EndNote

arXiv

Pmc

TEI

AJPBmc

METS

Spires

eSciDoc-Publication

eSciDoc-TOC

Page 25: Natasa Bulatovic Max Planck Digital Library Research and Development

Search&Export ServiceCiation style manager

● Searches and exports results ● Citation styles (Citation style manager)

○ EndNote

○ BibTex

○ …

● Reuse○ Data delivered in multiple formats (PDF, HTML, XML, ODT)

○ By external systems (content management, wordpress)

● Motivation○ Search results should be available in various outputs

○ One service – many presentations (e.g. Wordpress Plug-in)

○ One interface – easy inclusion of various export formats

Page 26: Natasa Bulatovic Max Planck Digital Library Research and Development

Syndication Service

● Provides with the latest data updates ● RSS

● Atom

● Reuse○ Subscription to feeds and data reuse

○ By any external clients

● Extensions○ Media RSS

Page 27: Natasa Bulatovic Max Planck Digital Library Research and Development

Validation service

Semantical validation

Contextual validation

Validation rule editor (upcoming)

Page 28: Natasa Bulatovic Max Planck Digital Library Research and Development

Data acquisition service• Fetches data from known sources via identifier (unAPI

interface)

• Transforms data to other format

Page 29: Natasa Bulatovic Max Planck Digital Library Research and Development

Pubman SWORD Server

• Deposit of data packages (metadata and fulltexts)

• Logic implements a pubman specific workflow

Page 30: Natasa Bulatovic Max Planck Digital Library Research and Development

PID Cache manager● Fetches Handles from the GWDG Handle System (dummy

resolution)

● Assigns a pre-fetched handle to the resource

● Synchronizes the assigned handle with the resolution to a resource in the Handle system

EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, http://www.pidconsortium.eu/ )

Page 31: Natasa Bulatovic Max Planck Digital Library Research and Development

A note on the metadata profiles

● DCAP based (Dublin Core Application Profile)

● DC terms (identified URIs)

● eSciDoc solution specific terms (identified by URIs)

● METS/MODS

● Publicly available

● Functional description http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Application_Profiles

● Schemas http://metadata.mpdl.mpg.de/escidoc/metadata/schemas/0.1/

● Interoperability levels

● Shared term definitions (done)

● Semantic interoperability (done)

● Description set syntactic interoperability (prepared)

● Description set profile interoperability (prepared)

Page 32: Natasa Bulatovic Max Planck Digital Library Research and Development

Premises● Applications

○ Web-based

○ Internationalized

○ Integrated Help system

○ Easy to use

○ Easy to install

● Services and infrastructure

○ Reusable, interoperable, composed, technology-independent

○ Extensible, Scalable and performant ● Data

○ Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization

○ Described with published metadata profiles

○ Interoperable and enabled for reuse and repurpose

Page 33: Natasa Bulatovic Max Planck Digital Library Research and Development

Related projects and new developments

DARIAH

Digital Research Infrastructure for Arts and Humanities (see http://dariah.eu)

Imeji

AWOB

Astronomers Workbench

Resource Registries

ECHO – European Cultural Heritage Online (see http://echo.mpiwg-berlin.mpg.de/home )

Page 34: Natasa Bulatovic Max Planck Digital Library Research and Development

Thank you!

[email protected]

http://colab.mpdl.mpg.de

http://escidoc.org