reinhard altenhöner

26
| IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010 "Digital Preservation and Access to news and views” Reinhard Altenhöner 1

Upload: armand

Post on 11-Jan-2016

48 views

Category:

Documents


2 download

DESCRIPTION

1. Reinhard Altenhöner. Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010 "Digital Preservation and Access to news and views”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Changing preservations tasks for the German National Library: Some insights and preliminary

remarks

IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010

"Digital Preservation and Access to news and views”

Reinhard Altenhöner

1

Page 2: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

ToC

2

1. Starting situation / setting

2. Digital Preservation in DNB

3. Practical Example: E-Papers

Page 3: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Publications issued in Germany since 1913 Since June 22, 2006: Online- / Net-

publications are covered by the new law Newspapers as well: Ca. 450 newspapers

(this means selection!) are microfilmed every day

About 9.000 datasets in the central database Some years ago we started some

brainstorming on alternatives for this MF-approach collecting e-papers from the web Archiving of print-files Cooperation with media / clipping agencies

DNB: Our task: Collecting and archiving, providing permanent access

3

Page 4: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Frequent update-processes Dedicated publication workflow: database,

Content-Management-System, presentation on the fly

Web 2.0-facilities for comments, blogging & tagging

Multiple ways of embedded advertisement Complex navigation and search functions Harvesting extremely difficult some experiments (e.g. on newsletters), but

no running workflow

Characteristics Online-newspapers

4c

Page 5: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

„kopal“

Co-operative development of a long-term digital information archive

Start in 2004

Task: Development of a standardized long-term preservation solution to facilitate resp. solutions for other libraries / industries

Basis: DIAS (Digital Information and Archiving System) of the Royal Dutch Library, condensed and extended with peripheral open-source

Enhancement for cooperative usage Development of an universal object scheme Hosting outside the library (remote access)

5

Page 6: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

kopal: cooperation

GWDG: HostingIBM: Archiving SW

DNB: Ingest/Acess SW

SUB: Ingest/Acess SW

Common task: Preservation Planning

6

Page 7: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

GWDG(Göttingen)

DIAS by IBMDIAS by IBM

Account 1

Account 2SUB Göttingen

DNB(Frankfurt)

Localsoftware

Localsoftware

Localsoftware

Localsoftware

kopal: Structure & concept

Partners nn

7

Page 8: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Packaging

Submission Information Package

ObjectMETS 1.4

UniversalObjectFormat

LMER 1.2 – Long-term preservation Metadata for Electronic Ressources

HeaderdmdSecamdSec File SectionStructural Map

Mets.xml

8

Page 9: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Administration InterfacekoLibRI

Online-Archivist

Machine Interface

Page 10: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Kopal preservation strategy

Migrate object with urn xxx into new format yyy

Migrate all objects of format xxx and/or that have been ingested before a certain date

and/or that are larger than zzz MB into new format xyz (e.g. from TIFF to PNG)

Implementation of emulation view paths

No restriction as of file size or file format / type – all known and unknown file formats are being accepted (text, pictures, video, audio, executables, ... etc.)

10

Page 11: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Digital newspapers in DNB

Some results (collections) from digitisation projects

- Simple graphics-data- access in a dedicated system - Including full text OCR & access

Online-Newspapers: Some pre-studies on objects like „Spiegel“ – but no running workflow

Concentration on e-papers

11

Page 12: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Digitisation results in DNB 1

12

Page 13: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Digitisation results in DNB 2

13

Page 14: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

E-papers in DNB

Preliminary thoughts: Requirements

Structured normalised metadata-set:Article/photo – issue – newspaper

Persistent identification of each unique objects, linkage between them, citable

Added information for author / title on the article level is useful but not necessarily needed

14

Page 15: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Quantity:- One newspaper: ca. 150 articles per day / 900 a

week / 47.000 per year- 21.150.000 per year

Start modestly

Retrodigitisation (collection started with 1913) will extend this to more than 1 bil. articles

Challenge in terms of resources and technical capacities

E-paper requirements

15

Page 16: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

In cooperation with a vendor after a tender procedure

Ca. 20 important newspapers, starting with two

Metadata should be delivered in ONIX.

Harvesting Interface OAI-PMH

All data delivered in a XML-File

Integrated Digital Preservation in the kopal environment

E-paper project (recently started)

16

Page 17: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

XML record for e-Papers

17

Page 18: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

E-Paper & Access

Principal question for access: Integration in Portal environment or dedicated (independent) search-area

Advanced requirements for segmentation of text

Direct link between portal (metadata) and text

Navigation / Browsing within the object, direct access to single chapters / pages

Zooming, scroll

Integrated Full text search

Print and Store facilities

DRM, IDM

18

Page 19: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

6

FilmInformation about actors, director, producers, music, sequence, year of production. Short description of the picture, video sequence…What is in the film, rights.Any other relevant information as short summary of content for fast access…

Related booksYear of printing, editions, authors, summary of the book….

Related internet linksYear of printing, editions, authors, summary of the book….

Related music scoreYear of printing, editions, authors, summary of the book….

Related films Year of printing, editions, authors, summary of the book….

Related songsYear of printing, editions, authors, summary of the book….

Related newsYear of printing, editions, authors, summary of the book….

Semantic

Multimedia-

Search

5

COREProfessionals

(Media archives…)

MANTLEAutomated(Learning)

SHELLEnd-User(Wikipedia)

Open

Knowledge

Networks

4

Knowledge base

Semantic

relation

3

Face Logo

Text Person

Speaker 1Speaker 2

Image

Text

Title

Content-

analysis

2

Automated

optimisation

1

digitisation

Reuse of results from CONTENTUS-project

19

Page 20: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Data processing

Automated Page-segmentation(headlines, images, tables)

OCR + entity recognition

Full text search

Semantic search interface

Based on:

Intellectual approved authority files

Statistical data analysis

| 20

20

Page 21: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Our solution currently

21

Integrated search and retrieval

Page 22: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Next step: Integrated E-papers

22

Page 23: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Integrated E-paper „ZEIT“ 1

23

Page 24: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Bereitstellung von freien Texten

24

Integrated E-paper „ZEIT“ 2

Page 25: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-2625

Integrated E-paper „ZEIT“ 3

Page 26: Reinhard Altenhöner

| IFLA2010. Newspaper section | 2010-02-26

Reinhard Altenhöner

mailto:[email protected]

http://www.d-nb.de

26