mass digitization and long-term preservation – processes ...€¦ · 20 dr. markus brantl goethe...

95
Mass Digitisation and Long-term Preservation – Processes and Production at Munich Digitisation Centre Dr. Markus Brantl Copyright Bayerische Staatsbibliothek

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

Mass Digitisation and Long-term Preservation – Processes and Production at Munich Digitisation Centre

Dr. Markus BrantlCopyright Bayerische Staatsbibliothek

MBrantl
Notiz
MigrationConfirmed festgelegt von MBrantl
Page 2: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

2

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Who invented Mass Digitisation?

Copyright Bayerische Staatsbibliothek

Page 3: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

3

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library (BSB) and the

Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects, examples

5. Outlook

Copyright Bayerische Staatsbibliothek

Page 4: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

4

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library (BSB) and the

Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects, Examples

5. Outlook

Copyright Bayerische Staatsbibliothek

Page 5: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

5

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Population: 12,5 million

7 Counties

8 towns > 100.000 Inhabitants

Capital city Munich (1,5 million)

245.000 university students

The Free State of Bavaria and Munich

Copyright Bayerische Staatsbibliothek

Page 6: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

6

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Bavarian State Library: Some Key Facts & Figures

Founded in 1558 by dukeAlbrecht V.680 employees43 million euro/annualbudgetInventory ~10 million volumes with an annualincrease ~ 150.000 volumes~1 million visitors in the reading room…

Copyright Bayerische Staatsbibliothek

Page 7: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

7

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Major Functions of BSBInternational research library of worldwide importanceCentral regional and archival library of the Free State of BavariaHead of the Bavarian Library Network (BVB) and Leader of the Bavarian Library ConsortiumOne of three partner of the „German Virtual National Library“ (together with Staatsbibliothek zu Berlin and Die Deutsche Nationalbibliothek in Frankfurt/Leipzig)

Copyright Bayerische Staatsbibliothek

Page 8: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

8

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

BSB as a Gateway to Cultural Heritage Resources

Collections: Timeframefrom 8th to 21th centuryManuscripts and Rare Books Collection of worldimportance

89,000 Manuscripts(number 4 worldwide)19,000 Incunabula (topposition in the world)130,000 16th centuryprints (largest collection in Germany)

Illuminated Reichenau Manuscripts part of the UNESCO Memory of the World since 2004

Copyright Bayerische Staatsbibliothek

Page 9: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

9

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

The Munich Digitisation Centre (MDZ)

1997 founded through the financing of the German Research Association as oneof two national digitisationcentres2003 integrated as divisionin the department of Acquisition, CollectionDevelopment and Cataloguing

MDZ todayNational competencecenter for digitisation and long-term preservationCentral innovation und digital production unit of BSB

Copyright Bayerische Staatsbibliothek

Page 10: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

10

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ Tasks within the Framework of „The Digital Library“

Research & Development Production Consulting (e.g. Workflows)Service Provider (e.g. scanning technology, Open-Source-Software for digitisation)

in(Retro-)DigitisationLong-term PreservationSubject Specific Portals

Copyright Bayerische Staatsbibliothek

Page 11: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

11

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ – Some Facts

Digitisation expertise in materials from 8th to 21th century

~32 M. files successfully processed= ~ 10 M. pages – 27,000 books= ~ 60 Terabyte of digital data stored and long-term

preserved in cooperation with Leibniz Supercomputing Centre (Munich-Garching)

= the largest operating long-term preservation archive in Germany

Complete production line from capture until long-term preservation based on own-developedSoftware ZEND

Comprehensive Hardwareequipment, e.g. for the processing of the Google Books Search imagecopies

Copyright Bayerische Staatsbibliothek

Page 12: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

12

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Examples of running R&D Projects at MDZ

Long-term Preservation

Development of organisational and business models for the long-term preservation of digital objects in Germany (German ResearchFoundation)BABS II – next step in deployment of practical long-term preservation (German Research Foundation) …

Digitisation IMProving Access to Text – Innovation in OCR

Development in new book scanning devices for fragile booksAutomated book scanners for 16th century books – developmentpartnership with theTreventus companyBook cradles

3D-Animation of books via WWW…

Copyright Bayerische Staatsbibliothek

Page 13: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

13

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Organisational structure 1

MDZ as Part of the Department of Acquisition, Collection Development and Cataloguing

Copyright Bayerische Staatsbibliothek

Page 14: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

14

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ Today's Organisation Unit in Detail

Copyright Bayerische Staatsbibliothek

Page 15: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

15

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ Staff & Financing

5 fulltime and 1 part-time established posts

70 full- and part-time temporary employees(44,5 FTE) financed by third-party grants from:

European UnionGerman Research Association (DFG)The Free State of Bavaria/Ministery of Science

Copyright Bayerische Staatsbibliothek

Page 16: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

16

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

www.munich-digitisation-centre.de

Daily Status

Copyright Bayerische Staatsbibliothek

Page 17: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

17

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library (BSB) and the Munich

Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and production

4. Projects - examples

5. Outlook

Copyright Bayerische Staatsbibliothek

Page 18: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

18

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Digitisation Strategy

Objectiveof retrodigital collection development

at BSB:

to digitize and make accessibly – free of charge –

all copyright-free library holdings

Copyright Bayerische Staatsbibliothek

Page 19: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

19

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

The Four Columns of

Copyright Bayerische Staatsbibliothek

Page 20: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

20

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Digitisation Strategy of the BSB

Third-party funds, Dig. On Demand, Conserv. Dig.

8th-16th centuryManuscripts, incunabula, special collections

Public-Private Partnership with(Google)

17-19th century

Third-party funds 20-21st century (new programmes in the context of German Digital Library approach)

Copyright Bayerische Staatsbibliothek

Page 21: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

21

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

What is Mass Digitisation?

Production of more than a million

pages?volumes ?

„Definition“ today (in Germany)production of more than a million pages

within a limited timeand tomorrow …?

Copyright Bayerische Staatsbibliothek

Page 22: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

22

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

What have we done before Mass Digitisation?

“Boutique”-digitisation Since 1997Small – medium sized projectsExplorative projects –learning and experimentingSelective picking of special objects under marketing aspects Deeper indexing of objectsManual work

Mass digitisation Started in 2006 with pilotLarge projetcs with 1 million pagesUsing technology and automation wheneverpossibleNo selection Dynamic indexing –starting at low level; deeper indexing when technically possible (e.g. Blackletter script)

Copyright Bayerische Staatsbibliothek

Page 23: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

23

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Mass Digitisation: Plans for Quantities?

German Research Foundation (DFG) target for digitisation of 16th.-18th. books

~1 M. titles =

~250 M. pages

Duration: Next 5-? years

Copyright Bayerische Staatsbibliothek

Page 24: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

24

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Current Mass Digitisation Projects at MDZProject Volumes Date Pages

VD16digital -1

~4.300 2006 -2008 ~ 1 M.

VD16digital -2

~37.000 2007-2009 ~ 7.5 M.

Inkunabeln ~ 9.000 2008-2012 ~ 1,6 M.

Private Public Partnership mit

> 1.2 M. 2008- > 300 M.

3 ScanRobots in the MDZ-Scancentre

Linux-Cluster of the MDZ for Processing the Google book copy

Copyright Bayerische Staatsbibliothek

Page 25: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

25

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Mass Digitisation: Main Challenges

Logistics: book pulling, transport, book-loan charging, quality management, return transport, book putting …Integration of metadataStorage and preservation – approx. 300,000 images per day Automated processing, e.g. automated book scanning devices…

Copyright Bayerische Staatsbibliothek

Page 26: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

26

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Mass Digitisation and Long-term Preservation

MDZ data production1997-2004 – over 3.000 CD-ROMs2004 migrated from CD-ROMs to tapes at Leibniz-Computing Centre2005 Start of the newlybuilt Scan-Centre fast growing archive

New challenges: ScanRobots and Googles output – more and moreTerabytes

Copyright Bayerische Staatsbibliothek

Page 27: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

27

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Processes and Production Cycle

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 28: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

28

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Processes and Production Cycle

Copyright Bayerische Staatsbibliothek

Page 29: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

29

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

PreparationSelection of all relevant titles for digitisation project from the collectionsBlock the selected items for other forms of use (lending order e.g.)Printing of order-forms for collecting the relevant items from the shelvesChecking the scanability of the selected items under conservational aspectsRegister the reason for non-scanability for every respective item

Correction of the catalogue if necessaryPrinting scanning order forms with Digital Identifier

Check if the books for scanning fit to the metadata of the selected title

Delivery of the books of one charge together with scanning order forms to the Scan Centre

… Scanning …Take the books back from the Scan Centre and check the completeness and actual condition of the booksPutting the books in the shelvesUnblocking lending orders

Copyright Bayerische Staatsbibliothek

Page 30: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

30

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 31: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

31

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Handling of Different Material Types

BSBs case: Wide range of materials from 8th-21th century

ManuscriptsIncunabulasEarly and rare printsHistorical Maps Photographic material (slights; glass plates etc.)MonographsNewspapersMagazines…

Copyright Bayerische Staatsbibliothek

Page 32: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

32

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Handling: Central Risks

Damages byManually treatment of the originalsRoom climate, e.g. during transport and reproductionLight

Copyright Bayerische Staatsbibliothek

Page 33: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

33

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Handling: Damages by Opening Angle of 180°

120°

opening angle spine streched, but

fold still okay

The same book:180°

opening angle fold broken

Copyright Bayerische Staatsbibliothek

Page 34: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

34

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Books Handling during Reproduction

BSBs Goal: Reduction of central risk factors of damageNo fixation during re-production under glass-plateDeformation of books

bodyClose cooperation with the Institute for Book and Manuscript Restoration at BSB

Cooperation in the selectingprocess of scanning devicesObligartory use ofCool ligthingOf different bookcradles for gentle book handling

Disadvantage: Higher set-up time

Copyright Bayerische Staatsbibliothek

Page 35: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

35

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Processes and Production Cycle

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 36: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

36

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

The MDZ Scan-CentreEspecially for books and special collectionmaterials before 1800(Modern books in outsourcing) 2005 developed fromreorganisation of the anlaogue reproductionservices

until2004

Since 2005

Copyright Bayerische Staatsbibliothek

Page 37: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

37

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Scan Facilities14 bookscanners

3 automatedbookscanners, „ScanRobots“

2 special bookscannersfor rare manuscripts and treasure holdings -„Grazer camera table“

Formats up to ~0,85 * 1,18 meters (DIN A0)

High resolution up to 600 ppi

Copyright Bayerische Staatsbibliothek

Page 38: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

38

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Image Scanning: Our Philosophy„Do it right the first time“

scanning at the best possible qualityReuse mulitple

Facsimile, reprintWWWDigitisation on Demand…

ImagesDigital master files: TIFF Derivates for WWW-Presentation: JPEG and PDF

Extensive automation of workflows

Copyright Bayerische Staatsbibliothek

Page 39: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

39

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Image Quality and Scanning Parameters

High resolution digitisation in relation to the original size ofManuscripts, early printed books, historical Maps, special materials

colour (24 Bit)400 up to 600 ppi Digital Master: TIFF uncompressedUse of

Colour Management Software (ICC-Profiles)Size – and Colour Targets

Modern prints of 19th and 20th centuries in dependency of the original

Text only: black and white (1Bit)Text/images: 1 Bit and grayscale (8 Bit) and/or colour300 up to 600 ppi ( 600 ppi only 1 Bit and TIFF-G4 compressed)

Image storage size between 100 Kilobyte up to 800 Megabyte per image

Detail pictures (same image enlarged ) In 24 Bit, 1 Bit, 8 Bit

Copyright Bayerische Staatsbibliothek

Page 40: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

40

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Scanning Output: Examples

Copyright Bayerische Staatsbibliothek

Page 41: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

41

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 42: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

42

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Digital Long-Term PreservationStill in experimental phase, no allround solution

Different approachesMigrationEmulationComputer museum

In cooperation on national and international levelnestor - the german Network of expertise in Digital long-term preservationInstitute of Informatics - University of the German Federal Armed ForcesEuropean UnionUSA

MDZ focuses on Digital long-term perservation in practice with the Leibniz Computing CentreTrusted Repositories Certification for long-term archives

Copyright Bayerische Staatsbibliothek

Page 43: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

43

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ:Milestones in long-term preservation1999: First project concerning removable media – testbedof a computer museum (Institute of Informatics, GFAF)

2004: 1. Migration of CD-ROM stored digitisation data to a large-scale archiving system of the Leibniz Supercomputing Centre (LRZ) in Munich (IBM Tivoli Storage Manager)

2005: automatical storage procedures are implemented, connecting the storage/archiving system of the LRZ and MDZ-production infrastructure (ZEND-Software)

Daily online transfer of new dataSecure storage of multiple copiesLong-term preservationRapid restore functions

Copyright Bayerische Staatsbibliothek

Page 44: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

44

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

2 x IBM TS3500 (3584)

20 LTO II drives35 MByte/s transfer rate 5,000 tape slots200 GByte per tape990 TByte total capacityMaximum possible capacity: 6.900 tape slots, 1,400 TByte

STK SL850016 titanium drives120 MByte/s transferrate4,900 tape slots500 GByte per tape2,400 TByte total capacity

(eq. 3,8 M CDs)Maximum possible capacity: 300.000 tape slots, 146,000 TByte

Long-term Preservation: The Archival System of the LRZ

Copyright Bayerische Staatsbibliothek

Page 45: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

45

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

MDZ: Increase of Long-Term Preserved Data

0

10.000.000

20.000.000

30.000.000

40.000.000

50.000.000

60.000.000

Jan 0

5Mrz

05Mai

05Ju

l 05

Sep 05

Nov 05

Jan 0

6Mrz

06Mai

06Ju

l 06

Sep 06

Nov 06

Jan 0

7Mrz

07Mai

07Ju

l 07

Sep 07

Nov 07

Jan 0

8

Expected annual increase: at last 100-150 Terabyte

Copyright Bayerische Staatsbibliothek

Page 46: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

46

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Copyright Bayerische Staatsbibliothek

Page 47: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

47

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 48: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

48

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Indexing – Metadata CreationMetadata will needed for

Search and RetrievalNavigation/Browsing

Creation of1. Bibliographical (descriptive) Metadata = from OPAC2. Structural metadata = stage model for indexing a book

a) Only Images b) Image and text of the table of contentsc) Images, text of the table of contents and indicesd) Images and „hidden text“ (with errors) e) Complete, layout-proof , nearly error-free text

3. Technical MD = How the data were created?4. Administrative MD = When the data were changed? By

whom?

Copyright Bayerische Staatsbibliothek

Page 49: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

49

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Standards for Document formats and Character Set

XML as data exchange format with

Metadata Encoding Transmission Standard (METS) Text Encoding Initiative (TEI)

Character setUnicode, UTF-8 encoding

Copyright Bayerische Staatsbibliothek

Page 50: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

50

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 51: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

51

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Building the WWW-Interfaces

In dependence of projects requirements and the depth of indexing for

Search and Retrieval

Navigation/Browsing

Usability

Accessibility

Copyright Bayerische Staatsbibliothek

Page 52: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

52

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Examples of Different Online Publications

Copyright Bayerische Staatsbibliothek

Page 53: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

53

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 54: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

54

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Proofs of Digitisted Items in Catalogues, Subject Gateways ...

Copyright Bayerische Staatsbibliothek

Page 55: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

55

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 56: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

56

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Reuse

Digital Master files at high resolution

Cross media publishing Facsimilie and ReprintExhibiton catalogues…

Document delivery

Deeper indexing at anytime

3D

Copyright Bayerische Staatsbibliothek

Page 57: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

57

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Reuse: Example 3D-Animation with an Book of Arms

Testbed in collaboration with

Copyright Bayerische Staatsbibliothek

Page 58: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

58

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Last but not least: Quality Control

Everywhere in production at defined points

But: 100% accuracy from „Boutique“-digitisation times is unrealistic for massdigitistion (only spot checks)

New approaches, e.g.checking page number sequence with OCR duringscanning for completnessInvolving the users in QC by offering a comment form with each image

Problem: Each reprocessing increases costs

Copyright Bayerische Staatsbibliothek

Page 59: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

59

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Software

c) Example of a Production Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 60: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

60

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND Production SoftwareZEND= Zentrale Erfassungs- und NachweisDatenbank[central acquisition and proof database]

ZEND developed by MDZ since 2003

Electronic Publishing System with modules forDocument-ManagementWeb-Content-ManagementWorkflow-ManagementLong term preservation of digital media

Based on Open-Source-SoftwareLinuxApacheMySQLPHP (bzw. Perl)Cocoon– XML-Publishing Framework von Apache …

Flessible (for different media types)Scaleable and expandable (e.g. Cross Server Architecture)

Copyright Bayerische Staatsbibliothek

Page 61: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

61

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND-GoalsMapping of the entireproduction cycle and itsprocesses in a modular systemDifferent service providers(scanning, text capture) can supply unlimited data to ZENDWorkflow-controlEvery item of the BSB, which will be digitised, follows only the ZEND-workflowTime and costreduction through extensive automation

Copyright Bayerische Staatsbibliothek

Page 62: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

62

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND-Main FeaturesWeb-based user interface and administrationSpecific job controlCollaborative workingImport of scanned books processed by different, unlimitedservice providersAutomatic URN allocation with linkresolving via German National LibraryOAI Data ProviderBibl. metadata-Import data via interface Z.39.50XML support for all common standards (for example, TEI, METS) Rights management: inhouse digital reading room or WWW publicationLong-term archivingSearch and RetrievalOCR server (works only for latin print type)PDF on Demand …

Copyright Bayerische Staatsbibliothek

Page 63: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

63

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND – Automated ProcessesFull-automatic processes

Collection of image data from different serviceprovidersImage conversionFast supply of an simple working turning-pagesversion of the book (XML)Regular bibliographical data updates (ZEND from the OPAC) OCRLong-term archiving of master files at LRZ and subsequent deleting of master-fiels on the MDZ-servers…

Semi-automatic processesLoading of large amounts of bibliographic metadata fromcatalogueRe-import lists of the URNsin the catalogueImport of prepared Excel-sheets for the creation of ToCs in ZENDAllocation of page numbersto digitised books (for navigation)Transfer of digital masterback from the long-termarchive…

Copyright Bayerische Staatsbibliothek

Page 64: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

64

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND-Layers

Capture, Content-Management, Indexing

Publication

Services

Workflow-Management

Data-Management

Copyright Bayerische Staatsbibliothek

Page 65: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

65

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library

(BSB) and the Munich Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

a) Production Cycle and Subprocesses

PreparationHandling of OriginalsScanning Storage and Long-termPreservationMetadata CreationOnline Publication and Data ManagementProofReuse

b) ZEND Production Softwarec) Example of a Production

Workflow with ZEND

Copyright Bayerische Staatsbibliothek

Page 66: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

66

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Overview: ZEND-Production

Production System

o Digitisation on Demando Project-oriented Digitisationo Conservational Digitisation

Zahlbar an $

Order

Exchange of metadate

Inhouse-onlypublication

Online Publication

Webpublication

Archival Storage

Definitive file name

URN

Digitised Object

Catalog (OPAC)

Network of Bavarian Libraries

Digitisation

Portals(BLO, ZvDD, Chronicon,...)

Search engines

OAI

URN-Resolving(XEpicur)

DNB ZEND

Administration of all metadata (bibliographic, technical, administrative)Automatic image processing

1

7

2 2

3

6

4

4

8

53

5

Copyright Bayerische Staatsbibliothek

Page 67: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

67

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Capture/Management (1)

withZEND-Services

Copyright Bayerische Staatsbibliothek

Page 68: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

68

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Capture/Management (2)

Result of order form: Process slipwith barcode

Copyright Bayerische Staatsbibliothek

Page 69: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

69

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Capture/Creation of File Name and URN (3)

Automated assignment with creation of the process slipOf the definitive file name

Example: bsb00001119_00001.tif

and at the same time

Assignment of a persistent identifier by use of the National Bibliography Number

Example : urn:nbn:de:bvb:12- bsb00001119

Assignment is done locally, but the administration of the URN and the central link resolving is part of the duties by the DNB: http://nbn-resolving.de/urn/resolver.pl?urn= urn:nbn:de:bvb:12- bsb00001119

Copyright Bayerische Staatsbibliothek

Page 70: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

70

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Capture/Management–3Books in process list:status

Copyright Bayerische Staatsbibliothek

Page 71: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

71

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Indexing (4)

Import of bibl.Metadata

Copyright Bayerische Staatsbibliothek

Page 72: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

72

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Indexing (5)

Import of bibl.Metadata viaZ39.50

Copyright Bayerische Staatsbibliothek

Page 73: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

73

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: After Scanning and Import of the Digital Master Files of the production System (6)

Creation of an index file for the online publication of the imagesAutomatic production of image formats for the web presentation (JPG, PDF etc.)Creation of the browsing structure for the object (ToC-Editor)OCR processing of the images (optional)Storage of all the data in the Leibniz Supercomputing Centre for long-termpreservation

Copyright Bayerische Staatsbibliothek

Page 74: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

74

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Indexing after Import (7)

URN – National bibliography number

Copyright Bayerische Staatsbibliothek

Page 75: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

75

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Indexing - ToC Editor (8)

Adding structuralinformation andenablingWWW-access

Copyright Bayerische Staatsbibliothek

Page 76: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

76

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Immediate availability of the link inside the local catalogue (OPAC) and the Bavarian Union Catalogue

Copyright Bayerische Staatsbibliothek

Page 77: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

77

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ZEND: Final Online Publication (9)

Copyright Bayerische Staatsbibliothek

Page 78: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

78

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library (BSB) and the Munich

Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

Copyright Bayerische Staatsbibliothek

Page 79: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

79

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Current Mass-digitisation Projects at MDZProject Volumes Project

periodPages

VD16digit al-1

~4.300 2006 - 2008

~ 1 M.

VD16digit al-2

~37.000 2007- 2009

~ 7.5 M.

Inkunabeln ~ 9.000 2008-2012 ~ 1,6 M.

Private Public Partnershi p mit

> 1.2 M. 2008- > 300 M.

3 ScanRobots in the MDZ-Scancentre

Linux-Cluster of the MDZ for Processing the Google book copy

Copyright Bayerische Staatsbibliothek

Page 80: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

80

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

„VD16-1 and VD16-2 digital“ -Projects

Goal: Online Publication of all 16th century books which are unique in the inventory of BSB

Copyright Bayerische Staatsbibliothek

Page 81: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

81

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

1501-1517( „VD16-1“)

Pilot project for massdigitizationDuration: 2006-2008 Manual scanning withbook scanners (A2) Digital Master with 300-400ppi, 24 bit, TIFF withICC profiles, uncompressed4,300 ~ ~ 20 terabytes of digital master data Deeper IndexingProcessing from Captureto long-term preservationwith ZEND

Cooperation partner for long-term storage and archiving Leibniz Supercomputing CentreDuration: 2007-2009Scanning with automaticbook scannersDigital Master with300ppi, otherwise likeV16-137000 titles = ~ 175 terabytes of digital masterZEND processing

1518-1600( „VD16-2“)

Copyright Bayerische Staatsbibliothek

Page 82: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

82

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Scanning Difficulties in the „VD16-1“-Project

Conservational requirements: Only 70% of books could be opened up to 90 °Scanning with „normal bookscanners“Need for book cradle support: only one-sided scanningpossible (1 Scan / click = 1 page) 3 process steps

1. Scanning all left pages2. Scanning all right pages3. Assembling left and right pages

Due lack of pagination in 16th century books led to higher error rate

Reduction of throughput

Copyright Bayerische Staatsbibliothek

Page 83: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

83

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Scanning Throughput in the „VD16-1“-Project

Average rate 3 books per Scanner / 8 h-working day

Increasing set-up time due conservational requirements

Set-up time = Adjustment of the book before the first scanScanning of the color/sharpness targetsAdjusting of sharpness during the scan-process (only for scanning without glass-plate)Integration time – writing the data from the CCD/firmwareto harddiskShut-down of the book

Set-up time is changing from object to object

Copyright Bayerische Staatsbibliothek

Page 84: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

84

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

The Way to the ScanRobot

Objective: optimisation of throughput for the scanning of early and rare prints with limited opening angle 2006 - Market evalution for automatic bookscanners (hardware and software) and tenderDevelopment partnership withMilestones

July 2007: 2 prototypes in the BSBJanuary 2008: start of productionMarch 2008: third robotApril 2008: Start of 4-shift operation from 07.00 a.m. to 11.00 p.m.

Copyright Bayerische Staatsbibliothek

Page 85: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

85

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

„VD16-2“: ScanRobot-Throughput

Status now: 10 books per ScanRobot / 8 hour workingdayTotal weekly production: 300 books ~ 75,000 pages ~ 2,2 Terabyte

Interim result: The objective of a significantincrease in production is achieved

Next target - optimisation of the book cradle : 500 + x titles per week

Copyright Bayerische Staatsbibliothek

Page 86: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

86

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

ScanRobot in ActionScanning 16th centuries books with automated bookscanners

Copyright Bayerische Staatsbibliothek

Page 87: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

87

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Digitisation in the GBS- programme

By Google and at the expenses of Google

Google Digital CopyIntegration in Google‘s services

Library Digital CopyIntegration in the BSB services

Copyright Bayerische Staatsbibliothek

Page 88: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

88

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Mass Digitisation

No „Selective Picking“: no prioritisation of textcorpora, shelf numbers, material types etc.

Logistics determines procedures and tempo of digitisation(Trucks, Carts, Operating Conditions, Shifts etc. etc.)

Selection focusses strictly on conservation purposes, formats or Copyright-issues

Continuous optimisation of Google‘s Scanning technologies („If it doesn‘t work today, it‘ll do tomorrow“)

In principle: If Google cannot digitise for technical orconservational reasons, the MDZ will do

Copyright Bayerische Staatsbibliothek

Page 89: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

89

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Some Workflow-FacetsOrganisation of a workflow with many departments for the digitization of up to 1,000 books and more on the day

Guaranteeing the quality of the workflow and itsoutcome: Track every book in every step of the process

Collection and documentation of process data (Non-scanable marks, comments…)

The digitisation process shouldn‘t disturb the normal library processes

Copyright Bayerische Staatsbibliothek

Page 90: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

90

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

:The Role of MDZ MDZ as part of a BSB-wide (all departments involved) project organisation

Responsible for the processing of the „Digital Library Copy“ from data import to online-Publication

Processing software: a special application of ZEND= „Google-ZEND“

Storage und long-term preservation in cooperation withLeibniz Supercomputing Centre

Copyright Bayerische Staatsbibliothek

Page 91: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

91

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Google-ZEND in a Nutshell

Copyright Bayerische Staatsbibliothek

Page 92: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

92

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

DLC – Preview of Online Publication by MDZ

Coming soon …

Copyright Bayerische Staatsbibliothek

Page 93: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

93

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Agenda1. The Bavarian State Library (BSB) and the Munich

Digitisation Centre (MDZ)

2. BSB Digitisation Strategy and the Challenges for Mass Digitisation and Long-Term Preservation

3. Processes and Production

4. Projects

5. Outlook

Copyright Bayerische Staatsbibliothek

Page 94: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

94

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

OutlookMDZ is just in the initial phase of mass digitisation

Long-term preserved data can be processed in the futurewith new technologies

Needs for further development and automation inOCRIndexing – Text MiningQuality Control

More online available digital content allows deeper linking between different repositories

There are still a lot of challenges …

Copyright Bayerische Staatsbibliothek

Page 95: Mass digitization and long-term preservation – processes ...€¦ · 20 Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008 Digitisation Strategy of the BSB Third-party

95

Dr. Markus Brantl Goethe Institute - Singapore/Jakarta June 2008

Thank you for your attention!

Contact:markus.brantl@bsb- muenchen.de

Copyright Bayerische Staatsbibliothek