approaches to preserving digitized taxonomic data

23
Approaches to preserving digitized taxonomic data: Prints, manuscripts & specimens Chris Freeland Director, Center for Biodiversity Informatics Technical Director, Biodiversity Heritage Library 28 October 2011 @chrisfreeland

Upload: chris-freeland

Post on 18-Nov-2014

3.065 views

Category:

Technology


0 download

DESCRIPTION

Sherborn Symposium. Natural History Museum, London. 28 October 2011.

TRANSCRIPT

Page 1: Approaches to preserving digitized taxonomic data

Approaches to preserving digitized taxonomic data:

Prints, manuscripts & specimens

Chris FreelandDirector, Center for Biodiversity Informatics

Technical Director, Biodiversity Heritage Library28 October 2011

@chrisfreeland

Page 2: Approaches to preserving digitized taxonomic data

Prints / Manuscripts / SpecimensDifferent objects, similar management

http://www.flickr.com/photos/biodivlibrary/6257859557 http://www.flickr.com/photos/chrisfreeland/6018724034 http://www.biodiversitylibrary.org/page/34045915

Page 3: Approaches to preserving digitized taxonomic data

Overview of Talk

• Why worry about digital preservation?

• Considerations for preservation– Collaboration– File formats– Metadata standards

• Views to the future

Preservation Panic!

Page 4: Approaches to preserving digitized taxonomic data

WHY WORRY?http://www.flickr.com/photos/biodivlibrary/6008902662

Page 5: Approaches to preserving digitized taxonomic data

Do it once, do it right

Costs more to get object to scanner than to scan

Page 6: Approaches to preserving digitized taxonomic data

• Conversion / Compost / Corruption• Longevity of digital objects• File changes• Media obsolescence

Cautionary Tales

Page 7: Approaches to preserving digitized taxonomic data

CONSIDERATION: COLLABORATION

Page 8: Approaches to preserving digitized taxonomic data

LOCKSS

Lots Of Copies Keeps Stuff Safe

• LOCKSS is both a software platform & a concept– Software: http://www.lockss.org

Page 9: Approaches to preserving digitized taxonomic data

Museum XLibrary Y

Rule of 3

Archive Z

1. Geographic Locations 2. Administrations 3. Technology Platforms

Page 10: Approaches to preserving digitized taxonomic data

CONSIDERATION: FILE FORMATS

Page 11: Approaches to preserving digitized taxonomic data

JPEG2000

• Wavelet compression, lossless encoding• 12 Parts• Of particular interest to documents &

specimens:– Part 1: Core Coding System, ISO/IEC 15444-1– Part 6: Compound image file format– Part 10: JP3D, Volumetric images

http://www.jpeg.org/jpeg2000/

Page 12: Approaches to preserving digitized taxonomic data

http://www.tropicos.org/ImageFullView.aspx?imageid=62182

Page 13: Approaches to preserving digitized taxonomic data

JPEG2000 (Hurrahs & Hisses)

• Advantages– Store a single file for access & preservation– Standards-based– Saves drive space (important at museum scale)

• Disadvantages– Doesn’t have wide native support in many apps– Requires an intermediary app to decode & serve

• But, there’s an open source option: djatoka http://djatoka.sourceforge.net

– Reports of data loss

Page 14: Approaches to preserving digitized taxonomic data

PDF/A

• ISO-standardized version of PDF suitable for long-term preservation

• Identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way in years to come.*

• Makes the file self-contained (and therefore larger)– Embeds fonts– Graphics

* http://en.wikipedia.org/wiki/PDF/A

Page 15: Approaches to preserving digitized taxonomic data

CONSIDERATION: METADATA

Page 16: Approaches to preserving digitized taxonomic data

The Great Thing AboutSTANDARDS

Is That There AreSO MANY

To Choose From

Page 17: Approaches to preserving digitized taxonomic data

FilesystemFilesystem

Metadata Preservation

• Descriptive information (metadata) provides content & context for indexing, reuse

• Can bundle metadata within files– EXIF: images, common in digital cameras– Adobe XMP: docs, images

• Should commit metadata to file system– Should not manage just

in DB or other management system

<DwC> XMLXML

JP2JP2

Page 18: Approaches to preserving digitized taxonomic data

THE FUTURE

Page 19: Approaches to preserving digitized taxonomic data

Electronic Publications

• Happening now, has been for years• Should take same care in ensuring

heterogeneity & diversity in digital management systems as with printed, bound books– Monolithic libraries have failed over time– Monolithic electronic archives will, too

Page 20: Approaches to preserving digitized taxonomic data

http://www.biodiversitylibrary.org/page/22681143

Need a meadow…

Page 21: Approaches to preserving digitized taxonomic data

…not a monoculture.

Page 22: Approaches to preserving digitized taxonomic data

There is no silver bullet

• Make best decision today

• Stay up with technology changes & best practices– <insert library & archive professionals here>

• Evaluate, experiment, document, lead

• Move to stable new technologies when necessary

Page 23: Approaches to preserving digitized taxonomic data

Questions?Chris Freeland

Director, Center for Biodiversity InformaticsTechnical Director, Biodiversity Heritage Library

28 October 2011

Email: [email protected]

Twitter: @chrisfreeland