digital libraries, r. jantz - feb. 26, 2002 1 digital preservation - outline introduction -...

32
Digital Libraries, R. Jantz - Fe b. 26, 2002 1 Digital Preservation - Outline Introduction - Definitions, Facts, Challenges Digital Archiving – A Life Cycle View Metadata Strategies RUL Projects Trusted Digital Repositories

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Digital Libraries, R. Jantz - Feb. 26, 2002

1

Digital Preservation - Outline

Introduction - Definitions, Facts, Challenges

Digital Archiving – A Life Cycle View– Metadata– Strategies– RUL Projects

Trusted Digital Repositories

Digital Libraries, R. Jantz - Feb. 26, 2002

2

Digital Dark Ages?

As we move into the electronic era of digital objects it is important to know that there are new barbarians at the gate and that we are moving into an era where much of what we know today, much of what is coded and written electronically, will be lost forever. We are, to my mind, living in the midst of digital Dark Ages; consequently, much as monks of times past, it falls to librarians and archivists to hold to the tradition which reveres history and the published heritage of our times. (Kuny, 1998)

Digital Libraries, R. Jantz - Feb. 26, 2002

3

Preservation

“The urge to preserve is endemic to our roles as librarians.”

“The patent office, home to nearly 6.5 million patents dating to 1790, is converting to an electronic database and discarding a significant portion of its paper files after they have been scanned and digitized.” -Mitchell, A. (2001). Ingenuity’s Blueprints, Into History’s Dustbin. NY Times. December 30, 2001, p. A1.

A scenario: A truck loaded with hazardous waste is headed toward a dump site. Will our descendants know where we have buried the waste? (Bide, et al, 1999)

Digital Libraries, R. Jantz - Feb. 26, 2002

4

Digital Preservation – Some Numbers

$20 Trillion loss of information expected over the next 20 years (Lysakowski and Leibowitz, 2000)

– Within 10 years, the total number of electronic records could be doubling every 60 minutes.

– From an economic model, the cost of converting from MS-Office95 to Office97 is estimated at 711,110 work years.

80 Million books in the US are rapidly deterioratingYale University states that 80% of their collection is endangered.

Print material All print material (ascii text) published in the world each year could be

stored in about 5 terabytes Images

Over 80 billion photographs are taken each year which would take 400 petabytes to store.

Digital Libraries, R. Jantz - Feb. 26, 2002

5

Numbers – continued(from: http://www.ccsf.caltech.edu/~roy/dataquan/)

Megabyte – one million bytes Gigabyte – 1000 megabytes Terabyte – 1000 gigabytes

10 terabytes: the printed collection of the US Library of Congress

Petabyte – 1000 terabytes 2 petabytes: all the material in US academic research libraries

Exabyte – 1000 petabytes 5 exabytes: all words ever spoken by human beings

Digital Libraries, R. Jantz - Feb. 26, 2002

6

Preservation in Digital Libraries

Preservation:“The managerial, financial, and technical issues involved in preserving library (or archive) materials in all formats - and/or their information content - so as to maximize their useful life” (Eden, 1997)

Digital preservation is defined as the managed activities necessary for ensuring:

1. The long term maintenance of a byte stream and 2. Continued accessibility of the contents thru time and changing technology.

Digital Libraries vs. Digital Archives: Archives make a commitment to long-term preservation of digital information. (Joint Task Force on Digital Archiving)

Digital Libraries, R. Jantz - Feb. 26, 2002

7

Why Would You Digitally Preserve?

Protect original print artifact

Provide access by accurately representing originals

Preserve material that exists in electronic form only

Enhance research by “improving” originals– High resolution imagery to study details– Searchable text

Digital Libraries, R. Jantz - Feb. 26, 2002

8

The Challenges of Digital Preservation

Lack of standards (or too many standards) Lack of documentation on production and use Cost and rapid obsolescence of technology Impermanence of the medium Mutability of the content (easily changed – legal issues) Version control Need to guarantee integrity of digital information Migration of information (driven by external factors)

Digital Libraries, R. Jantz - Feb. 26, 2002

9

What to Archive – A Checklist

Historical and research value Aesthetic and artistic merit Uniqueness of an item Subject content – relevant to Institution Access – Restrictions and inventory Condition Frequency of use – frequency of change Ownership Redundancy – concern for loss or modification Length of preservation Is any other institution archiving the material?

Digital Libraries, R. Jantz - Feb. 26, 2002

10

Candidates for Preservation

Material created (not digitized) in digital format – Reference databases (online catalogs, subject specific indexes, etc)– Electronic journals– Digital maps– Data– Websites (e.g. research guides, web-based databases, documents)

Government information– Census data, international statistics (Do we rely on the government to preserve this

material?)– Consortiums such as Inter-university Consortium for Political and Social Research

(ICPSR) have a role. Print material/manuscripts that are digitized for access and/or preservation:

– Original documents not retained (e.g. as in the NJ Environmental Digital Library)– Original document retained (as in Special Collections)

Electronic (analog) media that is digitized (audio, video tapes)

Digital Libraries, R. Jantz - Feb. 26, 2002

11

Digital Archiving – A Life Cycle View

1. Creation

2. Acquisition and Collection Development

3. Identification and Cataloging

4. Storage

5. Preservation (incl. Metadata)

6. Access

from (Hodge, 2000)

Digital Libraries, R. Jantz - Feb. 26, 2002

12

Digital Preservation Strategies

Migration: transferring digital materials from one media or format to another because of obsolescence, failure in media, software updates, standards, etc.

Emulation: refers to the process of mimicking, in software, a piece of hardware or software so that other processes think the original equipment/function is still available in its original form. (http://www.nla.gov.au/padi/topics/17.html)

Encapsulation: A technique of grouping together a digital object and anything else necessary to provide access to that object. This technique aims to overcome the problems of the technological obsolescence of file formats because the details of how to interpret the digital bits in the object can be part of the encapsulated information. (http://www.nla.gov.au/padi/topics/17.html)

Digital Libraries, R. Jantz - Feb. 26, 2002

13

Migration of Digital Information

Reasons for Migration: Medium refreshing (e.g. rewrite a CD)

Medium conversion (diskette to CD)

Format conversion (ascii to pdf)

Version upgrade (Office97 to Office2000)

Migration of technical environment (W98 to NT)

Digital Libraries, R. Jantz - Feb. 26, 2002

14

The Migration Process

Error Prone

Labor intensive and expensive

Governed by external factors

The only approach that works for now

Digital Libraries, R. Jantz - Feb. 26, 2002

15

RUL Projects – A Sampling

Medieval Early Modern Data Bank

Eagleton Public Opinion Polls

The Augustine Collection

REALITI – A Digital Preservation Framework

Digital Libraries, R. Jantz - Feb. 26, 2002

16

Medieval Early Modern Data Bank - MEMDB

Characteristics: At: http://www.scc.rutgers.edu/memdb Content: commodity prices in the medieval period Access: public domain Compiler: Co-directors of MEMDB Owner: RUL? Archiver: (who should archive?) Type: Database on the web Format: html, Active server pages, MS-Access, html Metadata req’mts: numeric data

Questions: What is the primary document? How long should it be preserved? Extent of “document”? Owner? Preserve look & feel?

Digital Libraries, R. Jantz - Feb. 26, 2002

17

Eagleton Public Opinion Polls

Characteristics• At: http://www.scc.rutgers.edu/eagleton_tst• Content: New Jersey public opinion (1970 - )• Access: public domain• Compiler: Eagleton Institute• Owner: Eagleton/Star Ledger• Archiver: RUL/Scholarly Communication Center• Type: database on the Web• Format: html, MS-Access, portable spss files• Metadata req’mnts: Questionnaires & numeric data

• Questions: Preserve “look & feel”, spss (proprietary software)

Digital Libraries, R. Jantz - Feb. 26, 2002

18

The Augustine Collection

Characteristics• At: http://www.scc.rutgers.edu/augustine• Content: Photographs from 19th Century New Jersey Shore• Access: public domain• Compiler: William F. Augustine• Owner: RUL Special Collections• Archiver: RUL Special Collections• Type: image archive• Format: html, jpeg• Metadata req’mnts: original artifacts

• Questions: image format, preserve digital archive, individual items/collection

Digital Libraries, R. Jantz - Feb. 26, 2002

19

REALITI – A Digital Preservation Framework(Rutgers Electronic Access to Library Information thru Technology Integration)

Characteristics• At: http://www.scc.rutgers.edu/realiti• Content: Civil War period in New Jersey • Access: public domain• Compiler: RUL Special Collections• Owner: RUL• Archiver: RUL Special Collections/SCC• Type: Images on the Web• Format: html, ColdFusion, MS-Access, PDF, djvu,tiff• Metadata: Preservation, multiple formats

• Questions: formats, compression, metadata, original artifact

Digital Libraries, R. Jantz - Feb. 26, 2002

20

Preservation Metatdata for Digital Collections*

Collection Level1. Persistent identifier:

2. Date of creation:

3. Structural type: (e.g. ascii text, jpeg images, etc)

4. Technical infrastructure: files, databases, html, etc.

5. File description

6. System requirements:

7. Installation requirements:

8. Storage information:

9. Access inhibitors:

10. Access facilitators:

11. Preservation action permission:

12. Validation: (information about validation mechanism)

13. Relationships (to other objects):

* (from National Library of Australia: http://www.nla.gov.au/preserve/pmeta.html )

(continued) 14. Quirks: (any characteristic that may cause loss in

funtionality)15. Archiving decision (work):16. Decision reason (work):17. Institution responsible for archiving decision:18. Archiving decision (manifestation):19. Decision reason (manifestation):20. Institution Responsible for Archiving Decision

(manifestation) 21. Intention Type 22. Institution with preservation responsibility 23. Process 24. Record Creator 25. Other

Digital Libraries, R. Jantz - Feb. 26, 2002

21

Trusted Digital Repositories (http://www.rlg.org/pr/pr2001-attributes.html)

A Proposed Framework for a Trusted Archival Agent:

Administrative – adherence to agreed upon standards

Organizational – commitment to management on behalf of depositors

Financial – adherence to good business practices

Technological – infrastructure in place with upgrade policies

Security – policies for security, auditability, and backup

Procedural – Repository practices will be in place and documented.

Digital Libraries, R. Jantz - Feb. 26, 2002

22

Possible Organizational Models(Who might be a digital repository)

Originators (e.g. individual researchers) Publishers (What happens when the publisher goes out of business?) Libraries, museums, and other conservator institutions National libraries and archives Cooperative service agencies (e.g. OCLC, RLG, ICPSR for social science

research) Segmented market providers (e.g. Bell & Howell for preserving

dissertation literature and Early English Books) Private storage providers Computer centers Scholarly associations (e.g. American Institute of Physics) Indexing and abstracting services Certified digital archives.

Digital Libraries, R. Jantz - Feb. 26, 2002

23

Institutional Efforts

OCLC and Web Document Digital Archive (WDDA) Project– Tools for libraries and archives to preserve and maintain access to digital content– At: http://www.oclc.org/digitalpreservation

RLG Cultural Materials– Cultural Materials is being developed through members to set the conditions for contributing

and distributing their digital surrogates of valuable collections.– The goal is a growing, significant, online resource and service solution.– At: http://www.rlg.org/culturalres

LOCKSS – A permanent web publishing and access system– Addresses problem of material no longer available from the publisher– Modeled on distributed print libraries. Reich, et al, (2001). D-Lib Magazine, 7, (6).

OAIS – Open Archival Information System Reference Model– Requirements for any system responsible for preserving any type of information over a long

period.– At: http://ssdoo.gsfc.nasa.gov/nost/isoas

Digital Libraries, R. Jantz - Feb. 26, 2002

24

Digital Preservation – Concluding Thoughts

Librarians and archivists are a key to the solution. “A major academic scandal will have to happen

first . . .” in order to focus attention and resources. (Graham, 2000).

A combination of solutions will be employed including migration and emulation.

Digital “archaeology” will be used to recover lost data.

Digital Libraries, R. Jantz - Feb. 26, 2002

25

Preservation Resources on the Web

Institutional Issues:– ARL Preservation Program (http://www.arl.org/preserv/index.html)Digital Preservation Needs and Requirements in RLG Member Institutions

(http://www.rlg.org/preserv/digpres.html)– RLG DigiNews (http://www.rlg.org/preserv/diginews/)

Technical Information/Papers:– Avoiding Technological Quicksand

(http://www.clir.org/pubs/reports/rothenberg/contents.html)– PADI - Preserving Access to Digital Information - from the National Library of Australia

(http://www.nla.gov.au/padi/).– Background Papers and Technical Information - from LOC American Memories site (

http://memory.loc.gov/ammem/ftpfiles.html)– Preservation of electronic information - a bibliography

(http://homes.ukoln.ac.uk/~lismd/preservation.html)– Digital Imaging Tutorial - http://www.library.cornell.edu/preservation/tutorial/

Digital Libraries, R. Jantz - Feb. 26, 2002

26

More Information on the Web

Technical Information/Papers (continued):– CLIR Publications (http://www.clir.org/pubs/reports/reports.html)– Kuny, T. (1998/May). The digital dark dges? Challenges in the preservation of electronic

information. International Preservation News, (17), At http://www.ifla.org/VI/4/news/17-98.htm#2

– Hodge, G. M. (2000). Best practices for digital archiving: An information life cycle approach. D-Lib Magazine, 6, (1), available at: http://www.dlib.org/dlib/janauary00/01hodge.html

Handbooks:– Hunter, G. S. (2000). Preserving Digital Information: A How-To-Do-It Manual, New York:

Neil-Schuman Publishers– Sitts, M. K. (2000). Handbook for Digital Projects: A Management Tool for Preservation

and Access, Andover, Massachusetts: Northeast Document Conservation Center

Digital Libraries, R. Jantz - Feb. 26, 2002

27

References

Bide, M, Potter, E, &Watkinson, A. (1999) , Digital Preservation: an introduction to the standards issues surrounding the deposit of non-print publications. At: www.bic.org.uk/digpres.doc

Graham, P. (2000). RLG and Archiving at the heart of the research library mission. RLG News. Winter 2000. (50). P. 12 – 13.

Graham, P. (1998/February). Digital strategies for the Rutgers University Libraries: a white paper draft. DRAFT 4.

Hedstrom, M. & Montgomery, S. (1998). Digital Preservation Needs and Requirements in RLG Member Institutions: A Study Commissioned by the Research Libraries Group. Available at: http://www.rlg.org/preserv/digpres.html

Hodge, G. (2000). Best practices for digital archiving: An information life cycle approach. D-Lib Magazine, 6, (1). Available at: http://www.dlib.org/dlib/january00/01hodge.html.

Lysakowski, R. & Leibowitz, Z. (2000). Looming information age crisis expected to cause trillion-dollar losses over the next 20 years: Titantic 2020 – a call to action. Available at: http://www.censa.org.

Rothenberg, J. (1998/January). Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Available at: http://www.clir.org/pubs/reports/rothenberg/contents.html

Digital Libraries, R. Jantz - Feb. 26, 2002

28

Digital Libraries, R. Jantz - Feb. 26, 2002

29

Migration & Complexity of the Technical Environment

PCs, disks, CDs Server, storage, NetworkHardware

Operating System W95/98/2000 Unix/NT

Network TCP/IP TCP/IP

Application SoftwareAdobe, NetscapeWord, SPSS

Servers: web, file, email, etc

Formats (extensions)Pdf, doc, wpd, txt, gif, jpeg,

tiff, eoo, por, ebx, mdb

Client Server

Digital Libraries, R. Jantz - Feb. 26, 2002

30

Preservation in Digital Libraries

Preservation:“The managerial, financial, and technical issues involved in preserving library (or archive) materials in all formats - and/or their information content - so as to maximize their useful life” (Eden, 1997)

Digital preservation: The term refers exclusively to the preservation (whatever exactly that entails) of material which is available [solely?] in electronic form (Bide, 1999).

And the digital version is considered to be the primary archival item. (Hedstrom, 1998)

Digital Libraries vs. Digital Archives: Archives make a commitment to long-term preservation of digital information. (Joint Task Force on Digital Archiving)

Digital Libraries, R. Jantz - Feb. 26, 2002

31

Digital Archiving- Getting Started

Form an archiving working group Prepare a preliminary policy statement Trial the policy statement with several small, existing projects Examine what others are doing and bring in best ideas. Collaborate with others who are interested in digital in

preservation. Initiate forums on digital archiving; invite colleagues, students,

researchers, etc. Submit a recommendation for a digital archiving program and

next steps.

Digital Libraries, R. Jantz - Feb. 26, 2002

32

Trusted Digital Repositories(http://www.rlg.org/pr/pr2001-attributes.html)

A Proposed Definition (from RLG document):

Technology Infrastructure– Auditability, security, and communication– Backup policies incl. avoiding, detecting and restoring corrupted data

Organization– Certification– Compliance– Reputation and performance– Agreements between creators and providers– Open sharing of what is being preserved and for whom– Balanced risk, benefit, and cost