capturing preservation metadata from institutional repositories preserv project presented by steve...

41
Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School of Electronics and Computer Science (ECS), Southampton University DCC Workshop on the Long-term Curation within Digital Repositories Cambridge, 6 July 2005

Upload: adrian-maldonado

Post on 27-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Capturing preservation metadata from

institutional repositories Preserv Project

Presented by Steve HitchcockIntelligence Agents Multimedia Group,

School of Electronics and Computer Science (ECS), Southampton University

DCC Workshop on the Long-term Curation within Digital Repositories

Cambridge, 6 July 2005

Page 2: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

AbstractPreservation scenarios are often based on hypothetical situations, generalised applications such as digital libraries or cultural heritage organisations, or on specific applications such as digitisation. This presentation will consider the emerging but real scenario of preservation in the context of institutional repositories (IRs), being investigated by the JISC Preserv project. While on the surface this particular scenario may not seem to differ significantly from others, we will build a picture of relationships between repositories and preservation service providers to reveal what differences there may be with other scenarios and to understand the implications. We will use this analysis to inform the capture of some preservation metadata from IRs through the user deposit interface, perhaps the most critical data capture point in the IR preservation chain. Some initial ideas on formalising these IR preservation elements will be proposed for consultation, with a view to learning from, and possibly contributing to, the standard reference in this area, currently the PREMIS Working Group Data Dictionary for Preservation Metadata.

Page 3: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Page 4: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preservation

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Page 5: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preservationstorage media, media refreshing,

reformatting, backups and disaster recovery, environment, audit, security,

preservation strategy, migration, technology preservation, emulation,

records management, etc.

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Page 6: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Page 7: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Page 8: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Institutional repositoryEprints.org, DSpace,

FAIR, JISC DRs

Page 9: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

Page 10: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Page 11: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Page 12: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAI

Page 13: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAI

“Access is still not the primary purpose of a preservation system”Cornell OAIS tutorial

Page 14: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAI

Page 15: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

M I/F(OAI)

M I/F

Page 16: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAI

Page 17: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish Library Preservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Page 18: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

“It is important to build the concept of preservation from the outset" (JISC Circular 4/04)

Page 19: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIEprintsdeposit interface

Page 20: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIEprintsdeposit interface

Contents of IRsMany types of digital objects, formatsVersioning issues, some duplicationDifferent degrees of moderation: institutional membership is selection baseline

Page 21: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

Page 22: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

Influence/feedback

Page 23: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

Page 24: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

• Users• IR managers and admins• Heads of institutions• Research funders• Course leaders• Teachers• Education funders

Stakeholders

Page 25: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Connecting IRs (content providers) and preservation services

• How far can we apply the OAIS model across our IR-preservation model?

• How can we embrace preservation metadata in this model?

• To what extent do these apply just to the preservation component?

It looks as if many of the ideas focus on the preservation archive rather than the content provider

• How can we connect content providers with preservation services?

Page 26: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

“Preservation metadata is seldom shared across organizations”

Clifford Lynch "there has been some useful work done on metadata standards for preservation, although that work is not highly advanced. Part of the problem is that a lot of the work on preservation metadata has given rise to organizational guidance, the kinds of things you should think about as you attach metadata to objects when you want to preserve them, rather than hard specifics that would be more typical in interchange format, because preservation metadata today is seldom shared across organizations.“* Since this talk was given there has been a great deal of progress in relevant areas here. I would point the interested reader at the work on METS, PREMIS, and the NISO Still Image Technical Metadata draft standard Preserving Digital Documents: Choices, Approaches, and Standards, Law Library Journal, 96 (4), 2004

http://www.aallnet.org/products/2004-40.pdf

Page 27: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

OAIS model

This model is not very different from the schematic sketched for Preserv, especially in terms of the core components – Ingest, Data Management, Archival Store, Access – but how effectively can OAIS be applied across different organisations?

Page 28: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryPreservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

• Users• IR managers and admins• Heads of institutions• Research funders• Course leaders• Teachers• Education funders

Stakeholders

Page 29: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Ingest relies upon rules established to determine the metadata that must be present, the formats that are acceptable, the means that may be used for transferring objects, and the quality checks that must be performed

Archival Storage functions are like storage functions that are performed in all kinds of digital storage environments, whether long-term preservation is a goal or not. The difference lies in the added rigor in error checking, media replacement, and disaster recovery.

Data management provides the glue for the system by capturing and managing all of the metadata that is needed to operate the system. As in Archival Storage, the functions of Data Management are familiar to anyone who has worked with production databases.

Access in OAIS may provide objects to an intermediary system that then interacts directly with users, or it may deliver directly to users.

Access is still not the primary purpose of a preservation system.

From Digital Preservation Management, 4B. The OAIS Reference Model. A Cornell tutorial http://www.library.cornell.edu/iris/dpworkshop/working/index.html

Page 30: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

OAIS information model: AIP

• Archival Information Package (AIP):– Content Information

• Original target of preservation• Information Object (Data Object & Representation Information)

– Preservation Description Information (PDI) • Other information (metadata) "which will allow the understanding

of the Content Information over an indefinite period of time“

From Michael Day, Categories, uses and challenges of metadata and process documentation

http://www.ukoln.ac.uk/preservation/presentations/2005/delos-summerschool/slides.ppt

Page 31: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

OAIS information model: PDI

PreservationDescriptionInformation

Reference Information

ProvenanceInformation

ContextInformation

FixityInformation

PDI Preservation Description Information (Figure 4-16)

From Michael Day, Categories, uses and challenges of metadata and process documentation

http://www.ukoln.ac.uk/preservation/presentations/2005/delos-summerschool/slides.ppt

Page 32: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

A suggestion

• PDI – or somewhere - ought also to indicate what you want to do with the object, and perhaps act as the basis of selection for preservation services. What isn't clear is how these features could be incorporated.

Page 33: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preservation metadata

PREMIS = Preservation Metadata: Implementation Strategies

Preservation metadata = "the information a repository uses to support the digital preservation process"

The PREMIS Data Dictionary for Preservation Metadata (May 2005)

http://www.oclc.org/research/projects/pmwg/

Page 34: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

PREMIS data model

Intellectual entities

Objects

Events

Rights

Agents

Page 35: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

PREMIS Data Dictionary, v 1.0

• Defines semantic units for Objects, Events, Agents and Rights– Object: objectIdentifier, preservationLevel, objectCategory,

objectCharacteristics (format, significant properties, etc.), creatingApplication, storageMedium, environment (dependencies, hardware and software details, etc), relationship, …

– Event: eventIdentifier, eventType (from a controlled list, e.g. ingestion, migration, normalization), eventDateTime, eventDetail, eventOutcomeInformation, linkingAgentIdentifier, …

– Agent: agentIdentifier, agentName, agentType, …– Rights: permissionStatement, …

Page 36: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Limits to scope of PREMIS dictionary

– Does not focus on descriptive metadata• Domain specific and dealt with by many other

schemes

– Does not deal with technical metadata for all different types of digital file (left to format experts)

– Does not consider in detail the business rules of a repository, e.g. roles, policies, and strategies (but this could be added to data model)

Page 37: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Resonance

Points raised by PREMIS that have resonance for Preserv are:

• "Questions about business plans, policies, preservation strategies, as well as metadata"

• "Recognition of the need for automatic capture of metadata

Page 38: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Some questions

• Is there a need or scope for Preserv to describe semantic units within the PREMIS data dictionary that may be relevant to preservation metadata for IRs?

• Is what we need already included? For example, might the result of interaction with Pronom produce data residing in the Object entity (but noting PREMIS "Does not deal with technical metadata)?

• Which other entities might we contribute to? Events is one possibility. It's possible that some information we'd like to capture, e.g. funder, might fall within the scope of the Intellectual entity, which is outside the scope of PREMIS.

Page 39: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preserv, Capturing preservation metadata from institutional repositories, DCC, Cambridge, 6 July 2005

Preservationstorage media, migration, etc.

Preserv partnerBritish LibraryOther preservationservice providers

Preserv partnerseprints.sotonOxford Univ.

IRs, Eprints.org, DSpace, etc.

User/author User/reader

Deposit Access

Machine interface OAIIR authordeposit interface

Format

Format IDTNA +Pronom

• Users• IR managers and admins• Heads of institutions• Research funders• Course leaders• Teachers• Education funders

Stakeholders

Q. Preservation metadataor Selection metadata?

Page 40: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Preservation metadata or selection metadata?

Who are the key stakeholders in selecting materials to preserve?

Institutions – selection by admission to IR, or other criteria?

Research funders – preserve outputs of a funded programme

Authors – objective or subjective?Preservation business models for IRs: perhaps the

answer lies in who pays?

Page 41: Capturing preservation metadata from institutional repositories Preserv Project Presented by Steve Hitchcock Intelligence Agents Multimedia Group, School

Credits

• Southampton University Les Carr, Tim Brody, Jessie Hey, Steve Hitchcock

• British Library Richard Boulderstone, Adam Farquhar, Richard Masters

• National Archives Adrian Brown• Oxford University David Price, Frances Boyle,

Neil Jefferies, Michael Popham