keepit course 5: revision

45
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project By Chris Blakeley Revision with Steve Hitchcock A rapid recap of tools from the course: what they do, what they look like, what we did with them

Upload: jisc-keepit-project

Post on 17-Nov-2014

1.371 views

Category:

Technology


0 download

DESCRIPTION

This course revision presents a rapid recap of all the tools covered in the KeepIt course. It reproduces selected slides from each of the presentations given during the course to illustrate three aspects of each of the tools encountered: what they do, what they look like, what we did with them. The presentation was given as part of the final module of a 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. For more on this and other presentations in this course look for the tag ’KeepIt course’ in the project blog http://blogs.ecs.soton.ac.uk/keepit/

TRANSCRIPT

Page 1: Keepit Course 5: Revision

Digital Preservation Tools for Repository ManagersA practical course in five parts

presented by the KeepIt project

ByChris Blakeley

Revision with Steve HitchcockA rapid recap of tools from the course:

what they do, what they look like, what we did with them

Page 2: Keepit Course 5: Revision

Tools Module 1

• The Data Asset Framework (DAF), Sarah Jones, University of Glasgow, and Harry Gibbs, University of Southampton

• The AIDA toolkit: Assessing Institutional Digital Assets, Ed Pinsent, University of London Computer Centre

Page 3: Keepit Course 5: Revision

… because good research needs good data

DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton

www.data-audit.eu/

Themes addressed in DAF surveys

• Data: type / format, volume, description, creator, funder

• Creation: policy, naming, versioning, metadata & documentation

• Management: storage, backup, roles and responsibilities, planning

• Access: restrictions, rights, security, frequency, ease of retrieval, publish

• Sharing: collaborators, requirements to share, methods, concerns

• Preservation: selection / retention, repository services, obsolescence

• Gaps / needs: services, advice, support, infrastructure

Page 4: Keepit Course 5: Revision

… because good research needs good data

DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton

www.data-audit.eu/

The methodology

http://www.data-audit.eu/DAF_Methodology.pdf

Page 5: Keepit Course 5: Revision

… because good research needs good data

DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton

www.data-audit.eu/

How would you scope:1) the range of data being created at your institution? 2) user expectations / requirements on the repository to help manage and preserve those data?

• What would you want to find out?- what would your key questions be?

• How would you go about collecting information?

• How would you ensure participation?

Page 6: Keepit Course 5: Revision

Relevance to this Course

• AIDA can…– Measure your ability to manage digital content

effectively– Show how good you are sustaining continued

access– Be directly relevant to managing a repository

(access, sharing, and usage)– Helps you find out where you are – Help you decide what to do next

Page 7: Keepit Course 5: Revision
Page 8: Keepit Course 5: Revision

Exercise

• Divide into four teams• One element from each leg, relating to one activity• Agree on the scope of what you will assess - work on a

single Institution (real or imaginary)• Assess the capacity for this activity• Expected results:

– A score for the element in each leg and at each level (6 scores in all)

– Explain why you arrived at that decision– Roles / job titles of people consulted– Outline evidentiary sources that might help

Page 9: Keepit Course 5: Revision

Tools Module 2

• Keeping Research Data Safe (KRDS), Costs, Policy, and Benefits in Long-term Digital Preservation, Neil Beagrie, Charles Beagrie Ltd consultancy

• LIFE3: Predicting Long Term Preservation Costs, Brian Hole, The British Library

Page 10: Keepit Course 5: Revision

What was Produced?• A cost framework consisting of:

– activity model in 3 parts: pre-archive, archive, support services

– Key cost variables divided into economic adjustments and service adjustments

– Resources template for Transparent Costing (TRAC)

• 4 detailed case studies (ADS, Cambridge, KCL, Southampton)

• Data from other services.

Page 11: Keepit Course 5: Revision

Benefits Framework

KRDS2 Benefits Taxonomy

Dimension 1(Type of Outcome)

Direct Indirect (costs avoided)

Dimension 2 (When)

Near-Term Benefits Long-term Benefits

Dimension 3 (Who)

Private Public

Page 12: Keepit Course 5: Revision

Group Exercise• Agree a spokesperson and “recorder”• Using KRDS2 Benefits Taxonomy:

– Q1 Identify which benefits can be costed?– Q2 Select 3 Key benefits (include costed and

uncosted)– Q3 Identify the information you might need for

measuring them• Report back at 12.10 !

Page 13: Keepit Course 5: Revision

LIF

E3

13

LIFE3: Estimating preservation costs

The LIFE3 Project: Aim: To develop the ability to estimate preservation costs across the

digital lifecycle The Project is developing:

A series of costing models for each stage and element of the digital lifecycle

An easy to use costing tool Support to enable easy input of data Integration to facilitate use of the results

Organisational Profile

Predicted Lifecycle Cost

CostEstimationTool

Context

Content Profile

Page 14: Keepit Course 5: Revision

LIF

E3

14

LIFE3 costing tool outputs – estimated costs

Reference Linking

Disposal

•Check-in

InspectionObtaining

BackupHoldingsUpdate

Ordering & Invoicing

....

User Support

RefreshmentDepositIPR & Licensing

....

Access Control

Storage Provision

MetadataSubmission Agreement

....

Access Provision

Repository Admin

Quality Assurance

Selection....

Life

cy

cle

Ele

me

nts

Access

Re-ingest

Preservation Action

Preservation Planning

Preservation Watch

Content Preservation

Bit-stream Preservation

IngestAcquisitionCreation or Purchase

Life

cy

cle

S

tag

e

Page 15: Keepit Course 5: Revision

LIF

E3

15

Exercise

Excel model The Content Profile Refining the calculations

Feedback Do you feel that this approach is sound? Have we included all relevant factors? Is the model suitable for the kind of content your repository deals

with? Are we making correct assumptions, and is it clear what these are? How could we improve it?

Page 16: Keepit Course 5: Revision

Tools Module 3

• Significant characteristics, Stephen Grace and Gareth Knight, King’s College London

• PREMIS, Open Provenance Model

Page 17: Keepit Course 5: Revision

AnalyseCheck Action

• Migration• Emulation• Storage selection

• Format identification,

versioning• File validation

• Virus check• Bit checking and

checksum calculation

Toolse.g. DROID

JHOVEFITS

Preservation planningCharacterisation:Significant properties and technical characteristics, provenance, format, risk factors

Risk analysis

ToolsPlato (Planets)PRONOM (TNA)P2 risk registry (KeepIt)INFORM (U Illinois)KB

Preservation workflow

Page 18: Keepit Course 5: Revision

A group task on format risks1. Choose two formats to compare (e.g. Word vs

PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG)2. By working through the (surviving) list of format

risks select a winner (or a draw) between your chosen formats for each risk category (1 point for win)

3. Total the scores to find an overall winning format4. Suggest one reason why the winning format using

this method may not be the one you would choose for your repository

Page 19: Keepit Course 5: Revision

19

Determine expected behaviours• What activities would a user – any

type of stakeholder – perform when using an email?

• Draw upon list of property descriptions performed in the previous step, formal standards and specifications, or other information sources.

Task 2:Identify the type of actions that a user would be able to perform using the email (Groups. 15 mins).

• E.g. Establish name of person who sent email

• E.g. May want to confirm that email originated from stated source.

Analyse structureIdentify purpose of technical properties

Determine expected behaviours

Associate structure with each function

Classify behaviours into functions

Review & finaliseSelect object type

for analysis

Recipient local-part

Behaviour Structure

Recipient domain-part

Trace-route

Recipient display-name

Sender local-part

Sender domain-part

Sender display-name

Message-id

references

In-reply-to

Body text colour

Body background

strikethrough

underline

Paragraph

Line break

Message text

subject

Page 20: Keepit Course 5: Revision

20

Exercise overview•Analyse the content of an email

• Analyse structure of email message• Determine purpose that each technical property performs

•Consider how email will be used by stakeholders• Identify set of expected behaviours• Classify set of behaviours into functions for recording

Page 21: Keepit Course 5: Revision

21

Page 22: Keepit Course 5: Revision

22

JHOVE Demo

Page 23: Keepit Course 5: Revision

Define Sample Objects

Page 24: Keepit Course 5: Revision

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the changes

over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

• Provenance in action: transmission and recording– Through a simple game we learned that if we don’t recognise the necessary

properties at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with

Page 25: Keepit Course 5: Revision

Tools Module 4

• Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton

• Plato, preservation planning tool from the Planets project, Andreas Rauber and Hannes Kulovits, TU Wien

Page 26: Keepit Course 5: Revision

Hybrid Storage Policies

Page 27: Keepit Course 5: Revision

EPrints Storage Manager

Page 28: Keepit Course 5: Revision

Preservation - Analyse EPrints File Classification + Risk Analysis

Risk AnalysisRisk Analysis In EPrints

Page 29: Keepit Course 5: Revision

Preservation - Action Mock up Transformation Interface

Transformation?

Tool Preservation Level

PPT -> PPTX

PPT -> PDF

Migration Tools

Risk Analysis In EPrints Migration?

Page 30: Keepit Course 5: Revision

Viewing high-risk objects

Page 31: Keepit Course 5: Revision

Exercise: EPrintsAdding ‘at risk’ image collection

Page 32: Keepit Course 5: Revision

Preservation Planning

Page 33: Keepit Course 5: Revision

Plato

Assists in analyzing the collection- Profiling, analysis of sample objects via Pronom and other services

Allows creation of objective tree- Within application or via import of mindmaps

Allows the selection of Preservation action tools

Preservation Planning with Plato

Page 34: Keepit Course 5: Revision

Plato

Runs experiments and documents results Allows definition of transformation rules, weightings Performs evaluation, sensitivity analysis, Provides recommendation (ranks solutions)

Preservation Planning with Plato

Page 35: Keepit Course 5: Revision

Exercise Time! The Scenario

National library Scanned yearbooks archive GIF images The purpose of this plan is to find a strategy on how to preserve this

collection for the future, i.e. choose a tool to handle our collection with.

The tool must be compatible with our existing hardware and software infrastructure, to install it within our server and network environment.

The files haven't been touched for several years now and no detailed description exists. However, we have to ensure their accessibility for the next years.

Re-scanning is not an option because of costs and some pages from the original newspapers do not exist anymore.

Page 36: Keepit Course 5: Revision

Exercise: EPrintsAdding ‘at risk’ image collection

Page 37: Keepit Course 5: Revision

Exercise: Plato-EPrintsPlan-migrate-review

Page 38: Keepit Course 5: Revision

Tools Module 5

• TRAC, Trusted Repository Audit and Certification: criteria and checklist

• DRAMBORA, Digital Repository Audit Method Based On Risk Assessment, Martin Donnelly, Digital Curation Centre, University of Edinburgh

Page 39: Keepit Course 5: Revision

… because good research needs good data

DRAMBORA and DAF, EDINA, 27th October 2009

www.data-audit.eu www.repositoryaudit.eu

Trustworthy Repositories Audit & Certification (TRAC) Criteria and Checklist

• RLG/NARA assembled an International Task Force to address the issue of repository certification

• TRAC is a set of criteria applicable to a range of digital repositories and archives, from academic institutional preservation repositories to large data archives and from national libraries to third-party digital archiving services

• Provides tools for the audit, assessment, and potential certification of digital repositories

• Establishes audit documentation requirements required• Delineates a process for certification

• Establishes appropriate methodologies for determining the soundness and sustainability of digital repositories

Page 40: Keepit Course 5: Revision

TRAC Criteria Checklist • Within TRAC, there are 84 individual criteria

Only 82 criteria to go!

Page 41: Keepit Course 5: Revision

To certify or not to certify?That is the question

1. Take a spreadsheet with all 84 TRAC criteria.

2. Select one.3. Decide whether you

could certify your repository for this, based on where your repository is now or where you think it might be after participating in this course. by Cayusa

by fabiux

Page 42: Keepit Course 5: Revision

… because good research needs good data

KeepIt #5: University of Northampton, 30 March 2010

www.repositoryaudit.eu

DRAMBORA Method• Discrete phases of (self-)assessment, reflecting

the realities of audit• Preservation is fundamentally a risk

management process:• Define Scope• Document Context and Classifiers• Formalise Organisation• Identify and Assess Risks

• Builds audit into internal repository management procedures

Page 43: Keepit Course 5: Revision

… because good research needs good data

KeepIt #5: University of Northampton, 30 March 2010

www.repositoryaudit.eu

Repository Administration

Page 44: Keepit Course 5: Revision

… because good research needs good data

KeepIt #5: University of Northampton, 30 March 2010

www.repositoryaudit.eu

Part I – Identify a risk (30 minutes)

Each group should identify one risk (based on your ownexperiences wherever possible), and complete the DRAMBORA worksheet.

Groups should complete:• name and description of the risk;• example manifestations of the risk;• nature of the risk;• risk owner(s);• stakeholders who would be affected;• if possible, relationships with other risks.

Page 45: Keepit Course 5: Revision

… because good research needs good data

KeepIt #5: University of Northampton, 30 March 2010

www.repositoryaudit.eu

Part II – Mitigate the risk (30 minutes)

Now identify what steps your archive might take to manage and mitigate the identified risk over time…

Each group should complete:

• Risk management strategy/-ies;• Risk management activities;• Risk management activity owner(s).