automation in digital preservation: three scenarios milena dobreva 1, yunhyong kim 2, gillian oliver...
TRANSCRIPT
Automation in Digital Preservation:Three Scenarios
Milena Dobreva1, Yunhyong Kim2, Gillian Oliver3, Seamus Ross2, Raivo Ruusalepp4
1 Centre for Digital Library Research2 Digital Curation Center (DCC) & Humanities Advanced
Technology and Information (HATII), University of Glasgow, UK3 School of Information Management, Victoria University of
Wellington, Wellington, New Zealand4 Estonian Business Archives Consultancy, Algi 29, Tallinn 10620,
Estonia
UK e-Science ALL HANDS meeting: 10 September 20082
Talk overview
3
2
1Automation in digital preservation
Three case studies
Findings
What comes next?
UK e-Science ALL HANDS meeting: 10 September 20083
“… if we do not actively pursue the preservation of digital material now, we risk having a gap in our intellectual record. … If you allow me another historical reference, we do not want to experience the digital equivalent of the destruction of the Alexandria Library. Scientific assets are just too valuable to be put at risk”.
Ms. Viviane Reding, EU Commissioner on IS & Media
“The digital preservation community’s inability to bring firm evidence to bear in support of its contentions about data loss, coupled with the alarmist rhetoric of terms such as digital dark ages and digital black hole, leave us exposed.”
Prof. Ross Harvey,Charles Sturt University, NSW, Australia
The Preservation Landscape
UK e-Science ALL HANDS meeting: 10 September 20084
1 Automation in digital preservation
• Digital preservation – part of the information management– highly interdisciplinary
• What actually ‘interoperability into the future’ means?– preservation of the bit streams– preservation of semantics
• A coherent theory of preservation is still not developed– need justified in 2001– currently CASPAR and SHAMAN projects are working in this
direction
UK e-Science ALL HANDS meeting: 10 September 20085
Typical preservation issues…
Failure of any component of the technological chainHardware, software or support environment change. Outcomes of a project which is not sustained.
Problems with the “the bits”This could happen because of a storage device or medium failure;
or if a DNS entry is no longer resolvable. Changes in the Knowledge Base Loss of understanding or usability
Lost data on provenance or authenticity (requirements of trusted repositories)
Record of who did what and how did they do itStrategies: migration, emulation, digital archaeology, …
UK e-Science ALL HANDS meeting: 10 September 20086
DigitalPreservationEurope: Research Roadmap (2007): 9 themes
1. Restoration2. Conservation3. Collection and Repository Management4. Preservation as Risk Management5. Preserving the Interpretability and Functionality of
Digital Objects6. Collection Cohesion and Interoperability7. Automation in Preservation8. Preserving the Context9. Storage Technologies and Methods
UK e-Science ALL HANDS meeting: 10 September 20087
Functional Model: Open Archival Information System (OAIS)
UK e-Science ALL HANDS meeting: 10 September 20088
2 Three case studies
• Build upon project experiences of HATII at the University of Glasgow and partners
– Appraisal
– Metadata extraction based on genre classification
– Risk assessment and audit
UK e-Science ALL HANDS meeting: 10 September 20089
Appraisal
Approach: developed sets of appraisal criteria; work on automation of their evaluation continues.
UK e-Science ALL HANDS meeting: 10 September 200810
Metadata Extraction Based on Genre Classification
UK e-Science ALL HANDS meeting: 10 September 200811
Risk Assessment and Audit
DRAMBORA (Digital Repository Audit Method Based on Risk Assessment)
UK e-Science ALL HANDS meeting: 10 September 200812
The Three Case Studies as OAIS Functional Entities
UK e-Science ALL HANDS meeting: 10 September 200813
SWOT observations
UK e-Science ALL HANDS meeting: 10 September 200814
3 Findings
• These case studies present various functions in the sense of OAIS model.
• For such ‘smaller’ digital preservation solutions we need to know more on the common logic.
• For bigger applications a coherent theory will be helpful.
• The degree of automation differs! – Profiler– Single automated model– Hybrid model