ontario.ca/archives preserving digital records for the long-term: building a trustworthy digital...
TRANSCRIPT
ontario.ca/archives
Preserving Digital Records for the Long-Term: Building a Trustworthy Digital Repository at the Archives of Ontario
Association for Manitoba Archives – April 29th, 2011
Ryan Carpenter
Senior Coordinator, Archival Electronic Records
Archives of Ontario
ontario.ca/archives
Agenda
• Archives of Ontario – A Brief Introduction• Digital Preservation Challenge• Digital Preservation at the Archives of Ontario• Trustworthy Digital Repository (TDR)
– What is it– Why do we need it – What has been done– What is being done– What’s next
• TDR & ECM• Digital Preservation Collaboration
ontario.ca/archives
Archives of Ontario: A Brief Introduction
• The Archives was established in 1903• Provides leadership to collect, manage and preserve
the records of Ontario and to promote and facilitate their use by present and future generations
• Recently became part of Information, Privacy and Archives Division of Corporate Chief Information Office.
• Archives is made up of three integrated program delivery areas:– Collections Development and Management– Customer Service and Outreach– Recordkeeping Support
ontario.ca/archives
Digital Preservation Challenge
ontario.ca/archives
The Digital Environment
• Digital records encompass email , audiovisual recordings, textual documents, websites, images, etc.
• Digital records are pervasive in all aspects of our personal and working life.
• The creation of digital information is exploding at an exponential rate.
• Some similarities but many differences between digital and analog records.
ontario.ca/archives
The Digital Environment – Government
• Ontario Public Service (OPS) digital records experience mirrors what is happening in other jurisdictions.
• Currently, 98% of new information created in the OPS is in digital format only.
• The implementation of the Enterprise Content Management (ECM) system will shift government recordkeeping from paper to electronic media across the OPS with the electronic form of the record, rather than the paper records, will be considered authoritative.
• The complexity involved in the long-term digital preservation coupled with the explosive growth of archival digital records in the next few years presents the Archives with a critical challenge; the volume of potentially archival digital records is roughly estimated to be 100 terabytes by 2013 across OPS.
• Under the Archives and Recordkeeping Act, 2006, the Archives is mandated to preserve and make available archival electronic records for as long as required.
ontario.ca/archives
Long-term Digital Preservation - Volume Impact to the Archives
Volume of Potential Archival Electronic Records in OPS
0
20
40
60
80
100
120
2007 2008 2011 2012
Tera
byte
s
Volume of Electronic Information in OPS
600780
1713
2226
0
500
1000
1500
2000
2500
2007 2008 2011 2012
Tera
byte
s
Volume• Approximately 85 TB of electronic information created in OPS in 2011 is of archival value and will potentially have to be transferred to the AO eventually. (Literature suggests that 3-5% of government records (paper records) are archival)
• The current total volume of digital records collections in the Archives is 5.5 TB.
• The average annual volume increase rate is approximately 400% (1998-2010)
• With future ECM implementations in the OPS, there will be more rigour in transferring archival electronic records to the Archives.
The OPS is managing about 1.7PB of electronic information in 2011. (Source: Managing Information Assets in the OPS: The Future is Now)
ontario.ca/archives
What is Digital Preservation?• Digital Preservation is the management of digital
information to ensure it is accessible and understandable over time.
OR
• Digital Preservation encompasses a broad range of activities designed to extend the usable life of digital files and protect them from media failure, physical loss, and obsolescence.
• However, it is one thing to preserve a bitstream, but quite another to preserve the content, form, style, appearance, and functionality.
ontario.ca/archives
Digital Preservation Threats
• File Format and Software Obsolescence
• Hardware and Media Obsolescence
• Physical Threats
ontario.ca/archives
Digital Preservation Strategies
Basic• Bitstream Copying (backups)• Refreshing• Durable/Persistent Media (e.g. Gold CDs)• Analog Backups (e.g. microfilm)Expensive – Not Feasible• Technology Preservation (‘computer museum’)• Digital Archaeology (data recovery)Preferred Approaches• Migration (most preferred approach currently)• Normalization (reliance on standard format – PDF/A)• Emulation (e.g. Universal Virtual Computer)• Encapsulation (‘wrapping’)
ontario.ca/archives
Digital Preservation Standards - ISO• ISO 14721:2003 - Open Archival Information
System (OAIS) - Reference model• Metrics for Digital Repository Audit and
Certification RED BOOK, CCSDS. Oct 2009• ISO/TR 18492:2005 - Long-term preservation of
electronic document-based information• ISO 19005-1:2005 - Document management -
Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1)
• ISO 15801 - Electronic imaging - Information stored electronically - Recommendations for trustworthiness and reliability
ontario.ca/archives
12
Digital Preservation at theArchives of Ontario
ontario.ca/archives
Existing Archival Digital Records Program• Program has existed since 1997.
• Program is focused on the long-term preservation of archival digital records.
• 2 full-time employees – Senior Coordinators, Archival Electronic Records.
• Created Electronic Records Online section of AO website in 2009.
ontario.ca/archives
Existing Archival Digital Repository
• Existing digital repository is on a virtual server maintained by Infrastructure Technology Services (ITS).
• Current digital holdings are about 5.5 TB, consisting of some 1.5 TB of archival born-digital records and 4 TB of digitized images (mostly VS records).
• These digital records are in various formats: MS Office documents, e-mails, HTML, digital audio and video files, databases, digital images, and websites etc.
• Existing repository is not adequate to meet future operational requirements as it offers little functionality to preserve and secure the digital records properly or make them accessible online.
ontario.ca/archives
Transfer of Digital Records• The Archives of Ontario currently acquires archival
digital records from Ontario public bodies and private donors.
• Guideline for Transferring Electronic Records to the Archives of Ontario was revised in September 2009.
• Assists with the transfer of archival digital records to the Archives in accordance with an approved records series that has a final disposition of ‘Transfer to Archives’.
• This guideline applies to all Ontario government public bodies that are subject to the requirements of the Archives and Recordkeeping Act, 2006.
ontario.ca/archives
Transfer of Digital Records – Cont’d• Originating public bodies are responsible for
ensuring that all digital records in their custody remain readable, accessible, secure, free of viruses, and are able to satisfy legal and evidentiary requirements throughout their lifecycle.
• Digital records are to be transferred in a software independent format whenever possible, or in a format the Archives finds acceptable.
• In general, the Archives will not acquire specialized software applications and their ongoing licenses.
ontario.ca/archives
Transfer of Digital Records – Cont’d• Transfer Procedures
– Consult with Archives– Identify Records for Transfer– Complete a Test Transfer– Transfer Official Records and
Documentation– Confirm Receipt of Records Transfer
ontario.ca/archives
Trustworthy Digital Repository
(TDR)
ontario.ca/archives
Trustworthy Digital Repository (TDR) – What is it?
Definition:
‘a mission to provide reliable, long-term access to managed digital resources to its Designated Community, now and into the future’
Taken from ‘Audit and Certification of Trustworthy Digital Repositories’ - October 2009
ontario.ca/archives
TDR - What is it - Cont’d
• A TDR is a long-term solution for the preservation of digital records of archival value.
• It will be driven by the Archives’ business requirements and will be modelled on ISO standards and other best practices as well.
ontario.ca/archives
TDR - What is it - Key Components
Staff
TDR will be modelled on ISO standards – OAIS Reference Model, and Audit and Certification of Trustworthy Digital Repository.
The Archives’ TDR will be certified once an international/national certification process is developed.
ontario.ca/archives
TDR – What is it - OAIS Reference Model
ontario.ca/archives
TDR - Why do we need it?
• Ensures the Archives meets its mandated statutory obligations as per the Archives and Recordkeeping Act, 2006.
• Meets the priority for long-term digital preservation as identified in Ontario’s Five Year Corporate I&IT Plan (2008-2013).
• Meets the government’s priority of strengthening front-line service delivery by greatly improving services to the public at the Archives. TDR will provide ‘anytime, anywhere’ remote 24/7 online access to archival digital records.
ontario.ca/archives
TDR - Why do we need it? Cont’d
• To preserve anyany type of electronic record,• Created using anyany type of application,• On anyany computing platform,• Delivered on anyany digital media,• From anyany public body in the Ontario Government
and any any private donor,• To provide discovery and delivery to anyoneanyone with
an interest and legal right of access,• For present and future generations … …For present and future generations … …
Revised from: http://www.archives.gov/era (U.S. A. National Archives and Records Administration Electronic Records Archives)
ontario.ca/archives
TDR - What has been done? • Full Business Case
– Main recommendation: Acquire a Modifiable Off-the-Shelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution
• Request for Information (RFI) for a trusted digital repository solution – Identified 5 vendors with viable long-term digital
preservation repository solutions
• High-level Functional Requirement Analysis for the future trustworthy digital repository– For main entities and functions of digital repository
• IT Governance Process– Gate 0 approval and Gate 1 GGRC endorsement
ontario.ca/archives
TDR – What has been done - Full Business Case
• Main recommendation: Acquire a Modifiable Off-the-Shelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution
• Other options which have been analyzed for the development of a TDR are: – Utilize an integrated open source software (OSS)
solution– Acquire a commercial custom system – Develop a digital preservation system in-house – Rely on OPS public bodies to preserve archival digital
records
ontario.ca/archives
TDR – What has been done - Request for Information
• The RFI has been well received by potential vendors with none finding difficulty with the concepts and constructs (such as OAIS Reference Model and TDR etc.) contained in the RFI document. A wealth of valuable information was received from the 7 respondents.
• All 5 TDR-focused submissions meet or exceed the basic requirements for a TDR as outlined in the RFI and demonstrate the availability of modifiable off-the-shelf (MOTS) products on the digital repository market.
• The estimated cost of purchasing and implementing such a solution (including software, hardware, customization, integration, and implementation, etc.) varies from $400,000 to $2,000,000.
• The adoption of Open Source Software (OSS) applications seems inevitable. Among the 5 TDR-focused submissions, 3 solutions comprise OSS components; while 2 other solutions are completely made up of OSS applications.
• The OAIS Reference Model, and the other TDR-related standards and best practices are highly accepted and followed by the solution providers.
• The use of any solution proposed alone will not guarantee the TDR’s compliance with the OAIS Reference Model and Trustworthy Repositories Audit & Certification.
ontario.ca/archives
TDR – What has been done - High-level Functional Requirement Analysis
35 Use Cases were developed for main Entities and Functions of a TDR:
– Ingest (7)– Archival Storage (8)– Data Management (4)– Access (4)– Administration (7)– Preservation Planning
(5)
Ingest (Entity)
OAIS (Function)
TDR (Function)
Comparison
Manage Transfer Agreement
Move Transfer Agreement Management from Administration to Ingest
Receive Submission
Receive SIP Submission
Quality Assurance
Perform SIP Quality Assurance
Generate AIP Generate AIP
Generate Descriptive Information
Extract Descriptive Metadata
Coordinate Updates
Delete Coordinate Updates, and incorporate the functionalities into Generate AIP and Extract Descriptive Metadata under Ingest
Notify Transfer Result
Add Notify Transfer Result
ontario.ca/archives
29
TDR - What has been done - High-level Functional Requirement Analysis cont’d
Use Case Template
ontario.ca/archives
30
TDR - What has been done - High-level Functional Requirement Analysis cont’d
RADR
Ingest
Administration
Access
Preservation Planning
Producer – EIM Systemt Consumer
1 SIP
DIP
Data Management
Archival Storage
ADD
CTS
3. Interface
5. Integration
6. Interface
Federated
+
7. Interface
CSS
Notes:1. TDR (ingest) interfaces to Producers’ system, especially their EIM Open Text System for the transfer of SIPs.. 2. TDR (Ingest) interfaces to Digitization projects in coordinating transfers of digitized images. 3. TDR (Ingest) interfaces to the CTS in coordinating transfers of mixed physical/digital records. Functionality might be very limited at early stage of RADR implementation.4. TDR (Ingest) interfaces to the Series Management Database to collecting records schedule information. Functionality might be very limited at early stage of RADR implementation. 5. TDR (Ingest) integrates with the ADD to cooperate on metadata capture and describing digital records. 6. TDR (Data Management) interfaces to ADD in proper storage and maintenance of metadata, especially duplicate descriptive metadata.7. TDR (Access) interfaces to AO Federated Search Engine and Customer Service System (CSS) in assisting users’ searching and ordering activities. TDR doesn’t interact with users directly, however TDR is responsible for preparing query results, reports and DIPs for Search Engine and/or CSS to deliver.
Digitization
2. Interface
Page 1
RADR – Potential Integration with other IT applications Friday, October 15, 2010
The Series Management
Database
4. Interface
Search results
Metadata maintenance
ontario.ca/archives
31
TDR - What has been done - High-level Functional Requirement Analysis cont’d
Reengineering of digital records management process is one of the biggest challenges we are facing. We mapped the archival process into OAIS Entities and Functions.
ontario.ca/archives
32
TDR - What has been done - High-level Functional Requirement Analysis cont’d
Stru
cture o
f Po
licies and
Pro
cedu
res S
tructu
re of P
olicies an
d P
roced
ures
Reco
mm
end
edR
ecom
men
ded
The Archives Fundamental Digital Preservation Polices & Procedures
…...
TDR Overall Policies & Procedures
…...
TDR Entity-specific Policies & Procedures
…… …… …… … ...
Ingest Archival Storage Data Management Access Administration Preservation Planning
Digital Preservation PolicyDigital Preservation Strategic Plan Digital Collection Policy
Digital Preservation Method
TDR Mission Statement
TDR Naming /Numbering Convention
Digital Records Transfer Guideline
Digital Records File Format Guideline
TDR Security Policy
TDR User Access Control
Backup and Recovery Policy
TDR Contingency Plan
System Configuration Manual
TDR AIP Packaging Standard
TDR Media Management Guideline
TDR AIP Migration Procedure
TDR Database administration policy
DIP Packaging Standard
TDR Import and Export Guideline
Technology Monitoring Guideline
Digital Records Selection and Culling Guideline
ontario.ca/archives
TDR- What is being done - Open Source Software (OSS) Experiments
OSS testing: objectives • Test functionalities of various products• Assess the feasibility of utilizing these tools for
interim• Validate and refine the detailed functional
requirements for the TDR• Inform revisions to the Archives’ existing digital
records guidelines and associated policies• Determine appropriate preservation tools• Further understand our existing electronic
records, identify preservation risks, and potential mitigation approaches
ontario.ca/archives
TDR- What is being done - Open Source Software (OSS) Experiments – Cont’dOSS testing: tools to be tested • Tools which validate file formats and extract technical
metadata:– DROID (created by The National Archives of UK)– JHOVE (created by Harvard University)– NLNZ (created by the National Library of New Zealand)
• Tools which convert digital objects to open formats:– XENA (created by the National Archives of Australia)
• Tools which manage the object assessment and ingest process:– Archivematica (created by Artefactual Systems)
• Preservation testbed environment and project management software:– Planets Comparator, Planets Testbed, Planets Plato
ontario.ca/archives
TDR- What is being done - Open Source Software (OSS) Experiments – Cont’dTechnical Inventory of Digital Records in the
AO’s e-Repository• Identify the file formats and the other
technical features of digital records in the Archives holdings
• Identify records requiring immediate preservation action
• Assess preservation risks of digital records in the Archives’ holdings
• Determine priorities for future preservation operations
• Inform revisions to current procedures
ontario.ca/archives
TDR – Next Steps?
• Work will proceed in-house on developing detailed functional requirements for the TDR.
• Explore options for the development of the TDR.
• Creation of long-term digital preservation strategy.
• Creation of long-term digital preservation policy.
ontario.ca/archives
37
TDR - Detailed Requirements – Preliminary Plan• Deliverables
– Detailed requirement specifications for all 6 Entities (Ingest, Archival Storage, Data Management, Access, Preservation Planning and Administration) of a future TDR to be developed and validated
– Detailed workflow for the management of archival digital records, starting from receiving, selection, accessioning, through archival description, storage to search and ordering etc. to be developed and validated
• Objectives – Provide a sound foundation for the future development and
implementation of a TDR in the Archives;– Ensure the future TDR can fit well into the overall Archives
business environment, meet actual business requirements, work smoothly with the other IT applications already in place, and
– Follow related ISO standards and digital preservation/TDR best practices.
ontario.ca/archives
38
TDR - Detailed Requirements - Reference Materials
ontario.ca/archives
39
TDR - Detailed Requirements - Reference Materials cont’d
ontario.ca/archives
40
TDR - Detailed Requirements - Methodology
ontario.ca/archives
TDR & ECM
ontario.ca/archives
Linkages with ECM
• Long-term digital preservation begins at the desktop -active records.
• Proper recordkeeping during all stages of IM lifecycle will ensure that records can be properly managed in TDR.
• Preservation policy required to mitigate risks to legacy digital records.
• IT and information management areas need to partner to address challenges, incorporating recordkeeping requirements.
ontario.ca/archives
Linkages with ECM Cont’d
• Elements of a TDR can be applied to non-archival active/semi-active records that have long-term retention requirements.
• TDR ensures the sustainability of an Enterprise Content Management (ECM) strategy by providing a trustworthy exporting channel and permanent repository for archival digital records initially managed by ECM system.
ontario.ca/archives
TDR vs. ECM/RDMS
TDR ≠ ECM• Have different objectives.• Use different standards.• Look forward to future developments such
as an integrated solution with both records management and long-term digital preservation capabilities.
ontario.ca/archives
TDR vs. ECM/RDMS Cont’d
ECM (RDMS as major component) Trustworthy Digital Repository
Objectives To regain control over electronic records/information by providing system tools to capture, classify and apply retention schedules and access controls to e-records.
To preserve and provide access to digital records/information, free from dependence on any specific hardware and software, for as long as required
Functions Capture, File plan, Retention and disposition, Access control, Document management, Workflow, Collaboration
Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration
Standards/Best Practices
ISO 15489; DOD 5015.2,MoReq, Functional Requirements for ERMS(ICA 2008) etc
ISO 14721:2003: Open Archival Information System (OAIS) -Reference model; ISO 20652:2006 Producer-archive interface -- Methodology abstract standard; Trustworthy Repositories Audit & Certification: Criteria and Checklist V1.0; etc
Suppliers Open Text, EMC2 (Documentum), HP (Trim), IBM (Filenet) etc
Lockheed Martin, Tessella, Ex Libris, IBM, SUN, HP, Microsoft
ontario.ca/archives
TDR vs. ECM/RDMS Cont’d
Active Semi-active Inactive
Archives’ TDRECM Repositories
Almost all public Almost all public electronic recordselectronic records
All archival electronic All archival electronic records that have fulfilled records that have fulfilled their retention periodstheir retention periods
Public electronic Public electronic records with long records with long retention periodsretention periods
Transfer of archival electronic records into the Archives' Repository
ontario.ca/archives
Digital Preservation Collaboration:
Pan-Canadian Efforts & External/Internal Partnerships
ontario.ca/archives
Collaboration - Goals
• Similar to the Archives of Ontario, other archives and many areas of government are facing preservation challenges.
• Promote the awareness of long-term digital preservation.
• Bring key stakeholders together.• Collectively share the knowledge gained
from the important work being done in the Archives and across government.
ontario.ca/archives
National Digital Preservation Working Group (NDPWG)
• The group was established by the Archives of Ontario in August 2008. 8 meetings have been held to date.
• The mandate of the group is to provide a forum for practitioners in the field of digital preservation to share ideas and expertise, discuss best practices and lessons learned.
• The membership includes :– Saskatchewan – Manitoba – Nova Scotia – Nunavut – Northwest Territories – Yukon – Alberta– Manitoba – Library and Archives Canada
• The Archives of Ontario is the current chair for the NDPWG.
ontario.ca/archives
Canadian Preservation Cooperation Strategy• Library and Archives Canada (LAC) visited Archives on July
27th, 2010, to discuss a number of digital preservation projects where they could work collaboratively with the Archives.
• Subsequent to the meeting, the Archives, LAC and the Saskatchewan Archives Board agreed to develop a Canadian Preservation Cooperation Strategy on Digital Preservation that outlines the principles of the group and its proposed projects.
• Meetings have been held to develop work plans and other planning documents.
• Canadian Preservation Cooperation Strategy was presented at National, Provincial and Territorial Archivists Conference (NPTAC) on Friday 22 October 2010.
• First joint project is Canadian Registry of Digital Storage Media – final draft completed.
ontario.ca/archives
Canadian TDR Network
• Initiative started by LAC and the University of Alberta in March 2010.
• Emerged out of the process that built the Canadian Digital Information Strategy.
• Idea is to start with a small group of pioneering institutions that will begin a process of understanding and articulating the issues involved with building a TDR network.
• The short-term goal is to create a coalition from which the group can begin to build its preservation capacity.
• Kick-off meeting held November 26th at LAC.• Development of a strategy and vision document is
underway (by LAC, University of Alberta Library, Archives of Ontario, University of British Columbia Library).
ontario.ca/archives
Academic Partnerships - iSchool
• Archives has partnered with the Faculty of Information (iSchool) at the University of Toronto on a number of digital preservation activities:– Attended Digital Preservation Reading Course led by Dean
Seamus Ross from February – April 2010.– Hosted practicum (internship) for iSchool student Suzanne
Leblanc from May-August 2010. She completed a survey and report on digital preservation file formats for digital video.
– Attended iSchool hosted Digital Curation Matters conference June 16-17 2010.
– Have explored possibility of employing PhD. students and jointly applying for grant funding for preservation research projects.
ontario.ca/archives
International Liaisons
• Have had numerous interactions with international digital preservation jurisdictions.– Hosted delegations from international archives
including:• Hefei City, Anhui Province, China – April 17, 2009• National Archives of Japan - March 19, 2010• Malaysia National Archives – April 30, 2010
– Ongoing information sharing with colleagues in the USA, UK, Australia and New Zealand.
ontario.ca/archives
Plans for Ontario Government
• Creation of Digital Preservation Collaboration Committee
• Launching a Digital Preservation OPSpedia (internal social networking) site
• Setting up digital preservation web presence on the Archives’ inter/intranet
ontario.ca/archives
Thank You!Questions?
ontario.ca/archives
Contact Information
Ryan CarpenterSenior Coordinator, Archival Electronic RecordsArchives of Ontario [email protected], 416-327-8174
Lijuan YuSenior Coordinator, Archival Electronic RecordsArchives of Ontario [email protected], 416-327-1588