south carolina information technology directors association september 8, 2008 bill henry, matt guzzi...
TRANSCRIPT
1
South Carolina Information Technology Directors Association
September 8, 2008Bill Henry, Matt Guzzi
SC Department of Archives and History
2
2007 NHPRC grant proposal not funded
AZ Archives submitted multi-state grant proposal to Library of Congress
AZ proposal had same basic goals
SC too late for funding
Paid own expenses to join project
2
3
One-time funding from General Assembly
Digitize paper records
Capture agency website snapshots
Purchase hardware and software
Library of Congress approved additional funds for project
SC now a fully-funded partner3
4
Persistent Digital Archives and Library System
Multi-state grant project funded by the Library of Congress and the Institute for Museum and Library Services
Five state partners: Arizona, Florida, New York, Wisconsin, South Carolina
Project will run 18-24 months; if successful, SCDAH intends to continue participation beyond this period
At the end of the project each partner will have a functioning digital archives system
4
5
An increasing number of long-term and archival records are created and maintained only in digital formats
Traditional archival practices designed for paper records won’t work in digital environment
Need ability to preserve electronic records so that we can demonstrate authenticity and protect integrity
PeDALS is both a learning opportunity and a chance to implement a functioning system
5
6
To develop a curatorial rationale that can be implemented in software to support an automated, integrated workflow to process collections of digital records
To build “digital stacks” – storage that has appropriate controls for preservation and disaster preparedness
6
7
Appraisal
Acquisition
Arrangement and description
Housing and storage
Reference and access
Preservation7
8
Transformation of traditional, paper-based practices into the digital arena
Focus on the rules, not the records
Automate the rules
8
9
More than storing the data (CD, tape, disk)
LOCKSS 1. Automatic integrity checking and error detection 2. Secure 3. Geographically distributed
9
10
To build a community of shared practice that meets the needs of a wide range of repositories
- For best practices - For resource sharing
To remove barriers by keeping costs as low as possible
10
11
OAIS an international (ISO) standard
Defines minimal set of responsibilities for long-term preservation
Can be applied to any information or object that needs to be retained long-term
OAIS does not specify a specific design or implementation
http://public.ccsds.org/publications/archive/650x0b1.pdf 11
1212
ProducerOAIS
(PeDALS) Consumer
Management
13
PeDALS (OAIS) Functional Areas
Ingest
Archival storage
Data management
Administration
Preservation planning
Access
14
PeDALS Overview - 1Agency records in an electronic records
system are transferred via the Internet to the PeDALS system
Supplemental processing checks for file integrity and completeness prior to transfer
15
PeDALS Overview - 2Agency records with associated metadata
are transferred to middleware server (Microsoft BizTalk®)
Rules-based software will transform records into format for long-term storage along with a copy for web access
16
PeDALS Overview - 3Records are transferred into LOCKSS
servers for long-term preservation
LOCKSS is a “dark archives”
17
PeDALS Overview - 4
Public access will be provided via the web
Restricted records will be blocked from public access
18
19
Agency’s will have the ability to login and upload records to the South Carolina Digital Archive.
Biz Talk will check the incoming records for completeness and matches the hash value on upload.
19
20
Once records are received the Archivist will receive an email.
The files will then be reviewed and a high level description will be entered in the Database Catalog.
The SIP (Submission Information Package) is created.
20
21
This is where the magic happens.
21
22
DIP (Dissemination Information Package) created.
The Catalog database is updated with Access, Description and Preservation Information.
The Archival records are placed on the Manifest Server for Ingest into LOCKSS.
The public access database is updated.
22
23
Based at Stanford University.
LOCKSS has primarily been used for scientific journals and publications.
Open Source and uses Open BSD which is a multi-platform 4.4BSD-based UNIX-like operating system.
23
24
Boots from CD = No operating system installed on the server.
Communicates using a VPN virtual private network.
Files for LOCKSS are stored on a separate Admin server running linux.
1 LOCKSS cluster with 7 Servers in our private distributed LOCKSS network.
Initially setup to take in 1TB of data and can be expanded.
24
25
Dark secure archival storage
LOCKSS is a sophisticated data storage system that scans for and repairs file corruption and other data integrity problems
Level 4 firewalls and geographic distribution provide added security
25
26
BizTalk Process - AIP (Archives Information Package).
This process moves records from LOCKSS to the Public Access web server based on the record access date.
26
27
Web server will provide Internet access to records through a web-based search interface.
Access to records restricted by statute or otherwise will be blocked during restriction period.
Restricted records are held in the LOCKSS dark archive no user copy is sent to the web server until public access is allowed.
27
28
We are currently in the process of implementing the web component of Rediscovery.
This will allow the public to search our holdings.
We are hoping to use Biz Talk to automatic populate the Rediscovery catalog.
Public access will be granted through URls to the Rediscovery web component.
28
2929
30
Permanently valuable electronic records scheduled for transfer to the SCDAH
Pilot project agencies and records:
Judicial Department – Supreme Court Case Files Election Commission – Voter Registration Master Files Public Service Commission – Orders DHEC – Electronic Index to Death Certificates
30
31
Project Status
Core metadata defined and data dictionary completed
System design completedHardware and software acquired and
installedAgency partners and records identifiedSystem prototype built (AZ & SC)BizTalk® training completed
32
On the HorizonOther states purchase and configure
hardware & software
First ingest of records in early winter
Develop public search website
33
Move from pilot to production mode
Develop procedures for agency participation
Expand participation to additional agencies and records
33
34
Bill Henry Electronic Records Consultant [email protected] (803) 896-6137
Matt Guzzi Electronic Records Archivist [email protected] (803) 896-6103
34