advisory committee on the electronic records archives april 29-30, 2009 program director’s update
TRANSCRIPT
Advisory Committee on the Electronic Records Archives
April 29-30, 2009
Program Director’s Update
Topics
Development and deployment of the ERA instance for the G.W. Bush presidential records
Plans for further development
Where is ERA?Where is ERA?
Rocket Center, WV
Erma Ora Byrd Conference & Learning Center
The Search & Access ERA Instance for G. W. Bush Electronic Presidential Records
What Does the Base ERA Do?
Focus:
Functions:
Federal Records Nationwide records management program National Archives
Creation, review and approval of records schedules
Manage transfer of physical and legal custody of all types of records
Systematically collect, create, and manage lifecycle data about records
Actual transfer, inspection, and archival storage of electronic records
What Does the Search & Access ERA Do?
Focus:
Functions:
Presidential Electronic Records George W. Bush Presidential Library
Rapid ingest of very large volumes of electronic records
Automatic indexing on ingest Immediate searchability, based on index Creation of different versions to support
structured search of priority records Basic case management for review and
redaction of sensitive content.
Search and Access Instance Development
Achieved Initial Operating Capability December 8, 2008
LMC proposed and received NARA and EOP agreement on an expedited method for transfer of electronic records.
NARA has enjoyed excellent collaboration from the EOP.
NARA implemented a contingency plan for access to high priority e-records, the finding aid for WH paper records and the database of digital photography, pending completion of processing into ERA.
1/26
EOP Transfer & Ingest Overview
ARMS (PRA) = 1.9 TB
PDS = 0.0005 TB
WARDS = .018 TB
SAN B1
PDS (delta) = 0.0005 TB
WARDS (delta) = 0.001 TB
SAN A
Exchange
12/5 (IOC)12/8
ARMS (SAN)
12/15
PDS WARDS
1/15
RMS
1/30
Merlin One = 36 TB
Non-Pri Types = 20TB
RMS = 1.0 TB
6.0
Sto
rag
e
Arr
ay
s
7.1
7.2
ARMS (PRA)
PDS
WARDS
PDS (delta)
WARDS (delta)
Merlin One
1/20
Da
ta T
yp
e
SW
Dro
ps
SA
SS
O
pe
rati
on
s
(In
ge
st)
7.0
SAN B Returns
12/12
Merlin One
April 11, 2023
RMS
2/11
Snap Server
RMS(Update)
11
SAN A2
Merlin One2 = 36 TB
Exchange
Non-Pri Types = 0.2 TB
SAN B2
Exchange = 57 TB
ARMS (FRA) = 5.1 TB
? 5/16?
G.W. Bush Presidential Electronic Records
RecordsNumber of
objects
Gigabytes of
Data
Shipped to ERA Data Center
Status
Priority Records
Email (2000-2003) 44,815,184 1,688 12/8/2008 >99% available for search in ERA. There are technical problems with the remaining messages.
MS Exchange email (2003-2008)
150,000,000 estimated
16,500 estim
ated
Expected mid May
In temporary storage. Conversion to standard format, separation from federal records, and identification of responsible EOP component largely complete.
Presidential Diary 682,193 1 12/8/2008 and 1/26/2009
100% available for search in ERA
digital photography 11,220,044 31,000 1/26/2009 Problems require shipment of a second set, expected in mid May
Index to White House paper records
313,850 583 1/26/2009 100% available for search in ERA, but about 6% of the records appear to be missing some pieces of data.
Visitor and worker access to EOP buildings
28,922,988 14 12/8/2008 and 1/26/2009
100% available for search in ERA
Index to motion video 305 5 1/26/2009 In ERA, being processed
Email from WH Counsel 572,051 1,057 1/26/2009 In ERA, being processed
Other Records >12,000,000 >5,450 Partial shipment 1/26/2009
Some in ERA, being processed.Remainder expected mid May
Processing Status - 1 All Bush e-records have been transferred to NARA’s custody.
Not all have been transferred to the ERA Data Center in WV. EOP is maintaining copies until NARA successfully completes ingest.
Archives Operational Issues Several sets of records were not transferred in the formats previously agreed by NARA and
EOPo NARA required retransmission
Some records exhibited anomalieso Some ARMS email records had binary data in the “To” fieldo Some metadata in the digital photography system did not have corresponding images.o Some entries in the Records Management System are missing some fields.o MS Exchange email was not divided presidential from federal records or associated
with EOP component, and contained numerous duplicates. EOP is addressing these problems prior to transfer to ABL. EOP has converted from proprietary to standard format. NARA will preserve both the original files and the output of the EOP processing.
o Encoding of date of birth in the Access system impeded searches on that field. Viruses have been found in a small percentage of files.
o Infected files have been successfully quarantined. LMC & NARA are working to produce clean copies.
Processing Status - 2
Technical Issues Issues with COTS products:
o Automatic indexing of a batch of records stops when errors are found in any of the records; e.g., binary data in headers of email.
o Erroneous results returned in certain conditionso Incomplete search results returned in other cases. o LMC underestimated storage space needed for the index.
Additional hardware has been ordered. Unanticipated software development needed to ensure complete
and accurate mapping between ‘.eml’ email produced by the EOP and the original MS ‘.pst’ files
NARA directed LMC to hire a subcontractor to perform actual ingest of records.
Status of Requests for Bush Records
28 Requests for access as of March 17, 2009 Primarily for paper records
NARA has responded using data about the paper records in the Records Management System
A few requests were for digital photographs. Most requests were addressed using the two systems
NARA set up under the Contingency Plan because processing of the records had not been completed at the time the requests were received.
Three requests fulfilled using records on temporary ERA storage.
Plans for Further Development
What’s in Store for the Future? Increment 2
Preservation Frameworko Introduction and use of a variety of tools for different preservation
needs Public access
o Information about all types of recordso Online access to electronic records
Initial system evolution Increments 3 - 5
Incremental enhancements in capability & capacity Continuing system evolution Governmentwide expansion Full Lifecycle Management Plans Appraisal case management and workflow Search Framework supporting different tools FOIA and other access case management Review and redaction of sensitive content
Shared Services
ERA Functional View: Current Status
System Management
System Management
Help DeskHelp DeskNetworkNetwork
Base InstanceBase Instance EOP InstanceEOP Instance
White HouseAgencies
Enterprise Service Bus
Data Management
Shared Services
ERA Functional View: Planned
System Management
System Management
Preservation Framework
Preservation Framework
Public AccessPublic Access
Help DeskHelp DeskNetworkNetwork
Base InstanceBase Instance EOP InstanceEOP Instance
White House
Congressional Instance
Congressional Instance
Committees
Records Center
Instance
Records Center
Instance
AgenciesAgencies
Public
Enterprise Service Bus
Current capability: solid fill
Future capability: hashed fill
Data Management
ERA Instances Base Instance (June 2008)
Used by NARA and federal agencies For management of all federal records For transfer, inspection and management of federal electronic records
EOP instance (December 2008) Used by NARA and Presidential Administrations For transfer, inspection, and management of presidential electronic
records Congressional Instance (future)
Used by NARA for Congressional Committees For transfer, inspection, and management of presidential electronic
records Federal Records Center Instance (future)
Used by NARA and other federal agencies For transfer and storage of temporary and permanent federal electronic
records that remain under the control of the originating agency
ERA Shared Services System Management (current)
System operation and maintenance Security User account management Deployment of new & updated software Backup & other common services
Help Desk (current) Respond to technical questions and issues from users
Network Link to the Internet, NARANET (current) Interfaces with other systems (future)
Data Management Data about records and transactions related to them (current) Description of NARA holdings (Increment 2) Review and redaction of records with restricted content (future)
Preservation Framework (Increment 2) Tools to overcome obsolescence of different digital formats (future)
Public Access (Inc. 2 +) Search and retrieval of information about records, regardless of custody Search and access to electronic records in NARA’s custody Search and access to digitized records from NARA’s holdings Freedom of Information Act for restricted records in NARA’s custody
Advantages of the Instances & Shared Services Approach
Instances enable different business rules and processes for different mission requirements: Base Instance: Federal Records Act provisions on
governmentwide records management and on the National Archives
EOP instance: Presidential Records Act Congressional instance: House and Senate rules. Federal Records Center Instance: Federal Records
Act provisions on storage of temporary and permanent records under originating agencies’ authority.
Advantages of the Instances & Shared Services Approach
Shared services maximize utilization of resources, reduce redundancy and provide a stable foundation for system growth and evolution over time.
Shared services deliver capabilities and capacity wherever needed, regardless of differences in mission and business needs E.g. the Preservation Framework can be used to preserve any
electronic records, regardless of whether they came from Congress, the White House or a federal agency.
E.g., a citizen seeking access to information will be able to find it using a single web portal, regardless of whether
o It is information about records or in the records, o the records are in NARA’s physical custody, o the records are electronic or hard copy, o they originated in the White House, Congress or an agency.
Preservation
Electronic Record2 Preservation
Framework
Record Identity
Record Integrity
Original Order
Tool1 Tool2
Tooln…
The Preservation Framework supports the introduction and use of an arbitrary number and variety of processes under the control of archival requirements for authenticity.
Electronic Recordn
Electronic Record1
Electronic Record2’
Electronic Recordn’
Electronic
Record1’
…
Public Access
Information about all records From Records Schedules Archival Descriptions Other NARA information
Online access to electronic recordsOnline access to scanned versions of hard copy
recordsRequests for copies of recordsFreedom of Information Act requests for
restricted recordsAssistance from NARA staff
Increment 3 Work Status
Authority to Proceed Issued for Early Analysis Architectural Framework Preservation examination and prototyping Search Engine examination and selection Open Access examination and selection Enhancements to address authorized user defined
changes and software defects not addressed at IOC
Discussions begun on scope of work and technical details for full proposal
Target date for award: 7/09
Governmentwide Expansion
Initial Implementation June 2008 – June 2009 Four collaborating agencies NARA staff proxy for other agencies
Invitational Phase June 2009 – February 2010 Additional agencies by invitation
Voluntary Phase February 2010 – December 2010 Additional agencies who volunteer and meet critera
Mandatory Phase January 2011 All agencies
The Development Timeline
Full Operating Full Operating CapabilityCapability
Initial Operating Capability)
6/08
Operation & Maintenance
9/05 9/06 9/07 9/08 9/09 9/10 9/11
Search & Access ERA
Public Access &Preservation Framework
Enhancement
Enhancement
ERA Base