taming the wild listserv; or, how to preserve specialized e-mail lists
DESCRIPTION
Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists. Lisa M. Schmidt [email protected] http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007. - PowerPoint PPT PresentationTRANSCRIPT
Taming the Wild LISTSERV; or, How to Preserve
Specialized E-Mail Lists
Lisa M. [email protected]://www.h-net.org/archive/
MATRIX: The Center for Humane Arts, Letters & Social Sciences Online
Michigan State UniversityMay 23, 2007
H-Net: Humanities and Social Sciences Online
• International consortium of scholars and teachers
• Oldest collection of born-digital and content-moderated arts, humanities, and social science material on the Internet
• Valuable scholarly resource– More than 180 networks, or e-mail lists– More than 230 “private” lists
• More than 1 million e-mail messages
MATRIX
• Digital humanities research center• Devoted to the application of new
technologies in humanities and social science teaching and research
• Uses Internet technologies to improve education and increase the flow of information
NHPRC Grant
• Conduct assessment of existing H-Net preservation policies and practices
• Develop an improved long-term preservation plan
• Apply NARA/OCLC TRAC checklist• Useful to those managing large collections of
electronic records• Research semantic clustering search
techniques
Preserving E-Mail Lists as Scholarly Resources
• How H-Net Works• Current Preservation Practices• Trustworthy Repositories Audit &
Certification: Criteria and Checklist (TRAC)
• Other E-Mail Preservation Projects• Preservation Improvement Plan
How H-Net Works:Network Configuration
How H-Net Works:Backup & Security
• Daily incremental backups, weekly full backups– Tapes cycle through system every 6 weeks– Swapped tapes kept in locked cabinet in secured
room– Tapes replaced as needed
• Monthly full, permanent tape backups– Tapes kept in secured room– Plans to keep log and move to offsite storage
• Server rack kept in climate controlled, physically secured room
How H-Net Works:Posting Messages
• H-Net runs on LISTSERV Software• Users must be list subscribers to post• Messages written in plain text• No attachments allowed on public lists
How H-Net Works:Posting Messages
Message Posting Process
How H-Net Works:Archiving of Lists
• Messages kept in flat text files called “notebooks”
• Post from a few seconds up to several days after approval
• Notebook includes messages posted during a weekly time period
How H-Net Works:Archiving of Lists
Time Period Day of Month
a 1-7
b 8-14
c 15-21
d 22-28
e 29-31
Ex. “h-africa.log0802a”
How H-Net Works:Archiving of Lists
• BRS Database– Newest notebook messages parsed and copied
every 24 hours– MD5 hashes created for each message– Available for full-text search
• MySQL Database Cache– Log browse cache extracts key metadata,
creates MD5 hashes– Cache builder script writes metadata to MySQL
database cache
How H-Net Works:Archiving of Lists
Message Metadata Stored in MySQL Database
How H-Net Works:Message Retrieval
How H-Net Works:Message Retrieval
How H-Net Works:Message Retrieval
How H-Net Works:Message Retrieval
How H-Net Works:Message Retrieval
Constructed Persistent URLhttp://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=
H-Albion&month=0805&week=c&msg=jeSTCR0QAxq28hhgJPZ%2beQ&user=&pw=
Current Preservation Practices
Message Ingest, Storage, and Retrieval Processes
Current Preservation Practices
• Backup and storage• Significant properties: message content,
stored in plain text formats• Authenticity
– Informal check by author and/or editor on posting– Broken URL on message retrieval attempt
• Cached metadata fulfills PDI requirement
Current Preservation Practices
OAIS PDI Term H-Net Cached MetadataReference Information filename + messageid
Context Information filename, from, subject, date (dpb)
Provenance Information filename, from, subject, date (dpb)
Fixity Information messageid
Cached Metadata
TRAC
• Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) published by NARA and OCLC, 02/07
• For certification by third party or self assessment
• Three sections– Organizational Infrastructure– Digital Object Management– Technologies, Technical Infrastructure, & Security
TRAC
Other E-Mail Preservation Projects
Preservation of Electronic Mail Collaboration InitiativeNorth Carolina State Archives, Kentucky Department of Library and Archives, Pennsylvania State Archiveshttp://www.ah.dcr.state.nc.us/records/EmailPreservation/default.htm
Collaborative Electronic Records ProjectSmithsonian Institution/Rockefeller Archives Centerhttp://siarchives.si.edu/cerp/index.htm
Collection-Based Long-Term PreservationSan Diego Supercomputer Centerhttp://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA365661&Location=U2&doc=GetTRDoc.pdf
All Used XML Encoding
Preservation Improvement Plan:Backup & Storage
• Media refreshment schedule for all tapes• Systematic sampling, remounting, reading,
retensioning permanent tapes• More than one set of backup tapes, or a
server mirror• Secure storage systems• Backup log• Participation in distributed storage system,
such as LOCKSS or iRODS
Preservation Improvement Plan:Authenticity
• Shorten and standardize ingest time window to seconds rather than weeks
• Define and document access permissions• Maintain audit log that tracks all activities
associated with records• Perform regular authenticity checks using
message digests• Consider using SHA-2 for integrity checks
Preservation Improvement Plan
• Continue to use MD5 to calculate name• Generate shorter persistent URL for use as
citation
• Awkward metadata handling• Editor data should be added to what’s there,
not replace it
Preservation Improvement Plan:Migration
• Messages and Notebooks– No migration strategy needed– Plain text ASCII and UTF-8 stable, open formats
• Attachments– Make private lists browsable by providing constructed
URL– Display attachment link in browse window– Detach attachments from notebook files, store separately,
link to original message– Provide conversion on demand to current formats
Preservation Improvement Plan:From TRAC Checklist
• Succession plan• Periodic review or trigger event definition• Technology watch• Document, document, document!
– Technology history– Change management system– Staff roles, responsibilities, and authorizations– Written recovery plan
References
• H-Net Archives, Documentation, http://www.hnet.org/archive/doc.php
• H-Net: Humanities and Social Sciences Online, http://www.h-net.org
• InterPARES, http://www.interpares.org• MATRIX: The Center for Humane Arts, Letters, and
Social Sciences Online, http://www.matrix.msu.edu• OAIS Reference Model,
http://public.ccsds.org/publications/archive/650x0b1.pdf• Trustworthy Repositories Audit & Certification: Criteria
and Checklist, http://www.crl.edu/PDF/trac.pdf