your scholarship. our world. preserving the long tail' by vicky reich
TRANSCRIPT
Your Scholarship. Our World.
Preserving The Long Tail
Victoria Reich, Executive Director LOCKSS Program
Stanford University Librarieshttp://www.lockss.org/
1 September 2015
Content Is Web-based
Web size: 1200 PBInternet Archive: 9 PB
Brewster Kahle Founder and Director
Internet Archive
CONTENT IN CONTEXTScholarly communication is: author’s words, data, software, communication, identity, related works, etc.
2
50% Preserved?
• 2010 ARL median research library receives ~80K serials
• Keepers Registry reports 28.5K preserved; 10K in progress– Not adjusted for risk – Not adjusted for
difficulty and cost
3
Biggest Threat to Content?
• Obsolescence and/or failure– Formats– Media, – Hardware, – Software
• Economic – national, organizational• Natural disasters• Humans
4
Reality
The rate of loss to future researchers from “never preserved” will vastly exceed that from all other causes.
Dr. David S. H. Rosenthalhttp://blog.dshr.org/2014/12/talk-at-fall-cni.html
5
Philosophy
Preservation is an activecommunity effort LOTS OF COPIES KEEP STUFF SAFE LOTS OF COMMUNITIES KEEP STUFF SAFE
6
The LOCKSS Program
• Communities use LOCKSS open source software to preserve & access their scholarly record
• LOCKSS staff provide services and software
7
LOCKSS At Stanford
8
Preservation Architecture
• Ingest• Preservation• Dissemination• Management
9
Formats Preserved
application/eps application/epub+zip application/javascript application/msword application/octet-stream application/powerpoint application/pdf application/postscript application/rss+xml application/rtf application/vnd.fdf application/vnd.ms-excel application/vnd.ms-powerpoint application/vnd.ms-word application/vnd.openxmlformats officedocument.wordprocessingml.document application/vnd.rn-realmedia application/wordperfect5.1 application/xhtml+xml application/x-javascript application/xml application/x-msexcel application/x-research-info-systems application/x-troff application/x-zip-compressed application/zip audio/mpeg audio/x-mp3 audio/x-pn-realaudio chemical/x-mdl-molfile image/bmp image/gif image/jpeg image/pjpeg image/png image/svg+xml image/tiff image/vnd.microsoft.icon image/x-icon text/css text/html text/javascript text/plain text/rtf text/x-bibtex text/x-js text/xml video/avi video/mp4 video/mp4v-es video/mpeg video/quicktime video/x-msvideo video/x-ms-wmv
• LOCKSS software migrates formats as needed10
Automated Cooperative Preservation
Identify and preserve authoritative version
TRAC/ISO16363 Audit
• 1st Ever Perfect Score for Technologies, Technical Infrastructure, Security
• Equaled previous over all highest score (Scholars Portal)• Documentation is public
• Linked from dshr.blog.org • Introduction• TRAC Audit: Process -
http://blog.dshr.org/2014/08/trac-audit-process.html • TRAC Audit: Lessons -
http://blog.dshr.org/2014/08/trac-audit-lessons.html • TRAC Audit: Do-It-Yourself Demos -
http://blog.dshr.org/2014/08/trac-audit-do-it-yourself-demos.html
12
Many LOCKSS Networks
• 1000’s of publishers– Subscription, open access, etc.
• Ingest techniques– OAI/PMH, web crawling, file transfer, API, etc.
• Preserved content types – Journals, books, databases, government documents,
thesis and dissertations, image collections…• Each with an organization & business model• Each with an appropriate access policy
13
A Few Networks
14
Global LOCKSS Network
15
Private LOCKSS Networks
!!!
!
!
!
!
v !
!
!
16
U.S. Government Documents
17
James Jacobs, Stanford
Canadian Government Information
18
Innovative Technology Award
19
5 Universities / 3 countries
20
Master and Ph.D. thesesAcademic publications Research data
Brazil’s Cariniana
21
PKP Private LOCKSS Network
22
Access From A LOCKSS Box
23
When The Publisher Is Not Available
24
Appreciation
The LOCKSS Program’s simple and flexible technical architecture is particularly well suited to the rapidly evolving landscape of e-journal publishing and scholarly practice.
Bernie Reilly, PresidentCouncil for Research Libraries,
2014
25
Research & Development • Internet Archive
– IMLS to build web preservation APIs– Web infrastructure and collection building
• Mellon Foundation– Preserving the future web – Emulation as a preservation strategy
• IIPC – Preservation and access (Memento, INA’s LAP)
• Library of Congress– Economics of long term storage
• Force 11 – Scholarly communication
• University of California Santa Cruz– Storage technologies, Advise PhD students
• 4C– Preservation Costs
26
Thank you
Looking forward to the conversation!
27