archive what i see now: personal web archiving with warcs
TRANSCRIPT
![Page 1: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/1.jpg)
Archive What I See Now Personal Web Archiving with WARCs
Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John BerlinWeb Science and Digital Libraries (WS-DL) Research Group
Old Dominion Universityws-dl.cs.odu.edu • @WebSciDL
http://bit.ly/iipcWAC2017HD-51670-13 • HK-50181-14
@machawk1IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 2: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/2.jpg)
Web Archiving Tools for Web Users
Standard Web archiving tools are difficult for non IT experts.
“Save Page As” is not suitable for archiving purposes.
Pages are behind authentication.
Pages change quickly, but current state needs archiving.
ARCHIVE
WHAT I SEE
NOW
HD-51670-13 • HK-50181-14http://bit.ly/iipcWAC2017
![Page 3: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/3.jpg)
Why?
● Allow non-technical users to locally create+replay own archives
● Preserve the previously unpreserved
more archives → more better
http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017June 15, 2017London, UK @machawk1
![Page 4: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/4.jpg)
CREATION+
ACCESSof personal and private web archives
http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017June 15, 2017London, UK @machawk1
![Page 5: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/5.jpg)
Goals: Advance Development of 3 ToolsWARCreateCreate a WARC from what you see in your browser
IIPC Web Archiving Conference 2017June 15, 2017London, UK
Web Archiving Integration Layer (WAIL)Replay the WARC using software of your desktopYour captures never leaves your machine
MinkSee how your captures temporally integrate with institutions’Submit new URIs to Web archives (was to-WAIL in scope?)
http://bit.ly/iipcWAC2017@machawk1
![Page 6: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/6.jpg)
WARCreate
![Page 7: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/7.jpg)
● Google Chrome browser extension● Save WARC files from your browser● No credentials pass through 3rd party● Heavily leverages Chrome webRequest API● Built in ‘12, APIs and libraries have evolved!
WARCreate
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 8: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/8.jpg)
● Three New Modes for Browser-Based Preservation○ Record Mode - retain buffer as you browse○ Countdown Mode - preserve reloading page on an interval○ Event Mode - preserve page when it’s automatically reloaded
● Save to local Web archive (e.g., WAIL)
WARCreate - Recent Advancements
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 9: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/9.jpg)
![Page 10: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/10.jpg)
Web Archiving Integration Layer (WAIL)
![Page 11: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/11.jpg)
Web Archiving Integration Layer (WAIL)● Stand-alone desktop application● Collection-based Web Archiving● Includes Heritrix for crawling, OpenWayback for Replay● Python scripts compiled to OS-native binaries (.app, .exe)
● What to do with WARCs?
● See: How WAIL came about, "Lipstick or Ham"
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 12: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/12.jpg)
WAIL - Recent Advancements● New User Interface● Ported from Python to Electron
○ Now using Web technologies to archive the Web
● Single archive to collection-based archiving● OpenWayback to pywb● Twitter integration
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
WAIL-Electron Feature Walk-through
![Page 13: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/13.jpg)
WAIL - New User Interface
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
Original one-click interface New collection-based interface
http://bit.ly/iipcWAC2017@machawk1
![Page 14: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/14.jpg)
![Page 15: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/15.jpg)
![Page 16: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/16.jpg)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 17: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/17.jpg)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 18: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/18.jpg)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 19: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/19.jpg)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 20: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/20.jpg)
![Page 21: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/21.jpg)
Mink
● Google Chrome browser extension● Indicates archival capture count as you browse● Quickly submit URI to multiple archives from UI● From Mink(owski Space)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 22: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/22.jpg)
Mink - Recent Advancements
IIPC Web Archiving Conference 2017June 15, 2017London, UK
● Enhance interface○ Add number of archived pages to icon at bottom of page○ Allow users to set preferences on how to view large set of mementos
● Communication with user-specified (or local) archive in additional to aggregated institutional archives’ results
http://bit.ly/iipcWAC2017@machawk1
![Page 23: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/23.jpg)
Mink - Previous Interface
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
➢ Interface affected by page CSS
➢ Obtrusive on the viewport by default
➢ Haphazard, inconsistent animations
http://bit.ly/iipcWAC2017@machawk1
![Page 24: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/24.jpg)
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 25: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/25.jpg)
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
➢ Interface-on-demand
➢ Shadow DOM, no CSS intrusion
➢ More consistent, intuitive Miller columns for many captures
http://bit.ly/iipcWAC2017@machawk1
![Page 26: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/26.jpg)
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 27: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/27.jpg)
Mink - Communication with Local Archives
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
Any -compatible archive
AggregatedTimeMap
http://bit.ly/iipcWAC2017@machawk1
![Page 28: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/28.jpg)
Mink usage GIF, also available at:https://youtu.be/bGjxofpTgv4
http://bit.ly/iipcWAC2017
![Page 29: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/29.jpg)
Tools’ Integration
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017@machawk1
IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 30: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/30.jpg)
Tools’ Integration: WARCreate→WAIL● Save WARC directly to local archive (by reference [easier]
○ By-value integration feasibility being investigated a la WASAPI
● Automatically indexed and replayable
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017@machawk1
![Page 31: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/31.jpg)
Tools’ Integration: Mink→WAIL
any -compatible archive
http://bit.ly/iipcWAC2017@machawk1
IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 32: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/32.jpg)
Some Future Work● Decouple Mink from external Memento aggregator
○ Client-side customizable set of archives instead
● WARC replay using browser extensions/apps● Further integration with other archiving tools in WAIL
○ Re-add Memgator Memento aggregator (removed from Electron version)
● Firefox version of tools○ XUL→WebExtensions○ Decouple from Chrome APIs
● Integration with InterPlanetary Wayback (speaking about later today)
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017@machawk1
IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 33: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/33.jpg)
Acknowledgements
● NEH Grant #s HD-51670-13 • HK-50181-14● Dr. Liza Potts and WIDE Research Center at Michigan
State University● ODU SEES Travel Grant
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017@machawk1
IIPC Web Archiving Conference 2017June 15, 2017London, UK
![Page 34: Archive What I See Now: Personal Web Archiving with WARCs](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a6508cb7f8b9aa2548b6135/html5/thumbnails/34.jpg)
Archive What I See Now Personal Web Archiving with WARCs
Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John BerlinWeb Science and Digital Libraries (WS-DL) Research Group
Old Dominion Universityws-dl.cs.odu.edu
HD-51670-13 • HK-50181-14
@machawk1IIPC Web Archiving Conference 2017June 15, 2017London, UK
http://bit.ly/iipcWAC2017