client-side reconstruction of composite mementos using serviceworker

16
Client-side Reconstruction of Composite Mementos Using ServiceWorker Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson Web Science and Digital Libraries Research Group Old Dominion University, Norfolk, VA, 23529 @ibnesayeed @WebSciDL Supported in part by NSF III 1526700 1 JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada

Upload: sawood-alam

Post on 22-Jan-2018

127 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Client-side Reconstruction of Composite Mementos

Using ServiceWorker

Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. NelsonWeb Science and Digital Libraries Research Group

Old Dominion University, Norfolk, VA, 23529

@ibnesayeed@WebSciDL

Supported in part by NSF III 15267001

JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada

Page 2: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

2008 Memento Seen in 2017

2

● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html

?

Page 3: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

2008 Memento Seen in 2012

3

● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html

Page 4: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

XenLand @ Alpha Centauri

4

Page 5: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies in Archive

5

?

Page 6: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies in Archive

6

<img src="http://xenland.alpha/images/map.png">// Is rewritten on replay to become:<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">

// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:var base = 'http://xenland.alpha';var imgdir = '/images/';var img = document.createElement('img');img.src = base + imgdir + 'ruler.png';document.getElementById('ruler').appendChild(img);//=>> http://xenland.alpha/images/ruler.png

Page 7: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Replay URL Resolution & Rewriting

7

Reference type Example Resolution after relocation

Relative path images/logo.png Potentially correct

Absolute path /public/images/logo.png Potentially incorrect

Absolute URL http://example.com/public/images/logo.png Potentially live leakage

http://example.com/public/index.html

...<img src="/public/images/logo.png">...

http://archive.example.org/<datetime>/http://example.com/public/index.html

...<img src="/<datetime>/http://example.com/public/images/logo.png">...

Page 8: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Avoiding Zombies

● Ahead-of-time rendering and JS execution○ http://archive.is/

● Archival replay proxy○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage

● Browser extension○ MementoFox (deprecated)

● JS override○ wombat.js in PyWB

● ServiceWorker

8

Page 9: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

● New web API (still a working draft)● A standalone JavaScript file● Persists in the browser independent of the window● Acts as a proxy● Installed by a web page under its domain at a specific path (called scope)● Intercepts all requests in scope

○ Resources under the scope path (at any depth)○ Secondary resource requests originated from any resource under scope

● Allows modification in request and response● Primarily used in web applications for offline access and notification support● Requires HTTPS● Growing browser support (73.61% as of June 8, 2017)

ServiceWorker

9● http://caniuse.com/#feat=serviceworkers

Page 10: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

reconstructive.js

10● https://github.com/oduwsdl/reconstructive

● A ServiceWorker script written for archival replay● Plug-in for web archives or Memento aggregators● Intercepts all network requests originated from a memento● Reroutes requests to an archive (prevents live leakage & incorrect references)● Optionally rewrites the content to add banner & to fix hyperlinks

Page 11: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies, No More!

11● https://github.com/oduwsdl/ipwb

Page 12: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Rewriting Mementos is Expensive

12

Original capture (without any rewriting)

In our experiment over 500 home pages we observed:

● One-fifth mean data overhead● One-third mean time overhead

15% more data in twice the time

Page 13: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Archival Capture Replay Test Suite (ACRTS)

13

reconstructive.js

● https://ibnesayeed.github.io/acrts/

Page 14: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Reconstruction Winners: PyWB & reconstructive.js

A. OpenWaybackB. PyWBC. Memento

ReconstructD. Memento for

ChromeE. reconstructive.js

14

Page 15: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Future Work

● Use “Prefer” header for original content (when archives support it)● Add a customizable archival banner● Add click handler for lazy rewriting of hyperlinks● Handle archived ServiceWorkers● Write a 404-combat ServiceWorker script for webmasters

15

● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html

Page 16: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

● reconstructive.js => no zombies!● Rerouting instead of rewriting (lazy rewriting)● Mean overhead reduction

○ one-fifth data○ one-third time

● 73.61% (and growing) browser support for ServiceWorker○ http://caniuse.com/#feat=serviceworkers

● reconstructive.js○ https://github.com/oduwsdl/reconstructive

● Archival Capture Replay Test Suite○ https://ibnesayeed.github.io/acrts/

Conclusions

16

● In-depth recap: WADL 2017 Thursday, June 22, 3:45pm (https://fox.cs.vt.edu/wadl2017.html)