harvesting and showing complicated sites using archive-it – status for some of our tests from...

20
Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov Larsen, netarchive.dk

Upload: chloe-blair

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015

January 2015

By Tue Hejlskov Larsen, netarchive.dk

Page 2: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

Archive-it (AIT) Setup january 2015 Heritrix 3.3.0 snapshot Umbra - all seed URLs in AIT are crawled using Umbra and Heritrix.

<<When a crawl is run using Umbra, designated seeds are sent by Heritrix to a separate process that mimics the way a browser would access the seed URLs. This allows client-side script to be executed so that previously unavailable URLs can be detected for Heritrix to crawl. Umbra also gives Heritrix a flexible way to imitate human interactions with Web sites that were previously not possible, such as executing JavaScript through clicking or hovering the mouse over different Web page elements and scrolling down a page>>

Harvesting using”Only one page” from october 2014 to january 2015. Following help instructions here

https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=3113092

(and sorry, if i’m missing some of the instructions – AIT updates the instructions from time to time !):

Used Wayback browser in proxy mode : Internet Explore 9

Page 3: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://www.b.dk/nationalt/danske-universiteter-dumper-internationalt

Page 4: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://www.b.dk/nationalt/danske-universiteter-dumper-internationalt They can harvest jsincludes with articles

Page 5: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://www.youtube.com/watch?v=iHNBl2aSJ9g

AIT VideoplayerNo commentsMissing some images

Page 6: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://www.youtube.com/watch?v=iHNBl2aSJ9g

With Video playback in place - only with Firefox in proxy mode

Page 7: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://twitter.com/Spolitik/

With tweets, images, video links

Page 8: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://twitter.com/Spolitik/

No Mouse down Paging

Page 9: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://twitter.com/Spolitik/

Tiny url’s oke.g. http://t.co/dJ0BmbSV9E

Page 10: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://twitter.com/Spolitik/

Using AIT free text search found posts/comments older than showed – have some locale problems…

Page 11: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://twitter.com/Spolitik/

With linked videos - not inplace

Page 12: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://www.facebook.com/socialdemokraterne

Images, Posts and some comments Posts to page in full view History (mouse down) No view comments No view of previous comments Using freetext search I found

comments which could not be showed on the page

Page 13: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://www.facebook.com/socialdemokraterne

Page 14: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://wayback.archive-it.org/4897/20141027123826/https://www.facebook.com/socialdemokraterne/posts/10152451814408030#

Page 15: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://instagram.com/socialdemokraterne

Images2 times mouse down paging

No proveniens topbarNo full imageNo show more button

Page 16: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://www.tumblr.com/search/socialdemokratiet/

Posts and imagesWith big imagesNo notes

Page 17: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://vimeo.com/77382505

With video - not in place

Page 18: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

http://vimeo.com/77382505

Page 19: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

https://www.google.com/culturalinstitute/collection/statens-museum-for-kunst?projectId=art-project

Images not inplaceNo zoom No streetview

Page 20: Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov

Comparison of display capabilities between Archive-it Wayback and NAS Wayback in proxy mode (AIT/NAS)

Summary of the tests – performed from October 2014 to January 2015

*AIT Free text search and AIT GUI views are also used to test what can be showed out of the box with AIT tools Harvesting Frequency currently available by AIT : Twice Daily, NAS each 15 minutes.

AIT/NAS ? = Not yet tested

b.dk and dr.dk youtube twitter facebook instagram Tumblr Vimeo Google art projects

Articles/posts/tweets Yes/No Yes/No Yes/No Yes/No Yes/no Yes/No

Comments/notes/retweets No/No No/No Yes some*/No

No/No No/No ?/No

Likes Yes/No Yes/No Yes/No

Timeline/history No/No Yes some* /No

Yes/No Yes some*/No

Images in(location)/out(site in AIT GUI/NAS GUI)

Yes/some missing Yes/No Yes/No Yes/No Yes/No Yes out/No

Ads in(location)/out(site in AIT GUI/NAS GUI)

Yes some out*/No No/No ?/No ?/No ?/No

Tiny links Yes/No ?/No

Video in (location)/out(site in AIT GUI/NAS GUI)

?/No Yes in/No Yes out/no Yes out/no ?/No ?/No Yes out/No

Image “zoom” No/No Yes/No No/No

Streetview No/No