user access patterns in web archives
DESCRIPTION
TRANSCRIPT
UserAccessPatterns inWebArchivesRobot sessions outnumber human sessions 10:1 in the Internet Archive
Yasmin AlNoamany,Michele C.Weigle, andMichael L. Nelson
{yasmin, mweigle, mln}@cs.odu.edu
How do Users access Web Archives?Although user patterns in the live web are well-understood, there has been no corresponding study of howusers, both humans and robots, access web archives.
Abstract Models for Accessing Web Archives
MethodologyData Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,2012.
Robots vs Humans
User Raw Requests Filtered Requests Sessions MBs Transferred
Robots 1,002,573 (50.1%) 396,627 (93.0%) 34,203 (90.9%) 20,010Humans 810,049 (40.5%) 29,690 (7.0%) 3,431 (9.1%) 4,459
Results
Dip Dive Slide and Dive Skim Slide
Per
cen
tag
e
0
10
20
30
40
50
Dip Dive Slide & Dive Skim Slide
010
2030
40 TimeMapMemento
Robots Humans
Robots and humans exhibit different access patterns.
Conclusion• Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1
in terms of MB transferred.
• Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,and that they access TimeMaps almost exclusively.
• Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlikerobots, humans mainly access archived pages rather than TimeMaps.
References1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle andMichael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.