the hiberlink project is supported by the andrew w. mellon foundation

Post on 23-Feb-2016

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Hiberlink Project is supported by the Andrew W. Mellon Foundation. Hiberlink – Towards Time Travel for the Scholarly Web. Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @ azaroth42 Herbert Van de Sompel hvdsomp@gmail.com - PowerPoint PPT Presentation

TRANSCRIPT

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

1

Hiberlink – Towards Time Travelfor the Scholarly Web

Martin Kleinmartinklein0815@gmail.com@mart1nkle1n

Robert Sandersonazaroth42@gmail.com@azaroth42

Herbert Van de Sompelhvdsomp@gmail.com@hvdsomp

http://www.hiberlink.org/ http://www.mementoweb.org/The Hiberlink Project is supported by the

Andrew W. Mellon Foundation

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

2

LANL

• Herbert Van de Sompel

• Rob Sanderson• Martin Klein

U. Edinburgh

• Claire Grover• Beatrix Alex• Richard Tobin• Adam Zhou

Hiberlink Project and Partners

EDINA

• Peter Burnhill• Christine Rees• Muriel Mewissen• Tim Strickland• Neil Mayo

Two year project funded by Andrew W. Mellon Foundation

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

3

Problem Statement

Preservation of formal scholarly output is (relatively) well understood.

Preservation of the resources that make up the context for that research is not:

• Datasets• Software• Workflows• Videos, Slides• Project and Demonstration web sites• AJAX• …

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

4

To what extent are web resources that are referenced from works in repositories still available at their original URL …

or from archives of web resources?

Participants: LANL, UNT, arXiv

Paper: http://arxiv.org/abs/1105.3459

Contributions: • Much larger scale than any previous study, 162,052

unique URLs• Automatically searched multiple archives for all URLs,

rather than manually for a small subset

Pilot Study

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

7

Pilot Study: Results

• 72% in archives and/or still exist

• High proportion of archived URLs, possibly due to academic level and general disciplines

• 78% in archives and/or still exist

• 45% still exist, but not archived!Possibly due to high value, but very discipline specific references

UNT

arXiv

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

8

To what extent are web resources that are referenced from works in repositories still available at their original URL … or from archives of web resources?

Redo the same experiment with…• Even larger dataset with millions of papers and URLs• Text mining processes for URL extraction • Track location of URL (citations, footnote, text, etc)• Evaluation of extraction via gold standard dataset• Determine type of resource referenced• Track type of publication (journal, thesis, report, etc)

Hiberlink: Quantify Full Extent of the Problem

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

9

We propose two active archiving solutions of resources referenced from scholarly papers to ensure that the scholarly record remains unbroken

1. Active Crawling:• Run extraction routines at repositories, publishers, or

third parties via text mining agreements or open access publications

• Feed the URL seed list to existing web crawlers, such as the Internet Archive

• IA (and others) already Memento compliant

Hiberlink: Propose Solutions (1)

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

10

2. Transactional Archiving:• Willing server forks responses for resources and

sends to both browser and to archive for preservation

Hiberlink: Propose Solutions (2)

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

11

2011 pilot study showed:• Significant problem!• Random archiving by web crawlers is not enough

Hiberlink project will:• Fully quantify the extent to which web resources that

form the context of scholarly output are available and archived

• Propose active solutions to prevent the loss of further resources

• Use Memento for both research and access

Summary

Hiberlink – Towards Time Travel for the Scholarly Web July 25th 2013, Indianapolis, IN, USA

12

Hiberlink – Towards Time Travelfor the Scholarly Web

Martin Kleinmartinklein0815@gmail.com@mart1nkle1n

Robert Sandersonazaroth42@gmail.com@azaroth42

Herbert Van de Sompelhvdsomp@gmail.com@hvdsomp

http://www.hiberlink.org/ http://www.mementoweb.org/The Hiberlink Project is supported by the

Andrew W. Mellon Foundation

top related