wellcome library & jisc web archiving project presented by michael day, ukoln, university of...

15
Wellcome Library & JISC Wellcome Library & JISC Web Archiving Project Web Archiving Project Presented by Michael Day, UKOLN, University of Bath Presented by Michael Day, UKOLN, University of Bath [Author of the Web Archiving feasibility study] [Author of the Web Archiving feasibility study] Digital Preservation Coalition and Pilgrim Trust Digital Digital Preservation Coalition and Pilgrim Trust Digital Preservation Award, British Library, 6 April 2004 Preservation Award, British Library, 6 April 2004

Upload: jacob-obrien

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Wellcome Library & JISC Wellcome Library & JISC Web Archiving ProjectWeb Archiving Project

Presented by Michael Day, UKOLN, University of Bath Presented by Michael Day, UKOLN, University of Bath

[Author of the Web Archiving feasibility study][Author of the Web Archiving feasibility study]

Digital Preservation Coalition and Pilgrim Trust Digital Digital Preservation Coalition and Pilgrim Trust Digital Preservation Award, British Library, 6 April 2004Preservation Award, British Library, 6 April 2004

Overview of presentationOverview of presentation

Discuss the background to the Wellcome & JISC Web archiving feasibility study.

Consider why we need to archive the web With special reference to the medical Web

Highlight the results of the feasibility study Discuss the actions that have resulted from the study Draw some conclusions and look to the future of Web

archiving in the UK.

Web Archiving Project: backgroundWeb Archiving Project: background

The Wellcome Library’s ICT Strategy 2001-2004 recommended that the Library should “build and maintain archives of ‘born digital’ collections”

“The evanescent nature of this material – here today gone tomorrow – means that capturing born digital materials should be a priority for the Wellcome Library.”

Library’s mission is to “preserve the record of medicine past and present”. We can’t ignore the electronic archive just because it is more difficult !

The JISC digital strategy also recognised the importance of digital preservation

Importance of electronic records and value of JISC project websites Research value of academic materials available on the Web and at risk

Both Wellcome and JISC have a shared interest in the area of Web-archiving and initiated a joint study as a basis for developing their institutional strategies and future programmes.

Aims Aims of the feasibility study - 1of the feasibility study - 1

To provide the Wellcome Library and the JISC with an analysis of existing Web archiving arrangements and to determine to what extent they address the needs of the UK research and further/higher education communities. Specifically:

clarify the need for creating Web archives; whether the archiving initiatives already in place (e.g. Internet Archive) are

meeting the needs of our communities – we would not wish to replicate this work.

Aims Aims of the feasibility study - 2of the feasibility study - 2

To provide recommendations on how the Wellcome Library and the JISC could develop Web archiving initiatives to meet the needs of their constituent communities. Specifically:

identify the benefits from Web-archiving for the defined user communities; evaluate the issues of copyright, sensitive information, performance, and

long-term viability in selected approaches; determine which of the collecting approaches or combination of

approaches, could best address the needs of the defined user communities; consider the interests of potential partners which may allow responsibilities,

development risks and costs to be shared.

Archiving the medical Web: the needArchiving the medical Web: the need

Dear Librarian,Future generations will need to know about medicine today, not just for antiquarian reasons but often for the sake of pressing scientific, medical and epidemiological inquiry. Those researchers will largely be dependent on what you do now …Future researchers will bless us or blame us depending upon how wisely we act now.(Roy Porter - Health Information and Libraries Journal, 2001, 18:137-138)

What are we in danger of losing?What are we in danger of losing? The record of the history of medicine as recorded on the

Web. Some specific examples: WHO pages relating to the SARS outbreak. It is likely that future historians

will want to see how this scare was first covered by the Web Cochrane Database of Systematic Reviews. This database of evidence

based reviews is the gold standard of medical information – but it is only published electronically. If it is not archived – how will we know what the “best treatments” were in, say 2002? This will be a key source when reviewing future claims of medical negligence.

GP-UK discussion list. Future historians interested in how the medical profession responded to BSE or the MMR/autism debate, will find this a rich source of information

Results from the study: key Results from the study: key recommendations recommendations

Successful Web archiving requires collaboration It is recommended that the JISC and Wellcome should work together and with

other partners to create pilot Web archiving service Web archiving initiatives are required NOW to help

preserve the informational, cultural and evidential vale of the Web

The medical web has long-term documentary value for medical historians Examination of existing archives (e.g. Internet Archive) shows that significant

content and functionality are missing If Wellcome and JISC are to meet their strategic objectives they should

A selective approach to Web archiving – with appropriate permissions secured – would be the best way to proceed.

Further research is required to help develop Web crawlers that can archive deep web sites.

Validating the feasibility studyValidating the feasibility study

Study endorsed by an International Advisory Board comprising representatives from the Library of Congress, Internet Archive, National Library of Medicine and National Library of Australia

Colin Webb, Director of Preservation at the NLA applauded the recommendation to undertake a Web archiving project and hoped it would “act as a model to others”

Both reports - Feasibility study into Web archiving, and the Legal study into Web archiving can be accessed at:

http://library.wellcome.ac.uk/projects/archiving.shtml

Implementing the recommendations - 1Implementing the recommendations - 1

Collaboration:Wellcome/JISC led the development of a UK Web Archiving Consortium

Consortium formally established in October 2003 – comprising: The British Library, The National Archives, National Library of Scotland, and National Library of Wales, also the Wellcome Library & JISC.

The BL agreed to be the lead party in the Consortium All costs relating to the establishment of the infrastructure (hosting the

service, providing technical support etc) will be shared equally by the partners

Implementing the recommendations - 2Implementing the recommendations - 2

Start archiving: Consortium has agreed to run a 2-year Web archiving pilot project.

Formal ITT issued in January 2004. Following a PQQ exercise, six organisations were invited to submit full tenders to host and manage this service. Contract will be awarded in April 2004, the service should be live by June 2004.

Formal evaluation will take place to determine the effectiveness of web archiving, and discuss how other institutions can participate and buy into the service.

Implementing the recommendations - 3Implementing the recommendations - 3

Adopt a selective approach:A selective approach (with archiving permissions secured from rights holders) has been agreed.

The Consortium is planning to use the PANDAS software (developed by the National Library of Australia) for harvesting sites, adding metadata, and making the archive publicly available.

The Consortium will work with the appointed “hosting” Contractor and the NLA to actively develop this software.

Implementing the recommendations - 4Implementing the recommendations - 4

Undertake research:During the 2-year pilot specific research elements will be commissioned by the Consortium.

Ensuring the Web archive can interoperate with institutional library catalogues - though use of Z3950 and/or OAI PMH

The Consortium will also keep a watching brief on the IIPC project and the development of the Heritrix harvester. <http://crawler.archive.org>

ConclusionsConclusions

The feasibility study have led to demonstrable actions that promise to deliver a significant leap forward in UK Web archiving activity.

The two-year UK Web Archiving Project will develop a methodology around issues of selection, sustainability and technical functionality.

The pilot study will be evaluated and recommendations made on how to progress Web archiving in the UK and make it sustainable.

Develop a service in which other instructions interested in Web archiving could buy into the service

Though this project is focused exclusively on Web archiving, it is also helping Consortium members get the issue of digital preservation onto institutional agendas.

Thank youThank you