2012.03.20 ihr farquhar v03

Digital Scholarship, The British Library @The Future of History

Adam Farquhar, Head of Digital Scholarship, The British Library

Outline

• The British Library’s Digital Scholarship Department

• Trends and requirements for research

• Projects to address the requirements

• Conclusion

A New Department: Digital Scholarship

• Develop clear strategies and operating models for the British Library’s role in/contribution to digital scholarship

• Develop innovative models for digital scholarship exploiting digital content and new technologies

• Develop a coherent strategy for digitisation

• Engage with new and existing user communities

• Strengthen the Library’s capabilities

3

Digital Scholarship Collection Areas - Maps

4

Digital Scholarship Collection Areas – Arts, Sound, Video, Music

Digital Scholarship - Digitisation

Digital Scholarship – International Dunhuang Project

Digital Scholarship – Digital Curator Team

8

Digital Scholarship

Definition

• The production, use and integration of digital content, services and tools to facilitate scholarship and research

• Allow research areas to be investigated in new ways, using new tools, leading to new discoveries and analysis to generate new understanding

Requirements • Comprehensive digital collections

• The ability to apply the tools of scholarship to digital collections: annotation, citation, comparison

• Infrastructure to store, preserve, discover, access

• The ability to apply new tools for analysis, visualisation, and experimentation

• Collaboration through social networking tools, social bookmarking, wikis, sharing drafts with commentary

• Non-traditional forms of outreach to draw attention to research

9

Research trends

Trend

• More digital content

• More cross-disciplinary

• More collaborative

• More analysis

• More data-driven

• More repurposing of content

Requirement

• Mass and focused digitisation

• Improved discovery

• Interfaces for sharing and building services, annotation

• Visualisation tools

• Conversion to data and analysis tools

• Open licenses & APIs, documented formats

Creating thematic content - First World War

Europeana – SB Berlin

Centenary of the outbreak of the First World War

Will create a European corpus of digitised materials concerning the First World War in all its aspects

Will contribute to Europeana a substantial collection of more than 400,000 outstanding sources

User generated content

Roadshows in 10 countries to create unique pan-European archive

Preston event produced more than 2300 images from letters, diaries, medals, pictures, trench art, and more

Creating massive digitised collections through partnership

The British Newspaper Archive A partnership between the British

Library and brightsolid online publishing

Will digitise up to 40 million newspaper pages from the British Library's collection over 10 years

Collection includes runs of most newspapers published in the UK since 1800

Over 4m pages added since launch

Google Books

A 6 year project starting June 2011

250,000 Books, 1700-1870

From the French Revolution to the end of slavery.

Material in major European languages

Focus on books that are not yet freely available in digital form online

Access via Google Books and BL

Storage at Google and BL

Contract and terms available on the web!

Making broadcast news more accessible through speech-to-text

Broadcast News • Broadcast News

Television and radio news programmes receivable in the UK, recorded by the British Library since May 2010

Currently record 37 hours per day from 15 TV channels and 2 radio channels such as Al-Jazeera English, CNN, France 24, Russia Today

Innovative search across subtitles (where available) Launch in reading rooms May 2012

• Opening up speech archives AHRC-funded project looking at speech-to-text

technologies for opening up audio and video archives Project will index 3,00 hours of TV news and 3,000 hours

of radio content

IMPACT Historic Text • Improve the digital accessibility of

printed text produced before 1900 State-of-the-art OCR does not produce

satisfactory results for old books, magazines and newspapers

Commercial OCR focuses on modern documents

Historic material have archaic fonts, complex layouts, warped or degraded pages

Manual post-correction is slow and expensive

Visualising Personal Digital Archives and Web Archives

Personal digital archives • Data analysis beyond documents

• Use computer forensics techniques

• Capture, management, description, and preservation of personal digital collections to facilitate access and analysis

• Archives range from poets (W Cope) and playwrights (H Pinter) to computer scientists (D Michie) and biologists

Web archives • Create a research collection of UK

websites

• Develop high-impact data analytical access services

• Demonstrate the potential of domain level web archives, or the “haystacks”

• UK web domain > 9m .uk domain names

• Estimate 110TB/crawl

Making maps accessible through crowd-sourced geo-referencing

• Goal Make maps easy to find, access, use

Crowd-sourcing map geo-referencing

Built on previous crowd-sourcing projects

Addressed key challenges – awareness, engagement, productivity

• Approach Accessible and convenient application

Immediate results and feedback

Competitive tools

Recognition and visible contribution

• Results: 725 maps assigned spatial metadata over 5

days

Publicity minimal – social media key

~90 participants

Top five completed half the work

Data quality good: <3% had errors >.005

Conclusion – Support for Historical Research

• Massive collections of digitised historical material

• Increased integration of images, sound, video

• Improved conversion to text

• Improved support for entity extraction

• Improved linkage across content silos

• Improved frameworks to bring analysis tools to data

• Improved tools for visualisation

• Improved frameworks for annotation and sharing

• Improved integration with research tools

Leveraging the power of digital

reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church

– Franco Moretti, Stanford

17

2012.03.20 ihr farquhar v03

Education

head of digital scholarship

andintegration of digital

digital content mass

digital formonline access

socialusing new tools

new tools foranalysis

tools annotation

networking tools