2012.03.20 ihr farquhar v03
TRANSCRIPT
Digital Scholarship, The British Library @The Future of History
Adam Farquhar, Head of Digital Scholarship, The British Library
Outline
• The British Library’s Digital Scholarship Department
• Trends and requirements for research
• Projects to address the requirements
• Conclusion
A New Department: Digital Scholarship
• Develop clear strategies and operating models for the British Library’s role in/contribution to digital scholarship
• Develop innovative models for digital scholarship exploiting digital content and new technologies
• Develop a coherent strategy for digitisation
• Engage with new and existing user communities
• Strengthen the Library’s capabilities
3
Digital Scholarship Collection Areas - Maps
4
Digital Scholarship Collection Areas – Arts, Sound, Video, Music
Digital Scholarship - Digitisation
Digital Scholarship – International Dunhuang Project
Digital Scholarship – Digital Curator Team
8
Digital Scholarship
Definition
• The production, use and integration of digital content, services and tools to facilitate scholarship and research
• Allow research areas to be investigated in new ways, using new tools, leading to new discoveries and analysis to generate new understanding
Requirements • Comprehensive digital collections
• The ability to apply the tools of scholarship to digital collections: annotation, citation, comparison
• Infrastructure to store, preserve, discover, access
• The ability to apply new tools for analysis, visualisation, and experimentation
• Collaboration through social networking tools, social bookmarking, wikis, sharing drafts with commentary
• Non-traditional forms of outreach to draw attention to research
9
Research trends
Trend
• More digital content
• More cross-disciplinary
• More collaborative
• More analysis
• More data-driven
• More repurposing of content
Requirement
• Mass and focused digitisation
• Improved discovery
• Interfaces for sharing and building services, annotation
• Visualisation tools
• Conversion to data and analysis tools
• Open licenses & APIs, documented formats
Creating thematic content - First World War
Europeana – SB Berlin
Centenary of the outbreak of the First World War
Will create a European corpus of digitised materials concerning the First World War in all its aspects
Will contribute to Europeana a substantial collection of more than 400,000 outstanding sources
User generated content
Roadshows in 10 countries to create unique pan-European archive
Preston event produced more than 2300 images from letters, diaries, medals, pictures, trench art, and more
Creating massive digitised collections through partnership
The British Newspaper Archive A partnership between the British
Library and brightsolid online publishing
Will digitise up to 40 million newspaper pages from the British Library's collection over 10 years
Collection includes runs of most newspapers published in the UK since 1800
Over 4m pages added since launch
Google Books
A 6 year project starting June 2011
250,000 Books, 1700-1870
From the French Revolution to the end of slavery.
Material in major European languages
Focus on books that are not yet freely available in digital form online
Access via Google Books and BL
Storage at Google and BL
Contract and terms available on the web!
Making broadcast news more accessible through speech-to-text
Broadcast News • Broadcast News
Television and radio news programmes receivable in the UK, recorded by the British Library since May 2010
Currently record 37 hours per day from 15 TV channels and 2 radio channels such as Al-Jazeera English, CNN, France 24, Russia Today
Innovative search across subtitles (where available) Launch in reading rooms May 2012
• Opening up speech archives AHRC-funded project looking at speech-to-text
technologies for opening up audio and video archives Project will index 3,00 hours of TV news and 3,000 hours
of radio content
IMPACT Historic Text • Improve the digital accessibility of
printed text produced before 1900 State-of-the-art OCR does not produce
satisfactory results for old books, magazines and newspapers
Commercial OCR focuses on modern documents
Historic material have archaic fonts, complex layouts, warped or degraded pages
Manual post-correction is slow and expensive
Visualising Personal Digital Archives and Web Archives
Personal digital archives • Data analysis beyond documents
• Use computer forensics techniques
• Capture, management, description, and preservation of personal digital collections to facilitate access and analysis
• Archives range from poets (W Cope) and playwrights (H Pinter) to computer scientists (D Michie) and biologists
Web archives • Create a research collection of UK
websites
• Develop high-impact data analytical access services
• Demonstrate the potential of domain level web archives, or the “haystacks”
• UK web domain > 9m .uk domain names
• Estimate 110TB/crawl
Making maps accessible through crowd-sourced geo-referencing
• Goal Make maps easy to find, access, use
Crowd-sourcing map geo-referencing
Built on previous crowd-sourcing projects
Addressed key challenges – awareness, engagement, productivity
• Approach Accessible and convenient application
Immediate results and feedback
Competitive tools
Recognition and visible contribution
• Results: 725 maps assigned spatial metadata over 5
days
Publicity minimal – social media key
~90 participants
Top five completed half the work
Data quality good: <3% had errors >.005
Conclusion – Support for Historical Research
• Massive collections of digitised historical material
• Increased integration of images, sound, video
• Improved conversion to text
• Improved support for entity extraction
• Improved linkage across content silos
• Improved frameworks to bring analysis tools to data
• Improved tools for visualisation
• Improved frameworks for annotation and sharing
• Improved integration with research tools
Leveraging the power of digital
reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church
– Franco Moretti, Stanford
17