university of edinburghhomepages.inf.ed.ac.uk/balex/talks/bl-labs-symposium... · 2014. 12. 1. ·...

Post on 30-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Palimpsest: an Edinburgh Literary Cityscape!Beatrice Alex!

University of Edinburgh

British Library Labs Symposium 2014, London, November 3rd 2014

British Library Labs Symposium 2014, London, November 3rd 2014

Palimpsest!!AHRC (Big Data) project: 01/2014- 03/2015 Literature, University of Edinburgh James Loxley, Professor of Early Modern Literature Miranda Anderson, Research Fellow !Informatics, University of Edinburgh Jon Oberlander, Professor of Epistemics Beatrice Alex, Research Fellow in Text Mining Claire Grover, Senior Research Fellow !SACHI: St Andrews Human Computer Interaction Research Aaron Quigley, Director of SACHI & Chair of Human Computer Interaction David Harris-Birtill, Research Fellow Uta Hinrichs, Research Fellow !EDINA James Reid, Workgroup Leader, Geoservices Nicola Osborne, Social Media Officer

Prototype

British Library Labs Symposium 2014, London, November 3rd 2014

Prototype

British Library Labs Symposium 2014, London, November 3rd 2014

I visited Edinburgh with languid eyes and mind; and yet that city might have interested the most unfortunate being. Clerval did not like it so well as Oxford; for the antiquity of the latter city was pleasing to him. But the beauty and regularity of the new town of Edinburgh, its romantic castle and its environs, the most delightful in the world, Arthur’s Seat, St. Bernards Well, and the Pentland Hills, compensated him for the change and filled him with cheerfulness and admiration.

Mary Shelley, Frankenstein

Frankenstein

British Library Labs Symposium 2014, London, November 3rd 2014

Edinburgh: Picturesque Notes

But it is not only pipers who have vanished, many a solid bulk of masonry has been likewise spirited into the air. Here, for example, is the shape of a heart let into the causeway. This was the site of the Tolbooth, the Heart of Midlothian, a place old in story and namefather to a noble book.

! Stevenson, Edinburgh: Picturesque Notes

British Library Labs Symposium 2014, London, November 3rd 2014

Trainspotting

These burds ur gaun oantay us aboot how fuckin beautiful Edinburgh is, and how lovely the fuckin castle is oan the hill ower the gairdins n aw that shite. That's aw they tourist cunts ken though, the castle n Princes Street, n the High Street. Like whin Monny's auntie came ower fae that wee village oan that Island oaf the west coast ay Ireland, wi aw her bairns. The wifey goes up tae the council fir a hoose. The council sais tae her, whair's it ye want tae fuckin stey, like? The woman sais, ah want a hoose in Princes Street lookin oantay the castle.…Perr cunt jist liked the look ay the street whin she came oaf the train, thoat the whole fuckin place wis like that. The cunts in the council jist laugh n stick the cunt n one ay they hoatline joabs in West Granton, thit nae cunt else wants. Instead ay a view ay the castle, she's goat a view ay the gasworks. That's how it fuckin works in real life, if ye urnae a rich cunt wi a big fuckin hoose n plenty poppy.

Irvine Welsh, Trainspotting

British Library Labs Symposium 2014, London, November 3rd 2014

Datasets

HathiTrust collection (all worldwide public domain material)

British Library Nineteenth Century Books collection

English Project Gutenberg books

Oxford Text Archive data

National Library of Scotland data

ECCO/EEBO?

Limited set of copyrighted material, if author/publisher agrees (Irvine Welsh, Muriel Spark, Alexander McCall Smith ...)

British Library Labs Symposium 2014, London, November 3rd 2014

Palimpsest Workflow

British Library Labs Symposium 2014, London, November 3rd 2014

HathiTrust collectionBritish Library Nineteenths Century Books

National Library of Scotland collectionOxford Text ArchiveProject Gutenberg

...

TEXT MINING

DIGITISED DOCUMENTS DOCUMENT RETRIEVAL & FILTERING

RELATIONAL DATABASE

USER INTERFACES

EDINBURGH GAZETTEER

Ranked lists of Edinburgh-specific candidates

MANUAL CURATION

Curation of Edinburgh-specific literature

fine-grained location extraction and geo-referencing using the Edinburgh Geoparser

geo-referenced locationssnippets

meta data

24.189 The Journal of Sir Walter Scott (Scott, Walter) 22.079 Robert Louis Stevenson (Black, Margaret Moyes)20.725 The Modern Scottish Minstrel, Volumes I-VI. (Various)19.610 Spare Hours (Brown, John)17.181 The Heart of Mid-Lothian (Scott, Walter)15.369 The Works of Robert Louis Stevenson (Stevenson, Robert L.)15.018 Rab and His Friends and Other Papers (Brown, John)14.177 Greyfriars Bobby (Atkinson, Eleanor)...

gazetteer of Edinburgh place names and their latitude/longitude pairs or shape files derived from several sources

Palimpsest Workflow

British Library Labs Symposium 2014, London, November 3rd 2014

HathiTrust collectionBritish Library Nineteenths Century Books

National Library of Scotland collectionOxford Text ArchiveProject Gutenberg

...

TEXT MINING

DIGITISED DOCUMENTS DOCUMENT RETRIEVAL & FILTERING

RELATIONAL DATABASE

USER INTERFACES

EDINBURGH GAZETTEER

Ranked lists of Edinburgh-specific candidates

MANUAL CURATION

Curation of Edinburgh-specific literature

fine-grained location extraction and geo-referencing using the Edinburgh Geoparser

geo-referenced locationssnippets

meta data

24.189 The Journal of Sir Walter Scott (Scott, Walter) 22.079 Robert Louis Stevenson (Black, Margaret Moyes)20.725 The Modern Scottish Minstrel, Volumes I-VI. (Various)19.610 Spare Hours (Brown, John)17.181 The Heart of Mid-Lothian (Scott, Walter)15.369 The Works of Robert Louis Stevenson (Stevenson, Robert L.)15.018 Rab and His Friends and Other Papers (Brown, John)14.177 Greyfriars Bobby (Atkinson, Eleanor)...

gazetteer of Edinburgh place names and their latitude/longitude pairs or shape files derived from several sources

Big data IN!!

Small data OUT

Geo-specific Tasks

Retrieve literary works which are at least partly set in Edinburgh from all literature accessible to us.

Devise a method for identifying “loco-specificity” in literature automatically based on input from literary scholars.

Create a fine-grained location gazetteer for Edinburgh.

Identify and geo-reference locations (including street names and buildings) using the Edinburgh Geoparser.

British Library Labs Symposium 2014, London, November 3rd 2014

But first …

All input documents must first be:

Converted to a common format.

Identified as written English text.

Post-corrected automatically, if necessary.

Once curated, linguistically pre-processed.

British Library Labs Symposium 2014, London, November 3rd 2014

Document Retrieval

Goal: Find all Edinburgh loco-specific items which fit our remit (fiction, autobiography, travel writing, memoirs, ...).

Index collections and perform query & meta information dependent ranking.

Initial experiments on HathiTrust data (239,481 documents = all books, serials, journals, biographies).

Ranked outputs to be checked by literary scholars and feedback to improve the retrieval component.

Applied improved methods to all other collections.

British Library Labs Symposium 2014, London, November 3rd 2014

Ranked Documents

255.23 Cassell's Old and New Edinburgh, etc GRANT, James (1880)

130.41 Picturesque Edinburgh LOCKIE, Katharine F. (1899)

98.02 Memorials of Edinburgh in the olden time. 2nd ed. WILSON, Daniel (1891)

95.33 Memorials of Edinburgh in the olden time WILSON, Daniel (1848)

90.13 Home Country of R. L. Stevenson ... GEDDIE, John (1898)

89.75 Water of Leith, Source to Sea…. GEDDIE, John (1896)

27.18 Brought to Bay; Experiences of a City Detective MACGOVAN, James (1878)

British Library Labs Symposium 2014, London, November 3rd 2014

Assisted Curation

British Library Labs Symposium 2014, London, November 3rd 2014

Discovery

British Library Labs Symposium 2014, London, November 3rd 2014

Gazetteer Creation

Our text mining tools use the Edinburgh Geoparser to mark-up place names and resolve them to coordinates with a choice of gazetteer as the reference source (GeoNames, OS, ...).

We need to create a local gazetteer by aggregating information from multiple sources:

OpenStreetMap

OSLocator (Ordnance Survey roads)

RCAHMS and Historic Scotland (listed buildings, parks, monuments)

British Library Labs Symposium 2014, London, November 3rd 2014

Multiple Sources and Formats

British Library Labs Symposium 2014, London, November 3rd 2014

Aggregated Gazetteer

Good examples: <place name="Adam Bothwell's House" lat="55.949947" long="-3.19158" source="rcahms"/>

<place name="Adam Smith Statue" lat="55.9497628" long="-3.1900024" source="osm"/>

<place name="Oxford Bar" lat="55.9529618" long="-3.2047389" source="osm"/>

<place name="Oxford Bar" lat="55.952983" long="-3.204677" source="rcahms"/>

Bad examples: <place name="Spring" lat="55.9097916" long="-3.2180268" source="osm"/>

<place name="Aerated Water Factory" lat="55.942886" long="-3.04704" source="rcahms"/>

<place name="Amaryllis" lat="55.9406106" long="-3.2079740" source="osm"/>

British Library Labs Symposium 2014, London, November 3rd 2014

Geo-Referencing

British Library Labs Symposium 2014, London, November 3rd 2014

Mobile Interface

British Library Labs Symposium 2014, London, November 3rd 2014

start walking! ! ! select mode literature-on-the-go

…!under development at SACHI.

Mobile Interface

British Library Labs Symposium 2014, London, November 3rd 2014

start walking! ! ! select mode literature-on-the-go

…!under development at SACHI.

Summary

We are 10 months into the Palimpsest project.

The final outputs will be web-based visualisations and a mobile app. The aim is to create interfaces for literary scholars and the general public.

Big data to small data.

Assisted curation of literary works set in Edinburgh ensures that the final data content is of high precision and recall. A good example of digital humanities and interdisciplinary collaboration at work.

The fine-grained Edinburgh gazetteer and the Geoparser can be used for future research.

British Library Labs Symposium 2014, London, November 3rd 2014

LTG

Ongoing projects of the Edinburgh Language Technology Group:

Palimpsest (Mining Literary Edinburgh)

UK Connectivity (Analysis of social media for the British Council)

BotaniTours (Information aggregation and presentation of botanical points of interest in the Scottish Borders).

Trading Consequences (Text mining trends in commodity trading of large 19th century text collections).

New: Text mining brain scan reports for clinical neurologists.

British Library Labs Symposium 2014, London, November 3rd 2014

Thank You

!

LTG: www.ltg.ed.ac.uk

Palimpsest: palimpsest.blogs.edina.ac.uk

Twitter: @LitPalimpsest

Contact:

Beatrice Alex | balex@inf.ed.ac.uk

British Library Labs Symposium 2014, London, November 3rd 2014

top related