the listening experience database
TRANSCRIPT
The Listening Experience Database
Alessandro AdamouKnowledge Media Institute, The Open University
@anticitizen79
Background
Research gap between leading strands of analysis of the musical experience (cognitive, commercial, critical) [1,2], widened during the Web of Data age.
Primary sources assumed to exist in significant quantities but:
• unstructured (or worse, not digitised) and/or
• unpublished and/or
• domain-biased (popular interest, phonographic era, social media)
LED consortium formed end 2012 to collate primary evidence of listening.
• £0.75m AHRC grant (2013-15)
• £0.98m AHRC grant (2016-19)[1] S. Burstyn. In quest of the period ear. Early Music, XXV(4):692-701, November 1997.
[2] R. C. Wegman. Music as heard: Listeners and listening in Late-Medieval and Early Modern Europe (1300-1600). Musical Quarterly, 82(3-4):432-433, 1998.
Crowdsourcing in databases
Obtaining data by soliciting contributions from a community.
examples:• Historic Cambridge Newspaper Collection
• Zooniverse (SETIlive, Old Weather etc.)
• Wikimedia Foundation projects
• setlist.fm, discogs.com
• UK Reading Experience Database
“Must a modern database really start up empty today?”
Inclusion protocols
• From any historical period and culture– current oldest entry is 11th Century AD
• Involving any musical genre
• Must be documented with a referenceable source– also unpublished, if obtained from an archival resource
– e.g. diaries, private correspondence, oral history, official
papers, (auto)biographies, social media
• No solicited criticism or fictional accounts
• No minimum standard for the level of detail in
describing the entities involved
• in English (primarily or officially translated)
LED-in-a-slidehttp://open.ac.uk/Arts/LED8227 individual listening experience / ~10k submissions
Evidence from published sources as well as manuscripts
Supervised crowdsourcing by experts and enthusiasts
Implemented using Linked Data, cross-domain data reuse
Faceted browsing, search, geographical browsing
~ 400,000 RDF triples
Shape of LED data - by region
Shape of LED data - by genre
“Art music” groups Classical (incl. contemporary Classical), Baroque, Romantic, Chamber music
Shape of LED data - by period
Native Linked Data implementation• All generated data are entirely stored as RDF triples
– Browsing, searching etc. directly on the quad store
• Multi-tenancy and crowdsourcing model with named graphs
• Modular ontology using Bibo, DC, Music Ontology, FOAF, Schema.org (and a little in-house)
• Data reuse and reconciliation with external sources is integrated with the whole lifecycle
• Flexibility of SPARQL query interface: not constrained by the facets offered by the Web portal.
External Linked datasets
● British National Bibliography http://bnb.data.bl.uk
○ Published works in the UK
● DBpedia http://dbpedia.org○ Geographical data, musical works, published works worldwide
● LinkedBrainz http://linkedbrainz.org○ Discontinued, but reengineering code has been made public
● data.gov.uk http://reference.data.gov.uk○ Exact time instants and intervals in the British calendar
● VIAF http://viaf.org○ for bridging alignments between BNB and DBpedia (mainly)
Dealing with vagueness• Under-represented or vague spatial data
– e.g. “at home in Haymarket”, “church in Italy”, “a trip from Vienna to London”
• Not fully qualified temporal instants or intervals– e.g. “April 3, the 1820s”, “late 18th Century”, “Summer 1938, at night”,
“sometime between [...]”
• Entities being described but not named– e.g. “British soldiers”, “Anglican Mass”, “mourners of Felix
Mendelssohn”, “Mrs. Britten”
• Unaligned semantics
– e.g. “Chords”, “Electric Guitar”, “Gibson Les Paul Sunburst”
– e.g. “King of England”, “Queen”, “Monarch”
Not the primary application of Linked Data, but the paradigm and founding semantics can be adapted (to an extent).
Spatial data extractionFree text input Stanbol + OpenNLP Curated Input RDF+GeoSPARQL
Apollo Theatre, gallery
“Apollo Theatre, gallery”
dbp:Apollo_TheaterManhattan
dbp:Apollo_TheatreCity of Westminster
led:place/12345 sg:sfIntersects dbp:Apollo_Theater ; rdfs:label “Apollo ...gallery”@en
Fuzziness in temporal data
Extended Date/Time Format (standard draft, Library of Congress, 2012)• Allows formalisation of underspecified points in time and
intervals– “187u-22-uu” means “sometime in Summer in the 1870’s”
• We extended it to support subjective intervals (e.g. early/mid/late, also for daytimes) and ranges (from-to)
• Made available in RDF for others to reuse, through data.open.ac.uk (currently only materialised data)
LED contributions to the LD cloud
Royal Carl Rosa Company – “Faust”for orchestra and voicedate: 14 May, 1917location: Garrick Theatre (indoors, private space)
Novel data: historical music performances
Novel data: portions and quotes of document sources / manuscripts (not modelled in BNB)
Journeying boy : the diaries of the young Benjamin Britten 1928-1938
Diary entries:• Page 17, Feb 14 1929: “Still absent from school work. Everso much more […]”• Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]”• Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear
Gramophone records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov. 1. (Rubenstein) Tchaik.”
• …
LED contributions to the LD cloud
Refined data: biographical enhancements
Refined data: semantic alignments between DBpedia, BNB and MusicBrainz
dbpedia:Aaron_Coplanddbpedia:Jane_Austen
≡≡
mb:aad3af83-5b59-4b86-a569-1a8409149b09#_bnb:AustenJane1775-1817
Mary Somerville
Full name: Mary Fairfax Greig SomervilleSocial group: Rulers, chiefs, aristocracy & gentry etc.Occupation: ScientistReligion: Christian, Protestantwrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
Figures on reuse
Type Unique instances Total reuse Peak
People 8186 31869 1479
Written works 425 7474 431
Geographical locations 1410 8470 1061
Musical works (songs, albums) 6790 4241 46
Musical genres 343 7104 1195
Computed on 8227 distinct listening experiences
Source Reused distinct instances
DBpedia 2596
BNB 553
data.gov.uk 3278
MusicBrainz 1203
from external data sources
Ongoing work
• Text mining of listening evidence (e.g. most commonly used terms for describing listening for specific periods or genres).
• Analytics on structured data (community detection/clustering)
• Detection of listening experiences through Web crawling or hooking into the user experience
• Controlled vocabularies (e.g. HISCO for historical occupations)
• Linked Data Fragments for facilitating reuse (under investigation)
Further Reading
about LED:
Brown, S., Barlow, H., Adamou, A. and d'Aquin, M. (2015). The Listening Experience Database Project: Collating the Responses of the "Ordinary Listener" to Prompt New Insights into Musical Experience, The International Journal of the Humanities: Annual Review, 13, p. 17-32, CGPublisher
Brown, S., Adamou, A., Barlow, H. and d'Aquin, M. (2014). Building listening experience Linked Data through crowd-sourcing and reuse of library data, Proceedings of the 1st International Workshop on Digital Libraries for Musicology, p. 1-8, ACM
related:
Hyvönen, E. (2012). Publishing and Using Cultural Heritage Linked Data on the Semantic Web,Morgan & Claypool
Lewis, D. and Martin, T. (2015). Managing Vagueness with Fuzzy in Hierarchical Big Data. 2015 INNS Conference on Big Data, Vol. 53, p. 19-28, Elsevier
Thank you - QA time
Alessandro AdamouKnowledge Media Institute, The Open University
@anticitizen79