lifting the lid on linked data
DESCRIPTION
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/TRANSCRIPT
Linked Data and the LOCAH project
Jane Stevenson & Adrian Stevenson
Linked Data on the Hub & Copac
Linked Open Copac and Archives Hub: Locah
JISC funded project
August 2010 – July 2011
MimasUKOLNEduserv
The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today.
It is a space where people and organizations can post and consume data about anything.
Bizer/Cyganiak/Heath Linked Data Tuturial, linkeddata.org
Core questions
Is it achievable?
Will it bring substantial benefits?
“It is the unexpected re-use of information which is the value added by the web”
What is Linked Data?
4 ‘rules’ of for the web of data:
Use URIs as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Include links to other URIs. so that they can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
Use URIs as Names
We can make statements about things and establish relationships by assigning identifiers to them.
Uniform Resource Identifiers (URIs) are identifiers for entities (people, places, subjects, records, institutions).
They identify resources, and ideally allow you to access representations of those resources.
author = http://archiveshub.ac.uk/janefoaf.rdfbook = http://dbpedia.org/resource/manchestersubject = English = http://lexvo.org/id/iso639-3/eng
Entities and Relationships
Bibliographic Resource
Library
ProvidesAccessToProvidesAccessTo
Subject: Bibliographic ResourcePredicate: AccessProvidedByObject: Library
Subject > Predicate > Object
AccessProvidedByAccessProvidedBy
Triple statement
Bibliographic Resource
Library
Bibliographic Record
describedBy
describedBy
heldAtheldAt
encodedAs
encodedAs
MODS document
Title
hashas
An RDF Graph
So...?
If something is identified, it can be linked to
We can then take items from one dataset and link them to items from other datasets
BBC
VIAF
DBPedia Archives Hub
Copac
GeoNames
BBC:Cranford
VIAF:Gaskell
DBPedia: Gaskell Hub:Gaske
ll
Copac:Cranford
Geonames:Manchester
DBPedia: Dickens
Hub:Dickens
The Linking benefits of Linked Data
The Web of ‘Documents’
Global information space (for humans)
Document paradigm
Hyperlinks
Search engines index and infering relevance
Implicit relationships between documents
Lack of semantics
The Web of Linked Data
Global data space (for humans and machines)
Making connections between entities across domains (people, books, films, music, genes, medicines, health, statistics...)
LD is not about searching for specific documents or visiting particular websites, it is about things - identifying and connecting them.
Copac model
Groundwork done with Archives Hub. Then had to decide what we wanted to say about the data
Challenges over what a ‘record’ is – ‘Bleak House’ from each contributor? or one merged record?
In many ways simpler than archival data; but also can decide to create a simpler model
Copac Model (as at November 2010)
Copac specification
Model = entities and relationships
Specification = means to specify these more exactly – programmer can create transform script
Iterative process – model – spec – RDF output
Cardinality Property URI/literal
1 1 dct:title literal
0 1 dct:extent literal
0 m bibo:isbn literal
0 m bibo:issn literal
0 m bibo:note literal
0 m dct:alternative literal
0 m copac:uniformtitle literal
Node name MODS field Ontology
BibliographicResource
<modscollection> bibo
Node name MODS field Ontology
BibliographicResource
<modscollection> bibo
cardinality property URI/literal ontology
0 1 copac:creator Creator URI dc
0 m copac:contributor Contributor URI coapc
0 1 event:producedIn Production Date URI event
0 1 dct:issued Production Date URI dc
0 m pode:publicationPlace Place URI pode
0 m isbd:P1016 Place URI isbd
0 m dct:publisher Publisher URI dc
0 1 dct:isPartOf Series URI dc
1 m copac:HeldBy Institution URI with Institution as subject
1 1 bibo:type Type URI bibo
0 m dct:subject Subject URI dc
0 m skos:subject subject URI skos
0 m dct:language Language URI dc
1 1 hub:encodedAs mods URI hub
cardinality property URI/literal URI
1 1 rdf:type URIs http://purl.org/dc/terms/Agenthttp://xmlns.com/foaf/0.1/Agent1 1 rdfs:label literal {namePart}
1 1 skos:prefLabel literal {namePart}
1 1 isCreatorOfBibliographic Resource URI
root/id/bibliographicresource/{recordIdentifer}
Node name MODS fieldURI namespace uri pattern
Creator
<name> <namePart></namePart>where <roleTerm>creator</roleTerm> copac
root/id/agent/{BibID}{namePart}
Aggregated Data
Aggregated dataCopac MODS record = an aggregated book
record
e.g. ‘Bleak House’ held at 10 different libraries
Copac ‘merges’ the descriptions from 8 of them
2 are not consistent with the rest, so they remain as stand-alone descriptions
End result: have 3 records for ‘Bleak House’
Not talking about ‘a book’
Copac decisions
Vocabularies:dcterms:creatordcterms:contributorcopac:heldBy
When to create URIsTitle = literalPublication place = URI
How to deal with problematic/ambiguous dataDate? = productionDate
‘Creator’
Copac ‘creator’ = author or editor
<copac:creator> <dcterms:creator> <biblioResource>
6957115KNAPPF 6947115
<isCreatorOf>
• Alternative name = dct:alternative• Uniform name = copac:uniform
‘Contributor’Contributor = editor, illustrator, translator
Cannot specify role – has to be general
<dcterms:contributor>
RDF Process
What is LOCAH doing?
Part 1: Exposing the Linked Data
Part 2: Creating a prototype visualisation
Part 3: Reporting on opportunities and barriers
How are we exposing the Data?
1. Model our ‘things’ into RDF
2. Transform the existing data into RDF/XML
3. Enhance the data
4. Load the RDF/XML into a triple store
5. Create Linked Data Views
6. Document the process, opportunities and barriers on LOCAH Blog
1. Modelling ‘things’ into RDF
Hub data in ‘Encoded Archival Description’ EAD XML form
Copac data in ‘Metadata Object Description Schema’ MODS XML form
Take a step back from the data formatThink about your ‘things’What is EAD document “saying” about “things in
the world”?What questions do we want to answer about
those “things”?
http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/
1. Modelling ‘things’ into RDF
Need to decide on patterns for URIs we generate
Following guidance from W3C ‘Cool URIs for the Semantic Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’
http://data.archiveshub.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI
… is HTTP 303 ‘See Other’ redirected to …
http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner
document URI
… which is then content negotiated to …http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.htmlhttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.turtlehttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.json
http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
1. Modelling ‘things’ into RDF
Using existing RDF vocabularies:DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE,
LODE, Event and Time Ontologies
Define additional RDF terms where required,copac:BibiographicResourcecopac:Creator
It can be hard to know where to look for vocabs and ontologies
Decide on licence – CC BY-NC 2.0, CC0, ODC PDD
Vocabularies in Linked Data
Common vocabularies to describe the data, e.g. ‘title’ ‘author’ ‘contributor’ mean the same thing
Adopt the same vocabularies for expressing meaning
Use semantics to link data
Want to avoid transformation, mapping, contracts between data providers
Commonly used vocabularies (ones we’ve used in bold)
Friend-of-a-Friend (FOAF), vocabulary for describing people.
Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft.
Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities.
Description of a Project (DOAP), vocabulary for describing projects.
Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.
Music Ontology provides terms for describing artists, albums and tracks.
Review Vocabulary, vocabulary for representing reviews.
Creative Commons (CC), vocabulary for describing license terms.
Bibo, vocabulary for bibliographic data
Copac RDF
Copac RDF
Hub RDF
DC
foaf
skos
HubDCDC
foaffoaf
skosskos
CopacCopac
bibobibo
dcterms:titledcterms:identifier
Shared use of vocabularies
2. Transforming in RDF/XML
Transform EAD and MODS to RDF/XML based on our models
Hub: created XSLT Stylesheet and used Saxon parserhttp://saxon.sourceforge.net/Saxon runs the XSLT against a set of EAD
files and creates a set of RDF/XML files
Copac: created in-house Java transformation program
3. Enhancing our data
Language - lexvo.org Time periods - reference.data.gov.uk Geolocation - UK Postcodes URIs and
Ordnance Survey URIs Names - Virtual International Authority
FileMatches and links widely-used authority
files - http://viaf.org/Names (and subjects) - DBPediaSubjects - Library of Congress Subject
Headings
Use of ‘SameAs’
<sameAs>
Estelle Sylvia Pankhurst, 1882-1960: http://archiveshub.ac.uk/data/gb-106-7esphttp://viaf.org/viaf/51731588/
John William Bradley, fl. 1874:http://archiveshub.ac.uk/data/gb0096ms415 http://viaf.org/viaf/61047183/
4. Load RDF/XML into triple store
Using the Talis Platform triple store
RDF/XML is HTTP POSTed
We’re using Pynappl Python client for the Talis Platformhttp://code.google.com/p/pynappl/
Store provides us with a SPARQL query interface
5. Create Linked Data Views
Expose ‘bounded’ descriptions from the triple store over the Web
Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)
Using Paget ‘Linked Data Publishing Framework’http://code.google.com/p/paget/PHP scripts query Sparql endpoint
http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner
http://data.archiveshub.ac.uk/
Accessing the Locah Linked Data
Hub data released
Copac data release imminent
Include Linked Data views, Sparql endpoint details, example queries and supporting documentation
Reporting on opportunities and barriers
Locah Blog (tags: ‘opportunities’ ‘barriers’)
Feed into #JiscEXPO programme evidence gathering
More at: http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-
data-more-reflections-from-the-coal-face/ http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-
data
Feedback Requested!
We would like feedback on the project
Via blog http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-
some-extensions-to-the-hub-model/ http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-
data/
Via email, twitter, in person
Creating a Visualisation Prototype
Currently working on Hub visualisation
Data queried from Sparql endpoint
Use tools such as Simile, Many Eyes, Google Charts
Timemap visualisation Googlemaps and Similehttp://code.google.com/p/timemap/
Visualisation Prototype Using Timemap –
Googlemaps and Simile
http://code.google.com/p/timemap/
Early stages with this
Will give location and ‘extent’ of archive.
Will link through to Archives Hub
http://socialarchive.iath.virginia.edu/prototype.html
The learning process
Model the data, not the description
The description is one of the entities
Understand the importance of URIs
Think about your world before others
…but external links are important
Try to get to grips with terminology
Be prepared for unexpected surprises!
Risks
Can you rely on data sources long-term?
Persistence of persistent URIs?
New technologies
Investment of time – unsure of benefits
Licensing issues
Licensing
Nature of Linked Data: each triple as a piece of data
‘Ownership’ of data?
Data often already freely available (M2M interfaces)
Licensing
Public Domain Licences: simple, explicit, and permit widest possible reuse. Waive all rights to the data
BL, British National Bibiography uses public domain licence
Limit commercial uses?
Build in community norms: attribution, share alike - to reinforce desire for acknowledgement
Legal situation?
Thank You
Sections of this presentation adapted from materials created by other members of the LOCAH Project
This presentation available under creative commons Non Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/
Attribution and CC licence