lifting the lid on linked data

56
Linked Data and the LOCAH project Jane Stevenson & Adrian Stevenson

Upload: jane-stevenson

Post on 13-Jan-2015

10.957 views

Category:

Education


2 download

DESCRIPTION

Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/

TRANSCRIPT

Page 1: Lifting the Lid on Linked Data

Linked Data and the LOCAH project

Jane Stevenson & Adrian Stevenson

Page 2: Lifting the Lid on Linked Data

Linked Data on the Hub & Copac

Linked Open Copac and Archives Hub: Locah

JISC funded project

August 2010 – July 2011

MimasUKOLNEduserv

Page 3: Lifting the Lid on Linked Data

The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today.

It is a space where people and organizations can post and consume data about anything.

Bizer/Cyganiak/Heath Linked Data Tuturial, linkeddata.org

Page 4: Lifting the Lid on Linked Data

Core questions

Is it achievable?

Will it bring substantial benefits?

“It is the unexpected re-use of information which is the value added by the web”

Page 5: Lifting the Lid on Linked Data

What is Linked Data?

4 ‘rules’ of for the web of data:

Use URIs as names for things

Use HTTP URIs so that people can look up those names.

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Page 6: Lifting the Lid on Linked Data

Use URIs as Names

We can make statements about things and establish relationships by assigning identifiers to them.

Uniform Resource Identifiers (URIs) are identifiers for entities (people, places, subjects, records, institutions).

They identify resources, and ideally allow you to access representations of those resources.

author = http://archiveshub.ac.uk/janefoaf.rdfbook = http://dbpedia.org/resource/manchestersubject = English = http://lexvo.org/id/iso639-3/eng

Page 7: Lifting the Lid on Linked Data

Entities and Relationships

Page 8: Lifting the Lid on Linked Data

Bibliographic Resource

Library

ProvidesAccessToProvidesAccessTo

Subject: Bibliographic ResourcePredicate: AccessProvidedByObject: Library

Subject > Predicate > Object

AccessProvidedByAccessProvidedBy

Triple statement

Page 9: Lifting the Lid on Linked Data

Bibliographic Resource

Library

Bibliographic Record

describedBy

describedBy

heldAtheldAt

encodedAs

encodedAs

MODS document

Title

hashas

An RDF Graph

Page 10: Lifting the Lid on Linked Data

So...?

If something is identified, it can be linked to

We can then take items from one dataset and link them to items from other datasets

BBC

VIAF

DBPedia Archives Hub

Copac

GeoNames

Page 11: Lifting the Lid on Linked Data

BBC:Cranford

VIAF:Gaskell

DBPedia: Gaskell Hub:Gaske

ll

Copac:Cranford

Geonames:Manchester

DBPedia: Dickens

Hub:Dickens

The Linking benefits of Linked Data

Page 12: Lifting the Lid on Linked Data

The Web of ‘Documents’

Global information space (for humans)

Document paradigm

Hyperlinks

Search engines index and infering relevance

Implicit relationships between documents

Lack of semantics

Page 13: Lifting the Lid on Linked Data

The Web of Linked Data

Global data space (for humans and machines)

Making connections between entities across domains (people, books, films, music, genes, medicines, health, statistics...)

LD is not about searching for specific documents or visiting particular websites, it is about things - identifying and connecting them.

Page 14: Lifting the Lid on Linked Data

Copac model

Groundwork done with Archives Hub. Then had to decide what we wanted to say about the data

Challenges over what a ‘record’ is – ‘Bleak House’ from each contributor? or one merged record?

In many ways simpler than archival data; but also can decide to create a simpler model

Page 15: Lifting the Lid on Linked Data

Copac Model (as at November 2010)

Page 16: Lifting the Lid on Linked Data

Copac specification

Model = entities and relationships

Specification = means to specify these more exactly – programmer can create transform script

Iterative process – model – spec – RDF output

Page 17: Lifting the Lid on Linked Data

Cardinality Property URI/literal

1 1 dct:title literal

0 1 dct:extent literal

0 m bibo:isbn literal

0 m bibo:issn literal

0 m bibo:note literal

0 m dct:alternative literal

0 m copac:uniformtitle literal

Node name MODS field Ontology

BibliographicResource

<modscollection> bibo

Page 18: Lifting the Lid on Linked Data

Node name MODS field Ontology

BibliographicResource

<modscollection> bibo

cardinality property URI/literal ontology

0 1 copac:creator Creator URI dc

0 m copac:contributor Contributor URI coapc

0 1 event:producedIn Production Date URI event

0 1 dct:issued Production Date URI dc

0 m pode:publicationPlace Place URI pode

0 m isbd:P1016 Place URI isbd

0 m dct:publisher Publisher URI dc

0 1 dct:isPartOf Series URI dc

1 m copac:HeldBy Institution URI with Institution as subject

1 1 bibo:type Type URI bibo

0 m dct:subject Subject URI dc

0 m skos:subject subject URI skos

0 m dct:language Language URI dc

1 1 hub:encodedAs mods URI hub

Page 19: Lifting the Lid on Linked Data

cardinality property URI/literal URI

1 1 rdf:type URIs http://purl.org/dc/terms/Agenthttp://xmlns.com/foaf/0.1/Agent1 1 rdfs:label literal {namePart}

1 1 skos:prefLabel literal {namePart}

1 1 isCreatorOfBibliographic Resource URI

root/id/bibliographicresource/{recordIdentifer}

Node name MODS fieldURI namespace uri pattern

Creator

<name> <namePart></namePart>where <roleTerm>creator</roleTerm> copac

root/id/agent/{BibID}{namePart}

Page 20: Lifting the Lid on Linked Data

Aggregated Data

Page 21: Lifting the Lid on Linked Data

Aggregated dataCopac MODS record = an aggregated book

record

e.g. ‘Bleak House’ held at 10 different libraries

Copac ‘merges’ the descriptions from 8 of them

2 are not consistent with the rest, so they remain as stand-alone descriptions

End result: have 3 records for ‘Bleak House’

Not talking about ‘a book’

Page 22: Lifting the Lid on Linked Data

Copac decisions

Vocabularies:dcterms:creatordcterms:contributorcopac:heldBy

When to create URIsTitle = literalPublication place = URI

How to deal with problematic/ambiguous dataDate? = productionDate

Page 23: Lifting the Lid on Linked Data

‘Creator’

Copac ‘creator’ = author or editor

<copac:creator> <dcterms:creator> <biblioResource>

6957115KNAPPF 6947115

<isCreatorOf>

• Alternative name = dct:alternative• Uniform name = copac:uniform

Page 24: Lifting the Lid on Linked Data

‘Contributor’Contributor = editor, illustrator, translator

Cannot specify role – has to be general

<dcterms:contributor>

Page 25: Lifting the Lid on Linked Data

RDF Process

Page 26: Lifting the Lid on Linked Data

What is LOCAH doing?

Part 1: Exposing the Linked Data

Part 2: Creating a prototype visualisation

Part 3: Reporting on opportunities and barriers

Page 27: Lifting the Lid on Linked Data

How are we exposing the Data?

1. Model our ‘things’ into RDF

2. Transform the existing data into RDF/XML

3. Enhance the data

4. Load the RDF/XML into a triple store

5. Create Linked Data Views

6. Document the process, opportunities and barriers on LOCAH Blog

Page 28: Lifting the Lid on Linked Data

1. Modelling ‘things’ into RDF

Hub data in ‘Encoded Archival Description’ EAD XML form

Copac data in ‘Metadata Object Description Schema’ MODS XML form

Take a step back from the data formatThink about your ‘things’What is EAD document “saying” about “things in

the world”?What questions do we want to answer about

those “things”?

http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/

Page 29: Lifting the Lid on Linked Data

1. Modelling ‘things’ into RDF

Need to decide on patterns for URIs we generate

Following guidance from W3C ‘Cool URIs for the Semantic Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’

http://data.archiveshub.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI

… is HTTP 303 ‘See Other’ redirected to …

http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner

document URI

… which is then content negotiated to …http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.htmlhttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.turtlehttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.json

http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector

Page 30: Lifting the Lid on Linked Data

1. Modelling ‘things’ into RDF

Using existing RDF vocabularies:DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE,

LODE, Event and Time Ontologies

Define additional RDF terms where required,copac:BibiographicResourcecopac:Creator

It can be hard to know where to look for vocabs and ontologies

Decide on licence – CC BY-NC 2.0, CC0, ODC PDD

Page 31: Lifting the Lid on Linked Data

Vocabularies in Linked Data

Common vocabularies to describe the data, e.g. ‘title’ ‘author’ ‘contributor’ mean the same thing

Adopt the same vocabularies for expressing meaning

Use semantics to link data

Want to avoid transformation, mapping, contracts between data providers

Page 32: Lifting the Lid on Linked Data

Commonly used vocabularies (ones we’ve used in bold)

Friend-of-a-Friend (FOAF), vocabulary for describing people.

Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft.

Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities.

Description of a Project (DOAP), vocabulary for describing projects.

Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.

Music Ontology provides terms for describing artists, albums and tracks.

Review Vocabulary, vocabulary for representing reviews.

Creative Commons (CC), vocabulary for describing license terms.

Bibo, vocabulary for bibliographic data

Page 33: Lifting the Lid on Linked Data

Copac RDF

Copac RDF

Hub RDF

DC

foaf

skos

HubDCDC

foaffoaf

skosskos

CopacCopac

bibobibo

dcterms:titledcterms:identifier

Shared use of vocabularies

Page 34: Lifting the Lid on Linked Data

2. Transforming in RDF/XML

Transform EAD and MODS to RDF/XML based on our models

Hub: created XSLT Stylesheet and used Saxon parserhttp://saxon.sourceforge.net/Saxon runs the XSLT against a set of EAD

files and creates a set of RDF/XML files

Copac: created in-house Java transformation program

Page 35: Lifting the Lid on Linked Data

3. Enhancing our data

Language - lexvo.org Time periods - reference.data.gov.uk Geolocation - UK Postcodes URIs and

Ordnance Survey URIs Names - Virtual International Authority

FileMatches and links widely-used authority

files - http://viaf.org/Names (and subjects) - DBPediaSubjects - Library of Congress Subject

Headings

Page 36: Lifting the Lid on Linked Data

Use of ‘SameAs’

<sameAs>

Estelle Sylvia Pankhurst, 1882-1960: http://archiveshub.ac.uk/data/gb-106-7esphttp://viaf.org/viaf/51731588/

John William Bradley, fl. 1874:http://archiveshub.ac.uk/data/gb0096ms415 http://viaf.org/viaf/61047183/

Page 37: Lifting the Lid on Linked Data
Page 38: Lifting the Lid on Linked Data
Page 39: Lifting the Lid on Linked Data
Page 40: Lifting the Lid on Linked Data

4. Load RDF/XML into triple store

Using the Talis Platform triple store

RDF/XML is HTTP POSTed

We’re using Pynappl Python client for the Talis Platformhttp://code.google.com/p/pynappl/

Store provides us with a SPARQL query interface

Page 41: Lifting the Lid on Linked Data

5. Create Linked Data Views

Expose ‘bounded’ descriptions from the triple store over the Web

Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)

Using Paget ‘Linked Data Publishing Framework’http://code.google.com/p/paget/PHP scripts query Sparql endpoint

Page 42: Lifting the Lid on Linked Data

http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner

Page 43: Lifting the Lid on Linked Data

http://data.archiveshub.ac.uk/

Page 44: Lifting the Lid on Linked Data

Accessing the Locah Linked Data

Hub data released

Copac data release imminent

Include Linked Data views, Sparql endpoint details, example queries and supporting documentation

Page 45: Lifting the Lid on Linked Data

Reporting on opportunities and barriers

Locah Blog (tags: ‘opportunities’ ‘barriers’)

Feed into #JiscEXPO programme evidence gathering

More at: http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-

data-more-reflections-from-the-coal-face/ http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-

data

Page 46: Lifting the Lid on Linked Data

Feedback Requested!

We would like feedback on the project

Via blog http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-

some-extensions-to-the-hub-model/ http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-

data/

Via email, twitter, in person

Page 47: Lifting the Lid on Linked Data

Creating a Visualisation Prototype

Currently working on Hub visualisation

Data queried from Sparql endpoint

Use tools such as Simile, Many Eyes, Google Charts

Timemap visualisation Googlemaps and Similehttp://code.google.com/p/timemap/

Page 48: Lifting the Lid on Linked Data

Visualisation Prototype Using Timemap –

Googlemaps and Simile

http://code.google.com/p/timemap/

Early stages with this

Will give location and ‘extent’ of archive.

Will link through to Archives Hub

Page 49: Lifting the Lid on Linked Data
Page 50: Lifting the Lid on Linked Data

http://socialarchive.iath.virginia.edu/prototype.html

Page 51: Lifting the Lid on Linked Data

The learning process

Model the data, not the description

The description is one of the entities

Understand the importance of URIs

Think about your world before others

…but external links are important

Try to get to grips with terminology

Be prepared for unexpected surprises!

Page 52: Lifting the Lid on Linked Data

Risks

Can you rely on data sources long-term?

Persistence of persistent URIs?

New technologies

Investment of time – unsure of benefits

Licensing issues

Page 53: Lifting the Lid on Linked Data

Licensing

Nature of Linked Data: each triple as a piece of data

‘Ownership’ of data?

Data often already freely available (M2M interfaces)

Page 54: Lifting the Lid on Linked Data

Licensing

Public Domain Licences: simple, explicit, and permit widest possible reuse. Waive all rights to the data

BL, British National Bibiography uses public domain licence

Limit commercial uses?

Build in community norms: attribution, share alike - to reinforce desire for acknowledgement

Legal situation?

Page 55: Lifting the Lid on Linked Data

Thank You

Page 56: Lifting the Lid on Linked Data

Sections of this presentation adapted from materials created by other members of the LOCAH Project

This presentation available under creative commons Non Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/

Attribution and CC licence