linked data and locah, uksg2011

66
How to Become a First Class Citizen of the Web Linked Data and the LOCAH project Jane Stevenson & Adrian Stevenson

Upload: jane-stevenson

Post on 28-Jan-2015

105 views

Category:

Education


1 download

DESCRIPTION

An introduction to Linked Data and to the Linked Open Copac and Archives Hub project.

TRANSCRIPT

Page 1: Linked Data and Locah, UKSG2011

How to Become a First Class Citizen of

the WebLinked Data and the LOCAH project

Jane Stevenson & Adrian Stevenson

Page 2: Linked Data and Locah, UKSG2011

Remit

This session will give a brief overview of the concepts behind Linked Data and will explain how we are applying these ideas to archival and bibliographic data.

Archives Hub: merged catalogue of archival descriptions from 200 institutions across the UK

Copac: merged catalogue of bibliographic records from libraries across the UK

Page 3: Linked Data and Locah, UKSG2011

Introduction

Page 4: Linked Data and Locah, UKSG2011

The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today.

[The creation of] a space where people and organizations can post and consume data about anything.

Bizer/Cyganiak/Heath Linked Data Tuturial, linkeddata.org

Page 5: Linked Data and Locah, UKSG2011

In essence, it marks a shift in thinking from publishing data in human readable HTML documents to machine readable documents. That means that machines can do a little more of the thinking work for us.

http://www.linkeddatatools.com/semantic-web-basics

Page 6: Linked Data and Locah, UKSG2011

Linked Data encourages open data, open licences and reuse.

…but Linked Data does not have to be open.

Page 7: Linked Data and Locah, UKSG2011

Core questions

Is it achievable?

Will it bring substantial benefits?

“It is the unexpected re-use of information which is the value added by the web”

Page 8: Linked Data and Locah, UKSG2011

What is Linked Data?4 ‘rules’ of for the web of data:

Use URIs as names for things

Use HTTP URIs so that people can look up those names.

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Page 9: Linked Data and Locah, UKSG2011

Giving Things identifiers

We can make statements about things and establish relationships by assigning identifiers to them.

Jane Stevenson = http://archiveshub.ac.uk/janefoaf.rdfManchester = http://dbpedia.org/resource/manchesterEnglish = http://lexvo.org/id/iso639-3/eng

Page 10: Linked Data and Locah, UKSG2011

URIs

Uniform Resource Identifiers (URIs) are identifiers for entities (people, places, subjects, records, institutions).

They identify resources, and ideally allow you to access representations of those resources.

Think not of locations, but of identifiers!

For Linked Data you use HTTP URIs

Jane Stevenson = http://archiveshub.ac.uk/janefoaf.rdfManchester = http://dbpedia.org/resource/manchesterEnglish = http://lexvo.org/id/iso639-3/eng

Page 11: Linked Data and Locah, UKSG2011

Entities and Relationships

Page 12: Linked Data and Locah, UKSG2011

Archival Resource

Repository

ProvidesAccessTo

Subject: Archival ResourcePredicate: AccessProvidedByObject: Repository

Subject > Predicate > Object

AccessProvidedBy

Triple statement

Page 13: Linked Data and Locah, UKSG2011

ArchivalResource: http://data.archiveshub.ac.uk/id/findingaid/gb-106-7esp

<accessProvidedBy>

Repository: http://data.archiveshub.ac.uk/id/repository/gb106

HTTP URIs

Archival Resource

Repository

accessProvidedBy

Page 14: Linked Data and Locah, UKSG2011

Archival Resource

Repository

Finding Aid

describedBy

heldAt

encodedAs EAD

document

Title

has

An RDF Graph

Page 15: Linked Data and Locah, UKSG2011

So...?

If something is identified, it can be linked to

We can then take items from one dataset and link them to items from other datasets

BBC

VIAF

DBPedia Archives Hub

Copac

GeoNames

Page 16: Linked Data and Locah, UKSG2011

BBC:CranfordVIAF:Dicken

s

DBPedia: Gaskell Hub:Gaske

ll

Copac:Cranford

Geonames:Manchester

DBPedia: Dickens Hub:Dickens

The Linking benefits of Linked Data

Page 17: Linked Data and Locah, UKSG2011

The Web of ‘Documents’

Global information space (for humans)

Document paradigm

Hyperlinks

Search engines index and infering relevance

Implicit relationships between documents

Lack of semantics

Page 18: Linked Data and Locah, UKSG2011

The Web of Linked Data

Global data space (for humans and machines)

Making connections between entities across domains (people, books, films, music, genes, medicines, health, statistics...)

LD is not about searching for specific documents or visiting particular websites, it is about things - identifying and connecting them.

Closely aligned to the general architecture of the Web

Page 19: Linked Data and Locah, UKSG2011

From one thing…to the same thing

<sameAs>

http://dbpedia.org/resource/manchester

http://sws.geonames.org/2643123

http://data.archiveshub.ac.uk/id/concept/ncarules/manchester

Are they the same?

Page 20: Linked Data and Locah, UKSG2011

Vocabularies & Ontologies

Page 21: Linked Data and Locah, UKSG2011

Vocabularies & Ontologies

Vocabulary: set of terms

Ontology: organisation of terms – hierarchy,

relationships

Page 22: Linked Data and Locah, UKSG2011

Two different databases: one for films one for actorsTo collaborate using their current databases, the owners of either site would have to decide on a common data format by which to share information that they could both understand by using a common film and actor unique ID scheme of their own invention.

Problems of data integration: information exchange across independently designed systems

Shared vocabularies

Page 23: Linked Data and Locah, UKSG2011

Need ‘film title’; ‘actor name’; ‘actor birthdate’, etc. to mean the same thing to each

Use the same vocabulary

Query both databases.No need for transformations, mappings, contracts

Page 24: Linked Data and Locah, UKSG2011

Vocabularies in Linked Data

Common vocabulary to describe the data, e.g. ‘film-title’ means the same thing

Adopt the same ontologies for expressing meaning

Use semantics to link data

Want to avoid transformation, mapping, contracts between data providers

Page 25: Linked Data and Locah, UKSG2011

Copac RDF

Hub RDF

DC

foaf

skos

HubDC

foaf

skos

Copac

bibo

dcterms:titledcterms:identifier

Shared use of vocabularies

Page 26: Linked Data and Locah, UKSG2011

Ontologies

Many widely used ontologies

Use others as far as possible

Use your own where necessary

Dublin CoreFriend of a Friend (FOAF)Simple Knowledge Organisation System (SKOS)BiboOpen Cyc

Page 27: Linked Data and Locah, UKSG2011

Linked Data on the Hub & Copac

Linked Open Copac and Archives Hub: Locah

JISC funded project

August 2010 – July 2011

MimasUKOLNEduserv

Page 28: Linked Data and Locah, UKSG2011

What is LOCAH doing?

Part 1: Exposing the Linked Data

Part 2: Creating a prototype visualisation

Part 3: Reporting on opportunities and barriers

Page 29: Linked Data and Locah, UKSG2011

How are we exposing the Data?

1. Model our ‘things’ into RDF

2. Transform the existing data into RDF/XML

3. Enhance the data

4. Load the RDF/XML into a triple store

5. Create Linked Data Views

6. Document the process, opportunities and barriers on LOCAH Blog

Page 30: Linked Data and Locah, UKSG2011

1. Modelling ‘things’ into RDF

Hub data in ‘Encoded Archival Description’ EAD XML form

Copac data in ‘Metadata Object Description Schema’ MODS XML form

Take a step back from the data formatThink about your ‘things’What is EAD document “saying” about “things in

the world”?What questions do we want to answer about

those “things”?

http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/

Page 31: Linked Data and Locah, UKSG2011

1. Modelling ‘things’ into RDF

Need to decide on patterns for URIs we generate

Following guidance from W3C ‘Cool URIs for the Semantic Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’

http://data.archiveshub.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI

… is HTTP 303 ‘See Other’ redirected to …

http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner document URI

… which is then content negotiated to …http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.htmlhttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.turtlehttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.json

http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector

Page 32: Linked Data and Locah, UKSG2011

1. Modelling ‘things’ into RDF

Using existing RDF vocabularies:DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE,

LODE, Event and Time Ontologies

Define additional RDF terms where required,hub:ArchivalResourcecopac:BibiographicResourcehub:maintenanceAgencycopac:Creator

It can be hard to know where to look for vocabs and ontologies

Decide on licence – CC BY-NC 2.0, CC0, ODC PDD

Page 33: Linked Data and Locah, UKSG2011

ArchivalResource

Finding Aid

EAD Document

Biographical

History

Agent

Family Person Place

Concept

Genre Function

Organisation

maintainedBy/maintains

origination

associatedWith

accessProvidedBy/providesAccessTo

topic/page

hasPart/partOf

hasPart/partOf

encodedAs/encodes

Repository(Agent)

Book

Place

topic/page

Language

Level

administeredBy/administers

hasBiogHist/isBiogHistFor

foaf:focus Is-a associatedWith

level

Is-a

language

ConceptScheme

inScheme

ObjectrepresentedBy

PostcodeUnit

Extent

Creation

Birth Death

extent

participates in

TemporalEntity

TemporalEntity

at time

at time

product of

in

Archives Hub Model (as at 14/2/2011)

Page 34: Linked Data and Locah, UKSG2011

Copac Model (as at November 2010)

Page 35: Linked Data and Locah, UKSG2011

Feedback Requested!

We would like feedback on the model

Appreciate this will be easier when the data available

Via blog http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-

some-extensions-to-the-hub-model/ http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-

data/

Via email, twitter, in person

Page 36: Linked Data and Locah, UKSG2011

2. Transforming in RDF/XML

Transform EAD and MODS to RDF/XML based on our models

Hub: created XSLT Stylesheet and used Saxon parserhttp://saxon.sourceforge.net/Saxon runs the XSLT against a set of

EAD files and creates a set of RDF/XML files

Copac: created in-house Java transformation program

Page 37: Linked Data and Locah, UKSG2011

3. Enhancing our data

Language - lexvo.org Time periods - reference.data.gov.uk Geolocation - UK Postcodes URIs and

Ordnance Survey URIs Names - Virtual International Authority File

Matches and links widely-used authority files - http://viaf.org/

Names (and subjects) - DBPediaSubjects - Library of Congress Subject

Headings

Page 38: Linked Data and Locah, UKSG2011

4. Load RDF/XML into triple store

Using the Talis Platform triple store

RDF/XML is HTTP POSTed

We’re using Pynappl Python client for the Talis Platformhttp://code.google.com/p/pynappl/

Store provides us with a SPARQL query interface

Page 39: Linked Data and Locah, UKSG2011

5. Create Linked Data Views

Expose ‘bounded’ descriptions from the triple store over the Web

Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)

Using Paget ‘Linked Data Publishing Framework’http://code.google.com/p/paget/PHP scripts query Sparql endpoint

Page 40: Linked Data and Locah, UKSG2011

http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner

Page 41: Linked Data and Locah, UKSG2011

http://data.archiveshub.ac.uk/

Page 42: Linked Data and Locah, UKSG2011

Can I access the Locah Linked Data?

Will be releasing the Hub data very soon!

Copac data will follow approx 1 month later

Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation

Page 43: Linked Data and Locah, UKSG2011

Reporting on opportunities and barriers

Locah Blog (tags: ‘opportunities’ ‘barriers’)

Feed into #JiscEXPO programme evidence gathering

More at: http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-

data-more-reflections-from-the-coal-face/ http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-

data

Page 44: Linked Data and Locah, UKSG2011

Creating the Visualisation Prototype

Based on researcher use cases

Data queried from Sparql endpoint

Use tools such as Simile, Many Eyes, Google Charts

For first Hub visualisation using Timemap – Googlemaps and Similehttp://code.google.com/p/timemap/

Page 45: Linked Data and Locah, UKSG2011

Visualisation Prototype Using Timemap –

Googlemaps and Simile

http://code.google.com/p/timemap/

Early stages with this

Will give location and ‘extent’ of archive.

Will link through to Archives Hub

Page 46: Linked Data and Locah, UKSG2011

Sir Ernest Henry Shackleton

Archives related to Shackleton:

VIAF URL: http://viaf.org/viaf/12338195/

Biographical History:Ernest Henry Shackleton was born on 15 February 1874 in Kilkea, Ireland, one of six children of Anglo-Irish parents. The family moved from their farm to Dublin, where his father, Henry studied medicine. On qualifying in 1884, Henry took up a practice in south London, and between 1887 and 1890, Ernest was educated at Dulwich College. On leaving school, he entered the merchant service, serving in the square-rigged ship Hoghton Tower until 1894 when he transferred to tramp steamers. In 1896, he qualified as first mate, and two years later, was certified as master, joining the Union Castle line in 1899. [more]

http://archiveshub.ac.uk/data/gb15sirernesthenryshackleton

Books related to Shackleton:

Page 47: Linked Data and Locah, UKSG2011

The challenges

Page 48: Linked Data and Locah, UKSG2011

The learning process

Model the data, not the description

The description is one of the entities

Understand the importance of URIs

Think about your world before others

…but external links are important

Try to get to grips with terminology

Page 49: Linked Data and Locah, UKSG2011

Names

6947115KNAPPFF Knapp associated with record 6947115/id/agent/6947115KNAPPF

<copac:isCreatorOf rdf:resource="http://data.copac.ac.uk/id/mods/6947115"/>

6957115KNAPPF 6947115

<isCreatorOf>

Page 50: Linked Data and Locah, UKSG2011

Index terms (names, subjects, places)

‘AssociatedWith’ as the relationship

Benefits of structured index terms

Use /person/ and /organisation/ in the URI

Distinguish /person/pilkington’ the person and /organisation/pilkington

Distinguish place/reading/ and subject/reading/

Page 51: Linked Data and Locah, UKSG2011

Problems with source data

EAD very permissive: whole range of finding aids

Copac more consistent but still wide variety

Hub EAD: We limited the tags we worked with

Large files (around 5Mb) tend to need splitting up

Page 52: Linked Data and Locah, UKSG2011

Duplication of data

“So statements which relate things in the two documents must be repeated in each. This clearly is against the first rule of data storage: don't store the same data in two different places: you will have problems keeping it consistent.” (T B-L www.w3.org/designissues/linkeddata.html)

Page 53: Linked Data and Locah, UKSG2011

Archival Inheritance

“Do not repeat information at a lower level of description that has already been given at a higher level.” ISAD(G)

Many elements do not apply to ‘child’ descriptions

Simple rule of inheritance not always appropriate

LD does assert hierarchical relationships but no requirement to follow these links

Page 54: Linked Data and Locah, UKSG2011

CopacLarger community: more potential

vocabularies/documentation/support/confusion/inconsistencies

Merged catalogues: a unique scenario

‘Creator’ and ‘Others’ (editor, authors, illustrator)

Learning from Hub / Doing what is appropriate

Usually not right or wrong answers

Page 55: Linked Data and Locah, UKSG2011

Copac model

Groundwork done with Archives Hub. Then had to decide what we wanted to say about the data

Challenges over what a ‘record’ is – ‘Bleak House’ from each contributor? or one merged record?

In many ways simpler than archival data; but also can decide to create a simpler model

Page 56: Linked Data and Locah, UKSG2011

Copac Model

Page 57: Linked Data and Locah, UKSG2011

Copac specification

Hard to start but proved to be very crucial

Very iterative process between spec and RDF output

Important to establish the structure of the spec (we used tabs for each ‘entity’)

Page 58: Linked Data and Locah, UKSG2011

Copac specification

Page 59: Linked Data and Locah, UKSG2011

Copac decisions

Where to create Copac URIs – copac:creatorcopac:contributorcopac:heldBy

When to create URIsTitle = literalPublication place = URI

How to deal with problematic/ambiguous dataDate? = productionDate

Page 60: Linked Data and Locah, UKSG2011

Issues

Page 61: Linked Data and Locah, UKSG2011

Risks

Can you rely on data sources long-term?

Persistence of persistent URIs?

New technologies

Investment of time – unsure of benefits

Licensing issues

Page 62: Linked Data and Locah, UKSG2011

Provenance

Track which data comes from our sources: URIs identify your entities

Linked Data tends towards disassembling

Copac/Hub as trusted sources…is DBPedia (for example) as reliable?

Contributors may want data to be identified

Issues around administrative/biographical history

Benefits of trust?

Users may want to know where data is from

Page 63: Linked Data and Locah, UKSG2011

Licensing

Nature of Linked Data: each triple as a piece of data

‘Ownership’ of data?

Data often already freely available (M2M interfaces)

Page 64: Linked Data and Locah, UKSG2011

Licensing

Public Domain Licences: simple, explicit, and permit widest possible reuse. Waive all rights to the data

BL, British National Bibiography uses public domain licence

Limit commercial uses?

Build in community norms: attribution, share alike - to reinforce desire for acknowledgement

Legal situation?

Page 65: Linked Data and Locah, UKSG2011

Thank You

Page 66: Linked Data and Locah, UKSG2011

Sections of this presentation adapted from materials created by other members of the LOCAH Project

This presentation available under creative commons Non Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/

Attribution and CC licence