intertwingularity, semantic web and linked geo data

32
Dan Brickley <[email protected] > ‘Semantic Web and linked Geo data’ Geonovum workshop, Wageningen, 2010-10-12 Tuesday, 2 November 2010

Upload: dan-brickley

Post on 19-Jan-2015

3.977 views

Category:

Education


2 download

DESCRIPTION

A talk given at a GeoNovum workshop in the Netherlands,

TRANSCRIPT

Page 1: Intertwingularity, Semantic Web and linked Geo data

Dan Brickley <[email protected]>

‘Semantic Web and linked Geo data’

Geonovum workshop, Wageningen, 2010-10-12

Tuesday, 2 November 2010

Page 2: Intertwingularity, Semantic Web and linked Geo data

Overview

• historical origins of the Semantic Web initiative

• example of SPARQL querying ‘Linked Data’

• some conclusions and suggestions

A brief introduction to Semantic Web data sharing, focussing on underlying principles.

Tuesday, 2 November 2010

Page 3: Intertwingularity, Semantic Web and linked Geo data

Part 1: RDF & history

Tuesday, 2 November 2010

Page 4: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 5: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 6: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 7: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 8: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 9: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 10: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 11: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 12: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 13: Intertwingularity, Semantic Web and linked Geo data

Part 2: SemWeb today

• lessons: no global consistency; Web pages that make claims; inter-twingularity...

• what does this mean for modern RDF tools?

• how can we share and link data in the Web, in practice?

Tuesday, 2 November 2010

Page 14: Intertwingularity, Semantic Web and linked Geo data

over 24.7 billion triplesover 436 million links between datasets

Tuesday, 2 November 2010

Page 15: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 16: Intertwingularity, Semantic Web and linked Geo data

Tuesday, 2 November 2010

Page 17: Intertwingularity, Semantic Web and linked Geo data

USA

UK

Tuesday, 2 November 2010

Page 18: Intertwingularity, Semantic Web and linked Geo data

Linked Data guidelines

• 1. Use URIs as names for things (eg. schools!)

• 2. Use HTTP URIs to allow people to get info.

• 3. Publish useful info there (eg. using RDF).

• 4. Include links to other URIs in your data.

see: http://www.w3.org/DesignIssues/LinkedData.htmlTuesday, 2 November 2010

Page 19: Intertwingularity, Semantic Web and linked Geo data

RDF/SPARQL example

“Q: Which schools in the BANES area have a nursery?”

prefix sch-ont: <http://education.data.gov.uk/def/school/>prefix xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?name WHERE { ?school a sch-ont:School; sch-ont:establishmentName ?name; sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/00HA> ; sch-ont:nurseryProvision "true"^^xsd:boolean}

ORDER BY ?name

examples by Leigh Dodds, Talis: http://blogs.talis.com/n2/archives/818

Tuesday, 2 November 2010

Page 20: Intertwingularity, Semantic Web and linked Geo data

In RDF “nodes and arcs”:

Tuesday, 2 November 2010

Page 21: Intertwingularity, Semantic Web and linked Geo data

Fosse Way School, Fosseway Infant School, Keynsham Primary School, King Edward's School, Midsomer Norton Primary School, Monkton Prep School, Peasedown St John Primary School, Royal High School, Southdown Community Infant School, St Andrew's CofE Primary School, St Keyna Primary School, St Martin's Garden Primary School, St Saviour's CofE Infant School, The Paragon School, Junior School of Prior Park, College Trinity Coe VC Primary, Twerton Infant School...

(according to the SPARQL RDF database at http://services.data.gov.uk/education/sparql )

Answer:

Tuesday, 2 November 2010

Page 22: Intertwingularity, Semantic Web and linked Geo data

RDF/XML at http://statistics.data.gov.uk/id/local-authority-district/00HA ...

Tuesday, 2 November 2010

Page 23: Intertwingularity, Semantic Web and linked Geo data

More SPARQL-able queries from UK linked data :

Select the name, lowest and highest age ranges, capacity and pupil:teacher ratio for all schools in the Bath & North East Somerset district.

What is the uri, name, and opening date of the oldest school in the UK?

Select the name, easting and northing for the 100 newest schools in the UK.

Select the uri, name, and the reason for closing for all schools that are currently scheduled for closure. The reason is a URI from a controlled vocabulary in the ontology.

In which parliamentary constituencies did schools open in 2008?

examples by Leigh Dodds, Talis: http://blogs.talis.com/n2/archives/818

Tuesday, 2 November 2010

Page 24: Intertwingularity, Semantic Web and linked Geo data

Lessons from part 1

• no global consistency: RDF and SPARQL allow for contradictory, competing data

• semantics: RDF/XML, RDFa, GRDDL - several ways to get RDF statements from a document; several publishing models for RDF in your Web site.

• intertwingularity: “the interconnectedness of all things” as an engineering problem...

Tuesday, 2 November 2010

Page 25: Intertwingularity, Semantic Web and linked Geo data

‘Scope creep’

• “intertwingularity” is a silly name for a serious problem: scope creep

• Schema designers are under constant pressure to change, add, improve their designs. Problems are not tidily packaged.

• RDF is built to survive this: independent schemas and datasets can be freely mixed together, without always ‘asking permission’.

Tuesday, 2 November 2010

Page 26: Intertwingularity, Semantic Web and linked Geo data

In practice

• Each school could have an HTML/RDFa page (or RDF/XML too)

• Datasets that distinguish institution from location might publish one set of RDF; others that flatten these aspects together can do likewise with their data.

• Cross-dataset consistency comes later, if at all.

Tuesday, 2 November 2010

Page 27: Intertwingularity, Semantic Web and linked Geo data

Problems don't come nicely scoped and packaged into cleanly distinct domains. Whenever you try to solve one problem, it borders on a dozen others that are a higher priority for people elsewhere.

You think you're working with 'events' data but find yourself with information describing musicians; you think you're describing musicians, but find yourself describing digital images; you think you're describing digital images, but find yourself describing geographic locations; you think you're building a database of geographic locations, and find yourself modeling the opening hours of the businesses based at those locations.

To a poet or idealist, these interconnections might be beautiful or inspiring; to a project manager or product manager, they are as likely to be terrifying.

By dropping in identifiers that link to a big pile of other people's data, we can hopefully make it easier to keep projects nicely scoped without needlessly restricting future functionality.

An events database can remain an events database, but use identifiers for artists and performers, making it possible to filter events by properties of those participants. A database of places can be only a link or two away from records describing the opening hours or business offerings of the things at those places.

Tuesday, 2 November 2010

Page 28: Intertwingularity, Semantic Web and linked Geo data

“Pay as you go” integration

• there is no single “right” ontology

• data can be mixed and merged ad-hoc

• relations like owl:sameAs, skos:closeMatch can be used to interlink datasets later

• common models emerge from bottom up, “pave the cowpaths...”

*

* analogy by Richard CyganiakTuesday, 2 November 2010

Page 29: Intertwingularity, Semantic Web and linked Geo data

Geo questions• Can GML, KML etc be handled in RDF?

• yes, either as links, textual ‘islands’ or some RDF systems have extensions to support spatial queries within SPARQL.

• Which geo-related ontology to use?

• several exist, simple and complex. It depends.

• Is it better to use a common ontology, or capture our data exactly in a custom one?

• you can do both and let others decide.

Tuesday, 2 November 2010

Page 30: Intertwingularity, Semantic Web and linked Geo data

Suggestions

• Build a Linked Data test-bed with several datasets whose coverage overlaps in scope

• each dataset initially mapped to its own RDF

• experiment with finding common models; schemas/ontologies, and shared identifiers

• evaluate against use cases expressed as SPARQL queries

Tuesday, 2 November 2010

Page 31: Intertwingularity, Semantic Web and linked Geo data

Conclusions• The Semantic Web project applies Web ideas to data

sharing.

• Linked RDF datasets have different emphasis (eg. geo, schools, politics, events), accuracy and focus.

• Treated properly this is a strength, as it allows the Web of data to grow organically without central control.

• Location-related data is a natural ‘hub’, often mixed with non-geo data. RDF and SPARQL offer Web standards for sharing and querying such mixed data, allowing for decentralised schemas.

Tuesday, 2 November 2010

Page 32: Intertwingularity, Semantic Web and linked Geo data

Questions?

Credits: original NeXT browser, see http://en.wikipedia.org/wiki/WorldWideWeb

Images: Tim Berners-Lee, Richard Cyganiak, Anja JentzschTuesday, 2 November 2010