bbc linked data platform (semtechbiz san fran 2013)

23
5 June 2013 BBC Linked Data Platform Using semantic technologies to make our content more connected and more discoverable

Upload: dave-rogers

Post on 08-May-2015

458 views

Category:

Education


2 download

DESCRIPTION

A introduction to the BBC's Linked Data Platform, with occassional dips into the detail of the code, ontologies and queries that make it possible.

TRANSCRIPT

Page 1: BBC Linked Data Platform (SemTechBiz San Fran 2013)

5 June 2013

BBC Linked Data PlatformUsing semantic technologies to make our content more connected and more discoverable

Page 2: BBC Linked Data Platform (SemTechBiz San Fran 2013)

A (very) short history

✤ Dynamic Semantic Publishing

✤ BBC Sport - Transition from ‘static’ to ‘dynamic’

✤ Introduction of Semantic Technologies for World Cup 2010

✤ Raising the bar for Olympics 2012

✤ Linked Data Platform & The Creative Work

Page 3: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Olympics 2012Athletes & Medals: from trackside to our audience

Page 4: BBC Linked Data Platform (SemTechBiz San Fran 2013)

BBC Linked Data Platform

(our logo)

Page 5: BBC Linked Data Platform (SemTechBiz San Fran 2013)

LDP: The Creative Work

Min

imal

Met

adat

a

Sem

antic

ally

A

ggre

gate

d M

etad

ata

Triple Store

Website

Triple Store

Mobile Apps

IPTV

Open API

Page 6: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Works

✤ Minimal metadata

✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications

✤ Enough semantic metadata (tags) to support discovery through semantic queries

✤ Full metadata requires a content-type-specific metadata API

✤ Access to content requires a content API

Page 7: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Some use-cases

✤ Automated index pages/feeds

✤ Semantic navigation

✤ Semantic search

✤ A typical query:

✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party

Page 8: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Powered by LDP

BBC Sport

BBC Music

BBC Olympics 2012

BBC Knowledge & Learning Beta

BBC News Local Beta

BBC Sport Mobile App

Page 9: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Work Ontology

Page 10: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Works in Codecase class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size

require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", ""))}

object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300}

Page 11: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Work Query*

CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel .}WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE {! {{#about}}! ! FILTER (?about = <{{about}}>) .! ! ?creativeWork cwork:about ?about .! {{/about}}! {{#mentions}}! ! FILTER (?mentions = <{{mentions}}>) .! ! ?creativeWork cwork:mentions ?mentions .! {{/mentions}}! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified .! }! ORDER BY DESC(?modified)! LIMIT 10! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } }} *Simplified

SPARQL CONSTRUCT

Inner SELECT

Parametisation

Pagination

Mustache-templated

Page 12: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our principal challenge:

Data Management

Page 13: BBC Linked Data Platform (SemTechBiz San Fran 2013)

4 Kinds of Data

✤ Creative Works

✤ Reference Data, managed in sets (Datasets)

✤ Reference Data, managed individually (Resources)

✤ Ontologies

Page 14: BBC Linked Data Platform (SemTechBiz San Fran 2013)

99.99% Availability

Page 15: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our own URIs

✤ Everything has a ‘Thing URI’:

✤ http://www.bbc.co.uk/things/{GUID}#ID

✤ Opaque ID, dereferencable*

✤ BBC controls identity, therefore quality & consistency

✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc

*coming soon

Page 16: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our own ontologies

✤ Core set of ontologies that are BBC owned

✤ Creative Work, BBC, (Organsational) Provenance, etc

✤ Ability to change regularly and unilaterally

✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org)

✤ Domain ontologies can be shared or reused

✤ Sport, Politics, GeoLocation, etc

Page 17: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Open data

✤ Provided through Mashery

✤ ‘Connected Studio’ events will validate our API

✤ Public beta to follow

✤ JSON-LD & Turtle

✤ Future

✤ Self-provisioned, cloud-based triple stores

✤ Data Dumps

Page 18: BBC Linked Data Platform (SemTechBiz San Fran 2013)

The Hard Problems...

Page 19: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Managing concepts across BBC

✤ Which domain ‘owns’ Arnold Schwarzenegger?

✤ News? Entertainment? History? Politics?

✤ Can domains ‘own’ predicates?

✤ Layering information over shared concepts

✤ High quality sub-sets vs. lower quality ‘long-tail’

✤ Synchronisation with external datasets

✤ Tools for creating and managing concepts

✤ Emerging, splitting & combining concepts

✤ Linked Data gives us a language to solve these problems

Page 20: BBC Linked Data Platform (SemTechBiz San Fran 2013)

MetadataOften subjective, never complete

✤ What is this TV programme about?✤ Manual tag curation

✤ Subjective✤ Long-term expense✤ Inconsistent

✤ Automated tag generation✤ Short-term expense✤ Value in data or algorithm?✤ Complex✤ Relies on assumptions

✤ Our approach? Invest in both. Validate learnings.

Page 21: BBC Linked Data Platform (SemTechBiz San Fran 2013)

When to reason?

✤ Our options...

✤ Before writing to the triple store

✤ Materialised in the triple store (Forward-chaining inference)

✤ Inferred by the SPARQL engine (Backward-chaining inference)

✤ After SPARQL results have returned

✤ None/some/all of the above

Page 22: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Maturity of Semantic Tech

✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell

✤ Library/application immaturity can be a hinderance to innovation

✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction

✤ Semantic Technology is complex, but using it, need not be