bbc linked data platform (semtechbiz san fran 2013)
DESCRIPTION
A introduction to the BBC's Linked Data Platform, with occassional dips into the detail of the code, ontologies and queries that make it possible.TRANSCRIPT
5 June 2013
BBC Linked Data PlatformUsing semantic technologies to make our content more connected and more discoverable
A (very) short history
✤ Dynamic Semantic Publishing
✤ BBC Sport - Transition from ‘static’ to ‘dynamic’
✤ Introduction of Semantic Technologies for World Cup 2010
✤ Raising the bar for Olympics 2012
✤ Linked Data Platform & The Creative Work
Olympics 2012Athletes & Medals: from trackside to our audience
BBC Linked Data Platform
(our logo)
LDP: The Creative Work
Min
imal
Met
adat
a
Sem
antic
ally
A
ggre
gate
d M
etad
ata
Triple Store
Website
Triple Store
Mobile Apps
IPTV
Open API
Creative Works
✤ Minimal metadata
✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications
✤ Enough semantic metadata (tags) to support discovery through semantic queries
✤ Full metadata requires a content-type-specific metadata API
✤ Access to content requires a content API
Some use-cases
✤ Automated index pages/feeds
✤ Semantic navigation
✤ Semantic search
✤ A typical query:
✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party
Powered by LDP
BBC Sport
BBC Music
BBC Olympics 2012
BBC Knowledge & Learning Beta
BBC News Local Beta
BBC Sport Mobile App
Creative Work Ontology
Creative Works in Codecase class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size
require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", ""))}
object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300}
Creative Work Query*
CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel .}WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE {! {{#about}}! ! FILTER (?about = <{{about}}>) .! ! ?creativeWork cwork:about ?about .! {{/about}}! {{#mentions}}! ! FILTER (?mentions = <{{mentions}}>) .! ! ?creativeWork cwork:mentions ?mentions .! {{/mentions}}! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified .! }! ORDER BY DESC(?modified)! LIMIT 10! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } }} *Simplified
SPARQL CONSTRUCT
Inner SELECT
Parametisation
Pagination
Mustache-templated
Our principal challenge:
Data Management
4 Kinds of Data
✤ Creative Works
✤ Reference Data, managed in sets (Datasets)
✤ Reference Data, managed individually (Resources)
✤ Ontologies
99.99% Availability
Our own URIs
✤ Everything has a ‘Thing URI’:
✤ http://www.bbc.co.uk/things/{GUID}#ID
✤ Opaque ID, dereferencable*
✤ BBC controls identity, therefore quality & consistency
✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc
*coming soon
Our own ontologies
✤ Core set of ontologies that are BBC owned
✤ Creative Work, BBC, (Organsational) Provenance, etc
✤ Ability to change regularly and unilaterally
✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org)
✤ Domain ontologies can be shared or reused
✤ Sport, Politics, GeoLocation, etc
Open data
✤ Provided through Mashery
✤ ‘Connected Studio’ events will validate our API
✤ Public beta to follow
✤ JSON-LD & Turtle
✤ Future
✤ Self-provisioned, cloud-based triple stores
✤ Data Dumps
The Hard Problems...
Managing concepts across BBC
✤ Which domain ‘owns’ Arnold Schwarzenegger?
✤ News? Entertainment? History? Politics?
✤ Can domains ‘own’ predicates?
✤ Layering information over shared concepts
✤ High quality sub-sets vs. lower quality ‘long-tail’
✤ Synchronisation with external datasets
✤ Tools for creating and managing concepts
✤ Emerging, splitting & combining concepts
✤ Linked Data gives us a language to solve these problems
MetadataOften subjective, never complete
✤ What is this TV programme about?✤ Manual tag curation
✤ Subjective✤ Long-term expense✤ Inconsistent
✤ Automated tag generation✤ Short-term expense✤ Value in data or algorithm?✤ Complex✤ Relies on assumptions
✤ Our approach? Invest in both. Validate learnings.
When to reason?
✤ Our options...
✤ Before writing to the triple store
✤ Materialised in the triple store (Forward-chaining inference)
✤ Inferred by the SPARQL engine (Backward-chaining inference)
✤ After SPARQL results have returned
✤ None/some/all of the above
Maturity of Semantic Tech
✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell
✤ Library/application immaturity can be a hinderance to innovation
✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction
✤ Semantic Technology is complex, but using it, need not be
Find out more
✤ Video from QCon London 2013:
✤ http://www.infoq.com/presentations/bbc-‐data-‐platform-‐api
✤ BBC Internet Blog:
✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-‐Data-‐Connecting-‐together-‐the-‐BBCs-‐Online-‐Content
✤ @daverog