information intermediaries
DESCRIPTION
Information intermediaries for government linked dataTRANSCRIPT
Information intermediaries for government linked data
Dave Reynolds, Epimorphics Ltd
Governments around the world are releasing data
Why?
transparency, openness, it’s public data tap creativity, enthusiasm of web developers
stimulate applications for citizens & commerce
track crime in your areaunderstand where funding is going
plan travel
choose a school
Theme for this talkhow to accelerate this uptake?
reduce cost of exploiting public data?stimulate an ecosystem of value added
services?
data dump and information intermediaries linked data approach intermediaries for a linked data world
Traditional publication approach:data dumps publish individual datasets – typically CSV easy for publisher consumer has complete control
no complex formats or query languages manage data as they want to familiar technology stack
growing set of intermediaries web services to help you work with datasets
not specific to public sector data
Intermediary services
Service Features Examples
Discovery Metadata searchFaceted searchSocial annotationAggregation across repositories
CKANNumbraryGuardian data storeSocrataInfochimpsFactual
API access Programmatic accessQuery supportRESTful access APIMultiple formats
FactualGoogle spreadsheets(e.g. Guardian data store)
Data model Access to the data model, schema or ontology
Factual
Intermediary services
Service Features Examples
Data exploration
Table viewsSlice, dice, dice, drill down
SwivelFactualSocrata
Visualization & comparison
Interactive chartingGraph one set against another
SwivelManyEyesGoogle public data
explorer
Embeddable views
Static embeddable charts/graphsEmbeddable interactive widgets
SwivelFactual
Intermediary services
Service Features Examples
Data quality Ability to correct dataProvenance tracking for corrections
Factual
[Several support social annotations]
Commerce support
Marketplace for datasetsBids and offersPay per set (pay per use?)
InfochimpsMicrosoft
Limitations to data dumps
Silo design pattern each application does its own
data integration hard to share or reuse efforts
between applications
Static local stores which require
management and update
*http://www.flickr.com/photos/zoomzoom/
Linked data : public sector data webHow:
URIs to identify things described dereference to RDF (& other formats) SPARQL endpoints for query vocabularies and patterns for
statistics, versioning, provenance ... standard URI sets
time periods, regions, departments, schools ...
Public sector data web
SchoolsTime
Periods
Gov.Bodies
AdminGeograph
yEdubase
Ofsted
DCSF
Benefits of linked data approach integrated (linked!) data standard identifiers enables linking other sets
seed connections between third party sets fine grain addressing of data
annotations (e.g. provenance) fine grained programmatic access
consume live or cache, not forced to use static data model directly linked from data
But ... barrier to entry too high - “just give us CSV”
alien data model alien query methods alien representation formats overall mismatch to typical web developer tool kit
Solution
middleware to provide web-friendly access run at publisher end or as an intermediary publish as linked data -> automatic API configure automatically from ontology
customize configuration (e.g.URI patterns)if needed
Linked data API
Access RESTful API design serve lists of resources or individual resources automatic sorting, paging of lists simple web API to control filtering, viewing
Formatting developer-friendly JSON & XML retain resource-centric model remove round-tripping requirements rooted graph
Structure
API specificati
on
Data source
SPARQLendpoint
vocabulary of
data set
Endpoint
request
response
GET /doc/schools/district/Oxford.json ? min-capacity=1200 selector
viewer
formatter
SELECT ?itemWHERE { ... }
DESCRIBE <x> <y>
cache
Operation
/doc/schools/district/Oxford.json ? min-capacity=1200
/doc/schools/district/{d}
SELECT ?r WHERE { ?r a school:School; school:district [rdfs:label ‘Oxford’]; school:capacity ?c . FILTER (?c >= 1200)} OFFSET 0 LIMIT 10
Matchendpoint
Retrievematches
buildresponse List
metadata: query and
configurationpage
N-1
page N
page N+1
school ischool
ischool i
select format:JSON
JSON serialization "results":[
{
"_about":"http://.../district/Oxford?min-schoolCapacity=1200&_page=0",
"first":"http://.../district/Oxford?&min-schoolCapacity=1200&_page=0",
"isPartOf":"http://.../district/Oxford?&min-schoolCapacity=1200",
"page":0, "pageSize":10,
"type":"http://www.epimorphics.com/vocabularies/api#Page",
"contains":[
{
"_about":"http://education.data.gov.uk/id/school/123242",
"label":"Peers School",
"districtAdministrative":{
"_about":"http://statistics.data.gov.uk/id/local-authority-district/38UC",
"label":"Oxford“ },
"phaseOfEducation":{
"_about":"http://education.data.gov.uk/def/school/PhaseOfEducation_Secondary",
"label":"Secondary” },
"schoolCapacity":1220,
"type":[
{
"_about":"http://education.data.gov.uk/def/school/School",
"label":"School” },
}, ...
Linked data API : outcomes lowers barrier to entry
very positive reception build linked data applications with e.g. jQuery
no need to for full RDF stack
stepping stone to linked data world retain concept of resources with URIs retain schema-less model look at the SPARQL you made, look at API config
open specification (Epimorphics, Talis, TSO) multiple implementations, including open source http://code.google.com/p/linked-data-api/
What other mediators are needed for a linked data world?
Service Features Examples
Discovery Metadata searchSearch on entity/concept use
SindiceDCAT, VOiD
Integration Entity co-reference discoveryOntology mappingLink to text (named entity rec.)
UberlicSameAs.orgTSO doc serviceFreebase
Enrichment Inference closureStructure transformation
WebPIE??
Exploration Follow linked data graph Tabulator, ODE, Disco, Zitgist, sig.ma ...
Visualization Interactive slice, compare, visualize
??
Conclusions intermediary services, such as LD access API,
can make the power and flexibility of linked data available to broader range of developers
meet public sector goals of stimulating network of value added applications for citizens and business
lots more to do ...