open provenance model tutorial session 4: use cases from data.gov.uk

Post on 01-Jan-2016

222 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk

Outline

• Background about data.gov.uk• The use cases– XML serialization– Data transformation on the fly– Complex and nested processes

data.gov.uk

• Linking UK government data• Aims:– Provide a set of best practices for government

agencies– Provide the minimum set of tooling and

specification to facilitate the publication of data– Encourage “responsible” data publishing

XML -> RDF

XSLT ProcessorXSLT Processor

XSLT ParameterBinding

XSLT ParameterBinding

XSLT StylesheetXSLT Stylesheet

XSLT TemplateXSLT Template

input outputRDF FileRDF File

Who, when, which version,

how

Who, when, which version,

how

XSLT ProcessorXSLT Processorinput output

RDF FileRDF FileXSLT ParameterBinding

XSLT ParameterBinding

XSLT StylesheetXSLT Stylesheet

XSLT TemplateXSLT Template

Downloaded from;Unzipped from, etc Made accessible

Who, when, which version,

how

Who, when, which version,

how

On-the-fly Transformation

Data transformation

wrapper

Data transformation

wrapper

http://mytransportatio.db/j10

Who, when, which

version, how

Who, when, which

version, how

Complex Data Creation Pipeline

GATE PipelineGATE Pipeline

GateXMLRegressionTransformationGateXMLRegressionTransformation

GateXMLRdfaTransformationGateXMLRdfaTransformation

RdfaRdfXmlTransformationRdfaRdfXmlTransformation

Courtesy of Paul Appleby from TSO (Data Enrichment Service)

Complex Data Creation Pipeline

GATE PipelineGATE Pipeline

GateXMLRegressionTransformationGateXMLRegressionTransformation

GateXMLRdfaTransformationGateXMLRdfaTransformation

RdfaRdfXmlTransformationRdfaRdfXmlTransformation

Document Reset PRDocument Reset PR

ANNIE English Tokeniser

ANNIE English Tokeniser

ANNIE English SplitterANNIE English Splitter

ANNIE POS TaggerANNIE POS Tagger

Data.gov.uk Morphological Analyzer

Data.gov.uk Morphological Analyzer

Data.gov.uk Flexible Roof Gazetteer

Data.gov.uk Flexible Roof Gazetteer

Data.gov.uk Generic Gazeteer

Data.gov.uk Generic Gazeteer

GATE Noun Phrase Chunker

GATE Noun Phrase Chunker

Data.gov.uk Generic Transducer

Data.gov.uk Generic Transducer

TSO CoreferenceTSO CoreferenceCourtesy of Paul Appleby from TSO (Data Enrichment Service)

wasGeneratedBy wasGeneratedBy wasGeneratedBy

hasParentProcess iterationOfProcess

Level 1: Provenance of execution at higher level

Level 0: Provenance of execution at detailed level

Services used by executions

Artifacts

followed

wasDerivedFrom A data collection

wasTriggeredBy wasTriggeredByaccessedService

Non-digital Data Objects

• Organizations– Organizational structure changes over time– Origin organization, resulting Organization

• Boundary• Legislation

An organization ontology: http://www.epimorphics.com/public/vocabulary/org.html

The Challenges

• Data of different representations, of physical forms, of granularity

• Not tooling support• Provenance across different types of systems– Identification– Different terminologies

The Gaps

• A vocabulary being able to describe provenance of all types of data, from different systems

• A vocabulary still providing enough terms to describe provenance accurately

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)

top related