aukeggs canberra, 2006-11-29 exposing legacy file-based data (interop-for-files) andrew woolf cclrc...

22
AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory [email protected]

Upload: frederick-griffin

Post on 18-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Exposing legacy file-based data(interop-for-files)

Andrew WoolfCCLRC Rutherford Appleton Laboratory

[email protected]

Page 2: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Outline

• Introduction

• The feature model as integration key

• An interoperability approach for files

• xlink review and proposed profile for legacy data

• Examples

• Issues

Page 3: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Introduction

• Much ‘earth-science’ data exists as large legacy file-stores– e.g. ECMWF: 2 Pb of file-based data– e.g British Atmospheric Data Centre: 40 Tb of file-

based data

• Interoperability demands common approaches• BUT, multitude of formats masks commonality

– netCDF, HDF4, HDF5, GRIB, NASA Ames, PP, ...

Page 4: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Introduction

• File-centred data management focusses on the container rather than content

• File API is fundamental point of reference– binary format details not always exposed or

guaranteed– public API may be only supported access mechanism– often implemented as performant optimised native

library

• Conclusion: can’t/shouldn’t migrate

Page 5: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

• Want to expose information, not format...

Introduction

Page 6: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Introduction

• Information structures may be composed across files

Page 7: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

The feature model

• Common pattern with file-data:– need to integrate information structures

across multiple files– (relational tables provide this implicitly)

• Semantics provide an integration key– e.g. an oceanographer and meteorologist can

share a conversation about data despite format differences

Page 8: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

The feature model

Page 9: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

A model for file-based interoperability

• Retain file-based persistence format• Supplement with feature-based conceptual

model• ‘Cast’ legacy data onto conceptual model

– interoperableData = (featureModel) legacyData

• Legacy file data + GML-encoded conceptual ‘metadata’ = ‘interoperable view’– may be exposed through W*S

Page 10: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

A model for file-based interoperability

• GML provides conceptual feature ‘skeleton’

• File provides ‘flesh’

• GML ‘by-reference’ pattern for property values– uses simple xlink– “The value of a GML property that carries an xlink:href attribute is the resource returned by traversing the link”

Page 11: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

xlink review

extended xlink [role] [title]

local resource D[role][title][label]

remote resource C[href][role][title][label]

remote resource B[href][role][title][label]

local resource A[role][title][label]

arc 1[arcrole] [title]

[show] [actuate]

arc 2

arc 3

Page 12: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

xlink review

simple xlink [role] [title]

local resource[role][title][label]

remote resource[href][role][title][label]

arc[arcrole] [title]

[show] [actuate]

Page 13: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

xlink review

• ‘role’ (URI):– indicates a property of the remote resource– must be a URI reference that “identifies some

resource that describes the intended property”

• ‘arcrole’ (URI):– describes the “meaning of the arc’s ending

resource relative to its starting resource”– corresponds to RDF notion of a property

• starting-resource HAS arc-role ending-resource

Page 14: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

extended xlink

xlink patterns for files

GML feature instance

Aggregation semantics determined by xlink arc traversal rules

Page 15: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

simple xlink

xlink patterns for files

GML feature instance

Aggregation semantics determined by storage descriptor

Page 16: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

xlink proposal

• href examples:– netCDF#variable– RDBMS#SQLQuery– GRIBFile#recordNumber– CSMLStorageDescriptor#arrayID

<someGMLElement

xlink:arcrole="hasRemoteContentEmbeddedAt#localXpath"

xlink:href="storageDescriptor#portion"

xlink:role="storageSchemaIdentifier"

xlink:show="embed"

xlink:actuate="onRequest | onLoad"/>

Page 17: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Example

• GML CR 06-160– ISO 19123

CV_ReferenceableGrid

<gml:ReferenceableGrid gml:id="ID001" srsName="urn:ogc:def:crs:EPSG:6.6:4326" dimension="2"> <gml:limits> <gml:GridEnvelope> <gml:low>0 0</gml:low> <gml:high>7 4</gml:high> </gml:GridEnvelope> </gml:limits> <gml:axisLabels>x y</gml:axisLabels> <gml:coordTransformTable> <gml:GridCoordinatesTable> <gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic longitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList>13.5 24.9 32.4 37.7 41.5 46.8 54.4 65.7</gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1">Linear</gml:sequenceRule> </gml:GridOrdinateDescription> </gml:gridOrdinate> <gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic latitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList>

53.1 48.7 46.2 44.7 43.9 43.3 43.1 44.046.2 43.2 41.5 40.6 40.2 40.0 40.3 41.737.1 36.1 35.6 35.5 35.7 36.0 37.1 39.530.4 30.2 30.4 30.7 31.1 32.0 33.8 37.224.3 24.8 25.3 26.0 26.6 27.7 29.7 33.4

</gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x y</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1 -2">Linear</gml:sequenceRule> </gml:GridOrdinateDescription> </gml:gridOrdinate> </gml:GridCoordinatesTable> </gml:coordTransformTable> </gml:ReferenceableGrid>

Page 18: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Example• netCDF ASCII dump:

netcdf myfile {dimensions:

x = 8 ;y = 5 ;

variables:float lon(x) ;

lon:long_name = “longitude” ;lon:units = “degrees_east” ;

float lat(x,y) ;lat:long_name = “latitude” ;lat:units = “degrees_north” ;

float temp(x,y) ;temp:coordinates = “lon lat” ;temp:long_name = “temperature” ;temp:units = “degC” ;

data: lon = 13.5, 24.9, 32.4, 37.7, 41.5, 46.8, 54.4, 65.7 ; lat = 53.1, 48.7, 46.2, 44.7, 43.9, 43.3, 43.1, 44.0, 46.2, 43.2, 41.5, ...

Page 19: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Example<gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic longitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList srsName=“WGS84”>13.5 24.9 32.4 37.7 41.5 46.8 54.4 65.7</gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1">Linear</gml:sequenceRule> </gml:GridOrdinateDescription></gml:gridOrdinate>

<gml:coordAxisValuesxlink:arcrole=“http://ndg.nerc.ac.uk/xlinkUsage/insert#SpatialOrTemporalPositionList/coordinateList”xlink:href=“myfile.nc#lon”xlink:role=“http://ndg.nerc.ac.uk/fileFormat/netcdf”xlink:show=“embed”> <gml:SpatialOrTemporalPositionList> <gml:coordinateList srsName=“WGS84”/> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues>

Page 20: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Issues

• Need to ‘get as close as possible’ to target– ‘merge’ semantics consistent with GML?

(Opportunity: no best practice for GML yet!)• “If both a link and content are present in an

instance of a property element, then the object found by traversing the xlink:href link shall be the normative value of the property. The object included as content shall be used by the data recipient only if the remote instance cannot be resolved; this may be considered to be a "cached" version of the object.” [GML 7.2.3.4]

Page 21: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Issues

• xlink:href (URI) for remote resource fragment (format-specific)– e.g. RDBMS#SQLQuery, netCDF#variable, etc...

• xlink:role (URI) for resource format– e.g. reference PRONOM-type format repository?

• implied conversion to GML target content type• xlink:arcrole (URI) for ‘embed remote content’ semantics

– ‘insert at relative XPath’ essential

• simple xlink can’t handle multiple resources– application-specific ‘storage descriptor’ schemas for file

aggregation semantics

Page 22: AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory A.Woolf@rl.ac.uk

AUKEGGS

Canberra, 2006-11-29

Conclusion

• Presented a profile for xlink with files in absence of current best practice

• Meets key practical requirements– retain file-based persistence formats– provide interoperability ‘wrapper’– focus on logical content, not container (feature model)

• Semantic governance at appropriate points• Enables powerful, scalable mechanism for real

data– e.g. large meteorological datasets