netcdf-ld - towards linked data conventions for delivery of environmental data using netcdf

37
Towards linked data conventions for delivery of environmental data using netCDF LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox CSIRO LWF and BODC/ Marine Institute (Galway) 25 March 2014 | ISESS … netCDF-LD

Upload: jonathan-yu

Post on 11-Feb-2017

246 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF

LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP

Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox CSIRO LWF and BODC/ Marine Institute (Galway)25 March 2014 | ISESS

… netCDF-LD

Page 2: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Outline

1. Big data and netCDF

2. Metadata challenges

3. Linked Data, JSON-LD & CSV-on-the-web

4. netCDF-LD

5. Applications

2 |

Page 3: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Big data

• “90% of the world’s data has been produced over the last two years”

• Environmental and earth observation has always been big-data…

3 |

Page 4: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

National Soil and Landscape Grid

4 |

Page 5: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Hydrodynamic models

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu5 |

Page 6: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu6 |

Real-time wind data

Page 7: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF

• Persistence layer• Simple: self-describing, user

accessible “flat file”

• Java, C++, python, others

• Interfaces to the Data Model

7 |

File format

Software library

API

Page 8: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

netCDF-4 Data Model

Page 9: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF cont’

• Really good at handling array-oriented data – efficient subsetting• Multi-dimensional grids

• CF conventionsclimatology and weather forecasting domains

9 |

Page 10: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF metadata example

10 |

float Nap_MIM(time, latitude, longitude) ;  Nap_MIM:_FillValue = -999.f ;  Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;  Nap_MIM:units = "mg/L" ;  Nap_MIM:valid_min = 0.01209607f ; Nap_MIM:valid_max = 226.9626f ;  Nap_MIM:standard_name = ”total_suspended_solids”; 

Page 11: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Use of netCDF for Chlorophyll observations

11 |

Page 12: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF challenges

• Encoding format and framework

• “Domain agnostic” but conventions exist for communities – e.g. Climate and Forecasting (CF conventions)

• CF conventions limited• Use of ‘standard_name’ attribute to denote the combination of

medium/transformation/substance/state/quantity using a particular syntax• Assumes English• Standard names take a while to be adopted• Standard names variations of each other• Units are from UDUNITS – Unidata managed• Narrow scope of scientific domain – can’t reuse for other domains

12 |

Page 13: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF Metadata challenges – Semantic heterogeneity

13 |

Enviro Application

#1

Data

DB

Chl_MIM

Enviro Application

#2

Data

DB

mass_conc_chlorophyll_In_sea_water

Enviro Application

#3

Data

DB

mass_conc_chlorophyll_a_In_sea_water

Page 14: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Semantic Heterogeneity… leads to data silos

14 |

Enviro Application

#1

Enviro Application

#2

Enviro Application

#3

Data Data Data

DB DBDB

Chl_MIMmass_conc_chlorophyll_In_sea_water

mass_conc_chlorophyll_a_In_sea_waterX X

Meetings

Page 15: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Approaches to address netCDF metadata challenge

1. Strict naming grammars for standard_name

15 |

(Peckham 2014) CSDMS Standard Naming Conventions

Page 16: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Approaches to address netCDF metadata challenge (2)

2. Break up various concerns in standard_name into multiple attributes – ref (Yu et al. 2014)

float Nap_MIM(time, latitude, longitude) ;   Nap_MIM:_FillValue = -999.f ;   Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;   Nap_MIM:units = "mg/L" ;   Nap_MIM:valid_min = 0.01209607f ;   Nap_MIM:valid_max = 226.9626f ;   Nap_MIM:scaledQuantityKind_id = "http://environment.data.gov.au/water/quality/def/property/solids-total_suspended" ;   Nap_MIM:unit_id = "http://environment.data.gov.au/water/quality/def/unit/MilliGramsPerLitre" ;   Nap_MIM:substanceOrTaxon_id = "http://environment.data.gov.au/water/quality/def/object/solids";   Nap_MIM:medium_id = "http://environment.data.gov.au/water/quality/def/object/ocean" Nap_MIM:procedure_id = "http://data.ereefs.org.au/ocean-colour/MIM_SVDC_RRS" ;

16 |

Page 17: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Harmonised Publish, Discovery, Access and Use

17 |

Relies on community agreed vocabularies

Describe/Publish the data

Query/Use dataEnviro

Application #1

Enviro Application

#2

Enviro Application

#3

Data Data Data

DB DBDB

substanceOrTaxon= http://environment.data.gov.au/def/object/chlorophyll

scaledQuantityKind = http://environment.data.gov.au/def/property/chlorophyll_concentration

Need to communicate more consistentlyRequires shared, precise, agreed semantics

Page 18: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Linked Data, JSON-LD & CSV-on-the-web

18 |"LOD Cloud Diagram as of September 2011" by Anja Jentzsch

Page 19: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Linked Data

• Method of connect related data, existing vocabularies and other semantics using web links (URIs) and a language for describing resources (RDF)

• Applications can thenquery data, follow the linksand infer new insights from the data

19 |

Page 20: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

JSON-LD

{ "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"}

20 |

JSON-LDDecorators

Page 21: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

JSON-LD and Semantic Web

21 |

http://dbpedia.org/resource/

John_Lennon

http://dbpedia.org/resource/

Cynthia_Lennon

“John Lennon”1940-10-09

Page 22: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

CSV-on-the-web

• Linked data for CSV tabular data

• Add context to tables

22 |

Page 23: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Example: Domain vocab term

23 |

Page 24: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Example: Domain vocab term

24 |

... wqp:chlorophyll_a_concentration a skos:Concept, op:ScaledQuantityKind, qudt:ChemistryQuantityKind ;

skos:broader wqp:chlorophyll_concentration ; skos:prefLabel "chlorophyll a concentration"@en ;

Page 25: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

netCDF-LD

Take the already successful and popular encoding format and…

‘linkify’ netCDF!

25 |

Page 26: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF-LD: Linkifying netCDF

26 |

‘Context’ boilerplates

Page 27: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF-LD: Linkifying netCDF

27 |

Air Temperature Definition

Air

Temperature

Medium Quantity Kind

Kelvin

UoMeReefs Vocabularies

Page 28: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF-LD: Global Attributes

28 |

Page 29: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Assigning URIs to variable level attributes

z:units = "meters";z:units_ref = "http://qudt.org/vocab/unit#Meter";z:a = "http://environment.data.gov.au/def/op#quantityKind";z:dcPartOf = "http://foo.bar/linked_netCDF_example";z:valid_range = 0., 5000.;

29 |

Page 30: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF-LD to RDF

@prefix unit: <http://qudt.org/vocab/unit#> .@prefix qudt: <http://qudt.org/1.1/schema/qudt#> .@prefix op: <http://environment.data.gov.au/def/op#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dcterms: <http://purl.org/dc/terms/> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:z qudt:unit unit:Meter; a op:ScaledQuantityKind; dcterms:isPartOf <http://foo.bar/linked_netCDF_example>.

30 |

Page 31: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Use of netCDF-LD to support Data Discovery

31 |

Page 32: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Data annotated with bindings to vocab URIs

32 |

THREDDS

THREDDS Catalog

Domain Vocabs(Water Quality at

environment.data.gov.au)

Quantities/ Units ontology(QUDT)

substanceOrTaxon= http://environment.data.gov.au

/def/object/chlorophyll

scaledQuantityKind = http://environment.data.gov.au

/def/property/chlorophyll_concentration

unit= http://qudt.org/vocab/unit#Unitless

medium= http://environment.data.gov.au

/def/feature/ocean

Page 33: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

DBL Harvesting and End Use

33 |

Data Brokering Layer

THREDDS

THREDDS Catalog

Domain Vocabs(Water Quality at

environment.data.gov.au)

Quantities/ Units ontology(QUDT)

substanceOrTaxon= http://environment.data.gov.au

/def/object/chlorophyll

scaledQuantityKind = http://environment.data.gov.au

/def/property/chlorophyll_concentration

unit= http://qudt.org/vocab/unit#Unitless

medium= http://environment.data.gov.au

/def/feature/ocean

End users

Client application

chlorophyll

Page 34: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

netCDF-LD benefits

• Enhanced metadata – better self-description for any domain

• Allows binding to rich semantics from existing vocabularies being published via SKOS/OWL/RDF in multiple domains

• Linked Data approach – consistent with Semantic Web technologies, RDF, JSON-LD, CSV-on-the-web

• Hook into toolkits and software libraries for data discovery via Semantic Web technologies and other Linked data already being published online

34 |

Page 35: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Future/Current Work

1. netCDF-LD requires adoption at community level – need to test the proposal at OGC, Unidata, CF communities

2. Tooling for parsing netCDF-LD

3. Encoding large volumes using netCDF-LD

4. Demonstrate Data Broker concept using netCDF-LD

35 |

Page 36: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu

Summary

• netCDF is really good at big data – esp. array-oriented data

• Limitations with the current netCDF metadata structure

• *-LD pattern is emerging across multiple formats (JSON, CSV)

• Propose netCDF-LD which is an unintrusive extension on netCDF to facilitate improved self-described datasets

• netCDF-LD has potential to enhance data discovery from environmental dataset across to other Linked Data being published online

36 |

Page 37: netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

LAND AND WATER

Thank you

Land and WaterJonathan YuResearch Software Engineert +61 3 9252 6440e [email protected]