ecmwf wmo metadata workshop – beijing sep 2005 experience with the wmo core metadata in the...

31
ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

Upload: avice-gaines

Post on 18-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Experience with the WMO core metadata in the SIMDAT/VGISC project

Baudouin Raoult

ECMWF

Page 2: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

The SIMDAT/VGISC project

SIMDATEU funded GRID project7 Technologies: Grid infrastructure, Virtual Organisation,

Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services

4 Activities: Automotive, Areospace, Pharmacy and Meteorology

Meteorology activity: build a Virtual GISC (V-GISC)DWDUKMOMétéoFranceEUMETSATECMWF

Page 3: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

V-GISC infrastructure

Page 4: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

V-GISC Conceptual view

Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization

The Virtual Database Service provides a single view of partners databases

Page 5: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

VGISC Distributed Architecture

Page 6: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Why do we need metadata (in this project)?

Create a catalogue (discovery metadata)Searchable (Keyword, Geographical location, Time range)

Browsable (Directory hierarchy)

Implement the V-GISC (service metadata)Describe where the data resides (physical location)

Describe how to request the data

Describe the data format (useful for offering list of transformations, e.g. sub-sampling of gridded data, plots or format conversions)

Describe associated data policies

Page 7: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Study of the WMO core

Starting pointXML files available on the WMO web site

XML files from DWD earlier prototype

Trying to describe ECMWF archive (1.3 1010 GRIB fields)

Page 8: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

XML Root element

<p:piTimeseries xmlns:p="http://www.wmo.ch/web/www/metadata/piTimeseries" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wmo.ch/web/www/metadata" xsi:schemaLocation="http://www.wmo.ch/web/www/metadata http://www.dwd.de/UNIDART/metadata/WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/metadata/piTimeseries http://www.dwd.de/UNIDART/metadata/WMO19115_piTimeseries_schema.xsd">

or

<metaData xmlns="http://www.wmo.ch/web/www/metadata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“ xmlns:fc="http://www.wmo.ch/web/www/featurecatalogue“ xsi:schemaLocation="http://www.wmo.ch/web/www/metadata/../WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/featurecatalogue/./featurecat/iso19110.xsd">

Namespaces are a nightmare to use (especially using XPath when there is a default namespace)

Page 9: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

XML Keywords

<descriptiveKeywords>Russian Federation</descriptiveKeywords><descriptiveKeywords>Moscow region</descriptiveKeywords><descriptiveKeywords>Temperature</descriptiveKeywords><descriptiveKeywords>Clouds</descriptiveKeywords><descriptiveKeywords>Meteorology</descriptiveKeywords><descriptiveKeywords>Observation</descriptiveKeywords><descriptiveKeywords>Pressure</descriptiveKeywords><descriptiveKeywords>Rainfall</descriptiveKeywords><descriptiveKeywords>Snow</descriptiveKeywords><descriptiveKeywords>Snowfall</descriptiveKeywords><descriptiveKeywords>Weather</descriptiveKeywords><descriptiveKeywords>Wind</descriptiveKeywords><descriptiveKeywords>Phenomenon</descriptiveKeywords>

Or…

<descriptiveKeywords>EARTH SCIENCE > Cryosphere > Sea Ice</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Atmosphere</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Oceans</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Solid Earth</descriptiveKeywords><descriptiveKeywords>ocean, atmosphere, ice, land</descriptiveKeywords>

Or…

<descriptiveKeywords>METAR aviation hourly weather observation temperature dew point precipitation amount visibility cloud amount type height weather runway colour state</descriptiveKeywords>

Page 10: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

XML Geographical extent<geographicElement> <polygon> <point> <latitude>50.78</latitude> <longitude>6.1</longitude> </point> </polygon></geographicElement>

Or…

<geographicElement><geographicIdentifier gazetteer="http://www.wmo.ch/web/www/ois/volume-a/vola-home.htm">

CCCC2</geographicIdentifier>

</geographicElement>

Or…

<geographicElement><boundingBox>

<westBoundLongitude>-126.3</westBoundLongitude><eastBoundLongitude>-126.3</eastBoundLongitude><southBoundLatitude>39.9</southBoundLatitude><northBoundLatitude>39.9</northBoundLatitude>

</boundingBox></geographicElement>

Page 11: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

XML Temporal extent

<temporalElement><beginDateTime>0100-01-01</beginDateTime><endDateTime>0299-12-31</endDateTime><dataFrequency>monthly</dataFrequency><dataFrequency>daily</dataFrequency>

</temporalElement>

Or…

<temporalElement><referenceDateTime>2004-02-05T00:00:00</referenceDateTime><beginDateTime>2004-02-05T06:00:00</beginDateTime><endDateTime>2004-02-05T06:00:00</endDateTime>

</temporalElement>

Or…

<referenceDate><date>2004-01-28</date><dateType>creationDate</dateType>

</referenceDate>

Page 12: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Repetition of XML elements (means extension)

<dataExtent><verticalElement>

<minimumValue>3.5</minimumValue><maximumValue>992.5</maximumValue><unitOfMeasure>mb</unitOfMeasure>

</verticalElement></dataExtent><dataExtent>

<geographicElement><boundingBox>

<westBoundLongitude>-180</westBoundLongitude><eastBoundLongitude>+180</eastBoundLongitude><southBoundLatitude>-90</southBoundLatitude><northBoundLatitude>+90</northBoundLatitude>

</boundingBox><geographicIdentifier

gazetteer="http://gcmd.gsfc.nasa.gov/Resources/valids/location.html">Global</geographicIdentifier>

</geographicElement></dataExtent><dataExtent>

<temporalElement><beginDateTime>1900-01-01</beginDateTime><endDateTime>1999-12-31</endDateTime><dataFrequency>monthly</dataFrequency><dataFrequency>daily</dataFrequency>

</temporalElement></dataExtent>

Page 13: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Repetition of XML elements (means redefinition)

<dataExtent>

<description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector S</description><geographicElement>

<boundingBox><westBoundLongitude>-180</westBoundLongitude><eastBoundLongitude>-60</eastBoundLongitude><southBoundLatitude>0</southBoundLatitude><northBoundLatitude>90</northBoundLatitude>

</boundingBox></geographicElement>

</dataExtent>

<dataExtent><description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector T</description><geographicElement>

<boundingBox><westBoundLongitude>-60</westBoundLongitude><eastBoundLongitude>60</eastBoundLongitude><southBoundLatitude>0</southBoundLatitude><northBoundLatitude>90</northBoundLatitude>

</boundingBox></geographicElement>

</dataExtent>

Page 14: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Findings

A flexible format, that leads to a lack of consistency

Different way to encode geographical extent, keywords and temporal extents

Missing information (for the V-GISC)To create a directoryTo locate the dataTo create retrieval requestsTo describe available transformationsTo implement data policies

Page 15: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Findings (cont.)

Seems to be designed for human consumptionFree text in XML elements

•<distributionInfo>•<dataQualityInfo>

Not scalableSome document may change frequently (hourly?)Some documents are orders of magnitude larger than data

itselfCannot represent very large archives with small granularity

Page 16: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

SIMDAT/VGISC problem

Each site has its own practicesWe have to be ready for variability in the XMLWe will have to handle XML from other WMO programmes

We need to handle tens of thousands of documentsLot of repeated informationWe need fast search

We need to automatically Index the keywords, the geographical extent and the temporal

extent Create a browsable directory (similar the NCAR’s Community

data portal)Locate and retrieve the data Implement the data policy

Page 17: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Solution: split XML documents into fragments

WMO core metadata is structured

Some part are shared amongst many documentsAll metadata share the Core partAll UKMO metadata share the Owner partAll synops (should) share the same descriptionAll observations at Heathrow have the same locationThe date part is variable but is very small

WMO

UKMO

Synop

Heathrow

2005-10-12

Core

Owner

Data type

Station (geographical extent)

Date (temporal extent)

Page 18: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

XML fragments are hierarchically linked

WMO UKMO

Synop Heathrow

Heathrow Synop

Heathrow Synop 2005-10-12

Page 19: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Fragments: advantages

Factorizing commonalities into static fragmentsReduces size of XML documents

Indexation done once

Avoid redundancy of informationFaster searches

Frequently updated documents are smallManageable

Scalable

Complete XML document can be rebuilt For exchange outside the V-GISC

Page 20: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Indexing of XML fragments

WMO UKMO

Synop Heathrow

Heathrow Synop

Heathrow Synop 2005-10-12

Keywords

Geographical Extent

Temporal Extent

Page 21: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Prototype implementationXML Fragment are stored as “text”

Fragment tableHierarchy table

Indexed at insertion timeKeywords tableLocations tablePeriods tableDirectory table

Implemented with MySQLWith OpenGIS extensionWith text search extension

Indexes are “inherited”OO approach

Page 22: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Object Oriented Approach - Behaviours

WMO UKMO

Synop Heathrow

Heathrow Synop

Heathrow Synop 2005-10-12

Index <geographicElement><boundingBox>

as geography

Index <featureAttribute>

<membrName> as keyword

Index<referenceDate>

<date>as period

Index <descriptiveKeywords>

as keyword

Page 23: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Fragment properties - Behaviours

Only the owner of the data knows how to :Describe the data (Indexation information)

Request the data (Create internal request)

Extract a subset of the data (Define a interface to extract a subset)

Associated to each fragments ancillary metadata can be defined to describe how to index, request and sub-select the data

Behaviours are inheritedObject oriented approach

Page 24: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Behaviours example: indexing

<indexing class="XPathKeywordIndexer“ separator=“ “><xpath>//identificationInfo/descriptiveKeywords</xpath>

</indexing>

<indexing class="XPathBoundingBoxIndexer"><xpath>//identificationInfo/dataExtent/geographicElement/boundingBox</

xpath></indexing>

<indexing class="XPathPolygonIndexer"><xpath>//identificationInfo/dataExtent/geographicElement/polygon</xpath>

</indexing>

<indexing class="XPathDateIndexer"><xpath>//identificationInfo/referenceDate/date</xpath>

</indexing>

<indexing class="XPathPeriodIndexer"><xpath>//identificationInfo/dataExtent/temporalElement</xpath><xpath>//identificationInfo/referenceDate/period</xpath>

</indexing>

<indexing class="XPathDirectoryIndexer"><xpath>//identificationInfo/topicCategory</xpath>

</indexing>

Page 25: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

<vgisc> extension

A <vgisc> element from the “http://www.vgisc.org/” namespace is embedded in all the fragments

It contains all information needed to implement the V-GISC that is not defined by the WMO core because they are not relevant outside the scope of the V-GISC

Internal unique IDHierarchy relationshipPhysical location (which V-GISC node holds the data) Information used to create data request Information used to create web pages

It is removed when full XML document is recomposed for use outside the V-GISC

Page 26: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Fragment example

<metaData xmlns:v='http://www.vgisc.org/'><v:vgisc>

<id>urn:akrotiri.synop.land.second.record.20050629</id>

<inherit>urn:akrotiri</inherit><inherit>urn:int.wmo.synop.land.second.record</

inherit><location>ecmwf.obs</location>

</v:vgisc><identificationInfo>

<referenceDate><date>2005-06-29</date>

</referenceDate></identificationInfo>

</metaData>

Page 27: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Variables and Requests

Some datasets have two many items Impossible to describe every one of them

But describing the whole dataset is simple

Some datasets are very homogenousE.g. same parameters for a long period of time

This can be described in a compact form (<beginDateTime> and <endDateTime>)

But we still need to specify that individual dates can be requested by the user

Page 28: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Variables and requests (cont.)

Associate two elements with an XML fragment:

<request>Hold information specific on how to generate a valid request

to the data repository

<variable>Holds information on how to create a web interface to let the

user select items from the dataset

Web portalWe use WMO core for discovery

We use the <variable> element to present selection dialogues to the user

Page 29: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Fragment example: ECMWF Reanalysis <metadata xmlns:v='http://www.vgisc.org/'>

<v:vgisc><id>urn:int.ecmwf.era40.sfc</id><inherit>urn:int.wmo.core</inherit><location>ecmwf.mars</location><request>

<class>e4</class><levtype>sfc</levtype><database>marser</database>

</request><variables>

<date type='date'><startDate>1980-01-01</startDate><endDate>1990-12-31</endDate>

</date><param title='Parameter' multiple='1' type='enum'>

<value>2t</value><value>msl</value>

</param><time title='Base time' multiple='1' type='enum'>

<value>0000</value><value>0600</value><value>1200</value><value>1800</value>

</time></variables>

</v:vgisc><identificationInfo>

<descriptiveKeywords>ECMWF 40 Years reanalysis ERA40 ERA-40 in GRIB</descriptiveKeywords><topicCategory>NWP Outputs > ECMWF > 40 years reanalysis</topicCategory><dataExtent>

<temporalElement><beginDateTime>1980-01-01</beginDateTime><endDateTime>1990-12-31</endDateTime>

</temporalElement>…

Page 30: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

Directory structure

Problem: create a browsable hierarchy of topics, as the “Google directory” (see NCAR’s community data portal)

Not to be confuse with the internal “fragment hierarchy” which is not exposed to the end user

Currently using the element <topicCategory><topicCategory>NWP Outputs > ECMWF > 40 years

reanalysis</topicCategory>

The same product can appear in several locations of the directory<topicCategory>Observations > By Type > Profile > Temp Land</topicCategory><topicCategory>Observations > By Region > Asia > China</topicCategory>

Usage should be recommended by WMO

Page 31: ECMWF WMO Metadata Workshop – Beijing Sep 2005 Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

ECMWFWMO Metadata Workshop – Beijing Sep 2005

ConclusionThe approach taken in the V-GISC should help us

support the large variety of XML documentsNevertheless, the standard is too flexible

Lot of programming is required to support all possible variations

The WMO must provide “best practices” guidelinesHow to encode point in time, how to encode ranges, …

A topic hierarchy must be defined, to create the directory

WMO core metadata needs only contain sufficient information for discovery

The rest can be implemented as a series of local extensions, as long as they are not exported or exchanged