tips & tricks for spatial data...

26
Tips & Tricks for Spatial Data Harmonization Dr. Christine Giger ([email protected]) Dr. Jan Schulze Althoff ([email protected])

Upload: vuongque

Post on 04-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Tips & Tricks for Spatial Data Harmonization

Dr. Christine Giger ([email protected])

Dr. Jan Schulze Althoff ([email protected])

Overview

➞Why is Spatial Data Harmonization still important or necessary?

➞Tools & Methods ➞Tips & Tricks ➞Conclusions

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 2

Overview

➞Why is Spatial Data Harmonization still important or necessary?

➞Tools & Methods ➞Tips & Tricks ➞Conclusions

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 3

Provision of INSPIRE-compliant data

➞All of the commercial SW vendors and many open source products offer “off-the-shelf” solutions to work with INSPIRE-compliant services

➞All data providers deliver their data INSPIRE compliant

➞Everything should be interoperable when using INSPIRE-compliant data

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 4

BUT: Observed Problems (Software) ➞Still various interoperability issues between different

software systems – “data exported by system abc as INSPIRE compliant data cannot directly be used with system xyz” – some examples: – The data is not valid against the schema – The data and/or the schema cannot be imported into the

favoured spatial ETL-Tool or GIS ABC – The data cannot be visualized in the ETL-Tool or GIS ABC – Data on different themes created by the same tool cannot

be integrated or migrated – The data cannot be migrated with other (non-INSPIRE)

GML/XML data – Etc.

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 5

Causes for the problems ➞Many GIS and ETL-Tools...:

– ... operate “data-centred” and are not “schema-aware”

• Data is not validated in the production process (errors like missing attributes, non-declared elements)

• Implicit restrictions (e.g. specific geometry representation required - <pos> vs. <poslist> or specific position of “srs” Attribute)

– ... use hard-coded namespaces and schema locations

– ... use a vendor-specific GML3.2.1 core schema – ... use deprecated types – …

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 6

Observed Problems (Data provision)

➞Often data is still delivered in different formats/structures – Several XML-based formats, e.g.

• proprietary XML formats • GML 2.1 • GML 3.2.1 (in different flavors)

– Schemas are huge (up to 1.5 million lines) and (partly) complex (e.g. 580 complex types and over 80 referenced schemas)

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 7

GML.xsd

AAA-Basisschema.xsd

AAA-Fachschema.xsd

Example: Overview on the structure of German topographic data (here: Streets)

AX_Strasse AX_StrasseType

AA_ZUSOType

AA_ObjektType

AbstractFeatureType

AX_Strassenachse

AX_Fahrbahnachse

AX_StrassenachseType

AX_FahrbahnachseType

TA_CurveComponentType

AG_ObjekteMit GemeinsamerGeometrie

AA_REOType hatDirektUnten

istTeilVon

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 8

Overview

➞Why is Spatial Data Harmonization still important or necessary?

➞Tools & Methods ➞Tips & Tricks ➞Conclusions

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 9

Requirements for data harmonization

➞Interoperability between software – „minor“ adaptions (adding/removing

attributes, changing namespaces, validating…)

-> „Scripting tasks“ ➞Data transformation of delivered/

provided data – „major“ structural changes (extraction of

elements, reclassification, grouping, …) ->“Complex transformations“

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 10

Requirements for tools to support data transformation Technical requirements (grouped/simplified) ➞Read & Write XML Data ➞Support namespaces & schema validation ➞Support filtering on values, types, structures ➞Support conditional statements ➞Support group functions ➞Support simple spatial operations ➞Support GML 3.2.1 types directly

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 11

Technology used in different projects ➞Spatial ETL

– Safe FME – Talend Spatial Data Integrator – GeoKettle – Humboldt Alignment Editor – …

➞Combinations of open toolsets – XSLT – Python – GDAL – XQuery (incl. ExPath Geo Module)

➞Observation: No or very few spatial transformations are needed

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 12

XQuery - Overview ➞Functional language to query and create XML ➞Official W3C standard aligned with

– XPath for adressing XML – XSLT as template language

➞ Increasing relevance and maturity – esp. XML Databases (eXist DB, Marklogic, Oracle,

MS, ..) ➞Several Tools

– Execution Environments (Saxon, Zorba, Altova, …) – Development Support(Eclipse XQDT, XML Spy,

Oxygen, Stylus Studio, …)

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 13

XQuery - Technically ➞Functional language

– Execution as chain of functions – Variety of predefined functions

• Standard functions for Strings, Numerics, Paths, … • Extended functions for fulltext, geo operations, … • External functions in C, Java, …

➞XML oriented – XPath based selection and filtering – Native XML types & Schema aware – Loops, conditional statements and grouping on XML

Collections – Static & dynamic creation of XML elements

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 14

XQuery for Spatial Data Harmonization ➞Pros

– Open standard; several implementations and tools – Optimized for XML Processing (Schema aware;

collections/sequences and XML types) – Modularization and external libraries (e.g. ExPath Geo)

➞Cons – „Programming“ language with specific syntax (steep

learning curve) – No direct geospatial support

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 15

Overview

➞Why is Spatial Data Harmonization still important or necessary?

➞Tools & Methods ➞Tips & Tricks ➞Conclusions

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 16

Main “Tip & Trick”: use XQuery

➞Trick: for data/schema reduction – Simplifying data analysis and

understanding of structures – Speed up processing

➞Tip: for data/schema transformation – Encapsulate repetitive tasks in functions – Build modules for common structures

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 17

Example 1: Data reduction by filtering of Featuretypes declare variable $input as element() := validate strict {doc('file:///C:/data/dataset.xml')/gid:AX_Bestandsdatenauszug};

let $featureSet = $input/enthaelt/wfs:FeatureCollection/gml:featureMember [name(child::*)='AX_Strasse' or name(child::*)='AX_Strassenachse' or name(child::*)='AX_Fahrbahnachse' ]

for $feature in $featureSet

return $feature

1. Define and validate an external file as datasource 2. Selection of XML elements by using XPath expression 3. Iterate the result and return data

1.

2.

3.

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 18

Example 2: Reclassification of values

switch ($strasse/aaa:widmung) case "1301" return attribute{“abc:roadType”}{“highway”} case "1303" return attribute{“abc:roadType”}{“road”} default return element attribute{“abc:roadType”}{“unknown”}

1. Select the criteria 2. Decode the values and create according attributes 3. Return default attribute value

1.

2.

3.

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 19

Tips: Get started with XQuery ➞Info:

– XQuery Spec: http://www.w3.org/TR/xquery-30/

– XQuery Tutorial: http://www.w3schools.com/xquery/

➞Environment: – Zorba (XQuery Processor)

http://www.zorba.io/ – Eclipse XQDT

http://wiki.eclipse.org/XQDT/

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 20

XQuery Tools - XQDT

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 21

XQuery Tools - XQDT

Tip: Use the older Eclipse „Indigo“ (Eclipse „Juno“ and „Kepler“ fail on big XML Schemas)

INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 22

Overview

➞Why is Spatial Data Harmonization still important or necessary?

➞Tools & Methods ➞Tips & Tricks ➞Conclusions

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 23

Conclusions

➞XQuery is an excellent, easy-to-use method to specify and execute transformations for XML/GML data

➞Further possibilities for simplification, dependent on input and output schemas

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 24

Next Steps - XQuery Modules

GML Base Modules: • GML321_basicTypes.xq • GML321_geometryAggregates.xq • GML321_geometryBasic0d1d.xq • GML321_geometryBasic2d.xq • GML321_geometryPrimitives.xq • GML321_gmlBase.xq • GML321_feature.xq • XLink10.xq (XLink Schema) Helper: • Tools (Type extension, UUID, …) • External Calls

Simplifying Modules: • GML321_Simple.xq (Creation of

Feature, Point, Curve, Surface) • GML321_GeometryTools.xq

(Harmonising Geometry, Simple Transf.)

Schema Modules: • MySchema.xq (Creation of

CoreElements)

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 25

Conclusions

➞We are very much interested in exchanging experiences on the usage of XQuery for the transformation of spatial data!

➞Thank you for your attention!

25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 26