translation proofing

9
Translation Proofing – Quantitative Tools for Connecting Metadata Dialects Ted Habermann Director of Earth Science The HDF Group thabermann@hdfgroup. org 1

Upload: ted-habermann

Post on 16-Jan-2015

220 views

Category:

Science


3 download

DESCRIPTION

Communities use many different dialects to document their data. We need to be able to translate between these dialects and to understand how much is lost in translation

TRANSCRIPT

Page 1: Translation proofing

1

Translation Proofing – Quantitative Tools for Connecting Metadata Dialects

Ted HabermannDirector of Earth Science The HDF [email protected]

Page 2: Translation proofing

Metadata in Multiple Dialects

DocumentationRepositoryISO 19115,

19115-2, 19119 and extensions

THREDDS

HDF, netCDF(NcML)

FGDC,Data.Gov

SensorML

WCS, WMS, WFS, SOS

Open Provenance

Model, PROV

DIF, ECS, ECHO

KML

Page 3: Translation proofing

Translation Lossiness

Documentation dialects generally have significant overlap because the concepts that are being documented (who, where, what, when, and why?) are shared cross many communities and dialects.

At the same time, there are differences…

A B AB

More Lossy Less Lossy

We are familiar with the idea of lossiness with data compression. How can we quantify the lossiness of a translation?

Page 4: Translation proofing

Characterizing the Source

The distribution of elements in any metadata collection reflects the requirements of the data providers and users. Some elements are more common (important?) than others.

This heterogeneity needs to be considered when evaluating the translation.

448 CSDGM Records161,151 Elements and Attributes10,713 Place Keywords1 /metadata/USGSErp/MetadataNotes264 elements occur < 100 times

Page 5: Translation proofing

Lossiness = Distribution + Crosswalk

+Actual Distribution (collection & community) Reference Crosswalk

In order to calculate the lossiness of a translation we need the actual distribution of elements in the source and a reference crosswalk that gives the destinations that the source elements are mapped to.

Source Destination

Page 6: Translation proofing

ESIP Winter 2014 6

Three Examples

January 8-10, 2014

Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 1 25%C 20 10% 1 10%

204 1 100%

Element A occurs 134 times and makes up 66% of the source Element B occurs 50 times and makes up 25% of the source Element C occurs 20 times and makes up 10% of the source

Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 0 0%C 20 10% 1 10%

204 1 75%

Element # % Translated? % TranslatedA 134 66% 1 66%B 50 25% 1 25%C 20 10% 0 0%

204 1 91%

100% elements translated: lossiness = 0%

75% elements translated: lossiness = 25%

91% elements translated: lossiness = 9%

Page 7: Translation proofing

Calculating Lossiness

+

Number of Occurrences

Total Number of Elements*1 if in crosswalk0 if not

n = 1

number of elements

= Lossiness

Actual Distribution (collection & community) Reference Crosswalk

1-

Source Destination

Page 8: Translation proofing

8

Questions?

[email protected]

Page 9: Translation proofing

Acknowledgements

This work was partially supported by contract number NNG10HP02C from NASA.

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.