metadata transformation important technical considerations: extraction / normalization / enrichment

8
METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Upload: ronald-mcdonald

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

METADATA TRANSFORMATION

Important technical considerations: extraction / normalization / enrichment

Page 2: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Extraction: XML is picky

• All tags must be closed as opposed to HTML– <element></element>– <element/>

• Doesn’t like any of your special characters– & = &amp;– < = &lt;– > = &gt;

• Encoding sensitive

Page 3: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Extraction: Attributes• Consider the following example:• If you export names, post codes and coordinates

into the ”coverage” element – how can you use these afterwards?– <coverage>London</coverage>– <coverage>12.1234,89.1235531</coverage>

• The ESE doesn’t define these attributes for anything but language– <coverage type=”text”>London</coverage>– <coverage

type=”coordinates”>12.1234,89.1235531</coverage>

Page 4: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Extraction: Additional data

• ESE may not alway contain all the information which MAY be interesting from an aggregator’s perspective

• The ESE can be extended without breaking the format – but it needs to be done in such a way as not to conflict or interfere with the XML structure of ESE elements

Page 5: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Normalization: dates

• Date extraction is somewhat inaccurate and may well render ”bogus” output– ”...it was almost as bad as in the 1920s...”– ”...back in the dark ages...”– Values given by reference may be erroneously

considered valid for the content

• If uncertain about what to put where – consider what is most useful to the end-user

Page 6: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Normalization: vocabularies

• Van Eyck, Jan– Jan Van Eyck– Van Eyck Jan– Van Eyck, Jan en Hubert– gebroeders Van Eyck– Van Eyck, J. (1395-1441)

• (Example from “Erfgoedplus.be”, courtesy of Jef Malliet)

Page 7: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Normalization: precision

• ca. 1560• 1560 ?• 16th century• 1500-1599

• (Example from “Erfgoedplus.be”, courtesy of Jef Malliet)

Page 8: METADATA TRANSFORMATION Important technical considerations: extraction / normalization / enrichment

Enrichment: what is it?

• Example– Mapping content values to common vocabulary

with defined relationships between them– Enables vast quantities of unrelated content to be

automatically linked to eachother – rendering considerable added value

• Example– Automatic language translation– Poor quality – but possibly better than nothing