ese to edm 2010
TRANSCRIPT
Europeana Metadata - ESE to EDM
Robina Clayphan
Interoperability Manager, Europeana Foundation
Collections Trust , London 28 June, 2010
Introduction
• The current metadata schema
• Europeana Semantic Elements - ESE
• Content ingestion
• The future data model
• Europeana Data Model - EDM
Europeana
Europeana brings together and makes available digital
content from:
• Four cultural heritage sectors
• Museums, Archives, Libraries, Audio-visual archives
• Twenty-nine countries
• EU plus Norway and Switzerland
• Twenty-six languages
• Four types of material
• Image, sound, video, text
….need for a metadata lingua franca…
ESE V3.2
Europeana Semantic Elements (ESE) V3.2 developed for the
prototype
• A Dublin core-based application profile
• Cross-domain schema for heterogeneous data
• Not to capture the full semantics of provider’s data
• 37 Dublin Core terms – used principally to describe the
objects
• 12 Europeana coined terms - used to support portal
functionality
• Needed to have consistent data for the portal to work
The Dublin Core elements
Title Alternative
Creator
Subject
Description TableOfContents
Publisher
Contributor
Date Created; Issued
Type
Format Extent; Medium
Identifier
Source
Language
Relation isVersionOf; hasVersion; isReplacedBy; replaces; isRequiredBy; requires; isPartOf; hasPart; isReferencedBy; references; isFormatOf; hasFormat; conformsTo
Coverage Spatial; Temporal
Rights
Provenance
Europeana elements
Element Who is responsible Function
europeana:isShownAt
or
europeana:isShownBy
Provider must provide at least one of
these elements - both if applicable.
URL
Links to object
europeana:objectProvider - if appropriate to the data
URLSource of thumbnail
europeana:providerProvider must provide this element.
Controlled list.Facet
europeana:typeProvider must provide this element.
Controlled list Facet
europeana:unstoredProvider – only if appropriate to your
data. Text stringContainer element
europeana:country
Europeana is responsible for providing
all these elements.
Facet
europeana:hasObject System use
europeana:language Facet
europeana:uri System Identifier
europeana:usertag User provided tags (future)
europeana:year Facet, timeline
Normalised elements
• Language
• ISO 369-1 standard two character code.
• Country
• ISO 3166 standard
• Year
• Four digit year from Gregorian calendar (YYYY)
• Generated where possible from date supplied in <dc:date>
• Provider
• Controlled list of names, in the language of provider
• Type
• Controlled list (in English) of four types: Text, Image, Sound, Video
• mapped from the diverse types used in source data (by provider)
Mapping and Normalisation
Three key reference documents for providers:
• ESE Specification V3.2
• Normalisation Guidelines V1.2
• ESE V3.2 XML schema + explanatory text
All available from the “Provide Content” section of the
Europeana Group pages:
http://group.europeana.eu/web/guest/provide_content
Additional elements for Rhine
Currently under development:
• europeana:dataProvider
• The name of the content provider
• europeana:rights
• Indication of a licence type the covers the digital content and the thumbnail
• Waiting for all the interdependencies to be worked through
Introduction
• The current metadata schema
• Europeana Semantic Elements - ESE
• Content ingestion
• The future data model
• Europeana Data Model - EDM
Content Ingestion
• Europeana has provided a Content Checker tool which has
two parts:
• The Content Ingestor
• Allows uploading of a data set
• Validation against the ESE V3.2 XML schema
• Importing the data into the database
• Indexing of data
• Caching of thumbnails
• The Test Portal
• Separate from the operational portal
• Allows provider to search for uploaded data
Content Ingestor
Select “new data set” - the ingestor automatically
creates a new ID – “null05” in this example
Content Ingestor - upload
Content Ingestor - validate
Index
Test Portal - search
Introduction
• The current metadata schema
• Europeana Semantic Elements - ESE
• Content ingestion
• The future data model
• Europeana Data Model - EDM
Looking forward
• Rhine release – July to September 2010
• Some ESE-related changes
• Addition of europeana:dataProvider
• Addition of europeana:rights
• Danube release – April 2011
• Incremental move to Europeana Data Model
• Features will depend on outcome of current prototyping work
PlanningFrom April 2010 till December 2010
Danube requirements specification
• Explore
• New ways of searching and browsing content: e.g. map searches, virtual
exhibitions, improved timeline, extended facets and multi-lingual support.
• Re-Use
• Search API, Linked Data
• Interact
• Tagging, more social media features, user generated content coming in
from partners
• Under the hood
• New richer data model (EDM), Metadata Service Registry, (External)
Service Registry, Resolution Discovery Service (PIDs)
• Experiment
• The ThoughtLab will showcase new services developed by our partners.
Rationale of EDM
• Precursor: ESE (Europeana Semantic Elements)
• used in 2008 version of Europeana
• represents lowest common denominator for object metadata
• convert datasets to Dublin-Core like standard
• forces interoperability
• major drawback: original metadata is lost
• EDM goals
• preserve original data while still allowing for interoperability
• Semantic Web representation
• Semantic linking between objects
EDM requirements
1. Distinction between the real world object (painting, book,
program) and its digital representation
2. Distinction between the object and the metadata record
describing the object.
3. Allow multiple records for same object, containing
potentially contradictory statements about an object
4. Support for objects that are composed of other objects
5. Standard metadata format that can be specialized
6. Standard vocabulary format that can be specialized
7. EDM should be based on existing standards
• “not yet another standard” !
EDM basics
• OAI ORE for organization of metadata about an object
• Requirements 1-4
• Dublin Core for metadata representation
• Requirement 5
• SKOS for vocabulary representation
• Requirement 6
• OAI ORE, Dublin Core and SKOS together fulfil
Requirement-7
The General Picture
Semantic Network
Networked object representations
The Class Taxonomy (from V5.0+)
Proxy
The Property Taxonomy (without ESE)
The Example – 1 from Direction des Musees de
France
31
The Example – 2 from the Louvre
32
Aggregation organizes data of a single provider:example 1
33
aggregation
digital representation
object
provenance
metadata
Proxy: metadata record for an object
34
proxy
object
metadata
Multiple providers = multiple aggregations(the same object)
35
aggregation
of DMF
aggregation
of Louvre
v
Europeana is “just” a special provider
with processed/enriched metadata
36
Europeana
aggregation
enriched
metadata
landing
page
Read about it
• EDM Primer
http://www.few.vu.nl/~aisaac/edm/EDM_Primer_100401.pdf
Thank you!
Advanced modeling in EDM
• See the documentation
• Relations between “provided” objects
• Part-whole links for complex objects
• Derivation and versioning relations
• Predefined classes for person, place, time and event
PlanningFrom April 2010 till December 2010
Priorities for Danube
• Improved Access
• Contextualization
• Content reuse
• (User) participation
• Data Enrichment
• Ingestion Infrastructure
• Repository Infrastructure
• PR & Projects Activity
• Experimentation
EDM representation: RDF standard
• Ovals are web resources with a URL
• Arcs are properties linking resources to other resources or to literals
• Resources belong to classes
• RDF model can be specialized using subclass and subproperty definitions
Dublin Core
• EDM uses the latest version of DCMI Metadata Terms
http://dublincore.org/documents/dcmi-terms/
• Specified with an RDF model
• Specialization of 15 original DC elements
dcterms:coverage
dcterms:spatial
dcterms:temporal
• Can be specialized itself
• see requirement
SKOS: vocabulary publication on the Web
• W3C standard
http://www.w3.org/TR/skos-primer/
• Adopted by large institutions such as Library of Congress
• Specified with an RDF model
• Can be specialized itself
OAI OREOpen Archives Initiative Object Reuse & Exchange
• Specification:
http://www.openarchives.org/ore/1.0/toc.html
• Specified with an RDF model
• Four key notions (RDF classes)
• Object: the book/painting/program being described
• Aggregation: organizes object information from a particular provider (museum, archive, library)
• Digital representation: some digital form of the object with a Web address
• Proxy: the metadata record for the object