europeana: update on metadata mapping and normalisation, content ingestion and aggregation...
TRANSCRIPT
Europeana: Update on Metadata
Mapping and Normalisation, Content
Ingestion and Aggregation Activities
Robina Clayphan
Interoperability Manager, EDLF
ECDL Workshop – Harvesting Metadata: Practices and Challenges
September 30 2009
Introduction
• A look at the metadata schema we use and the elements that must be in a standard form
• The whole ingestion process
• Summary of the aspects of and approach to aggregation
Europeana
Europeana brings together and makes available digital content from:
•Four cultural heritage sectors• Museums, Archives, Libraries, Audio-visual archives
•Twenty-nine countries• EU plus Norway and Switzerland
•Twenty-six languages
•Four types of material• Image, sound, video, text
….need for a metadata lingua franca…
ESE V3.2
Europeana Semantic Elements (ESE) V3.2 developed for the prototype
•A Dublin core-based application profile• Cross-domain schema for heterogeneous data• Not to capture the full semantics of provider’s data
•37 Dublin Core terms – used principally to describe the objects
•12 Europeana coined terms - used to support portal functionality
• Needed to have consistent data for the portal to work
The Dublin Core elements
Title Alternative
Creator Subject Description TableOfContents
Publisher Contributor Date Created; Issued
Type Format Extent; Medium
Identifier Source Language Relation isVersionOf; hasVersion; isReplacedBy; replaces; isRequiredBy; requires;
isPartOf; hasPart; isReferencedBy; references; isFormatOf; hasFormat; conformsTo
Coverage Spatial; Temporal
Rights Provenance
Europeana elements
Element Who is responsible Function
europeana:isShownAt or europeana:isShownBy
Provider must provide at least one of these elements - both if applicable.URL
Links to object
europeana:object Provider - if appropriate to the data URL
Source of thumbnail
europeana:provider Provider must provide this element. Controlled list.
Facet
europeana:type Provider must provide this element.Controlled list
Facet
europeana:unstored Provider – only if appropriate to your data. Text string
Container element
europeana:country
Europeana is responsible for providing all these elements.
Facet
europeana:hasObject System use
europeana:language Facet
europeana:uri System Identifier
europeana:usertag User provided tags (future)
europeana:year Facet, timeline
Normalised elements
• Language• ISO 369-1 standard two character code.
• Country • ISO 3166 standard
• Year• Four digit year from Gregorian calendar (YYYY)• Generated where possible from date supplied in <dc:date>
• Provider• Controlled list of names, in the language of provider
• Type• Controlled list (in English) of four types: Text, Image, Sound, Video• mapped from the diverse types used in source data (by provider)
Mapping and Normalisation
Three key reference documents for providers:
•ESE Specification V3.2
•Normalisation Guidelines V1.2
•ESE V3.2 XML schema + explanatory text
All available from the “Provide Content” section of the Europeana Group pages:
http://group.europeana.eu/web/guest/provide_content
Content Ingestion
• Europeana has provided a Content Checker tool which has two parts:
• The Content Ingestor• Allows uploading of a data set• Validation against the ESE V3.2 XML schema• Importing the data into the database• Indexing of data• Caching of thumbnails
• The Test Portal• Separate from the operational portal• Allows provider to search for uploaded data
Content Ingestor
Select “new data set” - the ingestor automatically creates a new ID – “null05” in this example
Aggregation and the Content Strategy
Move on to a look at various aspects of aggregation in Europeana – the need for it, the approach to it.
Aggregation - terminology
• A Content Provider • an organization that provides metadata that enables access to its
digital objects
• An Aggregator • collects metadata from a group of content providers• transmits them to Europeana,• helps content providers with guidance on conformance with
Europeana norms • transforms metadata if necessary• supports the content providers with administration, operations and
training
Roles and benefits
• Content providers • Know their content and data best – fewer mapping errors• Look at the results before ingested in operational system
• Aggregators • Know the needs of the providers (domain, level)• Play a bridging role between providers and Europeana – single
point of contact, conduit for information in both directions
• Europeana• Supporting role for consultation, co-ordination, standardisation• Management of the 10 million objects• Offer the cross-domain and multi-lingual service
Organisational Model
Europeana
AggregatorAggregator
InstituteInstituteInstitute
Aggregator
Institute Institute Institute Institute Institute Institute Institute
Institute Institute Institute Institute Institute Institute Institute InstituteInstituteInstitute Institute
Types of aggregator
Matrix of aggregators:
• cross-domain, single domain, thematic
• level of operation – regional, national, European, global
Domain/Geographic coverage Regional National European Worldwide
Cross-domain
(horizontal)
Thuis in Brabant CulturaItalia Europeana
Single- domain
(vertical)
MovE (museums in East Flanders )
Direcção-Geral de Arquivos (Portuguese archives)
Dismarc (music)
TEL (books)
EFG (movies)
World Digital library WorldCat
Them-
atic
Cross domain Judaica ArXiv.org
Single domain Great War Archive
Why aggregation?
• November 2008 – 5 million items in Europeana
• July 2009 - content from over 1000 providers
• July 2010 – target of 10 million items
• Many individual organisations asking to contribute
• Currently there are six projects that aggregate content for Europeana (amongst other objectives)
• another three projects starting later this year
• Europeana Group site at: http://group.europeana.eu/web/guest/home
Why aggregation?
• Labour-intensive administration and ingestion processes • Not due to the amount of data – but the number of organisations
• Aggregation provides economies of scale allowing Europeana Office to remain relatively small
Promoting aggregation and providing services and expertise to aggregators will be key to Europeana’s Content Strategy
• Europeana is a small organisation!
Aggregation activities
• Aggregators survey• Establish shared issues and need for support
• Formation of Aggregators group• Council of Content Providers and Aggregators is now part of
Europeana Governance structure
• Training for aggregators• Generic and bespoke training days as the need arises
• Identifying potential aggregators
• “EuropeanaLabs” for Aggregators
• Test environment for content delivery and/or software development
Aggregation activities
• Handbook for aggregators. Content to be decided as part of survey but likely to cover:
• Europeana source code, APIs, content checker etc• Technical documentation for participating in Europeana• Templates and documentation for budget planning, fundraising,
revenue generation, sustainability• Templates and documentation for administrative and
organisational aspects of running an aggregator• Templates and documentation on IPR and European Licensing
framework• Documentation for establishing political and networks support• Templates and documentation for dissemination activities• Wiki for aggregator issues