experiences from the europeana regia project · – e.g. textual language • data tends to sit in...

Post on 24-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Cataloguing manuscripts in an

international contextExperiences from the Europeana regia project

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 2

• „A digital cooperative library of roal manuscripts in Medieval and Renaissance Europe“

• Co-funded by the European Commission (ICT-PSP, 50%)• Jan 2010 - June 2012• 5 partners

– BnF Bibliothèque nationale de France (and many municipal libraries)

– BSB Bayerische Staatsbibliothek, Munich

– BHUV Biblioteca historica, Universitat de Valencia

– HAB Herzog August Bibliothek Wolfenbüttel

– KBR Bibliothèque royale de Belgique, Brussels

Europeana Regia: partners

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 3

• 3 collections– Bibliotheca Carolina (8th/9th cent., 425 mss)– The Library of King Charles V of France (14th cent., 167 mss)– The library of the Aragonese Kings of Naples (15th cent., 282

mss)

• Information in 6 languages– Catalan, Dutch, English, French, German, Spanish

• Descriptions in 5 formats– MARC, EAD, TEI, MAB, MXML (=format of the German

national manuscript database Manuscripta Mediaevalia)

ER: collections + formats

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 4

1.Management2.Specification of metadata3. Integration of metadata4.Digitisation5. Integration of images6.Dissemination

WP leaders: Bnf 1+6, HAB 2+3, BSB 4+5

Workpackages

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 5

• This work package aims at consolidating the list and format of metadata to be used by each participant.– formats– level of metadata– quality of metadata– organise ingestion format

WP2 Specification of metadata

6Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.1 State of the art in metadata• global survey of cataloguing and metadata standards• obtain a matched OAI extraction despite the different

formats used in libraries (EAD and TEI e.g.)– BNF: EAD– KBR: local DB, had to decide upon the format (=TEI)– BSB: MAB → METS/MODS, MXML → TEI– HAB: TEI– BHUV: MARC-XML– Lyons: special data, with mapping to MODS

• quality and amount of metadata varied heavily6

7Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 Minimum metadata• Which information about a manuscript is necessary as

a very short overview and sufficient for basic orientation?– the manuscript identifiers (actual and former shelf marks,

hosting institution or possessors)– a summary title– basic historical information (date of origin, place of origin,

previous owners)– basic material information (material of the support, number of

leaves, size of leaves)– introductory bibliographical information

7

8Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 Minimum metadata• Europeana‘s view (according to ESE v3.2):

– Obligatory: europeana:rights– Strongly recommended

• dc:title• dcterms:alternative• dc:creator• dc:date• dc:contributor• dcterms:created• dcterms:issued

– Consider, how your data will perform in response to „who“, „what“, „where“, and „when“ queries

8

9Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 „Academic“ metadata• All information besides this minimum level is - together

with the minimum level - what is called "academic metadata" in the project, and will have to be added in a second step.

• The project partners will make sure that very important bits of information (authors names, standardized titles of the contained texts) are accurately provided.

• References to norm data like VIAF will be included in the descriptions.

• These special metadata will be translated in the relevant languages.

9

10Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.3 Vademecum for librarians• study of the existing descriptions (printed catalogues,

card files, computer files)• selection of common metadata to be provided by each

participant, formalised in a guideline for librarians and academic staff

• description of the format of metadata, in TEI, EAD and MARC

10

11Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.4 Attractive Guidelines• Guidelines (were intended to) cover the areas

– Content aggregation, metadata, image processing– Now: description of the project, the partners, the collections,

minimum metadata, and user's requirements• They (were intended to) address an external audience

– Librarians, scholars, professionals– Now: interested public

• „Technical description“ (former D2.1 – D2.3) will be published separately

11

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

• Integration of the existing (minimum or academic) metadata in each library’s system,

• Description of the digital object with a table of contents editor or image–related XML file, to update the metadata according to recent research (i.e. : date, patron, artist, origin)

• Eventually providing more detailed information in other internet resources

• Ingestion of data in Europeana

12

WP3 Integration of metadata

13Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

WP3 Progress on Metadata• Librarians will enter the minimum existing data in each library’s

system, as a first step, in order to make these metadata immediately available for Europeana.

• Metadata will be updated and (if necessary) amended, following a scheme that will allow improved access to the digital copy and agreed among the project partners.

• Full descriptions of the manuscripts will be accessible in specialized databases, e.g. Manuscripta Mediaevalia.

• As far as possible this information should be made available to Europeana.

13

14Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Mapping data: General rules• Map as many of the original source elements as possible to the

available ESE elements• If this is not possible, leave it unmapped or consider using

<europeana:unstored>• If possible use one of the more specific <dcterms> refinements• Consider how to meet expectations of the user and the

functionality of the system best• Consider how the data would perform in response to “who, what,

where and when” queries. This therefore encompasses names, types, places and dates relevant to the object and what it depicts

14

15Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

WP2 vs. WP3• WP2 = Theoretical foundation (Me)• WP3 = Practical implementation (Stefanie Gehrke)

15

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Local presentation: HAB

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Europeana: Search result

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Europeana: Detail view

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: Homepage

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: byRepository - HAB

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: ms detail

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 22

• All institutions needed to adapt their cataloguing formats– Customise the ENRICH-TEI schema, adapt TEI, adapt

AMREMM

• Aggregation via TEL (The European Library)– obligatory: rights declaration → tei:availability– obligatory: thumbnail → tei:pubPlace/tei:ptr– needed: project / collection → tei:projectDesc

• Delivery of ESE, preparation for EDM– EDM still under construction, but possibility “to represent”

manuscript data

Decisions: customise + delivery

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 23

• One TEI file for the manuscript, referencing the facsimile and resources (descriptions, websites, etc.)– Only „minimum“ metadata, ready to export to Europeana– Will be updated– <msDesc> = metadata, to be stored in <sourceDesc>

• One TEI file for each description– As rich description as it has been originally– Will stay as it is, as it represents an „historical“ document– <msDesc> = data, to be stored in <text>

Local Decisions: Theory

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 24

• TEL hasn't dealt with TEI themselves– Crosswalk had to be implemented → we supplied XSLTs

• How to – make sure every institution submits the same set of

information?– make similar information from different formats look

alike?• Refinement of the mapping table prepared in WP2

Mapping: Practice

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 25

• Translation– Some of the basic categories can be translated

(semi-)automatically (names, dates, etc)– In order to „avoid“ translation, use latin names and text-titles

• Harmonisation– Done during transformation (e.g. via OAI-MPH) respectively

during processing by aggregator→ break with habits: in TEL summary title contains shelfmark

• Normalisation, semantic quality– Norm data like VIAF, TGN, etc. shall be applied wherever

possible

Decisions: translate + harmonise

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 26

• Subject classification– In order to allow for browsing through repositories, subject

classification would be helpful → special index entries?

• Ontologies– Norm vocabulary would be helpful, e.g. for bindings,

decoration, muscial notation, scripts, etc.→ cf. Europeana Regia's TEI customisation

Wish-list

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 27

• The project has seen many changes– Implement workflows for the first time (digitisation + metadata

- KBR; OAI for mss → HAB; norm data → BHUV, KBR)– Adaptations (AMREMM – BHUV)– Delivery to Europeana → aggregation through TEL– Adapt export formats → harmonisation through TEL

• Adaptation of OAI difficult for BnF/BSB → selection by TEL

– ESE → EDM– Copyright status: RR-F → CC0– Organise multilingual access ourselves → europeanaregia.eu

Chan(c|g)es

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 28

• Cataloguing in a world of electronic publication and distribution, portals, and the need to exchange data needs to take into account– Data formats (local practices)– Publication paths (in print, electronic)– Mapping paths (generalisation of data types)– even: arranging data (position of kinds of data in the

character stream)

Conclusions I

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 29

• Additonally, some bits of information need to be encoded (more) explicitely– e.g. textual language

• Data tends to sit in multiple places– Each of them with special views, interests– Still: the most up-to-date and complete information will be

available usually from local presentations

• Until the realisation of the Semantic Web (and having solved some copyright issues) we might do with slim descriptions in portals and the richness in local (i.e. specialised) presentations.

Conclusions II

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 30

• http://www.europeanaregia.eu• http://www.europeana.eu• http://www.hab.de• http://diglib.hab.de/?db=mss

References

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 31

07_GoogleBooks_Blick-ins-Buch_Suchergebnis

AMREMM Descriptive Cataloguing of Ancient, Medieval, Renaissance, and Early-modern Manuscripts

EDM Europeana Data ModelENRICH European Networking Resources and Information concerning Cultural

Heritage

ESE Europeana Semantic ElementsMAB Maschinelles Austauschformat für BibliothekenOAI-PMH Open Archives Initiative, Protocol for Metadata HarvestingPND (GND) Personennamendatei (→ Gemeinsame Normdatei)TEL The European LibraryTGN Getty Thesaurus of Geographical NamesVIAF Virtual International Authority File

Glossary

top related