Enrichment and Structuring of Archival Description Metadata

Download Enrichment and Structuring of Archival Description Metadata

Post on 25-Feb-2016

33 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Enrichment and Structuring of Archival Description Metadata. Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou**. ** National Centre for Text Mining The University of Manchester, UK Ioannis.Korkontzelos@manchester.ac.uk Sophia.Ananiadou@manchester.ac.uk. - PowerPoint PPT Presentation

TRANSCRIPT

  • Enrichment and Structuring of Archival Description MetadataKalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou**

    * Tilburg Centre for Cognition & CommunicationThe University of Tilburg, NLK.Zervanou@uvt.nl Antal.vdnBosch@uvt.nl** National Centre for Text MiningThe University of Manchester, UKIoannis.Korkontzelos@manchester.ac.uk Sophia.Ananiadou@manchester.ac.uk

  • Research on MetadataDeveloping standards:collection specific (e.g. EAD, MARC21) cross-collection (e.g. Dublin Core)Provide mappings: across schemasontologies (ad hoc or standard CDOC-CRM) Discard metadata for IR (Koolen et al., 2007)Exploit metadata for IR (Zhang&Kamps, 2009)

  • The IISH EAD datasetEAD: XML standard for encoding archival descriptionsChallenges: Variety of languages usedVarying type and amount of informationStyle: enumerations, lists, incomplete sentences

  • Motivation & ObjectivesImproved search and retrievalcontent-based metadata document clusteringcontent-based/semantic searchsupport exploratory searchlink across collections, metadata formats & institutionscreate unified metadata knowledge resources

  • Method overview

  • Method overview

  • Pre-processingEAD/XML element selection & extractionEAD elements containing free-text & archive content information

    Language identification (n-gram method)Identifier trained on Europarl corpus

    Text snippets length: ~20 tokens

  • Snippet length based on language

  • Method overview

  • Method overview

  • Enrichment & StructuringTopic detection: Automatic term recognition using C-value method

    Agglomerative hierarchical term clustering:complete, single & average linkage criteriadocument co-occurence & lexical similarity measures

  • Method overview

  • Method overview

  • Term results (auto eval)

  • ResultsC-value best performance: candidates that occur as non-nested at least once

    Average linkage criterion & Doc Co-occurence: provide broader and richer hierarchies

  • Questions?Check-out our poster!

Recommended

View more >