slovenian biographical lexicon – from a digital edition to an on-line application
DESCRIPTION
Jan Jona Javoršek *Tomaž Erjavec*Petra Vide Ogrin** * Jožef Stefan Institute, Ljubljana, Slovenia ** Slovenian Academy of Sciences and Arts, Library, Ljubljana, Slovenia. Slovenian Biographical Lexicon – From a Digital Edition to an On-Line Application. Outline. Digitization - PowerPoint PPT PresentationTRANSCRIPT
INFuture 2009, Zagreb
Slovenian Biographical Lexicon – From a Digital Edition to an On-Line Application
Jan Jona Javoršek* Tomaž Erjavec* Petra Vide Ogrin**
*Jožef Stefan Institute, Ljubljana, Slovenia
**Slovenian Academy of Sciences and Arts, Library, Ljubljana, Slovenia
INFuture 2009, Zagreb
Outline
Digitization Encoding methodology XML–TEI structure On-line application Future plans
INFuture 2009, Zagreb
Slovenian Biographical Lexicon
Printed version comprises 15 volumes + index, published over a longer period of time (1925–1991)
Includes notable figures important for Slovenian cultural life, from the beginnings up to contemporary time
Covers 5,042 biographical entries, over 5,100 persons because of family entries
Data in the articles are checked against the relevant primary sources
INFuture 2009, Zagreb
Example page from SBL
INFuture 2009, Zagreb
Encoding methodology
Use of open standards and software Use of TEI P5: specific elements for describing
biographical and prosopographical data, e.g.:
<birth>, <death>, <date>, <placeName>, <sex>, <faith>, <occupation>, <floruit>
Up-conversion into TEI–XML: OpenOffice – TEI OO package (XSLT stylesheets) → TEI–XML document (basic structure)
Semi-automatic extraction of metadata: Perl, XSLT + manual intervention
INFuture 2009, Zagreb
SBL article structure
<div>
<listPerson>
<person n=“main“> <!-- other elements for biographical data: birth, death, occupation … -->
</person>
<person n=“author“> <!--author's name-->
</person>
</listPerson>
<p> <!-- the annotated text of the article -->
</p>
</div>
INFuture 2009, Zagreb
INFuture 2009, Zagreb
Example of various atribute values for <persName>
@type = adopted 2 = artistic 21 = incorrect 6 = married 193 = monastic 4 = nickname 37 = operosorum 21 = partisan 96 = pseudo 2350
INFuture 2009, Zagreb
SBL online application
Fedora Commons: extensible framework for storage, management and dissemination of complex objects and object relationships
Repository + a digital library of bibliographical articles, enabling browsing and searching
Fedora Generic Search – provides native Fedora Commons interface between an external search system and Fedora Commons API
SOLR, search system based on Apache Lucene search and indexing library
OAI-MH protocol, REST and SOAP protocols
INFuture 2009, Zagreb
Example entry
INFuture 2009, Zagreb
Advanced search options
INFuture 2009, Zagreb
Advanced search
Drop-down menus for occupations – integrated taxonomy
Drop-down menus for placenames: search by different categories, e.g. country, district, settlement, multilanguage search for some places: e.g. Gradec (slov.) – Graz (ger.)
Search by forename, surname, and by different languages of person's name
Search by rolename: e.g. bishop, or nobility titles, e.g. count, knight, baron etc.
INFuture 2009, Zagreb
Future plans
Expansion and normalization of numerous abbreviations – problem: Slovenian is a highly inflectional language
Named Entity Recognition: to enable (semi)-automatic extraction/encoding of persons' and place names occuring in the full-text
Encode other information in the full-text: relatives within SBL, person disambiguation, links within SBL and to external sources, e.g. COBISS bibliographical records, wikisource (online literature publication)
Map placenames on an atlas, e.g. Google maps Slovenian Biographical Hub – SBL joined by other
biographical resources
INFuture 2009, Zagreb
Welcome to beta:
http://nl.ijs.si/fedora/sbl
Hvala!