1 frbrization of european catalogues challenges and some solutions trond aalberg norwegian...

19
1 FRBRization of European Catalogues challenges and some solutions Trond Aalberg Norwegian University of Science and Technology (NTNU) Workshop on FRBR in The European Library 9 October 2008, National Library of Portugal – Lisbon - Portugal

Upload: nora-brooks

Post on 30-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

1

FRBRization of European Catalogueschallenges and some solutions

Trond Aalberg

Norwegian University of Science and Technology (NTNU)

Workshop on FRBR in The European Library

9 October 2008, National Library of Portugal – Lisbon - Portugal

2

Overview

• FRBRization?

• FRBR and new requirements for bibliographic information

• Challenges, problems and possibilities

– With some examples

Work

Manifestation

Expression

Item

is embodied in

is exemplified by

is realized through

3

FRBRization

• Catchy term for ”the FRBR model applied on existing bibliographic information”

– Converting existing bibliographic information – Or just interpreting (run-time)

• Different levels of ambition:

– Following the FRBR model or just FRBR-inspired– User interface only – presenting search results and allowing

users to navigate along the axis of FRBR relationships– Data model that implements (part of) the FRBR model

4

Cross-catalogue FRBRization

• FRBRization is even more relevant in a broader context:

– reuse of information across catalogues– as a framework for portals - integrated access to multiple

catalogues or cross domain integration– novel user interfaces – explorative

• In Europe

– Diversity in language, format and cataloguing practise

5

6

What FRBR really is about

• Emhasis on ”content” and the documentation of intellectual/artistic endavour

– What are the works and expressions in this product– Who are the actors and how do they relate to the

expressions and works– It’s like drawing a map....

• More consistently structured bibliographic information

– That can be processed and interpreted – not only searched and displayed

7

Our focus

• Conceptual models are ideal solutions

– ”This is where we want to go” objective– But how do we get there?

• Existing bibliographic information is a valuable asset

– One of the problems for future implementations of FRBR will be compatibility with already created information

• Identification of entities and relationships

– Experimenting with different rules, algortihms etc.– Gathering statistics and evaluating the results– Looking for solutions

8

Our experience so far.....

• Based on FRBRization of different collections

– BIBSYS (Norwegian catalogue - BIBSYSMARC)– The Slovenian National Bibliography (UNIMARC)– BTJ (Swedish catalogue - MARC 21)

• Different catalogues, different formats, different practises

– Many catalogue-spesific rules are needed

• A certain level of FRBRization is easy to achieve

– For ”richer” FRBRization there is a number of common problems related to the poor structuring capabilities of the MARC formats

9

10

Persons and Corporate Bodies

• Persons and Corporate Bodies are usually easy to identify

– Specific fields for these entities

• Duplicate entities is a frequent problem

– Despite the use of authority control

• Relatorcodes are needed to associate persons and corporate bodies to the correct kind of product entity

• For records with multiple persons and multiple works/expressions it is often difficult to set up the correct relationships....

11

12

Works and Expressions

• Works can be identified by titles and associated creators (if applicable)

– Major challenge is to find and select title, identify multiple works, ..– Problems related to the identification of persons are ”inherited”

• Expressions can be identified by the work it is associated to and additional expression-level information

• Typical problems

– Lack of original title/uniform title when title statement is inappropriate– Often inconsistent practise for work titles within and across catalogues

13

Not always easy...

100 1 $a Sjöwall, Maj, $d 1935-240 14 $a Den vedervärdige mannen från Säffle. $l Tyska245 14 $a Das Ekel aus Säffle ; $b Verschlossen und verriegelt : zwei Romane / $c Maj Sjöwall, Per Wahlöö260 $a Erftstadt : $b Area, $c 2006300 $a 639 s.500 $a Den vedervärdige mannen från Säffle / ... in der deutschen übersetzung von Eckerhard Schultz -- Det slutna rummet / ... in der deutschen übersetzung von Hans-Joachim Maass700 12 $a Sjöwall, Maj, $d 1935-. $t Det slutna rummet. $l Tyska700 12 $a Wahlöö, Per, $d 1926-1975. $t Det slutna rummet. $l Tyska700 1 $a Schultz, Eckehard $4 trl700 1 $a Maass, Hans-Joachim $4 trl700 12 $a Wahlöö, Per, $d 1926-1975. $t Den vedervärdige mannen från Säffle. $l Tyska740 4 $a Det slutna rummet

14

Manifestations

• Each record describes a single manifestation

– and manifestations can easily be identified by e.g. ISBN and/or title statment etc.

• But there are different solutions used for multivolumed publications

– Record linking– Note fields– Linking fields

15

Major challenges for FRBRization

• A number of techniques and a complex set of rules must be applied when interpreting records

– Inspecting fields, subfields and even parsing the text in note fields– Interpreting relator codes – No single set of rules for all catalogues– Still struggling with the bascic relationships...

• Results must be evaluated and corrected

– Equivalent entities has to be identified– Erronously identified entities and relationships has to be removed

16

What are the consequences?

• The current (rather simple) interfaces are tolerant to errors and inconsistencies

• The FRBR context adds new requirements to the data

17

The reason why020 $a 0396070213 : $c $6.95040 $a DLC $c DLC $d DLC050 00 $a PZ3.C4637 $b Hh3 $a PR6005.H66082 00 $a 823/.9/12100 1 $a Christie, Agatha, $d 1890-1976.245 10 $a Hercule Poirot's early cases / $c Agatha Christie.260 $a New York : $b Dodd, Mead, $c [1974]300 $a 250 p. ; $c 22 cm.505 0 $a The affair at the victory ball.--The adventure of the Clapham cook. --The cornish mystery.--The adventure of Johnnie Waverly.--The double clue.--The king of clubs. --The Lemesurier inheritance.--The lost mine.--The Plymouth express.--The chocolate box. --The submarine plans.--The third-floor flat.--Double sin.--The market basing mystery. --Wasps' nest.--The veiled lady.--Problem at sea.--How does your garden grow?650 0 $a Poirot, Hercule (Fictitious character) $x Fiction.650 0 $a Private investigators $z England $x Fiction.650 0 $a Detective and mystery stories, English.984 $a gsl991 $b c-GenColl $h PZ3.C4637 $i Hh3 $p 00022213155 $t Copy 1 $w BOOKS

18

What quality can we achieve?

• A large number of records have a ”simple” FRBR structure

– Single creator, published once...

• The quality from the more complex records is more questionable

– But this is where FRBR is mostly needed

• Errors and problems that users never would notice, become very visible when FRBRizing

19

Concluding remarks

• Is MARC sufficient for FRBR?

– More structured information about expressions, works is possible even in MARC

– Extensive use of relatorcodes is needed– Field linking (in MARC 21) could solve many of the problems

caused by multiplicity

• Can we automatically improve existing records?

– By implementing more intelligent entity discovery solutions– Using information from other records/catalogues in the

interpretation of others