Click to edit Master title style
OCLC Online Computer Library CenterOCLC and FRBR: directions and
research results Lorcan Dempsey
with contributions from Diane Vizine-Goetz, Ed O’Neill, Thom Hickey
and Eric Childress
!!Revolution or Evolution? The impact of FRBR (Functional
Requirements for Bibliographic Records) Organized by the Australian Committee on Cataloging.
Melbourne Convention Centre, 2 February 2004
Overview
FRBR and OCLC
OCLC research work
OCLC production plans
Some issues
FRBR and OCLC
Long standing interest in work-based approaches– The Humphry Clinker problem
Strong practical interest– End-user presentation– ILL– Cataloging – help find records– Collection analysis– Data enrichment
OCLC Research and FRBR
Mining the data …– Ed O’Neill
Algorithmically FRBRizing– Thom Hickey
Work-based prototypes– Diane Vizine-Goetz– Thom Hickey
Click to edit Master title style
OCLC Online Computer Library Center
Mining the data
Analyzing representations of a single work in detail.
Tested OCLC Research conversion algorithm against 1000 works.
Types of Works
Elemental Works have only a single manifestation (78 %)
Simple Works have only a single expression but multiple manifestations (16 %)
Complex Works have multiple expression (6 %)
Principal Types of Complex Works
Translations
Augmented
Revised
Collected/Selected
Translations
All translations are expressions
Other types of complex works frequently include translations
Typical Augmented Work
48 Expressions
114 Manifestations
Expressions created by augmentation with: notes, introductions, illustrations, bibliographies, glossaries, etc.
The Expedition of Humphry Clinker
The Expedition of Humphry Clinker
Typical Revised Work
1st and 2nd Editions are by John Phillip Immroth
• 3rd and 4th editions are by Lois Mai Chan and “Immroth’s” was added to the title
Collected Works
A collection of items each of which is a distinct intellectual or artistic creation; a collection of works
50% of ‘collected works’ explicitly list component works.
And …
Expressions not clear.
Bring out the differences that matter.
Retrospective activity constrained by available bibliographic data.
Empirical work will support ongoing clarification of the model (Working group on the expression entity)
Click to edit Master title style
OCLC Online Computer Library Center
Algorithmically ‘FRBRizing’
The OCLC Research work set algorithm
Our Approach
Concentrating on work-level– Problems with expression-level clusters
Efficient, maintainable, understandable
Useful matches with correct cataloging– Err on the side of missed matches– Some accommodation of frequent variants (e.g.
Shakespeare’s Hamlet = Hamlet)
Compare with manually clustered– Reliable at work level. Expression level not clear
enough.
The Algorithm
A key is generated for each record
Extract author, title– Look up in NACO authority file– Added entry information as needed
Form a key from bibliographic record– Author, title, added entry information– These can be sorted, compared
Results
Manual estimate: 1.5 manifestations/work in WorldCat
Algorithm: ~1.27
25,000 clusters have >20 records
415,000 clusters have >4 records
30% records and 50% of holdings are in a cluster
Click to edit Master title style
OCLC Online Computer Library Center
Work-based prototypes
FictionFinderXISBN
FictionFinder
A prototype system of 2.6+ million bibliographic records for fiction clustered according to the OCLC FRBR work set algorithm
Uses the FRBR model to organize, index, and display bibliographic elements of potential interest to users
Fiction Subset
2,665,662 WorldCat records (fiction indicator)
1,758,479 work clusters
1.5 records/cluster
3,866 clusters have 20 or more records
50,540 clusters have 5 or more records
Most widely held fiction works
Holdings M’stations Key
29,043 692 twain, mark\1835 1910/adventures of huckleberry finn
26,088 1,267 carroll, lewis\1832 1898/alices adventures in wonderland
20,843 640 twain, mark\1835 1910/adventures of tom sawyer
19,410 1,341 defoe, daniel\1661 1731/robinson crusoe
18,566 983 cervantes saavedra, miguel de\1547 1616/don quixote
18,492 836 stevenson, robert louis\1850 1894/treasure island
18,123 526 dickens, charles\1812 1870/christmas carol
18,100 278 crane, stephen\1871 1900/red badge of courage
17,761 525 bronte, charlotte\1816 1855/ Jane Eyre
17,499 332 chekhov, anton pavlovich\1860 1904/short stories
FictionFinder & FRBR
Information that applies to all expressions of a given work, such as summaries, genre terms, and subjects given precedence in work/expression-level screen displays.
Because of the difficulty of consistently identifying expressions, manifestations are organized by language of expression
Work display
Work/expression display
FictionFinder & FRBR
Some characteristics of an expression, such as expression title, e.g.,– Harry Potter and the Philosopher's Stone v.s– Harry Potter and the Sorcerer’s Stone
are presented at the Work/Expression level
Other less clear-cut distinctions between expressions & manifestations, such as Braille and electronic book versions are presented at both the Work/Expression level and the Manifestation level.
Work/expression/manifestation display
xISBN
An experimental web service:– xISBN server receives a single ISBN and returns a list of
all ISBNs for the work cluster– Designed for machine-to-machine data exchange– Can return list in XML or XHTML
Supports automatic expansion of ISBN searches:– Check user ILL requests against all editions/versions in
OPAC– Use xISBN bookmarklet to find local library’s editions
when user finds any edition of item on Amazon, etc.– Quickly check OPAC for all editions/versions during
selection/acquisitions/gift book processing
xISBN
work cluster 1
work cluster 2
work cluster 3
ISBN 1ISBN 2ISBN 3
ISBN 1ISBN 2ISBN 3
ISBN 5ISBN 6ISBN 7
ISBN 5ISBN 6ISBN 7
ISBN 8ISBN 9
ISBN 10
ISBN 8ISBN 9
ISBN 10
<?xml version="1.0" encoding="UTF-8" ?> - <idlist>
<isbn>1875847634</isbn> <isbn>1860464947</isbn> <isbn>1860464955</isbn> <isbn>963859313x</isbn> <isbn>2221087615</isbn> <isbn>9532060065</isbn> <isbn>9657120055</isbn>
</idlist>
<?xml version="1.0" encoding="UTF-8" ?> - <idlist>
<isbn>1875847634</isbn> <isbn>1860464947</isbn> <isbn>1860464955</isbn> <isbn>963859313x</isbn> <isbn>2221087615</isbn> <isbn>9532060065</isbn> <isbn>9657120055</isbn>
</idlist>
OCLC FRBR Work-Set Algorithm http://labs.oclc.org/xisbn/1875847634
Eucalyptus / Murray Bail 1998 Melbourne : Text Pub. ISBN: 1875847634
Eucalyptus 1998 Melbourne : Text Pub.
Eucalyptus 1998 London : Harvill Press
Eucalyptus 1999 London : Panther
Eukaliptusz 1999 Budapest : Ulpius-ház [Hungarian]
Eucalyptus 1999 Paris : R. Laffont [French]
Eukaliptus 1999 Zagreb : Meandar [Croatian]
Ekaliptus 2001 Tel Aviv : Hargol [Hebrew]
xISBN table builder
xISBNserver
Searchingfor the book on Amazon
Searchingfor the book on Amazon
LibraryLookup bookmarklet
LibraryLookupLibraryLookup
http://www.amazon.co.uk/exec/obidos/ASIN/1860464955/qid=1075134526/sr=1-1/ref=sr_1_10_1/202-6426661-8213436
Is the book at my library?Is the book at my library?
SingleISBN
xISBN bookmarklet
http://www.amazon.co.uk/exec/obidos/ASIN/1860464955/qid=1075134526/sr=1-1/ref=sr_1_10_1/202-6426661-8213436
xISBNserver
LibraryLookupLibraryLookup xISBNxISBN
Multiple ISBNs
ADDED
ADDED
ADDED
ADDED
ADDED
Is the book at my library?Is the book at my library?
OCLC production plans
FRBR in FirstSearch (end-user searching)– End 2004 as part of broader searching
enhancement.– Present users with view most relevent to them
(work, manifestation, …)
FRBR and cataloging– Interested in potential for ‘FRBRization’
services– Use FRBR as aid to finding cataloging copy– FRBR view of cataloging yet to be discussed.
Some issues
Data. Variations in cataloging practice and errors or omissions in transcription and input lead to false clusters
Systems. Support in library management and other systems.
Agreement and shared practice. Theoretical discussion needs to be informed by practice. The detail!
Communications format. How to share works etc. Different internal implementations.
Further information
www.oclc.org/research
ProjectsPublicationsResearchWorks (soon)Software (algorithm)