the world’s libraries. connected. reintroducing glimir plenary session: worldcat local panel music...
TRANSCRIPT
The world’s libraries. Connected.
ReintroducingGLIMIR
Plenary Session: WorldCat Local Panel
Music OCLC Users Group Annual Meeting
San Jose, California2013 February 27
Jay Weitz
Senior Consulting Database Specialist
WorldCat Quality Management Division
OCLC
The world’s libraries. Connected.
Reintroducing GLIMIR: Definition and Objectives
GLIMIR = Global LIbrary Manifestation IdentifieR• To identify records describing the same manifestation: Manifestation Clusters.
• Parallel records: Same resource with same content in same format, but described in different languages of cataloging.
• Create OCLC Manifestation Identifiers (OMI) and index them in WorldCat.
• To identify records describing different manifestations with the same content: Content Clusters.
• Originals, reprints, microform reproductions, digital reproductions.
• Create OCLC Content Identifiers (OCI) and index them in WorldCat.
• To improve FRBR work sets by merging those containing records that GLIMIR assesses to be equal in content.
• Informing FRBR of algorithm improvements.
The world’s libraries. Connected.
FRBR algorithm:• Works in real time.
• Makes author/title key.
• Creates work clusters.
• Assigns the OCLC Work Identifier (OWI).
Duplicate Detection and Resolution (DDR):
• Works as an offline process.
• Launches queries to find candidate duplicates.
• Resolution program determines “retained” record.
• GLIMIR adapts DDR algorithms, creates clusters and identifiers.
Reintroducing GLIMIR: Relation to FRBR and DDR
The world’s libraries. Connected.
Reintroducing GLIMIR:Diagram of Metadata and Identifier Structure
•Identifiers at all levels
•Holdings at all levels
•Metadata summaries at higher levels
The world’s libraries. Connected.
Worldcat.org: Before GLIMIR: Multiple Works,
Scattered Holdings• Retrieves and displays one
representative record per work set.
• Currently there may be multiple work sets for the same work (particularly for works without clear authors).
• Depending on the search, these records may be scattered in large result sets.
Reintroducing GLIMIR: Before
The world’s libraries. Connected.
Worldcat.org: After GLIMIR: One Work,
Consolidated Holdings• Consolidated work set (more
likely to get a thumbnail image).
• Includes translations.
• Briefer short lists, more complete retrieval.
Reintroducing GLIMIR: After
The world’s libraries. Connected.
• Perception of duplicate problem in WorldCat has worsened as more non-English language of cataloging records are loaded and parallel records are added.
• Holdings scatter.
• DDR has deleted nearly 13 million records since 1992.
• Perception of duplicates in WorldCat remains.
• GLIMIR OMI should have a bigger impact on perceived duplication.
• Importance of good work groups.
Reintroducing GLIMIR: Perceived Duplicates
The world’s libraries. Connected.
GLIMIR complements de-duplication:
• Hides records that are duplicates but cannot be de-duplicated (styles/rules too different, sparse records).
• Surfaces holdings, hides less desired descriptions.
• Gives more accurate count of the numbers of manifestations in WorldCat.
Reintroducing GLIMIR: De-Duplication
The world’s libraries. Connected.
Just as with FRBR, improvements to general matching have been identified:
• Typo tolerance in pagination.
• Improvements to lists of noise titles.
• Improved language and transliteration sensitivity.
• Interpretation of size (e.g. gr8 = octavo = 8o = 22 cm = 8 in.)
• Normalizing titles.
Reintroducing GLIMIR: De-Duplication
The world’s libraries. Connected.
• “Cast list.”
• Dates.
• Scores, Parts, Scores and Parts.
Reintroducing GLIMIR: Music and Film
The world’s libraries. Connected.
Reintroducing GLIMIR: Cluster HoldingsInformation Displays on Each Bibliographic Record