data designed for discovery - oclc...data designed for discovery roy tennant senior program officer,...

50
Member Forum • 16 December 2016 Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research

Upload: others

Post on 18-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Member Forum • 16 December 2016

Data Designed for Discovery

Roy TennantSenior Program Officer, OCLC Research

Page 2: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• This is the Research view of linked data• We (OCLC) have experiments and prototypes,

but no products or production services (yet)• We (OCLC Research) have been working with

linked data for as long as anyone in the library world

• Our (OCLC Research) playground is the entirety of WorldCat (380 million records) and a parallel computing cluster

• Stay tuned for more information on production services

A few introductory remarks

Page 3: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

WHY LINKED DATA?

Page 4: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

What we have to work with

Page 5: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• A collection of text strings…• Taken from the piece itself…• Sometimes “enhanced” with inferred

parentheticals (e.g., [1975] )…• Or additional statements not on the piece (e.g.,

subject headings)• Punctuation, which may or may not be present,

is used (inconsistently) for structure• Mostly uncontrolled and only loosely connected

to anything else• Designed for description rather than discovery

What we have to work with

Page 6: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

THE PROBLEM

Page 7: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• Identification Problems (illustrated next):– The Title Problem– The Names Problem

• Quality Problems (illustrated next):– The Legacy Problem (strings are not controlled

terms; often, they cannot be turned into them)• Linkage Problems:

– The Web Problem (records aren’t enough, you need links)

– The Language Problem (showing the right translation for a given user)

Actually, A Number of Problems

Page 8: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 9: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 10: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Data Quality Problems

Page 11: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

THE SOLUTION

Page 12: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

First, define ALL

THE THINGS

Page 13: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Quick Definitions

entity/ˈɛntɪti/noun

a thing with distinct and independent existence.

relationship/rɪˈleɪʃ(ə)nʃɪp/noun

the way in which two or more people or things are connected

Page 14: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Albert Einstein Person

Relativity: The Special and General TheoryWork

PhysicsConcept

author

about

…establish relationships with other entities

Page 15: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

https://www.wikidata.org/wiki/Q937 and http://viaf.org/viaf/75121530Wikidata and VIAF

http://experiment.worldcat.org/entity/work/data/369081611WorldCat Works

http://id.loc.gov/authorities/subjects/sh85101653.htmlLibrary of Congress Subject Headings

author

about

…with actionable links from authoritative data hubs

Page 16: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

From Records to Entities: Works

Page 17: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 18: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 19: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 20: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 21: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

OCLC Production Services

External OCLC Research Systems

Internal OCLC Research Resources

enhancedWorldCat

WORKS

Kindred Works

Classify

Identities

FictionFinder

Cookbook Finder

LCSH

FAST

VIAF

GMGPC

GSAFD

GTT

DDCLCTGM MeSH

Linked Data Entities

Page 22: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

OCLC’s linked data resources

WorldCat Catalog:15 billion triples

WorldCat Works: 5 billion RDF triples

FAST:23 million

triples

VIAF: 2 billion triples

ISNI: 10-50 million triples

Page 23: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

VIAF aggregates identifiers

Page 24: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Wikidata disseminates identifiers

Page 25: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

OCLC’S 2015 INTERNATIONAL LINKED DATA SURVEYSOURCE: KAREN SMITH-YOSHIMURA

Page 26: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Academic library

National library

Network

Government

Scholarly

Public Library

Museum

Other

31%

20%14%10%

8%7%

4% 6%

2015 responding institutions by type

71 institutions total

Page 27: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

What is published as linked data

0 10 20 30 40 50 60

Authority filesBibliographic data

Data about musuem objectsDatasets

Descriptive metadataDigital collections

Encoded archival descriptionsGeographic data

Ontologies/vocabulariesOther

Page 28: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

2015 linked data sources most consumed 2015VIAF (Virtual International Authority File) 41DBpedia 36GeoNames 35id.loc.gov 35Resources we convert to linked data ourselves 17Getty's AAT 16FAST (Faceted Application of Subject Terminology) 15WorldCat.org 15data.bnf.fr 12Deutsche National Bib Linked Data Service 12

Page 29: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

SOLVING PROBLEMS & MOVING TOWARD A LINKED DATA FUTURE

Page 30: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Improving the Discovery Experience

Page 32: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 33: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Exploring Ways to Use Linked Data

Page 34: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 35: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 36: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

Offering the right translation

Page 37: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

Offering the right translation

Page 38: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Bringing Authority Control to the Web

Page 39: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• Person Lookup Service – An experimental service for looking up OCLC Person Entities

• Scenario:– A library wants to disambiguate a name – It sends the name text string to our API– We check all of our aggregated authority files and

send back the best match(es)– Each response comes with one or more URIs (e.g., to

LCNAF, Wikidata, ISNI, etc.)– The library inserts this data into their record, turning a

text string into an actionable link on the web

Prototyping New Services

Page 40: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Replicate existing library functions more cheaply and

efficiently

Improve data integration

A better user experience

Greater Web visibility

Develop better models of resources not well served by

current standards

Improve internal data management

In Summary: Why Linked Data?

Page 41: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

EASING THE TRANSITION

Page 42: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• Working with the Library of Congress and others to finalize the BIBFRAME standard

• Beginning to explore what working with it at scale will mean

Collaborating on BIBFRAME

Page 43: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• Modeling bibliographic data using Schema.org• Collaborating on expanding the Schema.org with

additional bibliographic elements at bib.schema.org• Syndicating WorldCat data to search engines using

Schema.org markup

Working With the Web

Page 44: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

Learning About Changing Workflows

Photo by https://www.flickr.com/photos/sanjoselibrary/ - CC BY-SA 2.0

Page 45: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments
Page 46: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• Use uniform titles • Use added entries with role codes (7xx and $4)• Use 041 for translations, including intermediate translations• Use indicators to refine the meaning

• Use the most specific fields appropriate for a descriptive task

• Minimize the use of 500 fields• Obey field semantics• Avoid redundancy

If you must use free text:• Use established conventions• Use standardized terms

Least machine-processable

Most machine-processable

Algorithmically recoverable

Making MARC “Linked Data Ready”

Page 47: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

The Charge How should URIs

be added to MARC records to ease the transition to Linked

Data?

Participants • British Library, German National Library, Library of Congress,

National Library of Medicine, OCLC.• University libraries at Cornell, Columbia, George Washington,

Harvard, Ohio State, Stanford, University of Washington

Creating Standards for URIs

Page 48: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

• We are in a major transition that will take YEARS to navigate

• We don’t know yet exactly what the future holds…

• ...but we know that it will be more linked and machine readable (actionable) than ever before

• And that’s a Good Thing

Summary Remarks

Page 49: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

For More Information

Page 50: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments

SMTogether we make breakthroughs possible.

Thank you!Roy Tennant@[email protected]

OCLC Member Forum • 3 Nov 2016

©2016 OCLC. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: “This work uses content from “Data Designed for Discovery” © OCLC, used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.”