knowledge bases and related tools: improving openurl effectiveness jason price, phd claremont...
TRANSCRIPT
KNOWLEDGE BASES AND RELATED TOOLS:
IMPROVING OPENURL EFFECTIVENESS
Jason Price, PhD Claremont Colleges/SCELCKBART Working Group Member
ER&L 2009 ConferenceUCLA
K ?Xok ?X KBART
Today’s Outline
OpenURL Overview Measure of success; Positives and negatives
KBART: Reviewing Problems & Seeking Solutions KBART background, goals, membership
Main problem areas & Solutions improve holdings data accuracy Improve application of OpenURL syntax from
“sources” Improve knowledge of OpenURL & its
importance & issues KBART Deliverables
OpenURL Overview
The evolution of the OpenURL in reality:
If links fail, patrons will turn to the tool that always works
Three main problems with OpenURL today: Bad data; Bad formatting; Lack of
knowledge;
‘ ‘
The Measure of Success
Better access for patrons Fewer false positives: saying it’s available when its
not Fewer false negatives: saying it’s not available
when it is Best-case scenario:
IF a patron is seeking an item, and her library offers access to it through exactly seven online resources,
THEN the OpenURL resolver returns exactly seven accurate links to the full text
AND the ‘best’ resources appear first
http://tinyurl.com/59txop
Why we do what we do…
The OpenURL resolver window
Transport to the target database…
…containing the full text
Dan in Real Life…
The Positives – it gets patrons to content they would not otherwise have found It’s a great leap forward in library services It’s fairly straightforward; it’s not incredibly
complicated The Negatives – it doesn’t get patrons to
content as effectively as it should Inaccurate data leads to bad and missing links Incorrect implementation doesn’t transfer
metadata properly Lack of knowledge of its importance means:
some vendors aren’t using it many of others aren’t investing in improved source
implementation or more accurate & timely data transfer But first, a bit of history
OpenURL in Real Life…
KBART: A History
UKSG 2007 research report by James Culling,“Link Resolvers and the Serials Supply Chain” (at http://www.uksg.org/projects/linkfinal) Provided ideas on improving usage and accuracy Recommended follow-up to address some specifics
NISO partnership to broaden reach and include US audience
KBART: An Introduction
Knowledge Bases And Related Tools UKSG and NISO collaborative project Get better data for everyone –
Those who provide data (publishers, aggregators)
Those who process data (link resolvers, ERMs, etc.)
Those who present data (libraries, consortia) All for THOSE WHO USE DATA – library patrons
Ensuring timely transfer of accurate data to knowledgebases, ERMs, etc.
Who’s in KBART? Core working group chaired by Peter McCracken
(Serials Solutions) and Charlie Rapple (TBI Communications; formerly Ingenta) Link resolver/ERM suppliers – Ex Libris, Serials Solutions Publishers – British Medical Journal Group, Taylor &
Francis Subscription agents/aggregators – Credo, EBSCO, Swets Consortia –California Digital Library, SCELC Libraries – Claremont, Cornell, Edinburgh, Leicester,
Princeton, Pacific Northwest Technical Lab Monitoring group
More of these plus other related groups e.g. NASIG Anyone can join monitoring group sign up for updates: [email protected]
Knowledge bases
Date coverage
Title relations
Licensing
Data & transfer
Supply chain
Compliance
accuracy
format
vol/issue vs date
date granularity (day, month, season, year)
title changes
title mapping
abbreviations
ISSN/ISBN variations
re-use of ISSN effect on
licensing
genericism/granularity
misrepresentation
package variations
accuracy
free content
format
ownership
contacts/feedback mechanisms
incentive
informal structure
unclear responsibilities
duplication of effort
file format
format definitions;
shoe-horning
age of data
accuracy
frequency
link syntax and
granularity
Problem Overview
KBART: Examining the problems
“OpenURL’s Negatives” Inaccurate holdings data leads to bad
& missing links Incorrect implementation doesn’t transfer
metadata properly Lack of knowledge means some vendors
aren’t using it and the remainder aren’t improving it
Inaccurate Data – The problem
ErrorLevel
False (+) including links to inaccessible
content
False (-)lacking links to accessible
content
TitleAccess not activated by publisher
Accessible title not listed in KB/Catalog
Date Range
Part of access not activated by publisher OR Years of access over-represented in KB/Catalog
Years of access under-representedin KB/Catalog
Inaccurate Data – Impact
ErrorLevel
False (+) False (-)
Title 290 journal yrs 1205 journal
yrsDate Range 485 journal
yrs
Listing of ≈ 120,000 articles needed correction
(based on estimated ave. 6x/yr & 10 articles/issue)
Inaccurate Data – Current responses
REACTIVE – correcting data for individual articles that patrons report as inaccessible But what about the (large) majority that go
unreported (esp. the false negatives that prove that “Google has
lots of content ’not available through the library’”) PROACTIVE – before we get (or don’t get)
complaints title by title or package by package extremely labor intensive An example
Proactive reconciliation of an ejournal package list General Process – library, consortium or KB
vendor (Re-)Request updated access list from publisher Sample publisher list for accuracy Translate publisher list to match KB list
Number of titles never matches Perform ISSN match with MS Access Watch for & integrate title changes, mergers, acquisitions
and losses Watch for publisher-reuse of ISSNs/title combinations Identify date discrepancies manually (inconsistent formats)
Decide when its ‘good enough’ and go live/distribute new list
Lather, Rinse, Repeat
Correcting Inaccurate Data – the hard way
Correcting Inaccurate Data – the hard way
226 titles = 16%
Inaccurate Data – The KBART Solutions Standardize transfer of data within
and among supply chain participants Phase I - Best practices recommendations
specifying: means of data transfer frequency of updates File structure Data elements – Mandatory and
Optional e.g. Start and end date format & granularity
KBART: progressive data element recommendations
Under consideration (Mandatory or optional?)
Title level information Issue completeness (includes all articles?) Article completeness (Includes tables & figs?) Full text format (html vs pdf) Embargo period (granular specification) Moving wall (a la Nature/Palgrave) Genre
Freely accessible content listed separately Ebook fields
If we build it, & they don’t come … How do we handle incorrect data?
Grading? Policing? Shaming? Biggest and most difficult problem to solve
Highlight to content providers how important completely accurate data is to their end users Consider the ‘false positive’: arrrgh, that’s
frustrating… Consider the ‘false negative’: much, much
worse: how would you ever know?
article citation (SOURCE)
query (base URL+ metadata string)
link resolver/knowledge base
target (cited)article
publisherwebsite
database
printcollections gateways
publisher/providerholdings data
repository
Incorrect Implementation – the problem
A book chapter citation in a database: Cognitive psychology, new test design,
and new test theory: An introduction. Snow, Richard E.; Lohman, David F.; In: Test theory for a new generation of tests. Frederiksen, Norman; Mislevy, Robert J.; Bejar, Isaac I.; Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc, 1993. pp. 1-17. [Chapter]
Incorrect Implementation – an example
Incorrect Implementation – an example
No self-respecting OpenURL talk… http://ry6af4uu9w.search.serialssolutions.com/?
genre=bookitem&isbn=0805805931&issn=&atitle=Test+theory+for+a+new+generation+of+tests.&volume=&issue=&date=19930101&title=Cognitive+psychology%2c+new+test+design%2c+and+new+test+theory%3a+An+introduction.&aulast=Snow%2c+Richar&spage=1&pages=1-17&sid=XXXX:PsycINFO&pid=%3Cui%3E1992-98936-001%3C/ui%3E&%3Cdate%3E19930101%3C/date%3E&%3Cdb%3EPsycINFO%3C/db%3E
Incorrect Implementations – the Example made
worse http://ry6af4uu9w.search.serialssolutions.com/?
genre=article&isbn=0805805931&issn=&atitle=Test+theory+for+a+new+generation+of+tests.&volume=&issue=&date=19930101&title=Cognitive+psychology%2c+new+test+design%2c+and+new+test+theory%3a+An+introduction.&aulast=Snow%2c+Richar&spage=1&pages=1-17&sid=XXXX:PsycINFO&pid=%3Cui%3E1992-98936-001%3C/ui%3E&%3Cdate%3E19930101%3C/date%3E&%3Cdb%3EPsycINFO%3C/db%3E
Genre: Openurl 0.1 vs 1.0
Study at Claremont 5 Results in each of 5 genres from each of 5
databases Journal Articles, Books, Chapters, Newspaper arts,
[Dissertations] Measure success rate, cause of each failure Preliminary analysis shows:
Journal articles have significantly lower failure rate Source URL formation as major cause of failure Relative consistency within a database/genre
combination
Incorrect Implementations – the impact
Solving the Problems: Lack of Knowledge
Some content providers simply aren’t aware of what OpenURL does and why it benefits them Education & advocacy
Follow recommendations of Culling/SIS report; provide useful information to those content providers How to implement correctly Offer contacts for those needing assistance
The remainder may not recognize the value of and their role in improving OpenURL effectiveness
Solving the Problem: Lack of knowledge
Help content providers determine what is working, and what isn’t Cornell project to focus on source OpenURLs Identify correct and incorrect implementations Give opportunity for vendors to grade selves
Offer more & better examples of why open OpenURL matters Quiet challenge (ok, at least out loud) to ER
community: produce and distribute studies of the effect of OpenUrl (& poor implementations) on usage
There is one underway at Claremont, we need many more So many possibilities, so little time (shortsighted? –we’re
treating the symptoms, not curing the disease)
Summary: KBART Deliverables Create a report that provides general
guidance on problematic issues Data problems Incorrect implementation Limited knowledge
Offer best practices guidelines for how to effectively transfer accurate data among parties
Provide better understanding of supply chain
Challenges
Figuring out how to deal with data accuracy questions
Ensuring uptake among smaller or less-committed content providers
Providing ongoing support for new participants
Thanks!
http://www.uksg.org/kbart http://www.niso.org/workrooms/kbart
Peter McCracken (NISO co-chair) [email protected]
Co-founder & Director for Research, Serials Solutions Charlie Rapple (UKSG co-chair)
[email protected] Head of Marketing Development, TBI Communications
Jason Price (Working group member) [email protected]
Head of Collections @ Claremont; SCELC ejournal package analyst