the sciencethe searchthe solution dois and the secondary publisher; a match made in heaven? andrea...
Post on 18-Dec-2015
215 Views
Preview:
TRANSCRIPT
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
DOIs and the Secondary Publisher; a match made in heaven?
Andrea Powell
Product Development Director
CABI Publishing
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
It is a truth universally acknowledged….
….that a secondary database in possession of millions of bibliographic references, must be in want of a linking solution
(with apologies to Jane Austen)
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
A bit about CABI Publishing
• First publication in 1912• Applied life sciences publisher• Database products at the heart of our
publishing business (CAB Abstracts and Global Health)
• Primary journals and books now account for 30% of turnover
• Total turnover approx. £12 million
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Some facts and figures• CAB Abstracts (1973-2004) contains 4.5
million bibliographic references• Our Archive (1912-1972) adds a further 2.2
million references• Our acquisitions database lists 9000 active
publishers from whom we receive content• We receive about 7500 serials in any one
year, from over 125 countries in over 50 languages
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Oh, and not forgetting...
… we also cover books, conference proceedings, technical bulletins, “grey literature”, websites, annual reports, theses…… (approx. 18% of total)
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
So what do we do?
• Create a consistently indexed, standardised, searchable database to enable the discovery of this rich content
• And then link the user to the full-text as seamlessly as possible
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
DOIs and CrossRef - a heaven-sent solution?
• Universal, multi-publisher protocol
• Cost-effective, although concerns at the beginning about escalation of look-up fee costs
• Hurray!
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Adding DOIs to the database
• Creation of new field within Production Database
• Development and implementation of new workflows to collect DOIs at most appropriate stage of our process
• Matching our serials list against the CrossRef Metadata Database
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Looking up DOIs - the early days
• In early 2002, we were able to achieve 4% matching rates (ought to have been 18%)
• Reasons for poor match rate:- timing of deposits- poor quality data- rigid matching algorithm- mis-match between our records and retrieved metadata
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Our DOI look-up and implementation process
• Two methods:- weekly look-up- twice-yearly batch look-up
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up
• Automated system built into our weekly mechanism for transferring records from our production system to our live database
• Manual option to re-run this stage is also available if necessary
• Records with no DOI value but with ISSN selected and extracted into a processing list
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up
• Each field is processed to replace CABI-specific formatting with URL-safe coding
• Single query string constructed from the data from 50 records
• Each new query added to the string, separated from neighbour using URL-safe line feed “%0A”
• Approximately 3800 look-ups per week
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up
• We use a piped format:SN|DO|AU|VL|NO|PP|YR||PA*(*PA is our unique identifier)
• Query string sent to CrossRef via web link:"http://doi.crossref.org/servlet/query?&usr=cabi&pwd=crpw1683&type=q&area=live&fuzzy=true&format=piped&qdata=SN|DO|AU|VL|NO|PP|YR||PA|%0A SN|DO|AU|VL|NO|PP|YR||PA|”
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
DOI Assignment
• Web feed returned and converted into text file, which is processed to extract out individual queries
• Each query then processed to recover the PAN (unique ID) and DOI data
• PAN matched back to our database and DOI data embedded in record
• BINGO!
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Twice-yearly batch look-up• Entirely manual process, using text files and e-
mails• Look-up process much the same, but date range
added to selection process• Piped query strings output in batches of 1000 and
prefixed with a CABI e-mail address • Each file of 1000 queries uploaded via CrossRef
website• Results returned via e-mail and processed to
extract PAN and DOI
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Looking up DOIs - these days
• Now consistently achieving 25-30% matching rate
• Backfile look-ups are even better, at 40%
• But how frequently should we add DOIs to our backfile - is twice a year enough?
• Not yet querying for Books or Conferences, but plan to soon
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Getting DOIs to the customer
• A&I databases are typically delivered via a number of third parties, e.g. Ovid, ISI, EBSCO, Dialog….
• It’s taken until late 2004 for some vendors to implement DOIs in our database
• Not all vendors use DOIs for linkage, preferring their proprietary systems
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Other ways of linking to full-text
• 40% is good, but that still leaves a lot of unmatched references!
• User demand is for more and more full-text linkage - “good enough” generation won’t pursue non-linking items
• Customers can tailor their own links with Link Resolvers
• CAB Direct provides a default linking solution for subscribers without a Link Resolver
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Digital Archives
• Many primary and secondary publishers now digitising their archives
• CAB Abstracts archive adds 2.2 million references, back to 1912
• Full-text linking more difficult with incomplete references, no ISSNs (pre-database era), lack of digital originals
• Issue of timing again writ large!
THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
The bigger picture• Researchers still use secondary databases
heavily in their resource discovery processes
• The amount of material to be indexed increases year by year
• Secondary databases have to keep pace with changes in scholarly communication
• We must put our content where the users are, not the other way round
top related