oclc online computer library center interoperability standards & searching multiple repositories...

Post on 04-Jan-2016

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

OCLC Online Computer Library Center

Interoperability Standards &

Searching Multiple Repositories

Ralph LeVan/OCLC

Ray Denenberg/Library of Congress

The ProblemThe Problem

How do I provide a common interface for my users?

How do I combine results from multiple sources?

How do I provide a common interface for my users?

How do I provide a common interface for my users?

How do I convert my queries into the Content Provider’s (CP’s) queries?

How do I ask for 10 records?

How do I ask for more records?

How do I interpret their response?

How do I convert my queries into the CP’s queries?

How do I convert my queries into the CP’s queries?

My user said “author=twain and title=huck finn”

Google expects: +twain +”huck finn”

Z39.50: twain/1=1003;4=2 “huck finn”/1=4;4=1 and

Lucene: creator:twain and titlePhrase:”huck finn”

How do I ask for 10 records?How do I ask for 10 records?

Amazon won’t let you

RedLightGreen: MAXRECORDS=n

British Library: records=n

How do I ask for more records?How do I ask for more records?

Amazon: page=n

RedLightGreen: STARTINDEX=n

British Library: start=n

How do I interpret their response?How do I interpret their response?

How many records did I retrieve?

Did something go wrong?

How do I convert the CP’s records into something my users will recognize?

How many records did I retrieve?How many records did I retrieve?

Amazon:<a href="/gp/search/ref=sr_nr_i_0/002-2019116-

8269663?%5Fencoding=UTF8&keywords=pratchett&rh=i%3Aaps%2Ck%3Apratchett%2Ci%3Astripbooks&page=1">Books</a><span class="narrowValue">&nbsp;(334)</span>

RedLightGreen:<b>Viewing:</b> 1-10 of 239 results

British Library<opensearch:totalResults>190</

opensearch:totalResults>

Did Something Go Wrong?Did Something Go Wrong?

RedLightGreen:<span class=smallText>We didn't find any

matches for <b>dog and</b>.</span>

British Library:<item ><title >Nothing found due to an error</title><description >Too many hits. Refine your

request.</description></item>

How do I convert the records?How do I convert the records?

Amazon:

<table class="searchresults" border="0" width="100%" cellpadding="0" cellspacing="0">

<tr><td width="100%" class="searchitem" id="Td:0">

<table border="0" width="100%" cellpadding="0" cellspacing="0"><tr valign="top">

<td>

<table class="n2" border="0" cellpadding="0" cellspacing="0">

<tr>

<td class="imageColumn" width="88"><table border="0" cellpadding="0" cellspacing="0">

<tr><td align="center" width="80">

<a href="http://www.amazon.com/gp/product/0060815221/sr=8-1/qid=1142436987/ref=pd_bbs_1/002-2019116-8269663?%5Fencoding=UTF8"><img src="http://ec1.images-amazon.com/images/P/0060815221.01._PIsitb-st-arrow,TopLeft,-1,-14_SCTHUMBZZZ_.jpg" width="55" alt="Thud! (Discworld, Book 32)" height="82" border="0" /></a>

</td><td width="8"></td></tr></table></td>

<td class="dataColumn"><table cellpadding="0" cellspacing="0" border="0"><tr><td>

<a href="http://www.amazon.com/gp/product/0060815221/sr=8-1/qid=1142436987/ref=pd_bbs_1/002-2019116-8269663?%5Fencoding=UTF8"><span class="srTitle">Thud! (Discworld, Book 32)</span></a>

by Terry Pratchett (<span class="binding">Hardcover</span>

- Sep 13, 2005)</td></tr>

<tr><td class="brandLink"><span class="aliasName">Books:</span> <a href="/gp/search/ref=sr_nr_seeall_1/002-2019116-8269663?%5Fencoding=UTF8&keywords=pratchett&rh=i%3Aaps%2Ck%3Apratchett%2Ci%3Astripbooks">See all 334 items</a></td></tr>

<tr><td><span class="priceType"><a href="http://www.amazon.com/gp/product/0060815221/sr=8-1/qid=1142436987/ref=pd_bbs_1/002-2019116-8269663?%5Fencoding=UTF8">Buy new</a>: </span>&nbsp;<span class="listprice">$24.95</span> <span class="saleprice">$15.72</span>

&nbsp; <span class="priceType">

<a href="http://www.amazon.com/gp/offer-listing/0060815221/sr=8-1/qid=1142436987/ref=pd_bbs_1/002-2019116-8269663?%5Fencoding=UTF8">Used &amp; new</a>

</span> from <span class="otherprice">$3.76</span>

&nbsp; <span class="avail">Usually ships in 24 hours</span>

</td></tr><tr><td colspan="2"><table cellpadding="0" cellspacing="0" border="0">

<tr><td class="excerptStart"><span class="excerptLead">Excerpt from</span> <a href="/gp/reader/0060815221/ref=sib_aps_pg/002-2019116-8269663?%5Fencoding=UTF8&keywords=pratchett&p=S00E&checkSum=y3glB4NEGJ6Ql3iAWFd6teZptAJmys3Uu8CCW9387%252BA%253D">page 2</a>: &quot;<span class="excerpt">... Terry <b>Pratchett</b> "Most of the news is ...</span>&quot;</td></tr>

<tr><td class="excerptSeeMore"><a href="/gp/reader/0060815221/ref=sib_aps_ref/002-2019116-8269663?%5Fencoding=UTF8&keywords=pratchett&v=search-inside">See more references</a> to <span class="excerptUserInput">pratchett</span> in this book.</td></tr><tr><td style="padding-top: 5px; padding-bottom: 8px;"><span style="font-weight: bold; color: #339933;">Surprise me!</span> <a href="http://www.amazon.com/gp/reader/0060815221/ref=sib_aps_sup/002-2019116-8269663?%5Fencoding=UTF8&p=random">See a random page</a> in this book.</td></tr></table></td></tr>

</table></td></tr></table>

</td></tr></table></td>

</tr>

Converting Records Cont.Converting Records Cont.

RedLightGreen:

<td class="highlightcell"><span class="titleText"><b><a title="View more information about this title." href="ucw.servlets.UCWController?ACTION=EDITION&amp;WORKID=21537371&amp;LANGUAGE=ENG&amp;MATERIAL=books&amp;FROMRSLT=3&amp;FROMWORK=1&amp;lang=english">Hogfather</a></b>, by Terry Pratchett <br>3 editions published between 1996 and 1998 in English.<br>Primary Subject: Discworld Imaginary Place - Fiction<br><img src="/ucwprod/web/images/green.gif" height="3" width="10" alt="A title's position in a search result is based on relevancy (how closely your search terms match the description) &#xA;and availability (how many libraries have a copy of the title)."/><img src="/ucwprod/web/images/white.gif" height="3" width="1"/><img src="/ucwprod/web/images/green.gif" height="3" width="10" alt="A title's position in a search result is based on relevancy (how closely your search terms match the description) &#xA;and availability (how many libraries have a copy of the title)."/><img src="/ucwprod/web/images/white.gif" height="3" width="1"/><img src="/ucwprod/web/images/green.gif" height="3" width="10" alt="A title's position in a search result is based on relevancy (how closely your search terms match the description) &#xA;and availability (how many libraries have a copy of the title)."/><img src="/ucwprod/web/images/white.gif" height="3" width="1"/><img src="/ucwprod/web/images/green.gif" height="3" width="10" alt="A title's position in a search result is based on relevancy (how closely your search terms match the description) &#xA;and availability (how many libraries have a copy of the title)."/><img src="/ucwprod/web/images/white.gif" height="3" width="1"/><img src="/ucwprod/web/images/gray.gif" height="3" width="10" alt="A title's position in a search result is based on relevancy (how closely your search terms match the description) &#xA;and availability (how many libraries have a copy of the title)."/><img src="/ucwprod/web/images/white.gif" height="3" width="1"/></span></td></tr></table><table xmlns="http://www.w3.org/TR/REC-html40" border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="recordsepcell" colspan="2"><img src="/ucwprod/web/images/clear.gif" height="1"/></td></tr></table><table xmlns="http://www.w3.org/TR/REC-html40" border="0" cellpadding="3" cellspacing="0" width="100%"><tr valign="top"><td width="25" align="right" class="highlightcell"><span class="titleText">2.</span></td>

Converting Records Cont.Converting Records Cont.

British Library:

<item ><title >Thud! / Terry Pratchett.</title>

<link >http://catalogue.bl.uk/F/-?func=direct-doc-set&doc_number=013220851&l_base=BLL01&from=A9OpenSearch</link>

<description > Pratchett, Terry. ; London : Doubleday, 2005. . ISBN 0385608675 (hbk.) : £17.99 . (Added : 20050614 )</description></item>

How do I combine results from multiple sources?How do I combine results from multiple sources?

Things you might want the server to do for you:– Common Record Format– Common Sort Order– Common Rank Order

Functional MatrixFunctional MatrixRequest Record Starting Point

Request Number of Records

Request Record Schema

Defined Query Grammar

Specify Sort Order

Specify Ranking Order

Diagnostic Messages

XML Response

Record Count In Response

Records In Known Schema

The Old SolutionsThe Old Solutions

Screen Scraping

Private API’s

Z39.50

Screen ScrapingScreen Scraping

A query has to be generated and embedded in a CP specific URL

Code has to be written to examine the HTML returned by a CP

Prone to breakage– Web sites change formatting frequently

Every site is unique– Separate code to be maintained for every

site

Private API’sPrivate API’s

Often only a slight improvement over screen scraping

Provides documentation on how to construct the URL

Might provide documentation on how to construct the query

Might guarantee a stable response format

Still requires unique code for each site

Z39.50Z39.50

Guarantees a standard request and response

But…– Not HTTP or HTML

• Binary encoding over raw TCP/IP

– Complicated• 11 services• 7 extended services

– Easy to be compliant and not interoperable– Unfriendly

• The response to a protocol error was to drop the connection

Why Use A Standard API?Why Use A Standard API?

Defined requests and responses

Reusable code across sites

Open Source code

The New SolutionsThe New Solutions

OpenSearch 1.1

MXG– Levels 0-2

SRU

OpenSearch 1.1OpenSearch 1.1

From Wikipedia– OpenSearch is a collection of technologies

that allow publishing of search results in a format suitable for syndication. It is a way for search engines to publish their search results in a standard and accessible format

OpenSearch 1.1 (cont.)OpenSearch 1.1 (cont.)

Defines a Description Record with information about the CP– ShortName and LongName– Description– Tags– URL template

Example:

http://herbie.bl.uk:9080/opensearch.xml

OpenSearch 1.1 (cont.)OpenSearch 1.1 (cont.)

URL Template– Server Indicates how to specify OpenSearch request

parameters– Parameters not specified in the template are

unavailable– The only mandatory parameter is {searchTerms}

<Url type="application/rss+xml" template="http://herbie.bl.uk:9080/cgi-bin/OSxml1.cgi/?q={searchTerms}&start={startIndex?}&records={count?}&format=rss" />

OpenSearch 1.1 (cont.)OpenSearch 1.1 (cont.)

Request Parameters– {searchTerms}– {count}– {startIndex} – {startPage} – {language} – {outputEncoding}– {inputEncoding}

OpenSearch 1.1 (cont.)OpenSearch 1.1 (cont.)

Uses RSS 2.0 with a few extra elements for the response– RSS define title, description and link

elements– OpenSearch adds the totalResults,

startIndex, itemsPerPage, link and Query elements

http://herbie.bl.uk:9080/cgi-bin/OSxml1.cgi/?q=levan&format=rss

Functional MatrixFunctional MatrixOS 1.1

Request Record Starting Point ●

Request Number of Records ○

Request Record Schema

Defined Query Grammar

Specify Sort Order

Specify Ranking Order

Diagnostic Messages

XML Response ○

Record Count In Response ○

Records In Known Schema ○

Key: ●==Full Support ○==Limited Support

Cool FeatureCool FeatureThe RSS mechanism in OpenSearch provides the ability to have persistent and periodic queries!

NISO MetaSearch XML Gateway

MXG

NISO MetaSearch XML Gateway

MXGMXG has been designed to provide a low implementation barrier to content providers that want to make their databases available to metasearch engines.  Interoperability across content providers was explicitly not a goal of MXG

MXG Levels of SupportMXG Levels of Support

Level 0: Requests are simple URL’s using any query grammar and responses are XML records

Level 1: Adds a description record for the database

Level 2: Support a limited subset of a standard query grammar: CQL

MXG RequestMXG Request

Version (mandatory)

Query (mandatory)

StartRecord

MaximumRecords

http://alcme.oclc.org/MXG/search/ORPubs?version=1.1&query="levan"&startRecord=1&maximumRecords=10

MXG ResponseMXG Response

<?xml version="1.0" ?> <searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> … </records> <nextRecordPosition>1</nextRecordPosition> <echoedSearchRetrieveRequest> <version>1.1</version> <query>&quot;stuff&quot;</query> </echoedSearchRetrieveRequest> </searchRetrieveResponse>

MXG Response RecordsMXG Response Records

<record> <recordSchema> info:srw/schema/1/dc-v1.1 </recordSchema> <recordPacking>xml</recordPacking> <recordData> … </recordData> <recordPosition>1</recordPosition> </record>

MXG Response recordDataMXG Response recordData

<srw_dc:dc xmlns="http://www.w3.org/TR/xhtml1/strict" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:srw_dc="info:srw/schema/1/dc-v1.1"> <dc:identifier>rrl1234</dc:identifier> <dc:title>Dog and Cat</dc:title> </srw_dc:dc>

MXG Error MessagesMXG Error Messages<diagnostics> <diagnostic

xmlns="http://www.loc.gov/zing/srw/diagnostic/"> <uri>info:srw/diagnostic/1/51</uri> <details>66ntqk</details> </diagnostic> </diagnostics>

http://www.loc.gov/z3950/agency/zing/srw/diagnostics-list.html

Functional MatrixFunctional MatrixMXG Level 0

Request Record Starting Point ●

Request Number of Records ●

Request Record Schema ○

Defined Query Grammar

Specify Sort Order

Specify Ranking Order

Diagnostic Messages ●

XML Response ●

Record Count In Response ●

Records In Known Schema ●

Key: ●==Full Support ○==Limited Support

MXG Level 1MXG Level 1

Add a description record for the database

http://www.loc.gov/z3950/agency/zing/srw/explain.html

http://alcme.oclc.org/MXG/search/ORPubs

Functional MatrixFunctional MatrixMXG Level 1

Request Record Starting Point ●

Request Number of Records ●

Request Record Schema ●

Defined Query Grammar

Specify Sort Order

Specify Ranking Order

Diagnostic Messages ●

XML Response ●

Record Count In Response ●

Records In Known Schema ●

Key: ●==Full Support ○==Limited Support

MXG Level 2MXG Level 2

Support a limited subset of a standard query grammar: CQL

Supports indexes and Booleans

http://www.loc.gov/z3950/agency/zing/cql/

http://alcme.oclc.org/srw/search/ORPublications?version=1.1&query=dc.author=levan&maximumRecords=1

Functional MatrixFunctional MatrixMXG Level 2

Request Record Starting Point ●

Request Number of Records ●

Request Record Schema ●

Defined Query Grammar ○

Specify Sort Order

Specify Ranking Order

Diagnostic Messages ●

XML Response ●

Record Count In Response ●

Records In Known Schema ●

Key: ●==Full Support ○==Limited Support

SRUSRU

MXG Level 2 Plus:– Full Query Grammar (CQL)– Full Sort Specification

CQL: Common Query LanguageCQL: Common Query Language

Loosely based on CCL Search

Boolean & Proximity Operators

Index Sets & Indexes

String Indexes vs. Keyword Indexes

Truncation Characters ‘*’, ‘#’ & ‘?’

Relations: ‘=‘, all, any, exact, within

Example:dc.title=“harry potter” or bib1.isbn=123-456-78x

SortSort

sortKeys parameter with the following comma separated values specified:– Xpath (path to the element to be sorted on)– Schema (that the xpath comes from)– Ascending (value is 1==true or 0==false,

default==true)– CaseSensitive (value is 1==true or

0==false, default==false)– missingValue (values are omit, abort,

highValue or lowValue, default==highValue)

e.g. &sortKeys=title,onix,0

Functional MatrixFunctional MatrixSRU

Request Record Starting Point ●

Request Number of Records ●

Request Record Schema ●

Defined Query Grammar ●

Specify Sort Order ●

Specify Ranking Order ○

Diagnostic Messages ●

XML Response ●

Record Count In Response ●

Records In Known Schema ●

Key: ●==Full Support ○==Limited Support

Cool FeatureCool Feature

Combining SRU response data and echoed data with javascript and stylesheets allows for thin, browser based, clients

http://alcme.oclc.org/MXG/search/ORPubs?version=1.1&query="levan"&startRecord=1&maximumRecords=10

Functional MatrixFunctional MatrixOS 1.1

MXG L0

MXG L1

MXG L2

SRU

Request Record Starting Point ● ● ● ● ●

Request Number of Records ○ ● ● ● ●

Request Record Schema ○ ● ● ●

Defined Query Grammar ○ ●

Specify Sort Order ●

Specify Ranking Order ○

Diagnostic Messages ● ● ● ●

XML Response ○ ● ● ● ●

Record Count In Response ○ ● ● ● ●

Records In Known Schema ○ ● ● ● ●

Key: ●==Full Support ○==Limited Support

top related