next generation z39.50 a web services approach for search and retrieve

29
Next Generation Z39.50 A Web Services Approach for Search and Retrieve nnual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC William E. Moen <[email protected]> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603

Upload: sanura

Post on 31-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

6 th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC. Next Generation Z39.50 A Web Services Approach for Search and Retrieve. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Next Generation Z39.50A Web Services Approach for Search and Retrieve

6th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC

William E. Moen<[email protected]>

School of Library and Information Sciences

Texas Center for Digital KnowledgeUniversity of North Texas

Denton, TX 72603

Page 2: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 2

Overview

Quick description of SRW Brief background – historical, political,

conceptual Non-technical (almost) introduction to SRW Common Query Language (CQL) briefly Concluding thoughts

Page 3: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 3

What is SRW? Search and Retrieve Web Service (SRW) An XML-based protocol for searching, retrieving,

and other information retrieval transactions Cast in the standards/technologies for web

services XML SOAP HTTP

Brings the concepts and experience of Z39.50 into the web environment using web technologies

Page 4: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 4

Why SRW?

Genesis: several years of soul searching by Z39.50 developers and implementors

The “web” had become the common implementation environment

Z39.50 was not perceived as web friendly Pivotal moments:

December 2000 ZIG meeting July 2001 meeting

Page 5: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 5

Turning point: December 2000 “Z39.50 Future” discussion Perceptions of Z39.50

broken heavy-weight difficult and complex old technology not web friendly

Several options presented Rewrite the protocol from the ground up Rewrite as an XML protocol Separate the Z39.50 protocol from its use of BER as a wire

protocol Simplify the protocol specifications to focus on core features

Recognition of the intellectual contribution of Z39.50

Page 6: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 6

Taking action: June 2001 Invitational meeting to discuss moving Z39.50 to an XML-

based protocol Goal

Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful.

Objective Define specifications for a new web service definition based on

Z39.50 together with web technologies Separate the Z39.50 abstract and associated semantic model

from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP)

Initially called Z39.50 Next Generation (ZNG) Intended as proof-of-concept Defining only those protocol specifications that would

actually be implemented by participants

Page 7: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 7

ZING – Z39.50 International Next Generation

Make intellectual/semantic content of Z39.50 more broadly available

Make Z39.50 more attractive by lowering barriers to implementation Use of XML – to represent and encode data Use of HTTP – for transport Use of SOAP – for interaction between client and

server based on Remote Procedural Call (RPC) Several ZING initiatives: ZOOM, ez39.50, ZeeRex,

SRW/U

FOR MORE INFORMATION, VISIT THE ZING WEBSITE…

http://www.loc.gov/z3950/agency/zing/

Page 8: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 8

SRW/U, SRW, SRU SRW/U: Search and Retrieve for the Web

General designation for this initiative SRW: Search and Retrieve Web Service

HTTP Post Simple Object Access Protocol (SOAP) XML messages

SRU: Search and Retrieve URL Service HTPP Get Request parameters included in URL syntax

Development Version 1.0 November 2001 Version 1.1 February 2002

FOR MORE INFORMATION, VISIT THE SRW WEBSITE…

http://www.loc.gov/srw

Page 9: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 9

Networked information retrieval

What’s needed: Identifying a target to search A vocabulary for expressing search requests,

search criteria, retrieval requests, etc. Methods to encode the requests and

responses from the target Methods to transport the requests and

responses across a network In other words, a protocol and supporting

specifications

Page 10: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 10

Abstract Model of IR

Page 11: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 11

Abstract model of Z39.50

Page 12: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 12

Z39.50 classic & SRW

Page 13: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 13

SRW Overview

Builds on Z39.50 concepts and web technologies

Web technologies: XML, SOAP, HTTP Uses new, human-readable query

language Combines several Z39.50 features into

several “operation types” searchRetrieve operation scan operation explain operation

Page 14: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 14

searchRetrieve operation

The core of the protocol Expresses the search and additional criteria Records are returned in XML

Request parameters version query Optional parameters

• sortkeys• recordPacking• recordSchema• recordXPath• stylesheet

Response parameters version numberOfRecords Optional parameters

• resultSetID• resultSetIdleTime• records• diagnostics

Page 15: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 15

SRW & XML

XML as foundation for protocol Provides syntax for intelligent markup Defines or references XML schemas Example XML schema for SRW

specifications searchRetrieveRequest searchRetrieveResponse

Page 16: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 16

searchRetrieveRequest example

Sent as a HTTP Post XML document is sent to the server Using SOAP to wrap the request

<searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest>

Page 17: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 17

searchRetrieveResponse example

<searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-

v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse>

Page 18: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 18

searchRetrieve response

Records returned in response All records in XML syntax According to one or more XML schemas

(semantics) Dublin Core Onix MODS MarcXml

Page 19: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 19

searchRetrieve example

Retrieval results XML view Screen shot

<searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema></searchRetrieveRequest>

Page 20: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 20

SRW results

Page 21: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 21

SRU briefly Protocol requests can be carried via HTTP Get searchRetrieveRequest parameters expressed in

standard URL syntax baseURL and search part separated by question

mark “?” Response is XML document containing records The searchRetrieveRequest in SRU:

http://alcme.oclc.org/srw/search/SOAR?operation=searchRetrieve&version=1.1&query=dc.title=%22computer%22&recordSchema=DC&startRecord=1&maximumRecords=10&recordPacking=xml

Page 22: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 22

search/Retrieve query

SRW query consists of one or more query statements linked by Boolean operators

Five categories of query statements:1. single search clause

2. two or more search clauses linked by Boolean

3. search clauses and result sets linked by Boolean

4. two or more result sets linked by Boolean

5. single result set

Expressed in the Common Query Language (CQL)

Page 23: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 23

Common Query Language (CQL) A formal language for representing queries to information

retrieval systems Human-readable Search clause

Always includes a term• simple terms consist of one or more words

May include index name• To limit search to a particular field/element• Index name includes base name and may include prefix

• title, subject• dc.title, dc.subject

• Several index sets have been defined (called Context Sets in SRW)• dc• bath• srw

• Context set defines the available indexes for a particular application

Page 24: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 24

Other components of CQL Relation

<, >, <=, >=, =, <> exact used for string matching all when term is list of words to indicate all words must be found any when term is list of words to indicate any words must be

found Boolean operators: and, or, not Proximity (prox operator)

relation (<, >, <=, >=, =, <>) distance (integer) unit (word, sentence, paragraph, element) ordering (ordered or unordered)

Masking rules and special characters single asterisk (*) to mask zero or more characters single question mark (?) to mask a single character carat/hat (^) to indicate anchoring, left or right

Page 25: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 25

CQL examples Simple queries:

dinosaur "the complete dinosaur"

Boolean dinosaur and bird or dinobird "feathered dinosaur" and (yixian or jehol)

Proximity foo prox bar foo prox/>/4/word/ordered bar

Indexes title = dinosaur bath.title="the complete dinosaur" srw.serverChoice=dinosaur

Relations year > 1998 title all "complete dinosaur" title any "dinosaur bird reptile" title exact "the complete dinosaur"

Page 26: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 26

SRW & classic Z39.50 SRW

No explicit concept of connection, session, or state

Results sets named by server

Single record syntax (XML), multiple schemas

String (i.e., human-readable) queries CQL

Named indexes

Classic Z39.50 Stateful Results sets named by

client Multiple record syntaxes No human-readable query

language Type 1 query using attribute

sets Use attribute to identify

access point

Z39.50 Concepts Retained Result sets Abstract access points

Abstract record schemas Explain Diagnostics

Page 27: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 27

What problems does SRW solve Addresses need for standards-based searching

in the networked environment Shows the vitality of the Z39.50 concepts and

implements those in a web services & URL access context

Offers database providers with a web-friendly method for offering standards-based searching of resources

Provides low barrier to entry solution using commonly available technologies

XML format of records provide for more reuse, and more interesting use of resources

Page 28: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 28

Possible implementation venues

Gateways to existing Z39.50 servers Lightweight SRW/U servers to specialized

databases Cost-effective search access to

commercial databases (e.g., citation, full-text)

Metasearching Beyond libraries to many other information

communities

Page 29: Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 29

References Z39.50 International Next Generation – ZING

http://www.loc.gov/z3950/agency/zing/

Search and Retrieve for the Web – SRW/U http://www.loc.gov/srw

A Gentle Introduction to SRW http://www.loc.gov/z3950/agency/zing/srw/introduction.html

A Gentle Introduction to CQL http://zing.z3950.org/cql/intro.html

Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb04) http://www.dlib.org/dlib/february04/vanveen/02vanveen.html