oclc research tai chi webinar 7/1/2010 oclc open source linked data framework ralph levan sr....
TRANSCRIPT
OCLC ResearchTAI CHI Webinar
7/1/2010
OCLC Open Source Linked Data Framework
OCLC Open Source Linked Data Framework
Ralph LeVanSr. Research ScientistOCLC Research
Goal: Expose Text Database Content as Linked DataGoal: Expose Text Database Content as Linked Data
Technique: Using a combination of the urlrewritefilter from tuckey.org, the content negotiation component from the Freie Universität Berlin’s Pubby server and our Open Source SRW/U server you can expose the records in your database as Linked Data
RoadmapRoadmap
1. SRW/U Server
2. URIs for records
3. Real World Objects for records
4. Multiple record formats
5. RDF needs to be returned
6. Content Negotiation
SRW/U ServerSRW/U Server
SRW/U server sits in front of text databases
• We have interfaces for DSpace, Lucene and Pears
Easy to write your own interface
• Convert CQL query to native query language
• Do search and return a resultset object
• Return records from the resultset
• (The Lucene interface is a good simple example of how to build your own database interface)
I expose my SRW/U service as <context>/search, but you can put it wherever you want.
URIs for recordsURIs for records
urlrewritefilter implements apache mod_rewrite patterns for java servlets
It sees the URI and converts it to an SRU search.
• <from>^/([0-9][0-9]+)/$</from>
<to>/search?query=local.viafID+exact+%22$1%22</to>
• E.g. viaf/123 becomes viaf/search?query=viafID+exact+%22123%22
Aside: What to Return?Aside: What to Return?
An SRU query returns a searchRetrieveResponse. A smart client can pick its record out of that response, but that seems wrong
A bad URI will result in “no records found”, but a 404 (record not found) is more appropriate
• Solution: add a new parameter (service=APP) to signal that this was a request for a single record
• E.g. viaf/123 becomes viaf/search?query=viafID+exact+%22123%22&service=APP
Real World Objects for recordsReal World Objects for records
urlrewritefilter can generate 303 (see other) redirects based on URI patterns
• <from>^/([0-9][0-9]+)$</from>
<to type="seeother-redirect">/viaf/$1/</to>
• E.g. viaf/123 redirects to viaf/123/
• The target, viaf/123/, is called the Generic Record
• (note: now we use viaf/123/ as the URI that gets turned into the SRU search)
Multiple record formatsMultiple record formats
urlrewritefilter plus the new httpAccept parameter in SRU
• E.g., viaf/123/marc21.xml becomes viaf/search?query=viafID+exact+123&httpAccept=application/marc21+xml
SRU is configured with a list of supported media types and the XSL stylesheets that render them
MimeType ConfigurationMimeType Configuration
XML.mimeTypes=application/sru+xml;q=0.85, application/xml, text/xml
HTML.mimeTypes=text/html;q=0.9, application/xhtml+xml
RSS.mimeTypes=application/rss+xml;q=0.8RSS.styleSheet=viaf2rss.xslM21.mimeTypes=application/marc21+xml;q=0.7M21.styleSheet=viaf2marc21.xslmarc21HTML.mimeTypes=application/
marc21+html;q=0.7marc21HTML.styleSheet=viaf2marc21.xsl
Aside: 123/marc21.xml NOT 123.m21Aside: 123/marc21.xml NOT 123.m21
The Generic Record being at viaf/123/ seems to imply that it is a collection of records
How do I ask for the HTML version of the MARC21 version of viaf/123 if suffix mangling is all I have?
RDF needs to be returnedRDF needs to be returned
viaf/123/rdf.xml
Making good RDF is tricky and beyond the scope of this presentation (but I think we’re getting close to agreements of sensible basics)
Content Negotiation on Generic RecordContent Negotiation on Generic Record
Pubby has a really nice Content Negotiation module
• It is configured with the list of supported media types with optional quality measures
• It takes an HTTP Accept header and returns the supported media type that matches best
SRU is configured with a list of supported media types and their quality measures and the XSL stylesheets that render them (see previous slide)
ResultResult
viaf/123 redirected to viaf/123/
viaf/123/ turned into viaf/search?query=viafID+exact+123
VIAF record returned by SRW/U server
Content Negotiation causes viaf/123/viaf.html to be returned to googlebot and browsers, viaf/123/rdf.xml to applications that ask for application/rdf+xml and viaf/123/viaf.xml returned when no preference is provided
Lucene Database DemonstrationLucene Database Demonstration
TBD
Questions?Questions?