searching featured lists. about mercer university
TRANSCRIPT
Searching Featured Lists
Jeremy BrownLinda Chen
MercerUniversity Libraries
About Mercer University
• Academic University• 11 Schools• 10,000+ students
Featured Lists
• User Manual Page #107098• Supports review files of bib or item records• Do Not support other record types• Need to keep the list(s) from the Creating List
2 Steps
• Create a list• Set up a WWWOption
Step 1 – Create a List
Step 1 - Creating List Search Screen
Step 2 - WWWOption
Step 2 - WWWOption
• User Manual Page # 106908•WWWOPTION: FEATURED_LIST
Example 1
•WWWOPTION: FEATURED_LIST=[list]|[table_header]| [revfile_header]|[#recs_header]• FEATURED_LIST= 01,03,12,17
Example 2
•WWWOPTION: FEATURED_LIST=[list]|[table_header]| [revfile_header]|[#recs_header]• FEATURED_LIST=01,03,12,17|SELECTED TITLES|
TOPIC|# TITLES
Our WWWOption
FEATURED_LIST=[list]|[table_header]| [revfile_header]|[#recs_header]FEATURED_LIST=03,02,05,06,36|February 2012| Collections|# Titles
Feature List
•What do we use it for? Monthly New Book Lists Special Projects Display Different Collections Anyone has other suggestion?
Feature List
• This is ok, but..• Something is missing…
Cannot search the list
Searching Feature Lists
What We Wanted
•Menu of Feature Lists to Search• Keyword Search function
What We Noticed
• Feature List URLs always contain “ftlist” Review file number
• Result Record URLs always contain Review file number Bibliographic record number
Anatomy of a List URL
A Result URL
Anatomy of a Result URL
This Sounded Indexable!
• Each URL contains the list review file• Each URL contains the bib number• Bibliographic records have lots of key words.
Indexing WebPAC Feature Lists
Technologies Used
• Nutch: http://nutch.apache.org/• SOLR: http://lucene.apache.org/solr/• jQuery/AJAX: http://jquery.com
Flow Chart
Crawl TheWebPAC
What Does Nutch Do?
1. Read list of URLs2. Download one of the URLs3. Extract all the links on the page, add to list
of URLs4. Extract the full text of the page5. Repeat until done with URLs6. Send Pages to SOLR for indexing
Set up WebPAC for Crawling
•We need to let our crawler look at the WebPAC• Name the crawler in
[nutch_home]/conf/nutch-site.xml
• Edit /iiidb/http/robots.txt• Add:
User-agent: MercuSpiderDisallow:
Goals of Nutch URL Configuration
• Only want bibs from the featured lists•We do not want to crawl:
MARC view Save record screens Request item screens Probably others…
•We do not want to index the whole WebPAC
Set up Nutch for Crawling the WebPAC
• Edit nutch/conf/regex-urlfilter.txt• Add:
-^http://library.mercer.edu/.*request.*-^http://library.mercer.edu/.*save.*-^http://library.mercer.edu/.*marc.*+^http://library.mercer.edu/search~S1.*ftlist
Indexing withSOLR
What Is SOLR?• A search engine in a box• Put documents in• Creates a full text index• Answers queries
Set up Solr for Crawling the WebPAC• SOLR comes ready to run• It works well with Nutch• Easy to use as search engine back end
The Single, Solitary SOLR Modification
• Add to solr-example/solr/conf/solrconfig.xml
<requestHandler name="/mercuFtlist" class="solr.StandardRequestHandler"> <lst name="defaults"> <int name="rows">10</int> <str name="echoParams">explicit</str> <str name="fl">url,id</str> </lst> <lst name="appends"> <str name="fq">(url:ftlist)+(url:\+\+ftlist/\+\+ftlist)</str> </lst> <lst name="invariants"> <bool name="facet">false</bool> </lst></requestHandler>
SOLR Request Handler, Explained
•We only want the URL and ID in our results
•We only want to search items that appear in the feature list
<str name="fq">(url:ftlist)+(url:\+\+ftlist/\+\+ftlist)</str>
<str name="fl">url,id</str>
Performance Hit (Millennium)
Performance Hit (Web Server)
Searching
Architecture• Standard web form• AJAX calls to SOLR for results• AJAX results enhancement
Search Form
Search Form in the WebPAC
Basic Results Display
Enhanced Results
More Information• Our Featured Lists:
http://library.mercer.edu/search~S1?/ftlist• Visit our Systems page:
http://libraries.mercer.edu/university-libraries/systems-department/iug-2012or: http://goo.gl/3q3QV
•MAJAX is located here:http://libx.org/majax/
Thank You!