introduction to apache lucene/solr
DESCRIPTION
Introduction to Apache Lucene/Solr. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. What is Lucene/Solr? Where did it come from? What are the current versions of Lucene/Solr? What can it do?. Apache Lucene. The brainchild of Doug Cutting - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Apache Lucene/Solr
CSCI 572: Information Retrieval and Search Engines
Summer 2010
May-20-10 CS572-Summer2010 CAM-2
Outline
• What is Lucene/Solr?• Where did it come from?• What are the current versions of Lucene/Solr?• What can it do?
May-20-10 CS572-Summer2010 CAM-3
Apache Lucene
• The brainchild of DougCutting
• Free-text indexing library that implements most of the functionality I’ve talked to you about– Query Models, Ranking, Indexing
• Core API is implemented in Java– C++/C, Ruby, Python APIs as well, but small
communities or automatically generated
• Initially Sourceforge, moved to Apache in 2001
May-20-10 CS572-Summer2010 CAM-4
Apache Solr• Originally developed at CNET
• Web service layer built on topof Lucene library
• Provides schema andunderstanding of field types, conversion to and from representation
• Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty
• P/L independent programming APIs
• Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily
May-20-10 CS572-Summer2010 CAM-5
How to get started
• Lucene (2.9.2 and 3.0.1 stable)– Put your Java hat on
– Have Eclipse ready or your favorite IDE
– Download lucene-core-<version>.jar from• http://repo1.maven.org/maven2/org/apache/lucene/
– Download src and build from• http://www.apache.org/dyn/closer.cgi/lucene/java/
– Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic
• http://onjava.com/pub/a/onjava/2003/01/15/lucene.html
May-20-10 CS572-Summer2010 CAM-6
How to get started• Solr
– Grab a release of Solr (1.4.0 stable)• http://www.apache.org/dyn/closer.cgi/lucene/solr/
– Unpack into e.g., /usr/local/solr
– Deploy onto tomcat• Install tomcat into /usr/local/tomcat
• Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/
– Create solr.home JNDI property and point to /usr/local/solr/solr
• Start tomcat
– Head over to $solr/example/example-docs• curl http://localhost:8983/solr/update -H 'Content-type:text/xml;
charset=utf-8' --data-binary @artists.xml
May-20-10 CS572-Summer2010 CAM-7
Modifying your schema.xml
• Field Types• Analyzers• Tokenizers
http://wiki.apache.org/solr/SchemaXml
May-20-10 CS572-Summer2010 CAM-8
Solr Faceting
• facet=on&facet.field=&facet.field=…• http://wiki.apache.org/solr/SimpleFacetParameters
May-20-10 CS572-Summer2010 CAM-9
Advanced Topics
• Standing up cores• Sharding• Replication• Zookeeper and Cloud
May-20-10 CS572-Summer2010 CAM-10
Development currently in flux
• Stick with release versions• Depending on trunk won’t really help• Lucene and Solr have merged
May-20-10 CS572-Summer2010 CAM-11
Wrapup
• Lots more information at– http://lucene.apache.org
– http://lucene.apache.org/solr/
– http://lucene.apache.org/java/
• Possible projects– Geospatial search
• Improving existing code and contributing back to Apache SIS and to Apache Solr
– Improving date faceting
– Rewriting the ResponseWriter framework
May-20-10 CS572-Summer2010 CAM-12
Acknowledgements
• Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community