introduction to apache lucene/solr

12

Click here to load reader

Upload: uma-weber

Post on 31-Dec-2015

55 views

Category:

Documents


9 download

DESCRIPTION

Introduction to Apache Lucene/Solr. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. What is Lucene/Solr? Where did it come from? What are the current versions of Lucene/Solr? What can it do?. Apache Lucene. The brainchild of Doug Cutting - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Apache Lucene/Solr

Introduction to Apache Lucene/Solr

CSCI 572: Information Retrieval and Search Engines

Summer 2010

Page 2: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-2

Outline

• What is Lucene/Solr?• Where did it come from?• What are the current versions of Lucene/Solr?• What can it do?

Page 3: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-3

Apache Lucene

• The brainchild of DougCutting

• Free-text indexing library that implements most of the functionality I’ve talked to you about– Query Models, Ranking, Indexing

• Core API is implemented in Java– C++/C, Ruby, Python APIs as well, but small

communities or automatically generated

• Initially Sourceforge, moved to Apache in 2001

Page 4: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-4

Apache Solr• Originally developed at CNET

• Web service layer built on topof Lucene library

• Provides schema andunderstanding of field types, conversion to and from representation

• Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty

• P/L independent programming APIs

• Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily

Page 5: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-5

How to get started

• Lucene (2.9.2 and 3.0.1 stable)– Put your Java hat on

– Have Eclipse ready or your favorite IDE

– Download lucene-core-<version>.jar from• http://repo1.maven.org/maven2/org/apache/lucene/

– Download src and build from• http://www.apache.org/dyn/closer.cgi/lucene/java/

– Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic

• http://onjava.com/pub/a/onjava/2003/01/15/lucene.html

Page 6: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-6

How to get started• Solr

– Grab a release of Solr (1.4.0 stable)• http://www.apache.org/dyn/closer.cgi/lucene/solr/

– Unpack into e.g., /usr/local/solr

– Deploy onto tomcat• Install tomcat into /usr/local/tomcat

• Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/

– Create solr.home JNDI property and point to /usr/local/solr/solr

• Start tomcat

– Head over to $solr/example/example-docs• curl http://localhost:8983/solr/update -H 'Content-type:text/xml;

charset=utf-8' --data-binary @artists.xml

Page 7: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-7

Modifying your schema.xml

• Field Types• Analyzers• Tokenizers

http://wiki.apache.org/solr/SchemaXml

Page 8: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-8

Solr Faceting

• facet=on&facet.field=&facet.field=…• http://wiki.apache.org/solr/SimpleFacetParameters

Page 9: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-9

Advanced Topics

• Standing up cores• Sharding• Replication• Zookeeper and Cloud

Page 10: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-10

Development currently in flux

• Stick with release versions• Depending on trunk won’t really help• Lucene and Solr have merged

Page 11: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-11

Wrapup

• Lots more information at– http://lucene.apache.org

– http://lucene.apache.org/solr/

– http://lucene.apache.org/java/

• Possible projects– Geospatial search

• Improving existing code and contributing back to Apache SIS and to Apache Solr

– Improving date faceting

– Rewriting the ResponseWriter framework

Page 12: Introduction to Apache Lucene/Solr

May-20-10 CS572-Summer2010 CAM-12

Acknowledgements

• Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community