make your data searchable with solr in 25 minutes
TRANSCRIPT
Make Your Data
Searchable
With Solr in 25 Minutes
Kai Chan
BruinTech Tech-a-Thon, November 19, 2013
The Goal
data
data
find this
The Goal
• objectiveso find something in the (text) data
o get the results fast
o get the most relevant results first
o avoid getting the not-so-relevant results first
• (one) solution: Solr
What Solr is
• used by high-profile websites like Twitter
… and interesting projects like NewsScape
• open-source, full-text search platform
• uses Lucene for indexing and searching
• standalone process/program (typically)
• REST-like API over HTTP
• different output formats (XML, JSON, CSV)
How to Talk To Solr
• have front-end/browser make HTTP
requests
• language-specific clientso .Net
o Java
o PHP
o Python
o Ruby
• integration with other applications
o Moodle
o Drupal
o Plone
How Solr works
Solrquery
(i.e. search criteria)
result
(i.e. things being looked for)
How Solr works
Solrquery
(i.e. search criteria)
result
(i.e. things being looked for)
Solr
index
index
How Solr works
Solrquery
(i.e. search criteria)
result
(i.e. things being looked for)
Solr
Solr
data to be searched
index
index
How Solr works
Solrquery
(i.e. search criteria)
result’
(i.e. things being looked for)
Solr
Solr
index
index’
index
additions
updates
deletions
query
(i.e. search criteria)
result
(i.e. things being looked for)
How Data Are Organized
collection
document document document
field
field
field
field
field
field
field
field
field
collection
document document document
subject
date
from
subject
date
from
date
from
text text
reply-to
text
reply-to
How Data Are Organized
collection
document document document
subject
date
from
title
SKU
price
last name
phone
text description
first name
address
How Data Are Organized
Solr Field Definition
• fieldo name
o type
o options
• field typeo text: "string", "text_general"
o numeric: "int", "long", "float", "double"
• options
o indexed: content can be searched
o stored: content can be returned at search-time
o multivalued: multiple values per field & document
Solr Dynamic Field
• define field by naming convention
• "amount_i": int, index, stored
• "tag_ss": string, indexed, stored, multivaluedname type indexed stored multiValued
*_i int true true false
*_l long true true false
*_f float true true false
*_d double true true false
*_s string true true false
*_ss string true true true
*_t text_general true true false
*_txt text_general true true true
Getting Data into Solr
• submit (post) files to Solro XML
o JSON
o CSV
• have Solr pull data from database or fileo RDBMS
o XML data locally (file) or remotely (HTTP)
o extract data (XPath)
o manipulate data (regex replace, strip HTML tags)
Searching Data in Solr
• send request to http://host:port/solr/search
• parameterso q - main query
o fl - fields to return
o sort - sort criteria
o wt - response writer (e.g. xml, json)
o indent - set to true for pretty-printing
Query Syntax
• basic format: field name “:” word/phrasetext:negotiation
text:"debt ceiling"
Query Syntax
• several clauses: separated by spacetext:negotiation
subject:debt
• make the word/phrase required: “+” prefix+text:negotiation
+subject:debt
• make the word/phrase prohibited: “-” prefixtext:negotiation -
subject:debt
Additional Things Solr Can Do
• other types of querieso range
o fuzzy
o wildcard
o regex
o proximity
o spatial
o join
• sorting
• faceted search
• … and more
Conclusion
• more about
Solr:http://lucene.apache.org/solr/
• Solr reference
guide:http://www.apache.org/dyn/closer.cgi/l
ucene/solr/ref-guide/
• my e-mail:[email protected]
• questions?