make your data searchable with solr in 25 minutes

20
Make Your Data Searchable With Solr in 25 Minutes Kai Chan BruinTech Tech-a-Thon, November 19, 2013

Upload: ucla-social-sciences-computing

Post on 14-Jul-2015

299 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Make Your Data Searchable With Solr in 25 Minutes

Make Your Data

Searchable

With Solr in 25 Minutes

Kai Chan

BruinTech Tech-a-Thon, November 19, 2013

Page 2: Make Your Data Searchable With Solr in 25 Minutes

The Goal

data

data

find this

Page 3: Make Your Data Searchable With Solr in 25 Minutes

The Goal

• objectiveso find something in the (text) data

o get the results fast

o get the most relevant results first

o avoid getting the not-so-relevant results first

• (one) solution: Solr

Page 4: Make Your Data Searchable With Solr in 25 Minutes

What Solr is

• used by high-profile websites like Twitter

… and interesting projects like NewsScape

• open-source, full-text search platform

• uses Lucene for indexing and searching

• standalone process/program (typically)

• REST-like API over HTTP

• different output formats (XML, JSON, CSV)

Page 5: Make Your Data Searchable With Solr in 25 Minutes

How to Talk To Solr

• have front-end/browser make HTTP

requests

• language-specific clientso .Net

o Java

o PHP

o Python

o Ruby

• integration with other applications

o Moodle

o Drupal

o Plone

Page 6: Make Your Data Searchable With Solr in 25 Minutes

How Solr works

Solrquery

(i.e. search criteria)

result

(i.e. things being looked for)

Page 7: Make Your Data Searchable With Solr in 25 Minutes

How Solr works

Solrquery

(i.e. search criteria)

result

(i.e. things being looked for)

Solr

index

index

Page 8: Make Your Data Searchable With Solr in 25 Minutes

How Solr works

Solrquery

(i.e. search criteria)

result

(i.e. things being looked for)

Solr

Solr

data to be searched

index

index

Page 9: Make Your Data Searchable With Solr in 25 Minutes

How Solr works

Solrquery

(i.e. search criteria)

result’

(i.e. things being looked for)

Solr

Solr

index

index’

index

additions

updates

deletions

query

(i.e. search criteria)

result

(i.e. things being looked for)

Page 10: Make Your Data Searchable With Solr in 25 Minutes

How Data Are Organized

collection

document document document

field

field

field

field

field

field

field

field

field

Page 11: Make Your Data Searchable With Solr in 25 Minutes

collection

document document document

subject

date

from

subject

date

from

date

from

text text

reply-to

text

reply-to

How Data Are Organized

Page 12: Make Your Data Searchable With Solr in 25 Minutes

collection

document document document

subject

date

from

title

SKU

price

last name

phone

text description

first name

address

How Data Are Organized

Page 13: Make Your Data Searchable With Solr in 25 Minutes

Solr Field Definition

• fieldo name

o type

o options

• field typeo text: "string", "text_general"

o numeric: "int", "long", "float", "double"

• options

o indexed: content can be searched

o stored: content can be returned at search-time

o multivalued: multiple values per field & document

Page 14: Make Your Data Searchable With Solr in 25 Minutes

Solr Dynamic Field

• define field by naming convention

• "amount_i": int, index, stored

• "tag_ss": string, indexed, stored, multivaluedname type indexed stored multiValued

*_i int true true false

*_l long true true false

*_f float true true false

*_d double true true false

*_s string true true false

*_ss string true true true

*_t text_general true true false

*_txt text_general true true true

Page 15: Make Your Data Searchable With Solr in 25 Minutes

Getting Data into Solr

• submit (post) files to Solro XML

o JSON

o CSV

• have Solr pull data from database or fileo RDBMS

o XML data locally (file) or remotely (HTTP)

o extract data (XPath)

o manipulate data (regex replace, strip HTML tags)

Page 16: Make Your Data Searchable With Solr in 25 Minutes

Searching Data in Solr

• send request to http://host:port/solr/search

• parameterso q - main query

o fl - fields to return

o sort - sort criteria

o wt - response writer (e.g. xml, json)

o indent - set to true for pretty-printing

Page 17: Make Your Data Searchable With Solr in 25 Minutes

Query Syntax

• basic format: field name “:” word/phrasetext:negotiation

text:"debt ceiling"

Page 18: Make Your Data Searchable With Solr in 25 Minutes

Query Syntax

• several clauses: separated by spacetext:negotiation

subject:debt

• make the word/phrase required: “+” prefix+text:negotiation

+subject:debt

• make the word/phrase prohibited: “-” prefixtext:negotiation -

subject:debt

Page 19: Make Your Data Searchable With Solr in 25 Minutes

Additional Things Solr Can Do

• other types of querieso range

o fuzzy

o wildcard

o regex

o proximity

o spatial

o join

• sorting

• faceted search

• … and more

Page 20: Make Your Data Searchable With Solr in 25 Minutes

Conclusion

• more about

Solr:http://lucene.apache.org/solr/

• Solr reference

guide:http://www.apache.org/dyn/closer.cgi/l

ucene/solr/ref-guide/

• my e-mail:[email protected]

• questions?