codemotion 2013 - creare il proprio motore di ricerca con apache solr

28
Creare il proprio motore di ricerca con Apache Solr [email protected] (@afocareta) Pro-netics S.p.A. [email protected] Pro-netics S.p.A Alfonso Focareta Angelo Quercioli

Upload: alfonso-focareta

Post on 11-May-2015

2.473 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Creare il proprio motore di ricerca con Apache Solr

[email protected] (@afocareta) Pro-netics [email protected] Pro-netics S.p.A

Alfonso FocaretaAngelo Quercioli

Page 2: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr & LuceneAlfonso FocaretaAngelo Quercioli

[email protected] [email protected]

Page 3: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Lucene: featuresAlfonso FocaretaAngelo Quercioli

• High performance, full-text & scalable search library

• 100% pure Java

• Focus: Indexing + Searching Documents (“Document” is just a list of name+value pairs)

• No crawlers or document parsing Flexible Text Analysis (tokenizers + token filters)

[email protected] [email protected]

Page 4: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: featuresAlfonso FocaretaAngelo Quercioli

• A full text search server based on Lucene• XML/HTTP, JSON Interfaces• Faceted Search (category counting)• Flexible data schema to define types and fields• Hit Highlighting• Configurable Advanced Caching• Index Replication• Extensible Open Architecture, Plugins• Web Administration Interface• Written in Java5, deployable as a WAR

[email protected] [email protected]

Page 5: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: licenseAlfonso FocaretaAngelo Quercioli

OPEN SOURCE!!Apache License

[email protected] [email protected]

Page 6: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: ArchitectureAlfonso FocaretaAngelo Quercioli

[email protected] [email protected]

Page 7: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Installing and StartingAlfonso FocaretaAngelo Quercioli

• JDK5 or above intsalled

[email protected] [email protected]

http://localhost:8983/solr/admin/ in your web browser for admin it

Page 8: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Define a schema.xmlAlfonso FocaretaAngelo Quercioli

Define a Schema (schema.xml)

The file schema.xml describes the structures of the data indexed.

• Type definitions• Field definitions• CopyField section• Additional definitions

[email protected] [email protected]

Page 9: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Define a schema.xml (type definition)Alfonso FocaretaAngelo Quercioli

Type Definition

List of type and component (simple and complex)• Primitive type• WhiteSpaceTokenizerFactory• StopFilterFactory• WordDelimiterFilterFactory• LowerCaseFilterFactory• SnowBallFilterFactory (stemming)

[email protected] [email protected]

Page 10: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Define a schema.xml (type definition- example)

Alfonso FocaretaAngelo Quercioli

Type Definition - Example

[email protected] [email protected]

Page 11: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Define a schema.xml (type definition- example)

Alfonso FocaretaAngelo Quercioli

Field Definitions

• Field Attributes: name, type, indexed, stored, multiValued, omitNorms, termVectors

<field name="id“ type="string" indexed="true" stored="true"/><field name="sku“ type="textTight” indexed="true" stored="true"/><field name="name“ type="text“ indexed="true" stored="true"/><field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/><field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/><field name=“category“ type=”text_ws“ indexed=”true” stored=“true”

multiValued="true"/>

• Dynamic Fields

<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/><dynamicField name="*_s" type="string“ indexed="true" stored="true"/><dynamicField name="*_t" type="text“ indexed="true" stored="true"/>

[email protected] [email protected]

Page 12: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Define a schema.xml (Copy Field- example)

Alfonso FocaretaAngelo Quercioli

Copy Field

Copies one field to another at index time.Case#1: Analyze same field different ways

– copy into a field with a different analyzer– boost exact-case, exact-punctuation matches– language translations, thesaurus, soundex

<field name=“title” type=“text”/><field name=“title_exact” type=“text_exact” stored=“false”/><copyField source=“title” dest=“title_exact”/>

Case #2: Index multiple fields into single searchable field

[email protected] [email protected]

Page 13: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Indexing MethodAlfonso FocaretaAngelo Quercioli

Indexing Method

You put documents in it (called "indexing") via :

• XML• JSON• CSV• Binary over http (multipart request)

[email protected] [email protected]

Page 14: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Indexing (Java Api)Alfonso FocaretaAngelo Quercioli

Indexing by Solrj

Send an xml like this

[email protected] [email protected]

<add><doc <field name=“id”>043564</field> <field name=“name”>Alfonso</field> <field name=“surname”>Focareta</field> <field name=“category”>developer</field> <field name=“language”>Italian</field> <field name=“language”>English</field></doc></add>

Page 15: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Indexing (Solrj)Alfonso FocaretaAngelo Quercioli

Solrj

Solrj is a java client to access solr, It offers a java interface to add, update, and query the solr index

Example ->

[email protected] [email protected]

Page 16: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Indexing (Solrj) ExampleAlfonso FocaretaAngelo Quercioli

[email protected] [email protected]

Page 17: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Delete DocumentAlfonso FocaretaAngelo Quercioli

Delete document(s)

• Delete by Id(most efficient)<delete>

<id>05591</id> <id>32552</id>

</delete>

• Delete by Query<delete>

<query>language:english</query>

</delete>

[email protected] [email protected]

Page 18: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Commit and OptimizeAlfonso FocaretaAngelo Quercioli

Commit and Optimize

Commit : when you are indexing documents to Solr none of the changes you are making will appear until you run the commit command!

Optimize: the command that reorganize the index into segments (increasing search speed) and remove any deleted (replaced) documents.

[email protected] [email protected]

Page 19: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: SearchingAlfonso FocaretaAngelo Quercioli

Searching You can search document in Solr by http or by solrj

library.http:/

/localhost:8983/solr/select?q=language:italian&start=0&rows=2&fl=name,surname

<response> <result numFound=“15" start="0"> <doc> <str name=“name">Angelo</str> <str name=“surname”>quercioli</str> </doc> <doc> <str name=“name">Alfonso</str> <str name=“surname”>Focareta</str> </doc> </result></response>

[email protected] [email protected]

Page 20: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Searching (Response Format)Alfonso FocaretaAngelo Quercioli

Response FormatYou can add &wt=json for JSON formatted response

{“result": {"numFound":15, "start":0, "docs": [ {“name”:”Angelo”, “surname”:”Quercioli”}, {“name”:” Alfonso”, “surname”:” Focareta”} ]}

[email protected] [email protected]

Page 21: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Searching – Query SyntaxAlfonso FocaretaAngelo Quercioli

Lucene Query Syntax

• Italian englishEquiv: italian OR englishQueryParser default operator is “OR”/optional

• Wildcard searches: ang?o, alf*o, rom*

• +italian+english –name:angelo Equiv: italian AND english NOT name:angelo

• “justice league” –name:aquaman• releaseDate:[2012-01-01T00-00-00Z TO 2013-12-

31T23:59:59Z]• description:“legge roma”~100•

[email protected] [email protected]

Page 22: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Searching – Query Syntax 2Alfonso FocaretaAngelo Quercioli

Lucene Query Syntax 2

• *:*• (angelo AND “pier francesco”) OR

(+federico +paolo)

[email protected] [email protected]

Page 23: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Function QueryAlfonso FocaretaAngelo Quercioli

Function Query• Allows adding function of field value to score– Boost recently added or popular documents

• Current parser only supports function notation• Example: log(sum(popularity,1))• sum, min, max, log, sqrt, currency, ms … etc• scale(x, target_min, target_max)– calculates min & max of x across all docs

• map(x, min, max, target)– useful for dealing with defaults

[email protected] [email protected]

Page 24: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Boosted QueryAlfonso FocaretaAngelo Quercioli

Boosted Query

• Score is multiplied instead of added– New local params {!...} syntax added

&q={!boost b=sqrt(popularity)}”super man”

• Parameter dereferencing in local params&q={!boost b=$boost v=$userq}&boost=sqrt(popularity)&userq=“super man”

[email protected] [email protected]

Page 25: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Facet QueryAlfonso FocaretaAngelo Quercioli

Facet QueryFaceted search breaks up search result into multiple

categories

http://solr/select?q=foo&wt=json&indent=on &facet=true&facet.field=cat &facet.query=price:[0 TO 100] &facet.query=manu:IBM

{"response":{"numFound":26,"start":0,"docs":[…]}, “facet_counts":{ "facet_queries":{ "price:[0 TO 100]":6, “manu:IBM":2}, "facet_fields":{ "cat":[ "electronics",14, "memory",3, "card",2, "connector",2] }}}

[email protected] [email protected]

Page 26: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Solr: Filter QueryAlfonso FocaretaAngelo Quercioli

Filter Query

• Filters are restrictions in addition to the query• Use in faceting to narrow the results• Filters are cached separately for speed

User queries for memory, query sent to solr is &q=memory&fq=inStock:true&facet=true&…2. User selects 1GB memory size &q=memory&fq=inStock:true&fq=size:1GB&…3. User selects DDR2 memory type &q=memory&fq=inStock:true&fq=size:1GB &fq=type:DDR2&…

[email protected] [email protected]

Page 27: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Demo!Alfonso FocaretaAngelo Quercioli

Demo!

[email protected] [email protected]

Page 28: Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

Demo!Alfonso FocaretaAngelo Quercioli

Questions ?

[email protected] [email protected]