discovering elasticsearch

100
Discovering Elas/cSearch

Upload: ben-corlett

Post on 13-Nov-2014

700 views

Category:

Technology


4 download

DESCRIPTION

My introduction to ElasticSearch at Laracon EU 2014, where I explain the ins and outs of ElasticSearch. The talk is centred around a single example; objective, that is modelled both in ElasticSearch and plain SQL. I discuss the advantages of both as well as integration with Laravel 4.

TRANSCRIPT

Discovering+Elas/cSearch

A"li%le"about"me

23#years#old

Jess'%boyfriend

Zoe's&dad

Mountain(biker

Scuba&diver

Scuba&biker?

Mar$al&Arts&Instructor

I'm$also$a$developer.I"run"a"development"agency"in"Australia,"

Webcomm

Finally,(I'm(a(Laracon'addict

who$has$travelled132,000&km

a"third"of"the"distance"to"the"moon.

Flashcard• @ben_corle+

• h+p://github.com/bencorle+

• h+p://webcomm.com.au

Search'isfiltering!informa)onand$determining$relevance

Picture(this(scenario

"I#want#to#find#hotels#called#Renaissance#for#under+€150,#within+500m+of+Bimhuis#so#I#am#close#to#

Laracon#EU#2014.#The#hotel#needs#to#have#disability+access#and#ideally#provide#Wifi#and#have#rooms#above+

ground+level."

Requirements

1. Name'of'hotel'called'Renaissance.

2. Under'€150.'1

3. Within'500m'of'Bimhuis.'1

4. Disability'access.

1"Perceived"requirements"may"be"flexible.

Wants

1. Wifi.&1

2. Rooms&above&ground&level.

1"Let's"be"honest,"this"should"be"a"requirement";)

If!search!is!filtering!informa1on!and!determining!relevance

And!humans!think!with!expression!and!emo2on,!why!do!your!apps!operate!like!

robots?

How!can!we!tailor!our!apps!to!think!like!our!users?

Use$the$right$toolset;$andKnow%your%data.

Introducing+Elas0cSearch

Elas%cSearch

1. Powerful+search+and+analy3cs+engine.+1

2. Object;based+document+store+where+every+field+is+indexed+and+searchable.

3. Distributed;+ready+to+scale.+2

1"Elas'cSearch"may"be"used"for"much"more"than"just"a"search"engine.2"Webscalez"FTW"(trollolol).

SQL$vs.$Elas+cSearch

SQL$and$Elas+cSearch;$a$comparison• SQL%is%a%rela,onal%database,%Elas,cSearch%is%a%search%engine.

• Where%SQL%is%great%at%filtering%on%a%binary%level,%Elas,cSearch%thrives%on%both%binary%data%and%full%text%relevance.

• SQL%indexes%are%always%up%to%date%with%your%primary%store,%Elas,cSearch%needs%to%be%synced.

• Elas,cSearch%is%very%easy%to%horizontally%scale%for%performance%and%redundancy.

SQL$and$Elas+cSearch;$a$comparison

SQL$uses$the$following$structure$for$its$data$store:

database > table > row

Elas%cSearch+has+a+different,+yet+comparable+structure:

index > type > document

Elas%cSearch+101

It's%all%about%documents

1. Elas'cSearch-is-document5oriented.

2. Documents-are-represented-using-JSON.

3. Data-can-be-in-nested-JSON-objects,-arrays-and-is-all-searchable.

It's%all%about%documents{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ]}

Elas%cSearch+Requires

JavaYou$have$3$seconds$to$sulk$and$complain,$then$shutup.

Installa'on)is)easy

1. Install)Java.

2. Download)and)run)Elas4cSearch)in)3)bash)commands.)1

3. Debian)or)RPM)packages)available.

4. Puppet)&)chef)scripts)available.

1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/_installa.on.html

Scaling(isn't(scary

1. Near'zero*configura0on*to*build*a*cluster*of*Elas0cSearch*instances.

2. Easy*to*scale*horizontally.

3. Each*Elas0cSearch*instance*is*referred*to*as*a*node.

4. Any*node*is*capable*of*handling*any*request*and*delega0ng*load.

In#very#basic#terms,#horizontal*scaling#is#adding

more%serversto#build#a#cluster,#or#cloud...

...where&ver$cal(scaling&is&throwing

more%resourcesat#an#individual#server.

Elas%cSearch+exposes+a

RESTful(API

Win.

Communica)ng+with+Elas)cSearch

1. HTTP&verbs&,&GET,&POST,&PUT,&DELETE,&etc...

2. Send&&&receive&JSON&payloads.

:VERB /:index/:type/:document

{ "key": "value", "complex": ["foo", "bar"]}

Crea%ng(a(documentPOST /myapp/hotel

{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ]}

Upda%ng(a(documentPUT /myapp/hotel/1 # ^1

{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ]}

1"Actually"an"upsert;"create"or"update"depending"on"existance.

Dele$ng'a'document

DELETE /myapp/hotel/1

There's'lots'more'to'documents

1. Par&al(document(upda&ng.

2. Document(versioning.

3. Conflict(resolu&on(for(distributed(documents.

4. Bulk(CRUD(methods(to(avoid(HTTP(boEleneck.

See#h%p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/docs.html

Elas%cSearch+makessearching*fun

Searching*in*Elas.cSearch

1. Every(single(field(can(be(searchable.

2. Perform(structured(queries(or(filters,(against(fields.1

3. Perform(full(text(queries(to(find(documents.

4. Queries(and(filters(represented(using(JSON.

5. Organise(results(by(relevance.

1"SQL&like"approach.

Index

1. Index&(noun)#$#refers#to#the#equivalent#of#a#database#in#an#SQL#system.

2. Index&(verb)#$#refers#to#the#process#of#storing(a(document#in#an#index.

3. Inverted&index#$#list#of#all#terms#inside#Elas@cSearch#and#the#documents#in#which#they#appear.

Analysis

• Character(filters"simplify"data,"such"as"changing:

• "&""to""and".

• "é""to""e".

• Data"is"split"into"terms"through"a"process"called"tokenisa1on.

Analysis

• Token&filters"tweak"and"normalise"terms,"such"as:

• Cast"to"lowercase.

• Remove&stop3words"like""a""or""the"

• Add"synonyms.

Inverted(index

1. Analysis#process#extremely#configurable.1

2. Mul7lingual#support#(33#languages#in#total),#interchangeable#per#index.

3. Any#fields#not#indexed#are#not#searchable.

4. The+same+analysis+process+occurs+at+search+3me.1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/analysis.html

Example(inverted(index

Consider)the)two)following)sentences:

"The%quick%brown%fox%jumped%over%the%lazy%dog""Quick%brown%foxes%leap%over%lazy%dogs%in%summer"

Example(inverted(indexTerm Doc_1 Doc_2-------------------------brown | X | Xdog | X | Xfox | X | Xin | | Xjump | X | Xlazy | X | Xover | X | Xquick | X | Xsummer | | Xthe | X | X------------------------

Scoring

1. Term%frequency#$#the#more#o+en#a#term#appears#in#a#field,#the#more%relevant.

2. Inverted%document%frequency#$#the#more#o+en#a#term#appears#in#the#inverted#index,#the#less%relevant.

3. Field%length%norm#$#the#longer#the#field,#the#less#relevant#each#term#in#it#is.

Scoring

1. Fields)are)"boostable")to)increase)relevance.

2. Func5ons)(inbuilt)and)scripted))can)be)used)to)increase/decrease)relevance.

3. Altering)analysis)to)fine?tune)scoring.

4. Very)important)to)know%your%data.

Queries'and'filters

1. Both'are'modular;'think'of'building(blocks.

2. Both'can'be'nested'inside'one'another.

3. Syntax'does'not'change,'regardless'of'posi?on'or'nes?ng.

4. En?re'JSON'object'is'the'Elas/cSearch(Query(DSL.

Querying)in)Elas.cSearch

1. There'37'queries'(as'of'August'2014).'1

2. Queries'are'intelligent;'they'score'all'results'according'to'a'relevance'algorithm.

3. Any'nesBng'passes'relevance'back'to'parents.

1"h$p://www.elas.csearch.org/guide/en/elas.csearch/guide/current/relevance9intro.html

Filters

1. You&will&find&27&filters&(as&of&August&2014).&1

2. Filters&are&binary;&either&a&field&matches&or&it&doesn't.

3. Filters&don't&affect&relevance&scoring.

1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/query;dsl;filters.html

Querying)in)Elas.cSearch

GET /myapp/hotel/_search

{ "query": { "match": "Renaissance" }}

This%is%a%match%query.%It%is%the%go1to%full%text%query.

Querying)in)Elas.cSearchGET /myapp/hotel/_search

{ "query": { "filtered": { "filter": { "term": { "features": "disability_access" } } } }}

This%is%a%filtered%query,%containing%a%term%filter.

Back%to%our%scenario

"I#want#to#find#hotels#called#Renaissance#for#under+€150,#within+500m+of+Bimhuis#so#I#am#close#to#

Laracon#EU#2014.#The#hotel#needs#to#have#disability+access#and#ideally#provide#Wifi#and#have#rooms#above+

ground+level."

Two$approaches

1. Possible*with*both*SQL*and*Elas5cSearch.

2. Much*easier*Elas5cSearch.

3. Elas5cSearch*understands*the*concept*of*relevance.

4. Elas5cSearch*can*severely*outperform*SQL.

SQL$approach

First,'we'll'build'the'obvious...

select *from `hotels`where `name` like "%Renaissance%"and `price` <= 150and `disability_access` = 1

Performance*&*relevance

Consider)the)following:

select *from `hotels`where `name` like "%Renaissance%"

1. This'query'will'be'slow.

2. This'query'accounts'for'terms'which'contain'the'(correctly)'spelt'Renaissance.

Full$text$search

1. Add%a%full%text%index.%1

2. Alter%the%query,%and%search%across%both%name%and%company.

select *from `hotels`where match (`name`, `company`) against ("Renaissance")and `price` <= 150and `disability_access` = 1

1"h$p://dev.mysql.com/doc/refman/5.0/en/fulltext<search.html

Checklist

1. Name'of'hotel'called'Renaissance

2. Under'€150.

3.Within&500m&of&Bimhuis.

4. Disability'access.

5.Wifi.

6. Above&ground&level.

Adding&"wants"&inselect *, if (`floor_levels` > 1, 1, 0) as `has_multiple_floor_levels`from `hotels`where match (`name`, `company`) against ("Renaissance")and `price` <= 150and `disability_access` = 1order by `wifi` desc, `has_multiple_floor_levels` desc -- ^1

1"We're"priori*sing"Wifi"over"mul*ple"floor"levels...

Checklist

1. ~Name(of(hotel(called(Renaissance.

2. Under(€150.

3.Within&500m&of&Bimhuis.

4. Disability(access.

5. Wifi.

6. Rooms(above(ground(level.

Spacial'awareness...

1. Not&so&easy.

2. PostGIS&for&PostgreSQL.&1

3. Possible&with&MySQL&with&MyISAM&tables&only.&2

4. Very&finite;&either&a&match&or&not&a&match.

5. Outside&the&scope&of&this&talk.1"h$p://postgis.net2"Possible"with"other"engines"in"new"versions"of"MySQL

Checklist

1. ~Name(of(hotel(called(Renaissance.

2. Under(€150.

3. Within(500m(of(Bimhuis.

4. Disability(access.

5. Wifi.

6. Rooms(above(ground(level.

What%if...

1. Somebody*searched*for*"Residence*Inn"*as*the*hotel*name?

2. There*was*an*appropriate*hotel*for*€151?

3. A*brilliant*candidate*could*be*found*501m*away*from*Bimhuis?

4. Somebody*cared*more*haveing*rooms*above*ground*level*than*being*provided*Wifi?

Elas%cSearch+approach

Popula'ng*Elas'cSearchPOST /myapp/hotel

{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ]}

Rince&and&repeat&for&as&many&hotels&as&required

The$bool$query

{ "bool": { "must": {}, "must_not": {}, "should": {} }}

We#specify#condi-ons#which#must#and#must%not#match.#Terms#that#should#match#make#a#document#more#relevant.

Prepare&a&bool&query{ "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name^2", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } }}

A"field"boost"of"2"was"applied"to"name"to"increase"relevance.

Checklist

1. ~Name(of(hotel(called(Renaissance.

2. Under&€150.

3.Within&500m&of&Bimhuis.

4. Disability(access.

5. Wifi.

6. Have(rooms(above(ground(level.

What%if...

1. There'was'an'appropriate'hotel'for'€151?

2. A'brilliant'candidate'could'be'found'501m'away'from'Bimhuis?

This%is%all%possible%with%Elas/cSearch,

plus%it's%easy.

Controlling)relevance

{ "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } }}

Set$the$origin$to$Bimhuis,$allowing$loca4ons$of$hotels$within$500m.$Outside$that,$a$steep$decay$of$relevance$occurs.

Controlling)relevance

{ "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } }}

Any$price$over$€100$suffers$a$similar,$severe$relevance$penalty.

{ "query": { "function_score": { "query": { "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } } }, "functions": [ { "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } } }, { "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } } } ] } }}

See#example#in#detail#over#at

h"p://git.io/V4Hm6w

Integra(ng)withWordPress

*Joking*

Integra(ng)with

Laravel

Install'via'Composer

{ "require": { "elasticsearch/elasticsearch": "1.1.*" }}

Crea%ng(/(upda%ng(documents$client = new Elasticsearch\Client();$client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'name' => 'Renaissance Hotel Amsterdam', 'company' => 'Marriott', 'location' => [52.3712561, 4.9005577], 'floor_levels' => 10, 'features' => ['disability_access', 'wifi', 'pool', 'restaurant'], ],]);

You$can$create$or$update$in$the$same$request.

Par$ally'upda$ng'documents

$client = new Elasticsearch\Client();$client->update([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'floor_levels' => 11, ],]);

Dele$ng'documents

$client = new Elasticsearch\Client();$client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1',]);

Searching*documents

$client = new Elasticsearch\Client();$client->search([ 'index' => 'myapp', 'type' => 'hotel', 'body' => [ 'query' => [ 'match' => 'Renaissance', ], ],]);

Create/update/delete+eloquent+documents

Hotel::created(function ($hotel) { $client = new Elasticsearch\Client(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]);});

Create/update/delete+eloquent+documents

Hotel::updated(function ($hotel) { $client = new Elasticsearch\Client(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]);});

Create/update/delete+eloquent+documents

Hotel::deleted(function ($hotel) { $client = new Elasticsearch\Client(); $client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, ]);});

Searching*+*two*approaches

1. Acceptable#have#a#search()#method#directly#on#Eloquent#to#use#Elas6cSearch

2. Be*er#approach#is#to#decorate#a#repository;#decouple#and#remove#vendor#lock<in.

Eloquent)repositoryclass EloquentHotelRepository implements HotelRepository{ public function create($name) { // Create and save the model }

public function search($name, array $filters) { // Perform search as best you can without ElasticSearch... }

// Truncated for brevity...}

Decora'ng*the*repositoryclass ElasticSearchHotelRepository implements HotelRepository{ protected $eloquent;

public function __construct(EloquentHotelRepository $eloquent) { $this->eloquent = $eloquent; }

public function create($name) { $this->eloquent->create($name); }

public function search($name, array $filters) { // Truncated for brevity... }}

Decora'ng*the*repositoryclass ElasticSearchHotelRepository implements HotelRepository{ public function search($name, array $filters) { $results = $client->search([ // ... ]);

return array_map(function ($result) { $hotel = new Hotel([ // ]); $hotel->exists = true; return $hotel; }, $results); }}

To#dig#deeper,#please#visit

h"p://git.io/CRW6Mg01

1"Repository"will"be"live"soon

Why$I$chose$Elas-cSearch

1. Built(for(real.me(search(applica.ons.

2. Handles(concurrent(read/write(much(be;er(than(compe.tors.

3. Download(an(execute(a(single(binary(as(a(bare(minimum.

4. Easy(to(configure;(you(don't(need(to(configure(anything.

5. JSON(over(a(RESTful(API.(Need(I(say(more?

Why$I$chose$Elas-cSearch$over$"X"

1. Solr#$#I#dislike#how#you#communicate#with#it;#maybe#I'm#not#enterprise+enough#for#XML.#I#also#don't#like#it's#real>me#performance.1

2. Sphinx#$#query#language#was#peculiar,#SQL$like.#Was#never#built#as#a#real>me#search#engine.

1"h$p://blog.socialcast.com/real5me6search6solr6vs6elas5csearch/

Things'I'haven't'told'you'about

1. Par&al(matching(0(matching(par&al(words(using(ngrams.

2. How(easy(and(fast(autocomplete(can(be.

3. Fuzzy1search(0(misspelt(words.

4. Fine0tuning(analysis(for(specific(data(sets.

5. Analy5cs(0(aggrega&ng(sta&s&cs(to(produce(things(like(reports(or(faceted8filtering(0(part(of(a(query.

One$more$thing...

Elas%cSearch+is+coming+soon+to

Laravel'HomesteadRun$vagrant box update$to$get$the$awesomeness.

Further'learning

1. h$p://www.elas-csearch.org

2. h$p://shop.oreilly.com/product/0636920028505.do

3. h$p://git.io/CRW6Mg

h"p://joind.in/11691h"ps://github.com/bencorle"/laracon5eu52014