discovering elasticsearch
DESCRIPTION
My introduction to ElasticSearch at Laracon EU 2014, where I explain the ins and outs of ElasticSearch. The talk is centred around a single example; objective, that is modelled both in ElasticSearch and plain SQL. I discuss the advantages of both as well as integration with Laravel 4.TRANSCRIPT
"I#want#to#find#hotels#called#Renaissance#for#under+€150,#within+500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#have#disability+access#and#ideally#provide#Wifi#and#have#rooms#above+
ground+level."
Requirements
1. Name'of'hotel'called'Renaissance.
2. Under'€150.'1
3. Within'500m'of'Bimhuis.'1
4. Disability'access.
1"Perceived"requirements"may"be"flexible.
If!search!is!filtering!informa1on!and!determining!relevance
And!humans!think!with!expression!and!emo2on,!why!do!your!apps!operate!like!
robots?
How!can!we!tailor!our!apps!to!think!like!our!users?
Elas%cSearch
1. Powerful+search+and+analy3cs+engine.+1
2. Object;based+document+store+where+every+field+is+indexed+and+searchable.
3. Distributed;+ready+to+scale.+2
1"Elas'cSearch"may"be"used"for"much"more"than"just"a"search"engine.2"Webscalez"FTW"(trollolol).
SQL$and$Elas+cSearch;$a$comparison• SQL%is%a%rela,onal%database,%Elas,cSearch%is%a%search%engine.
• Where%SQL%is%great%at%filtering%on%a%binary%level,%Elas,cSearch%thrives%on%both%binary%data%and%full%text%relevance.
• SQL%indexes%are%always%up%to%date%with%your%primary%store,%Elas,cSearch%needs%to%be%synced.
• Elas,cSearch%is%very%easy%to%horizontally%scale%for%performance%and%redundancy.
SQL$and$Elas+cSearch;$a$comparison
SQL$uses$the$following$structure$for$its$data$store:
database > table > row
Elas%cSearch+has+a+different,+yet+comparable+structure:
index > type > document
It's%all%about%documents
1. Elas'cSearch-is-document5oriented.
2. Documents-are-represented-using-JSON.
3. Data-can-be-in-nested-JSON-objects,-arrays-and-is-all-searchable.
It's%all%about%documents{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ]}
Installa'on)is)easy
1. Install)Java.
2. Download)and)run)Elas4cSearch)in)3)bash)commands.)1
3. Debian)or)RPM)packages)available.
4. Puppet)&)chef)scripts)available.
1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/_installa.on.html
Scaling(isn't(scary
1. Near'zero*configura0on*to*build*a*cluster*of*Elas0cSearch*instances.
2. Easy*to*scale*horizontally.
3. Each*Elas0cSearch*instance*is*referred*to*as*a*node.
4. Any*node*is*capable*of*handling*any*request*and*delega0ng*load.
Communica)ng+with+Elas)cSearch
1. HTTP&verbs&,&GET,&POST,&PUT,&DELETE,&etc...
2. Send&&&receive&JSON&payloads.
:VERB /:index/:type/:document
{ "key": "value", "complex": ["foo", "bar"]}
Crea%ng(a(documentPOST /myapp/hotel
{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ]}
Upda%ng(a(documentPUT /myapp/hotel/1 # ^1
{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ]}
1"Actually"an"upsert;"create"or"update"depending"on"existance.
There's'lots'more'to'documents
1. Par&al(document(upda&ng.
2. Document(versioning.
3. Conflict(resolu&on(for(distributed(documents.
4. Bulk(CRUD(methods(to(avoid(HTTP(boEleneck.
See#h%p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/docs.html
Searching*in*Elas.cSearch
1. Every(single(field(can(be(searchable.
2. Perform(structured(queries(or(filters,(against(fields.1
3. Perform(full(text(queries(to(find(documents.
4. Queries(and(filters(represented(using(JSON.
5. Organise(results(by(relevance.
1"SQL&like"approach.
Index
1. Index&(noun)#$#refers#to#the#equivalent#of#a#database#in#an#SQL#system.
2. Index&(verb)#$#refers#to#the#process#of#storing(a(document#in#an#index.
3. Inverted&index#$#list#of#all#terms#inside#Elas@cSearch#and#the#documents#in#which#they#appear.
Analysis
• Character(filters"simplify"data,"such"as"changing:
• "&""to""and".
• "é""to""e".
• Data"is"split"into"terms"through"a"process"called"tokenisa1on.
Analysis
• Token&filters"tweak"and"normalise"terms,"such"as:
• Cast"to"lowercase.
• Remove&stop3words"like""a""or""the"
• Add"synonyms.
Inverted(index
1. Analysis#process#extremely#configurable.1
2. Mul7lingual#support#(33#languages#in#total),#interchangeable#per#index.
3. Any#fields#not#indexed#are#not#searchable.
4. The+same+analysis+process+occurs+at+search+3me.1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/analysis.html
Example(inverted(index
Consider)the)two)following)sentences:
"The%quick%brown%fox%jumped%over%the%lazy%dog""Quick%brown%foxes%leap%over%lazy%dogs%in%summer"
Example(inverted(indexTerm Doc_1 Doc_2-------------------------brown | X | Xdog | X | Xfox | X | Xin | | Xjump | X | Xlazy | X | Xover | X | Xquick | X | Xsummer | | Xthe | X | X------------------------
Scoring
1. Term%frequency#$#the#more#o+en#a#term#appears#in#a#field,#the#more%relevant.
2. Inverted%document%frequency#$#the#more#o+en#a#term#appears#in#the#inverted#index,#the#less%relevant.
3. Field%length%norm#$#the#longer#the#field,#the#less#relevant#each#term#in#it#is.
Scoring
1. Fields)are)"boostable")to)increase)relevance.
2. Func5ons)(inbuilt)and)scripted))can)be)used)to)increase/decrease)relevance.
3. Altering)analysis)to)fine?tune)scoring.
4. Very)important)to)know%your%data.
Queries'and'filters
1. Both'are'modular;'think'of'building(blocks.
2. Both'can'be'nested'inside'one'another.
3. Syntax'does'not'change,'regardless'of'posi?on'or'nes?ng.
4. En?re'JSON'object'is'the'Elas/cSearch(Query(DSL.
Querying)in)Elas.cSearch
1. There'37'queries'(as'of'August'2014).'1
2. Queries'are'intelligent;'they'score'all'results'according'to'a'relevance'algorithm.
3. Any'nesBng'passes'relevance'back'to'parents.
1"h$p://www.elas.csearch.org/guide/en/elas.csearch/guide/current/relevance9intro.html
Filters
1. You&will&find&27&filters&(as&of&August&2014).&1
2. Filters&are&binary;&either&a&field&matches&or&it&doesn't.
3. Filters&don't&affect&relevance&scoring.
1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/current/query;dsl;filters.html
Querying)in)Elas.cSearch
GET /myapp/hotel/_search
{ "query": { "match": "Renaissance" }}
This%is%a%match%query.%It%is%the%go1to%full%text%query.
Querying)in)Elas.cSearchGET /myapp/hotel/_search
{ "query": { "filtered": { "filter": { "term": { "features": "disability_access" } } } }}
This%is%a%filtered%query,%containing%a%term%filter.
"I#want#to#find#hotels#called#Renaissance#for#under+€150,#within+500m+of+Bimhuis#so#I#am#close#to#
Laracon#EU#2014.#The#hotel#needs#to#have#disability+access#and#ideally#provide#Wifi#and#have#rooms#above+
ground+level."
Two$approaches
1. Possible*with*both*SQL*and*Elas5cSearch.
2. Much*easier*Elas5cSearch.
3. Elas5cSearch*understands*the*concept*of*relevance.
4. Elas5cSearch*can*severely*outperform*SQL.
First,'we'll'build'the'obvious...
select *from `hotels`where `name` like "%Renaissance%"and `price` <= 150and `disability_access` = 1
Performance*&*relevance
Consider)the)following:
select *from `hotels`where `name` like "%Renaissance%"
1. This'query'will'be'slow.
2. This'query'accounts'for'terms'which'contain'the'(correctly)'spelt'Renaissance.
Full$text$search
1. Add%a%full%text%index.%1
2. Alter%the%query,%and%search%across%both%name%and%company.
select *from `hotels`where match (`name`, `company`) against ("Renaissance")and `price` <= 150and `disability_access` = 1
1"h$p://dev.mysql.com/doc/refman/5.0/en/fulltext<search.html
Checklist
1. Name'of'hotel'called'Renaissance
2. Under'€150.
3.Within&500m&of&Bimhuis.
4. Disability'access.
5.Wifi.
6. Above&ground&level.
Adding&"wants"&inselect *, if (`floor_levels` > 1, 1, 0) as `has_multiple_floor_levels`from `hotels`where match (`name`, `company`) against ("Renaissance")and `price` <= 150and `disability_access` = 1order by `wifi` desc, `has_multiple_floor_levels` desc -- ^1
1"We're"priori*sing"Wifi"over"mul*ple"floor"levels...
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(€150.
3.Within&500m&of&Bimhuis.
4. Disability(access.
5. Wifi.
6. Rooms(above(ground(level.
Spacial'awareness...
1. Not&so&easy.
2. PostGIS&for&PostgreSQL.&1
3. Possible&with&MySQL&with&MyISAM&tables&only.&2
4. Very&finite;&either&a&match&or¬&a&match.
5. Outside&the&scope&of&this&talk.1"h$p://postgis.net2"Possible"with"other"engines"in"new"versions"of"MySQL
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under(€150.
3. Within(500m(of(Bimhuis.
4. Disability(access.
5. Wifi.
6. Rooms(above(ground(level.
What%if...
1. Somebody*searched*for*"Residence*Inn"*as*the*hotel*name?
2. There*was*an*appropriate*hotel*for*€151?
3. A*brilliant*candidate*could*be*found*501m*away*from*Bimhuis?
4. Somebody*cared*more*haveing*rooms*above*ground*level*than*being*provided*Wifi?
Popula'ng*Elas'cSearchPOST /myapp/hotel
{ "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ]}
Rince&and&repeat&for&as&many&hotels&as&required
The$bool$query
{ "bool": { "must": {}, "must_not": {}, "should": {} }}
We#specify#condi-ons#which#must#and#must%not#match.#Terms#that#should#match#make#a#document#more#relevant.
Prepare&a&bool&query{ "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name^2", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } }}
A"field"boost"of"2"was"applied"to"name"to"increase"relevance.
Checklist
1. ~Name(of(hotel(called(Renaissance.
2. Under&€150.
3.Within&500m&of&Bimhuis.
4. Disability(access.
5. Wifi.
6. Have(rooms(above(ground(level.
What%if...
1. There'was'an'appropriate'hotel'for'€151?
2. A'brilliant'candidate'could'be'found'501m'away'from'Bimhuis?
Controlling)relevance
{ "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } }}
Set$the$origin$to$Bimhuis,$allowing$loca4ons$of$hotels$within$500m.$Outside$that,$a$steep$decay$of$relevance$occurs.
Controlling)relevance
{ "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } }}
Any$price$over$€100$suffers$a$similar,$severe$relevance$penalty.
{ "query": { "function_score": { "query": { "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } } }, "functions": [ { "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } } }, { "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } } } ] } }}
Crea%ng(/(upda%ng(documents$client = new Elasticsearch\Client();$client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'name' => 'Renaissance Hotel Amsterdam', 'company' => 'Marriott', 'location' => [52.3712561, 4.9005577], 'floor_levels' => 10, 'features' => ['disability_access', 'wifi', 'pool', 'restaurant'], ],]);
You$can$create$or$update$in$the$same$request.
Par$ally'upda$ng'documents
$client = new Elasticsearch\Client();$client->update([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'floor_levels' => 11, ],]);
Dele$ng'documents
$client = new Elasticsearch\Client();$client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1',]);
Searching*documents
$client = new Elasticsearch\Client();$client->search([ 'index' => 'myapp', 'type' => 'hotel', 'body' => [ 'query' => [ 'match' => 'Renaissance', ], ],]);
Create/update/delete+eloquent+documents
Hotel::created(function ($hotel) { $client = new Elasticsearch\Client(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]);});
Create/update/delete+eloquent+documents
Hotel::updated(function ($hotel) { $client = new Elasticsearch\Client(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]);});
Create/update/delete+eloquent+documents
Hotel::deleted(function ($hotel) { $client = new Elasticsearch\Client(); $client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, ]);});
Searching*+*two*approaches
1. Acceptable#have#a#search()#method#directly#on#Eloquent#to#use#Elas6cSearch
2. Be*er#approach#is#to#decorate#a#repository;#decouple#and#remove#vendor#lock<in.
Eloquent)repositoryclass EloquentHotelRepository implements HotelRepository{ public function create($name) { // Create and save the model }
public function search($name, array $filters) { // Perform search as best you can without ElasticSearch... }
// Truncated for brevity...}
Decora'ng*the*repositoryclass ElasticSearchHotelRepository implements HotelRepository{ protected $eloquent;
public function __construct(EloquentHotelRepository $eloquent) { $this->eloquent = $eloquent; }
public function create($name) { $this->eloquent->create($name); }
public function search($name, array $filters) { // Truncated for brevity... }}
Decora'ng*the*repositoryclass ElasticSearchHotelRepository implements HotelRepository{ public function search($name, array $filters) { $results = $client->search([ // ... ]);
return array_map(function ($result) { $hotel = new Hotel([ // ]); $hotel->exists = true; return $hotel; }, $results); }}
Why$I$chose$Elas-cSearch
1. Built(for(real.me(search(applica.ons.
2. Handles(concurrent(read/write(much(be;er(than(compe.tors.
3. Download(an(execute(a(single(binary(as(a(bare(minimum.
4. Easy(to(configure;(you(don't(need(to(configure(anything.
5. JSON(over(a(RESTful(API.(Need(I(say(more?
Why$I$chose$Elas-cSearch$over$"X"
1. Solr#$#I#dislike#how#you#communicate#with#it;#maybe#I'm#not#enterprise+enough#for#XML.#I#also#don't#like#it's#real>me#performance.1
2. Sphinx#$#query#language#was#peculiar,#SQL$like.#Was#never#built#as#a#real>me#search#engine.
1"h$p://blog.socialcast.com/real5me6search6solr6vs6elas5csearch/
Things'I'haven't'told'you'about
1. Par&al(matching(0(matching(par&al(words(using(ngrams.
2. How(easy(and(fast(autocomplete(can(be.
3. Fuzzy1search(0(misspelt(words.
4. Fine0tuning(analysis(for(specific(data(sets.
5. Analy5cs(0(aggrega&ng(sta&s&cs(to(produce(things(like(reports(or(faceted8filtering(0(part(of(a(query.
Further'learning
1. h$p://www.elas-csearch.org
2. h$p://shop.oreilly.com/product/0636920028505.do
3. h$p://git.io/CRW6Mg