99 problems, but the search ain't one

68
99 Problems, But The Search Ain’t One Andrei Zmievski • PHP UK • Feb 25, 2011

Upload: andrei-zmievski

Post on 22-Apr-2015

36.637 views

Category:

Technology


5 download

DESCRIPTION

ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful search engine, ready to be plugged into Web applications.

TRANSCRIPT

Page 1: 99 Problems, But The Search Ain't One

99 Problems, ButThe Search Ain’t OneAndrei Zmievski • PHP UK •!Feb 25, 2011

Page 2: 99 Problems, But The Search Ain't One

who am I?

curl http://localhost:9200/speaker/info/andrei

{“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”}

Page 3: 99 Problems, But The Search Ain't One

what is elasticsearch?

a search engine for the NoSQL generation

domain-driven

distributed

RESTful

Hitchhiker’s Guide to the Galaxy (no, really)

Page 4: 99 Problems, But The Search Ain't One

document model

document-oriented

JSON-based

schema-free

Page 5: 99 Problems, But The Search Ain't One

based on Lucene

multi-tenancy

distributed, out of the box

engine

Page 6: 99 Problems, But The Search Ain't One

nomenclature

index

type

document

_id

node

Page 7: 99 Problems, But The Search Ain't One

3 easy steps

Page 8: 99 Problems, But The Search Ain't One

1. index!"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N=

requ

est

>

%%%%?1:?/-#"9

%%%%?OB7<9P?/?!178?

%%%%?O-I.9?/?3.92:9#?

%%%%?OB<?/?;?

Nresp

onse

Page 9: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?5?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

Page 10: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

!!!!"#$#%&"!'!()%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?5?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

total number of hits

Page 11: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

!!!!!!"*+,-./"!'!"0$,1")%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?5?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse the index of the doc

Page 12: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

!!!!!!"*#23."!'!"43.%5.6")%%%%%%?OB<?%/%?5?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse the type of the doc

Page 13: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"7")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the id of the doc

Page 14: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"7")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the id of the doc

Page 15: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"7")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the hit score

Page 16: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"7")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

8!!!!",%9."'!":,-6.+!;9+.<45+")!!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.")!!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G)!!!!"#H+##.6"'!"%")!!!!"A.+FA#"'!(IJK%N%J%N%N

resp

onse

the original source

Page 17: 99 Problems, But The Search Ain't One

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%"#$$5"!'!L)%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?5?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the execution time

Page 18: 99 Problems, But The Search Ain't One

3. profit

that’s up to you

Page 19: 99 Problems, But The Search Ain't One

demo

Page 20: 99 Problems, But The Search Ain't One

distributed model

provides:

performance

resiliency (high-availability)

Page 21: 99 Problems, But The Search Ain't One

shards

a portion of the document space

each one is a separate Lucene index

thus, many per-index settings are available

document is sharded by its _id value

but can be assigned (routed) to a shard deterministically

Page 22: 99 Problems, But The Search Ain't One

zero-conf discovery

zen (multicast and unicast)

cloud (EC2 via API)

Page 23: 99 Problems, But The Search Ain't One

auto-routing

master node:

maintains cluster state

reassigns shards if nodes leave/join cluster

any node can serve as the request router

the query is handled via scatter-gather mechanism

Page 24: 99 Problems, But The Search Ain't One

replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after index creation

replicas can be used for querying in parallel

Page 25: 99 Problems, But The Search Ain't One

shard allocationnode 1

start with a single node

Page 26: 99 Problems, But The Search Ain't One

shard allocation

PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1}}

node 1person1person2

Page 27: 99 Problems, But The Search Ain't One

shard allocationnode 1person1person2

node 2person1person2

start the second node

Page 28: 99 Problems, But The Search Ain't One

shard allocationnode 1 node 2 node 3 node 4person1person2

person1person2

start 2 more nodes

Page 29: 99 Problems, But The Search Ain't One

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

start 2 more nodes

Page 30: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

PUT /person/info/1{ … }

Page 31: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

hashed to shard 1PUT /person/info/1{ … }

Page 32: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

replicated

PUT /person/info/1{ … }

Page 33: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

PUT /person/info/2{ … }

Page 34: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

hashed to shard 2

PUT /person/info/2{ … }

Page 35: 99 Problems, But The Search Ain't One

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

replicated

PUT /person/info/2{ … }

Page 36: 99 Problems, But The Search Ain't One

scatter-gathernode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 37: 99 Problems, But The Search Ain't One

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 38: 99 Problems, But The Search Ain't One

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 39: 99 Problems, But The Search Ain't One

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 40: 99 Problems, But The Search Ain't One

transactional model

per-document consistency

no need to commit/flush

uses write-behind transaction log

write consistency (W) can be controlled

one, quorum, or all

Page 41: 99 Problems, But The Search Ain't One

(near) real-time search

1 second refresh rate by default

_refresh API also

Page 42: 99 Problems, But The Search Ain't One

index storage

node data considered transient

can be stored in local file system, JVM heap, native OS memory, or FS & memory combination

persistent storage requires a gateway

Page 43: 99 Problems, But The Search Ain't One

gateways

persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:local

shared FS

Hadoop via HDFS

S3

Page 44: 99 Problems, But The Search Ain't One

mapping

describes document structure to the search engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a good idea)

can run into merge conflicts

Page 45: 99 Problems, But The Search Ain't One

mapping

important meta fields:

_source

_all

_boost

Page 46: 99 Problems, But The Search Ain't One

mapping types

simple:

string, integer/long, float/double, boolean, and null)

complex:

array, object

Page 47: 99 Problems, But The Search Ain't One

sample mapping

>?"39#?/%%%%%%?<9#B!:?E

%?-B-$9?/%%%%%?W17X-%(27B!?E

%?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE

%?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E

%?.#B1#B-I?/%%5Ndocu

men

t

>?.13-?/%>

%%?.#1.9#-B93?%/%>

%%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE

%%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-\/%;UVNE

%%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE

%%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9\/%[71\NE

%%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N

NNN

map

ping

Page 48: 99 Problems, But The Search Ain't One

analyzers

break down (tokenize) and normalize fields during indexing and query strings at search time

analyzer = tokenizer + token filters (0 or more)

*-27<2#<%A72$IZ9#%S

%%%*-27<2#<%+1:97BZ9#%]

%%%%%%%*-27<2#<%+1:97%^B$-9#%]

%%%%%%%_1K9#!239%+1:97%^B$-9#%]

%%%%%%%*-1.%+1:97%^B$-9#

Page 49: 99 Problems, But The Search Ain't One

analyzers

analyzers, tokenizers, and filters can be customizedB7<9P/

%%272$I3B3/

%%%%272$IZ9#/

%%%%%%.@&%,F/%%%%%%%%-I.9/%!"3-1@

%%%%%%%%-1:97BZ9#/%3-27<2#<

%%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E

%%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@Jelas

ticse

arch

.ym

l

`

?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE

`

map

ping

Page 50: 99 Problems, But The Search Ain't One

API

Page 51: 99 Problems, But The Search Ain't One

API conventions

append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter

Page 52: 99 Problems, But The Search Ain't One

API structure

http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/_status

GET http://es:9200/twitter/_status

POST http://es:9200/twitter/tweet/1

GET http://es:9200/twitter/tweet/1

Page 53: 99 Problems, But The Search Ain't One

API structure

http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/twitter/tweet/_search

GET http://es:9200/twitter/user/_search

GET http://es:9200/twitter/tweet,user/_search

GET http://es:9200/twitter,facebook/_search

GET http://es:9200/_search

Page 54: 99 Problems, But The Search Ain't One

_cluster API structure

GET /_cluster/health

GET /_cluster/health/index1,index2

GET /_cluster/nodes/stats

GET /_cluster/nodes/nodeId1,nodeId2/stats

Page 55: 99 Problems, But The Search Ain't One

API {core}

index

bulk

delete

delete by query

get

count

search

query

from/size paging

sort

highlighting

selective fields

Page 56: 99 Problems, But The Search Ain't One

API {indices}

create

delete

open/close

get/put/delete mapping

refresh

optimize

snapshot

update settings

analyze

status

flush

Page 57: 99 Problems, But The Search Ain't One

API {cluster}

health

state

nodes info

nodes stats

nodes shutdown

Page 58: 99 Problems, But The Search Ain't One

Query DSL

term / terms

range

prefix

bool

fuzzy

wildcard

query_string

default_operator

analyzer

phrase_slop

etc

Page 59: 99 Problems, But The Search Ain't One

filters

share some similar features with queries (term, range, etc)

why use a filter?

Page 60: 99 Problems, But The Search Ain't One

filters

faster than queries

cached (depends on the filter)

the cache is used for different queries against the same filter

no scoring

more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query

Page 61: 99 Problems, But The Search Ain't One

facets

provide aggregated data based on the search request

terms, histogram, date histogram, range, statistical, and more

Page 62: 99 Problems, But The Search Ain't One

geo search

implemented as filters (and a facet)

geo_distance

geo_bounding_box

geo_polygon

Page 63: 99 Problems, But The Search Ain't One

interfaces

REST

including memcached

Java /!Groovy

Language clients (REST/Thrift):

pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation

Page 64: 99 Problems, But The Search Ain't One

elastica

similar to the other PHP ElasticSearch client

API naming is consistent with Zend Framework

can be extended for new filters, facets, etc

still under development

Page 65: 99 Problems, But The Search Ain't One

elastica

$es = new Elastica_Client('vm', 9200);$index = new Elastica_Index($es, 'test');$index->create(array(), true);$type = new Elastica_Type($index, 'person');$doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski', 'email' => '[email protected]', 'username' => 'andrei', 'bills' => array(2, 3, 5)));$type->addDocument($doc);

$qs = new Elastica_Query_QueryString('andrei');$query = new Elastica_Query($qs);$resultSet = $type->search($query);print $resultSet->count();

exam

ple

Page 66: 99 Problems, But The Search Ain't One

data import

ES is not the primary data store (usually)

to import/synchronize data:

write an agent (Gearman, message queues, etc)

use rivers (CouchDB, RabbitMQ, Twitter)

Page 67: 99 Problems, But The Search Ain't One

10 more features

versioning

index aliases

parent/child docs

scripting

dynamic mapping templates

load balancing nodes

plugins

more_like_this

multi_field mapping

percolation