elasticsearch - berlin buzzwords...
TRANSCRIPT
![Page 1: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/1.jpg)
elasticsearchThe Road to a
Distributed, (Near) Real Time, Search Engine
Shay Banon - @kimchy
Tuesday, June 7, 2011
![Page 2: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/2.jpg)
Lucene Basics -Directory
A File System Abstraction
Mainly used to read and write “files”
Used to read and write different index files
Tuesday, June 7, 2011
![Page 3: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/3.jpg)
Lucene Basics - IndexWriter
Used to add documents / delete documents from the index
Changes are stored in memory (possibly flushing to maintain memory limits)
Requires a commit to make changes “persistent”, which is expensive
A single IndexWriter can write to an index, expensive to create (reuse at all cost!)
Tuesday, June 7, 2011
![Page 4: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/4.jpg)
Lucene Basics - Index Segments
An index is composed of internal segments
Each segment is almost a self sufficient index by itself, immutable up to deletes
Commits “officially” adds segments to the index, though internal flushing might create new segments as well
Segments are merged continuously
A lot of caching per segment (terms, field)Tuesday, June 7, 2011
![Page 5: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/5.jpg)
Lucene Basics - (Near) Real Time
IndexReader is the basis for searching
IndexWriter#getReader allows to get a refreshed reader that sees changes done to IW
Requires flushing (but not committing)
Can’t call it on each operation, too expensive
Segment based readers and search
Tuesday, June 7, 2011
![Page 6: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/6.jpg)
Distributed DirectoryImplement a Directory that works on top of a distributed “system”
Store file chunks, read them on demand
Implemented for most (Java) data grids
Compass - GigaSpaces, Coherence, Terracotta
Infinispan
Tuesday, June 7, 2011
![Page 7: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/7.jpg)
Distributed Directory
DIR
Node
IndexWriter
IndexReader
ChunkChunk
NodeChunk
Chunk
NodeChunk
Chunk
Tuesday, June 7, 2011
![Page 8: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/8.jpg)
Distributed Directory“Chatty”- many network roundtrips to fetch data
Big indices still suffer from a non distributed IndexReader
Lucene IndexReader can be quite “heavy”
Single IndexWriter problem, can’t really scale writes
Tuesday, June 7, 2011
![Page 9: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/9.jpg)
Partitioning
Document Partitioning
Each shard has a subset of the documents
A shard is a fully functional “index”
Term Partitioning
Shards has subset of terms for all docs
Tuesday, June 7, 2011
![Page 10: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/10.jpg)
Partitioning - Term Based
pro: K term query -> handled at most by K shards
pro: O(K) disk seeks for K term query
con: high network traffic
data about each matching term needs to be collected in one place
con: harder to have per doc information (facets / sorting / custom scoring)
Tuesday, June 7, 2011
![Page 11: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/11.jpg)
Partitioning - Term Based
Riak Search - Utilizing its distributed key-value storage
Lucandra (abandoned, replaced by Solandra)
Custom IndexReader and IndexWriter to work on top of Cassandra
Very very “chatty” when doing a search
Does not work well with other Lucene constructs, like FieldCache (by doc info)
Tuesday, June 7, 2011
![Page 12: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/12.jpg)
Partitioning - Document Based
pro: each shard can process queries independently
pro: easy to keep additional per-doc information (facets, sorting, custom scoring)
pro: network traffic small
con: query has to be processed by each shard
con: O(K*N) disk seeks for K term on N shard
Tuesday, June 7, 2011
![Page 13: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/13.jpg)
Distributed Lucene Doc Partitioning
Shard Lucene into several instances
Index a document to one Lucene shard
Distribute search across Lucene shards
Lucene Lucene Lucene
Search Index
Tuesday, June 7, 2011
![Page 14: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/14.jpg)
Distributed Lucene Replication
Replicated Lucene Shards
High Availability
Scale search by searching replicas
Lucene Lucene
Tuesday, June 7, 2011
![Page 15: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/15.jpg)
Pull ReplicationMaster - Slave configuration
Slave pulls index files from the master (delta, only new segments)
LuceneSegmentSegment
SegmentLucene
SegmentSegment
Segment
Tuesday, June 7, 2011
![Page 16: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/16.jpg)
Pull Replication - Downsides
Requires a “commit”on master to make changes available for replication to slave
Redundant data transfer as segments are merged (especially for stored fields)
Friction between commit (heavy) and replication, slaves can get “way” behind master (big new segments), looses HA
Does not work for real time search, slaves are “too” behind
Tuesday, June 7, 2011
![Page 17: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/17.jpg)
Push Replication“Master/Primary” push to all the replicas
Indexing is done on all replicas
Lucene Lucene
Client
Doc
Doc
Tuesday, June 7, 2011
![Page 18: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/18.jpg)
Push Replication - Downsides
Indexing the document on all nodes
(though less data transfer over the wire)
Delicate control over concurrent indexing operations
Usually solved using versioning
Tuesday, June 7, 2011
![Page 19: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/19.jpg)
Push Replication - Benefits
Documents indexed are immediately available on all replicas
Improves High Availability
Allows for (near) real time search architecture
Architecture allows to switch “roles” ->
Primary dies, slave can become primary, and still allow indexing
Tuesday, June 7, 2011
![Page 20: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/20.jpg)
Push Replication - IndexWriter#commit
IndexWriter#commit is heavy, but required in order to make sure data is actually persisted
Can be solved by having a write ahead log that can be replayed on the event of a crash
Can be more naturally supported in push replication
Tuesday, June 7, 2011
![Page 21: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/21.jpg)
elasticsearchhttp://www.elasticsearch.org
Tuesday, June 7, 2011
![Page 22: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/22.jpg)
index - shards and replicas
Node Node
Client
curl -XPUT localhost:9200/test -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 23: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/23.jpg)
index - shards and replicas
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 24: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/24.jpg)
indexing - 1
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test/type1/1 -d '{ "name" : { "first" : "Shay", "last" : "Banon" } , "title" : "ElasticSearch - A distributed search engine"}'
Automatic sharding, push replication
Tuesday, June 7, 2011
![Page 25: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/25.jpg)
indexing - 2
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test/type1/2 -d '{ "name" : { "first" : "Shay", "last" : "Banon" } , "title" : "ElasticSearch - A distributed search engine"}'
Automatic request “redirection”
Tuesday, June 7, 2011
![Page 26: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/26.jpg)
search - 1
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test/_search?q=test
Scatter / Gather search
Tuesday, June 7, 2011
![Page 27: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/27.jpg)
search - 2
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test/_search?q=test
Automatic balancing between replicas
Tuesday, June 7, 2011
![Page 28: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/28.jpg)
search - 3
NodeShard 0
(primary)
Shard 1(replica)
NodeShard 0(replica)
Shard 1(primary)
Client
curl -XPUT localhost:9200/test/_search?q=test
failure
Automatic failover
Tuesday, June 7, 2011
![Page 29: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/29.jpg)
adding a node
NodeShard 0
(primary)
Shard 1(replica)
Node
Shard 1(primary)
Shard 0(replica)
“Hot” relocation of shards to the new node
Tuesday, June 7, 2011
![Page 30: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/30.jpg)
adding a node
NodeShard 0
(primary)
Shard 1(replica)
Node
Shard 1(primary)
NodeShard 0(replica)
“Hot” relocation of shards to the new node
Tuesday, June 7, 2011
![Page 31: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/31.jpg)
adding a node
NodeShard 0
(primary)
Shard 1(replica)
Node
Shard 1(primary)
NodeShard 0(replica)
“Hot” relocation of shards to the new node
Shard 0(replica)
Tuesday, June 7, 2011
![Page 32: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/32.jpg)
node failure
Node
Shard 1(primary)
NodeShard 0(replica)
NodeShard 0
(primary)
Shard 1(replica)
Tuesday, June 7, 2011
![Page 33: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/33.jpg)
node failure - 1
Node
Shard 1(primary)
NodeShard 0
(primary)
Replicas can automatically become primaries
Tuesday, June 7, 2011
![Page 34: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/34.jpg)
node failure - 2
Node
Shard 1(primary)
NodeShard 0
(primary)
Shards are automatically assigned, and do “hot” recovery
Shard 0(replica)
Shard 1(replica)
Tuesday, June 7, 2011
![Page 35: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/35.jpg)
dynamic replicas
NodeShard 0
(primary)
NodeShard 0(replica)
Client
curl -XPUT localhost:9200/test -d '{ "index" : { "number_of_shards" : 1,
"number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 36: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/36.jpg)
dynamic replicas
NodeShard 0
(primary)
Node NodeShard 0(replica)
Client
Tuesday, June 7, 2011
![Page 37: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/37.jpg)
dynamic replicas
NodeShard 0
(primary)
Node NodeShard 0(replica)
Client
Shard 0(replica)
curl -XPUT localhost:9200/test/_settings -d '{ "index" : {
"number_of_replicas" : 2 }}'
Tuesday, June 7, 2011
![Page 38: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/38.jpg)
multi tenancy -indices
Node Node Node
Client
curl -XPUT localhost:9200/test1 -d '{ "index" : { "number_of_shards" : 1, "number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 39: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/39.jpg)
multi tenancy -indices
Nodetest1 S0
(primary)
Node Nodetest1 S0(replica)
Client
curl -XPUT localhost:9200/test1 -d '{ "index" : { "number_of_shards" : 1, "number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 40: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/40.jpg)
multi tenancy -indices
Nodetest1 S0
(primary)
Node Nodetest1 S0(replica)
Client
curl -XPUT localhost:9200/test2 -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 }}'
Tuesday, June 7, 2011
![Page 41: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/41.jpg)
multi tenancy -indices
Nodetest1 S0
(primary)
Node Nodetest1 S0(replica)
Client
curl -XPUT localhost:9200/test2 -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 }}'
test2 S0(replica)
test2 S1(primary)
test2 S1(replica)
test2 S0(primary)
Tuesday, June 7, 2011
![Page 42: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/42.jpg)
multi tenancy - indices
Search against specific index
curl localhost:9200/test1/_search
Search against several indices
curl localhost:9200/test1,test2/_search
Search across all indices
curl localhost:9200/_search
Can be simplified using aliasesTuesday, June 7, 2011
![Page 43: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/43.jpg)
transaction logIndexed / deleted doc is fully persistent
No need for a Lucene IndexWriter#commit
Managed using a transaction log / WAL
Full single node durability (kill dash 9)
Utilized when doing hot relocation of shards
Periodically “flushed” (calling IW#commit)
Tuesday, June 7, 2011
![Page 44: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/44.jpg)
many more...(dist. related)
Custom routing when indexing and searching
Different “search execution types”
dfs, query_then_fetch, query_and_fetch
Complete non blocking, event IO based communication (no blocking threads on sockets, no deadlocks, scalable with large number of shards/replicas)
Tuesday, June 7, 2011
![Page 45: elasticsearch - Berlin Buzzwords 20112011.berlinbuzzwords.de/.../files/elasticsearch-bbuzz2011.pdf · elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5a8141797f8b9a571e8d3f48/html5/thumbnails/45.jpg)
ThanksShay Banon, twitter: @kimchy
elasticsearch
http://www.elasticsearch.org/
twitter: @elasticsearch
github: https://github.com/elasticsearch/elasticsearch
Tuesday, June 7, 2011