elasticsearch with tire
DESCRIPTION
Introduction to how does a search engine do with elasticsearch and tire.TRANSCRIPT
![Page 1: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/1.jpg)
ElasticSearch with Tire@AbookYun, Polydice Inc.
1Wednesday, February 6, 13
![Page 2: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/2.jpg)
It’s all about Search
• How does search work?
• ElasticSearch
• Tire
2Wednesday, February 6, 13
![Page 3: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/3.jpg)
How does search work?
A collection of articles
• Article.find(1).to_json{ title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” }
• Article.find(2).to_json{ title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object-oriented programming language.” }
• Article.find(3).to_json{ title: “Three”, content: “Ruby is a song by English rock band.” }
3Wednesday, February 6, 13
![Page 4: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/4.jpg)
How does search work?
How do you search?
Article.where(“content like ?”, “%ruby%”)
4Wednesday, February 6, 13
![Page 5: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/5.jpg)
How does search work?
The inverted indexT0 = “it is what it is”T1 = “what is it”T2 = “it is a banana”
“a”: {2}“banana”: {2}“is”: {0, 1, 2}“it”: {0, 1, 2}“what”: {0, 1}
A term search for the terms “what”, “is” and “it”{0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1}
5Wednesday, February 6, 13
![Page 6: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/6.jpg)
How does search work?
The inverted indexTOKEN ARTICLESARTICLESARTICLES
ruby article_1 article_2 article_3
pink article_1
gemstone article_1
dynamic article_2
reflective article_2
programming article_2
song article_3
english article_3
rock article_3
6Wednesday, February 6, 13
![Page 7: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/7.jpg)
How does search work?
The inverted indexArticle.search(“ruby”)Article.search(“ruby”)Article.search(“ruby”)Article.search(“ruby”)
ruby article_1 article_2 article_3
pink article_1
gemstone article_1
dynamic article_2
reflective article_2
programming article_2
song article_3
english article_3
rock article_3
7Wednesday, February 6, 13
![Page 8: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/8.jpg)
How does search work?
The inverted indexArticle.search(“song”)Article.search(“song”)Article.search(“song”)Article.search(“song”)
ruby article_1 article_2 article_3
pink article_1
gemstone article_1
dynamic article_2
reflective article_2
programming article_2
song article_3
english article_3
rock article_3
8Wednesday, February 6, 13
![Page 9: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/9.jpg)
module SimpleSearch def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end
def analyze content # Split content by words into "tokens" content.split(/\W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end
def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end
def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end
INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there)
extend selfend
9Wednesday, February 6, 13
![Page 10: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/10.jpg)
SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”SimpleSearch.index “article2”, “Ruby is a song.”SimpleSearch.index “article3”, “Ruby is a stone.”SimpleSearch.index “article4”, “Java is a language.”
How does search work?
Indexing documents
10Wednesday, February 6, 13
![Page 11: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/11.jpg)
SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”SimpleSearch.index “article2”, “Ruby is a song.”SimpleSearch.index “article3”, “Ruby is a stone.”SimpleSearch.index “article4”, “Java is a language.”
Indexed document article1 with tokens:[“ruby”, “language”, “java”, “also”, “language”]Indexed document article2 with tokens:[“ruby”, “song”]Indexed document article3 with tokens:[“ruby”, “stone”]Indexed document article4 with tokens:[“java”, “language”]
How does search work?
Indexing documents
11Wednesday, February 6, 13
![Page 12: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/12.jpg)
print SimpleSearch::INDEX
{“ruby” => [“article1”, “article2”, “article3”],“language” => [“article1”, “article4”],“java” => [“article1”, “article4”],“also” => [“article1”],“stone” => [“article3”],“song” => [“article2”]
}
How does search work?
Index
12Wednesday, February 6, 13
![Page 13: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/13.jpg)
SimpleSearch.search “ruby”
Results for token ‘ruby’:* article1* article2* article3
How does search work?
Search the index
13Wednesday, February 6, 13
![Page 14: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/14.jpg)
How does search work?
Search is ...
Inverted Index{ “ruby”: [1,2,3], “language”: [1,4] }
+
Relevance Scoring
• How many matching terms does this document contain?
• How frequently does each term appear in all your documents?
• ... other complicated algorithms.
14Wednesday, February 6, 13
![Page 15: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/15.jpg)
ElasticSearch
ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.
http://github.com/elasticsearch/elasticsearch
15Wednesday, February 6, 13
![Page 16: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/16.jpg)
ElasticSearch
TerminologyRelational DB ElasticSearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Index *Everything
SQL query DSL
16Wednesday, February 6, 13
![Page 17: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/17.jpg)
# Add document
curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” }
# Delete document
curl -XDELETE ‘http://localhost:9200/articles/article/1’
# Search
curl -XGET ‘http://localhost:9200/articles/_search?q=One’
ElasticSearch
RESTful
17Wednesday, February 6, 13
![Page 18: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/18.jpg)
# Querycurl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”: { “title”: “One” } }}’# Results
{ “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”,
“_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] }
ElasticSearch
JSON in / JSON out
18Wednesday, February 6, 13
![Page 19: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/19.jpg)
ElasticSearch
Distributed
Automatic Discovery Protocol
Node 1 Node 2 Node 3 Node 4Master
The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node.
The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards.
19Wednesday, February 6, 13
![Page 20: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/20.jpg)
ElasticSearch
Distributed
Index A
by default, every Index will split into 5 shards and duplicated in 1 replicas.
A3A2A1 A5A4
A3’A2’A1’ A5’A4’
Shards
Replicas
20Wednesday, February 6, 13
![Page 21: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/21.jpg)
Queries
- query_string
- term
- wildcard
- boosting
- bool
- filtered
- fuzzy
- range
- geo_shape
- ...
Filters
- term
- query
- range
- bool
- and
- or
- not
- limit
- match_all
- ...
ElasticSearch
Query DSL
21Wednesday, February 6, 13
![Page 22: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/22.jpg)
Queries
- query_string
- term
- wildcard
- boosting
- bool
- filtered
- fuzzy
- range
- geo_shape
- ...
Filters
- term
- query
- range
- bool
- and
- or
- not
- limit
- match_all
- ...
ElasticSearch
Query DSL
With RelevanceWithout Cache
With CacheWithout Relevance
22Wednesday, February 6, 13
![Page 23: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/23.jpg)
curl -X DELETE "http://localhost:9200/articles"
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'
curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '
{
"query" : { "query_string" : {"query" : "T*"} }, "facets" : {
"tags" : { "terms" : {"field" : "tags"} } }
}'
ElasticSearch
Facets
23Wednesday, February 6, 13
![Page 24: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/24.jpg)
"facets" : {
"tags" : {
"_type" : "terms", "missing" : 0,
"total": 5, "other": 0,
"terms" : [ {
"term" : "foo", "count" : 2
}, { "term" : "bar",
"count" : 2
}, { "term" : "baz",
"count" : 1 } ]
}
ElasticSearch
Facets
24Wednesday, February 6, 13
![Page 25: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/25.jpg)
curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '{ "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } }}'curl -XGET 'http://localhost:9200/articles/article/_mapping'
ElasticSearch
Mapping
25Wednesday, February 6, 13
![Page 26: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/26.jpg)
curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '{ “article”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } } }}’curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’
ElasticSearch
Analyzer
C
C n oiu p e r t
u p
u p e
p e r
. . .
26Wednesday, February 6, 13
![Page 27: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/27.jpg)
Tire
A rich Ruby API and DSL for the ElasticSearch search engine.
http://github.com/karmi/tire/
27Wednesday, February 6, 13
![Page 28: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/28.jpg)
Tire
ActiveRecord Integration# New rails application$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb
# Callbackclass Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks
end
# Create a articleArticle.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now
# SearchArticle.search do
query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { by :published_on, 'desc' }end
28Wednesday, February 6, 13
![Page 29: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/29.jpg)
Tire
ActiveRecord Integrationclass Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => {
:analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } }
# Mapping mapping do
indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ endend
29Wednesday, February 6, 13
![Page 30: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/30.jpg)
Reference
# github
http://github.com/elasticsearch/elasticsearch
http://github.com/karmi/tire/
# Slides
https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine
https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011
https://speakerdeck.com/clintongormley/to-infinity-and-beyond
30Wednesday, February 6, 13
![Page 31: ElasticSearch with Tire](https://reader031.vdocuments.mx/reader031/viewer/2022013102/554f77e6b4c9052a518b4867/html5/thumbnails/31.jpg)
Thanks
31Wednesday, February 6, 13