Download - Your Data, Your Search, Elasticsearch
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Your data, your search, Elasticsearch
Costin Leau@costinl
Agenda
Elasticsearch
Big Data
Analytics
What is Elasticsearch?
Open-Source Search & Analytics engine - Structured & Unstructured Data
- Real Time
- Analytics capabilities (facets)
- REST based
Distributed- Designed for the Cloud
- Designed for Big Data
What is Elasticsearch?
Open-Source Search & Analytics engine - Structured & Unstructured Data
- Real Time
- Analytics capabilities (facets)
- REST based
Distributed- Designed for the Cloud
- Designed for Big Data
Lightweight
What is Elasticsearch?
Open-Source Search & Analytics engine - Structured & Unstructured Data
- Real Time
- Analytics capabilities (facets)
- REST based
Distributed- Designed for the Cloud
- Designed for Big Data
Lightweight
Popular: ~200K dl/month
Users
Users
Platform adoption
http://www.thoughtworks.com/radar#platforms 2013
Platform adoption
http://www.thoughtworks.com/radar#platforms 2013
Use Case – Text search1.3 billion files, 130 billion lines of code
https://github.com/blog/1381-a-whole-new-code-search
Use Case - Geolocation50 million venues / day
Use Case - Recommandationsmillions of recommandations
Use Case – Support/Reporting
Use Case – Centralized Logging
Use Case – Pure Analytics
Plug & Play
Instalation
$ wget https://download.elasticsearch.org/...
$ tar -xf elasticsearch-0.90.3.tar.gz
$ ./elasticsearch-0.90.3/bin/elasticsearch
... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...
Index a document
$ curl -X PUT localhost:9200/products/product/1 -d '{
"title" : "Welcome!"}'
Update a document
$ curl -X PUT localhost:9200/products/product/1 -d '{
"title" : "Welcome to SpringOne2GX 2013!"}'
Search for documents...
$ curl -X GET localhost:9200/products/_search?q=welcome
Scaling out
$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node2
...[cluster.service] [Node2] detected_master [Node1] ...
Primaries and Replicas
curl -XPUT 'http://localhost:9200/a/' -d '{
"settings" : {"index" : {
"number_of_shards" : 3,"number_of_replicas" : 1
}}
}'
A1 Replicas
Primaries
A2
A3
A1
A2
A3
Scaling out
$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node3
...[cluster.service] [Node3] detected_master [Node1] ...
JSON & HTTP
{"id" : "abc123“,"title" : "A JSON Document“,"body" : "A JSON document is a ...“,"published_on" : "2013/06/27 10:00:00“,"featured" : true, "tags" : ["search", "json"],"author" : {"first_name" : "Clara","last_name" : "Rice","email" : "[email protected]"
}}
http:// Lingua Franca of APIs
Also supported: Native Java protocol, Thrift, Memcached
Search & Find$ curl -X GET "http://localhost:9200/_search?q=<YOUR QUERY>"
Termsappleapple iphone
Phrases "apple iphone"
Proximity "apple safari"~5
Fuzzy apple~0.8
Wildcardsapp**pp*
Boosting apple^10 safari
Range[2011/05/01 TO 2011/05/31][java TO json]
Booleanapple AND NOT iphone+apple -iphone(apple OR iphone) AND NOT review
Fieldstitle:iphone^15 OR body:iphonepublished_on:[2011/05/01 TO "2011/05/27 10:00:00“]
Query DSLcurl -X GET localhost:9200/articles/_search -d '{
"query" : {"filtered" : {"query" : {
"bool" : {
"must" : {"match" : {"author.first_name" : {
"query" : "claire","fuzziness" : 0.1
}}
},
"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]
}}
}
},
"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }
]}
}}
}'
Query DSLcurl -X GET localhost:9200/articles/_search -d '{
"query" : {"filtered" : {"query" : {
"bool" : {
"must" : {"match" : {"author.first_name" : {
"query" : "claire","fuzziness" : 0.1
}}
},
"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]
}}
}
},
"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }
]}
}}
}'
Query DSLcurl -X GET localhost:9200/articles/_search -d '{
"query" : {"filtered" : {"query" : {
"bool" : {
"must" : {"match" : {"author.first_name" : {
"query" : "claire","fuzziness" : 0.1
}}
},
"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]
}}
}
},
"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }
]}
}}
}'
Query DSLcurl -X GET localhost:9200/articles/_search -d '{
"query" : {"filtered" : {"query" : {
"bool" : {
"must" : {"match" : {"author.first_name" : {
"query" : "claire","fuzziness" : 0.1
}}
},
"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]
}}
}
},
"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }
]}
}}
}'
Search types
Full-text Search
Structured Search
Custom Scoring
“Find all articles from year 2013 tagged ‘search’”
“Find all articles with ‘search’ in their title or body, give matches in titles higher score”
See custom_score and custom_filters_score queries
User Search Engine
Fetch document field ➝
Pick configured analyzer ➝
Parse text into tokens ➝
Apply token filters ➝
Store into index
Search perspectives
Slice & Dice
Query
Facets
OLAP Cube
Dimensions, measures, aggregations
Slice Dice Drill Down / Roll Up
Show me sales numbers for all products across all locations in year 2013
Show me product A sales numbers across all locations over all years
Show me products sales numbers in location X over all years
Clients
Pick your language
Java
Perl*
Python*
Ruby*
Php*
Javascript
.Net
scala
clojure
go
Erlang
Eventmachine
Cli
Smalltalk
Ocaml
Spring Data
Spring Data Elasticsearch
Easy to use Elasticsearch in a Spring-powered app
Configuring Elasticsearch client
Dedicated template for one-liners
Repository support
Configuration
<beans xmlns:es=“http://www.sf.org/schema/data/elasticsearch”>
<es:repositories base-package=“com.acme” /><es:transport-client id="client"
cluster-nodes="localhost:9300,someip:9300" /></beans>
@Configuration@EnableElasticsearchRepositories(basePackages = “com/acme")static class Config {@Bean public ElasticsearchOperations elasticsearchTemplate() {
return new ElasticsearchTemplate(nodeBuilder().local(true).node().client());}
}
Dedicated Template
Create/delete index/mappings
Query options
– Criteria
– String
– Search
Bulk operations
Scrolling/streaming
Repositories
public interface BookRepository extends Repository<Book, String> {
List<Book> findByNameAndPrice(String name, Integer price);
List<Book> findByNameOrPrice(String name, Integer price);
Page<Book> findByName(String name,Pageable page);
Page<Book> findByNameNot(String name,Pageable page);
Page<Book> findByPriceBetween(int price,Pageable page);
Page<Book> findByNameLike(String name,Pageable page);
@Query("{‘bool’ : {‘must’ : {‘field’:{‘message’ : ‘?0’}}}}")Page<Book> findByMessage(String message, Pageable pageable);
}
Sophisticated query creation
Keyword Example
And/Or findByNameAndPrice
Is findByName
Not findByNameNot
Less/GreaterThanEqual findByPriceLessThan
Before/After findByPriceAFter
Starting/EndingWith findByNameEndingWith
Contains/Containing findByNameContaining
OrderBy findByCountryOrderByName
True/False findByRetiredFalse
Near soon
Big Data
A Holistic View of a Big Data System
ETL
Real TimeStreams
Unstructured Data (HDFS)
RT Semi structuredDatabase(hBase, Cassandra,Mongo)
Big SQL(Greenplum,AsterData,Etc…)
BatchProcessingReal-Time
Processing(s4, storm)
Analytics
ETL
Real TimeStreams
Unstructured Data (HDFS)
RT Semi structuredDatabase(hBase, Cassandra,Mongo)
Big SQL(Greenplum,AsterData,Etc…)
BatchProcessing
Analytics
Real-TimeProcessing(s4, storm)
A Holistic View of a Big Data System
Hadoop eco-system
Hadoop Distributed File System (HDFS)
Map Reduce Framework (MapRed)
Elasticsearch - Hadoop
Read/write data to Hadoop transparently
• Hadoop Input/OutputFormat
• Cascading Tap
• Pig Storage
• Hive SerDe
Native Map/Reduce model
Elasticsearch + Hadoop
Writing
0
10
20
30
40
50
60
M/R Pig Hive
Raw
0
10
20
30
40
50
60
M/R Pig Hive
Raw
Reading / Querying
Data Ingestion
DIY
Logstash
Flume
Graylog2
HDFS
Logstash
Tool for managing events and logs
Collect, parse and store
Tons of
– inputs (~40)
– codecs (~11)
– filters(~40)
– outputs (~50)
Kibana
Make senses of logging data
Runs inside your browser
Highly customizable
Leverages Elasticsearch aggregations/facets
Thank you!@costinl