search and analyze your data with elasticsearch

52
SEARCH AND ANALYZE YOUR DATA WITH ELASTIC SEARCH Anton Udovychenko JEEConf May 20, 2016

Upload: anton-udovychenko

Post on 13-Apr-2017

774 views

Category:

Software


7 download

TRANSCRIPT

Page 1: Search and analyze your data with elasticsearch

SEARCH AND ANALYZE YOUR DATA WITH ELASTICSEARCHAnton Udovychenko

JEEConf May 20, 2016

Page 2: Search and analyze your data with elasticsearch

ABOUT ME Software Architect @ Levi9 8+ years of Java experience Passionate about agile methodology and clean code

http://ua.linkedin.com/in/antonudovychenko

http://www.slideshare.net/antonudovychenko

Page 3: Search and analyze your data with elasticsearch

AGENDA•Why does search matter to you•Why Elasticsearch• Basic Concepts• Comparison with SQL• Elasticsearch usage• Elasticsearch and Java•Q&A

Page 4: Search and analyze your data with elasticsearch

WHY DOES SEARCH MATTER TO YOU

Page 5: Search and analyze your data with elasticsearch

WHY DOES SEARCH MATTER TO YOU

Page 6: Search and analyze your data with elasticsearch

WHAT IS IT ABOUT

Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics

engine, designed for horizontal scalability, high availability

Page 7: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics

engine, designed for horizontal scalability, high availability

Page 8: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,

schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

Page 9: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-

oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

Apache 2.0 License

Page 10: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-

oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

{ "title": "My blogpost", "body": "Having a lot of text...", "user": “es_user", "postDate": "2016-01-01 15:03:32"}

Page 11: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine,

designed for horizontal scalability, high availability

REST API

Page 12: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,

schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

Page 13: Search and analyze your data with elasticsearch

Image via batman-news.com

Page 14: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH - ALTERNATIVES

– Complex logic (No additional level of abstraction)

+ More fine-grained control= Elasticsearch is based on Lucene

Page 15: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH - ALTERNATIVES

– Proprietary protocol– Real-time caveats– Difficult to go to cloud– More difficult to start using– Smaller community

Sphinx+ Faster on a cold start+ Occupies less memory= Non Java based (C++)

Page 16: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH - ALTERNATIVES

+ Truly open-source+ Primary support of Hadoop distributors+ ZooKeeper is more mature than Zen= Near Real-Time Search= Similar performance

– More difficult to start using– SolrCloud (vs ES out of the box)– Zookeeper is harder to use then Zen– Worse operational tools– Worse monitoring tools– Worse analytical abilities

Page 17: Search and analyze your data with elasticsearch

WHY ELASTICSEARCH

Page 18: Search and analyze your data with elasticsearch

BASIC CONCEPTS•Near realtime•Cluster•Node• Index• Type•Document• Shards and replicas

Page 19: Search and analyze your data with elasticsearch

BASIC CONCEPTS

Cluster

Page 20: Search and analyze your data with elasticsearch

BASIC CONCEPTS

Node Node Node

Page 21: Search and analyze your data with elasticsearch

BASIC CONCEPTS

Shard Shard

Shard

Shard

Shard

Shard

ShardShard

Page 22: Search and analyze your data with elasticsearch

BASIC CONCEPTS

Shard Shard

Shard

Shard

Shard

Shard

ShardShard

Index

Page 23: Search and analyze your data with elasticsearch

BASIC CONCEPTS

Shard

Segm

ent

Segm

ent

Segm

ent

Segm

ent

Lucene Index

Page 24: Search and analyze your data with elasticsearch

BASIC CONCEPTSSegment core

Term Freq

DocIds

brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1

Inverted indexDocId

Fields

0 Text: The quick brown fox jumped over the lazy dog Author: Bob

1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill

Document store

0 2101 90

Column store

Likes0 591 23

Shared

Page 25: Search and analyze your data with elasticsearch

BASIC CONCEPTSSegment coreDocId

Fields

0 Text: The quick brown fox jumped over the lazy dog Author: Bob

1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill

Document store

0 2101 90

Column store

Likes0 591 23

Shared

Search term: Leaping brown Fox

Term Freq

DocIds

brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1

Inverted index

Page 26: Search and analyze your data with elasticsearch

SQL

ELASTIC

Page 27: Search and analyze your data with elasticsearch

COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow DocumentColumn field Field

Page 28: Search and analyze your data with elasticsearch

COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow Document with propertiesColumn field Field

Page 29: Search and analyze your data with elasticsearch

COMPARISON WITH SQLid title body user postDate1 My first

blogpostHaving a lot of text... es_user 2016-01-01

15:03:32

2 About search

The search data sometimes has a peculiar property…

es_user 2016-01-01 19:22:03

3 Introduction to Elasticsearch

Once I have stumbled upon this idea…

es_user 2016-01-03 11:55:41

Page 30: Search and analyze your data with elasticsearch

COMPARISON WITH SQLPOST http://localhost:9200/blog

CREATE DATABASE blog;USE blog;CREATE TABLE post( id bigint(20) AUTO_INCREMENT,

title varchar(250), body text, user varchar(50), postDate timestamp, PRIMARY KEY(id));

{"mappings": { "post": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user": { "type": "string" }, "postDate": { "type": "date"} } } }

(not obligatory)

Page 31: Search and analyze your data with elasticsearch

COMPARISON WITH SQL (CREATE)

POST http://localhost:9200/blog/postINSERT INTO post( title, body, user, postDate)VALUES( 'My blogpost', 'Having a lot of text...', ‘es_user', '2016-01-01 15:03:32');

{ "title": "My blogpost", "body": "Having a lot of text...", "user": "es_user", "postDate": "2016-01-01 15:03:32"}

Page 32: Search and analyze your data with elasticsearch

COMPARISON WITH SQL (UPDATE)

POST http://localhost:9200/blog/post/1/_update

UPDATE post SET title='My blogpost‘WHERE id=1;

{ "doc": { "title": "My blogpost" }}

Page 33: Search and analyze your data with elasticsearch

COMPARISON WITH SQL (DELETE)

DELETE http://localhost:9200/blog/post/1DELETE FROM post WHERE id=1

Page 34: Search and analyze your data with elasticsearch

COMPARISON WITH SQL (READ)

GET http://localhost:9200/blog/post/1SELECT * FROM post WHERE id=1

SELECT * FROM post GET http://localhost:9200/blog/post/_search

SELECT * FROM post WHERE user=‘es_user’

GET http://localhost:9200/blog/post/_search?q=user:es_user

Page 35: Search and analyze your data with elasticsearch

COMPARISON WITH SQL (READ)

POST http://localhost:9200/blog/post/_search

SELECT * FROM post WHERE body LIKE '%Having %';

{ "query": { "match": { "body": "Having" } }}

Page 36: Search and analyze your data with elasticsearch

DEMO TIME

Page 37: Search and analyze your data with elasticsearch

ELASTICSEARCH AND JAVA

• Native Java client• Spring Data Elasticsearch• REST endpoints• Jest (https://github.com/searchbox-io/Jest)

https://github.com/terrafant/es-feeder

Page 38: Search and analyze your data with elasticsearch

DEMO TIME

Page 39: Search and analyze your data with elasticsearch

Application

ELASTICSEARCH USAGE

ES c

lient

JDBC

DB

Elasticsearch

cluster

REST

Nativ e

Request

SQL

Binary

JSON

Page 40: Search and analyze your data with elasticsearch

ELASTICSEARCH USAGE (DETAILS)

Load

bal

ance

r

Master-

eligible

Node

Master-

eligible

Node

ClientNode

DataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

e

Master

Node

ClientNode

ClientNode

Elas

ticse

arch

clu

ster

Page 41: Search and analyze your data with elasticsearch

ELASTICSEARCH USAGE (ELK)

Frontend Backend

Elasticsearch Kibana

Logstash

Brow

ser

DB

Logstash

Logstash

Broker

Page 42: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security

Page 43: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain

Page 44: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes

Page 45: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)

Page 46: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings

Page 47: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 2

Page 48: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory

Page 49: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user

Page 50: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools

Page 51: Search and analyze your data with elasticsearch

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools10.Use Oracle JDKs

Page 52: Search and analyze your data with elasticsearch

THANK YOU!Get social@elastic

Explore the docselastic.co/guide

Give it a tryelastic.co/downloads/elasticsearch

Join the communitydiscuss.elastic.com

Check ELK stackdemo.elastic.co