elasticsearch introduction at bigdata meetup

53
Introduction to Elasticsearch 27th May 2014 - BigData Meetup Eric Rodriguez @wavyx

Upload: eric-rodriguez

Post on 26-Jan-2015

125 views

Category:

Technology


7 download

DESCRIPTION

Global introduction to elastisearch presented at BigData meetup. Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...

TRANSCRIPT

Page 1: Elasticsearch Introduction at BigData meetup

Introduction to Elasticsearch27th May 2014 - BigData Meetup

Eric Rodriguez @wavyx

Page 2: Elasticsearch Introduction at BigData meetup

About MeEric Rodriguez Founder of data.be !• Web entrepreneur • Data addict • Multi-Language: PHP, Java/

Groovy/Grails, .Net, …

be.linkedin.com/in/erodriguez !github.com/wavyx !@wavyx

Page 3: Elasticsearch Introduction at BigData meetup

Elasticsearch - Company

• Founded in 2012 => http://www.elasticsearch.com

• Professional services

• Training

• Consultancy / Development support

• Production support subscription (3 levels of SLAs)

Page 4: Elasticsearch Introduction at BigData meetup

Enterprises using Elasticsearch

Page 5: Elasticsearch Introduction at BigData meetup

(M)ELK Stack

• Elasticsearch - Search server based on Lucene

• Logstash - Tool for managing events and logs

• Kibana - Visualize logs and time-stamped data

• Marvel - Monitor your cluster’s heartbeat

You Know, for Search…

Page 6: Elasticsearch Introduction at BigData meetup

Logstash• Collect, parse, index, and search logs

Page 7: Elasticsearch Introduction at BigData meetup

Kibana• A versatile dashboard to see and interact with your data

Page 8: Elasticsearch Introduction at BigData meetup

Marvel• Monitor the health of your cluster

cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)

Page 9: Elasticsearch Introduction at BigData meetup

real time, search and

analytics engine

open-source

Lucene

JSON

schema free

documentstore

RESTful

API

documentation

scalability

high availability

distributed

multi tenancy

per-operation persistence

Page 10: Elasticsearch Introduction at BigData meetup

Use Cases• Full-Text Search

• Data Store

• Analytics

• Alerts

• Ads

• …

Page 11: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 12: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 13: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 14: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 15: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 16: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 17: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 18: Elasticsearch Introduction at BigData meetup

Elasticsearch core• Apache Lucene is a high-performance, full-featured text search engine library

written entirely in Java

• Elasticsearch added value: “Simple is best”

• Simple API (with documentation)

• JSON & RESTful

• Sharding & Replication

• Extensibility: plugins and scripts

• Interoperability: clients and integrations

Page 19: Elasticsearch Introduction at BigData meetup

Terms for DBAs

• Index

• Type

• Document

• Fields

• Mapping

ElasticsearchRDBMs

• Database

• Table

• Row

• Column

• Schema

Page 20: Elasticsearch Introduction at BigData meetup

Plug & Play

• Zero configuration

• 4 LoC to get started ;)

Page 21: Elasticsearch Introduction at BigData meetup

Alive !

=> http://localhost:9200/?pretty

Page 22: Elasticsearch Introduction at BigData meetup

REST• Check your cluster, node, and index health, status, and statistics

• Administer your cluster, node, and index data and metadata

• Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes

• Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others

Page 23: Elasticsearch Introduction at BigData meetup

Basic Operations 1/3

• Add a document

• Create index

Page 24: Elasticsearch Introduction at BigData meetup

Basic Operations 2/3

• Modify/Replace a document

• Delete a document

• Delete index

Page 25: Elasticsearch Introduction at BigData meetup

Basic Operations 3/3• Update a document

Page 26: Elasticsearch Introduction at BigData meetup

Mapping 1/2

• Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, ..

• Explicit mapping is defined on an index/type level

• A default mapping is automatically created

Page 27: Elasticsearch Introduction at BigData meetup

Mapping 2/2• Core types: string, integer/long, float/double, boolean, and null

• Other types: Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment

• Example

Page 28: Elasticsearch Introduction at BigData meetup

Search API 1/2

• Multi-index, Multi-type

• Uri search - Google like Operators (AND/OR), fields, sort, paging, wildcards, …

Page 29: Elasticsearch Introduction at BigData meetup

Search API 2/2• Paging & Sort

• Fields: selection, scripts

• Post filter

• Highlighting

• Rescoring

• Explain

• …

Page 30: Elasticsearch Introduction at BigData meetup

Query DSL• “SQL” for elasticsearch

• Queries should be used

• for full text search

• where the result depends on a relevance score

• Filters should be used

• for binary yes/no searches

• for queries on exact values

Page 31: Elasticsearch Introduction at BigData meetup

Basic Queries

Page 32: Elasticsearch Introduction at BigData meetup

Basic Filters

Page 33: Elasticsearch Introduction at BigData meetup

Analysis 1/2• Analysis is extracting “terms” from a given text

• Processing natural language to make it computer searchable

• Configurable registry of Analyzers that can be used

• to break indexed (analyzed) fields when a document is indexed

• to process query strings

Page 34: Elasticsearch Introduction at BigData meetup

Analysis 2/2

• Analyzers are composed of

• a single Tokenizer (may be preceded by one or more CharFilters)

• zero or more TokenFilters

• Default Analyzersstandard, pattern, whitespace, language, snowball

Page 35: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 36: Elasticsearch Introduction at BigData meetup

Analytics• Aggregation of information: similar to “group by”

• Facets

• Aggregated data based on a search query

• One-dimensional results

• Ex: “term facets” return facetcounts for various values for a specific field Think color, tag, category, …

• Aggregations (ES 1.0+)

• Nested Facets

• Basic Stats: mean, min, max, std dev, term counts

• Significant Terms, Percentiles, Cardinality estimations

Page 37: Elasticsearch Introduction at BigData meetup

Facets• not yet deprecated, but use aggregations!

• Various Facets terms, range, histogram, date, statistical, geo distance, …

Page 38: Elasticsearch Introduction at BigData meetup

Aggregations• A generic powerful framework that can be divided into 2 main families:

• Bucketing Each bucket is associated with a key and a document criterion The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it.

• MetricAggregations that keep track and compute metrics over a set of documents.

• Aggregations can be nested !

Page 39: Elasticsearch Introduction at BigData meetup

Bucket Aggregators• global

• filter

• missing

• terms

• range

• date range

• ip range

• histogram

• date histogram

• geo distance

• geohash grid

• nested

• reverse nested

• top hits (version 1.3)

Page 40: Elasticsearch Introduction at BigData meetup

Metrics Aggregators• count

• stats

• extended stats

• cardinality

• percentiles

• min

• max

• sum

• avg

Page 41: Elasticsearch Introduction at BigData meetup

Search for end users

• Suggesters - “Did you mean” Terms, Phrases, Completion, Context

• “More like this” Find documents that are "like" provided text by running it against one or more fields

Page 42: Elasticsearch Introduction at BigData meetup

Percolator• Classic ES

1. Add & Index documents

2. Search with queries

3. Retrieve matching documents

• Percolator

1. Add & Index queries

2. Percolate documents

3. Retrieve matching queries

Page 43: Elasticsearch Introduction at BigData meetup

Why Percolate ?!

• Alerts: social media mentions, weather forecast, news alerts

• Automatic Monitoring: price monitoring, stock alerts, logs

• Ads: display targeted ads based on user’s search queries

• Enrich: percolate new documents, then add query matches as document tags

Page 44: Elasticsearch Introduction at BigData meetup

High Availability 1/2• Sharding - Write Scalability

• Split logical data over multiple machines & Control data flows

• Each index has a fixed number of shards

• Improve indexing performance

• Replication - Read Scalability

• Each shard can have 0-many replicas (dynamic setup)

• Removing SPOF (Single Point Of Failure)

• Improve search performance

Page 45: Elasticsearch Introduction at BigData meetup

High Availability 2/2• Zen Discovery

• Automatic discovery of nodes within a cluster and electing a master node

• Useful for failover and replication

• Specific modules: Amazon EC2, Microsoft Azure, Google Compute Engine

• Snapshot & Restore module

Page 46: Elasticsearch Introduction at BigData meetup

Cluster Management• Marvel - http://www.elasticsearch.org/overview/marvel/

• BigDesk - http://bigdesk.org/

• Paramedic - https://github.com/karmi/elasticsearch-paramedic

• KOPF - https://github.com/lmenezes/elasticsearch-kopf/

• Elastic HQ - http://www.elastichq.org/

Page 47: Elasticsearch Introduction at BigData meetup

Clients & Integration• Ecosystem: Kibana, Logstash, Marvel, Hadoop integration

• API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, …

• Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal, Wordpress, …

• Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ, Amazon SQS, File System, Twitter, Wikipedia, RSS, …

Page 48: Elasticsearch Introduction at BigData meetup

Fast & Furious EvolutionVersion 1.1March 25, 2014

• Cardinality Agg

• Percentiles Agg

• Significant Terms Agg

• Search Templates

• Cross fields search

• Alias for indices & templates

Version 1.2May 22, 2014• Java 7

• Indexing & Merging performance

• Aggregations performance

• Context suggester

• Deep scrolling

• Field value factor

Benchmark API coming in 1.3

Version 1.0Feb 12, 2014• Aggregations

• Snapshot & Restore

• Distributed Percolator

• Cat API

• Federated search

• Doc values

• Circuit breaker

Page 49: Elasticsearch Introduction at BigData meetup

Resources• http://www.elasticsearch.org/guide/

• http://www.elasticsearch.org/videos/

• http://www.elasticsearchtutorial.com/

• http://exploringelasticsearch.com/

• http://joelabrahamsson.com/elasticsearch-101/

• http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/

• http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html

Page 50: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch Server

http://www.packtpub.com/elasticsearch-server-2e/book

• Elasticsearch in Action http://www.manning.com/hinman/

Page 51: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch Cookbook

http://www.packtpub.com/elasticsearch-cookbook/book

• Mastering Elasticsearch http://www.packtpub.com/mastering-elasticsearch-querying-and-data-handling/book

Page 52: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch - The Definitive Guide

http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/

Page 53: Elasticsearch Introduction at BigData meetup

Thank [email protected] - @wavyx

be.linkedin.com/in/erodriguez - github.com/wavyxhttp://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/