elasticsearch : petit déjeuner du 13 mars 2014

Petit dejeuner

13 Mars 2014

Hotel Intercontinental

[email protected]

[email protected]

Agenda & Intervenants

Introduction

Alter Way in 2 slides

Elasticsearch in 1 slide

• More than 6 million downloads

• 450,000 New Downloads per Month

• 1000s of Mission Critical Implementations

• Top Investors: Benchmark Capital, Index

Ventures

• Seasoned Executive Team

– Founded by Creator of Elasticsearch

– Seasoned Executives from SpringSource

Les enjeux de la recherche à

l’ère du BigData

Big Data in Todayʼ’s Business and Technology

Environment : some significant figures

• 2.7 Zetabytes of data exist in the digital universe today. (=1 billion Terabytes)

• 235 Terabytes of data has been collected by the U.S. Library of Congress in April

2011.

• Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.

• Akamai analyzes 75 million events per day to better target advertisements.

• Walmart handles more than 1 million customer transactions every hour, which is

imported into databases estimated to contain more than 2.5 petabytes of data.

• The largest AT&T database boasts titles including the largest volume of data in one

unique database (312 terabytes) and the second largest number of rows in a unique

database (1.9 trillion), which comprises AT&Tʼ’s extensive calling records.

• Hadoop :

– 94% of Hadoop users perform analytics on large volumes of data not possible

before

– 88% analyze data in greater detail;

– while 82% can now retain more of their data.

The Rapid Growth of Unstructured Data

• YouTube users upload 48 hours of new video every minute of the

day.

• 500+ new websites are created every minute of the day.

• Brands and organizations on Facebook receive 34,722 Likes every

minute of the day.

• 100 terabytes of data uploaded daily to Facebook.

• According to Twitterʼ’s own research in early 2012, it sees roughly

175 million tweets every day, and has more than 465 million

accounts.

• 30 Billion pieces of content shared on Facebook every month.

Data production will be 44 times greater in 2020 than it was in 2009. .

Big Data & Real Business Issues

• 25+ % of decision‐makers surveyed predict that data volumes in their

companies will rise by more than 60% by the end of 2014, with the

average of all respondents anticipating a growth of no less than 42 %.

• 40% projected growth in global data generated per year vs. 5% growth in

global IT spending.

• According to estimates, the volume of business data worldwide, across all

companies, doubles every 1.2 years.

– Poor data can cost businesses 20%–35% of their operating revenue.

– Bad data or poor data quality costs US businesses $600 billion annually.

• 75+ % of decision-makers surveyed anticipate significant impacts in the

domain of storage systems as a result of the “Big Data” phenomenon.

• We anticipate a new challenge : to be able to Search and Analyse all

those datas … in real time !

Présentation des

fonctionnalités

d’Elasticsearch, Logstash,

Kibana et Marvel

StartUp

search = like % ?SELECT doc.*, pays.* FROM doc, pays WHERE doc.pays_code = pays.code AND doc.date_doc > to_date('2011-12', 'yyyy-mm') AND doc.date_doc < to_date('2012-01', 'yyyy-mm') AND lower(pays.libelle) = 'france' AND lower(doc.commentaire) LIKE ‘%produit%' AND lower(doc.commentaire) LIKE ‘%david%';

StartUp

Moteur de recherche ?

elasticsearch ?

plug & play

REST/JSON

scalable

Apache 2 license

Lucene

elasticsearch

Start…

$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.tar.gz!$ tar -xf elasticsearch-1.0.0.tar.gz!$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Ghost Maker] {1.0.0}[5645]: initializing

… and play!$ curl -XPUT localhost:9200/sessions/session/1 -d '{! "title" : "Elasticsearch",! "subtitle" : "Make sense of your (BIG) data !",! "date" : "2014-02-13T16:30:00",! "tags" : [ "elasticsearch", "elosi", "cloud" ],! "speaker" : [{! "first_name" : "David", ! "last_name" : "Pilato" ! }]!}'

Search!$ curl -XPOST http://localhost:9200/sessions/session/_search -d' { "query": { "multi_match": { "query": "elasticsearch elosi david", "fields": [ "title^2", "speaker.first_name" ] } }, "post_filter": { "range": { "date": { "from": "2014-02-13", "to": "2014-02-14" } } } }'

StartUp

Compute?

$ curl -XPOST http://localhost:9200/sessions/session/_search -d' { "query": { ... }, "aggs": { "by_date": { "date_histogram": { "field": "date", "interval": "day", "format" : "dd/MM/yyyy" } } } }'

"by_date": [ { "key_as_string": "11/02/2014", "doc_count": 1 }, { "key_as_string": "12/02/2014", "doc_count": 2 }, { "key_as_string": "13/02/2014", "doc_count": 3 } ]

Compute!

Let’s make sense of …• logs

• twitter

• github

• marketing data

• ...

• your data

• your big data

{ "name":"Pilato David", "dateOfBirth":"1971-12-26", "gender":"male", "marketing":{ "fashion":334, "music":3363, "hifi":2351 }, "address":{ "country":"France", "countrycode":"FR", "city":"Paris" } }

Let’s make sense of …• logs

• twitter

• github

• marketing data

• ...

• your data

• your big data

démo#mstechdays #elasticsearch StartUp

MAKE SENSE OF YOUR (BIG) DATA!

let’s inject some marketing documents…

ELASTICSEARCH

Elastique ? Distribué ?

StartUp

Distributed indices node 1

orders

products

1 2

3 4

1 2

$ curl -XPUT localhost:9200/orders -d '{! "settings.index.number_of_shards" : 4,! "settings.index.number_of_replicas" : 1!}'

$ curl -XPUT localhost:9200/products -d '{! "settings.index.number_of_shards" : 2,! "settings.index.number_of_replicas" : 0!}'

StartUp


orders

products

1 2

3 4

1 2

node 2

$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Armageddon] {1.0.0}[5645]: initializing [INFO ][cluster.service][Armageddon] detected_master [Ghost Maker]

StartUp


orders

products

1

4

1

node 2

orders

products

2

3

2

2

3

1

4

2

3

2

StartUp

node 3


orders

products

1

4

1

node 2

orders

products

2

3

2

2

3

1

4

$ ./elasticsearch-1.0.0/bin/elasticsearch![INFO ][node ][Karnak] {1.0.0}[5645]: initializing [INFO ][cluster.service][Karnak] detected_master [Ghost Maker]

StartUp

node 3

products

orders


orders

products

1

4

1

node 2

orders

products

2

33

2

2

3

1

4 3

1

4

elasticsearch.elasticsearch

kibana

logstash

Marvel

Elasticsearch 1.0 : une

solution prête pour la

production !

Revolutionizing Data Search

and Analytics

Purpose of Elasticsearch

• Organize data and make it easily accessible

– Through powerful search and analytics

– Easily consumable (even for non-data scientists)

– Elegantly handles extremely large data volumes

– Delivers results in real time

• Technology stack agnostic

• Used across all market verticals

Features of Elasticsearch

• Structured & unstructured search

• Advanced analytics capabilities

• Unmatched performance

• Real-time results

• Highly scalable

• User friendly installation and maintenance

User: GitHub

Searches 20TB of data, 1.3 billion files and 130

lines of code using Elasticsearch

User: Foursquare

Searches 50,000,000 venues every day using

Elasticsearch

User: Fog Creek Software

Searches 40,000,000,000 (40 billion) lines of code

in real-time using Elasticsearch

User: StumbleUpon

Delivers millions of recommendations every day

using Elasticsearch

Example: Email ArchivingEmail Archiving of 2 Petabytes of data across 100’s of servers

Big data, structured and unstructured

Example: Support AgentsCustom Support – Search, Facets, and Reports

Real time metrics

What’s new in ES 1.0

• Snapshot & Restore

• Distributed Percolation

• Aggregations

• Much more….

Unprecedented Uptake

Elasticsearch has more than 6 Million downloads

… and 450,000 more each month

Cumulative

User Raves

Chris Cowan @uhduh

I’m in love with @elasticsearch! I want to use it for everything right now!

Alain Richardt @alaincxs

Moving ffrom #solr to # Elasticsearch is like upgrading from a Reliant Robin to a McLaren

F1

Pete Connolly @peteconnolly

Two really useful and productive days of training from @kimchy and @uboness all

about #elasticsearch. Best training course in years

Cyril Lacôte @clacote

#ElasticSearch is the s*&t. Amazingly simple and powerful. Open source is awesome.

That's made my day.

Logan Lowell @fractaloop

Tweaking @elasticsearch for huge indexes can be fun. I'm very glad the IRC channel is

so helpful too.

Product Offerings:Support Throughout Your Project

1. Core Elasticsearch Training

2. Development and Production Support

1: Training

Core Elasticsearch Training

• Two day classroom training

• Delivered by Elasticsearch developers

1. Worldwide Public Courses

2. Onsite Training Course

2: Support

Resources

• www.elasticsearch.com

• www.elasticsearch.org

• User Groups:

http://www.elasticsearch.org/community/forum/

• Contact:

Richard Maurer

Territory Manager

[email protected]

http://www.elasticsearch.com

http://www.elasticsearch.org

http://www.elasticsearch.org/community/forum/

Comment insérer

Elasticsearch dans votre

système d’information et en

tirer le meilleur parti !

Elasticsearch to do What ?

SEARCH

ANALYZE

Are you ready to use

ElasticSearch in your IT?

Github projects

• Many

project

s

• Great

activity

• Many

langua

ges

Clients

Scripting Plugins Language

Why it ‘s easy

• One to

many

• Zero

conf

• Cloud

oriente

d

• Scalab

ility

Start Small Grow Big

Where / How can you use

ElasticSearch?

VIA

Centralized Log Storage (eg.)

…

CMS Search Engine

• Faceting

• Fuzzy Search

• Speed

• Auto Completion

• Geo Search

• Log Analysis

Ecommerce Enhanced Search

Engine

• REST based

• Memory and I/O efficient

• Adaptive I/O

• Map/Reduce API support

• Pig support

• Hive support

• = elasticsearch-hadoop

Combining Hadoop & ElasticSearch

It’s up to you to decide what to build with ES

Analysis / Dasboards

Some Examples

Kibana examples : IRC Activity

Kibana examples : Pfsense Monitoring

Kibana examples : Windows Events

Kibana examples : Inventory

Kibana examples : Syslog

ES = No Limits

Conclusion

Conclusion

• Il est temps de révolutionner la façon dont vous valorisez

vos données : offrez Elasticsearch à vos applicatifs !

• La stack ELK (Elasticsearch, Logstash, Kibana) en

version 1.0 est prête pour la production !

• Faites vous accompagner pour bénéficier des bonnes

pratiques et du support à tous les stades de votre projet :

conception, développement, production

Questions / Réponses