séminaire big data alter way - elasticsearch - octobre 2014
DESCRIPTION
TRANSCRIPT
Agenda & Intervenants
Introduction
Alter Way in 2 slides
Alter Way in 2 slides
Elasticsearch in 1 slide
• More than 11 million downloads
• 650,000 New Downloads per Month
• 1000s of Mission Critical Implementations
• Top Investors: Benchmark Capital, Index
Ventures
• Seasoned Executive Team
– Founded by Creator of Elasticsearch
– Seasoned Executives from SpringSource
Les enjeux de la recherche à
l’ère du BigData
Big Data in Todayʼs Business and Technology
Environment : some significant figures
• 2.7 Zetabytes of data exist in the digital universe today. (=1 billion Terabytes)
• 235 Terabytes of data has been collected by the U.S. Library of Congress in
April 2011.
• Facebook stores, accesses, and analyzes 30+ Petabytes of user generated
data.
• Akamai analyzes 75 million events per day to better target advertisements.
• Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than 2.5 petabytes
of data.
• The largest AT&T database boasts titles including the largest volume of data in
one unique database (312 terabytes) and the second largest number of rows in
a unique database (1.9 trillion), which comprises AT&Tʼ’s extensive calling
records.
• Hadoop :
– 94% of Hadoop users perform analytics on large volumes of data not
possible before
– 88% analyze data in greater detail;
– while 82% can now retain more of their data.
The Rapid Growth of Unstructured Data
• YouTube users upload 48 hours of new video every minute of the
day.
• 500+ new websites are created every minute of the day.
• Brands and organizations on Facebook receive 34,722 Likes every
minute of the day.
• 100 terabytes of data uploaded daily to Facebook.
• According to Twitterʼ’s own research in early 2012, it sees roughly
175 million tweets every day, and has more than 465 million
accounts.
• 30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009.
Big Data & Real Business Issues
• 25+ % of decision‐makers surveyed predict that data volumes in their
companies will rise by more than 60% by the end of 2014, with the
average of all respondents anticipating a growth of no less than 42 %.
• 40% projected growth in global data generated per year vs. 5% growth in
global IT spending.
• According to estimates, the volume of business data worldwide, across all
companies, doubles every 1.2 years.
– Poor data can cost businesses 20%–35% of their operating revenue.
– Bad data or poor data quality costs US businesses $600 billion annually.
• 75+ % of decision-makers surveyed anticipate significant impacts in the
domain of storage systems as a result of the “Big Data” phenomenon.
• We anticipate a new challenge : to be able to Search and Analyse all
those datas … in real time !
Elasticsearch
A solution already in production
with significant french
implementations
Revolutionizing Data Search and
AnalyticsRichard Maurer– SEMEA Territory Manager
Purpose of Elasticsearch
• Organize data and make it easily accessible
– Through powerful search and analytics
– Easily consumable (even for non-data scientists)
– Elegantly handles extremely large data volumes
– Delivers results in real time
• Technology stack agnostic
• Used across all market verticals
Features of Elasticsearch
• Structured & unstructured search
• Advanced analytics capabilities
• Unmatched performance
• Real-time results
• Highly scalable
• User friendly installation and maintenance
Elasticsearch 1.4: a solution
production ready• Real time data Indexation
• Distributed
• High Availability
• Schema Free
• Real Time Data Analytics
• Multi Tenancy
• Much more….
Unprecedented Uptake
Elasticsearch has more than11 Million downloads
… and 650,000 more each month
Cumulative
French Users
French Use Cases
Bouygues Telecom:
Uses Elasticsearch in their Big Data Platform. Cut their web resolution time by 10X
Daily Motion:
Indexing their 20 million Videos on Elasticsearch. On production for over 2 years
Voyages SNCF
They have recently announced ES has being live on their “Usine Logicielle”
Fotolia:
Search Engine made on Elasticsearch, to access 24 Million Images, move over to ES
Orange:
With over 1.2 billion docs, looking at better solution and cost reduction
Product Offerings:Support Throughout Your Project
1. Core Elasticsearch Training (2 days)
2. ELK Workshop (1 day)
3. Development and Production Support
4. Marvel, Monitoring of your ES clusters
2: Support
Resources
• www.elasticsearch.com
• www.elasticsearch.org
• User Groups:
http://www.elasticsearch.org/community/forum/
• Contact:
Richard Maurer
Territory Manager
MAKE SENSE OF YOUR (BIG) DATA!
David Pilato Technical advocate!!elasticsearch. @dadoonet
StartUp
data ?
StartUp
StartUp
StartUp
StartUp
StartUp
StartUp
BIG data ?
StartUp
BIG data ?
StartUp
Source: http://www.csc.com/insights/flxwd/78931-big_data_just_beginning_to_explode
35.000.000.000.000.000 mb
StartUp
Source: http://www.domo.com/learn/data-never-sleeps-2
StartUp
search = like % ?SELECT ! doc.*, country.* !FROM ! doc, country!WHERE ! doc.country_code = country.code AND! doc.date_doc > to_date('2011-12', 'yyyy-mm') AND ! doc.date_doc < to_date('2012-01', 'yyyy-mm') AND ! lower(country.name) = 'france' AND ! lower(doc.comment) LIKE ‘%product%' AND lower(doc.comment) LIKE ‘%david%';
StartUp
Search engine ?
StartUp
elasticsearch ?
plug & play
REST/JSON
scalable
Apache 2 license
Lucene
elasticsearch
Start…
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.tar.gz!$ tar -xf elasticsearch-1.1.1.tar.gz!$ ./elasticsearch-1.1.1/bin/elasticsearch![INFO ][node ][Ghost Maker] {1.1.1}[5645]: initializing
… and play!$ curl -XPUT localhost:9200/sessions/session/1 -d '{! "title" : "Elasticsearch",! "subtitle" : "Make sense of your (BIG) data !",! "date" : "2014-05-20T10:30:00",! "tags" : [ "elasticsearch", "alterway", "bigdata" ],! "speakers" : [{! "first_name" : "David", ! "last_name" : "Pilato" ! }]!}'
Search!$ curl http://localhost:9200/sessions/session/_search -d' { "query": { "multi_match": { "query": "elasticsearch alterway david", "fields": [ "title^3", "tags^2", "speakers.first_name" ] } }, "post_filter": { "range": { "date": { "from": "2014-05-01", "to": "2014-06" } } } }'
StartUp
Compute?
$ curl http://localhost:9200/sessions/session/_search -d' { "query": { ... }, "aggs": { "by_date": { "date_histogram": { "field": "date", "interval": "day", "format" : "dd/MM/yyyy" } } } }'
"by_date": [ { "key_as_string": "03/04/2014", "doc_count": 1 }, { "key_as_string": "12/04/2014", "doc_count": 2 }, { "key_as_string": "16/04/2014", "doc_count": 3 } ]
Compute!
#mstechdays #elasticsearch StartUp
• logs!
• twitter!
• github!
• marketing data!
• ...!
• your data!
• your big data
Let’s make sense of …
#mstechdays #elasticsearch StartUp
• logs!
• twitter!
• github!
• marketing data!
• ...!
• your data!
• your big data
Let’s make sense of …{ "name":"Pilato David", "dateOfBirth":"1971-12-26", "gender":"male", "children":3, "marketing":{ "fashion":334, "music":3363, "hifi":2351 }, "address":{ "country":"France", "city":"Paris", "location": [2.332395, 48.861871] } }
démo#mstechdays #elasticsearch StartUp
MAKE SENSE OF YOUR (BIG) DATA!
let’s inject some marketing documents…
elasticsearch.elasticsearch
kibana
logstash
Marvel
@dadoonet
thanks
Comment insérer ElasticSearch
dans votre Système d’Information
et en tirer le meilleur parti
ElasticSearch to do What ?
STORE
SEARCH
ANALYZE
Are you ready to use
ElasticSearch in your IT?
What you need to run it
• Java 8 update 20 or later, or Java 7 update 55 or later
• Only Oracle’s Java and the OpenJDK are supported.
Github projects• Many projects• Big activity• Many languages
6 mois !
Clients
Scripting Plugins Language
Why it ‘s easy
• One to many• ~ Zero conf• Cloud oriented• Scalability DNA• Replication• Sharding• Distributed• Resilience• Snapshot• Restore
Start Small Grow Big
• One to many• ~ Zero conf• Cloud oriented• Scalability DNA• Replication• Sharding• Distributed• Resilience• Snapshot• Restore
Start Small Grow Big
Where / How can you use
ElasticSearch?
VIA
Centralized Log Storage 1/2
Centralized Log Storage 2/2
…
CMS Search Engine
• Faceting• Fuzzy Search• Speed• Auto Completion• Geo Search• Log Analysis
Ecommerce Enhanced Search
Engine
• REST based• Memory and I/O efficient• Adaptive I/O• Map/Reduce API support• Pig support• Hive support
elasticsearch-hadoop
Combining Hadoop & ElasticSearch
What Else ?
It’s up to you to decide what to build with ES
Analysis / Dasboards
Some Examples
Kibana examples : IRC Activity
Kibana examples : Pfsense Monitoring
Kibana examples : Windows Events
Kibana examples : Inventory
Kibana examples : Syslog
Kibana examples : Web Activity
ES = No Limits
Conclusion
Conclusion
• Il est temps de révolutionner la façon dont vous valorisez
vos données : offrez Elasticsearch à vos applicatifs !
• La stack ELK (Elasticsearch, Logstash, Kibana) est déjà
massivement utilisée en production !
• Faites vous accompagner pour bénéficier des bonnes
pratiques et du support à tous les stades de votre projet :
conception, développement, production
Questions / Réponses