delhi elasticsearch meetup
TRANSCRIPT
Delhi Elasticsearch Meetup
Bharvi Dixit@d_bharvi
Nov 29, 2014 Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Agenda
What is a search engine? Lucene Overview and Indexing Pipeline. Data Driven Approaches & Problems. Elasticsearch Comes to Rescue. Understanding Elasticsearch Architecture. Logstash & Kibana Overview. The ELK stack together. Some tips.
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
About Me
Software engineer @Orkash. Loves Java, Data, Elasticsearch, MongoDB, Eclipse. Interested in all things scale, search, security & DevOps. Creator: CIBET Pro Manager Working on Elasticsearch for more than a year. Social Media and News Media Intelligence. (Complex
schemas & Query designs)
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
What is a search engine?
• An information retrieval system designed to find informationstored in computer system.
A search engine has different modules:
• But what about the relevant or irrelevant results??
Data collected from various
sourcesData stored in indexes
Data is queried
Indexing
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
What is a search engine?
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Auto completionDid-You-Mean
Spell correctionMulti-lingual
StemmingSynonyms
HighlightingMore-Like-This
Lucene Overview
Lucene:• Open source, Fast, high performance, search/IR library.• Written in Java.• Initially developed by Doug Cutting (Also author of
Hadoop)• Indexing and Searching.• Inverted Index of documents.• Provides advanced Search options like synonyms,
stopwords, similarity, proximity.
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Lucene Internals- Inverted Index
Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Lucene Internals- Continued
• Defines documents Model
• Index contains documents.
• Each document consist of fields.
• Each Field has attributes.
– What is the data type (FieldType)
– How to handle the content (Analyzers, Filters)
– Is it a stored field (stored="true") or Index field
(indexed="true")
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Indexing Pipeline
• Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters)
• Each field can define an Analyzer at index time/query time or the both at same time.
Document TokenizerDocument
WriterToken Filter
Inverted Index
Analysis Phase
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Everything starts with a problem..!!
• Data Driven Decisions• Logfiles for scaling up/down• Warehouse withdrawal triggers orders• History for fraud detection• Assembly line, throughput improvement
... data explosion
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Everything starts with a problem..!!
Better decisions == more data?
Data
Big Data
BIG DATA
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Big Data Problem goes on..• I need BIG DATA.• I need to analyze this data.• I need to enrich this big data & make it more bigger. • I need fast searching.• I need real-time analytics.• Ohh wait.. I need relational queries on this big data to get
more insights..• I need .. I need .. I need..
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
And I guess this is why someone nailed it..
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Elasticsearch comes to rescue..
What is Elasticsearch:• “you know, for search”• Schema-free, REST & JSON Based distributed Full Text
search engine & document store.• Written in JAVA & Build on top of Lucene.• Highly reliable, scalable, fault tolerant.• Support distributed Indexing, Replication, and load
balanced querying.• Powerful Geo-Spatial Queries.• Latest Release : 1.4.1Wait..!! Schema Free?? The real gotcha..
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Elasticsearch comes to rescue..
What does it add to Lucene:• REST service: Json API’s over HTTP
• High Availability & Performance: Clustering & Replication
• A Powerful query DSL.• Interoperation with non-Java/JVM languages.• More and more Resilience.• Multitenancy• And the best one: It allows to maintain relationship
among documents.
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
The Elasticsearch Open Source Model
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
The Popularity of Elasticsearch
10M downloads in 2 years and counting..
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
The Popularity of Elasticsearch
Have a look at the case studies here:http://www.elasticsearch.org/case-studies/
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Understanding Elasticsearch Structure
A live demo is better then nothing
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Logstash
• Tool for Receiving, processing and outputting logs.(Input======Filter======Output)
• All kinds of logs: System logs, error logs, webserver logs,application logs & just about anything you can throw at it.
• Open Source: Apache License 2.0.
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Kibana
• Execute queries on your data & visualize results.• Add/remove widgets.• Share/Save/Load dashboards.• No need to know coding.• Open Source: Apache License 2.0.
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
The ELK Stack Together
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
meetup.com RSVP stream
• All RSVPs are written out to a HTTP stream• Each line is a JSON document• Available at http://stream.meetup.com/2/rsvps
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
meetup.com RSVP stream
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
In the end..
• Look out for best practices. (Proper cluster formation, Bulk Indexing)
• Continuous monitoring: Marvel, Bigdesk, HQ• Open-JDK strictly prohibited.• Elasticsearch is the always hungry: Give me more RAM..!!• Benchmarking of data to create indexes/shards. (Once
created; can’t be broken)• And don’t forget to create mappings.• Manage your security.. But Now It’s coming soon..
Elasticsearch Shield.. “you know, for security”
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014
Thank You for Listening
[email protected]://twitter.com/d_bharvislideshare.net/bharvidixit/
Delhi Elasticsearch Meetup: Talk About Most Advance Search Engine of The world "Elasticsearch"
Nov 29, 2014