configuring elasticsearch for performance and scale

20
Configuring Elasticsearch For Performance and Scale Based on the knowledge gained after attending elasticsearch webinar on 30 th September 2014 Prepared By: Bharvi Narayan Dixit Software Engineer, Orkash Services Pvt. Ltd.

Upload: bharvi-dixit

Post on 02-Dec-2014

1.674 views

Category:

Data & Analytics


2 download

DESCRIPTION

The contents are based on the vast experience shared by the experts from the industries like The Guardian, Datadog, Captora and elasticsearch itself.

TRANSCRIPT

Page 1: Configuring elasticsearch for performance and scale

Configuring Elasticsearch For Performance and Scale

Based on the knowledge gained after attending elasticsearch webinar on

30th September 2014

Prepared By: Bharvi Narayan DixitSoftware Engineer, Orkash Services Pvt. Ltd.

Page 2: Configuring elasticsearch for performance and scale

Contents

The Elasticsearch Open Source Model The Popularity of Elasticsearch Insights across The Guardian Ophan - The real time analytics tool Datadog’s Elasticsearch Story How Datadog’s event dashboards look like Elasticsearch use @ Captora Captora dashboard and it’s architecture Webinar Poll for type of infrastructures used for

elasticsearch

Page 3: Configuring elasticsearch for performance and scale

The Elasticsearch Open Source Model

Page 4: Configuring elasticsearch for performance and scale

The Popularity of Elasticsearch

10M downloads in 2 years and counting..

Page 5: Configuring elasticsearch for performance and scale

Insights across the Guardian

• A large portion of The Guardian’s business relies on Elasticsearch to understand how their content is being consumed.

• Before Ophan, guardian used a traditional analytics package which had a four-hour lag and that is too with so many restrictions.

• ~40M documents is processed per day and 360M documents can be easily queried.

• Real-Time traffic analysis of each content, which enables the organization to see the audience engagement.

• Easy scaling the cluster (Adding more capacity) whenever there is any stress on elasticsearch because of any new feature.

Page 6: Configuring elasticsearch for performance and scale

Ophan - The real time analytics tool created by the Guardian based on elasticsearch

Page 7: Configuring elasticsearch for performance and scale

Datadog’s Elasticsearch Story

• Elasticsearch is used as Datadog’s primary data store for events/logs.

• Before elasticsearch Postgres was being used.• Event data is always structured with flexibility of

adding/removing fields as needed.• Hundreds of millions of full-text events across 12+ indices.• ~10M documents/day. Doubling the volume every 4-5 months.

Page 8: Configuring elasticsearch for performance and scale

First version of elasticsearch cluster in Datadog

• One node per AZ (availability zone) handling HTTP and data.• One large index storing all events from all time.• Writing to a pool of all nodes in the cluster.• Worked well for 1-1.5 years.

Page 9: Configuring elasticsearch for performance and scale

Faster and more scalable cluster

• Split cluster into head and data nodes.• Head nodes act as a load balancer, accepting the HTTP requests.• Data nodes just interact with head and data nodes.• Use a rolling index with one month of event data each.

Page 10: Configuring elasticsearch for performance and scale

What Datadog’s engineers learned??

• Give some planning time to sizing before setting on data format.– With a bit of planning, they could have avoided migrating to a rolling index

later on.– But you can’t plan for everything, so architect deployments, with

migration in mind.

• Monitor your elasticsearch cluster from the beginning.• Creating tooling around backup and restore should almost be in

your first deployment

Page 11: Configuring elasticsearch for performance and scale

How Datadog’s event dashboards look like..

Page 12: Configuring elasticsearch for performance and scale

How Datadog’s event dashboards look like..

Page 13: Configuring elasticsearch for performance and scale

How Datadog’s event dashboards look like..

Provides ability to write comments over events by mentioning peers.

Page 14: Configuring elasticsearch for performance and scale

How Datadog’s event dashboards look like..

Page 15: Configuring elasticsearch for performance and scale

Elasticsearch use @ Captora

• Captora is the first marketing cloud solution to automatically expand and optimize the marketing campaigns to engage and convert thousands of new future buyers.

• It provides an approach of Adaptive Marketing, market discovery, engagement, and convert new buyers by intelligently and automatically scaling content-driven campaigns across multiple channels (search, advertising, and social).

• Read more at http://www.captora.com/technology/

Page 16: Configuring elasticsearch for performance and scale

Elasticsearch use @ Captora

@captora Elasticsearch is primarily used for• Indexing all textual data (i.e. crawled multi-channel content streams, user

generated documents etc.)• Power the textual search, rankings, and relevant calculation of the content

recommendation engine.• Power the user portal search of the content stream.

Elasticsearch stats @captora• Mostly semi-structured data (i.e. web-pages, white-papers, meta data of videos

from YouTube, LinkedIn updates, blogs, Tweets etc.)• ~200M documents, ~300GB of data.• Partitioned across ~1200 indices, 2300 shards, with replication factor of 4.• 6 EC2 nodes (c3.2xlarge, provisioned SSD), two AWS availability zones, ELB

balanced.• Index rate: 10 to 500 requests/Sec.• Query rate: 100 to 2000 requests/Sec.

Page 17: Configuring elasticsearch for performance and scale

Captora’s Dashboard

Page 18: Configuring elasticsearch for performance and scale

Captora’s Architecture

Page 19: Configuring elasticsearch for performance and scale

Poll Time(Based on the votes by webinar attendees)

Page 20: Configuring elasticsearch for performance and scale

Thank You