aws re:invent 2016: how to scale and operate elasticsearch on aws (dev307)

Mahdi Ben Hamida - SignalFx

11/30/2016

DEV307

How to Scale and Operate

Elasticsearch on AWS

What to Expect from the Session

• Elasticsearch (ES) usage at SignalFx

• What do we use ES for?

• How ES is deployed on AWS?

• Backup/restore of ES on Amazon S3

• Important ES/AWS metrics to monitor; what to alert on

• ES capacity planning

• Zero-downtime re-sharding

• SignalFx metadata storage architecture overview

• Scaling up and zero-downtime re-sharding on AWS

Elasticsearch at

ES Usage

Ad-hoc queries Auto-complete Full-text search

Cluster Size

• 4 clusters in production on Amazon EC2

• Biggest cluster

• 54 data nodes, 3 master nodes, 6 client nodes deployed

across 3 AZs

• Over 1.3 billion unique documents

• 10+ TB of data

• 270 shards (primaries + replica)

• Sustained 75 QPS, 1K index/sec

ES Deployment on AWS

• Dockerized ES 2.3/1.7 clusters. Orchestration done

using MaestroNG

• Biggest cluster

• Data nodes: i2.2xlarge – 16 GB heap (61GB total)

• Master nodes: m3.large – 2 GB heap (7.5GB total)

• Client nodes: m3.xlarge – 10 GB heap (15GB total)

• ES rack awareness to distribute primary and 2 replica

across 3 Availability Zones

Backup/Restore

• Made easy using the AWS Cloud plugin:PUT _snapshot/s3-repo { "type": "s3", "settings": { "bucket": ”signalfx-es-backups", "region": "us-east" } }

• Incremental backups

• Un-versioned S3 bucket

• VPC S3 endpoint to avoid bandwidth constraints

• Instance profiles for authentication to S3

• Cron job for hourly snapshots and weekly rotation

ES Monitoring & Alerting

Key Performance Metrics

Key Detectors

• High CPU usage, low disk size

• Sustained high heap usage

• Master nodes availability

• Cluster state (green/yellow/red)

• Unassigned shards

• Thread pool rejections (search, bulk, index are the most

critical)

Always Test your ES Detectors/Alerts

Elasticsearch Capacity Planning

Capacity Factors

• Indexing

• CPU/IO utilization can be considerable

• Merges are CPU/IO intensive. Improved in ES 2.0

• Queries

• CPU load

• Memory load

ES Sharding & Scale-up

node-1

node-2

node-1

node-2

node-3

node-4

node-1

node-2

node-3

node-4

node-5

node-6

Sizing Shards

• Create an index with one shard

• Simulate what you expect your indexing load to be –

measure CPU/IO load, find where it breaks

• Do the same with queries

• Determine disk consumption (average document size)

Zero-downtime Re-sharding

Why Re-shard?

• Required if you can’t scale up indexing by adding more

• If the index is read-only, you could implement a simpler

approach using aliases

• If the index is being written to, it’s more complicated

service-A

metabase-client

server-1mb-

server-1metabase-1index-topic

write-topic

(1) enqueue write

(2) dequeue write

(3) write to C*

(4) enqueue index

(7) index document

(5) dequeue index

(6) read from C*

SignalFx’s Metadata Storage Architecture

Index Re-sharding Process

• Pre-requisites

• Phase 1: create target index

• Phase 2: bulk re-indexing

• Phase 3: double writing & change re-conciliation

• Phase 4: testing new index

• Phase 5: complete re-sharding process

Pre-requisite 1: readers query from an alias

myindex_v1

myindex readerreaderreader

Pre-requisite 2: indexing state +

generation number

myindex_v1

indexer generation: 42

extra: <null>

current: myindex_v1

myindex_v2

Phase 1: create new index with updated

mappings

myindex_v1

extra: <null>

current: myindex_v1

Phase 2: increment generation, then start

bulk re- indexing of older generations

myindex_v1 myindex_v2_generation <= 42

extra: <null>

current: myindex_v1

During this step, documents may get

added/updated (or deleted*)

_generation <= 42

updated

created

indexer

myindex_v1

generation: 43

extra: <null>

current: myindex_v1

myindex_v2

Index state at the end of the bulk indexing

indexer

myindex_v1

generation: 43

extra: <null>

current: myindex_v1

myindex_v2

Phase 3 – (a): enable double writ ing & bump

generation

indexer

myindex_v2myindex_v1

generation: 44

extra: myindex_v2

current: myindex_v1

Phase 3 – (b): re- index documents at

generation 43

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

Phase 3 – (c): re- index documents at

generation 43

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

generation 43

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

generation 43

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

Phase 3 – (e): perfect sync of both indices

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

Phase 4: A/B testing of the new index

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

myindexreaderreaderreader

Phase 4: swap read alias (or swap back !)

indexer

generation: 44

extra: myindex_v2

current: myindex_v1

myindexreaderreaderreader

Phase 5: switch write index, generation,

stop double writ ing

indexer

myindex_v1

generation: 45

extra: <null>

current: myindex_v2

myindex_v2

Handling Failures

• Bulk re-indexing can fail (and it does); you don’t want to

re-start from scratch

• Use a “partition” field

• Migrate partition ranges

• Deletions could be a problem. We handle that by using

“deletion markers” instead then cleaning up

Performance Considerations

• Migrate using partition ranges to avoid holding segments

for a long time

• Add temporary nodes to handle the load

• Disable refreshes on the target index (so worth it!)

• Start with no replica (or one just in case)

• Avoid ”hot” shards by sorting on a field (a timestamp for

example)

• Have throttling controls to control indexing load

Thank you!

Sign-up for a free trial at

signalfx.com

Remember to complete

your evaluations!

aws re:invent 2016: how to scale and operate elasticsearch on aws (dev307)

Technology

aws re:invent - optimizing costs with aws

aws security ideas - re:invent 2016

aws re:invent 2015

aws re:invent 2013 recap

navigating aws re:invent 2015

aws re:invent 2016 fast forward

feedback on aws re:invent 2016

aws re:invent - securing hipaa compliant apps in aws

(arc206) architecting reactive applications on aws | aws...

understanding aws storage options (stg101) | aws re:invent...

aws re:invent 2016 security follow up aws organizations

aws re:invent 2016 후기

aws re:invent 2016: large-scale aws migrations (ent204)

bluesoft @ aws re:invent 2017 + aws 101

(app308) chef on aws: deep dive | aws re:invent 2014

(app315) coca-cola: migrating to aws | aws re:invent 2014

aws security – keynote address (sec101) | aws re:invent...

understanding aws database options (dat201) | aws re:invent...

aws 2016 re:invent launch summary

(sec201) aws security keynote address | aws re:invent 2014