elasticsearch meetup final_2014_04
DESCRIPTION
TRANSCRIPT
Elasticsearch for reporting analytics on communities
Elasticsearch Meetup - 23 April 2014
Marc Harrison
Lithium makes software that helps brands better connect with their customers
Our social software helps companies respond on social networks and build trusted content on a community they own.
Empower brands to distill terabytes of daily data into understanding participation
▪ fast
▪ flexible
▪ scalable
What products/services are generating the
most conversations?
Who is authoring content that
generates the most kudos/likes?
Are customer posts getting timely
replies?
What types of content does this
audience segment look for?
Lithium Social Intelligence (LSI)
Cluster specs▪ One of our clusters – elastic search 1.0
• 7+ billion documents/4.4+ TB and growing fast!• 21 nodes (3 masters, 2 clients, 16 data)
Lessons learned
▪ Bulk loading
▪ Faceting
Bulk initial load / rebuild of data
Hadoop
mysql streamTransform/
route
…
JSON Elasticsearch
Bulk loading▪ Make sure ingest logic is robust
• Idempotent for bulk reply - ‘_id’ • Include revision based on processor/time• Check cluster/index status to make sure ready to ingest
▪ Know the cache and thread pool sizes• Bulk – fixed - # of processors - queue size 50• Handle back off and retry
▪ How many docs?• Like capacity - test with data –
• number of shards• index.refresh_interval: 30s• indices.memory.index_buffer_size: 5%• indices.memory.*• index.translog.*
Search - time series pattern for scale
Faceting
▪ Don't forget about memory!• Strings - not_analyzed• Numbers long vs int, double vs float, etc• Do you need seconds/minutes when faceting?• fielddata format - doc_values (1.0)• Admin API’s allow checking field data size + evictions
• indices.cache.filter.size: 15%• indices.fielddata.cache.size: 45%
Faceting II
▪ Accuracy• shard_size• Number of shards• Cardinality• Routing
▪ Great custom plugin framework• Uniques• Array faceting
Impact
▪ Order of magnitude improvement
▪ Developers able to focus on improving insights
▪ community + elasticsearch + hadoop + horton works = exciting
Select settings (data center)• bootstrap.mlockall: true• cluster.routing.allocation.disk.threshold_enabled: true• http.compression: true• transport.tcp.compress: true• gateway.recover_after_data_nodes: 13• gateway.recover_after_master_nodes: 2• gateway.recover_after_time: 3m• gateway.expected_nodes: 17• indices.memory.index_buffer_size: 5%• indices.cache.filter.size: 15%• indices.fielddata.cache.size: 45%• index.store.type: mmapfs• index.translog.flush_threshold_ops: 10000• action.auto_create_index: false• action.disable_delete_all_indices: true• cluster.routing.allocation.node_initial_primaries_recoveries: 4• cluster.routing.allocation.node_concurrent_recoveries: 15• indices.recovery.max_bytes_per_sec: 100mb• indices.recovery.concurrent_streams: 5• discovery.zen.minimum_master_nodes: 2• index.search.slowlog.threshold.query.warn: 5s• index.search.slowlog.threshold.query.info: 1s• index.indexing.slowlog.threshold.index.warn: 5s• plugin.mandatory: lithium-unique-facets
Questions?