the journy to real time analytics
TRANSCRIPT
![Page 1: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/1.jpg)
The journey to real-time analyticsIdo Friedman
![Page 2: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/2.jpg)
IdoFriedman.ymlName: Ido Friedman,Past:”SQL Server consultant,Instructor,Team Leader”Present:”Data engineer and Architect,
Elasticsearch,CouchBase,MongoDB,Python”,…]WorkPlace:PerionWhenNotWorking:@Sea
![Page 3: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/3.jpg)
AgendaWhat is Real-Time analytics
Our use case goals and insight
What’s next
![Page 4: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/4.jpg)
Real-Time analyticsReal-time analytics is the use of, or the capacity to use,
all available enterprise data and resources when they are needed. It consists of dynamic analysis and reporting, based on data entered into a system less than one minute before the actual time of use. Real-time analytics is also known as real-time data analytics, real-time data integration, and real-time intelligence.
![Page 5: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/5.jpg)
Time dimensions/SLAs
Real Time
Msec/Secs
Near Real Time
(Min/Hour)
Batch
(Hours/Days)
![Page 6: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/6.jpg)
Analytics
Batch
Analytics
Real Time analytics Stream
Analytics
![Page 7: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/7.jpg)
Our goals
Online segmentation
User report dashboard
![Page 8: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/8.jpg)
SegmentationSingle event granularity
Full filtering flexibility no predefinition
No restriction on time range queries
No data purging
Msec response time
Hundreds to Thousands of requests per minute
![Page 9: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/9.jpg)
So it began
Elastic search was selected because
No overhead on indexing fields – It’s all index
VERY fast filtering and aggregation
Rich aggregation and querying
Relatively easy maintenance of large data sets
![Page 10: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/10.jpg)
Some words on Elastic searchFull Text engine gone wild
Highly available Search and analytics
Ultra scalable and easily maintainable
By developers for Developers
https://www.elastic.co/products/elasticsearch
![Page 11: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/11.jpg)
ES ExamplesDate histogramsFiltersAggsCardinalityTopMany more..
![Page 12: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/12.jpg)
POC
Number of indexes and shards was decide…
Index mapping was set
Search patterns, queries and SLA were achieved
Data set was not big enough
RE – POC
IN PRODUCTION
![Page 13: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/13.jpg)
POC v2 - GoalsFind the correct sharding / nodes combination
Create a manageable cluster
Achieve repeatable results
Reduce costs
![Page 14: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/14.jpg)
The insightsShardingReplicationNodesRoutingCluster managementRoutingDoc Values vs Field DataMaster nodes
![Page 15: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/15.jpg)
The insights - Nodes
1 TB Data
250 GB Data
250 GB Data
250 GB Data
250 GB Data
250 GB Data
250 GB Data
Data Nodes option 1Nodes option 1 Effect of a single node downtime
50%
25%
![Page 16: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/16.jpg)
Data loading•Analyze your need and choose your tools to suite
• If you know your data don’t invest in generic solution
•Check your data load processes and verify its accuracy
![Page 17: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/17.jpg)
Re sharding
Will be internally in elastic in future versions
![Page 18: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/18.jpg)
$$$$$
Money is not your enemy
Use costs as the main drive to improve your solution
Use costs as the main matric it will keep your company running
![Page 19: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/19.jpg)
Issues – not all is perfectCardinality aggregation
PerformanceAccuracyData set size
![Page 20: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/20.jpg)
Hardware resource balanceFind your real bottle neck
Choose the correct node for your load
Best practices are sometimes too general
![Page 21: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/21.jpg)
We are not happy yetWe need joins – Data modeling Elastic search main issue for us –> data piping
![Page 22: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/22.jpg)
Where we go next?Other analytics engines?
DruidMongoDB
Couchbase
![Page 23: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/23.jpg)
MongoDB Aggregation framework
![Page 24: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/24.jpg)
CouchBase - Global Service Indexing
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type="product" USING GSI WITH {"nodes":"node1:8091"}; CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type="product" USING GSI WITH {"nodes":"node2:8091"};
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type="product" AND productName BETWEEN "A" AND "K" USING GSI WITH {"nodes":"node1:8091"}; CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type="product" AND productName BETWEEN "K" AND "Z" USING GSI WITH {"nodes":"node2:8091"};
Manual scale out and replication
![Page 25: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/25.jpg)
Druid
![Page 26: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/26.jpg)
Joins in ElasticSearchhttp://siren.solutions/relational-joins-for-elasticsearch-the-siren-join-plugin/
$ curl -XGET 'http://localhost:9200/articles/_coordinate_search?pretty' -d '{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "filterjoin" : { (1) "mentions" : { (2) "indices" : ["companies"], (3) "path" : "id", (4) "query" : { (5) "term" : { "name" : "orient" } } } } } } }}'
![Page 27: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/27.jpg)
SummaryNo magic solutions
Always understand your data and needs
Invest the time on modeling and optimization
![Page 28: The journy to real time analytics](https://reader034.vdocuments.mx/reader034/viewer/2022051403/58a993b61a28abc2518b4851/html5/thumbnails/28.jpg)