apache eagle - monitor hadoop in real time
TRANSCRIPT
![Page 1: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/1.jpg)
Apache EagleMonitor Hadoop in Real Time
Hao Chen
http://people.apache.org/~hao
![Page 2: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/2.jpg)
ABOUT SPEAKER
PMC & Committer of Apache [email protected]
MTS Software Engineer at [email protected]
Hao Chen / 陈浩
![Page 3: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/3.jpg)
Agenda
• Introduction•Use Case•Architecture•Q & A
![Page 4: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/4.jpg)
is a distributed real-time monitoring and alerting engine for hadoop from eBay
Open sourced as Apache Incubator Project on Oct 26th 2015
Secure Hadoop in Realtime a data activity monitoring solution to instantly identify access to sensitive data, recognize attacks/ malicious activity and block access in real time.
See http://eagle.incubator.apache.org or http://github.com/apache/incubator-eagle
Apache Eagle
![Page 5: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/5.jpg)
Eagle was initialized by end of 2013 for hadoop ecosystem monitoring as any existing tool like zabbix, ganglia can not handle the huge volume of metrics/logs generated by hadoop system in eBay.
2014/201610,000 nodes150,000+ cores200 PB2000+ user
3000+ nodes10,000+ cores50+ PB2012
20111000+ nodes10,000+ cores10+ PB
100+ nodes1000 + cores1 PB2010
200950+ nodes
20071-‐10 nodes
Hadoop Data • Security• ActivityHadoop Platform • Heath• Availability• Performance
Hadoop @ eBay Inc
Initiative – Why build Eagle?
7+ CLUSTERS10000+ NODES200+ PB DATA
10 B+ EVENTS / DAY500+ METRIC TYPES50,000+ JOBS / DAY50,000,000+ TASKS / DAY
![Page 6: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/6.jpg)
Eagle Use Cases
Data Activity MonitoringSecure Hadoop in Realtime a data activity monitoring solution to instantly identify access to sensitive data, recognize attacks/ malicious activity and block access in real time.
Job Performance MonitoringHadoop, Spark Job Profiling & Performance Monitoring, Cluster Health Anomaly Detection
1
24
eBay Unified Monitoring PlatformProvide unified monitoring-‐as-‐service for everything around infrastructure or business.
3
![Page 7: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/7.jpg)
Eagle Data Activity MonitoringData Loss PreventionGet alerted and stop a malicious user trying to copy, delete, move sensitive data from the Hadoop cluster.
Malicious LoginsDetect login when malicious user tries to guess password. Eagle creates user profiles using machine learning algorithm to detect anomalies
Unauthorized accessDetect and stop a malicious user trying to access classified data without privilege.
Malicious user operationDetect and stop a malicious user trying to delete large amount of data. Operation type is one parameter of Eagle user profiles. Eagle supports multiple native operation types.
User
Privileges
Common Data Sets
Patterns
CommandsZones
Query
Columns
![Page 8: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/8.jpg)
§ HDFS Policies§ Access to Sensitive files§ HDFS Commands used (read, write, update…)§ Client host § Destination§ Security Zones
§ Hive Policies§ Access to tables with PII Data§ SQL Query Profiles§ Client Host§ Security Zone
Eagle Data Activity MonitoringHadoop Data Security: Detect anomalies in accessing HDFS and Hive
![Page 9: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/9.jpg)
Offline: Determine bandwidth from training dataset the kernel density function parameters (KDE)Online: If a test data point lies outside the trained bandwidth, it is anomaly (Policy)
PCs(Principle Components) in EVD(Eigenvalue Value Decomposition)
Kernel Density Function
Eagle Machine Learning User Profile
![Page 10: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/10.jpg)
Use Case Detect node anomaly by analyzing task failure ratio across all nodesAssumption Task failure ratio for every node should be approximately equalAlgorithm Node by node compare (symmetry violation) and per node trend
Eagle Job Performance Monitoring
![Page 11: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/11.jpg)
Alerting: Anomaly Detection Alerting
Insight: Task failure drill-‐down
Insight: Task failure drill-‐down
Task Failure based Anomaly Host Detection
![Page 12: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/12.jpg)
Counters & Features
Use Case Detect data skew by statistics and distributions for attempt execution durations and countersAssumption Duration and counters should be in normal distribution
mapDurationreduceDurationmapInputRecordsreduceInputRecordscombineInputRecordsmapSpilledRecordsreduceShuffleRecordsmapLocalFileBytesReadreduceLocalFileBytesReadmapHDFSBytesReadreduceHDFSBytesRead
Modeling & Statistics
AvgMinMax DistributionsMax z-scoreTop-NCorrelation
Threshold & Detection
Counters
Correlation > 0.9 & Max(Z-Score) > 90%
Hadoop Job Data Skew Detection
![Page 13: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/13.jpg)
Counters & Features
Use Case Detect data skew by statistics and distributions for attempt execution durations and countersAssumption Duration and counters should be in normal distribution
mapDurationreduceDurationmapInputRecordsreduceInputRecordscombineInputRecordsmapSpilledRecordsreduceShuffleRecordsmapLocalFileBytesReadreduceLocalFileBytesReadmapHDFSBytesReadreduceHDFSBytesRead
Modeling & Statistics
AvgMinMax DistributionsMax z-scoreTop-NCorrelation
Threshold & Detection
Counters
Correlation > 0.9 & Max(Z-Score) > 90%
Hadoop Job Data Skew Detection
![Page 14: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/14.jpg)
Eagle @ eBay Inc.7+ CLUSTERS10000+ NODES200+ PB DATA
10 B+ EVENTS / DAY500+ METRIC TYPES50,000+ JOBS / DAY50,000,000+ TASKS / DAY
Eagle Deployment at eBay Production• 100+ security policies• 8 nodes• 30 worker process• 64 kafka partitionEagle Performance• Avg Latency: ~ 50 ms• Max Throughput/Cluster: 300 k /s
![Page 15: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/15.jpg)
Technical ChallengesLarge Scale Processing and Storage• Scale IO Complexity (Stream)• Scale Computation Complexity (Policy)• Scale Storage & Query (Event, Log, Metric)
1
Real-‐Time Alerting• Real-‐time Data Collection• Real-‐time Stream Processing• Real-‐time Alerting
2
Expressive Correlation Model• Complex policy model• Stream GroupBy, Join, Window• Machine Learning
3
Hadoop Ecosystem Integration• Data Source Integration• Anomaly Detection Model
4
![Page 16: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/16.jpg)
Eagle Architecture
![Page 17: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/17.jpg)
Distributed Policy Engine
METADATA MANAGER
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time Alerts
Alerts
Policy Management
Policy
Dynamical Policy Deployment
Real-‐time Event Stream
Stream_{1}
Stream_{*}
Dynamical Stream Schema
Stream Processing
• Real-‐time Streaming: Apache Storm (Execution Engine) + Kafka (Message Bus)• Declarative Policy: SQL (CEP) on Streaming + Hot Deploy• Linear Scalability: Data volume scale + Computation scale• Metadata-‐Driven: Schema Management and Dynamical Policy Lifecycle
![Page 18: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/18.jpg)
METADATA MANAGER
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time Alerts
Alerts
Policy Management
Policy
Dynamical Policy Deployment
Real-‐time Event Stream
Stream_{1}
Stream_{*}
Dynamical Stream Schema
Stream Processing
from MetricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream;
• Real-‐time Streaming: Apache Storm (Execution Engine) + Kafka (Message Bus)• Declarative Policy: SQL (CEP) on Streaming + Hot Deploy• Linear Scalability: Data volume scale + Computation scale• Metadata-‐Driven: Schema Management and Dynamical Policy Lifecycle
Distributed Policy Engine
![Page 19: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/19.jpg)
Declarative Policy
• Filter• Join• Aggregation: Avg, Sum , Min, Max, etc• Group by• Having• Stream handlers for window: TimeWindow, Batch Window,
Length Window • Conditions and Expressions: and, or, not, ==,!=, >=, >, <=, <,
and arithmetic operations• Pattern Processing• Sequence processing• Event Tables: intergrate historical data in realtime processing• SQL-‐Like Query: Query, Stream Definition and Query Plan
compilation
Distributed SQL on Streaming : Siddhi CEP + Storm by default
from MetricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream;
![Page 20: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/20.jpg)
Declarative Policy -‐ Examples
from hadoopJmxMetricEventStream[metric == "hadoop.namenode.fsnamesystemstate.capacityused" and value > 0.9] select metric, host, value, timestamp, component, site insert into alertStream;
Example 1: Alert if hadoop namenode capacity usage exceed 90 percentages
from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.fsnamesystem.hastate"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and a.value != value)] within 10 min select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as site insert into alertStream;
Example 2: Alert if hadoop namenode HA switches
![Page 21: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/21.jpg)
1
Distributed Policy Engine -‐ Scalability
2
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream Processing
Dynamic policy partition by {event} * {policy}
• N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks• Physical partition + policy-‐level partition
Linear Scalability Principle
![Page 22: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/22.jpg)
3
Algorithm Weights of Executors By Partition User
Random 0.0484 0.152 0.3535 0.105 0.203 0.072 0.042 0.024
Greedy 0.0837 0.0837 0.0837 0.0837 0.0737 0.0637 0.0437 0.0837
Stream Partition Skew (15:1)
Distributed Policy Engine – OptimizationStream Partition Problem https://en.wikipedia.org/wiki/Partition_problem
![Page 23: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/23.jpg)
Distributed Real-time Policy Engine
Siddhi CEP Policy Evaluator
Machine Learning Policy Evaluator• Support WSO2 Siddhi CEP as first class
• Extensible Policy Engine Implementation• Extensible Policy Lifecycle Management• Metadata-‐based Module Management
Extensible Policy Evaluator
public interface PolicyEvaluatorServiceProvider {public String getPolicyType(); // literal string to identify one type of policypublic Class getPolicyEvaluator(); // get policy evaluator implementationpublic List getBindingModules(); // policy text with json format to object mapping
}
METADATA MANAGER
Policy/Metadata
Distributed Policy Engine – ExtensibilityPolicy Engine Extensibility
![Page 24: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/24.jpg)
Stream Processing Framework
Optimizer
1. Development 2. Optimization 3. Compile to native app
Use eagle alert framework as library
![Page 25: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/25.jpg)
• Light-‐weight ORM Framework for HBase/RDMBS• Full-‐function SQL-‐Like REST Query • Optimized Rowkey design for time-‐series data• Native HBase Coprocessor• Secondary Index Support
@Table("alertdef")@ColumnFamily("f")@Prefix("alertdef")@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)@JsonIgnoreProperties(ignoreUnknown = true)@TimeSeries(false)@Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"})@Indexes({
@Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true),})public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{
@Column("a")private String desc;@Column("b")private String policyDef;@Column("c")private String dedupeDef;
Query=AlertDefinitionService[@dataSource="hiveQueryLog"]{@policyDef}
Large Scale Storage and Query
![Page 26: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/26.jpg)
Uniform HBase rowkey design
• Metric
• Entity
• Log
Rowkey ::= Prefix | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Metric Name | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Default Prefix | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Log Type | Partition Keys | timestamp | tagName | tagValue | …Rowvalue ::= Log Content
Large Scale Storage and Query
http://opentsdb.net
![Page 27: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/27.jpg)
Multi-‐Tenants – Topology Scheduler• Dynamical Topology Management• No-‐downtime Topology Maintenance• Topology High Availability & Balance• Resource Scheduling & Isolation: Runtime,Woker, Topology or Cluster
![Page 28: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/28.jpg)
Multi-‐Tenants -‐ Dynamical Correlation
• Dynamical Correlation on Runtime:• Sort, Groupby, Join, Window• Hot Deploy Logic• Policy Management
• Multi-‐Correlation on the Single Stream• Group by different fields of same stream• Resort same stream by different order• Join certain stream in different way
• Multi Correlation on Multi Streams• Cross Streams Join• Real-‐time & Historical Stream Join
![Page 29: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/29.jpg)
Multi-‐Tenants -‐ Dynamical Correlation
![Page 30: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/30.jpg)
1 Eagle FrameworkDistributed real-‐time framework for efficiently developing highly scalable monitoring applications
Eagle Ecosystem
2 Eagle AppsSecurity/ Hadoop/ Operational Intelligence / …
3 Eagle InterfaceREST Service / Management UI / Customizable Analytics Visualization
4 Eagle IntegrationAmbari / Docker / Ranger / Dataguise
Appsà Securityà Hadoopà Cloudà Database
Interfaceà Web Portalà REST Servicesà Analytics Visualization
Integrationà Ambarià Dockerà Rangerà Dataguise Eagle
Framework
Open SourceCommunity-‐driven and Cross-‐community cooperation5
![Page 31: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/31.jpg)
Learn More about Apache EagleCommunity• Website: http://eagle.incubator.apache.org• Github: http://github.com/apache/incubator-‐eagle• Mailing list: [email protected]
Resources• Documentation: http://eagle.incubator.apache.org/docs/• Docker images: https://hub.docker.com/r/apacheeagle/sandbox/
Publications & Patents• EAGLE: USER PROFILE-‐BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE)• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP CLUSTER
![Page 32: Apache Eagle - Monitor Hadoop in Real Time](https://reader031.vdocuments.mx/reader031/viewer/2022022414/586fddfe1a28ab18428b6a41/html5/thumbnails/32.jpg)
Thank you!
http://eagle.incubator.apache.org
The slide is licensed under Creative Commons Attribution 4.0 International license.