extending and automating cloudera manager via api
DESCRIPTION
As delivered by Patrick Angeles, Director of Field Technical ServicesTRANSCRIPT
1
Cloudera Manager – API’s & Extensibility Patrick Angeles, Director Field Technical Services December 2013
CONFIDENTIAL -‐ RESTRICTED
Cloudera Manager
2
End-‐to-‐End AdministraHon for CDH
Manage Easily deploy, configure & opHmize clusters 1 Monitor Maintain a central view of all acHvity 2 Diagnose Easily idenHfy and resolve issues 3 Integrate Use Cloudera Manager with exisHng tools 4
©2013 Cloudera, Inc. All Rights Reserved.
IntegraHng with your IT Mgmt tools
3 ©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Installa;on, Deployment
tools e.g. Chef, Puppet etc.
Monitoring Tools
e.g. Orion, Tivoli, BMC
etc.
Aler;ng Tools
e.g Nagios, SNMP etc.
Hadoop Opera*ons
Datacenter Opera*ons Various op*ons of integra*ng Cloudera Manager into your exis*ng Datacenter Opera*ons/Tools • Cloudera Manager API
• Introduced in CM4 (June 2012) • Installa*on & deployment • Monitoring
• SNMP Alerts • Introduced in CM4.5 (Feb 2013)
• And more… • Monitoring ‘tsquery’ (Feb 2013) • User-‐defined triggers/alarms (new for C5!) • Service extensibility (new for C5!)
Cloudera Manager (CM) API • API access was a new feature introduced in Cloudera Manager 4.0, providing programmaHc access to
cluster operaHons (such as configuraHon and restart) and monitoring informaHon (such as health and metrics).
• The CM API is an HTTP REST API, using JSON serializaHon. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuraHon. API users have the same privileges as they do in the web UI world.
©2013Cloudera, Inc. All Rights Reserved. 4
• Docs & Examples hZp://cloudera.github.io/cm_api/ hZps://github.com/cloudera/cm_api
• Java/Python clients hZp://blog.cloudera.com/blog/2013/05/how-‐to-‐automate-‐your-‐hadoop-‐cluster-‐from-‐java/
Examples of integraHon with CM API • Installa;on & Deployment
• Chef • Puppet • Dell Crowbar
• hZp://blog.cloudera.com/blog/2013/08/how-‐to-‐deploy-‐hadoop-‐clusters-‐automaHcally-‐with-‐dell-‐crowbar-‐and-‐cloudera-‐manager/ • StackIQ
• hZp://web.stackiq.com/blog/bid/312064/StackIQ-‐Cluster-‐Manager-‐now-‐integrated-‐with-‐Cloudera • WANdisco – non-‐stop NN setup • Several other customers/partners leveraging the API’s as part of their install & deployment
process • Monitoring & Aler;ng
• Oracle Enterprise Manager (via Big Data Appliance) • Nagios
• hZps://github.com/cloudera/cm_api/tree/master/nagios • hZps://github.com/harisekhon/nagios-‐plugins/blob/master/
check_hadoop_cloudera_manager_metrics.pl • SNMP alerts integraHon with IBM Netcool
©2013 Cloudera, Inc. All Rights Reserved. 5
Develop & Contribute your plug-‐in’s using Cloudera Manager API
Cloudera Manager – Monitoring via ‘tsquery’
6
©2013 Cloudera, Inc. All Rights Reserved.
• Introduced as part of CM4.5 release (Feb 2013)
• Great way to add interesHng charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters
• The tsquery language is used to specify statements for retrieving Hme-‐series data from the Cloudera Manager Hme-‐series data store
• Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service? select bytes_read, bytes_wriZen where roleType=DATANODE and serviceName=hdfs1
• Retrieved Hme-‐series data can be ploZed via various opHons – line, bar, scaZer, heat maps, table list etc.
• Extending this concept to create user-‐defined triggers/alarms (new for C5!).
• More details • hZp://www.cloudera.com/content/cloudera-‐content/cloudera-‐docs/CM5/latest/Cloudera-‐
Manager-‐DiagnosHcs-‐Guide/cm5dg_chart_Hme_series_data.html
Examples of Cloudera Manager ‘tsquery’
7
©2013 Cloudera, Inc. All Rights Reserved.
Example1: How do I track the aggregate Cluster Disk IO? select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID
Example2: How do I compare CPU usage across hosts? select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_so`_irq) / getHostFact(numCores, 1) * 100
Create & Contribute your ‘tsqueries’! hZps://github.com/cloudera/cm_charHng_scrapbook
Cloudera Manager – Service Extensibility
• Introduced in C5 • SHll in Beta!
• Some aspects (espcially Parcel mgmt) available in CM4.x
• Example: CollaboraHon with Syncsort to deploy DMX-‐h libraries
• Single management console for CDH, non-‐CDH services and ISV applicaHons
• Similar look and feel as exisHng services
• Easy to write (Java-‐free!)
• Flexible
• Independent release cycle
©2013Cloudera, Inc. All Rights Reserved.
Analogy from OperaHng Systems (OS) world
9 ©2013Cloudera, Inc. All Rights Reserved.
Core OS kernel
Package Mgmt
Process/ Resource Mgmt
Security Mgmt
Data Access Mgmt
ISV’s view of OS
Systems Management
Bringing ISV Apps to CDH
10 ©2013Cloudera, Inc. All Rights Reserved.
Core Hadoop/CDH kernel
Parcels Resource Mgmt
Security Mgmt CDK API’s
ISV’s view of Hadoop
Cloudera Manager
IntegraHng into the Cloudera Product Porpolio
11 ©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Features Descrip;on Examples
Package Mgmt
-‐ Ability to easily package and distribute binaries/jars via “Parcels”
-‐ InformaHca -‐ Syncsort
Resource Mgmt
-‐ Ability to deploy applicaHons as stand-‐alone processes or via YARN* on the Hadoop grid
-‐ Resource isolaHon of cluster resources
-‐ SAS -‐ 0xData -‐ Accumulo
Security Mgmt
-‐ Support for Kerberos Mgmt -‐ Role bases access control for Tables/Views in Hive/Impala via Sentry
Data Access Mgmt
-‐ HDFS and HBase API abstracHon and simplificaHon
Systems Mgmt
Manage -‐ Deploy and upgrade (rolling) services and pkgs -‐ Manage configuraHons
Monitor -‐ ProacHve health checks -‐ Track resource uHlizaHon -‐ Custom metrics charts
Diagnose -‐ Distributed log collecHon and searching -‐ Tag and track key events
Integrate -‐ Access operaHonal tools via API -‐ Surface overall cluster metrics to ISV dashboard
Non-‐CDH Apps…
ISV’s
Accumulo, Spark, Giraph etc.
* Support for YARN planned as part of CM5.x in FY14
So.. How does it work?
• A JSON file that describes of your service • Set of control scripts • Packaged as a JAR file • As promised, Java-‐free
©2013Cloudera, Inc. All Rights Reserved.
Example: Cloudera Manager Extensions -‐ Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
#!/bin/bash CMD=$1 MASTER_PORT=<read in from ./params.proper;es>
case $CMD in (start_master) exec $SPARK_HOME/scripts/spark-‐start.sh master" ;; (*) echo "$;mestamp Don't understand [$CMD]" ;; esac
name : “spark”, roles : [{ name : "master", startRunner : { program : "scripts/control.sh", args : [ "start_master", "./params.proper;es"] }, parameters : [{ name : "master_port", type : "port", default : 7077 }], configWriter : { generators : [{ filename : "params.proper;es" }] }]
The Code
©2013Cloudera, Inc. All Rights Reserved.
Next Steps
• DocumentaHon & SDK as part of C5 Beta2 or later (definitely before GA!)
• Working with select ISV’s (SAS, Syncsort, 0xData etc.) as part of Beta to further fine-‐tune this feature
©2013Cloudera, Inc. All Rights Reserved.
Develop & Contribute your Cloudera Manager service extensibility plug-‐in’s !
Vision of CM Extensibility
©2012Cloudera, Inc. All Rights Reserved. 20
CDH CM
Syncsort Informatica
Security ISV’s 0xData
Capacity Mgr SLA Mgr Cost
Optimizer
API
Horizontal Extension
Vert
ical
Ext
ensi
on
Serv
ice
Exte
nsib
ility
Ops Apps
SAS
Revolution
Spark Giraph Accumulo
Oracle OEM Dell Nagios
API SNMP
Chef/ Puppet
Q&A
©2013Cloudera, Inc. All Rights Reserved.