simplified cluster operation and troubleshooting
TRANSCRIPT
![Page 1: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/1.jpg)
Simplified Cluster Operation & Troubleshooting
Alejandro Fernandez + Jayush Luniya
![Page 2: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/2.jpg)
Speakers
Alejandro FernandezSr. Software Engineer @ HortonworksApache Ambari [email protected]
Jayush LuniyaStaff Engineer @ HortonworksApache Ambari [email protected]
![Page 3: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/3.jpg)
What is Apache Ambari?
Apache Ambari is the open-source platform to provision, manage and monitor Hadoop clusters
![Page 4: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/4.jpg)
New Enterprise Features
Ambari 2.4• New Services: Log Search, Zeppelin, Hive
LLAP• Role Based Access Control• Management Packs• Grafana UI for Ambari Metrics System• New Views: Zeppelin, Storm
![Page 5: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/5.jpg)
Apache Ambari Jiras
April 2015
1690 1864
277379
797
206
488
July - Sept 2015
Dec 2015 –Feb 2016
Today
v2.0
v2.1
v2.2v2.41542 and
growing
![Page 6: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/6.jpg)
Deploy
Secure/LDAP
Smart Configs
Monitor
Upgrade
Scale, Extend, Analyz
e
Simply Operations - Lifecycle
Ease-of-Use Deploy
![Page 7: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/7.jpg)
Deploy On Premise
Ambari UI wizard handles all of these combinations and makes
recommendations based on host specs.
![Page 8: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/8.jpg)
Deploy On The Cloud
Certified environmentsSysprepped VMsHundreds of similar clusters
![Page 9: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/9.jpg)
Deploy with Blueprints
• Systematic way of defining a cluster
• Export existing cluster into blueprint/api/v1/clusters/:clusterName?format=blueprint
Configs Topology Hosts Cluster
![Page 10: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/10.jpg)
Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}
{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"
} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"
}, { "fqdn" : "worker002.ambari.apache.org"
}, … { "fqdn" : "worker099.ambari.apache.org"
} ] } ]}
1. POST /api/v1/blueprints/my-blueprint
2. POST /api/v1/clusters/my-cluster
![Page 11: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/11.jpg)
Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}
{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"
} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"
}, { "fqdn" : "worker002.ambari.apache.org"
}, … { "fqdn" : "worker099.ambari.apache.org"
} ] } ]}
1. POST /api/v1/blueprints/my-blueprint
2. POST /api/v1/clusters/my-cluster
![Page 12: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/12.jpg)
Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}
{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"
} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"
}, { "fqdn" : "worker002.ambari.apache.org"
}, … { "fqdn" : "worker099.ambari.apache.org"
} ] } ]}
1. POST /api/v1/blueprints/my-blueprint
2. POST /api/v1/clusters/my-cluster
![Page 13: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/13.jpg)
Create a cluster with Blueprints{ "configurations" : [ { "hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" }}
{ "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org"
} ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org"
}, { "fqdn" : "worker002.ambari.apache.org"
}, … { "fqdn" : "worker099.ambari.apache.org"
} ] } ]}
1. POST /api/v1/blueprints/my-blueprint
2. POST /api/v1/clusters/my-cluster
![Page 14: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/14.jpg)
Blueprints for Large Scale• Kerberos, secure out-of-the-box
• High Availability is setup initially for NameNode, YARN, Hive, Oozie, etc
• Host Discovery allows Ambari to automatically install services for a Host when it comes online
• Stack Advisor recommendations
![Page 15: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/15.jpg)
POST /api/v1/clusters/MyCluster/hosts
[ { "blueprint" : "single-node-hdfs-test2", "host_groups" :[ { "host_group" : "slave", "host_count" : 3, "host_predicate" : "Hosts/cpu_count>1” }, { "host_group" : "super-slave", "host_count" : 5, "host_predicate" : "Hosts/cpu_count>2& Hosts/total_mem>3000000" } ] }]
Blueprint Host Discovery
![Page 16: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/16.jpg)
Kerberos Available since Ambari 2.0
• Ambari manages Kerberos principals and keytabs
• Works with existing MIT KDC or Active Directory• Once Kerberized, handles
• Adding hosts• Adding components to existing hosts• Adding services• Moving components to different hosts
![Page 17: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/17.jpg)
Management Packs - Motivation
• Release Managemento Ambari core and stacks released togethero Stack changes require Ambari releaseoDecouple stack and Ambari core releases
• Add-on ServicesoRelease vehicle for 3rd party serviceso Self contained release artifacts
![Page 18: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/18.jpg)
Management Packs – Release Trains
![Page 19: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/19.jpg)
Management Packs
• Generalized release artifact for stacks, add-on services, views, etc
• Decouples stack releases from Ambari core release
• Tarballs with metadata for applicability and content
• Stack is an overlay of multiple management packs
![Page 20: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/20.jpg)
Overlay of Management Packs
![Page 21: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/21.jpg)
Management Pack++
Short Term Goals (Ambari 2.4)• Retrofit in Stack Processing Framework• Enable 3rd party to ship add-on services• Command line support
Long Term Goals (Future)• Management Pack Framework• Deliver Views• Rest API support
![Page 22: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/22.jpg)
Role Based Access Control (RBAC)
As Ambari & organizations grow,so do security needs
Ambari integrates with external authentication systems & LDAP
![Page 23: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/23.jpg)
RBAC Terms
• Roles have permissions,e.g., add services to cluster
• Roles are applied to Resourcese.g., Ambari, particular Cluster, particular View
• Users belong to groups• A group has a role• Users can also have additional roles
![Page 24: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/24.jpg)
New RBAC Roles
allAmbari Admin
Cluster Admin except manage permissions
Cluster Op except add services, Kerberos,manage Alerts, & upgrades
Service Admin except alter cluster topologyor install components
Service Op except change configsRead-Only only view
![Page 25: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/25.jpg)
Background: Upgrade Terminology
Manual Upgrade
The user follows instructions to upgrade the stack Incurs downtime
![Page 26: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/26.jpg)
Background: Upgrade Terminology
Manual Upgrade
The user follows instructions to upgrade the stack Incurs downtime
Rolling Upgrade
Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact
![Page 27: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/27.jpg)
Background: Upgrade Terminology
ExpressUpgrade
Automated Runs in parallel across hosts Incurs downtime
Manual Upgrade
The user follows instructions to upgrade the stack Incurs downtime
Rolling Upgrade
Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact
![Page 28: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/28.jpg)
Automated Upgrade: Rolling or Express
Check Prerequisites
Review the prereqs to confirm your cluster configs are ready
Prepare
Take backups of critical cluster metadata
Perform Upgrade
Perform the HDP upgrade. The steps depend on upgrade method: Rolling or Express
Register + Install
Register the HDP repository and install the target HDP version on the cluster
Finalize
Finalize the upgrade, making the target version the current version
![Page 29: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/29.jpg)
Process: Rolling Upgrade
ZooKeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Finalize or Downgrade
HDFS, YARN, MR, Tez, HBase, Pig. Hive, etc.
HDFS
YARN
HBase
![Page 30: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/30.jpg)
Grafana for Ambari Metrics
• Grafana as a “Native UI” for Ambari Metrics
• Pre-built DashboardsHost-level, Service-level
• Supports HTTPS
• System Home, Servers
• HDFS Home, NameNodes, DataNodes
• YARN Home, Applications, Job History Server
• HBase Home, Performance, Misc
FEATURES DASHBOARDS
![Page 31: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/31.jpg)
Grafana includes pre-built dashboards for visualizing the most important cluster metrics.
![Page 32: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/32.jpg)
The HDFS NameNodedashboard highlightsfile system activity.
![Page 33: Simplified Cluster Operation and Troubleshooting](https://reader035.vdocuments.mx/reader035/viewer/2022062306/5873c0d91a28abbc788b6637/html5/thumbnails/33.jpg)
Future of Ambari
• Cloud features• Multiple instances of same service at different
versions, e.g., Spark 1.6 and Spark 2.0• YARN assemblies• Component & Patch Upgrades: upgrade
individual components in the same stack version, e.g., just DN and RM in HDP 2.4.*.* with zero downtime