chef patterns at bloomberg scale - bbhub.io · chef patterns at bloomberg scale ......

39
Chef Patterns at Bloomberg Scale // CHEF PATTERNS AT BLOOMBERG SCALE HADOOP INFRASTRUCTURE TEAM https://github.com/bloomberg/chef-bach Freenode: #chef-bach

Upload: vuongquynh

Post on 28-Aug-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

CHEF PATTERNS AT BLOOMBERG SCALE HADOOP INFRASTRUCTURE TEAM

https://github.com/bloomberg/chef-bach

Freenode: #chef-bach

Page 2: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

BLOOMBERG CLUSTERS 2

• APPLICATION SPECIFIC

• Hadoop, Kafka

• ENVIRONMENT SPECIFIC

• Networking, Storage

• BUILT REGULARLY

• DEDICATED “BOOTSTRAP” SERVER

• Virtual Machine

• DEDICATED CHEF-SERVER

Page 3: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

WHY A VM? 3

• LIGHTWEIGHT PRE-REQUISITE

• Low memory/Storage Requirements

• RAPID DEPLOYMENT

• Vagrant for Bring-Up

• Vagrant for Re-Configuration

• EASY RELEASE MANAGEMENT

• MULTIPLE VM PER HYPERVISOR

• Multiple Clusters

• EASY RELOCATION

Page 4: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICES OFFERED 4

• REPOSITORIES

• APT

• Ruby Gems

• Static Files (Chef!)

• CHEF SERVER

• KERBEROS KDC

• PXE SERVER

• DHCP/TFTP Server

• Cobbler (https://github.com/bloomberg/cobbler-cookbook)

• Bridged Networking (for test VMs)

• STRONG ISOLATION

Page 5: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

BUILDING BOOTSTRAP 5

• CHEF AND VAGRANT

• Generic Image (Jenkins)

• NETWORK CONFIGURATION

• CORRECTING “KNIFE.RB”

• CHEF SERVER RECONFIGURATION

• CLEAN UP (CHEF REST API)

• CONVERT BOOTSTRAP TO BE AN ADMIN CLIENT

• Secrets/Keys

Page 6: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

BUILDING BOOTSTRAP 6

• CHEF-SOLO PROVISIONER

# Chef provisioning bootstrap.vm.provision "chef_solo" do |chef|

chef.environments_path = [[:vm,""]]

chef.environment = env_name

chef.cookbooks_path = [[:vm,""]]

chef.roles_path = [[:vm,""]]

chef.add_recipe("bcpc::bootstrap_network")

chef.log_level="debug"

chef.verbose_logging=true

chef.provisioning_path="/home/vagrant/chef-bcpc/"

end

• CHEF SERVER RECONFIGURATION

• NGINX, SOLR, RABBITMQ

# Reconfigure chef-server bootstrap.vm.provision :shell, :inline => "chef-server-ctl reconfigure"

Page 7: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

BUILDING BOOTSTRAP 7

• CLEAN UP (REST API)

ruby_block "cleanup-old-environment-databag" do

block do

rest = Chef::REST.new(node[:chef_client][:server_url], "admin", \

"/etc/chef-server/admin.pem")

rest.delete("/environments/GENERIC")

rest.delete("/data/configs/GENERIC")

end

end

ruby_block "cleanup-old-clients" do

block do

system_clients = ["chef-validator", "chef-webui"]

rest = Chef::REST.new(node[:chef_client][:server_url], "admin", \

"/etc/chef-server/admin.pem")

rest.get_rest("/clients").each do |client|

if !system_clients.include?(client.first)

rest.delete("/clients/#{client.first}")

end

end

end

end

Page 8: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

BUILDING BOOTSTRAP 8

• CONVERT TO ADMIN (BOOTSTRAP_CONFIG.RB)

ruby_block "convert-bootstrap-to-admin" do

block do

rest = Chef::REST.new(node[:chef_client][:server_url],

"admin",

"/etc/chef-server/admin.pem")

rest.put_rest("/clients/#{node[:hostname]}",{:admin => true})

rest.put_rest("/nodes/#{node[:hostname]}",

{ :name => node[:hostname],

:run_list => ['role[BCPC-Bootstrap]'] }

)

end

end

Page 9: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

CLUSTER USABILITY 9

• CODE DEPLOYMENT

• APPLICATION COOKBOOKS

• RUBY GEMS

• Zookeeper, WebHDFS

• CLUSTERS ARE NOT SINGLE MACHINE

• Which machine to deploy

• Idempotency; Races

Page 10: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

DEPLOY TO HDFS 10

• USE CHEF DIRECTORY RESOURCE

• USE CUSTOM PROVIDER

• https://github.com/bloomberg/chef-

bach/blob/master/cookbooks/bcpc-

hadoop/libraries/hdfsdirectory.rb

directory “/projects/myapp” do mode 755 owner “foo” recursive true provider BCPC::HdfsDirectory end

Page 11: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

DEPLOY KAFKA TOPIC 11

• USE LWRP

• Dynamic Topic; Right Zookeeper

• PROVIDER CODE AVAILABLE AT

• https://github.com/mthssdrbrg/kafka-cookbook/pull/49

# Kafka Topic Resource

actions :create, :update

attribute :name, :kind_of => String , :name_attribute => true

attribute :partitions, :kind_of => Integer, :default => 1

attribute :replication, :kind_of => Integer, :default => 1

Page 12: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

KERBEROS 12

• KEYTABS

• Per Service / Host

• Up to 10 Keytabs per Host

• WHAT ABOUT MULTI HOMED HOSTS?

• Hadoop imputes _HOST

• PROVIDERS

• WebHDFS uses SPNEGO

• SYSTEM ROLE ACCOUNTS

• TENANT ROLE ACCOUNTS

• AVAILABLE AT

• https://github.com/bloomberg/chef-bach/tree/kerberos

Page 13: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

LOGIC INJECTION 13

• COMPLETE CODE CAN BE FOUND AT

• Community cookbook

• https://github.com/mthssdrbrg/kafka-cookbook#controlling-restart-of-kafka-brokers-in-a-cluster

• Wrapper custom recipe

• https://github.com/bloomberg/chef-bach/blob/rolling_restart/cookbooks/kafka-bcpc/recipes/coordinate.rb

Statutory Warning

Code snippets are edited to fit the slides which may have resulted in logic

incoherence, bugs and un-readability. Readers discretion requested.

Page 14: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

LOGIC INJECTION 14

• WE USE COMMUNITY COOKBOOKS

• Takes care of standard install, enable and starting of services

• NEED TO ADD LOGIC TO COOKBOOK RECIPES

• Take action on a service only when conditions are satisfied

• Take action on a service based on dependent service state

Page 15: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

template ::File.join(node.kafka.config_dir, 'server.properties') do

source 'server.properties.erb'

...

helpers(Kafka::Configuration)

if restart_on_configuration_change?

notifies :restart, 'service[kafka]', :delayed

end

end

service 'kafka' do

provider kafka_init_opts[:provider]

supports start: true, stop: true, restart: true, status: true

action kafka_service_actions

end

LOGIC INJECTION 15

VANILLA COMMUNITY COOKBOOK:

Page 16: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

template ::File.join(node.kafka.config_dir, 'server.properties') do

source 'server.properties.erb'

...

helpers(Kafka::Configuration)

if restart_on_configuration_change?

notifies :restart, 'service[kafka]', :delayed

end

end

#----- Remove ----#

service 'kafka' do

provider kafka_init_opts[:provider]

supports start: true, stop: true, restart: true, status: true

action kafka_service_actions

end

#----- Remove----#

LOGIC INJECTION 16

VANILLA COMMUNITY COOKBOOK:

Page 17: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

template ::File.join(node.kafka.config_dir, 'server.properties') do

source 'server.properties.erb’

...

helpers(Kafka::Configuration)

if restart_on_configuration_change?

notifies :create, 'ruby_block[pre-shim]', :immediately

end

end

#----- Replace----#

include_recipe node["kafka"]["start_coordination"]["recipe"]

#----- Replace----#

LOGIC INJECTION 17

VANILLA COMMUNITY COOKBOOK 2.0:

Page 18: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ruby_block 'pre-shim' do

# pre-restart no-op

notifies :restart, 'service[kafka] ', :delayed

end

service 'kafka' do

provider kafka_init_opts[:provider]

supports start: true, stop: true, restart: true, status: true

action kafka_service_actions

end

LOGIC INJECTION 18

COOKBOOK COORDINATOR RECIPE:

Page 19: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ruby_block 'pre-shim' do

# pre-restart done here

notifies :restart, 'service[kafka] ', :delayed

end

service 'kafka' do

provider kafka_init_opts[:provider]

supports start: true, stop: true, restart: true, status: true

action kafka_service_actions

notifies :create, 'ruby_block[post-shim] ', :immediately

end

ruby_block 'post-shim' do

# clean-up done here

end

LOGIC INJECTION 19

WRAPPER COORDINATOR RECIPE:

Page 20: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE ON DEMAND 20

• COMMON SERVICE WHICH CAN BE REQUESTED

• Copy log files from applications into a centralized location

• Single location for users to review logs and helps with security

• Service available on all the nodes

• Applications can request the service dynamically

Page 21: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE ON DEMAND 21

• NODE ATTRIBUTE TO STORE SERVICE REQUESTS

default['bcpc']['hadoop']['copylog'] = {}

• DATA STRUCTURE TO MAKE SERVICE REQUESTS

{

'app_id' => { 'logfile' => "/path/file_name_of_log_file",

'docopy' => true (or false)

},...

}

Page 22: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE ON DEMAND 22

• APPLICATION RECIPES MAKE SERVICE REQUESTS

# # Updating node attributes to copy HBase master log file to HDFS

#

node.default['bcpc']['hadoop']['copylog']['hbase_master'] = {

'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log",

'docopy' => true

}

node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = {

'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out",

'docopy' => true

}

Page 23: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE ON DEMAND 23

• RECIPE FOR THE COMMON SERVICE

node['bcpc']['hadoop']['copylog'].each do |id,f|

if f['docopy']

template "/etc/flume/conf/flume-#{id}.conf" do

source "flume_flume-conf.erb”

action :create ...

variables(:agent_name => "#{id}",

:log_location => "#{f['logfile']}" )

notifies :restart,"service[flume-agent-multi-#{id}]",:delayed

end

service "flume-agent-multi-#{id}" do

supports :status => true, :restart => true, :reload => false

service_name "flume-agent-multi"

action :start

start_command "service flume-agent-multi start #{id}"

restart_command "service flume-agent-multi restart #{id}"

status_command "service flume-agent-multi status #{id}"

end

Page 24: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

PLUGGABLE ALERTS 24

• SINGLE SOURCE FOR MONITORED STATS

• Allows users to visualize stats across different parameters

• Didn’t want to duplicate the stats collection by alerting system

• Need to feed data to the alerting system to generate alerts

Page 25: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

PLUGGABLE ALERTS 25

• ATTRIBUTE WHERE USERS CAN DEFINE ALERTS

default["bcpc"]["hadoop"]["graphite"]["queries"] = {

'hbase_master' => [

{ 'type' => "jmx",

'query' => "memory.NonHeapMemoryUsage_committed",

'key' => "hbasenonheapmem",

'trigger_val' => "max(61,0)",

'trigger_cond' => "=0",

'trigger_name' => "HBaseMasterAvailability",

'trigger_dep' => ["NameNodeAvailability"],

'trigger_desc' => "HBase master seems to be down",

'severity' => 1

},{

'type' => "jmx",

'query' => "memory.HeapMemoryUsage_committed",

'key' => "hbaseheapmem",

...

},...], ’namenode' => [...] ...}

Query to pull stats

from data source

Define alert criteria

Page 26: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

TEMPLATE PITFALLS 26

• LIBRARY FUNCTION CALLS IN WRAPPER COOKBOOKS

• Community cookbook provider accepts template as an attribute

• Template passed from wrapper makes a library function call

• Wrapper recipe includes the module of library function

Page 27: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

TEMPLATE PITFALLS 27

...

Chef::Resource.send(:include, Bcpc::OSHelper)

...

cobbler_profile "bcpc_host" do

kickstart "cobbler.bcpc_ubuntu_host.preseed"

distro "ubuntu-12.04-mini-x86_64”

end

...

...

d-i passwd/user-password-crypted password

<%="#{get_config(@node, 'cobbler-root-password-salted')}"%>

d-i passwd/user-uid string

...

• WRAPPER RECIPE

• FUNCTION CALL IN TEMPLATE

Page 28: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

TEMPLATE PITFALLS 28

...

d-i passwd/user-password-crypted password

<%="#{Bcpc::OSHelper.get_config(@node, 'cobbler-root-password-

salted')}"%>

d-i passwd/user-uid string

...

• MODIFIED FUNCTION CALL IN TEMPLATE

Page 29: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

DYNAMIC RESOURCES 29

• ANIT-PATTERN? ruby_block "create namenode directories" do

block do

node[:bcpc][:storage][:mounts].each do |d|

dir = Chef::Resource::Directory.new("#{mount_root}/#{d}/dfs/nn",

run_context)

dir.owner "hdfs"

dir.group "hdfs"

dir.mode 0755

dir.recursive true

dir.run_action :create

exe = Chef::Resource::Execute.new("fixup nn owner", run_context)

exe.command "chown -Rf hdfs:hdfs #{mount_root}/#{d}/dfs"

exe.only_if {

Etc.getpwuid(File.stat("#{mount_root}/#{d}/dfs/").uid).name !=

"hdfs "

}

end

end

Page 30: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

DYNAMIC RESOURCES 30

• SYSTEM CONFIGURATION

• Lengthy Configuration of a Storage Controller

• Setting Attributes at Converge Time

• Compile Time Actions?

• MUST WRAP IN RUBY_BLOCK’S

• Does not Update the Resource Collection

• Lazy’s everywhere:

• Guards: not_if{lazy{node[…]}.call.map{…}}

Page 31: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE RESTART 31

• WE USE JMXTRANS TO MONITOR JMX STATS

• Service to be monitored varies with node

• There can be more than one service to be monitored

• Monitored service restart requires JMXtrans to be restarted**

Page 32: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE RESTART 32

• DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

"default_attributes" : {

"jmxtrans”:{

"servers”:[

{

"type": "datanode",

"service": "hadoop-hdfs-datanode",

"service_cmd":

"org.apache.hadoop.hdfs.server.datanode.DataNode"

}, {

"type": "hbase_rs",

"service": "hbase-regionserver",

"service_cmd":

“org.apache.hadoop.hbase.regionserver.HRegionServer"

}

]

} ...

Dependent Service Name

String to uniquely identify

the service process

Page 33: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE RESTART 33

• JMXTRANS SERVICE RESTART LOGIC BUILT DYNAMICALLY

jmx_services = Array.new

jmx_srvc_cmds = Hash.new

node['jmxtrans']['servers'].each do |server|

jmx_services.push(server['service'])

jmx_srvc_cmds[server['service']] = server['service_cmd']

end

service "restart jmxtrans on dependent service" do

service_name "jmxtrans"

supports :restart => true, :status => true, :reload => true

action :restart

jmx_services.each do |jmx_dep_service|

subscribes :restart, "service[#{jmx_dep_service}]", :delayed

end

only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar”,

jmx_srvc_cmds)}

end

What if a

process is

re/started

externally?

Store the dependent service

name and process ids in

local variables

Subscribes from all

dependent services

Page 34: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

SERVICE RESTART 34

def process_require_restart?(process_name, process_cmd, dep_cmds)

tgt_proces_pid = `pgrep -f #{process_cmd}`

...

tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}`

...

ret = false

restarted_processes = Array.new

dep_cmds.each do |dep_process, dep_cmd|

dep_pids = `pgrep -f #{dep_cmd}`

if dep_pids != ""

dep_pids_arr = dep_pids.split("\n")

dep_pids_arr.each do |dep_pid|

dep_process_stime = `ps --no-header -o start_time #{dep_pid}`

if DateTime.parse(tgt_proces_stime) <

DateTime.parse(dep_process_stime)

restarted_processes.push(dep_process)

ret = true

end ...

Start time of the service process

Start time of all the service processes on

which it is dependent on

Compare the start time

Page 35: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ROLLING RESTART 35

• AUTOMATIC CONVERGENCE

• AVAILABILITY

• High Availability

• Toxic Configuration

• HOW

• Check Masters for Slave Status

• Synchronous Communication

• Locking

Page 36: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ROLLING RESTART 36

• FLAGGING

• Negative Flagging – flag when a service is down

• Positive Flagging – flag when a service is reconfiguring

• Deadlock Avoidance

• CONTENTION

• Poll & Wait

• Fail the Run

• Simply Skip Service Restart and Go On

• Store the Need for Restart

• Breaks Assumptions of Procedural Chef Runs

Page 37: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ROLLING RESTART 37

HADOOP_SERVICE "ZOOKEEPER-SERVER" DO

DEPENDENCIES ["TEMPLATE[/ETC/ZOOKEEPER/CONF/ZOO.CFG]",

"TEMPLATE[/USR/LIB/ZOOKEEPER/BIN/ZKSERVER.SH]",

"TEMPLATE[/ETC/DEFAULT/ZOOKEEPER-SERVER]"]

PROCESS_IDENTIFIER "ORG.APACHE.ZOOKEEPER ... QUORUMPEERMAIN"

END

• SERVICE DEFINITION

Page 38: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

ROLLING RESTART 38

• SYNCH STATE STORE

• Zookeeper

• SERVICE RESTART (KAFKA) VALIDATION CHECK

• Based on Jenkins pattern for wait_until_ready!

• Verifies that the service is up to an acceptable level

• Passes or stops the Chef run

• FUTURE DIRECTIONS

• Topology Aware Deployment

• Data Aware Deployment

Page 39: CHEF PATTERNS AT BLOOMBERG SCALE - bbhub.io · CHEF PATTERNS AT BLOOMBERG SCALE ... ("bcpc::bootstrap_network") ... • DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES

Ch

ef P

att

ern

s a

t B

loom

berg

Sca

le

//

WE ARE HIRING JOBS.BLOOMBERG.COM:

https://github.com/bloomberg/chef-bach

Freenode: #chef-bach

• Hadoop Infrastructure Engineer

• DevOps Engineer Search Infrastructure