managing puppet using mcollective

39
R.I.Pienaar Puppet Camp Ghent Managing Puppet using MCollective

Upload: puppet-labs

Post on 10-May-2015

54.176 views

Category:

Technology


2 download

DESCRIPTION

R.I. Pienaar's talk "Managing Puppet using MCollective" at Puppet Camp Ghent, 2013 and at Puppet Camp New York 2013.

TRANSCRIPT

Page 1: Managing Puppet using MCollective

R.I.Pienaar

Puppet Camp Ghent

Managing Puppet using MCollective

Page 2: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Who am I?

• Puppet user since 0.22.x

• Architect of MCollective

• Author of Extlookup and Hiera

• Developer at Puppet Labs London

• Blog at http://devco.net

• Tweets at @ripienaar

• Volcane on IRC

Page 3: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Problem?

• Puppet needs management just like other software

• Enabling, disabling, ad-hoc runs, custom environments etc

• The Puppet Master is a finite resource that needs protection

• Orchestrated deploys

Page 4: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Available on yum.puppetlabs.com and apt.puppetlabs.com

http://srt.ly/mcpuppet

package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present}

MCollective Puppet Agent

Page 5: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Obtaining The Agent Status

Page 6: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

unix text here

Obtaining Statuses

$ mco puppet status

* [ ============================================================> ] 11 / 11

node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago ....

Summary of Applying:

false = 11

Summary of Daemon Running:

stopped = 11

Summary of Enabled:

enabled = 10 disabled = 1

Summary of Idling:

false = 11

Finished processing 11 / 11 hosts in 72.05 ms

Per node status

Estate wide summary

Page 7: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 5 Nodes currently stopped: 6

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6

Obtaining Statuses

Page 8: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco rpc puppet last_run_summary

* [ ============================================================> ] 28 / 28

. . .

Summary of Config Retrieval Time:

Average: 20.13

Summary of Total Resources:

Average: 435

Summary of Total Time:

Average: 39.33

Finished processing 28 / 28 hosts in 311.23 ms

Obtaining Statuses

Page 9: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Running Puppet

Page 10: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco puppet runonce

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2593.85 ms

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 2 Nodes currently stopped: 9

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8

Doing Basic Runs

Puppet 3 disable message

Run with default configured splay and splaylimit

Page 11: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Run with no splay, still subject to enable/disable

$ mco puppet runonce -f

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

Page 12: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Force splay and set a custom splay limit

$ mco puppet runonce --splay --splaylimit 120

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

Page 13: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selects 2 tags in a specific Puppet Environment

$ mco puppet runonce --tag webserver --tag syslog --environment development

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Tags and Environment

Page 14: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Do a noop run, gathers reports and audit information

$ mco puppet runonce --noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing noop Runs

Page 15: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

When puppet.conf has noop=true,do an actual run on demand

$ mco puppet runonce --tag webserver --no-noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing no-noop Runs

Page 16: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does a single run against a differentPuppet Master

$ mco puppet runonce --server secops.example.net:8134 --tag compliance

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Choosing a Master

Page 17: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Preventing Puppet Runs

Page 18: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Big Red Button

Disables Puppet, does not change currentlydisabled nodes reasons

$ mco puppet disable “we f’d up, stop the train!”

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Could not disable Puppet: Already disabled

Summary of Enabled:

disabled = 11

Finished processing 11 / 11 hosts in 90.06 ms

Page 19: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Big Green Button

Enables all disabled Puppet nodes

$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’

* [ ============================================================> ] 10 / 10

Summary of Enabled:

enabled = 10

Finished processing 10 / 10 hosts in 90.06 ms

Page 20: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Operating On Groups Of Hosts

Page 21: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:all web servers with fact cluster=a

$ mco puppet runonce -W “cluster=a roles::webserver”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Facter fact Puppet Class

Page 22: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:nodes where we manage /srv/www

$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Any Puppet resource

Page 23: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:Most recent run config_version was xyz

that had > 5 resource failures

$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Page 24: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Runs all nodes with a maximum concurrency

$ mco puppet runall 72013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 25: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does not attempt to manage disabled nodes

2013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes

Roll Out A Change Quickly

Page 26: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Starts the first 6 quickly but considersadministrators doing 1other run at the same time

2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7

Roll Out A Change Quickly

Page 27: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

node9 was being run by an administrator or normalschedule already, skipped to next node

2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 28: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Regularly checks the concurrency and startsmore nodes soon as possible.

Average node run time 34.39s, totaltime 55 seconds

2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 29: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does runonce in batches of 5, 5 minute sleepper batch. ^c after any batch to stop.

15 minute total run time.

$ mco puppet runonce --batch 5 --batch-sleep 300

* [ ============================================================> ] 11 / 11

Finished processing 11 / 11 hosts in 903686.29 ms

Roll Out A Change SlowlyWait 5 minutes

Page 30: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Advanced Status And Performance Metrics

Page 31: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of various metrics.

$ mco puppet summary

Summary statistics for 28 nodes:

Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k

Performance Analysis

Page 32: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of various metrics.

Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1

Performance Analysis

Page 33: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of config retrieval time.

$ mco plot resource config_retrieval_time

Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time

Performance Analysis

Slow machines

Page 34: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Find machines with config_retrieval_time over30 seconds - all the dev servers.

$ mco find -S "resource().config_retrieval_time > 30"dev3.example.netdev4.example.netdev7.example.netdev6.example.netdev8.example.netdev9.example.netdev10.example.net

Performance Analysis

Page 35: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Maintenance Windows and Access Control

Page 36: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Only cert=manager can enable and disablethe Puppet Agent indicating maintenance

periods

policy default denyallow cert=manager enable disable * *allow cert=sysadmin runonce status * *allow cert=developer * environment=development *

Puppet State As ACL

Page 37: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Puppet State As ACL

policy default denyallow cert=manager stop start * *allow cert=noc stop start puppet().enabled=falseallow cert=developer * environment=development *

NOC can start and stop servicesonly during a maintenance window.

Manager user can always overridemaintenance windows.

Page 38: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

What is MCollective?

• Ruby framework for writing Orchestration systems

• Provides Authentication, Authorization and Auditing

• No direct communication between client and nodes

Page 39: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Questions?twitter: @ripienaar

email: [email protected]

blog: www.devco.net

github: ripienaar

freenode: Volcane

Questions?