puppet at github / chatops

Post on 10-May-2015

39.924 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

"Puppet at GitHub / ChatOps" from PuppetConf 2012, by Jesse Newland Video of "Puppet at GitHub": http://bit.ly/WVS3vQ Learn more about Puppet: http://bit.ly/QQoAP1 Abstract: Ops at GitHub has a unique challenge - keeping up with the rabid pace of features and products that the GitHub team develops. In this talk, we'll focus on tools and techniques we use to rapidly and confidently ship infrastructure changes/features with Puppet using Puppet-Rspec, CI, Puppet-Lint, branch puppet deploys, and Hubot. Speaker Bio: Jesse Newland does Ops at GitHub. His favorite hobby is SPOF wack-a-mole, followed closely by guitar and piano. Prior to GitHub, Jesse was the CTO at Rails Machine where he ran a large private cloud and managed several hundred production Ruby on Rails applications using Puppet. To the delight and/or chagrin of the Puppet community, Jesse is to blame for Moonshine, the Ruby DSL for Puppet before Puppet had a Ruby DSL.

TRANSCRIPT

Jesse Newlandjnewland

hey errbodymy name is jesse newlandI do ops at GitHub

atPuppet

GitHubAnd today I’m going to be talking about Puppet at GitHub.

Really, I’m telling a story in two parts.

All of the amazing Puppet OSS projects @rodjek

has written but doesn’t want to talk about

First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has written but doesn’t want to talk about

and how we use them at GitHub

*And then, I want to introduce you to the star of the GitHub Ops team, Hubot, and tell you a little bit about something we’ve been calling ChatOps

Setupthe

But, before I get into all of that, I'm actually going to talk about an upcoming talk, one by a coworker of mine at GitHub. Will Farrington is going to be speaking tomorrow at 2:45pm about The Setup, our Puppet-powered GitHubber laptop management solution. It's amazing. It's one of the coolest uses of Puppet I've ever seen, and it's going to completely change the way you think about your development environment.

But I’m not going to be talking about any of that today.

So, yeah, go to Will's talk tommorrow. You won't be disappointed.

atPuppet

GitHubSo I guess you could say that I’m talking about

ofPuppetatGitHub

THE RESTthe rest of puppet at github. For the scope of this talk, I’m going to be talking about the Puppet infrastructure that runs github.com

4 years, >100k LOC

We’ve been managing GitHub’s infrastructure with Puppet for 4 years, since the move to Rackspace. There’s a ton of code, and we’re developing at a rapid pace.

SimpleBut we are obsessed with keeping our Puppet deployment simple

Single Master

We use a single puppetmaster running lots of unicorns. Nothing fancy. It works for now.

However, we will need to scale this tier up or out in about 6 months if the trends look right. We’ll probably switch to two load balanced puppetmasters around that time.

# cat /etc/cron.d/puppet 13 * * * * root /usr/bin/

cron FTWWe don’t run the agent, but rather run puppet on cron every hour in combination with runs triggered via Hubot (more on that later)

NoENC

We don’t use an external node classifier

$ cat manifests/nodes/janky.rscloud.pp

node /^janky\d+\.rscloud\.github\.com$/ { github::role::janky { 'janky': public_address => dns_lookup($fqdn), nginx_hostname => $fqdn, }}

([a-z0-9\-_]+)(\d+)([a-z]?)\.(.*)\.github.com

Instead, we give nodes DNS names that adhere to a naming convention that maps them to a pre-defined role

$ head modules/github/manifests/role/janky.pp

define github::role::janky($public_address, $nginx_hostname='', $god=true ) {

github::core { 'janky': }

include github::app::janky

github::nginx { 'janky': }

}

Where the magic happens

Role definitions are where the magic happens. We try to DRY common functionality into our core module and into other simple classes or defines so that role definitions read like a nice summary of what makes this role different from others

augeas { 'my.cnf/avoid_cardinality_skew': context => '/files/etc/mysql/my.cnf/mysqld/', changes => [ 'set innodb_stats_auto_update 0', 'set innodb_stats_on_metadata 0', 'set innodb_stats_on_metadata 64' ], require => Percona::Server[$::fqdn], }

Heavy use of augeas

We generally try to avoid templates for configuration files in favor of using aw ge us

Lets us manage the small pieces of configuration we care about and use the OS defaults for the things we don't.

BORINGBut I don’t want to just show all of you Puppet code for thirty minutes. That's boring

What’s interesting about Puppet at

GitHub?I’d rather talk about what's interesting about how we use Puppet at GitHub. And what I think is the most interesting is that we focus heavily on ensuring the Puppet development workflow is easily accessible to everyone at GitHub.

Making Puppet Less

ScaryWe’re doing our best to make puppet less scary for people that aren’t familiar with it, so they can help the Ops team grow and evolve our infrastructure. We’re doing some things right here, but there’s still a lot of work to do.

I’ve been thinking about this a lot recently as we’ve just had two large infrastructure projects shipped by people that were completely or relatively new to puppet. First, Derek Greentree shipped a Cassandra cluster,,,

And Adam Roben shipped puppet manifests for our windows build and CI servers.

thisis

goodThis is an awesome trend, and I want it to continue. So I thought I’d talk a bit today about what we’re doing to try to enable even more of this.

Flow just like a (GitHub)

Ruby projectFor us, an important part of making Puppet development accessible for other developers at GitHub is making the development flow on our puppet codebase as similar as possible to that of any other GitHub Ruby project. That means sticking with some common conventions

$ ./script/bootstrap

Setup

Like making it as easy to setup as any other project at GitHub

$ cat Gemfilesource :rubygems

gem 'puppet', '2.7.18'gem 'facter', '1.6.10'gem 'rspec-puppet', '0.1.2'gem 'rake', '0.8.7'gem 'puppet-lint', '0.2.1'gem 'ruby-augeas', '0.3.0'gem 'json', '1.5.1'gem 'fog', '1.3.1'gem 'librarian-puppet', '0.9.4'gem 'parallel_tests'

So ruby deps are managed by Bundler

$ cat Puppetfile

forge "http://forge.puppetlabs.com"

mod 'puppetlabs/apt'...

And puppet deps are managed by librarian-puppet, a bundler-like library that manages the puppet modules your infrastructure depends on and install them directly from GitHub repositories.

I’m of the opinion that the unit of open source currency is no longer a tarball downloaded from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the same way, so Tim wrote librarian puppet

rodjek / librarian-puppet

For those of you keeping score at home, that’s the first of Tim Sharpe’s open source projects that I’ve mentioned. Hi Tim!

Making puppet flow like other projects at GitHub means ensuring we have good editor support for the language

$ ./script/cibuild

Tests

It means running tests is a simple one-step process

TESTS!Tests are super important. A solid and easy to use test harness helps build developer confidence in a new language.

Safetynet

And tests are crucial safety net for helping people cut their teeth on Puppet if they haven’t ever touched it before.

should contain_github__firewall_rule('internal_network')

should contain_ssmtp__relay_to('smtp').with_relay_host('smtp')

should contain_file('/etc/logstash/logstash.conf')

should include_class('github::ksplice')

should contain_networking__bond('bond0').with( :gateway => '172.22.0.2', :arp_ip_target => '172.22.0.2', :up_commands => nil )

rspec-puppet

We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right now.

It’s amazing.

There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW to use it today, just touch a little bit on how WE use it.

describe 'github::role::fe' do let(:title) { 'fe' } let(:node) { 'fe1.rs.github.com' } let(:params) { { :public_address => '207.97.227.242/27', :private_address => '172.22.1.59/22', :git_weight => '16' } } let(:facts) { { :ipaddress => '172.22.1.59', :operatingsystem => 'Debian', :datacenter => 'rackspace-iad2', } }

it do should contain_github__core('fe') ... endend

rolespecs

areking

We try our best to adequately test our individual puppet modules, but our central and most frequently touched specs exercise our role system. There’s one spec for each role which describes its intended functionality.

These specs focus on critical functionality of each role, and help a great deal to build confidence that we’re not introducing regressions when adding or refactoring functionality or working in other roles.

$ git commit -am "lolbadchange"modules/github/manifests/role/fe.pp:err: Could not parse for environment production: Syntax error at 'allow_outbound_syslog'; expected '}' at /Users/jnewland/github/puppet/modules/github/manifests/role/fe.pp:31modules/github/manifests/role/fe.pp - WARNING: => is not properly aligned on line 626

.git/hooks/pre-commit

For an even faster feedback loop than running specs, all Puppet dev environments automatically get setup with a pre-commit hook that checks for syntax errors and ensures your changes confirm to the Puppet Style guide.

This has proved amazingly useful for Puppet novices and experts alike, novices finding it helps them understand language conventions quickly and guides them towards solutions, and experts using it to catch typos and help them not look like novices.

specs run on each push

auto deploy on CI passrspec-puppet and puppet-lint are automatically run by CI on every commit on every branch pushed to our Puppet repo.

Once master passes CI, puppet is automatically deployed

As you can see, Hubot automates a lot of the process of rolling out Puppet

That example covered pushing changes to master, but what about a Pull-Request based workflow?

Say we have a pull request for a branch we want to merge, and that we’ve reviewed the code and it all looks good.

environments

branches==

On each deploy, we turn all git branches into puppet environments.

This combined with heaven, our capistrano-powered deployment API we interact with via Hubot, enables us to experiment with unmerged Puppet branches in a powerful way

So, to safely merge this pull request...

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

You might ask Hubot to confirm its build status

Build #108816 (5fe75932f26ea62cb5fc5e3d0cb302cc2461d11e) of puppet/git-gh13 was successful(421s) github/

puppet@567ea48...5fe7593

Yup, looks good.

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

Then roll the branch out to a staging box to make everything applies cleanly there.

** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.stg.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.stg.github.com] notice: /Stage[main] Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...

Yup, looks good.

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

Then, if you wanted an extra layer of confidence, you could noop the branch against a production node

** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: would have changed from '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...

Yup, looks good

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to all affected nodes over the next hour.

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

If you wanted the rollout to happen faster than that, you could force a puppet run on the affected class of nodes

** [out :: REDACTED ] Bootstrapping...** [out :: REDACTED ] Gem environment up-to-date.** [out :: REDACTED ] Running librarian-puppet...** [out :: REDACTED ] Generating puppet environments...** [out :: REDACTED ] Cleaning up deleted branches...** [out :: REDACTED ] Done!** [out :: REDACTED ] Sending 'restart' command** [out :: REDACTED ] The following watches were affected:** [out :: REDACTED ] puppetmaster_unicorn** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs7b.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090'** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'** [out :: fs7b.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13'...

Yup, that looks good.

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

Then you’d probably want to check out load to make sure nothing went crazy

Yup, looks good

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

...and maybe check some logs or other related metrics to confirm your change didn’t break something

Yup, looks good

ChatOpsHow we interact with Puppet via Hubot is a great example of a core principal of how we do ops at GitHub. We’ve been calling it ChatOps recently.

Essentially, ChatOps is the result of Hubot becoming sentient, and decreeing, among other things, that we now address him as “Supreme Leader” and communicate with our infrastructure though his secure channels alone.

We occasionally observe him speaking in tongues that sound eerily like YouTube comments.

HubotActually, that’s not it at all. Hubot is the star of our Ops team.

heavenjankyshell

graphmeHubot

We use hubot day in day out to interact with other simple tools we’ve written over JSON apis.

hubotheaven

jankyshellgraphme

ALL OFTHE APIS

Hubot interacts nicely with tons of external APIs too. If you have a JSON API, making your service work with Hubot is a piece of cake.

Why is this stupid chat bot so

important to Ops? But why do we obsess about Hubot so much? It’s just a chat bot, right?

There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops has grown

hubot ci status puppet/git-gh13

deploy:apply puppet/git-gh13 staging/fs1

deploy:noop puppet/git-gh13 prod/fs1

# merge pull request

hubot deploy:apply puppet to prod/fs

graph me -1h @collectd.load(fs*)

log me hooks github/github

Remember the flow I just showed you for rolling out puppet changes to our infrastructure?

Everyone sees all of that happen

on their first dayEveryone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops room, right in the middle of the conversation in campfire.

You don’t just see how to roll out puppet, you see how to...

hubot ci status github/smoke-perf

check the status of branch’s last build

hubot deploy github/smoke-perf to prod/fe1

deploy a any branch of any github app to any server

hubot graph me -10min @app-perf

get graphs of the app’s recent performance

hubot procs unicorn

check the status of unicorns across all frontends

hubot resque critical

check the status of the resque critical queue

hubot graph me -10min @collectd.load(fe*)

check load on the frontends

hubot conns fe1

check current connections to a frontend that you suspect has a problem

hubot log me smoke fe1

grab smoke logs for that frontend and realize that you did, in fact, break it

hubot lbctl disable fe1

take it out of the load balancer

hubot status yellow Bad deploy. Reverting now.

update the status blog

hubot who’s on call

determine who is currently on call so you can apologize to them

hubot pingdom checks

check pingdom to make sure you haven’t broken everything

hubot upset me

chill yourself out really quick

hubot deploy github to prod/fe1

revert back to master on the busted frontend

hubot log me smoke fe1

verify things have returned to normal

hubot air drum me

get pumped up because you fixed it

hubot lbctl enable fe1

bring the fixed frontend back into the rotation

hubot status green All systems go.

clear alerts on the status page

hubot whois 4.9.23.22

Once the outage has been resolved, you might see how to grab whois information for an IP that exhibited suspicious activity in the logs you saw

hubot khanify spammers

and how to hit meme generator to make a joke when you realize that IP is a spammer

hubot play in the air tonight

then someone would queue up the song that popped into their head when they thought about drums and gorillas at the same time

hubot tweet@github PuppetConf Drinkup Friday night at 8:30 at Zeke’s (3rd & Brannan)

and then finish it all off with a tweet about the Drinkup we’re throwing friday night

ChatOpsChatOps means building tools that make it easier to operate your infrastructure via Hubot than via Terminal or Chrome

By placing tools directly in the middle of the conversation

Because...

Everyoneis pairing

all of the timeThis is the core concept behind ChatOps.

Teachingby

doingTeaching by doing is awesome

This was always my main motivation with hubot - teaching

by doing by making things visible. It's an extremely

powerful teachingtechnique - @rtomayko

Ryan Tomayko had this in mind from the very first commits to hubot, which just presented a simple wrapper around a repository of shell scripts we use for management and monitoring our infrastructure.

This is how I respond to “how to I do X” questions in Campfire now.

If there’s not yet Hubot functionality to do a thing, we try to write it.

Communicateby

doingPlacing tools in the middle of the conversation also means you get communication of your work for free.

If you’re doing something in a shell or on a website, you have to do it, then tell people about it. If you do it with hubot, that comes free.

THINGS IHAVEN’T ASKED

RECENTLYFor example, here are a few things I haven’t asked recently because Hubot has told me the answer

THINGS IHAVEN’T ASKED

RECENTLYhow’s that deploy going?

THINGS IHAVEN’T ASKED

RECENTLYhow’s that deploy going?

are you deploying that or should i?

THINGS IHAVEN’T ASKED

RECENTLYhow’s that deploy going?

are you deploying that or should i?

is anyone responding to that nagios alert?

THINGS IHAVEN’T ASKED

RECENTLYhow’s that deploy going?

are you deploying that or should i?

is anyone responding to that nagios alert?

is that branch green?

THINGS IHAVEN’T ASKED

RECENTLY

is that branch green?

how’s that deploy going?are you deploying that or should i?

is anyone responding to that nagios alert?

how does load look?

THINGS IHAVEN’T ASKED

RECENTLY

is that branch green?

how’s that deploy going?are you deploying that or should i?

is anyone responding to that nagios alert?

how does load look?

did anyone update the status page?

THINGS IHAVEN’T ASKED

RECENTLYhow’s that deploy going?

are you deploying that or should i?

is anyone responding to that nagios alert?

is that branch green?

how does load look?did that deploy finish?

did anyone update the status page?

Free communication is especially crucial in a distributed environment.

Our Ops team is entirely remote, so Campfire is our default means of communication.

http://www.flickr.com/photos/7997249@N06/6061305639/This is extremely helpful during outages or other situations that require tactical response.

You don’t have to SAY that you’re spraying water on the fire, people SEE you doing it.

Hidethe

uglyAnother awesome benefit of ChatOps-ing all of the things is that you can hide ugly interfaces and design exactly the interaction you want with some simple porcelain commands

My favorite example of this is ugliest of the ugly, Nagios.

[nines] hubot opened issue #4263: Nagios (229906) - fs3b/syslog - Tue Sept 25 23:40:18 PDT 2012. github/nines#4263

Hubot politely delivers nagios alerts directly into chat

hubot nagios ack fs3b/syslog

# fix stuff

nagios check fs3b/syslog

nagios status fs3b/syslog

hubot nagios downtime fs3b/syslog 90

nagios mute fs3b/syslog

nagios unmute fs3b/syslog

Which we can interact with without any unnecessary eye bleeding. Making this easy means developers and other ops engineers actually mute or schedule downtime when they’re testing things.

Mobile

FTWYet another awesome benefit of ChatOps is that you get mobile support for free

Well, that is, if you have a team of awesome iOS developers that have built an actually functioning Campfire client for the iPhone

This lets you do anything hubot can do from your phone.

Which means from your couch. Or your bed. Or a beach in Hawaii.

Which means you can fix a lot of things without pulling your laptop out of your bag.

ChatOpsThat’s ChatOps at its finest.

And now for something completely different

While I’m showing off mobile stuff, I thought I’d slip in a demo of something else we’ve done to make Ops more mobile friendly.

We’ve hacked together support for PagerDuty alerts via Apple Push Notifications. When you swipe on the alert, you go directly to the PagerDuty mobile UI for an incident

Which lets you ack an alert

while you’re still in bed

or on the couch.

BoomI can’t even begin to tell you how happy this makes me, and how less shitty it makes being on-call

So, who better to summarize all of this than Hubot himself. I asked him what he thought about ChatOps. Here’s what he said:

ChatOps all the things.

Listen to what Hubot said. You’ll love it. Your ops team will love it.

And you’ll help other developers learn how to interact with ops tools without any additional work.

That’s awesome.

Work at GitHubjesse@github.com

If you can’t ChatOps all the things at your gig now, you could always just come work with me at GitHub.

Shoot me an email if you’re interested.

Thanks!

That’s all I have. Thanks for listening! any questions?

Tomorrow @ 8:30 PM

Zeke’s

3rd & BrannanWhile I still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the corner of 3rd and Brannan, everyone’s invited. I’ll see you there.

Thanks again!

top related