puppet camp boston 2014: orchestrating infrastructure change using puppet rake, mcollective, lm and...

Post on 14-May-2015

552 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Orchestrating Infrastructure Change Using Puppet Rake, mcollective, LM and Jenkins presented by Anton Gurov and Chaminda Delpagodage, Paydiant at Puppet Camp Boston 2014

TRANSCRIPT

Application Deployment Orchestrationwith Puppet and JenkinsAnton Gurov, Chaminda Delpagodage

August 20, 2014

22

About Us

Chaminda DelpagodagePaydiant Technical Operations TeamRelease Engineering, Systems Administration, Automationlinkedin.com/in/chamindad

Anton GurovPaydiant Technical Operations TeamInfrastructure, Systems Administration, Securitylinkedin.com/in/antongurov

33

Cloud-based mobile wallet solution

Open ecosystem for mobile payments, offers and loyalty

Completely white-label

“Bank grade” platform of shared services↘ SaaS

↘ Secure SDKs for iPhone and Android

Top tier investors and well capitalized

44

Paydiant Puppet Use

Puppet Enterprise (PE) users since day one

100% PE coverage of Paydiant platform↘ PE handles everything after instance bootstrap

Multiple environments actively managed by PE↘ 4 Puppet Masters in multiple datacenters and security zones

↘ 8 Environments

Licensed node count doubling every year

Hosts0

100200300400500600700800900

Estimated by Year-End

Nodes under man-agement

2011 20122013 2014 EST

55

Paydiant Puppet Use

‘11-12 – Bi-annual production platform releases↘ Waterfall – major platform change

↘ Big outage – 1-2 days on the weekend

‘13-14 – Transition to daily/weekly non-production and monthly production releases

↘ Agile – smaller platform changes

↘ Zero-downtime deployment

↘ 100% Production release success rate since inception

Heavy usage of Puppet Dashboard, Puppet APIs and Jenkins

66

Puppet Dashboard as data repository

Why Dashboard?↘ Visual, flexible, powerful (if used right)

↘ Allows for business data edits by teams unfamiliar with Puppet

↘ Hiera not available at the time

Decided early on to keep Puppet code and data separate

Came up with our own Dashboard pattern – “Classes, Parameters and Supergroups”

Puppet Module

Code

PuppetDashboard

BusinessData

Puppet Module

Parameters

77

Puppet Dashboard as data repositoryClasses, Parameters and Supergroups pattern overview

class_C

supergroup_type_A

class_Bclass_A

parameters_X parameters_Y parameters_Z…

node 1 node 2 node X…node 4node 3

Groups

Nodes

88

Puppet Dashboard as data repositoryClasses, Parameters and Supergroups pattern overview

class_C

supergroup_type_B

class_Bclass_A

parameters_X parameters_Y parameters_Z…

node 1 node 2 node X…node 4node 3

Groups

Nodes

99

class_Bdef: default paramsincl:

Puppet Dashboard as data repositoryClass building block

class B

class A class B

class_Adef: default paramsincl: class A

class_Cdef: default paramsincl: class C

class C

Classes

Groups

Group name prefixed with class_Contains Puppet class and some default variables/parameters for the class

1010

Puppet Dashboard as data repositoryClass building block - example

1111

Puppet Dashboard as data repositoryParameters building block

Group name prefixed with parameters_

Only contains data and data overrides

Arbitrary hierarchy levels

Allows for inheritance and reuse

parameters_X_1incl:def: params overridesdef: additional params

parameters_Xdef: default params

parameters_X

supergroup_A supergroup_B

parameters_X_2incl:def: params overridesdef: additional params

parameters_X

supergroup_C

1212

Puppet Dashboard as data repositoryParameters building block – inheritance example

1313

Puppet Dashboard as data repositorySupergroup building block == server “role”

Group name prefixed with supergroup_

Contains all the “ingredients” for the node to configure and define itself

Node can belong to only one supergroup (many-to-one)

supergroup_type_Aincl:

def: params overrides (if any)def: additional params (if any)

class_B

class_A parameters_X

parameters_Z

node 1 node 2

Groups

Nodes

class_B

class_A

parameters_X

parameters_Z

1414

Puppet Dashboard as data repositorySupergroup building block - example

2-3 pages condensed

1515

Classes, Parameters and Supergroups pattern Pros

All parameters and classes are visible on the Supergroup page↘ See missing parameters (if inherited “SET ME!” from parent for example)

↘ See parameter clashes (Dashboard will warn if parameter is defined in 2 places)

↘ See exactly where parameter is defined

Allows teams unfamiliar with Puppet to make changes via Dashboard

Arbitrary data hierarchy/inheritance

Data reuse

1616

Classes, Parameters and Supergroups pattern Cons

Version control is difficult↘ Have to resolve to group cloning/export/import (custom RAKE copy/clone command from Puppet support)

↘ Puppet roadmap to fix this

Dashboard UI could use some help↘ Too much data on the screen sometimes

↘ Lack of sorting/grouping

Can’t store complex multi-line variables like text blobs

Zero-Downtime Deployment architecture …

v.1

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.1

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.1

parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=1

v.1

High-level platformrepresentation

parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=1

parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=1

parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=1

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.1

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.1

Disable B(FE+BE)

v.1v.1

parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1

parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1

v.2a

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.1

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.1

Run first phase of database changes(i.e. adds new stuff & migrate data)

v.2aDB changes Phase 1

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Upgrade B (FE+BE)

v.2av.2a

parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2

parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Re-enable B (FE+BE)

v.2av.2a

parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=2

parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=2

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Disable A(FE+BE)

v.2av.2a

parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1

parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1

FrontendLoad Balancer

FE-B FE-Av.2

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.2

FE-B BE-Bv.2

Upgrade A (FE+BE)

v.2av.2a

parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2

parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2

FrontendLoad Balancer

FE-B FE-Av.2

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.2

FE-B BE-Bv.2

Re-enable A (FE+BE)

v.2av.2a

parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=2

parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=2

v.2

FrontendLoad Balancer

FE-B FE-Av.2

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.2

FE-B BE-Bv.2

Run second phase of database changes(Cleanup old v.1 data)

v.2DB changes Phase 2

Details of the upgrade sequence …

v.1

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.1

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.1

Putting a set of nodes into maintenance mode

2929

Putting nodes into maintenance mode Using LB node health check – http://nodeX:8080/healthcheck.jsp

Puppet ERB template for healthcheck.jsp content

………

Pseudo code:Check if “maintenance mode” throw exception elseIf “module A” present

Check if module A is upIf “module B” present

Check if module B is up…Throw 503 if any exception caught

3030

Putting nodes into maintenance mode cont.

A parameter group controls the maintenance mode

E.g. Parameter group “parameters_deployment-staging-BankB” controls “paydiant_app_operation_mode” for the nodes in set FE-B of the Staging environment

3131

Putting nodes into maintenance mode cont.

Update group parameter using Rake API (as ‘puppet-dashboard’ user)

RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefile nodegroup:variables [parameters_deployment-stagin-BankB, 'paydiant_app_operation_mode=MAINTENANCE’]

Puppet run-once using MCO (as ‘peadmin’ user)

mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

While loop… check the health check page till all nodes return 503 (i.e. in maintenance) status

mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd=\''curl --silent http://localhost:8080/healthcheck/healthcheck.jsp

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Upgrading applicationson a set of nodes

v.2a

3333

Upgrading Application Version

Disable Puppet agent

mco puppet disable --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

Stop Tomcat service

mco service tomcat stop --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

Cleanup exploded Tomcat webapps directory (for sanity)

mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd='find $TOMCAT_HOME/webapps/ -maxdepth 1 -mindepth 1 -type d -exec rm -rf {} \;’

3434

Upgrading Application Version Cont.

Upgrade the application version

RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefile nodegroup:variables [parameters_deployment-stagin-BankB, ’paydiant_app_version=2’]

Re-enable Puppet

mco puppet enable --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

Puppet run-once

mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Taking a set of nodes out ofmaintenance mode

v.2a

3636

Taking nodes out of maintenance mode

Update parameter using Rake API (as ‘puppet-dashboard’ user)

RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefile nodegroup:variables [parameters_deployment-staging-BankB, 'paydiant_app_operation_mode=LIVE’]

Puppet run-once using MCO (as ‘peadmin’ user)

mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B

While loop… check the health check page till all nodes return 200 (i.e. live) status

mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd=\''curl --silent http://localhost:8080/healthcheck/healthcheck.jsp

FrontendLoad Balancer

FE-B FE-Av.1

FE-B FE-Bv.2

BackendLoad Balancer

FE-B BE-Av.1

FE-B BE-Bv.2

Switching traffic toupgraded stack

v.2a

Viewing transition in Splunk across multiple datacenters

Version Transition

Jenkins …

4040

What is Jenkins

Tool to schedule and monitor the execution of repeated jobs

4141

Why Jenkins ?

Configurability↘ Different types of input parameters

↘ Invoke shell scripts

↘ Post-build actions (automatic/manual)

4242

Why Jenkins ? cont.

Plugin support↘ More than 600 plugins (https://wiki.jenkins-ci.org/display/JENKINS/Plugins)

↘ Eg. vSphere plugin (stop/start, snapshots, rollbacks…)

↘ Build pipeline plugin

↘ Parameterized remote trigger plugin

4343

Why Jenkins ? cont.

Keeps all your console logs at a single place↘ No need to hunt for 10 log files on 5 different machines

↘ Visual representation of passed/failed/in-progress status, based on downstream shell scripts or other jobs

4444

Why Jenkins ? cont.

And it’s…

MCO

Rake API

DB FE-B FE-* FE-B BE-*

Source code, liquibase

change sets

4646

Jenkins – Puppet Integration

4747

Jenkins – Puppet Integration cont.

4848

Jenkins – Puppet Integration cont.

4949

Jenkins – Puppet Integration cont.

5050

Jenkins – Puppet Integration cont.

Jenkins invoke local bash scripts, which in turn use SSH to call;↘ MCO (as ‘peadmin’ user on Puppet Master)

↘ Rake API (as ‘puppet-dashboard’ user on Puppet Master)

SSH login as ‘peadmin’ and ‘puppet-dashboard’ is password-less, using PKI↘ Generate RSA keypair for the local Jenkins user, using ssh-keygen command

↘ Append the public key to ~/.ssh/authorized_keys file of ‘peadmin’ and ‘puppet-dashboard’ users, on Puppet Master

MCO special purpose sub commands we use;↘ puppet

↘ service

↘ shellcmd* (ask your Puppet Enterprise Support for this custom MCO plugin)

5252

Recap/Takeaways… Use Puppet Enterprise

↘ Support is awesome (Celia Cottle, Jay Wallace, Ken Johnson, Zachary Stern – you guys rock!)

↘ Gotten help and features from James Turnbull and Nigel Kersten with some early versions of PE

↘ Live management and Mcollective are essential for any self-respecting enterprise

Zero-downtime upgrades↘ To Dashboard or not to Dashboard?

↘ Database update phases

↘ Managing LB health check monitors dynamically using Puppet

Automation baby steps – don’t boil the ocean↘ Understand what you are doing before automating it - develop runbooks

↘ Identify manual steps and script some of them

↘ Add scripts to orchestration tool (Jenkins, ServiceNow, whatever else you use in-house)

Thank you.

top related