continuous delivery with netflixoss

37
Continuous Delivery with Netflix OSS Dan Woods

Upload: daniel-woods

Post on 16-Jul-2015

620 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Continuous Delivery with NetflixOSS

Continuous Delivery with Netflix OSSDan Woods

Page 2: Continuous Delivery with NetflixOSS

/danveloper

[email protected]

Senior Software Engineer: Delivery Engineering

Learning Ratpack

Page 3: Continuous Delivery with NetflixOSS

Overview of Netflix OSS

• Netflix encourages talking to the world about how we’re solving problems

!• We solve a ton of problems that companies both small

and large are faced with !• Shoot to open source as much as possible

Page 4: Continuous Delivery with NetflixOSS

Overview of Netflix OSS

• Netflix is a large consumer of cloud offerings — mostly from AWS

!• We’ve done a ton of work over the years to lift the

infrastructure entirely to the cloud !• Pioneered running at scale on Amazon AWS

Page 5: Continuous Delivery with NetflixOSS

Overview of Netflix OSS

• Developed a massive tool suite to operationalize running in the cloud at scale

!• Teams need to be able to quickly get code running in the

cloud !• Teams needs to quickly be able to see metrics and

performance

Page 6: Continuous Delivery with NetflixOSS

Overview of Netflix OSS

Links: !

http://techblog.netflix.com/ !

http://github.com/netflix !

http://netflix.github.io

Page 7: Continuous Delivery with NetflixOSS

Continuous Delivery

Big Picture:!!

What Does Continuous Delivery Mean At Netflix?

Page 8: Continuous Delivery with NetflixOSS

Continuous Delivery

Big Picture:!!• Immutable Infrastructure !• Tooling the Build System !• Ongoing and Continuous Deployment

Page 9: Continuous Delivery with NetflixOSS

Immutable Infrastructure

• Designing a server to become your unit of deployment !• “Bake” the software into a “pre-cooked” (known-good

configuration) image !• Allows you to test and certify a server image for

distribution !• Walk that server through the phases of test, qa, and

finally to prod

Page 10: Continuous Delivery with NetflixOSS

Immutable Infrastructure

• Builds must be designed in a way that produces an os-package

!• This allows the build to control the manner in which the

server image will be created !• Specify OS-level dependencies (Java, Python, etc) !• Get all the benefits of a version controlled configuration

Page 11: Continuous Delivery with NetflixOSS

Tooling the Build System

• Hundreds, sometimes thousand, of builds that run every day at Netflix

!• Builds need to fit into a somewhat conferment structure to

garner the support of the tooling !• A polyglot stack adds a ton of complexity to designing

the tooling for the build system !• Teams are free to use whatever language or framework or

stack that they want, and we need to do our best to have a handle on the permutations

Page 12: Continuous Delivery with NetflixOSS

Tooling the Build System

• The JVM is the predominant code platform at Netflix !• Many different languages on the JVM, including:

JavaScript, Scala, Groovy, Clojure, Ruby, Python !• The “runner up” runtime is NodeJS !• Lots of new JavaScript stuff starting to come out, starting

to design scalable tooling around JS

Page 13: Continuous Delivery with NetflixOSS

Tooling the Build System

• Netflix has adopted Gradle as its build platform !• Gradle is a JVM-based build system that is capable of

building JVM and non-JVM projects !• Support for dynamically and programmatically designing

builds (loads of flexibility) !• Great open source community, tons of support from

Gradlware

Page 14: Continuous Delivery with NetflixOSS

Tooling the Build System

• Can build plugins for Gradle in Groovy (ahh soo nice :-)) !• Plugins are designed to make it appealing for teams to

conform to the tooling infrastructure !• Custom internal Gradle wrapper applies common

conventions and applies hacks that would be unmanageable at scale

!• The goal of all this is to make teams want to use the build

tooling, so that we can operationalize and manage it for scale

Page 15: Continuous Delivery with NetflixOSS

Continuous Deployment

• Continuous Delivery at Netflix speaks to more than just staging code for deployment

!• The Continuous Delivery story is a follow through, from

source to production !• Continuous Deployment is an integral part of that process

(it means the code running in the cloud!) !• Hands down this is the trickiest and most-fragile part of

the whole process…

Page 16: Continuous Delivery with NetflixOSS

Continuous Deployment

• By this point in the workflow, the code has already been built and baked…

!• We have an immutable server image, and we’re ready to

ship it off to the cloud… !• The complexity is here: “ship it off to the cloud” is an

inherently asynchronous process… !• There are many failure points.!

Page 17: Continuous Delivery with NetflixOSS

Continuous Deployment

What constitutes a successful deployment? !• Every application has a different definition of “success” !• Need to provide tooling so that the process is able to

identify the vectors of success !

Page 18: Continuous Delivery with NetflixOSS

Continuous Deployment

What constitutes a successful deployment? !• Amazon telling us the server has deployed is basically

the equivalent of them saying they pressed the power button

!• Need to consider a successful deployment in terms of

“this server is ready to start taking traffic” !

!

Page 19: Continuous Delivery with NetflixOSS

Continuous Deployment

What constitutes a successful deployment? !• “Ready to start taking traffic” means different things to

different applications: !

• Tomcat has started, and the app is listening? !

• Tomcat has started, app is listening, caches are primed?

!• Tomcat has started, app is listening, and the server

group is in some designated traffic pool (canary)? !

!!

Page 20: Continuous Delivery with NetflixOSS

Continuous Deployment

• Service discovery becomes a very big part of understanding the health of an app

!• Gives the app the responsibility to inform the tool as to its

traffic-taking-readiness !• It would be difficult for the tool to reach out to every

instance to ask it for its health, better to have the instance tell us !

• The tooling now only need to query two places: Amazon and the Service Registry

Page 21: Continuous Delivery with NetflixOSS

Continuous Deployment

• Teams can choose if “Discovery” health should be incorporated into their continuous deployment workflow

!• This may not be necessary; for strictly IPC stack apps, it’s

ok for them to be “up” and to let the IPC client (Ribbon) determine to which instance traffic is routed

Page 22: Continuous Delivery with NetflixOSS

Continuous Deployment

What do we do after success?!!• Once the new version of code is deployed, now what? !• Netflix lumps packages of software into a “cluster”, within

which different versions may run !• For rapid rollback, we need to keep the ancestor server

group around, but take it out of traffic rotation

Page 23: Continuous Delivery with NetflixOSS

Continuous Deployment

What do we do after success?!!• Put the ancestral server group into a “disabled” state !• Inform the service registry that the instances within this

group are no longer accepting traffic !• Most consuming apps will use the service registry to find

their endpoint, so this is sufficient !• For those that use DNS and go through a load balancer,

we remove the instances from associated load balancers as well

Page 24: Continuous Delivery with NetflixOSS

Continuous Deployment

Why not just update the existing config and roll the servers (rolling push)?!!• Rolling push is a bad, bad thing !• While new instances are launching against a new image,

ancestral instances still exist !• Can leave the server group in a half-done state, which

can yield very weird results !• Tooling is built around the server group being the

management target

Page 25: Continuous Delivery with NetflixOSS

Continuous Deployment

Incubating Deployment Strategies…!!• Phased canary

• 25%, 50%, 75%, 100% !• Global push

• Deployment windows to different regions !• Highlander

• Don’t keep the ancestor server group around • This is good for test environments that don’t need

rollback

Page 26: Continuous Delivery with NetflixOSS

Continuous Deployment

Continuous Delivery Tooling!!• Many CD tools are available today from NetflixOSS! !• The puzzle pieces are there for the entire problem

domain !• Tooling for build system packaging, baking immutable

infrastructure, service discovery, continuous deployment, and cluster management

Page 27: Continuous Delivery with NetflixOSS

Build System Tooling

Nebula Gradle Plugins!!• Nebula (like, “space clouds”) is a collection of Gradle

plugins to assist in the continuous delivery workflow !• Often two parts: Nebula and Gradle — The “Gradle” part

is just a Gradle plugin, and you’re on your own to configure it; the “Nebula” part is an opinionated veneer

!• Tons of great plugins, extensive documentation, many,

many, many available videos and presentations on Nebula

Page 28: Continuous Delivery with NetflixOSS

Build System Tooling

Nebula OS Package Plugins!!• The Gradle Side

• Provides mechanism for producing Debian and RPM artifacts • Very straight-forward integration that uses Gradle’s well-known

CopySpec for getting files into an OS structure • Nice DSL for describing OS-level dependencies

!• The Nebula Side

• Derives configuration in a “best fit” kind of way • Provides integration with Gradle’s application plugin to package a

runnable distribution into an OS artifact • Provides ability to produce an OS daemon for your service

!https://github.com/nebula-plugins/nebula-ospackage-plugin

Page 29: Continuous Delivery with NetflixOSS

Build System Tooling

Page 30: Continuous Delivery with NetflixOSS

The Bakery

Baking a Server Image!!• Aminator

• Provides easy creation of package-specific AMIs • Attaches a “Base Image” volume, installs your software package • Takes a snapshot of the volume, resulting in an AMI • This AMI is the immutable infrastructure • AMI will act as our unit of deployment going forward

!!

https://github.com/netflix/aminator

Page 31: Continuous Delivery with NetflixOSS

Service Discovery

Service Registry for Apps!!• Eureka

• Applications can register their own health !

• Integrates tightly with Ribbon to provide inter-app service discovery, load balancing, and fault tolerance

!• Able to be leveraged during the continuous deployment process to

inform as to successful deployments !!

https://github.com/netflix/eureka https://github.com/netflix/ribbon

Page 32: Continuous Delivery with NetflixOSS

Continuous Deployment and Cluster Management

Managing Deployments!!• Asgard

• Provides a UI for managing AWS cloud resources • RESTful API for consumers to be able to script against • Decorates AWS with concepts that are relevant to Netflix’s continuous

delivery infrastructure • This includes the concept of applications and clusters, which is

something that AWS does not have • Standalone, runnable JAR or WAR deployment options

!!

https://github.com/netflix/asgard

Page 33: Continuous Delivery with NetflixOSS

Continuous Deployment and Cluster Management

Page 34: Continuous Delivery with NetflixOSS

Some Harsh Realities…

• All of this stuff is difficult to get up-and-running !• Every tool makes assumptions about account structure,

available resources, naming conventions, etc !• Non-native concepts, like applications and clusters, are

difficult to understand from an outsider’s perspective !• Cost-to-benefit may be low if you’re not adopting the

entire stack

Page 35: Continuous Delivery with NetflixOSS

Getting better…

• Many initiatives underway currently to engage the open source community more directly !

• The goal is to make the barrier for entry very low on getting up-and-running with NetflixOSS !

• Andrew Spyker (@aspyker) is leading the charge for making NetflixOSS plug-and-play…

!• Although, not very much (right now) speaks directly to

gluing tools together for continuous delivery

Page 36: Continuous Delivery with NetflixOSS

Some Resources

• Zero to Cloud: • http://www.oscon.com/oscon2014/public/schedule/detail/34252 • Walks you through a document that shows how to setup your AWS

account • Shows you how to leverage CloudFormation to configure a NetflixOSS

runtime !• Zero to Docker:

• http://techblog.netflix.com/2014/11/zerotodocker-easy-way-to-evaluate.html

• Pre-built Docker images for NetflixOSS components • Provides a quick way to get up-and-running • Not for production use; not in-use at Netflix

Page 37: Continuous Delivery with NetflixOSS

Trying to make this easy on you…

Introducing the Zero to Cloud Gradle Plugin!!

https://github.com/Netflix-Skunkworks/zerotocloud-gradle !

• “Netflix Skunkworks”, so not officially NetflixOSS at this point

!• A single command can initialize a continuous delivery

infrastructure built on NetflixOSS technologies !• Plugin can be utilizes by builds to be the “glue” between

the OS packaging, the Bakery, and Asgard