continuous deployment: the dirty details

Post on 08-May-2015

18.274 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented at ALM Summit 3 in Redmond, WA. January 2013. Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com. http://www.etsy.com/careers

TRANSCRIPT

Mike Brittain@mikebrittainmike@etsy.com

ENGINEERING DIRECTOR

CONTINUOUS DEPLOYMENT

The Dirty Details

CONTINUOUS DEPLOYMENT

The Dirty Details

“OK, sounds cool. But I have some questions...”

CD 100- & 200-levels

- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. feature flags)- DevOps mentality- Metrics and alerting- Automated deploy scripts

credit: photobookgirl (flickr)

CD 100- & 200-levels

- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. feature flags)- DevOps mentality- Metrics and alerting- Automated deploy scripts

- Deploys vs. releases- Decoupled systems, schema changes- How we work: Arch. & Process- Integration and Operations

CD 300 level

credit: photobookgirl (flickr)

www. .com

GROSS MERCHANDISE SALES

http://www.etsy.com/blog/news/2013/notes-from-chad-2012-year-in-review/

DECEMBER 20121.5 Billion page views$117 Million of goods sold6 Million items sold

http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet

175+ Committers, everyone deploys

credit: martin_heigan (flickr)

Very end of 2009Today

DEPLOYMENTS PER DAY

Continuous delivery is a pattern language in growing use in software development to improve the process of software delivery.

~wikipedia

Techniques such as automated testing, continuous integration, and continuous deployment allow software to be developed to a high standard and easily packaged and deployed to test environments, resulting in the ability to rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead.

credit: Stewart, redgen (flickr)

Linux, Apache, MySQL, PHPArchitecture Stack

Memcache, Gearman, Postgresql, Solr, Java, Traffic Server, Hadoop, HBase

credit: Saire Elizabeth (flickr)

Git, Jenkins

2010-today2009

Then Now

Just before we started using CD

15 mins6-14 hours1 person“Deployment Army”

Then Now

Part of everydayworkflow

Special event andhighly orchestrated

Blocked for15 minutes.

15 minutes toredeploy

Blocked for6-14 hours.

6+ hours toredeploy

Then Now

Mainline,minimal linking

and building,rsync,site up

Release branch,database schemas,

data transforms,packaging,

rolling restarts,cache purging,

scheduled downtime

Then Now

1!" #$%Put your face on Etsy.com.

2&# #$%Complete tax, insurance, and

benefits forms.

credit: ktpupp (flickr)

WARNING

GROSS MERCHANDISE SALES

Continuous DeploymentSmall, frequent changes.

Constantly integrating into production.30+ deploys per day.

“Wow... 30 deploys a day.How do you build features so quickly?”

Software Deploy ≠ Product Launch

Deploys frequently gated by config flags(“dark” releases)

$cfg[‘new_search’] = array('enabled' => 'off');$cfg[‘sign_in’] = array('enabled' => 'on');$cfg[‘checkout’] = array('enabled' => 'on');$cfg[‘homepage’] = array('enabled' => 'on');

$cfg[‘new_search’] = array('enabled' => 'off');

$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...

# old and boring search$results = do_grep();

$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...

if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr();} else { # old and boring search $results = do_grep();}

$cfg[‘new_search’] = array('enabled' => 'on');

// or...

$cfg[‘new_search’] = array('enabled' => 'staff');

// or...

$cfg[‘new_search’] = array('enabled' => '1%');

// or...

$cfg[‘new_search’] = array('enabled' => 'users', 'user_list' => 'mike,john,kellan');

Validate in production, hidden from public.

Small incremental changes to the applicationNew classes, methods, controllersGraphics, stylesheets, templatesCopy/content changes

Turning flags on, off, or % ramp up

What’s in a deploy?

Latent bugs and security holesTraffic management, load sheddingAdding and removing infrastructure

Tweaking config flags or releasing patches.

Low MTTR (response times)

http://www.flickr.com/photos/flyforfun/2694158656/

http://www.flickr.com/photos/flyforfun/2694158656/

OperatorConfig flags

Metrics

Favorites“on”

Favorites“off”

Many deploys eventually lead to a product launch.

“How do you continuously deploy database schema changes?”

Code deploysSchema changes

~15-20 minutes

Code deploysSchema changes

~15-20 minutesTHURSDAYS!

Our web application is largely monolithic.

Etsy.com, Support & Back-office tools, Developer API, Gearman (async work)

Our web application is largely monolithic.

Etsy.com, Support & Back-office tools, Developer API, Gearman (async work)

PHP, Apache, Memcache

External “services” are not deployed with the main application.

e.g. Databases, Search, Photo storage, Payments

External “services” are not deployed with the main application.

e.g. Databases, Search, Photo storage, Payments

MYSQL(schema changes)

SOLR, JVM(rolling restarts)

PROXY CACHE, FILERS, AMAZON S3

(specialized infra.)

PCI(controlled access)

For every config flag, there are two stateswe can support — present and future.

For every config flag, there are two stateswe can support — present and future.

... or past and present.

“Non-Breaking Expansions”Expose new version in a service interface;

support multiple versions in the consumer.

Example: Changing a Database SchemaMerging “users” and “users_prefs”

RULE OF THUMB:Prefer ADDs over ALTERs (non-breaking expansion)C

1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version

0. Add new version to schema1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version

0. Add new version to schemaSchema change to add prefs columns to “users” table.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “off”“read_prefs_from_users_table” => “off”

1. Write to both versionsWrite code for writing prefs to the “users” table.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “off”

2. Backfill historical dataOffline process to sync existing data from “user_prefs”to new columns in “users”

3. Read from new versionData validation tests. Ensure consistency both internallyand in production.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “staff”

3. Read from new versionData validation tests. Ensure consistency both internallyand in production.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “1%”

3. Read from new versionData validation tests. Ensure consistency both internallyand in production.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “5%”

3. Read from new versionData validation tests. Ensure consistency both internallyand in production.

“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”

(“on” == “100%”)

4. Cut-off writes to old versionAfter running on the new table for a significant amount of time, we can cut off writes to the old table.

“write_prefs_to_user_prefs_table” => “off”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”

“Branch by Astraction”

Controller Controller

Users Model

“users” (old) “user_prefs” “users”

old schema new schema

(Abstraction)

http://paulhammant.com/blog/branch_by_abstraction.htmlhttp://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/

1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version

“The Migration 4-Step”

“When do you clean up all of those config flags?

We might remove config flags for the old version when...

It is no longer valid for the business.It is no longer stable, maintained, or trusted.It has poor performance characteristics.The code is a mess, or difficult to read.We can afford to spend time on it.

Promote “dev flags” to “feature flags”

// Feature flag$cfg[‘mobilized_pages’] = array('enabled' => 'on');

// Dev flags$cfg[‘mobile_templates_seller_tools’] = array('enabled' => 'on');$cfg[‘mobile_templates_account_tools’] = array('enabled' => 'on');$cfg[‘mobile_templates_member_profile’] = array('enabled' => 'on');$cfg[‘mobile_templates_search’] = array('enabled' => 'off');$cfg[‘mobile_templates_activity_feed’] = array('enabled' => 'off');

...

if ($cfg[‘mobilized_pages’] == ‘on’ && $cfg[‘mobile_templates_search’] == ‘on’) { // ... // ...}

// Feature flags$cfg[‘search’] = array('enabled' => 'on');$cfg[‘developer_api’] = array('enabled' => 'on');$cfg[‘seller_tools’] = array('enabled' => 'on');

$cfg[‘the_entire_web_site’] = array('enabled' => 'on');

// Feature flags$cfg[‘search’] = array('enabled' => 'on');$cfg[‘developer_api’] = array('enabled' => 'on');$cfg[‘seller_tools’] = array('enabled' => 'on');

$cfg[‘the_entire_web_site’] = array('enabled' => 'on');$cfg[‘the_entire_web_site_no_really_i_mean_it’] = array('enabled' => 'on');

Architecture and Process

Deploying is cheap.

Releasing is cheap.

Some philosophies on product development...

Gathering data should be cheap, too.staff, opt-in prototypes, 1%

Treat first iterations as experiments.

Get into code as quickly as possible.

“Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time. Hence plan to throw one away; you will, anyhow.”

~ Fred Brooks, The Mythical Man-Month

Architecture largely doesn’t matter.

Kill things that don’t work.

Your assumptions will be wrongonce you’ve scaled 10x.

“We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.”

~Kellan Elliott-McCrea, CTO

Become really good at changingyour architecture.

Invest time in architecture by the2nd or 3rd iteration.

Integration and Operations

WARNING

REMEMBER THIS?

Continuous DeploymentSmall, frequent changes.

Constantly integrating into production.30 deploys per day.

Code review before commit

Automated tests before deploy

Why Integrate with Production?

Dev ≠ Prod

Verify frequently and in small batches.

Integrating with production is a test in itself.We do this frequently and in small batches.

More database servers in prod.Bigger database hardware in prod.More web servers.Various replication schemes.Different versions of server and OS software.Schema changes applied at different times.Physical hardware in prod.More data in prod.Legacy data (7 years of odd user states).More traffic in prod.Wait, I mean MUCH more traffic in prod.Fewer elves.Faster disks (SSDs) in prod.

Using a MySQL database in dev for an application that will be runningon Oracle in production: Priceless

Verify frequently and in small batches.

Dev ⇾ QA ⇾ Staging ⇾ Prod

Dev ⇾ QA ⇾ Staging ⇾ Prod

Dev ⇾ Pre-Prod ⇾ Prod

Test and integrate where you’ll see value.

Config flags (again)

off, on, staff, opt-in prototypes, user list, 0-100%

Config flags (again)

off, on, staff, opt-in prototypes, user list, 0-100%

“canary pools”

Automated tests after deploy

Real-time metrics and dashboardsNetwork & Servers, Application, Business

SERVER METRICS

Apache requests/sec, Busy processes,CPU utilization, Script exec time (med. & 95th)

APPLICATION METRICS

Logins, Registrations, Checkouts, Listings created, Forum posts

Time and event correlated.

Real humans reporting trouble!

“Theoretical” vs. “Practical” Engineering

http://www.flickr.com/photos/flyforfun/2694158656/

OperatorConfig flags

Metrics

Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas(~30 days)

Managing risk during Holiday Shopping season

Code Freeze?

“Code Slush”

DEPLOYMENTS PER DAY

Tighten your feedback cyclesIntegrate with production and validate early in cycle.Use tools that allow you to detect issues early.Optimize for quick response times.

Applied to both feature development and operability.

Thank you

Mike Brittain

... and questions?

These slides will be available later today at http://mikebrittain.com/talks

@mikebrittainmike@etsy.com

ENGINEERING DIRECTOR

top related