agility requires safety

176
AGILITY requires SAFETY

Upload: yevgeniy-brikman

Post on 15-Jan-2017

7.100 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Agility Requires Safety

AGILITYrequires

SAFETY

Page 2: Agility Requires Safety

Every startup has the same story:

Page 3: Agility Requires Safety

“We don’t have time for best practices.”

Page 4: Agility Requires Safety
Page 5: Agility Requires Safety
Page 6: Agility Requires Safety
Page 7: Agility Requires Safety
Page 8: Agility Requires Safety
Page 9: Agility Requires Safety
Page 10: Agility Requires Safety

You can’t go faster by being reckless

Page 11: Agility Requires Safety

Think of cars on a highway

Page 12: Agility Requires Safety

What happens if everyone jams down on the gas?

Page 13: Agility Requires Safety
Page 14: Agility Requires Safety

To go fast, a car needs not only a powerful engine…

Page 15: Agility Requires Safety

But also powerful brakes.

Page 16: Agility Requires Safety

As well as seat belts, airbags, bumpers, and auto-pilot

Page 17: Agility Requires Safety

For cars and for software, speed is limited by safety

Page 18: Agility Requires Safety

What are the seat belts, brakes, & self-driving cars of

software?

Page 19: Agility Requires Safety

This talk is about safety mechanisms

Page 20: Agility Requires Safety

That make it possible tobuild software quickly

Page 21: Agility Requires Safety

I’mYevgeniyBrikmanybrikman.com

Page 22: Agility Requires Safety

Founder of

Atomic Squirrel

atomic-squirrel.net

Page 23: Agility Requires Safety

PAST LIVES

Page 24: Agility Requires Safety

Author ofHello,

Startup

hello-startup.net

Page 25: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 26: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 27: Agility Requires Safety

Good brakes stop your car before you run into something

Page 28: Agility Requires Safety

Continuous integration stops buggy code before it goes into production

Page 29: Agility Requires Safety

Imagine your goal is to build the International Space Station

Page 30: Agility Requires Safety

Each team designs and builds their component in isolation

Page 31: Agility Requires Safety

You launch everything into space and hope it all comes together

Page 32: Agility Requires Safety
Page 33: Agility Requires Safety

I thought the Russians were going to build the bathrooms?

Page 34: Agility Requires Safety

Weren’t the French supposed to do the wiring?

Page 35: Agility Requires Safety

Everyone is using the metric system, right?

Page 36: Agility Requires Safety

Teams working for a long time with incorrect

assumptions

Page 37: Agility Requires Safety

Finding this out when you’re in outer space is too

late

Page 38: Agility Requires Safety

This is the result of “late integration”

Page 39: Agility Requires Safety

Lots of teams working in isolation on separate branches

Page 40: Agility Requires Safety

Before attempting a massive merge at the very end

Page 41: Agility Requires Safety

MERGE CONFLICT

Page 42: Agility Requires Safety

The alternative is “continuous integration”

Page 43: Agility Requires Safety

Where everyone regularly merges their work

Page 44: Agility Requires Safety

The most common approach is

trunk-based development

Page 45: Agility Requires Safety

Everyone works on a single branch (trunk)

Page 46: Agility Requires Safety

That can’t possibly scale to a lot of developers, can it?

Page 47: Agility Requires Safety

Uses trunk-based development for 1,000+ developers

Page 48: Agility Requires Safety

Uses trunk-based development for 4,000+ developers

Page 49: Agility Requires Safety

Uses trunk-based development for 20,000+ developers

Page 50: Agility Requires Safety
Page 51: Agility Requires Safety

Wouldn’t you have merge conflicts all the time?

Page 52: Agility Requires Safety

If you merge (commit) regularly, conflicts are rare.

Page 53: Agility Requires Safety

And those that happen are from a day of work—not months.

Page 54: Agility Requires Safety

Commit early and often.

Page 55: Agility Requires Safety

Small commits are easier to merge, test, revert, review

Page 56: Agility Requires Safety
Page 57: Agility Requires Safety

Wouldn’t there constantly be broken code in trunk?

Page 58: Agility Requires Safety

Build Build Build Build

Not if you run a self-testing build after every commit

Build Build Build Build Build Build Build

Page 59: Agility Requires Safety

Build Build Build Build

It should compile your code and run your automated tests

Build Build Build Build Build Build Build

Page 60: Agility Requires Safety

Build Build Build Build

If a build fails, a developer must fix it ASAP or revert the commit

Build Build Build Build Build Build Build

Page 61: Agility Requires Safety

Of course, this depends on having good automated

tests

Page 62: Agility Requires Safety

Tests give you the confidence to make changes

quickly

Page 63: Agility Requires Safety

JUnit version 4.11

...

Time: 6.063

OK (259 tests)

How long would it take you to do 259 tests manually?

Page 64: Agility Requires Safety

What should you test?

Page 65: Agility Requires Safety

Everything!

Page 66: Agility Requires Safety

Everything!

Page 67: Agility Requires Safety

It’s a trade-off between: 1. Likelihood of bugs2. Cost of bugs 3. Cost of testing

Page 68: Agility Requires Safety

Likelihood of bugs is higher for complex code and large

teams

Page 69: Agility Requires Safety

Cost of bugs is higher for some systems (payments,

security)

Page 70: Agility Requires Safety

Cost of tests is higher for integration and UI tests

Page 71: Agility Requires Safety

“Without continuous integration, your software is broken until somebody proves it works, usually during a testing or integration stage.

Page 72: Agility Requires Safety

With continuous integration, your software is proven to work (assuming a sufficiently comprehensive set of automated tests) with every new change—and you know the moment it breaks and can fix it immediately.”

Page 73: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 74: Agility Requires Safety

Ships have bulkheads to try to contain flooding to one area.

Page 75: Agility Requires Safety

You can split up a codebase to contain problems to one area.

Page 76: Agility Requires Safety

Code is the enemy: the more you have, the slower

you go

Page 77: Agility Requires Safety

Project SizeLines of code

Bug Density Bugs per thousand lines of code

< 2K 0 – 25

2K – 6K 0 – 40

16K – 64K 0.5 – 50

64K – 512K 2 – 70

> 512K 4 – 100

Page 78: Agility Requires Safety

As the code grows, the number of bugs grows even

faster

Page 79: Agility Requires Safety

“Software development doesn't happen in a chart, an IDE, or a design tool; it happens in your head.”

Page 80: Agility Requires Safety

The mind can only handle so much complexity at once

Page 81: Agility Requires Safety

One solution is to break the code into multiple

codebases

Page 82: Agility Requires Safety

Instead of depending on the source of another module

/moduleA

/moduleB /moduleC /moduleD

/moduleE

Page 83: Agility Requires Safety

You depend on a versioned artifact from that module

moduleA-0.3.1.jar

moduleB-3.1.0.jar moduleC-9.8.0.jar moduleD-1.4.3.jar

moduleE-0.5.6.jar

Page 84: Agility Requires Safety

This provides isolation from changes in other modules

moduleA-0.3.1.jar

moduleB-3.1.0.jar moduleC-9.8.0.jar moduleD-1.4.3.jar

moduleE-0.5.6.jar

Page 85: Agility Requires Safety

You already do this: guava-18.0.jar

jquery-2.2.0.js

Page 86: Agility Requires Safety

Advantages of artifacts:

1. Isolation2. Decoupling3. Faster builds

Page 87: Agility Requires Safety

Disadvantages of artifacts:

1. Dependency hell2. No continuous

integration3. Hard to make global

changes

Page 88: Agility Requires Safety

Another option is to break the codebase into services

Page 89: Agility Requires Safety

In a monolith, you use function calls within one process

A.a()

B.b() C.c() D.d()

E.e()

Page 90: Agility Requires Safety

With services, you pass messages between processes

http://A/a

http://B/bhttp://C/c

http://D/d

http://E/e

Page 91: Agility Requires Safety

Advantages of services:

1. Technology agnostic2. Scalability3. Isolation

Page 92: Agility Requires Safety

Disadvantages of services:

1. Operational overhead2. Performance overhead3. I/O, error handling4. Backwards compatibility5. Hard to make global

changes

Page 93: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 94: Agility Requires Safety

Autopilot prevents accidents caused by human error

Page 95: Agility Requires Safety

Automated deployments prevent accidents caused by human error

Page 96: Agility Requires Safety

Deploying code can be painful

Page 97: Agility Requires Safety

“If it hurts, do it more often.” – Martin Fowler

Page 98: Agility Requires Safety

The deployment process should be:

Page 99: Agility Requires Safety
Page 100: Agility Requires Safety

That means you should never deploy or configure

manually

Page 101: Agility Requires Safety

> ssh [email protected]

__| __| __| _| ( \__ \ Amazon ECS-Optimized Amazon Linux AMI 2015.09.d ____|\___|____/

[ec2-user ~]$ sudo apt-get install ruby

Don’t do this

Page 102: Agility Requires Safety

Or this

Page 103: Agility Requires Safety

Instead, automate everything

Page 104: Agility Requires Safety

The gold standard is theblue-green deployment

Page 105: Agility Requires Safety

Let’s say you have version 0.0.1 of your app deployed

Page 106: Agility Requires Safety

First, deploy version 0.0.2 on a duplicate set of servers

Page 107: Agility Requires Safety

If everything looks good, switch the load balancer over to 0.0.2

Page 108: Agility Requires Safety

Four main categories of deployment automation

tools:

Page 109: Agility Requires Safety

1. Configuration management: Chef, Puppet, Ansible, Salt

Page 110: Agility Requires Safety

- name: Install httpd and php yum: name={{ item }} state=present with_items: - httpd - php

- name: start httpd service: name=httpd state=started enabled=yes

- name: Copy the code from repository git: repo={{ repository }} dest=/var/www/html/

Imperative scripts to configure servers and deploy code

Page 111: Agility Requires Safety

2. Provisioning tools: Terraform, CloudFormation, Heat

Page 112: Agility Requires Safety

resource "aws_instance" "example" { ami = "ami-b960b1d" instance_type = ["t2.micro"]}

resource "aws_eip" "ip“ { instance = "${aws_instance.example.id}" depends_on = ["aws_instance.example"]}

Declarative templates that define your infrastructure

Page 113: Agility Requires Safety

3. Virtual machines: VMWare, VirtualBox, Packer, Vagrant

Page 114: Agility Requires Safety

{ "builders": [{ "type": "amazon-ebs", "source_ami": "ami-de0d9eb7", "instance_type": "m1.medium", "ami_name": "example-packer-ami-{{timestamp}}" }], "provisioners": [{ "type": "shell", "inline": [ "sudo apt-get -y update", "sudo apt-get -y install httpd php” ] }]}Images of configured servers

Page 115: Agility Requires Safety

4. Containers: Docker, rkt, LXD

Page 116: Agility Requires Safety

FROM ubuntu:12.04

RUN apt-get update && apt-get install -y apache2 php

ENV APACHE_RUN_USER www-dataENV APACHE_LOG_DIR /var/log/apache2

EXPOSE 80

CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]

Lightweight images of configured servers

Page 117: Agility Requires Safety

These tools allow you to define your infrastructure

as code

Page 118: Agility Requires Safety

That way, you can version it, review it, test it, and

reuse it.

Page 119: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 120: Agility Requires Safety

Elisha Otis demoing elevatorfree-fall

safety in 1854

Page 121: Agility Requires Safety

The safety elevator patent

Page 122: Agility Requires Safety

The safety catches are locked by default

Page 123: Agility Requires Safety

Only an intact cable can unlock thelatches

Page 124: Agility Requires Safety

This elevator provides safety by default

Page 125: Agility Requires Safety

Feature toggles provide safety by default

Page 126: Agility Requires Safety

New feature, part 1

New feature, part 2

New feature, part 3

If a large new feature takes many commits, wouldn’t a user see it in an unfinished state?

Page 127: Agility Requires Safety

<section id="new-section"> <!-- Code for new section--></div><section id="original-section"> <!-- Code for original section--></section>

Let’s say you were adding a new section to your website.

Page 128: Agility Requires Safety

<% if toggles.enabled("new-section") %> <section id="new-section"> <!-- Code for new section--> </div><% end %> <section id="original-section"> <!-- Code for original section--></section>

Wrap new code in a conditional that looks up a feature toggle

Page 129: Agility Requires Safety

<% if toggles.enabled("new-section") %> <section id="new-section"> <!-- Code for new section--> </div><% end %> <section id="original-section"> <!-- Code for original section--></section>

Toggles are off by default, so users won’t see unfinished work

Page 130: Agility Requires Safety

development: feature_toggles: new-section: true

production: feature_toggles: new-section: false

You can enable feature toggles in a config file.

Page 131: Agility Requires Safety

> curl http://feature.toggles/

{ "development": { "new-section": true }, "production": { "new-section": false }}

Or you could create a web service for feature toggles.

Page 132: Agility Requires Safety

> curl http://feature.toggles/?user=123

{ "development": { "new-section": "A" }, "production": { "new-section": "B" }}

It could return different, complex values for each user.

Page 133: Agility Requires Safety

And provide a web UI for configuring toggles.

Page 134: Agility Requires Safety

This allows you to quickly turn features on or off.

Page 135: Agility Requires Safety

<% if toggles.get("new-section") == "A" %> <section id="new-section-bucket-a"> <!-- Code for new section, version A --> </div><% elsif toggles.get("new-section") == "B" %> <section id="new-section-bucket-b"> <!-- Code for new section, version B --> </div><% end %>

This allows A/B testing

Page 136: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 137: Agility Requires Safety

A speedometer tells you how fast you’re driving

Page 138: Agility Requires Safety

Monitoring tells you how your product is performing

Page 139: Agility Requires Safety

“If you can’t measure it, you can’t fix it.” – David Henke

Page 140: Agility Requires Safety

There are many types of monitoring

Page 141: Agility Requires Safety

Availability metrics: is my product up or down?

Page 142: Agility Requires Safety

Useful tools: Keynote, Pingdom, Uptime Robot, Route53

Page 143: Agility Requires Safety

Business metrics: what are my users doing in the product?

Page 144: Agility Requires Safety

Useful tools: Google Analytics, KISSMetrics, Mixpanel

Page 145: Agility Requires Safety

Application metrics: how is my application performing?

Page 146: Agility Requires Safety

Useful tools: New Relic, CloudWatch, Datadog

Page 147: Agility Requires Safety

127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /apache_pb.gif HTTP/1.0" 200 232664.242.88.10 - - [07/Mar/2004:16:05:49] "GET /twiki/bin/ HTTP/1.1" 401 12846127.0.0.1 - - [28/Jul/2006:10:22:04] "GET / HTTP/1.0" 200 221664.242.88.10 - - [07/Mar/2004:16:06:51] "GET /twiki/bin/Twiki/" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02] "GET /mailman HTTP/1.1" 200 6291127.0.0.1 - - [28/Jul/2006:10:27:32] "GET /hidden/ HTTP/1.0" 404 7218192.168.2.20 - - [28/Jul/2006:10:27:10] "GET /cgi-bin/try HTTP/1.0" 200 339564.242.88.10 - - [07/Mar/2004:16:11:58] "GET /twiki/bin/view/" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55] "GET /twiki HTTP/1.1" 200 5253Log files are also a form of application-level monitoring

Page 148: Agility Requires Safety

127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /apache_pb.gif HTTP/1.0" 200 232664.242.88.10 - - [07/Mar/2004:16:05:49] "GET /twiki/bin/ HTTP/1.1" 401 12846127.0.0.1 - - [28/Jul/2006:10:22:04] "GET / HTTP/1.0" 200 221664.242.88.10 - - [07/Mar/2004:16:06:51] "GET /twiki/bin/Twiki/" 200 452364.242.88.10 - - [07/Mar/2004:16:10:02] "GET /mailman HTTP/1.1" 200 6291127.0.0.1 - - [28/Jul/2006:10:27:32] "GET /hidden/ HTTP/1.0" 404 7218192.168.2.20 - - [28/Jul/2006:10:27:10] "GET /cgi-bin/try HTTP/1.0" 200 339564.242.88.10 - - [07/Mar/2004:16:11:58] "GET /twiki/bin/view/" 200 735264.242.88.10 - - [07/Mar/2004:16:20:55] "GET /twiki HTTP/1.1" 200 5253Useful tools: loggly, logstash, Papertrail, Sumo Logic

Page 149: Agility Requires Safety

Server metrics: how is my server performing?

Page 150: Agility Requires Safety

Useful tools: Nagios, Icinga, Munin, collectd, CloudWatch

Page 151: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 152: Agility Requires Safety

Warning lights notify you if something is wrong

Page 153: Agility Requires Safety

Alerting systems notify you if something is wrong

Page 154: Agility Requires Safety

You can’t look at metrics 24/7. Alerting systems can.

Page 155: Agility Requires Safety

Useful tools: PagerDuty, VictorOps

Page 156: Agility Requires Safety

For a full list of monitoring and alerting tools, see:

hello-startup.net/resources

Page 157: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 158: Agility Requires Safety

Seat belts help you survive crashes

Page 159: Agility Requires Safety

High availability helps you survive crashes

Page 160: Agility Requires Safety

Stateless servers: multiple instances, multiple zones

Page 161: Agility Requires Safety

Load balancer routes around server or zone outages

Page 162: Agility Requires Safety

Auto-recovery mechanism brings server back after outage

Page 163: Agility Requires Safety

Stateful servers: multiple instances, multiple zones

Page 164: Agility Requires Safety

Replication to one or more standby servers

Page 165: Agility Requires Safety

Load balancer switches to standby server in case of outage

Page 166: Agility Requires Safety

Auto-recovery mechanism brings server back after outage

Page 167: Agility Requires Safety

Test your recovery process regularly.

Page 168: Agility Requires Safety

1. Brakes2. Bulkheads3. Autopilot4. Safety catch5. Speedometer6. Warning lights7. Seat belt

Outline

Page 169: Agility Requires Safety

Speed is limited by safety

Page 170: Agility Requires Safety

Two cars can drive at 80mph in opposite directions safely…

Page 171: Agility Requires Safety

Because of two yellow lines

Page 172: Agility Requires Safety

It’s worth the time to put these safety mechanisms in

place

Page 173: Agility Requires Safety
Page 174: Agility Requires Safety

For more info, see

Hello, Startup

hello-startup.net

Page 175: Agility Requires Safety

Questions?