test driven infrastructure · • immutable infrastructure • treat your running environment as a...

Test Driven Infrastructure

Jess My pronouns are he/him/his;

I identify as 'Ops'.

As Ops I make users happy keeping the service upby

As Ops I

make sure the site is availableensure resource efficiency

protect uptimetroubleshoot issues

keep an eye on service latency

validating responsesmonitoring the networkproviding canary environmentsmeasuring resource consumption practicing chaos engineeringstudying post-mortemsaggregating logs

user pages only display partial content -- what's wrong?

aws s3 ls

Make a controlled change:curl

1) Observe the current state

2) Ask yourself how your change will manifest3) Make your change4) Validate change; Observe new state

5) Did your change impact all of, and only, what you wanted to change?

OpsDev

photo by: Marcel Quinan

OpsDev

What we've learned• software development lifecycle

• central, controlled environment for repeatable infrastructure builds• immutable infrastructure

• treat your running environment as a black box: easier to version, easier to replace• a unit of composition that's easier to reason about because it's composition is known

and certain• version control

• time travel!• a shared repository allows everyone to see (and declare) what the state of the

environment should be• can compare intended configuration to the running environment to find discrepancies

• infrastructure-as-code• repeatable infrastructure

Test Driven Development

Write a failing test

Write enough code

to make it pass

Refactor

Write enough code

to make it pass

Refactor

import unittest from mycode import * class MyFirstTests(unittest.TestCase): def test_hello(self): self.assertEqual(hello_world(), 'hello world')

def hello_world(): pass

def hello_world(): return 'hello world'

def hello_world(lang='en'): if (lang == 'en'): return 'hello world'

Write enough code

to make it pass

Refactor

import unittest from mycode import * class MyFirstTests(unittest.TestCase): def test_hello_en(self): self.assertEqual(hello_world(), 'hello world') def test_hello_es(self, lang='es'): self.assertEqual(hello_world(), 'hola mundo') def hello_world(lang='en'):

if (lang == 'en'): return 'hello world'

def hello_world(lang='en'): if (lang == 'en'): return 'hello world' if (lang == 'es'): return 'hola mundo'

Write enough code

to make it pass

Refactor

Make a controlled change:curl

1) Observe the current state

2) Ask yourself how your change will manifest3) Make your change4) Observe new state

5) Did your change impact all of, and only, what you wanted to change?

Write enough code

to make it pass

Refactor

$ curl -sko /dev/null -w @status_format \ https://example.com/path/to/microservice {'http_code': '404', 'time_total': '0.005953'}

rspec# https://puppet.com/blog/unit-testing-rspec-puppet-for-beginners require 'spec_helper' describe 'nginx' do let(:title) { 'nginx' } let(:node) { 'example.com' }

it { is_expected.to contain_package(‘nginx’).with(ensure: 'present') } it { is_expected.to contain_file(‘/var/www/index.html') .with( :ensure => 'file', :require => 'Package[nginx]', ) } it { is_expected.to contain_service(‘nginx') .with( :ensure => 'running', :enabled => true, ) } end

inspec# https://github.com/inspec/inspec/blob/master/examples/kitchen-chef/test/integration/default/web_spec.rb describe package('nginx') do it { should be_installed } end # extend tests with metadata control '01' do impact 0.7 title 'Verify nginx service' desc 'Ensures nginx service is up and running' describe service('nginx') do it { should be_enabled } it { should be_installed } it { should be_running } end end # implement os dependent tests web_user = 'www-data' web_user = 'nginx' if os[:family] == 'centos' describe user(web_user) do it { should exist } end

goss# https://github.com/aelsabbahy/goss # `goss validate` for a one-time check # `goss serve` for a local http endpoint port: tcp:22: listening: true ip: - 0.0.0.0 service: sshd: enabled: true running: true process: sshd: running: true

We have tools to unit test configuration management

kubernetes liveness probes# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ --- apiVersion: v1 kind: Pod spec: containers: - name: liveness image: k8s.gcr.io/liveness args: - /server livenessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3

kubernetes readiness probes# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ --- apiVersion: v1 kind: Pod spec: containers: - name: liveness image: k8s.gcr.io/liveness args: - /server readinessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 5 periodSeconds: 5

kubernetes liveness probes w/goss--- apiVersion: v1 kind: Pod spec: containers: - name: goss_liveness image: goss_liveness livenessProbe: exec: command: [“goss”, “validate”, “-g”, “goss.yaml”] initialDelaySeconds: 3 periodSeconds: 3

We have tools for functional testing

Smoke Test Deployments$ cat Jenkinsfile ... stage("Validate") { steps { container('inspec') { // Initiate validation tests script { env.TARGET = “app-${params.CLUSTER}.example"

sh """ inspec exec validate/endpoints.rb """ } } } ...

Inspec HTTP Resourcetests = { ‘test site response’ => { 'host' => 'validate.example.com', 'path' => '/' }, } nginx_proxy = ENV[‘TARGET'] || ‘localhost’ # point at deployed location tests.each do |testname, testdata| control testname.to_s do impact 1.0 host = testdata['host'] path = testdata['path'] title "curl -k https://#{nginx_proxy}#{path} -H 'host: #{host}'" desc "#{path} with #{host} should work." describe http("http://#{nginx_proxy}#{path}", headers: { 'host' => host, 'User-Agent' => "jenkins" }) do its('status') { should cmp 200 } end end end

We have tools for functional testing

user pages only display partial content -- what's wrong?

aws s3 ls

Write enough code

to make it pass

Refactor

{ "query": "avg(last_5m):avg:http.2xx_responses{endpoint:location-service} by {host} / avg(last_5m):avg:http.total_responses{endpoint:location-service} < 0.95", "message": "Healthy response volume dropped\n@slack-demo-monitors-nonprod", "name": "Demo: response volume", "type": "metric alert" }

Infrastructure

$ # https://github.com/DataDog/datadogpy $ dog monitor show <monitor_id> # dumps json $ dog monitor show <monitor_id> > monitor_id.json $ dog monitor fupdate monitor_id.json

Implementation:

monitorfile$ ls -1aF ./ ../ .git/ .gitignore Dockerfile Jenkinsfile README.md app/ monitorfile

{ "tags": [ "app:demo", "endpoint:location-service", "environment:non-prod" ], "query": "avg(last_5m):avg:http.2xx_responses{endpoint:location-service} by {host} / avg(last_5m):avg:http.total_responses{endpoint:location-service} < 0.95", "message": "Healthy response volume dropped\n@slack-demo-monitors-nonprod", "name": "Demo: response volume", "type": "metric alert", "options": { "thresholds": { "critical": 0.95, "warning": 0.9 } } }

jenkinsfile$ ls -1aF ./ ../ .git/ .gitignore Dockerfile Jenkinsfile README.md app/ monitorfile

... stage(“configure monitor") { steps { container('datadog') { script { sh """ dog monitor fupdate monitorfile """ } } } } ...

We monitor services to understand their performance.

Picking appropriate monitors that represent that represent that service well creates a Service Level Indicator (SLI).

A Service Level Objective is simply stating what target levels we want for that SLI.

Service Level Agreements are published to users; they describe intentions for the service, and recourse for missed service.

SLOs with Datadog$ dog screenboard show k9m-b2s-df3 { "widgets": [ { "title_text": "Demo: order service error rate (non-2xx)", "source": "single_monitor", "type": "uptime", "showErrorBudget": true "sliType": "time", "monitorIds": [ 7117003 ], "timeframes": [ "7 days" ], "rules": { "0": { "threshold": 98, "color": "red", "timeframe": "7 days" } }, "scaleFactor": 1, ...

Automated Visibility

T E L S

New Dev• Immediately sees application Architecture• Immediately understands critical functionality for microservices• Immediately starts building a sense of scale/performance/load

Operability should be a design consideration —build testing in from the start

Questions?

jmales@gmail.com http://x47industries.com

test driven infrastructure · • immutable infrastructure • treat your running environment as a...

Documents

immutable infrastructure · immutable infrastructure...

aws re:invent 2016: life without ssh: immutable...

immutable infrastructure - secure.trifork.com€¦ ·...

loft your web platform into the clouds with immutable...

continuous delivery using blue-green deployments and...

emc world 2016 - code.02 introduction to immutable...

belden infrastructure solutions...

understand immutable infrastructure, what? why? how? -...

devops and immutable infrastructure - cloud expo 2015 nyc

9 immutable laws

immutable infrastructure with boxfuse

presidential campaigns & immutable infrastructure · •...

understand immutable infrastructure - at build stuff kiev...

immutable infrastructure in the mesos ecosystem ·...

immutable infrastructure in nanapi

マイクロサービスで、一歩先行くimmutable...

meta infrastructure as code: how capital one automated our...

immutable infrastructure security

immutable infrastructure with docker and ec2 docker conf ...

immutable infrastructure deployment€¦ · immutable...