devops in a regulated and embedded environment (agiledc)

39
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 1 Agility. Security. Delivered. DevOps in a Regulated and Embedded Environment By: Arjun Comar (Was DevOps on a Legacy Project) twitter: @arjuncomar email: [email protected]

Upload: arjun-comar

Post on 15-Apr-2017

54 views

Category:

Business


1 download

TRANSCRIPT

Page 1: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

1

Agility. Security. Delivered.

DevOps in a Regulated and Embedded Environment

By: Arjun Comar(Was DevOps on a Legacy Project)

twitter: @arjuncomar email: [email protected]

Page 2: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

2

Agenda• About Me• Agile, DevOps, and Medical Devices: What’s the Problem?• Git Flow in a Regulated World• Expect to Deploy• Scaling for Success and Resource Management• Questions

twitter: @arjuncomar email: [email protected]

Page 3: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 3

About Me• B.S. in Computer Science from the Rose-

Hulman Institute of Technology• Worked on everything from the Linux

kernel to computer vision.• Interested in software quality and

correctness.• Been with Coveros for ~2.5 years.• Run the local HaskellDC meetup group.

twitter: @arjuncomar email: [email protected]

Page 4: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 4

About Coveros• Coveros builds security-critical applications using

agile methods.• Coveros Services

• Agile transformations• Agile development and testing• DevOps and continuous integration• Application security analysis

• Agile & Security training• Government qualifications

• DCAA approved rates and accounting• TS facility clearance

Areas of Expertise

twitter: @arjuncomar email: [email protected]

Page 5: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 5

Select Clients

twitter: @arjuncomar email: [email protected]

Page 6: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

6

Medical Devices and the Law• It isn’t sufficient to write the code, release requires regulatory

approval.• Approval is per feature (epic)

• Contingent on development, testing, risk mitigation, etc.• We want short-lived branches, but…• If we don’t get approval for one feature, business still wants to release

the others• Unmerge all the feature branches that went into an epic?

• Further requirements around documentation, especially:• Design• Testing• Risk Management

twitter: @arjuncomar email: [email protected]

Page 7: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

7

Legacy Problems• C code, embedded device target

• cross compilation: Windows -> QNX• Some modules only built on WinXP• Manual build, deploy, test process• Custom hardware, custom firmware

• Old codebase, not written to be unit tested• Unit test execution requires target environment• Rough order of magnitude, 200 kloc codebase• Hardware platform ~25 years old

twitter: @arjuncomar email: [email protected]

Page 8: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

8

Integration and Deployment• Manual builds, deploy to unit test?• Unmaintained deployment scripts

• Written by a contractor in ksh, • Last maintainer had already left the company• Working deployments flashed unit with usb stick and physical dongle

• Rewrite with Chef? ...Ansible? … Bash? • try: sh run over telnet• No ruby, python, perl, bash, ssh, dhcp

• Network deployments/updates to a device that goes in a human being…?

twitter: @arjuncomar email: [email protected]

Page 9: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

9

Feedback Cycles• Deployments took ~30 minutes and required physical interaction

through the process• Testing involved long protocols with detailed and very particular

steps• ~5-6 weeks for the test team, maybe 8 weeks, but at least 3-4.

• Release cycle on the order of years.

twitter: @arjuncomar email: [email protected]

Page 10: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

10

Resource Needs and Team Size• Business wanted multiple features in development in parallel• Different tests take different lengths of time to run

• even when automated• seconds -> weeks

• Business needed 4 teams like the one they had• Continuous integration targets, unit test targets, deployment

testing targets, full functional test targets, partially automated test targets

• Performance, reliability, security, durability, etc.?

twitter: @arjuncomar email: [email protected]

Page 11: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

11

SolutionsOne thing at a time...

twitter: @arjuncomar email: [email protected]

Page 12: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

12

Git Flowin a Regulated World

twitter: @arjuncomar email: [email protected]

Page 13: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

13

Git Workflows• Linux Kernel: benevolent dictator, many trusted lieutenants, an

insane number of contributors.• GitHub: Single (or small team) of maintainers, contributors submit

pull-requests• Corporate git usage: Trusted team of developers, co-maintain

shared repository

twitter: @arjuncomar email: [email protected]

Page 14: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

14

Enter: Git Flow

twitter: @arjuncomar email: [email protected]

Page 15: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

15

But I can’t merge back daily...• No, really. Daily merges back to develop means pulling an epic out

requires a virtually impossible unmerge.• Might be legally required not to go forward with a feature• Can’t get approval until feature is developed and tested with

known risks documented and mitigated• Business still wants to release what they can

twitter: @arjuncomar email: [email protected]

Page 16: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

16

Can’t not integrate...• Long lived lines of development, all separate• Tested independently prior to release• Business wants to release, integrate necessary branches and…• Disaster: merge conflicts, retest everything, unknown interactions

everywhere

twitter: @arjuncomar email: [email protected]

Page 17: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

17

Extending the workflow to deal with regulation

Extend the git flow modelKeep epic specific code in ‘develop/epic-

name’ branchesUse ‘feature/epic-name/feature-name’

branches for daily workMerge these back daily!

Epic branches get merged back for a release

twitter: @arjuncomar email: [email protected]

Page 18: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

18

Integrating Continuously• Use tooling to manage the problem for you• Have Jenkins (or your CI stack of choice) do builds by merging

develop with the epic branches first• develop holds code that will be released, features that conflict must be

fixed• Run the normal deployment and testing cycle on these builds• merge conflicts are failed builds

twitter: @arjuncomar email: [email protected]

Page 19: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

19

Integrating even more continuously• Still need to know if there’s potential conflicts between epic

branches• fail early, fail often, right?• Take all the epic branches and merge them with develop• Run a full build/deploy/test cycle on this mess as well.• Any failures found -> failed build

• If it doesn’t cleanly merge, we can’t release, right?• The software should always be ready to release; make it a business

decision, not a technical one.

twitter: @arjuncomar email: [email protected]

Page 20: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

20

Digging deeper to unearth conflict• Better error detection and reporting:

• If we merge everything together, it looks like the later branches cause conflicts more often

• Branches that conflict exclude each other• Find conflicting pairs and report them both as failed• Conflicts may only show up with the interaction of 3+ branches

• But this gets exponentially hard to detect

twitter: @arjuncomar email: [email protected]

Page 21: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

21

Do what you can• Merge all possible epic branch pairs together, track+report failures

• Report these failures once or the team will ignore you...• Branches that cleanly merge with everything get merged together

with development and built• This assesses the health of the software as it exists at this moment• This might be expensive, so do it overnight.• Shortcuts:

• If ‘A’ merges with ‘B’, then ‘B’ merges with ‘A’• ‘A’ always merges with ‘A’• (You only need the top half of the n x n matrix)

twitter: @arjuncomar email: [email protected]

Page 22: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

22

This is a lot of work...• Long-lived branches are hard to deal with.• You could even go further and build the sets of conflicting branches

that can be merged together• This is really hard; it’s easier to ask the team to fix the mess.

• If you don’t have to do it, don’t.• You probably don’t unless regulatory constraints make you.

twitter: @arjuncomar email: [email protected]

Page 23: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

23

Expect to DeployWhat a lifesaver

twitter: @arjuncomar email: [email protected]

Page 24: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

24

Expect?• Tcl scripting language used to automate interactive programs• ...like telnet and ftp• Was used to automate testing way back in the day• Turns out to be rather perfect for scripting deployments, testing,

etc. in this tool restricted environment• sh, ksh, telnet, ftp• not: bash, python, ruby, ssh, perl, etc.

twitter: @arjuncomar email: [email protected]

Page 25: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

25

Wait, why not use ...• Yes, we could have tried to beat that wall down• Lots of effort/expertise to produce a working build of python for

the target environment• QNX support would probably have been willing to help• But loading new software onto the target environment to increase

its capabilities is fundamentally risky• Business was understandably risk averse

• Rather limited DevOps team at this point of me, myself, and I.

twitter: @arjuncomar email: [email protected]

Page 26: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

26

A little expect script$ cat login.expect#!/usr/bin/expect set timeout 20set addr [lindex $argv 0]set user [lindex $argv 1]set pass [lindex $argv 2] spawn telnet $addrexpect "login:"send "$user\r"expect "Password:"send "$pass\r"expect "#"interact

twitter: @arjuncomar email: [email protected]

Page 27: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

27

Adding a little abstractionproc login { addr user pass } { spawn telnet $addr expect { timeout { send_user "Could not connect\n"; exit 1 } eof { send_user "Connection refused\n"; exit 1 } "login:" } send "$user\r" expect "Password:" send "$pass\r" expect { timeout { send_user "Failed to login.\n"; exit 1 } "#" }}

twitter: @arjuncomar email: [email protected]

Page 28: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

28

Separation of Concerns• It only takes minor modifications to use the same logic to connect

to ftp• Use ftp to upload deployment archive, install sh script• Use telnet to set permissions and execute install script on archive• Deployment logic is now separate from connecting, setup, etc.

• “talking to the target” vs “doing stuff on the target”• This is exactly the separation chef/puppet/ansible provide

• (They also provide a whole lot of other value as well, but it’s nice to recover any of it!)

twitter: @arjuncomar email: [email protected]

Page 29: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

29

Towards a deployment framework• How many environments like this are out there?

• limited tooling, embedded platform, etc.• If there are a lot… we have the start of a deployment framework to

target these environments• Dependencies are very minimal, can be used to target virtually

anything• With work, we could get something idempotent with clean

modularity and composability.• A whole lot of work… Is there a market that needs this?

twitter: @arjuncomar email: [email protected]

Page 30: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

30

Scaling for Successand Resource Management

twitter: @arjuncomar email: [email protected]

Page 31: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

31

Resource Needs• Embedded device with potential hardware attachments for

particular tests -- virtualization is out.• Unit tests need to run in the target environment so one target is

needed at a minimum just for rapid feedback CI.• Basic integration testing (i.e. devint env) takes ~1 min to ~ 10 mins• Fully automated functional testing takes ~10 mins to 1+ hours to

run (i.e. test env)• Partially automated tests require interaction, need another target.• Longer term testing (i.e. stress, durability, performance, etc.) takes

weeks and needs its own target.• ~5 targets minimum to support development for basic CI/CD

twitter: @arjuncomar email: [email protected]

Page 32: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

32

Tackling Resource Allocation• If a new build kicks off and reaches deployment testing while the

previous round of smoke testing is still on-going, what happens?• Probably: target gets bricked as OS level code is updated while the

machine is in use.• Even if the pipeline is built carefully so these things can’t happen,

there’s always PEBKAC• Deployment and testing tools need to be smart enough to check if

a console is available before attempting to use it• We need a resource allocator...

twitter: @arjuncomar email: [email protected]

Page 33: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

33

Making a first pass• Track the target state on the target• Use an old Unix trick -- drop a lock file in a well-known spot, and

make tools attempt to acquire the lock before using the target• Pros: Extremely simple to implement and use; it’s a really simple

pair of shell scripts.• Cons: If the lockfile isn’t cleaned up, the target is unavailable; if the

tool (user) doesn’t check for the lock, they could still cause problems. It’s hard to track what targets are in use where, there’s no centralized management.

twitter: @arjuncomar email: [email protected]

Page 34: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

34

Aside: Jenkins Pipeline• Specifying the pipeline in groovy

instead of shell/jenkins xml prevented a lot of bugs.

• acquireLock and releaseLock have simple contracts and provide strong guarantees with try/finally idiom.

• This is tricky/hard to achieve with traditional jenkins.

def locking(target, action) { try { acquireLock(target) action() } finally { releaseLock(target) }}downloadTests(latest)locking(targetAddr) { deploy(targetAddr) runTests(targetAddr, myBuild, testTags)}

twitter: @arjuncomar email: [email protected]

Page 35: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

35

Multiple teams, multiple workstreams• Goal is to reduce cycle time. If one team has to wait for feedback

for another team’s build to finish, we’re wasting time.• Key takeaway: we can’t effectively share environments between

parallel streams of development.• Business wanted ~4 streams of work progressing in parallel.• Team needs to be able to support old releases via hotfixes (~2 old,

previous release, current stream of development).• Hardware/firmware platform changes between releases

• Test automation team needs to an environment to test their tests.• DevOps team needs to be able to test pipeline changes.• ~40 target machines to effectively support CI/CD pipeline.

twitter: @arjuncomar email: [email protected]

Page 36: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

36

That’s a lot of equipment...• Where do you put it all?

• Shelving/rackspace, cooling, switches, networking…• Units are expensive; if they aren’t in use/needed, business is going

to get annoyed.• Hard to track utilization, load, etc. from a really decentralized

place.• We might also be able to save money / use fewer targets if we’re

more intelligent about allocating them; i.e. allocate on demand.• Centralization also means we can start hitting nice-to-haves:

• console access from the web browser for debugging• status/health check daemon reporting to the manager

twitter: @arjuncomar email: [email protected]

Page 37: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

37

Centralized Resource Management• Pool available targets, expose REST API to acquire a target for use,

release a target, check a target, etc.• Track target status, usage metrics, target requester statistics in

backend database.• Set up a simple frontend to display statistics about usage, provide a

manual form to acquire a target for manual/ad-hoc testing, etc.• Like a library; acquire target for duration, get grumpy emails if it’s

not returned in time.• Can be easily expanded to provide additional services over time.

twitter: @arjuncomar email: [email protected]

Page 38: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

38

Lightning Quick Recap• Integrate continuously to keep software testable, increase quality,

and build confidence.• Prioritize the delivery of working software.• Fail early, fail often.• Make your tools serve your needs.• Set yourself up to success -- plan ahead to cover scaling needs.

twitter: @arjuncomar email: [email protected]

Page 39: DevOps in a Regulated and Embedded Environment (AgileDC)

© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED.

39

That was fast...• There’s a lot more I’d love to talk about.• Please feel free to ask me questions during the break or

afterwards.

• Thanks for your time!

twitter: @arjuncomar email: [email protected]