gluecon 2013 - dark architecture and how to forklift upgrade your system - dyn inc

Post on 18-Nov-2014

1.446 Views

Category:

Education

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dyn's CTO Cory von Wallenstein walks through how to evolve a system architecture for scale, performance and looser coupling without putting the business at risk and while keeping high tech team morale using a Dark Architecture approach.

TRANSCRIPT

Dark Architecture & How to Forklift Upgrade Your Infrastructure with Zero Downtime

Cory von WallensteinChief Technology Officer,

Dyn Inc.@cvonwallenstein

@cvonwallenstein from @DynInc at #gluecon

But First, Who Is Dyn?• Internet Infrastructure as a Service

– Managed DNS and Email Delivery

• 230 Global Employees (we bootstrapped to 170)• Headquarters in Manchester, NH (offices in SFO & UK too)• Raised first financing in Oct 2012: $38MM from NorthBridge

@cvonwallenstein from @DynInc at #gluecon

Problem We Are Trying To Solve

InputsBlack Magic

(Your Current System Architecture)Outputs

Different Black Magic (Your New System Architecture)

Inputs

Inputs

Inputs

Outputs

Outputs

Outputs

Scalex10, x102, etc.

Performance(t2 - t0) <= (t1 - t0)

t1

t2

t0

t0

CouplingTight -> Loose

@cvonwallenstein from @DynInc at #gluecon

Pragmatic Engineering over Unicorn Marketing

@cvonwallenstein from @DynInc at #gluecon

Why Things Get This Way• Time to market reigns supreme

– MVP was very… minimum… on… everything– Sooner is better than perfect

• Prototype to production to scale without architectural rigor– Skillset for system engineering in high demand

• Seen more often in small teams who find product market fit faster than expected– Inexperience, but we’ve all been there

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture• A way of thinking about, and technical

approach to, solving the scale/performance/coupling problem while enabling the business to succeed and keeping (some) of your hair

• We stand on shoulders of giants– Fowler, Amazon, Netflix, etc.

@cvonwallenstein from @DynInc at #gluecon

High Level of Dark Architecture• Legacy approach: Flag Day Upgrade/Deploy

– Scope out 3 month upgrade to swap architecture A to B, turns into 6 months, don’t get to anything else, cross fingers on flag day, fight fires where broken, gain weight, lose hair, girlfriend breaks up with you, team quits, FML…

• Evolved approach: Fowler’s Blue/Green Deploy– Two copies of system, load balancing to rapidly

deploy new system version, rapidly fail back to legacy on failure (only one active at a time)

@cvonwallenstein from @DynInc at #gluecon

High Level of Dark Architecture• Dark Architecture Approach

– Two copies of system, both active, send inputs for a workflow to both, compare outputs and throw one away (the one you threw the output away from is the “dark architecture”), log and inspect output differences, gain confidence in new system when differences go away, swap which output you throw away (effectively bringing the “dark” architecture “light”), achieve equilibrium on what workflows get processed by what system so your business has flexibility, high five everyone, onward and upward.

@cvonwallenstein from @DynInc at #gluecon

Tangible Examples• Scaling Global DNS Stats beyond 17 POPs

– MySQL to Cassandra, Log file rsync to agg counts

@cvonwallenstein from @DynInc at #gluecon

Tangible Examples• Scaling Email Delivery beyond 1 billion/month

– Cron to daemon (2011), Perl to Node.js (now)

Dark Architecture Manifesto1. Clear definition of success over ambiguity

– Likely scale/performance measured, may get blank stares on coupling

2. Continuously deliver value over months of no visible progress

3. Confidence in functional equivalence over scope creep

4. ^5’s over finger pointing5. Plan for failure over cross fingers

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture Manifesto6. Customer impact over elegant system

diagrams7. System flows over system components8. Operational confidence and familiarity over

trial by fire9. Having a ten item list over a nine item list10. Architecture evolution over architecture

revolution

@cvonwallenstein from @DynInc at #gluecon

Scope and Priority

• Prioritize a backlog of input/output workflows by amount of pain– Don’t think on a system component level

• “swap MySQL for Cassandra”

– Think on a system workflow level• “retrieve query logs and render *.example.com graphs”

– This exercise will force you to hone scope to exactly where the pain is so you can focus on delivering the solution to this pain first and save others for later.

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 0

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 1

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

0% of functionality enabled

0% of functionality consumed

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 4

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

25% of functionality enabled

0% of functionality consumed

Most people start with easy pieces under a misguided “crawl walk run” philosophy. Quick wins on easy stuff while saving hard problems

for later rarely ends well.

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 8

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

35% of functionality enabled

0% of functionality consumed

Progress slows as harder problems encountered

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 12

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

80% of functionality enabled

0% of functionality consumed

80% of projects spend 80% of their calendar time at 80% perceived completion. I’m 80% sure.

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Week 24

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

100% of functionality enabled

0% of functionality consumed

Other fires came up, things took longer than expected, you know… business. Morale never

been lower

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Flag Day!

Input

Legacy System

100% of functionality enabled

0% of functionality consumed

Output

New System

100% of functionality enabled

100% of functionality consumed

@cvonwallenstein from @DynInc at #gluecon

Legacy Approach: Flag Day!

Input

Legacy System

100% of functionality enabled

0% of functionality consumed

Output

New System

100% of functionality enabled

100% of functionality consumed

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture Approach

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture Approach: Week 0

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture Approach: Week 1

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

0% of functionality enabled

0% of functionality consumed

@cvonwallenstein from @DynInc at #gluecon

Dark Architecture Approach: Week 2

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

0% of functionality enabled

0% of functionality consumed

InputOutput

No functionality yet, just dark architecture

framework for two inputs and two

outputs (throwing one output away)

Dark Architecture Approach: Week 3

Input

Legacy System

100% of functionality enabled

100% of functionality consumed

Output

New System

2% of functionality enabled

2% of functionality consumed (dark)

InputOutput

Throw one away, but log and inspect differences!

Dark Architecture Approach: Week 4

Input

Legacy System

100% of functionality enabled

98% of functionality consumed

Output

New System

2% of functionality enabled

2% of functionality consumed

InputOutput

Gain confidence operating with two

equal outputs, switch which one is thrown

away for that workflow. Goes horribly wrong?

Switch back.

Dark Architecture Approach: Week 12

Input

Legacy System

100% of functionality enabled

80% of functionality consumed

Output

New System

20% of functionality enabled

20% of functionality consumed

InputOutput

Where do we stand at expected 3 months? Most painful 20% of problems resolved…

now we have flexibility for what to

do next.

Customer impact over elegant system diagrams

• Your customers are not paying you to have pretty whiteboards of elegant system architectures

• Your customers are paying you to make their pain go away. This gets priority.

• It’s OK to have different workflows handled by different systems to give your team agility– Other priorities came up? System is stable.– Have technical debt time? Continue arch migration

@cvonwallenstein from @DynInc at #gluecon

Parting Takeaways• Manifesto is a preference, not a rule• Think in flows not components• Deliver most painful pieces first so when

priorities change, you’re not left half complete.• Process success >>> process name• Be realistic. DA provides flexibility and frequent

victories for morale and some value delivered sooner, but it won’t necessarily make a full architecture migration faster in calendar days.

@cvonwallenstein from @DynInc at #gluecon

Cory von Wallenstein@cvonwallenstein

Questions?

@cvonwallenstein from @DynInc at #gluecon

top related