dependable cloud architecture - swocc edition

29
Image: xkcd.com Dependable Cloud Architecture http://mvwood.com @mikewo on Twitter [email protected]

Upload: michael-wood

Post on 13-Nov-2014

695 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Dependable Cloud Architecture - SWOCC Edition

Image: xkcd.com

Dependable Cloud Architecture

http://mvwood.com@mikewo on Twitter

[email protected]

Page 2: Dependable Cloud Architecture - SWOCC Edition

“Failure is alwaysan option.”

Image: Discovery Channel, Fair Use

Page 3: Dependable Cloud Architecture - SWOCC Edition

Protection From:

What are we looking for?

Check out: http://bit.ly/wazbizcontImages: Office ClipArt & Godzilla Releasing Corp (Fair Use)

Hardware Failure Data Corruption Network Failure Loss of Facilities

Page 4: Dependable Cloud Architecture - SWOCC Edition

Image: FOX, Fair Use

Human Error

Page 5: Dependable Cloud Architecture - SWOCC Edition

What we’re trying to achieve

1. Monitoring2. Resilient Solutions

Image: Cohdra

Page 6: Dependable Cloud Architecture - SWOCC Edition

Image: Office ClipArt

Cost vs Risk

99.999% $1, … ,000.00

To get more 9’s here add more 0’s here.

Page 7: Dependable Cloud Architecture - SWOCC Edition

Image: NASA

Monitoring

Page 8: Dependable Cloud Architecture - SWOCC Edition

Functional Transparency

Image: Office ClipArt

Logging Messages

Hardware Health

Dependent Services Health

Page 9: Dependable Cloud Architecture - SWOCC Edition

Telemetry

Page 10: Dependable Cloud Architecture - SWOCC Edition

Image: NASA

Analyze your Data

Page 11: Dependable Cloud Architecture - SWOCC Edition

ResilienceImage: Office ClipArt

Page 12: Dependable Cloud Architecture - SWOCC Edition

Remember: Failure is always an option.

Common Points of Failure• Machine\application crashes• Throttling (exceeding capacity)• Connectivity\Network• External service dependencies

Focus less on the uptime of hardware and more about how the solution handles it WHEN

something fails!

Page 13: Dependable Cloud Architecture - SWOCC Edition

Try/catch != Resilient

private void createFile() {

string fileName = @"c:\workingDirectory\someFileName.txt";

try {

File.Create(fileName);}catch (DirectoryNotFoundException ex)

{Trace.WriteLine(String.Format("Unable to create {0}. {1}",

fileName, ex));

throw; } } }

Page 14: Dependable Cloud Architecture - SWOCC Edition

Image: Michael Wood

Decompose your system…

Page 15: Dependable Cloud Architecture - SWOCC Edition

Capacity BufferingContent Delivery Networks (CDN’s)

Distributed Application Cache

Local Content Cache

Enables recovery during outages or

spikes in load

Image: jepler

Page 16: Dependable Cloud Architecture - SWOCC Edition

Always carry a spare75% Capacity, half of our load 75% Capacity, half of our load

50% more capacity then needed• Can absorb of temporary spikes• Time to react if need to add capacity

100% of load, 150% Capacity0% Capacity, redirect all load

Over allocated, but still functioning• Degrade, but don’t fail

SYSTEM FAILURE!!!

Image: Kevin Rosseel

Page 17: Dependable Cloud Architecture - SWOCC Edition

Request Buffering

Image: Joe Shlabotnik

QueuesRetry PoliciesAsync Workloads

Page 18: Dependable Cloud Architecture - SWOCC Edition

Dept. of Redundancy Dept.

Have a backup, somewhere elseMore than one? Cost to benefit Ratio?

Ready StateHot = full capacityWarm = scaled down, but ready to growCold = mothballed, starts from zero

Image: Mr. White

Page 19: Dependable Cloud Architecture - SWOCC Edition

Redundancy - Its about probability

95% uptime 95% uptime 95% uptime 95% uptime

1 box : 5% downtime or 438hrs per year

2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year

4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,0000.000625% downtime or 3.285 MINUTES per year

(that’s 18 ½ days!)

Page 20: Dependable Cloud Architecture - SWOCC Edition

Total Outage duration =

Time to Detect+ Time to Diagnose+ Time to Decide+ Time to ActImage: Office ClipArt

Page 21: Dependable Cloud Architecture - SWOCC Edition

Dynamic Addressing & Configuration

Page 22: Dependable Cloud Architecture - SWOCC Edition

What about your data?

Image: barrymieny

Page 23: Dependable Cloud Architecture - SWOCC Edition

Availability via Degradation

Image: Michael Wood

Page 24: Dependable Cloud Architecture - SWOCC Edition

Images: Gizmodo

Virtualization and Automation

Page 25: Dependable Cloud Architecture - SWOCC Edition

Images: Orion Pictures owns Terminator Franchise

Page 26: Dependable Cloud Architecture - SWOCC Edition

The “HI” Point

Check out: http://bit.ly/wazinternalsImages: Office Clip Art

Page 27: Dependable Cloud Architecture - SWOCC Edition

Image: NASA

Page 28: Dependable Cloud Architecture - SWOCC Edition

“Don't be too proud of this technological terror you've constructed…”

ADMIT:• Your Solution WILL fail at some point• You can learn from others just as

well as yourself

DO:• Root cause analysis• Read other root cause analysis• Plan for failure

DON’T:• Get cocky• Stick your head in the sand

Images: LucasFilm, Fair Use

Page 29: Dependable Cloud Architecture - SWOCC Edition

@mikewo

[email protected]

http://mvwood.com

http://bit.ly/CloudFailSafe

Questions?