architecting for failure for... · 2014-01-06 · architecting for failure. outsource...
TRANSCRIPT
![Page 2: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/2.jpg)
Outsource Infrastructure?
![Page 3: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/3.jpg)
Traditional Web Application
Web SiteVirtual Machine / Directly on Hardware100 MB Relational DatabaseInbound TransactionsOutput TransactionsFile System
![Page 4: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/4.jpg)
Hosting Provider Costs
Provider $ / Monthly Cost
Host Gator 9.95
Go Daddy 10
ORCS Web 69
Amazon 83+ BYOS
Windows Azure 97
Note: traditional hosting, no custom colocation, virtualized data centers.
![Page 5: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/5.jpg)
Cloud is Not Cheaper for Hosting
![Page 6: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/6.jpg)
Perhaps, Higher Availability?
![Page 7: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/7.jpg)
SLA is Not Radically Different
Provider Compute SLA (%)
Go Daddy 99.9
ORCS Web 99.9
Host Gator 99.9
Amazon 99.95
Azure 99.95
Difference is seven minutes a day; 1.75 days a year.
![Page 8: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/8.jpg)
Higher Rate Since You Pay for Flexibility
![Page 9: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/9.jpg)
Hosting is Not Cloud Computing
![Page 10: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/10.jpg)
Why Utility Computing?
Scalability: do not have to pay for peak scenarios.Availability: can approach 100% if you want to pay.
![Page 11: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/11.jpg)
Architecturally, they are the same problem
![Page 12: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/12.jpg)
You must design to accommodate missing computing resources.
![Page 13: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/13.jpg)
Designing for Failure is Cloud Computing
![Page 14: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/14.jpg)
What’s wrong with this Code Fragment?
ClientProxy client = new ClientProxy();Response response = client.Do (request);
![Page 15: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/15.jpg)
Never assume that any interface between two components always succeeds.
![Page 16: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/16.jpg)
So You Put in a Catch Handler
try{
ClientProxy client = new ClientProxy();int result = client.Do (a, b, c);
}catch (Exception ex){
}
![Page 17: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/17.jpg)
What if…
a timeout, how many retries?the result is a complete failure?the underlying hardware crashed?you need to save the user’s data?you are in the middle of a transaction?
![Page 18: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/18.jpg)
What Do You Put in the Catch Handler?
try{
ClientProxy client = new ClientProxy();int result = client.Do (a, b, c);
}catch (Exception ex)
{????
}
![Page 19: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/19.jpg)
You can’t program yourself out of a failure.
![Page 20: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/20.jpg)
Failure is a first-class design citizen.
![Page 21: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/21.jpg)
The critical issue is how to respond to failure. The underlying infrastructure
cannot guarantee availability.
Principle #1
![Page 22: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/22.jpg)
Consequences of Failure
![Page 23: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/23.jpg)
Multiple tiers and dependenciesIf your order queue fails, no ordersIf your customer service fails, no
membership information
![Page 24: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/24.jpg)
The more dependencies, the more consequences of a poorly handle failure
![Page 25: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/25.jpg)
Dependencies include your code, third parties, the Internet/Web, anything you
do not control
![Page 26: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/26.jpg)
Unhandled failures propagate (like cracks) through your application.
![Page 27: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/27.jpg)
Failures Cascade – an unhandled failure in one part of the system becomes a
failure of your application.
Principle #2
![Page 28: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/28.jpg)
Two Types of Failure
Transient FailureResource Failure
![Page 29: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/29.jpg)
Typical Response to a Transient Failure
RetryHow Often?How Long Before You Give Up?
![Page 30: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/30.jpg)
Delays Cascade Just Like Failures
Delays occur while you are waiting or retryingDelays hog resources like threads, TCP/IP ports, database connections, memory.Since delays are usually the result of resource bottlenecks, waiting or retrying for long periods adds to the bottleneck.
![Page 31: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/31.jpg)
Transient failures become resource failures
![Page 32: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/32.jpg)
Transient Failures
Retry for a short time, then give up (like a circuit breaker) if unsuccessful.Never block on I/O, timeout and assume failure.
![Page 33: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/33.jpg)
There is no such thing as a transient failure. Fail fast and treat it as a resource
failure.
Principle #3
![Page 34: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/34.jpg)
Make Components Failure Resistant
Must Provide Failure Isolation
![Page 35: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/35.jpg)
Make Components Failure Resistant
Design For Beyond Largest Expected LoadUnderstand latency of adding a new resourceUser load, virtual memory, CPU size, bandwidth, database
Handle all ErrorsFailure affects more people than on the desktop.
![Page 36: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/36.jpg)
Provide Failure Isolation
Catch all exceptionsLog all errorsReturn Succeed / Fail to External ServicesHave Failure Strategy For Dependent Services
![Page 37: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/37.jpg)
Define your own SLA
![Page 38: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/38.jpg)
Stress test components and system
![Page 39: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/39.jpg)
A chain is a strong as its weakest link
![Page 40: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/40.jpg)
Use a Margin of Safety when designing the resources used.
Principle #4
![Page 41: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/41.jpg)
What is the cost of availability?
![Page 42: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/42.jpg)
Any component or instance can fail –eliminate single points of failure.
![Page 43: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/43.jpg)
Search for Dependencies
Hardware / Virtual MachinesThird Party LibrariesInternet/WebInterfaces to your own componentsTCP/IP portsDNS ServersMessage QueuesDatabase DriversCredit Card Processors, Geocoding services, etc.
![Page 44: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/44.jpg)
Examine Queries
Only three types of result sets:Zero, One, Many (can become large overnight)
Search Providers limit results returnedRemember those 5 way joins your ORM usesObjects on a DCOM or RMI call
![Page 45: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/45.jpg)
Eliminate single points of failure. Accept the fact that you must build a distributed
application.
Principle #5
![Page 46: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/46.jpg)
You need redundancy...
![Page 47: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/47.jpg)
but you have to manage state.
![Page 48: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/48.jpg)
Solutions such as database mirroring may have unacceptable latencies, such as
over geography.
![Page 49: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/49.jpg)
Reduce the parts of your application that handle state to a minimum.
![Page 50: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/50.jpg)
Loss of a stateful component usually means loss of user data.
![Page 51: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/51.jpg)
State Handling Components
Does the UI layer need session state?Business Logic, Domain Layer should be statelessUse queues where they make sense to hold dataDesign services for minimal dependencies
Pay with a customer numberKeep state with the message
Don’t forget infrastructure logs, configuration filesState is in specialized stores
![Page 52: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/52.jpg)
Build atomic services.
Atomic means unified, not small.
Decouple the services.
![Page 53: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/53.jpg)
Stateless components allow for scalability and redundancy.
![Page 54: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/54.jpg)
What about the data tier?
![Page 55: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/55.jpg)
Can you relax consistency constraints?What is acceptable data loss?
![Page 56: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/56.jpg)
What is the cost of an apology?
![Page 57: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/57.jpg)
How important is the relational model?
![Page 58: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/58.jpg)
Design for Eventual Consistency
![Page 59: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/59.jpg)
Consider CQRS
![Page 60: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/60.jpg)
Monitor your components.
Understand why they fail.
![Page 61: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/61.jpg)
Reroute traffic to existing instances or another data center or geographic area?
![Page 62: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/62.jpg)
Add more instances?
![Page 63: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/63.jpg)
Caching or throttling can help your application run under failure.
![Page 64: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/64.jpg)
Poorer performance may be acceptable.
![Page 65: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/65.jpg)
Automate…Automate….Automate
![Page 66: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/66.jpg)
Degrade gracefully and predictably. Know what you can live without.
Principle #6
![Page 67: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/67.jpg)
Cloud Outages Happen
![Page 68: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/68.jpg)
Some Are Normal
Some Are Black Swans
![Page 69: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/69.jpg)
Humans Reason About Probabilities Poorly
![Page 70: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/70.jpg)
Assume the Rare Will Occur - It Will Occur
Principle #7
![Page 71: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/71.jpg)
Case Study: Amazon Four Day Outage
![Page 72: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/72.jpg)
Facts
April 21, 2011 One Day of Stabilization, Three Days of RecoveryProblems: EC2, EBS, Relational Database ServiceAffected: Quora, Hootsite, Foursquare, RedditUnaffected: Netflix, Twillo
![Page 73: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/73.jpg)
Why were Netflix and Twillo Unaffected?
They Designed For Failure
![Page 74: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/74.jpg)
Netflix Explicitly Architected For Failure
![Page 75: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/75.jpg)
Although more errors, higher latency, no increase in customer service calls or
inability to find or start movies.
![Page 76: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/76.jpg)
Key Architectural Decisions
Stateless ServicesData stored across isolation zones
Could switch to hot standby
Had Excess Capacity (N + 1)Handle large spikes or transient failures
Used relational databases only where needed.Could partition data
Degraded Gracefully
![Page 77: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/77.jpg)
Degraded Gracefully
Fail Fast, Aggressive TimeoutsCan degrade to lower quality service
no personalized movie list, still can get list of available movies
Non Critical Features can be removed.
![Page 78: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/78.jpg)
Chaos Monkey
![Page 79: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/79.jpg)
Some Problems
Had to manually reroute traffic; use more automation in the future for failover and recoveryRound robin load balancer can overload decreased number of instances.
May have to change auto scaling algorithm and internal load balancing.
Expand to Geographic Regions
![Page 80: Architecting For Failure For... · 2014-01-06 · Architecting For Failure. Outsource Infrastructure? Traditional Web Application Web Site ... Failure affects more people than on](https://reader034.vdocuments.mx/reader034/viewer/2022042302/5ecd7f55c6e7f0101d677286/html5/thumbnails/80.jpg)
Summary
Hosting in a cloud computing environment is valid.
Cloud Computing means designing for failure.