resilient architecture

49
RESILIENT ARCHITECTURE Matt Stine ( ) @mstine http://www.mattstine.com

Upload: matt-stine

Post on 12-Apr-2017

452 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Resilient Architecture

RESILIENTARCHITECTUREMattStine( )@mstine

http://www.mattstine.com

Page 2: Resilient Architecture

HEADLINES

Page 3: Resilient Architecture

ASYSTEMFAILURECOSTSAWELL-KNOWNRETAILERSIGNIFICANTREVENUEONTHEBIGGESTINTERNETSHOPPINGDAYOF

THEYEAR.

Page 4: Resilient Architecture

ASYSTEMFAILURECAUSESTHECANCELLATIONOFHUNDREDSOF

FLIGHTS,STRANDINGTHOUSANDSOFAIRLINEPASSENGERS,ANDULTIMATELYCOSTINGTHEAIRLINEMILLIONSIN

REVENUE.

Page 5: Resilient Architecture

ABEAUTIFULLYDESIGNEDONLINESTORECRUMBLESUNDERTHEPRESSUREOFATHUNDERINGHERDOFCUSTOMERS

TRYINGTOPURCHASETHELATESTTECHGADGET.

Page 6: Resilient Architecture

ASECURITYBREACHEXPOSESTHOUSANDSOFCUSTOMERCREDITCARDNUMBERS,LEADINGTOMILLIONSINLOSTREVENUEDUETOTHERESULTINGLOSS

OFTRUST.

Page 7: Resilient Architecture

WHATCANWEDO?

Page 8: Resilient Architecture

DISRUPTIVECOMPANIESAREALSOAPPROACHINGRESILIENCY

DIFFERENTLY.

Page 9: Resilient Architecture

STOPTRYINGTOPREVENTMISTAKES.

Page 10: Resilient Architecture

EMBRACEFAILURE.

Page 11: Resilient Architecture

FROMMTBFTOMTTR

Page 12: Resilient Architecture

WENEEDBETTERTOOLSANDTECHNIQUES.

Page 13: Resilient Architecture

RESILIENTARCHITECTURESEnhanceObservabilityLeverageResiliencyPatternsEmbraceChaos

Page 14: Resilient Architecture

ENHANCEOBSERVABILITY

Page 15: Resilient Architecture

SEEFAILUREWHENITHAPPENS

Page 16: Resilient Architecture

MEASUREEVERYTHING

Page 17: Resilient Architecture

WHATISNORMAL?ValuesRatesofChangeMean?P95/99/99.9?

Page 18: Resilient Architecture

WHATISNORMAL?

http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/

Page 19: Resilient Architecture
Page 20: Resilient Architecture
Page 21: Resilient Architecture

SPRINGBOOTHEALTHENDPOINT

{ "diskSpace": { "status": "UP", "total": 1056858112, "free": 878850048, "threshold": 10485760 }, "refreshScope": { "status": "UP" }, "configServer": { "status": "UP", "propertySources": [ "configClient", "https://github.com/spring-cloud-services-samples/fortune-teller/configuration/application.yml" ] }, "hystrix": {

Page 22: Resilient Architecture

SPRINGBOOTINFOENDPOINT "git": { "build": { "host": "Matts-MacBook-Pro.local", "version": "0.0.1-SNAPSHOT", "time": 1489021333000, "user": { "name": "Matt Stine", "email": "[email protected]" } }, "branch": "master", "commit": { "message": { "short": "initial commit", "full": "initial commit" }, "id": "9b624974e417693cf921b9abc50b5af4ea0b6dde", "id.describe-short": "9b62497-dirty", "id.abbrev": "9b62497", "id.describe": "9b62497-dirty",

Page 23: Resilient Architecture

DISTRIBUTEDTRACING

Zipkin

Page 24: Resilient Architecture

EXAMPLES:SpringBootActuatorhttp://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#production-ready

PCFAppsManagerhttps://docs.pivotal.io/pivotalcf/1-9/console/using-actuators.html

SpringCloudSleuthhttps://cloud.spring.io/spring-cloud-sleuth/

Zipkinhttp://zipkin.io/

Page 25: Resilient Architecture

LEVERAGERESILIENCYPATTERNS

Page 26: Resilient Architecture

TIMEOUTS

Page 27: Resilient Architecture

TIMEOUTSThinkingishalfthebattle!AnythingthatblocksthreadsAnymethodcallwithanoptionaltimeoutargument

Page 28: Resilient Architecture

ADDINGTIMEOUTSTORESTTEMPLATE

@Beanpublic RestTemplate restTemplate() { SimpleClientHttpRequestFactory clientHttpRequestFactory = new SimpleClientHttpRequestFactory(); clientHttpRequestFactory.setConnectTimeout(10 * 1000); // Ten seconds! clientHttpRequestFactory.setReadTimeout(10 * 1000); // Ten seconds! return new RestTemplate(clientHttpRequestFactory);}

Page 29: Resilient Architecture

RETRIES

Page 30: Resilient Architecture

RETRIESPotentiallytransientfailuresImmediatelyWithabackoffMaximumtimesLogallthethings

Page 31: Resilient Architecture

SIMPLERETRY @RequestMapping("/acquireThings")@Retryablepublic ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK);}

@Recoverpublic ResponseEntity<String> recover() { logger.warn("Returning default response..."); return new ResponseEntity<String>("default things", HttpStatus.OK);}

Page 32: Resilient Architecture

RETRYWITHBACKOFF @RequestMapping("/acquireThings")@Retryable(maxAttempts = 5, backoff = @Backoff(delay = 100L, maxDelay = 1000L, multiplier = 2, random = true))public ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK);}

Page 33: Resilient Architecture

EXPONENTIALBACKOFF @Beanpublic BackOffPolicy backOffPolicy() { return new ExponentialBackOffPolicy();}

Page 34: Resilient Architecture

BULKHEADS

Page 35: Resilient Architecture

BULKHEADSMicroservicesThreadPoolsAvailabilityZones

Page 36: Resilient Architecture

CIRCUITBREAKERS

Page 37: Resilient Architecture

CIRCUITBREAKERS

Page 38: Resilient Architecture

SPRINGCLOUDHYSTRIX @HystrixCommand(fallbackMethod = "fallbackFortune")public Fortune randomFortune() { return restTemplate.getForObject("http://fortunes/random", Fortune.class);}

private Fortune fallbackFortune() { return new Fortune(42L, fortuneProperties.getFallbackFortune());}

Page 39: Resilient Architecture

EXAMPLES:SpringRetryhttps://github.com/spring-projects/spring-retry

Hystrixhttps://github.com/Netflix/Hystrix

viaSpringCloudNetflixhttps://cloud.spring.io/spring-cloud-netflix/

Page 40: Resilient Architecture

EMBRACECHAOS

Page 41: Resilient Architecture

HOWDOYOUKNOWYOURSYSTEMWILLTOLERATEFAILURE

IFITHASN'TFAILED?

Page 42: Resilient Architecture

GAMEDAYEXERCISES

Page 43: Resilient Architecture

CANWEDIALTHATUPANOTCH?

Page 44: Resilient Architecture

YAUANDCHEUNG:DESIGNOFSELF-CHECKINGSOFTWARE

(1975)

Page 45: Resilient Architecture

DIDSOMEBODYSAY...

Page 46: Resilient Architecture

EXAMPLES:ChaosLemur(BOSH)https://github.com/strepsirrhini-army/chaos-lemur

ChaosLoris(CF)https://github.com/strepsirrhini-army/chaos-loris

Page 47: Resilient Architecture
Page 48: Resilient Architecture

REVIEWTIME!StoptryingtopreventmistakesFocusonMTTREnhanceobservabilityLeverageresiliencypatternsEmbracechaos!

Page 49: Resilient Architecture

THANKS!

MattStine( )@mstinehttp://www.mattstine.com