resilient architecture
TRANSCRIPT
RESILIENTARCHITECTUREMattStine( )@mstine
http://www.mattstine.com
HEADLINES
ASYSTEMFAILURECOSTSAWELL-KNOWNRETAILERSIGNIFICANTREVENUEONTHEBIGGESTINTERNETSHOPPINGDAYOF
THEYEAR.
ASYSTEMFAILURECAUSESTHECANCELLATIONOFHUNDREDSOF
FLIGHTS,STRANDINGTHOUSANDSOFAIRLINEPASSENGERS,ANDULTIMATELYCOSTINGTHEAIRLINEMILLIONSIN
REVENUE.
ABEAUTIFULLYDESIGNEDONLINESTORECRUMBLESUNDERTHEPRESSUREOFATHUNDERINGHERDOFCUSTOMERS
TRYINGTOPURCHASETHELATESTTECHGADGET.
ASECURITYBREACHEXPOSESTHOUSANDSOFCUSTOMERCREDITCARDNUMBERS,LEADINGTOMILLIONSINLOSTREVENUEDUETOTHERESULTINGLOSS
OFTRUST.
WHATCANWEDO?
DISRUPTIVECOMPANIESAREALSOAPPROACHINGRESILIENCY
DIFFERENTLY.
STOPTRYINGTOPREVENTMISTAKES.
EMBRACEFAILURE.
FROMMTBFTOMTTR
WENEEDBETTERTOOLSANDTECHNIQUES.
RESILIENTARCHITECTURESEnhanceObservabilityLeverageResiliencyPatternsEmbraceChaos
ENHANCEOBSERVABILITY
SEEFAILUREWHENITHAPPENS
MEASUREEVERYTHING
WHATISNORMAL?ValuesRatesofChangeMean?P95/99/99.9?
WHATISNORMAL?
http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
SPRINGBOOTHEALTHENDPOINT
{ "diskSpace": { "status": "UP", "total": 1056858112, "free": 878850048, "threshold": 10485760 }, "refreshScope": { "status": "UP" }, "configServer": { "status": "UP", "propertySources": [ "configClient", "https://github.com/spring-cloud-services-samples/fortune-teller/configuration/application.yml" ] }, "hystrix": {
SPRINGBOOTINFOENDPOINT "git": { "build": { "host": "Matts-MacBook-Pro.local", "version": "0.0.1-SNAPSHOT", "time": 1489021333000, "user": { "name": "Matt Stine", "email": "[email protected]" } }, "branch": "master", "commit": { "message": { "short": "initial commit", "full": "initial commit" }, "id": "9b624974e417693cf921b9abc50b5af4ea0b6dde", "id.describe-short": "9b62497-dirty", "id.abbrev": "9b62497", "id.describe": "9b62497-dirty",
DISTRIBUTEDTRACING
Zipkin
EXAMPLES:SpringBootActuatorhttp://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#production-ready
PCFAppsManagerhttps://docs.pivotal.io/pivotalcf/1-9/console/using-actuators.html
SpringCloudSleuthhttps://cloud.spring.io/spring-cloud-sleuth/
Zipkinhttp://zipkin.io/
LEVERAGERESILIENCYPATTERNS
TIMEOUTS
TIMEOUTSThinkingishalfthebattle!AnythingthatblocksthreadsAnymethodcallwithanoptionaltimeoutargument
ADDINGTIMEOUTSTORESTTEMPLATE
@Beanpublic RestTemplate restTemplate() { SimpleClientHttpRequestFactory clientHttpRequestFactory = new SimpleClientHttpRequestFactory(); clientHttpRequestFactory.setConnectTimeout(10 * 1000); // Ten seconds! clientHttpRequestFactory.setReadTimeout(10 * 1000); // Ten seconds! return new RestTemplate(clientHttpRequestFactory);}
RETRIES
RETRIESPotentiallytransientfailuresImmediatelyWithabackoffMaximumtimesLogallthethings
SIMPLERETRY @RequestMapping("/acquireThings")@Retryablepublic ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK);}
@Recoverpublic ResponseEntity<String> recover() { logger.warn("Returning default response..."); return new ResponseEntity<String>("default things", HttpStatus.OK);}
RETRYWITHBACKOFF @RequestMapping("/acquireThings")@Retryable(maxAttempts = 5, backoff = @Backoff(delay = 100L, maxDelay = 1000L, multiplier = 2, random = true))public ResponseEntity<String> tryToAcquireThings() { logger.info("Attempting to acquire things..."); String things = restTemplate .getForObject("http://localhost:8081/things", String.class); return new ResponseEntity<String>(things, HttpStatus.OK);}
EXPONENTIALBACKOFF @Beanpublic BackOffPolicy backOffPolicy() { return new ExponentialBackOffPolicy();}
BULKHEADS
BULKHEADSMicroservicesThreadPoolsAvailabilityZones
CIRCUITBREAKERS
CIRCUITBREAKERS
SPRINGCLOUDHYSTRIX @HystrixCommand(fallbackMethod = "fallbackFortune")public Fortune randomFortune() { return restTemplate.getForObject("http://fortunes/random", Fortune.class);}
private Fortune fallbackFortune() { return new Fortune(42L, fortuneProperties.getFallbackFortune());}
EXAMPLES:SpringRetryhttps://github.com/spring-projects/spring-retry
Hystrixhttps://github.com/Netflix/Hystrix
viaSpringCloudNetflixhttps://cloud.spring.io/spring-cloud-netflix/
EMBRACECHAOS
HOWDOYOUKNOWYOURSYSTEMWILLTOLERATEFAILURE
IFITHASN'TFAILED?
GAMEDAYEXERCISES
CANWEDIALTHATUPANOTCH?
YAUANDCHEUNG:DESIGNOFSELF-CHECKINGSOFTWARE
(1975)
DIDSOMEBODYSAY...
EXAMPLES:ChaosLemur(BOSH)https://github.com/strepsirrhini-army/chaos-lemur
ChaosLoris(CF)https://github.com/strepsirrhini-army/chaos-loris
REVIEWTIME!StoptryingtopreventmistakesFocusonMTTREnhanceobservabilityLeverageresiliencypatternsEmbracechaos!
THANKS!
MattStine( )@mstinehttp://www.mattstine.com