No Microservice is an Island

Download No Microservice is an Island

Post on 22-Jan-2018

153 views

Category:

Software

0 download

TRANSCRIPT

<ol><li> 1. No Microservice Is An Island Michele Titolo Lead Software Engineer @ Capital One </li><li> 2. Microservices Are A Distributed System </li><li> 3. Something WILL PROBABLY GO WRONG </li><li> 4. How Do You Figure Out Whats Wrong? </li><li> 5. How Do You Deploy A Fix? </li><li> 6. New Challenges </li><li> 7. Microservices Are More Than Application Size </li><li> 8. Failure Points++ </li><li> 9. Microservices Need An Ecosystem </li><li> 10. Apps Must Contribute </li><li> 11. Infrastructure Must Contribute </li><li> 12. Otherwise Youll Play Catch Up </li><li> 13. Three Biggest Challenges </li><li> 14. Deployment </li><li> 15. Scaling </li><li> 16. Debugging </li><li> 17. Create A Foundation </li><li> 18. Deploying Microservices </li><li> 19. Characteristics </li><li> 20. Small Releases </li><li> 21. Frequent Releases </li><li> 22. Consistent Releases </li><li> 23. Limit Special Snowflakes </li><li> 24. Top-Notch Tooling </li><li> 25. Automate Every Aspect </li><li> 26. </li><li> 27. </li><li> 28. Strategies </li><li> 29. Staged Deployments </li><li> 30. Canary </li><li> 31. R O U T E R </li><li> 32. Blue/Green Red/Black </li><li> 33. R O U T E R </li><li> 34. R O U T E R </li><li> 35. R O U T E R </li><li> 36. Automatic Rollbacks </li><li> 37. How Do You Know A Deployment Is Successful? </li><li> 38. What Features Or Tools Need To Exist? </li><li> 39. App </li><li> 40. - Tests Pass - Health Check - Dependency/Secret Injection </li><li> 41. Robust Tests </li><li> 42. Unit And Integration </li><li> 43. Frequent Deployments Frequent Testing </li><li> 44. Standardized Health Check </li><li> 45. Consume Dependencies And Secrets </li><li> 46. Dependencies i.e. URL to database </li><li> 47. Secrets i.e. username/password to DB </li><li> 48. - Tests Pass - Health Check - Dependency/Secret Injection </li><li> 49. System </li><li> 50. Aggregate Health Checks </li><li> 51. Show System State </li><li> 52. Successful Deployment? </li><li> 53. Define Successful </li><li> 54. Proactive </li><li> 55. Report </li><li> 56. Bots </li><li> 57. Recap </li><li> 58. Once a Deployment Is Finished, Everythings Done? </li><li> 59. Distributed Systems Are Constantly Changing </li><li> 60. Scaling Microservices </li><li> 61. App Has More Or Less Load </li><li> 62. Health Tracked </li><li> 63. - RAM - CPU - Latency - Requests per second </li><li> 64. Custom Metrics </li><li> 65. How Do Microservices Scale? </li><li> 66. Automation </li><li> 67. Smart System </li><li> 68. Trigger Scaling </li><li> 69. Scale Up Scale Down </li><li> 70. Scale Up Scale Down </li><li> 71. Deploy New Instances </li><li> 72. Monitor Health Checks </li><li> 73. Send Traffic To New Instances </li><li> 74. Scale Up Scale Down </li><li> 75. Instances Arent Being Used </li><li> 76. Stop Routing Traffic </li><li> 77. Gracefully Shutdown </li><li> 78. Disconnect </li><li> 79. Report Progress </li><li> 80. Ultimately Get Removed </li><li> 81. Routing Trac? </li><li> 82. Infrastructure++ </li><li> 83. Load Balancer Service Discovery </li><li> 84. Load Balancer Service Discovery </li><li> 85. All Cloud Providers Have One Built-In </li><li> 86. Attach To Scaling Group </li><li> 87. New Instances Registered </li><li> 88. Conversely, Instances Going Away Are Removed </li><li> 89. Load Balancer Service Discovery </li><li> 90. Routes To Healthy Instances </li><li> 91. Apps Register </li><li> 92. Route Via Convention </li><li> 93. A B Service Discovery GET /b/foo GET /foo </li><li> 94. A Service Discovery GET /b/foo </li><li> 95. Knows Health, Can Trigger Scaling Automation </li><li> 96. Scaling Only Solves So Many Problems </li><li> 97. Debugging Microservices </li><li> 98. First You Need To Know Theres A Problem </li><li> 99. How? </li><li> 100. Alerts </li><li> 101. What Qualies As A Problem? </li><li> 102. Varies Per App </li><li> 103. RAM, CPU, Latency </li><li> 104. Less Obvious, Too </li><li> 105. Key Performance Indicator </li><li> 106. Alerts Are For Known Issues </li><li> 107. Dashboards </li><li> 108. How Do You Debug An Issue? </li><li> 109. First: Check Problematic App </li><li> 110. Pretend Ssh Doesnt Exist </li><li> 111. Next Best Thing: Logs </li><li> 112. Log Collection And Aggregation </li><li> 113. Configurable Log Levels </li><li> 114. Inspect Logs </li><li> 115. Non-Trivial Issue Isolate </li><li> 116. Avoid Cascading Failures </li><li> 117. How Do You Isolate A Service? </li><li> 118. Identify Isolate </li><li> 119. Identify Isolate </li><li> 120. Q: What Parts Of The System Depend On Another? </li><li> 121. Request Tracing </li><li> 122. Unique Request Ids </li><li> 123. Best As Overlay </li><li> 124. Opentracing </li><li> 125. Add Header Manually </li><li> 126. Identify Isolate </li><li> 127. Circuit Breaking </li><li> 128. Let App Recover </li><li> 129. App Or System </li><li> 130. Fail Gracefully </li><li> 131. Deploy A Fix </li><li> 132. Slowly Start Sending Traffic </li><li> 133. Can Be Done Automatically </li><li> 134. Built-In Load Balancers Service Discovery </li><li> 135. Otherwise Update Manually </li><li> 136. Everything Returns To Normal </li><li> 137. What If The Problem Isnt One Of Your Apps? </li><li> 138. Do You Have An External Dependency? </li><li> 139. No Access To Logs </li><li> 140. No Access To Source </li><li> 141. You Cant Deploy A Fix </li><li> 142. Fewer Tools </li><li> 143. Damage Control </li><li> 144. How Limit Impact? </li><li> 145. Status Page </li><li> 146. Monitor The Status Page </li><li> 147. If Status Page Shows No Issue, Raise It </li><li> 148. Keeping Your System Healthy </li><li> 149. Whats Inside Your Control? </li><li> 150. Cant See Logs </li><li> 151. Request Tracing </li><li> 152. Circuit Breaking </li><li> 153. Degrade Gracefully </li><li> 154. Sometimes Infrastructure Fails </li><li> 155. Ex: S3 Outage In 2017 </li><li> 156. Circuit Breaking </li><li> 157. To Recap </li><li> 158. Debugging Internal Apps </li><li> 159. Log, Trace, Circuit Break </li><li> 160. Debugging External Apps </li><li> 161. Trace And Circuit Break </li><li> 162. One Thing All Of These Have In Common </li><li> 163. Metrics </li><li> 164. Monitoring </li><li> 165. Gain Insights Into Distributed System </li><li> 166. Not Free </li><li> 167. Apps Updated </li><li> 168. Health Checks </li><li> 169. Infrastructure Updated </li><li> 170. Monitor Health Checks </li><li> 171. Gather Information </li><li> 172. Create Condence That System Is Healthy </li><li> 173. Creating An Ecosystem For Microservices Is Doable </li><li> 174. Requires More Metrics </li><li> 175. Requires Smarter Infrastructure </li><li> 176. Infrastructure wont evolve on its own; people need to change their behavior and fundamentally think of what it takes to run an application a different way. Nova, Kris and Garrison, Justin (2017). Cloud Native Infrastructure. OReilley. </li><li> 177. Build Software To Manage Software </li><li> 178. Build Software To Support A Healthy Ecosystem </li><li> 179. Thank You! @micheletitolo </li><li> 180. https://unsplash.com/photos/-lp8sTmF9HA https://unsplash.com/photos/Rri94rs8 https://unsplash.com/photos/ B87VRCWmpqU https://unsplash.com/photos/aLDuLRjZknE https://unsplash.com/photos/xal6tVZRCGc https://unsplash.com/photos/ VRAxb_gNjQQ Photo Credits </li></ol>