WTF is a Microservice - Rafael Schloming, Datawire

Download WTF is a Microservice - Rafael Schloming, Datawire

Post on 18-Feb-2017

98 views

Category:

Technology

5 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>WTF is a microservice?Rafael Schloming</p><p>Co-Founder &amp; Chief Architect</p></li><li><p>datawire.io</p><p>History</p><p>Datawire</p><p> Founded in 2014 Focused on microservices</p><p>Me</p><p> Lots of distributed systems experience Starting from zero with microservices</p><p>2</p></li><li><p>datawire.io</p><p>What is a microservice?</p><p>Wikipedia: ...no industry consensus</p><p> ...implementation approach for SoA ...processes that communicate with each other to fulfill a goal ...Naturally enforces a modular structure...</p><p>Everything else:</p><p> Volumes of essays good, bad, and ugly...</p><p>3</p></li><li><p>datawire.io</p><p>Three aspects of Microservices</p><p>Technology</p><p>Process</p><p>People</p><p>4</p></li><li><p>datawire.io</p><p>From Three Sources</p><p>5</p><p>Experts Bootstrapping</p><p>Migrating</p></li><li><p>datawire.io</p><p>Starting Point</p><p>Technical:</p><p> An application composed of a network of small services Building your application from microservices forces you to create clear </p><p>boundaries, better abstractions, ...</p><p>Process:</p><p> ???</p><p>People:</p><p> ???</p><p>6</p></li><li><p>datawire.io</p><p>The Expert Source</p><p>Read just about every firsthand story out there</p><p>Went to conferences</p><p>Talked to everyone we could</p><p>Started the practitioner summit</p><p>And armed with a little bit of knowledge, we started filling in our picture</p><p>7</p></li><li><p>datawire.io</p><p>People Picture</p><p>Developer Happiness/Tooling/Platform Team</p><p> Builds the infrastructure</p><p>Service teams</p><p> Builds the features</p><p>8</p></li><li><p>datawire.io</p><p>Technical Picture</p><p>Control Plane</p><p> Service Discovery Logging + Metrics Configuration Smart Endpoints</p><p>Traffic Layer</p><p> HTTP RPC Messaging</p><p>9</p><p>Reference Architecture</p></li><li><p>datawire.io</p><p>First Picture</p><p>Technical:</p><p> A network of small services Connected via a control plane and traffic layer</p><p>Process:</p><p> ???</p><p>People:</p><p> Platform team and service teams</p><p>10</p></li><li><p>datawire.io</p><p>The Bootstrap Perspective</p><p>Five engineers building an out of the box control plane...</p><p>Ingest interesting application level events: start, stop, heartbeat events log messages</p><p>Store them in an appropriate piece of infrastructure: Service registry Log store</p><p>Transform and Present: Realtime view of: routing table, service health Historic view of: request traces, ...</p><p>11</p></li><li><p>datawire.io</p><p>Ubiquitous Data Processing Pipeline</p><p>12</p><p>Ingest Source of Truth Transform Present</p><p>Template for many data driven businesses</p></li><li><p>datawire.io</p><p>V1: Started with Discovery</p><p>Requirements: highly available low throughput low latency low operational complexity able to survive a complete restart capable of handling spikes</p><p>Initial Choices: vert.x + hazelcast websockets smart clients auth0 + python shim</p><p>Total Services: 213</p></li><li><p>datawire.io</p><p>V2: Added Tracing (PoC)</p><p>Requirements: high throughput highish latency ok cannot impact application</p><p>Initial choices: vert.x, hazelcast (only retained transient buffer of last 1000 log messages) websockets smart circular buffer minimized impact on application</p><p>Total Services: 3</p><p>14</p></li><li><p>datawire.io</p><p>V3: Added Persistence for Tracing</p><p>Requirements: keep extended history provide full text search filtering, sorting, etc</p><p>Initial Choices: elasticsearch for storage/search query service</p><p>Total Services: 4</p><p>15</p></li><li><p>datawire.io</p><p>First hint of pain...</p><p>Rerouting data pathways: touched multiple services coupled changes</p><p>Poor local dev experience: manually fire up and wire the whole fabric</p><p>Slow deployment pipeline: bunched up changes</p><p>All this resulted in a big scary cutover</p><p>16</p></li><li><p>datawire.io</p><p>V4: Adding Persistence for Discovery</p><p>Requirements: track errors associated with particular service nodes store routing strategies</p><p>Initial Choices: postgres (RDS) for persistence</p><p>Yet another big cutover enough is enough!</p><p>Lets fix our tooling once and for all...</p><p>17</p></li><li><p>datawire.io</p><p>Deployment Requirements</p><p>Stuff we had tried:</p><p> Deliver everything as a docker image Still too much wiring to bootstrap the system</p><p> Use kubernetes for everything Nice dev experience with minikube, but we use amazon services</p><p>Need to meet both dev &amp; operational requirements</p><p> Fast dev cycle Good visibility Fast rollback Ability to leverage commodity services</p><p>18</p></li><li><p>datawire.io</p><p>Deployment Redesign</p><p> Complete system definition in git Contains all the information necessary to bootstrap the system from scratch in all of its operating </p><p>environments</p><p> System definition is well factored with respect to its environments Abstract definition: my service needs postgres and redis Development: service -&gt; docker image, postgres -&gt; docker image, redis -&gt; docker image</p><p> Use minikube to run the whole system Test: Production: service -&gt; docker image, postgres -&gt; RDS, redis -&gt; elasticache</p><p> Kubernetes cluster for stateless services</p><p> Tooling caters to the needs of each environment Development: fast feedback cycle Test: repeatable environments Production: quick and safe updates/rollbacks</p><p> Tooling helps maintain environment parity</p><p>19</p></li><li><p>datawire.io</p><p>DevOps?</p><p>DevOps is presented as a solution to an organizational problem, but we all sat in the same room</p><p>We were thinking about operational factors from day one:</p><p> throughput, latency, availability, building a service, not a server</p><p>This forced us to follow an incremental process:</p><p> tooling for this process was inadequate when we thought about the process it helped us figure out the tooling</p><p>20</p></li><li><p>datawire.io</p><p>Process: Architecture vs Development (SoA vs SoD)</p><p>Systems (their shape in particular) are traditionally architected</p><p>Architecture</p><p> lots of up front thinking slow feedback cycle</p><p>Development</p><p> frequent small changes quick feedback cycle measure the impact at every step</p><p>Microservices are about enabling a developmental methodology for systems</p><p>21</p></li><li><p>datawire.io</p><p>Methodology for Developing Systems</p><p>Principles small frequent changes rapid feedback and good visibility</p><p>Applied to codebases: Tooling for rapid feedback: compilers, incremental builds, test suites Tooling for good visibility: printf, logging, debuggers, profilers</p><p>Applied to systems: Key characteristics go beyond just logic and correctness Performance within specified tolerance of the running system is a critical feature</p><p>Tests dont cut it anymore...</p><p>22</p></li><li><p>datawire.io</p><p>Update the Dev Cycle</p><p>Tests assess impact on correctness...</p><p> Build -&gt; Test -&gt; Deploy</p><p>We need a way to assess impact on the system</p><p> Build -&gt; Test -&gt; Assess Impact -&gt; Deploy</p><p>How do you measure system level impact?</p><p> Measure impact against defined Service Level Objectives (SLOs): throughput, latency, and availability (error rate)</p><p>23</p></li><li><p>datawire.io</p><p>Back to the Experts...</p><p> Canary Testing Circuit Breakers Dark Launching Tracing Metrics Deployment</p><p>All ways to enable the dev cycle for running systems:</p><p> make small frequent changes measure the impact on the running system provide good visibility</p><p>24</p></li><li><p>datawire.io</p><p>Second Picture</p><p>Technical:</p><p> A network of small services Scaffolding to safely enable small frequent changes</p><p>Process:</p><p> Service oriented Development Small frequent changes with good visibility and feedback</p><p>People:</p><p> Platform team and service teams</p><p>25</p></li><li><p>datawire.io</p><p>The Migration Perspective</p><p>Variety of stages...</p><p> Monolith: django, rails, ... Monolith++: mothership + several little ducklings SoA-ish: small flock of services (maybe 5-10) Inbetweeners</p><p>Some moving really slowly...</p><p> Months to create just one microservice</p><p>Some moving much faster</p><p> Whats the difference?</p><p>26</p></li><li><p>datawire.io</p><p>Migration is about people</p><p>Starting point: team vs tech</p><p> Picking a tech stack for the entire eng org to adopt is slow lots of organizational friction</p><p> Replatforming/refactoring an entire existing monolith is slow lots of organizational and orchestrational friction</p><p> Creating a relatively autonomous team to tackle a particular problem in the form of a service</p><p>Growing pains: stability vs progress</p><p> some orgs hit a sticking point, some didnt</p><p>27</p></li><li><p>datawire.io</p><p>The People Picture: Dividing up the Work</p><p>The work has two aspects: build the features (dev) keeping the system running (ops)</p><p>You cant usefully divide up the work along these lines: new features are the biggest source of instability (bugs) separate roles creates misaligned incentives (devops) yet a big part of the work is keeping things running</p><p>Microservices is about how to go about dividing up work: break the big app into smaller ones divide operational responsibility in a way that aligns incentives</p><p>28</p></li><li><p>datawire.io</p><p>Third Picture</p><p>Technical:</p><p> A network of small services Scaffolding to quickly and safely enable small frequent changes</p><p>Process:</p><p> Service oriented Development Small frequent changes with good visibility and feedback</p><p>People:</p><p> Dividing up the work Service teams deliver features to users Platform team supports service teams</p><p>29</p></li><li><p>datawire.io</p><p>The Hard Way</p><p>30</p><p>1. Start with Tech2. Reverse Engineer The Process + People3. Make lots of mistakes along the way4. Learn from them</p></li><li><p>datawire.io</p><p>The Easy Way</p><p>31</p><p>1. Understand the principles of People and Process </p><p>2. Use this as a framework toa. pick tech that fitsb. learn from other people's mistakes</p></li><li><p>datawire.io</p><p>Microservices Cheat Sheet (What, Why &amp; How)People Process Technology</p><p>Microservices are a way to divide the work of building a cloud application</p><p>Microservices are built from a process of frequent small changes with rapid feedback and good visibility</p><p>Microservices are an application that is made up of a network of small services</p><p>This work falls into two categories: Keep the system running (ops) Build new features (dev).</p><p>Dividing work along these categories creates conflicting incentives between progress and stability. New features from dev eventually become the biggest source of instability for ops.</p><p>Unifying these roles (devops) allows you to minimize the tradeoff between progress and stability, but you now need to divide up the work by dividing up the app. This results in a network of services.</p><p>This is the application of the traditional dev cycle to systems rather than codebases, and for it to work, key system properties must become a first class features for developers.</p><p>This requires dev tooling to support quickly and safely assessing system impact.</p><p>This requires fast deployment tooling and good visibility into key system level properties:</p><p> Throughput Latency Availability (error-rate)</p><p>Depending on your system, this may require tooling for:</p><p> Fancy request routing (for canary testing, dark launching)</p><p>Give your dev teams operational responsibility!</p><p>Define service level objectives &amp; agreements for each service: SLOs: throughput, latency, availability SLAs: what happens when these arent met</p><p>Commoditize common operational overhead.</p><p>Extend the dev cycle to include a stage to assess the impact on key system properties (SLOs)</p><p>Build -&gt; Test -&gt; Deploy Build -&gt; Test -&gt; Assess Impact -&gt; Deploy</p><p>Start with a fast deployment pipeline that incorporates basic system level metrics and monitoring for each service.</p><p>32</p></li><li><p>datawire.io 33</p><p>Questions?</p></li><li><p>datawire.io</p><p>Microservices Cheat Sheet (What, Why &amp; How)People</p><p>Microservices are a way to divide the work of building a cloud application</p><p>Two aspects of work: keep it running (ops), build new features (dev)</p><p>Dividing by aspect creates conflicting incentives between progress and stability.</p><p>Unifying roles (devops) to minimize tradeoff... divide work by dividing the app</p><p>Give your dev teams operational responsibility!</p><p>Define service level objectives &amp; agreements for each service: SLOs: throughput, latency, availability SLAs: what happens when these arent met</p><p>Commoditize common operational overhead.</p><p>34</p></li><li><p>datawire.io</p><p>Microservices Cheat Sheet (What, Why &amp; How)</p><p>Process</p><p>Microservices are built from a process of frequent small changes with rapid feedback and good visibility</p><p>This is the application of the traditional dev cycle to systems rather than codebases, and for it to work, key system properties must become a first class features for developers.</p><p>Extend the dev cycle to include a stage to assess the impact on key system properties (SLOs)</p><p>Build -&gt; Test -&gt; Deploy Build -&gt; Test -&gt; Assess Impact -&gt; Deploy</p><p>35</p></li><li><p>datawire.io</p><p>Microservices Cheat Sheet (What, Why &amp; How)Technology</p><p>Microservices are an application that is made up of a network of small services</p><p>This requires dev tooling to support quickly and safely assessing system impact.</p><p>This requires fast deployment tooling and good visibility into key system level properties: Throughput Latency Availability (error-rate)</p><p>Depending on your system, this may require tooling for: Fancy request routing (for canary testing, dark launching)</p><p>Start with a fast deployment pipeline that incorporates basic system level metrics and monitoring for each service.</p><p>36</p></li><li><p>datawire.io</p><p>DevOps: you cant split the work (along these lines)</p><p>37</p><p>Dev</p><p>Ops</p><p>User User</p><p>DevOps</p></li><li><p>datawire.io</p><p>Features are the largest source of bugs</p><p>38</p><p>Dev</p><p>DevDev</p><p>Dev</p><p>Ops</p><p>Ops</p><p>User</p><p>User</p></li><li><p>datawire.io</p><p>Microservices: Divide the work by dividing the app</p><p>39</p><p>Dev</p><p>UserUser</p><p>Infra</p><p>DevDev</p><p>DevOps</p></li><li><p>datawire.io</p><p>Dividing up Work</p><p>40</p><p>Dev</p><p>DevDevDev</p><p>DevDev</p><p>Dev</p><p>Infra</p><p>User</p><p>User</p><p>User</p><p>User</p><p>Ops</p></li><li><p>datawire.io 41</p></li></ul>