tef con2016 (1)

40
Best Practices for Inter- process Communication Gustavo Garcia @anarchyco

Upload: ggarber

Post on 14-Apr-2017

107 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Tef con2016 (1)

Best Practices for Inter-process

Communication

Gustavo Garcia@anarchyco

Page 2: Tef con2016 (1)
Page 3: Tef con2016 (1)

What happens when your application and/or team starts growing?

Page 4: Tef con2016 (1)
Page 5: Tef con2016 (1)
Page 6: Tef con2016 (1)

Disclaimer: I don’t like the word. I’m not advocating to use microservices.

Page 7: Tef con2016 (1)

Inter-process communicationOnce you break a monolithic application into separate pieces – microservices – the pieces need to speak to each other. And it turns out that you have many options for inter-process communication.

1-1 1-many

SYNCHRONOUS Request / Response

ASYNCHRONOUS NotificationRequest / Async Response

Publish / SubscribePublish / Async Responses

Page 8: Tef con2016 (1)

Request / Response (RPC)

Discover -> Format -> Send

Page 9: Tef con2016 (1)

Discovery and Load BalancingWhen you are writing some code that invokes a service, in order to make a request, your code needs to know the network location (IP address and port) of a service instance.

In a modern, cloud-based microservices application, however, this is a much more difficult problem to solve.

Service instances have dynamically assigned network locations and the set of service instances changes dynamically because of autoscaling, failures, and upgrades.

Page 10: Tef con2016 (1)

Discovery and Load BalancingAt a high level there are two different approaches:

Client-Side Discovery Pattern: The calling service needs to find the

Server-Side Discovery Pattern: The calling service sends the request to an intermediary (router/proxy) who is the responsible of locating

Page 11: Tef con2016 (1)

Discovery and Load Balancing

Page 12: Tef con2016 (1)

Ribbon is a Inter Process Communication (remote procedure calls) library with built in software load balancers. The primary usage model involves REST calls with various serialization scheme support. It is heavily used in production by Netflix.

Finagle clients come equipped with a load balancer, a pivotal component in the client stack, whose responsibility is to dynamically distribute load across a collection of interchangeable endpoints. Finagle is the core component of the Twitter microservices architecture and it is used by FourSquare, Tumblr, ING...

“A common anti-pattern used for HTTP microservices is to have a load balancing service fronting each stateless microservice. “ Joyent.

“Generally, the Proxy Model is workable for simple to moderately complex applications. It’s not the most efficient approach/Model for load balancing, especially at scale.” Nginx.

Page 13: Tef con2016 (1)
Page 14: Tef con2016 (1)

Serialization / FormatsDifferent ways to serialize the information for sending:

- Interface Definition Language (protobuf, thrift, json schema ...)

- Schema-free or “Documentation” based

IDL based are usually binary (but not necessarily) and usually includes the possibility of auto-generating code.

Page 15: Tef con2016 (1)

Serialization / Formats

Binary / Schema Text / Schema free

Efficiency High Lower

Development speed Low? High

Debugging / Readability Low High

Robustness High Low

Page 16: Tef con2016 (1)
Page 17: Tef con2016 (1)

Transport

Protocol HTTP, TCP

Security SSL, non-SSL

Reusing connections No reuse, Reusing, Multiplexing

Page 18: Tef con2016 (1)
Page 19: Tef con2016 (1)

TransportGood News: HTTP/2

●Efficient, SSL, Multiplexed

●Supported by major libraries: gRPC, Finagle ...

Page 20: Tef con2016 (1)

FailuresApplications in complex distributed architectures have dozens of dependencies, each of which will inevitably fail at some point. If the host application is not isolated from these external failures, it risks being taken down with them.

For example, for an application that depends on 30 services where each service has 99.99% uptime, here is what you can expect: 99.9930 = 99.7% uptime

2+ hours downtime/month even if all dependencies have excellent uptime.

Reality is generally worse.

Page 21: Tef con2016 (1)

Engineering for Failure

Detect: How and when to mark a request as a failure

React: What do you do when you detect a failure

Isolate: Minimize the impact in the whole system

Page 22: Tef con2016 (1)
Page 23: Tef con2016 (1)

Detecting failuresWhat is the definition of failure?

Connection failures vs HTTP Response Status

Timeouts:

Sometimes is more difficult than what it looks like.

Fail Fast

Page 24: Tef con2016 (1)

Reacting to failuresPossible ways to react to failures:

Retrying the request again in case it is idempotent

Cache the results and return them if the next request fails or always

Fallback to return something else or change the logic when one of the requests fails (for example sending a predefined value)

Page 25: Tef con2016 (1)
Page 26: Tef con2016 (1)

Circuit BreakerIf something is not working stop trying for a while because it could to make it worse for you or for them.

It can be a local Circuit Breaker or a global one

Page 27: Tef con2016 (1)
Page 28: Tef con2016 (1)

Example of logic

https://github.com/Netflix/Hystrix

Page 29: Tef con2016 (1)

Bulkhead patternA service miss-behaving shouldn’t affect rest of services.

Control use of resources of the client to a specific service.

Make sure a client to a specific service is not blocking the whole process.

Page 30: Tef con2016 (1)

Swimline patternMantien independent full stacks so that even in case of a problem in one of them there is no full outage.

Page 31: Tef con2016 (1)

Back Pressure or Flow ControlWhen your server is under pressure you should use some counter-measures to avoid making it worse.

For example wait accepting new connections, throttling messages, return 503...

Page 32: Tef con2016 (1)

Monitoring and DebuggingKnowing what’s happening in your service and why the latency or failures increases is harder when you are calling 30 services to process the request.

Monitoring

Debugging

Page 33: Tef con2016 (1)

MonitoringYou need to know if any of your requests is taking longer than expected, how many are failing, queue sizes...

33% HTTP EndPoint

33% Logs

33% No stats

Page 34: Tef con2016 (1)

DebuggingConsistency:

It has to be automatic

There has to be some guidelines and you have to be very strict

Traceability:

●Easily find all the requests belonging to the same call flow

●Identify the hierarchy (who is calling who)

sessionId == X OR sessionid == X OR session_id == X

Page 35: Tef con2016 (1)

DebuggingTrace / Spans

Page 36: Tef con2016 (1)

This is just too hard

Page 37: Tef con2016 (1)

Frameworks, Frameworks, Frameworks

DDIY

Boring is Good

Microservices Chassis

“Para comerme la mierda de otro mejor me como la mía”

Page 38: Tef con2016 (1)

Wrap Up“When you move to a microservices architecture, it

comes with this constant tax on your development cycle

that’s going to slow you down from that point on”

Page 39: Tef con2016 (1)

AcknowledgementsAll the projects collaborating in the survey