simple solutions for complex problems - boulder meetup

84
Simple Solutions for Complex Problems Tyler Treat / Workiva Boulder NATS Meetup 6/7/2016

Upload: apcera

Post on 16-Apr-2017

407 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Simple Solutions for Complex Problems - Boulder Meetup

Simple Solutions for Complex ProblemsTyler Treat / Workiva

Boulder NATS Meetup 6/7/2016

Page 2: Simple Solutions for Complex Problems - Boulder Meetup

• Embracing the reality of complex systems

• Using simplicity to your advantage

• Why NATS?

• How Workiva uses NATS

ABOUT THIS TALK

Page 3: Simple Solutions for Complex Problems - Boulder Meetup

• Messaging tech lead at Workiva

• Platform infrastructure

• Distributed systems

• bravenewgeek.com

@[email protected]

ABOUT THE SPEAKER

Page 4: Simple Solutions for Complex Problems - Boulder Meetup

There are a lot of parallels between real-world systems anddistributed software systems.

Page 5: Simple Solutions for Complex Problems - Boulder Meetup

The world is eventually consistent…

Page 6: Simple Solutions for Complex Problems - Boulder Meetup

…and the database is just an optimization.[1]

[1] https://christophermeiklejohn.com/lasp/erlang/2015/10/27/tendency.html

Page 7: Simple Solutions for Complex Problems - Boulder Meetup

“There will be no further print editions [of the Merck Manual]. Publishing a printed book every five years and sending reams of paper around the world on trucks, planes, and boats is no longer the optimal way to provide medical information.”

Dr. Robert S. PorterEditor-in-Chief, The Merck Manuals

Page 8: Simple Solutions for Complex Problems - Boulder Meetup

Programmers find asynchrony hard to reason about, but the truth is…

Page 9: Simple Solutions for Complex Problems - Boulder Meetup

Life is mostly asynchronous.

Page 10: Simple Solutions for Complex Problems - Boulder Meetup

What does this mean for us as programmers?

Page 11: Simple Solutions for Complex Problems - Boulder Meetup

time / complexity

timesharing

monoliths

soa

virtualization

microservices

???

Complicated made complex…

Page 12: Simple Solutions for Complex Problems - Boulder Meetup

Distributed!

Page 13: Simple Solutions for Complex Problems - Boulder Meetup

Distributed computation is inherently asynchronous and the network is inherently unreliable[2]…

[2] http://queue.acm.org/detail.cfm?id=2655736

Page 14: Simple Solutions for Complex Problems - Boulder Meetup

…but the natural tendency is to build distributed systems as if they aren’t distributed at all because it’s easy to reason about.

strong consistency - reliable messaging - predictability

Page 15: Simple Solutions for Complex Problems - Boulder Meetup

• Complicated algorithms

• Transaction managers

• Coordination services

• Distributed locking

What’s in a guarantee?

Page 16: Simple Solutions for Complex Problems - Boulder Meetup
Page 17: Simple Solutions for Complex Problems - Boulder Meetup

• Message handed to the transport layer?

• Enqueued in the recipient’s mailbox?

• Recipient started processing it?

• Recipient finished processing it?

What’s a delivery guarantee?

Page 18: Simple Solutions for Complex Problems - Boulder Meetup

Each of these has a very different set of conditions, constraints, and costs.

Page 19: Simple Solutions for Complex Problems - Boulder Meetup

Guaranteed, ordered,exactly-once deliveryis expensive (if not impossible[3]).

[3] http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/

Page 20: Simple Solutions for Complex Problems - Boulder Meetup

Over-engineered

Page 21: Simple Solutions for Complex Problems - Boulder Meetup

Complex

Page 22: Simple Solutions for Complex Problems - Boulder Meetup

Difficult to deploy & operate

Page 23: Simple Solutions for Complex Problems - Boulder Meetup

Fragile

Page 24: Simple Solutions for Complex Problems - Boulder Meetup

Slow

Page 25: Simple Solutions for Complex Problems - Boulder Meetup

At large scale, guarantees will give out.

Page 26: Simple Solutions for Complex Problems - Boulder Meetup

0.1% failure at scale is huge.

Page 27: Simple Solutions for Complex Problems - Boulder Meetup
Page 28: Simple Solutions for Complex Problems - Boulder Meetup
Page 29: Simple Solutions for Complex Problems - Boulder Meetup

Replayable > Guaranteed

Page 30: Simple Solutions for Complex Problems - Boulder Meetup

Replayable > Guaranteed

Idempotent > Exactly-once

Page 31: Simple Solutions for Complex Problems - Boulder Meetup

Replayable > Guaranteed

Idempotent > Exactly-once

Commutative > Ordered

Page 32: Simple Solutions for Complex Problems - Boulder Meetup

But delivery != processing

Page 33: Simple Solutions for Complex Problems - Boulder Meetup

Also, what does it even mean to “process” a message?

Page 34: Simple Solutions for Complex Problems - Boulder Meetup

It depends on thebusiness context!

Page 35: Simple Solutions for Complex Problems - Boulder Meetup

If you need business-level guarantees, build them intothe business layer.

Page 36: Simple Solutions for Complex Problems - Boulder Meetup
Page 37: Simple Solutions for Complex Problems - Boulder Meetup

We can always buildstronger guarantees on top,but we can’t always removethem from below.

Page 38: Simple Solutions for Complex Problems - Boulder Meetup

End-to-end system semantics matter much more than the semantics of an individual building block[4].

[4] http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf

Page 39: Simple Solutions for Complex Problems - Boulder Meetup

Embrace the chaos!

Page 40: Simple Solutions for Complex Problems - Boulder Meetup

“Simplicity is the ultimate sophistication.”

Page 41: Simple Solutions for Complex Problems - Boulder Meetup

EMBRACING THE CHAOS MEANSLOOKING AT THE NEGATIVE SPACE.

Page 42: Simple Solutions for Complex Problems - Boulder Meetup

A simple technologyin a sea of complexity.

Page 43: Simple Solutions for Complex Problems - Boulder Meetup

Simple doesn’t mean easy.

[5] https://blog.wearewizards.io/some-a-priori-good-qualities-of-software-development

Page 44: Simple Solutions for Complex Problems - Boulder Meetup

“Simple can be harder than complex. You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.”

Page 45: Simple Solutions for Complex Problems - Boulder Meetup

• Wdesk: platform for enterprises to collect, manage, and report critical business data in real time

• Increasing amounts of data and complexity of formats

• Cloud solution: - Data accuracy - Secure - Highly available - Scalable - Mobile-enabled

About Workiva

Page 46: Simple Solutions for Complex Problems - Boulder Meetup
Page 47: Simple Solutions for Complex Problems - Boulder Meetup
Page 48: Simple Solutions for Complex Problems - Boulder Meetup

• First solution built on Google App Engine

• Scaling new solutions requires service-oriented approach

• Scaling new services requires a low-latency communication backplane

About Workiva

Page 49: Simple Solutions for Complex Problems - Boulder Meetup

Why ?

Page 50: Simple Solutions for Complex Problems - Boulder Meetup

Availabilityover

everything.

Page 51: Simple Solutions for Complex Problems - Boulder Meetup

• Always on, always available

• Protects itself at all costs—no compromises on performance

• Disconnects slow consumers and lazy listeners

• Clients have automatic failover and reconnect logic

• Clients buffer messages while temporarily partitioned

Availability over Everything

Page 52: Simple Solutions for Complex Problems - Boulder Meetup

Simplicity as a feature.

Page 53: Simple Solutions for Complex Problems - Boulder Meetup

• Single, lightweight binary

• Embraces the “negative space”: - Simplicity —> high-performance - No complicated configuration or external dependencies (e.g. ZooKeeper) - No fragile guarantees —> face complexity head-on, encourage async

• Simple pub/sub semantics provide a versatile primitive: - Fan-in - Fan-out - Request/response - Distributed queueing

• Simple text-based wire protocol

Simplicity as a Feature

Page 54: Simple Solutions for Complex Problems - Boulder Meetup

Fast as hell.

Page 55: Simple Solutions for Complex Problems - Boulder Meetup

[6] http://bravenewgeek.com/benchmarking-message-queue-latency/

Page 56: Simple Solutions for Complex Problems - Boulder Meetup
Page 57: Simple Solutions for Complex Problems - Boulder Meetup

• Fast, predictable performance at scale and at tail

• ~8 million messages per second

• Auto-pruning of interest graph allows efficient routing

• When SLAs matter, it’s hard to beat NATS

Fast as Hell

Page 58: Simple Solutions for Complex Problems - Boulder Meetup

• Low-latency service bus

• Pub/Sub

• RPC

How We Use NATS

Page 59: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

NATSService Gateway

Web Client

Web Client

Web Client

Page 60: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

NATSService Gateway

Web Client

Web Client

Web Client

Page 61: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

NATSService Gateway

Web Client

Web Client

Web Client

Page 62: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

NATSService Gateway

Web Client

Web Client

Web Client

Page 63: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

Service

Service

NATSService Gateway

Web Client

Web Client

Web Client

Page 64: Simple Solutions for Complex Problems - Boulder Meetup

Web Client

Web Client

Web Client

Service Gateway NATS

Service

Service

Service

Page 65: Simple Solutions for Complex Problems - Boulder Meetup

Service

Service

Service

NATS

Page 66: Simple Solutions for Complex Problems - Boulder Meetup

Pub/Sub

Page 67: Simple Solutions for Complex Problems - Boulder Meetup

“Just send this thing containing these fields serialized in this way using that encoding to this topic!”

Page 68: Simple Solutions for Complex Problems - Boulder Meetup

“Just subscribe to this topic and decode using that encoding then deserialize in this way and extract these fields fromthis thing!”

Page 69: Simple Solutions for Complex Problems - Boulder Meetup
Page 70: Simple Solutions for Complex Problems - Boulder Meetup

Pub/Sub is meant to decouple services but often ends up coupling the teams developing them.

Page 71: Simple Solutions for Complex Problems - Boulder Meetup

How do we evolve services in isolation and reduce development overhead?

Page 72: Simple Solutions for Complex Problems - Boulder Meetup

• Extension of Apache Thrift

• IDL and cross-language, code-generated pub/sub APIs

• Allows developers to think in terms of services and APIs rather than opaque messages and topics

• Allows APIs to evolve while maintaining compatibility

• Transports are pluggable (we use NATS)

Frugal RPC

Page 73: Simple Solutions for Complex Problems - Boulder Meetup

struct Event { 1: i64 id, 2: string message, 3: i64 timestamp, }

scope Events prefix {user} { EventCreated: Event EventUpdated: Event EventDeleted: Event}

subscriber.SubscribeEventCreated( "user-1", func(e *event.Event) { fmt.Println(e) },)

. . .

publisher.PublishEventCreated( "user-1", event.NewEvent())

generated

Page 74: Simple Solutions for Complex Problems - Boulder Meetup

• Service instances form a queue group

• Client “connects” to instance by publishing a message to the service queue group

• Serving instance sets up an inbox for the client and sends it back in the response

• Client sends requests to the inbox

• Connecting is cheap—no service discovery and no sockets to create, just a request/response

• Heartbeats used to check health of server and client

• Very early prototype code: https://github.com/workiva/thrift-nats

RPC over NATS

Page 75: Simple Solutions for Complex Problems - Boulder Meetup
Page 76: Simple Solutions for Complex Problems - Boulder Meetup

• Store JSON containing cluster membership in S3

• Container reads JSON on startup and creates routes w/ correct credentials

• Services only talk to the NATS daemon on their VM via localhost

• Don’t have to worry about encryption between services and NATS, only between NATS peers

NATS per VM

Page 77: Simple Solutions for Complex Problems - Boulder Meetup

• Only messages intended for a process on another host go over the network since NATS cluster maintains interest graph

• Greatly reduces network hops (usually 0 vs. 2-3)

• If local NATS daemon goes down, restart it automatically

NATS per VM

Page 78: Simple Solutions for Complex Problems - Boulder Meetup

• Doesn’t scale to large number of VMs

• Fairly easy to transition to floating NATS cluster or running on a subset of machines per AZ

• NATS communication abstracted from service

• Send messages to services without thinking about routing or service discovery

• Queue groups provide service load balancing

NATS per VM

Page 79: Simple Solutions for Complex Problems - Boulder Meetup

• We’re a SaaS company, not an infrastructure company

• High availability

• Operational simplicity

• Performance

• First-party clients: Go Java C C# Python Ruby Elixir Node.js

NATS as a Messaging Backplane

Page 80: Simple Solutions for Complex Problems - Boulder Meetup

• Handle failure at the client - The less state in your middleware & infrastructure, the easier it is to scale - Exponential backoffs with jitter

• But never trust the client - Rate limits, message size limits, back pressure - Be strict in what you accept - Limit failure domain by forcing applications to make design decisions upfront instead of punting

Important Corollaries

Page 81: Simple Solutions for Complex Problems - Boulder Meetup

Assume every client is trying to DoS you (because they probably are, intentionally or not).

Page 82: Simple Solutions for Complex Problems - Boulder Meetup

Assume every client is trying to DoS you (because they probably are, intentionally or not).

Page 83: Simple Solutions for Complex Problems - Boulder Meetup

–Derek Landy, Skulduggery Pleasant

“Every solution to every problem is simple… It's the distance between the two where the mystery lies.”

Page 84: Simple Solutions for Complex Problems - Boulder Meetup

@tyler_treat

github.com/tylertreat

bravenewgeek.com

Thanks!