dockercon eu 2015: dns service discovery for docker swarm

75
DNS Service Discovery for Docker Swarm Ahmet Alp Balkan @AhmetAlpBalkan Software Engineer, Microsoft

Upload: docker-inc

Post on 16-Apr-2017

7.924 views

Category:

Technology


0 download

TRANSCRIPT

DNS Service Discovery for Docker Swarm

Ahmet Alp Balkan@AhmetAlpBalkan

Software Engineer, Microsoft

Slides are available at

://aka.ms/srv-dsc

About The Speaker

3

I am Ahmet Alp Balkan, a software engineer at Microsoft.

I contribute to Open Source.

Follow me at @AhmetAlpBalkan.

About This Talk

4

Service Discovery

Service Discovery Methods A peek into various solutions Thought exercises

Service Discovery for Docker Swarm Can a drop-in tool just do it™ for Swarm?

Where are we headed?

Before we begin

5

how many of you…

… use Docker Swarm?

… used a Service Discovery method?

… wrote or configured a DNS server?

Service Discovery

6

Cluster

Service Discovery

7

Cluster

machine1

machine3machine4 machine2

machine5machine6

machine7machine8

Service Discovery

8

Cluster

machine3machine4 machine2

machine5

machine7machine8

db db

db

machine6

machine1

Service Discovery

9

Cluster

machine3machine4 machine2

machine5

machine7machine8

api

api

api

db db

db

machine6

machine1

Service Discovery

10

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

machine6

machine1

Service Discovery

11

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

stuff

stuff

stuff

stuff

stuff

machine6

machine1

Service Discovery

12

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

stuff

stuff

stuff

stuff

stuff

machine6

machine1

Service Discovery

13

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

stuff

stuff

stuff

stuff

stuff

machine6

machine1

Service Discovery

14

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

stuff

stuff

stuff

stuff

stuff

machine6

machine1

Service Discovery

15

Cluster

machine3machine4 machine2

machine5

machine7machine8

web

web

api

api

api

db db

db

stuff

stuff

stuff

stuff

stuff

machine6

machine1

Service Discovery

16

ServiceA ServiceB

help services find and talk to each other

Service Discovery

17

ServiceA ServiceBaddr

help services find and talk to each other

Service Discovery

18

ServiceA ServiceB

ServiceBServiceB

ServiceB

ServiceB

ServiceB

ServiceB

Service Discovery

19

ServiceA ServiceB

ServiceBServiceB

ServiceB

ServiceB

ServiceB

ServiceB

Service Discovery Methods

20

before we begin…

Service Discovery Methods

21

before we begin…

SPOILER: No method is actually really good.

Service Discovery Methods

22

before we begin…

SPOILER: No method is actually really good.

This is still an unsolved problem.

Thought exercise, not a comparison.

Service Discovery in my dreams…

23

it comes with the orchestrator

…or it is a “setup and forget about it”

does not infect the application codewith the service discovery concern,

uses a reliable networking stack,

does not have too many moving parts.

Common Approaches

24

to Service Discovery

Overlay Networks

Mixing Tools (docker bridge + template + rev proxy)

Port Scanning

Domain Name Service

25

Overlay networks

Overlay Networks

26

node0

ServiceA

node1

ServiceB

*magic*

serviceB:80

Overlay Networks

27

Good at container-to-container networking

Static port allocation not a problem IP address per container

Seamless, does not change the application code

Introduces network latency overhead*

Docker 1.9 Multi-Host Networking

28

Container-to-container overlay network

Discovery* through /etc/hosts entries (DNS)

Lacking a load balancer (how to http://serviceA?)

29

Mixin’ Tools(because we can)

30

docker bridge template

reverse proxy

service registry

HAProxy

NGINX

Træfik

interlock

confd

etcd

consul

zookeeper

interlock

registrator

Mixin’ Tools(because we can)

consul-template

31

Reverse Proxies

Reverse Proxies (TCP/IP Load Balancers)

32

HAProxy, NGINX, Træfik, kube-proxy…

Route load balance traffic to multiple backends can route traffic from/to any port list of backends can be dynamically updated

ServiceA ServiceBProxy node1:32012

ServiceB node2:33406

ServiceB node7:32104

33

HAProxy, NGINX, Træfik, kube-proxy…

Load balancing

Health checks do not route traffic to unhealthy backends

ServiceA ServiceBProxy node1:32012

ServiceB node2:33406

ServiceB node7:32104

health probes

Reverse Proxies (TCP/IP Load Balancers)

34

Sticky sessions route a client’s traffic to the samebackend between requests/connections

Origin-based access control (ACLs)

HAProxy, NGINX, Træfik, kube-proxy…

ServiceA ServiceBProxy

ServiceC ServiceBProxy

Reverse Proxies (TCP/IP Load Balancers)

172.0.1.10

172.0.1.22

35

Connection draining* wait all connections to close before removing the backend from the routing list

Allows blue-green deployments flip the switch → the new version of your service starts getting traffic

HAProxy, NGINX, Træfik, kube-proxy…Reverse Proxies (TCP/IP Load Balancers)

36

Downside: another moving part that can fail what if the proxy server crashes?

Downside: discovery of the proxy server itself where do you place the proxy server(s)? what happens when they get rescheduled to another host? how do you discover proxy servers?

Downside: introduces latency overhead

HAProxy, NGINX, Træfik, kube-proxy…Reverse Proxies (TCP/IP Load Balancers)

37

Bridging Reverse Proxies with Docker

interlock

registrator

38

Discover new containers through Docker Events API on container events, update NGINX/HAProxy

Plugin model write your own event handler

Interlock by @ehazlett

docker engine interlock

event: startnginx

update nginx.conf

Get Container Details

SIGHUP

event: stopevent: die

39

Discover new containers through Docker Events API

Writes service definitions to consul/etcd

Registrator by @progrium

docker engine registrator

event: startconsul

Save

Get Containers

event: stopevent: die

40

You can then use consul-template/confd update haproxy/nginx backends list

Registrator by @progrium

docker engine registrator

event: startconsul

Save

Get Containers

event: stopevent: die

consul-template

Watch

nginxupdate nginx.conf

41

docker bridge template

reverse proxy

service registry

HAProxy

NGINX

Træfik

interlock

confd

etcd

consul

zookeeper

interlock

registrator

Mixin’ Tools(because we can)

consul-template

42

docker bridge template

reverse proxy

service registry

HAProxy

NGINX

Træfik

interlock

confd

etcd

consul

zookeeper

interlock

registrator

Mixin’ Tools(because we can)

consul-template

Mixin’ Tools

43

Far too many moving parts

How do you deploy these components HA?

You still have N point of failures & additional latency

Connection draining feature is a lie: …unless orchestrator coordinates with the reverse proxy Stopping the container will justdrop the connections.

Connection draining done right

44

kube-proxy handles load balancing in Kubernetes.

When you stop a pod, it is not stopped right away.

Remaining open connections stay alive for T.(T=grace period, configurable)

Also pre-start/post-start hooks for containers in pods

“Zero downtime rolling upgrades in 1M requests/sec”http://blog.kubernetes.io/2015/11/one-million-requests-per-second-dependable-and-dynamic-distributed-systems-at-scale.html

45

Port Scanning(nmap)

Port Scanning in Overlay Networks

46

by Jeff Nickoloff (github.com/allingeek/nmap-sd)

Add connected containers to a network (such as Docker 1.9 overlay driver)

Scan open ports in the network’s subnet periodically (as long as your subnet is small, it’s very reasonable)

Reports accessible ports to a file (bind volume)

Refresh reverse proxy config, route the traffic!

47

DNS(domain name system)

Motives for DNS

48

Started in 1984, roughly at the same time as TCP/IP

Humans suck at remembering IP addresses google.com → 2a00:1450:4003:806::200e

and IP addresses do not stick around forever

Can this 30-year old tech save us?

DNS Service Discovery

49

ServiceA ServiceB

addr

DNS

IPs

DNS Service Discovery

50

ServiceA ServiceBIPs

DNS

Intro to DNS Resource Records

51

Type A/AAAA records <hostname> → <IP>$ dig A +short docker.com.52.7.79.6152.22.96.10854.84.192.71

ugly truth: has no port information can’t support dynamic port-assigned containers :(

Intro to DNS Resource Records

52

Type SRV records <hostname> → <IP, port, weight>$ dig SRV +short _database._tcp.local.1 1 32770 192.168.0.41 1 32769 192.168.0.71 1 32801 192.168.0.6

ugly truth: SRV is neither used anywhere, nor getting adopted.new MySQLDriver(“_database._tcp.local”)

ain’t happenin'

Bad News

53

SRV is cool but not getting any adoption at all.

We are left with A/AAAA records = IP addresses Works if all your instances are on static ports (such as docker run -p 80:80) When you do dynamic ports (docker run -P),you need to resolve the port from SRV rec.host, port = resolveSRV(“_database._tcp.local”)… = new MySQLDriver(host, port)

you don’t want to do this all the time

DNS

54

Advantage: very simple, far less moving parts

Disadvantage: goodbye dynamic port allocation

Advantage: reduces load on middleware (DNS TTL)

Disadvantage: some languages* do not obey TTLs

Advantage: uses existing network stack

Disadvantage: no resilient way to do health checks

Advantage: load balancing by shuffling IPs :)

55

Mesos-DNS

Mesos-DNS

56

github.com/mesosphere/mesos-dns

Deploy once, forget about it Designed for Apache Mesos

Queries /state.json periodically todiscover new tasks

mesos-master

task

sync

mesos-dns

mesos-mastermesos-master

DNS query

DNS records

Mesos-DNS

57

github.com/mesosphere/mesos-dns

Stateless Easy to replicate and make it HA

Provides HTTP REST API For example, write your own SRV routerwithout doing any SRV calls

Many features IPSec, SOA/SRV/A records, DNS over TCP…

58

SkyDNS

SkyDNS

59

github.com/skynetservices/skydns2

Very similar to Mesos-DNS.

Closely coupled to etcd.

Really complicated, probably does everything. Kinda hard to set up, too.

Embraces plugin model, but only plugin is etcd.

Used by Kubernetes as default DNS add-on.

60

waglMinimalistic DNS Service Discovery for Docker Swarm

61

waglMinimalistic DNS Service Discovery for Docker Swarm

Service Discovery in my dreams…

62

it comes with the orchestrator

…or it is a “setup and forget about it”

does not infect the application codewith the service discovery concern,

uses a reliable networking stack,

does not have too many moving parts.

wagl

63

github.com/ahmetalpbalkan/wagl

Inspired by Mesos-DNS, built for Swarm ♥

Install and forget about it

Stateless, easy to replicate

Speaks Docker language, runs inside container

Minimalistic feature set, because I’m lazy

wagl architecture

64

swarm waglevents

service

Get Containers

refresh periodically

DNS query

DNS records

wagl Placement

65

swarm manager

wagl

master0

swarm manager

wagl

master2

swarm manager

wagl

master1

stuff

node0

stuff

stuff

stuff …

docker run --dns master0 --dns master1 --dns master2 …

node1 nodeN

Service Naming

66

Using “Docker Labels”

docker run -p 80:80 \ -l dns.service=api \ nginx

http://api.billing.swarm.

More labels…

67

docker run -p 80:80 \ -l dns.service=api \ -l dns.service=billing \ nginx

http://api.billing.swarm.

Features

68

Only DNS A/SRV records

Natural Load Balancing by shuffling DNS records

External DNS recursion

Works well with Docker TLS authentication

Deploying wagl

69

Just run:docker run -d --restart=always --name=dns \ -p 53:53/udp \ --link=swarm-master:swarm \ ahmet/wagl \ wagl --swarm tcp://swarm:3376

If it can get any easier, it means I have failed.

70

wagl in Action(demo time)

Embrace & Contribute

71

Source code:https://github.com/AhmetAlpBalkan/wagl

72

Where are we headed?

Where are we headed?

73

These are just baby steps (expect innovation here)

We need a complete and seamless solution

The solution will not change the application code

A combination of DNS + Reverse Proxy can be it

Watch for what orchestrators are going to adapt

Find more atgithub.com/AhmetAlpBalkan/wagl

Thank you!Ahmet Alp Balkan@AhmetAlpBalkan