postgresql high-availability and geographic locality using consul

72
PostgreSQL High-Availability and Geographic Locality using consul

Upload: sean-chittenden

Post on 16-Apr-2017

553 views

Category:

Software


2 download

TRANSCRIPT

Page 1: PostgreSQL High-Availability and Geographic Locality using consul

PostgreSQL High-Availability and Geographic Locality using consul

Page 2: PostgreSQL High-Availability and Geographic Locality using consul

Sean ChittendenEngineering, HashiCorp@[email protected]://keybase.io/seanc

Page 3: PostgreSQL High-Availability and Geographic Locality using consul

Quick Demo

Page 4: PostgreSQL High-Availability and Geographic Locality using consul

Consul Consul

dc2dc1PostgreSQLFollower

PostgreSQLLeaderPostgreSQL

Follower

Page 5: PostgreSQL High-Availability and Geographic Locality using consul

CONSULHASHICORP

Page 6: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Key Value Store

HTTP API

Host & Service Level Health

Checks

Datacenter Aware

Consul solves four central challenges with SOA

Service Discovery

HTTP + DNS

Page 7: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Installation

Page 8: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Overview

1. Introduction to Consul

2. Review of Consul

a. Architecture

b. Agent Functionality

c. Agent Configuration

d. Features

3. Further Reading

Page 9: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Introduction

Page 10: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul powers runtime orchestration

Page 11: PostgreSQL High-Availability and Geographic Locality using consul

CONSULHASHICORP

1. Service discovery

2. Service registry

3. Key/value store

4. Health checks

Page 12: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Glossary

Agent - Long-running daemon on every member of the Consul

cluster. The agent is able to run in either client or server mode.

Client - Agent that forwards all RPCs to a server and

participates in the LAN gossip pool.

Server - Agent that maintains cluster state, responds to RPC

queries, exchanges WAN gossip with other datacenters, and

forwards queries to leaders of remote datacenters.

Consensus - Agreement upon the elected leader

Page 13: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Glossary

Gossip - Random node-to-node communication primarily over

UDP that provides membership, failure detection, and event

broadcast information to the cluster. Built on Serf. Consul has

both LAN and WAN Gossip.

Datacenter - Networking environment that is private, low latency,

and high bandwidth. A Consul cluster is run per datacenter, so its

important to have low latency for the gossip protocol.

Page 14: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul vs. Other Software

- Opinionated framework for service discovery using DNS or HTTP

- Scalable gossip system that links server nodes and clients- Distributed health checking with edge triggered updates- Globally aware with multi-datacenter support- Operationally simple- Incorporation into the HashiCorp ecosystem

Page 15: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Architecture

Page 16: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Single Datacenter

CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT

SERVER SERVER SERVERREPLICATION REPLICATION

RPC

RPC LAN GOSSIP

Page 17: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Multi-Datacenter

CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT

SERVER SERVER SERVERREPLICATION REPLICATION

RPC

RPC LAN GOSSIP

SERVERSERVER SERVERREPLICATION REPLICATION

WAN GOSSIP

Page 18: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Raft Introduction

~/src/raft/thesecretlivesofdata/raft

open index.html

~/src/raft/raftscope

open index.html

Page 19: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

TCP and UDP Ports

Client HTTP RPCTCP/8500

DNSTCP/8600UDP/8600 LAN Gossip

TCP/8301UDP/8301

LAN GossipTCP/8301UDP/8301

RPCTCP/8400

RPCTCP/8400

WAN GossipTCP/8301UDP/8301

Clients consul1.dc1

Client RPC(HTTP)

DNSTCP/8600UDP/8600

Server RPCTCP/8300

consulN.dc2 consul2.dc1

Page 20: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Agent functionality (client or server)

- RPC, HTTP, DNS APIs

- Health Checks

- Event Execution

- Gossip Participation

- Membership

- Failure detection

Page 21: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Agent functionality (server)

- State replication

- Query Handling

- Leader election

- WAN Gossip

Page 22: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Failover via DNS

Page 23: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

DNS Failover

• Works across L3 boundaries in LAN environments

• Works across L3 boundaries in WAN environments

• Small TTLs• Workload Distribution• Clients cache DNS data• Not subject to spanning-tree

• Requires TCP connections be reset on failover

• Clients can cache stale DNS data

Pro Con

Page 24: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Installation

Page 25: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

consul Server 1/3

% cat config.json{ "acl_datacenter": "lab1", "acl_default_policy": "deny", "acl_master_token": "rootToken", "addresses": { "dns": "0.0.0.0", "http": "unix:///tmp/.consul.http.sock", "https": "0.0.0.0", "rpc": "unix:///tmp/.consul.rpc.sock" }, "bootstrap_expect": 3, "datacenter": "lab1", "data_dir": "./svc/data", "disable_remote_exec": true,

Page 26: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Server 2/3

"dns_config": { "allow_stale": true, "max_stale": "10080m", "node_ttl": "60s", "service_ttl": { "*": "5s", "stable-service": "86400s" } }, "encrypt": "[ random mime encoded data ]", "log_level": "debug", "ports": { "https": -1 }, "server": true, "unix_sockets": { "mode": "0700" }}

Page 27: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Server 3/3

% cat svc/run#!/bin/sh --set -eexec 2>&1exec \ /usr/bin/env -i \ ./bin/consul agent \ -config-file=./config.json \ -config-dir=./conf.d/

% cat svc/log/run#!/bin/sh —set -eset 2>&1exec chpst -u _log:_log svlogd ./main

Page 28: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Cluster

% consul membersNode Address Status Type Build Protocol DCvm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1% consul join 172.16.139.139 172.16.139.138Successfully joined cluster by contacting 2 nodes.% consul membersNode Address Status Type Build Protocol DCvm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1vm2 172.16.139.138:8301 alive server 0.7.0dev 2 lab1vm3 172.16.139.139:8301 alive server 0.7.0dev 2 lab1

Page 29: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Cluster

% consul infoagent: check_monitors = 0 check_ttls = 0 checks = 0 services = 1build: prerelease = dev revision = 'fa26d5f version = 0.7.0consul: bootstrap = false known_datacenters = 2 leader = false leader_addr = 172.16.139.139:8300 server = true[snip]

Page 30: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Cluster

% consul info[snip]raft: applied_index = 103339 commit_index = 103339 fsm_pending = 0 last_contact = 82.95803ms last_log_index = 103339 last_log_term = 50663 last_snapshot_index = 98437 last_snapshot_term = 2228 num_peers = 2 raft_peers = 172.16.139.139:8300,172.16.139.138:8300,172.16.139.140:8300 state = Follower term = 50663[snip]

Page 31: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

dnsmasq Config

% cat /usr/local/etc/dnsmasq.conflocal-serviceport=53server=/consul/127.0.0.1#8600rev-server=172.16.0.0/12,127.0.0.1#8600server=208.67.222.222server=208.67.220.220cache-size=65536% cat /etc/resov.confsearch localdomainnameserver 127.0.0.1

Page 32: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Service DiscoveryHTTP + DNS

Page 33: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

- Nodes, Services, Checks

- Simple registration (JSON)

- DNS Interface

- HTTP API

Service Discovery

Page 34: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

PostgreSQL Service

% hostnamepg002% cat config.d/pg-db.json{ "service": { "name": "pg-db", "tags": ["follower"], "port": 5432, "checks": [{ "id": "pg-alive", "notes": "Make sure connect and queries work", "script": "/usr/local/bin/check_postgresql", "interval": "10s" }] }}

Page 35: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

$ dig follower.pg-db.service.consul

Page 36: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

$ dig follower.pg-db.service.consul; <<>> DiG 9.8.3-P1 <<>> follower.pg-db.service.consul; (3 servers found);; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 946;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0;; WARNING: recursion requested but not available

;; QUESTION SECTION:;follower.pg-db.service.consul. IN A

;; ANSWER SECTION:follower.pg-db.service.consul. 0 IN A 172.16.139.141

Page 37: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

$ dig follower.pg-db.service.consul SRV; <<>> DiG 9.8.3-P1 <<>> follower.pg-db.service.consul SRV; (3 servers found);; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 480;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1;; WARNING: recursion requested but not available

;; QUESTION SECTION:;follower.pg-db.service.consul. IN SRV

;; ANSWER SECTION:follower.pg-db.service.consul. 0 IN SRV 1 1 5432

Page 38: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

DNS Interface

- Zero Touch

- Randomized Round-Robin DNS

- Filters on Health Checks

Page 39: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

HTTP API

- HTTP API

- Custom Integrations

Page 40: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Host & Service Level Health Checks

Page 41: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

What is a health check?

0PASSING

1WARNING

__FAILING

Any command that returns an exit code

Page 42: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Health Checks & Monitoring

- Nagios-compatible

- Scalable

- Actionable

- Edge Triggered

Page 43: PostgreSQL High-Availability and Geographic Locality using consul

Text Editor

HASHICORP

% cat conf.d/mem-check.json{ "check": { "id": "mem-util", "name": "Memory utilization", "script": "/usr/local/bin/mem_check.sh", "interval": "10s" }}

Creating a check

Use a custom script

Page 44: PostgreSQL High-Availability and Geographic Locality using consul

Text Editor

HASHICORP

% cat conf.d/http-check.json{ "check": { "id": "api", "name": "HTTP API on port 4455", "http": "http://localhost:4455/_health", "interval": "10s", "timeout": "1s" }}

Creating a check

Use a built-in check type

Page 45: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

HEALTH CHECKINGSERVICE

DB 1

DB 2

DB N

"Are you healthy?"

Page 46: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

HEALTH CHECKINGSERVICE

DB 1

DB 2

DB N

"Are you healthy?""Yessir!"

Page 47: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

HEALTH CHECKINGSERVICE

DB 1

DB 2

DB N

"Are you healthy?"

"What about you?"

"Yessir!"

Page 48: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

HEALTH CHECKINGSERVICE

DB 1

DB 2

DB N

"Are you healthy?"

"What about you?"

"Yessir!"

"Nah"

Page 49: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

DB 1

DB 2

DB N

HEALTH CHECKINGSERVICE

"Are you healthy?"

"What about you?"

"Yessir!"

"Nah"

Page 50: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Traditional Health Checking (pull)

DB 1

DB 2

DB N

HEALTH CHECKINGSERVICE

1,000'S OFREQUESTS

Page 51: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Health Checking (push)

CONSUL

DB 1

DB 2

DB N

My status has changed

Page 52: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Consul Health Checking (push)

CONSUL

DB 1

DB 2

DB N

10'S OFREQUESTS

Page 53: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Liveness

- No Heartbeats

- Gossip-based Failure Detector built

on Serf

- Constant Load

Page 54: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

HTTP UI

http://172.16.139.138:8500/ui/#/lab1/services

Page 55: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Key Value StoreHTTP API

Page 56: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

$ curl -X PUT -d 'bar' http://localhost:8500/v1/kv/footrue

Page 57: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

$ curl -X PUT -d 'bar' http://localhost:8500/v1/kv/footrue

$ curl http://localhost:8500/v1/kv/foo[ { "CreateIndex": 100, "ModifyIndex": 200, "Key": "foo", "Flags": 0, "Value": "YmFy" }]% echo -n 'bar' | base64YmFy% echo -n 'YmFy' | base64 -d ; echobar

Page 58: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

% cat <<EOF > acl.anonymous.json{ "ID": "anonymous", "Name": "Anonymous Token", "Type": "client", "Rules": "# Default all keys to read-onlykey \"\" { policy = \"read\"}

# Default all services to read-onlyservice \"\" { policy = \"read\"}

# Allow hearing any user event by default.event \"\" { policy = \"read\"}

Page 59: PostgreSQL High-Availability and Geographic Locality using consul

Terminal

HASHICORP

# Default prepared queries to read-only.query \"\" { policy = \"read\"}

# Read-only mode for the encryption keyring by default (list only)keyring = \"read\""}EOF% curl -v -X PUT -d @acl.anonymous.json --unix-socket /tmp/.consul.http.sock 'http://consul/v1/acl/update?token=rootToken'

Page 60: PostgreSQL High-Availability and Geographic Locality using consul

Prepared Queries

Page 61: PostgreSQL High-Availability and Geographic Locality using consul

Use Case• Multiple instances of a given service exist in

multiple datacenters

• Clients can talk to any of them, and always prefer the instances with lowest latency

• Policies can change, desire to not have the clients know the details of how to locate a healthy service

Page 62: PostgreSQL High-Availability and Geographic Locality using consul

Prepared Queries• New query namespace, similar to services

• Register queries to answer for parts of this namespace

• Clients use APIs, or “.query.consul” DNS lookups to run queries

• Magic happens :-)

Page 63: PostgreSQL High-Availability and Geographic Locality using consul

pg-db with Failover$ curl -X POST -d \'{ "Name": "geo-pg-db—follower", "Service": { "Service": "pg-db", "Failover": { "NearestN": 3 }, "Tags": ["follower"] }}’ localhost:8500/v1/query

geo-pgdb—follower.query.consul

Page 64: PostgreSQL High-Availability and Geographic Locality using consul

PostgreSQL Template$ curl -X POST -d \'{ "Name": "geo-db", "Template": { "Type": "name_prefix_match", "Regexp": "^geo-db-(.*?)-([^\\-]+?)$" }, "Service": { "Service": "pg—${match(1)}", "Failover": { "NearestN": 3, "Datacenters": ["dc1", "dc2"] }, "OnlyPassing": true, "Tags": ["${match(2)}"] }}' localhost:8500/v1/query

geo-db-customer-leader.query.consulgeo-db-customer-follower.query.consulgeo-db-billing-follower.query.consul

leader.pg-customer.service.consulfollower.pg-customer.service.consulfollower.pg-billing.service.consul

Page 65: PostgreSQL High-Availability and Geographic Locality using consul

Catch All Template$ curl -X POST -d \'{ "Name": "", "Template": { "Type": "name_prefix_match" }, "Service": { "Service": "${name.full}", "Failover": { "NearestN": 3 } }}' localhost:8500/v1/query

*.query.consul

With a single query template, all services can fail over to the nearest healthy service in a different datacenter!

Page 66: PostgreSQL High-Availability and Geographic Locality using consul

Under the Hood: Network Tomography

• Rides on pings that are part of LAN and WAN gossip

• Models networking round trip time using simple physics simulation with masses and springs

• Develops a set of “network coordinates” for round trip time estimation with a simple calculation

Page 67: PostgreSQL High-Availability and Geographic Locality using consul

Under the Hood: Network Tomography

Page 68: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

ConsulConclusion

Page 69: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Key Value Store

HTTP API

Host & Service Level Health

Checks

Datacenter Aware

Consul solves four central challenges with SOA

Service Discovery

HTTP + DNS

Page 70: PostgreSQL High-Availability and Geographic Locality using consul

HASHICORP

Further reading

- Consul vs. Other Software:

consul.io/intro/vs/index.html- Consul Agent:

consul.io/docs/agent/basics.html- Consul Commands:

consul.io/docs/commands/index.html- Consul Internals:

consul.io/docs/internals/index.html

Page 71: PostgreSQL High-Availability and Geographic Locality using consul

Questions?