deep dive into cluster manager in couchbase server 4.0: couchbase connect 2015

50
DEEP DIVE INTO CLUSTER MANAGER IN 4.0 Dave Finlay Snr Director Server Engineering, Couchbase

Upload: couchbase

Post on 11-Aug-2015

83 views

Category:

Technology


3 download

TRANSCRIPT

DEEP DIVE INTO CLUSTER MANAGER IN 4.0

Dave FinlaySnr Director Server Engineering, Couchbase

©2015 Couchbase Inc. 2

Cluster Manager

What is this “Cluster Manager” of which you speak?

©2015 Couchbase Inc. 3

First: We Need a Cluster!

machine 1 machine 2 machine 3

Ethernet

©2015 Couchbase Inc. 4

The “Depth Gauge”

cluster

node

os process

erlang process

The “depth gauge” indicates the current

zoom-level as we dive into Cluster

Manager

©2015 Couchbase Inc. 5

The “Depth Gauge”

cluster

node

os process

erlang process

©2015 Couchbase Inc. 6

Back to the Cluster

machine 1 machine 2 machine 3

Ethernet

Couchbase Node

Couchbase Node

Couchbase Node

©2015 Couchbase Inc. 7

Back to the Cluster

machine 1 machine 2 machine 3

Ethernet

Couchbase Node

Couchbase Node

Couchbase Node

©2015 Couchbase Inc. 8

Back to the Cluster

machine 1 machine 2 machine 3

Ethernet

node 1 node 2 node 3

This is the Cluster

Manager

©2015 Couchbase Inc. 9

What Does the Cluster Manager Do?

©2015 Couchbase Inc. 10

What Does the Cluster Manager Do?

manage services

manage data topology

manage replications

provide metadata

service

auth / security REST API UIstats,

monitoring, simple alerting

©2015 Couchbase Inc. 11

Anatomy of a Node

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

©2015 Couchbase Inc. 12

Anatomy of a Node

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

The Cluster Manager is babysitter

and ns-server

©2015 Couchbase Inc. 13

Anatomy of a Node

machine 1

babysitter

mem

cach

ed

ns-se

rver

Memcached serves key-

value memcached requests & is deployed on

“data service” nodes

©2015 Couchbase Inc. 14

Anatomy of a Node

machine 1

babysitter

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

XDCR, view-engine and some other

services also deployed on “data nodes”

©2015 Couchbase Inc. 15

Anatomy of a Node

machine 1

babysitter

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

XDCR and view-engine

were previously part of ns-

server. They’ve been carved out for

improved robustness

©2015 Couchbase Inc. 16

Anatomy of a Node

machine 1

babysitter

indexe

r

mem

cach

ed

ns-se

rver

Indexer is deployed on nodes with

the “indexing service”

xdcr

vie

w -

en

gin

e

oth

er …

©2015 Couchbase Inc. 17

Anatomy of a Node

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

Query is deployed on

“query nodes” and serves N1QL

requests xdcr

vie

w -

en

gin

e

oth

er …

©2015 Couchbase Inc. 18

Oh no, it’s the Chaos Monkey!

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

xdcr

©2015 Couchbase Inc. 19

Oh no, it’s the Chaos Monkey!

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

xdcr

©2015 Couchbase Inc. 20

Inside ns-server

per-node-&-bucket services

generic distributed facilities

generic local facilities

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

©2015 Couchbase Inc. 21

A Little Bit About Erlang

Developed in the 80’s for use in switches Such as this – the

famous AXD 301 switch which

boasted ridiculous reliability

(Even allowing for hyperbole, it was pretty reliable)

©2015 Couchbase Inc. 22

What Cool About Erlang?

According to Four-fold Increase in Productivity and Quality— Industrial-Strength Functional Programmingin Telecom-Class Products, http://www.erlang.se/publications/Ulf_Wiger.pdf

©2015 Couchbase Inc. 23

Key Feature of Erlang #1: Processes & Messages

process

Active, “lightweight processes”

Supports “actor model” style of programming

Processes have a mailbox for incoming messages

©2015 Couchbase Inc. 24

Key Feature of Erlang #1: Processes & Messages

process Pxyz

process Q

Values are immutable (no shared state)

Communication asynchronous

P ! x,P ! y,P ! z.

©2015 Couchbase Inc. 25

Key Feature of Erlang #1: Processes & Messages

aprocess Q

x y z

process P

“Selective receive”

receive y -> Q ! aend.

©2015 Couchbase Inc. 26

Key Feature of Erlang #1: Processes & Messages

process Q

x z

process P

a

©2015 Couchbase Inc. 27

Key Feature of Erlang #1: Processes & Messages

process Q

x z

a

bprocess P

©2015 Couchbase Inc. 28

1

gen_server

Key Feature of Erlang #2: OTP

21process

Such as gen_server, a generic, stateful,

startable and stoppable server

Framework that provides useful higher-level

constructs built from Erlang primitives

©2015 Couchbase Inc. 29

Key Feature of Erlang #2: OTP

supervisor

gen_server3

gen_server2

gen_server1

Also such as supervisor, a process

responsible for starting, restarting, stopping

and monitoring its child processes

©2015 Couchbase Inc. 30

gen_server2

gen_server2

Oh no, it’s the Chaos Monkey!

supervisor

gen_server3

gen_server1

©2015 Couchbase Inc. 31

gen_server2

gen_server2

Oh no, it’s the Chaos Monkey!

supervisor

gen_server3

gen_server1

©2015 Couchbase Inc. 32

Inside ns-server

ns_serverThe ns_server ‘application’.

Applications describe the bundle of code that is started and stopped

together.

©2015 Couchbase Inc. 33

Inside ns-server

ns_server ns_server_cluster_sup

The application is responsible for starting

the top level supervisor:

ns_server_cluster_sup

Supervises all supervisors and by extension all Erlang

processes in ns_server

©2015 Couchbase Inc. 34

Inside ns-server

ns_server ns_server_cluster_sup

ns_config_sup ns_clusterother

services…ns_server_nodes_sup

Main supervisor for the node outside of

config mgmt

Everything related to managing config: e.g.

storing, updating change

notifications

Performs node join / leave requests

Other stuff we won’t focus on

today

©2015 Couchbase Inc. 35

Inside ns-server

ns_server ns_server_cluster_sup

ns_config_sup ns_clusterother

services…ns_server_nodes_sup

ns_server_sup otherservices…Controls main

part of supervision hierarchy

©2015 Couchbase Inc. 36

Inside ns-server

ns_server ns_server_cluster_sup

ns_config_sup ns_cluster otherservices

…ns_server_nodes_sup

ns_server_sup otherservices…

©2015 Couchbase Inc. 37

Inside ns-server

ns_server_sup

mb_master ns_node_disco_sup menelaus_sup

Responsible for master election

Node discovery supervisor

Supervisor for REST API and web UI app

serverns_orchestrator

mb_master_sup

©2015 Couchbase Inc. 38

Inside ns-server

ns_server_sup

ns_bucket_supother

processes

©2015 Couchbase Inc. 39

Inside ns-server

ns_bucket_sup

single_bucket_kv_supother

processes

ns_memcached_sup

ns_memcached

ns_memcached mediates

communication with memcached

process and tracks its health

©2015 Couchbase Inc. 40

Inside ns-server

ns_bucket_sup

single_bucket_kv_supother

processes

ns_memcached_supreplicationmanager

dcp_sup

dcp_replicator-node1

dcp_replicator-node2

ns_memcached

dcp_sup and the replicators

manage inbound (consumer) DCP

replication streams

©2015 Couchbase Inc. 41

Inside ns-server

ns_bucket_sup

single_bucket_kv_supother

processes

ns_memcached_supreplicationmanager

dcp_sup

dcp_replicator-node1

dcp_replicator-node2

janitor_agent_sup

janitoragent

ns_memcached

The janitor agent keeps things nice

and clean and provides remote communication

façade for bucket management

©2015 Couchbase Inc. 42

NS-Server in Action: Rebalance

ns_rebalancer

idlerebalancing

ns_orchestratormenelaus_web

node master node

rebalance

rebalancing

User clicks “Rebalance”

button

State -> “rebalancing”

Orchestrator starts

rebalancer

©2015 Couchbase Inc. 43

ns_single_vbucket_mover1

ns_single_vbucket_mover2

NS-Server in Action: Rebalance

ns_rebalancer ns_bucket_mover

master node

Rebalance computes new vbucket map & starts the

bucket mover

Bucket mover in turn starts

single vbucket movers

©2015 Couchbase Inc. 44

janitoragent

NS-Server in Action: Rebalance

master nodeto-node for vbucket

dcp_replicator

dcp_supjanitoragent

beginns_single_vbucketmover

from-node for vbucket

Single vbucket mover sends

request to new node to begin

replicating

memcachedto-node

memcachedfrom-node

©2015 Couchbase Inc. 45

janitoragent

NS-Server in Action: Rebalance

master nodeto-node for vbucket

dcp_replicator

dcp_sup

from-node for vbucket

wait

ns_single_vbucketmover

memcachedto-node

memcachedfrom-node

janitoragent

Then mover waits until the backfill is

done and replication is up-

to-date

©2015 Couchbase Inc. 46

ns_single_vbucketmover

janitoragent

NS-Server in Action: Rebalance

master nodeto-node for vbucket

dcp_sup

from-node for vbucket

memcachedto-node

janitoragent

dcp_replicator

memcachedfrom-node

Lastly mover converts the

backfill replication stream to a “takeover”

stream

takeover

©2015 Couchbase Inc. 47

Inside ns-server

per-node-&-bucket services

generic distributed facilities

generic local facilities

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

©2015 Couchbase Inc. 48

Anatomy of a Node

machine 1

babysitter

query

indexe

r

mem

cach

ed

ns-se

rver

xdcr

vie

w -

en

gin

e

oth

er …

©2015 Couchbase Inc. 49

Back to the Cluster

machine 1 machine 2 machine 3

Ethernet

node 1 node 2 node 3

Get Started Today Couchbase Server 4.0 & N1QL

couchbase.com/beta