under the covers - couchbase server architecture: couchbase connect 2014

193
Under the Covers: Couchbase Server Architecture Steve Yen | co-founder, Couchbase

Upload: couchbase

Post on 20-Aug-2015

1.863 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Under the Covers:Couchbase Server Architecture

Steve Yen | co-founder, Couchbase

version 20141003.2

Page 2: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 2

System diagrams

Let there be a bucket

Rebalance

Agenda

Page 3: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 3

Definitions

Page 4: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 4

Bucketa logical container of keys & values

Definitions

Page 5: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 5

Bucketa logical container of keys & values

Partitiona sub-part or division of a Bucket;

a Bucket is made up of multiple Partitions;we can allocate Partitions onto different server nodes

Definitions

Page 6: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 6

Bucketa logical container of keys & values

Partitiona sub-part or division of a Bucket;

a Bucket is made up of multiple Partitions;we can allocate Partitions onto different server nodes

Rebalancethe orchestrated migration of Partitions amongst server nodes

in order to spread the load, and achieve replication constraints

Definitions

Page 7: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 7

System diagrams

Let there be a bucket

Rebalance

Agenda

Page 8: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a datacenter

a datacenter

Page 9: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a datacenter

load balancer

web app server web app serverweb app server

Page 10: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a datacenter

load balancer

Couchbase Cluster

Couchbase Server

Couchbase Server

Couchbase Server

Couchbase Server

Couchbase Server

web app server web app serverweb app server

Page 11: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a couchbase cluster

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 12: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a couchbase cluster

Cluster

Manager

--------------

Data

Manager

Clu

ster

Man

ager

--------------

Data

Man

ager

ClusterManager

--------------Data

Manager

Clu

ster

Man

ager

----

----

----

--D

ata

Man

ager

ClusterManager

------

------

--Data

Manager

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 13: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a couchbase cluster

Cluster

Manager

--------------

Data

Manager

Clu

ster

Man

ager

--------------

Data

Man

ager

ClusterManager

--------------Data

Manager

Clu

ster

Man

ager

----

----

----

--D

ata

Man

ager

ClusterManager

------

------

--Data

Manager

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 14: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 14

Inside a node

ClusterManager

--------------Data

Manager

Couchbase Server

Page 15: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 15

Inside a node

ClusterManager

--------------Data

Manager

Couchbase Server

Page 16: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 16

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 17: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 17

11209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 18: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 18

11209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 19: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 19

11209, 11210

Inside a node / OS processes

babysitter (erlang)

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

ns-server / view-engine(erlang)

godu(golang)

cert gen(golang)

map gen(golang)

Page 20: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 20

11209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 21: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 21

11209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 22: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 22

11209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 23: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 23

11209, 11210

Inside a node / OS processes

babysitter (erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 24: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 24

8091, 8092, 11214, 11215, …

Inside the Cluster Manager

ns-server / view-engine(erlang)

Page 25: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 25

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Page 26: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 26

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Framework for building reliable, clustered systems

Page 27: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 27

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Page 28: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 28

Inside the Cluster Manager / ns-server

ns-server

Page 29: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 29

per-node-&-bucket services

generic facilities

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

Page 30: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 30

generic facilities

per-node-&-bucket services

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

Services that run only on a single master node

Master will be selected at runtime

ClusterManager

--------------Data

Manager

Couchbase Server

Page 31: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 31

generic facilities

per-node-&-bucket services

Inside the Cluster Manager / ns-server

master-only services

REST admin

config gossip replication

per-node services

per-node-&-bucket services

Services that run on every node

Examples- node heart beat- XDCR

Page 32: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 32

generic facilities

per-node-&-bucket services

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

Services that run on every nodefor every bucket

Example- per node per bucket stats collection

Page 33: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 33

generic facilities

per-node-&-bucket services

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

REST admin

+ client-side JS / admin web UI

Page 34: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 34

per-node-&-bucket services

generic facilities

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

Page 35: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 35

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

master-only services

REST admin

per-node services

per-node-&-bucket services

Page 36: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 36

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misc

master-only services

REST admin

per-node services

per-node-&-bucket services

Librariesvector clockswork queuesevent pub/sub

github.com/couchbase/ns-server/… misc|vclock|event

Page 37: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 37

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misclogging (ALE)

master-only services

REST admin

per-node services

per-node-&-bucket services

Another Logger for Erlang

“ALE is the best!”“Awesome Logger for Erlang”

github.com/aartamonau/ale”

Page 38: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 38

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misclogging (ALE)

master-only services

REST admin

local config store

per-node services

per-node-&-bucket services

Local Config Store

simple local storageof configuration data

github.com/couchbase/ns_server … ns_config

Page 39: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 39

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

local config store

per-node services

per-node-&-bucket services

Distributed Node Discovery

… when nodes appear & disappear

github.com/couchbase/ns_server … node_disco

Page 40: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 40

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

Config Gossip Replication

eventually consistent distributed config

vector clock based

github.com/couchbase/ns_server … ns_config_rep

Page 41: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 41

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside the Cluster Manager / ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

Page 42: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 42

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Page 43: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 43

11209, 11210

Inside a node / OS processes

babysitter (erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 44: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 44

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

cert gen(golang)

map gen(golang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 45: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 45

Inside the Data Manager11

209,

112

10

memcached(c/c++)

Page 46: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 46

1120

9, 1

1210

Inside the Data Manager

libevent

Page 47: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 47

1120

9, 1

1210

Inside the Data Manager / libevent

libevent

libeventhigh performance cross-platform library for non-blocking network I/O

Page 48: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 48

1120

9, 1

1210

Inside the Data Manager / networking

libevent

networking layer / conn thread pool thread0 thread1 thread2 thread3

networking layerlisten()’s & accept()’s

connections;assigns connection to a worker

thread;parses incoming bytes to

messages

Page 49: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 49

1120

9, 1

1210

Inside the Data Manager / engine manager

libevent

networking layer / conn thread pool

engine manager

thread0 thread1 thread2 thread3

engine managerloads and manages engines

Page 50: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 50

1120

9, 1

1210

Inside the Data Manager / ep-engine

ep-engine(couchbasebucket type)

libevent

networking layer / conn thread pool

ep-engine(couchbasebucket type)

ep-engine

(couchbasebucket type)

engine manager

file I/O thread pool

thread0 thread1 thread2 thread3

Page 51: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 51

1120

9, 1

1210

Inside the Data Manager / ep-engine

ep-engine(couchbasebucket type)

libevent

networking layer / conn thread pool

ep-engine(couchbasebucket type)

ep-engine

(couchbasebucket type)

engine manager

file I/O thread pool

thread0 thread1 thread2 thread3

Page 52: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 52

Inside ep-engine

Append-only B-Tree Storage Engine

Engine APIs(get, set, del, add, append, DCP,

…)

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

Checkpoints

Checkpoints

Checkpoints

ReaderThreads

Non-IO Thread

s

DataReplicato

r

I/O Completion

Notifier

Aux-IOThreads

FlushersData

Backfill

User Configured Replica Count = 1

Batch Readers

WriterThreads

Item Pager

Expiry Pager

Checkpoint Manager

Shared Thread Pool

Page 53: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 53

Inside ep-engine

Append-only B-Tree Storage Engine

Engine APIs(get, set, del, add, append, DCP,

…)

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

Checkpoints

Checkpoints

Checkpoints

ReaderThreads

Non-IO Thread

s

DataReplicato

r

I/O Completion

Notifier

Aux-IOThreads

FlushersData

Backfill

User Configured Replica Count = 1

Batch Readers

WriterThreads

Item Pager

Expiry Pager

Checkpoint Manager

Shared Thread Pool

Chiyoung Seo’sDeep Dive Session

at 10/6 4:20PMin this room

Page 54: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 54

Inside ep-engine

Append-only B-Tree Storage Engine

Engine APIs(get, set, del, add, append, DCP,

…)

PartitionHash Table

(active)

PartitionHash Table

(replica)

PartitionHash Table

(active)…

Checkpoints

Checkpoints

Checkpoints

ReaderThreads

Non-IO Thread

s

DataReplicato

r

I/O Completion

Notifier

Aux-IOThreads

FlushersData

Backfill

User Configured Replica Count = 1

Batch Readers

WriterThreads

Item Pager

Expiry Pager

Checkpoint Manager

Shared Thread Pool

Mike Wiederhold’sDCP Session

at 10/6 5:10PMin this room

Page 55: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 55

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

ns-server / view-engine(erlang)

godu(golang)

cert gen(golang)

map gen(golang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 56: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 56

11209, 11210

Inside a node / OS processes

babysitter (erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 57: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 57

Inside the Cluster Manager

erlang

OTP

ns-server view-engine

Page 58: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 58

Inside the Cluster Manager

erlang

OTP

ns-server view-engine

Views!

Next Gen: separate process for view-engine

Page 59: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 59

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Views!

Next Gen: split this into separate process

Sarath Lakshman’sViews Sessionat 10/7 11:40AM

in this room

Page 60: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 60

Inside the Cluster Manager

erlang

OTP

view-enginens-server

Views!

Next Gen: split this into separate process

Gerald Sangudi’sN1QL & Indexing

at 10/7 10AMdeveloper track

Page 61: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 61

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 62: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 62

System diagrams

Let there be a bucket

Rebalance

Agenda

Page 63: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 63

11209, 11210

Inside a node / OS processes

babysitter (erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

ClusterManager

--------------Data

Manager

Couchbase Server

CREATE BUCKET

Page 64: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 64

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

1

Page 65: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 65

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

1

REST admin layer receives request

Page 66: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 66

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

global orchestrator

1

2

BUCKET CREATE is dispatched to global orchestrator which checks inputs and rules

Page 67: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 67

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

global orchestrator

1

3

2

…and, then saves new bucket config to local config store

Page 68: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 68

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

global orchestrator

1

3

4

2

New bucket config is gossip’ed and replicated to other nodes.

Page 69: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 69

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

global orchestrator

bucket supervisor

1

3

4

5

2

On the other nodes… Bucket Supervisor listens for bucket config change events

Page 70: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 70

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

global orchestrator

bucket supervisor

1

3

4

5

6

2

Bucket Supervisor spawns new per-node-&-bucket services

Page 71: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 71

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

global orchestrator

bucket supervisor

1

3

4

5

6

2

partition map gen 4

Meanwhile… concurrently, the orchestrator generates a new partition map

Page 72: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 72

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master janitor

global orchestrator

bucket supervisor

1

3

4

5

6

2

partition map gen 4

5

…and, then schedules a run of the Master Janitor

Page 73: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 73

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master janitor

global orchestrator

bucket supervisor

1

3

4

5

6

2

partition map gen 4

5

…and, then schedules a run of the Master Janitor

Master Janitorlooks for messesand cleans them up

Page 74: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 74

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master janitor

global orchestrator janitor agent

bucket supervisor

1

3

4

5

6

2

partition map gen 4

5

Master Janitor sends commands to Janitor Agents on each node…

6

Page 75: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 75

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master janitor

global orchestrator janitor agent

bucket supervisor

ns-memcached

1

3

4

5

6

6 72

partition map gen 4

5

…such as to create buckets and partitions on the local data manager (memcached) process…

Page 76: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 76

generic distributed facilities

generic local facilities

Inside ns-server / CREATE BUCKET

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master janitor

global orchestrator janitor agent DCP replicator

bucket supervisor

ns-memcached

1

3

4

5

6

6 72 8

partition map gen 4

5

…and to setup DCP replication streams between data manager processes

Page 77: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 77

1120

9, 1

1210

Inside the Data Manager / CREATE BUCKET

libevent

networking layer / conn thread pool

engine manager

Athread0 thread1 thread2 thread3

CREATE BUCKET command received and parsed

Page 78: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 78

1120

9, 1

1210

Inside the Data Manager / CREATE BUCKET

libevent

networking layer / conn thread pool

engine manager

A

B

thread0 thread1 thread2 thread3

…and forwarded to the engine manager

Page 79: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 79

1120

9, 1

1210

Inside the Data Manager / CREATE BUCKET

libevent

networking layer / conn thread pool

engine manager

A

B

thread0 thread1 thread2 thread3

…and engine manager loads a new instance of the required engine

ep-engine

(couchbasebucket type)

file I/O thread pool

C

Page 80: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 80

1120

9, 1

1210

Inside the Data Manager / CREATE BUCKET

libevent

networking layer / conn thread pool

engine manager

A

B

thread0 thread1 thread2 thread3

ep-engine

(couchbasebucket type)

file I/O thread pool

…which can then allocate resources (hashtables, queues, directories, files, etc) C

D

Page 81: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 81

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

CREATE BUCKET

Page 82: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 82

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

CREATE BUCKET

Page 83: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 83

System diagrams

Let there be a bucket

Rebalance

Agenda

Page 84: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 84

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

REBALANCE

Page 85: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 85

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server / REBALANCE

master-only services

REST admin

per-node services

per-node-&-bucket services

1

Page 86: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 86

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server / REBALANCE

master-only services

REST admin

per-node services

per-node-&-bucket services

1

Handles the REST call for REBALANCE by calling the global orchestrator

Page 87: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 87

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server / REBALANCE

master-only services

REST admin

per-node services

per-node-&-bucket services

master janitor

global orchestrator

rebalancer

1

2

global orchestrator does sanity checks and calls Rebalancer to generate new “balanced” maps and calls Master Janitor

Page 88: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 88

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server / REBALANCE

master-only services

REST admin

per-node services

per-node-&-bucket services

master janitor

global orchestrator janitor agent

rebalancer

1

23

Master Janitor remotely calls Janitor Agents for per-node operations and state changes

Page 89: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 89

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server / REBALANCE

master-only services

REST admin

per-node services

per-node-&-bucket services

master janitor

global orchestrator janitor agent DCP replicator

rebalancer

1

23 4

…including stopping and recreating replication streams

Page 90: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 90

All nodes have partitions

Spread the load

Balance

Page 91: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 91

All nodes have partitions

Spread the load

Rack/zone awareness

Balance

Page 92: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 92

All nodes have partitions

Spread the load

Rack/zone awareness

Swap rebalance & failover cases

Balance

Page 93: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 93

All nodes have partitions

Spread the load

Rack/zone awareness

Swap rebalance & failover cases

Clumpinessout degree of connections from any node is limited

Balance

Page 94: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 94

When you Failover a node, you still want some balance

2nd Degree Of Balance

Page 95: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 95

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 MASTER replica

5 replica MASTER

6 MASTER replica

7 MASTER replica

Page 96: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 96

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 MASTER replica

5 replica MASTER

6 MASTER replica

7 MASTER replica

Page 97: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 97

Chaos Shark!!

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 replica MASTER

5 replica MASTER

6 MASTER replica

7 replica MASTER

BETTER! This map has

2nd degree of balance

Page 98: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 98

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 MASTER replica

5 replica MASTER

6 MASTER replica

7 MASTER replica

Page 99: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 99

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 MASTER replica

5 replica MASTER

6 MASTER replica

7 MASTER replica

Page 100: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 100

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER MASTER

1 MASTER replica

2 replica MASTER

3 MASTER MASTER

4 MASTER replica

5 replica MASTER

6 MASTER MASTER

7 MASTER replica

Page 101: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 101

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER MASTER

1 MASTER replica

2 replica MASTER

3 MASTER MASTER

4 MASTER replica

5 replica MASTER

6 MASTER MASTER

7 MASTER replica

Server B is nowoverloaded!

Page 102: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 102

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 replica MASTER

5 replica MASTER

6 MASTER replica

7 replica MASTER

BETTER! This map has

2nd degree of balance

Page 103: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 103

The Return of the Chaos Shark!

Page 104: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 104

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 replica MASTER

5 replica MASTER

6 MASTER replica

7 replica MASTER

BETTER! This map has

2nd degree of balance

Page 105: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 105

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER replica

1 MASTER replica

2 replica MASTER

3 MASTER replica

4 replica MASTER

5 replica MASTER

6 MASTER replica

7 replica MASTER

BETTER! This map has

2nd degree of balance

Page 106: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 106

When you Failover a node, you still want some balance

2nd Degree Of Balance

A B C

0 MASTER MASTER

1 MASTER replica

2 replica MASTER

3 MASTER MASTER

4 replica MASTER

5 replica MASTER

6 MASTER MASTER

7 replica MASTER

BETTER! This map has

2nd degree of balance

Page 107: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 107

Also, try to minimize partition movements

Finding partition maps that meets all the constraints is hard!

Our search algorithms are far from perfect

Balance

Page 108: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 108

3 phases of migrating a single partition

Page 109: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 109

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

Page 110: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 110

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

get most of the data replicated

Page 111: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 111

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

get most of the data replicated

and, phase #1 is serialized;1 partition at a time per node,

to avoid crushing I/O, network

Page 112: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 112

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

get most of the data replicated

and, phase #1 is serialized;1 partition at a time per node,

to avoid crushing I/O, network

and, ensure #1 persists

before moving onwards for safety

Page 113: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 113

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

ensure that view queries have consistent results

even in midst of Rebalance

Page 114: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 114

1) replica building phase (serialized)

2) indexing phase (concurrent)

3) takeover phase (concurrent)

3 phases of migrating a single partition

Page 115: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 115

4 Simple Partition States

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 116: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 116

4 Simple Partition States & Partition Takeover [ Server A Server B ]

1) Switch server B’s partition P state to PENDING;

So, any client requests to server B for partition P will block.

.

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 117: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 117

4 Simple Partition States & Partition Takeover [ Server A Server B ]

1) Switch server B’s partition P state to PENDING;

So, any client requests to server B for partition P will block.

2) Setup DCP Takeover stream for partition P from server A to server B

.

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 118: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 118

4 Simple Partition States & Partition Takeover [ Server A Server B ]

1) Switch server B’s partition P state to PENDING;

So, any client requests to server B for partition P will block.

2) Setup DCP Takeover stream for partition P from server A to server B

Server A tries to drain data to server B.

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 119: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 119

4 Simple Partition States & Partition Takeover [ Server A Server B ]

1) Switch server B’s partition P state to PENDING;

So, any client requests to server B for partition P will block.

2) Setup DCP Takeover stream for partition P from server A to server B

Server A tries to drain data to server B

And, then, atomically, server A will… Send a TAKEOVER message to

server B, And, server A changes its partition to

DEAD.

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 120: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 120

4 Simple Partition States & Partition Takeover [ Server A Server B ]

1) Switch server B’s partition P state to PENDING;

So, any client requests to server B for partition P will block.

2) Setup DCP Takeover stream for partition P from server A to server B

Server A tries to drain data to server B

And, then, atomically, server A will… Send a TAKEOVER message to

server B, And, server A changes its partition to

DEAD.

So, any client requests to server A will redirect.

Partition State

When request arrives from a client …

ACTIVE process request as normal

PENDING server blocks the connection

REPLICA redirect response: you’re accessing the wrong server!

DEAD redirect response: you’re accessing the wrong server!

Page 121: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

When server B receives TAKEOVER message,

server B will atomically… switch state of its partition P from PENDING to ACTIVE state.

So, any clients previously blocked at server B will now proceed!

Server B handles the TAKEOVER message

Page 122: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 122

1) replica building phase (serialized) ✔

2) indexing phase (concurrent) ✔

3) takeover phase (concurrent) ✔

3 phases of migrating a single partition

Page 123: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 123

Can stoprebalance at any time

with no data loss

Pencils Down Policy

Page 124: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 124

Pencils Down Policy

Can stop (and restart)rebalance at any time

with no data loss

Page 125: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 125

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

REBALANCE

Page 126: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 126

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

REBALANCE

Page 127: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 127

Inside a node / OS processes

babysitter (erlang)

godu(golang)

cert gen(golang)

map gen(golang)

8091, 8092, 11214, 11215, …

ns-server / view-engine(erlang)

11209, 11210

memcached(c/c++)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 128: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a couchbase cluster

Cluster

Manager

--------------

Data

Manager

Clu

ster

Man

ager

--------------

Data

Man

ager

ClusterManager

--------------Data

Manager

Clu

ster

Man

ager

----

----

----

--D

ata

Man

ager

ClusterManager

------

------

--Data

Manager

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 129: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 129

Inside a couchbase cluster

Couchbase Cluster

Couchbase Server

Couchbase Server

Couchbase Server

Couchbase Server

Couchbase Server

Page 130: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 130

System diagrams

Let there be a bucket

Rebalance

Agenda

Page 131: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Under the Covers

Page 132: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

When you lookUnder the Covers

you’re just going to see more covers

Page 133: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Thanks!

[email protected]

Page 134: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Questions?

[email protected]

Page 135: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Extra Slides

Page 136: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Cluster mapConnect directly to data nodesaccept() loop assigns conn to a thread in a thread pool Conn sticks to that worker thread for the life of the conn More conns == more CPU utilization

A client connects

Page 137: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Hash partitionedHash(key) => vbucketIdclusterMap[vbucketId] => master & replica nodes for that vbucketId

Given a key, we know where the item lives CAP => Consistent (later: rebalance shows how we handle

Data distribution

Page 138: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 138

1120

9, 1

1210

Inside memcached / ep-engine

default-engine(memcachedbucket type)

ep-engine(couchbasebucket type)

libevent

networking layer / conn thread pool

default-engine

(memcachedbucket type)

ep-engine(couchbasebucket type)

ep-engine

(couchbasebucket type)

engine manager

file I/O thread pool

thread0 thread1 thread2 thread3

X

Y

Page 139: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 139

System diagrams Let there be a bucket Rebalance and failover A client connects A SET request and durability, replication, views, XDCR A GET request and background storage reads

Agenda

Page 140: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

bucket (associated with connection)operation (SET)vbucketId, key, value-size, valueCAS, flags, expiration

A SET request

Page 141: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Memcached cracks open request into data structureAsks engine to allocate memoryReads in bytes into memory bufferCalls the engine (ep-engine)Callback / EWOULDBLOCK style codeAll network I/O ops are evented (libevent – kpoll, epoll)

In the data node

Page 142: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Shout out to chiyoung here

tcmalloc & jemalloc – dave rigby

Memory management

Page 143: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Hash-tables consulted Vbucket hashtable (sharded; hashtable growth TBD) Set the entry into hashtable Add to end of queues Persistence, replication, checkpoints DCP

In ep-engine

Page 144: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

What are all these files?Flushing dirty itemsSortingCouchstore Append only btree; robustness; restarts; SSD friendliness Btree balance

persistence

Page 145: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

By ep-engine, orchestrated by ns-serverHistory is available until compactionDeletion tombstonesForestdb & SSD’s

compaction

Page 146: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

In erlang, view-engineRun map() function on each document JS “NIF”

Or deleteCopy on write btree

View Maintenance

Page 147: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Another day

View Queries

Page 148: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Coming soon

Secondary Indexes

Page 149: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

When a memory cache “hit”When a memory cache “miss” Eviction / ejection Separate eviction thread Separate expiration thread

Schedule a background fetch (bgfetch) Return EWOULDBLOCK to networking layer When background I/O read thread gets the item back, notifies worker

threads to retry the GET

A GET request

Page 150: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 150

1121111209, 11210

8091, 8092, 11214, 11215, …

Inside a node / OS processes

babysitter (erlang)

moxi(c)

ns-server / view-engine(erlang)

godu(golang)

memcached(c/c++)

cert gen(golang)

map gen(golang)

ClusterManager

--------------Data

Manager

Couchbase Server

Page 151: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

On startup: warmup

Page 152: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

observe

Page 153: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Stats everywhere in data node

Stats

Page 154: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 154

Data Manager Architecture

storage interface

DatabaseBucket

11210

Memcached

Storage Engine

DatabaseBucket

DatabaseBucket…

Bucket Engine

Shared Thread Pool

Page 155: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 155

Multiple partitions per hash table- Each partition is maintained by a linked list of items- Engine parameter “ht_size” to pass the initial partition

size to the database bucket Multiple locks to synchronize accesses to hash table partitions

- Engine parameter “ht_locks” to pass the number of partition locks to the database bucket Hash table partitions are dynamically resized by the daemon

task “hash table resizer”- NON-IO thread runs the hash table resizer task

periodically

Partition Hash Table

Page 156: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 156

Partition Hash Table

Key: “K1”Metadata: exp, cas, NRU, …Value: “V1”

Key: “K5”Metadata: exp, cas, NRU, …Value: “V5”

Key: “K100”Metadata: exp, cas, NRU, …Value: “V100” …

Key: “K50”

Metadata: exp, cas, NRU, …Value: “V50”

Key: “K3”Metadata: exp, cas, NRU, …Value: “V3”

Key: “70”Metadata: exp, cas, NRU, …Value: “V70” …

Key: “K200”Metadata: exp, cas, NRU, …Value: “V200”

Key: “K150”Metadata: exp, cas, NRU, …Value: “V150”

Key: “30”Metadata: exp, cas, NRU, …Value: “V30” …

Key: “K60”Metadata: exp, cas, NRU, …Value: “V60”

Key: “K20”Metadata: exp, cas, NRU, …Value: “V20”

Key: “130”Metadata: exp, cas, NRU, …Value: “V30” …

.

.

.

Partition 1

Partition 2

Partition 99

Partition 100

Page 157: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 157

Doctors: first, do no harm

Janitors: clean up the mess & and, don’t make any new messes

every 10 seconds,master janitor broadcasts to janitor agents on every node

to “please give me your state”and, compares “reality” with expected maps &

expected statesand, requests state changes & replication streams

as needed(startup case also has ‘enable traffic’ step)(also, if master janitor sees no vbucket map for a

bucket, then gens new map)

Doctors & Janitors

Page 158: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 158

Every 30 secondsUse local KV stats & view statsand compaction policy

to ask data node to compaction relevant KV vbucket db fileor run compaction on view db file

Compactor

Page 159: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 159

replication config documents stored in view-engine replicator DB

replication manager watches replicator DB for config document changes

next-gen in golang coming soon

XDCR

Page 160: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 160

mb_master decides who is master nodeand spawns right processes on that node

master election

Page 161: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Intra-cluster replicationCross-datacenter replication (XDCR)Views and secondary indexesIncremental Backup & Restore3rd party integrations (hadoop, elasticsearch, etc)Plug Mike’s DCP talk here

DCP

Page 162: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

DCPDCP replicator

theme: data-node doesn’t connect to outside

need diagram

Replication streams

Page 163: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 163

Bucket Created in REST / web UI (coming soon, more AUTH options)config entries savedbucket event handler (on every node)

watches for bucket config change eventsthen creates/stops per bucket supervisor on node

per bucket supervisor on a node…spawn connections to data-node

(ns_memcached)spawns janitor-agent for the bucket on that

node(janitor agent receives requests

from master janitor to create vbuckets, change

vbucket state, start/stop DCP replication)

spawns per-bucket stats collector & stats archiver

spawns CAPI view manager for view maintenance & queries

Let there be a bucket

Page 164: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 164

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent DCP replicator

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

Page 165: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 165

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

heart

HeartEvery 5 secondsGrabs bucket & task states

and broadcasts to entire cluster

Page 166: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 166

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

heart doctor

DoctorListens to Heart broadcasts

and keeps cache of recent news

Every node has sense of cluster health

Page 167: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 167

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

bucket supervisorheart doctor

Bucket SupervisorTop of local supervision treeof per-node-&-per-bucket

services

Page 168: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 168

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

bucket supervisorheart doctor

Bucket SupervisorTop of local supervision treeof per-node-&-per-bucket

services

Page 169: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 169

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

bucket supervisorheart doctor XDCR services

XDCR ServicesManages XDCR streamsNext gen:

Separate process; golang;

more flexible conflict resolution

Page 170: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 170

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

bucket supervisorheart doctor XDCR services

Page 171: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 171

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master electionbucket supervisorheart doctor XDCR services

Master ElectionDecides who isthe master node Cluster

Manager--------------

Data Manager

Couchbase Server

Page 172: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 172

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

master tick

bucket supervisorheart doctor XDCR services

TickBroadcasts global tick counter“lost N ticks”

Page 173: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 173

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

Global OrchestratorSpawns Rebalancer if neededSpawns Janitor

Page 174: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 174

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

RebalancerComputes new partition mapsSupervises the Rebalance

dance steps

Page 175: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 175

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

Master JanitorDetects any messes

and cleans them upTries to not make any new

messes:conservative

Page 176: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 176

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

Master JanitorDetects any messes

and cleans them upTries to not make any new

messes:conservative

every 10 seconds… Master Janitor broadcasts to Janitor Agents on every node

to “please give me your state”and, compares reality with expected statesand, requests state changes & replication streams as

needed

also, if Master Janitor sees no vbucket map for a bucket, then generates new map

Page 177: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 177

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

auto-failover detector

Auto Failover DetectorCONSERVATIVELY

“presses” the failover button

only once

Page 178: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 178

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator

bucket supervisor

rebalancer

heart doctor XDCR services

Page 179: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 179

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent

bucket supervisor

rebalancer

heart doctor XDCR services

Janitor Agent Handles commands from Master

Janitor

Page 180: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 180

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

Stats Collector & Archiver

Page 181: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 181

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent DCP replicator

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

DCP ReplicatorIntracluster replication

Page 182: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 182

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent DCP replicator

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

Page 183: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 183

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent DCP replicator

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

Web UI & REST Admin Service+ client-side JavaScript

(switching to AngularJS)

Page 184: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 184

per-node-&-bucket services

generic distributed facilities

generic local facilities

Inside ns-server

vclock, uuid, work queue, events, misclogging (ALE)

distributed node discovery

master-only services

REST admin

config gossip replication

local config store

per-node services

per-node-&-bucket services

master election

janitor

auto-failover detector

master tick

global orchestrator janitor agent DCP replicator

bucket supervisor

stats collector/archiver

rebalancer

heart doctor XDCR services

Page 185: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Spawns off new ep-engine instance Separate “apartments” Buckets share threads, IO mgr

Memcached handles bucket create

Page 186: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

When you press the Rebalance button / APICluster Manager computes new map (follows rack/zone rules and seeks balancedness)

Rebalance

Page 187: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 187

Orchestrator “conducts” the rebalance movesFor each bucket

Generate new vbucket mapThen, spawn vbucket-movers

A vbucket mover spawns per-vbucket-mover

Orchestrator-> Rebalancer

-> vb_mover-> single_vbucket_mover

-> does the takeover dance,with consistent view index maneuvers

1) replica building phase(bulk of data replicated)(phase #1 is serialized per

node, to avoid crushing I/O, network; 1 vbucket at a time per

node)(and, make sure #1 persists

to disk before moving onwards: safety)2) indexing phase (concurrent)3) takeover phase (concurrent)

Rebalance

Page 188: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 188

Partition State When request arrives from a client … Used …

ACTIVE process request as normal during normal operations

PENDING server blocks the connection during Rebalance - transferring partition ownership between servers

REPLICA error response: you’re accessing the wrong server!

to keep Couchbase consistent

DEAD error response: you’re accessing the wrong server!

to keep Couchbase consistent

4 Simple Partition States

Page 189: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 189

switch from “tmp not ready error” during warmup to “not-my-vbucket” error

CMD_ENABLE_TRAFFIC

Page 190: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 190

janitor is a library

the phases of janitor run…1) wait until everyone is ready

1) all states of all vbuckets on all nodes are ready2) so, we know all buckets are created, etc

2) change vbucket states and drop old replication streams3) create new replication streams

so, warmup and bucket creation are treated very similarly

janitor

Page 191: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

Inside a couchbase cluster

Cluster

Manager

--------------

Data

Manager

Clu

ster

Man

ager

--------------

Data

Man

ager

ClusterManager

--------------Data

Manager

Clu

ster

Man

ager

----

----

----

--D

ata

Man

ager

ClusterManager

------

------

--Data

Manager

Couchbase

Server

Couchbase

Server

Couchbase Server

Cou

chba

se

Ser

ver

Couchbase

Serve

r

Page 192: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 192

co-founder, Couchbase

co-founder, Escalate => GE Retail Systems

co-founder, Kiva Software => Netscape Application Server

Approach Software RDBMS => Lotus

About Me

Page 193: Under the Covers - Couchbase Server Architecture: Couchbase Connect 2014

©2014 Couchbase, Inc. 193

fast forward vbucket map & CCCP