ultra-high availability & disaster recovery with couchbase server: couchbase connect 2014

27
Ultra-High Availability & Disaster Recovery with Couchbase Server Anil Kumar Product Management, Couchbase

Upload: couchbase

Post on 29-Jun-2015

826 views

Category:

Data & Analytics


2 download

DESCRIPTION

Abstract: Join Anil Kumar for a demo-filled and guided session to learn how to deliver continuously available mission-critical apps across data centers. For today’s mission-critical apps, high availability is no longer a ‘nice to have’ but is essential. Downtime and data loss is unacceptable, resulting in lost revenue. In this session we will cover the wide array of high availability and disaster recovery features available in Couchbase Server.

TRANSCRIPT

Page 1: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Ultra-High Availability & Disaster Recovery with Couchbase Server

Anil Kumar Product Management, Couchbase

Page 2: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 2

About Me

Anil KumarProduct Manager, Couchbase

[email protected]

@anilkumar1129

Page 3: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 3

Part I - High Availability Single node architecture Local data redundancy Rebalance and failover Node recovery

Part II - Disaster Recovery Business continuity for “mission-critical” applications Geo redundancy Backup-Restore for worst case scenario

Demo

Q & A

High-Availability & Disaster Recovery

Page 4: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Part I - High Availability

Page 5: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 5

Couchbase Server – Single Node Architecture

Hea

rtbe

at

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per clusa

vBuc

ket

stat

e an

d re

plic

atio

n m

anag

er

http

RE

ST

man

ag

em

ent

AP

I/Web

UI

HTTP8091

Erlang port mapper4369

Distributed Erlang21100 - 21199

Erlang / OTP

storage interface

Couchbase EP Engine

11210Memcapable 2.0

Moxi

11211Memcapable 1.0

Memcached

Persistence Layer

8092Query API

Qu

ery

En

gin

e

Data Manager Cluster Manager

Single Node type is the foundation for high availability architecture

No Single Point of Failure (SPOF)

Easy scalability

Page 6: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

6

Intra-Cluster Replication – Data Redundancy

RAM to RAM replication

Max of 4 copies of data in a Cluster

Bandwidth optimized through de-duplicate, or ‘de-dup’ the item

©2014 Couchbase, Inc.

Intra-cluster replication is the process of replicating data on multiple servers within a cluster in order to provide data redundancy.

Page 7: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

7

Write Operation – Data Redundancy

33 2Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Memory-to-Memory Replication to other node

Doc

Doc Doc

©2014 Couchbase, Inc.

Page 8: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

8

(New in 3.0) Database Change Protocol (DCP) – Data Redundancy

©2014 Couchbase, Inc.

DCP is new streaming replication protocol in Couchbase Server 3.0

High-Performance, Stream-based Protocol

Better Resume-ability after blips and failures

Powers Intra Cluster Replication

Powers Cross Datacenter Replication

Powers Incremental Backup & Restore

Up to 150x Improvement on ReplicateTo latency from 2.5 to 3.0

Page 9: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

(New in 3.0) Auto Tuning Shared Thread Pool - Durability

©2014 Couchbase, Inc.

Efficient Auto-Tuning Engine Detect and allocate threads based

on HW resources

Pool threads for best resource utilization

Improved latency across the board Faster Rebalance

Faster Node Reactivation

Faster Durability with Writes & PersistTo

Up to 3x better PersistTo latency from 2.5 to 3.0

Page 10: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 10

Rebalance Operation in Couchbase Server – Data Availability

Rebalance redistributes data-partitions (data) around cluster When adding nodes When removing nodes When nodes have failed over

Aim is to bring cluster back to optimal health

Data-partitions are moved between nodes automatically

Rebalance happens on an active cluster Allows you to expand/shrink without pausing your application Client libraries automatically handle the rebalance and redistribute their requests

accordingly

Up to 2x Faster Rebalance under Load between 3.0 and 2.5.1

Page 11: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 11

Failover in Couchbase Server - Fault-tolerance

Failover automatically switches-over to the replicas for a given database Gracefully under node maintenance Immediately under auto-failover

Manual failover for node maintenance Can be triggered manually through the Admin-UI/REST/CLI

Automatic failover in case unplanned outage – system failures Can be configured through Admin-UI/REST/CLI Constraints in place to avoid “split-brain” and false positives

30 second delay, multiple heartbeat “pings” Clusters >=3 nodes Only one node down at a time

Page 12: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 12

Automatic Failover – “In action”

SERVER 4 SERVER 5

Replica

Active

Replica

Active

App Server 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

App Server 2

Active

SERVER 1

Shard 5

Shard 2

Shard 9Shard

Shard

Shard

Replica

Shard 4

Shard 1

Shard 8Shard

Shard

Shard

Active

SERVER 2

Shard 4

Shard 7 Shard 8

Shard

Shard Shard

Replica

Shard 6

Shard 3 Shard 2

Shard

Shard Shard

Active

SERVER 3

Shard 1

Shard 3

Shard 6Shard

Shard

Shard

Replica

Shard 7

Shard 9

Shard 5Shard

Shard

Shard

App servers accessing Shards

Requests to Server 3 fail

Cluster detects server failed

Promotes replicas of Shards to active

Updates cluster map

Requests for docs now go to appropriate server

Typically rebalance would follow

Shard 1 Shard 3

Shard

Page 13: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Node Recovery – Bring Cluster back to Capacity

©2014 Couchbase, Inc.

Failed-Over node can re-added back to cluster

Full recovery – Add back as a fresh node

(New in 3.0) Delta Node recovery – Add back failed node incrementally into the cluster without having to rebuild the full node.

New in 3.0

100sx Reduction in Time to Re-Add Node from 2.5 to 3.0

Page 14: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 14

Rack-Zone Awareness – Rack-Zone Availability

Rack 1

1

2

3

Rack 2

4

5

6

Rack 3

7

8

9

Grouping of servers into server groups so that each group is on a physically separate rack

Ensures that replica data partitions are not on the same rack as the primary partitions

Servers 1, 2, 3 on Rack 1

Servers 4, 5, 6 on Rack 2

Servers 7, 8, 9 on Rack 3

Cluster has 2 replicas (3 copies of data)

This is a balanced configuration

Page 15: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 15

If a entire Server Rack fails, data is still available

If a entire Cloud Zone or a Region fails, data is still available

Rack-Zone Awareness

Rack 1

1

2

3

Rack 2

4

5

6

Rack 3

7

8

9

Page 16: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 16

Couchbase Server provides statistics at multiple levels throughout the cluster. Used for regular monitoring, capacity planning and to identify the performance

characteristics. Enable email alerts to be raised when a significant error occurs on your Couchbase

Server cluster.

Monitoring & Alerting

Page 17: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Part II – Disaster Recovery

Page 18: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014
Page 19: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

19

Cross Datacenter Replication (XDCR)

©2014 Couchbase, Inc.

Unidirectional Replication

Hot spare / Disaster Recovery

Development/Testing copies

Bidirectional Replication

Datacenter Locality

Multiple Active Masters

Page 20: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

20

Cross Datacenter Replication (XDCR) using DCP

Replicates continuously data FROM source cluster to remote clusters may be spread across geo’s

Supports unidirectional and bidirectional operation

Application can read and write from both clusters (active – active replication)

Automatically handles node addition and removal

Simplified Administration via Admin UI, REST, and CLI

(New in 3.0) Pause and resume XDCR replication

©2014 Couchbase, Inc.

Page 21: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

21

Cross Datacenter Replication (XDCR) – Memory based using DCP

33 2Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Memory-to-Memory Replication to other node

Doc

Doc Doc

XDCR Queue

(New in 3.0) Memory-to-Memory Replication to remote cluster

Doc

©2014 Couchbase, Inc.

Up to 4x better on XDCR latency between clusters between 3.0 & 2.5

Page 22: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

©2014 Couchbase, Inc. 22

Backup & Restore – Oops Case

cbbackup tools provides backup for a running cluster Entire Cluster – across all bucket Single Node – across all buckets Single Node – single bucket Supports remote or local access

(New in 3.0) Incremental Backups Differential Or Cumulative Back up data that only changed since the last backup. Minimize resource and time consumption during backups. Enables more frequent backups

Restore cluster to point in time differential or cumulative backup

Page 23: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Backup & Restore

Page 24: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Demo !!!

Page 25: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Visit – Alex Ma (Deep-dive into XDCR & Rack-zone Awareness)Visit – Cihan (Couchbase on Azure)Visit – Kirk (Tuning Couchbase Server)

Related Talks

Page 26: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

DOWNLOAD COUCHBASE SERVER 3.0

www.couchbase.com/download

& give us feedback…

Download Couchbase Server 3.0

Page 27: Ultra-High Availability & Disaster Recovery with Couchbase Server: Couchbase Connect 2014

Q & AAnil Kumar

Product Management, Couchbase

[email protected]

@anilkumar1129