enabling high availability and disaster recovery in couchbase server

22
High Availability / Disaster Recover Mel Boulos Solutions Engineer Couchbase

Upload: couchbase

Post on 12-Jan-2017

527 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Enabling High Availability and Disaster Recovery in Couchbase Server

High Availability / Disaster Recover

Mel Boulos Solutions Engineer

Couchbase

Page 2: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 3

Next 40 minutes …

Part I - High Availability – Single node architecture– Local data redundancy– Rebalance and failover– Node recovery

Part II - Disaster Recovery– Business continuity for “mission-critical” applications – Geo redundancy – Backup-Restore for worst case scenario

Page 3: Enabling High Availability and Disaster Recovery in Couchbase Server

Part I - High Availability

Page 4: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 5

Couchbase Server – Single Node Architecture

Single node type is the foundation for high availability architecture

No Single Point of Failure (SPOF)

Easy scalability

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 2

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed Cache

Storage

Data Service

Index Service

Query Service

Page 5: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 6

Intra-Cluster Replication – Data Redundancy

RAM to RAM replication

Max of 4 copies of data in a Cluster

Bandwidth optimized through de-duplicate, or ‘de-dup’ the item

Intra-cluster replication is the process of replicating data on multiple servers within a cluster in order to provide data redundancy.

Page 6: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 7

Write Operation – Data RedundancyAPPLICATION SERVER

MANAGED CACHE

DISK

DISK

DOC 1

DOC 1DOC 1

Caching based on Memcached: App gets an ACK when write is successfully in RAM Or RAM+Replicated Or RAM+Persisted Or

RAM+Replicated+Persisted

DCP based Replication: writes queued to other nodes

Couchstore based Storage: writes queued for storage

DCP

INDEXER

Page 7: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 8

Database Change Protocol – Data Redundancy

DCP is new streaming replication protocol in Couchbase Server 3.0 High-Performance, Stream-

based Protocol

Better Resume-ability after blips and failures

Ordering

Consistent

Intra-Cluster Replication

Cross Datacenter Replication

Incremental Rebalance

Incremental Backup & RestoreExternal

streams for Change Data Capture (CDC) in future

Incremental Map/Reduce Views

Global Secondary Indexes

Connectors (Kafka, Scoop, Spark)

Page 8: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 9

Auto Tuning Shared Thread Pool - Durability

Efficient Auto-Tuning Engine Detect and allocate threads

based on HW resources

Pool threads for best resource utilization

Improved latency across the board

Faster Rebalance

Faster Node Reactivation

Faster Durability with Writes & PersistTo

Page 9: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 10

Rebalance Operation – Data Availability Rebalance redistributes data-partitions (data) around

cluster– When adding nodes– When removing nodes– When nodes have failed over

Aim is to bring cluster back to optimal health Data-partitions are moved between nodes automatically Rebalance happens on an active cluster

– Allows you to expand/shrink without pausing your application– Client libraries automatically handle the rebalance and

redistribute their requests accordingly

Page 10: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 11

Failover Operation - Fault-tolerance Failover automatically switches-over to the

replicas for a given database– Gracefully under node maintenance– Immediately under auto-failover– Can be triggered manually through the

Admin-UI/REST/CLI

Automatic failover in case of unplanned outages – system failures– Can be configured through Admin-UI/REST/CLI– Constraints in place to avoid “split-brain” and false

positives– 30 second delay, multiple heartbeat “pings”– Clusters >=3 nodes– Only one node down at a time

Page 11: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 12

Automatic Failover – “In action”

SERVER 4 SERVER 5

Replica

Active

Replica

ActiveActive

SERVER 1

Shard 5

Shard 2

Shard 9Shard

Shard

Shard

Replica

Shard 4

Shard 1

Shard 8Shard

Shard

Shard

Active

SERVER 2

Shard 4

Shard 7 Shard 8

Shard

Shard Shard

Replica

Shard 6

Shard 3 Shard 2

Shard

Shard Shard

Active

SERVER 3

Shard 1

Shard 3

Shard 6Shard

Shard

Shard

Replica

Shard 7

Shard 9

Shard 5Shard

Shard

Shard

App servers accessing Shards

Requests to Server 3 fail

Cluster detects server failed Promotes replicas

of Shards to active

Updates cluster map

Requests for docs now go to appropriate server

Typically rebalance would follow

Shard 1 Shard 3

Shard

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

Page 12: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 13

Node Recovery – Bring Cluster back to Capacity

Failed-Over node can re-added back to cluster – Full recovery – Add back as a fresh node– Delta Node recovery – Add back failed node incrementally

into the cluster without having to rebuild the full node.

Page 13: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 14

Rack-Zone Awareness – Rack-Zone Availability

Grouping of servers into server groups so that each group is on a physically separate rack

Ensures that replica data partitions are not on the same rack as the primary partitions

Rack 1

1

2

3

Rack 2

4

5

6

Rack 3

7

8

9

Servers 1, 2, 3 on Rack 1 Servers 4, 5, 6 on Rack 2 Servers 7, 8, 9 on Rack 3 Cluster has 2 replicas (3 copies

of data) This is a balanced configuration

Page 14: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 15

Couchbase Server - MDS Architecture (NEW in 4.0)What is Multi-Dimensional Scalability?

MDS is the architecture that enables independent scaling of data, query and indexing workloads. That also provides isolation of services for minimized interference.

Independent “zones” for Query, Index and Data Services

Index Service

Couchbase Cluster

Query Service Data Service

node1 node8

Page 15: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 16

Couchbase Server - MDS Architecture (NEW in 4.0)

Page 16: Enabling High Availability and Disaster Recovery in Couchbase Server

Part I I – Disaster Recovery

Page 17: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 18

Cross Datacenter Replication (XDCR) Unidirectional Replication

Hot spare / Disaster Recovery

Development/Testing copies

Bidirectional Replication

Datacenter Locality

Multiple Active Masters

Page 18: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 19

Cross Datacenter Replication (XDCR) using DCP

Replicates continuously data FROM source cluster to remote clusters may be spread across geo’s

Supports unidirectional and bidirectional operation Application can read and write from both clusters (active –

active replication) Automatically handles node addition and removal Simplified Administration via Admin UI, REST, and CLI Pause and resume XDCR replication (NEW in 4.0) Filtering of data on replication stream

Page 19: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 20

XDCR – Memory based using DCP

APPLICATION SERVER

MANAGED CACHE

DISK

DISK

DOC 1

DOC 1

Intra-Cluster Replication

INDEXER

Cross Datacenter Replication

DOC 1DOC 1

Page 20: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 21

Backup & Restore - Oops cbbackup tools provides backup for a running cluster

– Entire Cluster – across all bucket – Single Node – across all buckets– Single Node – single bucket– Supports remote or local access

Page 21: Enabling High Availability and Disaster Recovery in Couchbase Server

©2015 Couchbase Inc. 22

Minimize time and resources during backups

Efficient Recovery with Incremental Backup & Restore

• Back up only the data updated since the last backup

• Differential Backups• Cumulative Backups

Page 22: Enabling High Availability and Disaster Recovery in Couchbase Server

Thank you.

Questions?