fault tolerance in cassandra

Richard Low

[email protected]@acunu

@richardalow

Cassandra London Meetup, 5 Sept 2011

Fault tolerance in Cassandra

Tuesday, 6 September 2011

mailto:[email protected]






Menu

• Failure modes

• Maintaining availability

• Recovery


Failure modes


Failures are the norm

• With more than a few nodes, something goes wrong all the time

• Don’t want to be down all the time


Failure causes

• Hardware failure

• Bug

• Power

• Natural disaster


Failure modes

• Data centre failure

• Node failure

• Disk failure


Failure modes

• Data centre failure

• Node failure

• Disk failure

• Temporary

• Permanent


Failure modes

• Network failure

• One node

• Network partition

• Whole data centre


Failure modes

• Operator failure

• Delete files

• Delete entire database

• Incorrect configuration


Failure modes

• Want a system that can tolerate all the above failures

• Make assumptions about probabilities of multiple events

• Be careful when assuming independence


Solutions

• Do nothing

• Make boxes bullet proof

• Replication


AvailabilityTuesday, 6 September 2011

How maintain availability in the

presence of failure?


Replication

• Buy cheap nodes and cheap disks

• Store multiple copies of the data

• Don’t care if some disappear


Replication

• What about consistency?

• What if I can’t tolerate out-of-date reads?

• How restore a replica?


RF and CL

• Replication factor

• How many copies

• How much failure can tolerate

• Consistency Level

• How many nodes must be contactable for operation to succeed


Simple example

• Replication factor 3

• Uniform network topology

• Read and write at CL.QUORUM

• Strong consistency

• Available if any one node is down

• Can recover if any two nodes fail


In general

• RF N, reads and writes at CL.QUORUM

• Available if ceil(N/2)-1 nodes fail

• Can recover if N-1 nodes fail


Multi data centre

• Cassandra knows location of hosts

• Through the snitch

• Can ensure replicas in each DC

• NetworkTopologyStrategy

• => can cope with whole DC failure


RecoveryTuesday, 6 September 2011

Recovery

• Want to maintain replication factor

• Ensures recovery guarantees

• Methods:

• Automatic

• Manual


Automatic


Automatic processes

• Eventually moves replicas towards consistency

• The ‘eventual’ in ‘eventual consistency’


Hinted Handoff

• Hints

• Stored on any node

• When a node is temporarily unavailable

• Delivered when the node comes back

• Can use CL.ANY

• Writes not immediately readable


Read Repair

• Since done a read, might as well repair any old copies

• Compare values, update any out of sync


Manual


Repair: method

• Ensures a node is up to date

• Run ‘nodetool -h <node> repair’

• Reads through entire data on the node

• Builds a Merkel tree

• Compares with replicas

• Streams differences


Repair: when

• After node has been down a long time

• After increasing replication factor

• Every 10 days to ensure tombstones are propagated

• Can be used to restore a failed node


Replace a node: method

• Bootstrap new node with <old_token>-1

• Tell existing nodes old node is dead

• nodetool remove


Replace a node: when

• Complete node failure

• Cannot replace failed disk

• Corruption


Restore from backup: method

• Stop Cassandra on the node

• Copy SSTables from backup

• Restart Cassandra

• Make take a while reading indexes


Restore from backup: when

• Disk failure

• with no RAID rebuild available

• Operator error

• Corruption

• Hacker


Thanks :)

[email protected]

@acunu@richardalow


http://www.acunu.com








fault tolerance in cassandra

Technology