fault tolerance in cassandra
DESCRIPTION
A short talk on how Cassandra deals with various failure modes. Discussion of replication and consistency levels and how they can be used to survive many kinds of failure. Ends with explanation of recovery methods - repair, hinted handoff and read repair.TRANSCRIPT
Richard Low
[email protected]@acunu
@richardalow
Cassandra London Meetup, 5 Sept 2011
Fault tolerance in Cassandra
Tuesday, 6 September 2011
Menu
• Failure modes
• Maintaining availability
• Recovery
Tuesday, 6 September 2011
Failure modes
Tuesday, 6 September 2011
Failures are the norm
• With more than a few nodes, something goes wrong all the time
• Don’t want to be down all the time
Tuesday, 6 September 2011
Failure causes
• Hardware failure
• Bug
• Power
• Natural disaster
Tuesday, 6 September 2011
Failure modes
• Data centre failure
• Node failure
• Disk failure
Tuesday, 6 September 2011
Failure modes
• Data centre failure
• Node failure
• Disk failure
• Temporary
• Permanent
Tuesday, 6 September 2011
Failure modes
• Network failure
• One node
• Network partition
• Whole data centre
Tuesday, 6 September 2011
Failure modes
• Operator failure
• Delete files
• Delete entire database
• Incorrect configuration
Tuesday, 6 September 2011
Failure modes
• Want a system that can tolerate all the above failures
• Make assumptions about probabilities of multiple events
• Be careful when assuming independence
Tuesday, 6 September 2011
Solutions
• Do nothing
• Make boxes bullet proof
• Replication
Tuesday, 6 September 2011
AvailabilityTuesday, 6 September 2011
How maintain availability in the
presence of failure?
Tuesday, 6 September 2011
Replication
• Buy cheap nodes and cheap disks
• Store multiple copies of the data
• Don’t care if some disappear
Tuesday, 6 September 2011
Replication
• What about consistency?
• What if I can’t tolerate out-of-date reads?
• How restore a replica?
Tuesday, 6 September 2011
RF and CL
• Replication factor
• How many copies
• How much failure can tolerate
• Consistency Level
• How many nodes must be contactable for operation to succeed
Tuesday, 6 September 2011
Simple example
• Replication factor 3
• Uniform network topology
• Read and write at CL.QUORUM
• Strong consistency
• Available if any one node is down
• Can recover if any two nodes fail
Tuesday, 6 September 2011
In general
• RF N, reads and writes at CL.QUORUM
• Available if ceil(N/2)-1 nodes fail
• Can recover if N-1 nodes fail
Tuesday, 6 September 2011
Multi data centre
• Cassandra knows location of hosts
• Through the snitch
• Can ensure replicas in each DC
• NetworkTopologyStrategy
• => can cope with whole DC failure
Tuesday, 6 September 2011
RecoveryTuesday, 6 September 2011
Recovery
• Want to maintain replication factor
• Ensures recovery guarantees
• Methods:
• Automatic
• Manual
Tuesday, 6 September 2011
Automatic
Tuesday, 6 September 2011
Automatic processes
• Eventually moves replicas towards consistency
• The ‘eventual’ in ‘eventual consistency’
Tuesday, 6 September 2011
Hinted Handoff
• Hints
• Stored on any node
• When a node is temporarily unavailable
• Delivered when the node comes back
• Can use CL.ANY
• Writes not immediately readable
Tuesday, 6 September 2011
Read Repair
• Since done a read, might as well repair any old copies
• Compare values, update any out of sync
Tuesday, 6 September 2011
Manual
Tuesday, 6 September 2011
Repair: method
• Ensures a node is up to date
• Run ‘nodetool -h <node> repair’
• Reads through entire data on the node
• Builds a Merkel tree
• Compares with replicas
• Streams differences
Tuesday, 6 September 2011
Repair: when
• After node has been down a long time
• After increasing replication factor
• Every 10 days to ensure tombstones are propagated
• Can be used to restore a failed node
Tuesday, 6 September 2011
Replace a node: method
• Bootstrap new node with <old_token>-1
• Tell existing nodes old node is dead
• nodetool remove
Tuesday, 6 September 2011
Replace a node: when
• Complete node failure
• Cannot replace failed disk
• Corruption
Tuesday, 6 September 2011
Restore from backup: method
• Stop Cassandra on the node
• Copy SSTables from backup
• Restart Cassandra
• Make take a while reading indexes
Tuesday, 6 September 2011
Restore from backup: when
• Disk failure
• with no RAID rebuild available
• Operator error
• Corruption
• Hacker
Tuesday, 6 September 2011
Thanks :)
@acunu@richardalow
Tuesday, 6 September 2011