building a distributed key-value store with cassandra
DESCRIPTION
Slides from my talk at Kiwi Pycon in 2010. Covers why we chose Cassandra, overview of it's feature and data model, and how we implemented our application.TRANSCRIPT
Building a Key-Value Store with Cassandra
Kiwi PyCon 2010Aaron Morton @aaronmorton
Weta Digital
1
Why Cassandra?
• Part of a larger project started earlier this year to build new systems for code running on the render farm of 35,000 cores
• Larger project goals were Scalability, Reliability, Flexible Schema
2
How about MySQL ?• It works. But...
• Schema changes
• Write redundancy
• Query language mismatch
• So went looking for the right tool for the job
3
Redis ?
• Fast, flexible. But...
• Single core limit
• Replication, but no cluster (itʼs coming)
• Limited support options
4
Couch DB ?• Schema free, scalable (sort of),
redundant (sort of). But...
• Single write thread limit
• Replication, but no cluster (itʼs coming)
• Low consistency with asynchronous replication
5
Cassandra ?• Just right, perhaps. Letʼs see...
• Highly available
• Tuneable synchronous replication
• Scalable writes and reads
• Schema free (sort of)
• Lots of new mistakes to be made
6
Availability• Row data is kept together and
replicated around the cluster
• Replication Factor is configurable
• Partitioner determines the position of a row key in the distributed hash table
• Replication Strategy determines where in the cluster to place the replicas
7
Consistency• Each read or write request specifies a
Consistency Level
• Individual nodes may be inconsistent with respect to others
• Reads may give consistent results while some nodes have inconsistent values
• The entire cluster will eventually mode to a state where there is one version of each
8
Consistency
• R + W > N
• R = Read Consistency
• W = Write Consistency
• N = Replication Factor
9
Scale
• Distributed hash table
• Scale throughput and capacity with more nodes, more disk, more memory
• Adding or removing nodes is an online operation
• Gossip based protocol for discovery
10
Data Model• Column orientated
• Denormalise
• Cassandra in an index building machine
• Simple explanation: a row has a key and stores an ordered hash in one or more Column Families
11
Data Model
• Keyspace
• Row / Key
• Column Family or Super Column Family
• Column
12
Data Model
User CF Posts SCF
Fred email:[email protected]:04/03
post_1:{title: foo,body: bar}
Bob email:bobpost_100:{
title: monkeys,body: naughty}
13
API• Thrift
• Avro (beta)
• Auto generated bindings for many languages
• Stateful connections
• Python wrappers pycassa, Telephus (twisted)
14
API
• Client supplied time stamp for all mutations
• Client supplied Consistency Level for all mutations and reads
15
API
• insert (key, column_family, super_column, column, value)
• get(key, column_family, super_column, column)
• remove(key, column_family, super_column, column)
16
API• Slicing columns or super columns
• list of names
• start, finish, count, reversed
• get_slice() to slice one row
• multiget_slice() to slice multiple rows
• get_range_slices() to slice rows and columns
17
API
• Slicing keys
• start key, finish key, count
• Partitioner effects key order
• get_range_slices() to slice rows and columns
18
API
• batch_mutate()
• multiple rows and CFʼs
• delete or insert / update
• Individual mutations are atomic
• Request is not atomic, no rollback
19
Our ApplicationVarnish
Nginx
Tornado
Cassandra Rabbit MQ
20
Our Application
• Similar to Amazon S3.
• REST API.
• Databases, Buckets, Keys+Values.
21
Our Column Families
• Database (super)
• Bucket (super)
• Bucket Index
• Object
• Object Index (super)
22
Our API
http:// db_name.wetafx.co.nz/bucket/key
23
PUT Object
• /bucket/object
• batch_mutate()
• one row in Objects CF with columns for meta and the body
• one column in ObjectIndex CF row for the bucket
24
List Objects
• /bucket_name?start=foo
• get_slice()
• for the bucket row in ObjectIndex CF
• if needed, multiget_slice() to “join” to the Object CF
25
Delete Bucket
• /bucket_name
• get_slice() on ObjectIndex CF
• batch_mutate() to delete Object CF and ObjectIndex CF
• delete Bucket CF row
26
Thanks
• http://wetafx.co.nz
• http://cassandra.apache.org/
•
27