building a distributed key-value store with cassandra
Post on 13-May-2015
5.243 Views
Preview:
DESCRIPTION
TRANSCRIPT
Building a Key-Value Store with Cassandra
Kiwi PyCon 2010Aaron Morton @aaronmorton
Weta Digital
1
Why Cassandra?
• Part of a larger project started earlier this year to build new systems for code running on the render farm of 35,000 cores
• Larger project goals were Scalability, Reliability, Flexible Schema
2
How about MySQL ?• It works. But...
• Schema changes
• Write redundancy
• Query language mismatch
• So went looking for the right tool for the job
3
Redis ?
• Fast, flexible. But...
• Single core limit
• Replication, but no cluster (itʼs coming)
• Limited support options
4
Couch DB ?• Schema free, scalable (sort of),
redundant (sort of). But...
• Single write thread limit
• Replication, but no cluster (itʼs coming)
• Low consistency with asynchronous replication
5
Cassandra ?• Just right, perhaps. Letʼs see...
• Highly available
• Tuneable synchronous replication
• Scalable writes and reads
• Schema free (sort of)
• Lots of new mistakes to be made
6
Availability• Row data is kept together and
replicated around the cluster
• Replication Factor is configurable
• Partitioner determines the position of a row key in the distributed hash table
• Replication Strategy determines where in the cluster to place the replicas
7
Consistency• Each read or write request specifies a
Consistency Level
• Individual nodes may be inconsistent with respect to others
• Reads may give consistent results while some nodes have inconsistent values
• The entire cluster will eventually mode to a state where there is one version of each
8
Consistency
• R + W > N
• R = Read Consistency
• W = Write Consistency
• N = Replication Factor
9
Scale
• Distributed hash table
• Scale throughput and capacity with more nodes, more disk, more memory
• Adding or removing nodes is an online operation
• Gossip based protocol for discovery
10
Data Model• Column orientated
• Denormalise
• Cassandra in an index building machine
• Simple explanation: a row has a key and stores an ordered hash in one or more Column Families
11
Data Model
• Keyspace
• Row / Key
• Column Family or Super Column Family
• Column
12
Data Model
User CF Posts SCF
Fred email:fred@...dob:04/03
post_1:{title: foo,body: bar}
Bob email:bobpost_100:{
title: monkeys,body: naughty}
13
API• Thrift
• Avro (beta)
• Auto generated bindings for many languages
• Stateful connections
• Python wrappers pycassa, Telephus (twisted)
14
API
• Client supplied time stamp for all mutations
• Client supplied Consistency Level for all mutations and reads
15
API
• insert (key, column_family, super_column, column, value)
• get(key, column_family, super_column, column)
• remove(key, column_family, super_column, column)
16
API• Slicing columns or super columns
• list of names
• start, finish, count, reversed
• get_slice() to slice one row
• multiget_slice() to slice multiple rows
• get_range_slices() to slice rows and columns
17
API
• Slicing keys
• start key, finish key, count
• Partitioner effects key order
• get_range_slices() to slice rows and columns
18
API
• batch_mutate()
• multiple rows and CFʼs
• delete or insert / update
• Individual mutations are atomic
• Request is not atomic, no rollback
19
Our ApplicationVarnish
Nginx
Tornado
Cassandra Rabbit MQ
20
Our Application
• Similar to Amazon S3.
• REST API.
• Databases, Buckets, Keys+Values.
21
Our Column Families
• Database (super)
• Bucket (super)
• Bucket Index
• Object
• Object Index (super)
22
Our API
http:// db_name.wetafx.co.nz/bucket/key
23
PUT Object
• /bucket/object
• batch_mutate()
• one row in Objects CF with columns for meta and the body
• one column in ObjectIndex CF row for the bucket
24
List Objects
• /bucket_name?start=foo
• get_slice()
• for the bucket row in ObjectIndex CF
• if needed, multiget_slice() to “join” to the Object CF
25
Delete Bucket
• /bucket_name
• get_slice() on ObjectIndex CF
• batch_mutate() to delete Object CF and ObjectIndex CF
• delete Bucket CF row
26
Thanks
• http://wetafx.co.nz
• http://cassandra.apache.org/
•
27
top related