building a distributed key-value store with cassandra

Post on 13-May-2015

5.243 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from my talk at Kiwi Pycon in 2010. Covers why we chose Cassandra, overview of it's feature and data model, and how we implemented our application.

TRANSCRIPT

Building a Key-Value Store with Cassandra

Kiwi PyCon 2010Aaron Morton @aaronmorton

Weta Digital

1

Why Cassandra?

• Part of a larger project started earlier this year to build new systems for code running on the render farm of 35,000 cores

• Larger project goals were Scalability, Reliability, Flexible Schema

2

How about MySQL ?• It works. But...

• Schema changes

• Write redundancy

• Query language mismatch

• So went looking for the right tool for the job

3

Redis ?

• Fast, flexible. But...

• Single core limit

• Replication, but no cluster (itʼs coming)

• Limited support options

4

Couch DB ?• Schema free, scalable (sort of),

redundant (sort of). But...

• Single write thread limit

• Replication, but no cluster (itʼs coming)

• Low consistency with asynchronous replication

5

Cassandra ?• Just right, perhaps. Letʼs see...

• Highly available

• Tuneable synchronous replication

• Scalable writes and reads

• Schema free (sort of)

• Lots of new mistakes to be made

6

Availability• Row data is kept together and

replicated around the cluster

• Replication Factor is configurable

• Partitioner determines the position of a row key in the distributed hash table

• Replication Strategy determines where in the cluster to place the replicas

7

Consistency• Each read or write request specifies a

Consistency Level

• Individual nodes may be inconsistent with respect to others

• Reads may give consistent results while some nodes have inconsistent values

• The entire cluster will eventually mode to a state where there is one version of each

8

Consistency

• R + W > N

• R = Read Consistency

• W = Write Consistency

• N = Replication Factor

9

Scale

• Distributed hash table

• Scale throughput and capacity with more nodes, more disk, more memory

• Adding or removing nodes is an online operation

• Gossip based protocol for discovery

10

Data Model• Column orientated

• Denormalise

• Cassandra in an index building machine

• Simple explanation: a row has a key and stores an ordered hash in one or more Column Families

11

Data Model

• Keyspace

• Row / Key

• Column Family or Super Column Family

• Column

12

Data Model

User CF Posts SCF

Fred email:fred@...dob:04/03

post_1:{title: foo,body: bar}

Bob email:bobpost_100:{

title: monkeys,body: naughty}

13

API• Thrift

• Avro (beta)

• Auto generated bindings for many languages

• Stateful connections

• Python wrappers pycassa, Telephus (twisted)

14

API

• Client supplied time stamp for all mutations

• Client supplied Consistency Level for all mutations and reads

15

API

• insert (key, column_family, super_column, column, value)

• get(key, column_family, super_column, column)

• remove(key, column_family, super_column, column)

16

API• Slicing columns or super columns

• list of names

• start, finish, count, reversed

• get_slice() to slice one row

• multiget_slice() to slice multiple rows

• get_range_slices() to slice rows and columns

17

API

• Slicing keys

• start key, finish key, count

• Partitioner effects key order

• get_range_slices() to slice rows and columns

18

API

• batch_mutate()

• multiple rows and CFʼs

• delete or insert / update

• Individual mutations are atomic

• Request is not atomic, no rollback

19

Our ApplicationVarnish

Nginx

Tornado

Cassandra Rabbit MQ

20

Our Application

• Similar to Amazon S3.

• REST API.

• Databases, Buckets, Keys+Values.

21

Our Column Families

• Database (super)

• Bucket (super)

• Bucket Index

• Object

• Object Index (super)

22

Our API

http:// db_name.wetafx.co.nz/bucket/key

23

PUT Object

• /bucket/object

• batch_mutate()

• one row in Objects CF with columns for meta and the body

• one column in ObjectIndex CF row for the bucket

24

List Objects

• /bucket_name?start=foo

• get_slice()

• for the bucket row in ObjectIndex CF

• if needed, multiget_slice() to “join” to the Object CF

25

Delete Bucket

• /bucket_name

• get_slice() on ObjectIndex CF

• batch_mutate() to delete Object CF and ObjectIndex CF

• delete Bucket CF row

26

Thanks

• http://wetafx.co.nz

• http://cassandra.apache.org/

27

top related