outside the box with apache cassnadra

43
Outside The Box With Apache Cassandra Eric Evans [email protected] @jericevans Palemetto Open Source Software Conference April 16, 2010

Upload: eric-evans

Post on 17-May-2015

4.595 views

Category:

Technology


0 download

DESCRIPTION

Cassandra presentation given at the 3rd annual Palmetto Open Source Software Conference (POSSCON 2010).

TRANSCRIPT

Page 1: Outside The Box With Apache Cassnadra

Outside The Box With Apache Cassandra

Eric [email protected]

@jericevans

Palemetto Open Source Software ConferenceApril 16, 2010

Page 2: Outside The Box With Apache Cassnadra

Cassandra is...

A massively scalable, decentralized, structured data store (akadatabase).

Page 3: Outside The Box With Apache Cassnadra

Outline

1 Background

2 Project History

3 Description

4 Case Studies

5 Roadmap

Page 4: Outside The Box With Apache Cassnadra

The Digital Universe

Page 5: Outside The Box With Apache Cassnadra

Consolidation

Page 6: Outside The Box With Apache Cassnadra

Old Guard

Page 7: Outside The Box With Apache Cassnadra

Vertical Scaling Sucks

Page 8: Outside The Box With Apache Cassnadra

CAP Theorem (aka Brewer’s Theorem)

Distributed systems cannot provide all three of:

• Consistency

• Availability

• Partition Tolerance

Page 9: Outside The Box With Apache Cassnadra

Influential Papers

Dynamo: Amazon’s Highly Available Key-value Store 1

• Voldemort

• Riak

Bigtable: A Distributed Storage System for Structured Data 2

• Hypertable

• HBase

1http:

//www.allthingsdistributed.com/2007/10/amazons_dynamo.html2http://labs.google.com/papers/bigtable-osdi06.pdf

Page 10: Outside The Box With Apache Cassnadra

Outline

1 Background

2 Project History

3 Description

4 Case Studies

5 Roadmap

Page 11: Outside The Box With Apache Cassnadra
Page 12: Outside The Box With Apache Cassnadra
Page 13: Outside The Box With Apache Cassnadra
Page 14: Outside The Box With Apache Cassnadra

• 7 new committers added

• Dozens of contributors

• 200+ (!) people on IRC

• Hundreds of closed issues (bugs, features, etc)

• 4 major releases; a number of stable point releases

• Graduation to TLP

Page 15: Outside The Box With Apache Cassnadra

Outline

1 Background

2 Project History

3 Description

4 Case Studies

5 Roadmap

Page 16: Outside The Box With Apache Cassnadra

Cassandra is...

• O(1) DHT

• Eventual consistency

• Tunable trade-offs, consistency vs. availability

Page 17: Outside The Box With Apache Cassnadra
Page 18: Outside The Box With Apache Cassnadra

But...

• Values are structured, indexed

• Columns / column families

• Slicing w/ predicates (queries)

Page 19: Outside The Box With Apache Cassnadra

Column families

Page 20: Outside The Box With Apache Cassnadra

Supercolumn families

Page 21: Outside The Box With Apache Cassnadra

Client API

• Thrift (12 different languages!)3

• High-level client libraries• Ruby• Perl• Python (Twisted too)• Scala• Java• PHP• Grails• C++

3http://incubator.apache.org/thrift

Page 22: Outside The Box With Apache Cassnadra

Querying

• get(): retrieve by column name

• multiget(): by column name for a set of keys

• get slice(): by column name, or a range of names• returning columns• returning super columns

• multiget slice(): a subset of columns for a set of keys

• get count: number of columns or sub-columns

• get range slice(): subset of columns for a range of keys

Page 23: Outside The Box With Apache Cassnadra

Updating

• insert(): add/update column (by key)

• batch insert(): add/update multiple columns (by key)

• remove(): remove a column

• batch mutate(): like batch insert() but can also delete(new for 0.6, deprecates batch insert())

Page 24: Outside The Box With Apache Cassnadra

Column comparators

• TimeUUID

• LexicalUUID

• UTF8

• Long

• Bytes

• ...

Page 25: Outside The Box With Apache Cassnadra

Consistency

CAP Theorem: choose any two of Consistency, Availability, orPartition tolerance.

• Zero

• One

• Quorum ((N / 2) + 1)

• All

Page 26: Outside The Box With Apache Cassnadra

About writes...

• Atomic within a column family

• Any node

• Always writeable (hinted hand-off)

• Fast

Page 27: Outside The Box With Apache Cassnadra

Writes

Page 28: Outside The Box With Apache Cassnadra

About reads...

• Any node

• Read repair

• Key cache

• Record cache

Page 29: Outside The Box With Apache Cassnadra

Reads

Page 30: Outside The Box With Apache Cassnadra

Outline

1 Background

2 Project History

3 Description

4 Case Studies

5 Roadmap

Page 31: Outside The Box With Apache Cassnadra

Case 1: Digg

Digg is a social news site that allows people to discover and sharecontent from anywhere on the Internet by submitting stories andlinks, and voting and commenting on submitted stories and links.

Ranked 98th by Alexa.com.

Page 32: Outside The Box With Apache Cassnadra

Digg

Page 33: Outside The Box With Apache Cassnadra

Problem

• Terabytes of data; high transaction rate (reads dominated)

• Multiple clusters; heavily sharded

• Management nightmare (high effort, error prone)

• Unsatisfied availability requirements (geographic isolation)

Page 34: Outside The Box With Apache Cassnadra

Solution

• Currently production on ”Green Badges”

• Cassandra as primary data store RSN

• Datacenter and rack-aware replication

Page 35: Outside The Box With Apache Cassnadra

Case 2: Twitter

Twitter is a social networking and microblogging service thatenables its users to send and read tweets, text-based posts of up to140 characters.

Ranked 12th by Alexa.com.

Page 36: Outside The Box With Apache Cassnadra

Twitter

Page 37: Outside The Box With Apache Cassnadra

MySQL

• Terabytes of data, ˜1,000,000 ops/s

• Calls for heavy sharding, light replication

• Schema changes are very difficult, (if possible at all)

• Manual sharding is very high effort

• Automated sharding and replication is Hard

Page 38: Outside The Box With Apache Cassnadra

Case 3: Facebook

Facebook is a social networking site where users can create aprofile, add friends, and send them messages. Users can also joingroups organized by location or other points of common interest.

Ranked #2 by Alexa.com.

Page 39: Outside The Box With Apache Cassnadra

Inbox Search

• 100 TB

• 160 nodes

• 1/2 billion writes per day (2yr old number?)

Page 40: Outside The Box With Apache Cassnadra

Outline

1 Background

2 Project History

3 Description

4 Case Studies

5 Roadmap

Page 41: Outside The Box With Apache Cassnadra

0.6

• batch mutate command

• authentication (basic)

• new consistency level, ANY

• fat client

• mmapped i/o reads (default on 64bit jvm)

• improved write concurrency (HH)

• networking optimizations

• row caching

• improved management tools

• per-keyspace replication factor

Page 42: Outside The Box With Apache Cassnadra

0.7

• more efficient compactions (row sizes bigger than memory)

• easier (dynamic?) column family changes

• SSTable versioning

• SSTable compression

• support for column family truncation

• improved configuration handling

• remove key range command

• even more improved management tools

• vector clocks w/ server-side conflict resolution

Page 43: Outside The Box With Apache Cassnadra

Questions?