cassandra 1.1

29
©2012 DataStax Apache Cassandra 1.1 Jonathan Ellis / @spyced

Upload: jbellis

Post on 15-Jan-2015

3.472 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cassandra 1.1

©2012 DataStax

Apache Cassandra 1.1

Jonathan Ellis / @spyced

Page 2: Cassandra 1.1

©2012 DataStax

• CQL3

• Global row + key caches

• Fine-grained data storage control

• Row level isolation

• Concurrent schema changes

• Off-heap cache works on Windows

• "Write survey mode"

• Hadoop improvements

• Stress tool

New features in 1.1

Page 3: Cassandra 1.1

©2012 DataStax

Modern Cassandra, briefly• 0.7

• CREATE COLUMN FAMILY

• TTL

• Secondary (column) indexes

• 0.8• Counters

• Automatic memtable tuning

• 1.0• Compression

• Leveled compaction

Page 4: Cassandra 1.1

©2012 DataStax

Global row + key caches• cassandra.yaml

• key_cache_size_in_mb (default 2)

• row_cache_size_in_mb (default 0)

• Also save periods

• Per-CF: caching=ALL|KEYS_ONLY*|ROWS_ONLY|NONE

• Old CF-level options are ignored• row_cache_size, key_cache_size

• save periods

Page 5: Cassandra 1.1

©2012 DataStax

Data storage• Old:

• /var/lib/cassandra/data/Keyspace1/Standard1-hc-1-Data.db

• New:• /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-

Standard1-hc-1-Data.db

• (Includes KS in !lename for easier bulk loading)

Page 6: Cassandra 1.1

©2012 DataStax

Row-level isolation• Never see partial updates to a row

• We now have AID from ACID• C in ACID != C in CAP

Page 7: Cassandra 1.1

©2012 DataStax

Concurrent schema changes• Fixes http://wiki.apache.org/cassandra/

FAQ#schema_disagreement

• Can still have temporary disagreements if you use a new CF before all nodes have it

• Also speeds up adding new nodes

Page 8: Cassandra 1.1

©2012 DataStax

Off-heap cache on Windows• SerializingCacheProvider no longer requires JNA

• SCP is the default starting with 1.0, but falls back to CLHCP if JNA is not present in < 1.1

Page 9: Cassandra 1.1

©2012 DataStax

Write survey mode• bin/cassandra -Dcassandra.write_survey=true

• Allows experimenting w/ compaction, compression, new versions*• isolate node to test reads

Page 10: Cassandra 1.1

©2012 DataStax

Abortable compactions• nodetool stop <type>

Page 11: Cassandra 1.1

©2012 DataStax

• (CQL2 is still default)

• Composite PK support• .. slice syntax removed

• ORDER BY syntax conforms to SQL

CQL3

Page 12: Cassandra 1.1

©2012 DataStax

A simple exampleCREATE TABLE tweets (    tweet_id uuid PRIMARY KEY,    author varchar,    body varchar);

Page 13: Cassandra 1.1

©2012 DataStax

Tweets

tweet_id

1790

1787

1778

author body

gwashingtonTo be prepared for war is one of the most

effectual means of preserving peace

jmadison All men having power ought to be distrusted to a certain degree

gmason

Those gentlemen, who will be elected senators, will fix themselves in the federal

town, and become citizens of that town more than of your state

Page 14: Cassandra 1.1

©2012 DataStax

With clustering

CREATE TABLE timeline (    user_id varchar,    tweet_id uuid,    author varchar,    body varchar,    PRIMARY KEY (user_id, tweet_id));

partition keyclustered

Page 15: Cassandra 1.1

©2012 DataStax

Timeline

user_id

jadams

jadams

ahamilton

ahamilton

tweet_id author body

1787 jmadison All men ...

1790 gwashington To be prepared ...

1778 gmason Those gentlemen ...

1790 gwashington To be prepared ...

clustered (within partition key)not

clustered

Page 16: Cassandra 1.1

©2012 DataStax

Timeline, physical layout

jadams

ahamilton

(1787, author): jmadison

(1787, body):All men ...

(1790, author): gwashington

(1790, body): To be prepared ...

(1778, author): gmason

(1778, body): Those gentlemen ...

(1790, author): gwashington

(1790, body): To be prepared ...

Non-PK columns contain string literal of column name

Page 17: Cassandra 1.1

©2012 DataStax

WITH COMPACT

CREATE TABLE timeline (    user_id varchar,    tweet_id uuid,    author varchar,    body varchar,    PRIMARY KEY (user_id, tweet_id, author))WITH COMPACT STORAGE;

• For backwards compatibilityAll but one column

Page 18: Cassandra 1.1

©2012 DataStax

jadams

ahamilton

(1787, jmadison): All men ...

(1790, gwashington): To be prepared ...

(1778, gmason): Those gentlemen ...

(1790, gwashington): To be prepared ...

no “body” literal

Page 19: Cassandra 1.1

©2012 DataStax

Earlier changes• (1.0.6) Allow CF names to be quali"ed by keyspace for

INSERT, ALTER, DELETE, TRUNCATE• INSERT INTO ks.cf (...) VALUES (...)

• (SELECT was done in 1.0.1)

• (1.0.4) ALTER CF attributes

Page 20: Cassandra 1.1

©2012 DataStax

cqlsh• SOURCE and CAPTURE commands

• (1.0.8) DESCRIBE COLUMNFAMILIES

Page 21: Cassandra 1.1

©2012 DataStax

The future is CQL (based)• cqlsh

• performance• prepared statements

• netty-based transport (CASSANDRA-2478)

• What does this mean for pycassa, Hector, et al?

Page 22: Cassandra 1.1

©2012 DataStax

• 2I support*

• Wide row support*

• BulkOutputFormat

• (*Covered in updated WordCount)

Hadoop Integration

Page 23: Cassandra 1.1

©2012 DataStax

Secondary Index supportIndexExpression expr = new IndexExpression( ByteBufferUtil.bytes("int4"), IndexOperator.EQ, ByteBufferUtil.bytes(0));

ConfigHelper.setInputRange( job.getConfiguration(),

Page 24: Cassandra 1.1

©2012 DataStax

Wide row supportConfigHelper.setInputColumnFamily( job.getConfiguration(), KEYSPACE, COLUMN_FAMILY, true);

Also: PIG_WIDEROW_INPUT

Page 25: Cassandra 1.1

©2012 DataStax

BulkOutputFormatjob.setOutputFormatClass( BulkOutputFormat.class);

• Compatible w/ CFOF + extra options

• OUTPUT_LOCATION

• BUFFER_SIZE_IN_MB

• STREAM_THROTTLE_MBITS

• (system default, 64, unlimited)

• Limitation: can’t stream to dead nodes ("x in 1.1.1?)

Page 26: Cassandra 1.1

©2012 DataStax

Stress tool• tools/bin/stress*

• Insert, read, seq scan, indexed scan, multiget, counter add/get

• CQL

Page 27: Cassandra 1.1

©2012 DataStax

Bonus: What’s new in C* 1.1.1• Incremental repair by token range

• Support for commitlog archiving and PITR

• Identify and blacklist corrupted SSTables from future compactions

• Open 1 sstableScanner per level for leveled compaction

• More CQL3 improvements (e.g. reversed clustering)

• "x re-creating Keyspaces/ColumnFamilies with the same name as dropped ones

Page 28: Cassandra 1.1

©2012 DataStax

DataStax Community, with OpsCenter

Page 29: Cassandra 1.1