cassandra 1.1
DESCRIPTION
TRANSCRIPT
©2012 DataStax
Apache Cassandra 1.1
Jonathan Ellis / @spyced
©2012 DataStax
• CQL3
• Global row + key caches
• Fine-grained data storage control
• Row level isolation
• Concurrent schema changes
• Off-heap cache works on Windows
• "Write survey mode"
• Hadoop improvements
• Stress tool
New features in 1.1
©2012 DataStax
Modern Cassandra, briefly• 0.7
• CREATE COLUMN FAMILY
• TTL
• Secondary (column) indexes
• 0.8• Counters
• Automatic memtable tuning
• 1.0• Compression
• Leveled compaction
©2012 DataStax
Global row + key caches• cassandra.yaml
• key_cache_size_in_mb (default 2)
• row_cache_size_in_mb (default 0)
• Also save periods
• Per-CF: caching=ALL|KEYS_ONLY*|ROWS_ONLY|NONE
• Old CF-level options are ignored• row_cache_size, key_cache_size
• save periods
©2012 DataStax
Data storage• Old:
• /var/lib/cassandra/data/Keyspace1/Standard1-hc-1-Data.db
• New:• /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-
Standard1-hc-1-Data.db
• (Includes KS in !lename for easier bulk loading)
©2012 DataStax
Row-level isolation• Never see partial updates to a row
• We now have AID from ACID• C in ACID != C in CAP
©2012 DataStax
Concurrent schema changes• Fixes http://wiki.apache.org/cassandra/
FAQ#schema_disagreement
• Can still have temporary disagreements if you use a new CF before all nodes have it
• Also speeds up adding new nodes
©2012 DataStax
Off-heap cache on Windows• SerializingCacheProvider no longer requires JNA
• SCP is the default starting with 1.0, but falls back to CLHCP if JNA is not present in < 1.1
©2012 DataStax
Write survey mode• bin/cassandra -Dcassandra.write_survey=true
• Allows experimenting w/ compaction, compression, new versions*• isolate node to test reads
©2012 DataStax
Abortable compactions• nodetool stop <type>
©2012 DataStax
• (CQL2 is still default)
• Composite PK support• .. slice syntax removed
• ORDER BY syntax conforms to SQL
CQL3
©2012 DataStax
A simple exampleCREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);
©2012 DataStax
Tweets
tweet_id
1790
1787
1778
author body
gwashingtonTo be prepared for war is one of the most
effectual means of preserving peace
jmadison All men having power ought to be distrusted to a certain degree
gmason
Those gentlemen, who will be elected senators, will fix themselves in the federal
town, and become citizens of that town more than of your state
©2012 DataStax
With clustering
CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));
partition keyclustered
©2012 DataStax
Timeline
user_id
jadams
jadams
ahamilton
ahamilton
tweet_id author body
1787 jmadison All men ...
1790 gwashington To be prepared ...
1778 gmason Those gentlemen ...
1790 gwashington To be prepared ...
clustered (within partition key)not
clustered
©2012 DataStax
Timeline, physical layout
jadams
ahamilton
(1787, author): jmadison
(1787, body):All men ...
(1790, author): gwashington
(1790, body): To be prepared ...
(1778, author): gmason
(1778, body): Those gentlemen ...
(1790, author): gwashington
(1790, body): To be prepared ...
Non-PK columns contain string literal of column name
©2012 DataStax
WITH COMPACT
CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id, author))WITH COMPACT STORAGE;
• For backwards compatibilityAll but one column
©2012 DataStax
jadams
ahamilton
(1787, jmadison): All men ...
(1790, gwashington): To be prepared ...
(1778, gmason): Those gentlemen ...
(1790, gwashington): To be prepared ...
no “body” literal
©2012 DataStax
Earlier changes• (1.0.6) Allow CF names to be quali"ed by keyspace for
INSERT, ALTER, DELETE, TRUNCATE• INSERT INTO ks.cf (...) VALUES (...)
• (SELECT was done in 1.0.1)
• (1.0.4) ALTER CF attributes
©2012 DataStax
cqlsh• SOURCE and CAPTURE commands
• (1.0.8) DESCRIBE COLUMNFAMILIES
©2012 DataStax
The future is CQL (based)• cqlsh
• performance• prepared statements
• netty-based transport (CASSANDRA-2478)
• What does this mean for pycassa, Hector, et al?
©2012 DataStax
• 2I support*
• Wide row support*
• BulkOutputFormat
• (*Covered in updated WordCount)
Hadoop Integration
©2012 DataStax
Secondary Index supportIndexExpression expr = new IndexExpression( ByteBufferUtil.bytes("int4"), IndexOperator.EQ, ByteBufferUtil.bytes(0));
ConfigHelper.setInputRange( job.getConfiguration(),
©2012 DataStax
Wide row supportConfigHelper.setInputColumnFamily( job.getConfiguration(), KEYSPACE, COLUMN_FAMILY, true);
Also: PIG_WIDEROW_INPUT
©2012 DataStax
BulkOutputFormatjob.setOutputFormatClass( BulkOutputFormat.class);
• Compatible w/ CFOF + extra options
• OUTPUT_LOCATION
• BUFFER_SIZE_IN_MB
• STREAM_THROTTLE_MBITS
• (system default, 64, unlimited)
• Limitation: can’t stream to dead nodes ("x in 1.1.1?)
©2012 DataStax
Stress tool• tools/bin/stress*
• Insert, read, seq scan, indexed scan, multiget, counter add/get
• CQL
©2012 DataStax
Bonus: What’s new in C* 1.1.1• Incremental repair by token range
• Support for commitlog archiving and PITR
• Identify and blacklist corrupted SSTables from future compactions
• Open 1 sstableScanner per level for leveled compaction
• More CQL3 improvements (e.g. reversed clustering)
• "x re-creating Keyspaces/ColumnFamilies with the same name as dropped ones
©2012 DataStax
DataStax Community, with OpsCenter