Download - Cassandra20141113
![Page 1: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/1.jpg)
O V E R V I E W A N D R E A L W O R L D A P P L I C A T I O N S
Cassandra
Jersey Shore Tech Meetup
Nov 13, 2014
![Page 2: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/2.jpg)
You Are Not Here…*** http://njhalloffame.org/
2
![Page 3: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/3.jpg)
Agenda3
Some Basic Concepts/Overview
New Developments In Cassandra
Basic Data Modeling Concepts
Materialized Views
Secondary Indexes
Counters
Time Series Data
Expiring Data
![Page 4: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/4.jpg)
Cassandra High Level4
Cassandra's architecture is based on the combination of two technologies:
Google BigTable – Data Model
Amazon Dynamo – Distributed Architecture
BTW – these mean the same thing ->
Cassandra = C*
![Page 5: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/5.jpg)
Architecture Basics & Terminology5
Nodes are single instances of C*
Cluster is a group of nodes
Data is organized by keys (tokens) which are distributed across the cluster
Replication Factor (rf) determines how many copies are key
Data Center Aware – works well in multi-DC/EC2 etc.
Consistency Level – powerful feature to tune consistency vs. speed vs. availability.’
![Page 6: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/6.jpg)
C* Ring6
![Page 7: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/7.jpg)
More Architecture7
Information on who has what data and who is available is transferred using gossip.
No single point of failure (SPF), every node can service requests.
Handles Replication and Downed Nodes (within reason)
![Page 8: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/8.jpg)
CAP Theorem8
Distributed Systems Law:
Consistency
Availability
Partition Tolerance
(you can only really have two in a distributed system)
Cassandra is AP with Eventual Consistency
![Page 9: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/9.jpg)
Consistency9
Cassandra Uses the concept of Tunable Consistency, which make it very powerful and flexible for system needs.
![Page 10: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/10.jpg)
C* Persistence Model10
![Page 11: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/11.jpg)
Read Path11
![Page 12: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/12.jpg)
Write Path12
![Page 13: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/13.jpg)
Data Model Architecture13
Keyspace – container of column families (tables). Defines RF among others.
Table – column family. Contains definition of schema.
Row – a “record” identified by a key
Column - a key and a value
![Page 14: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/14.jpg)
14
![Page 15: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/15.jpg)
Deletions15
Distributed systems present unique problem for deletes. If it actually deleted data and a node was down and didn’t receive the delete notice it would try and create record when came back online. So…
Tombstone - The data is replaced with a special value called a Tombstone, works within distributed architecture
![Page 16: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/16.jpg)
Keys16
Primary Key
Partition Key – identifies a row
Cluster Key – sorting within a row
Using CQL these are defined together as a compound (composite) key
Compound keys are how you implement “wide rows”, the COOL FEATURE!
![Page 17: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/17.jpg)
Single Primary Key17
create table users (
user_id UUID PRIMARY KEY,
firstname text,
lastname text,
emailaddres text
);
** Cassandra Data Types
http://www.datastax.com/documentation/cql/3.0/cql/cql_ref
erence/cql_data_types_c.html
![Page 18: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/18.jpg)
Compound Key18
create table users (
emailaddress text,
department text,
firstname text,
lastname text,
PRIMARY KEY (emailaddress, department)
);
Partition Key plus Cluster Key
emailaddress is partition key
department is cluster key
![Page 19: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/19.jpg)
Compound Key19
create table users (
emailaddress text,
department text,
country text,
firstname text,
lastname text,
PRIMARY KEY ((emailaddress, department), country)
);
Partition Key plus Cluster Key
Emailaddress & department is partition key
country is cluster key
![Page 20: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/20.jpg)
New Rules20
Writes Are Cheap
Denormalize All You Need
Model Your Queries, Not Data (understand access patterns)
Application Worries About Joins
![Page 21: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/21.jpg)
What’s New In 2.021
Conditional DDL
IF Exists or If Not Exists
Drop Column Support
ALTER TABLE users DROP lastname;
![Page 22: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/22.jpg)
More New Stuff22
Triggers
CREATE TRIGGER myTrigger
ON myTable
USING 'com.thejavaexperts.cassandra.updateevt'
Lightweight Transactions (CAS)UPDATE users
SET firstname = 'tim'
WHERE emailaddress = '[email protected]'
IF firstname = 'tom';
** Not like an ACID Transaction!!
![Page 23: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/23.jpg)
CAS & Transactions23
CAS - compare-and-set operations. In a single, atomic operation compares a value of a column in the database and applying a modification depending on the result of the comparison.
Consider performance hit. CAS is (was) considered an anti-pattern.
![Page 24: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/24.jpg)
Data Modeling… The Basics24
Cassandra now is very familiar to RDBMS/SQL users.
Very nicely hides the underlying data storage model.
Still have all the power of Cassandra, it is all in the key definition.
RDBMS = model data
Cassandra = model access (queries)
![Page 25: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/25.jpg)
Side-Note On Querying25
Create table with compound key
Select using ALLOW FILTERING
Counts
Select using IN or =
![Page 26: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/26.jpg)
Batch Operations26
Saves Network Roundtrips
Can contain INSERT, UPDATE, DELETE
Atomic by default (all or nothing)
Can use timestamp for specific ordering
![Page 27: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/27.jpg)
Batch Operation Example27
BEGIN BATCH
INSERT INTO users (emailaddress, firstname, lastname, country) values
('[email protected]', 'brian', 'enochson', 'USA');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('[email protected]', 'tom', 'peters', 'DE');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('[email protected]', 'jim', 'smith', 'USA');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('[email protected]', 'alan', 'rogers', 'USA');
DELETE FROM users WHERE emailaddress = '[email protected]';
APPLY BATCH;
select in cqlsh
List in cassandra-cli with timestamp
![Page 28: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/28.jpg)
More Data Modeling…28
No Joins
No Foreign Keys
No Third (or any other) Normal Form Concerns
Redundant Data Encouraged. Apps maintain consistency.
![Page 29: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/29.jpg)
Secondary Indexes29
Allow defining indexes to allow other access than partition key.
Each node has a local index for its data.
They have uses, but shouldn’t be used all the time without consideration.
We will look at alternatives.
![Page 30: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/30.jpg)
Secondary Index Example30
Create a table
Try to select with column not in PK
Add Secondary Index
Try select again. (maybe need to reinsert)
![Page 31: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/31.jpg)
When to use?31
Low Cardinality – small number of unique values
High Cardinality – high number of distinct values
Secondary Indexes are good for Low Cardinality. So country codes, department codes etc. Not email addresses.
![Page 32: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/32.jpg)
Materialized View32
Want full distribution can use what is called a Materialized View pattern.
Remember redundant data is fine.
Model the queries
![Page 33: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/33.jpg)
Materialized View Example33
Show normal able with compound key and querying limitations
Create Materialized View Table With Different Compound Key, support alternate access.
Selects use partition key.
Secondary indexes local, not distributed
Allow filtering. Can cause performance issues
![Page 34: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/34.jpg)
Counters34
Updated in 2.1 and now work in a more distributed and accurate manner.
Table organization, example
How to update, view etc.
![Page 35: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/35.jpg)
Time Series Example….35
Time series table model.
Need to consider interval for event frequency and wide row size.
Make what is tracked by time and unit of interval partition key.
![Page 36: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/36.jpg)
Time Series Data36
Due to its quick writing model Cassandra is suited for storing time series data.
The Cassandra wide row is a perfect fit for modeling time series / time based events.
Let’s look at an example….
![Page 37: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/37.jpg)
Event Data37
Notice primary key and cluster key.
Insert some data
View in CQL, then in CLI as wide row
![Page 38: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/38.jpg)
TTL – Self Expiring Data38
Another technique is data that has a defined lifespan.
For instance session identifiers, temporary passwords etc.
For this Cassandra provides a Time To Live (TTL) mechanism.
![Page 39: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/39.jpg)
TTL Example…39
Create table
Insert data using TTL
Can update specific column with table
Show using selects.
![Page 40: Cassandra20141113](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559b850b1a28ab29458b470c/html5/thumbnails/40.jpg)
Questions40
http://www.thejavaexperts.net/
Email: [email protected]
Twitter: @benochso
G+: https://plus.google.com/+BrianEnochson