an introduction to apache cassandra

A Comprehensive Introduction to

Apache CassandraSaeid [email protected] 2015

https://twitter.com/saeidzeb


http://zebardast.com


Agenda ● What is NoSQL?● What is Cassandra?● Architecture● Data Model● Key Features and Benefits● Hardware● Directories and Files● Cassandra Tools

○ CQL○ Nodetool○ DataStax Opscenter

● Backup and Restore● Who’s using Cassandra?

2

What is NoSQL?● NoSQL (Not Only SQL)

● Simplicity of Design

● Horizontal Scaling (Scale Out)○ Add nodes to the Cluster as much as you wish○ Not all NoSQL databases.

● Finer Control over availability

● Data Structure○ Key-Value○ Column-Oriented○ Graph○ Document-Oriented○ And etc.

3

https://en.wikipedia.org/wiki/NoSQL#Types_of_NoSQL_databases

What is Cassandra?● Since 2008 - Current stable version 2.1.2 (Nov 2014)

● NoSQL

● Distributed

● Open source

● Written in Java

● High performance

● Extremely scalable

● Fault tolerant (i.e no SPOF)

4

Architecture Highlights● Scale out, not up

● Peer-to-Peer, distributed system○ All nodes the same - masterless with no SPOF

● Online load balancing, cluster growth

● Understanding System/Hardware failures

● Custom data replication to ensure fault tolerance

● CAP theorem (Consistency, Availability, Partition tolerance)○ You can not have the tree at the same time○ Tradeoff between consistency and latency are tunable○ Strong Consistency = Increased Latency

● Each node communicates with each other○ through the Gossip protocol

5

Architecture Layers

Core Layer Middle Layer Top Layer

● Messaging service● Gossip Failure detection● Cluster state● Partitioner● Replication

● Commit log● Memtable● SSTable● Indexes● Compaction

● Tombstones● Hinted handoff● Read repair● Bootstrap● Monitoring● Admin tools

Architecture Layers

6

Architecture of a write1. At first write to a disk commit log (sequential).

2. After write to commit log, it is sent to the appropriate nodes.

3. Each node receiving write, first records it in a local log, then makes update to appropriate Memtables (one for each column family).○ Memtable is in-memory representation of data (before the data gets flushed to disk as an SSTable).○ Memtables are flushed to disk when:

■ Out of space■ Too many keys (128 is default)■ Time duration (Client provided - no cluster clock)

4. When Memtables written out two files go out:○ Data File (SSTable).○ Index File (SSTable Index)

5. When a commit log has had all its column families pushed to disk, it is deleted.

6. Compaction○ Periodically data files are merged sorted into a new file.○ Merge keys○ Combine columns○ Discard tombstones

7

Data Model● [Keyspace][ColumnFamily][Key][Column]

● A keyspace is akin to a database in RDBMS

● The keyspace is a row-oriented, column structure

● A column family is similar to an RDBMS table○ More flexible/dynamic

● A row in a column family is indexed by its key (Primary Key).○ Cassandra supports up to 2 billion columns per (physical) row.

● Sample code to create keyspace and column family:○ CREATE KEYSPACE logs WITH replication = {'class': 'SimpleStrategy',

'replication_factor': 1} ;○ CREATE TABLE logs.samples (

node_id text,metric text,collection_ts timestamp,value bigint,PRIMARY KEY ((node_id, metric), collection_ts)

) WITH CLUSTERING ORDER BY (collection_ts DESC);8

Data Model - Primary Keys● Primary Keys are unique.

● Single Primary Key○ PRIMARY KEY(keyColumn)

● Composite Primary Key○ PRIMARY KEY (myPartiotionKey, my1stClusteringKey, my2stClusteringKey)

● Composite Partitioning Key○ PRIMARY KEY ((my1PartiotionKey ,my2PartiotionKey), myClusteringKey)

9

Data Model - Time-To-Live (TTL)● TTL a row

○ INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘saeid’, ‘zeb’) USING TTL 3600; //Expires data in one our

● TTL a column○ UPDATE users USING TTL 30 SET last = ‘zebardast’ WHERE id = ‘abc123’;

● TTL is in seconds

● Can also set default TTL at a table level.

● Expired columns/rows automatically deleted.

● With no TTL specified, columns/values never expire.

● TTL is useful for automatic deletion.

● Re-inserting the same row before it expires will overwrite TTL.

10

Partitioners - Consistent hashing● A partitioner determines how data is distributed across the nodes in the cluster (including replicas).

● A partitioner is a function for deriving a token representing a row from its partition key (typically by hashing).

11

name email gender

Saeid [email protected] M

Kamyar [email protected] M

Nazanin [email protected] F

Masoud [email protected] M

partition key Murmur3 hash value

Saeid -2245462676723223822

Kamyar 7723358927203680754

Nazanin -6723372854036780875

Masoud 1168604627387940318

Cassandra places the data on each node according to the value of partition key and the range that the node is responsible for.

Node Start range End range Partition key Hash value

A -9223372036854775808 -4611686018427387903 Saeid -6723372854036780875

B -4611686018427387904 -1 Kamyar -2245462676723223822

C 0 4611686018427387903 Nazanin 1168604627387940318

D 4611686018427387904 9223372036854775807 Masoud 7723358927203680754

Cassandra assigns a hash value to each partition key

Key Features and Benefits● Gigabyte to Petabyte scalability

● Linear performance

● No SPOF

● Easy replication / data distribution

● Multi datacenter and cloud capable

● No need for separate caching layer

● Tunable data consistency

● Flexible schema design

● Data compaction

● CQL Language (like SQL)

● Support for key languages and platforms

● No need for special hardware or software

12

Big Data Scalability● Capable of comfortably scaling to petabytes

● New nodes = linear performance increase

● Add new nodes online

13

No Single Point of Failure● All nodes the same

○ Peer-to-Peer - masterless

● Customized replication affords tunales data redundancy

● Read/Write from any node

● Can replicate data among different physical data center racks

14

Easy Replication / Data Distribution● Transparently handled by Cassandra● Multi-data center capable● Exploits all the benefits of Cloud computing● Able to do Hybrid Cloud/On-Premise setup

15

No Need for Caching Software● Peer-to-Peer architecture

○ removes need for special caching layer

● The database cluster uses the memory from all participating nodes to cache the data assigned

to each node.

● No irregularities between a memory cache and database are encountered

16

Tunable Data Consistency● Choose between strong and eventual consistency

○ Depends on the need

● Can be done on a per operation basis, and for both read and writes.

● Handle Multi-data center operations

● Consistency Level (CL)○ ALL = all replicas ack○ QUORUM = > 51% of replicas ack○ ONE = only one replica ack○ Plus more… (see docs)

17

http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html

Flexible Schema● Dynamic schema design

● Handles structured, semi-structured, and unstructured data.

● Counters is supported

● No offline/downtime for schema changes

● Support primary and secondary indexes○ Secondary indexes != Relational Indexes (They are not for convenient not speed)

18

Data Compaction● Use Google’s Snappy data compression algorithm

● Compresses data on a per column family level

● Internal tests at DataStax show up to 80%+ compression on row data

● No performance penalty○ Some increases in overall performance due to less physical I/O

19

Locally Distributed● Client reads or writes to any node

● Node coordinates with others

● Data read or replicated in parallel

● Replication info○ Replication Factor (RF): How many copy of your data?

○ Each node is storing (RF/Cluster Size)% of the clusters total data.

○ Handy Calculator: http://www.ecyrd.com/cassandracalculator/

20

http://www.ecyrd.com/cassandracalculator/

Rack Aware● Cassandra is aware of which rack (or availability zone) each node resides in.

● It will attempt to place each data copy in different rack.

21

Data Center Aware● Active Everywhere - reads/writes in multiple data centers

● Client writes local

● Data syncs across WAN

● Replication Factor per DC

● Different number of nodes per data center

22

Node Failure● A single node failure shouldn’t bring failure.

● Replication Factor + Consistency Level = Success

23

Node Recovery● When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally.● When the node recovers, the coordinator replays the missed writes.● Note: a hinted write does not count towards the consistency level.● Note: you should still run repairs across your cluster.

24

Security in Cassandra● Internal Authentication

○ Manages login IDs and passwords inside the database.

● Object Permission Management○ Controls who has access to what and who can do what in the database○ Uses familiar GRANT/REVOKE from relational systems.

● Client to Node Encryption○ Protects data in flight to and from a database

25

Hardware● RAM

○ The more memory a Cassandra node has, the better read performance.■ For dedicated hardware, the optimal price-performance sweet spot is 16GB to 64GB; the minimum is 8GB.■ For a virtual environments, the optimal range may be 8GB to 16GB; the minimum is 4GB.

● CPU○ More cores is better. Cassandra is built with concurrency in mind.

■ For dedicated hardware, 8-core CPU processors are the current price-performance sweet spot.■ For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace.

● Disk○ Cassandra tries to minimize random IO. Minimum of 2 disks. Keep CommitLog and Data (SSTable) on separate

spindles. RAID10 or RAID0 as you see fit.○ XFS or ext4.

● Network○ Be sure that your network can handle traffic between nodes without bottlenecks.

■ Recommended bandwidth is 1000 Mbit/s (gigabit) or greater.

● More info: Selecting hardware for enterprise implementations...26

http://www.datastax.com/documentation/cassandra/2.1/cassandra/planning/architecturePlanningHardware_c.html

Directories and Files● Configs

○ The main configuration file for Cassandra■ /etc/cassandra/cassandra.yaml

○ Java Virtual Machine (JVM) configuration settings■ /etc/cassandra/cassandra-env.sh

● Data directories○ /var/lib/cassandra

● Log directory○ /var/log/cassandra

● Environment settings○ /usr/share/cassandra

● Cassandra user limits○ /etc/security/limits.d/cassandra.conf

● More info: Package installation directories...

27

http://www.datastax.com/documentation/cassandra/2.1/cassandra/reference/referenceInstallLocatePkg_r.html

CQL Language● Very similar to RDBMS SQL syntax

● Create objects via DDL (e.g. CREATE)

● Core DML commands supported: INSERT, UPDATE, DELETE

● Query data with SELECT

● cqlsh, the Python-based command-line client○ CASSANDRA_PATH/bin/cqlsh

● More info: https://cassandra.apache.org/doc/cql/CQL.html

28

https://cassandra.apache.org/doc/cql/CQL.html

Nodetool● A command line interface for managing a cluster.

○ CASSANDRA_PATH/bin/nodetool

● Useful commands:○ nodetool info - Display node info (uptime, load and etc.).○ nodetool status [keyspace] - Display cluster info (state, load and etc.).○ nodetool cfstats [keyspace] - Display statistics of column families.○ nodetool tpstats - Display usage statistics of thread pool.○ nodetool netstats - Display network information.○ nodetool repair - Repair one or more column families.○ nodetool rebuild - Rebuild data by streaming from other nodes (similarly to bootstrap).○ nodetool drain - Flush Memtables to SSTables on disk and stop accepting writes. Useful before a restart to make startup

quick.○ nodetool flush [keyspace [columnfamily]] - Flushes one or more column families from the memtable.○ nodetool cfhistograms keyspace columnfamily - Display statistic histograms for a given column family.○ nodetool proxyhistograms - Display statistic histograms for network operations.○ nodetool help - Display help information!

29

Backup and Restore● Take Snapshot

○ nodetool snapshot■ /var/lib/cassandra/keyspace_name/table_name-UUID/snapshots/snapshot_name

○ nodetool clearsnapshot

● Restore Procedure○ Shutdown the node.○ Clear all files in the commitlog directory (/var/lib/cassandra/commitlog)○ Delete all *.db files in data_directory_location/keyspace_name/table_name-UUID directory.○ Locate the most recent snapshot folder in this directory:

■ data_directory_location/keyspace_name/table_name-UUID/snapshots/snapshot_name○ Copy its contents into this directory:

■ data_directory_location/keyspace_name/table_name-UUID○ Start the node

■ Restarting causes a temporary burst of I/O activity and consumes a large amount of CPU resources.○ Run nodetool repair

● More info: Restoring from a Snapshot...

30

http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html

DataStax Opscenter● Visually create new clusters with a few mouse clicks either on premise or in the cloud● Add, edit, and remove nodes● Automatically rebalance a cluster● Control automatic management services including transparent repair● Manage and schedule backup and restore operations● Perform capacity planning with historical trend analysis and forecasting capabilities● Proactively manage all clusters with threshold and timing-based alerts● Generate reports and diagnostic reports with the push of a button● Integrate with other enterprise tools via developer API● More info: http://www.datastax.com/datastax-opscenter

31

http://www.datastax.com/datastax-opscenter

Who’s Using Cassandra?● Apple● CERN● Cisco● Digg● Facebook● IBM● Instagram● Mahalo.com● Netflix● Rackspace● Reddit● SoundCloud● Spotify● Twitter● Zoho● http://planetcassandra.org/companies/

32

http://planetcassandra.org/companies/

http://planetcassandra.org/companies/

Where Can I Learn More?● https://cassandra.apache.org/● http://planetcassandra.org/● http://www.datastax.com

33

https://cassandra.apache.org/

https://cassandra.apache.org/

http://planetcassandra.org/

http://planetcassandra.org/

http://www.datastax.com

http://www.datastax.com

Thank you

Saeid [email protected] 2015

AnyQuestions, Comments?

34





an introduction to apache cassandra

Technology