paris cassandra meetup - cassandra for developers
TRANSCRIPT
Cassandra for DevelopersDataStax Drivers in Practice
Michaël FiguièreDrivers & Developer Tools Architect
@mfiguiere
© 2014 DataStax, All Rights Reserved.
Cassandra Peer to Peer Architecture
2
Node
Node Node
Node
NodeNode
Each node contains a replica of some partitions of tables
Every node have the same role, there’s no Master or Slave
© 2014 DataStax, All Rights Reserved.
Cassandra Peer to Peer Architecture
3
Node
Node Replica
Replica
ReplicaNode
Each partition is stored in several Replicas to ensure durability and high availability
© 2014 DataStax, All Rights Reserved.
Client / Server Communication
4
Client
Client
Client
Client
Node
Node Replica
Replica
ReplicaNode
Coordinator node:Forwards all R/W requeststo corresponding replicas
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
5
3 replicas
A A A
Time
5
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
66
Write and wait for acknowledge from one node
Write ‘B’
B A A
Time
A A A
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
77
Write and wait for acknowledge from one node
Write ‘B’
B A A
Time
A A A
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
88
R + W < N
Read waiting for one node to answer
B A A
8
B A A
A A A
Write and wait for acknowledge from one node
Time
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
9
R + W = N
B B A
B A
A A A
B
Write and wait for acknowledges from two nodes
Read waiting for one node to answer
Time
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
10
R + W > N
B A
B A
A A A
B
B
Write and wait for acknowledges from two nodes
Read waiting for two nodes to answer
Time
© 2014 DataStax, All Rights Reserved.
Tunable Consistency
11
R = W = QUORUM
B A
B A
A A A
B
B
Time
QUORUM = (N / 2) + 1
© 2014 DataStax, All Rights Reserved.
Cassandra Query Language (CQL)
• Similar to SQL, mostly a subset• Without joins, sub-queries, and aggregations• Primary Key contains:
• A Partition Key used to select the partition that will store the Row
• Some Clustering Columns, used to define how Rows should be grouped and sorted on the disk
• Support Collections• Support User Defined Types (UDT)
12
© 2014 DataStax, All Rights Reserved. 13
CQL: Create Table
CREATE TABLE users ( login text, name text, age int, …PRIMARY KEY (login));
login is the partition key, it will be hashed and rows will be spread over the cluster on different partitions
Just like in SQL!
© 2014 DataStax, All Rights Reserved. 14
CQL: Clustered Table
CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id));
message_id is a clustering column, it means that all the rows with a same login will be grouped and sorted by message_id on the disk
A TimeUUID is a UUID that can be sorted chronologically
© 2014 DataStax, All Rights Reserved. 15
CQL: Queries
SELECT * FROM mailboxWHERE login = jdoeAND message_id = '2014-09-25 16:00:00';
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoeAND message_id <= '2014-09-25 16:00:00'AND message_id >= '2014-09-20 16:00:00';
Get message by user and date interval
WHERE clauses can only be constraints on the primary key and range queries are not possible on the partition key
© 2014 DataStax, All Rights Reserved. 16
CQL: Collections
CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY (login)); It’s not possible to use nested
collections… yet
set and list have a similar semantic as in Java
© 2014 DataStax, All Rights Reserved. 17
Cassandra 2.1: User Defined Type (UDT)
CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, …PRIMARY KEY(login));
CREATE TYPE address ( street_number int, street_name text, postcode int, country text);
CREATE TABLE users ( login text, … location frozen<address>, … PRIMARY KEY(login));
© 2014 DataStax, All Rights Reserved. 18
Cassandra 2.1: UDT Insert / Update
INSERT INTO users(login,name, location) VALUES ('jdoe','John DOE', { 'street_number': 124, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' });
UPDATE users SET location = { 'street_number': 125, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' } WHERE login = jdoe;
© 2014 DataStax, All Rights Reserved.
Client / Server Communication
19
Client
Client
Client
Client
Node
Node Replica
Replica
ReplicaNode
Coordinator node:Forwards all R/W requeststo corresponding replicas
© 2014 DataStax, All Rights Reserved.
Request Pipelining
20
Client
WithoutRequest Pipelining
Cassandra
Client CassandraWith
Request Pipelining
© 2014 DataStax, All Rights Reserved.
Notifications
21
Client
WithoutNotifications
WithNotifications
NodeNode
Node
Client
NodeNode
Node
© 2014 DataStax, All Rights Reserved.
Asynchronous Driver Architecture
22
ClientThread
Node
Node
Node
ClientThread
ClientThread
Node
Driver
© 2014 DataStax, All Rights Reserved.
Asynchronous Driver Architecture
23
ClientThread
Node
Node
Node
ClientThread
ClientThread
Node
6
23
45
1
Driver
© 2014 DataStax, All Rights Reserved.
Failover
24
ClientThread
Node
Node
Node
ClientThread
ClientThread
Node
7
2
4
531
Driver
6
© 2014 DataStax, All Rights Reserved.
DataStax Drivers Highlights
• Asynchronous architecture using Non Blocking IOs• Prepared Statements Support• Automatic Failover• Node Discovery• Tunable Load Balancing
• Round Robin, Latency Awareness, Multi Data Centers, Replica Awareness
• Cassandra Tracing Support• Compression & SSL
25
© 2014 DataStax, All Rights Reserved.
DataCenter Aware Balancing
26
Node
Node
NodeClient
Datacenter B
Node
Node
Node
Client
Client
Client
Client
Client
Datacenter A
Local nodes are queried first, if non are available, the request could be sent to a remote node.
© 2014 DataStax, All Rights Reserved.
Token Aware Balancing
27
Nodes that own a Replica of the PK being read or written by the query will be contacted first.
Node
Node
ReplicaNode
Client
Replica
Replica
Partition Key will be inferred from Prepared Statements metadata
© 2014 DataStax, All Rights Reserved.
State of DataStax Drivers
28
Cassandra1.2
Cassandra2.0
Cassandra2.1
Java 1.0 - 2.1 2.0 - 2.1 2.1
Python 1.0 - 2.1 2.0 - 2.1 2.1
C# 1.0 - 2.1 2.0 - 2.1 2.1
Node.js 1.0 1.0 Later
C++ 1.0-beta4 1.0-beta4 Later
Ruby 1.0-beta3 1.0-beta3 Later
Later versions of Cassandra can use earlier Drivers, but some features won’t be supported
© 2014 DataStax, All Rights Reserved. 29
DataStax Driver in Practice
<dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-‐driver-‐core</artifactId> <version>2.1.0</version>
</dependency>
Java
$ pip install cassandra-‐driver
Python
PM> Install-‐Package CassandraCSharpDriver
C#
gem install cassandra-‐driver -‐-‐pre
Ruby
$ npm install cassandra-‐driver
Node.js
© 2014 DataStax, All Rights Reserved. 30
Connect and Write
Cluster cluster = Cluster.builder() .addContactPoints("10.1.2.5", "cassandra_node3") .build();
Session session = cluster.connect(“my_keyspace");
session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', '[email protected]')");
The rest of the nodes will be discovered by the driver
A keyspace is just like a schema in the SQL world
© 2014 DataStax, All Rights Reserved. 31
Read
ResultSet resultSet = session.execute( "SELECT * FROM user WHERE user_id IN (1,8,13)");
List<Row> rows = resultSet.all(); for (Row row : rows) {
String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}
Actually ResultSet also implements Iterable<Row>
Session is a thread safe object. A singleton should be instantiated at startup
© 2014 DataStax, All Rights Reserved. 32
Write with Prepared Statements
PreparedStatement insertUser = session.prepare( "INSERT INTO user (user_id, name, email) VALUES (?, ?, ?)");
BoundStatement statement = insertUser .bind(12345, "johndoe", "[email protected]") .setConsistencyLevel(ConsistencyLevel.QUORUM);
session.execute(statement);
Parameters can be named as well
PreparedStatement objects are also threadsafe, just create a singleton at startup
BoundStatement is a stateful, NON threadsafe object
Consistency Level can be set for each statement
© 2014 DataStax, All Rights Reserved. 33
Asynchronous Read
ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");
ResultSet resultSet = future.get();
List<Row> rows = resultSet.all(); for (Row row : rows) {
String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}
Will not block. Returns immediately
Will block until less all the connections are busy
© 2014 DataStax, All Rights Reserved. 34
Asynchronous Read with Callbacks
ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");
future.addListener(new Runnable() { public void run() { // Process the results here }}, executor);
ResultSetFuture implements Guava’s ListenableFuture
executor = Executors .newCachedThreadPool();
executor = MoreExecutors .sameThreadExecutor();
Only if your listener code is trivial and non blocking as it’ll be executed in the IO Thread
…Or any thread pool that you prefer
© 2014 DataStax, All Rights Reserved. 35
Query Builder
import staticcom.datastax.driver.core.querybuilder.QueryBuilder.*;
Statement selectAll = select().all().from("user").where(eq("user_id", userId));
session.execute(selectAll);
Statement insert = insertInto("user") .value("user_id", 2) .value("name", "johndoe") .value("email", "[email protected]");
session.execute(insert);
import static of QueryBuilder is required in order to use the DSL
© 2014 DataStax, All Rights Reserved. 36
Python
cluster = Cluster(['10.1.1.3', '10.1.1.4', ’10.1.1.5'])session = cluster.connect('mykeyspace')
def handle_success(rows): user = rows[0] try: process_user(user.name, user.age, user.id) except Exception: log.error("Failed to process user %s", user.id) # don't re-raise errors in the callback
def handle_error(exception): log.error("Failed to fetch user info: %s", exception)
future = session.execute_async("SELECT * FROM users WHERE user_id=3")future.add_callbacks(handle_success, handle_error)
It’s also possible to retrieve the result from the future
object synchronously
© 2014 DataStax, All Rights Reserved. 37
C#
var cluster = Cluster.Builder() .AddContactPoints("host1", "host2", "host3") .Build();var session = cluster.Connect("sample_keyspace");
var task = session.ExecuteAsync(statement);task.ContinueWith((t) =>{ var rs = t.Result; foreach (var row in rs) { //Get the values from each row }}, TaskContinuationOptions.OnlyOnRanToCompletion);
Asynchronously execute a query using the TPL
© 2014 DataStax, All Rights Reserved. 38
C / C++
CassString query = cass_string_init("SELECT keyspace_name FROM system.schema_keyspaces;");CassStatement* statement = cass_statement_new(query, 0);
CassFuture* result_future = cass_session_execute(session, statement);
if (cass_future_error_code(result_future) == CASS_OK) { const CassResult* result = cass_future_get_result(result_future); CassIterator* rows = cass_iterator_from_result(result);
while (cass_iterator_next(rows)) { // Process results }
cass_result_free(result); cass_iterator_free(rows);}
cass_future_free(result_future);
Each structure must be freed with the appropriate function
© 2014 DataStax, All Rights Reserved. 39
Node.js
var cassandra = require('cassandra-driver');var client = new cassandra.Client({ contactPoints: ['host1', 'h2'], keyspace: 'ks1'});var query = 'SELECT email, last_name FROM user_profiles WHERE key=?';
client.execute(query, ['guy'], function(err, result) { assert.ifError(err); console.log('got user profile with email ' + result.rows[0].email);});
Here we’re using a Parameterized Statement, which is not prepared, but still allows parameters
© 2014 DataStax, All Rights Reserved. 40
Ruby
cluster = Cassandra.cluster
session = cluster.connect(‘system')
future = session.execute_async('SELECT * FROM schema_columnfamilies')
future.on_success do |rows| rows.each do |row| puts "The keyspace #{row['keyspace_name']} has a table called #{row['columnfamily_name']}" endend
future.join
Register a listener on the future, which will be called when results are available
© 2014 DataStax, All Rights Reserved.
Object Mapper
• Avoid boilerplate for common use cases
• Map Objects to Statements and ResultSets to Objects
• Do NOT hide Cassandra from the developer
• No “clever tricks” à la Hibernate
• Not JPA compatible, but JPA-ish API
41
© 2014 DataStax, All Rights Reserved. 42
Object Mapper in Practice
<dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-‐driver-‐mapping</artifactId> <version>2.1.0</version>
</dependency>
Additional artifact for object mapping
Available from Driver 2.1.0
© 2014 DataStax, All Rights Reserved. 43
Basic Object Mapping
CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE users ( email text PRIMARY KEY, address address );
@UDT(keyspace = "ks", name = "address") public class Address { private String street; private String city; private int zip; // getters and setters omitted... } @Table(keyspace = "ks", name = "users") public class User { @PartitionKey private String email; private Address address; // getters and setters omitted... }
© 2014 DataStax, All Rights Reserved. 44
Basic Object Mapping
MappingManager manager = new MappingManager(session);
Mapper mapper = manager.mapper(User.class); UserProfile myProfile = mapper.get("[email protected]");
ListenableFuture saveFuture = mapper.saveAsync(anotherProfile);
mapper.delete("[email protected]");
Mapper, just like Session, is a thread-safe object. Create a singleton at startup.
get() returns a mapped row for the given Primary Key
ListenableFuture from Guava. Completed when the write is acknowledged.
© 2014 DataStax, All Rights Reserved. 45
Accessors
UserAccessor accessor = manager.createAccessor(UserAccessor.class); Result<User> users = accessor.firstN(10);
for (User user : users) { System.out.println( profile.getAddress().getZip() ); }
Result is like ResultSet but specialized for a mapped class…
…so we iterate over it just like we would with a ResultSet
@Accessor interface UserAccessor { @Query("SELECT * FROM user_profiles LIMIT :max") Result<User> firstN(@Param("max") int limit); }
We’re Hiring!
@mfiguiere
Cassandra Tech Day - ParisNovember 4th
Cassandra Summit Europe - LondonDecember 3-4th