c* summit eu 2013: cassandra internals

61
CASSANDRA EU 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License #CassandraEU

Upload: planet-cassandra

Post on 11-May-2015

1.003 views

Category:

Technology


3 download

DESCRIPTION

Speaker: Aaron Morton, Apache Cassandra Committer & Co-Founder/Principle Consultant at The Last Pickle Inc. Video: http://www.youtube.com/watch?v=efI5fL8eEfo&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=23 From the microsecond your request hits an Apache Cassandra node there are many code paths, threads and machines involved in storing or fetching your data. This talk will step through the common operations and highlight the code responsible. Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. This talk will step through read and write requests, automatic processes and manual maintenance tasks. I'll discuss the general approach to solving the problem and drill down to the code responsible for implementation. Existing Cassandra users, those wanting to contribute to the project and people interested in Dynamo based systems will all benefit from this tour of the code base.

TRANSCRIPT

Page 1: C* Summit EU 2013: Cassandra Internals

CASSANDRA EU 2013

CASSANDRA INTERNALS

Aaron Morton @aaronmorton

!

Co-Founder & Principal Consultant www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License #CassandraEU

Page 2: C* Summit EU 2013: Cassandra Internals

About The Last Pickle. Work with clients to deliver and improve

Apache Cassandra based solutions.

Apache Cassandra Committer, DataStax MVP, Hector Maintainer, Apache Usergrid Committer.

Based in New Zealand & Austin, TX.

#CassandraEUwww.thelastpickle.com

Page 3: C* Summit EU 2013: Cassandra Internals

Architecture Code

#CassandraEUwww.thelastpickle.com

Page 4: C* Summit EU 2013: Cassandra Internals

Cassandra Architecture.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

#CassandraEUwww.thelastpickle.com

Page 5: C* Summit EU 2013: Cassandra Internals

Cassandra Cluster Architecture.

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

#CassandraEUwww.thelastpickle.com

Page 6: C* Summit EU 2013: Cassandra Internals

Dynamo Cluster Architecture.

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

www.thelastpickle.com #CassandraEU

Page 7: C* Summit EU 2013: Cassandra Internals

Architecture API

Dynamo Database

#CassandraEUwww.thelastpickle.com

Page 8: C* Summit EU 2013: Cassandra Internals

API Transports. !

Thrift Native Binary

!

#CassandraEUwww.thelastpickle.com

Page 9: C* Summit EU 2013: Cassandra Internals

Thrift Transport. !

//Custom TServer implementations o.a.c.thrift.CustomTThreadPoolServer o.a.c.thrift.CustomTHsHaServer

#CassandraEUwww.thelastpickle.com

Page 10: C* Summit EU 2013: Cassandra Internals

API Transports.

Thrift Native Binary

#CassandraEUwww.thelastpickle.com

Page 11: C* Summit EU 2013: Cassandra Internals

Native Binary Transport. !

Beta in Cassandra 1.2, now GA. Uses Netty. CQL 3 only.

#CassandraEUwww.thelastpickle.com

Page 12: C* Summit EU 2013: Cassandra Internals

o.a.c.transport.Server.run() !

//Setup the Netty server new ExecutionHandler() new NioServerSocketChannelFactory() ServerBootstrap.setPipelineFactory()

#CassandraEUwww.thelastpickle.com

Page 13: C* Summit EU 2013: Cassandra Internals

o.a.c.transport.Message.Dispatcher.messageReceived() !

//Process message from client ServerConnection.validateNewMessage() Request.execute() ServerConnection.applyStateTransition() Channel.write()

#CassandraEUwww.thelastpickle.com

Page 14: C* Summit EU 2013: Cassandra Internals

Messages. !

Defined in the Native Binary Protocol

$SRC/doc/native_protocol.spec

#CassandraEUwww.thelastpickle.com

Page 15: C* Summit EU 2013: Cassandra Internals

API Services. !

JMX Thrift

CQL 3 !

#CassandraEUwww.thelastpickle.com

Page 16: C* Summit EU 2013: Cassandra Internals

JMX Management Beans. !

Spread around the code base.

Interfaces named *MBean

#CassandraEUwww.thelastpickle.com

Page 17: C* Summit EU 2013: Cassandra Internals

JMX Management Beans. !

Registered with names such as org.apache.cassandra.db:

type=StorageProxy

#CassandraEUwww.thelastpickle.com

Page 18: C* Summit EU 2013: Cassandra Internals

API Services. !

JMX Thrift CQL 3

!

#CassandraEUwww.thelastpickle.com

Page 19: C* Summit EU 2013: Cassandra Internals

o.a.c.thrift.CassandraServer !

// Implements Thrift Interface // Access control // Input validation // Mapping to/from Thrift and internal types

#CassandraEUwww.thelastpickle.com

Page 20: C* Summit EU 2013: Cassandra Internals

Thrift Interface. !

Thrift IDL $SRC/interface/cassandra.thrift

#CassandraEUwww.thelastpickle.com

Page 21: C* Summit EU 2013: Cassandra Internals

o.a.c.thrift.CassandraServer.get_slice() !

// get columns for one row Tracing.begin() ClientState cState = state() cState.hasColumnFamilyAccess() multigetSliceInternal() !

#CassandraEUwww.thelastpickle.com

Page 22: C* Summit EU 2013: Cassandra Internals

CassandraServer.multigetSliceInternal() !

// get columns for may rows ThriftValidation.validate*() // Create ReadCommands getSlice() !

#CassandraEUwww.thelastpickle.com

Page 23: C* Summit EU 2013: Cassandra Internals

CassandraServer.getSlice() !

// Process ReadCommands // return Thrift types !

readColumnFamily() thriftifyColumnFamily() !

#CassandraEUwww.thelastpickle.com

Page 24: C* Summit EU 2013: Cassandra Internals

CassandraServer.readColumnFamily() !

// Process ReadCommands // Return ColumnFamilies !

StorageProxy.read() !

#CassandraEUwww.thelastpickle.com

Page 25: C* Summit EU 2013: Cassandra Internals

API Services. !

JMX Thrift

CQL 3 !

#CassandraEUwww.thelastpickle.com

Page 26: C* Summit EU 2013: Cassandra Internals

o.a.c.cql3.QueryProcessor !

// Prepares and executes CQL3 statements // Used by Thrift & Native transports // Access control // Input validation // Returns transport.ResultMessage

!

!

#CassandraEUwww.thelastpickle.com

Page 27: C* Summit EU 2013: Cassandra Internals

CQL3 Grammar. !

ANTLR Grammar $SRC/o.a.c.cql3/Cql.g

#CassandraEUwww.thelastpickle.com

Page 28: C* Summit EU 2013: Cassandra Internals

o.a.c.cql3.statements.ParsedStatement !

// Subclasses generated by ANTLR // Tracks bound term count // Prepare CQLStatement prepare()

#CassandraEUwww.thelastpickle.com

Page 29: C* Summit EU 2013: Cassandra Internals

o.a.c.cql3.statements.CQLStatement !

checkAccess(ClientState state) validate(ClientState state) execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)

#CassandraEUwww.thelastpickle.com

Page 30: C* Summit EU 2013: Cassandra Internals

statements.SelectStatement.RawStatement !

// Implements ParsedStatement // Input validation prepare()

#CassandraEUwww.thelastpickle.com

Page 31: C* Summit EU 2013: Cassandra Internals

statements.SelectStatement.execute() !

// Create ReadCommands StorageProxy.read()

www.thelastpickle.com #CassandraEU

Page 32: C* Summit EU 2013: Cassandra Internals

Architecture API

Dynamo Database

#CassandraEUwww.thelastpickle.com

Page 33: C* Summit EU 2013: Cassandra Internals

Dynamo Layer. o.a.c.service

o.a.c.net !

o.a.c.dht o.a.c.gms

o.a.c.locator o.a.c.stream

#CassandraEUwww.thelastpickle.com

Page 34: C* Summit EU 2013: Cassandra Internals

o.a.c.service.StorageProxy !

// Cluster wide storage operations // Select endpoints & check CL available // Send messages to Stages // Wait for response // Store Hints

#CassandraEUwww.thelastpickle.com

Page 35: C* Summit EU 2013: Cassandra Internals

o.a.c.service.StorageService !

// Ring operations // Track ring state // Start & stop ring membership // Node & token queries

#CassandraEUwww.thelastpickle.com

Page 36: C* Summit EU 2013: Cassandra Internals

o.a.c.service.IResponseResolver !

preprocess(MessageIn<T> message) resolve() throws DigestMismatchException !

RowDigestResolver RowDataResolver RangeSliceResponseResolver

#CassandraEUwww.thelastpickle.com

Page 37: C* Summit EU 2013: Cassandra Internals

Response Handlers / Callback.

implements IAsyncCallback<T> !

response(MessageIn<T> msg) !

#CassandraEUwww.thelastpickle.com

Page 38: C* Summit EU 2013: Cassandra Internals

o.a.c.service.ReadCallback.get()

//Wait for blockfor & data response condition.await(timeout, TimeUnit.MILLISECONDS) !

throw ReadTimeoutException() !

resolver.resolve()

#CassandraEUwww.thelastpickle.com

Page 39: C* Summit EU 2013: Cassandra Internals

o.a.c.service.StorageProxy.fetchRows() !

getLiveSortedEndpoints() new RowDigestResolver() new ReadCallback() MessagingService.sendRR() --------------------------------------- ReadCallback.get() # blocking catch (DigestMismatchException ex) catch (ReadTimeoutException ex)

#CassandraEUwww.thelastpickle.com

Page 40: C* Summit EU 2013: Cassandra Internals

Dynamo Layer !

o.a.c.service o.a.c.net

!

o.a.c.dht o.a.c.gms

o.a.c.locator o.a.c.stream

#CassandraEUwww.thelastpickle.com

Page 41: C* Summit EU 2013: Cassandra Internals

o.a.c.net.MessagingService.verb<<enum>> !

MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE

(And more...)

#CassandraEUwww.thelastpickle.com

Page 42: C* Summit EU 2013: Cassandra Internals

o.a.c.net.MessagingService.verbHandlers !

new EnumMap<Verb, IVerbHandler>(Verb.class)

#CassandraEUwww.thelastpickle.com

Page 43: C* Summit EU 2013: Cassandra Internals

o.a.c.net.IVerbHandler<T> !

doVerb(MessageIn<T> message, String id);

!

#CassandraEUwww.thelastpickle.com

Page 44: C* Summit EU 2013: Cassandra Internals

o.a.c.net.MessagingService.verbStages !

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

#CassandraEUwww.thelastpickle.com

Page 45: C* Summit EU 2013: Cassandra Internals

o.a.c.net.MessagingService.receive() !

runnable = new MessageDeliveryTask( message, id, timestamp); !

StageManager.getStage( message.getMessageType()); !

stage.execute(runnable);

#CassandraEUwww.thelastpickle.com

Page 46: C* Summit EU 2013: Cassandra Internals

o.a.c.net.MessageDeliveryTask.run() !

// If dropable and rpc_timeout MessagingService.incrementDroppedMessages(v

erb); return; !

MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id)

#CassandraEUwww.thelastpickle.com

Page 47: C* Summit EU 2013: Cassandra Internals

Architecture API Layer

Dynamo Layer Database Layer

#CassandraEUwww.thelastpickle.com

Page 48: C* Summit EU 2013: Cassandra Internals

Database Layer !

o.a.c.concurrent o.a.c.db

!

o.a.c.cache o.a.c.io

o.a.c.trace

#CassandraEUwww.thelastpickle.com

Page 49: C* Summit EU 2013: Cassandra Internals

o.a.c.concurrent.StageManager !

stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class); !

getStage(Stage stage)

#CassandraEUwww.thelastpickle.com

Page 50: C* Summit EU 2013: Cassandra Internals

o.a.c.concurrent.Stage !

READ MUTATION GOSSIP REQUEST_RESPONSE ANTI_ENTROPY

(And more...)#CassandraEUwww.thelastpickle.com

Page 51: C* Summit EU 2013: Cassandra Internals

Database Layer. o.a.c.concurrent

o.a.c.db !

o.a.c.cache o.a.c.io

o.a.c.trace

#CassandraEUwww.thelastpickle.com

Page 52: C* Summit EU 2013: Cassandra Internals

o.a.c.db.Table !

// Keyspace open(String table) getColumnFamilyStore(String cfName) !

getRow(QueryFilter filter) apply(RowMutation mutation, boolean writeCommitLog)

#CassandraEUwww.thelastpickle.com

Page 53: C* Summit EU 2013: Cassandra Internals

o.a.c.db.ColumnFamilyStore !

// Column Family getColumnFamily(QueryFilter filter) getTopLevelColumns(...) !

apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

#CassandraEUwww.thelastpickle.com

Page 54: C* Summit EU 2013: Cassandra Internals

o.a.c.db.IColumnContainer !

addColumn(IColumn column) remove(ByteBuffer columnName) !

ColumnFamily SuperColumn !

(Removed in 2.0)

#CassandraEUwww.thelastpickle.com

Page 55: C* Summit EU 2013: Cassandra Internals

o.a.c.db.ISortedColumns !

addColumn(IColumn column, Allocator allocator) removeColumn(ByteBuffer name) !

ArrayBackedSortedColumns AtomicSortedColumns TreeMapBackedSortedColumns

#CassandraEUwww.thelastpickle.com

Page 56: C* Summit EU 2013: Cassandra Internals

o.a.c.db.Memtable !

put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) !

flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)

#CassandraEUwww.thelastpickle.com

Page 57: C* Summit EU 2013: Cassandra Internals

o.a.c.db.ReadCommand !

getRow(Table table) !

SliceByNamesReadCommand SliceFromReadCommand RangeSliceCommand

(Additional classes for paging in 2.0)

#CassandraEUwww.thelastpickle.com

Page 58: C* Summit EU 2013: Cassandra Internals

o.a.c.db.IDiskAtomFilter !

getMemtableColumnIterator(...) getSSTableColumnIterator(...) !

IdentityQueryFilter NamesQueryFilter SliceQueryFilter

#CassandraEUwww.thelastpickle.com

Page 59: C* Summit EU 2013: Cassandra Internals

Summary CustomTThreadPoolServer Message.Dispatcher

CassandraServer QueryProcessor

ReadCommand

StorageProxy

IResponseResolver

IAsyncCallback

MessagingService

IVerbHandler

Table ColumnFamilyStore IDiskAtomFilter

API

Dynamo

Database

#CassandraEUwww.thelastpickle.com

Page 60: C* Summit EU 2013: Cassandra Internals

Thanks. !

#CassandraEUwww.thelastpickle.com

Page 61: C* Summit EU 2013: Cassandra Internals

Aaron Morton @aaronmorton

www.thelastpickle.com !

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License