cassandra community webinar: apache cassandra internals

61
CASSANDRA COMMUNITY WEBINARS AUGUST 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Upload: datastax

Post on 26-Jan-2015

140 views

Category:

Technology


0 download

DESCRIPTION

Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation. Speaker: Aaron Morton, Apache Cassandra Committer Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

TRANSCRIPT

Page 1: Cassandra Community Webinar: Apache Cassandra Internals

CASSANDRA COMMUNITY WEBINARS AUGUST 2013

CASSANDRA INTERNALS

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Community Webinar: Apache Cassandra Internals

About The Last PickleWork with clients to deliver and improve

Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP,

Hector Maintainer, 6+ years combined Cassandra experience.

Based in New Zealand & Austin, TX.

Page 3: Cassandra Community Webinar: Apache Cassandra Internals

ArchitectureCode

www.thelastpickle.com

Page 4: Cassandra Community Webinar: Apache Cassandra Internals

Cassandra Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

www.thelastpickle.com

Page 5: Cassandra Community Webinar: Apache Cassandra Internals

Cassandra Cluster Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

www.thelastpickle.com

Page 6: Cassandra Community Webinar: Apache Cassandra Internals

Dynamo Cluster Architecture

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

www.thelastpickle.com

Page 7: Cassandra Community Webinar: Apache Cassandra Internals

ArchitectureAPI

DynamoDatabase

www.thelastpickle.com

Page 8: Cassandra Community Webinar: Apache Cassandra Internals

API Transports

ThriftNative Binary

www.thelastpickle.com

Page 9: Cassandra Community Webinar: Apache Cassandra Internals

Thrift Transport

//Custom TServer implementations

o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer

www.thelastpickle.com

Page 10: Cassandra Community Webinar: Apache Cassandra Internals

API Transports

ThriftNative Binary

www.thelastpickle.com

Page 11: Cassandra Community Webinar: Apache Cassandra Internals

Native Binary Transport

Beta in Cassandra 1.2Uses Netty

Enabled with start_native_transport

(Disabled by default)

www.thelastpickle.com

Page 12: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.transport.Server.run()

//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()

www.thelastpickle.com

Page 13: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.transport.Message.Dispatcher.messageReceived()

//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()

www.thelastpickle.com

Page 14: Cassandra Community Webinar: Apache Cassandra Internals

Messages

Defined in the Native Binary Protocol

$SRC/doc/native_protocol.spec

www.thelastpickle.com

Page 15: Cassandra Community Webinar: Apache Cassandra Internals

API Services

JMXThrift

CQL 3

www.thelastpickle.com

Page 16: Cassandra Community Webinar: Apache Cassandra Internals

JMX Management Beans

Spread around the code base.

Interfaces named *MBean

www.thelastpickle.com

Page 17: Cassandra Community Webinar: Apache Cassandra Internals

JMX Management Beans

Registered with names such as org.apache.cassandra.db:

type=StorageProxy

www.thelastpickle.com

Page 18: Cassandra Community Webinar: Apache Cassandra Internals

API Services

JMXThriftCQL 3

www.thelastpickle.com

Page 19: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.thrift.CassandraServer

// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types

www.thelastpickle.com

Page 20: Cassandra Community Webinar: Apache Cassandra Internals

Thrift Interface

Thrift IDL$SRC/interface/cassandra.thrift

www.thelastpickle.com

Page 21: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.thrift.CassandraServer.get_slice()

// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()

www.thelastpickle.com

Page 22: Cassandra Community Webinar: Apache Cassandra Internals

CassandraServer.multigetSliceInternal()

// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()

www.thelastpickle.com

Page 23: Cassandra Community Webinar: Apache Cassandra Internals

CassandraServer.getSlice()

// Process ReadCommands// return Thrift types

readColumnFamily()thriftifyColumnFamily()

www.thelastpickle.com

Page 24: Cassandra Community Webinar: Apache Cassandra Internals

CassandraServer.readColumnFamily()

// Process ReadCommands// Return ColumnFamilies

StorageProxy.read()

www.thelastpickle.com

Page 25: Cassandra Community Webinar: Apache Cassandra Internals

API Services

JMXThrift

CQL 3

www.thelastpickle.com

Page 26: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.cql3.QueryProcessor

// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage

www.thelastpickle.com

Page 27: Cassandra Community Webinar: Apache Cassandra Internals

CQL3 Grammar

ANTLR Grammar$SRC/o.a.c.cql3/Cql.g

www.thelastpickle.com

Page 28: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.cql3.statements.ParsedStatement

// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()

www.thelastpickle.com

Page 29: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.cql3.statements.CQLStatement

checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)

www.thelastpickle.com

Page 30: Cassandra Community Webinar: Apache Cassandra Internals

statements.SelectStatement.RawStatement

// Implements ParsedStatement// Input validationprepare()

www.thelastpickle.com

Page 31: Cassandra Community Webinar: Apache Cassandra Internals

statements.SelectStatement.execute()

// Create ReadCommandsStorageProxy.read()

www.thelastpickle.com

Page 32: Cassandra Community Webinar: Apache Cassandra Internals

ArchitectureAPI

DynamoDatabase

www.thelastpickle.com

Page 33: Cassandra Community Webinar: Apache Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.net

o.a.c.dhto.a.c.gms

o.a.c.locatoro.a.c.stream

www.thelastpickle.com

Page 34: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.service.StorageProxy

// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints

www.thelastpickle.com

Page 35: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.service.StorageService

// Ring operations// Track ring state// Start & stop ring membership// Node & token queries

www.thelastpickle.com

Page 36: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.service.IResponseResolver

preprocess(MessageIn<T> message)resolve() throws DigestMismatchException

RowDigestResolverRowDataResolverRangeSliceResponseResolver

www.thelastpickle.com

Page 37: Cassandra Community Webinar: Apache Cassandra Internals

Response Handlers / Callback

implements IAsyncCallback<T>

response(MessageIn<T> msg)

www.thelastpickle.com

Page 38: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.service.ReadCallback.get()

//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)

throw ReadTimeoutException()

resolver.resolve()

www.thelastpickle.com

Page 39: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.service.StorageProxy.fetchRows()

getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)

www.thelastpickle.com

Page 40: Cassandra Community Webinar: Apache Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.net

o.a.c.dhto.a.c.gms

o.a.c.locatoro.a.c.stream

www.thelastpickle.com

Page 41: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.MessagingService.verb<<enum>>

MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE

(And more...)

www.thelastpickle.com

Page 42: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.MessagingService.verbHandlers

new EnumMap<Verb, IVerbHandler>(Verb.class)

www.thelastpickle.com

Page 43: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.IVerbHandler<T>

doVerb(MessageIn<T> message, String id);

www.thelastpickle.com

Page 44: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.MessagingService.verbStages

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

www.thelastpickle.com

Page 45: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.MessagingService.receive()

runnable = new MessageDeliveryTask( message, id, timestamp);

StageManager.getStage( message.getMessageType());

stage.execute(runnable);

www.thelastpickle.com

Page 46: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.net.MessageDeliveryTask.run()

// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag

es(verb);

MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)

www.thelastpickle.com

Page 47: Cassandra Community Webinar: Apache Cassandra Internals

ArchitectureAPI Layer

Dynamo LayerDatabase Layer

www.thelastpickle.com

Page 48: Cassandra Community Webinar: Apache Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

www.thelastpickle.com

Page 49: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.concurrent.StageManager

stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);

getStage(Stage stage)

www.thelastpickle.com

Page 50: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.concurrent.Stage

READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY

(And more...)www.thelastpickle.com

Page 51: Cassandra Community Webinar: Apache Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

www.thelastpickle.com

Page 52: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.Table

// Keyspaceopen(String table)getColumnFamilyStore(String cfName)

getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)

www.thelastpickle.com

Page 53: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.ColumnFamilyStore

// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)

apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

www.thelastpickle.com

Page 54: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.IColumnContainer

addColumn(IColumn column)remove(ByteBuffer columnName)

ColumnFamilySuperColumn

www.thelastpickle.com

Page 55: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.ISortedColumns

addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)

ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns

www.thelastpickle.com

Page 56: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.Memtable

put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)

www.thelastpickle.com

Page 57: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.ReadCommand

getRow(Table table)

SliceByNamesReadCommandSliceFromReadCommand

www.thelastpickle.com

Page 58: Cassandra Community Webinar: Apache Cassandra Internals

o.a.c.db.IDiskAtomFilter

getMemtableColumnIterator(...)getSSTableColumnIterator(...)

IdentityQueryFilterNamesQueryFilterSliceQueryFilter

www.thelastpickle.com

Page 59: Cassandra Community Webinar: Apache Cassandra Internals

SummaryCustomTThreadPoolServer Message.Dispatcher

CassandraServer QueryProcessor

ReadCommand

StorageProxy

IResponseResolver

IAsyncCallback

MessagingService

IVerbHandler

Table ColumnFamilyStore IDiskAtomFilter

API

Dynamo

Database

www.thelastpickle.com

Page 60: Cassandra Community Webinar: Apache Cassandra Internals

Thanks.

www.thelastpickle.com

Page 61: Cassandra Community Webinar: Apache Cassandra Internals

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License