Cassandra Community Webinar: Apache Cassandra Internals

Download Cassandra Community Webinar: Apache Cassandra Internals

Post on 26-Jan-2015

107 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation. Speaker: Aaron Morton, Apache Cassandra Committer Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

TRANSCRIPT

<ul><li> 1. CASSANDRA COMMUNITY WEBINARS AUGUST 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton Co-Founder &amp; Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License </li></ul> <p> 2. AboutThe Last Pickle Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Hector Maintainer, 6+ years combined Cassandra experience. Based in New Zealand &amp; Austin,TX. 3. Architecture Code www.thelastpickle.com 4. Cassandra Architecture API's Cluster Aware Cluster Unaware Clients Disk www.thelastpickle.com 5. Cassandra Cluster Architecture API's Cluster Aware Cluster Unaware Clients Disk API's Cluster Aware Cluster Unaware Disk Node 1 Node 2 www.thelastpickle.com 6. Dynamo Cluster Architecture API's Dynamo Database Clients Disk API's Dynamo Database Disk Node 1 Node 2 www.thelastpickle.com 7. Architecture API Dynamo Database www.thelastpickle.com 8. APITransports Thrift Native Binary www.thelastpickle.com 9. ThriftTransport //Custom TServer implementations o.a.c.thrift.CustomTThreadPoolServer o.a.c.thrift.CustomTNonBlockingServer o.a.c.thrift.CustomTHsHaServer www.thelastpickle.com 10. APITransports Thrift Native Binary www.thelastpickle.com 11. Native BinaryTransport Beta in Cassandra 1.2 Uses Netty Enabled with start_native_transport (Disabled by default) www.thelastpickle.com 12. o.a.c.transport.Server.run() //Setup the Netty server new ExecutionHandler() new NioServerSocketChannelFactory() ServerBootstrap.setPipelineFactory() www.thelastpickle.com 13. o.a.c.transport.Message.Dispatcher.messageReceived() //Process message from client ServerConnection.validateNewMessage() Request.execute() ServerConnection.applyStateTransition() Channel.write() www.thelastpickle.com 14. Messages Dened in the Native Binary Protocol $SRC/doc/native_protocol.spec www.thelastpickle.com 15. API Services JMX Thrift CQL 3 www.thelastpickle.com 16. JMX Management Beans Spread around the code base. Interfaces named *MBean www.thelastpickle.com 17. JMX Management Beans Registered with names such as org.apache.cassandra.db: type=StorageProxy www.thelastpickle.com 18. API Services JMX Thrift CQL 3 www.thelastpickle.com 19. o.a.c.thrift.CassandraServer // Implements Thrift Interface // Access control // Input validation // Mapping to/from Thrift and internal types www.thelastpickle.com 20. Thrift Interface Thrift IDL $SRC/interface/cassandra.thrift www.thelastpickle.com 21. o.a.c.thrift.CassandraServer.get_slice() // get columns for one row Tracing.begin() ClientState cState = state() cState.hasColumnFamilyAccess() multigetSliceInternal() www.thelastpickle.com 22. CassandraServer.multigetSliceInternal() // get columns for may rows ThriftValidation.validate*() // Create ReadCommands getSlice() www.thelastpickle.com 23. CassandraServer.getSlice() // Process ReadCommands // return Thrift types readColumnFamily() thriftifyColumnFamily() www.thelastpickle.com 24. CassandraServer.readColumnFamily() // Process ReadCommands // Return ColumnFamilies StorageProxy.read() www.thelastpickle.com 25. API Services JMX Thrift CQL 3 www.thelastpickle.com 26. o.a.c.cql3.QueryProcessor // Prepares and executes CQL3 statements // Used by Thrift &amp; Native transports // Access control // Input validation // Returns transport.ResultMessage www.thelastpickle.com 27. CQL3 Grammar ANTLR Grammar $SRC/o.a.c.cql3/Cql.g www.thelastpickle.com 28. o.a.c.cql3.statements.ParsedStatement // Subclasses generated by ANTLR // Tracks bound term count // Prepare CQLStatement prepare() www.thelastpickle.com 29. o.a.c.cql3.statements.CQLStatement checkAccess(ClientState state) validate(ClientState state) execute(ConsistencyLevel cl, QueryState state, List variables) www.thelastpickle.com 30. statements.SelectStatement.RawStatement // Implements ParsedStatement // Input validation prepare() www.thelastpickle.com 31. statements.SelectStatement.execute() // Create ReadCommands StorageProxy.read() www.thelastpickle.com 32. Architecture API Dynamo Database www.thelastpickle.com 33. Dynamo Layer o.a.c.service o.a.c.net o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com 34. o.a.c.service.StorageProxy // Cluster wide storage operations // Select endpoints &amp; check CL available // Send messages to Stages // Wait for response // Store Hints www.thelastpickle.com 35. o.a.c.service.StorageService // Ring operations // Track ring state // Start &amp; stop ring membership // Node &amp; token queries www.thelastpickle.com 36. o.a.c.service.IResponseResolver preprocess(MessageIn message) resolve() throws DigestMismatchException RowDigestResolver RowDataResolver RangeSliceResponseResolver www.thelastpickle.com 37. Response Handlers / Callback implements IAsyncCallback response(MessageIn msg) www.thelastpickle.com 38. o.a.c.service.ReadCallback.get() //Wait for blockfor &amp; data condition.await(timeout, TimeUnit.MILLISECONDS) throw ReadTimeoutException() resolver.resolve() www.thelastpickle.com 39. o.a.c.service.StorageProxy.fetchRows() getLiveSortedEndpoints() new RowDigestResolver() new ReadCallback() MessagingService.sendRR() --------------------------------------- ReadCallback.get() # blocking catch (DigestMismatchException ex) catch (ReadTimeoutException ex) www.thelastpickle.com 40. Dynamo Layer o.a.c.service o.a.c.net o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com 41. o.a.c.net.MessagingService.verb MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE (And more...) www.thelastpickle.com 42. o.a.c.net.MessagingService.verbHandlers new EnumMap(Verb.class) www.thelastpickle.com 43. o.a.c.net.IVerbHandler doVerb(MessageIn message, String id); www.thelastpickle.com 44. o.a.c.net.MessagingService.verbStages new EnumMap(MessagingService.Verb.class) www.thelastpickle.com 45. o.a.c.net.MessagingService.receive() runnable = new MessageDeliveryTask( message, id, timestamp); StageManager.getStage( message.getMessageType()); stage.execute(runnable); www.thelastpickle.com 46. o.a.c.net.MessageDeliveryTask.run() // If dropable and rpc_timeout MessagingService.incrementDroppedMessag es(verb); MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id) www.thelastpickle.com 47. Architecture API Layer Dynamo Layer Database Layer www.thelastpickle.com 48. Database Layer o.a.c.concurrent o.a.c.db o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com 49. o.a.c.concurrent.StageManager stages = new EnumMap(Stage.class); getStage(Stage stage) www.thelastpickle.com 50. o.a.c.concurrent.Stage READ MUTATION GOSSIP REQUEST_RESPONSE ANTI_ENTROPY (And more...) www.thelastpickle.com 51. Database Layer o.a.c.concurrent o.a.c.db o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com 52. o.a.c.db.Table // Keyspace open(String table) getColumnFamilyStore(String cfName) getRow(QueryFilter filter) apply(RowMutation mutation, boolean writeCommitLog) www.thelastpickle.com 53. o.a.c.db.ColumnFamilyStore // Column Family getColumnFamily(QueryFilter filter) getTopLevelColumns(...) apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) www.thelastpickle.com 54. o.a.c.db.IColumnContainer addColumn(IColumn column) remove(ByteBuffer columnName) ColumnFamily SuperColumn www.thelastpickle.com 55. o.a.c.db.ISortedColumns addColumn(IColumn column, Allocator allocator) removeColumn(ByteBuffer name) ArrayBackedSortedColumns AtomicSortedColumns TreeMapBackedSortedColumns www.thelastpickle.com 56. o.a.c.db.Memtable put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) flushAndSignal(CountDownLatch latch, Future context) www.thelastpickle.com 57. o.a.c.db.ReadCommand getRow(Table table) SliceByNamesReadCommand SliceFromReadCommand www.thelastpickle.com 58. o.a.c.db.IDiskAtomFilter getMemtableColumnIterator(...) getSSTableColumnIterator(...) IdentityQueryFilter NamesQueryFilter SliceQueryFilter www.thelastpickle.com 59. Summary CustomTThreadPoolServer Message.Dispatcher CassandraServer QueryProcessor ReadCommand StorageProxy IResponseResolver IAsyncCallback MessagingService IVerbHandler Table ColumnFamilyStore IDiskAtomFilter API Dynamo Database www.thelastpickle.com 60. Thanks. www.thelastpickle.com 61. Aaron Morton @aaronmorton Co-Founder &amp; Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License </p>