successful software development with apache cassandra

89
CASSANDRA-SF 2014 SUCCESSFUL SOFTWARE DEVELOPMENT WITH CASSANDRA Nate McCall @zznate #CassandraSummit Co-Founder & Sr. Technical Consultant Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Upload: zznate

Post on 29-Nov-2014

1.289 views

Category:

Software


0 download

DESCRIPTION

Adding a new technology to your development process can be challenging, and the distributed nature of Apache Cassandra can make it daunting. However, recent improvements in drivers, utilities and tooling have simplified the process making it easier than ever before to develop software with Apache Cassandra. In this presentation, we cover essential knowledge for all developers wanting to efficiently create reliable Apache Cassandra based solutions. Topics include: - Language and Driver selection - Optimizing Driver configuration - Productive Developer environments using ccm, Vagrant and DataStax DevCenter - Creating appropriate test data - Unit testing - Automated integration testing - Test optimization with profiles New and existing users will come away from this presentation with the necessary knowledge to make their next Apache Cassandra project a success.

TRANSCRIPT

Page 1: Successful Software Development with Apache Cassandra

CASSANDRA-SF 2014

SUCCESSFUL SOFTWARE DEVELOPMENT WITH

CASSANDRA Nate McCall

@zznate #CassandraSummit

Co-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Successful Software Development with Apache Cassandra

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions.

!

Based in New Zealand & USA.

Page 3: Successful Software Development with Apache Cassandra

OVERVIEW

Page 4: Successful Software Development with Apache Cassandra

Overview:

What makes a software development

project successful?

Page 5: Successful Software Development with Apache Cassandra

Overview: Successful Software Development

- it ships - maintainable - good test coverage - check out and build

Page 6: Successful Software Development with Apache Cassandra

Overview:

Impedance mismatch: distributed systems

development on a laptop.

Page 7: Successful Software Development with Apache Cassandra

GETTING STARTED: FOLLOW THE PATH OF LEAST

RESISTANCE

Page 8: Successful Software Development with Apache Cassandra

Getting Started: !

JVM-Based if at all Possible.

Page 9: Successful Software Development with Apache Cassandra

Getting Started: !

Python Otherwise.

https://github.com/datastax/python-driver

Page 10: Successful Software Development with Apache Cassandra

Getting Started: !

C#?

https://github.com/datastax/csharp-driver

Page 11: Successful Software Development with Apache Cassandra

Getting Started: !

Ruby?

https://github.com/datastax/ruby-driver

Page 12: Successful Software Development with Apache Cassandra

Getting Started: !

ORM? maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Page 13: Successful Software Development with Apache Cassandra

DATA MODELING

Page 14: Successful Software Development with Apache Cassandra

Data Modeling: !

… a topic unto itself. But quickly:

Page 15: Successful Software Development with Apache Cassandra

Data Modeling - Quickly !

• It’s Hard • Do research • #1 performance problem • Tip: don’t “port” your schema

Page 16: Successful Software Development with Apache Cassandra

DEVELOPER PRODUCTIVITY

Page 17: Successful Software Development with Apache Cassandra

Productivity: !

use CQL

Page 18: Successful Software Development with Apache Cassandra

Productivity - Using CQL: !

• tools support • easy tracing (and trace discovery) • documentation*

*Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Page 19: Successful Software Development with Apache Cassandra

Productivity: !

Use the Java Driver

Page 20: Successful Software Development with Apache Cassandra

Productivity - Java Driver :

!

• Reference implementation • Well written, extensive coverage • open source

https://github.com/datastax/java-driver/

Page 21: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Existing Spring Users: Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Page 22: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Guice Users: “GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Page 23: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Configuration is Similar to Other DB Drivers (with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Page 24: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Major Difference: it’s a Cluster!

Page 25: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Two groups of configurations !

• policies • connections

Page 26: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Three Policy Types: • load balancing • connection • retry

Page 27: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Configuration: !

Connection Options: • protocol* • pooling • socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Page 28: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

Embrace Asynchronicity (but use RxJava)

https://github.com/ReactiveX/RxJava

Page 29: Successful Software Development with Apache Cassandra

Productivity - Java Driver : !

A note about User Defined Types (UTDs)

Page 30: Successful Software Development with Apache Cassandra

Productivity - Java Driver - Using UDTs: !

Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Page 31: Successful Software Development with Apache Cassandra

Productivity: !

Tools: DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Page 32: Successful Software Development with Apache Cassandra

Productivity: !

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/

Page 33: Successful Software Development with Apache Cassandra

Productivity - Instrumentation via Metrics API: !

Run Riemann locally

http://riemann.io/

Page 34: Successful Software Development with Apache Cassandra
Page 35: Successful Software Development with Apache Cassandra

Productivity: !

Trace Frequently

Page 36: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 37: Successful Software Development with Apache Cassandra

cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; !

doc_version ------------- 65856 !

!

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Page 38: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 39: Successful Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …

Page 40: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 41: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

Page 42: Successful Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 43: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Page 44: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

`nodetool settraceprobability`

Page 45: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

…then make sure you try it again

with a node down!

Page 46: Successful Software Development with Apache Cassandra

Productivity - Tracing: !

Final note on tracing: do it sparingly

Page 47: Successful Software Development with Apache Cassandra

Productivity: !

Logging Verbosity can be changed dynamically**

!

!

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Page 48: Successful Software Development with Apache Cassandra

Productivity: !

nodetool for developers • cfstats • cfshistograms • proxyhistograms

Page 49: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfstats:

cfstats: per-table statistics about size

and performance (single most useful command)

Page 50: Successful Software Development with Apache Cassandra

Productivity - nodetool - cfhistograms:

cfhistograms: column count and partition size vs. latency distribution

Page 51: Successful Software Development with Apache Cassandra

Productivity - nodetool - proxyhistograms:

proxyhistograms: performance of inter-cluster

requests

Page 52: Successful Software Development with Apache Cassandra

Productivity: !

Running Cassandra during development

Page 53: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Local Cassandra • easy to setup • you control it • but then you control it!

Page 54: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

CCM • supports multiple versions • clusters and datacenters • up/down individual nodeshttps://github.com/pcmanus/ccm

Page 55: Successful Software Development with Apache Cassandra

Productivity - Running Cassandra: !

Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production!

http://www.vagrantup.com/

Page 56: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 57: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 58: Successful Software Development with Apache Cassandra

server_count = 3 network = '192.168.2.' first_ip = 10 !

servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end

Page 59: Successful Software Development with Apache Cassandra

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

Page 60: Successful Software Development with Apache Cassandra

ENCAPSULATE ENVIRONMENTS

Page 61: Successful Software Development with Apache Cassandra

Environments: !

Configuration Management is Essential

Page 62: Successful Software Development with Apache Cassandra

Environments: !

Laptop to Production with NO

Manual Modifications!

Page 63: Successful Software Development with Apache Cassandra

TESTING

Page 64: Successful Software Development with Apache Cassandra

Testing:

Use a Naming Scheme !

• *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”

Page 65: Successful Software Development with Apache Cassandra

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Page 66: Successful Software Development with Apache Cassandra

Testing:

Group tests into

logical units (“suites”)

Page 67: Successful Software Development with Apache Cassandra

Testing - Suites:

Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles

Page 68: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 69: Successful Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 70: Successful Software Development with Apache Cassandra

Testing - Suites:

Using annotations for suites in code

Page 71: Successful Software Development with Apache Cassandra
Page 72: Successful Software Development with Apache Cassandra

Testing: !

Use Mocks where possible

Page 73: Successful Software Development with Apache Cassandra

Testing: !

Unit Integration Testing

Page 74: Successful Software Development with Apache Cassandra

Testing:

Verify Assumptions: test failure scenarios

explicitly

Page 75: Successful Software Development with Apache Cassandra

Testing - Integration:

Runtime Integrations: • local • in-process • forked-process

Page 76: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

EmbeddedCassandra

Page 77: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Page 78: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

CCMBridge: delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Page 79: Successful Software Development with Apache Cassandra

Testing - Integration - Runtime:

Vagrant: delegate to vagrant cli

Page 80: Successful Software Development with Apache Cassandra

Testing - Integration:

Best Practice: Jenkins should be able to

manage your cluster

Page 81: Successful Software Development with Apache Cassandra

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge? !

• choice of style, really • developer integration with CM • what else is in the architecture?

Page 82: Successful Software Development with Apache Cassandra

Testing: !

Load Testing Goals • reproducible metrics • catch regressions • test to breakage point

Page 83: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Stress.java (lot’s of changes recently)

Page 84: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

CassandraJMeter

https://github.com/Netflix/CassJMeter

Page 85: Successful Software Development with Apache Cassandra

Testing - Load Testing: !

Workload recording and playback coming soon

https://issues.apache.org/jira/browse/CASSANDRA-6572

Page 86: Successful Software Development with Apache Cassandra

Testing: !

Primary testing goal: Don’t let

cluster behavior surprise you.

Page 87: Successful Software Development with Apache Cassandra

Summary: • Go slowly with bite sized chunks • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures

Page 88: Successful Software Development with Apache Cassandra

Thanks. !

Page 89: Successful Software Development with Apache Cassandra

Nate McCall @zznate

!

Co-Founder & Sr. Technical Consultant www.thelastpickle.com

#CassandraSummit