Presentation given on 9/19/2012 to the Cassandra Boston Users Group


  Cassandra on Openstack and AWS
presenting to
Thomas Vachon September 2012
Boston Cassandra
Users Group

2. Agenda Current Openstack Implementation Running Cassandra on Openstack Lessons Learned about Cassandra on AWS Connecting Openstack and AWS Connecting Cassandra on Openstack and AWS Questions Show & Tell1 3. INTRODUCTIONCurrent Cassandra Implementation (AWS) 9 Cassandra Nodes (3 per AZ) Cassandra 1.0.10 AWS m1.xlarge 4 Drive RAID-0 Array EC2Snitch RF = 3 Network Topology Aware Statistics Peak Traffic: 724 r/s with 1308 wr/s across the cluster 3.5ms read latency avg 1.7ms write latency avg 2 4. Running Cassandra on Openstack Ec2Snitch doesnt work (looks to the wrong endpoint) Its hard to guarantee you keep your instances on separatemachines with a single zone Contention/Steal is more easily achieved due to KVM and thelack of CPU throttling As always, the faster the hardware, the better theperformance Perf Test - 5 Cassandra Nodes with RF=3 (cassandra-stress) Read/s: 1,562 w/s Writes/s: 3,846 r/s Avg latency per op: 7.2ms Seems to hurt the testing server more than the Cassandra Cluster 3 5. Lessons Learned with Cassandra and AWS Be proactive in adjusting your caches Row cache is a great thing (keep it out of heap) Key cache hit rates dictate if you should burn memory on them or not KNOW your Data and Access Patterns A slow node is worse than a dead node CPU Steal is your mortal enemy 4 6. Connecting Openstack and AWS Two Options Public Internet Replication (SSL Highly recommended) HUGE transfer costs, risky VPC Tunnel Static Tunnel with ASA ASAs can only connect to one tunnel at a time even inan HA pair BGP Tunnel with Routing Each router connects to two endpoints, HSRPbetween, extremely redundant Openstack Complexity VLAN Tagging If using VLAN tagging in Openstack, your tunnel device needs toparticipate in the VLAN which is used for VMs (300 by default) 5 7. Connecting Cassandra Since EC2Snitch doesnt work in OS, RackInferringSnitchmust be used Standard Multi-datacenter tokenization strategies are required Replication lag is dependant on connectivity and latency Tests from VPC IPSec tunnels in NJ show 8ms to Ashburn Tests from Ashburn DC datacenters are about 4ms The biggest problem is the volume of data and a hard cutover We started in EC2, but are migrating to VPC 6 8. m Questions/Suggestions? 7 9. m P.S. - We are HIRING!8 10. m Show & TellCome see our Openstack cluster 9