gumgum: multi-region cassandra in aws
TRANSCRIPT
CASSANDRA SUMMIT 2015 CASSANDRA SUMMIT 2015
Mario Lazaro
September 24th 2015
#CassandraSummit2015
MULTI-REGION CASSANDRA IN AWSMULTI-REGION CASSANDRA IN AWS
WHOAMIWHOAMI
Mario Cerdan Lazaro
Big Data Engineer
Born and raised in Spain
Joined GumGum 18 months agoAbout a year and a half experience withCassandra
#5 Ad Platform in the U.S
10B impressions / month
2,000 brand-safe premiumpublisher partners
1B+ global unique visitors
Daily inventory Impressionsprocessed - 213M
Monthly Image Impressionsprocessed - 2.6B
123 employees in seven offices
AGENDAAGENDAOld clusterInternational Expansion
ChallengesTestingModus OperandiTipsQuestions & Answers
25 Classic nodes cluster
1 Region / 1 Rack hosted in
AWS EC2 US East
Version 2.0.8
Datastax CQL driver
GumGum's metadata including
visitors, images, pages, and ad
performance
Usage: realtime data access
and analytics (MR jobs)
OLD C* CLUSTER - MARCH 2015OLD C* CLUSTER - MARCH 2015
OLD C* CLUSTER - REALTIME USE CASEOLD C* CLUSTER - REALTIME USE CASE
Billions of rows
Heavy read workload
60/40
TTLs everywhere - Tombstones
Heavy and critical use of counters
RTB - Read Latency constraints
(total execution time ~50 ms)
OLD C* CLUSTER - ANALYTICS USE CASEOLD C* CLUSTER - ANALYTICS USE CASE
Daily ETL jobs to extract / join data
from C*
Hadoop MR jobs
AdHoc queries with Presto
INTERNATIONAL EXPANSIONINTERNATIONAL EXPANSION
FIRST STEPSFIRST STEPSStart C* test datacenters in USEast & EU West and test how C*multi region works in AWSRun capacity/performance tests.We expect 3x times more trafficin 2015 Q4
FIRST THOUGHTSFIRST THOUGHTSUse AWS Virtual PrivateCloud (VPC)Cassandra & VPC presentsome connectivityissues challengesReplicate entire datawith same number of replicas
TOO GOOD TO BE TRUE ...TOO GOOD TO BE TRUE ...
CHALLENGESCHALLENGESProblems between Cassandra in EC2Classic / VPC and Datastax Javadriver
EC2MultiRegionSnitch uses public
IPs. EC2 instances do not have an
interface with public IP address -
Cannot connect between instances in
the same region using Public IPs
/** * Implementation of {@link AddressTranslater} used by the driver that * translate external IPs to internal IPs. * @author Mario <[email protected]> */public class Ec2ClassicTranslater implements AddressTranslater { private static final Logger LOGGER = LoggerFactory.getLogger(Ec2ClassicTranslater.class);
private ClusterService clusterService; private Cluster cluster; private List<Instance> publicDnss;
@PostConstruct public void build() { publicDnss = clusterService.getInstances(cluster); }
/** * Translates a Cassandra {@code rpc_address} to another address if necessary. * <p> * * @param address the address of a node as returned by Cassandra. * @return {@code address} translated IP address of the source. */ public InetSocketAddress translate(InetSocketAddress address) { for (final Instance server : publicDnss) { if (server.getPublicIpAddress().equals(address.getHostString())) { LOGGER.info("IP address: {} translated to {}", address.getHostString(), server.getPrivateIpAddress()); return new InetSocketAddress(server.getPrivateIpAddress(), address.getPort()); } } return null; }
public void setClusterService(ClusterService clusterService) { this.clusterService = clusterService; }
public void setCluster(Cluster cluster) { this.cluster = cluster; }}
Problems between Cassandra in EC2Classic / VPC and Datastax Javadriver
EC2MultiRegionSnitch uses public
IPs. EC2 instances do not have an
interface with public IP address -
Cannot connect between instances in
the same region using Public IPs
Region to Region connectivity will use
public IPs - Trust those IPs or use
software/hardware VPN
Problems between Cassandra in EC2Classic / VPC and Datastax Javadriver
EC2MultiRegionSnitch uses public
IPs. EC2 instances do not have an
interface with public IP address -
Cannot connect between instances in
the same region using Public IPs
Region to Region connectivity will use
public IPs - Trust those IPs or use
software/hardware VPN
Your application needs to connect to
C* using private IPs - Custom EC2
translator
Datastax Java Driver Load Balancing
Multiple choices
DCAware + TokenAware
Datastax Java Driver Load Balancing
Multiple choices
DCAware + TokenAware + ?
Datastax Java Driver Load Balancing
Multiple choices
CHALLENGES
“ Clients in one AZ attempt to always communicate with C*nodes in the same AZ. We call this zone-aware connections. This
feature is built into , Netflix’s C* Java client library.Astyanax
Zone Aware Connection:
Webapps in 3 different AZs: 1A, 1B, and 1CC* datacenter spanning 3 AZs with 3 replicas
CHALLENGESCHALLENGES
1A
1B
1C
1B1B
We added it! - Rack/AZ awareness toTokenAware Policy
CHALLENGESCHALLENGES
CHALLENGESCHALLENGESThird Datacenter: Analytics
Do not impact realtime data accessSpark on top of CassandraSpark-Cassandra Datastax connectorReplicate specific keyspacesLess nodes with larger disk spaceSettings are different
Ex: Bloom filter chance
CHALLENGESCHALLENGESThird Datacenter: Analytics
Cassandra Only DC
Realtime
Cassandra + Spark DC
Analytics
CHALLENGESCHALLENGESUpgrade from 2.0.8 to 2.1.5
Counters implementation is buggy inpre-2.1 versions
“ My code never has bugs. It just develops randomunexpected features
CHALLENGESCHALLENGES“ To choose, or not to choose VNodes. That is the question.
(M. Lazaro, 1990 - 2500)
Previous DC using Classic Nodes
Works with MR jobs
Complexity for adding/removing nodes
Manual manage token ranges
New DCs will use VNodes
Apache Spark + Spark Cassandra Datastax
connector
Easy to add/remove new nodes as traffic
increases
TESTINGTESTING
TESTINGTESTINGTesting requires creating and modifyingmany C* nodes
Create and configuring a C* clusteris time-consuming / repetitive taskCreate fully automated process forcreating/modifying/destroyingCassandra clusters with Ansible
# Ansible settings for provisioning the EC2 instance--- ec2_instance_type: r3.2xlarge
ec2_count: - 0 # How many in us-east-1a ? - 7 # How many in us-east-1b ?
ec2_vpc_subnet: - undefined - subnet-c51241b2 - undefined - subnet-80f085d9 - subnet-f9138cd2
ec2_sg: - va-ops - va-cassandra-realtime-private
TESTING - PERFORMANCETESTING - PERFORMANCEPerformance tests using newCassandra 2.1 Stress Tool:
Recreate GumGum metadata /
schemas
Recreate workload and make
it 3 times bigger
Try to find limits / Saturate
clients
# Keyspace Namekeyspace: stresscql #keyspace_definition: |# CREATE KEYSPACE stresscql WITH replication = {'class': #'NetworkTopologyStrategy'
### Column Distribution Specifications ###
columnspec: - name: visitor_id size: gaussian(32..32) #domain names are relatively short population: uniform(1..999M) #10M possible domains to pick
- name: bidder_code cluster: fixed(5) - name: bluekai_category_id - name: bidder_custom size: fixed(32) - name: bidder_id size: fixed(32) - name: bluekai_id size: fixed(32) - name: dt_pd - name: rt_exp_dt - name: rt_opt_out ### Batch Ratio Distribution Specifications ###
insert: partitions: fixed(1) # Our partition key is the visitor_id so select: fixed(1)/5 # We have 5 bidder_code per domain batchtype: UNLOGGED # Unlogged batches ## A list of queries you wish to run against the schema#
TESTING - PERFORMANCETESTING - PERFORMANCE
Main worry:
Latency and replication overseas
Use LOCAL_X consistency levels in your clientOnly one C* node will contact only one C* node in adifferent DC for sending replicas/mutations
TESTING - PERFORMANCETESTING - PERFORMANCEMain worries:
Latency
TESTING - INSTANCE TYPETESTING - INSTANCE TYPETest all kind of instance types. We decided to
go with r3.2xlarge machines for our cluster:
60 GB RAM
8 Cores
160GB Ephemeral SSD Storage for commit logs and
saved caches
RAID 0 over 4 SSD EBS Volumes for data
Performance / Cost and GumGum usecase makes r3.2xlarge the best optionDisclosure: I2 instance family is the best ifyou can afford it
TESTING - UPGRADETESTING - UPGRADEUpgrade C* Datacenter from 2.0.8 to 2.1.5
Both versions can cohabit in the same DCNew settings and features tried
DateTieredCompactionStrategy:Compaction for Time Series DataIncremental repairsCounters new architecture
MODUS OPERANDIMODUS OPERANDI
MODUS OPERANDIMODUS OPERANDISum up
From: One cluster / One DC in US East To: One cluster / Two DCs in US East and one DCin EU West
MODUS OPERANDIMODUS OPERANDIFirst step:
Upgrade old cluster snitch from EC2Snitch to
EC2MultiRegionSnitch
Upgrade clients to handle it (aka translators)
Make sure your clients do not lose connection to upgraded
C* nodes (JIRA DataStax - )JAVA-809
MODUS OPERANDIMODUS OPERANDISecond step:
Upgrade old datacenter from 2.0.8 to 2.1.5
nodetool upgradesstables (multiple nodes at a time)
Not possible to rebuild a 2.1.X C* node from a 2.0.X C*
datacenter.
rebuildWARN [Thread-12683] 2015-06-17 10:17:22,845 IncomingTcpConnection.java:91 -UnknownColumnFamilyException reading from socket;closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=XXX
MODUS OPERANDIMODUS OPERANDIThird step:
Start EU West and new US East DCs within the samecluster
Replication factor in new DCs: 0
Use dc_suffix to differentiate new Virginia DC from old one
Clients do not talk to new DCs. Only C* knows they exist
Replication factor to 3 on all except analytics 1
Start receiving new data
Nodetool rebuild <old-datacenter>
Old data
MODUS OPERANDIMODUS OPERANDI
RF 3
Clients
RF 3:0:0:0RF 3:3:3:1
US East Realtime
EU West Realtime
US East Analytics
Rebuild
Rebuild
Rebuild
From 39d8f76d9cae11b4db405f5a002e2a4f6f764b1d Mon Sep 17 00:00:00 2001From: mario <[email protected]>Date: Wed, 17 Jun 2015 14:21:32 -0700Subject: [PATCH] AT-3576 Start using new Cassandra realtime cluster
--- src/main/java/com/gumgum/cassandra/Client.java | 30 ++++------------------ .../com/gumgum/cassandra/Ec2ClassicTranslater.java | 30 ++++++++++++++-------- src/main/java/com/gumgum/cluster/Cluster.java | 3 ++- .../resources/applicationContext-cassandra.xml | 13 ++++------ src/main/resources/dev.properties | 2 +- src/main/resources/eu-west-1.prod.properties | 3 +++ src/main/resources/prod.properties | 3 +-- src/main/resources/us-east-1.prod.properties | 3 +++ .../CassandraAdPerformanceDaoImplTest.java | 2 -- .../asset/cassandra/CassandraImageDaoImplTest.java | 2 -- .../CassandraExactDuplicatesDaoTest.java | 2 -- .../com/gumgum/page/CassandraPageDoaImplTest.java | 2 -- .../cassandra/CassandraVisitorDaoImplTest.java | 2 -- 13 files changed, 39 insertions(+), 58 deletions(-)
MODUS OPERANDIMODUS OPERANDI
Start using new Cassandra DCs
RF 3:3:3:1
MODUS OPERANDIMODUS OPERANDI
Clients
US East Realtime
EU West Realtime
US East Analytics
RF 0:3:3:1
MODUS OPERANDIMODUS OPERANDI
Clients
US East Realtime
EU West Realtime
US East Analytics
RF 0:3:3:1RF 3:3:1Decomission
TIPSTIPS
TIPS - AUTOMATED MAINTENANCETIPS - AUTOMATED MAINTENANCEMaintenance in a multi-region C* cluster:
Ansible + Cassandra maintenance keyspace +
email report = zero human intervention!
CREATE TABLE maintenance.history ( dc text, op text, ts timestamp, ip text, PRIMARY KEY ((dc, op), ts)) WITH CLUSTERING ORDER BY (ts ASC) AND bloom_filter_fp_chance=0.010000 AND caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment='' AND dclocal_read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND read_repair_chance=0.000000 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX history_kscf_idx ON maintenance.history (kscf);
233-133-65:/opt/scripts/production/groovy$ groovy CassandraMaintenanceCheck.groovy -dc us-east-va-realtime -op compaction -e mario
TIPS - SPARKTIPS - SPARKNumber of workers above number of total C*
nodes in analytics
Each worker uses:
1/4 number of cores of each instance
1/3 total available RAM of each instance
Cassandra-Spark connector
SpanBy
.joinWithCassandraTable(:x, :y)
Spark.cassandra.output.batch.size.bytes
Spark.cassandra.output.concurrent.writes
val conf = new SparkConf() .set("spark.cassandra.connection.host", cassandraNodes) .set("spark.cassandra.connection.local_dc", "us-east-va-analytics") .set("spark.cassandra.connection.factory", "com.gumgum.spark.bluekai.DirectLinkConnectionFactory") .set("spark.driver.memory","4g") .setAppName("Cassandra presidential candidates app")
TIPS - SPARKTIPS - SPARKCreate "translator" if using EC2MultiRegionSnitch
Spark.cassandra.connection.factory
SINCE C* IN EU WEST ...SINCE C* IN EU WEST ...US West Datacenter!
EU West DC US East DC Analytics DC US West DC