GumGum: Multi-Region Cassandra in AWS

Download GumGum: Multi-Region Cassandra in AWS

Post on 26-Jan-2017




5 download

Embed Size (px)


<ul><li><p> CASSANDRA SUMMIT 2015 CASSANDRA SUMMIT 2015</p><p>Mario Lazaro</p><p>September 24th 2015</p><p>#CassandraSummit2015</p><p>MULTI-REGION CASSANDRA IN AWSMULTI-REGION CASSANDRA IN AWS</p></li><li><p>WHOAMIWHOAMI</p><p>Mario Cerdan Lazaro</p><p>Big Data Engineer</p><p>Born and raised in Spain</p><p>Joined GumGum 18 months agoAbout a year and a half experience withCassandra</p></li><li><p>#5 Ad Platform in the U.S</p><p>10B impressions / month</p><p>2,000 brand-safe premiumpublisher partners</p><p>1B+ global unique visitors</p><p>Daily inventory Impressionsprocessed - 213M</p><p>Monthly Image Impressionsprocessed - 2.6B</p><p>123 employees in seven offices</p></li><li><p>AGENDAAGENDAOld clusterInternational Expansion</p><p>ChallengesTestingModus OperandiTipsQuestions &amp; Answers</p></li><li><p> 25 Classic nodes cluster</p><p>1 Region / 1 Rack hosted in</p><p>AWS EC2 US East</p><p>Version 2.0.8</p><p>Datastax CQL driver</p><p>GumGum's metadata including</p><p>visitors, images, pages, and ad</p><p>performance</p><p>Usage: realtime data access</p><p>and analytics (MR jobs)</p><p>OLD C* CLUSTER - MARCH 2015OLD C* CLUSTER - MARCH 2015</p></li><li><p>OLD C* CLUSTER - REALTIME USE CASEOLD C* CLUSTER - REALTIME USE CASE</p><p>Billions of rows</p><p>Heavy read workload </p><p>60/40</p><p>TTLs everywhere - Tombstones</p><p>Heavy and critical use of counters</p><p>RTB - Read Latency constraints</p><p>(total execution time ~50 ms) </p></li><li><p>OLD C* CLUSTER - ANALYTICS USE CASEOLD C* CLUSTER - ANALYTICS USE CASE</p><p>Daily ETL jobs to extract / join data</p><p>from C*</p><p>Hadoop MR jobsAdHoc queries with Presto</p></li><li><p>INTERNATIONAL EXPANSIONINTERNATIONAL EXPANSION</p></li><li><p>FIRST STEPSFIRST STEPSStart C* test datacenters in USEast &amp; EU West and test how C*multi region works in AWSRun capacity/performance tests.We expect 3x times more trafficin 2015 Q4</p><p>FIRST THOUGHTSFIRST THOUGHTSUse AWS Virtual PrivateCloud (VPC)Cassandra &amp; VPC presentsome connectivityissues challengesReplicate entire datawith same number of replicas</p></li><li><p>TOO GOOD TO BE TRUE ...TOO GOOD TO BE TRUE ...</p></li><li><p>CHALLENGESCHALLENGESProblems between Cassandra in EC2Classic / VPC and Datastax Javadriver</p><p>EC2MultiRegionSnitch uses public</p><p>IPs. EC2 instances do not have an</p><p>interface with public IP address -</p><p>Cannot connect between instances in</p><p>the same region using Public IPs</p><p>/** * Implementation of {@link AddressTranslater} used by the driver that * translate external IPs to internal IPs. * @author Mario */public class Ec2ClassicTranslater implements AddressTranslater { private static final Logger LOGGER = LoggerFactory.getLogger(Ec2ClassicTranslater.class);</p><p> private ClusterService clusterService; private Cluster cluster; private List publicDnss;</p><p> @PostConstruct public void build() { publicDnss = clusterService.getInstances(cluster); }</p><p> /** * Translates a Cassandra {@code rpc_address} to another address if necessary. * * * @param address the address of a node as returned by Cassandra. * @return {@code address} translated IP address of the source. */ public InetSocketAddress translate(InetSocketAddress address) { for (final Instance server : publicDnss) { if (server.getPublicIpAddress().equals(address.getHostString())) {"IP address: {} translated to {}", address.getHostString(), server.getPrivateIpAddress()); return new InetSocketAddress(server.getPrivateIpAddress(), address.getPort()); } } return null; }</p><p> public void setClusterService(ClusterService clusterService) { this.clusterService = clusterService; }</p><p> public void setCluster(Cluster cluster) { this.cluster = cluster; }}</p><p>Problems between Cassandra in EC2Classic / VPC and Datastax Javadriver</p><p>EC2MultiRegionSnitch uses public</p><p>IPs. EC2 instances do not have an</p><p>interface with public IP address -</p><p>Cannot connect between instances in</p><p>the same region using Public IPs</p><p>Region to Region connectivity will use</p><p>public IPs - Trust those IPs or use</p><p>software/hardware VPN</p><p>Problems between Cassandra in EC2Classic / VPC and Datastax Javadriver</p><p>EC2MultiRegionSnitch uses public</p><p>IPs. EC2 instances do not have an</p><p>interface with public IP address -</p><p>Cannot connect between instances in</p><p>the same region using Public IPs</p><p>Region to Region connectivity will use</p><p>public IPs - Trust those IPs or use</p><p>software/hardware VPN</p><p>Your application needs to connect to</p><p>C* using private IPs - Custom EC2</p><p>translator</p></li><li><p>Datastax Java Driver Load Balancing</p><p>Multiple choices</p><p>DCAware + TokenAware</p><p>Datastax Java Driver Load Balancing</p><p>Multiple choices</p><p>DCAware + TokenAware + ?</p><p>Datastax Java Driver Load Balancing</p><p>Multiple choices</p><p>CHALLENGES</p><p> Clients in one AZ attempt to always communicate with C*nodes in the same AZ. We call this zone-aware connections. Thisfeature is built into , Netflixs C* Java client library.Astyanax</p><p></p></li><li><p>Zone Aware Connection:</p><p>Webapps in 3 different AZs: 1A, 1B, and 1CC* datacenter spanning 3 AZs with 3 replicas</p><p>CHALLENGESCHALLENGES</p><p>1A</p><p>1B</p><p>1C</p><p>1B1B</p></li><li><p>We added it! - Rack/AZ awareness toTokenAware Policy</p><p>CHALLENGESCHALLENGES</p></li><li><p>CHALLENGESCHALLENGESThird Datacenter: Analytics</p><p>Do not impact realtime data accessSpark on top of CassandraSpark-Cassandra Datastax connectorReplicate specific keyspacesLess nodes with larger disk spaceSettings are different</p><p>Ex: Bloom filter chance</p></li><li><p>CHALLENGESCHALLENGESThird Datacenter: Analytics</p><p>Cassandra Only DC</p><p>Realtime</p><p>Cassandra + Spark DC</p><p>Analytics</p></li><li><p>CHALLENGESCHALLENGESUpgrade from 2.0.8 to 2.1.5</p><p>Counters implementation is buggy inpre-2.1 versions</p><p> My code never has bugs. It just develops randomunexpected features</p></li><li><p>CHALLENGESCHALLENGES To choose, or not to choose VNodes. That is the question.</p><p>(M. Lazaro, 1990 - 2500)</p><p>Previous DC using Classic Nodes</p><p>Works with MR jobs</p><p>Complexity for adding/removing nodes</p><p>Manual manage token ranges</p><p>New DCs will use VNodes</p><p>Apache Spark + Spark Cassandra Datastax</p><p>connector</p><p>Easy to add/remove new nodes as traffic</p><p>increases</p></li><li><p>TESTINGTESTING</p></li><li><p>TESTINGTESTINGTesting requires creating and modifyingmany C* nodes</p><p>Create and configuring a C* clusteris time-consuming / repetitive taskCreate fully automated process forcreating/modifying/destroyingCassandra clusters with Ansible</p><p># Ansible settings for provisioning the EC2 instance--- ec2_instance_type: r3.2xlarge</p><p> ec2_count: - 0 # How many in us-east-1a ? - 7 # How many in us-east-1b ?</p><p> ec2_vpc_subnet: - undefined - subnet-c51241b2 - undefined - subnet-80f085d9 - subnet-f9138cd2</p><p> ec2_sg: - va-ops - va-cassandra-realtime-private</p></li><li><p>TESTING - PERFORMANCETESTING - PERFORMANCEPerformance tests using newCassandra 2.1 Stress Tool:</p><p>Recreate GumGum metadata /</p><p>schemas</p><p>Recreate workload and make</p><p>it 3 times bigger</p><p>Try to find limits / Saturate</p><p>clients</p><p># Keyspace Namekeyspace: stresscql #keyspace_definition: |# CREATE KEYSPACE stresscql WITH replication = {'class': #'NetworkTopologyStrategy'</p><p>### Column Distribution Specifications ###</p><p>columnspec: - name: visitor_id size: gaussian(32..32) #domain names are relatively short population: uniform(1..999M) #10M possible domains to pick </p><p> - name: bidder_code cluster: fixed(5) - name: bluekai_category_id - name: bidder_custom size: fixed(32) - name: bidder_id size: fixed(32) - name: bluekai_id size: fixed(32) - name: dt_pd - name: rt_exp_dt - name: rt_opt_out ### Batch Ratio Distribution Specifications ###</p><p>insert: partitions: fixed(1) # Our partition key is the visitor_id so select: fixed(1)/5 # We have 5 bidder_code per domain batchtype: UNLOGGED # Unlogged batches ## A list of queries you wish to run against the schema#</p></li><li><p>TESTING - PERFORMANCETESTING - PERFORMANCE</p><p>Main worry:</p><p>Latency and replication overseas</p><p>Use LOCAL_X consistency levels in your clientOnly one C* node will contact only one C* node in adifferent DC for sending replicas/mutations</p></li><li><p>TESTING - PERFORMANCETESTING - PERFORMANCEMain worries:</p><p>Latency</p></li><li><p>TESTING - INSTANCE TYPETESTING - INSTANCE TYPETest all kind of instance types. We decided to</p><p>go with r3.2xlarge machines for our cluster:</p><p>60 GB RAM</p><p>8 Cores</p><p>160GB Ephemeral SSD Storage for commit logs and</p><p>saved caches</p><p>RAID 0 over 4 SSD EBS Volumes for data</p><p>Performance / Cost and GumGum usecase makes r3.2xlarge the best optionDisclosure: I2 instance family is the best ifyou can afford it</p></li><li><p>TESTING - UPGRADETESTING - UPGRADEUpgrade C* Datacenter from 2.0.8 to 2.1.5</p><p>Both versions can cohabit in the same DCNew settings and features tried</p><p>DateTieredCompactionStrategy:Compaction for Time Series DataIncremental repairsCounters new architecture</p></li><li><p>MODUS OPERANDIMODUS OPERANDI</p></li><li><p>MODUS OPERANDIMODUS OPERANDISum up</p><p>From: One cluster / One DC in US East To: One cluster / Two DCs in US East and one DCin EU West</p></li><li><p>MODUS OPERANDIMODUS OPERANDIFirst step:</p><p>Upgrade old cluster snitch from EC2Snitch to</p><p>EC2MultiRegionSnitch</p><p>Upgrade clients to handle it (aka translators)</p><p>Make sure your clients do not lose connection to upgraded</p><p>C* nodes (JIRA DataStax - )JAVA-809</p><p></p></li><li><p>MODUS OPERANDIMODUS OPERANDISecond step:</p><p>Upgrade old datacenter from 2.0.8 to 2.1.5</p><p>nodetool upgradesstables (multiple nodes at a time)Not possible to rebuild a 2.1.X C* node from a 2.0.X C*</p><p>datacenter.</p><p>rebuildWARN [Thread-12683] 2015-06-17 10:17:22,845 -UnknownColumnFamilyException reading from socket;closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=XXX</p></li><li><p>MODUS OPERANDIMODUS OPERANDIThird step:</p><p>Start EU West and new US East DCs within the samecluster</p><p>Replication factor in new DCs: 0</p><p>Use dc_suffix to differentiate new Virginia DC from old oneClients do not talk to new DCs. Only C* knows they exist</p><p>Replication factor to 3 on all except analytics 1 </p><p>Start receiving new dataNodetool rebuild </p><p>Old data</p></li><li><p>MODUS OPERANDIMODUS OPERANDI</p><p>RF 3</p><p>Clients</p><p>RF 3:0:0:0RF 3:3:3:1</p><p>US East Realtime</p><p>EU West Realtime</p><p>US East Analytics</p><p>Rebuild</p><p>Rebuild</p><p>Rebuild</p></li><li><p>From 39d8f76d9cae11b4db405f5a002e2a4f6f764b1d Mon Sep 17 00:00:00 2001From: mario Date: Wed, 17 Jun 2015 14:21:32 -0700Subject: [PATCH] AT-3576 Start using new Cassandra realtime cluster</p><p>--- src/main/java/com/gumgum/cassandra/ | 30 ++++------------------ .../com/gumgum/cassandra/ | 30 ++++++++++++++-------- src/main/java/com/gumgum/cluster/ | 3 ++- .../resources/applicationContext-cassandra.xml | 13 ++++------ src/main/resources/ | 2 +- src/main/resources/ | 3 +++ src/main/resources/ | 3 +-- src/main/resources/ | 3 +++ .../ | 2 -- .../asset/cassandra/ | 2 -- .../ | 2 -- .../com/gumgum/page/ | 2 -- .../cassandra/ | 2 -- 13 files changed, 39 insertions(+), 58 deletions(-)</p><p>MODUS OPERANDIMODUS OPERANDI</p><p>Start using new Cassandra DCs</p></li><li><p>RF 3:3:3:1</p><p>MODUS OPERANDIMODUS OPERANDI</p><p>Clients</p><p>US East Realtime</p><p>EU West Realtime</p><p>US East Analytics</p><p>RF 0:3:3:1</p></li><li><p>MODUS OPERANDIMODUS OPERANDI</p><p>Clients</p><p>US East Realtime</p><p>EU West Realtime</p><p>US East Analytics</p><p>RF 0:3:3:1RF 3:3:1Decomission</p></li><li><p>TIPSTIPS</p></li><li><p>TIPS - AUTOMATED MAINTENANCETIPS - AUTOMATED MAINTENANCEMaintenance in a multi-region C* cluster:</p><p>Ansible + Cassandra maintenance keyspace +</p><p>email report = zero human intervention!</p><p>CREATE TABLE maintenance.history ( dc text, op text, ts timestamp, ip text, PRIMARY KEY ((dc, op), ts)) WITH CLUSTERING ORDER BY (ts ASC) AND bloom_filter_fp_chance=0.010000 AND caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment='' AND dclocal_read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND read_repair_chance=0.000000 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'};</p><p>CREATE INDEX history_kscf_idx ON maintenance.history (kscf);</p><p>233-133-65:/opt/scripts/production/groovy$ groovy CassandraMaintenanceCheck.groovy -dc us-east-va-realtime -op compaction -e mario</p></li><li><p>TIPS - SPARKTIPS - SPARKNumber of workers above number of total C*</p><p>nodes in analytics</p><p>Each worker uses: </p><p>1/4 number of cores of each instance</p><p>1/3 total available RAM of each instance</p><p>Cassandra-Spark connector</p><p>SpanBy.joinWithCassandraTable(:x, :y)</p><p>Spark.cassandra.output.batch.size.bytes</p><p>Spark.cassandra.output.concurrent.writes</p></li><li><p>val conf = new SparkConf() .set("", cassandraNodes) .set("spark.cassandra.connection.local_dc", "us-east-va-analytics") .set("spark.cassandra.connection.factory", "com.gumgum.spark.bluekai.DirectLinkConnectionFactory") .set("spark.driver.memory","4g") .setAppName("Cassandra presidential candidates app")</p><p>TIPS - SPARKTIPS - SPARKCreate "translator" if using EC2MultiRegionSnitch</p><p>Spark.cassandra.connection.factory</p></li><li><p>SINCE C* IN EU WEST ...SINCE C* IN EU WEST ...US West Datacenter!</p><p>EU West DC US East DC Analytics DC US West DC</p></li><li><p>Q&amp;AQ&amp;AGumGum is hiring!</p><p></p><p></p></li></ul>