Cassandra@Coursera: AWS deploy and MySQL transition

Download Cassandra@Coursera: AWS deploy and MySQL transition

Post on 19-Aug-2014

624 views

Category:

Engineering

9 download

DESCRIPTION

Touches on what Coursera aims to get out of Cassandra, what goes into a good deployment, and our experience so far transitioning off MySQL.

TRANSCRIPT

Cassandra @ Coursera Deploying in AWS MySQL Transition Daniel Chia @DanielJHChia Software Engineer, Infrastructure Overview Why Cassandra What goes into a good deployment MySQL Cassandra transition experience 110 partners ! 698 courses ! 8.5 million learners A Coursera Course Your Final Project This is your chance to apply the course concepts to real-world situations Identity Veried Certicates Technical 100% hosted on AWS Service-oriented architecture Mix of MySQL and Cassandra for persistence What do we care about? We care about Availability Scalability Operational Ease Latency (Bonus) Multi-region writes Availability matters EBS Outage (2012) Master us-east-1a Slave us-east-1c Scalability Scalability Sharded by class class1 class2 class3 class4 class5 Machine 1 class6 class7 class8 class9 class10 Machine 2 class11 class12 class13 class14 class15 Machine 3 New use-case Uh-oh doesnt t in existing sharding We care about Availability Scalability Operational Ease Performance (Bonus) Multi-region Try Cassandra! So we decided to Cassandra [database XYZ] Albert Einstein But if you judge a sh by its ability to climb a tree, it will live its whole life believing that it is stupid. Time to deploy Cassandra! sudo apt-get install dse-full A good deployment Machine-level Cluster-level Picking a machine Disk IOPS IOPS IOPS Latency Author: D-Kuru/Wikimedia Commons Licence: CC-BY-SA-3.0-AT Picking a machine CPU Author: Mark Sze Licence: CC BY-NC-ND 2.0 Picking a machine Memory Save some for page cache! Author: brutalSoCal Licence: CC BY-NC-ND 2.0 On AWS Ephemeral disks. Please dont use EBS. Really. IOPS usually the problem Instance sizes: spinning disk: m1.large, m1.xlarge, m2.4xlarge ssd: m3.xlarge, c3.2xlarge, i2.* Set up the machine Lots of documentation / talks about this Recommended reading: Datastax guide [1] [1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html Cluster conguration A C B Priam care and feeding of Cassandra on AWS https://github.com/Netix/Priam Cluster Topology We use RF=3 Ring balanced within datacenter Nodes alternate racks (or AZs) Cluster Topology (Priam) Token assignments stored in a database Can takeover token in instance of node failure Cluster Topology (Priam) Priam assigns tokens evenly per region Alternates AZs within region az1 az3 az2 az1 az2 az3 Autoscaling groups Recover from lost instance We don't use it for scaling with trafc Important: Need one ASG per AZ east-1a east-1a east-1a east-1b east-1beast-1b east-1ceast-1c east-1c ASG size: 9 Important: Need one ASG per AZ ASG size: 9 east-1a east-1a east-1a east-1b east-1beast-1b east-1ceast-1c east-1b Important: Need one ASG per AZ ASG-1a size: 3 east-1a east-1a east-1a east-1b east-1beast-1b east-1ceast-1c ASG-1b size: 3 ASG-1c size: 3 east-1c Backups Data on ephemeral disks Guard against application errors SSTables immutable -> ship to S3 Priam does this Restore Have to be able use your backup Also useful for QA / test Priam handles this rather nicely Deployed! Time to chill? https://www.ickr.com/photos/spunkinator/2394514059 Creative Commons Monitoring working / not working doesnt count. We have our own custom reporter agent for Datadog Theres pluggable reporter support in 2.0.2 now. JVM GC woes JVM GC woes All happy now SSTables Read Histogram Questions? before we carry on Transition takes time mindset shift expertise (some) risk Our experience Pick one feature rst Mindset shift Data modeling consulting Libraries / Patterns / Data-as-a-service Pick one feature Dont go all in with Cassandra with something important right away Work closely with that team You probably will make mistakes Oops! Mindset shift Everyone knows SQL Not everyone knows Cassandra / NoSQL Need to know queries beforehand Enrollment Example Learners enroll into a course learner (many-to-many) course Need to keep track of this membership MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id` INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l` (`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY (`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) ) MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id` INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l` (`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY (`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) ) MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id` INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l` (`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY (`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) ) MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id` INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l` (`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY (`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) ) Cassandra Style CREATE TABLE courses_by_learner ( learner_id uuid, course_id uuid, PRIMARY KEY (learner_id, course_id) ) Data modeling consulting Build core team procient at C* data modeling Available to consult for trickier use cases Libraries / Patterns Abstract away simple (but common) use-cases Key-value storage Simple time series Maybe every developer wont need deep C* knowledge? More radical: data as a service (e.g. STAASH) STAASH: https://github.com/Netix/staash Its a long road but well get there Author: Carissa Rogers License: CC BY 2.0 Conclusion Know Cassandra Know what makes a good deployment Know that new skills have to be acquired Questions? Were hiring! coursera.org/jobs