several nines cluster control.pptx

14
Our guest speaker will be Riaan Nolan of Foodpanda/Hellofood, Rocket Internet’s global online food delivery marketplace, operating in over 40 countries.

Upload: riaan-nolan

Post on 09-Aug-2015

152 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Several Nines Cluster Control.pptx

Our guest speaker will be Riaan Nolan of Foodpanda/Hellofood, Rocket Internet’s global online food delivery marketplace, operating in over 40 countries.

Page 2: Several Nines Cluster Control.pptx

● eCommerce infrastructure challenges - AIA Case Study (i)

● Provisioning highly available environments across multi-server and multi-AZs● Building and maintaining configuration management systems such as Puppet● Enabling self-service infrastructure services to internal dev teams● Health and performance monitoring ● Elastic scaling & Automating failure handling● Disaster recovery

http://www.severalnines.com/sites/default/files/AIA_Case_Study.pdf (i)

Page 3: Several Nines Cluster Control.pptx

eCommerce infrastructure challenges- AIA Case Study

Before Cluster Control

● Split Brain o.0- One node started with an empty gcomm:// address, thank you Puppet :)

● Track back your Data.- Use anything, callcentre logs, emails sends via AWS SES to recreate your Data.

● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.

● Be agile and move quickly NOW!

- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc.

What have I learned

Workflow Before Cluster Control

Page 4: Several Nines Cluster Control.pptx

After Cluster Control

eCommerce infrastructure challenges- AIA Case Study

● Look for help!● If 20% sacrifice can fix 100% of your problems, go!

- Puppet does not have to control Cluster Control 100% and why should it?

- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera.

● Fix bugs and refine your processes.- Experience is never a bad thing, and it’s how you play the cards that you were dealt.

- Re-Create Disasters in a Sandbox

What have I learned

Workflow After Cluster Control

Page 5: Several Nines Cluster Control.pptx

Provisioning highly available environments across multi-server and multi-AZs

http://aws.amazon.com/cloudformation

http://aws.amazon.com/cloudformation/aws-cloudformation-templates

https://docs.puppetlabs.com

https://forge.puppetlabs.com

HIERA &FACTER

Puppet Manifests

HardwareStack

Page 6: Several Nines Cluster Control.pptx

Building and maintaining configuration management systems such as Puppet

HIERA &FACTER

Puppet Manifests

HardwareStack

https://help.github.com

https://raw.githubusercontent.com/nerdgirl/git-cheatsheet-visual/master/gitcheatsheet.png (Wallpaper)

https://docs.puppetlabs.com

https://forge.puppetlabs.com

Page 7: Several Nines Cluster Control.pptx

Enabling self-service infrastructure services to internal dev teams

WTF!

Page 8: Several Nines Cluster Control.pptx

Enabling self-service infrastructure services to internal dev teams - Seriously o.0

Create End-Points in your Workflow● Devs and PMs are your most demanding customers. SALE NOW ON!

- And they are also your most prized possession. Keep them happy!

● They want everything protected with little holes.- They will ask you to Htaccess protect staging, but let their requests from

the Payment Gateways through without Htaccess.

● Keeping them happy, could be as simple as an SSH tunnel >>>

- In your SSH config add something like this:

LocalForward 33306 MyRDSInstanceReplica.mydomain.com:3306

● A Vagrant or Docker Environment.- Use your existing Puppet code to spin up Docker or Vagrant instances.

● Create Environments for everyone to play in.- So what if you already have staging, create dev, dev2, qa, beta whatever they need.

● Version Control- Keep everything under Version Control, Duh!

THE END.

Page 10: Several Nines Cluster Control.pptx

Elastic scaling & Automating failure handling

"MyFleetAutoScalingGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones": { "Fn::GetAZs": { "Ref": "AWS::Region" } }, “LaunchConfigurationName" : { "Ref" : "MyFleetLaunchConfig" }, "MinSize" : { "Ref" : "InstanceCountMin" }, "MaxSize" : { "Ref" : "InstanceCountMax" }, "DesiredCapacity" : { "Ref" : "InstanceCountDesired" }, "LoadBalancerNames" : [ { "Ref" : "ProductionPublicElasticLoadBalancer" }, { "Ref" : "StagingPublicElasticLoadBalancer"} ], }}

"MyFleetPublicElasticLoadBalancer" : { "Type" : "AWS::ElasticLoadBalancing::LoadBalancer", "Properties" : { "CrossZone" : true, }}

"MyFleetLaunchConfig" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [ "#!/bin/bash\n", “# e.g fleetserver-1104eb3a.mydomain.com\n”, "HOSTNAME=\"fleetserver-$(curl -s http://169.254.169.254/latest/meta-data/instance-id | cut -d '-' -f2).mydomain.com\"\n", "echo \"${HOSTNAME}\" > /etc/hostname\n", "hostname \"${HOSTNAME}\"\n", "apt-get update\n", "apt-get install -y puppet knockd\n”, "knock puppet.mydomain.com 7777 3333\n", "puppet agent -tv --server puppet.mydomain.com --waitforcert 300 --configtimeout 300\n" ] ] } } }}

node /^fleetserver/ { class { '::myfleet': }}service { ‘myservice’: ensure => running,}

Page 11: Several Nines Cluster Control.pptx

Disaster recovery

"MyRDSInstance" : { "Type": "AWS::RDS::DBInstance", "Properties": { "MultiAZ" : “true”, }, "DeletionPolicy": "Snapshot"}

"Conditions" : { "CreateReadReplica" : { "Fn::Equals" : [{ "Ref" : "RDSReadReplica"}, "true" ]}}

"MyRDSInstanceReplica": { "Type": "AWS::RDS::DBInstance", "Condition" : "CreateReadReplica", "Properties": { "SourceDBInstanceIdentifier": { "Ref": "MyRDSInstance" }, }}

Page 12: Several Nines Cluster Control.pptx

Re-cap: verb: rēˈkap/ - state again as a summary; recapitulate. "a way of recapping the story"

● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.

● Be agile and move quickly NOW!

- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc. Never import bad data!

● If 20% sacrifice can fix 100% of your problems, go! Go Now!- Puppet does not have to control Cluster Control 100% and why should it?

- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera

● Log everything. Seriously EVERYTHING!● Devs and PMs are your most demanding customers. SALE NOW ON!

- And they are also your most prized possession. Keep them happy!

● Use Puppet and, if you can, Cloudformation, everything must be in code! Living the dream .. <3

- If your infrastructure, applications and configurations is in code, you are only 1 commit away from a fix.

● Backup all the things One must backup xXx

- Backup, BACKUP, BAAACCCKKKUUUPPP!!!

THANK YOU!

Page 13: Several Nines Cluster Control.pptx

Who is using Cluster Control?

Page 14: Several Nines Cluster Control.pptx

THANK YOU