several nines cluster control.pptx

Our guest speaker will be Riaan Nolan of Foodpanda/Hellofood, Rocket Internet’s global online food delivery marketplace, operating in over 40 countries.

● eCommerce infrastructure challenges - AIA Case Study (i)

● Provisioning highly available environments across multi-server and multi-AZs● Building and maintaining configuration management systems such as Puppet● Enabling self-service infrastructure services to internal dev teams● Health and performance monitoring ● Elastic scaling & Automating failure handling● Disaster recovery

http://www.severalnines.com/sites/default/files/AIA_Case_Study.pdf (i)

http://www.severalnines.com/sites/default/files/AIA_Case_Study.pdf

http://www.severalnines.com/sites/default/files/AIA_Case_Study.pdf

eCommerce infrastructure challenges- AIA Case Study

Before Cluster Control

● Split Brain o.0- One node started with an empty gcomm:// address, thank you Puppet :)

● Track back your Data.- Use anything, callcentre logs, emails sends via AWS SES to recreate your Data.

● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.

● Be agile and move quickly NOW!

- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc.

What have I learned

Workflow Before Cluster Control

After Cluster Control

eCommerce infrastructure challenges- AIA Case Study

● Look for help!● If 20% sacrifice can fix 100% of your problems, go!

- Puppet does not have to control Cluster Control 100% and why should it?

- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera.

● Fix bugs and refine your processes.- Experience is never a bad thing, and it’s how you play the cards that you were dealt.

- Re-Create Disasters in a Sandbox

What have I learned

Workflow After Cluster Control

Provisioning highly available environments across multi-server and multi-AZs

http://aws.amazon.com/cloudformation

http://aws.amazon.com/cloudformation/aws-cloudformation-templates

https://docs.puppetlabs.com

https://forge.puppetlabs.com

HIERA &FACTER

Puppet Manifests

HardwareStack









Building and maintaining configuration management systems such as Puppet

HIERA &FACTER

Puppet Manifests

HardwareStack

https://help.github.com

https://raw.githubusercontent.com/nerdgirl/git-cheatsheet-visual/master/gitcheatsheet.png (Wallpaper)





https://raw.githubusercontent.com/nerdgirl/git-cheatsheet-visual/master/gitcheatsheet.png

https://raw.githubusercontent.com/nerdgirl/git-cheatsheet-visual/master/gitcheatsheet.png





Enabling self-service infrastructure services to internal dev teams

WTF!

Enabling self-service infrastructure services to internal dev teams - Seriously o.0

Create End-Points in your Workflow● Devs and PMs are your most demanding customers. SALE NOW ON!

- And they are also your most prized possession. Keep them happy!

● They want everything protected with little holes.- They will ask you to Htaccess protect staging, but let their requests from

the Payment Gateways through without Htaccess.

● Keeping them happy, could be as simple as an SSH tunnel >>>

- In your SSH config add something like this:

LocalForward 33306 MyRDSInstanceReplica.mydomain.com:3306

● A Vagrant or Docker Environment.- Use your existing Puppet code to spin up Docker or Vagrant instances.

● Create Environments for everyone to play in.- So what if you already have staging, create dev, dev2, qa, beta whatever they need.

● Version Control- Keep everything under Version Control, Duh!

THE END.

Health and performance monitoring

http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatch.html

http://www.elasticsearch.org/overview/kibana

http://newrelic.com

https://www.icinga.org

http://www.severalnines.com/clustercontrol





http://newrelic.com

http://newrelic.com





Elastic scaling & Automating failure handling

"MyFleetAutoScalingGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones": { "Fn::GetAZs": { "Ref": "AWS::Region" } }, “LaunchConfigurationName" : { "Ref" : "MyFleetLaunchConfig" }, "MinSize" : { "Ref" : "InstanceCountMin" }, "MaxSize" : { "Ref" : "InstanceCountMax" }, "DesiredCapacity" : { "Ref" : "InstanceCountDesired" }, "LoadBalancerNames" : [ { "Ref" : "ProductionPublicElasticLoadBalancer" }, { "Ref" : "StagingPublicElasticLoadBalancer"} ], }}

"MyFleetPublicElasticLoadBalancer" : { "Type" : "AWS::ElasticLoadBalancing::LoadBalancer", "Properties" : { "CrossZone" : true, }}

"MyFleetLaunchConfig" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [ "#!/bin/bash\n", “# e.g fleetserver-1104eb3a.mydomain.com\n”, "HOSTNAME=\"fleetserver-$(curl -s http://169.254.169.254/latest/meta-data/instance-id | cut -d '-' -f2).mydomain.com\"\n", "echo \"${HOSTNAME}\" > /etc/hostname\n", "hostname \"${HOSTNAME}\"\n", "apt-get update\n", "apt-get install -y puppet knockd\n”, "knock puppet.mydomain.com 7777 3333\n", "puppet agent -tv --server puppet.mydomain.com --waitforcert 300 --configtimeout 300\n" ] ] } } }}

node /^fleetserver/ { class { '::myfleet': }}service { ‘myservice’: ensure => running,}

Disaster recovery

"MyRDSInstance" : { "Type": "AWS::RDS::DBInstance", "Properties": { "MultiAZ" : “true”, }, "DeletionPolicy": "Snapshot"}

"Conditions" : { "CreateReadReplica" : { "Fn::Equals" : [{ "Ref" : "RDSReadReplica"}, "true" ]}}

"MyRDSInstanceReplica": { "Type": "AWS::RDS::DBInstance", "Condition" : "CreateReadReplica", "Properties": { "SourceDBInstanceIdentifier": { "Ref": "MyRDSInstance" }, }}

Re-cap: verb: rēˈkap/ - state again as a summary; recapitulate. "a way of recapping the story"

● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.

● Be agile and move quickly NOW!

- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc. Never import bad data!

● If 20% sacrifice can fix 100% of your problems, go! Go Now!- Puppet does not have to control Cluster Control 100% and why should it?

- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera

● Log everything. Seriously EVERYTHING!● Devs and PMs are your most demanding customers. SALE NOW ON!

- And they are also your most prized possession. Keep them happy!

● Use Puppet and, if you can, Cloudformation, everything must be in code! Living the dream .. <3

- If your infrastructure, applications and configurations is in code, you are only 1 commit away from a fix.

● Backup all the things One must backup xXx

- Backup, BACKUP, BAAACCCKKKUUUPPP!!!

THANK YOU!

Who is using Cluster Control?

THANK YOU

several nines cluster control.pptx

Documents

puppet galera

version control

templates https

wallpaper https

cluster control split

existing puppet code

multiazs http

internal dev teams health