gov 2.0: scaling, automation, & management in the cloud
DESCRIPTION
Gov 2.0: Scaling, Automation, & Management in the CloudTRANSCRIPT
Copyright © 2010 Opscode, Inc - All Rights Reserved
Speaker:
‣ [email protected]‣ @jesserobbins‣ www.opscode.com
Jesse Robbins CEO
Scaling in the Cloud
1
Copyright © 2010 Opscode, Inc. – Confidential – Do Not Redistribute
Opscode makes a new kind of Infrastructure Automation, offered as a hosted Service.
2
http://www.flickr.com/photos/timyates/2854357446/sizes/l/
•Developers?
http://www.flickr.com/photos/timyates/2854357446/sizes/l/
•Developers?
•Systems Administrators?
http://www.flickr.com/photos/timyates/2854357446/sizes/l/
•Developers?
•Systems Administrators?
•Executives/Leaders?
http://www.flickr.com/photos/timyates/2854357446/sizes/l/
For Developers...
For Developers...
• Do it yourself.
For Developers...
• Do it yourself.
• The infrastructure is the application (and vice versa).
For Developers...
• Do it yourself.
• The infrastructure is the application (and vice versa).
• You are not a Systems Administrator.
For Developers...
• Do it yourself.
• The infrastructure is the application (and vice versa).
• You are not a Systems Administrator.
• You need tools.
Sysadmins..
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
Sysadmins..• Say “Yes”.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
Sysadmins..• Say “Yes”.
• You never liked rack and stack that much anyway.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
Sysadmins..• Say “Yes”.
• You never liked rack and stack that much anyway.
• You have never been more critical.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
Sysadmins..• Say “Yes”.
• You never liked rack and stack that much anyway.
• You have never been more critical.
• Lean into it.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
Executives...
Executives...
• Not a magic unicorn
Executives...
• Not a magic unicorn
• Benefits come from efficiency, not raw Capex
Executives...
• Not a magic unicorn
• Benefits come from efficiency, not raw Capex
• Has real cultural implications at every level
Executives...
• Not a magic unicorn
• Benefits come from efficiency, not raw Capex
• Has real cultural implications at every level
• You are the biggest asset to success
Copyright © 2010 Opscode, Inc - All Rights Reserved 7
(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)
10
20
30
40
50
“Traditional” Operations
# o
f H
our
s
05
101520
1 2 3 4 5 6 7 9 10 11 12
Ser
vers
Week #
10
20
30
40
50
Operations - The “Secret Sauce”
UpkeepConfigOS InstallHardware
05
101520
1 2 3 4 5 6 7 9 10 11 12
Week #
ExistingNew
Copyright © 2010 Opscode, Inc - All Rights Reserved 7
(http://radar.oreilly.com/archives/2007/10/operations-advantage.html)
10
20
30
40
50
“Traditional” Operations
# o
f H
our
s
05
101520
1 2 3 4 5 6 7 9 10 11 12
Ser
vers
Week #
10
20
30
40
50
Operations - The “Secret Sauce”
UpkeepConfigOS InstallHardware
05
101520
1 2 3 4 5 6 7 9 10 11 12
Week #
ExistingNew
This is the secret of Cloud Computing.
Every other virtue stems from here.
You are 10% Unique
You are 10% Unique
And itʼs probablythe things you did wrong
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
1999Inventory, packaged file transers and desktops
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
1999Inventory, packaged file transers and desktops
2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
1999Inventory, packaged file transers and desktops
2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success
2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
1999Inventory, packaged file transers and desktops
2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success
2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins
2008 Unattended server in 2 minutes 5000 servers in a week
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is Hard
9
1999Inventory, packaged file transers and desktops
2005Unattended bare metal servers “very very” hard7k Nodes took 5 days w/90 success
2007 Unattended bare metal in under 10 minutesFully configured in under 3 mins
2008 Unattended server in 2 minutes 5000 servers in a week
201010k Nodes in under 5 minutes
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
‣ Demand is dynamic
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
‣ Demand is dynamic
‣ Developers are crucial to Operations
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
‣ Demand is dynamic
‣ Developers are crucial to Operations
‣ Web / Cloud services are proliferating...and Enterprise is following along.
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
‣ Demand is dynamic
‣ Developers are crucial to Operations
‣ Web / Cloud services are proliferating...and Enterprise is following along.
‣ Manual configuration no longer a crutch
Copyright © 2010 Opscode, Inc - All Rights Reserved
Infrastructure is changing
10
‣ Easier to get (good!)...but harder to manage (bad!)
‣ Demand is dynamic
‣ Developers are crucial to Operations
‣ Web / Cloud services are proliferating...and Enterprise is following along.
‣ Manual configuration no longer a crutch
‣ Few tools to solve a ubiquitous problem
Copyright © 2010 Opscode, Inc - All Rights Reserved
Managing Infrastructure Is HardHas Always Been
1980
1989
1999
2001
•Solve very little of the problem...
•Reach just a handful of large, enterprise customers
•Require custom implementations with large professional services bills
•Deployed exclusively on-premise
•Acquired by companies with large consulting organizations (IBM, HP, CA)
Previous Attempts Typically...
Proprietary Solutions
Copyright © 2010 Opscode, Inc - All Rights Reserved
Google, Amazon, Microsoftbuilt their own tools
12
Copyright © 2010 Opscode, Inc - All Rights Reserved13
but it’s “secret sauce”
Copyright © 2010 Opscode, Inc - All Rights Reserved
P
everyone else is here
... inexperienced & poorly equipped for the world they must now operate in.
14
“Cloud”
Cloud
Alistair’s mom’s definition
Cloud Web=
Alistair’s mom’s definition
Cloud Web= Internet=
Alistair’s mom’s definition
Cloud Web= Internet= Useless=
Alistair’s mom’s definition
Slide courtesy Alistair Croll - [email protected]
Private Public
Slide courtesy Alistair Croll - [email protected]
Managedhosting
Virtualization
Private Public
IaaS IaaS
Slide courtesy Alistair Croll - [email protected]
Managedhosting
Virtualization
Private Public
PaaS PaaS
IaaS IaaS
Slide courtesy Alistair Croll - [email protected]
Managedhosting
Virtualization
Private Public
SaaS
PaaS PaaS
IaaS IaaS
Slide courtesy Alistair Croll - [email protected]
Managedhosting
Virtualization
Private Public
SaaS
PaaS PaaS
IaaS IaaS
If you want to
talk clouds,
pick one first.
Slide courtesy Alistair Croll - [email protected]
Infrastructure as a Service(IaaS)
Amazon EC2, Rackspace Cloud, Terremark, Gogrid, Joyent (and nearly every private cloud built on Zenserver or VMWare.)
Slide courtesy Alistair Croll - [email protected]
Dedicatedhardware
On-premiseprivate clouds
Virtualprivate clouds
Third-partypublic clouds
Slide courtesy Alistair Croll - [email protected]
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Always in cloud
Partner access
Proximity to cloud services (storage,
CDN, etc.)
Massively grid/parallel (genomic,
modelling)
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Always in cloud
Partner access
Proximity to cloud services (storage,
CDN, etc.)
Massively grid/parallel (genomic,
modelling)Lo
ad/p
ricin
g en
gine
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Always in cloud
Partner access
Proximity to cloud services (storage,
CDN, etc.)
Massively grid/parallel (genomic,
modelling)Lo
ad/p
ricin
g en
gine
Polic
y en
gine
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Always in cloud
Partner access
Proximity to cloud services (storage,
CDN, etc.)
Massively grid/parallel (genomic,
modelling)Lo
ad/p
ricin
g en
gine
Polic
y en
gine
Virtual machine(infrastructure cloud)
Slide courtesy Alistair Croll - [email protected]
Always on premise
Private
Compliance-enforced
Need to track and audit
Legislative
Data near local computation
Can be done anywhere
Testing
Training
Prototyping
Batch processing
Seasonal load
Always in cloud
Partner access
Proximity to cloud services (storage,
CDN, etc.)
Massively grid/parallel (genomic,
modelling)Lo
ad/p
ricin
g en
gine
Polic
y en
gine
Compute task(service cloud)
Slide courtesy Alistair Croll - [email protected]
Automation
Bootstrapping
Bootstrapping ApproachesGood Bad Time
Corp Approvals
Agile Corp Approvals
Cloud
Known Costs, No Variation.
Anything you want, as long as IT pre-approved it.
High Waste (Hoarding)Red Tape
Expensive ($/Time)Long lead time
6-8w
Known Costs.Total Hardware Control.
Trivial Approvals.
Lower WasteLess Red Tape
Still slowExpensive ($/Time)Shorter lead time
2-4w
Variable Costs.Highly Adaptable.Minimal lead time.Trivial approvals.
No humans needed.
Variable Costs.No control over hardware.
Must re-train.5-10m
curl -O http://brainspl.at/velocity.sh && sh velocity.sh
Configuration
Configuration ApproachesGood Bad
Manual
Ad-Hoc
Infrastructure as Code
You can do anything.Results in an intimate knowledge
of the details.
Slow.Error Prone (Bus Error!)
Non-repeatable.Difficult knowledge transfer.
More repeatable.Knowledge is dispersed.
Built your way, with your model.
Rarely idempotent.Hard to collaborate.
Brittle.No API.
Repeatable.Idempotent.
Agile.Sharable.
Self documenting.
Have to learn how to use it.Hard things remain hard.
Not magic. (Yet!)
Command and Control
Command and ControlGood Bad
Meatcloud*
Ad-Hoc
Framework
Super flexible.Can do almost anything.
Always easy to find someone to blame.
Free will.
Error Prone.Slow.
Expensive to Scale.Not repeatable.
Free will.
More repeatable.Easier to scale.
Less error prone (hopefully!)
One-off by neccessity.Tooling sprawl.
Hard to share solutions.Much higher learning curve.
One system to learn.Scales well.
Paint by numbers.Repeatable.
Two-Way.
Not everything maps cleanly.Trades depth of knowledge for
ease of use.
*Meatcloud appears in this presentation courtesy of Andrew Shafer - http://is.gd/Ega
Lightning Strikes!
Webservers
Webservers
Database Servers
DOOM
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
DOOM
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOM
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOM
Monitoring Signals Nanite
/node/down Service
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOMNanite
boots new EC2 Instances, with
Chef Role + Attribute
Nanite removes nodes in Chef
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOMProvisions
Instances, EBS, Elastic IPs
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOM
Chef configures nodes
according to assigned
Lightning Strikes!
Webservers
Webservers
Database Servers
XX X
Configuration
BootstrappingCommand &Control
MonitoringSystem Updates
Signals Moar!
Provisions
11
12
12
1313
1414
15
DOOM
Chef updates the monitoring
system
A word about Scaling...
Typical Peak Load
Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes
1.Bring on capacity as traffic ramps up2.Take down capacity as it ramps down3.10-15 Minutes on either side, fully unattended
Atypical Load
1.Hope you know it is coming.2.Increase capacity in advance.3.Take down capacity as it ramps down.
Graphs in this portion of the presentation taken from Theo Schlossnaglehttp://omniti.com/seeds/dissecting-todays-internet-traffic-spikes
No way around
Capacity Planning
However,you are
still better off!
Capacity Planning is king.
http://www.flickr.com/photos/allspaw/2095439645/sizes/l/
Have a queue?
Have a queue?
Does it scale linearly with more resources?
Have a queue?
Does it scale linearly with more resources?
Congratulations - you can auto-scale!
NoSQL
http://www.flickr.com/photos/wingler/3429634150/sizes/l/
CAP Theorem
• Consistency
• Availability
• Partition Tolerance
PickTwo
Most SQL Databases
• Choose Consistency over all
• Availability comes distant second
Web Applications need...
• Availability
• Partition Tolerance
“Global temporal consistency is a fiction”
Christopher Brown
Choosing Consistency for your Web App...
Means failure is global
When you choose Partition Tolerance and
Availability...
You fail or succeed for a subset of users
Apologies
• Apologize after the fact for failures
• Better than nothing at all
NoSQL
• Many different tools
• They tweak CAP differently
• CouchDB
• Cassandra
• Redis
• MongoDB
Copyright © 2010 Opscode, Inc - All Rights Reserved
Speaker:
‣ [email protected]‣ @jesserobbins‣ www.opscode.com
Jesse Robbins CEO
Scaling in the Cloud
43