capacity planning for fun & profit
Post on 18-Oct-2014
3.036 views
DESCRIPTION
Capacity Planning for fun & profit, as presented in the 2nd São Paulo Perl Mongers ConferenceTRANSCRIPT
Capacity Planningfor fun & profit
beyond cacti and top
II São Paulo Perl WorkshopRodrigo Albani de Campos - @xinu
Agenda
• Capacity planning primer: a tale of discovery
• Metrics
• Queues
• Models
Why Perl ?
• Main reason: I feel comfortable with it
• Ubiquitous and free
• Plenty of stable statistics modules available at CPAN
• Ultimately, it gets the job done
Capacity Planning
• Is just like sex...
• Everyone wants to do it
• Many say they’re doing it
• You always exaggerate how much of it you’re doing
• Most people aren’t actually doing it (despite their best efforts)
• Everybody else seems to be doing more than you
A tale of discovery
There once was a system administrator...
A tale of discovery
How many ?
Actual capacity ? Servers do we need ? How much memory ?
What’s the predicted growth ? IO Capacity ?
Typical Performance Metrics
• Load Average - uptime
• The single most misunderstood metric
• CPU - mpstat
• IO - iostat
• Memory Usage - vmstat
Typical Performance Metrics
Time series chartsI’m looking at you cacti huggers !
• Time series performance data is useful for:
• Troubleshooting
• Simplistic forecasting
• Find trends
• Identify seasonal behavior
• This left alone is NOT Capacity Planning
Frustration
• Computer systems can be harsh
• Most systems will not scale linearly
• Diminishing returns and lock contention will punch you in the face
• “Oh but I’ve checked cacti and the CPU was 25% idle”
Let’s put it in the Cloud
• We are moving back to an utility computing model
• You’re charged per usage
• Even more important to care about capacity planning !!!
Call the experts
• Cost per MIPS
• IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972
• Average purchase price: $ 771,000*
• No disks or peripherals included
• $ 4,082,039 by 2011
• Need to squeeze every drop of processing power
* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html
QueuesThe not so typical performance metrics
• 1961 - CTSS was first demonstrated at MIT
• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC
• Another offspring of Project MAC is Multics
Computer System
QueuesThe not so typical performance metrics
CPU
Disks
QueuesThe not so typical performance metrics
S
Open/ClosedNetwork
(A) λ
WR
X
A Arrival Count
λ Arrival Rate (A/T)
W Time spent in Queue
R Residence Time (W+S)
S Service Time
X System Throughput (C/T)
C Completed tasks count
(C)
Arrival Rate (λ)
• Pretty straightforward
• Requests per second/hour/day
• Not the same as throughput (X)
• Although in a steady state:
• A = C as T →∞
• λ = X
Service Time (S)
• Time spent in processing
• Web server response time
• Total query time
• IO operation time length
Mythical Performance
• Not gonna happen...
• Don’t believe vendor’s sales pitch
• “In God we trust, all others must bring data” - William Edwards Deming
!"
!#$"
!#%"
!#&"
!#'"
!#("
!#)"
!" (" $!" $(" %!" %(" &!" &(" '!" '("
!"#$%&"'(%)"'*+,'
-##%$./'0.1"'*2%1+3+,'
!"#$%&"'(%)"*+,'
*+,-./+"0.1+234"
Mythical Performance
• Not gonna happen...
• Don’t believe vendor’s sales pitch
• “In God we trust, all others must bring data” - William Edwards Deming
How to measure ?
• Apache: %D in mod_log_config
• nginx: $request_time in HttpLogModule
• use Benchmark;
• tcprstat - http://goo.gl/0cbYx
• collectd - http://goo.gl/OXKG7
• metrics - http://goo.gl/gQFVM
• sysstat - http://goo.gl/2aLul
How to measure ?
[02/Jul/2010:14:00:18... 1863
my ($date,$svctime) = (m/\[(\S+).+?\s(\d+)$/);
$arrivalRate{$date}++;
$serviceTimeAcc{$date} += $svctime;
Time to serve the request,
in μseconds.
use Chart::Clicker;
use Chart::Clicker;
use Chart::Clicker;
use Chart::Clicker;
Average Hits/s = 65.142Average Svc time = 0.0159
use Chart::Clicker;
Average Hits/s = 65.142Average Svc time = 0.0159
What to look for ?
• Stretch factor
• Method/Operation
• Geolocation
• Cookies
• Use mod_logio to measure inbound traffic as well
Modeling
Prediction is very difficult, especially if it’s about the future.
Niels Bohr
Capacity planning is about setting expectations. Even wrong expectations are better than no expectations!
Neil J. Gunther - The Guerrilla Manifesto
http://goo.gl/lZKWH
Modeling
• A model is an abstraction of a complex system
• A model allows us to observe phenomena that cannot be easily replicated
Modeling Methods
• Statistics / Trending / Forecasting
• Pros:
• Easy to understand
• Tools readily available
• Cons:
• Hard to create “What-if” scenarios
• Hard to predict contention and bottlenecks
Modeling Methods
• Queuing Analisys
• Pros:
• Allows you to make predictions when no production data is available
• Allows you to create “What-if” scenarios
• Cons:
• Sometimes it can be unintuitive
• The math behind it can be difficult
Queues as modelsTypical LAMP Stack
Clients
Apache Application Database
Requests Replies
Queues as modelsWhat if ?
Clients
Apache Application Database
Requests Replies
Cache
Queues as modelsWhat happens if we use a 15k RPM disk ?
CPU Disk 10k RPM
Queues as modelsm1.small ? m1.large ? m1.xlarge ?
Memory Bus
Virtual Cores X EC2 CU
use pdq;
• Available at http://goo.gl/s98wQ (not on CPAN)
• PDQ is a queuing circuit solver by Neil J. Gunther
• There’s a whole book about it http://goo.gl/9MA2c
use pdq;
CreateNode() Define a queuing center
CreateOpen() Define a traffic stream of an open circuit
CreateClosed() Define a traffic stream of a closed circuit
SetDemand() Define the service demand for each of the queuing centers
use pdq;
CEN Queuing Center
DLY Delay Center
Node Types
use pdq;
FCFS First-come first-served
LCFS Last-come first-served
ISRV Infinite Server
PSHR Processor Sharing
Service Disciplines
use pdq;
• Apache Web Server
• Average Network RTD: 0.00921 seconds
• Added as a delay center in the circuit
• Average Arrival Rate: 65.142 hits/s
• Average Service time: 0.0159 seconds
• 128 worker threads
use pdq;
$workload = "httpd";
$httpMaxClient = 128;
pdq::Init("web server");
$arrivalRate = 65.142;
$serviceTime = 0.1159;
$pdq::streams = \
pdq::CreateOpen($workload,$arrivalRate);
pdq::Report();
Metric Value Unit
------ ----- ----
Workload: "httpd"
Number in system 8.0279 Trans
Mean throughput 65.1420 Trans/Sec
Response time 0.1232 Sec
Stretch factor 1.0626
pdq::Report();
Bounds Analysis:
Max throughput 1104.4003 Trans/Sec
Min response 0.1160 Sec
pdq::Report();
• Average request size: 145 KBytes
• ~ 1160 Kbits
• @1104 transactions / second:
• 1,280,640 Kbits /s ~ 1.28 Gbps
Resources and References
• CMG Public Proceedings: http://www.cmg.org/proceedings/
• Measure IT:http://www.cmg.org/measureit/
• Guerrilla Capacity Planninghttp://www.perfdynamics.com/Classes/Outlines/guerilla.html
Resources and References
• Performance by Design - Menasce, Dowdy, Almeida - http://amzn.to/mpqfVO
• Capacity Planning for Web Performance: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida - http://amzn.to/lOATba
• Capacity Planning for Web Services: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida - http://amzn.to/iClpsB
Resources and References
• Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services - Neil Gunther - http://amzn.to/kfrfLK
• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling - R. K. Jain - http://amzn.to/jqud1I
Any questions ?