capacity planning for fun & profit

Capacity Planningfor fun & profit

beyond cacti and top

II São Paulo Perl WorkshopRodrigo Albani de Campos - @xinu

[email protected]

mailto:[email protected]

mailto:[email protected]

Agenda

• Capacity planning primer: a tale of discovery

• Metrics

• Queues

• Models

Why Perl ?

• Main reason: I feel comfortable with it

• Ubiquitous and free

• Plenty of stable statistics modules available at CPAN

• Ultimately, it gets the job done

Capacity Planning

• Is just like sex...

• Everyone wants to do it

• Many say they’re doing it

• You always exaggerate how much of it you’re doing

• Most people aren’t actually doing it (despite their best efforts)

• Everybody else seems to be doing more than you

A tale of discovery

There once was a system administrator...

A tale of discovery

How many ?

Actual capacity ? Servers do we need ? How much memory ?

What’s the predicted growth ? IO Capacity ?

Typical Performance Metrics

• Load Average - uptime

• The single most misunderstood metric

• CPU - mpstat

• IO - iostat

• Memory Usage - vmstat

Typical Performance Metrics

Time series chartsI’m looking at you cacti huggers !

• Time series performance data is useful for:

• Troubleshooting

• Simplistic forecasting

• Find trends

• Identify seasonal behavior

• This left alone is NOT Capacity Planning

Frustration

• Computer systems can be harsh

• Most systems will not scale linearly

• Diminishing returns and lock contention will punch you in the face

• “Oh but I’ve checked cacti and the CPU was 25% idle”

Let’s put it in the Cloud

• We are moving back to an utility computing model

• You’re charged per usage

• Even more important to care about capacity planning !!!

Call the experts

• Cost per MIPS

• IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972

• Average purchase price: $ 771,000*

• No disks or peripherals included

• $ 4,082,039 by 2011

• Need to squeeze every drop of processing power

* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html

http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html

http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html

QueuesThe not so typical performance metrics

• 1961 - CTSS was first demonstrated at MIT

• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC

• Another offspring of Project MAC is Multics

Computer System


CPU

Disks


S

Open/ClosedNetwork

(A) λ

WR

X

A Arrival Count

λ Arrival Rate (A/T)

W Time spent in Queue

R Residence Time (W+S)

S Service Time

X System Throughput (C/T)

C Completed tasks count

(C)

Arrival Rate (λ)

• Pretty straightforward

• Requests per second/hour/day

• Not the same as throughput (X)

• Although in a steady state:

• A = C as T →∞

• λ = X

Service Time (S)

• Time spent in processing

• Web server response time

• Total query time

• IO operation time length

Mythical Performance

• Not gonna happen...

• Don’t believe vendor’s sales pitch

• “In God we trust, all others must bring data” - William Edwards Deming

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!" (" $!" $(" %!" %(" &!" &(" '!" '("

!"#$%&"'(%)"'*+,'

-##%$./'0.1"'*2%1+3+,'

!"#$%&"'(%)"*+,'

*+,-./+"0.1+234"

Mythical Performance

• Not gonna happen...

• Don’t believe vendor’s sales pitch

• “In God we trust, all others must bring data” - William Edwards Deming

How to measure ?

• Apache: %D in mod_log_config

• nginx: $request_time in HttpLogModule

• use Benchmark;

• tcprstat - http://goo.gl/0cbYx

• collectd - http://goo.gl/OXKG7

• metrics - http://goo.gl/gQFVM

• sysstat - http://goo.gl/2aLul

http://goo.gl/0cbYx

http://goo.gl/0cbYx

http://goo.gl/OXKG7

http://goo.gl/OXKG7

http://goo.gl/gQFVM

http://goo.gl/gQFVM

http://goo.gl/2aLul

http://goo.gl/2aLul

How to measure ?

[02/Jul/2010:14:00:18... 1863

my ($date,$svctime) = (m/\[(\S+).+?\s(\d+)$/);

$arrivalRate{$date}++;

$serviceTimeAcc{$date} += $svctime;

Time to serve the request,

in μseconds.

use Chart::Clicker;

use Chart::Clicker;

Average Hits/s = 65.142Average Svc time = 0.0159

What to look for ?

• Stretch factor

• Method/Operation

• Geolocation

• Cookies

• Use mod_logio to measure inbound traffic as well

Modeling

Prediction is very difficult, especially if it’s about the future.

Niels Bohr

Capacity planning is about setting expectations. Even wrong expectations are better than no expectations!

Neil J. Gunther - The Guerrilla Manifesto

http://goo.gl/lZKWH

http://goo.gl/lZKWH

http://goo.gl/lZKWH

Modeling

• A model is an abstraction of a complex system

• A model allows us to observe phenomena that cannot be easily replicated

Modeling Methods

• Statistics / Trending / Forecasting

• Pros:

• Easy to understand

• Tools readily available

• Cons:

• Hard to create “What-if” scenarios

• Hard to predict contention and bottlenecks

Modeling Methods

• Queuing Analisys

• Pros:

• Allows you to make predictions when no production data is available

• Allows you to create “What-if” scenarios

• Cons:

• Sometimes it can be unintuitive

• The math behind it can be difficult

Queues as modelsTypical LAMP Stack

Clients

Apache Application Database

Requests Replies

Queues as modelsWhat if ?

Clients

Apache Application Database

Requests Replies

Cache

Queues as modelsWhat happens if we use a 15k RPM disk ?

CPU Disk 10k RPM

Queues as modelsm1.small ? m1.large ? m1.xlarge ?

Memory Bus

Virtual Cores X EC2 CU

use pdq;

• Available at http://goo.gl/s98wQ (not on CPAN)

• PDQ is a queuing circuit solver by Neil J. Gunther

• There’s a whole book about it http://goo.gl/9MA2c

http://goo.gl/s98wQ

http://goo.gl/s98wQ

http://goo.gl/9MA2c

http://goo.gl/9MA2c

use pdq;

CreateNode() Define a queuing center

CreateOpen() Define a traffic stream of an open circuit

CreateClosed() Define a traffic stream of a closed circuit

SetDemand() Define the service demand for each of the queuing centers

use pdq;

CEN Queuing Center

DLY Delay Center

Node Types

use pdq;

FCFS First-come first-served

LCFS Last-come first-served

ISRV Infinite Server

PSHR Processor Sharing

Service Disciplines

use pdq;

• Apache Web Server

• Average Network RTD: 0.00921 seconds

• Added as a delay center in the circuit

• Average Arrival Rate: 65.142 hits/s

• Average Service time: 0.0159 seconds

• 128 worker threads

use pdq;

$workload = "httpd";

$httpMaxClient = 128;

pdq::Init("web server");

$arrivalRate = 65.142;

$serviceTime = 0.1159;

$pdq::streams = \

pdq::CreateOpen($workload,$arrivalRate);

pdq::Report();

Metric Value Unit

------ ----- ----

Workload: "httpd"

Number in system 8.0279 Trans

Mean throughput 65.1420 Trans/Sec

Response time 0.1232 Sec

Stretch factor 1.0626

pdq::Report();

Bounds Analysis:

Max throughput 1104.4003 Trans/Sec

Min response 0.1160 Sec

pdq::Report();

• Average request size: 145 KBytes

• ~ 1160 Kbits

• @1104 transactions / second:

• 1,280,640 Kbits /s ~ 1.28 Gbps

Resources and References

• CMG Public Proceedings: http://www.cmg.org/proceedings/

• Measure IT:http://www.cmg.org/measureit/

• Guerrilla Capacity Planninghttp://www.perfdynamics.com/Classes/Outlines/guerilla.html

http://www.cmg.org/proceedings/

http://www.cmg.org/proceedings/

http://www.cmg.org/measureit/

http://www.cmg.org/measureit/

http://www.perfdynamics.com/Classes/Outlines/guerilla.html





• Performance by Design - Menasce, Dowdy, Almeida - http://amzn.to/mpqfVO

• Capacity Planning for Web Performance: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida - http://amzn.to/lOATba

• Capacity Planning for Web Services: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida - http://amzn.to/iClpsB

http://amzn.to/mpqfVO

http://amzn.to/mpqfVO

http://amzn.to/lOATba





• Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services - Neil Gunther - http://amzn.to/kfrfLK

• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling - R. K. Jain - http://amzn.to/jqud1I

http://amzn.to/kfrfLK

http://amzn.to/kfrfLK

http://amzn.to/jqud1I

http://amzn.to/jqud1I

Any questions ?

capacity planning for fun & profit

Technology