cmg 101 - understanding performance

Post on 13-Dec-2014

511 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Web performance is good, understanding performance is better. What you need to understand in order to be able to have IT systems that perform well at a reasonable cost.

TRANSCRIPT

Performance is good, Understanding performance is better

Peter HJ van EijkChairman NLCMG

A non-profit community of professionals

Feb 11, 2012

CMG 101Computer Cloud Measurement Group

Understand:• Definitions of availability and response time• Psychological and business effect of delay/response time. User interfaces, cost of

downtime • Transactions, and their structure. • Waterfall diagrams for transactions and web page downloads• Performance measures (seconds, bytes, bits per seconds, IOPS, etc).• Reporting measures / metrics. • Visualization of quantitative data, how to• Resources (CPU, memory, disk, network, software) • Elementary queuing theory• Phases in development and how to incorporate performance and capacity (analysis,

design, etc.), performance engineering• Typical free and commercial tools, or at least their functionality

– monitoring, reporting, alerting, analysis, modelling

Availability and Response Time

• Availability: Ability of a Configuration Item or IT Service to perform its agreed Function when required. […] Availability is usually calculated as a percentage.

• Response Time: A measure of the time taken to complete an Operation or Transaction

Graphs of availability and response time

Psychological and business cost of downtime

€ + $ + £

Sudden surges can kill you1-

jan-

0819

-jan-

086-

feb-

0824

-feb-

0813

-Mrt

-200

831

-Mrt

-200

818

-apr

-08

6-m

ei-0

824

-mei

-08

11-ju

n-08

29-ju

n-08

17-ju

l-08

4-au

g-08

22-a

ug-0

89-

sep-

0827

-sep

-08

15-o

kt-0

82-

nov-

0820

-nov

-08

8-de

c-08

26-d

ec-0

813

-jan-

0931

-jan-

0918

-feb-

0908

-Mrt

-200

926

-Mrt

-200

913

-apr

-09

1-m

ei-0

919

-mei

-09

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000 Pageviews

Pageviews

Page

view

s

Bron: SiteStat

IceSave failure

KNMI.nlPageviews per hour

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

30-dec

31-dec

Ordinary day

Weather alarm day

Transactions and their structure waterfall diagrams

Query

Ack

ServerClient

Ack

Reply

Netwerk latency

Serverturnaround time

Yslow detail

A single user level transaction decomposes into multiple transactions on components

© Digital Infrastructures

9

Transactions: from visits to bandwidth

Visits

Pageviews

GET requests

Bandwidth

7,42 pageviews per bezoek (volgens SiteStat), echter lager tijdens crisis

Circa 6800 bytes per request gemiddeld

Sitestat meting

Sitestat meting, ServerlogsPageopbouw via FireBug

HTTP Serverlogs

HTTP Serverlogs

10,6 (=79/7,42) GET/pageview effectief32 GET voor homepage (volgens browser)

79 GET per bezoek volgens logfile en Sitestat

1,7 visits/sec

6.380 /uur

13 pageviews/sec

47.338 /uur

140 requests/sec

0,95 Mbyte/sec

7,6 Megabit/sec

How to diagnose a problem, where to look? Resource = capacity

WAN LinkWAN Link

SANSAN

End to endEnd to end

Router Switch (CPE)

Router Switch (CPE)

NASNAS

(Test) client(Test) client

Firewall, ProxyFirewall, Proxy

LAN switchesLAN switches

Load BalancerLoad Balancer

HTTP front endHTTP front end

MySQL DBMySQL DB

Users

Application

Network

Network lines

Server

Example breakdowns

Na het uitvragen van de medewerkersnummers (er zijn 373 Janssen’s), worden dienstverbanddetails per stuk uitgevraagd (in totaal 612). Dit leidt op het GBO LAN tot 30 sec doorlooptijd (gemeten).

Op basis van 50 mSec roundtrip op het WAN

Resource contribution to response time, modeling different resource allocations

Modelling different network bandwidth’s effect on response time

0 100 200 300 400 500

GBO

ICTRO 2Mb

256K

64K

Server tijd (sec) Client tijd (sec)

Netwerk tijd delay (sec) Netwerk tijd bandbreedte (sec)

Excessive client/server chatter leads to a user interaction time of more than 7 minutes!

How much faster will this be with?•Very fast network/•Very fast client / •Very fast server

Queuing theoryD

ela

y f

acto

r

0

2

4

6

8

10

12

10% 20% 30% 40% 50% 60% 70% 80% 90%

Utilisation

 Response depends on capacity At higher loads, congestion can set in

Traffic load

Actu

al

thro

ug

hp

ut

Congestion

Perfect

Sweet spotSweet spot

Sw

eet

spot

Sw

eet

spot

So what was the bottleneck?

• KNMI: static page served from database 1000/sec

• Ministry: very chatty client/server interaction• DNB: JSP application server serves static

content• Anne Frank: many, large digital assets, no use of

CDN• Hospital information system: client (front-end)

code

How to incorporate performance in development and operations

Typical free and commercial tools and their functionality

Functionality• Monitoring• Reporting• Alerting• Analysis• Modelling• Etc …

Example tools• Nagios• Cacti• WatchMouse• PDQ• R• Yslow• …

CMG 101• We want to develop a ‘standard’ body of

knowledge– To educate our people– Speak more of the same language– Enable tool vendors to more easily express their

offerings• Note: defining what is in the course is not the

same as developing a course

Call for Action

• Want to know more?• Want to collaborate, contribute?• Want to get a course?• Want to sponsor?

• Talk to mePeter HJ van Eijk@petersgriddle

inbox@peterhjvaneijk.nl +31 2268 4939

www.nlcmg.nl NLCMG is a chapter of CMG.org

Some of my performance projects

• KNMI (Weather service): website meltdown after weather emergency (“weeralarm”)

• DNB (Dutch Banks Authority): website meltdown during 2008 financial crisis

• Unnamed Ministry: information system with multi-minute response times

• Crisis.nl: ….• Anne Frank website: … anticipated surge after major

redesign• Hospital information system: storage sizing

http://zoom.nl/foto/1713577/portret/cloudwatch.html

Achtung alles Lookenspeepers! Nur watchen das Cloud.

How does a financial IT crisis look like?

Fernando’s office (bank’s capacity planner)

top related