getting started in cp ho - the conference exchange · share august 2011 1 getting started in (z/os)...

SHARE August 2011

1

Getting Started in (z/OS) Capacity Planning

(Topics in Capacity Planning)

Ray Wicks

561-236-5846

[email protected]

BibliographyRay has spent most of his career at IBM in the performance analysis and capacity planning end of the business in Poughkeepsie, London, and now at the Washington Systems Center. He is the major contributor to IBM’s internal PA & CP tool zCP3000. This tool is used extensively by the IBM services and technical support staff world wide to analyze existing zSeries configurations (Processor, storage, and I/O) and make projections for capacity expectations.

Ray has given classes and lectures worldwide. He was a visiting scholar at the University of Maryland where he taught part time at the Honors College.

He won the prestigious Computer Measurement Group’s A.A. Michelson award in 2000. His recent virtual sessions “Getting Started in Performance Analysis & Capacity Planning” workshop held for attendees in China and India was well accepted.

SHARE August 2011

2

Trade Marks, Copyrights & StuffMany terms are trademarks of different companies and are owned by them.

On foils that appear in this presentation are not in the handout. This is to prevent you from looking ahead and spoiling my jokes and surprises. Also foils added afterI made handouts.

This tutorial is a two part introductory level session designed to introduce the student to the concepts required for Performance Analysis and Capacity Planning.

Emphasis is placed on large processor systems and examples will be largely drawn from z/OS but the concepts apply to all operating systems and hardware. Topics:

Abstract

Conceptual and Perceptual structures for performance analysis and capacity planning,

Using the Forced Flow law in PA & CP

Performance Analysis queries for capacity planning

Processor performance data (ITRRs & MIPS),

Resource Metrics for use in the Balance System model,

Using the utilization growth process in capacity planning,

A simple view of analytic modeling in CP

SHARE August 2011

3

PhilosophyIn this presentation I have stressed the principles involved in performance analysis and capacity planning rather than the specific development of formulae and hard details in the hardware and software architectures and implementations. The formulae can be found in many text books. It is the intuitions behind the formulae which is important. There is a difference between the formula and understanding what the formula is telling you.The architectures and implementations will change over time. It is more important to know how to look and evaluate a system the counts. If you want hardware and software details, you look to the vendors for that.

Capacity Planning

Capacity Planning ensures that Adequate resources are available for the Workload to complete in an Appropriate time.

Performance Analysis is Short Term (3-7 Days) Capacity Planning is Long Term (6-24 Months) What's the Workload using now? What’s Appropriate? Adequate? Service Level Objective or Service Level Agreement Things are ordered by Priority or Importance Discretionary Workloads?

SHARE August 2011

4

CP QuestionsEASY Do I have enough resource (CPU, I/O, Storage,..) to do the job today? If not, who’s suffering? If I get more, who will be helped? How much? If I need more, when will it be? How much more? Can I use specialized Processing Units? What variables should I track? Do I have any latent demand?

Harder Do I want faster or more CPs? How do I establish my growth? How do I size a new application? What tools should I use? Which interval do I model? If I reduce the #CPs & keep the MIPS the same will there be a problem?

CP FlowKnowledge Preparation Understand Environment Workload Characterization Business Units Application Programs H/W ResourcesModel Preparation Data Acquisition Representative CalibrationModel Execution Workload ForecastingModeling Output Future Requirements H/W & S/W Speeds & Feeds ArchitectureBusiness Negotiation Investment & Configuration

Data Modelλ,Traffic, I/O, #clients,

etc.

Business ModelForecast

Cost

SHARE August 2011

5

Topology

I BM 6611

IBM 66 11

I BM 6611

IBM 6611

LANWAN

Client

Local Server

Front End Server

Back End, DB Serverz/OS

For each node: # Servers? Utilization? Servicetime? Queuing? Response Time? Traffic? Etc.?

--------- PARTITION DATA ----------------- -- LOGICAL PARTITION PROCESSOR DATA -- -- AVERAGE PROCESSOR UTILIZATION PERCENTAGES -

----MSU---- -CAPPING-- PROCESSOR- ----DISPATCH TIME DATA---- LOGICAL PROCESSORS --- PHYSICAL PROCESSORS --

NAME S WGT DEF ACT DEF WLM% NUM TYPE EFFECTIVE TOTAL EFFECTIVE TOTAL LPAR MGMT EFFECTIVE TOTAL

AQFT A 215 0 522 NO 0.0 11.0 CP 03.59.05.986 03.59.50.403 72.45 72.68 0.06 18.98 19.03

VICTEST A 3 0 3 NO 0.0 5 CP 00.01.24.297 00.01.35.116 0.94 1.06 0.01 0.11 0.13

VMTOOL1 A 315 0 549 NO 0.0 16 CP 04.09.50.974 04.12.27.823 52.05 52.60 0.21 19.83 20.04

AQCF1 A DED 0 65 0.0 1 CP 00.29.59.893 00.29.59.929 99.99 100.0 0.00 2.38 2.38

AQHO A 2 0 3 NO 0.0 5 CP 00.01.23.451 00.01.26.815 0.93 0.96 0.00 0.11 0.11

AQLINX A 1 0 0 NO 0.0 2 CP 00.00.00.728 00.00.00.749 0.02 0.02 0.00 0.00 0.00

HOCF4 A DED 0 65 0.0 1 CP 00.29.59.895 00.29.59.938 99.99 100.0 0.00 2.38 2.38

GDLVM7 A 32 0 80 NO 0.0 3 CP 00.35.50.251 00.36.45.931 39.82 40.85 0.07 2.84 2.92

POKVMXA1 A 32 0 390 NO 0.0 6 CP 02.59.23.002 02.59.25.407 99.66 99.68 0.00 14.24 14.24

*PHYSICAL* 00.06.40.226 0.53 0.53

------------ ------------ ----- ----- -----

TOTAL 12.46.58.480 12.58.12.342 0.89 60.87 61.76

-

LNXVM14 A 4 2 IFL 00.01.30.569 00.01.35.358 2.52 2.65 0.13 2.52 2.65

*PHYSICAL* 00.00.17.265 0.48 0.48

------------ ------------ ----- ----- -----

TOTAL 00.01.30.569 00.01.52.624 0.61 2.52 3.13

-

AQFT A 215 2 IIP 00.00.02.237 00.00.02.271 0.06 0.06 0.00 0.06 0.06

AQHO A 2 2 IIP 00.00.01.500 00.00.01.581 0.04 0.04 0.00 0.04 0.04

*PHYSICAL* 00.00.01.220 0.03 0.03

------------ ------------ ----- ----- -----

TOTAL 00.00.03.737 00.00.05.072

CPU 2097 CPC CAPACITY 2740 SEQUENCE CODE 0000000000019F30

MODEL 742 CHANGE REASON=N/A HIPERDISPATCH=YES

H/W MODEL E56

0---CPU------------------- TIME % ----------------

LOG PROC --I/O INTERRUPTS--

NUM TYPE ONLINE LPAR BUSY MVS BUSY PARKED SHARE % RATE % VIA TPI

0 CP 100.00 76.79 76.77 0.00 100.0HIGH 52.40 34.86

1 CP 100.00 59.44 59.34 0.00 100.0HIGH 148.9 29.38

2 CP 100.00 77.13 77.11 0.00 100.0HIGH 44.24 32.59

3 CP 100.00 71.92 71.90 0.00 100.0HIGH 41.70 33.45

4 CP 100.00 77.94 77.91 0.00 100.0HIGH 49.20 35.32

5 CP 100.00 57.96 57.87 0.00 100.0HIGH 133.1 30.26

6 CP 100.00 78.32 78.30 0.00 100.0HIGH 42.10 32.89

7 CP 100.00 73.52 73.50 0.00 100.0HIGH 37.48 33.56

8 CP 100.00 76.44 76.42 0.00 100.0HIGH 16.37 34.00

9 CP 100.00 76.55 76.53 0.00 100.0HIGH 14.57 33.06

A CP 100.00 73.47 73.45 0.00 100.0HIGH 2446 4.03

TOTAL/AVERAGE 72.68 72.65 1100

3026 9.36

0 B IIP 100.00 0.10 0.10

0.00 100.0 HIGH

C IIP 100.00 0.02 0.02

0.00 98.2 MED

TOTAL/AVERAGE 0.06 0.06

198.2

Lots of DataThe SAS System

Model: MODEL1

Dependent Variable: CPU

Analysis of Variance

Sum of M

ean

Source D

F Squares

Square F Value

Prob>F

Model

1 9496.97929 9496.97929648.634

0.0001

Error

71 1039.54729 14.64151

C Total 7

2 10536.52658

Root MSE 3.82642 R

-square 0.9013

Dep Mean 31.40685 A

dj R-sq 0.8999

C.V. 1

2.18340

Parameter Estimates

Parameter Standard T

for H0:

Variable DF Estim

ate E

rror Parameter=0 P

rob >

|T|

INTERCEP 1 1.045116 1

.27348468 0.821

0.4146

DASD 1

0.072051 0.00282903

25.468

0.0001

Num_Samples Date Time Duration Box_ID CPU_Model CPS ICFS IFLS zAAPS SUPRV LPAR_Name SYS900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD

SHARE August 2011

6

Frameworks

Conceptual Framework (Model, Paradigm) What's it supposed to mean? How do things connect?

Perceptual Framework What's it supposed to look like? How do things connect?

SHARE August 2011

7

A Plumbing Problem

P u m p

P u m p

P u m p

ΣF ΣF

Fluid Flow

A Plumbing Problem

Pum p

Pum p

Pum p

ΣF ΣF

Fluid Flow

=Forced Flow Law

SHARE August 2011

8

Conceptual Framework Forced Flow Law Flow Law

Waiting Service

Waiting Service

Waiting Service

Service Center 1

Service Center 3

Service Center 2

Conceptual Framework

W aiting Service

W aiting Service

W aiting Service

Service Center 1

Service Center 3

Service Center 2

5/M inute

15/M inute If the model is divided in two and the number of transactions crossing the boundaries is as indicated, what would the forced flow law say?

SHARE August 2011

9


Waiting Service

Waiting Service

Waiting Service

1 Second

15 Minutes

1 Second

If the total time in the service center is as indicated, where would the users/transactions be?

B.S. Conceptual Framework

Disk

Disk

G KW E

Thinking

CPU

M em ory

SHARE August 2011

10

Conceptual Framework Implications

Users distribute themselves among nodes in proportion to the time spent at each node. The capacity of the System is determined by the slowest node (server). The resource usage (transaction rates) at various nodes are in proportion.

DASD

DASD

GKW E

Thinking

CPU

M em ory

2010 z/OS B.S. Metrics

MIPS usedS = SSCH rateDG = DASD gigabytes. The computation is nominal in that it is 2.83/actPS = Central Storage configured

S/MIPS 1.201 2.349 3.707DG/MIPS 2.236 6.593 52.539PS/MIPS 6.766 14.055 37.306

10% 50% 90%

DASD

DASD

GKWE

Thinking

CPU

Memory

MIPS

PS

SDG

SHARE August 2011

11

S/MIPS Samplesy = 2.1215x

R2 = 0.8727

0

5,000

10,000

15,000

20,000

25,000

30,000

0 2,000 4,000 6,000 8,000 10,000 12,000

MIPS Used

S

MIPS Used and SSCH are in proportion (correlated).

Good linear relationship.

PS/MIPS

MIPS Used and PS are not in proportion (correlated).

y = 15.948x - 4176.7

R2 = 0.5913

-50,000

0

50,000

100,000

150,000

200,000

250,000

0 2,000 4,000 6,000 8,000 10,000 12,000

MIPS Used

PS

SHARE August 2011

12

DG/MIPS

y = 3.5481x

R2 = -0.9825

05000

1000015000

2000025000

3000035000

4000045000

0 2,000 4,000 6,000 8,000 10,000

MIPS Used

DG DG

Linear (DG)

DG/MIPS

y = 3.5481x

R2 = -0.9825

y = 1.0133x + 344.16

R2 = 0.7620

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 5,000 10,000 15,000

MIPS Used

DG

DG

DG Used

Linear (DG)

Linear (DG Used)

SHARE August 2011

13

Archimedes

Give me a fixed point and I canmove the world.

Desperate Capacity Planning

What part of a 500 MIPS machine would DB2 use at 100 I/Os per second?How many MIPS for DB2?

S/MIPS ε [1.201, 2.349, 3.707]100/MIPS = 1.201 → MIPS = 83.26100/MIPS = 2.349 → MIPS = 42.57Answer:Between 8.5% and 16.6% (????) of a 500 MIPS Machine

SHARE August 2011

14

More Metrics DefinedMIPS used

S = SSCH rate

DG = DASD gigabytes. The computation is nominal in that it is 2.83/act

DG = Nacts * 2.83

CS = Central Storage configured

D as in CS = 4000 + 0.04(MIPS)^D

where CS (or PS) is configured Central Storage (or RAM)

NNTacts = acts with rate >=2

AD = Access Density = S/DG

Full Metrics2010 83 Partitions

10% 50% 90%MIPS 403 2004 6247S 1123 4221 11779S/MIPS 1.201 2.349 3.707DG/MIPS 2.236 6.593 52.539PS/MIPS 6.766 14.055 37.306D 1.534 1.630 1.894DASD Resp 0.952 1.865 3.626DASD Serv 0.681 1.564 2.980Resp/Serv 1.227 1.827 1.232Nacts 1305 4084 12105NNTacts 73 264 882DASD GB 3693 11558 34256Used DG 585 2114 7065AD 0.059 0.360 0.909

SHARE August 2011

15

If the Model Works, What Should I See?

y = 60.941x + 655.76

R2 = 0.8377

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60 70 80

CPU%

DA

SD

I/O

If the model is valid, the forced flow law would prescribe that the variation of CPU service would be proportional to the I/O service. In this graph that translates into a linear relationship (R2>0.7?).

When It Doesn’t Work

y = 37.324x

R2 = 0.113

0

500

1000

1500

2000

2500

3000

3500

4000

0 20 40 60 80 100

CPU%

DA

SD

I/O

If the model Doesn’t work, you would see a non-linear relationship.

SHARE August 2011

16

RIOC=F(Workload Composition)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0:00 4:48 9:36 14:24 19:12 0:00

Time

RIO

C

0102030405060708090

100

0:59

2:59

4:59

6:59

8:59

10:59

12:59

14:59

16:59

18:59

20:59

CP

U%

BATDEV2

BATDEV1

BATPROD

STCLO

STCMD STCHI

SYSSTC

SYSTEM

Compare those intervals in the RIOC plot where the RIOC is stable or unstable with the workload mixture for the same intervals in theCPU% plot. The workload mixture often explains the RIOC variation.

Use of Conceptual Framework like BS Model

Establishes Relationships Balanced System, resource ratios

Builds Expectations Linear graph

High lights Exceptions Non Linearity Outliers

Generates Questions Especially if not as expected

But... it may cause you to see Framework (model) interactions that just aren't there!

SHARE August 2011

17

Another ExamplePartition GCP MIPS

0

5000

10000

15000

20000

25000

0:00

3:15

6:30

9:45

13:0

0

16:1

5

19:3

0

22:4

5

2:00

5:15

8:30

11:4

5

15:0

0

18:1

5

21:3

0

MIP

S

CEC7/SYSJCEC7/SYSG

CEC4/SYSK

CEC4/SYSR

CEC4/SYSF

CEC6/SYSE

CEC6/SYSH

CEC6/SYSD

zIIP MIPS

0

200

400

600

800

1000

1200

1400

0:00

4:00

8:00

12:0

0

16:0

0

20:0

0

0:00

4:00

8:00

12:0

0

16:0

0

20:0

0

CEC7/SYSJCEC7/SYSG

CEC4/SYSR

CEC4/SYSF

CEC6/SYSH

CEC6/SYSD

CEC 6 UtilCEC6 GCP Util

0

20

40

60

80

100

120

0:0

0

3:3

0

7:0

0

10:3

0

14:0

0

17:3

0

21:0

0

0:3

0

4:0

0

7:3

0

11:0

0

14:3

0

18:0

0

21:3

0

SYSDSYSH

SYSE

CEC 6 zIIP Util

020406080

100120140160

0:0

0

3:0

0

6:0

0

9:0

0

12:

00

15:

00

18:

00

21:

00

0:0

0

3:0

0

6:0

0

9:0

0

12:

00

15:

00

18:

00

21:

00

Potential

SYSD

SYSH

SHARE August 2011

18

SYSD zIIP = F(GCP)?zI

IPM

IPS

R2 = 0.02

Workload zIIP = F(GCP)?

SHARE August 2011

19

Fuzzy Thinkers Do Not Trust Perfection

y = 1.9259x

R2 = 1

0

2

4

6

8

10

12

14

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00

GCPs

zIIP

s

SYSD Workload zIIP = F(GCP)?

y = 0.0844x

R2 = 0.9047

y = 0.1037x0.9262

R2 = 0.9805

0

0.1

0.2

0.3

0.4

0.5

0.6

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00

GCPs

zIIP

s

SHARE August 2011

20

CEC 7 UtilCEC 7 GCP Util

0

20

40

60

80

100

120

0:00

3:15

6:30

9:45

13:0

0

16:1

5

19:3

0

22:4

5

2:00

5:15

8:30

11:4

5

15:0

0

18:1

5

21:3

0

SYSG

SYSJ

CEC7 zIIP Util

050

100150200250300350400450

0:00

3:30

7:00

10:3

0

14:0

0

17:3

0

21:0

0

0:30

4:00

7:30

11:0

0

14:3

0

18:0

0

21:3

0

Potential

SYSGSYSJ

Another Example

y = 2.4545x

R2 = 0.3726

0

2000

4000

6000

8000

10000

12000

14000

16000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

MIPS

I/O

SHARE August 2011

21

Same Data

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0:00

1:00

2:00

3:00

4:00

5:00

6:00

7:00

8:00

9:00

10:0

0

11:0

0

12:0

0

13:0

0

14:0

0

15:0

0

16:0

0

17:0

0

18:0

0

19:0

0

20:0

0

21:0

0

22:0

0

23:0

0

IO/M

IPS


DASD

DASD

G KW E

Thinking

CPU

M em ory

Maybe we need a big processor? More engines?????

SHARE August 2011

22

Future Conceptual Framework?

DASD

GKW E

Thinking

CPU

Non-Volatale M em ory

Some Capacity Concerns at a Service Center

Type of serverNumber of serversSpeed of servers

Number in queueOrder in queueService demandType of servicePopulationArrival pattern

SHARE August 2011

23

Z900 Structure

Crypto 1 ClockCrypto 0

ETR

Cluster 1

MBA3

STI

L1

PU10

L1

PU0C

L1

PU0D

L1

PU0E

L1

PU0F

L1

PU0B

L1

PU0AMBA2

STI

L1

PU13

L1

PU11

L1

PU12

Cache control Chip and cache data Chips 16 MB L2 Shared Cache

Cluster 0

MBA0

STI

PU01 PU06PU02 PU03 PU04 PU05PU00

STI

PU09PU07 PU08

Cache control Chip and cache data Chips 16 MB L2 Shared Cache

MBA1

ETR

35 logic chips in total on a 20 PU MCM

PassatI/O Cage(Optional)

Parallel 3/4 PortOSA-2 TR

OSA-2 FDDIESCON 4 Port

ESCON 16 PortFICON 2 Port

OSA-E Gb EthernetOSA-E Fast Ethernet

OSA-E ATMISC-3 1-4 Port

PCI-CC 2 engines

CargoI/O Cage

333 MByte STIs 1 GByte STIs

ICB-2 333 MByte

ICB-3 1 GByte

Memorycard

0

Memorycard

2

Memorycard

1

Memorycard

3

L1 L1L1 L1 L1 L1L1 L1L1 L1

Ref. SG24-5975

Connections

Micro Processor(Adaptor)

Buffer

Link

SubChannels

Every line connecting two boxes in a diagram implies micro processors on each end to do the talking? (What happens if they speak different languages?) Data is moved from a buffer to micro processor buffer onto link into m-processor buffer into storage buffer.

SHARE August 2011

24

Z10 Memory – a simple view

BookMemory

L2 Cache

L 1.5

CPU

L 1

L 1.5

L 1

CPU

L 1.5

L 1

CPU

Memory

L2 Cache

L 1.5

CPU

L 1

L 1.5

L 1

CPU

L 1.5

L 1

CPU

PR/SM

The Nest

Mem

ory

Hie

rarc

hy

or

N

est

Inst

ruc

tio

n

Co

mp

lexi

ty-

Mic

rop

roce

sso

r d

esig

n

Reference: John Burg’s presentation at SHARE 3/3/2011

http://www/ibm.com/support/techdocs/atsmastr.nsf/Webindex/TC000066

L1L1

192MB eDRAMShared L42 SC Chips

LRU Cast-OutCP StoresData Fetch Return

24MB eDRAMShared L3

L1

24MB eDRAMShared L3

24MB eDRAMShared L3

L1 L1 L1 L1

24MB eDRAMShared L3

24MB eDRAMShared L3

24MB eDRAMShared L3

PU0 PU1 PU2 PU3 PU4 PU5

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

1.5MBL2

zEnterprise

SHARE August 2011

25

49

z10 ECCPU

4.4 GHzCaches

L1 private 64k i, 128k dL1.5 private 3 MBL2 shared 48 MB / bookbook interconnect: star

z196CPU

5.2 GHzOut-Of-Order execution

CachesL1 private 64k i, 128k dL2 private 1.5 MB L3 shared 24 MB / chipL4 shared 192 MB / bookbook interconnect: star

z196 versus z10 hardware comparison

...

Memory

L4 Cache

L2

CPU1

L1

L3 Cache

L2

CPU4

L1... L2

CPU1

L1

L3 Cache

L2

CPU4

L1...

...

Memory

L2 Cache

L1.5

CPU

L1

L1.5

CPU

L1

L1.5

CPU

L1

Introducing the new Relative Nest Intensity (RNI) metric

(SMF 113 Data)

Note these Formulas may change in the future

L1

The “Nest”

L2LP L2RP MEMP

Relative Nest Intensity

Microprocessor Design Memory Hierarchy or Nest

How Often?

L1MP

RNI

Distribution and latency across technology

How intensely this part of the architecture is utilized

SHARE August 2011

26

Level 1 (L1) Miss Percent

If not from L1, from Where? (SMF 113 Data)

Here's the plot of percent sourcing from different levels of

cache. As the sourcing moves from the highest level of

cache (percent=L15P) to the slowest memory source

(percent=MEMP), the performance degrades. Level 1

cache is the fastest and closest to the processing unit.

The sourcing shown in the graph is for data not found in

level 1 cache. You can check the level 1 cache miss % by

graphing variable L1MP.

L1.5P=%sourced from level 1.5 cache

L2LP=%sourced from level 2 cache same book

L2RP=%sourced from level 2 cache different book

MEMP=%sourced from memory

Remember than as more and more of the instructions and

data has to be fetched from more distance caches, the

machine effectively runs slower.

SHARE August 2011

27

Relative Nest Intensity (RNI)z10 RNI=(1.0*L2LP + 2.4*L2RP + 7.5*MEMP) / 100

Z196 RNI=1.6*(0.4*L3P + 1.0*L4LP + 2.4*L4RP + 7.5*MEMP) / 100

Memory and Workload Characteristics

HIGHAVERAGE

>=0.75< 0.75

>6%

HIGHAVERAGELOW

>1.00.6 to 1.0< 0.6

3% to 6%

AVERAGELOW

>= 0.75< 0.75

<3%

Workload Hint

RNIL1MP

Note that these are initial values and may change.

SHARE August 2011

28

Hiper Dispatch

Problem of cache misses is alleviated by controlling the dispatch of LCPs on specific RCPs.

Keep LCP-RCP on the same book.

Minimize PRSM dispatching.

Keep LCP on same RCP.

Re-dispatch work units on same processor subset.

zPCR

SHARE August 2011

29

zPCR

Establish Power value in MIPS?

SHARE August 2011

30

Processor Power: zPCR

Configuration Input

Manual

RMF Listing

EDF file (CP3KEXTR)

Processor Power: zPCRMaximum: MIPS available if other partitions are idle given logical configuration.

Minimum: MIPS entitled to if other partitions are demanding their fair share (Weight) given the logical configuration.

2097-E56 summary with this logical configuration

SHARE August 2011

31

Capacity Planning

SLO

Time

Resource Usage

t0

CP Actions

Starting from a single point at t0, we project growth until some threshold is reached– a Service level Objective or Agreement (SLO, SLA).

Then we take action.

SLO or SLA for a Workload Group

Interactive

For some number of users

At some threshold transaction rate

At some threshold Response Time

At some amount of power & I/O

Batch

For some number of Jobs

At some rate

At some about of power & I/O

At some turn-around threshold

SHARE August 2011

32

Capacity Planning Actions

Upgrade Hardware Add CPs (PUs) New Model Add Another CPC

Move Workload to another image Split Workload and move a piece Tune it? Continue to Suffer

How Accurate Is It?

Time

Prediction

t0

Starting from an initial point of maybe dubious accuracy, we apply a growth rate (also dubious) and then recommend actions costing lots of money.

SHARE August 2011

33

Accuracy

Timet0Time

Prediction

t0

Accuracy is found in values that are close to the expected curve. This closeness implies an expected bound or variation in reality. So a thicker line makes sense.

Fuzzy Patches

Time

Prediction

t0 t

p

Time

Prediction

t0 t

p

At time t, is the prediction a precise point p or a fuzzy patch?

SHARE August 2011

34

Fuzzy Factors

Basis for prediction is a single sample taken from a set of samples with some distribution.

Growth Factor applied may be just better than fiction.

Prediction compounds the fuzz and is itself fuzzy.

Niels Bohr: “Prediction is very hard to do. Especially about the future.”

Analytic ExampleErlang’s M/M/c

Q

S

c = Number of CPsU = UtilizationT= traffic or c*UC(c,T) is Erlang's C formulaE[s] is expected service time

From any queuing theory book: Arnold or Jain for example.

E[RT] = E[S] + E[Q]

E[RT] = E[S] + C(c,T)E[S]c(1-U)

E[RT] = E[S] 1 + C(c,T) c(1-U)

SHARE August 2011

35

Erlang’s M/M/c

Q

S

c = Number of CPsU = UtilizationT= traffic or c*UC(c,T) is Erlang's C formulaE[s] is expected service time

E[RT] = E[S] + E[Q]

E[RT] = E[S] + C(c,T)E[S]c(1-U)

E[RT] = E[S] 1 + C(c,T) c(1-U)

Read as Contention Factor. When CF=0, E[RT] = E[ST]When CF=1, E[RT] = 2* E[ST]

M/M/1E[RT] = E[S] + E[Q]

E[RT] = E[S] + C(c,T)E[S]c(1-U)

E[RT] = E[S] {1 + C(c,T) }c(1-U)

E[RT] = E[S]1-U

IF E[S] = 30 Ms. And U=80%Then E[RT] = 30/(1-0.8) = 150

C=1

SHARE August 2011

36

M/M/1 Exercise

Assume M/M/1.

(1) What is the expected RT for Medium?

(2) At what effective utilization would the response time for medium exceed 1.2 seconds?

12%

45%

32%

Workload Utilization

89%1.32 MinLow

77%0.25 secMedium

32%0.05 secHi

Perceived Utilization

Service Time

Workload

M/M/1 Exercise

12%

45%

32%

Workload Utilization

89%1.32 MinLow

77%0.25 secMedium

32%0.05 secHi

Perceived Utilization

Service Time

Workload

Assume M/M/1.

(1) What is the expected RT for Medium?

(2) At what effective utilization would the response time for medium exceed 1.2 seconds?

RT = ST / 1-U RT = 0.25 / 1- .77RT= 1.1

RT = ST / 1-U 1.2 = 0.25 / 1-UU= 79%

SHARE August 2011

37

Simple Capacity PlanProblem: For the following workload, find the future workload utilization by month if the per annum growth rate is 30%.

Base# LCPs 4MIPS 1000MIPS/LCP 250

Target# LCPs 6MIPS 1200MIPS/LCP 200

Base → Target More MIPS More Engines Slower Engines

Hi 40.0Middle 25.0Low 20.0

Total 85.0

Growth ComputationsUtilizationInput Projected

Workload Jun-06 Jul-06Hi 40.0 40.9Middle 25.0 25.6Low 10.0 10.2

Total 75.0 76.7

PA Growth G 30

Period Length L 1

Period F 1.022104451

*F

(0.01 * G)L/12

in Months

Per Annum

1.0221044512 = 1.3

SHARE August 2011

38

Base UtilizationWorkload Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08

Hi 40.0 40.9 41.8 42.7 43.7 44.6 45.6 46.6 47.6 48.7 49.8

Middle 25.0 25.6 26.1 26.7 27.3 27.9 28.5 29.1 29.8 30.4 31.1

Low 20.0 20.4 20.9 21.4 21.8 22.3 22.8 23.3 23.8 24.3 24.9

Total 85.0 86.9 88.8 90.8 92.8 94.8 96.9 99.1 101.2 103.5 105.8

PA Growth 30Period 1Period F 1.022104#CPs 4

0.0

20.0

40.0

60.0

80.0

100.0

120.0

Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08

CP

U%

LowMiddleHi

Base Response TimeWorkload Service T Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07

Hi 5 40.0 40.9 41.8 42.7 43.7 44.6 45.6 46.6 47.6 48.7 49.8

Middle 10 65.0 66.4 67.9 69.4 70.9 72.5 74.1 75.7 77.4 79.1 80.9

Low 10 85.0 86.9 88.8 90.8 92.8 94.8 96.9 99.1 101.2 103.5 105.8Hi 5.2 5.2 5.2 5.2 5.3 5.3 5.3 5.3 5.4 5.4 5.4Middle 12.5 12.8 13.1 13.4 13.8 14.3 14.8 15.4 16.1 17.0 18.0Low 21.5 23.8 27.0 31.7 39.2 52.8 85.6 269.6 0.0 0.0 0.0

0.0

20.0

40.0

60.0

80.0

100.0

70.0 80.0 90.0 100.0

CPU%

Re

spo

ns

e M

s

HiMiddleLow

SHARE August 2011

39

Migration Options Migrate to same MIPS and more CPs. Migrate to same MIPS and fewer CPs. Migrate to more MIPS and fewer CPs. Migrate to more MIPS and more CPs.

Utilization?Service Time?Evaluation?

Target UtilizationWorkload Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08Hi 33.3 34.1 34.8 35.6 36.4 37.2 38.0 38.8 39.7 40.6 41.5Middle 20.8 21.3 21.8 22.2 22.7 23.2 23.8 24.3 24.8 25.4 25.9Low 16.7 17.0 17.4 17.8 18.2 18.6 19.0 19.4 19.9 20.3 20.7

Total 70.8 72.4 74.0 75.6 77.3 79.0 80.8 82.5 84.4 86.2 88.1

PA Growth 30Period 1Period F 1.022104#CPs 2

0.0

20.0

40.0

60.0

80.0

100.0

Jun-07 Jul-07 Aug-07

Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08

CP

U%

LowMiddleHi

SHARE August 2011

40

Target Response TimeWorkload Service T Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08Hi 8.333333 33.3 34.1 34.8 35.6 36.4 37.2 38.0 38.8 39.7 40.6 41.5Middle 16.66667 54.2 55.4 56.6 57.8 59.1 60.4 61.8 63.1 64.5 65.9 67.4Low 16.66667 70.8 72.4 74.0 75.6 77.3 79.0 80.8 82.5 84.4 86.2 88.1Hi 9.4 9.4 9.5 9.5 9.6 9.7 9.7 9.8 9.9 10.0 10.1Middle 23.6 24.0 24.5 25.0 25.6 26.3 26.9 27.7 28.6 29.5 30.5Low 33.4 35.0 36.8 38.9 41.4 44.4 47.9 52.3 57.8 65.0 74.7

0.0

20.0

40.0

60.0

80.0

100.0

70.0 80.0 90.0 100.0

CPU%

Res

po

ns

e M

s

HiMiddleLow

Compare Utilization

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

70.0 80.0 90.0 100.0

CPU%

Res

po

ns

e M

s

HiBaseMidBaseHiTargMidTargLowBaseLowTarg

SHARE August 2011

41

Compare Transaction Rate

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

Jun-06

Jul-06 Aug-06

Sep-06

Oct-06

Nov-06

Dec-06

Jan-07

Feb-07

Mar-07

Apr-07

Period

Res

po

nse

Ms

HiBaseMidBaseHiTargMidTargLowBaseLowTarg

For each period, Base and Target have same transaction rate.

How to Compare?

0.010.020.030.040.050.060.070.0

70.0 80.0 90.0 100.0

CPU%

Res

po

nse

Ms

LowBaseLowTarg

Same utilization

Same TransRate

SHARE August 2011

42

Modeling Restrictions At CPU contention points, WLM and IRD in z/OS can restructure the physical & logical configuration. At I/O contention points in z/OS, PAVs can be moved around. Model arrival rates and service distributions can vary significantly. Priority ordering varies under WLM control In Sysplex, transactions could be routed to other partitions. Software serialization points usually not modeled. Model output results strictly mimic the model. If you miss something important, the results may not reflect your environment. Various models differ greatly in detail. Don’t miss an important service center.

CP Alternatives

ROTs

Trending

Analytic Modeling

Simulation

Benchmark

Difficulty (Time) & Cost ($$$)

Accuracy

SHARE August 2011

43

Rules of Thumb

ROTs (thresholds) are often quite adequate for CP

Useful for Health Check

Rules of Thumb

• Honor your Father & Mother

• Do unto others as you would have them do unto you.

• Do unto others before they do unto you.

• Keep your CPU%<90%

• Don’t swim soon after eating.

• It is better to give than receive.

SHARE August 2011

44

Philosophical Remark

We understand a Rule by trying to break it.Or

Learn the rules so you know how to break them correctly.

All swans are white ≡ There does not exist a swan which is not white

Balanced System ROTs

DASD

DASD

GKW E

Thinking

CPU

M em ory

MIPS used, memory used, I/O used should be in some proportion.

SHARE August 2011

45

Expected Bounds

The resource ratio is shown as a bar. If the bar is above the 90%ile line, it means that the value was in the top 10% of the samples reviewed. Similarly, if the bar is below the 10%ile line, the value is in the bottom 10%. Neither is good or bad., it’s an flag to examine the amount of resource available.

Trending

Trending predicts the future if the future looks like the past.

Time Series Trending is complicated.

Trending can answers overall CP questions.

SHARE August 2011

46

Analytic ModelingDASD

DASD

GKWE

Thinking

CPU

Memory

Pre-built packages can be fast to solve and relatively easy to use. Flow is statistically driven and usually predefined. Accuracy? Utilization within 5% Response times within 30%

Data acquisition is key. Calibration can be tough. Custom analytic models are really tough. Requires technical staff. Services are Available.

ROTs & Analytic Modeling

Res

po

nse

Tim

e

System Utilization

Low

Middle

Hi

0.0

20.0

40.0

60.0

80.0

100.0

120.0

Jun-07

Jul-07 Aug-07

Sep-07

Oct-07 Nov-07

Dec-07

Jan-08

Feb-08 Mar-08

Apr-08

CP

U%

Low

Middle

Hi

The relationship between Utilization and Server Response is sensitive to the priority of the workload. Utilization in Response time is “perceived utilization”. Watch out for: Logical vs physical utilization and single task workloads.

SHARE August 2011

47

Simulation Pre-built packages are slower to solve and can be relatively easy to use. Flow is statistically driven and usually predefined but can be customized. (Application modeling.) Accuracy?

Utilization within 5% Response times within 30%

Data acquisition is key. Calibration can be tough. Custom models are build from service center building blocks. Simulation languages do exist. Specialized staff. Services exist.

DASD

DASD

GKWE

Thinking

CPU

Memory

Benchmark

A lot of work in preparation Hardware/SoftwareWorkload Lot's of time.

It does mimic the running environment the best. Software flow & queuing Software usage It's expensive. Variations limited by resources. Given the resources the benchmark can be complicated. Tests the environment - does it work?

SHARE August 2011

48

CP Alternatives

Difficulty

Accuracy

ROTs

Benchmark

AnalyticModel

Simulation

Trending

What questions do you have?What questions must you answer? Cost of the answer? Cost of getting it wrong? Time line?What happens if you get it wrong?

Things to Remember

• Be aware of the exceptions to the rule.• A framework helps but it can make you see things

that just aren't there.• Impeccable mathematics does not replace

knowledge of the facts.• Protect yourself.• Business decisions can override technical issues.• Sometimes being understood is more important than

being very accurate.• Being "very" accurate may be a luxury of the idle.• Other than the technicalities, there may be a hidden

agenda.

SHARE August 2011

49

Bibliography - IThe Art of Computer Systems Performance Analysis, by Raj Jain, Wiley. I like this one. It is thorough and complete. A very good reference.Capacity Planning for Web Performance, by Daniel A. Menasce and Virgilio A.F. Almeida, Prentice Hall. A good book on network structure and terminology and introduction to the topic.Probability, Statistics, and Queuing Theory, by Arnold O. Allen, Academic Press Inc. This is the classic in queuing theory.Performance by Design: computer capacity planning by example. By Daniel A. Menascé, Virgilio A. F. Almeida, and L. W. Dowdy. The web site http://cs.gmu.edu/~menasce/perfbyd/ has a lot of .xls modeling worksheets. MVS I/O Subsystems, by Gilbert E. Houtekamer and H. Pat Artis, Performance Associates. More than you want to know about the I/O subsystem. A definitive source but is a little out of date. Is available from Intellimagic or perfassoc.com. Exploring IBM S/390 Computers, by Jim Hoskins and George Coleman, Maximum Press. A general introduction to S/390 hardware and architecture. (with IBM G326-3006-06)Statistical Concepts and Methods, by Gouri Bhattacharyya and Richard A. Johnson, John Wiley & Sons.The Practical Performance Analyst, by Neil J. Gunther, Authors Choice Press. A very good book.Almost any volume of the Computer Measurement Group (CMG) Proceedings is worth

looking at for performance and capacity planning articles. Web Site: http://www.cmg.org/measureit/

Bibliography - IIGC28-1761 MVS™ Planning: Workload Management. A guide to WLM.SC28-1950 Resource Measurement Facility Report Analysis. A guide to report reading.SC28-1951 Resource Measurement Facility Performance Management Guide. A good tutorial to get started.SG24-5975 IBM zSeries 900 Technical Guide. A good hardware architecture and implementation Red Book.LY28-1042 RMF™ Support for LPAR Management Time. Want to know how LPAR works?SC28-1187 Large Systems Performance Reference by John Fitch. John goes into detail about the LSPR data.SG24-4356 System/390® MVS Parallel Sysplex Performance. A good Red Book on Parallel Sysplex RMF reports and data.SG24-4680 System/390 MVS Parallel Sysplex Capacity Planning . A good Red Book on the function and capacity of Parallel Sysplex.

A great URL for z/Series documents in general: http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/RMF in particular:http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/r4pdf/rmf.htmlFor zPCR,search www.ibm.com for “zPCR” & “SoftCap”

SHARE August 2011

50

Bibliography - IIIEXCEL:Applied Statistics For Engineers and Scientists Using Excel and MINITAB, by David Levine, Patricia Ramsey, Robert Smidt, Prentice Hall. This comes with a CD containing handy Excel Add-Ins.Excel Data Analysis by Jinjier Simon, Wiley. Nice basic reference concentrating on data presentation.

Other Good Stuff:The Black Swan: The Impact of the Highly Improbable, by Nassim Nicholas Taleb, Random House. This is an informative and entertaining approach to statistical analysis among other things.

Statistics as Principled Argument, Robert Abelson, Erlbaum Assoc. Publishers, 1995. Go good discussion of the use of statistics without the ugly formulae.

Judgment under uncertainty: Heuristics and biases, Kahneman, Slovic, & Tversky, Cambridge University Press. The first chapter alone is worth reading. It’s a summary of the pitfalls with intuitive thinking.

Ray Wicks’ Monographs

CPS document for this presentation and other interesting monographs can be obtained from your favorite IBMer. Goto:ftp://cpstools.washington.ibm.com/zcp3000/winLook for: Getting Started In CP.These have been published in cmg.org/measureit.

getting started in cp ho - the conference exchange · share august 2011 1 getting started in (z/os)...

Documents