getting started in cp ho - the conference exchange · share august 2011 1 getting started in (z/os)...
TRANSCRIPT
SHARE August 2011
1
Getting Started in (z/OS) Capacity Planning
(Topics in Capacity Planning)
Ray Wicks
561-236-5846
BibliographyRay has spent most of his career at IBM in the performance analysis and capacity planning end of the business in Poughkeepsie, London, and now at the Washington Systems Center. He is the major contributor to IBM’s internal PA & CP tool zCP3000. This tool is used extensively by the IBM services and technical support staff world wide to analyze existing zSeries configurations (Processor, storage, and I/O) and make projections for capacity expectations.
Ray has given classes and lectures worldwide. He was a visiting scholar at the University of Maryland where he taught part time at the Honors College.
He won the prestigious Computer Measurement Group’s A.A. Michelson award in 2000. His recent virtual sessions “Getting Started in Performance Analysis & Capacity Planning” workshop held for attendees in China and India was well accepted.
SHARE August 2011
2
Trade Marks, Copyrights & StuffMany terms are trademarks of different companies and are owned by them.
On foils that appear in this presentation are not in the handout. This is to prevent you from looking ahead and spoiling my jokes and surprises. Also foils added afterI made handouts.
This tutorial is a two part introductory level session designed to introduce the student to the concepts required for Performance Analysis and Capacity Planning.
Emphasis is placed on large processor systems and examples will be largely drawn from z/OS but the concepts apply to all operating systems and hardware. Topics:
Abstract
Conceptual and Perceptual structures for performance analysis and capacity planning,
Using the Forced Flow law in PA & CP
Performance Analysis queries for capacity planning
Processor performance data (ITRRs & MIPS),
Resource Metrics for use in the Balance System model,
Using the utilization growth process in capacity planning,
A simple view of analytic modeling in CP
SHARE August 2011
3
PhilosophyIn this presentation I have stressed the principles involved in performance analysis and capacity planning rather than the specific development of formulae and hard details in the hardware and software architectures and implementations. The formulae can be found in many text books. It is the intuitions behind the formulae which is important. There is a difference between the formula and understanding what the formula is telling you.The architectures and implementations will change over time. It is more important to know how to look and evaluate a system the counts. If you want hardware and software details, you look to the vendors for that.
Capacity Planning
Capacity Planning ensures that Adequate resources are available for the Workload to complete in an Appropriate time.
Performance Analysis is Short Term (3-7 Days) Capacity Planning is Long Term (6-24 Months) What's the Workload using now? What’s Appropriate? Adequate? Service Level Objective or Service Level Agreement Things are ordered by Priority or Importance Discretionary Workloads?
SHARE August 2011
4
CP QuestionsEASY Do I have enough resource (CPU, I/O, Storage,..) to do the job today? If not, who’s suffering? If I get more, who will be helped? How much? If I need more, when will it be? How much more? Can I use specialized Processing Units? What variables should I track? Do I have any latent demand?
Harder Do I want faster or more CPs? How do I establish my growth? How do I size a new application? What tools should I use? Which interval do I model? If I reduce the #CPs & keep the MIPS the same will there be a problem?
CP FlowKnowledge Preparation Understand Environment Workload Characterization Business Units Application Programs H/W ResourcesModel Preparation Data Acquisition Representative CalibrationModel Execution Workload ForecastingModeling Output Future Requirements H/W & S/W Speeds & Feeds ArchitectureBusiness Negotiation Investment & Configuration
Data Modelλ,Traffic, I/O, #clients,
etc.
Business ModelForecast
Cost
SHARE August 2011
5
Topology
I BM 6611
IBM 66 11
I BM 6611
IBM 6611
LANWAN
Client
Local Server
Front End Server
Back End, DB Serverz/OS
For each node: # Servers? Utilization? Servicetime? Queuing? Response Time? Traffic? Etc.?
--------- PARTITION DATA ----------------- -- LOGICAL PARTITION PROCESSOR DATA -- -- AVERAGE PROCESSOR UTILIZATION PERCENTAGES -
----MSU---- -CAPPING-- PROCESSOR- ----DISPATCH TIME DATA---- LOGICAL PROCESSORS --- PHYSICAL PROCESSORS --
NAME S WGT DEF ACT DEF WLM% NUM TYPE EFFECTIVE TOTAL EFFECTIVE TOTAL LPAR MGMT EFFECTIVE TOTAL
AQFT A 215 0 522 NO 0.0 11.0 CP 03.59.05.986 03.59.50.403 72.45 72.68 0.06 18.98 19.03
VICTEST A 3 0 3 NO 0.0 5 CP 00.01.24.297 00.01.35.116 0.94 1.06 0.01 0.11 0.13
VMTOOL1 A 315 0 549 NO 0.0 16 CP 04.09.50.974 04.12.27.823 52.05 52.60 0.21 19.83 20.04
AQCF1 A DED 0 65 0.0 1 CP 00.29.59.893 00.29.59.929 99.99 100.0 0.00 2.38 2.38
AQHO A 2 0 3 NO 0.0 5 CP 00.01.23.451 00.01.26.815 0.93 0.96 0.00 0.11 0.11
AQLINX A 1 0 0 NO 0.0 2 CP 00.00.00.728 00.00.00.749 0.02 0.02 0.00 0.00 0.00
HOCF4 A DED 0 65 0.0 1 CP 00.29.59.895 00.29.59.938 99.99 100.0 0.00 2.38 2.38
GDLVM7 A 32 0 80 NO 0.0 3 CP 00.35.50.251 00.36.45.931 39.82 40.85 0.07 2.84 2.92
POKVMXA1 A 32 0 390 NO 0.0 6 CP 02.59.23.002 02.59.25.407 99.66 99.68 0.00 14.24 14.24
*PHYSICAL* 00.06.40.226 0.53 0.53
------------ ------------ ----- ----- -----
TOTAL 12.46.58.480 12.58.12.342 0.89 60.87 61.76
-
LNXVM14 A 4 2 IFL 00.01.30.569 00.01.35.358 2.52 2.65 0.13 2.52 2.65
*PHYSICAL* 00.00.17.265 0.48 0.48
------------ ------------ ----- ----- -----
TOTAL 00.01.30.569 00.01.52.624 0.61 2.52 3.13
-
AQFT A 215 2 IIP 00.00.02.237 00.00.02.271 0.06 0.06 0.00 0.06 0.06
AQHO A 2 2 IIP 00.00.01.500 00.00.01.581 0.04 0.04 0.00 0.04 0.04
*PHYSICAL* 00.00.01.220 0.03 0.03
------------ ------------ ----- ----- -----
TOTAL 00.00.03.737 00.00.05.072
CPU 2097 CPC CAPACITY 2740 SEQUENCE CODE 0000000000019F30
MODEL 742 CHANGE REASON=N/A HIPERDISPATCH=YES
H/W MODEL E56
0---CPU------------------- TIME % ----------------
LOG PROC --I/O INTERRUPTS--
NUM TYPE ONLINE LPAR BUSY MVS BUSY PARKED SHARE % RATE % VIA TPI
0 CP 100.00 76.79 76.77 0.00 100.0HIGH 52.40 34.86
1 CP 100.00 59.44 59.34 0.00 100.0HIGH 148.9 29.38
2 CP 100.00 77.13 77.11 0.00 100.0HIGH 44.24 32.59
3 CP 100.00 71.92 71.90 0.00 100.0HIGH 41.70 33.45
4 CP 100.00 77.94 77.91 0.00 100.0HIGH 49.20 35.32
5 CP 100.00 57.96 57.87 0.00 100.0HIGH 133.1 30.26
6 CP 100.00 78.32 78.30 0.00 100.0HIGH 42.10 32.89
7 CP 100.00 73.52 73.50 0.00 100.0HIGH 37.48 33.56
8 CP 100.00 76.44 76.42 0.00 100.0HIGH 16.37 34.00
9 CP 100.00 76.55 76.53 0.00 100.0HIGH 14.57 33.06
A CP 100.00 73.47 73.45 0.00 100.0HIGH 2446 4.03
TOTAL/AVERAGE 72.68 72.65 1100
3026 9.36
0 B IIP 100.00 0.10 0.10
0.00 100.0 HIGH
C IIP 100.00 0.02 0.02
0.00 98.2 MED
TOTAL/AVERAGE 0.06 0.06
198.2
Lots of DataThe SAS System
Model: MODEL1
Dependent Variable: CPU
Analysis of Variance
Sum of M
ean
Source D
F Squares
Square F Value
Prob>F
Model
1 9496.97929 9496.97929648.634
0.0001
Error
71 1039.54729 14.64151
C Total 7
2 10536.52658
Root MSE 3.82642 R
-square 0.9013
Dep Mean 31.40685 A
dj R-sq 0.8999
C.V. 1
2.18340
Parameter Estimates
Parameter Standard T
for H0:
Variable DF Estim
ate E
rror Parameter=0 P
rob >
|T|
INTERCEP 1 1.045116 1
.27348468 0.821
0.4146
DASD 1
0.072051 0.00282903
25.468
0.0001
Num_Samples Date Time Duration Box_ID CPU_Model CPS ICFS IFLS zAAPS SUPRV LPAR_Name SYS900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/26/2005 23:45:00 899.984 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:00:00 900.053 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:15:00 899.919 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:30:00 900.001 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 0:45:00 900.025 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD900 3/27/2005 8:00:00 900.026 PLEXS01 9672-R26 2 0 0 0 zOS SPTEST SSSS900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 LPAR PHYSICAL PHYSICAL900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 zOS PRODONLE PPPP900 3/27/2005 8:15:00 900.01 PLEXS01 9672-R26 2 0 0 0 zOS PRODBAT DDDD
SHARE August 2011
6
Frameworks
Conceptual Framework (Model, Paradigm) What's it supposed to mean? How do things connect?
Perceptual Framework What's it supposed to look like? How do things connect?
SHARE August 2011
7
A Plumbing Problem
P u m p
P u m p
P u m p
ΣF ΣF
Fluid Flow
A Plumbing Problem
Pum p
Pum p
Pum p
ΣF ΣF
Fluid Flow
=Forced Flow Law
SHARE August 2011
8
Conceptual Framework Forced Flow Law Flow Law
Waiting Service
Waiting Service
Waiting Service
Service Center 1
Service Center 3
Service Center 2
Conceptual Framework
W aiting Service
W aiting Service
W aiting Service
Service Center 1
Service Center 3
Service Center 2
5/M inute
15/M inute If the model is divided in two and the number of transactions crossing the boundaries is as indicated, what would the forced flow law say?
SHARE August 2011
9
Conceptual Framework
Waiting Service
Waiting Service
Waiting Service
1 Second
15 Minutes
1 Second
If the total time in the service center is as indicated, where would the users/transactions be?
B.S. Conceptual Framework
Disk
Disk
G KW E
Thinking
CPU
M em ory
SHARE August 2011
10
Conceptual Framework Implications
Users distribute themselves among nodes in proportion to the time spent at each node. The capacity of the System is determined by the slowest node (server). The resource usage (transaction rates) at various nodes are in proportion.
DASD
DASD
GKW E
Thinking
CPU
M em ory
2010 z/OS B.S. Metrics
MIPS usedS = SSCH rateDG = DASD gigabytes. The computation is nominal in that it is 2.83/actPS = Central Storage configured
S/MIPS 1.201 2.349 3.707DG/MIPS 2.236 6.593 52.539PS/MIPS 6.766 14.055 37.306
10% 50% 90%
DASD
DASD
GKWE
Thinking
CPU
Memory
MIPS
PS
SDG
SHARE August 2011
11
S/MIPS Samplesy = 2.1215x
R2 = 0.8727
0
5,000
10,000
15,000
20,000
25,000
30,000
0 2,000 4,000 6,000 8,000 10,000 12,000
MIPS Used
S
MIPS Used and SSCH are in proportion (correlated).
Good linear relationship.
PS/MIPS
MIPS Used and PS are not in proportion (correlated).
y = 15.948x - 4176.7
R2 = 0.5913
-50,000
0
50,000
100,000
150,000
200,000
250,000
0 2,000 4,000 6,000 8,000 10,000 12,000
MIPS Used
PS
SHARE August 2011
12
DG/MIPS
y = 3.5481x
R2 = -0.9825
05000
1000015000
2000025000
3000035000
4000045000
0 2,000 4,000 6,000 8,000 10,000
MIPS Used
DG DG
Linear (DG)
DG/MIPS
y = 3.5481x
R2 = -0.9825
y = 1.0133x + 344.16
R2 = 0.7620
5000
10000
15000
20000
25000
30000
35000
40000
45000
0 5,000 10,000 15,000
MIPS Used
DG
DG
DG Used
Linear (DG)
Linear (DG Used)
SHARE August 2011
13
Archimedes
Give me a fixed point and I canmove the world.
Desperate Capacity Planning
What part of a 500 MIPS machine would DB2 use at 100 I/Os per second?How many MIPS for DB2?
S/MIPS ε [1.201, 2.349, 3.707]100/MIPS = 1.201 → MIPS = 83.26100/MIPS = 2.349 → MIPS = 42.57Answer:Between 8.5% and 16.6% (????) of a 500 MIPS Machine
SHARE August 2011
14
More Metrics DefinedMIPS used
S = SSCH rate
DG = DASD gigabytes. The computation is nominal in that it is 2.83/act
DG = Nacts * 2.83
CS = Central Storage configured
D as in CS = 4000 + 0.04(MIPS)^D
where CS (or PS) is configured Central Storage (or RAM)
NNTacts = acts with rate >=2
AD = Access Density = S/DG
Full Metrics2010 83 Partitions
10% 50% 90%MIPS 403 2004 6247S 1123 4221 11779S/MIPS 1.201 2.349 3.707DG/MIPS 2.236 6.593 52.539PS/MIPS 6.766 14.055 37.306D 1.534 1.630 1.894DASD Resp 0.952 1.865 3.626DASD Serv 0.681 1.564 2.980Resp/Serv 1.227 1.827 1.232Nacts 1305 4084 12105NNTacts 73 264 882DASD GB 3693 11558 34256Used DG 585 2114 7065AD 0.059 0.360 0.909
SHARE August 2011
15
If the Model Works, What Should I See?
y = 60.941x + 655.76
R2 = 0.8377
0
1000
2000
3000
4000
5000
6000
7000
0 10 20 30 40 50 60 70 80
CPU%
DA
SD
I/O
If the model is valid, the forced flow law would prescribe that the variation of CPU service would be proportional to the I/O service. In this graph that translates into a linear relationship (R2>0.7?).
When It Doesn’t Work
y = 37.324x
R2 = 0.113
0
500
1000
1500
2000
2500
3000
3500
4000
0 20 40 60 80 100
CPU%
DA
SD
I/O
If the model Doesn’t work, you would see a non-linear relationship.
SHARE August 2011
16
RIOC=F(Workload Composition)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0:00 4:48 9:36 14:24 19:12 0:00
Time
RIO
C
0102030405060708090
100
0:59
2:59
4:59
6:59
8:59
10:59
12:59
14:59
16:59
18:59
20:59
CP
U%
BATDEV2
BATDEV1
BATPROD
STCLO
STCMD STCHI
SYSSTC
SYSTEM
Compare those intervals in the RIOC plot where the RIOC is stable or unstable with the workload mixture for the same intervals in theCPU% plot. The workload mixture often explains the RIOC variation.
Use of Conceptual Framework like BS Model
Establishes Relationships Balanced System, resource ratios
Builds Expectations Linear graph
High lights Exceptions Non Linearity Outliers
Generates Questions Especially if not as expected
But... it may cause you to see Framework (model) interactions that just aren't there!
SHARE August 2011
17
Another ExamplePartition GCP MIPS
0
5000
10000
15000
20000
25000
0:00
3:15
6:30
9:45
13:0
0
16:1
5
19:3
0
22:4
5
2:00
5:15
8:30
11:4
5
15:0
0
18:1
5
21:3
0
MIP
S
CEC7/SYSJCEC7/SYSG
CEC4/SYSK
CEC4/SYSR
CEC4/SYSF
CEC6/SYSE
CEC6/SYSH
CEC6/SYSD
zIIP MIPS
0
200
400
600
800
1000
1200
1400
0:00
4:00
8:00
12:0
0
16:0
0
20:0
0
0:00
4:00
8:00
12:0
0
16:0
0
20:0
0
CEC7/SYSJCEC7/SYSG
CEC4/SYSR
CEC4/SYSF
CEC6/SYSH
CEC6/SYSD
CEC 6 UtilCEC6 GCP Util
0
20
40
60
80
100
120
0:0
0
3:3
0
7:0
0
10:3
0
14:0
0
17:3
0
21:0
0
0:3
0
4:0
0
7:3
0
11:0
0
14:3
0
18:0
0
21:3
0
SYSDSYSH
SYSE
CEC 6 zIIP Util
020406080
100120140160
0:0
0
3:0
0
6:0
0
9:0
0
12:
00
15:
00
18:
00
21:
00
0:0
0
3:0
0
6:0
0
9:0
0
12:
00
15:
00
18:
00
21:
00
Potential
SYSD
SYSH
SHARE August 2011
18
SYSD zIIP = F(GCP)?zI
IPM
IPS
R2 = 0.02
Workload zIIP = F(GCP)?
SHARE August 2011
19
Fuzzy Thinkers Do Not Trust Perfection
y = 1.9259x
R2 = 1
0
2
4
6
8
10
12
14
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
GCPs
zIIP
s
SYSD Workload zIIP = F(GCP)?
y = 0.0844x
R2 = 0.9047
y = 0.1037x0.9262
R2 = 0.9805
0
0.1
0.2
0.3
0.4
0.5
0.6
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
GCPs
zIIP
s
SHARE August 2011
20
CEC 7 UtilCEC 7 GCP Util
0
20
40
60
80
100
120
0:00
3:15
6:30
9:45
13:0
0
16:1
5
19:3
0
22:4
5
2:00
5:15
8:30
11:4
5
15:0
0
18:1
5
21:3
0
SYSG
SYSJ
CEC7 zIIP Util
050
100150200250300350400450
0:00
3:30
7:00
10:3
0
14:0
0
17:3
0
21:0
0
0:30
4:00
7:30
11:0
0
14:3
0
18:0
0
21:3
0
Potential
SYSGSYSJ
Another Example
y = 2.4545x
R2 = 0.3726
0
2000
4000
6000
8000
10000
12000
14000
16000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
MIPS
I/O
SHARE August 2011
21
Same Data
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
0
IO/M
IPS
Conceptual Framework
DASD
DASD
G KW E
Thinking
CPU
M em ory
Maybe we need a big processor? More engines?????
SHARE August 2011
22
Future Conceptual Framework?
DASD
GKW E
Thinking
CPU
Non-Volatale M em ory
Some Capacity Concerns at a Service Center
Type of serverNumber of serversSpeed of servers
Number in queueOrder in queueService demandType of servicePopulationArrival pattern
SHARE August 2011
23
Z900 Structure
Crypto 1 ClockCrypto 0
ETR
Cluster 1
MBA3
STI
L1
PU10
L1
PU0C
L1
PU0D
L1
PU0E
L1
PU0F
L1
PU0B
L1
PU0AMBA2
STI
L1
PU13
L1
PU11
L1
PU12
Cache control Chip and cache data Chips 16 MB L2 Shared Cache
Cluster 0
MBA0
STI
PU01 PU06PU02 PU03 PU04 PU05PU00
STI
PU09PU07 PU08
Cache control Chip and cache data Chips 16 MB L2 Shared Cache
MBA1
ETR
35 logic chips in total on a 20 PU MCM
PassatI/O Cage(Optional)
Parallel 3/4 PortOSA-2 TR
OSA-2 FDDIESCON 4 Port
ESCON 16 PortFICON 2 Port
OSA-E Gb EthernetOSA-E Fast Ethernet
OSA-E ATMISC-3 1-4 Port
PCI-CC 2 engines
CargoI/O Cage
333 MByte STIs 1 GByte STIs
ICB-2 333 MByte
ICB-3 1 GByte
Memorycard
0
Memorycard
2
Memorycard
1
Memorycard
3
L1 L1L1 L1 L1 L1L1 L1L1 L1
Ref. SG24-5975
Connections
Micro Processor(Adaptor)
Buffer
Link
SubChannels
Every line connecting two boxes in a diagram implies micro processors on each end to do the talking? (What happens if they speak different languages?) Data is moved from a buffer to micro processor buffer onto link into m-processor buffer into storage buffer.
SHARE August 2011
24
Z10 Memory – a simple view
BookMemory
L2 Cache
L 1.5
CPU
L 1
L 1.5
L 1
CPU
L 1.5
L 1
CPU
Memory
L2 Cache
L 1.5
CPU
L 1
L 1.5
L 1
CPU
L 1.5
L 1
CPU
PR/SM
The Nest
Mem
ory
Hie
rarc
hy
or
N
est
Inst
ruc
tio
n
Co
mp
lexi
ty-
Mic
rop
roce
sso
r d
esig
n
Reference: John Burg’s presentation at SHARE 3/3/2011
http://www/ibm.com/support/techdocs/atsmastr.nsf/Webindex/TC000066
L1L1
192MB eDRAMShared L42 SC Chips
LRU Cast-OutCP StoresData Fetch Return
24MB eDRAMShared L3
L1
24MB eDRAMShared L3
24MB eDRAMShared L3
L1 L1 L1 L1
24MB eDRAMShared L3
24MB eDRAMShared L3
24MB eDRAMShared L3
PU0 PU1 PU2 PU3 PU4 PU5
L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
1.5MBL2
zEnterprise
SHARE August 2011
25
49
z10 ECCPU
4.4 GHzCaches
L1 private 64k i, 128k dL1.5 private 3 MBL2 shared 48 MB / bookbook interconnect: star
z196CPU
5.2 GHzOut-Of-Order execution
CachesL1 private 64k i, 128k dL2 private 1.5 MB L3 shared 24 MB / chipL4 shared 192 MB / bookbook interconnect: star
z196 versus z10 hardware comparison
...
Memory
L4 Cache
L2
CPU1
L1
L3 Cache
L2
CPU4
L1... L2
CPU1
L1
L3 Cache
L2
CPU4
L1...
...
Memory
L2 Cache
L1.5
CPU
L1
L1.5
CPU
L1
L1.5
CPU
L1
Introducing the new Relative Nest Intensity (RNI) metric
(SMF 113 Data)
Note these Formulas may change in the future
L1
The “Nest”
L2LP L2RP MEMP
Relative Nest Intensity
Microprocessor Design Memory Hierarchy or Nest
How Often?
L1MP
RNI
Distribution and latency across technology
How intensely this part of the architecture is utilized
SHARE August 2011
26
Level 1 (L1) Miss Percent
If not from L1, from Where? (SMF 113 Data)
Here's the plot of percent sourcing from different levels of
cache. As the sourcing moves from the highest level of
cache (percent=L15P) to the slowest memory source
(percent=MEMP), the performance degrades. Level 1
cache is the fastest and closest to the processing unit.
The sourcing shown in the graph is for data not found in
level 1 cache. You can check the level 1 cache miss % by
graphing variable L1MP.
L1.5P=%sourced from level 1.5 cache
L2LP=%sourced from level 2 cache same book
L2RP=%sourced from level 2 cache different book
MEMP=%sourced from memory
Remember than as more and more of the instructions and
data has to be fetched from more distance caches, the
machine effectively runs slower.
SHARE August 2011
27
Relative Nest Intensity (RNI)z10 RNI=(1.0*L2LP + 2.4*L2RP + 7.5*MEMP) / 100
Z196 RNI=1.6*(0.4*L3P + 1.0*L4LP + 2.4*L4RP + 7.5*MEMP) / 100
Memory and Workload Characteristics
HIGHAVERAGE
>=0.75< 0.75
>6%
HIGHAVERAGELOW
>1.00.6 to 1.0< 0.6
3% to 6%
AVERAGELOW
>= 0.75< 0.75
<3%
Workload Hint
RNIL1MP
Note that these are initial values and may change.
SHARE August 2011
28
Hiper Dispatch
Problem of cache misses is alleviated by controlling the dispatch of LCPs on specific RCPs.
Keep LCP-RCP on the same book.
Minimize PRSM dispatching.
Keep LCP on same RCP.
Re-dispatch work units on same processor subset.
zPCR
SHARE August 2011
29
zPCR
Establish Power value in MIPS?
SHARE August 2011
30
Processor Power: zPCR
Configuration Input
Manual
RMF Listing
EDF file (CP3KEXTR)
Processor Power: zPCRMaximum: MIPS available if other partitions are idle given logical configuration.
Minimum: MIPS entitled to if other partitions are demanding their fair share (Weight) given the logical configuration.
2097-E56 summary with this logical configuration
SHARE August 2011
31
Capacity Planning
SLO
Time
Resource Usage
t0
CP Actions
Starting from a single point at t0, we project growth until some threshold is reached– a Service level Objective or Agreement (SLO, SLA).
Then we take action.
SLO or SLA for a Workload Group
Interactive
For some number of users
At some threshold transaction rate
At some threshold Response Time
At some amount of power & I/O
Batch
For some number of Jobs
At some rate
At some about of power & I/O
At some turn-around threshold
SHARE August 2011
32
Capacity Planning Actions
Upgrade Hardware Add CPs (PUs) New Model Add Another CPC
Move Workload to another image Split Workload and move a piece Tune it? Continue to Suffer
How Accurate Is It?
Time
Prediction
t0
Starting from an initial point of maybe dubious accuracy, we apply a growth rate (also dubious) and then recommend actions costing lots of money.
SHARE August 2011
33
Accuracy
Timet0Time
Prediction
t0
Accuracy is found in values that are close to the expected curve. This closeness implies an expected bound or variation in reality. So a thicker line makes sense.
Fuzzy Patches
Time
Prediction
t0 t
p
Time
Prediction
t0 t
p
At time t, is the prediction a precise point p or a fuzzy patch?
SHARE August 2011
34
Fuzzy Factors
Basis for prediction is a single sample taken from a set of samples with some distribution.
Growth Factor applied may be just better than fiction.
Prediction compounds the fuzz and is itself fuzzy.
Niels Bohr: “Prediction is very hard to do. Especially about the future.”
Analytic ExampleErlang’s M/M/c
Q
S
c = Number of CPsU = UtilizationT= traffic or c*UC(c,T) is Erlang's C formulaE[s] is expected service time
From any queuing theory book: Arnold or Jain for example.
E[RT] = E[S] + E[Q]
E[RT] = E[S] + C(c,T)E[S]c(1-U)
E[RT] = E[S] 1 + C(c,T) c(1-U)
SHARE August 2011
35
Erlang’s M/M/c
Q
S
c = Number of CPsU = UtilizationT= traffic or c*UC(c,T) is Erlang's C formulaE[s] is expected service time
E[RT] = E[S] + E[Q]
E[RT] = E[S] + C(c,T)E[S]c(1-U)
E[RT] = E[S] 1 + C(c,T) c(1-U)
Read as Contention Factor. When CF=0, E[RT] = E[ST]When CF=1, E[RT] = 2* E[ST]
M/M/1E[RT] = E[S] + E[Q]
E[RT] = E[S] + C(c,T)E[S]c(1-U)
E[RT] = E[S] {1 + C(c,T) }c(1-U)
E[RT] = E[S]1-U
IF E[S] = 30 Ms. And U=80%Then E[RT] = 30/(1-0.8) = 150
C=1
SHARE August 2011
36
M/M/1 Exercise
Assume M/M/1.
(1) What is the expected RT for Medium?
(2) At what effective utilization would the response time for medium exceed 1.2 seconds?
12%
45%
32%
Workload Utilization
89%1.32 MinLow
77%0.25 secMedium
32%0.05 secHi
Perceived Utilization
Service Time
Workload
M/M/1 Exercise
12%
45%
32%
Workload Utilization
89%1.32 MinLow
77%0.25 secMedium
32%0.05 secHi
Perceived Utilization
Service Time
Workload
Assume M/M/1.
(1) What is the expected RT for Medium?
(2) At what effective utilization would the response time for medium exceed 1.2 seconds?
RT = ST / 1-U RT = 0.25 / 1- .77RT= 1.1
RT = ST / 1-U 1.2 = 0.25 / 1-UU= 79%
SHARE August 2011
37
Simple Capacity PlanProblem: For the following workload, find the future workload utilization by month if the per annum growth rate is 30%.
Base# LCPs 4MIPS 1000MIPS/LCP 250
Target# LCPs 6MIPS 1200MIPS/LCP 200
Base → Target More MIPS More Engines Slower Engines
Hi 40.0Middle 25.0Low 20.0
Total 85.0
Growth ComputationsUtilizationInput Projected
Workload Jun-06 Jul-06Hi 40.0 40.9Middle 25.0 25.6Low 10.0 10.2
Total 75.0 76.7
PA Growth G 30
Period Length L 1
Period F 1.022104451
*F
(0.01 * G)L/12
in Months
Per Annum
1.0221044512 = 1.3
SHARE August 2011
38
Base UtilizationWorkload Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08
Hi 40.0 40.9 41.8 42.7 43.7 44.6 45.6 46.6 47.6 48.7 49.8
Middle 25.0 25.6 26.1 26.7 27.3 27.9 28.5 29.1 29.8 30.4 31.1
Low 20.0 20.4 20.9 21.4 21.8 22.3 22.8 23.3 23.8 24.3 24.9
Total 85.0 86.9 88.8 90.8 92.8 94.8 96.9 99.1 101.2 103.5 105.8
PA Growth 30Period 1Period F 1.022104#CPs 4
0.0
20.0
40.0
60.0
80.0
100.0
120.0
Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08
CP
U%
LowMiddleHi
Base Response TimeWorkload Service T Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07
Hi 5 40.0 40.9 41.8 42.7 43.7 44.6 45.6 46.6 47.6 48.7 49.8
Middle 10 65.0 66.4 67.9 69.4 70.9 72.5 74.1 75.7 77.4 79.1 80.9
Low 10 85.0 86.9 88.8 90.8 92.8 94.8 96.9 99.1 101.2 103.5 105.8Hi 5.2 5.2 5.2 5.2 5.3 5.3 5.3 5.3 5.4 5.4 5.4Middle 12.5 12.8 13.1 13.4 13.8 14.3 14.8 15.4 16.1 17.0 18.0Low 21.5 23.8 27.0 31.7 39.2 52.8 85.6 269.6 0.0 0.0 0.0
0.0
20.0
40.0
60.0
80.0
100.0
70.0 80.0 90.0 100.0
CPU%
Re
spo
ns
e M
s
HiMiddleLow
SHARE August 2011
39
Migration Options Migrate to same MIPS and more CPs. Migrate to same MIPS and fewer CPs. Migrate to more MIPS and fewer CPs. Migrate to more MIPS and more CPs.
Utilization?Service Time?Evaluation?
Target UtilizationWorkload Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08Hi 33.3 34.1 34.8 35.6 36.4 37.2 38.0 38.8 39.7 40.6 41.5Middle 20.8 21.3 21.8 22.2 22.7 23.2 23.8 24.3 24.8 25.4 25.9Low 16.7 17.0 17.4 17.8 18.2 18.6 19.0 19.4 19.9 20.3 20.7
Total 70.8 72.4 74.0 75.6 77.3 79.0 80.8 82.5 84.4 86.2 88.1
PA Growth 30Period 1Period F 1.022104#CPs 2
0.0
20.0
40.0
60.0
80.0
100.0
Jun-07 Jul-07 Aug-07
Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08
CP
U%
LowMiddleHi
SHARE August 2011
40
Target Response TimeWorkload Service T Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08Hi 8.333333 33.3 34.1 34.8 35.6 36.4 37.2 38.0 38.8 39.7 40.6 41.5Middle 16.66667 54.2 55.4 56.6 57.8 59.1 60.4 61.8 63.1 64.5 65.9 67.4Low 16.66667 70.8 72.4 74.0 75.6 77.3 79.0 80.8 82.5 84.4 86.2 88.1Hi 9.4 9.4 9.5 9.5 9.6 9.7 9.7 9.8 9.9 10.0 10.1Middle 23.6 24.0 24.5 25.0 25.6 26.3 26.9 27.7 28.6 29.5 30.5Low 33.4 35.0 36.8 38.9 41.4 44.4 47.9 52.3 57.8 65.0 74.7
0.0
20.0
40.0
60.0
80.0
100.0
70.0 80.0 90.0 100.0
CPU%
Res
po
ns
e M
s
HiMiddleLow
Compare Utilization
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
70.0 80.0 90.0 100.0
CPU%
Res
po
ns
e M
s
HiBaseMidBaseHiTargMidTargLowBaseLowTarg
SHARE August 2011
41
Compare Transaction Rate
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
Jun-06
Jul-06 Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
Period
Res
po
nse
Ms
HiBaseMidBaseHiTargMidTargLowBaseLowTarg
For each period, Base and Target have same transaction rate.
How to Compare?
0.010.020.030.040.050.060.070.0
70.0 80.0 90.0 100.0
CPU%
Res
po
nse
Ms
LowBaseLowTarg
Same utilization
Same TransRate
SHARE August 2011
42
Modeling Restrictions At CPU contention points, WLM and IRD in z/OS can restructure the physical & logical configuration. At I/O contention points in z/OS, PAVs can be moved around. Model arrival rates and service distributions can vary significantly. Priority ordering varies under WLM control In Sysplex, transactions could be routed to other partitions. Software serialization points usually not modeled. Model output results strictly mimic the model. If you miss something important, the results may not reflect your environment. Various models differ greatly in detail. Don’t miss an important service center.
CP Alternatives
ROTs
Trending
Analytic Modeling
Simulation
Benchmark
Difficulty (Time) & Cost ($$$)
Accuracy
SHARE August 2011
43
Rules of Thumb
ROTs (thresholds) are often quite adequate for CP
Useful for Health Check
Rules of Thumb
• Honor your Father & Mother
• Do unto others as you would have them do unto you.
• Do unto others before they do unto you.
• Keep your CPU%<90%
• Don’t swim soon after eating.
• It is better to give than receive.
SHARE August 2011
44
Philosophical Remark
We understand a Rule by trying to break it.Or
Learn the rules so you know how to break them correctly.
All swans are white ≡ There does not exist a swan which is not white
Balanced System ROTs
DASD
DASD
GKW E
Thinking
CPU
M em ory
MIPS used, memory used, I/O used should be in some proportion.
SHARE August 2011
45
Expected Bounds
The resource ratio is shown as a bar. If the bar is above the 90%ile line, it means that the value was in the top 10% of the samples reviewed. Similarly, if the bar is below the 10%ile line, the value is in the bottom 10%. Neither is good or bad., it’s an flag to examine the amount of resource available.
Trending
Trending predicts the future if the future looks like the past.
Time Series Trending is complicated.
Trending can answers overall CP questions.
SHARE August 2011
46
Analytic ModelingDASD
DASD
GKWE
Thinking
CPU
Memory
Pre-built packages can be fast to solve and relatively easy to use. Flow is statistically driven and usually predefined. Accuracy? Utilization within 5% Response times within 30%
Data acquisition is key. Calibration can be tough. Custom analytic models are really tough. Requires technical staff. Services are Available.
ROTs & Analytic Modeling
Res
po
nse
Tim
e
System Utilization
Low
Middle
Hi
0.0
20.0
40.0
60.0
80.0
100.0
120.0
Jun-07
Jul-07 Aug-07
Sep-07
Oct-07 Nov-07
Dec-07
Jan-08
Feb-08 Mar-08
Apr-08
CP
U%
Low
Middle
Hi
The relationship between Utilization and Server Response is sensitive to the priority of the workload. Utilization in Response time is “perceived utilization”. Watch out for: Logical vs physical utilization and single task workloads.
SHARE August 2011
47
Simulation Pre-built packages are slower to solve and can be relatively easy to use. Flow is statistically driven and usually predefined but can be customized. (Application modeling.) Accuracy?
Utilization within 5% Response times within 30%
Data acquisition is key. Calibration can be tough. Custom models are build from service center building blocks. Simulation languages do exist. Specialized staff. Services exist.
DASD
DASD
GKWE
Thinking
CPU
Memory
Benchmark
A lot of work in preparation Hardware/SoftwareWorkload Lot's of time.
It does mimic the running environment the best. Software flow & queuing Software usage It's expensive. Variations limited by resources. Given the resources the benchmark can be complicated. Tests the environment - does it work?
SHARE August 2011
48
CP Alternatives
Difficulty
Accuracy
ROTs
Benchmark
AnalyticModel
Simulation
Trending
What questions do you have?What questions must you answer? Cost of the answer? Cost of getting it wrong? Time line?What happens if you get it wrong?
Things to Remember
• Be aware of the exceptions to the rule.• A framework helps but it can make you see things
that just aren't there.• Impeccable mathematics does not replace
knowledge of the facts.• Protect yourself.• Business decisions can override technical issues.• Sometimes being understood is more important than
being very accurate.• Being "very" accurate may be a luxury of the idle.• Other than the technicalities, there may be a hidden
agenda.
SHARE August 2011
49
Bibliography - IThe Art of Computer Systems Performance Analysis, by Raj Jain, Wiley. I like this one. It is thorough and complete. A very good reference.Capacity Planning for Web Performance, by Daniel A. Menasce and Virgilio A.F. Almeida, Prentice Hall. A good book on network structure and terminology and introduction to the topic.Probability, Statistics, and Queuing Theory, by Arnold O. Allen, Academic Press Inc. This is the classic in queuing theory.Performance by Design: computer capacity planning by example. By Daniel A. Menascé, Virgilio A. F. Almeida, and L. W. Dowdy. The web site http://cs.gmu.edu/~menasce/perfbyd/ has a lot of .xls modeling worksheets. MVS I/O Subsystems, by Gilbert E. Houtekamer and H. Pat Artis, Performance Associates. More than you want to know about the I/O subsystem. A definitive source but is a little out of date. Is available from Intellimagic or perfassoc.com. Exploring IBM S/390 Computers, by Jim Hoskins and George Coleman, Maximum Press. A general introduction to S/390 hardware and architecture. (with IBM G326-3006-06)Statistical Concepts and Methods, by Gouri Bhattacharyya and Richard A. Johnson, John Wiley & Sons.The Practical Performance Analyst, by Neil J. Gunther, Authors Choice Press. A very good book.Almost any volume of the Computer Measurement Group (CMG) Proceedings is worth
looking at for performance and capacity planning articles. Web Site: http://www.cmg.org/measureit/
Bibliography - IIGC28-1761 MVS™ Planning: Workload Management. A guide to WLM.SC28-1950 Resource Measurement Facility Report Analysis. A guide to report reading.SC28-1951 Resource Measurement Facility Performance Management Guide. A good tutorial to get started.SG24-5975 IBM zSeries 900 Technical Guide. A good hardware architecture and implementation Red Book.LY28-1042 RMF™ Support for LPAR Management Time. Want to know how LPAR works?SC28-1187 Large Systems Performance Reference by John Fitch. John goes into detail about the LSPR data.SG24-4356 System/390® MVS Parallel Sysplex Performance. A good Red Book on Parallel Sysplex RMF reports and data.SG24-4680 System/390 MVS Parallel Sysplex Capacity Planning . A good Red Book on the function and capacity of Parallel Sysplex.
A great URL for z/Series documents in general: http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/RMF in particular:http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/r4pdf/rmf.htmlFor zPCR,search www.ibm.com for “zPCR” & “SoftCap”
SHARE August 2011
50
Bibliography - IIIEXCEL:Applied Statistics For Engineers and Scientists Using Excel and MINITAB, by David Levine, Patricia Ramsey, Robert Smidt, Prentice Hall. This comes with a CD containing handy Excel Add-Ins.Excel Data Analysis by Jinjier Simon, Wiley. Nice basic reference concentrating on data presentation.
Other Good Stuff:The Black Swan: The Impact of the Highly Improbable, by Nassim Nicholas Taleb, Random House. This is an informative and entertaining approach to statistical analysis among other things.
Statistics as Principled Argument, Robert Abelson, Erlbaum Assoc. Publishers, 1995. Go good discussion of the use of statistics without the ugly formulae.
Judgment under uncertainty: Heuristics and biases, Kahneman, Slovic, & Tversky, Cambridge University Press. The first chapter alone is worth reading. It’s a summary of the pitfalls with intuitive thinking.
Ray Wicks’ Monographs
CPS document for this presentation and other interesting monographs can be obtained from your favorite IBMer. Goto:ftp://cpstools.washington.ibm.com/zcp3000/winLook for: Getting Started In CP.These have been published in cmg.org/measureit.