much ado about cpu

71
© 2009 IBM Corporation 1 Much Ado About CPU Martin Packer +44-20-8832-5167 [email protected] Twitter: MartinPacker

Upload: martin-packer

Post on 12-Jun-2015

1.774 views

Category:

Technology


1 download

DESCRIPTION

My - regularly updated - on z/OS mainframe performance information in the CPU area.

TRANSCRIPT

Page 1: Much Ado About CPU

© 2009 IBM Corporation 1

Much Ado About CPU

Martin Packer+44-20-8832-5167

[email protected]: MartinPacker

Page 2: Much Ado About CPU

© 2009 IBM Corporation2

AbstractzSeries, System z9 and z10 processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management.

This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the author's experience of evolving his reporting to support these changes.

This presentation is substantially enhanced this year

Page 3: Much Ado About CPU

© 2009 IBM Corporation3

Agenda

Review of technology

"Traditional" LPAR Configuration and IRD

Coupling Facility CPU

IFL

zAAP and zIIP

z/OS Release 8, 9 and 10 Changes

Soft Capping and Group Capacity Limits

Blocked Workloads

z10 Hiperdispatch

Cool It

I/O Assist Processors (IOPs)

In Conclusion

Backup Foils

Page 4: Much Ado About CPU

© 2009 IBM Corporation4

R

Review of Technology

Page 5: Much Ado About CPU

© 2009 IBM Corporation5

"Characterisable" Engines ICF CPs

–Run CF code only–Dynamic Dispatch an option

IFL CPs–Run Linux only (though often under VM)

z/OS engineszAAPs

–Run java code "offloaded" from regular CPs (GCPs)–Also System XML etc

–zIIPs–Run certain kinds of DB2 work offloaded from GCPs–z/OS Global Mirror–IPSec Encryption

GCPs–General purpose CPs

“Non-Characterisable" Engines–SAPs

–I/O Assist Processor–Spares

●Fewer in multi-book z9 and z10 machines than in z990

Page 6: Much Ado About CPU

© 2009 IBM Corporation6

Book-Structured From z990 Onwards● Connected by rings in z990 and z9

● z10 ensures all books connected to all books directly● Data transfers are direct between books via the L2 Cache chip in each book's MCM

● L2 Cache is shared by every PU on the MCM

● Only 1 book in z890, z9 BC and z10 BC models

Page 7: Much Ado About CPU

© 2009 IBM Corporation7

IRD CPU ManagementWeight Management for GCP engines

–Alter weights within an LPAR Cluster

–Shifts of 10% of weight

CP Management

–Vary LOGICAL CPs on and off

–Only for GCP engines

WLM objectives

–Optimise goal attainment

–Optimise PR/SM overhead

–Optimise LPAR throughput

Part of "On Demand" picture

–Ensure you have defined reserved engines

–Make weights sensible to allow shifts to happen

Page 8: Much Ado About CPU

© 2009 IBM Corporation8

"Traditional" LPAR Configuration and IRD

Page 9: Much Ado About CPU

© 2009 IBM Corporation9

Some Old QuestionsHow do we evolve our performance and capacity reporting?Should we define an LPAR with dedicated engines?

–Or with shared engines?What should the weights be?In total and individuallyAnd what about in each pool?

How many engines should each LPAR have?–For Dedicated engines:

•The number is usually fairly obvious–For Shared engines:

• The number of engines should roughly match share of pool•Other considerations often apply, though

Page 10: Much Ado About CPU

© 2009 IBM Corporation10

Increasing ComplexityInstallations are increasing the numbers of LPARs on a machine

–Many exceed 10 per footprintƒExpect 20 + soon

ƒAnd have more logical and physical enginesƒAnd increasing the diversity of their LPARs

ƒGreater incidence of IFLsƒFast uptake of zIIPs and zAAPs

•Sometimes meaning 2 engine speedsƒFewer stand-alone CF configurations

•With mergers etc. the numbers of machines managed by a team is increasing•And stuff's got more dynamic, too

●As an aside...● Shouldn't systems be self-documenting?

Page 11: Much Ado About CPU

© 2009 IBM Corporation11

IRDWeights altered dynamicallyBut only for GCPs

Numbers of engines altered dynamicallyBut only for GCPs

And not with HiperDispatch turned onThese introduce their own problems:

–Varying weights when doing "share" calculations in reporting–Fractional engines and varying engines

Number of engines may go down when machine gets very busyThis MIGHT be a surprising result–This is OK if goals are still met

ƒIn the example in the backup foils even the minimum engine count is well above actual LPAR capacity requirement Backup Foils

Page 12: Much Ado About CPU

© 2009 IBM Corporation12

CPU Analysis with IRDz/OS image utilisation becomes less tenable

–How do you compare 90% of 4 engines to 80% of 5?ƒCould happen in neighbouring intervalsƒAnswer: 3.6 engines vs 4.0 engines

Capture ratio needs to take into account fractional engines–And varying at that

Percent of share becomes less meaningful–As denominator can vary with varying weights

Stacking up Partition Data Report utilisations still makes sense–Probably best way of summarising footprint and z/OS image utilisation

–This is true for all pools, though IRD only relates to GCPs

Page 13: Much Ado About CPU

© 2009 IBM Corporation13

Coupling Facility CPU

Page 14: Much Ado About CPU

© 2009 IBM Corporation14

Managed out of common Pool 2 in z990

–Out of Pool 5 in z9 and z10

–Pool numbers given in SMF 70 as index into table of labels

– Called “ICF” in both z990 and z9 / z10

Recommendation: Manage in reporting as a separate pool

Follow special CF sizing guidelines

–Especially for takeover situations

Always runs at full speed

So good technology match for coupled z/OS images on same footprint

Another good reason to use ICFs is IC links

Shared ICFs strongly discouraged for Production

Especially if the CF image has Dynamic Dispatch turned on

Internal Coupling Facility (ICF)

Page 15: Much Ado About CPU

© 2009 IBM Corporation15

ICF ...R744PBSY and R744PWAI add up to SMF 70-1 LPAR view of processor busy

•PBSY is CPU time processing requests

•PWAI is CPU time while CFCC is not processing requests but it is still using CF cycles

•For Dynamic Dispatch PWAI is time when not processing CF requests but Logical CP not yet taken back by PR/SM

•For dedicated or non-Dynamic Dispatch cases sum is constant

•For Dynamic Dispatch sum can vary.

Number of defined processors is number of CF Processor Data sections in 74-4

PBSY and PWAI Can be examined down to Coupling Facility engine level

SMF 74-4 has much more besides CF Utilisation

Page 16: Much Ado About CPU

© 2009 IBM Corporation16

ICF ...

Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper CPU picture

Since z/OS Release 8 74-4 has machine serial number– Allows correlation in most cases– Partition number added to 74-4 in OA21140

• Enables correlation with 70-1 when LPAR name is not the Coupling Facility Name

Page 17: Much Ado About CPU

© 2009 IBM Corporation17

Structure-Level CPU Consumption CFLEVEL 15 and z/OS R.9 Always 100% Capture Ratio

– Adds up to R744PBSY

Multiple uses:– Capacity planning for changing request rates– Examine which structures are large consumers– Compute CPU cost of a request

• And compare to service time• Interesting number is “non-CPU” element of service time

– as we shall see

NOTE:

– Need to collect 74-4 data from all z/OS systems sharing to get total request rate

Page 18: Much Ado About CPU

© 2009 IBM Corporation18

Structure CPU Experiment

Page 19: Much Ado About CPU

© 2009 IBM Corporation19

Structure CPU Experiment Based on

– R744SETM Structure Execution Time– Sync Request Rate

• Virtually no Async– Sync Service Time

One minute RMF intervals– Sorted by request rate increasing

Run was 1-way DB2 Datasharing– Only really active structures ISGLOCK and LOCK1

Red lines are CPU time per request– Blue lines are Service time per request

ISGLOCK “low volume”– Shows amortization of some fixed cost effect

• Wondering also if some “practice effect” affects service times– CF used IC links

LOCK1 “high volume”– More reliable for capacity planning– CF used a mixture of ISC and ICB links

Page 20: Much Ado About CPU

© 2009 IBM Corporation20

ISGLOCK Requests

02468

10121416

0 10 20 30 40 50 60 70Requests / Second

Mic

rose

cond

s

CPU Time Service Time

Page 21: Much Ado About CPU

© 2009 IBM Corporation21

LOCK1 Requests

0

2

4

6

8

10

12

750 800 850 900Requests / Second

Mic

rose

cond

s

CPU Time Service Time

Page 22: Much Ado About CPU

© 2009 IBM Corporation22

And From My Travels... Next chart isn't from the experiment just described

– A real customer system

A Group Buffer Pool

ISC-Connected– Necessary for the customer's estate

Clearly something goes wrong at about 1100 requests / second– Especially in response time terms but also CPU

• (Coupling Facility not CPU constrained)

Options include– Managing the request rate to below 1100 / sec

– Working on the request mix

– Infrastructure reconfiguration

Page 23: Much Ado About CPU

© 2009 IBM Corporation23

Page 24: Much Ado About CPU

IFL

Page 25: Much Ado About CPU

© 2009 IBM Corporation25

IFL

Integrated Facility for Linux

–Runs LinuxƒPerhaps under VM

“Pool 2“ in z990

Separate “Pool 4” in z9 and z10

Labeled “IFL”

Can be managed under IRD

–Set velocity goals

–Weight Management onlyƒNot CP management

For a good view of utilisation use VM etc monitors

–Unless shared IFL

–Always runs at full speed

Page 26: Much Ado About CPU

© 2009 IBM Corporation26

zAAP and zIIP

Page 27: Much Ado About CPU

© 2009 IBM Corporation27

zAAP and zIIP

Must not exceed number of GCPs

Run at full speed, even if GCPs don't

Hardcapping but no softcapping

zAAP “Pool 2" engines in z990

Separate Pool 3 in z9 and z10 for zAAP

Separate Pool 6 for zIIPs

Not managed by IRD

–Weight is the INITIAL LPAR weight

Page 28: Much Ado About CPU

© 2009 IBM Corporation28

zAAP and zIIP Management

zAAP-supported workloads can also run on GCP

–If IFACROSSOVER=YES

–zAAP-supported workload runs on GCP at priority of original workloadƒIf IFAHONORPRIORITY=YES

ƒOA14131 removes need for IFACROSSOVER=YES in order to use IFAHONORPRIORITY=YES

zIIP implementation similar to zAAP

Reporting almost identical in RMF and Type 30

Simplified management

IFAHONORPRIORITY not used

“YES” behaviour always

Page 29: Much Ado About CPU

© 2009 IBM Corporation29

SMF Type 70 and zAAP

(Similar For zIIP)

Field Section DescriptionSMF70PRF Bit 4 Product IFA processors availableSMF70IFA CPU Control Number of IFA processors online at end of

intervalSMF70TYP CPU Data This engine is an IFASMF70CIX Logical Processor

Data2 if "Pool 2" i.e IFA, ICF or IFL

Page 30: Much Ado About CPU

© 2009 IBM Corporation30

SMF Type 72 and zAAP

(Similar For zIIP)

RMF Workload Activity Report--SERVICE TIMES--

TCB 581.6 <- TCB time (seconds) SRB 0.0

RCT 0.0

IIT 0.0

HST 0.0

IFA 0.0 <- zAAP time (seconds)

APPL% CP 64.6 <- GCP % of an engine

APPL% IFACP 0.0 <-% of an engine that could have been zAAP but wasn't

APPL% IFA 0.0 <-% of an engine that used a zAAP

Page 31: Much Ado About CPU

© 2009 IBM Corporation31

SMF Type 72 / 30 and zAAP

(Similar For zIIP)

APPL% IFACP is a subset of APPL% CP

Field R723NFFI is normalization factor for IFA service time

–Used to convert between real IFA time and normalized IFA times ƒEquivalent time on a GCPƒMultiply R723IFAT x R723NFFI / 256 = normalized IFA time

R723IFAU, R723IFCU, R723IFAD state samples

–IFA Using, IFA on CP Using, IFA Delay

SMF30:

–SMF30CPT includes time spent on a GCP but eligible for zAAP

–SMF30_TIME_ON_IFA is time spent on a zAAP

–SMF30_TIME_IFA_ON_CP is time spent on GCP but eligible for zAAP

–Other fields to do with enclaves

Page 32: Much Ado About CPU

© 2009 IBM Corporation32

Page 33: Much Ado About CPU

© 2009 IBM Corporation33

z/OS Release 8, 9, 10 Changes

Page 34: Much Ado About CPU

© 2009 IBM Corporation34

z/OS Release 8 Changes Type 70 Subtype 1

– Engine Counts by Pool

• Copes with “large LPAR configuration splits 70-1 record” case• PTF to Release 7

– Hardware Model

• Allows you to figure out how many uncharacterised PUs there are– But a z10 E64 with 2 OPTIONAL SAPs probably should be called

a “E62”• PTF to Release 7 and requires certain levels of PR/SM microcode

– Machine Serial Number

• Correlation with 74-4– Type 74 Subtype 4

• Machine Serial Number– Correlation with 70-1 and with peer Coupling Facility

• Structure-Level CPU– Requires CFLEVEL=15 Backup Foils

Page 35: Much Ado About CPU

© 2009 IBM Corporation35

z/OS Release 9 changes

More than 32 engines in an LPAR – The z9 limit is 54

• 64 on z10– GCPs, zIIPs and zAAPs added together

74-4 CPU enhancements– Whether Dynamic Dispatch is active– Whether a processor is shared or dedicated– Processor weight

• Requires CFLEVEL 14

Page 36: Much Ado About CPU

© 2009 IBM Corporation36

z/OS Release 10 Changes

All RMF Records– Whether at least one zAAP was online– Whether at least one zIIP was online

In Type 70 and retrofitted to supported releases:– Permanent and Temporary Capacity Models and 3

capacities– Hiperdispatch

• To be covered in a few minutes

Page 37: Much Ado About CPU

© 2009 IBM Corporation37

Defined- and Group- Capacity instrumentation

Page 38: Much Ado About CPU

© 2009 IBM Corporation38

Soft Capping and Group CapacityDefined Capacity

A throttle on the rolling 4-hour average of the LPARƒWhen this exceeds the defined capacity PR/SM softcaps the LPARƒCPU delay in RMF

SMF70PMA Average Adjustment Weight for pricing management

SMF70NSW Number of samples when WLM softcaps partitionGroup CapacitySimilar to Defined Capacity but for groups of LPARs on the same machinesSMF70GJT Timestamp when the system joined the Group Capacity group

SMF70GNM Group name

SMF70GMU Group Capacity MSU limit

Page 39: Much Ado About CPU

© 2009 IBM Corporation39

Exceeding 8 MSUs (MSU_VS_CAP > 100%) in the morning leads to active capping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuing numbers

Page 40: Much Ado About CPU

© 2009 IBM Corporation40

LPAR Table Fragment for Group Capacity

Page 41: Much Ado About CPU

© 2009 IBM Corporation41

Rolling

4-Hour

Average

MSUs

as % of G

roup Ca

p

Hour

– Does something strike you as odd here?

Page 42: Much Ado About CPU

© 2009 IBM Corporation42

Blocked Workloads

Page 43: Much Ado About CPU

© 2009 IBM Corporation43

z/OS Release 9 Blocked Workload Support Rolled back to R.7 and R.8 Blocked workloads:

– Lower priority work may not get dispatched for an elongated time

– May hold a resource that more important work is waiting for

WLM allows some throughput for blocked workloads– By dispatching low important workload from time to time, these

“blocked workloads” are no longer blocked– Helps to resolve resource contention for workloads that have

no resource management implemented– Additional information in WSC flash

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609

Page 44: Much Ado About CPU

© 2009 IBM Corporation44

IEAOPT BLWLTRPCT and BLWLINTHD (With OA22443) BLWLTRPCT Percentage of the CPU capacity of the LPAR to be used for

promotion

Specified in units of 0.1%

Default is 5 (=0.5%)

Maximum is 200 (=20%)

Would only be spent when sufficiently many dispatchable units need promotion.

BLWLTRPCT Percentage of the CPU capacity of the LPAR to be used for promotion

Specified in units of 0.1%

Default is 5 (=0.5%)

Maximum is 200 (=20%)

Would only be spent when sufficiently many dispatchable units need promotion.

Page 45: Much Ado About CPU

© 2009 IBM Corporation45

Type 70 CPU Control Section

Type 72-3 Service/Report Class Period Data Section

Page 46: Much Ado About CPU

© 2009 IBM Corporation46

IBM System z10 EC HiperDispatch

Page 47: Much Ado About CPU

© 2009 IBM Corporation47

HiperDispatch – z10 EC unique function

– Dispatcher Affinity (DA) - New z/OS Dispatcher

– Vertical CPU Management (VCM) - New PR/SM Support Hardware cache optimization occurs when a given unit of work is consistently

dispatched on the same physical CPU

– Up until now software, hardware, and firmware have acted independently of each other

– Non-Uniform-Memory-Access has forced a paradigm change• CPUs have different distance-to-memory attributes

• Memory accesses can take a number of cycles depending upon cache level / local or remote memory accessed

The entire z10 EC hardware / firmware /OS stack now tightly collaborates to manage these effects

z10 EC HiperDispatch

Page 48: Much Ado About CPU

© 2009 IBM Corporation48

New z/OS Dispatcher–Multiple dispatching queues

• Average 4 logical processors per queue–Tasks distributed amongst queues–Periodic rebalancing of task assignments–Generally assign work to minimum # logicals needed to use

weight• Expand to use white space on box

–Real-time on/off switch (Parameter in IEAOPTxx)–May require "tightening up" of WLM policies for important work

• Priorities are more sensitive with targeted dispatching queues

z10 EC HiperDispatch – z/OS Dispatcher Functionality

Page 49: Much Ado About CPU

© 2009 IBM Corporation49

z10 EC HiperDispatch – z/OS Dispatcher Functionality…

Initialization– Single HIPERDISPATCH=YES z/OS parameter dynamically

activates HiperDispatch (full S/W and H/W collaboration) without IPL• With HIPERDISPATCH=ON, IRD management of CPU is

turned OFF – Four Vertical High LPs are assigned to each Affinity Node – A “Home” Affinity Node is assigned to each address space /

task– zIIP, zAAP and standard CP “Home” Affinity Nodes must be

maintained for work that transitions across specialty engines– Benefit increases as LPAR size increases (i.e. crosses

books)

Page 50: Much Ado About CPU

© 2009 IBM Corporation50

Workload Variability Issues–Short Term

• Dealing with transient utilization spikes–Intermediate

• Balancing workload across multiple Affinity Nodes– Manages “Home” Book assignment

–Long Term• Mapping z/OS workload requirements to available physical

resources – Via dynamic expansion into Vertical Low Logical Processors

z10 EC HiperDispatch – z/OS Dispatcher Functionality…

Page 51: Much Ado About CPU

© 2009 IBM Corporation51

New PR/SM Support–Topology information exchanged with z/OS

• z/OS uses this to construct its dispatching queues–Classes of logicals

• High priority allowed to consume weight–Tight tie of logical processor to physical processor

• Low priority generally run only to consume white space

z10 EC HiperDispatch – PR/SM Functionality

Page 52: Much Ado About CPU

© 2009 IBM Corporation52

z10 EC HiperDispatch – PR/SM Functionality…

Firmware Support (PR/SM, millicode)– New z/OS invoked instruction to cause PR/SM to enter “Vertical

mode” • To assign vertical LPs subset and their associated LP to physical CP

mapping – Based upon LPAR weight

– Enables z/OS to concentrate its work on fewer vertical processors • Key in PR/SM overcommitted environments to reduce the LP

competition for physical CP resources– Vertical LPs are assigned High, Medium, and Low attributes– Vertical low LPs shouldn’t be used unless there is logical white

space within the CEC and demand within LPAR

Page 53: Much Ado About CPU

© 2009 IBM Corporation53

z10 EC HiperDispatch Instrumentation Hiperdispatch status

– SMF70HHF bits for Supported, Active, Status Changed Parked Time

– SMF70PAT in CPU Data Section Polarization Weight

– SMF70POW in Logical Processor Data Section• Highest weight for LPAR means Vertical High processor• Zero weight means Vertical Low processor• In-between means Vertical Medium processor

Example on next foil– 2 x Vertical High (VH)– 1 x Vertical Medium (VM)– 4 x Vertical Low (VL)– Because Hiperdispatch all engines online in the interval are online all the time

• But there are other engines reserved so with Online Time = 0

Page 54: Much Ado About CPU

© 2009 IBM Corporation54

Depiction Of An LPAR – With HiperDispatch Enabled

0

20

40

60

80

100

120

0 1 2 3 4 5 60

20

40

60

80

100

120

140

160

UNPARKED % PARKED % POLAR WEIGHT I/O %

Page 55: Much Ado About CPU

© 2009 IBM Corporation55

HiperDispatch “GA2” Support in RMF - OA21140 SMF70POF Polarisation Indicators Bits 0,1

– 00 is “Horizontal” or “Polarisation Not Indicated”

– 01 is “Vertical Low”

– 10 is “Vertical Medium”

– 11 is “Vertical High”

– (Bit 2 is whether it changed in the interval)

SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors online and unparked

– Refinement is to take into account parking and unparking

Also SMF70RNM– Normalisation factor for zIIP

• Which happens to be the same for zAAP

Also R744LPN – LPAR Number

– For correlation with SMF 70

(Also zHPF support)

Page 56: Much Ado About CPU

© 2009 IBM Corporation56

“Cool It” - Cycle Steering Introduced with z990

– http://www.research.ibm.com/journal/rd/483/goth.html

Refined in later processors

– BOTH frequency- and voltage-reduction in z9

When cooling degraded processor progressively slowed

– Much better than dying

– Rare event

• But should not be ignored

WLM Policy refreshed

– Admittedly not that helpful message:

• IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE• Automate it

– SMF70CPA not changed

• Used as part of SCRT• Talk to IBM and consider excluding intervals round such an event

– R723MADJ is changed

• Al Sherkow's news item shows an example:– http://www.sherkow.com/updates/20081014cooling.html

Page 57: Much Ado About CPU

© 2009 IBM Corporation57

IOPs – I/O Assist Processors Not documented in Type 70

– Despite being regular engines characterised as IOPs

– NOT a pool

Instrumentation in Type 78-3– Variable-length Control Section

• 1 IOP Initiative Queue / Util Data Section per IOP inside it

– Processor Was Busy / Was Idle counts• NOT Processor Utilisation as such• Suggest stacking the two numbers on a by-hour plot

– I/O Retry counts• Channel Path Busy, CU Busy, Device Busy

Machines can be configured with different numbers of IOPs– Depending on I/O intensiveness of workloads

• Generally speaking it's only TPF that is said to need extra IOPs– Analysis can help get this right

Page 58: Much Ado About CPU

© 2009 IBM Corporation58

In Conclusion

Page 59: Much Ado About CPU

© 2009 IBM Corporation59

In ConclusionBe prepared for fractional engines, multiple engine pools, varying weights etc

Understand the limitations of z/OS Image Level CPU Utilisation as a number

Consider the value of IRD for complex LPAR setups

Take advantage of Coupling Facility Structure CPU

For Capacity Planning

For CF Request Performance Analysis

There’s additional instrumentation for Defined- and Group-Capacity limits

z9 and z10 ARE different from z990 – and from each other

•And z10 is evolving

The CPU data model is evolving

To be more complete

To be more comprehensible

To meet new challenges

Such as Hiperdispatch’s Parked Time state

For example SMF 23 and 113

Page 60: Much Ado About CPU

© 2009 IBM Corporation60

Backup Foils

Page 61: Much Ado About CPU

© 2009 IBM Corporation61

SMF Type 70 Subtype 1 LayoutCPU Control Section

–Control informationƒSuch as machine type and model (software)

CPU Data Sections

–1 per logical processor for this z/OS imageƒCount is number that were ever on in the interval

ASID Data Area Section

–Address space distributions

PR/SM Partition Data Section

–One for each partitionƒWhether active or not

PR/SM Logical Processor Data Section

–One for each logical engine for each partitionƒIncludes reserved enginesƒInactive LPARs have zero sections

CPU Identification Section

–Table containing mnemonics for engine pools and engine counts

Page 62: Much Ado About CPU

© 2009 IBM Corporation62

Other System z9 ChangesMultiple speed engines

Up to 8 slower speed GCPs for System z9 Business Class

Separate management pools for all engine types

Using Pools 3, 4, 5 and 6

Pool 2 obsoleted

zAAP Initial Weight can be different from GCP Initial Weight

More processors in a CEC / in a book:

S08, S18, S28 and S38 still 12 engines in a book

4 2-engine chips and 2 1-engine chip

2 spares across entire CEC and 2 SAPs in a book

So (12 - 2) * #books – 2 compared to 2 (12 – 2 - 2) * #books

S54 has 16 in a book

4 2-engine chips2 spares across entire CEC and 2 SAPs in a bookSo (16 - 2) * #books - 2 = 54

Page 63: Much Ado About CPU

© 2009 IBM Corporation63

-> CPU Identification Name Section (6)

======================================

#1: +0000: C3D74040 40404040 40404040 40404040 *CP *

+0010: 00320000 * *

#2: +0000: 40404040 40404040 40404040 40404040 * *

+0010: 00000000 * *

#3: +0000: C9C6C140 40404040 40404040 40404040 *IFA *

+0010: 00020000 * *

#4: +0000: C9C6D340 40404040 40404040 40404040 *IFL *

+0010: 00000000 * *

#5: +0000: C9C3C640 40404040 40404040 40404040 *ICF *

+0010: 00000000 * *

#6: +0000: C9C9D740 40404040 40404040 40404040 *IIP *

+0010: 00020000 * *

Page 64: Much Ado About CPU

© 2009 IBM Corporation64

-> CPU Control Section (1)

==========================

#1: +0000: 20940036 18980000 F7F4F440 40404040 * m q 744 *

+0010: 40404040 40404040 001B0000 000004B6 * ¶*

+0020: 0000011F 00000001 00000000 00000000 * *

+0030: E2F5F440 40404040 40404040 40404040 *S54 *

+0040: 0001C02F E7C77139 1000F0F2 4040F0F0 * { XGÉ 02 00*

+0050: F0F0F0F0 F0F0F0F0 F0F4C2F1 F0C50000 *0000000004B10E *

+0060: 00000000 0000 * *

Page 65: Much Ado About CPU

© 2009 IBM Corporation65

-> Local Coupling Facility Data Section (1)

===========================================

#1: +0000: C3C6F140 40404040 E2E8E2C4 40404040 *CF1 SYSD *

+0010: 80000000 00000001 00000000 00000000 *Ø *

+0020: 0000000E 00000007 00000007 00000000 * *

+0030: 00000000 44142800 00000000 00000000 * à *

+0040: 00000000 00000000 00000000 00000000 * *

+0050: 00000000 00000000 00000000 00000000 * *

+0060: 00000000 4040F2F0 F8F4C2F1 F6F0F200 * 2084B1602 *

+0070: 0000000E 80C08000 C3C2D740 40404040 * Ø{Ø CBP *

+0080: 40404040 40404040 40404040 40404040 * *

+0090: 40404040 40404040 40404040 40404040 * *

+00A0: F0F0F0F0 F0F0F0F2 F3C1F6C1 *000000023A6A *

Main Presentation

Page 66: Much Ado About CPU

© 2009 IBM Corporation66

A Way of Looking at a Logical Engine – Breaking the RMF Interval Up Into Components

1 Logical CP does not exist2 Logical CP not dispatched3 LPAR overhead *4 Logical CP dispatched for work

* Other overhead is recordedin PHYSICAL LPAR

0

0.2

0.4

0.6

0.8

1Fraction of Interval

Interval - SMF70ONT (1)SMF70ONT - SMF70PDT (2)

SMF70PDT - SMF70EDT (3)SMF70EDT (4)

With z10 HiperDispatch there’s another state: PARKED

Add to (1) when calculating z/OS CPU UtilisationNOTE: If HiperDispatch is enabled Online Time is normally the RMF interval for non-reserved engines

Page 67: Much Ado About CPU

© 2009 IBM Corporation67

IRD and Standby ImagesIn IRD example (in backup foils) R1 and R3 could be viewed as hot standby images

–If one fails the other picks up the load

–In fact workload affinities complicate this:ƒSome work has specific affinity to R1, explaining the imbalanceƒSo failover might not work perfectly

Multi-Machine Hot Standby cases pretty similar

–It's just one LPAR suddenly gets much bigger and no others shrink

Other resources need to be taken into account - to sustain the oncoming work

zIIPs and zAAPs not “automatically provisioned” by IRD

–IRD will probably do the job for disk I/O

–Real memory needs to be available to support additional workƒNot managed by IRD

–DB2 Virtual Storage needs to plan for takeover caseƒMainly threads but also eg buffer pools

–Workload routingƒIn some cases determined by WLM

Page 68: Much Ado About CPU

© 2009 IBM Corporation68

Example of IRD and LPAR Weights - 3 Systems on a z990

0 1

2 3

4 5

6 7

8 9

1011

1213

1415

1617

1819

2021

2223

0

100

200

300

400

500

600

700

800

900

R3R1E1

Page 69: Much Ado About CPU

© 2009 IBM Corporation69

IRD Changing Weights and Engines - 2 LPARs

0 1

2 3

4 5

6 7

8 9

1011

1213

1415

1617

1819

2021

2223

150

200

250

300

350

400

450

500

Weight

3.5

4

4.5

5

5.5

6

6.5

Engines

R1 Weight R3 Weight R1 CPs R3 CPs

Page 70: Much Ado About CPU

© 2009 IBM Corporation70

SMF Type 70 and IRDField Section DescriptionSMF70CNF Bit 6 CPU Data This z/OS image's engine n reconfigured during

intervalSMF70CNF Bit 7 CPU Data This z/OS image's engine n online at end of interval

SMF70BDN Partition Data Number of engines defined for this LPAR - both online and reserved

SMF70SPN Partition Data LPAR Cluster Name

SMF70ONT Logical Processor Data Logical Processor Online Time (only for IRD-capable processors)

SMF70BPS Logical Processor Data Traditional Weight (X'FFFF' = reserved)

SMF70VPF Bit 2 Logical Processor Data Weight has changed during interval

SMF70MIS / MAS Logical Processor Data Max and Min Share

SMF70NSI / NSA Logical Processor Data Number of samples share with 10% of Min / Max

SMF70ACS Logical Processor Data Accumulated processor share - Divide by SMF70DSA to get average share

Page 71: Much Ado About CPU

© 2009 IBM Corporation71

Online CPs by hour - "1" means the CP was online all hour (Chart based on SMF70ONT)

01

23

45

67

89

1011

1213

1415

1617

1819

2021

22

hour

0

2

4

6

8

10

12

Engines

11109876543210

Main Presentation