z/vm performance with smt june 27, 2015 z/vm workshop xenia tkatschow z/vm performance analysis...

36
z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis [email protected] © 2013, 2015 IBM Corporation

Upload: maude-little

Post on 11-Jan-2016

235 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

© 2013, 2015 IBM Corporation

z/VM Performance with SMT June 27, 2015 z/VM Workshop

Xenia Tkatschowz/VM Performance [email protected]

Page 2: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

2 © 2013, 2015 IBM Corporation

TrademarksThe following are trademarks of the International Business Machines Corporation in the United States and/or other countries.

Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g., zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT.

The following are trademarks or registered trademarks of other companies.

* Other product and service names might be trademarks of IBM or other companies.

* Registered trademarks of IBM Corporation

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. andLinux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. OpenStack is a trademark of OpenStack LLC. The OpenStack trademark policy is available on the OpenStack website.TEALEAF is a registered trademark of Tealeaf, an IBM Company.Windows Server and the Windows logo are trademarks of the Microsoft group of countries.Worklight is a trademark or registered trademark of Worklight, an IBM Company.UNIX is a registered trademark of The Open Group in the United States and other countries.

BladeCenter*DB2*DS6000*DS8000*ECKD

FICON*GDPS*HiperSocketsHyperSwapIBM z13*

OMEGAMON*Performance Toolkit for VMPower*PowerVMPR/SM

RACF*Storwize*System Storage*System x*System z*

System z9*System z10*Tivoli*zEnterprise*z/OS*

zSecurez/VM*z Systems*

Page 3: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

3 © 2013, 2015 IBM Corporation

AgendaOverview of Architecture Changes

SMTMET Tool (new – Available July 1))

CPUMF Tool

Closer Look At Performance Results

Monitor and PERFKIT Changes

Page 4: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

4 © 2013, 2015 IBM Corporation

Overview of Architecture Changes

Page 5: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

5 © 2013, 2015 IBM Corporation

z13 Notable Characteristics

Compared to the zEC12, the z13 offers larger cachesL1 I-cache is 50% larger L1 D-cache is 33% largerL2 cache is 100% largerL3 cache is 33% larger L4 cache is 25% larger

CPU cores are multithreaded

Clock speed is slower than zEC12

Various other changes and improvements (e.g. Branch Prediction)

z/VM exploits multithreading only on IFL cores

Page 6: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

6 © 2013, 2015 IBM Corporation

Review of Performance Tools CPUMF and SMTMET

Page 7: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

7 © 2013, 2015 IBM Corporation

CPUMF Display Tool An Exec that extracts z System CPU records its internal performance

experience; metrics as instructions completed, clock cycles used, and cache misses

http://www.vm.ibm.com/perf/tips/cpumf.html

The process of reducing the CPUMF counters: Start with a MONWRITE file that contains Domain 5 Record 13 records.

• Command Syntax

EXEC CPUMFINT filename MONDATA filemode

• Resultant file:

filename CPUMFINT filemode Interim file

• Command Syntax EXEC CPUMFLOG filename CPUMFINT filemode

• Resultant file: filename $CPUMFLG filemode Final report file

Page 8: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

8 © 2013, 2015 IBM Corporation

Sample $CPUMFLG Output

_IntEnd_ LPU Typ ___L1MP___ ___L2P____ ___L3P____ ___L4LP__

>>Mean>> 0 IFL 2.05 87.22 12.71 0.0

>>Mean>> 1 IFL 2.01 87.27 12.66 0.0

>>Mean>> 2 IFL 2.02 87.13 12.80 0.0

>>Mean>> 3 IFL 2.04 87.06 12.86 0.0

>>Mean>> 4 IFL 2.01 87.25 12.68 0.0

>>Mean>> 5 IFL 2.01 87.21 12.72 0.0

>>MofM>> 2.02 87.19 12.74 0.0

>>AllP>>

00:46:02 0 IFL 1.99 87.00 12.93 0.0

00:46:02 1 IFL 1.99 87.04 12.91 0.0

00:46:02 2 IFL 1.96 87.01 12.93 0.0

00:46:02 3 IFL 1.96 86.93 13.01 0.0

00:46:02 4 IFL 1.97 86.95 12.98 0.0

00:46:02 5 IFL 1.99 86.96 12.96 0.0

L1MP – Percentage of instructionsThat incur an L1 miss

L2P – Percentage of L1 misses Sourced from L2

Page 9: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

9 © 2013, 2015 IBM Corporation

Lets consider a Workload’s Cache Footprint

Without SMT A workload that:

Stayed well within the zEC12’s cache might see only modest improvement on z13 because it will get no help from increased z13 cache sizes

Grossly overflows cache on both machines might see no benefit from z13

Didn’t fit well into zEC12 cache but does fit well into the increased caches on z13; might experience the most improvement

Now enable SMT

You have twice as many logical processors competing for the same amount of cache

L1s and L2s are larger on the z13, but with multithreading, two threads of a core share the L1 and the L2. This may change the performance of the L1 and L2 and is very much a function of the workload

Again, CPUMF will come in handy to observe your workload’s behavior with respect to cache.

Page 10: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

10 © 2013, 2015 IBM Corporation

SMTMET Display Tool

An EXEC that extracts MT metrics from Domain 0 Record 2. Available on our download library. The resultant file includes two reports: per-core-type report per-core report

SMTMET Documentation: http://www.vm.ibm.com/perf/tips/smtmet.html (Available July 1)

The process of reducing the CPUMF counters

Start with a MONWRITE file that contains D0 R2 records.

Command Syntax from CMS prompt:

SMTMET filename MONDATA filemode

Resultant file from CMS prompt:

filename $SMTMET filemode

Page 11: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

11 © 2013, 2015 IBM Corporation

What are all those other numbers on Indicate MT?

Busy time - how often the core was executing instructions during the interval.

Thread density - how often the core was able to run both threads at once, while the core was in use at all.

Productivity - how often the core was completely busy on both threads while the core was in use.

MT Utilization - how much of the maximum core capacity was used.

Capacity factor - a way of looking at the amount of work the multithreaded core was able to accomplish compared to the amount of work a single threaded core could accomplish.

Maximum Capacity factor - how much work could've been accomplished at the current rate, if the core had been kept busier.

Page 12: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

12 © 2013, 2015 IBM Corporation

D0R2 Per-Core-type Report for file: IDLESYS MONDATA Interval Core Sampled Pct Core Pct Cap Pct Max Pct MT Average__Ended_ Type ___Secs___ ___Cores__ Prodctvity __Factor__ _Cap Fct__ Utilztion_ Thread Den>>Mean>> IFL 90.0 3.0 76.7 124.1 172.1 0.0 1.36 13:51:42 IFL 119.8 4.0 80.6 137.3 175.7 0.0 1.4913:52:12 IFL 120.1 4.0 80.9 136.1 173.4 0.0 1.4913:52:42 IFL 120.0 4.0 80.8 135.8 173.7 0.0 1.4913:53:12 IFL 120.1 4.0 80.3 137.2 177.3 0.0 1.5013:53:42 IFL 120.0 4.0 82.1 132.1 164.1 0.0 1.4813:54:12 IFL 60.0 2.0 81.2 134.0 171.9 0.1 1.4913:54:42 IFL 60.0 2.0 70.0 106.5 167.1 0.1 1.1813:55:12 IFL 60.0 2.0 59.2 103.2 200.0 0.1 1.1213:55:42 IFL 60.0 2.0 86.0 101.0 118.2 0.1 1.0913:56:12 IFL 60.0 2.0 66.0 118.0 200.0 0.1 1.26

Page 13: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

13 © 2013, 2015 IBM Corporation

D0R2 Per-Core Report for file: IDLESYS MONDATA Interval Core Core Pct Core Pct MT Average Pct Core__Ended_ _ID_ Type ___Secs___ Prodctvity Utilztion_ Thread Den ___Busy___>>Mean>> 00 IFL 30.0 71.7 0.1 1.20 0.05>>Mean>> 01 IFL 30.0 78.8 0.1 1.39 0.11>>Mean>> 02 IFL 30.0 0.0 0.0 1.38 0.01>>Mean>> 03 IFL 30.0 0.0 0.0 1.40 0.01 13:52:42 00 IFL 30.0 .......... .......... 1.24 0.0213:52:42 01 IFL 30.0 80.8 0.1 1.55 0.1213:52:42 02 IFL 29.9 .......... .......... 1.40 0.0113:52:42 03 IFL 30.0 .......... .......... 1.40 0.01 13:53:12 00 IFL 30.0 .......... .......... 1.25 0.0213:53:12 01 IFL 30.0 80.3 0.1 1.55 0.1213:53:12 02 IFL 30.1 .......... .......... 1.39 0.0113:53:12 03 IFL 30.0 .......... .......... 1.40 0.01 13:53:42 00 IFL 30.0 .......... .......... 1.23 0.0213:53:42 01 IFL 30.0 82.1 0.1 1.54 0.1313:53:42 02 IFL 30.0 .......... .......... 1.30 0.0113:53:42 03 IFL 30.0 .......... .......... 1.40 0.01 13:54:12 00 IFL 29.9 .......... .......... 1.20 0.0213:54:12 01 IFL 30.0 81.2 0.1 1.55 0.12 13:54:42 00 IFL 30.0 70.0 0.1 1.26 0.1013:54:42 01 IFL 30.0 .......... .......... 1.07 0.07

Page 14: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

14 © 2013, 2015 IBM Corporation

Performance Results

Page 15: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

15 © 2013, 2015 IBM Corporation

Performance Measurements ■ SMT2 Ideal Application

■ Maximum Storage Configuration

■ Maximum Logical Processor Configuration

■ Linux-only mode with Single Processor serialization Application

■ Mitigation 1: Increasing virtual processors

■ Mitigation 2: Increasing servers in workload

■ Linux-only mode with Master Processor Serialization Application

■ z/VM-mode with Master Processor Serialization Application

■ CPU Pooling Workload

■ Live Guest Relocation (LGR) Workload

For a more details about performance results see: http://www.vm.ibm.com/perf/reports/zvm/html/1q5smt.html

Page 16: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

16 © 2013, 2015 IBM Corporation

Performance Measurements: SMT2 Ideal Application ------------------------- ---------- ---------- ----------- ------ Run ID AMPDGLD0 AMPDGLD1 Delta Pct Multithreading (p) Disabled Enabled Logical Processors (p) 4 8 4 100.0 ------------------------- ---------- ---------- ----------- ------ ETR (c) 7396.283 10072.687 2676.404 36.2 ITR (p) 7383.94 11362.15 3978.21 53.9 Total Util/Proc (p) 95.6 84.6 -11.0 -11.5 AWM avgRT (a) 0.008674 0.006445 -0.002229 -25.7 Client Util (p) 151.381 272.698 121.317 80.1 Server Util (p) 39.254 64.810 25.556 65.1 Virtual CPUs 4 4 0.0 -Avg Thread Density na 1.83 ------------------------- ---------- ---------- ----------- ------

------------------------- ---------- ---------- ----------- ------ • Highly parallel activity with no single point of serialization • Total of 16 virtual processors to drive the 4 logical processors in the nonSMT case and 8 logicals with SMT• This configuration demonstrates a value that can be obtained for a workload that has ideal SMT characteristics

Page 17: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

17 © 2013, 2015 IBM Corporation

D0R2 Per-Core-type Report for file: AMPDGLD1 MONDATA Interval Core Sampled Pct Core Pct Cap Pct Max Pct MT Average__Ended_ Type ___Secs___ ___Cores__ Prodctvity __Factor__ _Cap Fct__ Utilztion_ Thread Den>>Mean>> IFL 120.0 4.0 93.6 156.4 167.1 86.0 1.83 21:32:02 IFL 120.0 4.0 93.6 159.2 170.1 74.5 1.8421:32:32 IFL 120.0 4.0 93.6 158.7 169.6 89.7 1.8421:33:02 IFL 120.0 4.0 93.2 157.7 169.2 89.3 1.8321:33:32 IFL 119.6 4.0 93.4 158.9 170.1 89.3 1.8421:34:02 IFL 120.0 4.0 93.3 159.2 170.5 89.4 1.8421:34:32 IFL 120.0 4.0 93.5 158.9 169.9 89.8 1.8421:35:02 IFL 120.0 4.0 94.1 161.1 171.1 91.2 1.8621:35:32 IFL 120.0 4.0 93.4 159.0 170.1 89.8 1.8421:36:02 IFL 120.0 4.0 93.8 159.5 170.0 90.5 1.8521:36:32 IFL 120.0 4.0 93.0 158.5 170.4 88.7 1.8321:37:02 IFL 120.0 4.0 93.4 159.1 170.3 89.7 1.8421:37:32 IFL 120.0 4.0 93.7 159.4 170.1 90.2 1.8521:38:02 IFL 120.0 4.0 93.7 159.0 169.6 90.4 1.85

Page 18: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

18 © 2013, 2015 IBM Corporation

D0R2 Per-Core Report for file: AMPDGLD1 MONDATA Interval Core Core Pct Core Pct MT Average Pct Core__Ended_ _ID_ Type ___Secs___ Prodctvity Utilztion_ Thread Den ___Busy___>>Mean>> 00 IFL 30.0 93.6 86.0 1.83 92.06>>Mean>> 01 IFL 30.0 93.5 86.0 1.83 91.92>>Mean>> 02 IFL 30.0 93.7 86.3 1.83 92.20>>Mean>> 03 IFL 30.0 93.6 85.9 1.84 91.86 21:32:02 00 IFL 30.0 93.4 74.0 1.84 79.2621:32:02 01 IFL 30.0 93.1 74.4 1.83 79.9121:32:02 02 IFL 30.0 93.8 74.2 1.85 79.1121:32:02 03 IFL 30.0 93.8 75.4 1.85 80.39 21:32:32 00 IFL 30.0 94.1 91.2 1.86 96.8621:32:32 01 IFL 30.0 93.3 88.8 1.84 95.1621:32:32 02 IFL 30.0 92.7 88.3 1.82 95.2821:32:32 03 IFL 30.0 94.0 90.7 1.86 96.42 21:33:02 00 IFL 30.0 92.7 88.6 1.82 95.5921:33:02 01 IFL 30.0 92.6 88.1 1.82 95.2121:33:02 02 IFL 30.0 94.1 91.0 1.86 96.7121:33:02 03 IFL 30.0 93.3 89.3 1.84 95.71

Page 19: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

19 © 2013, 2015 IBM Corporation

FCX304 Run 2015/03/04 15:16:28 PRCLOG Processor Activity, by TimeFrom 2015/02/14 16:31:32 To 2015/02/14 16:42:02 For 630 Secs 00:10:30 "This is a performance repo___________________________________________________________________ <--- Percent Busy ----> <-- Ra C Pct Interval P Park Inst End Time U Type PPD Ent. DVID Time Total User Syst Emul Siml >>Mean>> 0 IFL VhD 100 0000 0 84.7 84.5 .2 77.0 30035 >>Mean>> 1 IFL VhD 100 0000 0 84.3 84.1 .2 76.8 29845 >>Mean>> 2 IFL VhD 100 0001 0 84.5 84.4 .2 76.8 31053 >>Mean>> 3 IFL VhD 100 0001 0 84.6 84.4 .2 77.0 30648 >>Mean>> 4 IFL VhD 100 0002 0 84.5 84.3 .2 77.0 29912 >>Mean>> 5 IFL VhD 100 0002 0 84.9 84.7 .2 77.5 29667 >>Mean>> 6 IFL VhD 100 0003 0 84.8 84.6 .2 77.3 29368 >>Mean>> 7 IFL VhD 100 0003 0 84.7 84.5 .2 77.3 29026 >>Total> 8 IFL VhD 800 MIX 0 677.0 675.5 1.5 616.6 240k

Page 20: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

20 © 2013, 2015 IBM Corporation

Performance Measurements: Single Processor Serialization Application ------------------------- ---------- ---------- ----------- ------Run ID APNDGLD0 APNDGLD1 Delta PctMultithreading (p) Disabled Enabled Logical Processors (p) 3 6 3 100.0 ------------------------- ---------- ---------- ----------- ------ ETR (c) 7787.112 5051.547 -2735.565 -35.1 ITR (p) 7753.26 7891.05 137.79 1.8 Total Util/Proc (p) 95.8 61.1 -34.7 -36.2 AWM avgRT (a) 0.004897 0.007102 0.002205 45.0 Client Util (p) 70.460 95.413 24.953 35.4 Client Virt CPUs (p) 1 1 0 0.0 Server Util (p) 6.194 6.532 0.338 5.5 Thread Density na 1.28 ------------------------- ---------- ---------- ----------- ------

• Client is a one-way and becomes the single point of serialization because now it is running on a thread and sharing the core with another thread. • A guest that is a one-way and is using more than 50% of a core; one would think it might become processor constrained in the SMT2 environment • In this case, the client is driving the workload, so when the client becomes processor constrained, the whole workload is slowed down. External Transaction Rate decreased by 35%. • Not all workloads show benefit when SMT2 is enabled • We adjusted this workload in the SMT2 environment to help overcome the performance impact

Page 21: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

21 © 2013, 2015 IBM Corporation

D0R2 Per-Core-type Report for file: APNDGLD1 MONDATA Interval Core Sampled Pct Core Pct Cap Pct Max Pct MT Average__Ended_ Type ___Secs___ ___Cores__ Prodctvity __Factor__ _Cap Fct__ Utilztion_ Thread Den>>Mean>> IFL 90.0 3.0 90.7 104.1 114.7 86.5 1.28 19:19:53 IFL 90.0 3.0 89.7 104.6 116.5 69.6 1.2819:20:23 IFL 90.0 3.0 89.8 104.5 116.2 89.8 1.2819:20:53 IFL 90.0 3.0 90.6 104.1 114.7 90.6 1.2819:21:23 IFL 90.0 3.0 90.6 104.1 114.9 90.6 1.2819:21:53 IFL 90.0 3.0 90.6 104.1 114.8 90.6 1.2819:22:23 IFL 90.0 3.0 90.5 104.2 115.0 90.4 1.2819:22:53 IFL 90.0 3.0 90.3 104.3 115.3 90.3 1.2819:23:23 IFL 90.0 3.0 89.9 104.5 116.1 89.9 1.2819:23:53 IFL 90.0 3.0 90.8 104.1 114.6 90.8 1.2819:24:23 IFL 90.0 3.0 90.5 104.2 115.0 90.5 1.28

Page 22: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

22 © 2013, 2015 IBM Corporation

D0R2 Per-Core Report for file: APNDGLD1 MONDATA Interval Core Core Pct Core Pct MT Average Pct Core__Ended_ _ID_ Type ___Secs___ Prodctvity Utilztion_ Thread Den ___Busy___>>Mean>> 00 IFL 30.0 90.8 86.5 1.27 95.53>>Mean>> 01 IFL 30.0 91.0 86.8 1.29 95.60>>Mean>> 02 IFL 30.0 90.4 86.3 1.27 95.49 19:19:53 00 IFL 30.0 86.8 67.3 1.12 77.4919:19:53 01 IFL 30.0 91.5 70.8 1.41 77.3519:19:53 02 IFL 30.0 90.7 70.5 1.31 77.74 19:20:23 00 IFL 30.0 87.5 87.4 1.13 99.9319:20:23 01 IFL 30.0 90.8 90.8 1.32 99.9519:20:23 02 IFL 30.0 91.2 91.2 1.38 99.95 19:20:53 00 IFL 30.0 90.1 90.1 1.24 99.9419:20:53 01 IFL 30.0 91.3 91.3 1.32 99.9519:20:53 02 IFL 30.0 90.5 90.5 1.28 99.94 19:21:23 00 IFL 30.0 90.9 90.9 1.29 99.9419:21:23 01 IFL 30.0 89.9 89.9 1.25 99.9419:21:23 02 IFL 30.0 90.8 90.8 1.30 99.95 19:21:53 00 IFL 30.0 90.3 90.3 1.26 99.9519:21:53 01 IFL 30.0 91.2 91.2 1.32 99.9419:21:53 02 IFL 30.0 90.3 90.3 1.26 99.94

Page 23: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

23 © 2013, 2015 IBM Corporation

Mitigation 1: adding more virtual processors

------------------------- ---------- ---------- ----------- ------ Run ID APNDGLD1 APNDGLDF Delta Pct Multithreading (p) Enabled Enabled Logical Processors (p) 6 6 0 0.0 ------------------------- ---------- ---------- ----------- ------ ETR (c) 5051.547 8292.568 3241.021 64.2 thrputITR (p) 7891.05 8540.06 649.01 8.2 Total Util/Proc (p) 61.1 92.6 31.5 51.6 AWM avgRT (a) 0.007102 0.004388 -0.002714 -38.2 Resp. TimeClient Util (p) 95.413 143.794 48.381 50.7 Client Virt CPUs (p) 1 2 1 100.0 ------------------------- ---------- ---------- ----------- ------

• Each client was given an extra virtual processor; so now they each became a 2-way • Client utilization jumped from 100% to 143%. • Total workload utilization increased and the bottleneck was removed

Page 24: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

24 © 2013, 2015 IBM Corporation

D0R2 Per-Core-type Report for file: APNDGLDF MONDATA Interval Core Sampled Pct Core Pct Cap Pct Max Pct MT Average__Ended_ Type ___Secs___ ___Cores__ Prodctvity __Factor__ _Cap Fct__ Utilztion_ Thread Den>>Mean>> IFL 90.0 3.0 98.2 152.7 155.5 93.3 1.94 21:28:23 IFL 90.0 3.0 97.7 156.2 159.9 69.7 1.9421:28:53 IFL 90.0 3.0 97.7 154.6 158.2 97.0 1.9421:29:23 IFL 90.0 3.0 97.8 155.6 159.1 96.9 1.9421:29:53 IFL 90.0 3.0 98.2 155.3 157.9 97.8 1.9521:30:23 IFL 89.7 3.0 98.1 156.1 159.1 97.5 1.9521:30:53 IFL 90.0 3.0 98.0 155.3 158.3 97.6 1.9521:31:23 IFL 90.0 3.0 98.0 155.4 158.4 97.4 1.9521:31:53 IFL 90.0 3.0 98.1 155.4 158.3 97.6 1.9521:32:23 IFL 90.0 3.0 98.5 156.7 159.0 98.1 1.9621:32:53 IFL 90.0 3.0 98.5 156.2 158.5 98.0 1.9621:33:23 IFL 90.0 3.0 98.2 156.3 158.9 97.6 1.95

Page 25: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

25 © 2013, 2015 IBM Corporation

D0R2 Per-Core Report for file: APNDGLDF MONDATA Interval Core Core Pct Core Pct MT Average Pct Core__Ended_ _ID_ Type ___Secs___ Prodctvity Utilztion_ Thread Den ___Busy___>>Mean>> 00 IFL 30.0 98.1 93.3 1.94 95.06>>Mean>> 01 IFL 30.0 98.3 93.4 1.94 95.15>>Mean>> 02 IFL 30.0 98.1 93.3 1.94 95.11 21:28:23 00 IFL 30.0 97.8 70.0 1.94 71.6421:28:23 01 IFL 30.0 97.8 69.8 1.94 71.4021:28:23 02 IFL 30.0 97.4 69.3 1.93 71.22 21:28:53 00 IFL 30.0 97.6 96.8 1.94 99.2521:28:53 01 IFL 30.0 97.8 97.2 1.94 99.4721:28:53 02 IFL 30.0 97.6 96.8 1.93 99.26 21:29:23 00 IFL 30.0 97.9 96.9 1.94 99.0721:29:23 01 IFL 30.0 97.7 96.8 1.94 99.1421:29:23 02 IFL 30.0 97.9 97.2 1.94 99.28 21:29:53 00 IFL 30.0 98.4 98.0 1.96 99.5321:29:53 01 IFL 30.0 98.3 97.9 1.96 99.5321:29:53 02 IFL 30.0 98.0 97.6 1.95 99.52

Page 26: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

26 © 2013, 2015 IBM Corporation

Mitigation 2: adding more client guests

------------------------- ---------- ---------- ----------- ------ Run ID APNDGLD1 APNDGLDD Delta Pct Multithreading (p) Enabled Enabled Logical Processors (p) 6 6 0 0.0 ------------------------- ---------- ---------- ----------- ------ ETR (c) 5051.547 10143.937 5092.390 100.8 thrputITR (p) 7891.05 10127.51 2236.46 28.3 Total Util/Proc (p) 61.1 95.6 34.5 56.5 AWM avgRT (a) 0.007102 0.008332 0.001230 17.3 resp. timeClient Users (p) 3 6 3 100.0 Client Virt CPUs (p) 1 1 0 0.0 Client Util (p) 95.413 73.698 -21.715 -22.8 ------------------------- ---------- ---------- ----------- ------

• Another adjustment was to add more clients to drive the workload • Each client remained a one-way and we doubled the number of clients from 3 to 6 (this is another way to add parallelism to the workload)• The ETR increased by 100% • Workload adjustment should be considered when possible

Page 27: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

27 © 2013, 2015 IBM Corporation

D0R2 Per-Core-type Report for file: APNDGLDD MONDATA Interval Core Sampled Pct Core Pct Cap Pct Max Pct MT Average__Ended_ Type ___Secs___ ___Cores__ Prodctvity __Factor__ _Cap Fct__ Utilztion_ Thread Den>>Mean>> IFL 90.0 3.0 99.9 192.7 192.8 95.6 1.99 00:46:02 IFL 90.0 3.0 99.8 189.2 189.5 77.8 2.0000:46:32 IFL 90.0 3.0 99.9 198.0 198.1 99.8 2.0000:47:02 IFL 90.0 3.0 99.9 198.1 198.2 99.8 2.0000:47:32 IFL 90.0 3.0 99.9 197.6 197.7 99.8 2.0000:48:02 IFL 90.0 3.0 99.9 197.5 197.5 99.8 2.0000:48:32 IFL 90.0 3.0 99.9 197.8 197.8 99.8 2.0000:49:02 IFL 90.0 3.0 99.9 197.9 198.0 99.8 2.0000:49:32 IFL 90.0 3.0 99.9 197.8 197.8 99.8 2.0000:50:02 IFL 90.0 3.0 99.9 197.6 197.6 99.8 2.00

Page 28: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

28 © 2013, 2015 IBM Corporation

D0R2 Per-Core Report for file: APNDGLDD MONDATA Interval Core Core Pct Core Pct MT Average Pct Core__Ended_ _ID_ Type ___Secs___ Prodctvity Utilztion_ Thread Den ___Busy___>>Mean>> 00 IFL 30.0 100.0 95.8 1.99 95.84>>Mean>> 01 IFL 30.0 99.9 95.6 1.99 95.72>>Mean>> 02 IFL 30.0 99.9 95.5 2.00 95.59 00:46:02 00 IFL 30.0 99.8 77.7 2.00 77.8600:46:02 01 IFL 30.0 99.8 78.0 2.00 78.1200:46:02 02 IFL 30.0 99.8 77.8 2.00 77.93 00:46:32 00 IFL 30.0 99.9 99.8 2.00 99.9500:46:32 01 IFL 30.0 99.9 99.8 2.00 99.9500:46:32 02 IFL 30.0 99.9 99.8 2.00 99.94 00:47:02 00 IFL 30.0 99.9 99.8 2.00 99.9500:47:02 01 IFL 30.0 99.9 99.8 2.00 99.9400:47:02 02 IFL 30.0 99.9 99.8 2.00 99.95 00:47:32 00 IFL 30.0 99.9 99.8 2.00 99.9500:47:32 01 IFL 30.0 99.9 99.8 2.00 99.9400:47:32 02 IFL 30.0 99.9 99.8 2.00 99.95

Page 29: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

29 © 2013, 2015 IBM Corporation

Performance Measurements: Live Guest Relocation

25 Linux guests relocated while running three workloads

• PING – to simulate network traffic

• BLAST– to simulate I/O

• PFAULT- to simulate referencing storage

Relocation was done synchronously using the SYNC option of VMRELOCATE command

Results:

• Relocation time increased by 10%

• Quiesce time increased by 26%

• PFAULT (71%) and BLAST (34%) completions increased

• Total number of pages relocated during quiesce increased by 51%

Conclusion:

With SMT2, the BLAST and PFAULT workloads were changing pages more frequently, thus causing more pages to be moved during quiesce time.

Page 30: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

30 © 2013, 2015 IBM Corporation

Performance Measurements: Conclusion • Results in measured workloads varied widely.

• Best results were observed for applications having highly parallel activity and no single point of serialization.

• No improvements were observed for applications having a single point of serialization.

• To overcome serialization, workload adjustment should be done where possible.

• Workloads that have a heavy dependency on the z/VM master processor are not good candidates for SMT-2. In z/VM Performance Toolkit, the master processor can be identified from FCX100 CPU and FCX180 SYSCONF.

Page 31: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

31 © 2013, 2015 IBM Corporation

Performance Measurements: Conclusion • The multithreading metrics (provided by the $SMTMET tool) provide information

about how well the cores perform when SMT is enabled. There is no direct relationship with ETR or with transaction response time.

• Measuring workload throughput and response time is the best way to know

whether SMT is providing value to the workload.

Page 32: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

32 © 2013, 2015 IBM Corporation

Performance Monitor and Performance Toolkit Changes

Page 33: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

33 © 2013, 2015 IBM Corporation

Monitor Changes

New Monitor Record Name Domain 5 Record 20 MT CPUMF counters

Change Monitor Records Name Domain 0 Record 2 Processor data (per processor)Domain 0 Record 15 Logical CPU utilization (global)Domain 0 Record 16 CPU utilization in a logical partition)Domain 0 Record 17 Physical CPU utilization data for LPAR managementDomain 0 Record 19 System data (global)Domain 0 Record 23 Formal spin lock data (global)Domain 1 Record 4 System configuration dataDomain 1 Record 5 Processor configuration data (per processor)Domain 1 Record 16 Scheduler settingsDomain 1 Record 18 CPU capability changeDomain 2 Record 4 Add user to dispatch listDomain 2 Record 5 Drop user from dispatch listDomain 2 Record 7 Set SRM changesDomain 2 Record 13 Add VMDBK to limit listDomain 2 Record 14 Drop VMDBK from limit listDomain 4 Record 2 User logoff dataDomain 4 Record 3 User activity dataDomain 4 Record 9 User activity data at transaction endDomain 5 Record 1 Vary on processorDomain 5 Record 2 Vary off processorDomain 5 Record 11 Instruction counts per processorDomain 5 Record 13 CPU-measurement facility countersDomain 5 Record 16 Park/unpark decisionDomain 5 Record 17 Real CPU dataDomain 5 Record 19 CPU pool utilization

Page 34: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

34 © 2013, 2015 IBM Corporation

SMT Ideal Application: Perfkit Screen PRCLOG (FCX304)

SMT Enabled FCX304 Run 2015/02/15 08:52:10 PRCLOG Page 56 Processor Activity, by Time From 2015/02/14 16:31:32 SYSTEMID To 2015/02/14 16:42:02 CPU 2964-704 SN 12F17 For 630 Secs 00:10:30 "This is a performance report for SYSTEM XYZ" z/VM V.6.3.0 SLU 0000 ____________________________________________________________________________________________________________________________________ <--- Percent Busy ----> <-- Rates per Sec. ---> <----- Paging -------> <Co> < Di> C Pct Fast Page <mm> < ag> Interval P Park Inst <2GB PGIN Path Read Msgs X'9C' Core/ Core added End Time U Type PPD Ent. DVID Time Total User Syst Emul Siml DIAG SIGP SSCH /s /s % /s /s /s Thread thread added >>Mean>> 0 IFL VhD 100 0000 0 84.7 84.5 .2 77.0 30035 416.7 1124 34.6 .0 .0 .... .2 .2 .0 00/0 >>Mean>> 1 IFL VhD 100 0000 0 84.3 84.1 .2 76.8 29845 447.8 1054 2.0 .0 .0 .... .0 .0 .0 00/1 >>Mean>> 2 IFL VhD 100 0001 0 84.5 84.4 .2 76.8 31053 439.6 1098 1.4 .0 .0 .... .0 .0 .0 01/0 >>Mean>> 3 IFL VhD 100 0001 0 84.6 84.4 .2 77.0 30648 491.9 1028 1.2 .0 .0 .... .0 .0 .0 01/1 >>Mean>> 4 IFL VhD 100 0002 0 84.5 84.3 .2 77.0 29912 535.7 1106 1.7 .0 .0 .... .0 .0 .0 02/0 >>Mean>> 5 IFL VhD 100 0002 0 84.9 84.7 .2 77.5 29667 526.1 1029 1.3 .0 .0 .... .0 .0 .0 02/1 >>Mean>> 6 IFL VhD 100 0003 0 84.8 84.6 .2 77.3 29368 450.1 1062 2.1 .0 .0 .... .1 .0 .0 03/0 >>Mean>> 7 IFL VhD 100 0003 0 84.7 84.5 .2 77.3 29026 566.8 1027 2.0 .0 .0 .... .0 .0 .0 03/1 >>Total> 8 IFL VhD 800 MIX 0 677.0 675.5 1.5 616.6 240k 3875 8527 46.2 .0 .0 .... .2 .3 .0 MIX

SMT Disabled FCX304 Run 2015/02/15 08:52:14 PRCLOG Page 56 Processor Activity, by Time From 2015/02/14 16:04:29 SYSTEMID To 2015/02/14 16:14:59 CPU 2964-704 SN 12F17 For 630 Secs 00:10:30 "This is a performance report for SYSTEM XYZ" z/VM V.6.3.0 SLU 0000 ____________________________________________________________________________________________________________________________________ <--- Percent Busy ----> <-- Rates per Sec. ---> <----- Paging -------> <Co> < Di> C Pct Fast Page <mm> < ag> Interval P Park Inst <2GB PGIN Path Read Msgs X'9C' Core/ Core added End Time U Type PPD Ent. DVID Time Total User Syst Emul Siml DIAG SIGP SSCH /s /s % /s /s /s Thread Thread 0 only >>Mean>> 0 IFL VhD 100 0000 0 95.7 95.5 .2 88.2 38153 551.3 22.8 37.1 .0 .0 .... .0 .2 .0 00/0 >>Mean>> 1 IFL VhD 100 0001 0 95.7 95.5 .2 88.2 37536 492.2 10.3 2.7 .0 .0 .... .0 .0 .0 01/0 >>Mean>> 2 IFL VhD 100 0002 0 95.6 95.4 .2 88.0 38178 509.8 74.0 2.9 .0 .0 .... .1 .0 .0 02/0 >>Mean>> 3 IFL VhD 100 0003 0 95.5 95.3 .2 87.8 38532 508.4 8.8 4.8 .0 .0 .... .1 .1 .0 03/0 >>Total> 4 IFL VhD 400 MIX 0 382.5 381.6 .9 352.1 152k 2062 115.9 47.5 .0 .0 .... .2 .3 .0 MIX

Page 35: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

35 © 2013, 2015 IBM Corporation

SMT Ideal Application: Perfkit Screen SYSCONF (FCX180)

SMT Enabled

FCX180 Run 2015/02/15 08:52:10 SYSCONF System Configuration, Initial and Changed From 2015/02/14 16:31:32 To 2015/02/14 16:42:02 CPU 2964-704 For 630 Secs 00:10:30 "This is a performance report for SYSTEM XYZ" z/VM V.6.3.0 ____________________________________________________________________________________________________________________________________ : Data removed to save screen space : Multithreading Enabled the z/VM system is enabled for SMT Server Time Protocol (STP) facility configuration XRC_TEST enabled No XRC_OPTIONAL enabled No STP H/W feature installed No STP H/W feature enabled No STP Timestamping enabled No STP Timezone usage enabled No STP is active No STP is suspended No STP susp. message issued No STP TOD clock offset +00:00:00.0000000000 Initial Status on 2015/02/14 at 16:31, Processor 2964-704 Total Conf Stby Resvd Ded Shrd Real Proc: Cap 492.0000 103 4 0 99 Sec. Proc: Cap 492.0000 99 99 0 4 Log. IFL : CAF 41 8 4 0 0 4 0 <------- Processor --------> Core/ Num Serial-Nr Type Status Thread 0 012F17 IFL Master 00/0 Total of 4 cores and each core has a thread 0 and thread 1 associated with it 1 012F17 IFL Alternate 00/1 2 012F17 IFL Alternate 01/0 3 012F17 IFL Alternate 01/1 4 012F17 IFL Alternate 02/0 5 012F17 IFL Alternate 02/1 6 012F17 IFL Alternate 03/0 7 012F17 IFL Alternate 03/1 Processor Configuration Mode: LINUX

Page 36: Z/VM Performance with SMT June 27, 2015 z/VM Workshop Xenia Tkatschow z/VM Performance Analysis xenia@us.ibm.com © 2013, 2015 IBM Corporation

36 © 2013, 2015 IBM Corporation

SMT Ideal Application: Perfkit Screen SYSSET (FCX154)

SMT Enabled

FCX154 Run 2015/02/15 08:52:10 SYSSET System Scheduler Settings, Initial and Changed From 2015/02/14 16:31:32 SYSTEMID To 2015/02/14 16:42:02 CPU 2964-704 For 630 Secs 00:10:30 "This is a performance report for SYSTEM XYZ" z/VM V.6.3.0 ____________________________________________________________________________________________________________________________________ Initial Scheduler Settings: 2015/02/14 at 16:31:32 DSPSLICE (minor) 10.00 msec. IABIAS Intensity 90 Percent Hotshot T-slice 4.000 msec. IABIAS Duration 2 Minor T-slices DSPBUF Q1 32767 Openings STORBUF Q1 Q2 Q3 300 % Main storage DSPBUF Q1 Q2 32767 Openings STORBUF Q2 Q3 250 % Main storage DSPBUF Q1 Q2 Q3 32767 Openings STORBUF Q3 200 % Main storage LDUBUF Q1 Q2 Q3 100 % Paging exp. Max. working set 9999 % Main storage LDUBUF Q2 Q3 75 % Paging exp. Loading user 10 Pgrd / T-slice LDUBUF Q3 60 % Paging exp. Loading capacity 34 Paging expos. LIMITHARD algorithm Consumption DSPWD method Reshuffle z/VM Dispatch Workload Algorithm must be set to Reshuffle for SMT to be enabled Polarization Vertical Hiperdispatch polarization must be Vertical for SMT to be enabled Global Perf. Data ON EXCESSUSE: CP ...... CPUPAD: CP 6400% ZAAP ...... ZAAP 0% IFL ...... IFL 0% ICF ...... ICF 0% ZIIP ...... ZIIP 0% Multithreading Enabled <--------- Threads ----------> H/W Requested System Activated Max Threads 255 2 Maximum number of threads activated on this z/VM system CP core 1 255 1 1 Activated column = MIN (H/W , System) IFL core 2 255 2 2 ICF core 2 255 1 1 ZIIP core 2 255 1 1 Changed Scheduler Settings Date Time Changed ..... ........ No changes processed