performance monotoring 1 aix perf updates

© 2010 IBM Corporation

AIX PerformanceUpdates & Issues 2011

Session: PE20Steve Nasypany [email protected]

IBM Advanced Technical Skills

© 20101IBM Corporation2

Power Systems Technical UniversityPower Systems Technical UniversityIBM

Agenda

� AIX 6.1 & POWER7

– nmon & topas updates

– Perfstat Library updates

– Dynamic Power Save/Active Energy Management

– 4-way Simultaneous Multi-threading

– Virtual Processor Folding

– Utilization Issues

– Enhanced Affinity

– Partition Placement

– 1TB Segment Aliasing

– JFS2 i-node/metadata

– iostat block IO

� AIX 7.1

– Performance tools 1024-way scaling

� Java/WAS Performance, Best Practices Links and Java Performance Advisor

� FCoE adapter performance

� New Free Memory Tool



nmon On Demand Recording (ODR)

� New function ideal for benchmarks, proof-of-concepts and problem analysis

� Allows “high-resolution” recordings to be made while in monitoring mode

– Records samples at the interactive monitoring rate

– AIX 5.3 TL12, AIX 6.1 TL05 and AIX 7.1

� Usage

– Start nmon, use “[“ and “]” brackets to start and end a recording

• Records standard background recording metrics, not just what is on screen.

• You can adjust the recorded sampling interval with

-s [seconds]on startup, interactive options “-” and “+” (<shift> +) do NOT change ODR

interval– Generates a standard nmon recording of format:

<host>_<YYYYMMDD>_<HHMMSS>.nmon

– Tested with nmon Analyser v33C, and works fine



nmon ODR



nmon chaos – to be or not to be logical

� Unknown to many customers and the field, the per-CPU metrics in the main nmon panel and that are recorded have always reported logical utilization for active CPUs, whereas AIX tools (topas, sar, mpstat, vmstat and lparstat) were adjusted in AIX 5.3 to report physical utilization

– Hardware register represents utilization in terms of absolute core consumption by hardware context thread (SMT) – this is called the Processor Utilization Resource Register (PURR)

– Values are relative to physical consumption for that CPU. ie, 99% busy means nothing without knowing what that CPU’s physical consumption is - 99% of 0.01 physc is perfectly normal behavior in shared lpars and not reason for alarm

– Details of hardware register are provided in last years white paper by Saravanan Devendra

http://www.ibm.com/developerworks/wikis/display/WikiPtype/Understanding+Processor+Utilization+on+POWER+Systems+-+AIX

� Nmon does (and has for many years) provide global utilization values (EC & VP) adjusted for PURR, and the ‘#’ option that provides ‘physical’ calculations that match other AIX tools. However, nmon does not provide physical consumption values for each CPU (screens or recording) – it simply notes “PURR Stats” are active

– To best understand this, run nmon and topas (or topas –L) and note that wherever CPU values are reported, topas reports ‘physc’ (global utilization) or ‘pc’ (per CPU utilization)



nmon chaos – to be or not to be logical

� A number of customers complained that the nmon main screen did not match other AIX tools. Development decided to switch nmon to use the same calculations as the other AIX tools

– These APARs shipped in 2H 2010 - many customers noticed this change and complained

– We discovered some ISV products and customers actually use per-logical CPU recording information for capacity planning. This is a bit of a problem:

• Anyone doing capacity planning in shared environments should be using the global utilization, physical and entitlement metrics reported by all tools, including nmon recording

• But many customers still do not understand that per-CPU metrics are relative to physical values, they are still living in an AIX 5.2 world

– Due to the new complaints, development decided to revert nmon back to the way it was. Those APARs shipped in Q1/2011

� Q4/2011 Updates provide both styles of per-logical CPU consumption in nmon recordings. Physical PURR numbers under PCPU* and PCPU_ALL(global) tags.

� For other updates, see Nigel Griffith’s NMON presentation



Dynamic Power Save

� POWER6 & POWER7 have the ability to automatically scale down energy usage based on processor utilization and thermal levels

� Static Power Saver Mode

– Lowers the processor frequency and voltage on a system by a fixed amount

– Reduces the power consumption of the system while still delivering predictable performance

– Percentage of power saved is predetermined and is not user configurable

– Memory also allowed to enter low power state when no access occurs (with supported firmware and DIMMs)

– Workload performance is not impacted, though CPU utilization may increase due to reduced frequency

� Dynamic Power Saver Mode

– Varies processor frequency and voltage based on the utilization of the system's processors

– System utilization is measured based on real-time utilization data

– Supports two modes:

• Favoring System Performance

• Favoring System Power Savings



Dynamic Power Save

� Whitepaper: IBM EnergyScale for POWER7 Processor-Based Systems

http://www-03.ibm.com/systems/power/hardware/whitepapers/energyscale7.html

� Capability requires a new type of reporting where processor utilization is provided by current running frequency and rated frequency

� Reporting is based on Processor Utilization Resource Registers (PURR), the accounting framework. See the Processor Utilization in AIX whitepaper for more details

– Actual based on PURR

– Normalized based on Scaled PURR (SPURR)

– Physical processors reported in each mode

– Adding user, system, idle and wait will equal the total entitlement of the partition from an actual and normalized view

– Shared, uncapped partitions can exceed entitlement

� lparstat (utilization and status): AIX 5.3 TL11 & AIX 6.1 TL04

� Statistics now available in Q4/2011 updates in nmon recordings, under SCPU* and SCPU_ALL tags



Dynamic Power Save - lparstat -E

� Idle value in this report has been modified to report the actual entitlement available (available capacity) – so be aware of this and do not directly compare to the legacy lparstat reports (if you go above entitlement, idle will equal 0)

Available Capacity = Idle = Entitlement - (user + sys + wait)

� When the partition is running at a reduced frequency, the actual available capacity (idle) shown by both the counters are different. The current idle capacity is shown by PURR. The idle value shown by SPURR is what the idle capacity would be (approximately) if the CPU ran at the rated frequency.

#lparstat -E 1

System configuration: type=Dedicated mode=Capped sm t=Off lcpu=64 mem=262144MB

Physical Processor Utilisation:

--------Actual-------- ------Normalised------

user sys wait idle freq user sys wai t idle

---- ---- ---- ---- --------- ---- ---- ---- ----

47.61 6.610 0.004 9.780 3.9GHz[102%] 48.35 6.714 0. 004 8.933

46.24 6.743 0.000 11.02 3.9GHz[102%] 46.96 6.849 0. 000 10.19

47.84 6.651 0.000 9.505 3.9GHz[102%] 48.59 6.756 0. 000 8.653



VIOS Monitoring using topas (AIX 6.1 TL04)� Run topas -C and press 'v' to show the VIOS Monitoring Panel

� All systems must be at AIX 5.3 TL09, VIOS 2.1 or higher to be monitored



VIOS Monitoring using topas� From topas VIOS panel, move the cursor to a particular VIOS server and press 'd' to get the detailed monitoring

for that server



topas Remote CEC & Cluster Views

� AIX 6.1 TL-04

� CEC function has been expanded to allow viewing of remote CEC

– topas -C option can attach to remote systems

– All partitions sharing that hardware ID will then be monitored

� Can now pre-define sets of partitions to make up Cluster

– topas –G

– Configuration details at end

– SMIT panels available

� Firewall support in AIX 6.1 TL06 and AIX 7.1



topas CEC (legacy –C)



topas CEC (new remote function)� Run “topas -C -o xmtopas=ses10.in.ibm.com”



topas Cluster Utilization Panel (topas –G)A Cluster can be defined as a group of related partitions or nodes. The Cluster utilization view can either show the utilization of a HACMP

Cluster utilization or an user defined cluster



topas Cluster subcommands� Press ‘g’ - Toggles global section between brief/detail ed listing

� Press ‘d’ and ‘s’ to toggle between dedicated/share d only partition listings



topas Cluster (remote function)Run “topas –G –o xmtopas=ses12.in.ibm.com”



Perfstat Library Overview� Perfstat is a documented system library for collection of AIX performance metrics

– Supported since AIX V5

– 32-bit, 64-bit threadsafe API

– Supports all performance-related resources

– Structures defined in /usr/include/libperfstat.h header file

� Enhanced with additional metrics and new APIs in AIX 6.1 TL07 & AIX 7.1 TL01

– All CPU metrics – global CPU, physical consumption, physical busy, frequency, etc

– Partition configuration metrics, supplement lpar_get_info()

– Support for process level information outside of legacy libc “procsinfo” API

– Detailed disk statistics (read/write and queue service times)

– New API calls for maintaining state data for interval computations (CPU, process, partition)

– Host Fabric Interface (POWER 775 IH blade)

� Examples of API usage in AIX pubs, and installed at /usr/samples/libperfstat

� Older examples at http://www.ibm.com/developerworks/wikis/display/WikiPtype/ryo



Libperfstat enhancementsThe list of Data Structures added:

� perfstat_cpu_util_t Global CPU Utilization

� perfstat_rawdata_t

� perfstat_process_t Process info

� perfstat_partition_config_t Partition Configuration (lparstat –i, etc)

The Data Structure updated:

� perfstat_disktotal_t Detailed disk statistics

The list of APIs added :

� perfstat_partition_config()

� perfstat_cpu_util()

� perfstat_process()

� perfstat_process_util()



Data Structure Example - perfstat_cpu_util_t



POWER7 Simultaneous Multi-threading Review

� POWER7 processors can run in ST, SMT2, SMT4 modes

– Like POWER6, the SMT threads will dynamically adjust based on workload

– Applications that are single process and/or single threaded may benefit from running in ST mode, particularly if they want to completely consume a single physical core

– Multi-process applications may run better in ST mode if there are fewer application processes than cores

– Multi-threaded and/or multi-process applications (where there are more than the number of cores) will benefit more running in SMT2 or SMT4 mode

0

1

2 3

SMT Thread

Primary

Secondary

Tertiary

� POWER7 threads have different priorities with Primary, Secondary and Tertiary instances

� Work will not be assigned to tertiary threads until enough workload exists to drive primary and secondary (same threads in POWER5 & POWER6) threads – typically ~80%



POWER7 Processor Utilization

FX0FX1FP0FP1LS0LS1BRXCRL

POWER7 4 Way SMT

� Calibrated CPU Utilization

– The Processor Utilization Resource Registers (PURR) hardware counters that are the basis for computing CPU utilization values in all tools, have been modified in POWER7

– Internal hardware counters are calibrated to a variety of commercial workloads to more accurately report real world utilization

– When SMT2 or SMT4 is enabled, a single hardware thread context will no longer be reported as consuming 100% of the core

– The goal is to provide a linear relationship between utilization values and throughput

– Use smtclt –t [1 | 2 | 4] to change SMT mode

� Whitepaper on SMT: Simultaneous Multi-Threading on POWER7 Processors by Mark Funk

http://www.ibm.com/systems/resources/pwrsysperf_SMT4OnP7.pdf

� Whitepaper on utilization Processor Utilization in AIX by Saravanan Devendran http://www.ibm.com/developerworks/wikis/display/WikiPtype/Understanding+Processor+Utilization+on+POWER+Systems+-+AIX



POWER7 Processor Utilization

� Simulating a single threaded process on 1 core, utilization values change

– 1 VP system, using simple shell/perl script cpu hogs

– Some variability between tools and reports depending on implementation

– Nigel’s nstress package has a CPU stress tool, ncpu, but may run more than one thread by default (see command usage)

� Calibrated PURR applies whether running in POWER7 or POWER6 modes on POWER7

– AIX 5.3 and/or POWER6 mode on POWER7 can only support SMT2

� Real world production workloads will involve dozens to thousands of threads, so many users may not notice any difference

busy

idle

POWER6 SMT2

Htc0

Htc1

100% busy

busy

idle

POWER7 SMT4

63%busy

idle

idle

busy

idle

POWER7 SMT2

~70%busy

busy

busy

100% busy

busy

busy

100%busy

Htc0

Htc1

Htc0

Htc1

Htc0

Htc1

Htc0

Htc1

Htc2

Htc3



POWER7 SMT4 Enhancement (AIX 6.1 TL06)

� The original SMT4 algorithm was too aggressive when moving workloads from secondary & tertiary threads to the primary thread when utilization had dropped

– Mechanism is known as idle-shedding

– This impacted OLTP workloads, so it was disabled in favor of a load-based algorithm

� Field experience on large shared-pool systems uncovered scenarios where workload changes did not optimally result in switching from SMT4 to ST mode

– Secondary/tertiary thread would hold a thread for too longer than desired when the primary became idle

– Idle shedding algorithm is now enabled, enhanced to support OLTP workloads and prevent threads from sticking to secondary/tertiary threads

� These are optimizations applied from experiences with “real world” customer workloads on POWER7

� APAR IZ97088



POWER7 SMT - 720 Firmware Bug

� There is a Hypervisor dispatch bug where it is only evaluating a single thread rather than all for SMT threads to determine the workload. SMT threads with work are ignored.

� Issue appears in 720_064 and later levels and is fixed in 720_101. It does not exist in 710 or 730 firmware levels.

http://www-304.ibm.com/webapp/set2/sas/f/power5cm/power7.html

� Can impact single or multiple-lpars while others are not effected

� There is no clear diagnosis from the OS level, but the primary complaint from customers is that they see distinct differences in application latency issues between lpars and have no identifiable resource constraints

– Some data appears to show that all VP’s are being dispatched to the primary SMT thread(s), so a normally multi-process or multi-threaded workload is very busy on the primary thread(s) and the secondary/tertiary threads are idle. sar –P ALL or mpstat output for all logical CPUs might show little or no distrubtion across SMT threads.



POWER7 SMT – Application/DB settings

� Where customers have seen performance differences between other systems and POWER7 with SMT4, we have found storage differences or software settings to be the primary factor for any delta. Examples:

– One DB environment used Asynchronous IO, whereas the POWER7 system did not. When the systems were made comparable, the POWER7 system performed as expected.

– Oracle’s CPU_COUNT (init.ora) and PARALLEL_THREADS_PER_CPU parameters determine the factor of parallelism

• Number of Virtual Processors * SMT Setting * PARALLEL_THREADS_PER_CPU

• Our ATS Software Specialists have additional material on these settings for Oracle

� These parameters can have a large impact on performance and must be taken into account before believing something is wrong with SMT



Virtual Processor Folding - Review

� Virtual Processor Folding is the technology which consolidates threads to the minimum number of Virtual Processors (VP) required to support a workload

– Each virtual processor can consume a maximum of one physical processor

– Operating system constantly assesses workload requirements and folds or unfolds VPs as required

– Response to customers allocating excessive VPs vs physical cores available in a shared pool

– Enabled by default since AIX 5.3 ML03

– Also allows dedicated partitions to donate free cycles to shared pools

– Dedicated systems do in fact run under VPs – they are just not enabled for folding by default

– All of the SMT threads associated with a physical core must be quiesced before a VP can be folded

– Technology aids the PowerVM hypervisor to put physical cores into lower energy levels, presuming all the VPs on different partitions within a shared pool associated with a physical core are “foldable”



POWER7 Virtual Processor Folding - Algorithm

� Every second, the OS calculates the physical utilization for the last second

– VPs are activated based on utilization thresholds and the vpm_xvcpus tunable setting

– Where the schedo vpm_xcpus setting is:

• Defaults to 0 (enabled)

• Disabled with -1

• Can be set to a positive whole number to increase the number of active VPs

� Folding is activated and deactivated 1 VP at a time, even in the case where utilization drops to idle, the VPs fold one at a time

� The legacy threshold at which default settings would trigger another VP was a utilization level of ~80%.

– This threshold has changed, and may evolve at any time

– AIX 6.1 TL05 is more aggressive about unfolding Virtual Processors

� non-IBM ISV’s have largely adopted IBM’s recommendations

http://www.oracle.com/technetwork/database/clusterware/overview/rac-aix-system-stability-131022.pdf



POWER7 Virtual Processor Folding - Cost

� Disabling folding will result in:

– Overriding optimizations built into the OS schedulers

– All VPs being dispatched to the hypervisor, whether they have work to do or not

– More hypervisor overhead, possible impact on physical resource affinity

� The upside to disabling folding is that it can lead to better performance when lpar(s) are perfectly sized. This typically applies to performance benchmarks and not a mix of real-world, traditional Unix production workloads sharing CPU resources.

– If the lpars are perfectly sized, and the pool is never constrained, customers may not notice much of a performance hit

– When the pool or enough lpars are constrained, excess VPs on other lpars will hurt performance

� Folding is a real-world performance feature

� Disabling folding will adversely impact most environments and may result in IBM Support refusing to analyze a PERFPMR until collections have been performed with restricted tunables reset to their defaults



Utilization Issues - 2011

� AIX 6.1 TL05 SP6 defects can inflate physical utilization

– Symptom is high entitlement/physical consumption AND high %Idle – very obviously wrong

– APAR IZ94768 in latest TL06

� Idle kernel proc looping in dispatch cycles

– APAR IV01111: WAITPROC IDLE LOOPING CONSUMES CPU

� Applicable to multi-node systems only, and if encountering problems (see Enhanced Affinity section)

– APAR IV06194: SRAD LOAD BALANCING ISSUES ON SHARED LPARS

� Because newer AIX 6.1 levels are more aggressive at unfolding Virtual Processors, comparisons with older AIX levels may cause confusion

– Noticed in some POWER6 to POWER7 migrations

– Physical consumption is higher, but User and System time are lower (system is more idle)

– Review utilization or physical busy metrics before assuming something is wrong



Utilization Issues - 2011� Many customers do not know that physical busy, the User + System percentage of physical

consumed is reported by lparstat> lparstat 1

System configuration: type=Shared mode=Uncapped smt =4 lcpu=48 mem=49152MB psize=15 ent=1.20

%user %sys %wait %idle physc %entc lbusy app vcsw phint %nsp

----- ----- ------ ------ ----- ----- ------ --- ----- ----- -----

61.5 1.3 0.0 37.1 1.36 113.2 15.1 11.39 1 3333 26 95

61.4 1.2 0.0 37.4 1.35 112.2 14.3 11.43 1 3664 21 96

59.4 3.0 0.0 37.6 1.39 115.9 15.0 11.36 1 2400 16 95

> lparstat –l 1

System configuration: type=Shared mode=Uncapped smt =4 lcpu=48 mem=49152MB psize=15 ent=1.20

%user %sys %wait %idle physc %entc pbusy app vcsw phint %nsp

----- ----- ------ ------ ----- ----- ------ --- ----- ----- -----

14.6 0.2 0.0 85.2 1.35 112.1 84.7 11.29 1 3461 10 96

14.1 0.0 0.0 85.9 1.38 115.1 86.4 11.22 1 2677 32 95

14.0 0.4 0.0 85.6 1.36 113.5 85.4 11.38 1 3397 16 96

NOTE: –l results in utilization values switching to logical, whereas default shows physical “PURR” utilization



POWER7 Prefetch� POWER prefetch instructions can be used to mask latencies of requests to the memory controller and fill

cache.

– The POWER7 chip can recognize memory access patterns and initiate prefetch instructions automatically.

– Control over how aggressive the hardware will prefetch, i.e. how many cache lines will be prefetched for a given reference, is controlled by the Data Streams Control Register (DSCR).

� The dscrctl command can be used to query and set the system wide DSCR value

# dscrctl -q

Current DSCR settings:

Data Streams Version = V2.06

number_of_streams = 16

platform_default_pd = 0x5 (DPFD_DEEP)

os_default_pd = 0x0 (DPFD_DEFAULT)

� A system administrator can change the system wide value using the dscrctl command

# dscrctl [-n | -b] –s <value>

Disengage the data prefetch feature : dscrctl -n -s 1

Returning to default: dscrctl –n –s 0

� This is a dynamic system-wide setting. Consult AIX Release Notes and Performance Guide for HPC Applications on IBM Power 755 System for more information



POWER7 Enhanced Affinity

� Affinity is a threads relationship with CPU and memory resources

– Large partitions and shared pools will span chips and nodes

– Maintaining proximity to a set of resources provides optimal performance

– POWER7 and AIX 6.1 provide much better instrumentation for affinity enhancements than have been available in previously

� Enhanced Affinity Summary

– Memory and CPU resources are localized, and form Affinity Domains

• Resource Allocation Domain (RAD) is a collection of physical resources

• Scheduler Resource Allocation Domain (SRAD)

Collection of system resources that are the basis for most resource allocation and scheduling activities performed by the kernel

– An Affinity Domain (home “node”) is assigned to each thread at startup

• Thread’s private data is affinitized to home node

• Threads may temporarily execute remotely, but will eventually return to their home SRAD

• Single-threaded processes application data heap will be placed on home SRAD

• Multi-threaded processes will be balanced on SRADs depending upon footprint



Enhanced Affinity� Resource Affinity structures used by Enhanced Affinity function to help maintain locality for

threads to hardware resources. New terms describe the distance between two resources in a two- or three-tier affinity environment (POWER7)

– 2-tier for low-end systems (blades, 710, 720, 730, 740, 750, 755)

Local resources have affinity within a chip

Far resources outside the chip

– 3-tier for multi-node systems (770, 780, 795)

Local resources have affinity within a chip

Near resources share the same node/book

Far resources outside the node/book

� AIX Topology Service

– System Detail Level (SDL) is used to identify local (chip), near (within node) and far (external node)

– “REF” System Detail Level Used to identify near/far memory boundaries (nodes)

� A new tool, lssrad, displays hierarchy and topology for memory and scheduler. When dynamically changing CPU/memory configurations, lssrad output can show the systems balance.

� Affinity metrics can be monitored in dedicated or shared partitions, but shared partitions layout is not a 1:1 mapping to physical layout



System Topology & lssrad –va (4 node 770)REF SRAD MEM LCPU

0

0 28250.00 0-31

1 27815.00 32-63

1

2 28233.00 64-95

3 27799.00 96-127

2

4 28281.00 128-159

5 27799.00 160-191

3

6 28016.00 192-223

7 27783.00 224-255

MEM 032GB

REF0

MEM 132GB

CPU0

CPU1

MEM 232GB

REF1

MEM 332GB

CPU2

CPU3

MEM 432GB

REF2

MEM 532GB

CPU4

CPU5

MEM 632GB

REF3

MEM 732GB

CPU6

CPU7



Enhanced Affinity: topas -M

� You can select columns to sort on with tab key



Enhanced Affinity: mpstat

System configuration: lcpu=8 ent=1.0 mode=Uncapped

cpu cs ics .. S0rd S1rd S2rd S3rd S4rd S5rd ilcs v lcs S3hrd S4hrd S5hrd

0 12344 4498 .. 95.0 0.0 0.0 5.0 0.0 0.0 3 3095 100.0 0.0 0.0

1 112 56 .. 99.6 0.4 0.0 0.0 0.0 0.0 0 139 100.0 0.0 0.0

2 0 0 .. 50.0 50.0 0.0 0.0 0.0 0.0 0 90 100.0 0.0 0.0

3 0 0 .. 0.0 100.0 0.0 0.0 0.0 0.0 0 90 100.0 0.0 0.0

4 12109 4427 .. 94.9 0.0 0.0 5.1 0.0 0.0 1 3053 100.0 0.0 0.0

5 326 163 .. 99.9 0.1 0.0 0.0 0.0 0.0 0 250 100.0 0.0 0.0

6 0 0 .. 0.0 100.0 0.0 0.0 0.0 0.0 0 90 100.0 0.0 0.0

7 0 0 .. 0.0 100.0 0.0 0.0 0.0 0.0 0 90 100.0 0.0 0.0

ALL 24891 9144 .. 95.0 0.0 0.0 5.0 0.0 0. 0 4 6897 100.0 0.0 0.0

� mpstat (-a, -d flags) displays logical-CPU SRAD affi nity

– Home SRAD redispatch statistics

• S3hrd – local

• S4hrd – near (3-tier only)

• S5hrd – far



Enhanced Affinity: svmon

� Global Report

– Affinity domains are represented based on SRADID

Memory information of each SRAD: total, used, free, filecacheLogical CPUs in each SRAD

� Process Report

– Displays the ‘home SRAD’ affinity statistics for the threads of a process

– Also provides an application’s memory placement policies



Enhanced Affinity: svmon

# svmon -G -O affinity=on,unit=MB

size inuse free pin virtua l available mmode

memory 32768.00 3353.63 29414.37 185 0.54 3195.75 29410.62 Ded

pg space 5408.00 10.6

work pers clnt other

pin 896.20 0 0 95 4.34

in use 3195.75 0.10 157.78

Domain affinity free used total filecache lcpus

0 24011.50 1837.38 25848.88 117.41 0 1 2 3

1 5403.90 560.89 5964.79 26.8 4 5 6 7 8 9 10

# lssrad -va

REF1 SRAD MEM CPU

0

0 25848.88 0-3

1 5964.79 4-10



Enhanced Affinity: svmon # svmon -P 3670212 -O threadaffinity=on

Pid Command Inuse Pin Pgsp Virtua l

3670212 rmcd 20334 10793 0 20162

Tid HomeSRAD LocalDisp NearDisp FarDisp

16449773 0 602 672 0

18808987 0 41 41 0

7864593 1 23 0 0

7930141 1 21 0 0

# svmon -P 1 -O affinity=detail

Pid Command Inuse Pin Pgsp Virtu al

1 init 18654 10786 0 18636

Domain affinity Npages Percent Private lcpu s

1 9914 53.2 31 2 3 4

0 8722 46.8 147 0 1



POWER7 Affinity & Partition Placement� POWER6 and earlier, Hypervisor (PHYP) minimized the number of affinity domains

(books/drawers/chips) per partition

� POWER7 Hypervisor improves affinity by selecting optimized number of domains

–Ensures cores/memory allocated from each domain

� PHYP, AIX 7 and IBM i v7r1m0 changed to support both a primary and secondary affinity domains.

–For 795, chip is primary and book is secondary domain.

–OS enforces affinity in SPLPAR partitions in POWER7 (not done in P5 & P6)

� PHYP, AIX 7 and IBM i v7r1m0 also added support for home node per shared virtual processor (previous PHYP internally supported home node per partition) to improve affinity

� New System Partition Processor Limit (SPPL) gives direction to PHYP whether to contain partitions to minimum domains or spread partitions across multiple domains.

–Applies to shared or dedicated environments



Setting System Partition Processor Limit (SPPL) on the HMC

The following section in Managing the HMC infocenter topic provides a reference to System Partition Processor Limit (SPPL):

http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ha1/smProperties.htm

Systems ManagementProperties

Advanced Tab



Placement with Max Partition Size = 32

LPAR

(8 VPs)

LPAR

(16 VPs) LPAR

(32 VPs)

LPAR

(8 VPs)

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8

LPAR

(24 VPs)LPAR

(16 VPs)

� Partitions will be contained in minimum number of nodes.

� If a partition cannot be contained within a single node, then it will be spread across a minimum number of nodes.

LPAR

(32 VPs)

LPAR

(16 VPs)

LPAR

(28 VPs)

LPAR

(26 VPs)

LPAR

(16 VPs)LPAR

(16 VPs)

8 Free cores

4 Free cores

6 Free cores



Partition Placement/Licensing

� New Firmware (eFW7.3 and later)

� At system power on treat all processors and memory as licensed

� Place all the partitions as optimally as possible from a performance viewpoint

� May require spreading a partition across multiple chips/drawers/books to ensure memory and processors on domains (i.e. try to ensure if memory from a domain there is also a processor from the domain and visa versa).

� Optimization of other hardware components might also cause spreading of larger partitions across domains (i.e. to provide additional internal bus bandwidth, spread >24 way processor partitions across multiple books)

� Unlicense individual processors that have not been assigned to partitions

� First choice is to unlicense processors that do not have any memory DIMMs connected to the processor

� Second is to spread out the unlicensed processors across the domains such that each domain would have similar number of unlicensed processors



Placement with Max Partition > 32 + Licensing

LPAR1

(20 VPs)

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8

� Partitions of 24 or fewer virtual CPU (VPs) are packed into a single node if sufficient memory

� Partition >24 processors are spread across multiple books to allow for additional bandwidth

� Memory would come from same books where processors are located.

� Licensed memory is a max value across all 8 books, not specific locations

LPAR2

(36 VPs)

LPAR3

(64 VPs)

LPAR4

(16 VPs)

6 unlicensed processors



LPAR5

(42 VPs)






LPAR6

(12 VPs, Memory across 3 Books)



POWER7 1TB Segment Aliasing

� Workloads with large memory footprints and low spatial locality can perform poorly

– Analysis shows that processor Segment Lookaside Buffers (SLB) faults can take a significant amount of time

– POWER 6

• SLB has 64 entries

• 20 reserved for the kernel

• 44 available for user processes -> yields 11GB of accessible memory

• Many customer workloads do not fit into 11GB

– POWER 7

• SLB has 32 entries - architectural trend towards smaller SLB sizes

• 20 still reserved for the kernel

• 12 available for user processes -> yields 3GB of accessible memory

• Potential for performance regression



Example Process Address Space with 1TB Aliasing

Kernel

Program TextHeap

80GB shared Memory region(320 segments)

Kernel

Program TextHeap

80GB shared Memory region(320 segments)

3 256MB segments

320 256MB segments

8 256MB segments

3 256MB segments

1 1TB shared alias segment

1 1TB unshared alias segment

331 SLB entries, heavy SLB thrashing

5 SLB entries, no SLB faults



POWER7 1TB Segment Aliasing

� Feature allows user applications to use 1TB segments

– 12 SLB entries can now address 12TB of memory

– SLB fault issue no longer relevant

– Immediate performance boost for applications, new and legacy

– Significant changes under the covers

• New address space allocation policy

• Attempts to group address space requests together to facilitate 1TB aliasing.

• Once certain allocation size thresholds have been reached, OS automatically aliases memory with 1TB aliases.

– 256MB segments still exist for handling IO

– Currenty only available for 64-bit process shared memory regions

– Default in AIX 7.1, optional in AIX 6.1 TL06

� Use & diagnosis

– Engage ATS Software Specialists, ask for Ralf Schmidt-Dannert’s whitepaper

– Diagnosis techniques require use of trace tools for analysis. Example:

tprof –T 100000000 –skeux sleep 10

– Review sleep.prof file for noticeable cpu % in kernel routine set_smt_pri_user_slb_found()



JFS2 i-node cache review� The maximum size of the JFS2 i-node and metadata ca ches are set by two ioo tunables

– j2_inodeCacheSize = 400 (AIX 6.1)

– j2_metadataCacheSize = 400 (AIX 6.1)

– Both of these default is 400 and the units are undocumented and undefined, but we do know that the the i-node structure is 1K

� Procfs command is the only simple way to view usage on AIX 6.1

# cat /proc/sys/fs/jfs2/memory_usage

metadata cache: 48476160

inode cache: 161546240

total: 210022400

� Years ago, Ralf Schmidt-Dannert and Doug Ranz inves tigated the unit settings and determined for a 10GB system:

1.0 GB

400 MB

300 MB

200 MB

100 MB

Maximum metadata

Cache

2500K2.5 GB1000

1000K1.0 GB400 (default)

750K750 MB300

500K500 MB200

250K250 MB100

Cacheable i-nodesMaximum i-node

cache

j2_inodeCacheSize or j2_metadataCacheSize



JFS2 i-node cache review� The general answer to the question: “svmon isn’t re porting all my memory, where’s the

rest of it?” is to check inode/metacache usage

� The behavior of AIX JFS2 i-node and metadata cache can cause problems on customer systems

– Customers may note system memory increasing over time with no clear explanation, vmstat and svmon do not report i-node or metadata usage

– Kernel pinned heap is used for these structures

– Cache usage is not reduced when files are deleted, and remounting filesystems does not release the i-node or metadata

– New i-nodes use new memory until reaching the capacity defined by the tunables. At that point, older entries may be recycled

– These tunables are dynamic, but the general thinking is it is not a good idea to tune an active system down once it has reached these limits. If you are running 10% inode cache on a very large system and want to reduce that, open a PMR or wait for a reboot.

� Undocumented formulas generally appear to allow for the defaults to consume these percentages of real memory:

– Inode cache 10% (AIX 6.1) and 5% (AIX 7.1)

– Metadata cache 4% (AIX 6.1) and 2% (AIX 7.1)

� Weblink provides an online reference for customers:

http://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXJ2inode



Support for enhanced iostat metrics

� AIX 7.1 and AIX 6.1 TL06 (SP2)

� The option 'b' provides detailed I/O statistics for block devices. Enabled for root and non-root user

� Block IO stats collection is disabled by default

– root user can enable with: raso -o biostat=1

� Syntax:

iostat -b [block Device1 [block Device [...]]] Inte rval [sample]



Raso tunable - biostat

Purpose:

Specifies whether block IO device statistics collection should be enabled or not.

Values:

Default: 0

Range: 0, 1

Type: Dynamic

Unit: boolean

Tuning:

This tunable is useful in analyzing performance/utilization of various block IO devices. If this tunable is enabled, we can use iostat -b to show IO statistics for various block IO devices.

Possible Value:

1 : Enabled

0 : Disabled



Block IO Device Utilization Report

The Block IO Device report provides statistics on per IO device basis. The report has following format:

� device Name of the device

� reads Number of read requests over interval

� writes Number of write requests over interval

� bread Number of bytes read over interval

� bwrite Number of bytes written over interval

� rserv Read service time in milliseconds per read over interval

� wserv Write service time in milliseconds per write over interval

� rerr Number of read errors over interval

� werr Number of write errors over interval



iostat –b sample output



Support for 1024 Logical CPUs – AIX 7.1

� Growth of physical core support and SMT4 for POWER7 will drastically increase the number of logical CPUs on systems

� Presents difficult challenge in analysis of very large partitions

� Processor tools need to support new filtering and output options for analysis

– Sorting and filtering options for sar and mpstat

– Screen freezing, scrolling, paging for topas

– New XML formatted reports (vmstat, iostat, mpstat, sar, lparstat)



sar –O option for sorting and filteringsar [ { -A [ -M ] | [ -a ] [ -b ] [ -c ] [ -d ][ -k ] [ -m ] [ -q ] [ -r ] [ -u ] [ -v ] [ -w ] [ -y ] [ -M ] } ] [ -P processoridentifier, ... | ALL | RST[-O {sortcolumn=col_name[,sortorder={asc|desc}][,topcount=n]}]]] [ [ -@ wparname ] [ -e [YYYYYMMDD]hh [ : mm [ : ss ] ] ] [ -f file ] [ -i seconds ] [ -o file ] [ -s [YYYYYMMDD]hh [ : mm [ : ss ] ] ] [-x] [ Interval [ Number ] ]

-O Options Allows users to specify the command option. -O options=value...Following are the supported options:sortcolumn = Name of the metrics in the sar command outputsortorder = [asc|desc] - Default value of sortorder is “desc”topcount = Number of CPUs to be displayed in the sar command sorted output

To display the sorted output for the column cswch/s with the -w flag, enter the following command: sar -w -P ALL -O sortcolumn=cswch/s 1 1

To list the top ten CPUs, sorted on the scall/s column, enter the following command: sar -c -O sortcolumn=scall/s,sortorder=desc,topcount =10 -P ALL 1

Support for 1024 Logical CPUs – sar filtering



mpstat –O option for sorting and filtering

mpstat [ { -d | -i | -s | -a | -h } ] [ -w ][ -OOptions ] [ -@ wparname] [ interval [ count ] ]

-O Options Specifies the command option. -O options=value...

Following are the supported options:

sortcolumn = Name of the metrics in the mpstat command output

sortorder = [asc|desc] - Default value of sortorder is “desc”

topcount = Number of CPUs to be displayed in the mpstat command sorted output

To see the sorted output for the column cs (context switches), enter the following command:

mpstat -d -O sortcolumn=cs

To see the list of the top 10 CPUs, enter the following command:

mpstat -a -O sortcolumn=min,sortorder=desc,topcount= 10

Support for 1024 Logical CPUs – mpstat filtering



� topas panel freezing. 'Space Bar' is used totoggle a screen freeze

� Sort using left/right arrows to select column

� Scroll using PgUp/PgDn

Support for 1024 Logical CPUs - topas



� XML output for commands lparstat, vmstat, iostat, mpstat, sar

� Default output file name is command_DDMMYYHHMM.xml and is generated in current directory

� User can specify output file name and directory using “–o”

lparstat -X -o /tmp/lparstat_data.xml

� XML schema files are shipped with base OS under /usr/lib/perf

iostat_schema.xsd, lparstat_schema.xsd, mpstat_sche ma.xsd, sar_schema.xsd, vmstat_schema.xsd

� Currently, the xml output generated by these commands is not validated as per schema. It is up to the application to do this validation

sar [ -X [-o filename]] [interval[count]]

mpstat [ -X [-o filename]] [interval[count]]

lpartstat [ -X [-o filename]] [interval[count]]

vmstat [ -X [ -o filename]] [ interval [ count ]]]

iostat [ -X [-o filename]] [interval[count]]

Support for 1024 Logical CPUs – XML output



Miscellaneous AIX 6.1 TL06 & AIX 7.1

� CPU Interrupt Disabling

– Minimize interrupt jitter that impacts application performance

– Quiesce external interrupts on a set of logical processors

– Control interface (subroutine, kernel service, command line)

– POWER5 and later systems, dedicated or shared

� Kernel memory pinning

– Default in AIX 7.1, option in AIX 6.1

– vmo vmm_klock_mode tunable

– Should you do this in AIX 6.1? No – you should be talking to SupportLine with problems involving kernel segment(s) being paged before this is done. But there is more flexibility now to protect kernel memory.

� Hot Files Detection Subsystem

– A new subsystem for detecting hot files in JFS2 filesystems

– Currently, only a program interface using ioctl() calls on active file descriptors. Contact us if you are an interested in using this interface.

– System header and structures, see /usr/include/sys/hfd.h

� Perfstat and PTX SPMI API’s have been extended for covering the various new technologies supported (Active Memory Expansion, etc)



Java & POWER7

� Java 6 SR7 enhanced for POWER7 instructions, pre-fetch and autonomic 64KB page sizes

http://www.ibm.com/developerworks/java/jdk/aix/faqs.html

� Best Practices for Java powerformance on POWER7

https://www.ibm.com/developerworks/wikis/display/LinuxP/Java+Performance+on+POWER7

� Websphere Application Server (WAS)

– V7 & V8 provide specific exploitation of POWER6 & POWER7 instructions and 64KB page sizes

– V8 includes scaling, footprint reduction, Java Persistence API (JPI) improvements

� Java Performance Advisor (JPA)

– Provides performance recommendations for Java/WAS applications on AIX

https://www.ibm.com/developerworks/wikis/display/WikiPtype/Java+Performance+Advisor



Java Performance Advisor



FC over Ethernet – 5708 Adapter9000 MTU1500 MTUSessionsDirectionTest

1756 MB/s

1733 MB/s

1179 MB/s

1173 MB/s

1111 MB/s

1076 MB/s

Single port

237415 TPS

26171 TPS

1914 MB/s

1712 MB/s

992 MB/s

1015 MB/s

1402 MB/s

1311 MB/s

Both ports

182062 TPS150

13324 TPS111 byte message

TCP_Request & Response

2176 MB/s1527 MB/s4

2106 MB/s1439 MB/s1duplex

1393 MB/s925 MB/s4

1393 MB/s785 MB/s1receive

1668 MB/s1068 MB/s4

1647 MB/s870 MB/s1sendTCP STREAM

Both portsSingle port

Host: P7 750 4-way, SMT-2, 3.3 GHz, AIX 5.3 TL12, dedicated LPAR, dedicated adapter

Client: P6 570 with two single-port 10 Gb (FC 5769), point-to-point wiring (no ethernet switch)

1Single session 1/TPS round trip latency is 75 microseconds, default ISNO settings, no interrupt coalescing



FCoE – Ethernet Performance

� Note that receive is the lowest performance

– Lower thruput

– Both ports only get slightly more thruput than a single port as sessions are added

– Cannot provide 10 Gb receive bandwidth on two ports

– A 50% duty cycle should be OK

� Disk IO does better due to larger blocks/buffers

� AIX 6.1 SMT4 should have better throughput



New Free Memory Tool

� Memory Tools

– An IBM FTSS maintains a webpage reviewing AIX memory and paging issues

http://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXV53memory

– He has also developed a script that post-processes various memory tool outputs to provide detailed breakdowns of system and user memory usage

• Tool is called pmrmemuse and a wrapper script called showmemsuse

https://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXmemuse

– Website provides usage, output examples and extensive FAQ on the caveats of svmon-based output



pmrmemuse – System SummaryInuse Pin Pgsp Virtual Inuse Pin Pgsp Virtual Segment P rocess

(4K pages) (4K pages) (4K pages) ( 4K pages) (MBs) (MBs) (MBs) (MBs) count count

------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Free (on AIX free list) 39 548 0 0 0 154.484 0.000 0.000 0.000

Used in AIX kernel but not in any segment 212 304 212304 0 212304 829.312 829.312 0.000 829.312

Used in AIX kernel & extension segments 1279 266 947052 31815 1292591 4997.133 3699.422 124.277 5049.184 337

Used in segments shared by several users 27 528 0 620 27974 107.531 0.000 2.422 109.273 3

Used for clnt (JFS2 & NFS) file cache segs 177 201 0 0 0 692.191 0.000 0.000 0.000 3126

Empty clnt (JFS2 & NFS) file cache segs 0.000 0.000 0.000 0.000 415659

Used for pers (JFS) file cache segments 5 0 0 0 0.020 0.000 0.000 0.000 4

Empty pers (JFS) file cache segs 0.000 0.000 0.000 0.000 5

Used by user wmethods in unshared segments 3945 684 404 742594 4251313 15412.828 1.578 2900.758 16606.691 120 12

Used by user wmethods in shared segments 1 0 0 1 0.004 0.000 0.000 0.004 1 2

Used by user oracle in unshared segments 74 554 5368 98428 173070 291.227 20.969 384.484 676.055 572 24

Used by user oracle in shared segments 627 361 0 479823 760211 2450.629 0.000 1874.309 2969.574 13 89

Used by user root in unshared segments 25 784 6436 11055 36485 100.719 25.141 43.184 142.520 261 108

Used by user root in shared segments 1 0 1 2 0.004 0.000 0.004 0.008 2 4

Used by user wmuser in unshared segments 620 24 273 877 2.422 0.094 1.066 3.426 12 3

Used by user wmuser in shared segments 2 0 1 3 0.008 0.000 0.004 0.012 3 6

….

Used by user daemon in unshared segments 122 4 519 602 0.477 0.016 2.027 2.352 3 1

Used by user flexsens in unshared segments 74 4 76 132 0.289 0.016 0.297 0.516 2 1

Unused work segments 12 706 200 388 13142 49.633 0.781 1.516 51.336 55

Segments found only in svmon -S output 2 065 160 45 2107 8.066 0.625 0.176 8.230 16

Empty work segments 0.000 0.000 0.000 0.000 4327

Empty rmap segments 0.000 0.000 0.000 0.000 10

Empty mmap segments 0.000 0.000 0.000 0.000 110

Empty clnt segs found only in svmon -S otpt 0.000 0 .000 0.000 0.000 2

Empty work segs found only in svmon -S otpt 0.000 0 .000 0.000 0.000 23

Segs not found in svmon -S otpt (see below) 0.000 0. 000 0.000 0.000 49 2

------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Totals (Free + Used): 6425 414 1172020 1365638 6771402 25099.273 4578.203 5334.523 26450.789 424729 258



pmrmemuse – System SummaryTotals reported in svmon -G output (for comparison to summary above):

Inuse Pin Pgsp Virtual Inuse Pin Pgsp Virtual Segment P rocess


------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Total real memory 6422 528 0 0 0 25088.000 0.000 0.000 0.000

Free (on AIX free list) 39 548 0 0 0 154.484 0.000 0.000 0.000

Used in AIX kernel but not in any segment 212 304 212304 0 212304 829.312 829.312 0.000 829.312

Used for clnt (JFS2 & NFS) file cache 178 644 0 0 0 697.828 0.000 0.000 0.000

Used for pers (JFS) file cache 5 0 0 0 0.020 0.000 0.000 0.000

Total memory used 6382 980 1172180 1348900 6759780 24933.516 4578.828 5269.141 26405.391

------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Information reported in vmo -a and vmstat -v output (for comparison to summary above):

Value Value

(4K pages) (MBs)

--------------------------------------------------- ----- ---------- ----------

Number of memory pages 6422528 25088.000

Number of lruable pages 6194912 24198.875

minfree (2048) * number of memory pools (6) 12288 48.000

number of free pages 38987 152.293

maxfree (3072) * number of memory pools (6) 18432 72.000

minperm% (3.0%) of number of lruable pages (6194912 ) 185847 725.965

numperm% (2.2%) of number of lruable pages (6194912 ) 136288 532.375

maxperm% (90.0%) of number of lruable pages (619491 2) 5575421 21778.988

numclient% (2.2%) of number of lruable pages (61949 12) 136288 532.375

maxclient% (90.0%) of number of lruable pages (6194 912) 5575421 21778.988



pmrmemuse – Detailed Summary by UserInuse Pin Pgsp Virtual Inuse Pin Pgsp Virtual Segment P rocess


------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Used by user wmethods in unshared segments 3945 684 404 742594 4251313 15412.828 1. 578 2900.758 16606.691 120 12

------------------------------------------- -------- -- ---------- ---------- ---------- ---------- --------- - ---------- ---------- -------- -------

Vsid Esid Type Description PSiz e Inuse Pin Pgsp Virtual Comments

---- ---- ---- ------------------------- ----- ------ --- --- ------ ------- -----------------------------

9f5819 - work mmap source s m 65536 0 0 65536 65536 x 4 KB = 256.000 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

aeb72b - work mmap source s m 65536 0 0 65536 65536 x 4 KB = 256.000 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

c45442 - work mmap source s m 65536 0 0 65536 65536 x 4 KB = 256.000 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

e6afe2 - work mmap source s m 65172 0 1959 65536 65172 x 4 KB = 254.578 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

91d991 - work mmap source s m 65078 0 4996 65536 65078 x 4 KB = 254.211 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

e75a66 - work mmap source s m 65026 0 2435 65536 65026 x 4 KB = 254.008 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

b414b1 - work mmap source s m 65021 0 2355 65536 65021 x 4 KB = 253.988 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

c325c7 - work mmap source s m 64996 0 2479 65536 64996 x 4 KB = 253.891 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

f1d171 - work mmap source s m 64948 0 5770 65536 64948 x 4 KB = 253.703 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

91af93 - work mmap source s m 64938 0 2586 65536 64938 x 4 KB = 253.664 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

…

a626a2 11 work text data BSS heap sm 48275 0 18642 62762 48275 x 4 KB = 18 8.574 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

e769e1 - work mmap source s m 48028 0 26780 65536 48028 x 4 KB = 187.609 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

a4eca1 13 work text data BSS heap sm 6517 0 768 6803 6517 x 4 KB = 2 5.457 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

f0a4f4 9001000a work shared library data sm 333 0 70 449 333 x 4 KB = 1.301 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

b73c31 f00000002 work process private m 5 3 0 5 5 x 64 KB = 0.312 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

ba5e3c 80020014 work USLA heap sm 46 0 1 47 46 x 4 KB = 0.180 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

b62632 ffffffff work application stack sm 3 0 12 15 3 x 4 KB = 0.012 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

dc2f59 - work mmap source s m 0 0 1 1 0 x 4 KB = 0.000 MB, pid=4784560: Inuse=6110.3 MB, pscmd=java

…

b32437 - work mmap source s m 65536 0 0 65536 65536 x 4 KB = 256.000 MB, pid=7798878: Inuse=5361.5 MB, pscmd=java

950f90 - work mmap source s m 65536 0 450 65536 65536 x 4 KB = 256.000 MB, pid=7798878: Inuse=5361.5 MB, pscmd=java

a9f3ac - work mmap source s m 65536 0 0 65536 65536 x 4 KB = 256.000 MB, pid=7798878: Inuse=5361.5 MB, pscmd=java



Stay Connected & Continue Skills Transfer via IBM Training

� Training pathsWhat to take, when to take it

� Social mediaJoin the conversation

� Custom catalogCreate a catalog that meets your interest areas

� RSS feedsUp-to-date information on the training you need

� IBM Training NewsTargeted to your needs

� New to Instructor Led Online (ILO)? Take a freetest drive!

� Education PacksOnline discount program for ALL IBM Training courses for your company

Questions? Email Lisa Ryan ([email protected])



This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area.

Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.

All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.

IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.

IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.

All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.

Revised September 26, 2006

Special notices



IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 6 (logo), AS/400, Active Memory, BladeCenter, Blue Gene, CacheFlow, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC, pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, AIX 5L, Chiphopper, Chipkill, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General Purpose File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER, PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems, Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+, POWER6, POWER7, pureScale, System i, System p, System p5, System Storage, System z, Tivoli Enterprise, TME 10, TurboCore, Workload Partitions Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (®or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml

The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.UNIX is a registered trademark of The Open Group in the United States, other countries or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both.Intel, Itanium, Pentium are registered trademarks and Xeon is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or both.AMD Opteron is a trademark of Advanced Micro Devices, Inc.Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both.AltiVec is a trademark of Freescale Semiconductor, Inc.Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association. Other company, product and service names may be trademarks or service marks of others.

Revised February 9, 2010

Special notices (cont.)



The IBM benchmarks results shown herein were derived using particular, well configured, development-level and generally-available computer systems. Buyers should consult other sources of information to evaluate the performance of systems they are considering buying and should consider conducting application oriented testing. For additional information about the benchmarks, values and systems tested, contact your local IBM office or IBM authorized reseller or access the Web site of the benchmark consortium or benchmark vendor.

IBM benchmark results can be found in the IBM Power Systems Performance Report at http://www.ibm.com/systems/p/hardware/system_perf.html .

All performance measurements were made with AIX or AIX 5L operating systems unless otherwise indicated to have used Linux. For new and upgraded systems, AIX Version 4.3, AIX 5L or AIX 6 were used. All other systems used previous versions of AIX. The SPEC CPU2006, SPEC2000, LINPACK, and Technical Computing benchmarks were compiled using IBM's high performance C, C++, and FORTRAN compilers for AIX 5L and Linux. For new and upgraded systems, the latest versions of these compilers were used: XL C Enterprise Edition V7.0 for AIX, XL C/C++ Enterprise Edition V7.0 for AIX, XL FORTRAN Enterprise Edition V9.1 for AIX, XL C/C++ Advanced Edition V7.0 for Linux, and XL FORTRAN Advanced Edition V9.1 for Linux. The SPEC CPU95 (retired in 2000) tests used preprocessors, KAP 3.2 for FORTRAN and KAP/C 1.4.2 from Kuck & Associates and VAST-2 v4.01X8 from Pacific-Sierra Research. The preprocessors were purchased separately from these vendors. Other software packages like IBM ESSL for AIX, MASS for AIX and Kazushige Goto’s BLAS Library for Linux were also used in some benchmarks.

For a definition/explanation of each benchmark and the full list of detailed results, visit the Web site of the benchmark consortium or benchmark vendor.

TPC http://www.tpc.orgSPEC http://www.spec.orgLINPACK http://www.netlib.org/benchmark/performance.pdfPro/E http://www.proe.comGPC http://www.spec.org/gpcVolanoMark http://www.volano.comSTREAM http://www.cs.virginia.edu/stream/SAP http://www.sap.com/benchmark/Oracle Applications http://www.oracle.com/apps_benchmark/PeopleSoft - To get information on PeopleSoft benchmarks, contact PeopleSoft directly Siebel http://www.siebel.com/crm/performance_benchmark/index.shtmBaan http://www.ssaglobal.comFluent http://www.fluent.com/software/fluent/index.htmTOP500 Supercomputers http://www.top500.org/Ideas International http://www.ideasinternational.com/benchmark/bench.htmlStorage Performance Council http://www.storageperformance.org/results

Revised March 12, 2009

Notes on benchmarks and values



Revised March 12, 2009

Notes on HPC benchmarks and valuesThe IBM benchmarks results shown herein were derived using particular, well configured, development-level and generally-available computer systems. Buyers should consult other sources of information to evaluate the performance of systems they are considering buying and should consider conducting application oriented testing. For additional information about the benchmarks, values and systems tested, contact your local IBM office or IBM authorized reseller or access the Web site of the benchmark consortium or benchmark vendor.

IBM benchmark results can be found in the IBM Power Systems Performance Report at http://www.ibm.com/systems/p/hardware/system_perf.html .

All performance measurements were made with AIX or AIX 5L operating systems unless otherwise indicated to have used Linux. For new and upgraded systems, AIX Version 4.3 or AIX 5L were used. All other systems used previous versions of AIX. The SPEC CPU2000, LINPACK, and Technical Computing benchmarks were compiled using IBM's high performance C, C++, and FORTRAN compilers for AIX 5L and Linux. For new and upgraded systems, the latest versions of these compilers were used: XL C Enterprise Edition V7.0 for AIX, XL C/C++ Enterprise Edition V7.0 for AIX, XL FORTRAN Enterprise Edition V9.1 for AIX, XL C/C++ Advanced Edition V7.0 for Linux, and XL FORTRAN Advanced Edition V9.1 for Linux. The SPEC CPU95 (retired in 2000) tests used preprocessors, KAP 3.2 for FORTRAN and KAP/C 1.4.2 from Kuck & Associates and VAST-2 v4.01X8 from Pacific-Sierra Research. The preprocessors were purchased separately from these vendors. Other software packages like IBM ESSL for AIX, MASS for AIX and Kazushige Goto’s BLAS Library for Linux were also used in some benchmarks.

For a definition/explanation of each benchmark and the full list of detailed results, visit the Web site of the benchmark consortium or benchmark vendor.SPEC http://www.spec.orgLINPACK http://www.netlib.org/benchmark/performance.pdfPro/E http://www.proe.comGPC http://www.spec.org/gpcSTREAM http://www.cs.virginia.edu/stream/Fluent http://www.fluent.com/software/fluent/index.htmTOP500 Supercomputers http://www.top500.org/AMBER http://amber.scripps.edu/FLUENT http://www.fluent.com/software/fluent/fl5bench/index.htmGAMESS http://www.msg.chem.iastate.edu/gamessGAUSSIAN http://www.gaussian.comANSYS http://www.ansys.com/services/hardware-support-db.htm

Click on the "Benchmarks" icon on the left hand side frame to expand. Click on "Benchmark Results in a Table" icon for benchmark results.ABAQUS http://www.simulia.com/support/v68/v68_performance.phpECLIPSE http://www.sis.slb.com/content/software/simulation/index.asp?seg=geoquest&MM5 http://www.mmm.ucar.edu/mm5/MSC.NASTRAN http://www.mscsoftware.com/support/prod%5Fsupport/nastran/performance/v04_sngl.cfmSTAR-CD www.cd-adapco.com/products/STAR-CD/performance/320/index/htmlNAMD http://www.ks.uiuc.edu/Research/namdHMMER http://hmmer.janelia.org/

http://powerdev.osuosl.org/project/hmmerAltivecGen2mod



Revised April 2, 2007

Notes on performance estimatesrPerf for AIX

rPerf (Relative Performance) is an estimate of commercial processing performance relative to other IBM UNIX systems. It is derived from an IBM analytical model which uses characteristics from IBM internal workloads, TPC and SPEC benchmarks. The rPerf model is not intended to represent any specific public benchmark results and should not be reasonably used in that way. The model simulates some of the system operations such as CPU, cache and memory. However, the model does not simulate disk or network I/O operations.

� rPerf estimates are calculated based on systems with the latest levels of AIX and other pertinent software at the time of systemannouncement. Actual performance will vary based on application and configuration specifics. The IBM eServer pSeries 640 is the baseline reference system and has a value of 1.0. Although rPerf may be used to approximate relative IBM UNIX commercial processing performance, actual system performance may vary and is dependent upon many factors including system hardware configuration and software design and configuration. Note that the rPerf methodology used for the POWER6 systems is identical to that used for the POWER5 systems. Variations in incremental system performance may be observed in commercial workloads due to changes in the underlying system architecture.

All performance estimates are provided "AS IS" and no warranties or guarantees are expressed or implied by IBM. Buyers should consult other sources of information, including system benchmarks, and application sizing guides to evaluate the performance of a system they are considering buying. For additional information about rPerf, contact your local IBM office or IBM authorized reseller.

========================================================================

CPW for IBM i

Commercial Processing Workload (CPW) is a relative measure of performance of processors running the IBM i operating system. Performance in customer environments may vary. The value is based on maximum configurations. More performance information is available in the Performance Capabilities Reference at: www.ibm.com/systems/i/solutions/perfmgmt/resource.html

performance monotoring 1 aix perf updates

Documents

work mmap

infiniband

enterprise

block io devices

technical

ibm authorized

file cache

unfolding