altix uv hw/sw · tg11 - sgi altix uv tutorial ncsa - psc - rdav cache affinity • affinity...
TRANSCRIPT
![Page 1: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/1.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Altix UV HW/SW • SGI Altix UV utilizes an array of advanced hardware and software
feature to offload:
thread synchronization
data sharing
massage passing overhead from CPUs.
• This system has a rich set of hardware features that enable scalable programming models to be implemented with high efficiency and performance.
SGI MPI
• The SGI MPI software stack includes a number of software components.
![Page 2: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/2.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
SGI MPI Software Stack • MPI
• XPMEM(cross process memory mapping)
• GRU development kit
• NUMA tools
• Perfboost
• Perfcatcher
• MPInside
![Page 3: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/3.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
UV HUB The UV_HUB is a custom ASIC developed by SGI. It
implements NUMAlink5 protocol, memory operations and associated atomic operations. It provides following capabilities:
Cache-coherent global shared memory.
Offloading time-sensitive and data-intensive operations from processors to increase processing efficiency and scaling.
Scalable, reliable, fair interconnect with other blades via NUMAlink5.
![Page 4: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/4.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Altix UV Blade and HUB
Source : SGI Altix UV 1000 System User’s Guide
![Page 5: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/5.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
UV HUB in detail • SI(socket interface):
provides bridge between the Hub’s LH and RH chip sets and Intel sockets.
• To communicate with Intel sockets, the SI implements an Intel proprietary Interconnect called CSI(common system interface).
Source : SGI Altix UV admin manual
![Page 6: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/6.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
UV HUB in detail • LH(local home)
manages directory operations associated with remote memory requests. The LH has a single external memory channel.
• RH(Remote home)
processes coherent and non-coherent CSI transactions that are initialized by a local socket to a remote system address.
processes Numalink intervention and invalidate requests when remote is locally cached by a socket.
• LB(local block)
provides system software the ability to select, configure and control various functionalities of the UV hub chip.
provides facilities to monitor, diagnose, and debug hardware states and operations on live systems.
![Page 7: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/7.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
UV HUB Units • The NUMAlink interconnect
• The Global reference unit(GRU)
• The processor interconnect
![Page 8: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/8.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
NUMAlink Interconnect • Shared memory, globally addressable system
interconnect.
• All physically distributed system memory is mapped into one global address space.
• Peak aggregate bi-directional bandwidth 15GB/s.
• 2-3x MPI latency improvement.
• Special support for block transfer and global operation.
• NUMAlink is connected into the memory infrastructure of the system, versus being indirectly connected through an IO subsystem chip.
![Page 9: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/9.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Fetch-Op in HUB • Fetch-Op-variables on Hub provide fast synchronization
• The Fetch-Op AMO helped reduce MPI send/recv latency from 12 to 8 microseconds.
• Used by MPI_Barrier, MPI_Win_fence, and shmem barrier all
CPU HUB
ROUTER
CPU
Fetch-op variable
![Page 10: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/10.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
GRU • Hardware in the Hub for memory to memory block
transfer and CPU synchronization events.
• It is used by MPI, SHMEM, UPC
• External TLB with large page support
• Page initialization
• Scatter/Gather operations
• Update cache for AMOs
![Page 11: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/11.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
GRU API Components • GRU resource allocators
• GRU memory access functions
• XPMEM address mapping functions
• MPT address mapping functions
![Page 12: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/12.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
MOE • It is a set of functionality that offloads MPI communication
workload from CPUs to the Altix UV_HUB ASIC, accelerating common MPI tasks such as barriers and reductions across GSM(global shared memory).
• Similar in concept to a TCP/IP offload engine(TOE) which offloads TCP/IP protocol processing from system CPUs.
• Frees CPU from MPI activity.
• Faster reduction operations.
• Faster barriers and random access.
![Page 13: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/13.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
MPI and MOE • Accessing the MOE.
• MOE implements atomic memory operations in conjunction with a hardware multicast facility that helps to accelerate MPI_barrier, MPI_Bcast, MPI_Allreduce.
• Accelerates MPI point-to-point and collective communication.
![Page 14: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/14.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
MOE Advantages • MOE provides:
MPI message queues
synchronization primitives
Advanced RDMA capabilities such as strided and indexed global memory updates.
Hardware multicast.
![Page 15: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/15.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Determining System Configuration
![Page 16: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/16.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
topology • topology:
displays general information about SGI Altix system, with a focus on node information.
• It includes node counts for blades, node IDs, NASIDs, memory per node, UV hub and partition number.
![Page 17: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/17.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
cpumap • cpumap: displays logical CPUs and shows
relationship between them.
• Aspects displayed include, hyper threading, last level cache sharing and topology placements.
• It gets information from /proc/cpuinfo, /sys/devices/system and /proc/sgi_uv/topology
![Page 18: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/18.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
cpumap
![Page 19: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/19.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
nodeinfo
![Page 20: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/20.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
nodeinfo • Hit: page was allocated on the preferred node.
• Miss: preferred node was full. Allocation occurred on this node by a process running on another node that was full.
• Foreign: preferred node was full. Had to allocate somewhere else.
• Interleave: allocation was for interleaved policy numactl –i.
• Local: page allocated on This node by a process running on this node.
• Remote: page allocated on this node by a process running on the another node.
![Page 21: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/21.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
x86info
![Page 22: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/22.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
pmchart • Put figure here
![Page 23: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/23.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
pmchart
• Put figure here
![Page 24: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/24.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
HW Summary • /proc/cpuinfo • /proc/meminfo • /sys/devices/system/node • /dev/cpuset/torque/job#
![Page 25: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/25.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Data Placement Tool
![Page 26: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/26.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
CPU Scheduling • In a single-processor system, only one process
can run at a time.
• CPU scheduling controls how the OS switches access to the CPU between processes.
• Kernel provides mechanism called time slicing.
• Time slice is the maximum length of time that a process owns its CPU resource and executes at its current policy.
• Each CPU has its own run queue.
![Page 27: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/27.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Cache Affinity • Affinity scheduling is a special scheduling discipline
used in multiprocessor system.
• As a process executes, it causes more and more data and instruction text to be loaded into the processor cache. This creates an “affinity” between the process and the CPU.
![Page 28: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/28.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Data Placement Tool
• NUMA machines have a shared address space. There is a single shared memory space and a single operating system instance.
• Performance penalty to access remote memory versus local memory.
• Access time to memory vary over physical address ranges and between processing elements. NUMAlink used to access memory between blades/node.
• Memory latency is lowest when a processor accesses local memory.
• NUMA tool also helps run multiple instances of serial program in a single job script with better processes placement.
![Page 29: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/29.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
NUMA API • The API is called from libcpuset
cpuset: create, modify, destroy cpuset.
taskset: Run a process on specific physical CPU.
numactl: Control NUMA policy for processes or shared memory.
dplace: Binds process to specific logical CPU.
omplace: Controls the placement of MPI processes and OpenMP threads.
Batch systems: LSF, PBSPro, Torque, SGE
dlook, dlook-summary, pidstat, cpuset-Q
![Page 30: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/30.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
cpuset • cpuset includes sched_setaffinity for CPU
binding and memory binding.
• Each task has a link to a cpuset structure that specifies the CPUs and memory node available for its use.
• All tasks sharing the same placement constraints reference the same cpuset.
![Page 31: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/31.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Why Use a cpuset ? • Restrict consumption of designated resources
CPU to specified processes/threads.
• Limit run time variability.
• Memory affinity.
• Isolates the I/O.
![Page 32: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/32.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
How Are cpuset’s Used • Static cpusets (batch calls shared by queue)
Cpusets are defined by administrator after system startup.
User attach processes to the existing cpusets.
Cpusets continue to exist after job finish executing.
• Dynamic cpusets
Workload management system(WMS) creates cpuset when It is required by a job.
WMS attaches job to the newly created cpus.
WMS destroys cpuset at the end of job.
![Page 33: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/33.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
cpuset Command Line Options
• cpuset
-c cpuset_name Create CPUSET
-m cpuset_name Modify CPUSET
-x cpuset_name Destroy CPUSET
-d cpuset_name Dump CPUSET attributes
-i csname –I script Run command
-p cpuset_name List all procs in CPUSET
-a cpuset_name Attach pids to CPUSET
-w pid List CPUSET the PID is attached to
-f filename input config file
![Page 34: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/34.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Advantage of Cpuset? • It improves cache locality and memory access
time.
• Facilitates providing equal resources to each thread in a job.
Results in both optimum and repeatable performance.
![Page 35: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/35.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
taskset • taskset: restricts execution to the listed set of
CPUs. However, processes are still free to move among listed CPUs.
• It is used to set of retrieve the CPU affinity of a running process given its PID or to launch a new command with a given CPU affinity.
• The CPU affinity is represented as a bitmask(hexadecimal), with the lowest order bit.
0x00000001 ## is processor number 0.
![Page 36: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/36.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
taskset • taskset:It does not pin a task to a specific CPU.
It only restricts a task so that it does not run any CPU that is not in the cpulist.
• If you are running an MPI application, you do not use the taskset command. Instead of taskset use dplace.
mpirun –np 8 dplace -s1 –c10, 11, 16-21 ./a.out
export MPI_DSM_CPULIST 10,11,16-21 mpirun –np 8 ./a.out
![Page 37: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/37.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
taskset examples • taskset 0x1 ./a.out #executes on physical CPU 1
• taskset 0x00131 ./a.out #executes on physical CPUs 0 4 5 8
• taskset –p 0xa8 14386 #executes PID 14386 on physical CPUs 3 5 and 7
• taskset –p –c 5 ./a.out #execute a.out on physical CPU 5
• taskset –p 14386 #returns the affinity mask of PID 14386
![Page 38: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/38.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
numactl • Runs processes with a specific NUMA scheduling or memory placement
policy.
• Control memory placement
Interleave node(round robin)
Membind(allocate from specified node pool)
Preferred node
Local allocation(first touch)
• Each task has a link to a cpuset structure that specifies the CPUs and memory node available for its use.
• All tasks sharing the same placement constraints reference the same cpuset.
![Page 39: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/39.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
numactl Command Line Options
• numactl --interleave Set a memory interleave policy.
--membind Only allocate memory from nodes.
--cpunodebind Only execute command on the CPUs of nodes.
--physcpubind Only execute process on CPUs.
![Page 40: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/40.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
numactl examples • numactl --physcpubind=+0-4,8-12 myapplic arguments #Run myapplic on cpus 0-4 and 8-12 of the current cpuset.
• numactl --interleave=all bigdatabase arguments #Run big database with its memory interleaved on all CPUs.
• numactl --cpubind=0 --membind=0,1 process #Run process on node 0 with memory allocated on node 0 and 1.
![Page 41: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/41.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
numactl --hardware
![Page 42: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/42.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dplace • dplace ensures the Linux kernel “pins” a thread [or series
of threads] to a specific CPU core within a container. Once pinned they do not migrate.
• By default, binds processes sequentially in a round-robin fashion against logical CPUs in current cpuset.
• Integrate with MPT[via omplace and environmental variables].
• It understands fork, exec, threads etc..
• Helps to ensure optimal performance and to minimize runtime variability.
![Page 43: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/43.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dplace Feature • Default memory allocation policy is node-local (first
touch).
• dplace allows processes to be bound to specific logical(within cpuset) cpus.
• Prevents migration (thread hopping).
• May require knowledge of application.
• Global load balancing.
![Page 44: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/44.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dplace Command Line Options
• dplace -c CPU list
-e exact placement
-s skip n cpu’s before starting placement
-n only processes with name
-x skip mask
-p placement file
-r replicate shared text to each node
-q list global count
![Page 45: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/45.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dplace examples • dplace –c 0-3 ./a.out # places thread on the first four cpus, beginning with core 0.
• dplace –c 0-7 –x2 ./a.out # place threads on the first 8 cpus, but used SKIP MASK[-x2] to skip the second thread(which in the case of Intel OpenMP is the lightweight monitor thread)
• mpirun –np 8 dplace –s1 –c 0-7 ./a.out # skips the first process as this process is essentially the MPI shepherd. dplace handles the placements of the other 7 MPI ranks.
![Page 46: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/46.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
numactl and dplace • Consider a code that runs with 4 threads.
• What is the difference between
numactl –c 0-3 a.out
dplace –c 0-3 a.out
• With dplace, each thread is bound to a particular cpu. With numactl, the threads are bound to the range of cpus 0-3, and are free to migrate within that range.
• numactl does have memory binding options.
![Page 47: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/47.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
omplace • Tool for controlling the placement of MPI processes and
OpenMP threads.
-c cpulist: specifies the effective CPU list.
-nt threads: specifies the number of threads per MPI process.
-s skip: the number of processes to skip before placements starts.
-vv verbose: Automatically generated placement file will be displayed in its entirely.
![Page 48: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/48.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
omplace examples • mpirun –np 2 omplace –nt 4 -vv ./a.out # To run 2 MPI processes with 4 threads per process, and to display the generated placement file.
![Page 49: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/49.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dlook • Tool for showing process memory maps and cpu usage.
• View address space and page placement.
• Two forms
dlook [options] pid
dlook [options] <command> [command-args]
• Run a MPI job using mpirun and print the memory map for each thread:
mpirun –np 8 dlook a.out
![Page 50: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/50.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Summary • Use cpumap to determine partitioning and placement.
• Use taskset to lock a process or process groups into CPU or group of CPUs.
• Use dplace to place a process group into system topology.
• Run an MPI/OpenMP hybrid and use omplace for pining.
• Use numactl to control memory placement.
![Page 51: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/51.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Tips!!! • Use dplace, numactl, or cpuset to lock down
processes, preventing thread hopping/migration.
• Strong cache affinity reduces cache misses, instruction pipeline flushes.
• Keeps processes close to their node-local memory.
• Be aware of data placement.
![Page 52: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/52.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
Heisenberg Principle • Looking at the system will impact the system
• Tracing events are the highest impact: strace, gprof,
• PCP and sar the lowest impact
• You can not measure a system without effecting it. top will show up in the top display.
• PCP uses less than 1% of a CPU.
![Page 53: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/53.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
sar • sar indicates normal/abnormal behavior of
system. sar can imply performance problems and bottlenecks.
• Many people look at sar as a set of performance metrics when it is not. It is an indicator of what a system is doing!
• PCP and sar simply tell you what to look for.
![Page 54: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/54.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
sar • sar –vq to check kernel table sizes.
• sar -W to check swapping activity.
• sar –rsW to what memory and swap is left.
• sar –u reports the amount of time executing kernel code.
![Page 55: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/55.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
top, ps , pstree • top provides a dynamic real-time view of a
running system.
• top with H provides thread information.
• ps: report a snapshot of the currently running processes on the system. Use with grep <username> to get user specific information.
• pstree: display a tree of processes.
![Page 56: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/56.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
vmstat, mpstat • vmstat indicates reports information about
processes, memory, paging, block IO, traps and cpu activity.
• mpstat writes to standard output activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported.
![Page 57: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/57.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
mpvis • mpvis displays a three dimensional bar chart of
CPU utilization. The display is updated with new values retrieved from the target host or archive every interval seconds.
![Page 58: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/58.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
pidstat • pidstat is used for monitoring individual task a
currently being managed by the Linux kernel.
–r report page faults and memory utilization
-d report I/O statistics
-u report CPU utilization
-p select tasks for which statistics are to be reported
-t display statistics for threads associated with selected tasks
• pidstat –t –p 14374
![Page 59: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/59.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
cpuset-Q • It gives information allocated CPUs, node, IPD,
WCHAN, Command name etc..
![Page 60: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/60.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
dlook • Tool for showing process memory maps and cpu
usage.
• Two forms
dlook [options] pid
dlook [options] <command> [command-args]
• Run a MPI job using mpirun and print the memory map for each thread:
mpirun –np 8 dlook a.out
![Page 61: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As](https://reader030.vdocuments.mx/reader030/viewer/2022040210/5e54f15b185cc02bcf49f230/html5/thumbnails/61.jpg)
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV
References • UV System Analysis Manual • UV System Administration Manual • Technical Advances in the SGI Altix UV
Architecture(white paper) • A Hardware-Accelerated MPI Implementation on
SGI Altix UV Systems(white paper) • Linux Application Tuning Guide for SGI X86_64
Based Systems • SGI Message Passing Toolkit(MPT) User’s Guide • SGI NUMAlink white paper