an analysis of node sharing on hpc clusters using xdmod/ tacc_stats
DESCRIPTION
An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats. Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University at Buffalo, SUNY XSEDE14 JULY 13– 18, 2014. Outline. Motivation Overview of tools (XDMOD, tacc_stats ) Background Results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/1.jpg)
An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats
Joseph P White, Ph.DScientific Programmer - Center for Computational Research
University at Buffalo, SUNYXSEDE14 JULY 13– 18, 2014
![Page 2: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/2.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Outline• Motivation• Overview of tools (XDMOD, tacc_stats)• Background• Results• Conclusions• Discussion
![Page 3: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/3.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
CoAuthors
• Robert L. DeLeon (UB)• Thomas R. Furlani (UB)• Steven M. Gallo (UB)• Matthew D Jones (UB)• Amin Ghadersohi (UB)• Cynthia D. Cornelius (UB)• Abani K. Patra (UB)• James C. Browne (UTexas)• William L. Barth (TACC) • John Hammond (TACC)
![Page 4: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/4.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Motivation
• Node sharing benefits:– increases throughput by up to 26%– increases energy efficiency by up to 22% (Breslow et al.)
• Node sharing disadvantages:– resource contention
• Number of cores per node increasing• Ulterior motive:
– Prove toolset
• A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely. The case for colocation of hpc workloads. Concurrency and Computation: Practice and Experience, 2013 http://dx.doi.org/10.1002/cpe.3187
![Page 5: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/5.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Tools
• XDMoD– NSF funded open source tool that provides a wide range of usage
and performance metrics on XSEDE systems– Web-based interface– Powerful charting features
• tacc_stats– low-overhead collection of system-wide performance data– Runs on every node on a resource collects data at job start, end and
periodically during job• CPU usage • Hardware performance counters• Memory usage• I/O usage
![Page 6: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/6.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Data flow
![Page 7: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/7.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Data flow
![Page 8: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/8.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
XDMoD Data Sources
![Page 9: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/9.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Background
• CCR's HPC resource "Rush"– 8000+ cores– Heterogeneous cluster 8, 12, 16 or 32 cores per node– InfiniBand– Panasas parallel filesystem– SLURM resource manager
• node sharing enabled by default• cgroup plugin to isolate jobs
• Academic computing center: higher % of smaller jobs than large XSEDE resources
• All data from Jan - Feb 2014 (~370,000 jobs)
![Page 10: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/10.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Number of jobs by job size
![Page 11: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/11.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Results
• Exclusive jobs: where no other jobs ran concurrently on the allocated node(s) (left hand side of plots)
• Shared jobs: where at least one other job was running on the allocated node(s) (right hand side)
– Process memory usage– Total OS memory usage– LLC read miss rates– Job exit status– Parallel filesystem bandwidth– InfiniBand interconnect bandwidth
![Page 12: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/12.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Memory usage per core
• (MemUsed - FilePages - Slab) from/sys/devices/system/node/node0/meminfo
Memory usage per core GBExclusive jobs
Memory usage per core GBShared jobs
![Page 13: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/13.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Total memory usage per core(4GB/core nodes)
Total memory usage per core GBExclusive jobs
Total memory usage per core GBShared jobs
![Page 14: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/14.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Last level cache (LLC) read miss rate per socket
• UNC_LLC_MISS:READ on Intel Westmere uncore• Gives upper bound estimate of DRAM bandwidth
LLC read miss rate 106/sExclusive jobs
LLC read miss rate 106/sShared jobs
![Page 15: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/15.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Job exit status reported by SLURM
Successful Killed Failed0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Exit status
Exclusive jobs Shared jobs
Frac
tion
of Jo
bs
![Page 16: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/16.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Panasas parallel filesystem write rate per node
Write rate per node B/s Exclusive jobs
Write rate per node B/s Shared jobs
![Page 17: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/17.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
InfiniBand write rate per node
Write rate Log10(B/s)Exclusive jobs
Write rate Log10(B/s)Shared jobs
• Peaks truncated:• ~45,000 for Exclusive jobs • ~80,000 for shared jobs
![Page 18: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/18.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Conclusions
• Little difference on average between the shared and exclusive jobs on Rush
• Majority of jobs have resource usage much less than max available
• Have created data collection/processing software that facilitates easy evaluation of system usage
![Page 19: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/19.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Discussion
• Limitations of current work– Unable to determine impact (if any) on job wall
time– Comparing overall average values for jobs– Shared node job statistics are convolved– Exit code not reliable way to determine failure
![Page 20: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/20.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Future work
• Use Application Kernels to get detailed analysis of interference
• Many more metrics now available:– FLOPS– CPU clock cycles per instruction (CPI)– CPU clock cycles per L1D cache load (CPLD)
• Add support for per job metrics on shared nodes.
• Study classes of applications
![Page 21: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/21.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Questions
• BOF: XDMoD: A Tool for Comprehensive Resource Management of HPC Systems– 6:00pm - 7:00pm tomorrow. Room A602
• XDMoD– https://xdmod.ccr.buffalo.edu/
• tacc_stats– http://github.com/TACCProjects/tacc_stats
• Contact info – [email protected]
![Page 22: An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats](https://reader035.vdocuments.mx/reader035/viewer/2022062723/56813c70550346895da60157/html5/thumbnails/22.jpg)
T E C H N O L O G Y A U D I T S E R V I C E
Acknowledgments
• This work is supported by the National Science Foundation under grant number OCI 1203560 and grant number OCI 1025159 for the technology audit service (TAS) for XSEDE