open-source profiling and analysis tools for aurora€¦ · open-source profiling and analysis...

13
Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp, Supada Laosooksathit, Preeti Suman, Tatyana Mineeva Intel, Software and Service Group (SSG) Systems Engineering, Architecture & Runtimes [email protected], [email protected]

Upload: others

Post on 10-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Open-Source Profiling and Analysis Tools for AuroraPerformance studies on Knights Landing

1 August 2016

Rashawn L. Knapp, Supada Laosooksathit, Preeti Suman, Tatyana MineevaIntel, Software and Service Group (SSG)Systems Engineering, Architecture & [email protected], [email protected]

Page 2: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

1 Aug. 20162

Open-Source Tools Team: Executive SummaryIntroduce team and goals

Tools

Benchmark Suite

Performance Studies on Xeon® Phi™ Coprocessor Knights Landing (KNL)‐ Description of architecture and study platform

‐ Greater Chicago Area Systems Research 2016 – attended and discussed work

‐ OpenSpeedShop support for Intel compilers and study

‐ CAM-SE with OpenSpeedShop and HPCToolKit

Aurora Preparation

Summary and Next Seps

Page 3: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice31 Aug. 2016

Open-Source Tools Team: Goals and PurposeOur Role‐ Collaborations with Tool Owners

‐ Enable Open Source HPC analyzers, ensuring these performance tools run well on Intel’s current and upcoming Xeon Phi platforms

CORAL‐ Theta - Knights Landing (KNL), 8.5 petaflops (PFLOPS)

‐ Aurora - Knights Hill (KNH), peak 180 PFLOPS, >50,000 compute nodes, >7 PB DRAM and persistent memory

Page 4: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

1 Aug. 2016 Intel Confidential4

Open-Source Tools Team: Tools and Status

Tools table

Tool Description Status

Lo

w-l

ev

el

Fo

un

da

tio

n Dyninst (UMD, UW) Dynamic binary instrumentation tool HSW EP, KNL, Intel/GCC compilers, Intel MPI. Test suite Several HSW patchesVersions: 8.2.1 - 9.2.0; Verified: test suite

PAPI (UTK) Interface to count CPU and off-core performance events HSW EP, KNL (w/ Intel patch) with Intel and GCC compilers. Test suite completed, Patch to enable off-core HSW events in Component tests. (Jul. ‘15)Versions 5.4.1- 5.4.3; Verified: test suite

Hig

h-l

ev

elT

oo

ls

TAU (UO) Profiling and tracing tool for parallel applications, supports MPI and OpenMP; incorporates Dyninst and PAPI

HSW EP Intel Compilation with Intel MPI, MPICH, and Intel and GCC C/C++/Fortran compilers, Intel MPI, MPICH, Dyninst, PAPI. Version: 2.24.1 (HSW), KNL in progress (v 2.25.1)

Score-P (VI-HPS) Provides a common interface for high-level tools; incorporates Dyninst and PAPI

HSW EP: Intel/GCC compilers, Dyninst, PAPI, Intel MPI/MPICHVersion 3.0Compiled but not tested (goal is with TAU)

Open|Speedshop(Krell Institute)

Dynamic Instrumentation tool for Linux: profiling, event tracing for MPI and OpenMP programs; incorporates Dyninst and PAPI

HSW EP, KNL, Intel/GCC compilers, Intel MPI, Dyninst, PAPIPatch to enable OSS installation with Intel compilers (Q1 ‘16)Version 2.2.*; Verified: benchmark suite

HPCToolKit (Rice) Lightweight sampling measurement tool for HPC; incorporates Dyninst* and PAPI

HSW EP, KNL, Intel/GCC compilers, Intel MPIVersions 5.4.*; Verified: benchmark suite

Darshan (ALCF) IO monitoring tool HSW EP, KNL, Intel/GCC compilers, Intel MPIVersions 2.3.1, 3.0.1; Verified: benchmark suite

Ind

ep

en

de

nt Valgrind Base framework for constructing dynamic analysis tools; includes suite of tools

including a debugger, and error detection for memory and pthreads.

HSW EP, KNL, Intel/GCC compilers.Version: 3.10.1; Verified: test suite

memcheck Detects memory errors: stack, heap, memory leaks, and MPI distributedmemory. For C and C++.

HSW EP, KNL, Intel/GCC compilers.Version: 3.10.1; Verified: test suite

helgrind Pthreads error detection: synchronization, incorrect use of pthreads API, potential deadlocks, data races. C, C++, Fortran

Enabled on HSW EP and KNL with Intel/GCC compilers.Version: 3.10.1; Verified: test suite

Page 5: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

1 Aug. 20165

Open-Source Tools Team: Benchmark Suite‐ CORAL benchmarks

‐ Well-known HPC benchmark (WKHPCB)

Name Category Type Priority Notes

LSMS Scalable Science Computation, Process communication, System scalability 1

CAM-SE Throughput Computation, Process communication 1

AMG2013 Throughput Computation, Process communication, Memory-access bound 1

UMT2013 Throughput Computation, Process communication, Memory-access bound 1

IOR Skeleton Process communication, IO 1

STRIDE Skeleton Computation, Memory 1

FTQ Skeleton Computation 2

HPL WKHPCB Computation, Process communication 1

STREAM WKHPCB Memory bandwidth 1

HPCG WKHPCB Computation, Memory, Process communication 1

NPB WKHPCB Computation, Memory, Process communication 1Serial, MPI, OMP, hybrid

HPCC WKHPCB Computation, Memory, Process communication 1 HPL, Stream

Page 6: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

1 Aug. 20166

Open-Source Tools Team: KNL Performance Studies

The Knights Landing Processor

Knights Landing (KNL) Highlights‐ Either as main processor or a co-processor

‐ Intel(R) Advanced Vector Extensions 512(Intel(R) AVX-512)

‐ 14-nanometer processor

‐ The chip contains 36 Tiles, each with 2 cores, 2Vector Processing Units (VPUs)/core and 1MBL2 cache; interconnected by 2D Mesh.

‐ 16 GB High Band Width Multi-ChannelDRAM(MCDRAM) and 6 channels DDR4

‐ Intel Omni-Path controller to support IntelOmni-Path Architecture (OPA)

Page 7: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

0

100

200

300

400

500

600

700

800

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

ao

be

nch

cil

k

NP

B W

is

NP

B W

cg

NP

B W

ep

NP

B W

mg

coll

isio

n t

bb

coll

isio

n c

ilk

um

t

rt s

er

ma

nd

elb

rot

tbb

wa

ve

2d

se

r

NP

B W

bt

hp

cg

rtm

_st

en

cil

tbb

NP

B W

sp

ma

nd

elb

rot

ser

pe

rlin

tb

b

NP

B W

dc

pe

rlin

se

r

rtm

_st

en

cil

ser

NP

B W

ft

NP

B W

mg

am

g

fwq

um

t

coll

isio

n s

er

NP

B W

bt

NP

B W

lu

ao

be

nch

se

r

lsm

s

vo

l re

nd

se

r

rtm

_st

en

cil

ser

be

nch

ma

rk s

tan

da

lon

e t

ime

, s

ov

erh

ea

d

Open|SpeedShop pcsamp overhead, %

Intel O|SS, Intel benchmark

GCC O|SS, Intel benchmark

Intel O|SS, GCC benchmark

GCC OSS, GCC benchmark

benchmark standalone time, s

1 Aug. 20167

Open-Source Tools Team: KNL Performance StudiesOpen|SpeedShop Enabling‐ Enabled compilation with Intel compiler

‐ Now includes intel-specific compilation option to the official release.

‐ No critical bugs were found on KNL

‐ On serial benchmarks, in 90% of cases the hotspots overhead is less than 5%

Page 8: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

1 Aug. 20168

Open-Source Tools Team: KNL Performance StudiesCAM-SE - HPCToolKit and Open|SpeedShop ProfilingExperiment setup-Purpose: Compare Profiling Capabilities of Open-Source tools on CAM-SE

-Approach:

- Open|SpeedShop – pcsamp trials

- HPCToolkit – hpcrun trials

- Compare top function reports from the tools

-CORAL CAM-SE benchmark

-Hardware and software environments

- Single node KNL machine

- Intel and GCC compilation versions

- Various combinations of number of MPI processes and OpenMP threads

Page 9: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice9

Page 10: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

‐ Enable tools in preparation for Aurora

‐ Optimizations to enable better tool performance

1 Aug. 2016 Intel Confidential10

Open-Source Tools Team: Aurora Preparation

Page 11: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

Summary

‐ Our enabling KNL work is mostly complete

Next Steps

‐ Complete in progress KNL performance studies

‐ Complete tool enabling (Score-P with TAU)

‐ Transition to Aurora

Fun Things that Intel offers:

‐ FLOPS calculation for KNL with Intel® Software Development Emulator (Intel® SDE): https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde

‐ OpenHPC: aggregation of common ingredients required for Linux HPC cluster deployment

‐ http://www.openhpc.community/

Questions

1 Aug. 2016 Intel Confidential11

Open-Source Tools Team: Summary and Next Steps

Page 12: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance ofthat product when combined with other products.

Copyright © 2014, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

121 Aug. 2016 Intel Confidential

Page 13: Open-Source Profiling and Analysis Tools for Aurora€¦ · Open-Source Profiling and Analysis Tools for Aurora Performance studies on Knights Landing 1 August 2016 Rashawn L. Knapp,