compiling for sve - microsoft...compiling for sve sve hackathon, sc18 will lovett software...
TRANSCRIPT
Compiling for SVESVE Hackathon, SC18
Will LovettSoftware Architect, HPC Tools
2
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
3
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
4
Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators
Cross-platform Tools Arm Architecture Tools
DDT MAPFORGE
PERFORMANCEREPORTS
C/C++ & FORTRANCOMPILER
PERFORMANCE LIBRARIES
5
Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators
Cross-platform Tools Arm Architecture Tools
DDT MAPFORGE
PERFORMANCEREPORTS
C/C++ & FORTRANCOMPILER
PERFORMANCE LIBRARIES
6
Arm’s commercially-supported C/C++/Fortran compiler
Tuned for Scientific Computing, HPC and Enterprise workloads• Processor-specific optimizations for various server-class Arm-based platforms• Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime
Linux user-space compiler with latest features• C++ 14 and Fortran 2003 language support with OpenMP 4.5• Support for Armv8-A and SVE architecture extension• Based on LLVM and Flang, leading open-source compiler projects
Commercially supported by Arm • Available for a wide range of Arm-based platforms running leading Linux
distributions – RedHat, SUSE and Ubuntu
Compilers tuned for Scientific Computing and HPC
Latest features and performance optimizations
Commercially supported by Arm
7
C/C++ Frontend
Fortran Frontend
Optimizer Armv8-A code-gen
SVE code-gen
Clang based LLVM based
PGI Flang based
Enhanced optimization for Armv8-A and SVE
C/C++ Files (.c/.cpp)
Fortran Files (.f/.f90)
Arm C/C++/Fortran Compiler
Armv8-A binary
SVEbinary
LLVM IR LLVM IRIR Optimizations
Auto-vectorization
LLVM based
LLVM based
Language specific frontend Architecture specific backendLanguage agnostic optimization
Arm Compiler is built on LLVM, Clang and Flang projects
8
Optimized BLAS, LAPACK and FFT
Commercial 64-bit Armv8-A math libraries • Commonly used low-level math routines - BLAS, LAPACK and FFT• Provides FFTW compatible interface for FFT routines• Batched BLAS support
Best-in-class serial and parallel performance• Generic Armv8-A optimizations by Arm• Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon
vendors
Validated and supported by Arm• Available for a wide range of server-class Arm-based platforms• Validated with NAG’s test suite, a de-facto standard• Available for Arm compiler or GNU
Best in class performance
Validated with NAG test suite
Commercially supportedby Arm
9 © 2018 Arm Limited
What’s new in the compiler
• Updates• GNU 8.2• LLVM 7.0
• Enhanced integration between ArmPL and Arm Compiler• Fortran
• Submodules (F2008)• Directives: IVDEP, NOVECTOR, VECTOR ALWAYS• NINT and DNINT performance
• Improved documentation
10 © 2018 Arm Limited
What’s new in the libraries
• Improved function prototypes• BLAS, CBLAS, LAPACK
• Enhanced libamath
• FFTW MPI interface added• Single and double precision
• Improved Guru interface
• Sparse mat-vec multiplication (SpMV) in CSR format
11
Useful Links
• SVE Hackathon SC18 landing page• https://developer.arm.com/hpc/events/sve-hackathon-sc18
• Arm porting and tuning guides• https://developer.arm.com/products/software-development-tools/hpc/resources/porting-and-tuning
• Arm application recipes (Please contribute to this)• https://gitlab.com/arm-hpc/packages/wikis/categories/allPackages
• Compiler documentation and quick start guides• https://developer.arm.com/products/software-development-tools/hpc/documentation• E.g. gfortran to armflang, ifort to armflang, compiler quick start
• Arm C Language Extensions (ACLE) for SVE• https://developer.arm.com/docs/100987/latest/arm-c-language-extensions-for-sve
• Arm community forum• https://community.arm.com/tools/hpc/f/hpc-user-group
12
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
13 © 2018 Arm Limited
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
14 © 2018 Arm Limited
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
• armclang for C/C++• armflang for Fortran
15 © 2018 Arm Limited
• Required to enable SVE code generation
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
-march=name[+[no]feature]• Targets an architecture profile,
generating generic code that runs on any processor of that architecture.
16 © 2018 Arm Limited
• Produces higher-quality, but less portable code
• Once SVE-based SoCs are available, this will remove the need for-march=armv8-a+sve
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
• Enables the compiler to automaticallydetect the CPU it is being run on andoptimize accordingly.
• This supports a range of Arm®v8-A based SoCs, including ThunderX2.
17 © 2018 Arm Limited
• SVE-enabled ArmPL libraries will beavailable in a future release
• SVE vector mathematical functions supported today
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
• Enable Arm Performance Libraries(ArmPL)
• Enables optimized versions of the C mathematical functions declared in math.h
• Enables optimized versions of Fortran math intrinsics
• Enables autovectorization of mathematical functions & intrinsics
18 © 2018 Arm Limited
Optimizations enabled by -Ofast:• Assume no infinities• Assume no NaNs• Disable correct handling of errno for
math functions• Enable re-association of FP• Enable reciprocal math optimizations• Disable handling of signed zero• Disable handling of FP traps
Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast
-Ofast• Aggressive optimization at the expense
of strict language compliance-O3 –ffp-contract=fast• O3 optimizations plus fused FP
instructions (eg FMAs)-O2• Minimum level required for
vectorization
19 © 2018 Arm Limited
LLVM Optimization RemarksLet the compiler tell you how to improve vectorization
Flag Description-Rpass=<regexp> What was optimized.
-Rpass-analysis=<regexp> What was analyzed.
-Rpass-missed=<regexp> What failed to optimize.
For each flag, replace <regexp> with an expression for the type of remarks you wish to view.
Recommended <regexp> queries are:
• -Rpass=\(loop-vectorize\|inline\)• -Rpass-missed=\(loop-vectorize\|inline\)• -Rpass-analysis=\(loop-vectorize\|inline\)
where loop-vectorize will filter remarks regarding vectorized loops, and inline for remarks regarding
inlining.
To enable optimization remarks, pass the following -Rpass options to armclang:
20 © 2018 Arm Limited
LLVM Optimization Remarks example
example.c:8:18: remark: hoisting zext[-Rpass=licm]
for (int i=0;i<K; i++)^
example.c:8:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
for (int i=0;i<K; i++)^ example.c:7:1: remark: 28 instructions in function
armclang -O3 -Rpass=.* -Rpass-analysis=.* example.c
armflang -O3 -Rpass=loop-vectorize example.F90 -gline-tables-onlyexample.F90:21: vectorized loop (vectorization width: 2, interleaved count: 1) [-Rpass=loop-vectorize]
END DO
21 © 2018 Arm Limited
• Use this flag when compilingprocedures that are called fromOpenMP regions, but do not contain OpenMP themselves
Thread-safe recursionarmflang -frecursive
• Allocates all local variables on the stack
• Allows thread-safe recursion
• Applied automatically with -fopenmp
• Can have implications on stack size
• Increase stack with:
ulimit -s unlimited
22
Text 30pt sentence case
Other compiler optionsman armclang
General flags
-g Generate source-level debug information (use with -O0 where possible)
-Wall Enable all warnings.
-w Suppress all warnings.
-fopenmp Enable OpenMP.
Fortran flags
-cpp Preprocess Fortran files. Default for .F, .F90, .F95,…
-module <path> Specifies a directory to place, and search for, module files.
-i8 Set the default kind for INTEGER and LOGICAL to 64bit (i.e. KIND=8).
23
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
24 © 2018 Arm Limited
C pragmas to control vectorization#pragma clang loop <…>
vectorize(assume_safety)• Allows the compiler to assume that there
are no aliasing issues in a loop
vectorize(enable)• Always vectorize, if it’s safe to do so
vectorize(disable)• Disable vectorization
distribute(enable) • Attempt to split a loop, to aid
vectorization
unroll_count(_value_)• Forces a scalar loop to unroll by a given
factor
interleave_count(_value_)• Forces a vectorized loop to be interleaved
by a given factor
25
Fortran directives to control vectorization
!dir$ ivdep
• Allows the compiler to assume that there are no aliasing issues in a loop
!dir$ vector always
• Always vectorize, if it’s safe to do so
!dir$ novector
• Disable vectorization
26
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
27 © 2018 Arm Limited
PGI Fortran specific workarounds
• The Flang in the Arm compiler comes from PGI• However it is not completely the same
• PGI Fortran specific workarounds are still valid for ArmFlang
• However ifdef __PGI not recognised
• Use #ifdef __FLANG
28 © 2018 Arm Limited
Fortran lazy evaluation
if ( some_true_statement .OR. function_with_side_effect() )
• In the Fortran standard lazy vs. strict evaluation is undefined
• ArmFlang always performs lazy evaluation in if statements
Solution:• Remove ambiguous evaluation operations• Don’t write functions with side effects!
29 © 2018 Arm Limited
Fortran Features
• Not all of the (new/esoteric) Fortran features have been implemented• Code may fail at compile time due to unsupported feature
• Full list of supported features:• Fortran 2003
– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2003-
status
• Fortran 2008
– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2008-
status
30 © 2018 Arm Limited
ArmFlang Fortran Intrinsics
• Some GNU Fortran intrinsics are non-standard• ArmFlang started by implementing standard ones• Some non-standard ones may be missing
• Intrinsics list• https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-intrinsics
• Example:• GETARG – Getting a command line argument – GNU non-standard• GET_COMMAND_ARGUEMENT – Fortran 2003 standard, supported by armflang
• ISNAN, COSD• Compiler specific: mm_prefetch
31 © 2018 Arm Limited
Arm uses a weak memory model
• Some HPC codes have their own threading• Usually based on pthreads• Written to have more control over the threads of
execution and how they synchronize• Some had no problems working with AArch64’s weakly
ordered memory system• Others exhibited issues in multi-threaded modes that
were particularly hard to diagnose without a detailed investigation into how the multi-threaded mode was implemented
– Problems are almost always down to a lock-free thread interaction implementation
– Key symptom: correct operation on a strongly ordered architecture, failure on weakly ordered
Done!
In UseMemory Pool
In UseFree
In UseIn Use
Core 1
Waiting for Memory...
Core 2
Inco
min
g W
rite
Inco
min
g W
rite
Mar
k Fr
ee
In UseMemory Pool
In Use
In UseIn Use
Core 1
Allocated
Core 2
Inco
min
g W
rite
Mar
k Fr
eeIn
com
ing
Writ
e
Weakly
Ordered
In UseDone!
32 © 2018 Arm Limited
Old config.guess Fails to Recognise Arm Architecture
• When building a code with autoconf it can fail to recognise aarch64
• Will fail automatically
• Similar issue to that seen on Power 8
• Solution:
• Pull down new versions of config.guess and config.sub
• If you own the package, or know the owners, ask them to update the config package in their release
• This would help us massively
wget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD' -O config.guesswget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD' -O config.sub
33 © 2018 Arm Limited
Libtool Fails to Recognise Arm Compiler
• Build systems which rely on an old libtool it fails to recognise Flang as a compiler• Causes failures when building code
• Solution:• Run sed-script to update the configuration
• Steps:– Update autoconf - Optional– Run configure– Run sed scripts– Run make
• Example: https://gitlab.com/arm-hpc/packages/wikis/packages/fftw• Let package owners know to update the libtool in their package
– Like: https://savannah.gnu.org/patch/index.php?9442
sed -i -e 's#wl=""#wl="-Wl,"#g' libtoolsed -i -e 's#pic_flag=""#pic_flag=" -fPIC -DPIC"#g' libtool
34 © 2018 Arm Limited
Arm Porting Advisor
• Porting advisor for Arm architecture / compiler• https://github.com/arm-hpc/porting-advisor
• Static source code analysis• Python tool to look for common source based porting issues
• Identifies:• Build system issues (config.guess)• Non-Arm assembly• Non-Arm intrinsics
35
Request: Fortran usage survey
Fortran usage survey
Anton Shterenlikht, Standards Officer, BCS Fortran Specialist Group
https://goo.gl/forms/JUFUReOoVUin2m8D2
http://www.fortran.bcs.org/2018/FortranBenefitsSurvey_interimrep_Aug2018.pdf
36
Content
• Quick introduction to Arm Compiler for HPC
• Compiler flags
• Pragmas and directives
• Other tips
• … one more thing
37
Arm Compiler for HPC on your laptop
Instructions:
1. Install docker (https://www.docker.com/)
2. git clone https://github.com/ARM-software/arm-hpc-tools
3. cd arm-hpc-tools
4. make interactive
3838
Thank You!Danke!Merci!��!�����!Gracias!Kiitos!
Confidential © Arm 2017 Limited