compiling for sve - microsoft...compiling for sve sve hackathon, sc18 will lovett software...

38
Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

Compiling for SVESVE Hackathon, SC18

Will LovettSoftware Architect, HPC Tools

Page 2: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

2

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

Page 3: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

3

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

Page 4: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

4

Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators

Cross-platform Tools Arm Architecture Tools

DDT MAPFORGE

PERFORMANCEREPORTS

C/C++ & FORTRANCOMPILER

PERFORMANCE LIBRARIES

Page 5: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

5

Arm’s solution for HPC application development and portingCommercial tools for aarch64, x86_64, ppc64 and accelerators

Cross-platform Tools Arm Architecture Tools

DDT MAPFORGE

PERFORMANCEREPORTS

C/C++ & FORTRANCOMPILER

PERFORMANCE LIBRARIES

Page 6: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

6

Arm’s commercially-supported C/C++/Fortran compiler

Tuned for Scientific Computing, HPC and Enterprise workloads• Processor-specific optimizations for various server-class Arm-based platforms• Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime

Linux user-space compiler with latest features• C++ 14 and Fortran 2003 language support with OpenMP 4.5• Support for Armv8-A and SVE architecture extension• Based on LLVM and Flang, leading open-source compiler projects

Commercially supported by Arm • Available for a wide range of Arm-based platforms running leading Linux

distributions – RedHat, SUSE and Ubuntu

Compilers tuned for Scientific Computing and HPC

Latest features and performance optimizations

Commercially supported by Arm

Page 7: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

7

C/C++ Frontend

Fortran Frontend

Optimizer Armv8-A code-gen

SVE code-gen

Clang based LLVM based

PGI Flang based

Enhanced optimization for Armv8-A and SVE

C/C++ Files (.c/.cpp)

Fortran Files (.f/.f90)

Arm C/C++/Fortran Compiler

Armv8-A binary

SVEbinary

LLVM IR LLVM IRIR Optimizations

Auto-vectorization

LLVM based

LLVM based

Language specific frontend Architecture specific backendLanguage agnostic optimization

Arm Compiler is built on LLVM, Clang and Flang projects

Page 8: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

8

Optimized BLAS, LAPACK and FFT

Commercial 64-bit Armv8-A math libraries • Commonly used low-level math routines - BLAS, LAPACK and FFT• Provides FFTW compatible interface for FFT routines• Batched BLAS support

Best-in-class serial and parallel performance• Generic Armv8-A optimizations by Arm• Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon

vendors

Validated and supported by Arm• Available for a wide range of server-class Arm-based platforms• Validated with NAG’s test suite, a de-facto standard• Available for Arm compiler or GNU

Best in class performance

Validated with NAG test suite

Commercially supportedby Arm

Page 9: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

9 © 2018 Arm Limited

What’s new in the compiler

• Updates• GNU 8.2• LLVM 7.0

• Enhanced integration between ArmPL and Arm Compiler• Fortran

• Submodules (F2008)• Directives: IVDEP, NOVECTOR, VECTOR ALWAYS• NINT and DNINT performance

• Improved documentation

Page 10: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

10 © 2018 Arm Limited

What’s new in the libraries

• Improved function prototypes• BLAS, CBLAS, LAPACK

• Enhanced libamath

• FFTW MPI interface added• Single and double precision

• Improved Guru interface

• Sparse mat-vec multiplication (SpMV) in CSR format

Page 11: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

11

Useful Links

• SVE Hackathon SC18 landing page• https://developer.arm.com/hpc/events/sve-hackathon-sc18

• Arm porting and tuning guides• https://developer.arm.com/products/software-development-tools/hpc/resources/porting-and-tuning

• Arm application recipes (Please contribute to this)• https://gitlab.com/arm-hpc/packages/wikis/categories/allPackages

• Compiler documentation and quick start guides• https://developer.arm.com/products/software-development-tools/hpc/documentation• E.g. gfortran to armflang, ifort to armflang, compiler quick start

• Arm C Language Extensions (ACLE) for SVE• https://developer.arm.com/docs/100987/latest/arm-c-language-extensions-for-sve

• Arm community forum• https://community.arm.com/tools/hpc/f/hpc-user-group

Page 12: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

12

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

Page 13: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

13 © 2018 Arm Limited

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

Page 14: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

14 © 2018 Arm Limited

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• armclang for C/C++• armflang for Fortran

Page 15: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

15 © 2018 Arm Limited

• Required to enable SVE code generation

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

-march=name[+[no]feature]• Targets an architecture profile,

generating generic code that runs on any processor of that architecture.

Page 16: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

16 © 2018 Arm Limited

• Produces higher-quality, but less portable code

• Once SVE-based SoCs are available, this will remove the need for-march=armv8-a+sve

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• Enables the compiler to automaticallydetect the CPU it is being run on andoptimize accordingly.

• This supports a range of Arm®v8-A based SoCs, including ThunderX2.

Page 17: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

17 © 2018 Arm Limited

• SVE-enabled ArmPL libraries will beavailable in a future release

• SVE vector mathematical functions supported today

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

• Enable Arm Performance Libraries(ArmPL)

• Enables optimized versions of the C mathematical functions declared in math.h

• Enables optimized versions of Fortran math intrinsics

• Enables autovectorization of mathematical functions & intrinsics

Page 18: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

18 © 2018 Arm Limited

Optimizations enabled by -Ofast:• Assume no infinities• Assume no NaNs• Disable correct handling of errno for

math functions• Enable re-association of FP• Enable reciprocal math optimizations• Disable handling of signed zero• Disable handling of FP traps

Basic flagsarmclang -march=armv8-a+sve -mcpu=native –armpl –Ofast

-Ofast• Aggressive optimization at the expense

of strict language compliance-O3 –ffp-contract=fast• O3 optimizations plus fused FP

instructions (eg FMAs)-O2• Minimum level required for

vectorization

Page 19: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

19 © 2018 Arm Limited

LLVM Optimization RemarksLet the compiler tell you how to improve vectorization

Flag Description-Rpass=<regexp> What was optimized.

-Rpass-analysis=<regexp> What was analyzed.

-Rpass-missed=<regexp> What failed to optimize.

For each flag, replace <regexp> with an expression for the type of remarks you wish to view.

Recommended <regexp> queries are:

• -Rpass=\(loop-vectorize\|inline\)• -Rpass-missed=\(loop-vectorize\|inline\)• -Rpass-analysis=\(loop-vectorize\|inline\)

where loop-vectorize will filter remarks regarding vectorized loops, and inline for remarks regarding

inlining.

To enable optimization remarks, pass the following -Rpass options to armclang:

Page 20: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

20 © 2018 Arm Limited

LLVM Optimization Remarks example

example.c:8:18: remark: hoisting zext[-Rpass=licm]

for (int i=0;i<K; i++)^

example.c:8:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]

for (int i=0;i<K; i++)^ example.c:7:1: remark: 28 instructions in function

armclang -O3 -Rpass=.* -Rpass-analysis=.* example.c

armflang -O3 -Rpass=loop-vectorize example.F90 -gline-tables-onlyexample.F90:21: vectorized loop (vectorization width: 2, interleaved count: 1) [-Rpass=loop-vectorize]

END DO

Page 21: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

21 © 2018 Arm Limited

• Use this flag when compilingprocedures that are called fromOpenMP regions, but do not contain OpenMP themselves

Thread-safe recursionarmflang -frecursive

• Allocates all local variables on the stack

• Allows thread-safe recursion

• Applied automatically with -fopenmp

• Can have implications on stack size

• Increase stack with:

ulimit -s unlimited

Page 22: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

22

Text 30pt sentence case

Other compiler optionsman armclang

General flags

-g Generate source-level debug information (use with -O0 where possible)

-Wall Enable all warnings.

-w Suppress all warnings.

-fopenmp Enable OpenMP.

Fortran flags

-cpp Preprocess Fortran files. Default for .F, .F90, .F95,…

-module <path> Specifies a directory to place, and search for, module files.

-i8 Set the default kind for INTEGER and LOGICAL to 64bit (i.e. KIND=8).

Page 23: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

23

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

Page 24: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

24 © 2018 Arm Limited

C pragmas to control vectorization#pragma clang loop <…>

vectorize(assume_safety)• Allows the compiler to assume that there

are no aliasing issues in a loop

vectorize(enable)• Always vectorize, if it’s safe to do so

vectorize(disable)• Disable vectorization

distribute(enable) • Attempt to split a loop, to aid

vectorization

unroll_count(_value_)• Forces a scalar loop to unroll by a given

factor

interleave_count(_value_)• Forces a vectorized loop to be interleaved

by a given factor

Page 25: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

25

Fortran directives to control vectorization

!dir$ ivdep

• Allows the compiler to assume that there are no aliasing issues in a loop

!dir$ vector always

• Always vectorize, if it’s safe to do so

!dir$ novector

• Disable vectorization

Page 26: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

26

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

Page 27: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

27 © 2018 Arm Limited

PGI Fortran specific workarounds

• The Flang in the Arm compiler comes from PGI• However it is not completely the same

• PGI Fortran specific workarounds are still valid for ArmFlang

• However ifdef __PGI not recognised

• Use #ifdef __FLANG

Page 28: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

28 © 2018 Arm Limited

Fortran lazy evaluation

if ( some_true_statement .OR. function_with_side_effect() )

• In the Fortran standard lazy vs. strict evaluation is undefined

• ArmFlang always performs lazy evaluation in if statements

Solution:• Remove ambiguous evaluation operations• Don’t write functions with side effects!

Page 29: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

29 © 2018 Arm Limited

Fortran Features

• Not all of the (new/esoteric) Fortran features have been implemented• Code may fail at compile time due to unsupported feature

• Full list of supported features:• Fortran 2003

– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2003-

status

• Fortran 2008

– https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-2008-

status

Page 30: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

30 © 2018 Arm Limited

ArmFlang Fortran Intrinsics

• Some GNU Fortran intrinsics are non-standard• ArmFlang started by implementing standard ones• Some non-standard ones may be missing

• Intrinsics list• https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler/fortran-intrinsics

• Example:• GETARG – Getting a command line argument – GNU non-standard• GET_COMMAND_ARGUEMENT – Fortran 2003 standard, supported by armflang

• ISNAN, COSD• Compiler specific: mm_prefetch

Page 31: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

31 © 2018 Arm Limited

Arm uses a weak memory model

• Some HPC codes have their own threading• Usually based on pthreads• Written to have more control over the threads of

execution and how they synchronize• Some had no problems working with AArch64’s weakly

ordered memory system• Others exhibited issues in multi-threaded modes that

were particularly hard to diagnose without a detailed investigation into how the multi-threaded mode was implemented

– Problems are almost always down to a lock-free thread interaction implementation

– Key symptom: correct operation on a strongly ordered architecture, failure on weakly ordered

Done!

In UseMemory Pool

In UseFree

In UseIn Use

Core 1

Waiting for Memory...

Core 2

Inco

min

g W

rite

Inco

min

g W

rite

Mar

k Fr

ee

In UseMemory Pool

In Use

In UseIn Use

Core 1

Allocated

Core 2

Inco

min

g W

rite

Mar

k Fr

eeIn

com

ing

Writ

e

Weakly

Ordered

In UseDone!

Page 32: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

32 © 2018 Arm Limited

Old config.guess Fails to Recognise Arm Architecture

• When building a code with autoconf it can fail to recognise aarch64

• Will fail automatically

• Similar issue to that seen on Power 8

• Solution:

• Pull down new versions of config.guess and config.sub

• If you own the package, or know the owners, ask them to update the config package in their release

• This would help us massively

wget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD' -O config.guesswget'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD' -O config.sub

Page 33: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

33 © 2018 Arm Limited

Libtool Fails to Recognise Arm Compiler

• Build systems which rely on an old libtool it fails to recognise Flang as a compiler• Causes failures when building code

• Solution:• Run sed-script to update the configuration

• Steps:– Update autoconf - Optional– Run configure– Run sed scripts– Run make

• Example: https://gitlab.com/arm-hpc/packages/wikis/packages/fftw• Let package owners know to update the libtool in their package

– Like: https://savannah.gnu.org/patch/index.php?9442

sed -i -e 's#wl=""#wl="-Wl,"#g' libtoolsed -i -e 's#pic_flag=""#pic_flag=" -fPIC -DPIC"#g' libtool

Page 34: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

34 © 2018 Arm Limited

Arm Porting Advisor

• Porting advisor for Arm architecture / compiler• https://github.com/arm-hpc/porting-advisor

• Static source code analysis• Python tool to look for common source based porting issues

• Identifies:• Build system issues (config.guess)• Non-Arm assembly• Non-Arm intrinsics

Page 35: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

35

Request: Fortran usage survey

Fortran usage survey

Anton Shterenlikht, Standards Officer, BCS Fortran Specialist Group

https://goo.gl/forms/JUFUReOoVUin2m8D2

http://www.fortran.bcs.org/2018/FortranBenefitsSurvey_interimrep_Aug2018.pdf

Page 36: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

36

Content

• Quick introduction to Arm Compiler for HPC

• Compiler flags

• Pragmas and directives

• Other tips

• … one more thing

Page 37: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

37

Arm Compiler for HPC on your laptop

Instructions:

1. Install docker (https://www.docker.com/)

2. git clone https://github.com/ARM-software/arm-hpc-tools

3. cd arm-hpc-tools

4. make interactive

Page 38: Compiling for SVE - Microsoft...Compiling for SVE SVE Hackathon, SC18 Will Lovett Software Architect, HPC Tools 2 Content •Quick introduction to Arm Compiler for HPC •Compilerflags

3838

Thank You!Danke!Merci!��!�����!Gracias!Kiitos!

Confidential © Arm 2017 Limited