some experiences for porting application to intel xeon phi

13
Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US [email protected] Other side of my face [email protected] (FreeBSD committer) [email protected] (Apache OpenOffice committer) 2012/11 Super Computing 2012 121115日木曜日

Upload: maho-nakata

Post on 28-Jan-2015

111 views

Category:

Technology


2 download

DESCRIPTION

Some experiences for porting application to Intel Xeon Phi

TRANSCRIPT

Page 1: Some experiences for porting application to Intel Xeon Phi

Porting application to Intel Xeon Phi: some experiences

RIKEN Advanced Center for Computing and Communication2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US

[email protected]

Other side of my [email protected] (FreeBSD committer)[email protected] (Apache OpenOffice committer)

 2012/11 Super Computing 2012

12年11月15日木曜日

Page 2: Some experiences for porting application to Intel Xeon Phi

Aims of my talk

•Proof of concept:-Intel says, “One source base, tuned to many targets”-Is it true or not?-my answer is TRUE.

•Native model is considered- Just compile with Intel Composer XE 2013 :-)- Offload model is extremely demanding for modern complicated programs

- CUDA expertise's say: to get performance, do everything on GPU, do not transfer data between CPU and GPU.

- Modern applications use a lot of external open source / free software packages. Very complex structure!

- Not realistic!

•Providing Porting tips-Gaussian09, povray, sdpa...

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 3: Some experiences for porting application to Intel Xeon Phi

What is Intel Xeon Phi ??

• Intel Xeon Phi is a co-processor, connected via PCI-express slot.• Peak performance is 1TFlops in double precision

- many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM...•We can see as if there are another cluster of computer inside a Linux box.

- Linux micro OS is provided•Better programability

- x86 based (64bit)- Development tool: Intel Composer XE 2013 - C, C++, Fortran- compile and run same code to CPU- familiar parallelism : OpenMP, MPI, OpenCL

- Various programming model- MIC centric- CPU centric

-CAUTION: BINARY IS INCOMPATIBLE!-Recompile is needed for Xeon Phi!

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 4: Some experiences for porting application to Intel Xeon Phi

How to build your program on Xeon Phi

•Very easy.•Just passing -mmic flags to Compilers

-icc -mmic-icpc -mmic-ifort -mmic

•How to link against optimized BLAS and LAPACK?-just add -mkl-same for CPU case.

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 5: Some experiences for porting application to Intel Xeon Phi

DGEMM benchmark: sorry, no free lunch, tune Needed.

• DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking.- not see the memory bandwidth

• Intel Xeon Phi’s theoretical peak performance is 1TFlops.• Do we need some tunes for Intel Xeon Phi?

- YES. Otherwise 40% of peak is attained: ~400GFlops- If tuned we attain ~816GFlops.- memory allocation, thread affinity

• How to obtain the data?- just malloc and fill random values- no alignment is specified- CPU’s case it is sufficient, but- not sufficient for Xeon Phi.

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 6: Some experiences for porting application to Intel Xeon Phi

SDPA : How to cheat “configure” part I

• SDPA is a highly efficient semidefinite programming solver.- distributed at http://sdpa.sourceforge.net/, under GPL.

• ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this?

- almost the same environment...- Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then

replace to “-mmic”, then compile.

Super Computing 2012 @ Intel Booth

#!/bin/sh

CC="icc"; export CCCXX="icpc"; export CXXFC="ifort"; export FC

CFLAGS="-DMMIC" ; export CFLAGSCXXFLAGS="-DMMIC" ; export CXXFLAGSFFLAGS="-DMMIC" ; export FFLAGS

./configure --with-blas="-mkl" --with-lapack="-mkl"

files=$(find ./* -name Makefile)perl -p -i -e 's/-DMMIC/-mmic/g' $files

12年11月15日木曜日

Page 7: Some experiences for porting application to Intel Xeon Phi

Povray: how to cheat configure part II

• The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program.

• This treat how to build Povray 3.7 RC- This version is the first pthread parallelized Povray.

• Requires some external libraries other than provided to Intel Xeon Phi.

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 8: Some experiences for porting application to Intel Xeon Phi

Povray: how to cheat configure : part II

• Prerequisites- boost, zlib, jpeg, tiff and libpng.- all libraries should be build for Phi :-( :-( :-(

• How to build boost and zlib: We took the same strategy as povray.- First build and install host version of boost to /home/maho/HOST then Phi

version to /home/maho/MIC- Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows:

- backup /home/maho/MIC to /home/maho/MIC.org- copy /home/maho/HOST to /home/maho/MIC- run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.

- be sure LD_LIBRARY_FLAGS points /home/maho/MIC!- remove /home/maho/MIC- rename /home/maho/MIC.org to /home/maho/MIC- replace -DMMIC to -mmic- make for Xeon Phi binary.- Done.

• Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 9: Some experiences for porting application to Intel Xeon Phi

Povray: how to cheat configure : part II

• Prerequisites- boost, zlib, jpeg, tiff and libpng.- all libraries should be build for Phi :-( :-( :-(

• Strategy: do build twice: host build then Xeon Phi build- build and install host version of libraries to /home/maho/HOST- build and install Phi version of libraires to /home/maho/MIC

- actually, • Final configure for Povray should be done as follows:

- backup /home/maho/MIC to /home/maho/MIC.org- copy /home/maho/HOST to /home/maho/MIC- run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.

- be sure LD_LIBRARY_FLAGS points /home/maho/MIC!- remove /home/maho/MIC- rename /home/maho/MIC.org to /home/maho/MIC- replace -DMMIC to -mmic- make for Xeon Phi binary.- Done.

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 10: Some experiences for porting application to Intel Xeon Phi

Gaussian09 Partially Runs on Intel Xeon Phi!

•Gaussian09 is a famous quantum chemical program package and it provides state-of the-art capabilities for electronic structure modeling.

•Very large source code: 1.7 million lines- $ cat *F | wc -l- 1714217

• Intel Composer XE is not officially supported compiler- Gaussian Inc. only supports PGI compiler.- Patches are made by M.N. (sorry, we cannot provide the patches to public)- Small set of patches enable us to build

- -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09- -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c- -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make- -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F- -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau- -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags

- patches are almost the same as hosts’ one. - almost merely adding -mmic

- somehow shared libs don’t work??- utils.a should be a static library.- Intel MKL should also be linked statically.- shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed?- Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 11: Some experiences for porting application to Intel Xeon Phi

Gaussian09 Partially Runs on Intel Xeon Phi!

• Just run• Still very unstable with -O3

- l303.exe (just wish your luck)- l401.exe (should be built with -O0)- Passed:(just test000.com-test200.com)

test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,038,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,115,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,170,172,177,184,188,195

Super Computing 2012 @ Intel Booth

12年11月15日木曜日

Page 12: Some experiences for porting application to Intel Xeon Phi

A packaging system (pkgsrc) porting effort on Intel Phi!!!

• What is the pkgsrc?- pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000

packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/

• NAKATA, Maho has over ten years of FreeBSD ports committer experience.• Why pkgsrc?

- We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages.

- RPM, deb are too complex (to me).- Native tool chain for Intel Phi is really important

- ./configure (autotools) is a good one but cross building is rarely supported.- ./configure looks some parameters of the host machine.- Intel Composer can be used as if it is a native toolkit with a small trick.

- highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD

• Status:- ./bootstrap : done

• How to get?- I’ll provide ASAP on sourceforge.net or somewhere...

12年11月15日木曜日

Page 13: Some experiences for porting application to Intel Xeon Phi

Summary and outlook

• We tested Intel Xeon Phi, especially how to build Phi native binary.-“One source base, tuned to many targets” is TRUE!

• We regard Intel Xeon Phi as a small Linux cluster.- but no binary compatibility inbetween.

• We provided a porting tip; how to build gaussian, povray and sdpa.• For packages using autotools (./configure) or similar things, our approach

requires two pass configure to cheat- if configure looks Phi specific stuffs like availability of FMA, then this

strategy doesn’t work.- Yoshikazu Kamoshida’s strategy solves for configure or build system which

requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment).

• More packages are needed!- Poring NetBSD’s pkgsrc might be good idea for cross compiling environment

like Intel Xeon Phi.- pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over

12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/

12年11月15日木曜日