porting an mpi application to hybrid mpi+openmp with reveal tool on shaheen ii

Post on 15-Feb-2017

95 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

KAUSTSupercompu.ngLaboratoryPor.nganMPIapplica.ontohybridMPI+OpenMPwithRevealtoolonShaheenII

GeorgeMarkomanolisComputa.onalScien.stJune23th,2016

Outline

KAUST King Abdullah University of Science and Technology 2

❖  Introduction

❖  Test case

❖  Reveal

Introduc.on-ComponentsofCrayPat

KAUST King Abdullah University of Science and Technology 3

❖  Module perftools-base

•  pat_build – Instruments the program to be analyzed •  pat_report – Generates text reports from the performance data

captured during program execution and exports data for use in other programs.

•  Cray Apprentice2 – A graphical analysis tool that can be used to visualize and explore the performance data captured during program, execution

•  Reveal – A graphical source code analysis tool that can be used to correlate performance analysis data with annotated source code listings, to identify key opportunities for optimization (it works only with Cray compiler)

Studyingcase

KAUST King Abdullah University of Science and Technology 4

❖  Application from seismic group related to acoustic wave

solver •  Why this application? A user asked for it •  MPI application •  Test on 3 nodes with totally 96 cores on

Shaheen II

Prepareforthetutorial

KAUST King Abdullah University of Science and Technology 5

•  Connect to Shaheen II and copy the material: •  ssh –X username@shaheen.kaust.edu.sa

•  cp /scratch/tmp/model_reveal.tgz .

•  tar zxvf model_reveal.tgz

•  cd model_reveal

•  Reservation name: k1056_141

Reveal

A tool to port your application to OpenMP programming model

KAUST King Abdullah University of Science and Technology 6

Reveal

KAUST King Abdullah University of Science and Technology 7

❖ Reveal is Cray’s next-generation integrated performance analysis and code optimization tool.

•  Source code navigation using whole program

analysis (data provided by the Cray compilation environment only)

•  Coupling with performance data collected during execution by CrayPAT. Understand which high level serial loops could benefit from parallelism.

•  Enhanced loop mark listing functionality. •  Dependency information for targeted loops •  Assist users optimize code by providing variable

scoping feedback and suggested compile directives.

PrepareforReveal

KAUST King Abdullah University of Science and Technology 8

❖  Load Perftools •  module unload darshan •  module load perftools-base/6.3.2 •  module load perftools/6.3.2

❖  Execute the MPI version •  cd model_reveal •  make clean •  make •  In the submit.sh file changed to your account number and submit the

job §  sbatch submit.sh

•  tail -n 10 testdata.XXX.err §  1m46.361s

Reservation: k1056_141

Preparetheapplica.onforReveal

KAUST King Abdullah University of Science and Technology 9

❖  Compile the version for Reveal tool •  make clean –f Makefile_reveal •  In the Makefile_reveal file

§  $(CC) -h profile_generate -hpl=data.pl -h noomp $< -o $@ $(CFLAGS)

§  ${CC} -h profile_generate -hpl=data.pl -h noomp -c $< CrayData.c §  Reveal needs the object of the files, so you need to modify the

Makefile if needed •  make –f Makefile_reveal •  The folder data.pl is created in the folder •  Instrument your application

§  pat_build –w CrayData.exe §  New executable is called CrayData.exe+pat, replace it to submit.sh

SubmitthejobforRevealtool

KAUST King Abdullah University of Science and Technology 10

❖  Submit your job script and do not forget the reservation name (--reservation=…)

•  sbatch submit.sh

❖  A performance file (extension .xf) is created, if not something was wrong in the previous steps

❖  Generate the report and the ap2 file •  pat_report -o report.txt CrayData.exe+pat+58072-37t.xf

❖  Execute Reveal •  reveal data.pl CrayData.exe+pat+58072-37t.ap2

Reveal–LoopPerformance

KAUST King Abdullah University of Science and Technology 11

Reveal–Scoping

KAUST King Abdullah University of Science and Technology 12

Reveal–Programview

KAUST King Abdullah University of Science and Technology 13

Reveal–Func.onView

KAUST King Abdullah University of Science and Technology 14

Reveal–ArrayView

KAUST King Abdullah University of Science and Technology 15

Reveal–CompilerMessages

KAUST King Abdullah University of Science and Technology 16

Reveal–LoopPerformance

KAUST King Abdullah University of Science and Technology 17

Reveal–ScopingTool

KAUST King Abdullah University of Science and Technology 18

Reveal–ScopingResults

KAUST King Abdullah University of Science and Technology 19

Reveal–OpenMPpragmas

KAUST King Abdullah University of Science and Technology 20

Reveal–InsertedOpenMPpragmas

KAUST King Abdullah University of Science and Technology 21

CleanthecodefromunresolvedissuesandobserveOpenMPpragmas

KAUST King Abdullah University of Science and Technology 22

❖  vim CrayData.c ❖  Remove the lines with unresolved, only if you are sure.

#pragma omp parallel for default(none) \ private (i1,i2,u) \ shared (nxpad,nzpad)

#pragma omp parallel for default(none) \ private (ix,ib,ibz) \ shared (nxpad,nb,nzpad,bndr,p0) \ lastprivate (w)

CheckanOpenMPpragmaanditsvalida.on

KAUST King Abdullah University of Science and Technology 23

#pragma omp parallel for default(none) private (ix,ib,ibz) \ shared (nxpad,nb,nzpad,bndr,p0) \ lastprivate (w) for(ix=0; ix<nxpad; ix++) {

for(ib=0; ib<nb; ib++) { w = bndr[nb-ib-1]; ibz = nzpad-ib-1;

p0[ix][ib ] *= w; /* top sponge */ p0[ix][ibz] *= w; /* bottom sponge */ } } for(ib=0; ib<nb; ib++) { ibx = nxpad-ib-1; for(iz=0; iz<nzpad; iz++) { p0[ib ][iz] *= w; /* left sponge */

p0[ibx][iz] *= w; /* right sponge */ } }

Cleanthecodefromunresolvedissues,compileandrun

KAUST King Abdullah University of Science and Technology 24

❖  vim CrayData.c ❖  Remove the lines with unresolved if you are sure. ❖  Compile your application with MPI and OpenMP

•  make –f Makefile_omp •  The new executable is called CrayData_omp.exe •  Comment the active srun line in the submit.sh and uncomment

the next srun call. •  Uncomment also the line with OMP_NUM_THREADS=2 •  Now, we will execute the application with 48 MPI processes

(ntasks) and 2 threads per MPI process (cpus-per-task) •  srun --ntasks=48 --ntasks-per-node=16 --ntasks-per-socket=8 --

hint=nomultithread --cpus-per-task=2 ./CrayData_omp.exe

Differentcasesandresults

KAUST King Abdullah University of Science and Technology 25

❖  Results for 2 threads •  Change according:

§  export OMP_NUM_THREADS=2 §  srun –ntasks=48 --ntasks-per-node=16 --ntasks-per-

socket=8 --hint=nomultithread --cpus-per-task=2 ./CrayData_omp.exe

•  51.211s (2.86X)

❖  Results 4 threads •  Change according:

§  export OMP_NUM_THREADS=4 §  srun --ntasks=24 --ntasks-per-node=8 --ntasks-per-socket=4

--hint=nomultithread --cpus-per-task=4 ./CrayData_omp.exe •  24.815s (5.9X)

Differentcasesandresults

KAUST King Abdullah University of Science and Technology 26

❖  Results 8 threads •  12.222s (11.98X)

❖  Results 16 threads •  Change according:

§  export OMP_NUM_THREADS=16

§  srun --ntasks=6 --ntasks-per-node=2 --ntasks-per-socket=1 --hint=nomultithread --cpus-per-task=16 ./CrayData_omp.exe

•  8.895s (16.45X)

Theoriginalversionwasimproved19.19.mes

KAUST King Abdullah University of Science and Technology 27

170.67

106.36

8.8950

20406080

100120140160180

Originalversion Op.mizedMPIversion

MPI+OpenMP

Time(in

sec.)

Execu.on.me

Valida.on

KAUST King Abdullah University of Science and Technology 28

Original version Optimized MPI+OpenMP

Summary

KAUST King Abdullah University of Science and Technology 29

❖  Reveal is an easy to use tool

❖  The user should be careful though, give notice to compiler messages

❖  You can have great speedup with this tool

❖  We need to investigate more complicated applications

KAUST Supercomputing Laboratory

KAUST King Abdullah University of Science and Technology 30

top related