parallel programming & intel hpc round table visitjanjust/presentations/pdp-groepsoverleg... ·...

13
J.J. Keijser Nikhef Amsterdam CT/PDP Parallel Programming & Intel HPC Round Table visit Jan Just Keijser 23 April 2014

Upload: hoangkhanh

Post on 18-Feb-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

J.J. KeijserNikhefAmsterdamCT/PDP

Parallel Programming & Intel HPC Round Table visit

Jan Just Keijser23 April 2014

J.J. KeijserNikhefAmsterdamCT/PDP

PPCES 2014:◦ Parallel Programming for Computational

Engineering & Science◦ 10-14 March◦ RWTH Aachen

Intel EMEA HPC Roundtable◦ 8 – 9 April◦ Cambridge, UK◦ I survived sitting in a British car with Tristan

S. driving

J.J. KeijserNikhefAmsterdamCT/PDP

Parallel Programming Course1.Introduction, Parallel Computing Architectures,

Serial Tuning.

2.Message Passing with MPI.

3.Shared Memory Programming with OpenMP

4.OpenMP Programming for the MIC Architecture.

5.GPGPU Programming with OpenACC

J.J. KeijserNikhefAmsterdamCT/PDP

Day 1: serial tuningVery instructiveFound a bug in the Intel C++ 2014 compiler:

Good

10 times slower:

J.J. KeijserNikhefAmsterdamCT/PDP

Day 2&3: MPI & OpenMPMPI a bit boring, although very similar to

OpenMPOpenMP 4.0 looks promising:

◦ Annotate C/C++ or FORTRAN with #pragma's to help parallellisation

◦ New #pragma's “target” and “simd” to explicitly select an offload target and vectorisation

Portland Group (PG) compiler can compile OpenMP 4.0 code to multiple targets:◦ CPU, Xeon Phi, CUDA (Nvidia GPU's),

OpenCL

J.J. KeijserNikhefAmsterdamCT/PDP

Day 4: OpenMP on MICMIC = Intel Xeon PhiFirst time that I have seen code that runs

faster on a single Xeon Phi vs a dual Xeon E5 2697v2

Can use either Intel compiler or PG compiler

Again a bug discovered:◦ Code that is offloaded to a Xeon Phi runs

slower than code that is natively compiled for a Xeon Phi by a factor > 2

J.J. KeijserNikhefAmsterdamCT/PDP

Day 5: OpenACCSimilar to OpenMPMeant as a open standard to be able to

annotate code to compile on GPU'sThus far, only commercial compilers support

this (SGI, Portland Group)

J.J. KeijserNikhefAmsterdamCT/PDP

Intel HPC Round TableNDA meeting, so Intel folks could talk freelyLatest news from Intel on CPUs, Xeon Phi's,

storage, networkingSessions on success stories with Intel stuff The real reason to go is the ability to directly

talk to Intel developers◦ Talked to C++ compiler developer◦ Talked to Xeon Phi developer

J.J. KeijserNikhefAmsterdamCT/PDP

News on next-gen Xeon PhiAlso in socket format – replaces a regular

XeonWill always use special memory for best

performance, so caching data locally remains vital

Based on Atom cores instead of current Pentium-M cores: x86_64 compatible

J.J. KeijserNikhefAmsterdamCT/PDP

GPUs are excellent for performing the Same operation (Instruction) on

Multiple Data elements (SIMD)Vector processing, anyone?

Remember this slide?

From: Massively Parallel Computing with CUDA

J.J. KeijserNikhefAmsterdamCT/PDP

Available GPUs @ NikhefArun: 2 x NVIDIA Tesla M2070

Pleedo: 2 x Intel Xeon Phi 5110P

Plofkip: 2 x NVIDIA Tesla M1060, 1 x NVIDIA GTX580, 1 x AMD Radeon HD7970 GHz

Nesse: Jos Vermaseren's Xeon Phi 7120P

Some engineering workstations now have AMD Firepro W5000's, which are actually quite nice

J.J. KeijserNikhefAmsterdamCT/PDP

Which direction to take?OpenMP 4.0 looks promising

NVIDIA vs Intel vs AMD?◦ Most HPC centres use NVIDIA (cartesius)◦ However, look at raw performance:

J.J. KeijserNikhefAmsterdamCT/PDP