a. saà-garriga , d. castells-rufas and j. carrabina albert.saa@uabt

22
OMP2HMPP A. Saà-Garriga et al., CAIAC (UAB) HIP3ES 1/22 A. Saà-Garriga, D. Castells-Rufas and J. Carrabina [email protected] Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC) Universitat Autònoma de Barcelona. UAB 21/01/2014 OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

Upload: kiora

Post on 21-Mar-2016

47 views

Category:

Documents


1 download

DESCRIPTION

A. Saà-Garriga , D. Castells-Rufas and J. Carrabina [email protected] Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC) Universitat Autònoma de Barcelona. UAB 21/01/2014. OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

1/22

A. Saà-Garriga, D. Castells-Rufas and J. [email protected]

Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC)

Universitat Autònoma de Barcelona. UAB

21/01/2014

OMP2HMPP: HMPP Source Code Generation fromPrograms with Pragma Extensions

Page 2: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

2/22

11 Introduction

22 OMP2HMPP Compiler

33 Results

44 Conclusions

Intro Compiler Results Conclusions

Page 3: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

3/22

11 Introduction

22 OMP2HMPP Compiler

33 Results

44 Conclusions

Intro Compiler Results Conclusions

Page 4: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

4/22

GPGPUS and Embedded Systems

One of the main integrated blocks on heterogeneous platforms

Mali GPUs (embedded systems)

NVIDIA GPUs in first 10 machines of Green Top 500 (Nov, 2013)

GPGPUs are potentially useful for speed up applications Both classical HPC and EHPC

Complex and error-prone due to the programming complexity and language paradigms

Intro Compiler Results Conclusions

Page 5: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

5/22

Actual Programming Workflow

Intro Compiler Results Conclusions

New Proposals Learning

Source Code

AdaptationVersion

Evaluation

• New language• Language extensions

•Language syntax•Programing paradigms

GPGPUs programming could become a hurdle that can limit their adoption, since the programmer has to learn the hardware capabilities and the language to work with these.

Page 6: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

6/22

Programming Alternatives

Intro Compiler Results Conclusions

Directive Based Languages

New Languages

OpenACC[2] HMPP[3]

Language Extensions OpenMPC[4] hiCUDA[5]

Direct Transformations

Par4All[6]

Hide GPU complexity No automatic transfer optimization New list of directives

Hide GPU Complexity New Language

Hide GPU complexity No intermediate language No data transfer optimization Just C source code transformation

Page 7: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

7/22

Proposed Programming Workflow

Intro Compiler Results Conclusions

OMP2HMPP

Hide GPU complexity

Just one new directive

Uses HPC standard as input

C/C++

New Proposals Learning

Source Code

AdaptationVersion

Evaluation

• New language• Language extensions

•Language syntax•Programing paradigms

OpenMP OMP2HMPP HMPP

• Mercurium Infrastucture.[J. Balart et al. EWOMP 2004]

Page 8: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

8/22Intro Compiler Results Conclusions

11 Introduction

22 OMP2HMPP Compiler

33 Results

44 Conclusions

Page 9: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

9/22

Generate HMPP Directives

Intro Compiler Results Conclusions

Callsite

Codelet

Group

Advanced Load

Delegate Store

Syncronize

Page 10: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

10/22

Generate HMPP Directives

Intro Compiler Results Conclusions

OpenMP block Outlining

#pragma hmpp outlined_block codeletvoid outlined_block(int i, int A[10], int C[10]) { for(i=...) { ... C[i]=A[i]*k; ... }}

int main(){ ... A[x]=v;#pragma hmpp outlined_block callsite outlined_block(i,A,C); ... A[j]=C[j];}

Page 11: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

11/22

Contextual Information

Intro Compiler Results Conclusions

For each of the variables used inside an OpenMP block to transform OMP2HMPP analyze the Abstract Syntax Tree to identify:

The next/last access (read/write)

Where is computed (CPU/GPU) this access

If an operation is made inside a loop and identify this one.

Page 12: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

12/22

Contextual information

Intro Compiler Results Conclusions

Data Transfer Optimitzation

Advanced Load

Delegate Store

Page 13: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

13/22

Use of Contextual Information

Intro Compiler Results Conclusions

Data Transfer Optimitzation (Loops)

Page 14: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

14/22

Use of Contextual Information

Intro Compiler Results Conclusions

Data Transfer Optimitzation (Loops)

Page 15: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

15/22Intro Compiler Results Conclusions

11 Introduction

22 OMP2HMPP Compiler

33 Results

44 Conclusions

Page 16: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

16/22

Source Code Example

Intro Compiler Results Conclusions

Page 17: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

17/22

Experimental Results

Intro Compiler Results Conclusions

B505(1) B505(2) B515Num.

Processors2 2 2

Processor E5640 E5640 E5-2400Memory 24Gb 24Gb 192Gb

GPU NVIDIA Tesla M2050

NVIDIATesla C2075

NVIDIATesla K20

Tested Architectures

Page 18: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

18/22

Experimental Results

Intro Compiler Results Conclusions

B505(1)

Page 19: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

19/22

Experimental Results

Intro Compiler Results Conclusions

B505(2)

Page 20: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

20/22

Experimental Results

Intro Compiler Results Conclusions

B515

Page 21: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

21/22Intro Compiler Results Conclusions

11 Introduction

22 OMP2HMPP Compiler

33 Results

4 4 Conclusions

Page 22: A. Saà-Garriga ,  D. Castells-Rufas and J. Carrabina Albert.saa@uabt

OMP2HMPP

A. Saà-Garriga et al., CAIAC (UAB) HIP3ES

22/22

Conclusions

The programmer avoid to expend time in learning.

Tested set of problems from Polybench[8] obtains an average speedup of 113x compared to sequential.

An average speedup over 31x compared to OpenMP.

OMP2HMPP gives a solution that rarely differ from the best HMPP hand-coded version.

OMP2HMPP establish a GPU parallel code reference point for expert developers that wants to refine the parallelization.

…thanks for your attention!

Intro Compiler Results Conclusions