openmp and openacc a comparison - gtc on demand...comparing openmp 4.0 device constructs to openacc...

21
COMPUTE | STORE | ANALYZE OpenMP and OpenACC a Comparison James Beyer Cray Inc. 4/15/2014 Cray Property 1

Upload: others

Post on 11-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

OpenMP and OpenACC a Comparison

James Beyer Cray Inc.

4/15/2014 Cray Property 1

Page 2: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Outline

4/15/2014 2

● Related talks here at GTC ● Background

● OpenMP ● OpenACC

● Important differences (today) ● Parallelism ● Present_or_* ● Scalars ● Loops ● Unstructured data ● Calls (separate compilation units) ● Nested parallelism

● What is next ● OpenMP ● OpenACC

Page 3: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Related talks here at GTC

4/15/2014 3

● 4438 - What's new in OpenACC 2.0 and OpenMP 4.0 ● S4474 - Scaling OpenACC Across Multiple GPUs ● S4472 - Performance Analysis and Optimization of

OpenACC Applications ● S4514 - Panel on Compiler Directives for Accelerated

Computing ● Hangout: OpenACC

● There are more just search for OpenACC

Page 4: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Background -- OpenMP

4/15/2014 4

● FORTRAN version 1.0 - (October 1997) ● Accelerator additions

● Proposal submitted Dec 2009 ● Subcommittee formed Aug 2009

● Cray OpenMP for Accelerators nears release ● Fall 2010 several members for OpenACC working group ● TR1 - Technical Report on Directives for Attached

Accelerators (November 2012) ● OpenMP 4.0 (July 2013)

Page 5: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Background -- OpenACC

4/15/2014 5

● PGI releases accelerator directives ● CAPS releases HMPP ● Fall 2010 several members form OpenACC working group ● OpenACC 1.0 (Nov 2010) ● OpenACC 2.0 (June 2013)

Page 6: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Important differences

4/15/2014 6

● Parallelism ● Present_or_* ● Scalars ● Loops ● Calls (separate compilation units)

Page 7: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Parallelism

4/15/2014 7

● OpenACC ● “Off-load” and parallel startup tied together

● Acc parallel ● Acc kernels

● OpenMP ● “Off-load” and parallel startup disconnected

● Omp target ● Omp parallel ● Omp teams

Page 8: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Parallel startup example (Fortran)

4/15/2014 8

OpenACC !$acc parallel … !$acc end parallel Or !$acc kernels … !$acc end kernels

OpenMP !$omp target !$omp teams/parallel … !$omp end teams !$omp end target

Page 9: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Parallel startup example (C/C++)

4/15/2014 9

OpenACC #pragma acc parallel { … } Or #pragma acc kernels { … }

OpenMP #pragma omp target #pragma omp teams/parallel { … }

Page 10: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

OpenMP teams vs parallel

4/15/2014 10

● Why two different “parallel” mechanisms ● Teams

● Independent collision domains ● Same behavior as OpenACC gangs ● Only select directives allowed

● Parallel ● A single collision domain ● Default if neither is present ● All non-accelerator OpenMP directives allowed

Page 11: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Present_or_*

4/15/2014 11

● OpenACC ● present_or_* programmer visible

● Copy, copyin copyout, create ● Copy* without present allowed

● Error prone ● Hard to debug ● Little actual savings

● OpenMP ● present-or_* not programmer visable ● map always implies present test

● In, out, inout, allocate

Page 12: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Scalars

4/15/2014 12

● OpenACC ● Firstprivate by default ● User can override

● Error prone ● Allows implementation to make these kernels arguments ● Pointers are “special”

● OpenMP ● No such restrcition ● Pointers are scalars

Page 13: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Loops

4/15/2014 13

● OpenACC ● One construct “loop” ● Multiple parallelism types ● “nested” parallelism implicit ● Three levels available

● Gang ● Worker ● vector

● OpenMP ● Three constructs

● Distribute ● Do/for ● Simd

● Nested parallelism explicit

Page 14: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Loop examples

4/15/2014 14

OpenACC !$acc loop do i=1,n … enddo

OpenMP !$omp do or !$omp distribute do i=1,n … enddo !$omp end distribute or !$omp end do

Page 15: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Loop examples

4/15/2014 15

OpenACC !$acc loop do i=1,n !$acc loop do j=1,m … enddo enddo

OpenMP !$omp distribute do i=1,n !$omp parallel do do j = 1,m … enddo !$omp end parallel do enddo !$omp end distribute

Page 16: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Loop examples

4/15/2014 16

OpenACC !$acc loop gang worker vector do i=1,n … enddo

OpenMP !$omp distribute parallel do simd do i=1,n … enddo !$omp end distribute parallel do simd

Page 17: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Unstructured data

4/15/2014 17

● Separate the move to and the move from parts of data constructs

● Enter data ● Constructors

● Exit data ● destructors

● OpenACC ● Added support in 2.0

● OpenMP ● Nearing completion of feature

Page 18: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Calls

4/15/2014 18

● OpenACC ● Routine ● Only one type of parallelism

allowed ● Gang ● Worker ● Vector ● Seq

● Hard on user ● Easy for implementer

● OpenMP ● Declare

● Type of parallelism ignored ● Easy on user ● Hard for implementer

Page 19: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

Nested parallelism

4/15/2014 19

● OpenACC ● Added in 2.0 ● Currently no full implementations

● Why?

● OpenMP ● Parallel inside of teams is allowed ● Teams inside of teams is not allowed.

Page 20: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator

C O M P U T E | S T O R E | A N A L Y Z E

What is next

4/15/2014 20

● OpenACC ● Tools interfaces ● Better user defined type support ● …

● OpenMP ● What is next ● Unstructured data ● Declare target deferred_map ● Interoperability with accelerated libraries ● Multiple devices ● User defined type support

Page 21: OpenMP and OpenACC a Comparison - GTC On Demand...Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0 Author: James Beyer Subject: The talk will briefly introduce two accelerator