CUDA Fortran Programming Guide and Fortran Programming Guide and Reference Version 2017 | viii PREFACE This document describes CUDA Fortran, a small set of extensions to Fortran that

Download CUDA Fortran Programming Guide and   Fortran Programming Guide and Reference Version 2017 | viii PREFACE This document describes CUDA Fortran, a small set of extensions to Fortran that

Post on 05-Mar-2018

216 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>CUDA FORTRAN PROGRAMMING GUIDE ANDREFERENCE</p><p>Version2017</p></li><li><p>CUDA Fortran Programming Guide and Reference Version2017|ii</p><p>TABLE OF CONTENTS</p><p>Preface............................................................................................................viiiIntended Audience........................................................................................... viiiOrganization................................................................................................... viiiConventions....................................................................................................viiiTerminology..................................................................................................... ixRelated Publications........................................................................................... ix</p><p>Chapter1. Introduction.........................................................................................1Chapter2.Programming Guide............................................................................... 3</p><p>2.1.CUDA Fortran Host and Device Code..................................................................32.2.CUDA Fortran Kernels.................................................................................... 52.3.Thread Blocks............................................................................................. 62.4.Memory Hierarchy........................................................................................ 62.5.Subroutine / Function Qualifiers.......................................................................7</p><p>2.5.1.Attributes(host)...................................................................................... 72.5.2.Attributes(global)....................................................................................72.5.3.Attributes(device)................................................................................... 72.5.4.Restrictions........................................................................................... 8</p><p>2.6.Variable Qualifiers........................................................................................ 82.6.1.Attributes(device)................................................................................... 82.6.2.Attributes(managed)................................................................................ 82.6.3.Attributes(constant).................................................................................92.6.4.Attributes(shared)................................................................................... 92.6.5.Attributes(pinned)...................................................................................92.6.6.Attributes(texture).................................................................................. 9</p><p>2.7.Datatypes in Device Subprograms......................................................................92.8.Predefined Variables in Device Subprograms....................................................... 102.9.Execution Configuration................................................................................102.10.Asynchronous Concurrent Execution................................................................ 11</p><p>2.10.1.Concurrent Host and Device Execution....................................................... 112.10.2.Concurrent Stream Execution.................................................................. 11</p><p>2.11.Kernel Loop Directive................................................................................. 122.11.1. Syntax.............................................................................................. 122.11.2.Restrictions on the CUF kernel directive..................................................... 132.11.3.Summation Example..............................................................................14</p><p>2.12.Using Fortran Modules................................................................................ 142.12.1.Accessing Data from Other Modules...........................................................152.12.2.Call Routines from Other Modules............................................................. 152.12.3.Declaring Device Pointer and Target Arrays..................................................162.12.4.Declaring Textures................................................................................18</p><p>2.13.Building a CUDA Fortran Program...................................................................20</p></li><li><p>CUDA Fortran Programming Guide and Reference Version2017|iii</p><p>2.14.Emulation Mode........................................................................................ 21Chapter3.Reference..........................................................................................22</p><p>3.1.New Subroutine and Function Attributes............................................................223.1.1.Host Subroutines and Functions................................................................. 223.1.2.Global Subroutines.................................................................................223.1.3.Device Subroutines and Functions.............................................................. 233.1.4.Restrictions on Device Subprograms............................................................ 23</p><p>3.2.Variable Attributes...................................................................................... 233.2.1.Device data......................................................................................... 243.2.2.Managed data.......................................................................................243.2.3.Pinned arrays....................................................................................... 253.2.4.Constant data.......................................................................................263.2.5. Shared data......................................................................................... 273.2.6.Texture data........................................................................................ 283.2.7.Value dummy arguments..........................................................................28</p><p>3.3.Allocating Device Memory, Pinned Memory, and Managed Memory..............................293.3.1.Allocating Device Memory........................................................................ 293.3.2.Allocating Device Memory Using Runtime Routines...........................................293.3.3.Allocate Pinned Memory.......................................................................... 303.3.4.Allocating Managed Memory......................................................................303.3.5.Allocating Managed Memory Using Runtime Routines........................................ 30</p><p>3.4.Data transfer between host and device memory.................................................. 313.4.1.Data Transfer Using Assignment Statements.................................................. 313.4.2.Implicit Data Transfer in Expressions........................................................... 323.4.3.Data Transfer Using Runtime Routines......................................................... 32</p><p>3.5. Invoking a kernel subroutine.......................................................................... 333.6.Device code.............................................................................................. 33</p><p>3.6.1.Datatypes Allowed................................................................................. 343.6.2.Built-in variables................................................................................... 343.6.3.Fortran Intrinsics...................................................................................353.6.4.New Intrinsic Functions........................................................................... 363.6.5.Warp-Vote Operations............................................................................. 393.6.6.Atomic Functions...................................................................................403.6.7.Fortran I/O..........................................................................................42</p><p>3.6.7.1.PRINT Example................................................................................ 423.6.8.Shuffle Functions...................................................................................423.6.9.Restrictions..........................................................................................44</p><p>3.7.Host code................................................................................................. 443.7.1.SIZEOF Intrinsic.....................................................................................443.7.2.Overloaded Reduction Intrinsics................................................................. 44</p><p>3.8.Fortran Modules......................................................................................... 453.8.1.Device Modules..................................................................................... 463.8.2.Host Modules........................................................................................48</p></li><li><p>CUDA Fortran Programming Guide and Reference Version2017|iv</p><p>Chapter4.Runtime APIs...................................................................................... 534.1. Initialization.............................................................................................. 534.2.Device Management.....................................................................................53</p><p>4.2.1.cudaChooseDevice................................................................................. 534.2.2.cudaDeviceGetCacheConfig.......................................................................534.2.3.cudaDeviceGetLimit............................................................................... 544.2.4.cudaDeviceGetSharedMemConfig................................................................ 544.2.5.cudaDeviceGetStreamPriorityRange.............................................................544.2.6.cudaDeviceReset................................................................................... 544.2.7.cudaDeviceSetCacheConfig....................................................................... 544.2.8.cudaDeviceSetLimit................................................................................ 554.2.9.cudaDeviceSetSharedMemConfig.................................................................554.2.10.cudaDeviceSynchronize.......................................................................... 554.2.11.cudaGetDevice.................................................................................... 554.2.12.cudaGetDeviceCount............................................................................. 554.2.13.cudaGetDeviceProperties........................................................................ 554.2.14.cudaSetDevice.....................................................................................564.2.15.cudaSetDeviceFlags...............................................................................564.2.16.cudaSetValidDevices..............................................................................56</p><p>4.3.Thread Management.................................................................................... 564.3.1.cudaThreadExit..................................................................................... 564.3.2.cudaThreadSynchronize........................................................................... 56</p><p>4.4.Error Handling........................................................................................... 574.4.1.cudaGetErrorString.................................................................................574.4.2.cudaGetLastError...................................................................................574.4.3.cudaPeekAtLastError...............................................................................57</p><p>4.5.Stream Management.................................................................................... 574.5.1.cudaforGetDefaultStream.........................................................................574.5.2.cudaforSetDefaultStream......................................................................... 584.5.3.cudaStreamAttachMemAsync..................................................................... 584.5.4.cudaStreamCreate................................................................................. 584.5.5.cudaStreamCreateWithFlags......................................................................584.5.6.cudaStreamCreateWithPriority................................................................... 594.5.7.cudaStreamDestroy................................................................................ 594.5.8.cudaStreamGetPriority............................................................................ 594.5.9.cudaStreamQuery.................................................................................. 594.5.10.cudaStreamSynchronize..........................................................................594.5.11.cudaStreamWaitEvent............................................................................ 59</p><p>4.6.Event Management...................................................................................... 604.6.1.cudaEventCreate................................................................................... 604.6.2.cudaEventCreateWithFlags....................................................................... 604.6.3.cudaEventDestroy.................................................................................. 604.6.4.cudaEventElapsedTime............................................................................ 60</p></li><li><p>CUDA Fortran Programming Guide and Reference Version2017|v</p><p>4.6.5.cudaEventQuery.................................................................................... 604.6.6.cudaEventRecord................................................................................... 614.6.7.cudaEventSynchronize.............................................................................61</p><p>4.7.Execution Control....................................................................................... 614.7.1.cudaFuncGetAttributes............................................................................614.7.2.cudaFuncSetCacheConfig......................................................................... 614.7.3.cudaFuncSetSharedMemConfig................................................................... 614.7.4.cudaSetDoubleForDevice.......................................................................... 624.7.5.cudaSetDoubleForHost.............................................................................62</p><p>4.8.Memory Management................................................................................... 624.8.1. cudaFree.............................................................................................624.8.2.cudaFreeArray........</p></li></ul>