cuda math api - rice university · 2019. 6. 14. · cuda math api vrelease version | 2 half...

260
CUDA MATH API vRelease Version | July 2019 API Reference Manual

Upload: others

Post on 02-Feb-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

  • CUDA MATH API

    vRelease Version | July 2019

    API Reference Manual

  • www.nvidia.comCUDA Math API vRelease Version | ii

    TABLE OF CONTENTS

    Chapter 1. Modules.............................................................................................. 11.1. Half Precision Intrinsics..................................................................................1

    Half Arithmetic Functions.................................................................................. 1Half2 Arithmetic Functions................................................................................ 1Half Comparison Functions.................................................................................2Half2 Comparison Functions............................................................................... 2Half Precision Conversion And Data Movement.........................................................2Half Math Functions......................................................................................... 2Half2 Math Functions....................................................................................... 21.1.1. Half Arithmetic Functions..........................................................................2

    __h2div..................................................................................................... 2__hadd...................................................................................................... 3__hadd_sat................................................................................................. 3__hdiv....................................................................................................... 4__hfma......................................................................................................4__hfma_sat.................................................................................................5__hmul...................................................................................................... 5__hmul_sat................................................................................................. 6__hneg...................................................................................................... 6__hsub...................................................................................................... 7__hsub_sat................................................................................................. 7

    1.1.2. Half2 Arithmetic Functions........................................................................ 8__hadd2.....................................................................................................8__hadd2_sat................................................................................................8__hfma2.................................................................................................... 9__hfma2_sat............................................................................................... 9__hmul2................................................................................................... 10__hmul2_sat.............................................................................................. 10__hneg2................................................................................................... 11__hsub2....................................................................................................11__hsub2_sat...............................................................................................12

    1.1.3. Half Comparison Functions....................................................................... 12__heq...................................................................................................... 12__hequ.....................................................................................................13__hge...................................................................................................... 13__hgeu.....................................................................................................14__hgt.......................................................................................................14__hgtu..................................................................................................... 15__hisinf.................................................................................................... 15__hisnan...................................................................................................16

  • www.nvidia.comCUDA Math API vRelease Version | iii

    __hle.......................................................................................................16__hleu..................................................................................................... 17__hlt....................................................................................................... 17__hltu......................................................................................................18__hne...................................................................................................... 18__hneu.....................................................................................................19

    1.1.4. Half2 Comparison Functions......................................................................19__hbeq2................................................................................................... 19__hbequ2..................................................................................................20__hbge2................................................................................................... 20__hbgeu2..................................................................................................21__hbgt2....................................................................................................21__hbgtu2.................................................................................................. 22__hble2.................................................................................................... 23__hbleu2.................................................................................................. 23__hblt2.................................................................................................... 24__hbltu2...................................................................................................24__hbne2................................................................................................... 25__hbneu2..................................................................................................25__heq2.....................................................................................................26__hequ2................................................................................................... 26__hge2.....................................................................................................27__hgeu2................................................................................................... 27__hgt2..................................................................................................... 28__hgtu2....................................................................................................28__hisnan2................................................................................................. 29__hle2..................................................................................................... 29__hleu2.................................................................................................... 30__hlt2......................................................................................................30__hltu2.................................................................................................... 31__hne2.....................................................................................................31__hneu2................................................................................................... 32

    1.1.5. Half Precision Conversion And Data Movement............................................... 32__float22half2_rn........................................................................................32__float2half...............................................................................................33__float2half2_rn......................................................................................... 33__float2half_rd...........................................................................................34__float2half_rn...........................................................................................34__float2half_ru...........................................................................................35__float2half_rz........................................................................................... 35__floats2half2_rn........................................................................................ 36__half22float2............................................................................................36__half2float...............................................................................................37

  • www.nvidia.comCUDA Math API vRelease Version | iv

    __half2half2.............................................................................................. 37__half2int_rd............................................................................................. 37__half2int_rn............................................................................................. 38__half2int_ru............................................................................................. 38__half2int_rz............................................................................................. 39__half2ll_rd...............................................................................................39__half2ll_rn............................................................................................... 40__half2ll_ru............................................................................................... 40__half2ll_rz............................................................................................... 41__half2short_rd.......................................................................................... 41__half2short_rn.......................................................................................... 42__half2short_ru.......................................................................................... 42__half2short_rz.......................................................................................... 43__half2uint_rd............................................................................................43__half2uint_rn............................................................................................44__half2uint_ru............................................................................................44__half2uint_rz............................................................................................45__half2ull_rd............................................................................................. 45__half2ull_rn............................................................................................. 46__half2ull_ru............................................................................................. 46__half2ull_rz..............................................................................................47__half2ushort_rd.........................................................................................47__half2ushort_rn.........................................................................................48__half2ushort_ru.........................................................................................48__half2ushort_rz......................................................................................... 49__half_as_short.......................................................................................... 49__half_as_ushort.........................................................................................50__halves2half2........................................................................................... 50__high2float.............................................................................................. 51__high2half............................................................................................... 51__high2half2.............................................................................................. 51__highs2half2.............................................................................................52__int2half_rd............................................................................................. 52__int2half_rn............................................................................................. 53__int2half_ru............................................................................................. 53__int2half_rz............................................................................................. 54__ll2half_rd...............................................................................................54__ll2half_rn............................................................................................... 55__ll2half_ru............................................................................................... 55__ll2half_rz............................................................................................... 56__low2float............................................................................................... 56__low2half................................................................................................ 57__low2half2...............................................................................................57

  • www.nvidia.comCUDA Math API vRelease Version | v

    __lowhigh2highlow...................................................................................... 57__lows2half2..............................................................................................58__shfl_down_sync........................................................................................58__shfl_down_sync........................................................................................59__shfl_sync............................................................................................... 60__shfl_sync............................................................................................... 60__shfl_up_sync........................................................................................... 61__shfl_up_sync........................................................................................... 62__shfl_xor_sync.......................................................................................... 63__shfl_xor_sync.......................................................................................... 63__short2half_rd.......................................................................................... 64__short2half_rn.......................................................................................... 64__short2half_ru.......................................................................................... 65__short2half_rz.......................................................................................... 65__short_as_half.......................................................................................... 66__uint2half_rd............................................................................................66__uint2half_rn............................................................................................67__uint2half_ru............................................................................................67__uint2half_rz............................................................................................68__ull2half_rd............................................................................................. 68__ull2half_rn............................................................................................. 69__ull2half_ru............................................................................................. 69__ull2half_rz..............................................................................................70__ushort2half_rd.........................................................................................70__ushort2half_rn.........................................................................................71__ushort2half_ru.........................................................................................71__ushort2half_rz......................................................................................... 72__ushort_as_half.........................................................................................72

    1.1.6. Half Math Functions............................................................................... 72hceil........................................................................................................73hcos........................................................................................................ 73hexp........................................................................................................73hexp10.....................................................................................................74hexp2...................................................................................................... 74hfloor...................................................................................................... 75hlog........................................................................................................ 75hlog10..................................................................................................... 76hlog2....................................................................................................... 76hrcp........................................................................................................ 76hrint........................................................................................................77hrsqrt...................................................................................................... 77hsin.........................................................................................................78hsqrt....................................................................................................... 78

  • www.nvidia.comCUDA Math API vRelease Version | vi

    htrunc......................................................................................................791.1.7. Half2 Math Functions..............................................................................79

    h2ceil...................................................................................................... 79h2cos.......................................................................................................80h2exp...................................................................................................... 80h2exp10................................................................................................... 80h2exp2.....................................................................................................81h2floor.....................................................................................................81h2log....................................................................................................... 82h2log10.................................................................................................... 82h2log2..................................................................................................... 83h2rcp.......................................................................................................83h2rint...................................................................................................... 83h2rsqrt.....................................................................................................84h2sin....................................................................................................... 84h2sqrt......................................................................................................85h2trunc.................................................................................................... 85

    1.2. Mathematical Functions................................................................................ 861.3. Single Precision Mathematical Functions............................................................ 86

    acosf.......................................................................................................... 86acoshf.........................................................................................................86asinf...........................................................................................................87asinhf......................................................................................................... 87atan2f.........................................................................................................88atanf.......................................................................................................... 88atanhf........................................................................................................ 88cbrtf.......................................................................................................... 89ceilf........................................................................................................... 89copysignf..................................................................................................... 90cosf........................................................................................................... 90coshf.......................................................................................................... 90cospif......................................................................................................... 91cyl_bessel_i0f............................................................................................... 91cyl_bessel_i1f............................................................................................... 91erfcf.......................................................................................................... 92erfcinvf....................................................................................................... 92erfcxf......................................................................................................... 93erff............................................................................................................93erfinvf........................................................................................................ 93exp10f........................................................................................................ 94exp2f..........................................................................................................94expf........................................................................................................... 95expm1f....................................................................................................... 95

  • www.nvidia.comCUDA Math API vRelease Version | vii

    fabsf.......................................................................................................... 95fdimf..........................................................................................................96fdividef....................................................................................................... 96floorf..........................................................................................................97fmaf...........................................................................................................97fmaxf......................................................................................................... 98fminf..........................................................................................................98fmodf......................................................................................................... 99frexpf......................................................................................................... 99hypotf....................................................................................................... 100ilogbf........................................................................................................ 100isfinite...................................................................................................... 101isinf..........................................................................................................101isnan.........................................................................................................101j0f............................................................................................................102j1f............................................................................................................102jnf........................................................................................................... 103ldexpf....................................................................................................... 103lgammaf.................................................................................................... 103llrintf........................................................................................................ 104llroundf..................................................................................................... 104log10f....................................................................................................... 105log1pf....................................................................................................... 105log2f......................................................................................................... 105logbf.........................................................................................................106logf.......................................................................................................... 106lrintf.........................................................................................................107lroundf...................................................................................................... 107modff........................................................................................................107nanf..........................................................................................................108nearbyintf.................................................................................................. 108nextafterf.................................................................................................. 108norm3df.....................................................................................................109norm4df.....................................................................................................109normcdff....................................................................................................110normcdfinvf................................................................................................ 110normf........................................................................................................110powf......................................................................................................... 111rcbrtf........................................................................................................ 112remainderf................................................................................................. 112remquof.....................................................................................................112rhypotf...................................................................................................... 113rintf..........................................................................................................113

  • www.nvidia.comCUDA Math API vRelease Version | viii

    rnorm3df....................................................................................................114rnorm4df....................................................................................................114rnormf.......................................................................................................115roundf....................................................................................................... 115rsqrtf........................................................................................................ 115scalblnf..................................................................................................... 116scalbnf...................................................................................................... 116signbit.......................................................................................................117sincosf.......................................................................................................117sincospif.................................................................................................... 118sinf...........................................................................................................118sinhf......................................................................................................... 119sinpif........................................................................................................ 119sqrtf......................................................................................................... 119tanf.......................................................................................................... 120tanhf........................................................................................................ 120tgammaf.................................................................................................... 121truncf........................................................................................................121y0f........................................................................................................... 121y1f........................................................................................................... 122ynf........................................................................................................... 122

    1.4. Double Precision Mathematical Functions......................................................... 123acos..........................................................................................................123acosh........................................................................................................ 123asin.......................................................................................................... 124asinh.........................................................................................................124atan..........................................................................................................125atan2........................................................................................................ 125atanh........................................................................................................ 125cbrt.......................................................................................................... 126ceil...........................................................................................................126copysign.....................................................................................................127cos........................................................................................................... 127cosh..........................................................................................................127cospi.........................................................................................................128cyl_bessel_i0............................................................................................... 128cyl_bessel_i1............................................................................................... 128erf........................................................................................................... 129erfc.......................................................................................................... 129erfcinv...................................................................................................... 130erfcx.........................................................................................................130erfinv........................................................................................................130exp...........................................................................................................131

  • www.nvidia.comCUDA Math API vRelease Version | ix

    exp10........................................................................................................131exp2......................................................................................................... 132expm1....................................................................................................... 132fabs.......................................................................................................... 132fdim......................................................................................................... 133floor......................................................................................................... 133fma.......................................................................................................... 134fmax......................................................................................................... 134fmin......................................................................................................... 135fmod.........................................................................................................135frexp.........................................................................................................136hypot........................................................................................................ 136ilogb......................................................................................................... 137isfinite...................................................................................................... 137isinf..........................................................................................................137isnan.........................................................................................................138j0.............................................................................................................138j1.............................................................................................................138jn............................................................................................................ 139ldexp........................................................................................................ 139lgamma..................................................................................................... 140llrint......................................................................................................... 140llround...................................................................................................... 141log........................................................................................................... 141log10........................................................................................................ 141log1p........................................................................................................ 142log2.......................................................................................................... 142logb..........................................................................................................143lrint..........................................................................................................143lround....................................................................................................... 143modf.........................................................................................................144nan...........................................................................................................144nearbyint................................................................................................... 144nextafter................................................................................................... 145norm.........................................................................................................145norm3d......................................................................................................146norm4d......................................................................................................146normcdf.....................................................................................................146normcdfinv................................................................................................. 147pow.......................................................................................................... 147rcbrt......................................................................................................... 148remainder.................................................................................................. 148remquo......................................................................................................149

  • www.nvidia.comCUDA Math API vRelease Version | x

    rhypot....................................................................................................... 149rint...........................................................................................................150rnorm........................................................................................................150rnorm3d.....................................................................................................151rnorm4d.....................................................................................................151round........................................................................................................ 152rsqrt......................................................................................................... 152scalbln...................................................................................................... 152scalbn....................................................................................................... 153signbit.......................................................................................................153sin............................................................................................................153sincos........................................................................................................154sincospi..................................................................................................... 154sinh.......................................................................................................... 155sinpi......................................................................................................... 155sqrt.......................................................................................................... 155tan........................................................................................................... 156tanh......................................................................................................... 156tgamma..................................................................................................... 157trunc.........................................................................................................157y0............................................................................................................ 157y1............................................................................................................ 158yn............................................................................................................ 158

    1.5. Single Precision Intrinsics.............................................................................159__cosf....................................................................................................... 159__exp10f.................................................................................................... 159__expf.......................................................................................................160__fadd_rd...................................................................................................160__fadd_rn...................................................................................................161__fadd_ru...................................................................................................161__fadd_rz...................................................................................................161__fdiv_rd................................................................................................... 162__fdiv_rn....................................................................................................162__fdiv_ru....................................................................................................162__fdiv_rz....................................................................................................163__fdividef...................................................................................................163__fmaf_rd.................................................................................................. 164__fmaf_rn.................................................................................................. 164__fmaf_ru.................................................................................................. 165__fmaf_rz...................................................................................................165__fmul_rd...................................................................................................166__fmul_rn...................................................................................................166__fmul_ru...................................................................................................166

  • www.nvidia.comCUDA Math API vRelease Version | xi

    __fmul_rz...................................................................................................167__frcp_rd................................................................................................... 167__frcp_rn................................................................................................... 167__frcp_ru................................................................................................... 168__frcp_rz................................................................................................... 168__frsqrt_rn................................................................................................. 169__fsqrt_rd.................................................................................................. 169__fsqrt_rn.................................................................................................. 169__fsqrt_ru.................................................................................................. 170__fsqrt_rz...................................................................................................170__fsub_rd................................................................................................... 170__fsub_rn................................................................................................... 171__fsub_ru................................................................................................... 171__fsub_rz................................................................................................... 172__log10f.....................................................................................................172__log2f...................................................................................................... 172__logf....................................................................................................... 173__powf...................................................................................................... 173__saturatef................................................................................................. 174__sincosf....................................................................................................174__sinf........................................................................................................174__tanf....................................................................................................... 175

    1.6. Double Precision Intrinsics........................................................................... 175__dadd_rd.................................................................................................. 175__dadd_rn.................................................................................................. 176__dadd_ru.................................................................................................. 176__dadd_rz.................................................................................................. 176__ddiv_rd................................................................................................... 177__ddiv_rn................................................................................................... 177__ddiv_ru................................................................................................... 177__ddiv_rz................................................................................................... 178__dmul_rd.................................................................................................. 178__dmul_rn.................................................................................................. 179__dmul_ru.................................................................................................. 179__dmul_rz.................................................................................................. 179__drcp_rd...................................................................................................180__drcp_rn...................................................................................................180__drcp_ru...................................................................................................181__drcp_rz................................................................................................... 181__dsqrt_rd..................................................................................................181__dsqrt_rn.................................................................................................. 182__dsqrt_ru.................................................................................................. 182__dsqrt_rz.................................................................................................. 183

  • www.nvidia.comCUDA Math API vRelease Version | xii

    __dsub_rd.................................................................................................. 183__dsub_rn...................................................................................................183__dsub_ru...................................................................................................184__dsub_rz...................................................................................................184__fma_rd................................................................................................... 185__fma_rn................................................................................................... 185__fma_ru................................................................................................... 186__fma_rz....................................................................................................186

    1.7.  Integer Intrinsics....................................................................................... 187__brev.......................................................................................................187__brevll..................................................................................................... 187__byte_perm............................................................................................... 187__clz.........................................................................................................188__clzll....................................................................................................... 188__ffs......................................................................................................... 188__ffsll....................................................................................................... 189__funnelshift_l.............................................................................................189__funnelshift_lc........................................................................................... 189__funnelshift_r............................................................................................ 190__funnelshift_rc........................................................................................... 190__hadd...................................................................................................... 190__mul24.....................................................................................................191__mul64hi.................................................................................................. 191__mulhi..................................................................................................... 191__popc...................................................................................................... 192__popcll.....................................................................................................192__rhadd..................................................................................................... 192__sad........................................................................................................ 192__uhadd.....................................................................................................193__umul24................................................................................................... 193__umul64hi................................................................................................. 193__umulhi....................................................................................................194__urhadd....................................................................................................194__usad.......................................................................................................194

    1.8. Type Casting Intrinsics................................................................................ 195__double2float_rd.........................................................................................195__double2float_rn.........................................................................................195__double2float_ru.........................................................................................195__double2float_rz.........................................................................................196__double2hiint.............................................................................................196__double2int_rd........................................................................................... 196__double2int_rn........................................................................................... 196__double2int_ru........................................................................................... 197

  • www.nvidia.comCUDA Math API vRelease Version | xiii

    __double2int_rz........................................................................................... 197__double2ll_rd.............................................................................................197__double2ll_rn.............................................................................................198__double2ll_ru.............................................................................................198__double2ll_rz............................................................................................. 198__double2loint.............................................................................................198__double2uint_rd..........................................................................................199__double2uint_rn..........................................................................................199__double2uint_ru..........................................................................................199__double2uint_rz..........................................................................................200__double2ull_rd........................................................................................... 200__double2ull_rn........................................................................................... 200__double2ull_ru........................................................................................... 201__double2ull_rz............................................................................................201__double_as_longlong.................................................................................... 201__float2int_rd..............................................................................................202__float2int_rn..............................................................................................202__float2int_ru..............................................................................................202__float2int_rz..............................................................................................202__float2ll_rd............................................................................................... 203__float2ll_rn............................................................................................... 203__float2ll_ru............................................................................................... 203__float2ll_rz................................................................................................204__float2uint_rd............................................................................................ 204__float2uint_rn............................................................................................ 204__float2uint_ru............................................................................................ 204__float2uint_rz............................................................................................ 205__float2ull_rd..............................................................................................205__float2ull_rn..............................................................................................205__float2ull_ru..............................................................................................206__float2ull_rz.............................................................................................. 206__float_as_int..............................................................................................206__float_as_uint............................................................................................ 206__hiloint2double...........................................................................................207__int2double_rn........................................................................................... 207__int2float_rd..............................................................................................207__int2float_rn..............................................................................................207__int2float_ru..............................................................................................208__int2float_rz..............................................................................................208__int_as_float..............................................................................................208__ll2double_rd.............................................................................................209__ll2double_rn.............................................................................................209__ll2double_ru.............................................................................................209

  • www.nvidia.comCUDA Math API vRelease Version | xiv

    __ll2double_rz............................................................................................. 209__ll2float_rd............................................................................................... 210__ll2float_rn............................................................................................... 210__ll2float_ru............................................................................................... 210__ll2float_rz................................................................................................211__longlong_as_double.................................................................................... 211__uint2double_rn..........................................................................................211__uint2float_rd............................................................................................ 211__uint2float_rn............................................................................................ 212__uint2float_ru............................................................................................ 212__uint2float_rz............................................................................................ 212__uint_as_float............................................................................................ 213__ull2double_rd........................................................................................... 213__ull2double_rn........................................................................................... 213__ull2double_ru........................................................................................... 214__ull2double_rz............................................................................................214__ull2float_rd..............................................................................................214__ull2float_rn..............................................................................................215__ull2float_ru..............................................................................................215__ull2float_rz.............................................................................................. 215

    1.9. SIMD Intrinsics.......................................................................................... 216__vabs2..................................................................................................... 216__vabs4..................................................................................................... 216__vabsdiffs2................................................................................................ 216__vabsdiffs4................................................................................................ 217__vabsdiffu2................................................................................................217__vabsdiffu4................................................................................................217__vabsss2................................................................................................... 218__vabsss4................................................................................................... 218__vadd2..................................................................................................... 218__vadd4..................................................................................................... 219__vaddss2...................................................................................................219__vaddss4...................................................................................................219__vaddus2.................................................................................................. 220__vaddus4.................................................................................................. 220__vavgs2.................................................................................................... 220__vavgs4.................................................................................................... 221__vavgu2....................................................................................................221__vavgu4....................................................................................................221__vcmpeq2................................................................................................. 222__vcmpeq4................................................................................................. 222__vcmpges2................................................................................................ 222__vcmpges4................................................................................................ 223

  • www.nvidia.comCUDA Math API vRelease Version | xv

    __vcmpgeu2................................................................................................ 223__vcmpgeu4................................................................................................ 223__vcmpgts2.................................................................................................224__vcmpgts4.................................................................................................224__vcmpgtu2................................................................................................ 224__vcmpgtu4................................................................................................ 225__vcmples2.................................................................................................225__vcmples4.................................................................................................225__vcmpleu2................................................................................................ 226__vcmpleu4................................................................................................ 226__vcmplts2................................................................................................. 226__vcmplts4................................................................................................. 227__vcmpltu2.................................................................................................227__vcmpltu4.................................................................................................227__vcmpne2................................................................................................. 228__vcmpne4................................................................................................. 228__vhaddu2.................................................................................................. 228__vhaddu4.................................................................................................. 229__vmaxs2................................................................................................... 229__vmaxs4................................................................................................... 229__vmaxu2...................................................................................................230__vmaxu4...................................................................................................230__vmins2....................................................................................................230__vmins4....................................................................................................231__vminu2................................................................................................... 231__vminu4................................................................................................... 231__vneg2..................................................................................................... 232__vneg4..................................................................................................... 232__vnegss2...................................................................................................232__vnegss4...................................................................................................232__vsads2.................................................................................................... 233__vsads4.................................................................................................... 233__vsadu2....................................................................................................233__vsadu4....................................................................................................234__vseteq2...................................................................................................234__vseteq4...................................................................................................234__vsetges2..................................................................................................235__vsetges4..................................................................................................235__vsetgeu2................................................................................................. 235__vsetgeu4................................................................................................. 236__vsetgts2.................................................................................................. 236__vsetgts4.................................................................................................. 236__vsetgtu2..................................................................................................237

  • www.nvidia.comCUDA Math API vRelease Version | xvi

    __vsetgtu4..................................................................................................237__vsetles2.................................................................................................. 237__vsetles4.................................................................................................. 238__vsetleu2.................................................................................................. 238__vsetleu4.................................................................................................. 238__vsetlts2...................................................................................................239__vsetlts4...................................................................................................239__vsetltu2.................................................................................................. 239__vsetltu4.................................................................................................. 240__vsetne2...................................................................................................240__vsetne4...................................................................................................240__vsub2..................................................................................................... 241__vsub4..................................................................................................... 241__vsubss2................................................................................................... 241__vsubss4................................................................................................... 242__vsubus2...................................................................................................242__vsubus4...................................................................................................242

  • www.nvidia.comCUDA Math API vRelease Version | 1

    Chapter 1.MODULES

    Here is a list of all modules:

    ‣ Half Precision Intrinsics

    ‣ Half Arithmetic Functions‣ Half2 Arithmetic Functions‣ Half Comparison Functions‣ Half2 Comparison Functions‣ Half Precision Conversion And Data Movement‣ Half Math Functions‣ Half2 Math Functions

    ‣ Mathematical Functions‣ Single Precision Mathematical Functions‣ Double Precision Mathematical Functions‣ Single Precision Intrinsics‣ Double Precision Intrinsics‣ Integer Intrinsics‣ Type Casting Intrinsics‣ SIMD Intrinsics

    1.1. Half Precision IntrinsicsThis section describes half precision intrinsic functions that are only supported in devicecode. To use these functions include the header file cuda_fp16.h in your program.

    Half Arithmetic Functions

    Half2 Arithmetic Functions

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 2

    Half Comparison Functions

    Half2 Comparison Functions

    Half Precision Conversion And Data Movement

    Half Math Functions

    Half2 Math Functions

    1.1.1. Half Arithmetic FunctionsHalf Precision Intrinsics

    To use these functions include the header file cuda_fp16.h in your program.

    __device__ __half2 __h2div (const __half2 a, const __half2 b)Performs half2 vector division in round-to-nearest-even mode.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

    elementwise division of a with b.

    Description

    Divides half2 input vector a by input vector b in round-to-nearest mode.DEEPLEARN-SRM_REQ-103

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 3

    __device__ __half __hadd (const __half a, const __half b)Performs half addition in round-to-nearest-even mode.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    sum of a and b.

    Description

    Performs half addition of inputs a and b, in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-94

    __device__ __half __hadd_sat (const __half a, const __half b)Performs half addition in round-to-nearest-even mode, with saturation to [0.0, 1.0].

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    sum of a and b, with respect to saturation.

    Description

    Performs half add of inputs a and b, in round-to-nearest-even mode, and clamps theresult to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 4

    __device__ __half __hdiv (const __half a, const __half b)Performs half division in round-to-nearest-even mode.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    result of dividing a by b.

    Description

    Divides half input a by input b in round-to-nearest mode. DEEPLEARN-SRM_REQ-98

    __device__ __half __hfma (const __half a, const __half b, const__half c)Performs half fused multiply-add in round-to-nearest-even mode.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    c- half. Is only being read.

    Returns

    half

    ‣ The

    result of fused multiply-add operation on a, b, and c.

    Description

    Performs half multiply on inputs a and b, then performs a half add of the result withc, rounding the result once in round-to-nearest-even mode. DEEPLEARN-SRM_REQ-96

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 5

    __device__ __half __hfma_sat (const __half a, const __half b, const__half c)Performs half fused multiply-add in round-to-nearest-even mode, with saturation to[0.0, 1.0].

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    c- half. Is only being read.

    Returns

    half

    ‣ The

    result of fused multiply-add operation on a, b, and c, with respect to saturation.

    Description

    Performs half multiply on inputs a and b, then performs a half add of the result withc, rounding the result once in round-to-nearest-even mode, and clamps the result torange [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    __device__ __half __hmul (const __half a, const __half b)Performs half multiplication in round-to-nearest-even mode.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    result of multiplying a and b.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 6

    Description

    Performs half multiplication of inputs a and b, in round-to-nearest mode.DEEPLEARN-SRM_REQ-99

    __device__ __half __hmul_sat (const __half a, const __half b)Performs half multiplication in round-to-nearest-even mode, with saturation to [0.0,1.0].

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    result of multiplying a and b, with respect to saturation.

    Description

    Performs half multiplication of inputs a and b, in round-to-nearest mode, and clampsthe result to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    __device__ __half __hneg (const __half a)Negates input half number and returns the result.

    Parameters

    a- half. Is only being read.

    Returns

    half

    ‣ minus

    a

    Description

    Negates input half number and returns the result. DEEPLEARN-SRM_REQ-100

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 7

    __device__ __half __hsub (const __half a, const __half b)Performs half subtraction in round-to-nearest-even mode.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    result of subtracting b from a.

    Description

    Subtracts half input b from input a in round-to-nearest mode. DEEPLEARN-SRM_REQ-97

    __device__ __half __hsub_sat (const __half a, const __half b)Performs half subtraction in round-to-nearest-even mode, with saturation to [0.0, 1.0].

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    half

    ‣ The

    result of subtraction of b from a, with respect to saturation.

    Description

    Subtracts half input b from input a in round-to-nearest mode, and clamps the result torange [0.0, 1.0]. NaN results are flushed to +0.0. TBR

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 8

    1.1.2. Half2 Arithmetic FunctionsHalf Precision Intrinsics

    To use these functions include the header file cuda_fp16.h in your program.

    __device__ __half2 __hadd2 (const __half2 a, const __half2 b)Performs half2 vector addition in round-to-nearest-even mode.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

    sum of vectors a and b.

    Description

    Performs half2 vector add of inputs a and b, in round-to-nearest mode. DEEPLEARN-SRM_REQ-95

    __device__ __half2 __hadd2_sat (const __half2 a, const __half2 b)Performs half2 vector addition in round-to-nearest-even mode, with saturation to [0.0,1.0].

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

    sum of a and b, with respect to saturation.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 9

    Description

    Performs half2 vector add of inputs a and b, in round-to-nearest mode, and clamps theresults to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    __device__ __half2 __hfma2 (const __half2 a, const __half2 b, const__half2 c)Performs half2 vector fused multiply-add in round-to-nearest-even mode.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    c- half2. Is only being read.

    Returns

    half2

    ‣ The

    result of elementwise fused multiply-add operation on vectors a, b, and c.

    Description

    Performs half2 vector multiply on inputs a and b, then performs a half2 vectoradd of the result with c, rounding the result once in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-105

    __device__ __half2 __hfma2_sat (const __half2 a, const __half2 b,const __half2 c)Performs half2 vector fused multiply-add in round-to-nearest-even mode, withsaturation to [0.0, 1.0].

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    c- half2. Is only being read.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 10

    Returns

    half2

    ‣ The

    result of elementwise fused multiply-add operation on vectors a, b, and c, withrespect to saturation.

    Description

    Performs half2 vector multiply on inputs a and b, then performs a half2 vectoradd of the result with c, rounding the result once in round-to-nearest-even mode, andclamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    __device__ __half2 __hmul2 (const __half2 a, const __half2 b)Performs half2 vector multiplication in round-to-nearest-even mode.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

    result of elementwise multiplying the vectors a and b.

    Description

    Performs half2 vector multiplication of inputs a and b, in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-102

    __device__ __half2 __hmul2_sat (const __half2 a, const __half2 b)Performs half2 vector multiplication in round-to-nearest-even mode, with saturation to[0.0, 1.0].

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 11

    Returns

    half2

    ‣ The

    result of elementwise multiplication of vectors a and b, with respect to saturation.

    Description

    Performs half2 vector multiplication of inputs a and b, in round-to-nearest-even mode,and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    __device__ __half2 __hneg2 (const __half2 a)Negates both halves of the input half2 number and returns the result.

    Parameters

    a- half2. Is only being read.

    Returns

    half2

    ‣ Returns

    a with both halves negated.

    Description

    Negates both halves of the input half2 number a and returns the result. DEEPLEARN-SRM_REQ-101

    __device__ __half2 __hsub2 (const __half2 a, const __half2 b)Performs half2 vector subtraction in round-to-nearest-even mode.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 12

    subtraction of vector b from a.

    Description

    Subtracts half2 input vector b from input vector a in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-104

    __device__ __half2 __hsub2_sat (const __half2 a, const __half2 b)Performs half2 vector subtraction in round-to-nearest-even mode, with saturation to[0.0, 1.0].

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    half2

    ‣ The

    subtraction of vector b from a, with respect to saturation.

    Description

    Subtracts half2 input vector b from input vector a in round-to-nearest-even mode, andclamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR

    1.1.3. Half Comparison FunctionsHalf Precision Intrinsics

    To use these functions include the header file cuda_fp16.h in your program.

    __device__ bool __heq (const __half a, const __half b)Performs half if-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 13

    Returns

    bool

    ‣ The

    boolean result of if-equal comparison of a and b.

    Description

    Performs half if-equal comparison of inputs a and b. NaN inputs generate false results.TBR

    __device__ bool __hequ (const __half a, const __half b)Performs half unordered if-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered if-equal comparison of a and b.

    Description

    Performs half if-equal comparison of inputs a and b. NaN inputs generate true results.TBR

    __device__ bool __hge (const __half a, const __half b)Performs half greater-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 14

    ‣ The

    boolean result of greater-equal comparison of a and b.

    Description

    Performs half greater-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR

    __device__ bool __hgeu (const __half a, const __half b)Performs half unordered greater-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered greater-equal comparison of a and b.

    Description

    Performs half greater-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR

    __device__ bool __hgt (const __half a, const __half b)Performs half greater-than comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of greater-than comparison of a and b.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 15

    Description

    Performs half greater-than comparison of inputs a and b. NaN inputs generate falseresults. TBR

    __device__ bool __hgtu (const __half a, const __half b)Performs half unordered greater-than comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered greater-than comparison of a and b.

    Description

    Performs half greater-than comparison of inputs a and b. NaN inputs generate trueresults. TBR

    __device__ int __hisinf (const __half a)Checks if the input half number is infinite.

    Parameters

    a- half. Is only being read.

    Returns

    int

    ‣ -1

    iff a is equal to negative infinity,‣ 1

    iff a is equal to positive infinity,‣ 0

    otherwise.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 16

    Description

    Checks if the input half number a is infinite. TBR

    __device__ bool __hisnan (const __half a)Determine whether half argument is a NaN.

    Parameters

    a- half. Is only being read.

    Returns

    bool

    ‣ true

    iff argument is NaN.

    Description

    Determine whether half value a is a NaN. TBR

    __device__ bool __hle (const __half a, const __half b)Performs half less-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of less-equal comparison of a and b.

    Description

    Performs half less-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 17

    __device__ bool __hleu (const __half a, const __half b)Performs half unordered less-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered less-equal comparison of a and b.

    Description

    Performs half less-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR

    __device__ bool __hlt (const __half a, const __half b)Performs half less-than comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of less-than comparison of a and b.

    Description

    Performs half less-than comparison of inputs a and b. NaN inputs generate falseresults. TBR

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 18

    __device__ bool __hltu (const __half a, const __half b)Performs half unordered less-than comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered less-than comparison of a and b.

    Description

    Performs half less-than comparison of inputs a and b. NaN inputs generate trueresults. TBR

    __device__ bool __hne (const __half a, const __half b)Performs half not-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of not-equal comparison of a and b.

    Description

    Performs half not-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 19

    __device__ bool __hneu (const __half a, const __half b)Performs half unordered not-equal comparison.

    Parameters

    a- half. Is only being read.

    b- half. Is only being read.

    Returns

    bool

    ‣ The

    boolean result of unordered not-equal comparison of a and b.

    Description

    Performs half not-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR

    1.1.4. Half2 Comparison FunctionsHalf Precision Intrinsics

    To use these functions include the header file cuda_fp16.h in your program.

    __device__ bool __hbeq2 (const __half2 a, const __half2 b)Performs half2 vector if-equal comparison, and returns boolean true iff both halfresults are true, boolean false otherwise.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    bool

    ‣ trueif

    both half results of if-equal comparison of vectors a and b are true;‣ falseotherwise.

  • Modules

    www.nvidia.comCUDA Math API vRelease Version | 20

    Description

    Performs half2 vector if-equal comparison of inputs a and b. The bool result is set totrue only if both half if-equal comparisons evaluate to true, or false otherwise. NaNinputs generate false results. TBR

    __device__ bool __hbequ2 (const __half2 a, const __half2 b)Performs half2 vector unordered if-equal comparison, and returns boolean true iff bothhalf results are true, boolean false otherwise.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    bool

    ‣ trueif

    both half results of unordered if-equal comparison of vectors a and b are true;‣ falseotherwise.

    Description

    Performs half2 vector if-equal comparison of inputs a and b. The bool result is set totrue only if both half if-equal comparisons evaluate to true, or false otherwise. NaNinputs generate true results. TBR

    __device__ bool __hbge2 (const __half2 a, const __half2 b)Performs half2 vector greater-equal comparison, and returns boolean true iff bothhalf results are true, boolean false otherwise.

    Parameters

    a- half2. Is only being read.

    b- half2. Is only being read.

    Returns

    boo