cuda math api - rice university · 2019. 6. 14. · cuda math api vrelease version | 2 half...
TRANSCRIPT
-
CUDA MATH API
vRelease Version | July 2019
API Reference Manual
-
www.nvidia.comCUDA Math API vRelease Version | ii
TABLE OF CONTENTS
Chapter 1. Modules.............................................................................................. 11.1. Half Precision Intrinsics..................................................................................1
Half Arithmetic Functions.................................................................................. 1Half2 Arithmetic Functions................................................................................ 1Half Comparison Functions.................................................................................2Half2 Comparison Functions............................................................................... 2Half Precision Conversion And Data Movement.........................................................2Half Math Functions......................................................................................... 2Half2 Math Functions....................................................................................... 21.1.1. Half Arithmetic Functions..........................................................................2
__h2div..................................................................................................... 2__hadd...................................................................................................... 3__hadd_sat................................................................................................. 3__hdiv....................................................................................................... 4__hfma......................................................................................................4__hfma_sat.................................................................................................5__hmul...................................................................................................... 5__hmul_sat................................................................................................. 6__hneg...................................................................................................... 6__hsub...................................................................................................... 7__hsub_sat................................................................................................. 7
1.1.2. Half2 Arithmetic Functions........................................................................ 8__hadd2.....................................................................................................8__hadd2_sat................................................................................................8__hfma2.................................................................................................... 9__hfma2_sat............................................................................................... 9__hmul2................................................................................................... 10__hmul2_sat.............................................................................................. 10__hneg2................................................................................................... 11__hsub2....................................................................................................11__hsub2_sat...............................................................................................12
1.1.3. Half Comparison Functions....................................................................... 12__heq...................................................................................................... 12__hequ.....................................................................................................13__hge...................................................................................................... 13__hgeu.....................................................................................................14__hgt.......................................................................................................14__hgtu..................................................................................................... 15__hisinf.................................................................................................... 15__hisnan...................................................................................................16
-
www.nvidia.comCUDA Math API vRelease Version | iii
__hle.......................................................................................................16__hleu..................................................................................................... 17__hlt....................................................................................................... 17__hltu......................................................................................................18__hne...................................................................................................... 18__hneu.....................................................................................................19
1.1.4. Half2 Comparison Functions......................................................................19__hbeq2................................................................................................... 19__hbequ2..................................................................................................20__hbge2................................................................................................... 20__hbgeu2..................................................................................................21__hbgt2....................................................................................................21__hbgtu2.................................................................................................. 22__hble2.................................................................................................... 23__hbleu2.................................................................................................. 23__hblt2.................................................................................................... 24__hbltu2...................................................................................................24__hbne2................................................................................................... 25__hbneu2..................................................................................................25__heq2.....................................................................................................26__hequ2................................................................................................... 26__hge2.....................................................................................................27__hgeu2................................................................................................... 27__hgt2..................................................................................................... 28__hgtu2....................................................................................................28__hisnan2................................................................................................. 29__hle2..................................................................................................... 29__hleu2.................................................................................................... 30__hlt2......................................................................................................30__hltu2.................................................................................................... 31__hne2.....................................................................................................31__hneu2................................................................................................... 32
1.1.5. Half Precision Conversion And Data Movement............................................... 32__float22half2_rn........................................................................................32__float2half...............................................................................................33__float2half2_rn......................................................................................... 33__float2half_rd...........................................................................................34__float2half_rn...........................................................................................34__float2half_ru...........................................................................................35__float2half_rz........................................................................................... 35__floats2half2_rn........................................................................................ 36__half22float2............................................................................................36__half2float...............................................................................................37
-
www.nvidia.comCUDA Math API vRelease Version | iv
__half2half2.............................................................................................. 37__half2int_rd............................................................................................. 37__half2int_rn............................................................................................. 38__half2int_ru............................................................................................. 38__half2int_rz............................................................................................. 39__half2ll_rd...............................................................................................39__half2ll_rn............................................................................................... 40__half2ll_ru............................................................................................... 40__half2ll_rz............................................................................................... 41__half2short_rd.......................................................................................... 41__half2short_rn.......................................................................................... 42__half2short_ru.......................................................................................... 42__half2short_rz.......................................................................................... 43__half2uint_rd............................................................................................43__half2uint_rn............................................................................................44__half2uint_ru............................................................................................44__half2uint_rz............................................................................................45__half2ull_rd............................................................................................. 45__half2ull_rn............................................................................................. 46__half2ull_ru............................................................................................. 46__half2ull_rz..............................................................................................47__half2ushort_rd.........................................................................................47__half2ushort_rn.........................................................................................48__half2ushort_ru.........................................................................................48__half2ushort_rz......................................................................................... 49__half_as_short.......................................................................................... 49__half_as_ushort.........................................................................................50__halves2half2........................................................................................... 50__high2float.............................................................................................. 51__high2half............................................................................................... 51__high2half2.............................................................................................. 51__highs2half2.............................................................................................52__int2half_rd............................................................................................. 52__int2half_rn............................................................................................. 53__int2half_ru............................................................................................. 53__int2half_rz............................................................................................. 54__ll2half_rd...............................................................................................54__ll2half_rn............................................................................................... 55__ll2half_ru............................................................................................... 55__ll2half_rz............................................................................................... 56__low2float............................................................................................... 56__low2half................................................................................................ 57__low2half2...............................................................................................57
-
www.nvidia.comCUDA Math API vRelease Version | v
__lowhigh2highlow...................................................................................... 57__lows2half2..............................................................................................58__shfl_down_sync........................................................................................58__shfl_down_sync........................................................................................59__shfl_sync............................................................................................... 60__shfl_sync............................................................................................... 60__shfl_up_sync........................................................................................... 61__shfl_up_sync........................................................................................... 62__shfl_xor_sync.......................................................................................... 63__shfl_xor_sync.......................................................................................... 63__short2half_rd.......................................................................................... 64__short2half_rn.......................................................................................... 64__short2half_ru.......................................................................................... 65__short2half_rz.......................................................................................... 65__short_as_half.......................................................................................... 66__uint2half_rd............................................................................................66__uint2half_rn............................................................................................67__uint2half_ru............................................................................................67__uint2half_rz............................................................................................68__ull2half_rd............................................................................................. 68__ull2half_rn............................................................................................. 69__ull2half_ru............................................................................................. 69__ull2half_rz..............................................................................................70__ushort2half_rd.........................................................................................70__ushort2half_rn.........................................................................................71__ushort2half_ru.........................................................................................71__ushort2half_rz......................................................................................... 72__ushort_as_half.........................................................................................72
1.1.6. Half Math Functions............................................................................... 72hceil........................................................................................................73hcos........................................................................................................ 73hexp........................................................................................................73hexp10.....................................................................................................74hexp2...................................................................................................... 74hfloor...................................................................................................... 75hlog........................................................................................................ 75hlog10..................................................................................................... 76hlog2....................................................................................................... 76hrcp........................................................................................................ 76hrint........................................................................................................77hrsqrt...................................................................................................... 77hsin.........................................................................................................78hsqrt....................................................................................................... 78
-
www.nvidia.comCUDA Math API vRelease Version | vi
htrunc......................................................................................................791.1.7. Half2 Math Functions..............................................................................79
h2ceil...................................................................................................... 79h2cos.......................................................................................................80h2exp...................................................................................................... 80h2exp10................................................................................................... 80h2exp2.....................................................................................................81h2floor.....................................................................................................81h2log....................................................................................................... 82h2log10.................................................................................................... 82h2log2..................................................................................................... 83h2rcp.......................................................................................................83h2rint...................................................................................................... 83h2rsqrt.....................................................................................................84h2sin....................................................................................................... 84h2sqrt......................................................................................................85h2trunc.................................................................................................... 85
1.2. Mathematical Functions................................................................................ 861.3. Single Precision Mathematical Functions............................................................ 86
acosf.......................................................................................................... 86acoshf.........................................................................................................86asinf...........................................................................................................87asinhf......................................................................................................... 87atan2f.........................................................................................................88atanf.......................................................................................................... 88atanhf........................................................................................................ 88cbrtf.......................................................................................................... 89ceilf........................................................................................................... 89copysignf..................................................................................................... 90cosf........................................................................................................... 90coshf.......................................................................................................... 90cospif......................................................................................................... 91cyl_bessel_i0f............................................................................................... 91cyl_bessel_i1f............................................................................................... 91erfcf.......................................................................................................... 92erfcinvf....................................................................................................... 92erfcxf......................................................................................................... 93erff............................................................................................................93erfinvf........................................................................................................ 93exp10f........................................................................................................ 94exp2f..........................................................................................................94expf........................................................................................................... 95expm1f....................................................................................................... 95
-
www.nvidia.comCUDA Math API vRelease Version | vii
fabsf.......................................................................................................... 95fdimf..........................................................................................................96fdividef....................................................................................................... 96floorf..........................................................................................................97fmaf...........................................................................................................97fmaxf......................................................................................................... 98fminf..........................................................................................................98fmodf......................................................................................................... 99frexpf......................................................................................................... 99hypotf....................................................................................................... 100ilogbf........................................................................................................ 100isfinite...................................................................................................... 101isinf..........................................................................................................101isnan.........................................................................................................101j0f............................................................................................................102j1f............................................................................................................102jnf........................................................................................................... 103ldexpf....................................................................................................... 103lgammaf.................................................................................................... 103llrintf........................................................................................................ 104llroundf..................................................................................................... 104log10f....................................................................................................... 105log1pf....................................................................................................... 105log2f......................................................................................................... 105logbf.........................................................................................................106logf.......................................................................................................... 106lrintf.........................................................................................................107lroundf...................................................................................................... 107modff........................................................................................................107nanf..........................................................................................................108nearbyintf.................................................................................................. 108nextafterf.................................................................................................. 108norm3df.....................................................................................................109norm4df.....................................................................................................109normcdff....................................................................................................110normcdfinvf................................................................................................ 110normf........................................................................................................110powf......................................................................................................... 111rcbrtf........................................................................................................ 112remainderf................................................................................................. 112remquof.....................................................................................................112rhypotf...................................................................................................... 113rintf..........................................................................................................113
-
www.nvidia.comCUDA Math API vRelease Version | viii
rnorm3df....................................................................................................114rnorm4df....................................................................................................114rnormf.......................................................................................................115roundf....................................................................................................... 115rsqrtf........................................................................................................ 115scalblnf..................................................................................................... 116scalbnf...................................................................................................... 116signbit.......................................................................................................117sincosf.......................................................................................................117sincospif.................................................................................................... 118sinf...........................................................................................................118sinhf......................................................................................................... 119sinpif........................................................................................................ 119sqrtf......................................................................................................... 119tanf.......................................................................................................... 120tanhf........................................................................................................ 120tgammaf.................................................................................................... 121truncf........................................................................................................121y0f........................................................................................................... 121y1f........................................................................................................... 122ynf........................................................................................................... 122
1.4. Double Precision Mathematical Functions......................................................... 123acos..........................................................................................................123acosh........................................................................................................ 123asin.......................................................................................................... 124asinh.........................................................................................................124atan..........................................................................................................125atan2........................................................................................................ 125atanh........................................................................................................ 125cbrt.......................................................................................................... 126ceil...........................................................................................................126copysign.....................................................................................................127cos........................................................................................................... 127cosh..........................................................................................................127cospi.........................................................................................................128cyl_bessel_i0............................................................................................... 128cyl_bessel_i1............................................................................................... 128erf........................................................................................................... 129erfc.......................................................................................................... 129erfcinv...................................................................................................... 130erfcx.........................................................................................................130erfinv........................................................................................................130exp...........................................................................................................131
-
www.nvidia.comCUDA Math API vRelease Version | ix
exp10........................................................................................................131exp2......................................................................................................... 132expm1....................................................................................................... 132fabs.......................................................................................................... 132fdim......................................................................................................... 133floor......................................................................................................... 133fma.......................................................................................................... 134fmax......................................................................................................... 134fmin......................................................................................................... 135fmod.........................................................................................................135frexp.........................................................................................................136hypot........................................................................................................ 136ilogb......................................................................................................... 137isfinite...................................................................................................... 137isinf..........................................................................................................137isnan.........................................................................................................138j0.............................................................................................................138j1.............................................................................................................138jn............................................................................................................ 139ldexp........................................................................................................ 139lgamma..................................................................................................... 140llrint......................................................................................................... 140llround...................................................................................................... 141log........................................................................................................... 141log10........................................................................................................ 141log1p........................................................................................................ 142log2.......................................................................................................... 142logb..........................................................................................................143lrint..........................................................................................................143lround....................................................................................................... 143modf.........................................................................................................144nan...........................................................................................................144nearbyint................................................................................................... 144nextafter................................................................................................... 145norm.........................................................................................................145norm3d......................................................................................................146norm4d......................................................................................................146normcdf.....................................................................................................146normcdfinv................................................................................................. 147pow.......................................................................................................... 147rcbrt......................................................................................................... 148remainder.................................................................................................. 148remquo......................................................................................................149
-
www.nvidia.comCUDA Math API vRelease Version | x
rhypot....................................................................................................... 149rint...........................................................................................................150rnorm........................................................................................................150rnorm3d.....................................................................................................151rnorm4d.....................................................................................................151round........................................................................................................ 152rsqrt......................................................................................................... 152scalbln...................................................................................................... 152scalbn....................................................................................................... 153signbit.......................................................................................................153sin............................................................................................................153sincos........................................................................................................154sincospi..................................................................................................... 154sinh.......................................................................................................... 155sinpi......................................................................................................... 155sqrt.......................................................................................................... 155tan........................................................................................................... 156tanh......................................................................................................... 156tgamma..................................................................................................... 157trunc.........................................................................................................157y0............................................................................................................ 157y1............................................................................................................ 158yn............................................................................................................ 158
1.5. Single Precision Intrinsics.............................................................................159__cosf....................................................................................................... 159__exp10f.................................................................................................... 159__expf.......................................................................................................160__fadd_rd...................................................................................................160__fadd_rn...................................................................................................161__fadd_ru...................................................................................................161__fadd_rz...................................................................................................161__fdiv_rd................................................................................................... 162__fdiv_rn....................................................................................................162__fdiv_ru....................................................................................................162__fdiv_rz....................................................................................................163__fdividef...................................................................................................163__fmaf_rd.................................................................................................. 164__fmaf_rn.................................................................................................. 164__fmaf_ru.................................................................................................. 165__fmaf_rz...................................................................................................165__fmul_rd...................................................................................................166__fmul_rn...................................................................................................166__fmul_ru...................................................................................................166
-
www.nvidia.comCUDA Math API vRelease Version | xi
__fmul_rz...................................................................................................167__frcp_rd................................................................................................... 167__frcp_rn................................................................................................... 167__frcp_ru................................................................................................... 168__frcp_rz................................................................................................... 168__frsqrt_rn................................................................................................. 169__fsqrt_rd.................................................................................................. 169__fsqrt_rn.................................................................................................. 169__fsqrt_ru.................................................................................................. 170__fsqrt_rz...................................................................................................170__fsub_rd................................................................................................... 170__fsub_rn................................................................................................... 171__fsub_ru................................................................................................... 171__fsub_rz................................................................................................... 172__log10f.....................................................................................................172__log2f...................................................................................................... 172__logf....................................................................................................... 173__powf...................................................................................................... 173__saturatef................................................................................................. 174__sincosf....................................................................................................174__sinf........................................................................................................174__tanf....................................................................................................... 175
1.6. Double Precision Intrinsics........................................................................... 175__dadd_rd.................................................................................................. 175__dadd_rn.................................................................................................. 176__dadd_ru.................................................................................................. 176__dadd_rz.................................................................................................. 176__ddiv_rd................................................................................................... 177__ddiv_rn................................................................................................... 177__ddiv_ru................................................................................................... 177__ddiv_rz................................................................................................... 178__dmul_rd.................................................................................................. 178__dmul_rn.................................................................................................. 179__dmul_ru.................................................................................................. 179__dmul_rz.................................................................................................. 179__drcp_rd...................................................................................................180__drcp_rn...................................................................................................180__drcp_ru...................................................................................................181__drcp_rz................................................................................................... 181__dsqrt_rd..................................................................................................181__dsqrt_rn.................................................................................................. 182__dsqrt_ru.................................................................................................. 182__dsqrt_rz.................................................................................................. 183
-
www.nvidia.comCUDA Math API vRelease Version | xii
__dsub_rd.................................................................................................. 183__dsub_rn...................................................................................................183__dsub_ru...................................................................................................184__dsub_rz...................................................................................................184__fma_rd................................................................................................... 185__fma_rn................................................................................................... 185__fma_ru................................................................................................... 186__fma_rz....................................................................................................186
1.7. Integer Intrinsics....................................................................................... 187__brev.......................................................................................................187__brevll..................................................................................................... 187__byte_perm............................................................................................... 187__clz.........................................................................................................188__clzll....................................................................................................... 188__ffs......................................................................................................... 188__ffsll....................................................................................................... 189__funnelshift_l.............................................................................................189__funnelshift_lc........................................................................................... 189__funnelshift_r............................................................................................ 190__funnelshift_rc........................................................................................... 190__hadd...................................................................................................... 190__mul24.....................................................................................................191__mul64hi.................................................................................................. 191__mulhi..................................................................................................... 191__popc...................................................................................................... 192__popcll.....................................................................................................192__rhadd..................................................................................................... 192__sad........................................................................................................ 192__uhadd.....................................................................................................193__umul24................................................................................................... 193__umul64hi................................................................................................. 193__umulhi....................................................................................................194__urhadd....................................................................................................194__usad.......................................................................................................194
1.8. Type Casting Intrinsics................................................................................ 195__double2float_rd.........................................................................................195__double2float_rn.........................................................................................195__double2float_ru.........................................................................................195__double2float_rz.........................................................................................196__double2hiint.............................................................................................196__double2int_rd........................................................................................... 196__double2int_rn........................................................................................... 196__double2int_ru........................................................................................... 197
-
www.nvidia.comCUDA Math API vRelease Version | xiii
__double2int_rz........................................................................................... 197__double2ll_rd.............................................................................................197__double2ll_rn.............................................................................................198__double2ll_ru.............................................................................................198__double2ll_rz............................................................................................. 198__double2loint.............................................................................................198__double2uint_rd..........................................................................................199__double2uint_rn..........................................................................................199__double2uint_ru..........................................................................................199__double2uint_rz..........................................................................................200__double2ull_rd........................................................................................... 200__double2ull_rn........................................................................................... 200__double2ull_ru........................................................................................... 201__double2ull_rz............................................................................................201__double_as_longlong.................................................................................... 201__float2int_rd..............................................................................................202__float2int_rn..............................................................................................202__float2int_ru..............................................................................................202__float2int_rz..............................................................................................202__float2ll_rd............................................................................................... 203__float2ll_rn............................................................................................... 203__float2ll_ru............................................................................................... 203__float2ll_rz................................................................................................204__float2uint_rd............................................................................................ 204__float2uint_rn............................................................................................ 204__float2uint_ru............................................................................................ 204__float2uint_rz............................................................................................ 205__float2ull_rd..............................................................................................205__float2ull_rn..............................................................................................205__float2ull_ru..............................................................................................206__float2ull_rz.............................................................................................. 206__float_as_int..............................................................................................206__float_as_uint............................................................................................ 206__hiloint2double...........................................................................................207__int2double_rn........................................................................................... 207__int2float_rd..............................................................................................207__int2float_rn..............................................................................................207__int2float_ru..............................................................................................208__int2float_rz..............................................................................................208__int_as_float..............................................................................................208__ll2double_rd.............................................................................................209__ll2double_rn.............................................................................................209__ll2double_ru.............................................................................................209
-
www.nvidia.comCUDA Math API vRelease Version | xiv
__ll2double_rz............................................................................................. 209__ll2float_rd............................................................................................... 210__ll2float_rn............................................................................................... 210__ll2float_ru............................................................................................... 210__ll2float_rz................................................................................................211__longlong_as_double.................................................................................... 211__uint2double_rn..........................................................................................211__uint2float_rd............................................................................................ 211__uint2float_rn............................................................................................ 212__uint2float_ru............................................................................................ 212__uint2float_rz............................................................................................ 212__uint_as_float............................................................................................ 213__ull2double_rd........................................................................................... 213__ull2double_rn........................................................................................... 213__ull2double_ru........................................................................................... 214__ull2double_rz............................................................................................214__ull2float_rd..............................................................................................214__ull2float_rn..............................................................................................215__ull2float_ru..............................................................................................215__ull2float_rz.............................................................................................. 215
1.9. SIMD Intrinsics.......................................................................................... 216__vabs2..................................................................................................... 216__vabs4..................................................................................................... 216__vabsdiffs2................................................................................................ 216__vabsdiffs4................................................................................................ 217__vabsdiffu2................................................................................................217__vabsdiffu4................................................................................................217__vabsss2................................................................................................... 218__vabsss4................................................................................................... 218__vadd2..................................................................................................... 218__vadd4..................................................................................................... 219__vaddss2...................................................................................................219__vaddss4...................................................................................................219__vaddus2.................................................................................................. 220__vaddus4.................................................................................................. 220__vavgs2.................................................................................................... 220__vavgs4.................................................................................................... 221__vavgu2....................................................................................................221__vavgu4....................................................................................................221__vcmpeq2................................................................................................. 222__vcmpeq4................................................................................................. 222__vcmpges2................................................................................................ 222__vcmpges4................................................................................................ 223
-
www.nvidia.comCUDA Math API vRelease Version | xv
__vcmpgeu2................................................................................................ 223__vcmpgeu4................................................................................................ 223__vcmpgts2.................................................................................................224__vcmpgts4.................................................................................................224__vcmpgtu2................................................................................................ 224__vcmpgtu4................................................................................................ 225__vcmples2.................................................................................................225__vcmples4.................................................................................................225__vcmpleu2................................................................................................ 226__vcmpleu4................................................................................................ 226__vcmplts2................................................................................................. 226__vcmplts4................................................................................................. 227__vcmpltu2.................................................................................................227__vcmpltu4.................................................................................................227__vcmpne2................................................................................................. 228__vcmpne4................................................................................................. 228__vhaddu2.................................................................................................. 228__vhaddu4.................................................................................................. 229__vmaxs2................................................................................................... 229__vmaxs4................................................................................................... 229__vmaxu2...................................................................................................230__vmaxu4...................................................................................................230__vmins2....................................................................................................230__vmins4....................................................................................................231__vminu2................................................................................................... 231__vminu4................................................................................................... 231__vneg2..................................................................................................... 232__vneg4..................................................................................................... 232__vnegss2...................................................................................................232__vnegss4...................................................................................................232__vsads2.................................................................................................... 233__vsads4.................................................................................................... 233__vsadu2....................................................................................................233__vsadu4....................................................................................................234__vseteq2...................................................................................................234__vseteq4...................................................................................................234__vsetges2..................................................................................................235__vsetges4..................................................................................................235__vsetgeu2................................................................................................. 235__vsetgeu4................................................................................................. 236__vsetgts2.................................................................................................. 236__vsetgts4.................................................................................................. 236__vsetgtu2..................................................................................................237
-
www.nvidia.comCUDA Math API vRelease Version | xvi
__vsetgtu4..................................................................................................237__vsetles2.................................................................................................. 237__vsetles4.................................................................................................. 238__vsetleu2.................................................................................................. 238__vsetleu4.................................................................................................. 238__vsetlts2...................................................................................................239__vsetlts4...................................................................................................239__vsetltu2.................................................................................................. 239__vsetltu4.................................................................................................. 240__vsetne2...................................................................................................240__vsetne4...................................................................................................240__vsub2..................................................................................................... 241__vsub4..................................................................................................... 241__vsubss2................................................................................................... 241__vsubss4................................................................................................... 242__vsubus2...................................................................................................242__vsubus4...................................................................................................242
-
www.nvidia.comCUDA Math API vRelease Version | 1
Chapter 1.MODULES
Here is a list of all modules:
‣ Half Precision Intrinsics
‣ Half Arithmetic Functions‣ Half2 Arithmetic Functions‣ Half Comparison Functions‣ Half2 Comparison Functions‣ Half Precision Conversion And Data Movement‣ Half Math Functions‣ Half2 Math Functions
‣ Mathematical Functions‣ Single Precision Mathematical Functions‣ Double Precision Mathematical Functions‣ Single Precision Intrinsics‣ Double Precision Intrinsics‣ Integer Intrinsics‣ Type Casting Intrinsics‣ SIMD Intrinsics
1.1. Half Precision IntrinsicsThis section describes half precision intrinsic functions that are only supported in devicecode. To use these functions include the header file cuda_fp16.h in your program.
Half Arithmetic Functions
Half2 Arithmetic Functions
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 2
Half Comparison Functions
Half2 Comparison Functions
Half Precision Conversion And Data Movement
Half Math Functions
Half2 Math Functions
1.1.1. Half Arithmetic FunctionsHalf Precision Intrinsics
To use these functions include the header file cuda_fp16.h in your program.
__device__ __half2 __h2div (const __half2 a, const __half2 b)Performs half2 vector division in round-to-nearest-even mode.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
elementwise division of a with b.
Description
Divides half2 input vector a by input vector b in round-to-nearest mode.DEEPLEARN-SRM_REQ-103
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 3
__device__ __half __hadd (const __half a, const __half b)Performs half addition in round-to-nearest-even mode.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
sum of a and b.
Description
Performs half addition of inputs a and b, in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-94
__device__ __half __hadd_sat (const __half a, const __half b)Performs half addition in round-to-nearest-even mode, with saturation to [0.0, 1.0].
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
sum of a and b, with respect to saturation.
Description
Performs half add of inputs a and b, in round-to-nearest-even mode, and clamps theresult to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 4
__device__ __half __hdiv (const __half a, const __half b)Performs half division in round-to-nearest-even mode.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
result of dividing a by b.
Description
Divides half input a by input b in round-to-nearest mode. DEEPLEARN-SRM_REQ-98
__device__ __half __hfma (const __half a, const __half b, const__half c)Performs half fused multiply-add in round-to-nearest-even mode.
Parameters
a- half. Is only being read.
b- half. Is only being read.
c- half. Is only being read.
Returns
half
‣ The
result of fused multiply-add operation on a, b, and c.
Description
Performs half multiply on inputs a and b, then performs a half add of the result withc, rounding the result once in round-to-nearest-even mode. DEEPLEARN-SRM_REQ-96
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 5
__device__ __half __hfma_sat (const __half a, const __half b, const__half c)Performs half fused multiply-add in round-to-nearest-even mode, with saturation to[0.0, 1.0].
Parameters
a- half. Is only being read.
b- half. Is only being read.
c- half. Is only being read.
Returns
half
‣ The
result of fused multiply-add operation on a, b, and c, with respect to saturation.
Description
Performs half multiply on inputs a and b, then performs a half add of the result withc, rounding the result once in round-to-nearest-even mode, and clamps the result torange [0.0, 1.0]. NaN results are flushed to +0.0. TBR
__device__ __half __hmul (const __half a, const __half b)Performs half multiplication in round-to-nearest-even mode.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
result of multiplying a and b.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 6
Description
Performs half multiplication of inputs a and b, in round-to-nearest mode.DEEPLEARN-SRM_REQ-99
__device__ __half __hmul_sat (const __half a, const __half b)Performs half multiplication in round-to-nearest-even mode, with saturation to [0.0,1.0].
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
result of multiplying a and b, with respect to saturation.
Description
Performs half multiplication of inputs a and b, in round-to-nearest mode, and clampsthe result to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
__device__ __half __hneg (const __half a)Negates input half number and returns the result.
Parameters
a- half. Is only being read.
Returns
half
‣ minus
a
Description
Negates input half number and returns the result. DEEPLEARN-SRM_REQ-100
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 7
__device__ __half __hsub (const __half a, const __half b)Performs half subtraction in round-to-nearest-even mode.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
result of subtracting b from a.
Description
Subtracts half input b from input a in round-to-nearest mode. DEEPLEARN-SRM_REQ-97
__device__ __half __hsub_sat (const __half a, const __half b)Performs half subtraction in round-to-nearest-even mode, with saturation to [0.0, 1.0].
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
half
‣ The
result of subtraction of b from a, with respect to saturation.
Description
Subtracts half input b from input a in round-to-nearest mode, and clamps the result torange [0.0, 1.0]. NaN results are flushed to +0.0. TBR
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 8
1.1.2. Half2 Arithmetic FunctionsHalf Precision Intrinsics
To use these functions include the header file cuda_fp16.h in your program.
__device__ __half2 __hadd2 (const __half2 a, const __half2 b)Performs half2 vector addition in round-to-nearest-even mode.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
sum of vectors a and b.
Description
Performs half2 vector add of inputs a and b, in round-to-nearest mode. DEEPLEARN-SRM_REQ-95
__device__ __half2 __hadd2_sat (const __half2 a, const __half2 b)Performs half2 vector addition in round-to-nearest-even mode, with saturation to [0.0,1.0].
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
sum of a and b, with respect to saturation.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 9
Description
Performs half2 vector add of inputs a and b, in round-to-nearest mode, and clamps theresults to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
__device__ __half2 __hfma2 (const __half2 a, const __half2 b, const__half2 c)Performs half2 vector fused multiply-add in round-to-nearest-even mode.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
c- half2. Is only being read.
Returns
half2
‣ The
result of elementwise fused multiply-add operation on vectors a, b, and c.
Description
Performs half2 vector multiply on inputs a and b, then performs a half2 vectoradd of the result with c, rounding the result once in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-105
__device__ __half2 __hfma2_sat (const __half2 a, const __half2 b,const __half2 c)Performs half2 vector fused multiply-add in round-to-nearest-even mode, withsaturation to [0.0, 1.0].
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
c- half2. Is only being read.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 10
Returns
half2
‣ The
result of elementwise fused multiply-add operation on vectors a, b, and c, withrespect to saturation.
Description
Performs half2 vector multiply on inputs a and b, then performs a half2 vectoradd of the result with c, rounding the result once in round-to-nearest-even mode, andclamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
__device__ __half2 __hmul2 (const __half2 a, const __half2 b)Performs half2 vector multiplication in round-to-nearest-even mode.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
result of elementwise multiplying the vectors a and b.
Description
Performs half2 vector multiplication of inputs a and b, in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-102
__device__ __half2 __hmul2_sat (const __half2 a, const __half2 b)Performs half2 vector multiplication in round-to-nearest-even mode, with saturation to[0.0, 1.0].
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 11
Returns
half2
‣ The
result of elementwise multiplication of vectors a and b, with respect to saturation.
Description
Performs half2 vector multiplication of inputs a and b, in round-to-nearest-even mode,and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
__device__ __half2 __hneg2 (const __half2 a)Negates both halves of the input half2 number and returns the result.
Parameters
a- half2. Is only being read.
Returns
half2
‣ Returns
a with both halves negated.
Description
Negates both halves of the input half2 number a and returns the result. DEEPLEARN-SRM_REQ-101
__device__ __half2 __hsub2 (const __half2 a, const __half2 b)Performs half2 vector subtraction in round-to-nearest-even mode.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 12
subtraction of vector b from a.
Description
Subtracts half2 input vector b from input vector a in round-to-nearest-even mode.DEEPLEARN-SRM_REQ-104
__device__ __half2 __hsub2_sat (const __half2 a, const __half2 b)Performs half2 vector subtraction in round-to-nearest-even mode, with saturation to[0.0, 1.0].
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
half2
‣ The
subtraction of vector b from a, with respect to saturation.
Description
Subtracts half2 input vector b from input vector a in round-to-nearest-even mode, andclamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0. TBR
1.1.3. Half Comparison FunctionsHalf Precision Intrinsics
To use these functions include the header file cuda_fp16.h in your program.
__device__ bool __heq (const __half a, const __half b)Performs half if-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 13
Returns
bool
‣ The
boolean result of if-equal comparison of a and b.
Description
Performs half if-equal comparison of inputs a and b. NaN inputs generate false results.TBR
__device__ bool __hequ (const __half a, const __half b)Performs half unordered if-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered if-equal comparison of a and b.
Description
Performs half if-equal comparison of inputs a and b. NaN inputs generate true results.TBR
__device__ bool __hge (const __half a, const __half b)Performs half greater-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 14
‣ The
boolean result of greater-equal comparison of a and b.
Description
Performs half greater-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR
__device__ bool __hgeu (const __half a, const __half b)Performs half unordered greater-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered greater-equal comparison of a and b.
Description
Performs half greater-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR
__device__ bool __hgt (const __half a, const __half b)Performs half greater-than comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of greater-than comparison of a and b.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 15
Description
Performs half greater-than comparison of inputs a and b. NaN inputs generate falseresults. TBR
__device__ bool __hgtu (const __half a, const __half b)Performs half unordered greater-than comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered greater-than comparison of a and b.
Description
Performs half greater-than comparison of inputs a and b. NaN inputs generate trueresults. TBR
__device__ int __hisinf (const __half a)Checks if the input half number is infinite.
Parameters
a- half. Is only being read.
Returns
int
‣ -1
iff a is equal to negative infinity,‣ 1
iff a is equal to positive infinity,‣ 0
otherwise.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 16
Description
Checks if the input half number a is infinite. TBR
__device__ bool __hisnan (const __half a)Determine whether half argument is a NaN.
Parameters
a- half. Is only being read.
Returns
bool
‣ true
iff argument is NaN.
Description
Determine whether half value a is a NaN. TBR
__device__ bool __hle (const __half a, const __half b)Performs half less-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of less-equal comparison of a and b.
Description
Performs half less-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 17
__device__ bool __hleu (const __half a, const __half b)Performs half unordered less-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered less-equal comparison of a and b.
Description
Performs half less-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR
__device__ bool __hlt (const __half a, const __half b)Performs half less-than comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of less-than comparison of a and b.
Description
Performs half less-than comparison of inputs a and b. NaN inputs generate falseresults. TBR
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 18
__device__ bool __hltu (const __half a, const __half b)Performs half unordered less-than comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered less-than comparison of a and b.
Description
Performs half less-than comparison of inputs a and b. NaN inputs generate trueresults. TBR
__device__ bool __hne (const __half a, const __half b)Performs half not-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of not-equal comparison of a and b.
Description
Performs half not-equal comparison of inputs a and b. NaN inputs generate falseresults. TBR
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 19
__device__ bool __hneu (const __half a, const __half b)Performs half unordered not-equal comparison.
Parameters
a- half. Is only being read.
b- half. Is only being read.
Returns
bool
‣ The
boolean result of unordered not-equal comparison of a and b.
Description
Performs half not-equal comparison of inputs a and b. NaN inputs generate trueresults. TBR
1.1.4. Half2 Comparison FunctionsHalf Precision Intrinsics
To use these functions include the header file cuda_fp16.h in your program.
__device__ bool __hbeq2 (const __half2 a, const __half2 b)Performs half2 vector if-equal comparison, and returns boolean true iff both halfresults are true, boolean false otherwise.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
bool
‣ trueif
both half results of if-equal comparison of vectors a and b are true;‣ falseotherwise.
-
Modules
www.nvidia.comCUDA Math API vRelease Version | 20
Description
Performs half2 vector if-equal comparison of inputs a and b. The bool result is set totrue only if both half if-equal comparisons evaluate to true, or false otherwise. NaNinputs generate false results. TBR
__device__ bool __hbequ2 (const __half2 a, const __half2 b)Performs half2 vector unordered if-equal comparison, and returns boolean true iff bothhalf results are true, boolean false otherwise.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
bool
‣ trueif
both half results of unordered if-equal comparison of vectors a and b are true;‣ falseotherwise.
Description
Performs half2 vector if-equal comparison of inputs a and b. The bool result is set totrue only if both half if-equal comparisons evaluate to true, or false otherwise. NaNinputs generate true results. TBR
__device__ bool __hbge2 (const __half2 a, const __half2 b)Performs half2 vector greater-equal comparison, and returns boolean true iff bothhalf results are true, boolean false otherwise.
Parameters
a- half2. Is only being read.
b- half2. Is only being read.
Returns
boo