Transcript
Page 1: Computer Graphics 3 Lecture 4: GPU Programming

Computer Graphics 3Lecture 4:

GPU Programming

Benjamin Mora 1University of Wales

Swansea

Dr. Benjamin Mora

Page 2: Computer Graphics 3 Lecture 4: GPU Programming

Content

2Benjamin MoraUniversity of Wales

Swansea

• Introduction.

• Vertex and Fragment Programs.

• Programming the GPU.

– Assembly Code.

– High Level Languages.

• Example of applications.

• Conclusion.

Page 3: Computer Graphics 3 Lecture 4: GPU Programming

Introduction

3Benjamin MoraUniversity of Wales

Swansea

Page 4: Computer Graphics 3 Lecture 4: GPU Programming

Introduction

4Benjamin MoraUniversity of Wales

Swansea

• OpenGL (SGI) early oriented the design of current graphics processors (GPUs).– Fixed pipeline.

• Once the different tests are passed, the fragment color is replaced by the new (textured & interpolated) one.

– Not realistic enough.• The graphics pipeline is fed with Primitives like Triangles,

Points, etc… that are rasterized.• Two main stages:

– Vertex processing.– Fragment (rasterized pixel) processing.

• These 2 stages have been extended for more realism.

Page 5: Computer Graphics 3 Lecture 4: GPU Programming

Introduction

5Benjamin MoraUniversity of Wales

Swansea

• Latest evolutions– Unified shaders.

• Automatic graphical units balancing between vertex and fragment programs.

• The lower the image size is, the more cpu and vertex bound the program is.

• The greater the image-size is, the more fragment/pixel bound the program is.

– Anti-aliasing and texture filtering parameters also contribute to this.

– Geometry shaders discussed separately.

Page 6: Computer Graphics 3 Lecture 4: GPU Programming

Vertex and Fragments Programs

6Benjamin MoraUniversity of Wales

Swansea

Page 7: Computer Graphics 3 Lecture 4: GPU Programming

Vertex and Fragment Programs

7Benjamin MoraUniversity of Wales

Swansea

Daniel Weiskopf, Basics of GPU-Based Programming,

http://www.vis.uni-stuttgart.de/vis04_tutorial/vis04_weiskopf_intro_gpu.pdf

Page 8: Computer Graphics 3 Lecture 4: GPU Programming

Vertex and Fragment Programs

8Benjamin MoraUniversity of Wales

Swansea

Setup

Rasterization

Frame Buffer Blending

Texture Fetch, Fragment Shading

Tests (z, stencil…)

Vertices

Transform And Lighting

Vertex Programs:User-Defined Vertex

Processing

Fragment Programs:User-Defined

Per-Pixel Processing

Page 9: Computer Graphics 3 Lecture 4: GPU Programming

Programming the GPU

9Benjamin MoraUniversity of Wales

Swansea

Page 10: Computer Graphics 3 Lecture 4: GPU Programming

Programming the GPU

10Benjamin MoraUniversity of Wales

Swansea

• Low Level languages (Pseudo-assembler).– Help to understand what is possible on the GPU.– Large code is a pain to maintain/optimize.– May be specific to the graphics card

generation/supplier.

• High Level languages.– Easier to write.– Early compilers were not very good.– Code may be more compatible.

• Loops.

Page 11: Computer Graphics 3 Lecture 4: GPU Programming

Current Low Level Languages (APIs)

11Benjamin MoraUniversity of Wales

Swansea

• DirectX 9.– Vertex shader 2.0.– Pixel shader 2.0.

• OpenGL extensions.– GL_ARB_vertex_program.– GL_ARB_fragment_program.

• Vendor APIs– NVidia vertex and fragment program.

Page 12: Computer Graphics 3 Lecture 4: GPU Programming

Current High Level Languages (APIs)

12Benjamin MoraUniversity of Wales

Swansea

• Microsoft, ATI.– High Level Shading Language (HLSL).

• NVidia.– Cg.

• OpenGL Shading Language.

Page 13: Computer Graphics 3 Lecture 4: GPU Programming

How to use them?

13Benjamin MoraUniversity of Wales

Swansea

• Assembly programs:– Can be loaded (and compiled) at run-time

(OpenGL).– Several programs can be loaded at once.

• Applying the suitable rendering style (i.e. program) to every scene primitive.

• Avoid latency due to pseudo-assembly compilation.

• High level Programs:– Must be compiled before run-time.– The resulting (pseudo) assembly code can then

be used.

Page 14: Computer Graphics 3 Lecture 4: GPU Programming

Vertex Programs

14Benjamin MoraUniversity of Wales

Swansea

• Vertex Program.– Bypass the T&L unit.– GPU instruction set to perform all vertex math.– Input: arbitrary vertex attributes.– Output: a transformed vertex attributes.

• homogeneous clip space position (required).• colors (front/back, primary/secondary).• fog coord.• texture coordinates.• Point size.

Page 15: Computer Graphics 3 Lecture 4: GPU Programming

Vertex Programs

15Benjamin MoraUniversity of Wales

Swansea

• Customized computation of vertex attributes– Computation of anything that can be interpolated

linearly between vertices.

• Limitations:– Vertices can neither be generated nor destroyed.

• Geometry shader for that.

– No information about topology or ordering of vertices is available.

Page 16: Computer Graphics 3 Lecture 4: GPU Programming

Vertex Programs

16Benjamin MoraUniversity of Wales

Swansea

• Vertex programs bypass the following OpenGL functionalities:– Vertex transformations.

• The modelview and projection matrix transformations.

– Normal transformations and normalizations.

– Color material.

– Per-vertex lighting.

– Texture coordinate generation.

– Texture matrix transformations.

– Raster position transformation.

– Client-defined clip planes.

– Per-vertex processing in EXT_point_parameters.

– Per-vertex processing in NV_fog_distance.

– Per-vertex point size computations.

Page 17: Computer Graphics 3 Lecture 4: GPU Programming

Vertex Programs

17Benjamin MoraUniversity of Wales

Swansea

• What is not replaced?– The view frustum clip.– Perspective divide (division by w).– The viewport transformation.– The depth range transformation.– Clamping the primary and secondary color to

[0,1].– Primitive assembly and per-fragment operations.– Evaluator (except the AUTO_NORMAL

normalization).

Page 18: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

18Benjamin MoraUniversity of Wales

Swansea

• Different Versions: 1.0,1.1, 2.0, 3.0.

• Version 1.0:– 12 temporary vectorial registers (xyzw): R0 =>

R11.– 96 Read-Only vectorial registers (xyzw).

• Specified outside of glBegin/glEnd.

– 8 Matrices.– 17 Different Vertex Programs instructions.

• (128 instruction Max. inside the program.)• 27 in shader 3.0 model.

Page 19: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

19Benjamin MoraUniversity of Wales

Swansea

• Input Parameters for the vertices (v[]):

Mnemonic Number Typical Meaning– OPOS 0 object position

– WGHT 1 vertex weight

– NRML 2 normal

– COL0 3 primary color

– COL1 4 secondary color

– FOGC 5 fog coordinate

– TEX0 8 texture coordinate 0

– TEX1 9 texture coordinate 1

– TEX2 10 texture coordinate 2

– TEX3 11 texture coordinate 3

– TEX4 12 texture coordinate 4

– TEX5 13 texture coordinate 5

– TEX6 14 texture coordinate 6

– TEX7 15 texture coordinate 7

Page 20: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

20Benjamin MoraUniversity of Wales

Swansea

• New Output Values for the vertices (o[]):

Mnemonic Typical Meaning– HPOS Homogeneous clip space position (x,y,z,w)

– COL0 Primary color (front-facing) (r,g,b,a)

– COL1 Secondary color (front-facing) (r,g,b,a)

– BFC0 Back-facing primary color (r,g,b,a)

– BFC1 Back-facing secondary color (r,g,b,a)

– FOGC Fog coordinate (f,*,*,*)

– PSIZ Point size (p,*,*,*)

– TEX0 Texture coordinate set 0 (s,t,r,q)

– TEX1 Texture coordinate set 1 (s,t,r,q)

– TEX2 Texture coordinate set 2 (s,t,r,q)

– TEX3 Texture coordinate set 3 (s,t,r,q)

– TEX4 Texture coordinate set 4 (s,t,r,q)

– TEX5 Texture coordinate set 5 (s,t,r,q)

– TEX6 Texture coordinate set 6 (s,t,r,q)

– TEX7 Texture coordinate set 7 (s,t,r,q)

Page 21: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

21Benjamin MoraUniversity of Wales

Swansea

• Vertex Program Instructions: OpCode Inputs Output Operation

(scalar or vector) (vector or replicated scalar)

ARL s address register address register load

MOV v v move

MUL v,v v multiply

ADD v,v v add

MAD v,v,v v multiply and add

RCP s ssss reciprocal

RSQ s ssss reciprocal square root

DP3 v,v ssss 3-component dot product

DP4 v,v ssss 4-component dot product

DST v,v v distance vector

MIN v,v v minimum

MAX v,v v maximum

SLT v,v v set on less than

SGE v,v v set on greater equal than

EXP s v (ssss?) exponential base 2

LOG s v (ssss?) logarithm base 2

LIT v v light coefficients

Page 22: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

22Benjamin MoraUniversity of Wales

Swansea

• Special Instruction Manipulation: – Use of Negated Values:

• MOV R0,-R1;• ADD R0,R1,-R2; # R0 <= R1-R2 (vectorial operation.)

– Registers can be Swizzled:• MOV R1,R1.wzyx;• ADDR R1,R1,R1.xzxy;

x y z w– Old R1:

– New R1:

1 3 7 11

2 10 8 14

Page 23: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

23Benjamin MoraUniversity of Wales

Swansea

• Example: Normal Normalization.

# v[NRML] = (nx,ny,nz)

#

# R0.xyz = normalize(v[NRML])

# R0.w = 1/sqrt(nx*nx + ny*ny + nz*nz)

#

!!VP1.0

MOV R1, v[NRML] ;

DP3 R0.w, R1, R1;

RSQ R0.w, R0.w;

MUL R0.xyz, R1, R0.wwww;

# Then use R0 to compute shading...

MOV o[COL0],...

Page 24: Computer Graphics 3 Lecture 4: GPU Programming

NV Vertex Programs

24Benjamin MoraUniversity of Wales

Swansea

#simple specular and diffuse lighting computation with an eye-space normal?!!VP1.0

#

# c[0-3] = modelview projection (composite) matrix

# c[4-7] = modelview inverse transpose

# c[32] = normalized eye-space light direction (infinite light)

# c[33] = normalized constant eye-space half-angle vector (infinite viewer)

# c[35].x = pre-multiplied monochromatic diffuse light color & diffuse material

# c[35].y = pre-multiplied monochromatic ambient light color & diffuse material

# c[36] = specular color

# c[38].x = specular power

#

# outputs homogenous position and color

#

DP4 o[HPOS].x, c[0], v[OPOS];

DP4 o[HPOS].y, c[1], v[OPOS];

DP4 o[HPOS].z, c[2], v[OPOS];

DP4 o[HPOS].w, c[3], v[OPOS];

DP3 R0.x, c[4], v[NRML];

DP3 R0.y, c[5], v[NRML];

DP3 R0.z, c[6], v[NRML]; # R0 = n' = transformed normal

DP3 R1.x, c[32], R0; # R1.x = Lpos DOT n'

DP3 R1.y, c[33], R0; # R1.y = hHat DOT n'

MOV R1.w, c[38].x; # R1.w = specular power

LIT R2, R1; # Compute lighting values

MAD R3, c[35].x, R2.y, c[35].y; # diffuse + emissive

MAD o[COL0].xyz, c[36], R2.z, R3; # + specular

END

Page 25: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

25Benjamin MoraUniversity of Wales

Swansea

• Similar to the Vertex Programs.– Same way to load programs.– Inputs and Outputs are differents. – Different Set of instructions.

• More instructions, but tend to be the same…

• Versions available: 1.0, 2.0, and 4.0.– 64 constant vector registers.– 32 32-bit floating point precision registers or 64

16-bit floating point precision registers.

Page 26: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

26Benjamin MoraUniversity of Wales

Swansea

Fragment Program Inputs

Register Name Descriptionf[WPOS] Position of the fragment center. (x,y,z,1/w)

f[COL0] Interpolated primary color (r,g,b,a)

f[COL1] Interpolated secondary color (r,g,b,a)

f[FOGC] Interpolated fog distance/coord (z,0,0,0)

f[TEX0] Texture coordinate (unit 0) (s,t,r,q)

f[TEX1] Texture coordinate (unit 1) (s,t,r,q)

f[TEX2] Texture coordinate (unit 2) (s,t,r,q)

f[TEX3] Texture coordinate (unit 3) (s,t,r,q)

f[TEX4] Texture coordinate (unit 4) (s,t,r,q)

f[TEX5] Texture coordinate (unit 5) (s,t,r,q)

f[TEX6] Texture coordinate (unit 6) (s,t,r,q)

f[TEX7] Texture coordinate (unit 7) (s,t,r,q)

Page 27: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

27Benjamin MoraUniversity of Wales

Swansea

Fragment Program Outputs

Register Name Description

o[COLR] Final RGBA fragment color, fp32 format (color programs)

o[COLH] Final RGBA fragment color, fp16 format (color programs)

o[DEPR] Final fragment depth value, fp32 format

o[TEX0] TEXTURE0 output, fp16 format (combiner programs)

o[TEX1] TEXTURE1 output, fp16 format (combiner programs)

o[TEX2] TEXTURE2 output, fp16 format (combiner programs)

o[TEX3] TEXTURE3 output, fp16 format (combiner programs)

Write access only!

Page 28: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

28Benjamin MoraUniversity of Wales

Swansea

Fragment Program Instruction Set (V2.0)Instruction Inputs Output Description

ADD[RHX][C][_SAT] v,v v add

COS[RH ][C][_SAT] s ssss cosine

DDX[RH ][C][_SAT] v v derivative relative to x

DDY[RH ][C][_SAT] v v derivative relative to y

DP3[RHX][C][_SAT] v,v ssss 3-component dot product

DP4[RHX][C][_SAT] v,v ssss 4-component dot product

DST[RH ][C][_SAT] v,v v distance vector

EX2[RH ][C][_SAT] s ssss exponential base 2

FLR[RHX][C][_SAT] v v floor

FRC[RHX][C][_SAT] v v fraction

KIL none none conditionally discard fragment

LG2[RH ][C][_SAT] s ssss logarithm base 2

LIT[RH ][C][_SAT] v v compute light coefficients

LRP[RHX][C][_SAT] v,v,v v linear interpolation

MAD[RHX][C][_SAT] v,v,v v multiply and add

MAX[RHX][C][_SAT] v,v v maximum

MIN[RHX][C][_SAT] v,v v minimum

MOV[RHX][C][_SAT] v v move

MUL[RHX][C][_SAT] v,v v multiply

PK2H v ssss pack two 16-bit floats

PK2US v ssss pack two unsigned 16-bit scalars

PK4B v ssss pack four signed 8-bit scalars

PK4UB v ssss pack four unsigned 8-bit scalars

POW[RH ][C][_SAT] s,s ssss exponentiation (x^y)

Page 29: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

29Benjamin MoraUniversity of Wales

Swansea

Fragment Program Instruction Set (V2.0)Instruction Inputs Output Description

RCP[RH ][C][_SAT] s ssss reciprocal

RFL[RH ][C][_SAT] v,v v reflection vector

RSQ[RH ][C][_SAT] s ssss reciprocal square root

SEQ[RHX][C][_SAT] v,v v set on equal

SFL[RHX][C][_SAT] v,v v set on false

SGE[RHX][C][_SAT] v,v v set on greater than or equal

SGT[RHX][C][_SAT] v,v v set on greater than

SIN[RH ][C][_SAT] s ssss sine

SLE[RHX][C][_SAT] v,v v set on less than or equal

SLT[RHX][C][_SAT] v,v v set on less than

SNE[RHX][C][_SAT] v,v v set on not equal

STR[RHX][C][_SAT] v,v v set on true

SUB[RHX][C][_SAT] v,v v subtract

TEX[C][_SAT] v v texture lookup

TXD[C][_SAT] v,v, v v texture lookup w/partials

TXP[C][_SAT] v v projective texture lookup

UP2H[C][_SAT] s v unpack two 16-bit floats

UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars

UP4B[C][_SAT] s v unpack four signed 8-bit scalars

UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars

X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation

Page 30: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

30Benjamin MoraUniversity of Wales

Swansea

• Simple Example: Red Colouring of the fragments (i.e., rasterized pixels):

!!FP1.0

DEFINE red={1.0,0,0,0};

MOV o[COLR], red;

END

• Simple Example: Applying Single Texturing.!!FP1.0

TEX R0, f[TEX0],TEX0, 2D; //Last Parameter can be 1D,2D,3D,RECT

MOV o[COLR],R0;

END

Page 31: Computer Graphics 3 Lecture 4: GPU Programming

NV Fragment Programs

31Benjamin MoraUniversity of Wales

Swansea

• Useful Instructions:– LRP: Linear Interpolation.– SIN, COS…– SGE,SLT, … : Set the comparison flags.– KILL : Stop the pixel computation.– Pack and Unpack instructions.

• Most instructions are done in 1 cycle (without allowing for texture access).

• Most instructions can conditionally update the result according the comparison flags (e.g., MOV => MOVC)

• Most instructions can clamp the results between 0 and 1.– MOV => MOV_SAT.

• Loops are now possible with the latest generation.

Page 32: Computer Graphics 3 Lecture 4: GPU Programming

(Silly) Limitations

32Benjamin MoraUniversity of Wales

Swansea

• Most of the limitations are for performance reasons.• At the fragment level, there is no real possibility to access

the frame-buffer in read-write mode.– The new pixel value cannot be computed from the old one.– Floating-point precision filtering and blending only available in recent

graphics cards (NV 8x00 generation). Previous cards (e.g., GeForce 7800 series) could only filter and blend at a FP16 precision.

– Actual number of registers may be less than the number of logical registers.

• Slower programs if a large number of registers is used.

Page 33: Computer Graphics 3 Lecture 4: GPU Programming

High Level Languages

33Benjamin MoraUniversity of Wales

Swansea

• Why ?– Assembly programming can be tedious when having long

assembly shaders.– Inefficient or difficult programming and debugging

operations.– High-level languages are more portable.

• But:– Final code may be slower.

Page 34: Computer Graphics 3 Lecture 4: GPU Programming

High Level Languages: Cg Overview

34Benjamin MoraUniversity of Wales

Swansea

• C for Graphics.– Syntax similar to C for easy shader writing.– See CG manual.

http://developer.nvidia.com/object/cg_toolkit.html

• The Vertex and Fragments programs take specific input vectors and values, and have to return specific outputs.

• Need to declare data structures that will be input and output parameters of a function.

Page 35: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Inputs

35Benjamin MoraUniversity of Wales

Swansea

• Two kinds of shader inputs:

– Varying Inputs.• Inputs that are specific to each entity processed.

– Vertex: Position, Normals, etc…– Fragment: Interpolated values like colors, texture

coordinates, etc…

– Uniform Inputs.• Values that do not change when streaming vertices.

– Vertex level: Transformation Matrix.– Fragment Level: Constant parameters,…

Page 36: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Inputs

36Benjamin MoraUniversity of Wales

Swansea

• Supported Inputs to a CG Vertex Program (Binding semantics).– POSITION .– BLENDWEIGHT.– NORMAL.– TANGENT.– BINORMAL. – PSIZE.– BLENDINDICES. – TEXCOORD0—TEXCOORD7.

• Every parameter can be declared as a float array with a range of 1 to 4 components. (float, float4,…).– float3 myPosition : POSITION;

Page 37: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Inputs

37Benjamin MoraUniversity of Wales

Swansea

• Example from the CG user Manual.struct myinputs {

float3 myPosition : POSITION;

float3 myNormal : NORMAL;

float3 myTangent : TANGENT;

float refractive_index : TEXCOORD3;

};

outdata foo(myinputs indata) {

/* ... */

// Within the program, the parameters are referred to as

// “indata.myPosition”, “indata.myNormal”, and so on.

/* ... */

}

Page 38: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Inputs

38Benjamin MoraUniversity of Wales

Swansea

• Inputs can be directly specified (rather than using a struct operator).

• Example from the CG user Manual:

outdata foo( float3 myPosition : POSITION,

float3 myNormal : NORMAL,

float3 myTangent : TANGENT,

float refractive_index : TEXCOORD3) {

/* ... */

}

Page 39: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Varying Output

39Benjamin MoraUniversity of Wales

Swansea

• The vertex program output type should match the fragment programs input type.

• The binding semantics will help the compiler to associate the vertex output to the fragment input (interoperability).

• The semantics do not actually impose a specific use for those channels.– Texture coordinates can be used to specify colors or

locations for example.

Page 40: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Varying Output

40Benjamin MoraUniversity of Wales

Swansea

• Supported outputs to a Vertex Program.– POSITION.– PSIZE.– FOG.– COLOR0–COLOR1.– TEXCOORD0–TEXCOORD7.

Page 41: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Vertex Program Varying Output

41Benjamin MoraUniversity of Wales

Swansea

• Example from the CG user Manual:// Vertex program (inside a CG file…)

struct myvf {

float4 pout : POSITION; // Used for rasterization

float4 diffusecolor : COLOR0;

float4 uv0 : TEXCOORD0;

float4 uv1 : TEXCOORD1;

};

myvf foo(/* ... */) {

myvf outstuff;

/* ... */

return outstuff;

}

Page 42: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Input/Output Interoperability

42Benjamin MoraUniversity of Wales

Swansea

• Example from the CG user Manual:struct myvert2frag {

float4 pos : POSITION;

float4 uv0 : TEXCOORD0;

float4 uv1 : TEXCOORD1;

};

// Vertex program

myvert2frag vertmain(...) {

myvert2frag outdata;

/* ... */

return outdata;

}

// Fragment program

void fragmain(myvert2frag indata ) {

float4 tcoord = indata.uv0;

/* ... */

}

Page 43: Computer Graphics 3 Lecture 4: GPU Programming

Cg: Fragment Program Varying Output

43Benjamin MoraUniversity of Wales

Swansea

• Two supported outputs: COLOR and DEPTH.• Examples:

void main(/* ... */, out float4 color : COLOR, out float depth : DEPTH) {

/* ...*/

color = diffuseColor * /* ...*/;

depth = /*...*/;

}

float4 main(/* ... */) : COLOR {

/* ... */

return diffuseColor * /* ... */;

}

Page 44: Computer Graphics 3 Lecture 4: GPU Programming

Cg: General Coding

44Benjamin MoraUniversity of Wales

Swansea

• Different type of variables are supported and declarable:– float, half (16 bits), fixed (12 bits).– int, bool.– float1, float4, bool4, bool1,…– float1x1, float2x2,…– Arrays.

• Can declare auxiliary functions.• A wide set of functions and operators is also

available.

Page 45: Computer Graphics 3 Lecture 4: GPU Programming

Cg: General Coding

45Benjamin MoraUniversity of Wales

Swansea

• Control flow.– if, else, while, for.

• Function definitions and function overloads.• Arithmetic operators from C.• Multiplication function.

– MatrixxVector, VectorxMatrix, MatrixxMatrix.

• Vector constructor.• Boolean and comparison operators.• Swizzle operator.

– float4 a; =>a.xxxx;

• Write mask operator.– float4 color = float4(1.0, 1.0, 0.0, 0.0); color.a=2.0;

• Conditional operator.

Page 46: Computer Graphics 3 Lecture 4: GPU Programming

Cg: General Coding

46Benjamin MoraUniversity of Wales

Swansea

• Standard nonprojective texture lookup:– tex2D (sampler2D tex, float2 s);– texRECT (samplerRECT tex, float2 s);– texCUBE (samplerCUBE tex, float3 s);

• Standard projective texture lookup:– tex2Dproj (sampler2D tex, float3 sq);– texRECTproj (samplerRECT tex, float3 sq);– texCUBEproj (samplerCUBE tex, float4 sq);

• Math functions:– abs, cos, sin, tan, acos, asin, atan, clamp, determinant,

exp, log, floor, lerp, min, max, pow, sqrt, normalize, …

Page 47: Computer Graphics 3 Lecture 4: GPU Programming

Applications

47Benjamin MoraUniversity of Wales

Swansea

Page 48: Computer Graphics 3 Lecture 4: GPU Programming

Application: Procedural Texturing

48Benjamin MoraUniversity of Wales

Swansea

ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/

• Application of textures that are not image based. – Combination of noise and various math expressions.

(Perlin Noise.)– Representation of Wood, Marble,

Stone, Clouds, Waves, Bumps…– Can be computed at the fragment level.– Adds computations, but reduces

bandwidth.– Suppresses the issue of texturing

curved surfaces.

Page 49: Computer Graphics 3 Lecture 4: GPU Programming

Application: Phong Shading

49Benjamin MoraUniversity of Wales

Swansea

ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/

• Traditional OpenGL pipeline implements Gouraud (shading) interpolation.– Computation of colors and lighting at the vertices,

followed by a linear interpolation.– Can miss the specular highlights that can occur in the

middle of a triangle.

• Phong interpolation is better.– Linearly interpolate the normal across the triangle first.– Then compute Phong shading from the interpolated

normal.

Page 50: Computer Graphics 3 Lecture 4: GPU Programming

Application: Phong Shading

50Benjamin MoraUniversity of Wales

Swansea

Ian Fergusson, https://www.cis.strath.ac.uk/teaching/ug/classes/52.359/lect13.pdf

Page 51: Computer Graphics 3 Lecture 4: GPU Programming

Application: Phong Shading

51Benjamin MoraUniversity of Wales

Swansea

• How to realize a Phong interpolation ?

– Pass the normal as a texture coordinate at the vertex level.

– The texture coordinates will be automatically interpolated at the fragment level.

– Normalize the normal in the fragment program first, and then compute a Phong shading.

Page 52: Computer Graphics 3 Lecture 4: GPU Programming

Other Applications

52Benjamin MoraUniversity of Wales

Swansea

• Bump Mapping.– Can be done at the vertex or at the fragment level.

• Volume Rendering.– Use of 3D textures.

• GPGPU.– General Processing on Graphics Processor Unit.– A lot of GFLOPS…– Scientific calculations like Fourier transforms.

• Geometry modification (Animation, Morphing…).


Top Related