programmability features of graphics hardware
TRANSCRIPT
2GH Programmability
Michael Doggett
OutlineGraphics HardwareTransform Stage
Vertex EngineOpenGL ARB vertex program
Fragment StagePixel Pipeline
OpenGL ARB fragment programExamples
MandelbrotFFTDisplacement Mapping
Programming OptionsOpenGL Shading Language overview
Computation on GPUs
3GH Programmability
Michael Doggett
Graphics Hardware
Hardware pipeline
Rasterization
Surface
FrameBuffer
Fragment
Transform
4GH Programmability
Michael Doggett
Graphics Hardware
Hardware pipelineBased on RADEON 9800
380MHzSIMD stages
TransformFragment
Highly parallel4 component vector registers and operationsHigh computation/bandwidth ratio
Arithmetic intensity
Rasterization
Surface
Fragment
TransformT T T T
F F F FF F F F
FragmentAA A AA A A AMemory
5GH Programmability
Michael Doggett
Graphics Hardware
Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex data
Rasterization
Surface
FrameBuffer
Fragment
TransformVertices
Surface
6GH Programmability
Michael Doggett
Graphics Hardware
Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programs Rasterization
Surface
FrameBuffer
Fragment
Transform
Programs
Programs
7GH Programmability
Michael Doggett
Graphics Hardware
Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures
Random memory access
Rasterization
Surface
FrameBuffer
Fragment
Transform
Textures
8GH Programmability
Michael Doggett
Graphics Hardware
Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures
Random memory access
Output – 2D ImageColor, depth and stencil
Rasterization
Surface
FrameBuffer
Fragment
Transform
Image
9GH Programmability
Michael Doggett
Graphics Hardware
Focus on programmable stages
TransformFragment
Rasterization
Surface
FrameBuffer
Fragment
Transform
Textures
Vertices
Surface
Programs
Programs
Image
10GH Programmability
Michael Doggett
Graphics Hardware Evolution
StateStateStateFrameBuffer
32 tex/ 64 alus16e7 Floating PointPS 2.0
12 tex/16 alus3.12 Fixed pointPS 1.4
StateFragmentStateStateStateRasterization
VS 2.0256 instrsControl Flow
VS 1.1128 instrs
StateTransformCPU/GPUCPU/GPUCPUSurface
Stages987DirectX
9700 (R300)8500 (R200)7500 (R100)ATI RADEON200220012000Year
11GH Programmability
Michael Doggett
Surface Generation Stage
Newest stageDX8 NPatches
State controlledGeometric calculations based on complex surfaces
R
S
FB
F
T
S
12GH Programmability
Michael Doggett
Transform Stage
Includes:ModelView TransformationVertex LightingPerspective TransformationTweening/Skinning
Per-vertex operationsOriginally microcoded on DSPs or CPU
Controlled by fixed function state
Programmable Vertex EngineLindholm et al., SIGGRAPH 2001
R
S
FB
F
T
S
13GH Programmability
Michael Doggett
Vertex Engine
4 parallel vertex enginesInput
Vertex stream Constants
OutputPosition, Color, Tex Coords
Program (Shader) has up to 256 instructions
VertexEngine
Vertex Data
Output
1616
14GH Programmability
Michael Doggett
Vertex EngineRegisters
Four component floating point vectors
12 Read-write temp registersOutput registers
position, color, fogcoord, pointsize, texcoord
Vertex shader outputs are pixel shader inputs
VertexEngine
Vertex Data
Address
Constant
Temp
Output
11
256256
1212
15GH Programmability
Michael Doggett
Vertex Program InstructionsBasic arithmetic operators
ADD, MAD, MUL, SUB
ComparisonMAX, MIN, SGE, SLT
Dot and cross ProductDP3, DP4, DPH, XPD
Exponential functionsEX2, EXP, LG2, LOG
Other ABS, ARL, DST, FLR, FRC, LIT, MOV, POW, RCP, RSQ, SWZ
16GH Programmability
Michael Doggett
Register Modifiers
Source swizzle selects which components to use
iPosition.[xyzw][xyzw][xyzw][xyzw]e.g. iPosition.yzxw
Destination mask to select individual componentoColor.{x}{y}{z}{w}
Source negation-iNormal
17GH Programmability
Michael Doggett
Simple Vertex Program!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;
# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;
# Write out a color.MOV oColor, ambientCol;END
!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;
# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;
# Write out a color.MOV oColor, ambientCol;END
18GH Programmability
Michael Doggett
RADEON 9800 Vertex Shader
DirectX 9.0 Vertex Shader 2.0Same arithmetic instructionsConstant based control flow capabilitiesLoops, branches, subroutines
CALL, LOOP, ENDLOOP, JUMP, JNZ, LABEL, REPEAT, ENDREPEAT, RETURN16 Integer constants16 Boolean constants Loop counter
19GH Programmability
Michael Doggett
Rasterization Stage
Includes:Triangle SetupViewport ClippingViewport TransformRasterization
Not programmable, some control through stateHigh precision vertex parameter interpolators
Position, normal, color, tex coords
R
S
FB
F
T
S
20GH Programmability
Michael Doggett
Fragment Stage
Includes:Texturing
arbitrary memory fetch (Gather)fixed point filtering
Point, Linear, Bi-linear, Tri-linear, Anisotropic Fragment (Per-Pixel) Lighting
RADEON 9700 introduced floating point32 Texture and 64 ALU instructionsCo-issue vector and scalar instruction
R
S
FB
F
T
S
21GH Programmability
Michael Doggett
PP
Pixel Pipeline
8 parallel floating point pixel pipelinesInput
Color, TexcoordsOutput
Color, DepthOutput surfaces
16, 32 bit float16 bit fixed
Color TextureCoords
88
Color
22GH Programmability
Michael Doggett
Pixel Pipeline
RegistersFour component 24bit floating point
performance and precision
12 read-write temp registers16 Texture imagesTexture instructions
KIL, TEX, TXB, TXP
PP
Color
ConstantRegisters
TextureCoords
3232
88
TextureSamplers
1616
Color
Temp
1212
23GH Programmability
Michael Doggett
Simple Fragment Program!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant
OUTPUT out = result.color;
#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END
!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant
OUTPUT out = result.color;
#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END
24GH Programmability
Michael Doggett
Computing the Mandelbrot set
Test each point on the complex number planeX dimension is the real componentY dimension is the imaginary componentZ’ = Z2 + CC = starting position on complex planeIterate until Z > 2
25GH Programmability
Michael Doggett
Mandelbrot main loopMUL pos.xy, curr, curr; # real componentADD pos.x, pos.x, -pos.y; # x2 – y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary componentMAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitudeSUB magnitude, magnitude, four; # compare magnitude to 4CMP escape.x, magnitude, 0, 1; ADD pos.z, pos.z, escape.x;MOV curr, pos; # ready next iteration
26GH Programmability
Michael Doggett
Mandelbrot main loop
MUL pos.xy, curr, curr; # real component
ADD pos.x, pos.x, -pos.y; # x2
– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component
MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude
#
27GH Programmability
Michael Doggett
Mandelbrot main loop
MUL pos.xy, curr, curr; # real component
ADD pos.x, pos.x, -pos.y; # x2
– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component
MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude
#
28GH Programmability
Michael Doggett
Mandelbrot main loop
MUL pos.xy, curr, curr; # real component
ADD pos.x, pos.x, -pos.y; # x2
– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component
MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude
#
29GH Programmability
Michael Doggett
Mandelbrot main loop
MUL pos.xy, curr, curr; # real component
ADD pos.x, pos.x, -pos.y; # x2
– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component
MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude
#
30GH Programmability
Michael Doggett
Mandelbrot multipass demo
Main loop is 11 instructionsNeed multiple iterations to get detailMultipass
Render to texturePass 1: Render output to framebufferSet framebuffer data as texturePass 2 to N: Read in previous result as texture Pass N+1: Draw texture on screen
31GH Programmability
Michael Doggett
Mandelbrot demo
32GH Programmability
Michael Doggett
Render to texture
Render data
Rasterization
Surface
FrameBuffer
Fragment
Transform
Output 1
33GH Programmability
Michael Doggett
Render to texture
Render dataSet data as texture input to Fragment stage Rasterization
Surface
FrameBuffer
Fragment
Transform
Output 1
34GH Programmability
Michael Doggett
Render to texture
Render dataSet data as texture input to Fragment stageRender to second output
Rasterization
Surface
FrameBuffer
Fragment
Transform
Output 1
Output 2
35GH Programmability
Michael Doggett
Render to texture
Render dataSet data as texture input to Fragment stageRender to second outputSwitch output and texture bufferPing-pong buffers
Rasterization
Surface
FrameBuffer
Fragment
Transform
Output 3
Output 2
36GH Programmability
Michael Doggett
Fast Fourier Transform
Transform image into frequency domainComplex weights for a summation of sine waves
Separable FilterCooley and Tukey 65, decimation in time
log2 (width) + log2 (height) + 2 rendering passesPing-pong between two floating point renderabletextures
37GH Programmability
Michael Doggett
Fast Fourier TransformHorizontal scramble
dependent texture read using precalculated position bit invertlog2(width) butterfly passes
texture read offset, betaVertical scramblelog2(height) butterfly passesMore details
ShaderX2 chapter, “Advanced Image Processing with DirectX 9 Pixel Shaders”, Mitchell, Ansari, Hart RNL seperable 50x50 gaussian blur filter
Jason Mitchell’s course notes from course 7. Real-Time Shading
38GH Programmability
Michael Doggett
FFT demo
39GH Programmability
Michael Doggett
Multiple Render Targets
Multiple render targets4 possible output buffersAll same X, Y and bit resolutionDifferent formats
ATI_draw_buffers extensionadds result.color[n] where n=0,1,2,3
OUTPUT oColor1 = result.color[1];
40GH Programmability
Michael Doggett
Render to texture
pbuffersOpenGL ARB SuperBuffers
Adds GLmem typecan be used as
framebuffertexturevertex array
41GH Programmability
Michael Doggett
Displacement mappingusing render to vertex array
2 Pass algorithm1st pass
Displacement fragment programOutput to SuperBuffer Vertex Array
using ATI_draw_buffersVertex, Normal, TexCoord
2nd pass Vertex program uses displaced Vertex BufferFragment program does bump mapped lighting
42GH Programmability
Michael Doggett
Displacement Mapping Demo
43GH Programmability
Michael Doggett
Programming OptionsLow or High level
Assembler or High Level Shading languagesAssembler
OpenGL Vertex Program and Fragment ProgramDirectX Vertex Shader and Pixel Shader
Shading Languagesthe OpenGL Shading LanguageDirectX9 HLSLNVIDIA’s Cg
ToolsATI’s RenderMonkey
44GH Programmability
Michael Doggett
the OpenGL Shading Language
C like language for programming GPUsBasically the same language for vertex and fragment shadersInputs and outputs similar to Vertex Engine and Pixel Pipeline
different syntax e.g. gl_TexCoord0, gl_FragColorType qualifiers
const // compile-time constantuniform // constant across primitive
Typesfloat, int, bool, vec4, ivec3
45GH Programmability
Michael Doggett
the OpenGL Shading Language
Structures and arraysOperators work with 1-4 component typesFlow controlUser defined functionsBuilt-in functions
sin, cos, normalize, noise,
46GH Programmability
Michael Doggett
Simple GLSL examplevertex shadervec4 LightPos = vec4 ( 0.4, 0.6, 0.6, 0.0 );vec4 DiffuseColor = vec4( 0.50, 0.15, 0.25, 1.0 ); vec4 LightColor = vec4( 0.8, 0.5, 0.8, 1.0 );
void main(void){
vec3 eyeNormal;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;eyeNormal = gl_NormalMatrix * gl_Normal;
gl_TexCoord0 = vec4( eyeNormal, 0.0 ); // eye space normalgl_TexCoord1 = LightPos;gl_TexCoord2 = DiffuseColor; gl_TexCoord3 = LightColor;
}
47GH Programmability
Michael Doggett
Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){
vec3 NNormal;vec4 MyColor;float Intensity;
NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation
// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;
}
48GH Programmability
Michael Doggett
Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){
vec3 NNormal;vec4 MyColor;float Intensity;
NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation
// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;
}
49GH Programmability
Michael Doggett
Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){
vec3 NNormal;vec4 MyColor;float Intensity;
NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation
// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;
}
50GH Programmability
Michael Doggett
Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){
vec3 NNormal;vec4 MyColor;float Intensity;
NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation
// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;
}
51GH Programmability
Michael Doggett
Computation on GPUsFast Computation of Generalized Voronoi Diagrams Using Graphics Hardware
Hoff et al. 99use polygon scan-conversion and depth comparison
Interactive Multi-Pass Programmable ShadingPeercy00Treat the OpenGL architecture as a general SIMD computer
Physically-Based Visual Simulation on Graphics Hardware
Mark Harris et. al., GH02Coupled map lattice stored in a texture
continuous values on a discrete latticesimulations of convection, reaction-diffusion, and boiling
52GH Programmability
Michael Doggett
Computation on GPUsSimulation and Computation, GH03 session
Simulation of Cloud Dynamics on Graphics HardwareHarris et al.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware
Goodnight et al. The FFT on a GPU
Moreland et al.
GPUs as Stream Processors, GH03 panelData Parallel Computing on Graphics Hardware, Ian Buck“Brook” stream API
objective is to abstract computational functionality
53GH Programmability
Michael Doggett
Computation on GPUs
SIGGRAPH2003 sessionThursday, 3.45-5.30Bolz et. al., Sparse matrix solversKrüger et. al., Krüger, Westermann
Linear Algebra Operators for GPU Implementation of Numerical AlgorithmsFramework for linear algebra operatorsNavier-Stokes Demo
54GH Programmability
Michael Doggett
Summary
GPU ishighly parallelvery programmable
low and high levels
increasing in performance and maintaining a low costdriven by a demanding consumer application
55GH Programmability
Michael Doggett
Questions ?
More informationwww.ati.com/[email protected] 22. The OpenGL Shading Language
Monday, Half Day, 1:45 - 5:30 pm SuperBuffers and other GL extensions
Tuesday, exhibit hall C, 1 - 3pm.www.opengl.org
AcknowledgementsEvan Hart, Marwan Ansari, Bill Licea-Kane, Arcot Preetham, James Peercy