programmability features of graphics hardware

55
Programmability Features of Graphics Hardware Michael Doggett ATI Research [email protected]

Upload: others

Post on 31-Jan-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programmability Features of Graphics Hardware

Programmability Features of Graphics Hardware

Michael DoggettATI Research

[email protected]

Page 2: Programmability Features of Graphics Hardware

2GH Programmability

Michael Doggett

OutlineGraphics HardwareTransform Stage

Vertex EngineOpenGL ARB vertex program

Fragment StagePixel Pipeline

OpenGL ARB fragment programExamples

MandelbrotFFTDisplacement Mapping

Programming OptionsOpenGL Shading Language overview

Computation on GPUs

Page 3: Programmability Features of Graphics Hardware

3GH Programmability

Michael Doggett

Graphics Hardware

Hardware pipeline

Rasterization

Surface

FrameBuffer

Fragment

Transform

Page 4: Programmability Features of Graphics Hardware

4GH Programmability

Michael Doggett

Graphics Hardware

Hardware pipelineBased on RADEON 9800

380MHzSIMD stages

TransformFragment

Highly parallel4 component vector registers and operationsHigh computation/bandwidth ratio

Arithmetic intensity

Rasterization

Surface

Fragment

TransformT T T T

F F F FF F F F

FragmentAA A AA A A AMemory

Page 5: Programmability Features of Graphics Hardware

5GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex data

Rasterization

Surface

FrameBuffer

Fragment

TransformVertices

Surface

Page 6: Programmability Features of Graphics Hardware

6GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programs Rasterization

Surface

FrameBuffer

Fragment

Transform

Programs

Programs

Page 7: Programmability Features of Graphics Hardware

7GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures

Random memory access

Rasterization

Surface

FrameBuffer

Fragment

Transform

Textures

Page 8: Programmability Features of Graphics Hardware

8GH Programmability

Michael Doggett

Graphics Hardware

Input – 3D SceneTypically triangles made up of vertices 1D buffers of vertex dataVertex and Fragment programsTextures

Random memory access

Output – 2D ImageColor, depth and stencil

Rasterization

Surface

FrameBuffer

Fragment

Transform

Image

Page 9: Programmability Features of Graphics Hardware

9GH Programmability

Michael Doggett

Graphics Hardware

Focus on programmable stages

TransformFragment

Rasterization

Surface

FrameBuffer

Fragment

Transform

Textures

Vertices

Surface

Programs

Programs

Image

Page 10: Programmability Features of Graphics Hardware

10GH Programmability

Michael Doggett

Graphics Hardware Evolution

StateStateStateFrameBuffer

32 tex/ 64 alus16e7 Floating PointPS 2.0

12 tex/16 alus3.12 Fixed pointPS 1.4

StateFragmentStateStateStateRasterization

VS 2.0256 instrsControl Flow

VS 1.1128 instrs

StateTransformCPU/GPUCPU/GPUCPUSurface

Stages987DirectX

9700 (R300)8500 (R200)7500 (R100)ATI RADEON200220012000Year

Page 11: Programmability Features of Graphics Hardware

11GH Programmability

Michael Doggett

Surface Generation Stage

Newest stageDX8 NPatches

State controlledGeometric calculations based on complex surfaces

R

S

FB

F

T

S

Page 12: Programmability Features of Graphics Hardware

12GH Programmability

Michael Doggett

Transform Stage

Includes:ModelView TransformationVertex LightingPerspective TransformationTweening/Skinning

Per-vertex operationsOriginally microcoded on DSPs or CPU

Controlled by fixed function state

Programmable Vertex EngineLindholm et al., SIGGRAPH 2001

R

S

FB

F

T

S

Page 13: Programmability Features of Graphics Hardware

13GH Programmability

Michael Doggett

Vertex Engine

4 parallel vertex enginesInput

Vertex stream Constants

OutputPosition, Color, Tex Coords

Program (Shader) has up to 256 instructions

VertexEngine

Vertex Data

Output

1616

Page 14: Programmability Features of Graphics Hardware

14GH Programmability

Michael Doggett

Vertex EngineRegisters

Four component floating point vectors

12 Read-write temp registersOutput registers

position, color, fogcoord, pointsize, texcoord

Vertex shader outputs are pixel shader inputs

VertexEngine

Vertex Data

Address

Constant

Temp

Output

11

256256

1212

Page 15: Programmability Features of Graphics Hardware

15GH Programmability

Michael Doggett

Vertex Program InstructionsBasic arithmetic operators

ADD, MAD, MUL, SUB

ComparisonMAX, MIN, SGE, SLT

Dot and cross ProductDP3, DP4, DPH, XPD

Exponential functionsEX2, EXP, LG2, LOG

Other ABS, ARL, DST, FLR, FRC, LIT, MOV, POW, RCP, RSQ, SWZ

Page 16: Programmability Features of Graphics Hardware

16GH Programmability

Michael Doggett

Register Modifiers

Source swizzle selects which components to use

iPosition.[xyzw][xyzw][xyzw][xyzw]e.g. iPosition.yzxw

Destination mask to select individual componentoColor.{x}{y}{z}{w}

Source negation-iNormal

Page 17: Programmability Features of Graphics Hardware

17GH Programmability

Michael Doggett

Simple Vertex Program!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;

# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;

# Write out a color.MOV oColor, ambientCol;END

!!ARBvp1.0ATTRIB iPos = vertex.position;PARAM mvp[4] = { state.matrix.mvp };PARAM ambientCol = state.lightprod[0].ambient;OUTPUT oPos = result.position;OUTPUT oColor = result.color;

# Transform the vertex to clip coordinates.DP4 oPos.x, mvp[0], iPos;DP4 oPos.y, mvp[1], iPos;DP4 oPos.z, mvp[2], iPos;DP4 oPos.w, mvp[3], iPos;

# Write out a color.MOV oColor, ambientCol;END

Page 18: Programmability Features of Graphics Hardware

18GH Programmability

Michael Doggett

RADEON 9800 Vertex Shader

DirectX 9.0 Vertex Shader 2.0Same arithmetic instructionsConstant based control flow capabilitiesLoops, branches, subroutines

CALL, LOOP, ENDLOOP, JUMP, JNZ, LABEL, REPEAT, ENDREPEAT, RETURN16 Integer constants16 Boolean constants Loop counter

Page 19: Programmability Features of Graphics Hardware

19GH Programmability

Michael Doggett

Rasterization Stage

Includes:Triangle SetupViewport ClippingViewport TransformRasterization

Not programmable, some control through stateHigh precision vertex parameter interpolators

Position, normal, color, tex coords

R

S

FB

F

T

S

Page 20: Programmability Features of Graphics Hardware

20GH Programmability

Michael Doggett

Fragment Stage

Includes:Texturing

arbitrary memory fetch (Gather)fixed point filtering

Point, Linear, Bi-linear, Tri-linear, Anisotropic Fragment (Per-Pixel) Lighting

RADEON 9700 introduced floating point32 Texture and 64 ALU instructionsCo-issue vector and scalar instruction

R

S

FB

F

T

S

Page 21: Programmability Features of Graphics Hardware

21GH Programmability

Michael Doggett

PP

Pixel Pipeline

8 parallel floating point pixel pipelinesInput

Color, TexcoordsOutput

Color, DepthOutput surfaces

16, 32 bit float16 bit fixed

Color TextureCoords

88

Color

Page 22: Programmability Features of Graphics Hardware

22GH Programmability

Michael Doggett

Pixel Pipeline

RegistersFour component 24bit floating point

performance and precision

12 read-write temp registers16 Texture imagesTexture instructions

KIL, TEX, TXB, TXP

PP

Color

ConstantRegisters

TextureCoords

3232

88

TextureSamplers

1616

Color

Temp

1212

Page 23: Programmability Features of Graphics Hardware

23GH Programmability

Michael Doggett

Simple Fragment Program!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant

OUTPUT out = result.color;

#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END

!!ARBfp1.0 TEMP temp; ATTRIB tex0 = fragment.texcoord[0]; # input registerATTRIB col0 = fragment.color; PARAM half = 0.5, 0.5, 0.5, 0.5; # constant

OUTPUT out = result.color;

#Fetch textureTEX temp, tex0, texture[0], 2D; #modulate and write out colorMUL out, col0, temp; END

Page 24: Programmability Features of Graphics Hardware

24GH Programmability

Michael Doggett

Computing the Mandelbrot set

Test each point on the complex number planeX dimension is the real componentY dimension is the imaginary componentZ’ = Z2 + CC = starting position on complex planeIterate until Z > 2

Page 25: Programmability Features of Graphics Hardware

25GH Programmability

Michael Doggett

Mandelbrot main loopMUL pos.xy, curr, curr; # real componentADD pos.x, pos.x, -pos.y; # x2 – y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary componentMAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitudeSUB magnitude, magnitude, four; # compare magnitude to 4CMP escape.x, magnitude, 0, 1; ADD pos.z, pos.z, escape.x;MOV curr, pos; # ready next iteration

Page 26: Programmability Features of Graphics Hardware

26GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

Page 27: Programmability Features of Graphics Hardware

27GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

Page 28: Programmability Features of Graphics Hardware

28GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

Page 29: Programmability Features of Graphics Hardware

29GH Programmability

Michael Doggett

Mandelbrot main loop

MUL pos.xy, curr, curr; # real component

ADD pos.x, pos.x, -pos.y; # x2

– y2 + start.xADD pos.x, pos.x, start.x;MUL pos.y, curr.x, curr.y; # imaginary component

MAD pos.y, pos.y, two.x, start.y; DP3 magnitude, pos, pos; # calculate magnitude

#

Page 30: Programmability Features of Graphics Hardware

30GH Programmability

Michael Doggett

Mandelbrot multipass demo

Main loop is 11 instructionsNeed multiple iterations to get detailMultipass

Render to texturePass 1: Render output to framebufferSet framebuffer data as texturePass 2 to N: Read in previous result as texture Pass N+1: Draw texture on screen

Page 31: Programmability Features of Graphics Hardware

31GH Programmability

Michael Doggett

Mandelbrot demo

Page 32: Programmability Features of Graphics Hardware

32GH Programmability

Michael Doggett

Render to texture

Render data

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

Page 33: Programmability Features of Graphics Hardware

33GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stage Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

Page 34: Programmability Features of Graphics Hardware

34GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stageRender to second output

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 1

Output 2

Page 35: Programmability Features of Graphics Hardware

35GH Programmability

Michael Doggett

Render to texture

Render dataSet data as texture input to Fragment stageRender to second outputSwitch output and texture bufferPing-pong buffers

Rasterization

Surface

FrameBuffer

Fragment

Transform

Output 3

Output 2

Page 36: Programmability Features of Graphics Hardware

36GH Programmability

Michael Doggett

Fast Fourier Transform

Transform image into frequency domainComplex weights for a summation of sine waves

Separable FilterCooley and Tukey 65, decimation in time

log2 (width) + log2 (height) + 2 rendering passesPing-pong between two floating point renderabletextures

Page 37: Programmability Features of Graphics Hardware

37GH Programmability

Michael Doggett

Fast Fourier TransformHorizontal scramble

dependent texture read using precalculated position bit invertlog2(width) butterfly passes

texture read offset, betaVertical scramblelog2(height) butterfly passesMore details

ShaderX2 chapter, “Advanced Image Processing with DirectX 9 Pixel Shaders”, Mitchell, Ansari, Hart RNL seperable 50x50 gaussian blur filter

Jason Mitchell’s course notes from course 7. Real-Time Shading

Page 38: Programmability Features of Graphics Hardware

38GH Programmability

Michael Doggett

FFT demo

Page 39: Programmability Features of Graphics Hardware

39GH Programmability

Michael Doggett

Multiple Render Targets

Multiple render targets4 possible output buffersAll same X, Y and bit resolutionDifferent formats

ATI_draw_buffers extensionadds result.color[n] where n=0,1,2,3

OUTPUT oColor1 = result.color[1];

Page 40: Programmability Features of Graphics Hardware

40GH Programmability

Michael Doggett

Render to texture

pbuffersOpenGL ARB SuperBuffers

Adds GLmem typecan be used as

framebuffertexturevertex array

Page 41: Programmability Features of Graphics Hardware

41GH Programmability

Michael Doggett

Displacement mappingusing render to vertex array

2 Pass algorithm1st pass

Displacement fragment programOutput to SuperBuffer Vertex Array

using ATI_draw_buffersVertex, Normal, TexCoord

2nd pass Vertex program uses displaced Vertex BufferFragment program does bump mapped lighting

Page 42: Programmability Features of Graphics Hardware

42GH Programmability

Michael Doggett

Displacement Mapping Demo

Page 43: Programmability Features of Graphics Hardware

43GH Programmability

Michael Doggett

Programming OptionsLow or High level

Assembler or High Level Shading languagesAssembler

OpenGL Vertex Program and Fragment ProgramDirectX Vertex Shader and Pixel Shader

Shading Languagesthe OpenGL Shading LanguageDirectX9 HLSLNVIDIA’s Cg

ToolsATI’s RenderMonkey

Page 44: Programmability Features of Graphics Hardware

44GH Programmability

Michael Doggett

the OpenGL Shading Language

C like language for programming GPUsBasically the same language for vertex and fragment shadersInputs and outputs similar to Vertex Engine and Pixel Pipeline

different syntax e.g. gl_TexCoord0, gl_FragColorType qualifiers

const // compile-time constantuniform // constant across primitive

Typesfloat, int, bool, vec4, ivec3

Page 45: Programmability Features of Graphics Hardware

45GH Programmability

Michael Doggett

the OpenGL Shading Language

Structures and arraysOperators work with 1-4 component typesFlow controlUser defined functionsBuilt-in functions

sin, cos, normalize, noise,

Page 46: Programmability Features of Graphics Hardware

46GH Programmability

Michael Doggett

Simple GLSL examplevertex shadervec4 LightPos = vec4 ( 0.4, 0.6, 0.6, 0.0 );vec4 DiffuseColor = vec4( 0.50, 0.15, 0.25, 1.0 ); vec4 LightColor = vec4( 0.8, 0.5, 0.8, 1.0 );

void main(void){

vec3 eyeNormal;

gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;eyeNormal = gl_NormalMatrix * gl_Normal;

gl_TexCoord0 = vec4( eyeNormal, 0.0 ); // eye space normalgl_TexCoord1 = LightPos;gl_TexCoord2 = DiffuseColor; gl_TexCoord3 = LightColor;

}

Page 47: Programmability Features of Graphics Hardware

47GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

Page 48: Programmability Features of Graphics Hardware

48GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

Page 49: Programmability Features of Graphics Hardware

49GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

Page 50: Programmability Features of Graphics Hardware

50GH Programmability

Michael Doggett

Simple GLSL examplefragment shaderconst float Shininess = 50.0;void main (void){

vec3 NNormal;vec4 MyColor;float Intensity;

NNormal = normalize( vec3 ( gl_TexCoord0 ) ); // tc0 = eyeNormalIntensity = dot ( vec3 ( gl_TexCoord1 ), NNormal ); // diffuse calculation

// tc1 = light positionIntensity = max ( Intensity, 0.0 );MyColor = vec4( Intensity ) * gl_TexCoord2; // tc2 = diffuse colorNNormal = abs ( NNormal );Intensity = dot ( NNormal, vec3( gl_TexCoord1 ) ); // light positionIntensity = max ( Intensity, 0.0 );Intensity = pow ( Intensity, Shininess );MyColor += vec4( Intensity ) * gl_TexCoord3; // specular light colorMyColor.a = gl_TexCoord2.a; // diffuse alphagl_FragColor = MyColor;

}

Page 51: Programmability Features of Graphics Hardware

51GH Programmability

Michael Doggett

Computation on GPUsFast Computation of Generalized Voronoi Diagrams Using Graphics Hardware

Hoff et al. 99use polygon scan-conversion and depth comparison

Interactive Multi-Pass Programmable ShadingPeercy00Treat the OpenGL architecture as a general SIMD computer

Physically-Based Visual Simulation on Graphics Hardware

Mark Harris et. al., GH02Coupled map lattice stored in a texture

continuous values on a discrete latticesimulations of convection, reaction-diffusion, and boiling

Page 52: Programmability Features of Graphics Hardware

52GH Programmability

Michael Doggett

Computation on GPUsSimulation and Computation, GH03 session

Simulation of Cloud Dynamics on Graphics HardwareHarris et al.

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

Goodnight et al. The FFT on a GPU

Moreland et al.

GPUs as Stream Processors, GH03 panelData Parallel Computing on Graphics Hardware, Ian Buck“Brook” stream API

objective is to abstract computational functionality

Page 53: Programmability Features of Graphics Hardware

53GH Programmability

Michael Doggett

Computation on GPUs

SIGGRAPH2003 sessionThursday, 3.45-5.30Bolz et. al., Sparse matrix solversKrüger et. al., Krüger, Westermann

Linear Algebra Operators for GPU Implementation of Numerical AlgorithmsFramework for linear algebra operatorsNavier-Stokes Demo

Page 54: Programmability Features of Graphics Hardware

54GH Programmability

Michael Doggett

Summary

GPU ishighly parallelvery programmable

low and high levels

increasing in performance and maintaining a low costdriven by a demanding consumer application

Page 55: Programmability Features of Graphics Hardware

55GH Programmability

Michael Doggett

Questions ?

More informationwww.ati.com/[email protected] 22. The OpenGL Shading Language

Monday, Half Day, 1:45 - 5:30 pm SuperBuffers and other GL extensions

Tuesday, exhibit hall C, 1 - 3pm.www.opengl.org

AcknowledgementsEvan Hart, Marwan Ansari, Bill Licea-Kane, Arcot Preetham, James Peercy