2007.05.30 jongwon kim
DESCRIPTION
Comparison of next generation graphic processors, Nvidia G80 and ATI R600. 2007.05.30 Jongwon Kim. Agenda. Old and new pipeline DirectX 10 Two competitors in graphic card Nvidia G80 architecture ATI R600 architecture Comparison of G80 and R600 Q & A. Old pipeline. - PowerPoint PPT PresentationTRANSCRIPT
Comparison of next generation graphic processers 1
2007.05.30Jongwon Kim
Comparison of next generation graphic processors, Nvidia G80 and ATI R600
Comparison of next generation graphic processers 2
Agenda
• Old and new pipeline• DirectX 10• Two competitors in graphic card• Nvidia G80 architecture• ATI R600 architecture• Comparison of G80 and R600• Q & A
Comparison of next generation graphic processers 3
Old pipeline
• The traditional graphics (transform & lighting) pipeline▪ Geometry pipeline
• Modeling transformation• Per-vertex lighting & shading• Viewing transformation• Projection transformation• Clipping• Triangle setup
▪ Scan-line conversion▪ Rendering/rasterization
• Triangle setup• Texturing, fragment shading• Alpha, stencil and depth testing• Frame buffer blending• Anti-aliasing (optional)
Comparison of next generation graphic processers 4
Pipeline with shaders
• Vertex information input▪ Vertex Data▪ High-order primitive tessellation
• Geometry pipeline▪ T&L engine▪ vertex shaders▪ Viewports and clipping
• Pixel and texture blending▪ Multi-texturing▪ pixel shaders▪ fog blending
• Rasterization▪ alpha, stencil and depth testing▪ frame buffer blending
Comparison of next generation graphic processers 5
The situation today
• No single graphics hardware target • CPU-bound games and applications
▪ Bandwidth and CPU cycles are the bottleneck in multiple areas (physics, AI)
▪ Large amount of CPU resources spent directing the GPU
• GPU overly-specialized
GPUGPU
CPUCPU
Comparison of next generation graphic processers 6
DirectX 10
• Big changes from DirectX 9▪ No more fixed function▪ State grouping▪ Reduced CPU load
• remove overhead from DirectX 9▪ Unified pixel shader and vertex shader▪ Shader model 4.0▪ Geometry Shader
• ex) generalize displacement map▪ Texture array
• dynamically indexable in the shader▪ Predicated draw
• GPU only process occlusion query▪ Stream out
• generated geometry easily redrawn▪ But support only Window vista
Texture Arrays Format Reinterpretation
Stream OutputResource ViewsInput Assembler
Immediate offset on Memory AccessInteger/Bitwise Instructions
Comparison FilteringConstant Buffers
State ObjectsShared-Exponent HDR Compression (RGBE)
Block-Compressed Formats for bump/normal maps
128 texture slots8 Render targets
More interstage communicationInstance, Vertex, Primitive identifiers
Per-primitive Clip distancePredicated Rendering
Alpha-to-CoverageMultisample ReadbackBetter cubemap filtering
Input Assembler…
Texture Arrays Format Reinterpretation
Stream OutputResource ViewsInput Assembler
Immediate offset on Memory AccessInteger/Bitwise Instructions
Comparison FilteringConstant Buffers
State ObjectsShared-Exponent HDR Compression (RGBE)
Block-Compressed Formats for bump/normal maps
128 texture slots8 Render targets
More interstage communicationInstance, Vertex, Primitive identifiers
Per-primitive Clip distancePredicated Rendering
Alpha-to-CoverageMultisample ReadbackBetter cubemap filtering
Input Assembler…
Comparison of next generation graphic processers 7
Pipeline of the DirectX 10
• Input assembler• Vertex shader 4.0• Geometry shader• Rasterizer (scan conversion)• Pixel shader 4.0• Output merger
InputAssembler
Vertex Buffer
Index Buffer
Texture
Texture
Texture
Depth/ Stencil
Render Target
Stream Output
VertexShader
GeometryShader
Rasterizer/Interpolator
PixelShader
OutputMerger
Comparison of next generation graphic processers 8
Shader programming
• What is a shader?▪ A part of the graphics renderer, which is responsible for calculati
ng the color of an object▪ The shader can apply transformations to a large set of elements
at a time for every vertex of a model or to each pixel in an area of the screen
▪ GPU (graphics processing unit) can provide shading functions▪ Shader functions introduced in the OpenGL version 1.5 and in th
e DirectX 8
• Why the shader is need?▪ Shader is well suited to parallel processing▪ GPU have a multi-core design to facilitate parallel processing
Comparison of next generation graphic processers 9
Shader model 4.0
• Unified shading architecture▪ Program can assign shader units (stream processors) as vertex o
r pixel shader▪ New HDR(high dynamic range) format
• FP16(64bit), FP24(96bit), FP32(128bit)• R11G11B1110, R9G9B9+5 (half size, same dynamic range)
Comparison of next generation graphic processers 10
Geometry shader
• Geometry shader▪ GPU can’t create new data, shaded only in shader model 3.0▪ Gemetry shader can add or remove vertecies▪ Displacement mapping, stencil shadow extrusion, piont sprit crea
tion, motion blur etc.▪ Created data moves through stream output. And send input asse
mbler
Comparison of next generation graphic processers 11
Why unify shader
Vertex Shader
Pixel Shader
Fully Loaded
Partially LoadedComplex Geometry
Processing
Vertex Shader
Pixel Shader
Fully Loaded
Partially LoadedComplex Geometry
Processing
Vertex Shader
Pixel Shader
Fully Loaded
Partially Loaded
Complex PixelProcessing
Vertex Shader
Pixel Shader
Fully Loaded
Partially Loaded
Complex PixelProcessing
Comparison of next generation graphic processers 12
Unified shader
Complex Geometryand Pixel Processing
Vertex, Pixel & Geometry Shader
Unified Shader Architecture
Complex Geometryand Pixel Processing
Vertex, Pixel & Geometry Shader
Unified Shader Architecture
Comparison of next generation graphic processers 13
Two competitors
• NVIDIA Corporation▪ Worldwide leader in GPU technologies▪ Major supplier of PC mother board chipset, graphics cards▪ Developed GPU for game console, Xbox and PlayStation 3▪ Well known products
• RIVA TNT, NVIDIA GeForce series, NVIDIA nForce series
• ATI Technologies▪ Major supplier of GPU, PC mother board chipset, graphics cards▪ Purchased by AMD in October 2006▪ Developed GPU for game console, Nintendo 64, Xbox 360, Wii▪ Well known products
• Mach32, Rage series, Radeon series, Xpress series
Comparison of next generation graphic processers 14
G80 and R600
Nvidia G80 ATI R600Release date Nov-06 May-07
Transitors 681 million 700 million
Die size 20x21mm
Core clock 575Mhz 742Mhz
Shader clock 1350Mhz
Stream processor 128 320
Shader processing 518.4GFLOPS 475GFLOPS
Memory clock 900Mhz 825Mhz
Memory IF bus 384bit 512bit
Memory bandwidth 86.4GB/s 105.6GB/s
Memory size 768MB 512MB
Texture fill rate 36.8GT/s 11.9GT/s
Geometry rate 742Mtris/s
Fab 90nm 80nm
ROPs 24 16
Bus type GDDR3 GDDR3
Comparison of next generation graphic processers 15
Nvidia G80, 8800GTX
Comparison of next generation graphic processers 16
ATI R600, HD2900XT
Comparison of next generation graphic processers 17
Stream processor unit
• R600 - 320 independent ALU unit (with 64 Special ALU)
• G80 - 128 ALU + 128 Special ALU
Comparison of next generation graphic processers 18
Cluster architecture
• R600▪ 320 = 4 cluster * 16 * 5 scalar unit▪ 8 thread = 2 arbiter * 4 cluster
• G80▪ 128 = 8 cluster * 16 shader unit▪ 16 thread = 2 instruction fetcher * 8 cluster
Comparison of next generation graphic processers 19
Clock war
• R600▪ Core clock is the same with shader clock, 742Mhz
• G80▪ Shader clock is higher than graphics core clock▪ Shader clock 1350Mhz▪ Core clock 575Mhz
Comparison of next generation graphic processers 20
PCI Express
PCI Express
Ring Stop
Ring Stop
RingRingStopStop
RingRingStopStop
RingRingStopStop
RingRingStopStop
Memory interface
• R600▪ 512 bit ring bus
• Simplifies routing to improve scalability• Reduces wire delay• Reduces number of repeaters required• 105.6GB/s
• G80▪ 384 bit crossbar bus▪ 86.4GB/s
Comparison of next generation graphic processers 21
HW tessellation
▪ ATI has provided HW tessellation, TruForm▪ Xbox 360 already include this▪ DirectX 11 will include
Comparison of next generation graphic processers 22
Nvidia G80 architecture
Comparison of next generation graphic processers 23
Nvidia G80 diagram
Comparison of next generation graphic processers 24
CUDA thread computing
• Compute unified device architecture
Comparison of next generation graphic processers 25
CUDA thread computing
Comparison of next generation graphic processers 26
ATI R600 architecture
• Command Processor• Setup Engine• Ultra-Threaded
DispatchProcessor
• Stream Processing Units
• Texture Units & Caches
• Memory Read/Write Cache & Stream Out Buffer
• Shader Export• Render Back-Ends
Z/S
tencil
Cache
Color Cache
VertexAssembler
Command Processor
GeometryAssembler
Scan Converter /Rasterizer
Interpolators
Hie
rarc
hic
al Z
ShaderC
on
sta
nt C
ache
Verte
x Inde
x F
etc
h
Str
eam
Ou
t B
uff
er
L2 T
extu
re C
ach
e
ProgrammableTessellator
Ultra-Threaded Dispatch Processor
Shader Export
ShaderIn
stru
ctio
n C
ache
Mem
ory
Re
ad/W
rite
Cache
L1 T
extu
re C
ach
e
Verte
x Ca
che
StreamStreamProcessingProcessing
UnitsUnits
Render BackRender Back--EndsEnds
Te
xtu
re U
nits
Te
xtu
re U
nits
SetupSetupEngineEngine
Comparison of next generation graphic processers 27
ATI R600 diagram
Comparison of next generation graphic processers 28
Q&A
– mailto://[email protected]– Links
– http://developer.nvidia.com– http://www.beyond3d.com– http://www.extremetech.com/article2/0,1697,2053309,00.asp– http://en.wikipedia.org/wiki/Radeon_R600– http://en.wikipedia.org/wiki/GeForce_8_series– http://www.beyond3d.com/content/reviews/1/