csl 859: advanced computer graphics dept of computer sc. & engg. iit delhi
TRANSCRIPT
CSL 859: CSL 859: Advanced Advanced Computer Computer GraphicsGraphicsDept of Computer Sc. & Engg.Dept of Computer Sc. & Engg.
IIT DelhiIIT Delhi
Adrianne DemoAdrianne Demo Skin shaderSkin shader
1,400 instructions per pixel1,400 instructions per pixel 15 render passes15 render passes Five bump mapsFive bump maps Physically-based lighting with sub-surface scatteringPhysically-based lighting with sub-surface scattering
Three skin layers with different scattering properties.Three skin layers with different scattering properties. Complex anisotropic hair shaderComplex anisotropic hair shader Real geometryReal geometry
GPU-accelerated character skinningGPU-accelerated character skinning BlendshapesBlendshapes Sculpt deformersSculpt deformers Skeletal-driven bump mapsSkeletal-driven bump maps
Graphics PipelineGraphics PipelineGeometry
Transform
Framebuffer
Clip Setup
Rasterize Z-test
Light
TextureBlend
Picture
Graphics PipelineGraphics PipelineVertex Connectivity Textures
Vertex Shader
Framebuffer
Clip &Setup
Rasterize FragmentShader
Raster OPs
PrimitiveAssembly
Texture
Blend
Picture
BottlenecksBottlenecks
Too many operationsToo many operations ParallelizeParallelize
Too many memory accessesToo many memory accesses ParallelizeParallelize
SCREENTILEXBAR
SCREENTILE
SCREENTILE
GEOMETRY OPERATIONS
FRAGMENTOPERATIONS
ParallelizationParallelization
Distribute computation to processorsDistribute computation to processors Work allocationWork allocation
Distribute texture to memory banksDistribute texture to memory banks Tile Screen-pixels into memory banksTile Screen-pixels into memory banks
Do all processors have access to all Do all processors have access to all memorymemory
Distribute access/Replicate dataDistribute access/Replicate data
Sorting TaxonomySorting Taxonomy
Sort firstSort first Allocate to processor, which is responsible for Allocate to processor, which is responsible for
only a given area of the screenonly a given area of the screen Sort middleSort middle
Optimally perform geometry ops and then Optimally perform geometry ops and then distribute to the responsible processordistribute to the responsible processor
Sort lastSort last No-screen subdivision.No-screen subdivision. Optimally perform geometry and fragment ops Optimally perform geometry and fragment ops
and then compose resultsand then compose results
Memory ConsiderationsMemory Considerations
Highly pipelinedHighly pipelined Guard against stallsGuard against stalls
Memory bandwidthMemory bandwidth How many accesses per second?How many accesses per second?
LatencyLatency Latency hiding buffers Latency hiding buffers
Larger memory atomsLarger memory atoms e.g., 32 byte atomse.g., 32 byte atoms
Graphics Architecture: A Graphics Architecture: A Brief HistoryBrief History
Evans & SutherlandEvans & Sutherland IkonasIkonas UNC Chapel HillUNC Chapel Hill Silicon GraphicsSilicon Graphics
(Mushroom: Smart VGA controllers)(Mushroom: Smart VGA controllers) nVIDIA, AMDnVIDIA, AMD
IKONASIKONAS 32 bit data, 24 bit address bus backbone32 bit data, 24 bit address bus backbone
Everything memory mappedEverything memory mapped Host interface = address registers to access anything on the bus. Host interface = address registers to access anything on the bus.
Frame buffer resolution and timing could be set via control registers. Frame buffer resolution and timing could be set via control registers. Graphics processorGraphics processor
(micro)Programmable(micro)Programmable 32 bit integer ALU and 16x16 bit integer multiplier32 bit integer ALU and 16x16 bit integer multiplier Address counters, Loop counters and Address counters, Loop counters and 64 bit instruction word. 64 bit instruction word.
Plug-in boardsPlug-in boards 16 bit graphics processor with 16 pixel-at-once parallel write 16 bit graphics processor with 16 pixel-at-once parallel write microprogrammed 16x16 bit matrix multiplier microprogrammed 16x16 bit matrix multiplier microprogrammed floating point matrix multiplier microprogrammed floating point matrix multiplier hardware Z-buffer hardware Z-buffer real-time alpha-blend hardware for two RGB images real-time alpha-blend hardware for two RGB images real-time RGB video frame grabberreal-time RGB video frame grabber
Pixel-planes 5 1989Pixel-planes 5 1989
Upto 32 GPs, i860, and upto 8 Renderers
2 GPs per board
1 128x128 arrayper board
Pixel-planes 5 RendererPixel-planes 5 Renderer
1 board had 64 mini-chips: Each with 2 columns of 128 pixel processors (w/memory)
RendererRenderer
64 chips of 256 pixel processing elements (PE
Each PE has 208 bits of memory, the chip contains a
Quadratic expression evaluator (QEE) Ax+By+C+Dx2+Exy+Fy2 simultaneously at
each pixel
Basic AlgorithmBasic Algorithm Host app transmits model database and new
frame requests to MGP Screen divided statically into bins of 128x128
pixels MGP allocates Renderers to screen regions
MGP broadcasts database commands to all GPs. GPs generate Renderer commands for each prim
Commands inserted into appropriate bins GPs send the bins Round-robin The Renderers send computed pixels to the frame
buffer.
SGI RealityEngineSGI RealityEngine
Kurt Akely 1993:Kurt Akely 1993:
The implementation is near-massively The implementation is near-massively parallel, employing 353 independent parallel, employing 353 independent processors in its fullest configuration, processors in its fullest configuration, resulting in a measured fill rate of over resulting in a measured fill rate of over 240 million antialiased, texture mapped 240 million antialiased, texture mapped pixels per second. Rendering performance pixels per second. Rendering performance exceeds 1 million antialiased, texture exceeds 1 million antialiased, texture mapped triangles per second.mapped triangles per second.
RealityEngine RealityEngine ArchitectureArchitecture
Input FIFO,Command Processor6, 8, or 12 Geom Engines
1, 2, or 4 raster boards5 Fragment Generators (Each has texture replica)80 Image Engines1280x1024 Framebuffer 256 bits/pixel
RealityEngine AlgorithmRealityEngine Algorithm
FIFO geometry distributed by CP to FIFO geometry distributed by CP to GEsGEs
GEs do geometry ops including setupGEs do geometry ops including setup GEs broadcast triangles to FG (Raster)GEs broadcast triangles to FG (Raster)
Finely interleaved pixel assignmentFinely interleaved pixel assignment FG distribute fragments to IEFG distribute fragments to IE
IEs do raster opsIEs do raster ops IEs are the framebufferIEs are the framebuffer
PC ArchitecturePC Architecture
NorthBridge
SouthBridge
CPU
PCI BUS
FSBMEM BUS
ATA BUS
PCI Express
(Upto 2.5Gbps bi-directional per lane)
nVIDIA 8800nVIDIA 8800
Process 90nm
Die Size484mm² (681 million Transistors)
21.5mm x 22.5mm
Chip Package Flipchip
Basic Pipeline Config32 / 24 / 192
Textures / Pixels / Z
Memory Config 384-bit
6x 64-bit (GDDR – GDDR4)
System Interconnect PCI Express x16
FSAAMultisampling, Supersampling,
Coverage samp., Transparency 2x1/2x2/4x2 (On a 16x16 grid)