csl 859: advanced computer graphics dept of computer sc. & engg. iit delhi

CSL 859: CSL 859: Advanced Advanced Computer Computer GraphicsGraphicsDept of Computer Sc. & Engg.Dept of Computer Sc. & Engg.

IIT DelhiIIT Delhi

Adrianne DemoAdrianne Demo Skin shaderSkin shader

1,400 instructions per pixel1,400 instructions per pixel 15 render passes15 render passes Five bump mapsFive bump maps Physically-based lighting with sub-surface scatteringPhysically-based lighting with sub-surface scattering

Three skin layers with different scattering properties.Three skin layers with different scattering properties. Complex anisotropic hair shaderComplex anisotropic hair shader Real geometryReal geometry

GPU-accelerated character skinningGPU-accelerated character skinning BlendshapesBlendshapes Sculpt deformersSculpt deformers Skeletal-driven bump mapsSkeletal-driven bump maps

Graphics PipelineGraphics PipelineGeometry

Transform

Framebuffer

Clip Setup

Rasterize Z-test

Light

TextureBlend

Picture

Graphics PipelineGraphics PipelineVertex Connectivity Textures

Vertex Shader

Framebuffer

Clip &Setup

Rasterize FragmentShader

Raster OPs

PrimitiveAssembly

Texture

Blend

Picture

BottlenecksBottlenecks

Too many operationsToo many operations ParallelizeParallelize

Too many memory accessesToo many memory accesses ParallelizeParallelize

SCREENTILEXBAR

SCREENTILE

SCREENTILE

GEOMETRY OPERATIONS

FRAGMENTOPERATIONS

ParallelizationParallelization

Distribute computation to processorsDistribute computation to processors Work allocationWork allocation

Distribute texture to memory banksDistribute texture to memory banks Tile Screen-pixels into memory banksTile Screen-pixels into memory banks

Do all processors have access to all Do all processors have access to all memorymemory

Distribute access/Replicate dataDistribute access/Replicate data

Sorting TaxonomySorting Taxonomy

Sort firstSort first Allocate to processor, which is responsible for Allocate to processor, which is responsible for

only a given area of the screenonly a given area of the screen Sort middleSort middle

Optimally perform geometry ops and then Optimally perform geometry ops and then distribute to the responsible processordistribute to the responsible processor

Sort lastSort last No-screen subdivision.No-screen subdivision. Optimally perform geometry and fragment ops Optimally perform geometry and fragment ops

and then compose resultsand then compose results

Memory ConsiderationsMemory Considerations

Highly pipelinedHighly pipelined Guard against stallsGuard against stalls

Memory bandwidthMemory bandwidth How many accesses per second?How many accesses per second?

LatencyLatency Latency hiding buffers Latency hiding buffers

Larger memory atomsLarger memory atoms e.g., 32 byte atomse.g., 32 byte atoms

Graphics Architecture: A Graphics Architecture: A Brief HistoryBrief History

Evans & SutherlandEvans & Sutherland IkonasIkonas UNC Chapel HillUNC Chapel Hill Silicon GraphicsSilicon Graphics

(Mushroom: Smart VGA controllers)(Mushroom: Smart VGA controllers) nVIDIA, AMDnVIDIA, AMD

IKONASIKONAS 32 bit data, 24 bit address bus backbone32 bit data, 24 bit address bus backbone

Everything memory mappedEverything memory mapped Host interface = address registers to access anything on the bus. Host interface = address registers to access anything on the bus.

Frame buffer resolution and timing could be set via control registers. Frame buffer resolution and timing could be set via control registers. Graphics processorGraphics processor

(micro)Programmable(micro)Programmable 32 bit integer ALU and 16x16 bit integer multiplier32 bit integer ALU and 16x16 bit integer multiplier Address counters, Loop counters and Address counters, Loop counters and 64 bit instruction word. 64 bit instruction word.

Plug-in boardsPlug-in boards 16 bit graphics processor with 16 pixel-at-once parallel write 16 bit graphics processor with 16 pixel-at-once parallel write microprogrammed 16x16 bit matrix multiplier microprogrammed 16x16 bit matrix multiplier microprogrammed floating point matrix multiplier microprogrammed floating point matrix multiplier hardware Z-buffer hardware Z-buffer real-time alpha-blend hardware for two RGB images real-time alpha-blend hardware for two RGB images real-time RGB video frame grabberreal-time RGB video frame grabber

IKONAS 1981IKONAS 1981

Pixel-planes 5 1989Pixel-planes 5 1989

Upto 32 GPs, i860, and upto 8 Renderers

2 GPs per board

1 128x128 arrayper board

Pixel-planes 5 RendererPixel-planes 5 Renderer

1 board had 64 mini-chips: Each with 2 columns of 128 pixel processors (w/memory)

RendererRenderer

64 chips of 256 pixel processing elements (PE

Each PE has 208 bits of memory, the chip contains a

Quadratic expression evaluator (QEE) Ax+By+C+Dx2+Exy+Fy2 simultaneously at

each pixel

Basic AlgorithmBasic Algorithm Host app transmits model database and new

frame requests to MGP Screen divided statically into bins of 128x128

pixels MGP allocates Renderers to screen regions

MGP broadcasts database commands to all GPs. GPs generate Renderer commands for each prim

Commands inserted into appropriate bins GPs send the bins Round-robin The Renderers send computed pixels to the frame

buffer.

SGI RealityEngineSGI RealityEngine

Kurt Akely 1993:Kurt Akely 1993:

The implementation is near-massively The implementation is near-massively parallel, employing 353 independent parallel, employing 353 independent processors in its fullest configuration, processors in its fullest configuration, resulting in a measured fill rate of over resulting in a measured fill rate of over 240 million antialiased, texture mapped 240 million antialiased, texture mapped pixels per second. Rendering performance pixels per second. Rendering performance exceeds 1 million antialiased, texture exceeds 1 million antialiased, texture mapped triangles per second.mapped triangles per second.

RealityEngine RealityEngine ArchitectureArchitecture

Input FIFO,Command Processor6, 8, or 12 Geom Engines

1, 2, or 4 raster boards5 Fragment Generators (Each has texture replica)80 Image Engines1280x1024 Framebuffer 256 bits/pixel

RealityEngine AlgorithmRealityEngine Algorithm

FIFO geometry distributed by CP to FIFO geometry distributed by CP to GEsGEs

GEs do geometry ops including setupGEs do geometry ops including setup GEs broadcast triangles to FG (Raster)GEs broadcast triangles to FG (Raster)

Finely interleaved pixel assignmentFinely interleaved pixel assignment FG distribute fragments to IEFG distribute fragments to IE

IEs do raster opsIEs do raster ops IEs are the framebufferIEs are the framebuffer

RealityEngineRealityEngine

GE

FG

IE

PC ArchitecturePC Architecture

NorthBridge

SouthBridge

CPU

PCI BUS

FSBMEM BUS

ATA BUS

PCI Express

(Upto 2.5Gbps bi-directional per lane)

nVIDIA 8800nVIDIA 8800

Process 90nm

Die Size484mm² (681 million Transistors)

21.5mm x 22.5mm

Chip Package Flipchip

Basic Pipeline Config32 / 24 / 192

Textures / Pixels / Z

Memory Config 384-bit

6x 64-bit (GDDR – GDDR4)

System Interconnect PCI Express x16

FSAAMultisampling, Supersampling,

Coverage samp., Transparency 2x1/2x2/4x2 (On a 16x16 grid)

TextureTextureTextures Per Pass 128

Texture Filtering MethodsBilinear, Trilinear, 2-16x Anisotropic

Texture Compression DXTC 1-5, 3Dc+

Fragment Processors 128x FP32 scalar MADD+MUL

csl 859: advanced computer graphics dept of computer sc. & engg. iit delhi

Documents

memory accesses

memory considerations

geometry ops

amd slide

screen subdivision

graphics pipeline geometry

accessreplicate data

memory mapped host interface