developing efficient graphics software
DESCRIPTION
Developing Efficient Graphics Software. Developing Efficient Graphics Software. Intent of Course Identify application and hardware interaction Quantify and optimize interaction Identify efficient software structure Balance software and hardware system component use. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/1.jpg)
Developing Efficient Graphics Software
![Page 2: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/2.jpg)
Developing Efficient Graphics SoftwareIntent of CourseIntent of Course• Identify application and hardware interactionIdentify application and hardware interaction
• Quantify and optimize interactionQuantify and optimize interaction
• Identify efficient software structureIdentify efficient software structure
• Balance software and hardware system component useBalance software and hardware system component use
![Page 3: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/3.jpg)
Developing Efficient Graphics Software OutlineOutline• 1:35 Hardware and graphics architecture and performance1:35 Hardware and graphics architecture and performance
• 2:05 Software and System Performance2:05 Software and System Performance
• BreakBreak
• 2:55 Software profiling and performance analysis2:55 Software profiling and performance analysis
• 3:20 C/C++ language issues3:20 C/C++ language issues
• 3:50 Graphics techniques and algorithms3:50 Graphics techniques and algorithms
• 4:40 Performance Hints4:40 Performance Hints
![Page 4: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/4.jpg)
Developing Efficient Graphics SoftwareSpeakers Speakers • Applications Consulting Engineers for SGI Applications Consulting Engineers for SGI
– optimizing, differentiating, graphicsoptimizing, differentiating, graphics
• Keith Cok, Bob Kuehne, Thomas True, Alan CommikeKeith Cok, Bob Kuehne, Thomas True, Alan Commike
![Page 5: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/5.jpg)
Hardware & Graphics Architecture & Performance
Bob Kuehne, SGIBob Kuehne, SGI
![Page 6: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/6.jpg)
Course OverviewWhy is your application drawing so slowly?Why is your application drawing so slowly?• Could actually be the graphicsCould actually be the graphics
• Could be the data traversalCould be the data traversal
• Could be something entirely differentCould be something entirely different
![Page 7: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/7.jpg)
Tour GuidePlatform architecture & componentsPlatform architecture & components• CPUCPU
• MemoryMemory
• GraphicsGraphics
Graphics performanceGraphics performance• Measurements: triangle rate, fill rate, misc. Measurements: triangle rate, fill rate, misc.
• Reproduce & maximizeReproduce & maximize
![Page 8: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/8.jpg)
Bottlenecks & BalanceBottlenecksBottlenecks• Find themFind them
• Eliminate them (sort of - move them around)Eliminate them (sort of - move them around)
BalanceBalance• Understand hardware architectureUnderstand hardware architecture
• Fully utilize hardwareFully utilize hardware
![Page 9: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/9.jpg)
Yin & Yang
• ““Yin and yang are the two primal cosmic principles of the Yin and yang are the two primal cosmic principles of the universe”universe”
• ““The best state for everything in the universe is a state of The best state for everything in the universe is a state of harmony represented by a balance of yin and yang.”harmony represented by a balance of yin and yang.”
– Skeptics Dictionary -- http://skepdic.com/yinyang.htmlSkeptics Dictionary -- http://skepdic.com/yinyang.html
![Page 10: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/10.jpg)
Write Once Run Everywhere?My application ran fast on that platform! Why is My application ran fast on that platform! Why is this one so slow?this one so slow?• Different platforms require different tuningDifferent platforms require different tuning
• Different platforms implement hardware differentlyDifferent platforms implement hardware differently
– Macro: Architecture & featuresMacro: Architecture & features
– Micro: Storage capacities, buffers, & cachesMicro: Storage capacities, buffers, & caches
– Effect: Bandwidth & latencyEffect: Bandwidth & latency
![Page 11: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/11.jpg)
Definitions:Definitions:• Latency: time required to communicate a unit of dataLatency: time required to communicate a unit of data
• Bandwidth: data transferred per unit timeBandwidth: data transferred per unit time
Example:Example:• Latency bottleneck:Latency bottleneck:
• Bandwidth bottleneck:Bandwidth bottleneck:
Latency & Bandwidth
SS tt SS tt SS tt SS ttSS tt SS tt
tt tt ttSS tt tt tt : unit of time: unit of times: texture setup times: texture setup timet: texture download timet: texture download time
![Page 12: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/12.jpg)
Platform: Software View
graphics
i/o
miscmemory
CPU
net
![Page 13: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/13.jpg)
Platform: PCI, AGP
CPUCPU MemoryMemory
Dis
kD
isk
Net
Net
Gra
phic
sG
raph
ics
I/OI/O
PCIPCI
glueglue
CPUCPU
glueglue
MemoryMemory
Dis
kD
isk
Net
Net
Gra
phic
sG
raph
ics
I/OI/O
PCIPCI AGPAGP
![Page 14: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/14.jpg)
PCIPCI
glueglue
Platform: UMA, Switched Hub
CPUCPU
glueglue
MemoryMemory CPUCPU MemoryMemory
Dis
kD
isk
Net
Net
Gra
phic
sG
raph
ics
I/OI/ODis
kD
isk
Net
Net
I/OI/O
UMAUMAG
raph
ics
Gra
phic
s
![Page 15: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/15.jpg)
Platform: The PointsWhy learn about hardware?Why learn about hardware?• To understand how your app interacts with itTo understand how your app interacts with it
• To best utilize the hardwareTo best utilize the hardware
• Potentially can use extra hardware featuresPotentially can use extra hardware features
Where?Where?• Platform documentationPlatform documentation
• Talk with hardware vendorTalk with hardware vendor
![Page 16: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/16.jpg)
CPU: OverviewCPU OperationCPU Operation• Data transferred from main memory to registersData transferred from main memory to registers
• CPU works on data in registersCPU works on data in registers
LatencyLatency• Registers: 0 (free)Registers: 0 (free)
• Level-1 (L1) cache: 1Level-1 (L1) cache: 1
• Level-2 (L2) cache: 10x L1 Level-2 (L2) cache: 10x L1
• Main memory: 100x L1Main memory: 100x L1
CPUCPU RR L1L1 L2L2 MainMainMemoryMemory
![Page 17: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/17.jpg)
CPU, Cache, and MemoryCaches designed to exploit data localityCaches designed to exploit data locality• Temporal localityTemporal locality
• Spatial localitySpatial localityMainMain
MemoryMemory
CPUCPU
L1L1L2L2
RegistersRegisters
![Page 18: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/18.jpg)
Memory: Cache & Logical Flow
In L1?In L1? In L2?In L2?In Register?In Register?
ComputeCompute Copy to L1Copy to L1(10)(10)
Copy toCopy toRegisterRegister
(1)(1)
Copy to L2Copy to L2(100)(100)
![Page 19: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/19.jpg)
Memory: Cache & Physical Flow
CPUCPU
RegistersRegisters
PagePage
Main MemoryMain Memory L2 CacheL2 Cache L1 CacheL1 Cache
![Page 20: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/20.jpg)
Memory: Allocation & Pools• List elements are often allocated as-neededList elements are often allocated as-needed
– This leads to spatial disparityThis leads to spatial disparity
• Mitigated by use of application memory managementMitigated by use of application memory management
– Bad: malloc, malloc, malloc, malloc, ...Bad: malloc, malloc, malloc, malloc, ...
– Good: pools - pool_init, pool_alloc, ...Good: pools - pool_init, pool_alloc, ...
• Graphics example:Graphics example:
– Vertices, normals, textures, etc.Vertices, normals, textures, etc.
![Page 21: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/21.jpg)
Memory: Graphics! Vertex Arrays
Vertex Array Cache Behavior
Number of Array Vertices
Tim
e to
Tra
vers
e Platform 0 - InterleavedPlatform 0 - Non-interleavedPlatform 1 - InterleavedPlatform 1 - Non-interleaved
![Page 22: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/22.jpg)
Graphics: Pipe
FIFOFIFO
xfxf lightlight clipclip rastrast fxfx fopsfops
xfxf: world to screen: world to screenlightlight: apply light: apply lightclipclip: clip to view: clip to view
rastrast: convert to pixels: convert to pixelsfxfx: apply texture, etc.: apply texture, etc.fopsfops: test pixel ops: test pixel ops
![Page 23: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/23.jpg)
Graphics: Pipe & Akeley Taxonomy
• G - Generate geometric dataG - Generate geometric data
• T - Traverse data structuresT - Traverse data structures
• X - Transform primitives world to screenX - Transform primitives world to screen
• R - Rasterize triangles to pixelsR - Rasterize triangles to pixels
• D - Display framebuffer on output deviceD - Display framebuffer on output device
XX RR DDGG
TT
![Page 24: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/24.jpg)
Graphics: Hardware4 types of hardware are common4 types of hardware are common• G-TXRD : all hardwareG-TXRD : all hardware
• GT-XRD :GT-XRD :
• GTX-RD :GTX-RD :
• GTXR-D : all softwareGTXR-D : all software
![Page 25: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/25.jpg)
Graphics: PerformanceBenchmarksBenchmarks• ““Trust, but verify.” - an ex-presidentTrust, but verify.” - an ex-president
DefinitionsDefinitions• Triangle rate: speed at which primitives are transformed (X)Triangle rate: speed at which primitives are transformed (X)
• Fill rate: speed at which primitives are rasterized (R)Fill rate: speed at which primitives are rasterized (R)
– Depth complexity: number of times pixel filledDepth complexity: number of times pixel filled
CaveatsCaveats• Quantization, fastpathQuantization, fastpath
![Page 26: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/26.jpg)
Graphics: Quantization• Frame Frame quantizationquantization is the result of swapbuffers occurring at is the result of swapbuffers occurring at
the next vertical retrace.the next vertical retrace.
– Necessary to avoid image artifacts such as tearingNecessary to avoid image artifacts such as tearing
• Example: 100Hz display refreshExample: 100Hz display refresh
![Page 27: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/27.jpg)
Graphics: Quantization
100 Hz100 Hz
50 Hz50 Hz
50 Hz50 Hz
33 Hz33 Hz
tt00 tt11 tt22 tt33 tt44 tt55 tt44 tt66 tt77
no-sync 120 Hzno-sync 120 Hz
: : one graphics frameone graphics frame ttnn: : 1/100 second1/100 second
![Page 28: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/28.jpg)
Graphics: FastpathDefinitionDefinition• Fastpath: the most optimized path through graphics Fastpath: the most optimized path through graphics
hardwarehardware
ExampleExample• fast path: float verts, float norms, AGBR textures, z-testfast path: float verts, float norms, AGBR textures, z-test
• less fast path: float verts, float norms, RGBA textures, z-testless fast path: float verts, float norms, RGBA textures, z-test
![Page 29: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/29.jpg)
Graphics: Fastpath Example
![Page 30: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/30.jpg)
• Fast path is often synonymous with ideal path.Fast path is often synonymous with ideal path.
– Real usage of graphics falls on a continuum.Real usage of graphics falls on a continuum.
• Must quantify what hardware can doMust quantify what hardware can do
– Quality & speedQuality & speed
Graphics: Fastpath Points
Fast pathFast path(hardware)(hardware)
Slow pathSlow path(software)(software)
SpeedSpeed QualityQualityWhere is your application?Where is your application?
![Page 31: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/31.jpg)
Graphics Hardware: TestingDuplicate performance numbers simply:Duplicate performance numbers simply:• Good: build a simple test programGood: build a simple test program
• Better: glPerf - http://www.spec.orgBetter: glPerf - http://www.spec.org
Maximize performance in an app:Maximize performance in an app:• Good: Use fast API extensionsGood: Use fast API extensions
• Better: Create an “is-fast” test, use what is verified as fastBetter: Create an “is-fast” test, use what is verified as fast
![Page 32: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/32.jpg)
Graphics Hardware: “Is-Fast”Test each platform to determine fast path Test each platform to determine fast path • Once, per-machine, test primitives and modesOnce, per-machine, test primitives and modes
– Vertex array format, texture format, display list, etc.Vertex array format, texture format, display list, etc.
• Store data in databaseStore data in database
– Detect hardware changes or time-to-liveDetect hardware changes or time-to-live
• Read data from database at startupRead data from database at startup
– Check database or re-generate dataCheck database or re-generate data
![Page 33: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/33.jpg)
Graphics Hardware: “Is-Fast”Pseudo-codePseudo-code
If ( new_machine() || hardware_changed() ) {If ( new_machine() || hardware_changed() ) { test_interesting_modes();test_interesting_modes(); store_in_database();store_in_database(); }}else { else { // have database entry// have database entry get_performance_data_from_database();get_performance_data_from_database();}}
// use the modes & primitives that are ‘’fast’’ when rendering// use the modes & primitives that are ‘’fast’’ when rendering
![Page 34: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/34.jpg)
Think Globally, Act LocallyThink globallyThink globally• Know the platforms & graphics hardwareKnow the platforms & graphics hardware
• Use hardware effectively in your appUse hardware effectively in your app
• Balance hardware utilizationBalance hardware utilization
Act locallyAct locally• Use in-cache dataUse in-cache data
• Understand hardware & graphics fastpathsUnderstand hardware & graphics fastpaths
• Balance quality vs. performanceBalance quality vs. performance
![Page 35: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/35.jpg)
Software and System Performance
Thomas J. True, SGIThomas J. True, SGI
![Page 36: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/36.jpg)
A Four Step Process
Quantify
System Evaluation
Graphics Analysis
Bottleneck Elimination
![Page 37: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/37.jpg)
QuantifyCharacterizeCharacterize• Application SpaceApplication Space
• Primitive TypesPrimitive Types
• Primitive CountsPrimitive Counts
• Rendering CharacteristicsRendering Characteristics
• Frame RateFrame Rate
![Page 38: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/38.jpg)
QuantifyCompareCompare
TriangleRate
Fill Rate
My Performance Ideal Performance
![Page 39: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/39.jpg)
Examine System ConfigurationResourcesResources• MemoryMemory
• DiskDisk
SetupSetup• DisplayDisplay
• NetworkNetwork
![Page 40: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/40.jpg)
Graphics AnalysisIdeal PerformanceIdeal Performance• Keep graphics pipeline full.Keep graphics pipeline full.
• 100% CPU utilization running application code.100% CPU utilization running application code.
• 100% graphics utilization.100% graphics utilization.
![Page 41: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/41.jpg)
Graphics AnalysisGraphics BoundGraphics Bound
Acme Electronics
0 100
5030
10
40
209080
7060
0 100
5030
10
40
209080
7060
![Page 42: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/42.jpg)
Graphics AnalysisGraphics BoundGraphics Bound• Graphics subsystem processes data slower than CPU can Graphics subsystem processes data slower than CPU can
feed it.feed it.
• Graphics subsystem issues an interrupt which causes the Graphics subsystem issues an interrupt which causes the CPU to stall.CPU to stall.
• Data processing within application stops until graphics Data processing within application stops until graphics subsystem can again accept data.subsystem can again accept data.
![Page 43: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/43.jpg)
Graphics AnalysisGeometry LimitedGeometry Limited• Limited by the rate at which vertices can be transformed and Limited by the rate at which vertices can be transformed and
clipped.clipped.
Fill LimitedFill Limited• Limited by the rate at which transformed vertices can be Limited by the rate at which transformed vertices can be
rasterized.rasterized.
![Page 44: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/44.jpg)
Graphics AnalysisCPU BoundCPU Bound
Acme Electronics
0 100
5030
10
40
209080
7060
0 100
5030
10
40
209080
7060
![Page 45: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/45.jpg)
Graphics AnalysisCPU BoundCPU Bound• CPU at 100% utilization but can’t feed graphics fast enough.CPU at 100% utilization but can’t feed graphics fast enough.
• Graphics subsystem at less than 100% utilization.Graphics subsystem at less than 100% utilization.
• All CPU cycles consumed by data processing.All CPU cycles consumed by data processing.
![Page 46: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/46.jpg)
Graphics AnalysisDetermination TechniquesDetermination Techniques• Remove graphics API calls.Remove graphics API calls.
• Shrink graphics window.Shrink graphics window.
• Reduce geometry processing requirements.Reduce geometry processing requirements.
• Use system monitoring tool.Use system monitoring tool.
![Page 47: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/47.jpg)
Graphics AnalysisStart
Remove graphics API
calls
Graphics Performance
Problem
Graphics bound:?
Performance Problem Not
Graphics
Graphics bound: fill limited
Graphics bound: geometry limited
Remove rendering
calls
Fallen off fast path
Shrink graphics window
Reduce geometry
load
Use system monitoring
tool
Excessive or unexpected CPU
activity
= frame rate increase = no change in frame rate
![Page 48: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/48.jpg)
Graphics AnalysisGraphics Architecture: GTXR-DGraphics Architecture: GTXR-D
Acme Electronics
![Page 49: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/49.jpg)
Graphics AnalysisGraphics Architecture: GTXR-D Graphics Architecture: GTXR-D (aka Dumb Frame Buffer)(aka Dumb Frame Buffer) • CPU does everything.CPU does everything.
• Typically CPU bound.Typically CPU bound.
• To remedy, buy a “real” graphics board.To remedy, buy a “real” graphics board.
![Page 50: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/50.jpg)
Graphics AnalysisGraphics Architecture: GTX-RDGraphics Architecture: GTX-RD
Acme Electronics
![Page 51: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/51.jpg)
Graphics AnalysisGraphics Architecture: GTX-RDGraphics Architecture: GTX-RD• Screen space operations performed by graphics.Screen space operations performed by graphics.
• Object-space to screen-space transform on host.Object-space to screen-space transform on host.
• Can easily become CPU bound.Can easily become CPU bound. ““Roughly 100 single-precision floating point operations are required to Roughly 100 single-precision floating point operations are required to
transform, light, clip test, project and map an object-space vertex to screen-transform, light, clip test, project and map an object-space vertex to screen-space.” - K. Akeley & T. Jermolukspace.” - K. Akeley & T. Jermoluk
• Beware of fast-path and slow-path issues.Beware of fast-path and slow-path issues.
![Page 52: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/52.jpg)
Graphics AnalysisGraphics Architecture: GTX-RDGraphics Architecture: GTX-RD• If Graphics Bound:If Graphics Bound:
– Reduce per-pixel operations.Reduce per-pixel operations.
– Reduce depth complexity.Reduce depth complexity.
– Use native-format data.Use native-format data.
![Page 53: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/53.jpg)
Graphics AnalysisGraphics Architecture: GTX-RDGraphics Architecture: GTX-RD• If CPU Bound:If CPU Bound:
– Reduce scene complexity.Reduce scene complexity.
– Use more efficient graphics algorithms.Use more efficient graphics algorithms.
![Page 54: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/54.jpg)
Graphics AnalysisGraphics Architecture: GT-XRDGraphics Architecture: GT-XRD
Acme Electronics
![Page 55: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/55.jpg)
Graphics AnalysisGraphics Architecture: GT-XRDGraphics Architecture: GT-XRD• Transformation and rasterization performed by graphics.Transformation and rasterization performed by graphics.
• Can be CPU or graphics bound. Can be CPU or graphics bound.
• Beware of fast-path and slow-path issues.Beware of fast-path and slow-path issues.
• Subject to host bandwidth limitations.Subject to host bandwidth limitations.
![Page 56: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/56.jpg)
Graphics AnalysisGraphics Architecture: GT-XRDGraphics Architecture: GT-XRD• If Graphics Bound:If Graphics Bound:
– Move lighting back to CPU.Move lighting back to CPU.
– Use native data formats within application.Use native data formats within application.
– Use display lists or vertex arrays.Use display lists or vertex arrays.
– Use less expensive lighting modes.Use less expensive lighting modes.
![Page 57: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/57.jpg)
Graphics AnalysisGraphics Architecture: GT-XRDGraphics Architecture: GT-XRD• If CPU Bound:If CPU Bound:
– Move lighting from CPU to graphics subsystem.Move lighting from CPU to graphics subsystem.
– Do matrix operations in graphics hardware.Do matrix operations in graphics hardware.
– Profile in search of computational performance issues.Profile in search of computational performance issues.
![Page 58: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/58.jpg)
Bottleneck EliminationBottlenecksBottlenecks
![Page 59: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/59.jpg)
Bottleneck EliminationBottlenecksBottlenecks• Understanding, crucial to effective tuning.Understanding, crucial to effective tuning.
• Will always exist, tune to balance.Will always exist, tune to balance.
• Not always a bad thing.Not always a bad thing.
![Page 60: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/60.jpg)
Bottleneck EliminationGraphicsGraphics• Use native graphics formats.Use native graphics formats.
• Remove excessive state changes.Remove excessive state changes.
• Package graphics primitives efficiently.Package graphics primitives efficiently.
• Use textures that fit in texture cache.Use textures that fit in texture cache.
• Don’t use unnecessary rendering modes.Don’t use unnecessary rendering modes.
• Decrease depth complexity.Decrease depth complexity.
• Cull out excessive geometry.Cull out excessive geometry.
![Page 61: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/61.jpg)
Bottleneck EliminationMemoryMemory• Don’t allocate memory in rendering loop.Don’t allocate memory in rendering loop.
• Avoid copying and repackaging of graphics data.Avoid copying and repackaging of graphics data.
• Organize graphics data.Organize graphics data.
• Avoid memory fragmentation.Avoid memory fragmentation.
![Page 62: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/62.jpg)
Bottleneck EliminationMemory Bandwidth and FragmentationMemory Bandwidth and Fragmentation
Independent Triangles
9 vertices: 504 bytes
Triangle Strip
5 vertices: 280 bytes
Vertex Array
5 vertices: 280 bytes
Vertex = RGBA+XYZW+XYZ+STR = 56 bytes
![Page 63: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/63.jpg)
Bottleneck EliminationCode and LanguageCode and Language• Use native data types.Use native data types.
• Avoid contention for a single shared resource.Avoid contention for a single shared resource.
• Avoid application bottlenecks in non-graphics code.Avoid application bottlenecks in non-graphics code.
• Reduce API call overhead.Reduce API call overhead.
![Page 64: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/64.jpg)
Bottleneck EliminationAPI Call OverheadAPI Call Overhead
Independent Triangles
(XYZW + RGBA + XYZ + STR) * 9 vertices: 36 function calls
Triangle Strips
(XYZW + RGBA + XYZ + STR) * 5 vertices: 20 function calls
Vertex Array
5 function calls
Display List
1 function call
![Page 65: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/65.jpg)
ConclusionPerformance Tuning an Iterative ProcessPerformance Tuning an Iterative Process
Quantify
System Evaluation
Graphics Analysis
Bottleneck Elimination
![Page 66: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/66.jpg)
ConclusionIt’s all about balance!It’s all about balance!
![Page 67: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/67.jpg)
Profiling and Performance Analysis
Keith Cok, SGIKeith Cok, SGI
![Page 68: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/68.jpg)
Profile and Performance Analysis
• Profiling points out code areas that take up most timeProfiling points out code areas that take up most time
• Imperative for well balanced applicationImperative for well balanced application
• Points out code and system bottlenecksPoints out code and system bottlenecks
![Page 69: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/69.jpg)
Two Methods of Software ProfilingBasic block Basic block • A section of code that has one entry and one exitA section of code that has one entry and one exit
• Measures Measures ideal timeideal time
Statistical samplingStatistical sampling• Interrupts program execution and examines current locationInterrupts program execution and examines current location
• Measures Measures actual CPU cyclesactual CPU cycles spent executing a line of code spent executing a line of code
![Page 70: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/70.jpg)
How Do You Profile Code?• Compile/link with compiler optimizations turned onCompile/link with compiler optimizations turned on
– cc foo.c -use_all_optimization_flagscc foo.c -use_all_optimization_flags .... ....
• Instrument the codeInstrument the code
– Unix: Unix: pixie foo.exepixie foo.exe -> foo.exe.pixie -> foo.exe.pixie
– Visual Studio: embedded in tool suiteVisual Studio: embedded in tool suite
• Run the application with relevant data setsRun the application with relevant data sets
– foo.exe.pixie - argsfoo.exe.pixie - args -> produces results data file -> produces results data file
![Page 71: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/71.jpg)
Profiling: Finding the Hot SpotFunction list, in descending order by exclusive ideal time
excl.% cum.% instructions calls function (dso: file, line)
[1] 10.3% 10.3% 190583064 11484 GL_CreateSurfaceLightmap (foo: gl_rsurf.c, 1293)
[2] 8.9% 19.2% 173920781 3203 S_Update_ (foo: snd_dma.c, 848)
[3] 8.2% 27.4% 145950460 338787 R_RenderBrushPoly (foo: gl_rsurf.c, 641)
[4] 5.9% 33.3% 97798122 1975976 __sin (libm.so: sin.c, 194)
[5] 4.1% 37.4% 82310479 240 GL_LoadTexture (foo: gl_draw.c, 990)
[6] 3.4% 40.8% 50786176 1204269 __glMgrim_Begin (libGLcore.so: mgras_prim.c, 221)
[7] 3.2% 44.0% 58099072 16797 R_DrawAliasModel (foo: gl_rmain.c, 232)
[8] 3.1% 47.1% 53832546 290970 R_RecursiveWorldNode (foo: gl_rsurf.c, 894)
[9] 3.1% 50.2% 43855299 437627 R_CullBox (foo: gl_rlight.c, 313; compiled in gl_rmain.c)
[10] 2.8% 53.0% 44666700 30981 EmitWaterPolys (foo: gl_warp.c, 187)
![Page 72: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/72.jpg)
Profiling: Fixing the Hot SpotWhat do you look for?What do you look for?• Common sub-expressions Common sub-expressions
• Loop invariant codeLoop invariant code
• Repeated pointer de-referencingRepeated pointer de-referencing
• Global variables and cache missesGlobal variables and cache misses
• ““Thin” loopsThin” loops
![Page 73: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/73.jpg)
Profiling Example
// Code the old way// Code the old way // Code the new way// Code the new way19: 19: void old_loop() {void old_loop() { 27: 27: void new_loop () {void new_loop () {20: 20: sum = 0;sum = 0; 28: 28: sum = 0;sum = 0;21: 21: for (i = 0;i < NUM; i++)for (i = 0;i < NUM; i++) 29: 29: ii = NUM%4;ii = NUM%4;22: 22: sum += x[i];sum += x[i]; 30: 30: for (i=0; i < ii; i++)for (i=0; i < ii; i++)23: 23: printf("sum = %f\n",sum);printf("sum = %f\n",sum); 31: 31: sum +=x[I];sum +=x[I];24: 24: }} 32: 32: for (i = ii; i < NUM; i +=4) {for (i = ii; i < NUM; i +=4) {
33: 33: sum += x[i];sum += x[i];34: 34: sum += x[i+1];sum += x[i+1];35: 35: sum += x[i+2];sum += x[i+2];36: 36: sum += x[i+3];sum += x[i+3];37 : 37 : }}38: 38: printf(“ sum = %f\n”,sum);printf(“ sum = %f\n”,sum);39: 39: }}
![Page 74: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/74.jpg)
Profiling Example: Profile Results
cycles instructions calls cycles instructions calls function (dso: file: line)function (dso: file: line)
[1] [1] 6160 6160 6168 6168 1 1 old_loopold_loop (blahdso.so: blahdso.c, 19) (blahdso.so: blahdso.c, 19)[2] [2] 4869 4869 8714 8714 1 1 setup_data (blahdso.so: blahdso.c, 11)setup_data (blahdso.so: blahdso.c, 11)
[1] [1] 4869 8714 4869 8714 1 1 setup_data (blahdso.so: blahdso.c, 11)setup_data (blahdso.so: blahdso.c, 11)
[2] [2] 4625 4891 4625 4891 1 1 new_loopnew_loop (blahdso.so: blahdso.c, 27) (blahdso.so: blahdso.c, 27)
![Page 75: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/75.jpg)
Profile Example: Line AnalysisLine list, in descending order by timeLine list, in descending order by time------------------------------------------------------------------------------------------------------------ cycles invocations function (dso: file, line)cycles invocations function (dso: file, line)
4096 1024 4096 1024 old_loop old_loop sum += x[i];sum += x[i]; 2061 1024 2061 1024 old_loop old_loop for (i = 0;i < NUM; i++)for (i = 0;i < NUM; i++) 978 256 978 256 new_loop new_loop sum += x[i+3];sum += x[i+3]; 968 256 968 256 new_loop new_loop sum += x[i+2];sum += x[i+2]; 968 256 968 256 new_loop new_loop sum += x[i+1];sum += x[i+1]; 968 256 968 256 new_loop new_loop sum += x[i];sum += x[i]; 733 256 733 256 new_loop new_loop for (i = ii; i < NUM; i +=4)for (i = ii; i < NUM; i +=4) 7 1 7 1 new_loop new_loop ii = NUM%4;ii = NUM%4;
![Page 76: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/76.jpg)
Profile and Performance AnalysisProfile Example: Visual C++/IntelProfile Example: Visual C++/Intel
Function Percent of HitFunction Percent of Hit Function Function Time(s) Run Time CountTime(s) Run Time Count ------------------------------------------------------------------------------------------------------------------------------------ 0.410 0.410 39.4 1 39.4 1 _old_loop _old_loop 0.249 23.9 1 _new_loop0.249 23.9 1 _new_loop
![Page 77: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/77.jpg)
Statistical vs. Basic Block Profile
void void ijkijk_loop(){ // loops _loop(){ // loops kjikji and and ikjikj as well as well sum = 0;sum = 0; for (i=0;i<YNUM;i++)for (i=0;i<YNUM;i++) for (j=0;j<YNUM;j++)for (j=0;j<YNUM;j++) for (k=0;k<YNUM;k++)for (k=0;k<YNUM;k++) sum += y[i][j][k];sum += y[i][j][k];}} printf("sum = %f\n",sum); printf("sum = %f\n",sum);
![Page 78: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/78.jpg)
Basic Block vs. Statistical SamplingBasic Block:Basic Block: Percent cycles inst calls function Percent cycles inst calls function [1] 25.3% 51141434 37101028 1 ijk_loop foo.c, 47[1] 25.3% 51141434 37101028 1 ijk_loop foo.c, 47 [2] 25.3% 51141434 37101028 1 kji_loop foo.c, 57[2] 25.3% 51141434 37101028 1 kji_loop foo.c, 57 [3] 25.3% 51141434 37101028 1 ikj_loop foo.c, 66[3] 25.3% 51141434 37101028 1 ikj_loop foo.c, 66
Statistical Sampling:Statistical Sampling: Percent Samples Procedure FunctionPercent Samples Procedure Function [1] 38.0% 2700 kji_loop foo.c, 57[1] 38.0% 2700 kji_loop foo.c, 57 [2] 23.9% 1700 setup_data foo.c, 15[2] 23.9% 1700 setup_data foo.c, 15 [3] 19.7% 1400 ikj_loop foo.c, 66[3] 19.7% 1400 ikj_loop foo.c, 66 [4] 18.3% 1300 ijk_loop foo.c, 47[4] 18.3% 1300 ijk_loop foo.c, 47
![Page 79: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/79.jpg)
Now We Know About Hot Spots...What do we do next?What do we do next?• Use compilers to fine-tune codeUse compilers to fine-tune code
• Use knowledge of language to optimizeUse knowledge of language to optimize
• Hand-tune codeHand-tune code
Profiling is fun, hard, and iterative and it can be Profiling is fun, hard, and iterative and it can be highly effectivehighly effective
![Page 80: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/80.jpg)
Compiler and Language Issues
Keith Cok, SGI Keith Cok, SGI Bob Kuehne, SGIBob Kuehne, SGI
![Page 81: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/81.jpg)
Compiler and Language IssuesCompiler Optimizations:Compiler Optimizations:• Occur within a compromise ofOccur within a compromise of
speed and memory spacespeed and memory space vs.vs. time to compile and linktime to compile and link• An iterative process to discover what does and doesn’t workAn iterative process to discover what does and doesn’t work
• Important to keep at itImportant to keep at it
![Page 82: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/82.jpg)
Compiler Issues: Trade-Offs• Trade-offs:Trade-offs:
– Round-off vs. needed precisionRound-off vs. needed precision
– Inter-procedural analysis vs. link timeInter-procedural analysis vs. link time
– Pointer aliasing vs. coding constraintsPointer aliasing vs. coding constraints
– Optimizing for processor architectures vs. work of multiple Optimizing for processor architectures vs. work of multiple binaries (support, test)binaries (support, test)
• Explore other compilers than your first choiceExplore other compilers than your first choice
• Different source code - different flagsDifferent source code - different flags
![Page 83: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/83.jpg)
Compiler and Language IssuesComments on 32 vs. 64 bit codeComments on 32 vs. 64 bit code• Benefits of 64 bit code:Benefits of 64 bit code:
– Increased address spaceIncreased address space– Higher precisionHigher precision
• Downsides of 64 bit code:Downsides of 64 bit code:– Application memory footprintApplication memory footprint– Need to port which can be difficult!Need to port which can be difficult!
• Performance issuesPerformance issues
![Page 84: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/84.jpg)
Language Issues
• Data ManagementData Management
• Unrolling loopsUnrolling loops
• ArraysArrays
• Temporary variablesTemporary variables
• Pointer aliasingPointer aliasing
![Page 85: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/85.jpg)
Language Issues: Data ManagementManipulate data structures efficiently since Manipulate data structures efficiently since graphics IS datagraphics IS data
struct { str *next;struct { str *next; struct { str *next;struct { str *next; str *prev;str *prev; str *prev; str *prev; large_type foo;large_type foo; int key; int key;
int key;int key; large_type foo; large_type foo; } str;} str; } str; } str;
![Page 86: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/86.jpg)
Language Issues: Data Management
Pack data efficientlyPack data efficientlystruct foo {struct foo { struct foo_better { struct foo_better { char aa; char aa; // 8 bits + 24 pad // 8 bits + 24 pad float bb; // 32 bitsfloat bb; // 32 bits float bb; float bb; // 32 bits// 32 bits char aa; // 8 bitschar aa; // 8 bits char cc;char cc; // 8 bits + 24 pad // 8 bits + 24 pad char cc; // 8 bitschar cc; // 8 bits float dd; float dd; // 32 bits// 32 bits char ee; // 8 bits + 8 padchar ee; // 8 bits + 8 pad char ee;char ee; // 8 bits + 24 pad // 8 bits + 24 pad float dd; // 32 bitsfloat dd; // 32 bits} foo_t; } foo_t; // // 160160 bits bits } foo_t; } foo_t; // // 9696 bits bits
![Page 87: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/87.jpg)
Language Issues: Data ManagementExamine your arrays and note their caching Examine your arrays and note their caching behaviorbehavior• Break up large arrays into smaller sub-arrays for better Break up large arrays into smaller sub-arrays for better
memory access patternsmemory access patterns
• Understand the implications of data layout and cache Understand the implications of data layout and cache behaviorbehavior
![Page 88: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/88.jpg)
Language Issues: Loop UnrollingProfiling ExampleProfiling Example// Code the old way// Code the old way // Code the new way// Code the new way19: 19: void old_loop() {void old_loop() { 27: 27: void new_loop() {void new_loop() {20: 20: sum = 0;sum = 0; 28: 28: sum = 0;sum = 0;21: 21: for (i = 0;i < NUM; i++)for (i = 0;i < NUM; i++) 29: 29: ii = NUM%4;ii = NUM%4;22: 22: sum += x[i];sum += x[i]; 30: 30: for (i=0; i < ii; i++)for (i=0; i < ii; i++)23: 23: printf("sum = %f\n",sum);printf("sum = %f\n",sum); 31: 31: sum +=x[i];sum +=x[i];24: 24: }} 32: 32: for (i=ii; i<NUM; ifor (i=ii; i<NUM; i +=4) {+=4) {
33: 33: sum += x[i];sum += x[i];34: 34: sum += x[i+1];sum += x[i+1];35: 35: sum += x[i+2];sum += x[i+2];36: 36: sum += x[i+3];sum += x[i+3];37: 37: }}38: 38: printf(“ sum = %f\n”,sum);printf(“ sum = %f\n”,sum);39: 39: }}
![Page 89: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/89.jpg)
Language Issues: Loop UnrollingProfile Example: Line AnalysisProfile Example: Line AnalysisLine list, in descending order by timeLine list, in descending order by time------------------------------------------------------------------------------------------------------------ cycles invocations function cycles invocations function 4096 1024 4096 1024 old_loop old_loop sum += x[i];sum += x[i]; 2061 1024 2061 1024 old_loop old_loop for (i = 0;i < NUM; i++)for (i = 0;i < NUM; i++) 978 256 978 256 new_loop new_loop sum += x[i+3];sum += x[i+3]; 968 256 968 256 new_loop new_loop sum += x[i+2];sum += x[i+2]; 968 256 968 256 new_loop new_loop sum += x[i+1];sum += x[i+1]; 968 256 968 256 new_loop new_loop sum += x[i];sum += x[i]; 733 256 733 256 new_loop new_loop for (i = ii; i < NUM; i +=4)for (i = ii; i < NUM; i +=4) 7 1 7 1 new_loop new_loop ii = NUM%4;ii = NUM%4;
![Page 90: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/90.jpg)
Language Issues: Loop UnrollingIssues with loop unrolling:Issues with loop unrolling:• Code complexityCode complexity• ClutterClutter• Compiler may/may not do thisCompiler may/may not do this• Flags may affect compiler time spent optimizingFlags may affect compiler time spent optimizing
Only “thin” loops gain performanceOnly “thin” loops gain performanceUse application knowledge to take advantage of Use application knowledge to take advantage of loop unrollingloop unrolling
![Page 91: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/91.jpg)
Language Issues: Local temporary variablesUse local temporary variables to avoid repeatedly Use local temporary variables to avoid repeatedly de-referencing a pointer structurede-referencing a pointer structureExample:Example:
x = global_ptr->record_str->a;x = global_ptr->record_str->a;y = global_ptr->record_str->b;y = global_ptr->record_str->b;
Use:Use:tmptmp = global_ptr->record_str; = global_ptr->record_str;x = x = tmptmp->a;->a;y = y = tmptmp->b;->b;
![Page 92: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/92.jpg)
Language Issues: Using tmp vars for global vars within a functionvoid tr_point(FLOAT *old_pt, FLOAT *m, FLOAT *new_pt)void tr_point(FLOAT *old_pt, FLOAT *m, FLOAT *new_pt)
FLOAT *c1, *c2, *c3, *c4, *op, *np, FLOAT *c1, *c2, *c3, *c4, *op, *np, tmptmp;;
c1 = m; c2 = m+4; c3 = m+8; c4 = m+12;c1 = m; c2 = m+4; c3 = m+8; c4 = m+12;for (j=0, np = new_pt;j<4; j++) { for (j=0; np = new_pt; j<4;j++) for (j=0, np = new_pt;j<4; j++) { for (j=0; np = new_pt; j<4;j++) op = old_pt; op = old_pt;op = old_pt; op = old_pt;
tmptmp += *op++ * *c1++; *np += *op++ * *c1++; += *op++ * *c1++; *np += *op++ * *c1++;tmptmp += *op++ * *c2++; *np += *op++ * *c2++; += *op++ * *c2++; *np += *op++ * *c2++;tmp tmp += *op++ * *c3++; *np += *op++ * *c3++; += *op++ * *c3++; *np += *op++ * *c3++;*np++ = *np++ = tmptmp + (*op * *c4++); } *np++ = *op++ * *c4++; } + (*op * *c4++); } *np++ = *op++ * *c4++; }
![Page 93: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/93.jpg)
Language Issues: Pointer Aliasing
• Pointers are aliases when they point to potentially Pointers are aliases when they point to potentially overlapping regions of memoryoverlapping regions of memory
• If regions never overlap, may optimize for this case. Not If regions never overlap, may optimize for this case. Not possible, though, in generalpossible, though, in general
• Compiler can't tell when pointers are aliasedCompiler can't tell when pointers are aliased
• Use Use restrict restrict key word or compiler optionkey word or compiler option
![Page 94: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/94.jpg)
Language Issues: Pointer Aliasing
in out
in out
Unaliased Pointers Compilers may use: - Parallelism - Pipelining
Aliased pointers
![Page 95: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/95.jpg)
Language Issues: Pointer Aliasing
void process_data( float * void process_data( float * restrictrestrict in, in, float * float * restrictrestrict out, out,
float gain) {float gain) {int i;int i;for (i = 0; i < NSAMPS; i++) {for (i = 0; i < NSAMPS; i++) {
out[i] = in[i] * gain;out[i] = in[i] * gain;}}
}}
![Page 96: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/96.jpg)
C++: General Issues• Language featuresLanguage features
– RTTI, safe casts, etc.RTTI, safe casts, etc.
• Use const, mutable, volatile, & inline Use const, mutable, volatile, & inline
– hints to compilershints to compilers
• Object constructionObject construction
– arrays, default constructors, arguments, etc.arrays, default constructors, arguments, etc.
• Method invocation issuesMethod invocation issues
– operators, overloads, conversion, etc.operators, overloads, conversion, etc.
![Page 97: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/97.jpg)
C++: Virtual Functions• Good - used to invoke child method when managing base-Good - used to invoke child method when managing base-
class handlesclass handles
• Expensive - incur an additional pointer de-referenceExpensive - incur an additional pointer de-reference
– one, find VTBL, two, find method, invokeone, find VTBL, two, find method, invoke
– bad for cachingbad for caching
• Use when necessary, but not for common objectsUse when necessary, but not for common objects
– Good for ‘large’ methods that do lots of workGood for ‘large’ methods that do lots of work
– Bad for ‘small’ methods, like a vertex queryBad for ‘small’ methods, like a vertex query
![Page 98: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/98.jpg)
C++: Exceptions & TemplatesExceptionsExceptions• Great for error checkingGreat for error checking
• Performance penaltyPerformance penalty
– Additional stack information requiredAdditional stack information required
TemplatesTemplates• Great for code re-useGreat for code re-use
• Memory penaltyMemory penalty
– Across libraries, across object filesAcross libraries, across object files
![Page 99: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/99.jpg)
Code & Language Issues: The EndBalanceBalance
• Know your compilerKnow your compiler
– Features & performanceFeatures & performance
• Know your languageKnow your language
– Features & performanceFeatures & performance
• Know your appKnow your app
– Features & performanceFeatures & performance
![Page 100: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/100.jpg)
Idioms and Application Architectures
Alan Commike, SGIAlan Commike, SGI
![Page 101: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/101.jpg)
Starting Quote
The best tuned most efficient bubble sort is still a The best tuned most efficient bubble sort is still a bubble sort. Additional tweaking won't improve bubble sort. Additional tweaking won't improve performance.performance.
Change The Algorithm!Change The Algorithm! - Commike ‘99
![Page 102: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/102.jpg)
IntroductionTo write an efficient graphics application, one To write an efficient graphics application, one must:must:• Understand the platformUnderstand the platform
• Use graphics efficientlyUse graphics efficiently
• Write good codeWrite good code
Use efficient application structures and algorithmsUse efficient application structures and algorithms
![Page 103: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/103.jpg)
Outline• OutlineOutline
• BackgroundBackground
• CullingCulling
• Level of Detail (LOD) managementLevel of Detail (LOD) management
• Application architecturesApplication architectures
![Page 104: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/104.jpg)
Application Architectures:Rendering Path• Application work, culling, LOD, drawingApplication work, culling, LOD, drawing
• Pipelined rendering pathPipelined rendering path
AppApp CullCull LODLOD DrawDraw
![Page 105: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/105.jpg)
Application Architectures:Rendering Path• Application work, culling, LOD, drawingApplication work, culling, LOD, drawing
• Pipelined rendering pathPipelined rendering path
AppApp CullCull LODLOD DrawDraw
AppApp CullCull LODLOD DrawDraw
![Page 106: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/106.jpg)
Application Architectures:Rendering Path• Application work, culling, LOD, drawingApplication work, culling, LOD, drawing
• Pipelined rendering pathPipelined rendering path
AppApp CullCull LODLOD DrawDraw
AppApp CullCull LODLOD DrawDraw
AppApp CullCull LODLOD DrawDraw
TT00 TT11 TT22 TT33 TT44 TT55
FrameFrame00
FrameFrame11
FrameFrame22
![Page 107: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/107.jpg)
Application Architectures:Target Frame RateA target frame rate attempts to bound the A target frame rate attempts to bound the maximum render timemaximum render time• Control Culling and LOD aggressivenessControl Culling and LOD aggressiveness
• Maintain a constant frame rateMaintain a constant frame rate
• Achieve an acceptable interactive frame rateAchieve an acceptable interactive frame rate
![Page 108: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/108.jpg)
Graphics Idioms• Culling Culling
– Removing geometry that isn't visibleRemoving geometry that isn't visible
• Level of Detail Management Level of Detail Management
– Reducing geometric complexityReducing geometric complexity
![Page 109: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/109.jpg)
Culling
Don’t draw what you can’t seeDon’t draw what you can’t see
![Page 110: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/110.jpg)
Culling:Culling TypesUse one. Use all. Pipeline them together.Use one. Use all. Pipeline them together.• View Frustum CullingView Frustum Culling
• Backface CullingBackface Culling
• Contribution CullingContribution Culling
• Occlusion CullingOcclusion Culling
![Page 111: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/111.jpg)
Culling:Bounding VolumesTest against a bounding volume not individual Test against a bounding volume not individual primitivesprimitives• Can be bounding sphere, box, oriented box, or any enclosing Can be bounding sphere, box, oriented box, or any enclosing
volumevolume
• Hierarchical bounding volumes to reduce cull timeHierarchical bounding volumes to reduce cull time
• Spheres are fast, boxes are more accurateSpheres are fast, boxes are more accurate
– Use a combination of both Use a combination of both
![Page 112: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/112.jpg)
Culling: View FrustumGraphics pipeline clips data that falls outside the Graphics pipeline clips data that falls outside the View FrustumView Frustum
If it will be clipped don’t bother drawingIf it will be clipped don’t bother drawing
![Page 113: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/113.jpg)
Culling: View Frustum Usefulness• Improves geometry rate Improves geometry rate
– Culled vertices are not transformed, lit, and clippedCulled vertices are not transformed, lit, and clipped
• Improves host download rateImproves host download rate
– Less data moved from memory into graphics Less data moved from memory into graphics
• Does not change fill rateDoes not change fill rate
– Triangles outside the View Frustum would not have been Triangles outside the View Frustum would not have been drawn anywaydrawn anyway
![Page 114: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/114.jpg)
Culling: View Frustum Implementation• Transform vertices to clip coordinates (in OpenGL multiply by Transform vertices to clip coordinates (in OpenGL multiply by
Model-View and Projection matrix)Model-View and Projection matrix)
• Check each vertex against View FrustumCheck each vertex against View Frustum
• Geometry is either Geometry is either In, , Out, or , or PartialPartial
• Render Render InIn and and PartialPartial
![Page 115: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/115.jpg)
Culling: Skip the ClipIn software transform systems (GTX-RD) skip the In software transform systems (GTX-RD) skip the clipclip• PartialPartial and and InIn geometry classified geometry classified
– Pipe renders Pipe renders PartialPartial as usual as usual
– Pipe can render Pipe can render InIn without a View Frustum clip without a View Frustum clip
• Might be a hint to renderMight be a hint to render
• Can improve geometry rates if not already fill-limitedCan improve geometry rates if not already fill-limited
![Page 116: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/116.jpg)
Only half of any closed polyhedron is visible at Only half of any closed polyhedron is visible at any one timeany one time
Don’t render what you can’t seeDon’t render what you can’t see
Culling: Backface
![Page 117: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/117.jpg)
Culling: Backface Usefulness• Improves fill rate when using a native implementationImproves fill rate when using a native implementation
– Primitives are transformed and lit before cullingPrimitives are transformed and lit before culling
• Helps both geometry and fill with an application specific Helps both geometry and fill with an application specific algorithmalgorithm
– More computationally expensiveMore computationally expensive
– Balance graphics and CPU workBalance graphics and CPU work
• This may not work well when you can enter closed geometry This may not work well when you can enter closed geometry or need two-sided lightingor need two-sided lighting
![Page 118: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/118.jpg)
Lava. Hot!
![Page 119: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/119.jpg)
Random Quote
Try not. Do, or do not. There is no try.Try not. Do, or do not. There is no try.
- Yoda ‘80- Yoda ‘80
![Page 120: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/120.jpg)
Culling: Contribution
If it’s too small to make a difference If it’s too small to make a difference
don’t render itdon’t render it
![Page 121: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/121.jpg)
Culling: Contribution Usefulness• Improves geometry rate Improves geometry rate
– Culled vertices are not transformed, lit, and clippedCulled vertices are not transformed, lit, and clipped
• Improves host download rateImproves host download rate
– Less data moved from memory into graphics Less data moved from memory into graphics
• Does not change fill rateDoes not change fill rate
– Screen space projection already minimalScreen space projection already minimal
– Removes few pixels from rasterization stageRemoves few pixels from rasterization stage
![Page 122: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/122.jpg)
Culling: Contribution ImplementationDon’t render items that fall below a size thresholdDon’t render items that fall below a size threshold• Screen space size of bounding volumeScreen space size of bounding volume
• A less computational approach A less computational approach
– Distance to object combined with some notion of global Distance to object combined with some notion of global object sizeobject size
![Page 123: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/123.jpg)
If you can’t see itIf you can’t see it
don’t draw itdon’t draw it
Culling: Occlusion
Front Side
![Page 124: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/124.jpg)
Culling: Occlusion GoalsFind the optimal set of occluders that will enable Find the optimal set of occluders that will enable drawing the minimal number of occludeesdrawing the minimal number of occludees• Occluders: The geometry that is visibleOccluders: The geometry that is visible
• Occludees: The geometry that is not visible Occludees: The geometry that is not visible
• Use general purpose occlusion culling algorithmsUse general purpose occlusion culling algorithms
• Use application specific spatial knowledge if possibleUse application specific spatial knowledge if possible
![Page 125: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/125.jpg)
Culling: Occlusion Culling Usefulness• Can improve both transform-limited and fill-limited Can improve both transform-limited and fill-limited
applicationsapplications
• Computationally expensiveComputationally expensive
– Beware of time trade-offsBeware of time trade-offs
• Possible hardware supportPossible hardware support
![Page 126: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/126.jpg)
Culling: General Occlusion Culling• Used for arbitrary scenesUsed for arbitrary scenes
• Can improve both transform limited and fill limited Can improve both transform limited and fill limited applicationsapplications
• Computationally expensive for arbitrary scenesComputationally expensive for arbitrary scenes
![Page 127: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/127.jpg)
Culling: Occlusion Spatial Partitioning““Cell and Portal” CullingCell and Portal” Culling• Spatial organization leads to Spatial organization leads to CellsCells and and PortalsPortals
• Games that move from room to roomGames that move from room to room
• Architectural walkthroughsArchitectural walkthroughs
![Page 128: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/128.jpg)
LOD: OverviewAfter culling, need to draw what is leftAfter culling, need to draw what is left• Still too much geometry: Still too much geometry:
– Use multiple Levels of Detail, I.e. multi-resolution objectsUse multiple Levels of Detail, I.e. multi-resolution objects
• Match geometric complexity to visible on-screen space Match geometric complexity to visible on-screen space coveragecoverage
• Reduce geometric complexity to maintain target frame rateReduce geometric complexity to maintain target frame rate
![Page 129: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/129.jpg)
LOD: Issues• Generating LODs: Generating LODs:
– Height Fields vs 3D objectsHeight Fields vs 3D objects
– View-Dependent: nice, but compute intensiveView-Dependent: nice, but compute intensive
– View-Independent: fast, memory intensiveView-Independent: fast, memory intensive
• Need to decide which LOD level to useNeed to decide which LOD level to use
– Not trivial!Not trivial!
• Need smooth transitions between levelsNeed smooth transitions between levels
– GeomorphsGeomorphs
![Page 130: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/130.jpg)
LOD: Height Fields• Generally thought of as infinite terrainGenerally thought of as infinite terrain
• Specialized algorithms can be usedSpecialized algorithms can be used
![Page 131: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/131.jpg)
LOD: 3D Models• General purpose simplification algorithmGeneral purpose simplification algorithm
• Can use on height fields alsoCan use on height fields also
• Some recent real-time view-dependent algorithmsSome recent real-time view-dependent algorithms
• Also used for compressionAlso used for compression
1024 Triangles 256 Triangles 64 Triangles 16 Triangles
![Page 132: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/132.jpg)
LOD: When to switch LOD levelsAbility to only generate LOD models is not Ability to only generate LOD models is not sufficientsufficient• Need to know when to use which LOD levelNeed to know when to use which LOD level
– single constant hard metric: distance from eyesingle constant hard metric: distance from eye
– Multiple heuristics: cost, benefit, rankingsMultiple heuristics: cost, benefit, rankings
• Can bias LODs to ensure frame rate targets are reachedCan bias LODs to ensure frame rate targets are reached
![Page 133: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/133.jpg)
LOD:Level determination• Determine system rendering characteristicsDetermine system rendering characteristics
• Determine cost of rendering each objectDetermine cost of rendering each object
• Render objects with highest benefit while remaining under Render objects with highest benefit while remaining under the target frame ratethe target frame rate
Level determination can be time consuming!Level determination can be time consuming!““take the time to time the time taken to reduce the take the time to time the time taken to reduce the
rendering time”rendering time”
![Page 134: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/134.jpg)
Going, and going, and going...
![Page 135: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/135.jpg)
LOD: Determining cost of renderingCost is affected by many factorsCost is affected by many factors• Graphics hardware: published benchmarks, startup testsGraphics hardware: published benchmarks, startup tests
• Number of vertices: primarily a function of LOD algorithmNumber of vertices: primarily a function of LOD algorithm
• Rendering Quality: lighting, shading, wire frame, anti-aliasing, Rendering Quality: lighting, shading, wire frame, anti-aliasing, etc.etc.
• Global Factors: total texture memory, dirty internal stateGlobal Factors: total texture memory, dirty internal state
![Page 136: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/136.jpg)
LOD: Benefit FunctionCost alone is not good enough, need benefit alsoCost alone is not good enough, need benefit also• Rendered size of objectRendered size of object
• Error tolerance between LOD level and reference modelError tolerance between LOD level and reference model
• Importance in sceneImportance in scene
• Frame-to-frame coherencyFrame-to-frame coherency
![Page 137: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/137.jpg)
LOD: The Optimal LODsFor all Objects, at each LOD Level, rendered with For all Objects, at each LOD Level, rendered with each RenderTypeeach RenderTypeMaximize the Benefit function:Maximize the Benefit function: Benefit(Object, Level, RenderType)Benefit(Object, Level, RenderType)
Subject to:Subject to: Cost(Object, Level, RenderType) <= TargetFrameRateCost(Object, Level, RenderType) <= TargetFrameRate
![Page 138: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/138.jpg)
LOD: Optimal Optimizations
• Simulated AnnealingSimulated Annealing
• Monte Carlo SimulationsMonte Carlo Simulations
• Simplex SearchesSimplex Searches
![Page 139: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/139.jpg)
LOD: Optimal Optimizations
• Simulated AnnealingSimulated Annealing
• Monte Carlo SimulationsMonte Carlo Simulations
• Simplex SearchesSimplex Searches
Dude,Dude,Can you spare a few dozen CPUs?Can you spare a few dozen CPUs?
![Page 140: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/140.jpg)
LOD: Trade-offsDon’t have enough time to run full LOD Don’t have enough time to run full LOD optimization problem and render the sceneoptimization problem and render the scene• Simplify cost and benefit functionsSimplify cost and benefit functions
• Simplify optimization problem into a ranking of Benefit/CostSimplify optimization problem into a ranking of Benefit/Cost
• Use frame-to-frame coherencyUse frame-to-frame coherency
• Be sure to consider time taken to calculate LODsBe sure to consider time taken to calculate LODs
![Page 141: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/141.jpg)
Application Architectures: Multi-Threading• More stages give more time to cull or generate LODsMore stages give more time to cull or generate LODs
• Each stage adds latencyEach stage adds latency
AppApp CullCull LODLOD DrawDraw
AppApp CullCull LODLOD DrawDraw
AppApp CullCull LODLOD DrawDraw
TT00 TT11 TT22 TT33 TT44 TT55
FrameFrame00
FrameFrame11
FrameFrame22
![Page 142: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/142.jpg)
Application Architectures: Multi-Threading• Hard part is data synchronizationHard part is data synchronization
• Watch out for memory bloatWatch out for memory bloat
![Page 143: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/143.jpg)
Application Architectures: Scene GraphsA scene graph is the basic data structures holding A scene graph is the basic data structures holding the description of your scenethe description of your scene• Cull-able, sort-able, and can contain multi-resolution objectsCull-able, sort-able, and can contain multi-resolution objects
• Hierarchical Bounding VolumesHierarchical Bounding Volumes
• Statistics gathering and timing infrastructureStatistics gathering and timing infrastructure
• For large scenes can do memory management and database For large scenes can do memory management and database pagingpaging
![Page 144: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/144.jpg)
Application Architectures: Trade-offs• QualityQuality
• SpeedSpeed
• MemoryMemory
• ComplexityComplexity
![Page 145: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/145.jpg)
Conclusion: Most importantly - Think about balance!Most importantly - Think about balance!
![Page 146: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/146.jpg)
Performance Hints
Keith Cok, SGIKeith Cok, SGI
![Page 147: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/147.jpg)
Performance Hints:Pipeline Management• Avoid round trips to graphics serverAvoid round trips to graphics server
– Cache own state/attribute information Cache own state/attribute information
– Avoid pipeline queries (e.g., glGet*)Avoid pipeline queries (e.g., glGet*)
– Flush buffer efficiently (glFlush vs. glFinish)Flush buffer efficiently (glFlush vs. glFinish)
• Reduce state changes. Sort by expense. For example, sort Reduce state changes. Sort by expense. For example, sort geometry by type (triangles, quads, etc) and then by colorgeometry by type (triangles, quads, etc) and then by color
• Eliminate unused attributesEliminate unused attributes
![Page 148: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/148.jpg)
Performance Hints: DebuggingDetect graphic errors:Detect graphic errors:#ifdef DEBUG#ifdef DEBUG#define GLEND() glEnd();\#define GLEND() glEnd();\ {int err; \{int err; \ err = glGetError(); \err = glGetError(); \ if (err != GL_NO_ERROR) \if (err != GL_NO_ERROR) \
printf("%s\n",gluErrorString(err)); \ printf("%s\n",gluErrorString(err)); \ assert(err == GL_NO_ERROR);}assert(err == GL_NO_ERROR);}#else#else #define GLEND() glEnd()#define GLEND() glEnd()#endif#endif
![Page 149: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/149.jpg)
Performance Hints: Geometry• Maximize data between glBegin/glEndMaximize data between glBegin/glEnd
– Sort geometry by type (triangle, quad, etc.) and group them Sort geometry by type (triangle, quad, etc.) and group them togethertogether
– Find best fit for length of glBegin/glEnd pairFind best fit for length of glBegin/glEnd pair
• Use stripped primitives (GL_TRIANGLE_STRIP...) to reduce Use stripped primitives (GL_TRIANGLE_STRIP...) to reduce geometry data sent to the pipelinegeometry data sent to the pipeline
• Avoid GL_POLYGON. Use specific geometric primitives instead Avoid GL_POLYGON. Use specific geometric primitives instead (GL_TRIANGLE, GL_QUAD, etc.)(GL_TRIANGLE, GL_QUAD, etc.)
• Use GL_FASTEST with glHint calls where possibleUse GL_FASTEST with glHint calls where possible
![Page 150: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/150.jpg)
Performance Hints: Geometry • Use flat display lists for static geometry. Deep display lists Use flat display lists for static geometry. Deep display lists
may induce unwanted memory thrashingmay induce unwanted memory thrashing
• Use API matrix operations instead of your own Use API matrix operations instead of your own
• Use texture to simulate complex geometryUse texture to simulate complex geometry
• Use vertex arrays. Test vertex, interleaved, precompiled Use vertex arrays. Test vertex, interleaved, precompiled arraysarrays
![Page 151: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/151.jpg)
Performance Hints: Geometry• Pass one normal (not 3 or 4) per flat shaded polygonPass one normal (not 3 or 4) per flat shaded polygon
• Use a data format suitable for quick transfer to the graphics Use a data format suitable for quick transfer to the graphics subsystemsubsystem
• Disable unneeded operations (alpha blending, depth, stencil, Disable unneeded operations (alpha blending, depth, stencil, blending, dithering, fog, etc.)blending, dithering, fog, etc.)
![Page 152: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/152.jpg)
Performance Hints: Lighting• Reduce lighting requirements: Reduce lighting requirements:
– Use as few lights as possibleUse as few lights as possible
– Use directional (infinite) lighting. Use Use directional (infinite) lighting. Use glLightfv(GL_LIGHTn, GL_POSITION, {x,y,z,0});glLightfv(GL_LIGHTn, GL_POSITION, {x,y,z,0});
– Use positional lights rather than spot lightsUse positional lights rather than spot lights
– Use one-sided lighting when possible (be aware of issues Use one-sided lighting when possible (be aware of issues associated with normals)associated with normals)
– Don’t change material properties frequently Don’t change material properties frequently
![Page 153: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/153.jpg)
Performance Hints: Lighting• Use normalized normal vectorsUse normalized normal vectors
– Supply unit length vectorsSupply unit length vectors
– Don’t enable GL_NORMALIZEDon’t enable GL_NORMALIZE
– Don’t scale using model-view matrix Don’t scale using model-view matrix
• Pre-multiply geometry, if possiblePre-multiply geometry, if possible
![Page 154: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/154.jpg)
Performance Hints: Visuals/Pixel Formats• Pick the correct visual. Use hardware accelerated visualsPick the correct visual. Use hardware accelerated visuals
• Structure windows and contexts to maximize performance Structure windows and contexts to maximize performance (app may block after context swaps)(app may block after context swaps)
• Put GUI elements in overlay planes to avoid unwanted Put GUI elements in overlay planes to avoid unwanted graphics window refreshesgraphics window refreshes
![Page 155: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/155.jpg)
Performance Hints: Buffers• Turn off depth buffer when possibleTurn off depth buffer when possible
• Use HW accelerated off-screen buffer for backing-storeUse HW accelerated off-screen buffer for backing-store
• Use stencil buffer for interactive picking and quick re-render Use stencil buffer for interactive picking and quick re-render (see course notes for full algorithm)(see course notes for full algorithm)
• Use color/depth buffer data for interactive editing of complex Use color/depth buffer data for interactive editing of complex scenes (see course notes for full algorithm)scenes (see course notes for full algorithm)
![Page 156: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/156.jpg)
Performance Hints: Textures• Be aware of texture sizesBe aware of texture sizes
– Reduce texture resolutionReduce texture resolution
– Use texture LOD extension (OpenGL 1.2)Use texture LOD extension (OpenGL 1.2)
• Use texture objects. Create textures once Use texture objects. Create textures once
• Don’t swap textures frequently, if possibleDon’t swap textures frequently, if possible
– Mosaic multiple textures into one large textureMosaic multiple textures into one large texture
– Sort geometry by textureSort geometry by texture
![Page 157: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/157.jpg)
Performance Hints: Textures• Use texture as an additional data lookup to simulate more Use texture as an additional data lookup to simulate more
complex data:complex data:– Lighting, geometry, color, clipping, application-space data Lighting, geometry, color, clipping, application-space data
• Use glTexSubImage to replace part of a texture rather than Use glTexSubImage to replace part of a texture rather than creating a whole new texturecreating a whole new texture
• Avoid expensive texture filter modesAvoid expensive texture filter modes
• Use texture lookup tables instead of multi-channel texturesUse texture lookup tables instead of multi-channel textures
![Page 158: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/158.jpg)
ConclusionKnow how your application works within the Know how your application works within the systemsystem• Don’t let caches, latencies, bandwidths, etc. slow you downDon’t let caches, latencies, bandwidths, etc. slow you down
• Know how fast you can goKnow how fast you can go
• Identify system performance characteristicsIdentify system performance characteristics
• Work your compilerWork your compiler
• Get all you can out of the hardwareGet all you can out of the hardware
![Page 159: Developing Efficient Graphics Software](https://reader035.vdocuments.mx/reader035/viewer/2022062816/56814cd7550346895db9db63/html5/thumbnails/159.jpg)
Questions and Answers