enhancing and optimizing the render cache
DESCRIPTION
Enhancing and Optimizing the Render Cache. Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics. Background. Render Cache - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/1.jpg)
Enhancing and Optimizing the Render Cache
Bruce Walter
Cornell Program of Computer Graphics
George DrettakisREVES/INRIA Sophia-Antipolis
Donald P. GreenbergCornell Program of Computer Graphics
![Page 2: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/2.jpg)
Background
Render Cache• “Interactive Rendering using the Render
Cache”, Rendering Workshop 1999• Goal
- Interactive Rendering
- Exploit frame-to-frame coherence
- Decouple renderer from display framerate
- Reuse “expensive” rendering results
![Page 3: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/3.jpg)
Background
Goal: Interactive rendering
Ray tracing Path tracing
![Page 4: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/4.jpg)
Background
Modified Visual
Feedback Loop
display
application
image
userrenderer
Asynchronousinterface
![Page 5: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/5.jpg)
Background
Reproject rendered points
Original view New view
![Page 6: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/6.jpg)
Background
renderer
renderer
imageInterpolate
Sampling
Depth Cull
Project/Z-Buffer
Display process
Update Points
![Page 7: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/7.jpg)
Background
Results after each stage
Projection Depth cull Interpolation
![Page 8: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/8.jpg)
Background
Displayed image Priority image Requested pixels
Sampling
![Page 9: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/9.jpg)
Related Work
Faster ray engines• Optimize and parallelize
- E.g., Wald et al
Hardware-based display• Mesh-based
- E.g., Tapestry, Holodeck, Tole et al
• Texture-based- E.g., Corrective textures
![Page 10: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/10.jpg)
Motivation
Render Cache works well• Can enable interactive use of higher quality
ray-based renderers.
… but needs improvement• Images too small (256x256)• Gaps often visible during camera motion• Not fast enough in tracking shading
changes
![Page 11: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/11.jpg)
Enhancements
Tiled Z-Buffer• Better scalability and memory coherence
Larger Interpolation Prefilter• Can fill larger gaps between points
Predictive Sampling• Improved quality during camera motion
Point Eviction• Faster update of shading changes
![Page 12: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/12.jpg)
Enhancements
Code Optimization• Use of SIMD (MMX/SSE/SSE2)• Data layout, branch conversions, etc.
Publicly Available• For evaluation, comparison, or use
- Non-commercial binary release
- URL is in the paper
![Page 13: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/13.jpg)
Memory Coherence
Change from R10K to Pentium 4• Cache reduced from 4MB to 256K• Clock increased from 195MHz to 1.7GHz
- Cache misses much more expensive
Change from 256x256 to 512x512• Point data ~ 5MB, Image data ~ 3MB
- Much bigger than cache
Projection and Z-Buffer problematic
![Page 14: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/14.jpg)
Projection and Z-Buffer
Point Cloud 5MB
Image - 3MB
Random order memory access- Read/modify/write operation is memory latency
limited
![Page 15: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/15.jpg)
Tiled Projection and Z-Buffer
Point Cloud 5MB
Image - 3MB
Divide image into tiles- Tiles sized to fit in cache
Tile Buckets - 4MB
![Page 16: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/16.jpg)
Tiled Projection and Z-Buffer
Point Cloud 5MB
Image - 3MB
Project and bucket sort by tile
Tile Buckets - 4MB
![Page 17: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/17.jpg)
Tiled Projection and Z-Buffer
Point Cloud 5MB
Image - 3MB
Z-Buffer each tile separately
Tile Buckets - 4MB
![Page 18: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/18.jpg)
Tiled Projection and Z-Buffer
Point Cloud 5MB
Image - 3MB
Uses more memory and instructions- But it is faster (25ms instead of 42ms)
Tile Buckets - 4MB
![Page 19: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/19.jpg)
Interpolation Filters
Larger filters• Fill larger gaps in point data• Generally more expensive• Result in more blurring of the image
The previous Render Cache• Used a 3x3 weighted filter
- Can only fill very small gaps
- Introduces only a small amount of blurring
![Page 20: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/20.jpg)
Prefilter
Add a larger “backup” filter• Results used only when 3x3 filter fails• Uses a uniform 7x7 filter
- Can be computed cheaply
• Can fill in much larger gaps• Does not affect sampling priorities• Actually executed first then overwritten
- Hence the name “prefilter”
![Page 21: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/21.jpg)
Prefilter
3x3 filter only 7x7 prefilter only Both filters
![Page 22: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/22.jpg)
Predictive Sampling
Sampling is purely reactive• Helps to guide sparse sampling• Samples returned in later frame
- Problem when large new regions become visible
Predict large gaps ahead of time• Project using a predicted camera• Request samples before they are needed
![Page 23: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/23.jpg)
Predictive Sampling
Projection is expensive• 47% of original render cache cost
Use simplified projection• No Z-Buffer
- Only need to find regions with no points
• Reduced resolution- 1/4 width and height (1/16 # of pixels)
• Store only 1 byte per pixel- Occupancy image fits easily in cache
![Page 24: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/24.jpg)
Predictive Sampling
No Prediction With Prediction
Example during rapid camera rotation
![Page 25: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/25.jpg)
Algorithm Overview
renderer
renderer
image
Interpolate
Sampling
Depth Cull
Z-Buffer
Update Points
Prediction
Project/Sort
Prefilter
![Page 26: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/26.jpg)
Point Eviction
Stale data can be worse than no data• Points may live a long time at high ratios
- Not enough new samples to overwrite old
• Color change detection already exists- Enhances sampling in regions of change
- Works by aging nearby points
Evict points beyond an age limit• Speeds image convergence
![Page 27: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/27.jpg)
SIMD Optimizations
Utilize MMX/SSE/SSE2 instructions• Project four points at once• Process R,G,B channel simultaneously• Add memory prefetches
- Automatic prefetch works well for linear access
• Convert branches to data dependencies- Compares set masks of zeroes or ones
- Use boolean operations instead of branches
• Roughly a factor of two total speedup
![Page 28: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/28.jpg)
Results
Ray trace only (1.8 fps) Render Cache (9 fps)
Single 1.7GHz processor - rotating camera
![Page 29: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/29.jpg)
Results
Timing: 62.1 ms (up to 16 fps)• 512x512 image, render cache only• 1.7GHz Pentium 4 processor
Update Points
Prediction
ProjectZ-Buffer
Depth Cull
Prefilter
Filter / Smooth
Sampling
![Page 30: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/30.jpg)
Scalability with Image Size
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
0 50 100 150 200 250 300 350
Fra
me
Siz
e (
Pix
els
)
Frame Time (ms)
512x512
1200x1200
![Page 31: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/31.jpg)
Results
Try it for yourself• Download publicly available binary
- Includes Render Cache and simple Ray Tracer
- Requires a Pentium 4 and Java Web Start
- Free for evaluation and internal use
- Http://www.graphics.cornell.edu/research/interactive/rendercache
Demo
![Page 32: Enhancing and Optimizing the Render Cache](https://reader031.vdocuments.mx/reader031/viewer/2022032708/56812be3550346895d90576d/html5/thumbnails/32.jpg)
The End