status – week 226 victor moya. summary recursive descent. recursive descent. hierarchical z...

45
Status – Week Status – Week 226 226 Victor Moya Victor Moya

Post on 21-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Status – Week Status – Week 226226

Victor MoyaVictor Moya

Page 2: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

SummarySummary

Recursive descent.Recursive descent. Hierarchical Z Buffer.Hierarchical Z Buffer.

Page 3: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

TILE FIFO

TRIANGLESETUP

TILEEVAL

TILEEVAL

TILEEVAL

HZTEST

FRAGMENTFIFO

TILEEVAL

Page 4: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Tile FIFO:Tile FIFO: Store start position of tiles to test/split.Store start position of tiles to test/split. Each tile has the following information:Each tile has the following information:

4 c values (3 edge, 1 z/w): 4 x 32 bits.4 c values (3 edge, 1 z/w): 4 x 32 bits. Tile level/size: log2(max(maxH, maxV)) bits.Tile level/size: log2(max(maxH, maxV)) bits.

– Ex: for 2048x2048, 12 bits.Ex: for 2048x2048, 12 bits. Expand bit: if the tile must be expanded.Expand bit: if the tile must be expanded.

For N tile evaluators could be arranged as a For N tile evaluators could be arranged as a NxM queue.NxM queue.

Triangle setup could add 1 tile (the full Triangle setup could add 1 tile (the full viewport) or N tiles (reduces in 1 the traversal viewport) or N tiles (reduces in 1 the traversal depth).depth).

Page 5: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Expand Tile: New tiles generated.

level, no expandlevel – 1,

expand

start sample

generated sample

level, no expand

level, no expand

Page 6: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

No Expand Tile: new tiles

level –1, expand

start sample

generated sample

Page 7: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Tile evaluator: N = 1Tile evaluator: N = 1 1x1 tiles (no subtiles tested at tile evaluator).1x1 tiles (no subtiles tested at tile evaluator). Calculates three new sample positions.Calculates three new sample positions. Tests if any triangle fragment is inside the tile:Tests if any triangle fragment is inside the tile:

If all the tile 4 corners are negative (outside) for any If all the tile 4 corners are negative (outside) for any of the edge equations.of the edge equations.

Performs HZ test.Performs HZ test. Only top level.Only top level. Top level and N middle levels.Top level and N middle levels. All levels.All levels.

Generates a 2x2 fragment stamp.Generates a 2x2 fragment stamp.

Page 8: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Tile Evaluator: N = 1Tile Evaluator: N = 1 3 x 4 equations evaluators:3 x 4 equations evaluators:

Linear equations : u = ax + by + cLinear equations : u = ax + by + c 3 edge equations.3 edge equations. 1 z/w parameter equation.1 z/w parameter equation. Incremental update: Incremental update:

– ccnewnew = c = cstartstart + (a << level) + (b << level). + (a << level) + (b << level). 4 x 4 x (e >= 0) tests:4 x 4 x (e >= 0) tests:

Sample inside/outside triangle.Sample inside/outside triangle. 4 x Z tests.4 x Z tests.

Against the proper HZ level.Against the proper HZ level.

Page 9: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

C A B

<< <<

+

Equation Evaluator

Only for N = 2

Samples NxN

Page 10: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Tile Evaluator: N = 2Tile Evaluator: N = 2 2x2 tile (subtiles/fragments).2x2 tile (subtiles/fragments). 8 new samples generated.8 new samples generated. 8 x 4 equation evaluators.8 x 4 equation evaluators. 8 x (e >= 0) tests.8 x (e >= 0) tests. 8 x Z tests.8 x Z tests. Generates 2x2 or 3x3 fragment stamps. Generates 2x2 or 3x3 fragment stamps. EXPAND TILES ARE NO LONGER REQUIRED.EXPAND TILES ARE NO LONGER REQUIRED.

Page 11: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

CMP

HZLevel i

CMPCMPCMP

AND

4 x Z/W

Tile Passes

only one value read

Tile Start Position

Level

Page 12: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Tile evaluator critical path?Tile evaluator critical path? Z/W parameter evaluation | HZ Z/W parameter evaluation | HZ

access.access. Z Compare/Test (>).Z Compare/Test (>).

But could be pipelined:But could be pipelined: Same throughputSame throughput Longer latency.Longer latency. Larger tile queue?Larger tile queue?

Page 13: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Each tile evaluator could also work Each tile evaluator could also work in more than one triangle at the in more than one triangle at the time.time.

Tile evaluator for 2 triangles: N = 1Tile evaluator for 2 triangles: N = 1 2 x 3 x 4 equation evaluators.2 x 3 x 4 equation evaluators. 2 x 3 x 4 e >=0 tests.2 x 3 x 4 e >=0 tests. 2 x 4 Z tests.2 x 4 Z tests. Generates 2 2x2 fragment stamps.Generates 2 2x2 fragment stamps.

Page 14: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

Benefits:Benefits: Single access to the HZ hierarchy.Single access to the HZ hierarchy. Increases throughput.Increases throughput. Shares latency for first fragment.Shares latency for first fragment.

Problems:Problems: Overlaping triangles.Overlaping triangles. Produces more tiles.Produces more tiles. Produces more fragments per cycle (but Produces more fragments per cycle (but

that would also happen with N=2).that would also happen with N=2).

Page 15: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

TriangleSetup

RecursiveRasterization

HierarchicalZ

FragmentFIFO

Simulator Boxes

Page 16: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Recursive RasterizationRecursive Rasterization

RecursiveRasterization

newTilenewTriangle

newFragment

HZ Update

(Tile level)

Recursive Rasterization Box and Signals

Page 17: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Multiple levels.Multiple levels.

Level 0:Level 0: 1 to 16 registers.1 to 16 registers.

Level 1:Level 1: Fixed size ~64 KB.Fixed size ~64 KB. Maps to 4x4, 8x8 or 16x16 tiles (relative to viewport Maps to 4x4, 8x8 or 16x16 tiles (relative to viewport

size).size). Z-Buffer.Z-Buffer.

Add additional level(s) between 0 and 1.Add additional level(s) between 0 and 1. More memory.More memory. Latency for updates?Latency for updates? Comparators?Comparators?

Page 18: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Z-Buffer could be accessed at the Z-Buffer could be accessed at the fragment level.fragment level. Good for prefetching?Good for prefetching? But would require N reads.But would require N reads.

Provide the full Z cache line.Provide the full Z cache line. Fragments stamps (NxM) map to a single Fragments stamps (NxM) map to a single

Z cache line.Z cache line.

Page 19: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Update mechanism:Update mechanism: Write to Z-Buffer.Write to Z-Buffer. At cache misses pack/compress the At cache misses pack/compress the

cache line and calculate the larger Z cache line and calculate the larger Z value for that line.value for that line.

Propagate upwards the Z value of the Propagate upwards the Z value of the line.line.

Could require a lot of comparations for Could require a lot of comparations for top levels.top levels.

Expensive?Expensive?

Page 20: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Access:Access: Tiles:Tiles:

At tile evaluators:At tile evaluators: For fragments:For fragments:

At tile evaluators: At tile evaluators: – Shares hardware.Shares hardware.– Larger latency for tile evaluators at fragment level.Larger latency for tile evaluators at fragment level.– Use HZ Level 1 for tiles larger than a stamp but Use HZ Level 1 for tiles larger than a stamp but

smaller than a HZ level 1 block.smaller than a HZ level 1 block. At an HZ test stage before Fragment FIFO:At an HZ test stage before Fragment FIFO:

– Smaller latency for tile evaluators.Smaller latency for tile evaluators.– Access to Z buffer.Access to Z buffer.

Page 21: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Location:Location: Level 0:Level 0:

At tile evaluators (duplicated):At tile evaluators (duplicated):– Better latency.Better latency.– Broadcast for updates.Broadcast for updates.

At HZ separated memory:At HZ separated memory:– Worst latency?Worst latency?– Shared => multiported!!!!.Shared => multiported!!!!.

Page 22: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Location:Location:

Level 1:Level 1: At tile evaluators:At tile evaluators:

– Too large.Too large. At HZ separated on die memory:At HZ separated on die memory:

– Better solution?Better solution?– Must be multiported: Must be multiported:

1 access per tile evaluator.1 access per tile evaluator. Multiple sets? Multiple sets? Set conflict?Set conflict?

At video memory:At video memory:– For very large HZ buffers or very small precission?For very large HZ buffers or very small precission?– Access time?Access time?– HZ cache?HZ cache?

Page 23: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Location:Location: Z-Buffer:Z-Buffer:

On video memory.On video memory.– Compressed.Compressed.– Reduce read/write bandwidth usage.Reduce read/write bandwidth usage.

Cache on die.Cache on die.– Uncompressed.Uncompressed.

Hardware packer/unpacker:Hardware packer/unpacker:– Used also for the HZ update.Used also for the HZ update.

Page 24: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

SizesSizes Level 0:Level 0:

Register kind access.Register kind access. 1 – 2 cycles max.1 – 2 cycles max. 1 – 32 values.1 – 32 values. Block size depends of the viewport Block size depends of the viewport

resolution:resolution:– 2048x20482048x2048

1 register: 2048x2048 block.1 register: 2048x2048 block. 16 registers: 128x128 blocks16 registers: 128x128 blocks 32 registers: 64x64 blocks.32 registers: 64x64 blocks.

Page 25: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Level 1:Level 1:

On die memory:On die memory:– Fixed size.Fixed size.– Limited by technology.Limited by technology.– Estimation:Estimation:

2048x2048 viewport.2048x2048 viewport. 8x8 blocks.8x8 blocks. 16 bits Z value.16 bits Z value. 128KB.128KB.

Video memory:Video memory:– Unlimited?Unlimited?– Variable size.Variable size.– But larger access time!!!But larger access time!!!– Requires cache.Requires cache.– Update!!Update!!

Page 26: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Z-Buffer:Z-Buffer: Video memory.Video memory. Unlimited size.Unlimited size. Large access time.Large access time. 32 bits (with stencil) per value.32 bits (with stencil) per value. Cache line size:Cache line size:

– HZ block 8x8 (tiled): 2048 bits.HZ block 8x8 (tiled): 2048 bits.– HZ block row/column 8: 256 bits.HZ block row/column 8: 256 bits.– Less than an HZ block row/column.Less than an HZ block row/column.

Page 27: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Packer/Unpacker:Packer/Unpacker: Diferential ‘whatever is called’ compression.Diferential ‘whatever is called’ compression. Calculate max Z value in cache line.Calculate max Z value in cache line. Z block type:Z block type:

00: cleared Z line.00: cleared Z line. 01: uncompressed.01: uncompressed. 10: compression 1.10: compression 1. 11: compression 2.11: compression 2.

Two compression levels:Two compression levels: 4 bits per value.4 bits per value. 16 bits per value.16 bits per value.

Page 28: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer HZ update hardware.HZ update hardware.

If a cache line has the size of a HZ level 1 blockIf a cache line has the size of a HZ level 1 block Nothing, just store.Nothing, just store.

If a cache line is smaller than a HZ level 1 If a cache line is smaller than a HZ level 1 block.block.

Must be stored in a combine cache.Must be stored in a combine cache.– Stores the current larger value for HZ level 1 block.Stores the current larger value for HZ level 1 block.– Stores if a HZ block has been fully updated.Stores if a HZ block has been fully updated.

When a combine cache line is full the HZ level 1 can When a combine cache line is full the HZ level 1 can be updated.be updated.

Use a FIFO policy for the combining cache (only Use a FIFO policy for the combining cache (only space locality).space locality).

Combining cache size?Combining cache size?

Page 29: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Hierarchical Z Buffer.Hierarchical Z Buffer.

How to update HZ level 0?How to update HZ level 0? For a 2048x2048 viewport a 2x2 level 0 has For a 2048x2048 viewport a 2x2 level 0 has

128x128 HZ level 1 blocks.128x128 HZ level 1 blocks. Compare the new HZ level 1 value against all Compare the new HZ level 1 value against all

the other HZ level 1 would require too much the other HZ level 1 would require too much hardware or too much time.hardware or too much time.

Combining cache for level 0:Combining cache for level 0: 4 combining lines (2x2).4 combining lines (2x2). Stores the further Z value written for the level 0 block.Stores the further Z value written for the level 0 block. Update when the full level 0 block is written.Update when the full level 0 block is written.

– !!! Could never happen !!!!!! Could never happen !!!

Page 30: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

L0 Combine Cache.L0 Combine Cache. Size:Size:

2x2 HZ Level 0 buffer.2x2 HZ Level 0 buffer. 256x256 HZ Level 1 buffer.256x256 HZ Level 1 buffer.

– at 2048x2048 HZ L1 block is 8x8.at 2048x2048 HZ L1 block is 8x8. 128x128 L1 blocks per L0 block128x128 L1 blocks per L0 block

– 16Kbit Mask for each L0 combine cache 16Kbit Mask for each L0 combine cache entry !!!!entry !!!!

Page 31: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Solution: Solution:

– Add a L0 line combine cache.Add a L0 line combine cache. 128 bit mask for line combine cache entry.128 bit mask for line combine cache entry. Size: 4 entries?Size: 4 entries? Update when line full to L0 combine cache.Update when line full to L0 combine cache. FIFO?FIFO?

– 128 bit mask for L0 combine cache entry.128 bit mask for L0 combine cache entry. 1 Z value per entry, 1 Z comparator per entry.1 Z value per entry, 1 Z comparator per entry. No replacement!!No replacement!! Only update.Only update. 1Kbit for bitmasks.1Kbit for bitmasks. 128 bits for Z values (16 bits).128 bits for Z values (16 bits). 8 Z comparators (16 bits).8 Z comparators (16 bits).

Page 32: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

L1 Combine Cache:L1 Combine Cache: Size:Size:

Z cache line: 8 Z values (256 bits).Z cache line: 8 Z values (256 bits). 2048x2048 viewport:2048x2048 viewport:

– 8x8 L1 blocks.8x8 L1 blocks.– 8bit bitmask per L1 combine cache entry.8bit bitmask per L1 combine cache entry.

4096x4096 viewport:4096x4096 viewport:– 16x16 L1 blocks.16x16 L1 blocks.– 16bit bitmask per L1 combine cache entry.16bit bitmask per L1 combine cache entry.

1 Z value and 1 Z comparator per entry.1 Z value and 1 Z comparator per entry. FIFO replacement policy?FIFO replacement policy?

Page 33: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Number of entries: 256Number of entries: 256– 4Kbits for bitmask.4Kbits for bitmask.– 4Kbits for Z values (16bit).4Kbits for Z values (16bit).– 256 Z comparators.256 Z comparators.

Fully associative.Fully associative.

Page 34: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z BufferCombine Cache

Z cache line max Z

HZ Level 1 Buffer

HZ Level 0 Buffer

Level 1 block Z

Level 1 combine cache

Level 0 combine

cache

Level 0 block Z

lines

blocks

Page 35: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

TmpZ BitMask

Combine cache line

- TmpZ stores max Z value written in the block.

- BitMask stores a mask with the written positions in the block

Page 36: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer Optimized access to HZ from the Tile Optimized access to HZ from the Tile

Evaluators.Evaluators. Child tiles can reuse HZ Z value read for parent Child tiles can reuse HZ Z value read for parent

tiles.tiles. Add a Z value at each tile in the Tile Buffer.Add a Z value at each tile in the Tile Buffer. Initialized with HZ L0 Z value at the proper tile Initialized with HZ L0 Z value at the proper tile

level.level. Reuse tile Z value until tile level is the same as Reuse tile Z value until tile level is the same as

HZ L1.HZ L1. At 8x8 for example for 8x8 HZ L1 blocks.At 8x8 for example for 8x8 HZ L1 blocks.

Final fragment stamps can be smaller than HZ Final fragment stamps can be smaller than HZ L1 blocks.L1 blocks.

Page 37: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

Hierarchical Z BufferHierarchical Z Buffer

Optimized access from Tile Optimized access from Tile Evaluators:Evaluators: Fetch of HZ L1 is done at 8x8 tile level Fetch of HZ L1 is done at 8x8 tile level

and reused for the fragment stamp.and reused for the fragment stamp. That implies that there is no latency That implies that there is no latency

penalty for fragments.penalty for fragments. In fact it doesn’t have to access the HZ In fact it doesn’t have to access the HZ

L1 buffer at stamp level.L1 buffer at stamp level. 0 cycles for fragment stamp HZ test.0 cycles for fragment stamp HZ test. Reduces accesses to the HZ buffer.Reduces accesses to the HZ buffer.

Page 38: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 39: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 40: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 41: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 42: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 43: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 44: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization

Page 45: Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer

RasterizationRasterization