high dynamic range emeka ezekwe m11 christopher thayer m12 shabnam aggarwal m13 charles fan m14...

31
Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 03/17/22 1

Post on 21-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

High Dynamic Range

Emeka Ezekwe M11

Christopher Thayer M12Shabnam Aggarwal M13Charles Fan M14

Manager: Matthew Russo04/18/23

1

Agenda2

Project Description Charles Marketing Shabnam Behavioral Description Emeka Design Process Chris Floorplan Evolution Shabnam Design Specifications Chris Layout Charles Conclusion Emeka

Charles Fan

Project Description3

Project Description4

High Dynamic Range?? Bright colors are BRIGHT Dark colors are DARK Details are seen CLEARLY

Otherwise… Colors and lights look distorted & bland

FP HDR Format requires 48 bits per pixel Problem: Too much storage space & memory

bandwidth!! Solution: HDR encoding yields 6:1 compression

OUR GOAL: Implement efficient HDR decoding in hardware

6:1 pixel compression Increases useable storage space by 6 fold decrease memory bandwidth by 6 fold Effectively increases performance

Shabnam Aggarwal

Marketing6

Marketing7

AMD’s ATI Mobility Radeon X1900 48-bit floating point HDR

HDR Compression is currently NOT supported Performance hit deters developers

Windows Vista also now requires a high end GPU to realize its full graphics potential. Laptops & portable devices are using

dedicated processors for graphics

OLED (Organic Light Emitting Diode) Displays are being developed by Sony Contrast Ratio: 1000000:1

Marketing9

Our decoder is designed to interface between specially encoded textures stored on the GPU’s memory and one of the GPU’s texture caches that feed into the shader processor.

Each ROP on (**ATI) is capable of processing 4 pixels per clock cycle. We plan for our hardware to decode the texture information for 4 pixels during each clock cycle.

This decoder will allow smaller textures to be stored in the GPU’s memory, which will allow graphics cards to provide the same functions with less memory.

Ultimately, this decoder can provide savings in cost, power consumption, heat dissipation, and size in current graphics cards.

Our HDR Decoder!!

Marketing10

Our HDR Decoder: Smaller textures stored in GPU’s memory Same functions…less memory

Savings in: Cost Power consumption Heat dissipation Size

HDR is the next generation of display technology

Emeka Ezekwe

Behavioral & Algorithmic Description

11

Algorithmic Description

Encoding Break texture into 4X4 pixel blocks. Extract luminance value of each pixel. Normalize red and blue values and average

over each 2X2 block. Green can be recalculated while decoding.

Allocate more bits to luminance values. After encoding, a 4X4 block of pixels can be

compressed from 48 bpp to 8 bpp.

Algorithmic Description

Decoding (Luminance values) Reconstruct Lp

1 Logical shift 1 Integer addition

Calculate GQ 1 Integer addition

Calculate final pixel values 3 floating-point multiplications

Total calculations 1 logical shift + 2 Integer additions + 3

floating-point multiplications

Data Flow1414

Find GReg

Reg

Reg

Reg

Reg

Reg

7

7

4

4

4

4

8

Reg

Compute 1 pixel

Compute 1 pixel

Compute 1 pixel

Compute 1 pixel

Int to FP

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Reg16

Serializeoutput

Serializeoutput

Serializeoutput

Serializeoutput

Chris Thayer

Design Process15

Design Process16

Goal: Speed 400 MHz 4 pixels per cycle, 4 cycles per block

Architectural decisions No denormal support in Floating Point Multiplier Pipelined design Storing input values Integer Multiplication

Wallace trees Booth encoding

Critical adders Carry select

Integer- Floating Point Conversion

Circuit level decisions Mirror FA’s to reduce carry-chain delay Two different HA’s AOI/OAI gates Gate sizing along critical paths Utilize Q and ~Q outputs from registers Clock buffers built into register blocks Double/Triple strapped VDD and GND Repeaters to break up long wires Balanced clock tree Device Folding

Design Process

Verification Process18

C Implementation Structural Verilog Gate Level Schematic Layout

Major Modules Pipeline Stages Global Signals

Shabnam Aggarwal

Floorplan Evolution19

Floorplan Evolution

Chris Thayer

Design Specifications21

Design Specifications22

Delays Stage one pipeline: 1.8 ns Stage two pipeline: 1.53ns Stage three pipeline: 2.479ns

Skew Stage one: x Stage two: x Stage three: x

Resulting Clock Speed: 500 MHz 2 BILLION pixels per second

Size: 442x453 microns Aspect Ratio: 1:1.024

Transistors: 42,772 Density: 0.21 T/micron^2

Charles Fan

Layout23

Floating Point Multiplier Layout24

Pretty beautiful

Floating Point Multiplier Data Flow

Poly Layer26

Metal One Layer27

Metal Two Layer28

Metal Three Layer29

Metal Four Layer30

Questions?