a 80/20mhz 160mw multimedia processor integrated with ......architecture overview color out (24b)...

22
A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM, MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications Chi-Weon Yoon, Ramchan Woo, Jeonghoon Kook, Se-Joong Lee, Kangmin Lee, Young-Don Bae, In-Cheol Park and Hoi-Jun Yoo Dept. of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Korea

Upload: others

Post on 29-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM, MPEG-4

Accelerator and 3D Rendering Engine for Mobile Applications

Chi-Weon Yoon, Ramchan Woo, Jeonghoon Kook, Se-Joong Lee, Kangmin Lee, Young-Don Bae,

In-Cheol Park and Hoi-Jun Yoo

Dept. of Electrical Engineering, Korea Advanced Institute of

Science and Technology (KAIST), Korea

Page 2: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Outline• Introduction• System Architecture Overview• Low Power Block Design

• 32Bit RISC • MPEG-4 Accelerator• 3D Rendering Engine • Embedded DRAM Frame Buffer

• Features of Test Chip• Conclusions

Page 3: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Requirements for Future Mobile Information Terminals

• Multimedia Signal Processing– mp3, 2D Image Processing, etc.– 3D Graphics

• Low Power Features– Battery-driven Products

• Low Cost Solutions– Major Factor for Consumer Electronics

Page 4: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Target Specifications

3D Image Rendering! > 2 Mpolygons /sec! 256 x 256 Resolution

with 24b True Color! 16b Z-Buffering! Alpha Blending! Double Buffering

MPEG-4 Video Decoding! Simple Profile! QCIF(176 x 144)! 15 frames /sec

System Power! < 200mW

Others! mp3, etc.

Low PowerMultimediaProcessor

LCDDisplay

Page 5: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

The Proposed Solution

+! Distribution of Computational Load! Small Area! Programmability! Circuit Level Low Power Techniques

OptimizedPerform.

CPU

Dedicated H/W

EmbeddedDRAM

! Large Area! High Power

Consumption

Optimized at Architecture / Circuit LevelH/W & S/W Mixed Solution

HighPerform.

CPU

Page 6: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Architecture Overview

Color Out(24b)

512b

MCAccelerator

FrameBuffer

20MHz

Ext. I/O

128b

YCrC

b to

RG

BSA

M

2048

b3DRendering

Engine

20MHz

Frame Buffer+

Z-Buffer SAM

ARM9ARM9

MAC

B.W.EqualizerDual-Port

SRAM(2KB)

B.W.EqualizerDual-Port

SRAM(2KB)

80MHz

DLL

ClkGen.

32b

80MHz

Fast / NarrowData Transaction(32b @ 80MHz)

Color Out(24b)

Slow / Wide Data Transaction(512b @ 20MHz)

Data BufferingOn-Chip WideBus between

Logic / eDRAM

Page 7: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Multimedia Enhancement in RISC

5-StagePipeline

D EX MEM WBFD EX MEM WBF

D EX MEM WBF

ExecutionUnits

REGFile Su

MUL

ALU

4:2Add

4:2Add

4:2Add

4:2Add

4:2Add

4:2Add

4:2Add

4:2Add

4:2Add

Tree Structure with 4:2 Adders

! 1-Cycle 32b x 32b Multiplication! 2-Cycle 32b x 32b Multiplication and Accumulation! 23% Cycle Reduction Compared with Conventional

ARM Architecture

Page 8: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Bandwidth Equalizer

512 (32)b @ 20MHzTo Dedicated H/W

32b @ 80MHzFrom RISC

DP-SRAM(2KB)

FlowCont. 32

512b

Single Endedfor Tight Bit Pitch

Act asA Row Cache

WBE : Wide Bus EnableSTR : Cache StoreDDO : Direct Data Out

DB DDO

BL

SEBLSA

BLSASAE

CS

DB

BL

BL2

BL2CELL

BL

WBE

BL

WBE

STR

Page 9: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Motion Compensation(MC) Accelerator

Pixe

l Buf

fer

MU

Xing

Log

ic PixelALU #0

#6#7 D

ata

Alig

nmen

t Half-PelALU #0

#6#7

MU

Xing

Log

ic

FB B

uffe

r

Parallel Operation @ 20MHz

Ada

ptiv

e Fe

tch

Con

trol

128b (16 Pixels)

512b

Frame Buffer #0(512b x 128row x 9bank)

FBCont

Frame Buffer #1(512b x 128row x 9bank)

FBCont

20MHz

Page 10: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Frame Buffer for MCA

9-Bank with 128b I/O

Sub-wordline with Partial Activation, Partial I/O Scheme

128bI/O

Bank#0 #1 #2 #8

128

Partial Activation ControlSW

L D

river

S/ADB S/A

x32

SWL

Driv

erS/A

DB S/Ax32

SWL

Driv

er

S/ADB S/A

x32

GW

L D

river

128b

Partial I/O Control

SWL

Driv

er

S/ADB S/A

x32

SWDL/GWL

RXPA Cont

GWL

Page 11: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Spatial Locality

MB Addr = N

MB Addr = N+1

Blocks to beReconstructed

PreviouslyUsed

NewlyNeeded

70~90% areConfined

in 8x8 Boundary

Large SpatialLocality

Re-usableBlock

MVy

Distribution of MotionVectors for Class A/B

(MVx,MVy)

4 8 16

4

8

16

-4-8

-16

-4-8-16 MVx

CommonlyUsed

Needed

Page 12: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Distributed Nine-Tiled Block Mapping: Low Power Technique (1)

Frame Image

Bank #0Bank #1

IncreasingRe-usability

Bank #8

9-Banks (1- Macro)

BK#0BK#8

! Minimizing Cell Core Activation in DRAM

0000 1111 22223333 4444 5555

6666 7777 8888

1-Bank

Row ConflictsA Block in A Row

Page 13: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Partial Activation Scheme: Low Power Technique (2)

NormalOperation

SAMTransfer G

WL

drv

0 1 2 3 GW

L dr

v

GW

L dr

v

GW

L dr

v

Bank #0 Bank #1

Bank #2 Bank #3

0 1 2 3

0 1 2 3 0 1 2 3

GW

L dr

v

0 1 2 3 GW

L dr

v

GW

L dr

v

GW

L dr

v

Bank #0 Bank #1

Bank #2 Bank #3

0 1 2 3

0 1 2 3 0 1 2 3

UnnecessaryData

Screen

0123#0 #1

#2 #3

0123

Screen

#0 #1

#2 #3

PartialActivation

NecessaryData

DNTBM +Partial ACT

Up to 31% Power ReductionCompared with 1-Bank Structure

Page 14: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Adaptive Fetch Control Scheme: Low Power Technique (3)

Block-by-BlockReconstruction

PE #6PE #7

PE #4PE #5

PE #2PE #3

PE #0PE #1Valid Data

Garbage Data

FB B

uffer

Muxing Logic

No Switchingin Datapath

=

Adaptive FetchControl

4

21

3

+ + +1 2 3 4

Page 15: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

3D Rendering Engine

1-EdgeProcessor

Bandwidth Equalizer

Left Right

Polygon Buffer

PP0 PP1 PP2 PP7

Fram

e-B

uffe

rIn

terf

ace

Calculating 8 Pixels/Cycle

R G B X Y Z R G B X Y Z

8-PixelProcessors

Parallel Datapath for RGB and Z

Shading

Blending

Shading

Blending

Shading

Blending

DepthComparison

Z-Unit

Old Pixel (RGBZ)

New Pixel (RGBZ)

R/G/B Unit

Shading

Blending

Update

1280b

20MHz

Page 16: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

• Virtually Spanning 2D Array(ViSTA) Architecture

ViSTA Architecture

Previous Work(ISSCC2000 TP14.7)

Control

EPPPPP

PP

MM

MEP

PPPP

PP

MM

M

EPPPPP

PP

MM

M

EPPPPP

PP

MM

M

EPPPPP

PP

MM

M

EPPPPP

PP

MM

MEP

PPPP

PP

MM

M

EPPPPP

PP

MM

M

1/8 Scaling EP

PP PP PP

Interface

M M M

! 1 EP! 8 PP's

8-StagePipelined EP

This Work(ViSTA)

VirtuallySpans 2D

ArrayDynamic Bus

Reconfiguration

! 8 EP's! 64 PP's

Parallel EP

Page 17: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Frame Buffer for 3DRE

512kbDRAM512kbDRAM512kbDRAM512kbDRAM

512kbDRAM512kbDRAM512kbDRAM512kbDRAM

SAMFBI

Depth Buffer

256b 256b

Write

Read

640b

From PixelProcessors

768b 24b TrueColor

InterchangeableDouble Color-Buffers

SCLK

FrameBuffer #0

FrameBuffer #1

384b 384b

1280b x 20MHz =3.2GB/sec

640b

384b 384b512kbDRAM512kbDRAM512kbDRAM512kbDRAM

ConcurrentData Transfer

Page 18: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Single Bitline Writing Scheme: Low Power Technique (4)

BL(Real)

WL

Vcc Vcc/2

Vcc

Single Bitline Writing

GND

BIS_0

BIS_1

No Transitionsin /BL

20% Power Reductionin Data Sensing

/BL(Ref.)

30.02 mW 19.3 mW

30.47 mW 15.0 mW

Periphery &Control

DataSensing

8Kb Cell Arraywith 2K Column

Page 19: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

System Power ConsumptionPower(mW)

Conventional Design - I(Ext. FB)

ProposedSystem

25

5075

100125150 eDRAM

Macro

Logic

DataI/O

eDRAMMacro

ConventionalDesign - II

(Embedded FB)

175

This Work

By EmbeddingDRAM

Logic Logic

1000

160mW

400~700

Page 20: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Die Photograph

3DRendering

Engine

DLL

32bitRISC

MCA

BandwidthEqualizer

MCFrameBuffer

#1

MCFrameBuffer

#2

3DREFrame Buffer

3DREZ-Buffer

InternalDRAM

SAMYCrCb to RGB SAM

! 0.18um EML Technology with 3-poly, 6-metal! 240pin QFP! 84mm2 (14 x 7 Including I/O Cells)

Page 21: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Chip Features (Physical)

MCAccelerator

FrameBuffer

3DRendering

Engine

! 80MHz! 1.5V! 12mW! 1.7mm2

FrameBuffer

B.W.EqualizerARM9

! 20MHz! 1.5V! 4.6mW! 2.3mm2

! 20MHz! 2.5V! 11.7mW! 5.25mm2

! 20MHz! 1.5V! 36mW! 5mm2

! 80MHz! 2.5V! 84mW! 16.4mm2

< 40mW

< 140mW

! 80/20MHz! 1.5V! 4mW! 1.6mm2

! I/O Cells : 3.3V

Page 22: A 80/20MHz 160mW Multimedia Processor integrated with ......Architecture Overview Color Out (24b) 512b MC Accelerator Frame Buffer 20MHz Ext. I/O 128b YCrCb to RGB SAM 3D 2048b Rendering

Conclusions• Low Power Multimedia Processor for Mobile

Applications– Optimized H/W & S/W Mixed System– Multimedia Signal Processing

• Not only 2D Image, But also 3D Graphics– Low Power Techniques

• Distributed Nine-Tiled Block Mapping• Partial Activation, Partial I/O scheme • Adaptive Fetch Control Scheme • Single Bitline Writing Scheme

• 160mW, 84mm2