h.264 intra frame coder system design Özgür taşdizen microelectronics program at sabanci...

36
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Upload: angeline-reavis

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

H.264 Intra Frame Coder System Design

Özgür Taşdizen

Microelectronics Program at Sabanci University

4/8/2005

Page 2: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

• Introduction

• Hardware Architectures For

Intra Frame Coder Modules

• Top Level Intra Frame Coder Hardware

• H.264 Intra Frame Coder System

• Conclusions and Future Work

OUTLINE

Page 3: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

1984 1985 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

H.262 / MPEG-2

H.264 / MPEG-4 Part 10

MPEG-1 MPEG-4

Joint ITU-T / MPEG

MPEG

ITU-TH.261 H.263 H.263+ H.263++

Standards

Years

• The latest video coding standard

• Developed with the collaboration of ITU-T and MPEG

• Includes 3 Profiles and 14 Levels

H.264 VIDEO CODING STANDARD

Page 4: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Bandwidth Required (Mbps)

Storage Utilization (MB)

Download Time (Minutes)

: MPEG-2

: MPEG-4 (ASP)

: H.264

3.0

1.8

1.1

2025

1234

727

386 235

139

CoderMPEG-4

ASPH.263 HLP MPEG-2

H.26438.62% 48.80% 64.46%

H.264 VIDEO CODING STANDARD

90-minute DVD-quality movie (Download time at 700 Kbps)

It Provides Significant Performance Gains

Average Bit Rate Savings

Page 5: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Reorder

Entropy Coder

Transform Quant

Inverse Transform

Inverse Quant

DeblockingFilter

Intra Prediction

MotionCompensation

Mode Decision

Reconstructed Frame

Reference Frame

Current Frame

MotionEstimation

Choose Intra Mode

+

++

-

Intra Frame Coder

H.264 Encoder Block Diagram

Residue

Reconstruction

Page 6: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

• Introduction

• Hardware Architectures For

Intra Frame Coder Modules

• Top Level Intra Frame Coder Hardware

• H.264 Intra Frame Coder System

• Conclusions and Future Work

OUTLINE

Page 7: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Transform and Quantization Algorithms

Forward Transform Quantizer

Inverse Transform

Inverse Quantizer

HadamardTransform

Inverse HadamardTransform

Residue

Reconstruction

VLC

Page 8: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

4x4 Forward Integer Transform

4x4 Hadamard Transform

2x2 Hadamard Transform

4x4 Inverse Integer Transform

H.264 Transform Algorithm• A multiply-free 4x4 integer transform is used. It only requires additions and shifts.

• For 16x16 intra coded luminance blocks and for 8x8 chrominance blocks a second transform, Hadamard Transform, is applied on DC coefficients.

Page 9: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

-1

2120

1918

2524

2322

16 17

15141110

131298

7632

5410

LUMA

CHROMACB

CHROMA CR

H.264 Transform Algorithm

• 4x4 Forward Integer Transform is applied to all the blocks except –1, 16, 17• 4x4 Hadamard Transform is applied to –1 if intra 16x16 mode is selected• 2x2 Hadamard Transform is applied to 16, 17

Page 10: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Register 0 stores: (x0+x4+x8+x12)

Register 1 stores: (x1+x5+x9+x13)

Register 2 stores: (x2+x6+x10+x14)

Register 3 stores: (x3+x7+x11+x15)

Pipelining Registers are used to increase the maximum clock frequency

Register 4 stores the result of transform operations

Transform Hardware

(x0+x4+x8+x12) + (x1+x5+x9+x13) + (x2+x6+x10+x14) + (x3+x7+x11+x15)

2*(x0+x4+x8+x12) + (x1+x5+x9+x13) - (x2+x6+x10+x14) - 2*(x3+x7+x11+x15)

(x0+x4+x8+x12) - (x1+x5+x9+x13) - (x2+x6+x10+x14) + (x3+x7+x11+x15)

(x0+x4+x8+x12) - 2* (x1+x5+x9+x13) + 2*(x2+x6+x10+x14) - (x3+x7+x11+x15)

Page 11: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

|Zij| = (|Wij|.MF + f) >> qbits, sign(Zij) = sign(Wij)

|Zij| = (|Yij|.MF + 2f) >> (qbits + 1), sign(Zij) = sign(Yij)

W’ij = Zij.V.2floor(QP/6)

If QP > 12 W’ij = Wqij.V.2floor(QP/6) - 2

Else W’ij = [ Wqij.V + 21 - floor(QP/6) ] >> (2-floor (QP/6))

Quantization Hardware

AC Coefficients :

DC Coefficients :

Inverse Quantization

AC Coefficients :

DC Coefficients :

QP ranges from 0 to 51. qbits = 15+floor(QP/6)

Page 12: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Transform and Quantization Hardware

Page 13: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

0.18µ ASIC

implementation  

Critical PathDelay [ns] Gate Count

Transform part of the Datapath

2.77 1978

Datapath 4.78 12773

Datapath + Control Unit 4.8 23162

Datapath + Control +Input Register File +

Output Register File TQ4.8 130505

0.18µ ASIC implementation works at 210MHz and it can code 70 VGA frames per second

FPGAimplementation

Excluding I/O Register Files

Including I/O Register Files

Function Generators

2497 4054

CLB Slices 1249 2027

Dffs or Latches 581 583

Block Multipliers 1 1

FPGA implementation works at 81MHz and it can code 27

VGA frames per second

Hardware Implementation ResultsIn the worst case, it takes 2500 cycles to complete the TQIQIT operations of a 4x4 block

Page 14: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Context Adaptive Variable Length Encoder Hardware

1) After prediction, transformation and quantization, blocks typically contain zeros and ones

2) The highest non-zero coefficients after the zig-zag scan are often sequences of +/-1.

3) The number of non-zero coefficients in neighbouring blocks are correlated

4) The magnitude of non-zero coefficients tends to be higher at the start

Page 15: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Datapath for 4x4 Luma

Prediction Modes

Controller for 16x16 Luma

Prediction Modes

Top Level Mode

Controller

Datapath for 16x16

Luma Prediction Modes

Datapath for 8x8 Chroma

Prediction Modes

Controller for 4x4

Luma Prediction Modes

Controller for 8x8

Chroma Prediction Modes

Inputs from Top-Level

Output

MUX

Prediction Buffer (384x8)

Neigbouring Buffers

Reconstructed Pixels

Address Generation Hardwares

Internal Buffers Reconstructed Pixels

Intra Prediction Hardware

• 9 prediction modes for 4x4 luma blocks

• 4 prediction modes for 16x16 luma and 8x8 chroma blocks

Page 16: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

• Introduction

• Hardware Architectures For

Intra Frame Coder Modules

• Top Level Intra Frame Coder Hardware

• H.264 Intra Frame Coder System

• Conclusions and Future Work

OUTLINE

Page 17: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Input

Register File

SEARCH

HARDWARE

Output

Register File

CODER

HARDWARE

Pipelining

Register File

Time (cycles)

4000

Functional Units

Search Hardware

Coder Hardware

1st MB

2nd MB

3rd MB

4th MB

8000 12000 16000

Top Level Intra Frame Coder Hardware

Level @30Mhz @40Mhz @50Mhz @60Mhz @70Mhz @80Mhz

2.0(CIF @30 fps)

2525 3367 4208 5050 5892 6734

CIF @ 30 fps requires processing 11800 Macroblocks per second

Page 18: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Search Hardware

Reg. for 16 DC coefs.

Residue

384 x 8

Current MB

384 x 8

Predicted MB

Intra Pred.Hadamard Transform

Residue

256 x 8

Current MB

256 x 8

Predicted MB

Intra Pred. Hadamard Transform

Mode Decision

Luma 16x16

Chroma 8x8

Luma 4x4

Neighbors

Neighbors

Mode

Mux

QP

Page 19: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

1. Cycle: Register = 8 x

2. Cycle: Register = 16 x

3. Cycle: Register = 24 x

4. Cycle: Register = 4x4cost + 24 x

5. Cycle: Register = 16x16cost – (4x4cost + 24 x )

Intra 4x4 vs Intra 16x16 Cost Comparator

Mode Decision

1) Compute the cost of each 4x4 mode

Select the 4x4 mode with lowest cost

2) Compute the cost of each 16x16 mode

Select the 16x16 mode with lowest cost

3) Compute the cost of each 8x8 mode

Select the 8x8 mode with lowest cost

4) Compare selected 4x4 and 16x16 costs and select the best mode

5) Start the coder hardware with selected mode information

SATD based mode decision algorithm

Cost4x4

Register

<< 3Cost16x16

Mux

Add_subAdd/Sub

Result

1818

18

9

19

19

Page 20: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

High Speed Hadamard Transform Hardware

• Performs SATD computation

• Reguires only 18 cycles for a 4x4 Block z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 z15

Register

P. R

eg

iste

rP. R

eg

ister

add/sub add/sub add/sub add/sub

add/sub add/sub

add/sub add/sub

add/sub

add/sub add/sub add/sub

add/sub

add/sub

add/sub

add/sub

• 13-bit adders/subtractors

• Two-stage pipeline

Page 21: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

CAVLC

Quant Transform

InverseQuant

Inverse Transform

Reconstruct

Residue 384 x 9

Reg. file

384 x 16

Reg. file

16 x 16

Reg. File

384 x 8

Reconstructed MB

384 x 8

Current MB

384 x 8

Predicted MB

192 x 32

Reg. File

HT IHT

Intra Pred.

Bitstream

Coder Hardware

Page 22: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Scheduling of Intra 4x4 modes

TQIQIT = 100, CAVLC = 120, Residue&Reconstruction = 18, Intra Prediction = 24

Modules

Residue

Intra Prediction

TQIQIT

CAVLC

Time (cycles) 0

Reconstruction

TQ IQIT TQ IQIT

24

42

86

142

160 202 246 302 320

1st Block

2nd Block

Worst Case cycle counts required to complete a 4x4 block :

Page 23: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Scheduling of Intra 16x16 modes

1st Block

2nd Block

16th BlockTQIQIT

CAVLC

Modules

Time (cycles) 0

Residue

Reconstruction

TQ TQ TQ IQIT IQIT

920 24

42 48

86

75

130 746

HT

800 860

384

402 1040880

Intra Prediction

Page 24: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Device Utilizations for XC2V8000 FPGA

Implementation Results for H.264 Intra Frame Coder Hardware

• Synthesized at 61.4 MHz and Placed & Routed at 53.8 MHz.

• The total equivalent gate count is 1,051,458

Resources Used Available Utilization

IOs 418 1108 37.73%

Global Buffers 2 16 12.50%

Function Generators

21404 93184 22.97%

CLB Slices 10702 46592 22.97%

Dffs or Latches 3881 96508 4.02%

Block RAMs 1 168 0.60%

Block Multipliers

1 168 0.60%

Page 25: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

• Introduction

• Hardware Architectures For

Intra Frame Coder Modules

• Top Level Intra Frame Coder Hardware

• H.264 Intra Frame Coder System

• Conclusions and Future Work

OUTLINE

Page 26: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

System Overview

• PC is used to develop Verilog modules and debug the system

• Multi Ice Debugger communicates with the development board

• Development Board is used for testing the designed hardware

• Color LCD Panel is used for visual verification

Page 27: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

ARM-based Development Platform

Logic Tile

Versatile Platform Baseboard

Arm 926EJ-S Processor based Development Chip

Xilinx Virtex II 8000 FPGA

Xilinx Virtex II 2000 FPGA

Page 28: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Development Chip

Page 29: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

ARM AMBA 2.0

Page 30: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Capturing the image in RGB format

Converting the image from RGB format to

YCbCr format

4:2:0

Sampling

Partitioning the image into

macroblocks

SRAMH.264 Intra Frame Coder Hardware

Reconstructing the image in raster-scan

order

Converting the image from YCbCr format to

RGB format

Displaying the reconstructed image

SRAM

SRAM

Software Implementation

• Matlab and C codes are developed

• ARM AXD Tool is used to debug the system

• C codes run on ARM926EJ-S processor

• SRAM available on Logic Tile is used to store image data

Page 31: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

ARM Development Board implements Tri-state AHB buses

An AHB master is designed for reading and writing the image data to the SRAMs available on the logic tile.

2 SRAM controllers are instantiated in the design as slaves on AHM M1 and AHM M2 buses.

System Arbiter controls the multiplexing

Hardware Implementation

Page 32: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Verilog

modules

Leonardo Spectrum

Netlist for XC2V8000

Xilinx Project

Navigator

Bitsream for XC2V8000

High Effort for Speed

Bitstream Options

High Effort for Speed

Compiler

Logic Optimizer

Mapper

Translator

Placer

Router

Design Flow

HDL files

Synthesis

Place and Route

Resulting bitsream

Constraints

Constraints

Constraints Met?

Yes

No

Modify

Modify

Constraints Met?

Yes

No

Modify

Page 33: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

• Introduction

• Hardware Architectures For

Intra Frame Coder Modules

• Top Level Intra Frame Coder Hardware

• H.264 Intra Frame Coder System

• Conclusions and Future Work

OUTLINE

Page 34: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Conclusions

• Transform – Quant architecture is designed and verified to work at 81 MHz

• Mode Decision, Intra Prediction and CAVLC are integrated.

• Top – Level design is synthesized at 61.4 MHz and placed & routed at 53.8MHz.

• Device utilization for XC2V8000 FPGA is approximately 23% with a total equivalent gate count of 1,051,458.

• The H.264 Intra Frame Coder System is verified to work on an ARM Versatile Platform development board.

Page 35: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Future Work

• Implementing header generation functionality

• Further verification by decoding the generated bitstream using an H.264 compliant decoder

• Implementing low-power techniques such as clock gating

• Adding a camera to the system for real-time video capturing and coding

• Developing an ASIC implementation and fabricating a prototype

• Creating a complete H.264 video coding system by integrating motion estimation, motion compensation, deblocking filter, intra vs. inter mode decision and rate control units

Page 36: H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005

Thanks

?

Questions...