ecec 453 image processing architecture

34
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1 Lecture 10 ECEC 453 Image Processing Architecture Lecture 10, 2/17/2004 MPEG-2, Industrial Strength Video Compression and Friends Oleh Tretiak Drexel University

Upload: arvid

Post on 05-Jan-2016

33 views

Category:

Documents


3 download

DESCRIPTION

ECEC 453 Image Processing Architecture. Lecture 10, 2/17/2004 MPEG-2, Industrial Strength Video Compression and Friends Oleh Tretiak Drexel University. Lecture Outline. Basic Video Coding Features of MPEG-1 Features of H261 MPEG-2 Introduction to MPEG-4. Picture of Layers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 10

ECEC 453Image Processing Architecture

Lecture 10, 2/17/2004

MPEG-2, Industrial Strength Video Compression

and FriendsOleh Tretiak

Drexel University

Page 2: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 10

Lecture Outline Basic Video Coding Features of MPEG-1 Features of H261 MPEG-2 Introduction to MPEG-4

Page 3: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 10

Picture of LayersGOP-1GOP-NGOP-2IBBPBB ... PSlice-1Slice-NSlice-2Sequence LayerGOP layerPicture layermb-1mb-2mb-n012333YCrCbSlice layerMacroblock layerBlock layer

Page 4: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 10

Video Compression: Picture Types

Group of Pictures: Three types I — intraframe coding only P — predictive coding B — bi-directional coding

IPB12345678

Page 5: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 10

Typical MPEG coding parameters Typical sequence

IPBBPBBPBBPBBPBB (16 frames)

Picture Average size

Comp-ression

I 156000 6.5P 62000 16.4B 15000 67.6

Compression (GOP) = BitsPerFrameU ×NFramesPerGOP

BitsPerCodedGOPBitsPerCodedGOP=NI frames×(Bits/ Iframe)+NPframes×(Bits/Pframe)+

+NBframes×(Bits/Bframe)

Bits / Iframe =BitsPerFrameU/CI , Bits/ Pframe=BitsPerFrameU/CP

Bits /Bframe=BitsPerFrameU/CB

Compression (GOP) = NFramesPerGOP

NIframes / CI +NPframes /CP +NBframes/CB

= 161/ 6.5 + 5 / 16 .4 +10 / 67 .6

=26.4

Page 6: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 10

Block Diagram of MPEG Decoder

I frameP frame

B frame

Page 7: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 10

Macroblock Coding: I & P I pictures (almost like JPEG)

Divided into slices and macroblocks No motion compensation Each macroblock can have different quantization DC and AC coded differently, as in JPEG Different coding tables from JPEG

P pictures Divided into slices and macroblocks Option: no motion compensation Option: can code block as inter or intra (like I picture) Can skip macroblock (replace with previous). Great compression

Page 8: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 10

Coding Image Blocks B pictures

Inter or intra? Forward, backward, interpolational? Code block or skip? Quantization step?

I P B Zero MV Skipped TotalI 3300 3300P 897 8587 5128 568 15180B 60 7356 22845 429 30690

Picture Type

Macroblock typeStatistics for an image sequence

Page 9: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 10

MPEG-1: ‘1.5’ Mbps Sample rate reduction in spatial and temporal domains Spatial

Block-based DCT Huffman coding (no arithmetic coding) of motion vectors and

quantized DCT coefficients 352 x 340 pixels, 12 bits per pixel, picture rate 30 pictures per second

—> 30.4 Mbps Coded bit stream 1.15 Mbps (must leave bandwidth for audio) Compression 26:1 Quality better than VHS!

Temporal Block-based motion compensation Interframe coding (two kinds)

Page 10: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 10

Video Teleconferencing Comprehensive Standard: H.320 Components of H.320

H.261: Video coding, 64 to 1920 kbits/sec G.722, G.726, G.728: Audio coding from 16 kbits/sec to 64

kbits/sec H.221: Multiplexing of audio and video (frame based rather than

packet based) H.230 and H.242: Handshaking and control H.233: encryption

Page 11: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 10

Generic Video Telephone System

Page 12: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 14Lecture 10

H.261 Features Common Interchange Format

Interoperability between 25 fps and 30 fps countries 252 pix/line, 288 line, 30 fps noninterlace Terminal equipment converts frame and line numbers Y Cb Cr components, color sub-sampled by a factor of 2 in both

directions Coding

DCT, 8x8, 4 Y and 2 chrominance per masterblock I and P frames only, P blocks can be skipped Motion compensation optional, only integer compensation (Optional) forward error correction coding

Page 13: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 10

H.261 vs MPEG-1 Similarities

CIF, SIF, non-interlaced DCT technology

Differences H.261 uses mostly P frames, no B frames H.261 typical bit rates much lower (down to 64 kbits/sec)

Low bit rates achieved by reducing frame rate Simpler motion compensations End-to-end coding delay must be low

Conclusion: Same technology, different design to meet different needs

Page 14: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 10

MPEG 2i, i = 0, 1 History & Goals Expanding universe of video coding What are MPEG-2 profiles? Features of MPEG-2

Page 15: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 10

MPEG Home Official web site

(http://www.cselt.it/mpeg/ still works) http://mpeg.telecomitalialab.com/

Information site http://www.mpeg.org/MPEG/ (unchanged)

History MPEG-1, the standard for storage and retrieval of moving pictures and audio on

storage media (approved Nov. 92) MPEG-2, the standard for digital television (approved Nov. 94) MPEG-4 version 1, the standard for multimedia applications (approved Oct. 98),

version 2, (approved Dec. 99) Under development: MPEG-4 versions 3&4 MPEG-7 the content representation standard for multimedia information

search, filtering, management and processing. Started MPEG-21, the multimedia framework.

Page 16: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 10

MPEG Example Film on DVD: 8 Gbytes Playing time: 2 hours Bit rate 8e9 bytes x 8 bits/byte / 7200 seconds ~ 9 Mbits/sec Information? on the web

http://www.microsoft.com/windowsxp/moviemaker/expert/digitalvideo.asp

‘Bit Rate Explained Bit rate describes how much information there is per second in a stream of data. You might have seen audio files described as “128–Kbps MP3” or “64–Kbps WMA.” Kbps stands for “kilobytes per second,” ....’

Site claims that 64 Kbps WMA is as good as 128 Kbps MP3 Ignorance about bits and bytes does not encourage credibility

Page 17: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 10

MPEG-2 Goals Compatibility with MPEG-1 Good picture quality Flexibility in input format Random access capability (I pictures) Capability for fast forward, fast reverse play, stop frame Bit stream scalability Low delay for 2-way communications (videoconferencing) Resilience to bit errors

Page 18: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 10

MPEG-2 Implications No reason to restrict to CCIR 601

High resolution can be included (HDTV) No single standard can satisfy all requirements

Family of standards Most applications use a small set of the features

Toolkit approach

Page 19: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 10

MPEG-2 profiles A profile is a subset of the entire MPEG-2 bit-stream syntax

Simple Main 4:2:2 SNR Spatial High Multiview

Each profile has several levels (resolution quality) Low — MPEG1 Main — CCIR 601 High-1440 (Video Editing) High (HDTV)

Page 20: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 10

Features of MPEG-2 Support of both non-interlaced and interlaced pictures Color handling

Y Cb Cr color space Several subsampling schemes are used

4:2:0, 4:2:2, 4:4:4 MPEG-2 sequence can be either frames or fields

Both frame prediction and field prediction are supportedThere can be motion between two fields in a frame, so that

frame prediction is more tricky In frame prediction, both fields constitute one picture In field prediction, either field in the previous frame or the

previous field in this frame can be used as referenceRobustified coding of motion vectors to protect against bit

errorsSpecial prediction modes: 16x8, dual-prime

Page 21: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 10

MPEG-2: DCT and Quantization Two quantizers: one for intra blocks and one for non-intra

blocks Support different quantization blocks for luminance and

chrominance Scalable bit streams

data partitioning, SNR scalability, temporal scalability, spatial scalability

Data partitioning: headers and motion vectors in two bit streams SNR scalability: lower layer provided basic video, other layers

provide enhancements. Basic layer sent with robust modulation Spatial scalability: lower layer provides basic resolution (e. g.,

MPEG-1), upper layer provides detail Temporal scalability: lower layer provides basic (low) frame rate

Page 22: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 10

MPEG-2: Profiles 4:2:2 profile at Main level

Two Y blocks for each pair of Cb, Cr blocks Distribution format for video production Robust for several compressions and decompressions 720x608, 30 fps 50 Mbit/sec Luminance full raster, chrominance are at full line rate DC precision of intra blocks can be up to 11 bits

Main (4:2:0) profile at Main level Four Y blocks for each pair of Cb, Cr blocks Intended for broadcast quality (actually, is better) 15 Mbit/sec

Main profile at low level Like MPEG-1

Page 23: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 10

MPEG2 features Schemes for ‘frame’ and field coding. There are two fields in a frame, T (top) B (bottom) Either can be first

Frame prediction for frame pictures What’s there to say?

Field prediction for field pictures Target macroblock is in one field Prediction pixels come from one field Can be the same of different parity as target field

Field prediction for frame pictures Dual prime for P-pictures 16x8 macroblock for field pictures

Motion vectors coded at half-pel resolution

Page 24: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 10

MPEG2 - Alternate Scan

Zig-zag scan Alternate scan

Page 25: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 10

MPEG2 — Subsampling Suppose picture is 720x480

4:4:4 Luminance and chrominance @ 720x480

4:2:2 Luminance @ 720x480, chrominance 360x480

4:2:0 Luminance 420x480, chrominance 360x240

Weird terminology

Page 26: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 10

Low Y ~ 352x240 Cb, Cr ~ 176x120 30 pictures per second +/- 64 pixel displacement, half pixel resolution

Page 27: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 10

Main (4:2:0) Y ~ 720x480 Cb,Cr ~ 360x240 30 frames per second 4:3, 16:9 aspect ratio Bitrate 15 Mbps (some applications as low as 5 Mbps) Digital television

Page 28: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 10

High Y 1920x1152 Cb, Cr 960x576 60 frames per second 80 Mbps HDTV

Page 29: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 31Lecture 10

Low rate Where is it needed? How is it done?

Page 30: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 32Lecture 10

MPEG-2: DCT and Quantization Two quantizers: one for intra blocks and one for non-intra

blocks Support different quantization blocks for luminance and

chrominance Scalable bit streams

data partitioning, SNR scalability, temporal scalability, spatial scalability

Data partitioning: headers and motion vectors in two bit streams SNR scalability: lower layer provided basic video, other layers

provide enhancements. Basic layer sent with robust modulation Spatial scalability: lower layer provides basic resolution (e. g.,

MPEG-1), upper layer provides detail Temporal scalability: lower layer provides basic (low) frame rate

Page 31: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 33Lecture 10

MPEG-4Multimedia Standard

Thumbnail Description

Page 32: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 34Lecture 10

What Is Left for MPEG-4? Initial goals

Coding standards for lower-than-MPEG-1 rates Hidden agenda: Incorporate new coding methods

Wavelet, fractal Revised agenda: Object-based coding

MPEG-4 Architecture Input to coder consist of audio, video, and stored objects Decoder combines encoded objects with local objects Example: send text by sending character codes, receiver uses

character generator.

Page 33: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 35Lecture 10

EncoderStoredObjects

Muxand

Demux

Audio-VideoObjects

Muxand

Demux

DecoderStoredObjectsCompositor

Schematic Overview of MPEG-4

Page 34: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 36Lecture 10

MPEG-4 Ideas Video Object Plane (VOP)

A VOP can be a natural image from video camera or from a graphics database

A VOP can consist of several visual object. Visual objects do not have to have rectangular outline (arbitrary shape)

A scene consists of several VO’s and VOP’s with appropriate compositing

Different VOP’s can have their own motion In principle, a visual scene can be decomposed into video

objects by segmentation. Color and texture can be attributes of visual objects A viewer can manipulate VO’s.