05a compression

8/6/2019 05A Compression

1/102

05A-compression.fm 1 15.March.01

Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser Multimedia-Systems:

Compression

Prof. Dr.-Ing. Ralf SteinmetzProf. Dr. Max Mhlhuser

MM: TU Darmstadt - Darmstadt University of Technology,

Dept. of of Computer Science

TK - Telecooperation, Tel.+49 6151 16-3709,

Alexanderstr. 6, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 16-3052

RS:TU Darmstadt - Darmstadt University of Technology,

Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science

KOM - Industrial Process and System Communications, Tel.+49 6151 166151,

Merckstr. 25, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 166152

GMD -German National Research Center for Information Technologyhttc - Hessian Telemedia Technology Competence-Center e.V
http://goback/


2/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Scope

Usage Applications

Learning & Teaching Design User Interfaces

Services Content

Process-

ing

Docu-

mentsSecurity ...

Synchro-

nization

GroupCommuni-

cations

System

s Databases Programming

Media-Server Operating Systems Communications

Opt. Memories Quality of Service Networks

Basics Computer

Archi-tectures

Compression

Image &

GraphicsAnimation Video Audio
http://goback/


3/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Contents

1. Motivation2. Requirements - General

3. Fundamentals - Categories

4. Source Coding

5. Entropy Coding:

6. Hybrid Coding: Basic Encoding Steps

7. JPEG

8. H.261 and related ITU Standards

9. MPEG-1

10. MPEG-2

11. MPEG-4

12. Wavelets13. Fractal Image Compression

14. Basic Audio and Speech Coding Schemes

15. Conclusion
http://goback/


4/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

1. Motivation

Digital video in computing means for

Text:

1 page with 80 char/line and 64 lines/page and 2 Byte/Char

80 x 64 x 2 x 8 = 80 kBit/page

Image:

24 Bit/Pixel, 512 x 512 Pixel/image 512 x 512 x 24 = 6 MBit/Image

Audio:

CD-quality, samplerate44,1 kHz, 16 Bit/sample

Mono: 44,1 x 16 = 706 kBit/s

Stereo: 1.412 MBit/s Video:

full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s1024 x 1024 x 24 x 30 = 720 MBit/s

more realistic360 x 240 Pixel/frame = 60 MBit/s

Hence compression is NECESSARY
http://goback/


5/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

2. Requirements - General

high quality

compression

low delay

low complexity (e.g., ease of decoding)efficient implementation (e.g., memory req.)

intrinsic scalability
http://goback/


6/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Requirements

DIALOGUE AND RETRIEVAL mode requirements:

Independence of frame size and video frame rate

Synchronization of audio, video, and other media

DIALOGUE mode requirements:

Compression and decompression in real-time(e.g. 25 frames/s)

End-to-end delay < 150ms

RETRIEVAL mode requirements:

Fast forward and backward data retrieval

Random access within 1/2 s

Software and/or hardware-assisted implementation requirements
http://goback/


7/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

3. Fundamentals - Categories

entropy encodingsource coding

- based on semantic of the data

- often lossy

channel coding

- adaptation to communication channel

- introduction of redundancy

hybrid

coding

- entropy

and

source

coding

entropy coding

- ignoring semantics of the data

- lossless
http://goback/


8/102


Scope

Contents

http:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Categories and Techniques

EntropyCoding

Run-Length CodingHuffman Coding

Arithmetic Coding

Source

Coding

Prediction DPCM

DM

TransformationFFT

DCT

Layered Coding

Bit Position

Subsampling

Sub-Band Coding

Vector Quantization

Hybrid

Coding

JPEG

MPEG

H.261, H.263

proprietary: Quicktime, ...
http://goback/


9/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ech

nik

.tu-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Categories & Techniques, Cont. (1)

Two principal possibilities

1. Entropy Coding: Eliminate Redundancy (thus, lossless)

2. Reduction Coding: Eliminate Irrelevance / Low-Relevance (lossy)

Preparatory Step: Decorrelation - Eliminate Interdependencies this is the essence of source coding

changes "representation" of media

goal usually: reduce dependencies between data

as such, is a preparatory step!!

usually, does not compress

Steps in hybrid coding (often): decorrelation - reduction - entropy coding

often: reduction by quantization

last step: additional compresion without harm

note: literature usually uses terms as in last slide!!

note: reduction coding is "smart deletion", not really "compression"
http://goback/


10/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Categories and Techniques, Cont. (2)

Major distinction: Symmetric / Asymmetric

Asym. (usually): more effort for compression

o.k. if compression non real-time, "only once" (movie!)

may involve number-crunchers (...owned by content provider)

Symmetric: "required" for real-time, e.g., videoconferencing

in reality, often not 100% symmetricFurther considerations include, e.g.,

Adjustable compression rate? ...quality?

"smooth" bit stream ("isochronous")?

terms: CBR (const. bit rate) vs. VBR (variable bit rate)

may be "over time": e.g., packet size BigSmallSmall BigSmallSmall...

may be simulated w/ loop-back filter plus buffer

"progressive" (mainly: non-continuous media): display-while-download

"streaming": ~ same for video (here, rather an issue of software)

more subtle issues "open" standard?

good "performance" (ratio, speed) for all kinds of media?

bullet-proof, well-understood?

...
http://goback/


11/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

4. Source Coding

DPCM

DPCM = Differential Pulse-Code Modulation

Assumptions:

Consecutive samples or frames have similar values

Prediction is possible due to existing correlationFundamental Steps:

Incoming sample or frame (pixel or block) is predictedby means of previously processed data

Difference between incoming data and prediction is determined

Difference is quantized

Challenge: optimal predictor

Further predictive coding technique:

Delta modulation (DM): 1 bit as difference signal
http://goback/


12/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Source Coding: Transformation

Assumptions:

Data in the transformed domain is easier to compress

Related processing is feasible

Example:

FFT: Fast Fourier Transformation

DCT: Discrete Cosine Transformation

Inverse

Fourier Transformation

time domain frequencydomain

Fourier Transformation
http://goback/


13/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Source Coding: Sub-Band

Assumption:

Some frequency ranges are more important than others

Example:

Application:

vocoder for speech communication

MPEG audio

frequency spectrum of the signal

transformation / codingfrequency
http://goback/


14/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Entropy Coding: Principle

Entropy (in information theory): information content/ "density"

symbols/words equally likely: high entropy (full of information)

otherwise: lower entropy (suboptimal representation of info, less dense)

Entropy formula:

example: given 4 possible symbols (words) in source code

i) IF all equal p=1/4: H(P)=2; ii) IF p= 1/2, 1/4, 1/8, 1/8 --> H(P)= 1

6

/8 "Entropy coding" means:

mean length of file equals (~almost) entropy

in ii) above, with B=2 (binary):

p=! code length -log2 () = -(-1)=1; p=! 2bits, etc.

GOAL: find code w/ symbol length as close as possible to logB p()

grey levelsp

roba

bility

grey levels

pro

ba

bility

high

Entropy

low

Entropy

grey levelsp

roba

bility

grey levels

pro

ba

bility

high

Entropy

low

Entropy

P( p p B

H ) ( ) ( )log=P( p p B

H ) ( ) ( )log=

note: seems "little information" tous since it is very regular; this is not

covered by entropy formula, yet maybe used for compression (e.g. run length)

here: "little info" because"most of picture is in same gray"
http://goback/


15/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

5. Entropy Coding:

Run-Length (only marginal relation to entropy)

Assumption:

Long sequences of identical symbols

Example:

Special variant: zero-length encoding

only repetition of zeroes count

in red part above, "symbol" not needed (i.e. "pays" for >2 repetitions)

... A B C E E E E E E D A C B...

compression

... A B C E ! 6 D A C B...

symbol

special flag

number ofoccurrences
http://goback/


16/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Entropy Coding: Huffman

Basics:

Assumption: some symbols occur more often than others

E.g., character frequencies of the English language

Idea: frequent symbols --> shorter bit strings (cf. Entropy!)

Example: Characters to be encoded: A, B, C, D, E

probability to occur: p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15

probability symbol code

1

0

1

0

1

0

1

0

coding tree

30%

30%

10%

15%

15%

A

B

C

D

E

11

10

011

010

00

100%

40%

60%

25%

step 1: scan all leaves, assign(1,0) to the two with lowest

probability -> intermediate root

steps 2-n: scan current "tops"

(intermediate roots or leaves),

assign (1,0) to the two withlowest probability, -> ...

end: assign codes by descending

tree until leaves, bits encountered

represent code step 2

step 3

step 4
http://goback/


17/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

echn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Entropy Coding: Huffman

Table and example of application to data stream

note: decoder may auto-detect end-of-symbol in bit-stream

Other types of Entropy encoding

Arithmetic Encoding (1)

most direct application of entropy principles!

symbols occupy sub-interval of [0,1) according to their probabilities successive coding of symbols cuts out corresponding sub-interval

in sub-(sub-sub-...) interval chosen so far

last symbol --> last sub-interval

chose "arbitrary" ("short") no. in this subinterval --> transmit/store

requires consideration of "additional" symbol "end-of-word" (below: "!")

10 11 011 010 11 10 00 10 11 00

B A DC BA B E A

symbol code

AB

CD

E

11

10

011010

00

E
http://goback/


18/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinm

etz

,M

.Mhlhuser

Arithmetic Coding Example

symbols a,b,c,d,!; how to encode "bbd!"??

transmit/store one (arbitrary) value X in "red" sub-interval

decoder: X lies in [.1,.6) !1st symbol is "b"; X in [.11, 36) ! 2nd symbol "b"

process continues. until "!" is decoded

0 .1 .6 .7 .9 1

".1#"$$$$$$$$$$ 0.5 $$$$$$$$$$#".1#"$$0.2$$#".1#

0 .1 .6 .7 .9 1

0 .1 .11 .36 .6 .7 .9 1

0 .1 .6 .7 .9 1

0 .1 .6 .7 .9 1

divide [0,1) accordingto probabilities

restrict to b: [0.1,0.6)

b: sub-interval [.1,.6)

of [.1,.6), i.e. [.11,.36)

d: sub-interval [.7,.9)of [.11,.36)

!: sub-interval [.9,1)...

a b c d !
http://goback/


19/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

LZW, Code Books, VLWs, VLIs

LZW (Lempl-Ziv-Welch):

e.g.: how to code "whoiswho"? might be thought of as follows: start: create table entry for w

then: create table entry for h, but also for "wh" (multi-symbol),

then: create table entry for o, but also for "who", (etc. until max-length)

multi-symbol entries which repeat "often" survive, others: over-written

Code-Books: used in many compression schemes 3 basic possibilities: (imagine in above example)

"fixed": all implementations of Codec have (1..n) pre-defined code-books

"pre-computed":

encoder-pass1: compute codebook, store/transmit upfront encoder-pass2: encode (compress) data using codebook

"dynamic": code-book grows / changes during compression (LZW)

needs "same procedure" for encoder, decoder

either: "pieces" of code-book are intertwined with code as they

are generated / changed or: rules are such that (dynamic) codebook contents can be derived

from encoded (compressed) data by decoder

VLWs / VLIs: variable length words / integers

similar to Huffman, but decoder can not detect end-of-symbol

e.g., 1="0", 2="01", 3="11", ... (useful?? see JPEG etc.)
http://goback/


20/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

6. Hybrid Coding: Basic Encoding Steps

audio:

video:lossy

lossless (sometimes lossless)

lossy

lossless

e.g.

- resolution- frame rate

e.g.

- DCT- sub-band

coding

e.g.

- linear- DC, AC

values

e.g.

- runlength- Huffman

data

pre-paration

data

pro-cessing

quanti-

zation

entropy

encoding

source

data

compresse

data
http://goback/


21/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

7. JPEG

JPEG: Joint Photographic Expert Group

International Standard:

For digital compression and coding of continuous-tone still images:

Gray-scale

Color

Since 1992Joint effort of:

ISO/IEC JTC1/SC2/WG10

Commission Q.16 of CCITT SGVIII

Compression rate of 1:10 yields reasonable results
http://goback/


22/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

JPEG

Very general compression scheme

Independence of:

Image resolution

Image and pixel aspect ratio

Color representation

Image complexity and statistical characteristicsWell-defined interchange format of encoded data

Implementation in:

Software only

Software and hardwareMOTION JPEG for video compression

Sequence of JPEG-encoded images
http://goback/


23/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

JPEG - Compression Steps

imagepre-

paration

imagepro-

cessing

quanti-

zation

entropy

encoding

source

image

com-

pressed

image

blockMCU

pixel

predictor

FDCT

runlength

Huffman

Arithm.

MCU: Minimum Coded UnitFDCT: Forward Discrete Cosine Transformation
http://goback/


24/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser

JPEG - Image Preparation

Planes:

1 N 255 components Ci (e.g., one plane per color) Different resolution of individual components possible

Pixel resolution:

8 or 12 bit per pixel in lossy modes

2 to 16 bit per pixel in lossless mode

C1

C2

CNYi

Xi

right

bottom

left

* * *

*

*

* *

*

*

*

*

line

topdata units

data units: samples in lossless mode, blocks with 8x8 pixels in other modes
http://goback/


25/102


Scope

Contents

h

ttp:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuser


Example 4:2:2 YUV, 4:1:1 YUV, and YUV9 Coding

Luminance (Y): brightness

sampling frequency 13.5 MHz

Chrominance (U, V):

color differences sampling frequency 6.75 MHz
http://goback/


26/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhuse

r


Non-interleaved encoding:

Interleaved encoding:

Minimum Coded Unit (MCU):

Combination of interleaved data units of different components

top

rightleft

bottom

* * * * * * *

* * * * * * *

* * * * * * *

* * ** * ** * ** * ** * ** * ** * ** * *

* * ** * ** * ** * *

* * ** * *

* *** **

* *** **

C1 C2 C3
http://goback/


27/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG - Baseline Mode

Baseline mode is mandatory for all JPEG implementations:

Often restricted to certain resolution Often only three planes with predefined color set-up

Image preparation:

Step 1a: Pixel resolution --> multiples of p=8 bit

yields 8 x 8 pixel blocks (data units)

Step 1b: unsigned --> signed integer (prepare for "oscillation" --> sin/cos)

... other steps see below

Step 4a: zigzag linearization (see below)

Steps 4b, c, ...: several entropy coding algorithms applied

tables

1: imagepre-

paration

2: imagepro-

cessing

3. quanti-

zation

4. entropy

encodingsource

image

com-

pressed

image

FDCT tables tables8x8

blocks

I i i U d di f DCT
http://goback/


28/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

Intuitive Understanding of DCT

Fourier-Transform (& FFT "fast" algorithm) known from 1-dimensional:

cut waveform into pieces (blocks of samples) for each blocks:

interpret as periodic (infinite) oscillating waveform

represent as sum of sin/cos waves ai sin t; i=0...(N-1); same for cos

ai coefficients; a0 = DC (direct current= shift wrt. 0-axis),others: how much of the respective sin or cos wave is part of waveform

i increasing frequencies (usually N = no. of samples in block)

DCT in JPEG etc.:

same idea, but 2-dimensional cos-waves

cut out square blocks from picture (NxN) cos waves all have independent frequencies in horizontal/vertical direction

comparable to smooth hills, # of valleys may differ horiz/vert.

again: interpret sample as periodic (2D) waveform--> represent as sum of (2D) cos wave "hill areas"

why only cos?? trick: picture swapped around axes

--> 4fold size --> picture symmetric to axes --> sin parts become zero

4fold size no problem: 3 parts redundant

axes have double "weight" (pix. row/col. "0") --> factor Cu/Cv in formula

JPEG B li M d I P i
http://goback/


29/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG - Baseline Mode: Image Processing

Forward Discrete Cosine Transformation (FDCT):

with:

cu, cv = , for u, v= 0; else cu, cv = 1

Formula applied to each block for all 0 u, v 7:

Blocks with 8x8 pixel result in 64 DCT coefficients:

1 DC-coefficient S00: basic color of the block

63 AC-coefficients: (likely) zero or near-by zero values

Different significance of the coefficients:

DC: most important AC: less important

Svu

14---C

uC

vs

yx2x 1+( )u

16-------------------------------

2y 1+( )v16

-------------------------------coscos

y 0=

7

x 0=

7

=

1

2-------

JPEG B li M d I P i
http://goback/


30/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG - Baseline Mode: Image Processing

FDCT transforms:

blocks into blocks not pixels into pixels

Example:

Calculation of S00

# # # # # # # #

# # # # # # # ## # # # # # # #

# # # # # # # #

# # # # # # # #

# # # # # # # #

# # # # # # # #

# # # # # # # #

* * * * * * * *

* * * * * * * ** * * * * * * *

* * * * * * * *

* * * * * * * *

* * * * * * * *

* * * * * * * *

* * * * * * * *

JPEG Baseline Mode: Quantization
http://goback/


31/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

form

atik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG - Baseline Mode: Quantization

Use of quantization tables for the DCT-coefficients:

Map interval of real numbers to one integer number Allows to use different granularity for each coefficient

05

Sc

Co

http://www kom e-technik tu-darmstadt de
http://goback/http://goback/


32/102

A-compressio

n.fm

3215.March.01

cope

ontents

http://www.kom.e technik.tu darmstadt.dehttp://www.tk.informatik.tu-darmstadt.de

R. Steinmetz, M. Mhlhuser

JPE

G:QuantizationEffect

(a)
http://goback/


33/102

JPEG - Baseline Mode: Entropy Coding


34/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG - Baseline Mode: Entropy Coding

63 AC coefficients:

Ordering in zig-zag form

reason: coefficients in lower right corner are likely to be zero Huffman coding of all coefficients:

Transformation into a codewhere amount of bits depends on frequency of respective value

Subsequent runlength coding of zeros

* *** ***** *** ****

* *** ****

* *** ****

* *** ***** *** ***** *** ***** *** ****

AC77AC70

DC

AC07AC01

JPEG: Details of (one possible) Entropy conding
http://goback/


35/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG: Details of (one possible) Entropy conding

Treatment of "zig-zag sequence":

differential coding of DC: DCi stored as "change wrt. DCi-1" assumption: there will rarely be two non-zero AC values in sequence

--> regard seq. as iteration of non-zero AC-values and zero-runlengths--> sometimes, the zero-runlength will have "length zero"

code non-zero AC-values as VLIs --> need to transmit VLI-lengths

(remember: this is not Huffman --> end of code not found by decoder) create pairs (zero-runlength, VLI-length-of-following-non-zero-AC-value)

these pairs are Huffman encoded

the very first "pair" is not a pair, but the VLI-length of the (diff.) DC-value

the block is finally represented as iterationHuffman-encoded pair / VLI-encoded non-zero-AC / Huffman-.... / VLI... / ...preceded by "Huffman-encoded VLI-length / VLI-encoded diff.-DC"

The next two slides give an example of the DCT coding of a 8x8 block

JPEG: Sample Compression of 1 Block: 8x8 Matrices
http://goback/


36/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG: Sample Compression of 1 Block: 8x8 Matrices

1. Typical Pixel Block:

3. Quantization Matrix:

139 144 149 153 155 155 155 155

144 151 153 156 159 156 156 156

150 155 160 163 158 156 156 156

159 161 162 160 160 159 159 159

159 160 161 162 162 155 155 155

161 161 161 161 160 157 157 157

162 162 161 163 162 157 157 157

162 162 161 161 163 158 158 158

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

16 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

2. DCT Coefficients:

4. Quantized Result:

235.6 1.0 -12.1 -5.2 2.1 -1.7 -2.7 1.3

-22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2

-10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1

-7.1 -1.9 0.2 1.5 0.9 -0.1 0.0 0.3

-0.6 -0.8 1.5 1.6 -0.1 -0.7 0.6 1.3

1.8 -0.2 1.6 -0.3 -0.8 1.5 1.0 -1.0

-1.3 -0.4 -0.3 -1.5 -0.5 1.7 1.1 -0.8

-2.6 1.6 -3.8 -1.8 1.9 1.2 -0.6 -0.4

15 0 -1 0 0 0 0 0

-2 -1 0 0 0 0 0 0

-1 -1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

JPEG: Sample Compression (contd.)
http://goback/


37/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG: Sample Compression (contd.)

assume: last DC value was 18 --> encoded difference is 3

--> only 3, -2, -1 occur as non-zero values.Their VLI-encoding is as follows:

3 11

-2 01

-1 0

This makes the iteration look as follows (VLIs still represented as integers):

(2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0) (


38/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

JPEG 4 Modes of Compression

lossy sequential DCT-based mode

(baseline mode)

expanded lossy DCT-based mode

lossless mode

hierarchical mode

JPEG - Extended Lossy DCT-Based Mode
http://goback/


39/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

J G te ded ossy C ased ode

Pixel resolution 8 to 12 bit

Sequential image display: Top to bottom

Good for small images and fast processing

Progressive image display:

Coarse to fine

Good for large and complicated images
http://goback/


40/102

JPEG - Lossless Mode


41/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuse

r

Image preparation:

On pixel basis (2-16 bit/pixel)Image processing:

Selection of a predictor for each pixel

Entropy coding:

Same as lossy mode

Code of chosen predictor and its difference to the actual value

c b

a x

predictioncode012345

67

no predictionx=Ax=Bx=Cx=A+B+Cx=A+((B-C)/2)

x=B+((A-C)/2)x=(A+B)/2

JPEG - Hierarchical Mode
http://goback/


42/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuser

Coding of each image with several resolutions:

Image scaling Differential encoding

First, coded with lowest resolution - image A

Coded with increasing horizontal & vertical resolution - image A

Difference between both images is computed - B = A - A (*)

Iteration for higher resolutions

Features:

Requires more storage and higher data rate

Fast decoding process

Used for scalable video Similar to Photo-CD (Kodak, proprietary)

(*) note for all scalable approaches:

relate higher-res version B (or B) to receivers de-codedlower-res version A (to avoid accumulation of quantization errors)

8. H.261 and related ITU Standards
http://goback/


43/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Stein

me

tz,

M.

Mhlhuser

Video codec for audiovisual services at p x 64kbit/s

("p-times-sixtyfour", where p means "multiples-of"): CCITT standard from 1990

For ISDN

With p=1,..., 30

Technical issues:

Real-time encoding/decoding

Max. signal delay of 150ms

Constant data rate

Implementation in hardware (main goal) and software

H.261 - Image Preparation
http://goback/


44/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Steinme

tz,

M.

Mhlhuser

Fixed source image format

Image components: Luminance signal (Y)

Two color difference signals (Cb,Cr)

Subsampling according to CCIR 601 (4:1:1)

Quarter Common Intermediate Format (QCIF) resolution: Mandatory Y: 176 x 144 pixel ("pruning" 180-->176)

At 29.97 frames/s appr. 9.115 Mbps (uncompressed)

but: encoder may leave out up to 3 frames (--> ~8 fps)

Common Intermediate format (CIF) resolution: Optional

Y: 352 x 288 pixel

At 29.97 frames/s appr. 36.46 Mbps (uncompressed) i.e. ~ 570 * 64kbps

Layered structure:

Block of 8 x 8 pixels

Macroblock of: 4 Y blocks, 1 Cr block, 1 Cb block Group of blocks (GOBs) of 3 x 11 macroblocks

Picture:

QCIF picture: 3 GOBs

CIF picture: 12 GOBs

CIF: 360*288

QCIF

H.261 - Image Compression
http://goback/


45/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Steinme

tz,

M.

Mhlhuser

Intraframe coding: yields "reference frame" f0

DCT w/ same quantization factor for all AC values this factor may be adjusted by loopback filter (see below)

Interframe coding, motion estimation:

interframes: f1,f2,f3,... relative to f0 (differential encoding)

in H.261: intraframes rare (bandwidth!, main application videophone)

Search of similar macroblock (16x16) in previous image

Position of this macroblock defines motion vector Search range is up to the implementation:

max. 15 pixel

but: motion vector may also always be 0 ("bad" software encoder)

Frame 1 Frame 2

note about motion vector mv:- mv points "backwards" in time

(pos. of object in f- mv related to block,

not moving object

H.261 - Image Compression
http://goback/


46/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Steinme

tz,

M.

Mhlhuser

Interframe coding, further steps:

Results: Difference between similar macroblocks

Motion vector

Difference of macroblocks:

DCT if value higher than a specific threshold (hybrid DPCM/DCT!)

No further processing if value less than this threshold

Motion vector:

Components are coded yielding code words of variable length

Quantization:

Linear Adaptation of step size ("loopback filter") => ~ constant data rate

("leaky bucket": constant 64kbps "drop out";loopback filter: adjust quantization factor if bucket filledabove threshold1 or below threshold 2, respectively)

Further ITU Video Schemes (H.263, H.3xx)
http://goback/


47/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

for

ma

tik

.tu-d

arms

tadt.d

e

R.

Steinme

tz,

M.

Mhlhuser

H.263

extension to H.261 max. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates suitable f. modem

Source Image Formats

Format PixelsH.261 H.263

Encoder Decoder Encoder Decoder

SQCIF 128 x 96 optional required

QCIF 176 x 144 required required

CIF 352 x 144 optional optional

4CIF 704 x 576not defined optional

16CIF 1408 x 1152
http://goback/


48/102

H.320, H.32x Family


49/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.d

e

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.d

e

R.

Steinme

tz,

M.

Mhlhuser

H.320 specifies (as overview) videophone for ISDN

H.310 adapt MPEG 2 for communication over B-ISDN (ATM)

H.321

define videoconferencing terminal for B-ISDN (instead of N-ISDN)

H.322 adapt H.320 for guaranteed QoS LANs (like ISO-Ethernet)

H.323

videoconferencing over non-guaranteed LANs

H.324 Terminal for low bit rate communication (over V.34 Modems)
http://goback/


50/102


51/102

MPEG - Video: Preparation Step


52/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Stei

nme

tz,

M.

Mhlhus

er

Fixed image format

Color subsampling: Y, Cr, Cb 4:2:0

Resolution:

Should be at most 768 x 576 pixel 8 bit/pixel in each layer (i.e., for Y, Cr, Cb)

14 pixel aspect ratios

8 frame rates

No user defined MCU like JPEG

No progressive mode like JPEG

MPEG - Video: Processing Step
http://goback/


53/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Stei

nme

tz,

M.

Mhlhus

er

4 types of frames:

I-frames (intra-coded frames): Like JPEG

Real-time decoding demands

P-frames (predictive coded frames):

Reference to previous I- or P-frames Motion vector

MPEG does not define how to determine the motion vector

difference of similar macroblocks is DCT coded

DC and AC coefficients are runlength coded

B-frames (bi-directional predictive coded frames):

Reference to previous and subsequent (I or P) frames

Interpolation between macro blocks

D-frames (DC-coded frames):

Only DC-coefficients are DCT coded For fast forward and rewind

MPEG - Video Coding
http://goback/


54/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Sequence of I-, P-, and B-frames:

Sequence:

Defined by application

E.g., I B B P B B P B B I B B P B B P B B

Order of transmission is different: I P B B ...

I

P

B

BP

B

B

IReferences

t

I-Frames (Intracoded)

P-Frames (Predictive Coded)

B-Frames (Bidirectionally Coded)

(D-Frames (DC Coded))

MPEG - Video: Implications
http://goback/


55/102


Scope

Contents

http:/

/www

.kom

.e-te

chn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Random access

at I-frames at P-frames: i.e. decode previous I-frame first

at B-frame: i.e. decode I and P-frames first

Editing

decoded data

loss of quality (encode -> decode -> encode -> ...)

application of all video editing functions

encoded data (previous to entropy encoding)

preservation of quality

transition effects as function in the DCT domain morphing, non-block conform overlay very difficult

encoded data

preservation of quality

today: too complex, if possible, i.e. need for entropy decoding

MPEG - Audio Coding: Fundamentals
http://goback/


56/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Masking threshold in the frequence domain

narrowband random noise

depends on frequency

0.02 0.05 0.1 0.2 0.5 1 2 5 10 20frequency (kHz)

SoundPressureLevel(dB)

0

20

40

60

80

fm

= 0.25 1 4 kHz

av

absolute thresholdof hearing

masking

patterns

MPEG - Audio Coding: Fundamentals
http://goback/


57/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Masking in Time Domain

after and before the event

depends on (to some extent) amplitude

-50 50 100 150 ms 0 50 100 150 200

SLT

0

20

40

60

Dt tv

masker

simultaneous-pre- post-masking-

MPEG - Audio Coding
http://goback/


58/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Yields: heavily asymmetric codecs!!

Audio channel:

Between 32 and 448 kbit/s

In steps of 16 kbit/s

Definition of 3 "layers" of quality:

"higher layer" means "more complex" & "can handle lower layers"

Layer 1: max. 448 Kbit/s (ca. 1:4 compression, e.g. used as PASC in DCC) Layer 2: max. 384 Kbit/s (ca. 1:6-8, common, e.g. as MUSICAM in DAB)

Layer 3: max. 320 Kbit/s (ca. 1:10-12, the famous MP3)

sub-bandcoding

quanti-zation

entropy

psychoacousticalmodel

32coder &

framepacking

controls: how many bits reservedfor which sub-band

MPEG - Audio Coding

S li tibl t di f CD DA d DAT
http://goback/


59/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.de

R.

Steinme

tz,

M.

Mhlhus

er

Sampling compatible to encoding of CD-DA and DAT:

Sampling rates: 32 kHz, 44,1 kHz, 48 kHz

Sampling precision:

16 bit/sample

Audio channels:

Mono (single, 1 channel)

Stereo (2 channels)

dual channel mode (independent, e.g., bilingual)

optional: joint stereo (exploits redundancy and irrelevancy)

Application Example: DAB Digital Audio Broadcasting uses MPEG layer 2 (compression also known as MUSICAM =

(Masking pattern adapted Universal Subband Integrated Coding And Multiplexing)

delays, for VLSI implementation:

max. 30 ms encoding

max. 10 ms decoding

SW codec delays vary for different layers, implementations, computers(rule-of-thumb may be 50/100/150 ms for layer 1/2/3, which makesMP3 rather inappropriate for real-time conversation)

MPEG - Audio and Video Data Streams

A di D t St L
http://goback/


60/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.

de

R.

Steinme

tz,

M.

Mhlhus

er

Audio Data Stream Layers:

1. Frames

2. Audio access units

3. Slots

Video Data Stream Layers:

1. Video sequence layer

2. Group of pictures layer

3. Single picture layer

4. Slice layer

5. Macroblock layer

6. Block layer

10. MPEG-2

Follow-Up MPEG Standards
http://goback/


61/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.

de

R.

Steinme

tz,

M.

Mhlhus

er

Follow Up MPEG Standards

MPEG-2:

Higher data rates for high-quality audio/video

Multiple layers and profiles

MPEG-3

Initially HDTV

MPEG-2 scaled up to subsume MPEG-3

MPEG-4:

Initially, lower data rates for e.g. mobile communication

then: focus coding & additional functionalities based on image contents

MPEG-7 (EC = "experimental core" status):

Content description

Basis for search and retrieval

See section on databasesMPEG-21 (upcoming):

Framework for multimedia business, delivery... whats missing?

maybe eCommerce focus --> e.g., security, watermarking?

MPEG-2

From MPEG 1 to MPEG 2
http://goback/


62/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.

de

R.

Ste

inme

tz,

M.

Mhlhus

er

From MPEG-1 to MPEG-2

Improvement in quality from VCR to TV to HDTV

No CD-ROM based constraints

higher data rates

MPEG-1: about 1.5 Mbit/s

MPEG-2: 2-100 Mbit/s

Evolution

1994: International Standard

Also later known as H.262

Prominent role for digital TV in DVB (digital video broadcasting) commercial MPEG-2 realizations available

MPEG-2 Video

Inclusion of interlaced video format
http://goback/


63/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.in

fo

rma

tik

.tu-d

arms

tadt.

de

R.

Ste

inme

tz,

M.

Mhlhus

er

Inclusion of interlaced video format

Increase resolution, more than CCIR 601Defined as:

5 profiles (simple, main,..)

4 levels (with increasing resolution,...)

Other additional features DCT coefficients may be coded with a non-linear quantization function

MPEG-2 Video: Scaling

Motivation
http://goback/


64/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt.

de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt.

de

R.

Ste

inme

tz,

M.

Mhlhuser

Motivation

analog: continuous decrease in quality if errors occur digital: need for tolerance whenever error occur, i.e scaling

Option: Spatial scaling

reduction of resolution

approach

image sampled with half resolution, then MPEG algorithms applied,output processed with better FEC (base layer)

Image decoded, substracted from original, to difference MPEG algorithmsapplied, output processed with worse FEC (enhanced layer)

Option: Signal to Noise (SNR) scaling noise introduced by

quantization errors and visible block structures

approach

Base layer: DCT output, more significant bits encoded with better FEC

Enhanced layer: DCT output, less significant bits encoded with worse FEC

MPEG-2 Video Profiles und Levels
http://goback/


65/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser High Level

1920 pixels/line

1152 lines

80 Mbit/s

100 Mbit/s

High-1440Level

1440 pixels/line1152 lines

60 Mbit/s 60 Mbit/s 80 Mbit/s

Main Level720 pixels/

line576 lines

15 Mbit/

s

15 Mbit/

s

15 Mbit/

s

20 Mbit/s

Low Level352 pixels/

line288 lines

4 Mbit/s 4 Mbit/s

SimpleProfile

MainProfile

SNRScalable

Profile

SpatialScalable

Profile

HighProfile
http://goback/


66/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

LEVELSand

PROFILES

No B-frames B-frames B-frames B-frames B-frames

4:2:0 4:2:0 4:2:0 4:2:04:2:0 or

4:2:2

NotScalable

NotScalable

SNRScalable

SNR

Scalable orSpatial

Scalable

SNR

Scalable orSpatial

Scalable

MPEG-2 Audio

(two modest) extension to MPEG-1 audio:
http://goback/


67/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

( )

1) "low sample rate extension" LSE: 1/2 of all MPEG-1 rates: 16, 22.05, 24kHz

quantization down to 8 bits/sample

2) "multichannel extension": more channels, i.e. up to

5 full bandwidth channels (surround system) left and right front

center (in front)

left and right back

"matrixing": rule for backward compatible conversion --> stereo (x, y = 0.71)

option: +1 "low freq. extension" (LFE) channel for subwoofer

"multilingual extension": 7 more, i.e. up to 12 channels

(multiple languages, commentary)

compatibility with MPEG-1:

all MPEG-1 audio format can be processed by MPEG-2

only 3 MPEG-2 audio codecs do not provide backward compatibility

Left for Stereo Left_f xCenter yLeft_b+ +=

Right for Stereo Right_f xCenter yRigtht_b+ +=

MPEG-2 System

Steps
http://goback/


68/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

p

1. Audio and video combined to Packetized Elementary Stream (PES)2. PES(es) combined to Program Stream or Transport Stream

Program stream:

Error-free environment

Packets of variable length One single stream with one timing reference

Transport stream:

Designed for noisy (lossy) media channels

Multiplex of various programs with one or more time bases Packets of 188 byte length

Conversion between Program and Transport Streams possible

11. MPEG-4

Goals
http://goback/


69/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

MPEG-4 (ISO 14496) originally:

Targeted at systems with very scarce resources

To support applications like

Mobile communication

Videophone and E-mail Max. data rates and dimensions (roughly):

Between 4800 and 64000 bits/s

176 columns x 144 lines x 10 frames/s

Largely covered by H.263, therefore re-orientation: Goal to provide enhanced functionality

to allow for analysis and manipulation of image contents

MPEG-4: Schedule for Standardization

1993 Work started

1997: Committee Draft

1998: Final Committee Draft

1998: Draft International Standard

1999-2000: International Standard

MPEG-4: Goals (cont.)

1: support composite multimedia i.e. find standardized ways to
http://goback/


70/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

Represent units of aural, visual or audiovisual content "audio/visual objects" or AVOs

object coding independent of other objects, surroundings and background Compose these objects together

i.e. creation of compound objects that form audiovisual scenes

Multiplex and synchronize the data associated with AVOs

for transportation over network channels providing QoS (Quality-of-Service)

2: support synthetic objects

computer-gen. (VR), synthesized (txt2speech), model-based ("face")

3: support truly interactive applications (more than play/pause/rewind..)

Interact with the audiovisual scene generated at the decoders site

Rhubarb

Audioobject 1 video objects

12

3

Audioobject 2

Rhubarb

MPEG-4: Scope

Definition of
http://goback/


71/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhuser

System Decoder Model specification for decoder implementations

Description language

binary syntax of an AVOs bitstream representation

scene description information

Corresponding concepts, tools and algorithms,especially for

content-based compression of simple and compound audiovisual objects

manipulation of objects

transmission of objects

random access to objects

animation

scaling

error robustness

e e r

MPEG-4: Scope (cont.)

Targeted bit rates for video and audio:
http://goback/


72/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhu

ser

VLBV core Very Low Bit-rate Video

5 - 64 Kbit/s

image sequences with up to CIF resolution and up to 15 frames/s

Higher-quality video

64 Kbit/s - 4 Mbit/s quality like digital TV

Natural audio coding

2 - 64 Kbit/s

e e r

MPEG-4: Video and Image Encoding

Encoding / decoding of
http://goback/


73/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.info

rma

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhu

ser

Rectangular imagesand video

coding similar toMPEG-1/2

motion prediction

texture coding Images and video of

arbitrary shape

as done inconventional

approach 8x8 DCT or shape-adaptive DCT

plus coding of shape and transparency information

Encoder

Must generate timing information

speed of the encoder clock = time base

desired decoding times and/or expiration times

by using time stamps attached to the stream

Can specify the minimum buffer resources needed for decoding

e er

MPEG-4: Composition of Scenes

Scene description includes:
http://goback/


74/102


Scope

Contents

http:/

/www

.kom

.e-tec

hn

ik.t

u-d

arms

tadt

.de

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt

.de

R.

Ste

inme

tz,

M.

Mhlhu

se

Tree to define hierarchical relationships between objects

Objects positions in space and time

by converting the objects local coordinate system into a global coordinatesystem

Attribute value selection

e.g. pitch of sound, color, texture, animation parameters

Description based on some VRML concepts

VRML = Virtual Reality Modelling Language

Interaction with scenes e.g. change viewing point, drag object, start/stop streams, select

language

RhubarbRhubarb

primitive AVO

compound object

compound object

05A-compression

Scope

Contents

http://www.kom.e-technik.tu-darmstadt.dehttp://www.tk.informatik.tu-darmstadt.de

R. Steinmetz, M. Mhlhuser
http://goback/http://goback/


75/102

n.fm7515.March.01

MPE

G-4:Exampleofa

Composi

tion

ee er

MPEG-4: Scaling

Three approaches:


76/102


Scope

Contents

http:/

/www

.kom

.e-tec

hn

ik.t

u-d

arms

tadt

.d

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt

.d

R.

Ste

inme

tz,

M.

Mhlhu

se

Spatial scalability decoder displays textures and visual objects at a reduced spatial resolution

by decoding only a subset of the total bit stream

32 levels max. for textures and still images

3 levels max. for video sequences

Temporal scalability decoder displays video at a reduced temporal resolution

by decoding only a subset of the total bit stream

3 levels max.

Quality scalability

bitstream is parsed into a number of bit stream layers of different bit-rates

either during transmission or in the decoder

subset of the layers still yields a meaningful signal

Spatial and temporal scaling both for

Conventional rectangular display and Objects with arbitrary shape

de

de e

r

MPEG-4: Synthetic Objects

Visual objects:
http://goback/


77/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt.d

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt.d

R.

Ste

inme

tz,

M.

Mhlhu

se

Human face start object: neutral-expression face

animated via FDPs and/or FAPs

FAP (facial anim param): animate current display

FDP (facial def. param): alternative shape/texture

Mesh + texture mapping: for 2D & 3D meshes 2D mesh may also be used for human face anim., see above

only triangular 2D meshes, vertices may be moved (mv!), texture is warped

e.g. virtual background

Texture coding for view-dependent applications

texture, e.g. virt. background; decoder/encoder loop for "minimal" Xmission

Audio objects:

Text-to-speech

speech generation from given text and prosodic parameters

face animation control Score driven synthesis

music generation from a score

more general than MIDI

Special effects

de

de e

r

MPEG-4: Layered Networking Architecture

Display / Recording
http://goback/


78/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt.d

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt.d

R.

Ste

inme

tz,

M.

Mhlhu

s

Access Units

Adaptation Layer

FlexMux Layer

Elementary Streams

Multiplexed Streams

Network or Local Storage

e.g. video or audio framesor scene description commands

A/V object data+ stream type info, sync. info, QoS req.,...

e.g. multiple elementary streams

with similar QoS requirements

Flexible Multiplexing

Transport Multiplexing

- only interface specifiedTransMux Layer

- layer itself can be any network,e.g. RTP/UDP/IP, AAL5/ATM

CoDecCoDecCoDecCoDec

Media

Coding / Decoding

de

de

ser

MPEG-4: Layered Networking Architecture (cont.)

DMIF Delivery Multimedia Integration Framework

Allows to establish multiple party sessions
http://goback/


79/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt.d

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt.d

R.

Ste

inme

tz,

M.

Mhlhu

s Allows to establish multiple party sessions

interaction with

remote interactive peers

broadcast systems

storage systems

establishment of channels with specific QoSs and bandwidths Controls

FlexMux layer

TransMux layer

de

de

ser

MPEG-4: Error Handling

Mobile communication:

Low bit-rate (< 64 Kbps)
http://goback/


80/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt.

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt.

R.

Ste

inme

tz,

M.

Mhlhu

s Low bit rate (< 64 Kbps)

Error-prone

MPEG-4 concepts for error handling:

Resynchronization

enables receiver to tune in again

based on markers within bitstream Data recovery

enables receiver to reconstruct lost data

encode data in an error-resilient manner

Error concealment enables receiver to bridge gaps in data

e.g. by repeating parts of old frames

.de

.de

ser

12. Wavelets

Motivation
http://goback/


81/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt.

http:/

/www

.tk

.informa

tik

.tu-d

arms

tadt.

R.

Ste

inme

tz,

M.

Mhlhu

s

JPEG / DCT problems:

DCT not applicable to whole image, but only to small blocks block structure becomes visible at high compression ratios

Scaling as add-on additional effort

DCT function is fixed can not be adapted to source data

Improvements by using Wavelets:

Transformation of the whole image

overcomes visible block structures and introduces inherent scaling

Better identification of which data is relevant to human perception

higher compression ratio

.de

.de

ser

Wavelets: Compression / Decompression

Compressor
http://goback/


82/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

http:/

/www

.tk

.inf

orma

tik

.tu-d

arms

tadt

R.

Ste

inme

tz,

M.

Mhlhu

The same overall structure as for DCT-based algorithms

But: important differences in the transformation step

Quantizer Encoder

Inverse WaveletTransformation

DeQuantizer Decoder

Forward WaveletTransformation

Decompressor

t.de

t.de

ser

Wavelets: Fundamental Idea

Image is transformed into the frequency domain (as in JPEG)
http://goback/


83/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

http:/

/www

.tk

.inf

orma

tik

.tu-d

arms

tadt

R.

Ste

inme

tz,

M.

Mhlhu But: based on Wavelet functions instead of cosine functions

Advantage: Wavelets are 0 outside a limited interval

Wavelet automatically relates only to a part of the image Image needs not be splitted into blocks

"Frequencies"??? :

Use Wavelet family: {2-j/2*(2-j*x-k)}, j,k Z, being a Wavelet

cosine:

......

Wavelet e.g.:

t.de

t.de

ser

Wavelets: Transformation Steps

"Discrete Wavelet Transformation" (Mallat, 1989)
http://goback/


84/102


Scope

Contents

http:/

/www

.kom

.e-

tec

hn

ik.t

u-d

arms

tadt

http:/

/www

.tk

.inf

orma

tik

.tu-d

arms

tadt

R.

Ste

inme

tz,

M.

Mhlhu Split image recursively by using high and low pass filters

c1

d11

d12

d13

L

H

L

H

L

H

. . .

read bycolumn (vert. op.)

L Low Pass + downsampling

H High Pass + downsampling

line (horiz.

read by

lowerfrequencies

transformedimage withreduced size

higherfrequencies

operations

t.de

t.de

ser

Wavelets: Transformation Steps (cont.)

In each step i:

Three images d i (x=1,2,3):
http://goback/


85/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.inf

orma

tik

.tu-d

arms

tad

R.

Ste

inme

tz,

M.

Mhlhu x containing the high frequency parts of the image

representing "details" of the image

submitted to Wavelet transformation

or thrown away in case of scaling

One image ci: containing the lower frequency parts of the image

representing the original image with less details / at a lower resolution

submitted to step i+1

Up to here: 4 images with 1/4 resolution each --> no compression!but again: decorrelation: many coefficients in d-images (close to) zero

Afterwards:

Quantization

Entropy encodingas with DCT

t.de

t.de

ser

Wavelets: DWT compared with DCT

Advantages of DWT over DCT:

No block artefacts
http://goback/


86/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.inf

orma

tik

.tu-d

arms

tad

R.

St

einme

tz,

M.

Mhlhu

Inherent scaling

based on the dxi for i=1,2,3,...

Lower time complexity for the transformation

DCT: O(n*logn),

DWT: O(n) (n=number of values to be transformed) Higher flexibility: Wavelet function can be freely chosen

(but: howto choose?)

t.de

t.de

ser

Wavelets: Further Issues

Edge detection reduces high frequencies:

First extract detected edges
http://goback/


87/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

St

einme

tz,

M.

Mhlhu

Then apply wavelets to such a filtered image

Application to video:

In-2In-1

Image n

Imt

Compute

differences

...In-1 - In-2

In - In-1

...t

Wavelet

compressor

t.de

t.de

ser

13. Fractal Image Compression

Fractal Geometry was first applied to image generation
http://goback/


88/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

St

einme

tz,

M.

Mhlhu

remember "Mandelbrot" images recursive construction of images

infinite granularity (i.e. zoom-in), but compact "image data" (formula)(such forms are called fractals)

Zi

= RealConst. * Zi-1

+ ComplexConst

d

t.de

d

t.de

user

Use of Fractals for Compression??? Overview (1)

observation: self-similarities in natural images(clouds, dunes, beaches: zoom-in reveals similar forms as large image)

idea can nat ral images be described / fractal geometr ??
http://goback/


89/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

St

einme

tz,

M.

Mhlh idea: can natural images be described w/ fractal geometry??

first published by Barnsley & Sloan (88), first impl. 89 by Arnaud Joquin

Key #1: Iterated Function Systems IFS:

input (sub-)picture subject to math. transform. of type picture moved, rotated / mirrored, and contracted

--> all transformations are "contractions"

Key #2: Banachs Fixed Point Theorem: apply a set Wimg={Wi} of contractions to an image

after infinitely many applications, a specific image appears

... called "attractor" or "fractal"

this process is independent of initial "start" image!!

human perception: iteration can stop "pretty soon" (finite no. of iterations)

Q: how to find Wimg such that attractor is image-to-be-compressed?

+ f

e

y

x

dc

ba

ad

t.de

ad

t.de

user

Use of Fractals for Compression??? Overview (2)

Key #3: Collages Theorem:

in order to find Wimg

as above: search Wimg

such that image is(almost) transformed into itself!
http://goback/


90/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

St

einme

tz,

M.

Mhlh img img(almost) transformed into itself!

First algorithm published (Joaquin):

partition image into (small, non-overlapping) "range blocks"

search (larger, overlapping) "domain blocks" which can be"contracted" into range blocks

for each range block, find domain block and contraction(lots of possibilities!!)

details / simplifications of Joaquin approach see below

ad

t.de

ad

t.de

user

To apply self-similarity: Image Generation

Examples

(from TUD + Univ. Bochum)

for recursive
http://goback/


91/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

Steinme

tz,

M.

Mhlh for recursive

contruction of images

Sirpinky triangle

to produce self-

similar structures infinite steps appliedto different sourceimages lead to sameresult

known asSirpinski-triangle

"Grenzwert" alsoknown as attractor

ad

t.de

ad

t.de

huser

To Find Self-Similarities

affine function allows for

translation

rotation
http://goback/


92/102


Scope

Contents

http:/

/www

.kom

.e-t

ec

hn

ik.t

u-d

arms

tad

http:/

/www

.tk

.informa

tik

.tu-d

arms

tad

R.

Steinme

tz,

M.

Mhlh rotation

scaling

brightness (/color) adaptation

IFS:Iterative Function System

ideally completely self-similar

example see right

PIFS:Partitioned Iterative Function System

real images arenot completly self-similar

Wimg?

ad

t.de

ad

t.de

huser

Theoretical Basis

Banachs Fixed Point Theorem:

Let F be a metrical space

Let W: FF be a contractive mapping
http://goback/


93/102


Scope

Contents

http:/

/www

.kom

.e

-tec

hn

ik.t

u-d

arms

t

http:/

/www

.tk

.informa

tik

.tu-d

arms

t

R.

Steinme

tz,

M.

Mhlh Let W: FF be a contractive mapping

i.e. there exists an s, 0


94/102


Scope

Contents

http:/

/www

.kom

.e

-tec

hn

ik.t

u-d

arms

t

http:/

/www

.tk

.informa

tik

.tu-d

arms

t

R.

Steinme

tz,

M.

Mhlh Decompression: Apply Wimg iteratively to any image easy

Stop when error falls below some bound

Error can be calculated by "Collage Theorem"

tad

t.de

tad

t.de

huser

How to Find Wimg? Joaquins Approach

Systematic search based on"Partitioned Iterative Function System (PIFS)"

Partition image into "range blocks" Ri
http://goback/


95/102


Scope

Contents

http:/

/www

.kom

.e

-tec

hn

ik.t

u-d

arms

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

R.

Steinme

tz,

M.

Mhl Partition image into range blocks Ri

8*8 pixel blocks

non-overlapping

Consider all "domain blocks" Dj of double size

16*16 pixel blocks overlapping

Find for each Ri the most similar Dj consider rotations (0o/90o/180o/270o) and mirroring

adapt brightness and contrast of Dj to that of Ri

translation, rotation, mirroring, brightness/contrast adaptationdefine a (partial) affine function

Combine partial functions to Wimg Compression rate? Example: for each (8*8) range block:

contraction factor fixed

3 bit for transformation

16 bit for domain block coordinates

12 bit for brightness/contrast adaptation

--> factor is 8x8x8 : 31= 512:31 (cf. JPEG example)
http://goback/


96/102


97/102

stad

t.de

stad

t.de

hlh

user

14. Basic Audio and Speech Coding Schemes

Voice encoder/decoder: "vocoder"

Background ITU driven activities


98/102


Scope

Contents

http:/

/www

.kom

.e

-tec

hn

ik.t

u-d

arms

http:/

/www

.tk

.in

forma

tik

.tu-d

arms

R.S

teinme

tz,

M.

Mh

g ITU driven activities

G.711: PCM

with 64 kbps

G.722 differential PCM (DPCM)

48, 56, 64 kbps

G.723

Multipulse-maximum Likelihood Quatizer (MP-MLQ): 6,3 kbps

Algebraic Codebook Excitation Linear Prediction (ACELP) 5,3 kbps

application: speech

stad

t.de

stad

t.de

hlh

user

Schemes for Speech Coding

05a compression

Documents