2. transform coding

80
Lecture notes of Image Compression and Video Compression series 2005 2. Transform Coding Prof. Amar Mukherjee Weifeng Sun

Upload: dinhcong

Post on 10-Feb-2017

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2. Transform Coding

Lecture notes of Image Compression and Video

Compression series 2005

2. Transform Coding

Prof. Amar MukherjeeWeifeng Sun

Page 2: 2. Transform Coding

#2

Topics

Introduction to Image CompressionTransform CodingSubband Coding, Filter BanksHaar Wavelet TransformSPIHT, EZW, JPEG 2000Motion CompensationWireless Video Compression

Page 3: 2. Transform Coding

#3

Transform Coding

Why transform Coding?Purpose of transformation is to convert the data into a form where compression is easier. This transformation will transform the pixels which are correlated into a representation where they are decorrelated. The new values are usually smaller on average than the original values. The net effect is to reduce the redundancy of representation.

For lossy compression, the transform coefficients can now be quantized according to their statistical properties, producing a much compressed representation of the original image data.

Page 4: 2. Transform Coding

#4

Transform Coding Block Diagram

Transmitter

Receiver

Segment into n*n Blocks

Forward Transform

Quantization and Coder

Original Image f(j,k) F(u,v) F*(uv)

Channel

Combine n*n Blocks

Inverse Transform Decoder

Reconstructed Image f(j,k) F(u,v) F*(uv)

Page 5: 2. Transform Coding

#5

How Transform Coders Work

Divide the image into 1x2 blocksTypical transforms are 8x8 or 16x16

x1

x1

x2

x2

Page 6: 2. Transform Coding

#6

Joint Probability Distribution

Observe the Joint Probability Distribution or the Joint Histogram.

x1x2

Probability

Page 7: 2. Transform Coding

#7

Pixel Correlation in Image[Amar]

Rotate 45o clockwise

Source Image: Amar

Before Rotation After Rotation

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

−=⎥

⎤⎢⎣

⎡=

2

1

2

1

45cos45sin45sin45cos

XX

YY

Yoo

oo

Page 8: 2. Transform Coding

#8

Pixel Correlation Map in [Amar]-- coordinate distribution

Upper: Before RotationLower:After Rotation

Notice the variance of Y2 is smaller than the variance of X2.Compression: apply entropy coder on Y2.

Page 9: 2. Transform Coding

#9

Pixel Correlation in Image[Lenna]

Let’s look at another example

Before Rotation After Rotation

Source Image: Lenna⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

−=⎥

⎤⎢⎣

⎡=

2

1

2

1

45cos45sin45sin45cos

XX

YY

Yoo

oo

Page 10: 2. Transform Coding

#10

Pixel Correlation Map in [Lenna]-- coordinate distribution

Upper: Before RotationLower:After Rotation

Notice the variance of Y2 is smaller than the variance of X2.Compression: apply entropy coder on Y2.

Page 11: 2. Transform Coding

#11

Rotation Matrix

Rotated 45 degrees clockwise

Rotation matrix A

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

−=⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

−==⎥

⎤⎢⎣

⎡=

2

1

2

1

2

1

22222222

45cos45sin45sin45cos

XX

XX

AXYY

Yoo

oo

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

−=⎥

⎤⎢⎣

−=

1111

22

22222222

45cos45sin45sin45cos

oo

oo

A

Page 12: 2. Transform Coding

#12

Orthogonal/orthonormal Matrix

Rotation matrix is orthogonal.The dot product of a row with itself is nonzero.The dot product of different rows is 0.

Futhermore, the rotation matrix is orthonormal.

The dot product of a row with itself is 1.

Example: ⎥⎦

⎤⎢⎣

⎡−

=1111

22A

⎩⎨⎧ =≠

=•else 0

if 0 jiAA ji

⎩⎨⎧ =

=•else 0

if 1 jiAA ji

Page 13: 2. Transform Coding

#13

Reconstruct the Image

Goal: recover X from Y.Since Y = AX, so X = A-1YBecause the inverse of an orthonormal matrix is its transpose, we have A-1 = AT

So, Y = A-1X = ATXWe have inverse matrix

⎥⎦

⎤⎢⎣

⎡ −=⎥

⎤⎢⎣

⎡ −==−

1111

22

45cos45sin45sin45cos1

oo

ooTAA

Page 14: 2. Transform Coding

#14

Energy Compaction

Rotation matrix [A] compacted the energy into Y1.Energy is the variance (http://davidmlane.com/hyperstat/A16252.html)

Given:

The total variance of X equals to that of Y. It is 41. Transformation makes Y2 (0.707) very small.

If we discard min{X}, we have error 42/41 =0.39If we discard min{Y}, we have error 0.7072/41 =0.012Conclusion: we are more confident to discard min{Y}.

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=

1111

22

2

1

2

1 AXX

AYY

Y

mean theis where )(1

1 2

1

2 µµδ ∑=

−−

=N

iix

N

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡−

==⎥⎦

⎤⎢⎣

⎡=

0.7076.364

19

22

54

1111

22AX then ,

54

X Y

Page 15: 2. Transform Coding

#15

Idea of Transform Coding

Transform the input pixels X0,X1,X2,...,Xn-1 into coefficients Y0,Y1,...,Yn-1 (real values)

The coefficients have the property that most of them are near zero. Most of the “energy” is compacted into a few coefficients.

Scalar quantize the coefficientThis is bit allocation.Important coefficients should have more quantization levels.

Entropy encode the quantization symbols.

Page 16: 2. Transform Coding

#16

Forward transform (1D)

Get the sequence Y from the sequence X.Each element of Y is a linear combination of elements in X.

The element of the matrix are also called the weight of the linear transform, and they should be independent of the data (except for the KLT transform).

Basis Vectors

0 0,0 0, 1 0

1 1,0 1, 1 1

n

n n n n n

Y a a X

Y a a X

− − − − −

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥= ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

L

M M O M M

L

∑−

=

−==1

0, 1,1,0

n

iiijj njXaY L

AXY =

Page 17: 2. Transform Coding

#17

Choosing the Weights of the Basis Vector

The general guideline to determine the values of A is to make Y0 large, while remaining Y1,...,Yn-1 to be small. The value of the coefficient will be large if weights aij reinforce the corresponding data items Xj. This requires the weights and the data values to have similar signs. The converse is also true: Yi will be small if the weights and the data values to have dissimilar signs.

Page 18: 2. Transform Coding

#18

Extracting Features of Data

Thus, the basis vectors should extract distinct features of the data vectors and must be independent orthogonal). Note the pattern of distribution of +1 and -1 in the matrix. They are intended to pick up the low and high “frequency” components of data.Normally, the coefficients decrease in the order of Y0,Y1,...,Yn-1.So, Y is more amenable to compression than X. AXY =

Page 19: 2. Transform Coding

#19

Energy Preserving (1D)

Another consideration to choose rotation matrix is to conserve energy.For example, we have orthogonal matrix

Energy before rotation: 42+62+52+22=81Energy after rotation: 172+32+(-5)2+12=324Energy changed!Solution: scale W by scale factor. The scaling does not change the fact that most of the energy is concentrated at the low frequency components.

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−

−−=

111111111111

1111

A

⎥⎥⎥⎥

⎢⎢⎢⎢

=

2564

X⎥⎥⎥⎥

⎢⎢⎢⎢

−==

25

317

AXY

Page 20: 2. Transform Coding

#20

Energy Preserving, Formal Proof

The sum of the squares of the transformed sequence is the same as the sum of the squares of the original sequence.Most of the energy are concentrated in the low frequency coefficients.

=

=

=

=

=

=

=

=

1

1

2

1

1

2

)(

)()(

Energy

n

ii

T

TT

TT

T

n

i

Ti

X

XXXAAX

AXAXAXAX

YYY

OrthonormalMatrix;

See page #12

Page 21: 2. Transform Coding

#21

Why we are interested in the orthonormal matrix?

Normally, it is computationally difficult to get the inverse matrix.The inverse of the transformation matrix is simply its transpose.

A-1=AT

1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 111 1 1 1 1 1 1 181 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

A

⎡ ⎤⎢ ⎥− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥− − − −⎢ ⎥= ⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎣ ⎦

1

1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 111 1 1 1 1 1 1 181 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

TB A A−

⎡ ⎤⎢ ⎥− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥− − − −⎢ ⎥= = = ⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎣ ⎦

Page 22: 2. Transform Coding

#22

Two Dimensional Transform

From input Image I, we get D.Given Transform matrix A

2D transformation goes as:

Notice the energy compaction.

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−

−−=

111111111111

1111

21A

⎥⎥⎥⎥

⎢⎢⎢⎢

=

9542674563869674

D 9657693644258764

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−−

=

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−

−−

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−

−−==

75.175.025.125.125.225.025.325.075.125.025.375.175.375.075.275.22

111111111111

1111

21

9542674563869674

111111111111

1111

21TAXAY

Page 23: 2. Transform Coding

#23

Two Dimensional Transform

Because transformation matrix is orthonormal, So, we have

Forward transform

Backward transform

1−= AAT

TAXAAXAY == −1

YAAYAAX T== −1

Page 24: 2. Transform Coding

#24

Linear Separable transform

Two dimensional transform is simplified as two iterations of one-dimensional transform.

Column-wise transform and row-wise transform.

1 1

, , , ,0 0

N N

k l k i i j l ji j

Y a X a− −

= =

=∑∑

Page 25: 2. Transform Coding

#25

Transform and Filtering

Consider the orthonormaltransform.

If A is used to transform a vector of 2 identical elements x = [x,x]T, the transformed sequence will indicating the “low frequency” or the “average”value is and the “high frequency” component is 0 because the signal value do not vary. If x = [3,1]T or [3,-1]T, the output sequence will be and respectively. Now, the high frequency component has positive value and it is bigger for [3,-1]T,indicating a much large variation. Thus, the two coefficients behave like output of a “low-pass”and a “high-pass” filters, respectively.

Tx )0,2(

⎥⎦

⎤⎢⎣

⎡−

=1111

22A

x2

T)2,22(T)22,2(

Page 26: 2. Transform Coding

#26

Transform and Functional Approximation

Transform is a kind of function approximation. Image is a data set. Any data set is a function.Transform is to approximate the image function by a combination of simpler, well defined “waveforms” (basis functions).Not all basis sets are equal in terms of compression.DCT and Wavelets are computationally easier than Fourier.

Page 27: 2. Transform Coding

#27

Comparison of various transforms

The KLT is optimal in the sense of decorrelating and energy-packing.Walsh-Hadamard Transform is especially easy for implementation.

basis functions are either -1 or +1, only add/sub is necessary.

Page 28: 2. Transform Coding

#28

Two-Dimensional Basis Matrix

The outer product of two vectors V1 and V2 is defined as V1

TV2. For example,

For a matrix A of size n*n, the outer product of ith row and jth column is defined as

[ ]⎥⎥⎥

⎢⎢⎢

⎡=

⎥⎥⎥

⎢⎢⎢

⎡=

cfcecdbfbebdafaead

fedcba

VV T21

[ ]⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

⎥⎥⎥⎥

⎢⎢⎢⎢

=

−−−−

− 1,1,1,1,0,1,

1,1,1,1,0,1,

1,0,1,0,0,0,

1,1,0,

1,

1,

0,

njnijnijni

njijiji

njijiji

njjj

ni

i

i

ij

aaaaaa

aaaaaaaaaaaa

aaa

a

aa

L

MOMM

L

L

LM

α

Page 29: 2. Transform Coding

#29

Outer Product

For example, if We have:

⎥⎦

⎤⎢⎣

⎡−

=11

1122A

[ ] ⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=

1111

2111

22

11

22

00α

[ ] ⎥⎦

⎤⎢⎣

⎡−−

=−⎥⎦

⎤⎢⎣

⎡=

1111

2111

22

11

22

01α

[ ] ⎥⎦

⎤⎢⎣

⎡−−

=⎥⎦

⎤⎢⎣

⎡−

=11

112111

22

11

22

10α

[ ] ⎥⎦

⎤⎢⎣

⎡−

−=−⎥

⎤⎢⎣

⎡−

=1111

2111

22

11

22

11α

Page 30: 2. Transform Coding

#30

Outer Product (2)

We have shown earlier that Consider X to be a 2*2 matrix:

YAAX T=

1111101001010000

11011000

1101100011011000

1101100011011000

11011000

11011000

1110

0100

1110

0100

1111

1111

1111

1111

21

21

1111

21

1111

1111

21

αααα yyyy

yyyy

yyyyyyyyyyyyyyyy

yyyyyyyy

yyyy

xxxx

+++=⎭⎬⎫

⎩⎨⎧

⎥⎦

⎤⎢⎣

⎡−

−+⎥

⎤⎢⎣

⎡−−

+⎥⎦

⎤⎢⎣

⎡−−

+⎥⎦

⎤⎢⎣

⎡=

⎥⎦

⎤⎢⎣

⎡+−−−+−−−++++

=

⎥⎦

⎤⎢⎣

⎡−⎥

⎤⎢⎣

⎡−−++

=

⎥⎦

⎤⎢⎣

⎡−⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

Page 31: 2. Transform Coding

#31

Basis Matrix

The quantities are called the basis matrices in 2-D space. In general, the outer products of a n*n orhtonormal matrix form a basis matrix set in 2 dimension. The quantity is called the DC coefficient (note all the elements for the DC coeeficient are 1, indicating an average operation), and other coefficients have alternating values and are called AC coefficients.

11100100 ,,, αααα

00α

Page 32: 2. Transform Coding

#32

Fast Cosine Transform

2D 8X8 basis functions of the DCT: The horizontal frequency of the basis functions increases from left to right and the vertical frequency of the basis functions increases from top to bottom.

.

Page 33: 2. Transform Coding

#33

Amplitude distribution of the DCTcoefficients

Histograms for 8x8 DCT coefficient amplitudes measured for natural images

DC coefficient is typically uniformly distributed.The distribution of the AC coefficients have a Laplacian distribution with zero-mean.

Page 34: 2. Transform Coding

#34

Discrete Cosine Transform (DCT)

Conventional image data have reasonably high inter-element correlation.DCT avoids the generation of the spurious spectral components which is a problem with DFT and has a fast implementation which avoids complex algebra.

Page 35: 2. Transform Coding

#35

One-dimensional DCT

The basis idea is to decompose the image into a set of “waveforms”, each with a particular “special” frequency.To human eyes, high spatial frequencies are imperceptible and a good approximation of the image can be created by keeping only the lower frequencies. Consider the one-dimensional case first. The 8 arbitrary grayscale values (with range 0 to 255, shown in the next slide) are level shifted by 128 (as is done by JPEG).

Page 36: 2. Transform Coding

#36

(b)

-150

-100

-50

0

50

100

150

1 2 3 4 5 6 7 8

x

f(x)

(c)

-150

-100

-50

0

50

100

150

1 2 3 4 5 6 7 8

u

S(u)

-1

0

1

1 2 3 4 5 6 7 8

U=7

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=6

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=5

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=4

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=3

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=2

Am

plitu

de

-1

0

1

1 2 3 4 5 6 7 8

U=1A

mpl

itude

-1

0

1

1 2 3 4 5 6 7 8

U=0

Am

plitu

de

Before DCT (image data) After DCT (coefficients

An example of 1-D DCT decomposition

The 8 basis functions for 1-D DCT

One-dimensional DCT

Page 37: 2. Transform Coding

#37

One-dimensional DCT

The waveforms cam be denoted as with frequencies f =

0, 1, …, 7. Each wave is sampled at 8 pointsto form a basis vector.

The eight basis vector constitutes a matrix A:

πθθ ≤≤= 0 with ),cos()( ffw

1615,

1613,

1611,

169,

167,

165,

163,

16ππππππππθ =

-0.1950.556-0.8310.981-0.9810.831-0.556-0.195

0.383-0.9240.924-0.383-0.3830.924-0.9240.383

-0.5560.981-0.195-0.8310.8310.195-0.9810.556

0.707-0.707-0.7070.7070.707-0.707-0.7070.707

-0.8310.1950.9810.556-0.556-0.981-0.1950.831

0.9240.383-0.383-0.924-0.924-0.3830.3830.924

-0.981-0.831-0.556-0.1950.1950.5560.8310.981

11111111

Page 38: 2. Transform Coding

#38

Page 39: 2. Transform Coding

#39

One-dimensional DCT

The output of the DCT transform is:

where A is the 8*8 transformation matrix defined in the previous slide, and I(x) is the input signal.S(u) are called the coefficients for the DCT transform for input signal I(x).

[ ]TxIAuS )()( =

Page 40: 2. Transform Coding

#40

One-dimensional FDCT and IDCT with N=8 Sample points

The 1-D DCT in JPEG is defined as:FDCT

IDCT

Where (u is frequency)I(x) is the 1-D sampleS(u) is the 1-D DCT coefficientAnd

0,1,..7for ]16

)12(cos[)(2

)()(7

0=

+= ∑

=

uuxxIuCuSx

π

0,1,..,7for ]16

)12(cos[)(2

)()(7

0

=+

=∑=

xuxuSuCxIu

π

⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(u

uuC

Page 41: 2. Transform Coding

#41

One-dimensional FDCT and IDCT with N Sample points

The 1-D DCT in JPEG is defined as:FDCT

IDCT

Where (u is frequency)I(x) is the 1-D sampleS(u) is the 1-D DCT coefficient and

1-1,2,..Nfor ]2

)12(cos[)()(2)(1

0=

+= ∑

=

uN

uxxIuCN

uSN

x

π

1-0,1,..Nfor ]2

)12(cos[)()(2)(1

0

=+

= ∑−

=

xN

uxuSuCN

xIn

u

π

⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(u

uuC

Page 42: 2. Transform Coding

#42

One-dimensional FDCT and IDCT

As an example, let I(x)=[12,10,8,10,12,10,8,11]

If we now apply IDCT, we will get back I(x). We can quantize the coefficient S(u) and still obtain a very good approximation of I(x). For example,

While)9989.10,99354.7,9932.9,0164.12,93097.9,96054.7,0233.10,0254.12(

)3.0,2.0,8.1,2.3,8.1,5.0,6.0,6.28(=

−−IDCT

[ ] ].309.0,191.0,729.1,182.3,757.1,4619.0,5712.0,6375.28[)()( −−== TxIAuS

)684.10,053.8,014.10,347.12,573.9,6628.7,6244.9,236.11()0,0,2,3,2,0,0,28(

=−IDCT

Page 43: 2. Transform Coding

#43

Two-dimensional FDCT and IDCT for 8X8 block

The 2-D DCT in JPEG is defined as:FDCT

IDCT

WhereI(y,x) is the 2-D sampleS(v,u) is the 2-D DCT coefficientAnd

∑∑==

++=

7

0

7

0]

16)12(cos[]

16)12(cos[),(

2)(

2)(),(

xy

uxvyxyIuCvCuvS ππ

∑∑==

++=

7

0

7

0

]16

)12(cos[]16

)12(cos[),(2

)(2

)(),(uv

uxvyuvSuCvCxyI ππ

⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(u

uuC⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(v

vvC

Page 44: 2. Transform Coding

#44

Fast Cosine Transform

2D 8X8 basis functions of the DCT: The horizontal frequency of the basis functions increases from left to right and the vertical frequency of the basis functions increases from top to bottom.

.

Page 45: 2. Transform Coding

#45

Two-dimensional FDCT for NXM block

Where I(y,x) is the 2-D sample, S(v,u) is the 2-D DCT coefficient and

10,1,..., and 10,1,...,for

]2

)12(cos[]2

)12(cos[),(2

)(2

)(2),(

FDCT1

0

1

0

−==

++= ∑∑

=

=

MvN-u

Nux

MvyxyIuCvC

NMuvS

M

y

N

x

ππ

⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(u

uuC⎪⎩

⎪⎨⎧

>

==0for 1

0for 22

)(v

vvC

Page 46: 2. Transform Coding

#46

Separable Function

The 2-D FDCT is a separable function because we can express the formula as

10,1,..., and 10,1,...,for

21)(2cos ]

2)12(cos[),()(2),( ][ }{

1

0

)(21

0

−==

++∑= ∑

=

=

MvN-u

Muy

NuxxyIvC

MuvS

M

y

uCN

N

x

ππ

This means that we can compute a 2-D FDCT by first computing a row-wise1-D FDCT and taking the output and perform on it a column-wise 1-D FDCTtransform. This is possible, as we explained earlier, because the basis matrices are orthonormal.

Page 47: 2. Transform Coding

#47

Two-dimensional IDCT for NXM block

.1 0,1,..., and 1 0,1,...,for

]2

)12(cos[]2

)12(cos[),( )( )( 2),(1

0

1

0

−=−=

++= ∑∑

=

=

MyNxN

uxM

vyuvSvCuCNM

xyIN

u

M

v

ππ

This function is again separable and the computation can be done intwo steps: a row-wise 1-D IDCT followed by a column-wise 1-D IDCT.A straightforward algorithm to compute both FDCT and IDCT will need( for a block of NXN pixels) an order of O(N3) multiplication/ additionoperations. After the ground-breaking discovery of O(N logN) algorithmfor Fast Fourier Transform, several researchers proposed efficientalgorithms for DCT computation.

Page 48: 2. Transform Coding

#48

JPEG Introduction - The background

JPEG stands for Joint Photographic Expert GroupA standard image compression method is needed to enable interoperability of equipment from different manufacturerIt is the first international digital image compression standard for continuous-tone images (grayscale or color)The history of JPEG – the selection process

Page 49: 2. Transform Coding

#49

JPEG Introduction – what’s the objective?

“very good” or “excellent” compression rate, reconstructed image quality, transmission ratebe applicable to practically any kind of continuous-tone digital source imagegood complexityhave the following modes of operations:

sequential encodingprogressive encodinglossless encodinghierarchical encoding

Page 50: 2. Transform Coding

#50

encoder decoderSource

image datacompressedimage data

reconstructedimage data

Image compression system

encoderstatistical

model

entropyencoder

Encodermodel

Sourceimage data

compressedimage data

descriptors symbols

modeltables

entropycoding tables

The basic parts of an JPEG encoder

JPEG Architecture Standard

Page 51: 2. Transform Coding

#51

JPEG has the following Operation Modes:Sequential DCT-based modeProgressive DCT-based modeSequential lossless modeHierarchical mode

JPEG entropy coding supports:Huffman encodingArithmetic encoding

JPEG Overview (cont.)

Page 52: 2. Transform Coding

#52

JPEG Baseline System

JPEG Baseline system is composed of:Sequential DCT-based modeHuffman coding

The basic architecture of JPEG Baseline system

Sourceimage data

quantizer entropyencoder

compressedimage data

tablespecification

tablespecification

8×8 blocks DCT-based encoder

statisticalmodel

FDCT

Page 53: 2. Transform Coding

#53

The Baseline System

The DCT coefficient values can be regarded as the relative amounts of the 2-D spatial frequencies contained in the 8×8 block

the upper-left corner coefficient is called the DC coefficient, which is a measure of the average of the energy of the block

Other coefficients are called AC coefficients, coefficients correspond to high frequencies tend to be zero or near zero for most natural images

DC coefficient

Page 54: 2. Transform Coding

#54

Quantization Tables in DCT

Human eyes are less sensitive to high frequencies.We use different quantization value for different frequencies.

Higher frequency, bigger quantization value.Lower frequency, smaller quantization value.

Each DCT coefficient corresponds to a certain frequency.High-level HVS is much more sensitive to the variations in the achromatic channel than in the chromatic channels.

Page 55: 2. Transform Coding

#55

The Baseline System – Quantization

Why quantization? .to achieve further compression by representing DCT coefficients with no greater precision than is necessary to achieve the desired image quality

Generally, the “high frequency coefficients” has larger quantization values Quantization makes most coefficients to be zero, it makes the compression system efficient, but it’s the main source that make the system “lossy” and introduces distortion in the reconstructed image.The quantization step-size parameters for JPEG are given by twoQuantization Matrices, each element of the matrix being an integerof size between 1 to 255. The DCT coeeficients are divided by the corresponding quantization parameters and rounded to nearest integer.

Page 56: 2. Transform Coding

#56

)),(),((),('

vuQvuFRoundvuF =

F(u,v): original DCT coefficientF’(u,v): DCT coefficient after quantizationQ(u,v): quantization value

There are two quantization tables: one for the luminance component and the other for the Chrominance component. JPEGstandard does not prescribe the values of the table; it is left uptothe user, but they recommend the following two tables under normal circumstances. These tables have been created by analysing data of human perception and a lot of trial and error.

Page 57: 2. Transform Coding

#57

Quantization Tables in DCT

So, we have two quantization tables. Measured for an “average” person.Higher frequency, bigger quantization value.Lower frequency, smaller quantization value.

9910310011298959272

10112012110387786449

921131048164553524

771031096856372218

6280875129221714

5669574024161314

5560582619141212

6151402416101116

9999999999999999

9999999999999999

9999999999999999

9999999999999999

9999999999996647

9999999999562624

9999999966262118

9999999947241817

Luminance quantization table Chrominance quantization table

Page 58: 2. Transform Coding

#58

Two-dimensional DCT

The image samples are shifted from unsigned integer with range [0, 2 N-1] to signed integers with range [- 2 N-1, 2 N-1-1]. Thus samples in the range 0-255 are converted in the range -128 to 127 and those in the range 0 to 4095 are converted in the range -2048 to 2047. This zero-shift done for JPEG to reduce the internal precision requirements in the DCT calculations. How to interpret the DCT coefficients?

The DCT coefficient values can be regarded as the relative amounts of the 2-D spatial frequencies contained in the 8×8 block.F(0,0) is called DC coefficient, which is a measure of the average of the energy of the block.Other coefficients are called AC coefficients, coefficients correspond to high frequencies tend to be zero or near zero for most images.

The energy is concentrated in the upper-left corner.

Page 59: 2. Transform Coding

#59

Baseline System - DC coefficient coding

After quantization step, the DC coefficient are encoded separately by using differential encoding. The DC difference sequence is constituted by concatenating the DC coefficient of the first block followed by the difference values of DC coefficients of the succeeding blocks. The average values of the succeeding blocks are usually correlated.

DPCMquantized DCcoefficients

DC difference

Differential pulse code modulation

Page 60: 2. Transform Coding

#60

Baseline System - AC coefficient coding

AC coefficients are arranged into a zig-zag sequence. This is because most of the significant coefficients are located in the upper left corner of the matrix. The high frequency components are mostly 0’s and can be efficiently coded by run-length encoding.

0 1 5 6 14 15 27 28

2 4 7 13 16 26 29 42

3 8 12 17 25 30 41 43

9 11 18 24 31 40 44 53

10 19 23 32 39 45 52 54

20 22 33 38 46 51 55 60

21 34 37 47 50 56 59 61

35 36 49 48 57 58 62 63

Horizontal frequency

Ver

tical

freq

uenc

y

Page 61: 2. Transform Coding

#61

An Example (Ref:T. Acharya, P. Tsai,”JPEG 2000 Standard for Image Compression), Wiley-Iterscience, 2005

The 8X8 Image Data Block110 110 118 118 121 126 131 131108 111 125 122 120 125 134 135106 119 129 127 125 127 138 144110 126 130 133 133 131 141 148115 116 119 120 122 125 137 139115 106 99 110 107 116 130 127110 91 82 101 99 104 120 118103 76 70 95 92 91 107 106

Page 62: 2. Transform Coding

#62

Page 63: 2. Transform Coding

#63

Baseline System - Statistical modeling

Statistical modeling translate the inputs to a sequence of “symbols” for Huffman coding to useStatistical modeling on DC coefficients:

Category: different size (SSSS)Magnitude: amplitude of difference (additional bits)

Page 64: 2. Transform Coding

#64

Coding the DC coeeficients

The prediction errors are classified into 16 categories in modulo 216. CategoryC contains the range of integers [-(2c-1), + (2c-1)] excluding the numbers in therange [-(2c-1-1), + (2c-1-1)] which falls in the previous category. The categorynumber is encoded using either a fixed code (SSSS), unary code (0, 10, 110, …)or a variable length Huffman code.

Page 65: 2. Transform Coding

#65

Coding the DC/AC coefficientsIf the magnitude of the error is positive, it is encoded using ‘category’

number of bits with a leading ‘1’. If it is negative, I’s complement of the positive number is used. Thus, if error is -6 and if the Huffman code for category 3 is 1110, it will get a code (1110001). The JPEG Standard recommends Huffman two codes for the category numbers: one for Luminance DC and the other for Chrominance DC ( See Salomon, p.289; it is also described in Annex K ( Table k.3 and K.4 of the Standard)

Statistical modeling on AC coefficients:Run-length: RUN-SIZE=16*RRRR+SSSSAmplitude: amplitude of difference (additional bits)

Page 66: 2. Transform Coding

#66

The AC coefficients contains just a few nonzero numbers, with runs of zeros followed by a long run of trailing zeros. For each nonzeronumber, two numbers are generated: 1)16*RRRR which gives the number of consecutive zeros preceding the nonzero coefficient being encoded, 2)SSSS gives the category corresponding to number of bits required to encode the AC coefficient using variable length Huffman code. This is specified in the form of a table (see next slide). The second attribute of the AC coefficient is its amplitude which isencoded exactly the same way as the DC values. Note the code (0,0) is reserved for EOB ( signifying that the remaining AC coefficients are just a single trailing run of 0’s). The code (15,0) is reserved for a maximum run of 16 zeros. If it is more than 16, it is broken down into runs of 16 plus possibly a single run less than 16. JPEG recommends two tables for Huffman codes for AC coefficients, Tables K5 and K6 for luminance and Chrominance values.

Page 67: 2. Transform Coding

#67

Huffman AC statistical model run-length/amplitude combinations Huffman coding of AC coefficients

Encoding the AC Coefficients

Page 68: 2. Transform Coding

#68

Example (Contd.)

Let us finish our example. The DC coeeficient is -6 has a code o 1110001. The AC coeeficients will be encoded as:(0,3)(-6), (0,3)(6), (0,3)(-5), (1,2)(2), (1,1)(-1), (5,1)(-1),(2,1)(-1), (0,1)(1),(0,0). The Table K.5 gives the codes1010, 00, 100, 1100, 11011, 11100 and 11110101 for the pairs of symbols (0,0), (0,1), (0,3), (1,1), (1,2), (2,1) and (5,1), respectively. The variable length code for AC coefficients 1, -1, 2, -5, 6 and -6 are 1, 0, 10, 010,110 and 001, respectively. Thus JPEG will give a compressed representation of the example image using only 55 bits as:1110001 100001 100110 100010 1101110 11000 11110100

111000 001 1010The uncompressed representation would have required 512 bits!

Page 69: 2. Transform Coding

#69

Example to illustrate encoding/decoding

Page 70: 2. Transform Coding

#70

JPEG Progressive Model

Why progressive model?Quick transmissionImage built up in a coarse-to-fine passes

First stage: encode a rough but recognizable version of the imageLater stage(s): the image refined by successive scans till get the final imageTwo ways to do this:

Spectral selection – send DC, AC coefficients separatelySuccessive approximation – send the most significant bits first and then the least significant bits

Page 71: 2. Transform Coding

#71

JPEG Lossless Model

DPCMSample values Descriptors

Differential pulse code modulation

C BA X

selection value prediction strategy

7

0123

(A+B)/2

no predictionABC

Predictors for lossless coding

B+(A-C)/2

A+B-CA+(B-C)/2

456

Page 72: 2. Transform Coding

#72

JPEG Hierarchical Model

Hierarchical model is an alternative of progressive model (pyramid)Steps:

filter and down-sample the original images by the desired number of multiplies of 2 in each dimensionEncode the reduced-size image using one of the above coding modelUse the up-sampled image as a prediction of the origin at this resolution, encode the differenceRepeat till the full resolution image has been encode

Page 73: 2. Transform Coding

#73

The Effect of Segmentation

The image samples are grouped into 8×8 blocks. 2-D DCT is applied on each 8×8 blocks.

Because of blocking, the spatial frequencies in the image and the spatial frequencies of the cosine basis functions are not precisely equivalent. According to Fourier’s theorem, all the harmonics of the fundamental frequencies must be present in the basis functions to be precise. Nonetheless, the relationship between the DCT frequency and the spatial frequency is a close approximation if we take into account the sensitivity of human eye for detecting contrast in frequency.The segmentation also introduces what is called the “blocking artifacts”. This becomes very pronounced if the DC coefficients from block to block vary considerably. These artifacts appear asedges in the image, and abrupt edges imply high frequency. The effect can be minimized if the non-zero AC coefficients are kept.

Page 74: 2. Transform Coding

#74

Some other transforms

Discrete Fourier Transform (DFT)Haar TransformKarhunen Loève Transform (KLT)Walsh-Hadamard Transform (WHT)

Page 75: 2. Transform Coding

#75

Discrete Fourier Transform (DFT)

Well-known for its connection to spectral analysis and filtering.Extensive study done on its fast implementation (O(Nlog2N) for N-point DFT).Has the disadvantage of storage and manipulation of complex quantities and creation of spurious spectral components due to the assumed periodicity of image blocks.

Page 76: 2. Transform Coding

#76

Haar Transform

Very fast transform.The easiest wavelet transform.Useful in edge detection, image coding, and image analysis problems.Energy Compaction is fair, not the best compression algorithms.

2D basis function of Haar transform

Page 77: 2. Transform Coding

#77

Karhunen Loève Transform (KLT)

Karhunen Loève Transform (KLT) yields decorrelated transform coefficients.Basis functions are eigenvectors of the covariance matrix of the input signal.KLT achieves optimum energy concentration.Disadvantages:

KLT dependent on signal statisticsKLT not separable for image blocksTransform matrix cannot be factored into sparse matrices

Page 78: 2. Transform Coding

#78

Walsh-Hadamard Transform (WHT)

Although far from optimum in an energy packing sense for typical imagery, its simple implementation (basis functions are either -1 or +1) has made it widely popular.

Transformation matrix:1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 111 1 1 1 1 1 1 181 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

A

⎡ ⎤⎢ ⎥− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥− − − −⎢ ⎥= ⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎢ ⎥− − − −⎢ ⎥

− − − −⎢ ⎥⎣ ⎦

Page 79: 2. Transform Coding

#79

Walsh-Hadamard Transform (WHT)

Walsh-Hadamard transform requires adds and subtractsUse of high speed signal processing has reduced its use due to improvements with DCT.

Page 80: 2. Transform Coding

#80

Transform Coding: Summary

Purpose of transformde-correlationenergy concentration

KLT is optimum, but signal dependent and, hence, without a fast algorithmDCT reduces blocking artifacts appeared in DFTThreshold coding + zig-zag-scan + 8x8 block size is widely used today

JPEG, MPEG, ITU-T H.263.Fast algorithm for scaled 8-DCT

5 multiplications, 29 additionsAudio Coding

MP3 = MPEG 1- Layer 3 uses DCT