transform coding 2005

Upload: cao-huy

Post on 14-Apr-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Transform Coding 2005

    1/80

    Lecture notes of Image Compression and Video

    Compression series 2005

    2. Transform Coding

    Prof. Amar Mukherjee

    Weifeng Sun

  • 7/27/2019 Transform Coding 2005

    2/80

    #2

    Topics

    z Introduction to Image Compression

    z Transform Coding

    z Subband Coding, Filter Banks

    z Haar Wavelet Transformz SPIHT, EZW, JPEG 2000

    z

    Motion Compensationz Wireless Video Compression

  • 7/27/2019 Transform Coding 2005

    3/80

    #3

    Transform Codingz Why transform Coding?

    z Purpose of transformation is to convert the data intoa form where compression is easier. Thistransformation will transform the pixels which arecorrelated into a representation where they aredecorrelated. The new values are usually smaller

    on average than the original values. The net effectis to reduce the redundancy of representation.

    z For lossy compression, the transformcoefficients can now be quantized according to

    their statistical properties, producing a muchcompressed representation of the originalimage data.

  • 7/27/2019 Transform Coding 2005

    4/80

    #4

    Transform Coding Block Diagram

    z Transmitter

    z Receiver

    Segment into

    n*n BlocksForward

    Transform

    Quantization

    and Coder

    Original Image f(j,k) F(u,v) F*(uv)

    Channel

    Combine n*nBlocks InverseTransform Decoder

    Reconstructed Image f(j,k) F(u,v) F*(uv)

  • 7/27/2019 Transform Coding 2005

    5/80

    #5

    How Transform Coders Work

    z Divide the image into 1x2 blocks

    z Typical transforms are 8x8 or 16x16

    x1

    x1

    x2

    x2

  • 7/27/2019 Transform Coding 2005

    6/80

    #6

    Joint Probability Distribution

    z Observe the Joint Probability Distribution

    or the Joint Histogram.

    x1x2

    Probability

  • 7/27/2019 Transform Coding 2005

    7/80

    #7

    Pixel Correlation in Image[Amar]

    z Rotate 45o clockwise

    Source Image: Amar

    Before Rotation After Rotation

    =

    =

    2

    1

    2

    1

    45cos45sin

    45sin45cos

    X

    X

    Y

    YY

    oo

    oo

  • 7/27/2019 Transform Coding 2005

    8/80

    #8

    Pixel Correlation Map in [Amar]

    -- coordinate distribution

    z Upper:Before Rotation

    z Lower:After Rotation

    z Notice thevariance of Y2 issmaller than thevariance of X2.

    z Compression:apply entropycoder on Y2.

  • 7/27/2019 Transform Coding 2005

    9/80

    #9

    Pixel Correlation in Image[Lenna]

    z Lets look at another example

    Before Rotation After Rotation

    Source Image: Lenna

    =

    =

    2

    1

    2

    1

    45cos45sin

    45sin45cos

    X

    X

    Y

    YY

    oo

    oo

  • 7/27/2019 Transform Coding 2005

    10/80

    #10

    Pixel Correlation Map in [Lenna]

    -- coordinate distributionz Upper:

    Before Rotation

    z Lower:After Rotation

    z Notice thevariance of Y2 issmaller than thevariance of X2.

    z Compression:apply entropycoder on Y2.

  • 7/27/2019 Transform Coding 2005

    11/80

    #11

    Rotation Matrix

    z Rotated 45 degrees clockwise

    z Rotation matrix A

    =

    ==

    =

    2

    1

    2

    1

    2

    1

    2222

    2222

    45cos45sin

    45sin45cos

    X

    X

    X

    XAX

    Y

    YY

    oo

    oo

    =

    =

    =

    11

    11

    2

    2

    2222

    2222

    45cos45sin

    45sin45cos

    oo

    oo

    A

  • 7/27/2019 Transform Coding 2005

    12/80

    #12

    Orthogonal/orthonormal Matrixz Rotation matrix is orthogonal.

    z The dot product of a rowwith itself is nonzero.

    z The dot product of different

    rows is 0.

    z Futhermore, the rotationmatrix is orthonormal.

    z The dot product of a row

    with itself is 1.

    z Example:

    =

    11

    11

    2

    2A

    ==

    else0

    if0 jiAA ji

    =

    =else0

    if1 jiAA ji

  • 7/27/2019 Transform Coding 2005

    13/80

    #13

    Reconstruct the Image

    z Goal: recover X from Y.

    z Since Y = AX, so X = A-1Y

    z Because the inverse of an orthonormal matrix

    is its transpose, we have A-1 = AT

    z So, Y = A-1X = ATX

    z We have inverse matrix

    =

    ==11

    11

    2

    2

    45cos45sin

    45sin45cos1oo

    ooT

    AA

  • 7/27/2019 Transform Coding 2005

    14/80

    #14

    Energy Compaction

    z Rotation matrix [A] compacted the energy into Y1.

    z Energy is the variance (http://davidmlane.com/hyperstat/A16252.html)

    z Given:

    z The total variance of X equals to that of Y. It is 41.

    z Transformation makes Y2 (0.707) very small.z If we discard min{X}, we have error 42/41 =0.39

    z If we discard min{Y}, we have error 0.7072/41 =0.012

    z Conclusion: we are more confident to discard min{Y}.

    =

    =

    =

    11

    11

    2

    2

    2

    1

    2

    1A

    X

    XA

    Y

    YY

    meantheiswhere)(

    1

    1 2

    1

    2 =

    =N

    i

    ix

    N

    =

    =

    ==

    =

    0.707

    6.364

    1

    9

    2

    2

    5

    4

    11

    11

    2

    2AXthen,

    5

    4X Y

  • 7/27/2019 Transform Coding 2005

    15/80

    #15

    Idea of Transform Codingz Transform the input pixels X0,X1,X2,...,Xn-1 into

    coefficients Y0,Y1,...,Yn-1 (real values)z The coefficients have the property that most of

    them are near zero.

    z Most of the energy is compacted into a few

    coefficients.

    z Scalar quantize the coefficientz This is bit allocation.

    z Important coefficients should have morequantization levels.

    z Entropy encode the quantization symbols.

  • 7/27/2019 Transform Coding 2005

    16/80

    #16

    Forward transform (1D)

    z Get the sequence Y from the sequence X.

    z Each element of Y is a linear combination ofelements in X.

    The element of the matrix are also called the weight of the linear transform,

    and they should be independent of the data (except for the KLT transform).

    BasisVectors

    0 0,0 0, 1 0

    1 1,0 1, 1 1

    n

    n n n n n

    Y a a X

    Y a a X

    =

    L

    M M O M M

    L

    =

    ==1

    0

    , 1,1,0n

    i

    iijj njXaY L

    AXY =

  • 7/27/2019 Transform Coding 2005

    17/80

    #17

    Choosing the Weights of the Basis

    Vector

    z The general guideline to determine the values

    of A is to make Y0 large, while remainingY1,...,Yn-1 to be small.

    z The value of the coefficient will be large ifweights a

    ij

    reinforce the corresponding dataitems Xj. This requires the weights and thedata values to have similar signs. Theconverse is also true: Yi will be small if the

    weights and the data values to have dissimilarsigns.

  • 7/27/2019 Transform Coding 2005

    18/80

    #18

    Extracting Features of Dataz Thus, the basis vectors should extract distinct

    features of the data vectors and must beindependent orthogonal). Note the pattern ofdistribution of +1 and -1 in the matrix. Theyare intended to pick up the low and high

    frequency components of data.z Normally, the coefficients decrease in the

    order of Y0,Y1,...,Yn-1.

    z So, Y is more amenable to compression thanX.

    AXY =

  • 7/27/2019 Transform Coding 2005

    19/80

    #19

    Energy Preserving (1D)z Another consideration to choose rotation matrix is to conserve

    energy.

    z For example, we have orthogonal matrix

    z Energy before rotation: 42+62+52+22=81

    z Energy after rotation: 172+32+(-5)2+12=324

    z Energy changed!z Solution: scale W by scale factor. The scaling does not change the fact

    that most of the energy is concentrated at the low frequency components.

    =

    11111111

    1111

    1111

    A

    =

    25

    6

    4

    X

    ==

    25

    3

    17

    AXY

  • 7/27/2019 Transform Coding 2005

    20/80

    #20

    Energy Preserving, Formal Proof

    z The sum of the squares

    of the transformedsequence is the same

    as the sum of the

    squares of the originalsequence.

    z Most of the energy are

    concentrated in the lowfrequency coefficients.

    =

    =

    =

    =

    ==

    =

    =

    1

    1

    2

    1

    1

    2

    )(

    )()(

    Energy

    n

    ii

    T

    TT

    TT

    T

    n

    i

    T

    i

    X

    XX

    XAAXAXAX

    AXAX

    YYY

    Orthonormal

    Matrix;See page #12

  • 7/27/2019 Transform Coding 2005

    21/80

    #21

    Why we are interested in the

    orthonormal matrix?

    z Normally, it is

    computationally difficultto get the inverse matrix.

    z The inverse of the

    transformation matrix issimply its transpose.

    A-1=AT

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 11 1 1 1 1 1 1 11

    1 1 1 1 1 1 1 18

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    A

    =

    1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 11

    1 1 1 1 1 1 1 18

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    TB A A

    = = =

  • 7/27/2019 Transform Coding 2005

    22/80

    #22

    Two Dimensional Transform

    z From input Image I, we get D.

    z Given Transform matrix A

    z 2D transformation goes as:

    z Notice the energy compaction.

    =

    1111

    1111

    1111

    1111

    2

    1A

    =

    9542

    6745

    6386

    9674

    D9657

    69364425

    8764

    =

    ==

    75.175.025.125.125.225.025.325.0

    75.125.025.375.1

    75.375.075.275.22

    11111111

    1111

    1111

    2

    1

    95426745

    6386

    9674

    11111111

    1111

    1111

    2

    1TAXAY

  • 7/27/2019 Transform Coding 2005

    23/80

    #23

    Two Dimensional Transform

    z Because transformation matrix is

    orthonormal,z So, we have

    z

    Forward transform

    z Backward transform

    1= AAT

    TAXAAXAY == 1

    YAAYAAXT== 1

  • 7/27/2019 Transform Coding 2005

    24/80

    #24

    Linear Separable transformz Two dimensional transform is simplified as two

    iterations of one-dimensional transform.

    z Column-wise transform and row-wise transform.

    1 1

    , , , ,

    0 0

    N N

    k l k i i j l j

    i j

    Y a X a

    = =

    =

  • 7/27/2019 Transform Coding 2005

    25/80

    #25

    Transform and Filteringz Consider the orthonormal

    transform.

    z If A is used to transform a vector of 2 identicalelements x = [x,x]T, the transformed sequence willindicating the low frequency or the average

    value is and the high frequency component is0 because the signal value do not vary.

    z If x = [3,1]T or [3,-1]T, the output sequence will beand respectively. Now, the high frequencycomponent has positive value and it is bigger for

    [3,-1]T,indicating a much large variation. Thus, thetwo coefficients behave like output of a low-passand a high-pass filters, respectively.

    Tx )0,2(

    =11

    11

    2

    2A

    x2

    T)2,22(T

    )22,2(

  • 7/27/2019 Transform Coding 2005

    26/80

    #26

    Transform and Functional

    Approximation

    z Transform is a kind of function approximation.

    z Image is a data set. Any data set is a function.z Transform is to approximate the image function

    by a combination of simpler, well defined

    waveforms (basis functions).z Not all basis sets are equal in terms of

    compression.

    z DCT and Wavelets are computationally easierthan Fourier.

  • 7/27/2019 Transform Coding 2005

    27/80

    #27

    Comparison of various transforms

    z The KLT is optimal in the sense of decorrelating and energy-packing.

    z Walsh-Hadamard Transform is especially easy for implementation.z basis functions are either -1 or +1, only add/sub is necessary.

  • 7/27/2019 Transform Coding 2005

    28/80

    #28

    Two-Dimensional Basis Matrixz The outer product of two vectors V1 and V2 is

    defined as V1TV2. For example,

    z For a matrix A of size n*n, the outer product ofith row and jth column is defined as

    [ ]

    =

    =

    cfcecd

    bfbebd

    afaead

    fed

    c

    b

    a

    VVT

    21

    [ ]

    =

    =

    1,1,1,1,0,1,

    1,1,1,1,0,1,

    1,0,1,0,0,0,

    1,1,0,

    1,

    1,

    0,

    njnijnijni

    njijiji

    njijiji

    njjj

    ni

    i

    i

    ij

    aaaaaa

    aaaaaa

    aaaaaa

    aaa

    a

    a

    a

    L

    MOMML

    L

    LM

  • 7/27/2019 Transform Coding 2005

    29/80

    #29

    Outer Product

    z For example, if

    z We have:

    =

    11

    11

    2

    2A

    [ ]

    =

    =

    11

    11

    2

    111

    2

    2

    1

    1

    2

    200

    [ ]

    =

    =

    1111

    2111

    22

    11

    22

    01

    [ ]

    =

    =

    11

    11

    2

    111

    2

    2

    1

    1

    2

    210

    [ ]

    =

    =

    11

    11

    2

    111

    2

    2

    1

    1

    2

    211

  • 7/27/2019 Transform Coding 2005

    30/80

    #30

    Outer Product (2)

    z We have shown earlier that

    z Consider X to be a 2*2 matrix:

    YAX T=

    1111101001010000

    11011000

    1101100011011000

    1101100011011000

    11011000

    11011000

    1110

    0100

    1110

    0100

    11

    11

    11

    11

    11

    11

    11

    11

    2

    1

    2

    1

    1111

    21

    11

    11

    11

    11

    2

    1

    yyyy

    yyyy

    yyyyyyyy

    yyyyyyyy

    yyyyyyyy

    yy

    yy

    xx

    xx

    +++=

    +

    +

    +

    =

    ++

    ++++=

    ++=

    =

  • 7/27/2019 Transform Coding 2005

    31/80

    #31

    Basis Matrix

    z The quantities are called the

    basis matrices in 2-D space.z In general, the outer products of a n*n

    orhtonormal matrix form a basis matrix set in 2dimension. The quantity is called the DCcoefficient (note all the elements for the DCcoeeficient are 1, indicating an averageoperation), and other coefficients have

    alternating values and are called ACcoefficients.

    11100100 ,,,

    00

  • 7/27/2019 Transform Coding 2005

    32/80

    #32

    Fast Cosine Transform

    z 2D 8X8 basis functionsof the DCT:

    z The horizontal frequencyof the basis functionsincreases from left toright and the vertical

    frequency of the basisfunctions increases fromtop to bottom.

    .

  • 7/27/2019 Transform Coding 2005

    33/80

    #33

    Amplitude distribution of the DCT

    coefficients

    z Histograms for 8x8 DCT

    coefficient amplitudesmeasured for natural

    images

    z

    DC coefficient is typicallyuniformly distributed.

    z The distribution of the AC

    coefficients have a

    Laplacian distributionwith zero-mean.

  • 7/27/2019 Transform Coding 2005

    34/80

    #34

    Discrete Cosine Transform (DCT)

    z Conventional image data have

    reasonably high inter-element correlation.z DCT avoids the generation of the

    spurious spectral components which is a

    problem with DFT and has a fast

    implementation which avoids complex

    algebra.

  • 7/27/2019 Transform Coding 2005

    35/80

    #35

    One-dimensional DCT

    z The basis idea is to decompose the image intoa set of waveforms, each with a particularspecial frequency.

    z To human eyes, high spatial frequencies areimperceptible and a good approximation of the

    image can be created by keeping only thelower frequencies.

    z Consider the one-dimensional case first. The 8arbitrary grayscale values (with range 0 to 255,

    shown in the next slide) are level shifted by128 (as is done by JPEG).

  • 7/27/2019 Transform Coding 2005

    36/80

    #36

    (b)

    -150

    -100

    -50

    0

    50

    100

    150

    1 2 3 4 5 6 7 8

    x

    f(x)

    (c)

    -150

    -100

    -50

    0

    50

    100

    150

    1 2 3 4 5 6 7 8

    u

    S(u)

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=7

    Amplitude

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=6

    Amplitu

    de

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=5

    Amplitude

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=4

    Amplitude

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=3

    Amplitude

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=2

    Amplitu

    de

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=1

    Am

    plitude

    -1

    0

    1

    1 2 3 4 5 6 7 8

    U=0

    Amplitude

    Before DCT (image data) After DCT (coefficients

    An example of 1-D DCT decomposition

    The 8 basis functions for 1-D DCT

    One-dimensional DCT

  • 7/27/2019 Transform Coding 2005

    37/80

    #37

    One-dimensional DCT

    z The waveforms cam be denoted as

    with frequencies f=0, 1, , 7. Each wave is sampled at 8 points

    to form a basis vector.

    z The eight basis vector constitutes a matrix A:

    = 0with),cos()( ffw

    16

    15,

    16

    13,

    16

    11,

    16

    9,

    16

    7,

    16

    5,

    16

    3,

    16

    =

    -0.1950.556-0.8310.981-0.9810.831-0.556-0.195

    0.383-0.9240.924-0.383-0.3830.924-0.9240.383

    -0.5560.981-0.195-0.8310.8310.195-0.9810.556

    0.707-0.707-0.7070.7070.707-0.707-0.7070.707

    -0.8310.1950.9810.556-0.556-0.981-0.1950.831

    0.9240.383-0.383-0.924-0.924-0.3830.3830.924

    -0.981-0.831-0.556-0.1950.1950.5560.8310.981

    11111111

  • 7/27/2019 Transform Coding 2005

    38/80

    #38

  • 7/27/2019 Transform Coding 2005

    39/80

    #39

    One-dimensional DCT

    z The output of the DCT transform is:

    where A is the 8*8 transformation matrix

    defined in the previous slide, and I(x) isthe input signal.

    z

    S(u) are called the coefficients for theDCT transform for input signal I(x).

    [ ]TxIAuS )()( =

  • 7/27/2019 Transform Coding 2005

    40/80

    #40

    One-dimensional FDCT and IDCT with N=8

    Sample points

    z The 1-D DCT in JPEG is defined as:

    z FDCT

    z IDCT

    z Where (u is frequency)z I(x) is the 1-D sample

    z S(u) is the 1-D DCT coefficientzAnd

    0,1,..7for]16

    )12(cos[)(

    2

    )()(

    7

    0=

    += = u

    uxxI

    uCuS

    x

    0,1,..,7for]16

    )12(cos[)(

    2

    )()(

    7

    0

    =+

    = =

    xux

    uSuC

    xIu

    >

    ==

    0for1

    0for2

    2

    )(

    u

    uuC

    O di i l FDCT d IDCT ith N

  • 7/27/2019 Transform Coding 2005

    41/80

    #41

    One-dimensional FDCT and IDCT with N

    Sample points

    z The 1-D DCT in JPEG is defined as:

    z FDCT

    z IDCT

    z Where (u is frequency)

    z I(x) is the 1-D sample

    z S(u) is the 1-D DCT coefficient and

    1-1,2,..Nfor]2

    )12(cos[)()(

    2)(

    1

    0

    =+=

    =

    uN

    uxxIuC

    NuS

    N

    x

    1-0,1,..Nfor]2

    )12(cos[)()(

    2)(

    1

    0=

    +

    =

    =xN

    uxuSuCNxI

    n

    u

    >

    ==

    0for1

    0for2

    2

    )(

    u

    uuC

  • 7/27/2019 Transform Coding 2005

    42/80

    #42

    One-dimensional FDCT and IDCT

    z As an example, let I(x)=[12,10,8,10,12,10,8,11]

    z If we now apply IDCT, we will get back I(x).

    We can quantize the coefficient S(u) and still

    obtain a very good approximation of I(x).z For example,

    z While)9989.10,99354.7,9932.9,0164.12,93097.9,96054.7,0233.10,0254.12(

    )3.0,2.0,8.1,2.3,8.1,5.0,6.0,6.28(

    =

    IDCT

    [ ] ].309.0,191.0,729.1,182.3,757.1,4619.0,5712.0,6375.28[)()( ==T

    xIAuS

    )684.10,053.8,014.10,347.12,573.9,6628.7,6244.9,236.11(

    )0,0,2,3,2,0,0,28(

    =

    IDCT

    T di i l FDCT d IDCT f 8X8

  • 7/27/2019 Transform Coding 2005

    43/80

    #43

    Two-dimensional FDCT and IDCT for 8X8

    block

    z The 2-D DCT in JPEG is defined as:

    z FDCT

    z IDCT

    z Wherez I(y,x) is the 2-D sample

    z S(v,u) is the 2-D DCT coefficientzAnd

    ==

    ++=

    7

    0

    7

    0

    ]16

    )12(cos[]

    16

    )12(cos[),(

    2

    )(

    2

    )(),(

    xy

    uxvyxyI

    uCvCuvS

    ==++

    =

    7

    0

    7

    0]16

    )12(

    cos[]16

    )12(

    cos[),(2

    )(

    2

    )(

    ),(uv

    uxvy

    uvS

    uCvC

    xyI

    >

    ==

    0for1

    0for2

    2

    )(

    u

    uuC

    >

    ==

    0for1

    0for2

    2

    )(

    v

    vvC

  • 7/27/2019 Transform Coding 2005

    44/80

    #44

    Fast Cosine Transform

    z 2D 8X8 basis functionsof the DCT:

    z The horizontal frequencyof the basis functionsincreases from left toright and the verticalfrequency of the basisfunctions increases fromtop to bottom.

    .

  • 7/27/2019 Transform Coding 2005

    45/80

    #45

    Two-dimensional FDCT for NXM block

    z Where I(y,x) is the 2-D sample, S(v,u) is the 2-D DCT

    coefficient and

    10,1,...,and10,1,...,for

    ]2

    )12(

    cos[]2

    )12(

    cos[),(2

    )(

    2

    )(2

    ),(

    FDCT

    1

    0

    1

    0

    ==

    ++

    =

    =

    =

    MvN-u

    N

    ux

    M

    vy

    xyI

    uCvC

    NMuvS

    M

    y

    N

    x

    >

    ==

    0for1

    0for22)(

    u

    uuC

    >==

    0for1

    0for2

    2

    )(

    v

    vvC

  • 7/27/2019 Transform Coding 2005

    46/80

    #46

    Separable Function

    z The 2-D FDCT is a separable function because we can express

    the formula as

    10,1,...,and10,1,...,for

    2

    1)(2cos]

    2

    )12(cos[),()(

    2),( ][}{

    1

    0

    )(2

    1

    0

    ==

    ++=

    =

    =

    MvN-u

    M

    uy

    N

    uxxyIvC

    MuvS

    M

    y

    uCN

    N

    x

    This means that we can compute a 2-D FDCT by first computing a row-wise

    1-D FDCT and taking the output and perform on it a column-wise 1-D FDCTtransform. This is possible, as we explained earlier, because the basis

    matrices are orthonormal.

  • 7/27/2019 Transform Coding 2005

    47/80

    #47

    Two-dimensional IDCT for NXM block

    .10,1,...,and10,1,...,for

    ]2

    )12(cos[]

    2

    )12(cos[),()()(

    2),(

    1

    0

    1

    0

    ==

    ++=

    =

    =

    MyNx

    N

    ux

    M

    vyuvSvCuC

    NMxyI

    N

    u

    M

    v

    This function is again separable and the computation can be done in

    two steps: a row-wise 1-D IDCT followed by a column-wise 1-D IDCT.A straightforward algorithm to compute both FDCT and IDCT will need

    ( for a block of NXN pixels) an order ofO(N3) multiplication/ addition

    operations. After the ground-breaking discovery of O(N logN) algorithm

    for Fast Fourier Transform, several researchers proposed efficient

    algorithms for DCT computation.

  • 7/27/2019 Transform Coding 2005

    48/80

    #48

    JPEG Introduction - The background

    z JPEG stands for Joint Photographic Expert Group

    z

    A standard image compression method is needed toenable interoperability of equipment from different

    manufacturer

    z It is the first international digital image compression

    standard for continuous-tone images (grayscale orcolor)

    z The history of JPEG the selection process

  • 7/27/2019 Transform Coding 2005

    49/80

    #49

    JPEG Introduction whats the objective?

    z

    very good or excellent compression rate,reconstructed image quality, transmission rate

    z be applicable to practically any kind of continuous-

    tone digital source image

    z good complexity

    z have the following modes of operations:

    z sequential encoding

    z progressive encodingz lossless encoding

    z hierarchical encoding

  • 7/27/2019 Transform Coding 2005

    50/80

    #50

    encoder decoder Source

    image data

    compressed

    image data

    reconstructed

    image data

    Image compression system

    encoder

    statistical

    model

    entropy

    encoder

    Encoder

    model

    Source

    image data

    compressed

    image datadescriptors symbols

    model

    tables

    entropy

    coding tables

    The basic parts of an JPEG encoder

    JPEG Architecture Standard

  • 7/27/2019 Transform Coding 2005

    51/80

    #51

    JPEG has the following Operation Modes:

    Sequential DCT-based mode

    Progressive DCT-based mode

    Sequential lossless mode

    Hierarchical mode

    JPEG entropy coding supports:

    Huffman encoding

    Arithmetic encoding

    JPEG Overview (cont.)

  • 7/27/2019 Transform Coding 2005

    52/80

    #52

    JPEG Baseline System

    JPEG Baseline system is composed of:

    Sequential DCT-based mode Huffman coding

    The basic architecture of JPEG Baseline system

    Source

    image data

    quantizerentropy

    encoder

    compressed

    image data

    table

    specification

    table

    specification

    88 blocksDCT-based encoder

    statistical

    modelFDCT

  • 7/27/2019 Transform Coding 2005

    53/80

    #53

    The Baseline System

    The DCT coefficient values can be regarded as the relative amounts ofthe 2-D spatial frequencies contained in the 88 block

    the upper-left corner coefficient is called the DC coefficient, which is ameasure of the average of the energy of the block

    Other coefficients are calledAC coefficients, coefficients correspondto high frequencies tend to be zero or near zero for most naturalimages

    DC coefficient

    Q ti ti T bl i DCT

  • 7/27/2019 Transform Coding 2005

    54/80

    #54

    Quantization Tables in DCT

    z Human eyes are less sensitive to highfrequencies.

    z We use different quantization value fordifferent frequencies.z Higher frequency, bigger quantization value.

    z

    Lower frequency, smaller quantization value.z Each DCT coefficient corresponds to a certain

    frequency.

    z High-level HVS is much more sensitive to the

    variations in the achromatic channel than inthe chromatic channels.

  • 7/27/2019 Transform Coding 2005

    55/80

    #55

    The Baseline System Quantization

    z Why quantization? .

    z to achieve further compression by representing DCT

    coefficients with no greater precision than is necessary toachieve the desired image quality

    z Generally, the high frequency coefficients has larger quantizationvalues

    z Quantization makes most coefficients to be zero, it makes the

    compression system efficient, but its the main source that make thesystem lossy and introduces distortion in the reconstructed image.

    z The quantization step-size parameters for JPEG are given by two

    Quantization Matrices, each element of the matrix being an integer

    of size between 1 to 255. The DCT coeeficients are divided by the

    corresponding quantization parameters and rounded to nearestinteger.

  • 7/27/2019 Transform Coding 2005

    56/80

    #56

    )),(

    ),((),('

    vuQ

    vuFRoundvuF =

    F(u,v): original DCT coefficient

    F(u,v): DCT coefficient after quantization

    Q(u,v): quantization value

    There are two quantization tables: one for the luminancecomponent and the other for the Chrominance component. JPEG

    standard does not prescribe the values of the table; it is left upto

    the user, but they recommend the following two tables under

    normal circumstances. These tables have been created by

    analysing data of human perception and a lot of trial and error.

    Quantization Tables in DCT

  • 7/27/2019 Transform Coding 2005

    57/80

    #57

    Quantization Tables in DCT

    z So, we have two quantization tables.z Measured for an average person.

    z Higher frequency, bigger quantization value.

    z Lower frequency, smaller quantization value.

    9910310011298959272

    10112012110387786449

    921131048164553524

    771031096856372218

    6280875129221714

    5669574024161314

    5560582619141212

    6151402416101116

    9999999999999999

    9999999999999999

    9999999999999999

    9999999999999999

    9999999999996647

    9999999999562624

    9999999966262118

    9999999947241817

    Luminance quantization table Chrominance quantization table

    Two dimensional DCT

  • 7/27/2019 Transform Coding 2005

    58/80

    #58

    Two-dimensional DCT

    z The image samples are shifted from unsigned integer with range[0, 2 N-1] to signed integers with range [- 2 N-1, 2 N-1-1]. Thussamples in the range 0-255 are converted in the range -128 to

    127 and those in the range 0 to 4095 are converted in the range -2048 to 2047. This zero-shift done for JPEG to reduce theinternal precision requirements in the DCT calculations.

    z How to interpret the DCT coefficients?z The DCT coefficient values can be regarded as the relative amounts

    of the 2-D spatial frequencies contained in the 88 block.z F(0,0) is called DC coefficient, which is a measure of the average of

    the energy of the block.

    z Other coefficients are called AC coefficients, coefficients correspondto high frequencies tend to be zero or near zero for most images.

    z

    The energy is concentrated in the upper-left corner.

  • 7/27/2019 Transform Coding 2005

    59/80

    #59

    Baseline System - DC coefficient coding

    z After quantization step, the DC coefficient are encoded separatelyby using differential encoding. The DC difference sequence isconstituted by concatenating the DC coefficient of the first block

    followed by the difference values of DC coefficients of thesucceeding blocks. The average values of the succeeding blocksare usually correlated.

    DPCMquantized DCcoefficients

    DC difference

    Differential pulse code modulation

  • 7/27/2019 Transform Coding 2005

    60/80

    #60

    Baseline System - AC coefficient coding

    z AC coefficients are arranged into a zig-zag sequence. This is

    because most of the significant coefficients are located in the

    upper left corner of the matrix. The high frequency componentsare mostly 0s and can be efficiently coded by run-length

    encoding.

    0 1 5 6 14 15 27 28

    2 4 7 13 16 26 29 42

    3 8 12 17 25 30 41 43

    9 11 18 24 31 40 44 53

    10 19 23 32 39 45 52 54

    20 22 33 38 46 51 55 60

    21 34 37 47 50 56 59 61

    35 36 49 48 57 58 62 63

    Horizontal frequency

    Verticalfrequency

    An Example (Ref:T. Acharya, P. Tsai,JPEG 2000

    Standard for Image Compression) Wiley Iterscience

  • 7/27/2019 Transform Coding 2005

    61/80

    #61

    Standard for Image Compression), Wiley-Iterscience,

    2005

    The 8X8 Image Data Block

    110 110 118 118 121 126 131 131108 111 125 122 120 125 134 135

    106 119 129 127 125 127 138 144

    110 126 130 133 133 131 141 148115 116 119 120 122 125 137 139

    115 106 99 110 107 116 130 127

    110 91 82 101 99 104 120 118103 76 70 95 92 91 107 106

  • 7/27/2019 Transform Coding 2005

    62/80

    #62

  • 7/27/2019 Transform Coding 2005

    63/80

    #63

    Baseline System - Statistical modeling

    z Statistical modeling translate the inputs

    to a sequence of symbols for Huffmancoding to use

    z Statistical modeling on DC coefficients:

    z Category: different size (SSSS)

    z Magnitude: amplitude of difference

    (additional bits)

    Coding the DC coeeficients

  • 7/27/2019 Transform Coding 2005

    64/80

    #64

    Coding the DC coeeficients

    The prediction errors are classified into 16 categories in modulo 216. Category

    C contains the range of integers [-(2c

    -1), + (2c

    -1)] excluding the numbers in therange [-(2c-1-1), + (2c-1-1)] which falls in the previous category. The category

    number is encoded using either a fixed code (SSSS), unary code (0, 10, 110, )

    or a variable length Huffman code.

    Coding the DC/AC coefficients

  • 7/27/2019 Transform Coding 2005

    65/80

    #65

    Coding the DC/AC coefficients

    If the magnitude of the error is positive, it is encoded using category

    number of bits with a leading 1. If it is negative, Is complement of

    the positive number is used. Thus, if error is -6 and if the Huffman

    code for category 3 is 1110, it will get a code (1110001). The JPEG

    Standard recommends Huffman two codes for the category

    numbers: one for Luminance DC and the other for Chrominance DC

    ( See Salomon, p.289; it is also described in Annex K ( Table k.3

    and K.4 of the Standard)

    z Statistical modeling on AC coefficients:

    z Run-length: RUN-SIZE=16*RRRR+SSSS

    z Amplitude: amplitude of difference (additional bits)

  • 7/27/2019 Transform Coding 2005

    66/80

    #66

    The AC coefficients contains just a few nonzero numbers, with runs of

    zeros followed by a long run of trailing zeros. For each nonzero

    number, two numbers are generated: 1)16*RRRR which gives the

    number of consecutive zeros preceding the nonzero coefficient being

    encoded, 2)SSSS gives the category corresponding to number of bits

    required to encode the AC coefficient using variable length Huffman

    code. This is specified in the form of a table (see next slide). The

    second attribute of the AC coefficient is its amplitude which is

    encoded exactly the same way as the DC values. Note the code (0,0)

    is reserved for EOB ( signifying that the remaining AC coefficients are

    just a single trailing run of 0s). The code (15,0) is reserved for a

    maximum run of 16 zeros. If it is more than 16, it is broken down into

    runs of 16 plus possibly a single run less than 16. JPEG recommends

    two tables for Huffman codes for AC coefficients, Tables K5 and K6

    for luminance and Chrominance values.

    Encoding the AC Coefficients

  • 7/27/2019 Transform Coding 2005

    67/80

    #67

    Huffman AC statistical model

    run-length/amplitude combinations Huffman coding of AC coefficients

    Encoding the AC Coefficients

    Example (Contd.)

  • 7/27/2019 Transform Coding 2005

    68/80

    #68

    Example (Contd.)

    z Let us finish our example. The DC coeeficient is -6 has a code o

    1110001. The AC coeeficients will be encoded as:

    z (0,3)(-6), (0,3)(6), (0,3)(-5), (1,2)(2), (1,1)(-1), (5,1)(-1),(2,1)(-1),(0,1)(1),(0,0). The Table K.5 gives the codes

    z 1010, 00, 100, 1100, 11011, 11100 and 11110101 for the pairs of

    symbols (0,0), (0,1), (0,3), (1,1), (1,2), (2,1) and (5,1),

    respectively. The variable length code for AC coefficients 1, -1, 2,-5, 6 and -6 are 1, 0, 10, 010,110 and 001, respectively. Thus

    JPEG will give a compressed representation of the example

    image using only 55 bits as:

    z 1110001 100001 100110 100010 1101110 11000 11110100

    111000 001 1010

    z The uncompressed representation would have required 512 bits!

    Example to illustrate encoding/decoding

  • 7/27/2019 Transform Coding 2005

    69/80

    #69

    p g g

    JPEG Progressive Model

  • 7/27/2019 Transform Coding 2005

    70/80

    #70

    JPEG Progressive Model

    z Why progressive model?

    z Quick transmission

    z Image built up in a coarse-to-fine passes

    z First stage: encode a rough but recognizable version of the image

    z Later stage(s): the image refined by successive scans till get the

    final image

    z Two ways to do this:

    z Spectral selection send DC, AC coefficients separately

    z Successive approximationsend the most significant bits first

    and then the least significant bits

    JPEG Lossless Model

  • 7/27/2019 Transform Coding 2005

    71/80

    #71

    JPEG Lossless Model

    DPCMSample values Descriptors

    Differential pulse code modulation

    C B

    A X

    selection value prediction strategy

    7

    0

    1

    2

    3

    (A+B)/2

    no prediction

    A

    B

    C

    Predictors for lossless coding

    B+(A-C)/2

    A+B-C

    A+(B-C)/2

    4

    56

    JPEG Hierarchical Model

  • 7/27/2019 Transform Coding 2005

    72/80

    #72

    JPEG Hierarchical Model

    z Hierarchical model is an alternative of progressive

    model (pyramid)

    z Steps:

    z filter and down-sample the original images by the desired

    number of multiplies of 2 in each dimension

    z Encode the reduced-size image using one of the above

    coding model

    z Use the up-sampled image as a prediction of the origin at

    this resolution, encode the difference

    z Repeat till the full resolution image has been encode

    The Effect of Segmentation

  • 7/27/2019 Transform Coding 2005

    73/80

    #73

    g

    z The image samples are grouped into 88 blocks. 2-D DCT isapplied on each 88 blocks.

    z Because of blocking, the spatial frequencies in the image and thespatial frequencies of the cosine basis functions are not preciselyequivalent. According to Fouriers theorem, all the harmonics ofthe fundamental frequencies must be present in the basisfunctions to be precise. Nonetheless, the relationship between

    the DCT frequency and the spatial frequency is a closeapproximation if we take into account the sensitivity of human eyefor detecting contrast in frequency.

    z The segmentation also introduces what is called the blockingartifacts. This becomes very pronounced if the DC coefficients

    from block to block vary considerably. These artifacts appear asedges in the image, and abrupt edges imply high frequency. Theeffect can be minimized if the non-zero AC coefficients are kept.

    Some other transforms

  • 7/27/2019 Transform Coding 2005

    74/80

    #74

    z Discrete Fourier Transform (DFT)

    z Haar Transformz Karhunen Love Transform (KLT)

    z Walsh-Hadamard Transform (WHT)

    Discrete Fourier Transform (DFT)

  • 7/27/2019 Transform Coding 2005

    75/80

    #75

    ( )

    z Well-known for its connection to spectral

    analysis and filtering.

    z Extensive study done on its fast

    implementation (O(Nlog2N) for N-point DFT).

    z

    Has the disadvantage of storage andmanipulation of complex quantities and

    creation of spurious spectral components due

    to the assumed periodicity of image blocks.

    Haar Transform

  • 7/27/2019 Transform Coding 2005

    76/80

    #76

    z Very fast transform.

    z The easiest wavelet

    transform.z Useful in edge

    detection, imagecoding, and image

    analysis problems.z Energy Compaction

    is fair, not the bestcompression

    algorithms.2D basis function of Haar transform

    Karhunen Love Transform (KLT)

  • 7/27/2019 Transform Coding 2005

    77/80

    #77

    z Karhunen Love Transform (KLT) yieldsdecorrelated transform coefficients.

    z Basis functions are eigenvectors of thecovariance matrix of the input signal.

    z KLT achieves optimum energy concentration.

    z Disadvantages:z KLT dependent on signal statistics

    z KLT not separable for image blocks

    z Transform matrix cannot be factored into sparsematrices

    Walsh-Hadamard Transform (WHT)

  • 7/27/2019 Transform Coding 2005

    78/80

    #78

    z Although far from

    optimum in an

    energy packing

    sense for typical

    imagery, its simple

    implementation(basis functions are

    either -1 or +1) has

    made it widelypopular.

    Transformation matrix:

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 111 1 1 1 1 1 1 18

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    A

    =

    Walsh-Hadamard Transform (WHT)

  • 7/27/2019 Transform Coding 2005

    79/80

    #79

    z Walsh-Hadamard transform requires

    adds and subtractsz Use of high speed signal processing has

    reduced its use due to improvements

    with DCT.

    Transform Coding: Summary

  • 7/27/2019 Transform Coding 2005

    80/80

    #80

    z Purpose of transformz de-correlation

    z

    energy concentrationz KLT is optimum, but signal dependent and, hence,

    without a fast algorithm

    z DCT reduces blocking artifacts appeared in DFT

    z

    Threshold coding + zig-zag-scan + 8x8 block size iswidely used todayz JPEG, MPEG, ITU-T H.263.

    z Fast algorithm for scaled 8-DCTz 5 multiplications, 29 additions

    z Audio Codingz MP3 = MPEG 1- Layer 3 uses DCT