lossless image compression - department of...

Lossless Image Compression

Yao WangPolytechnic Institute of NYU, Brooklyn, NY 11201

With contribution from Zhu LiuPartly based on A. K. Jain, Fundamentals of Digital Image

Processing

Yao Wang, NYU-Poly EL5123: Lossless Compression 2

Lecture Outline

• Introduction: need for compression

• Review of basic probability theory

• Binary encoding– Fixed length coding

– Variable length coding• Huffman coding• Other variable length code (LZW, arithmetic)

• Runlength coding of bilevel images– Fax coding standard

• Lossless Predictive coding– Coding block diagram

– Design of linear predictors


Necessity for Signal Compression

Size

One Page of Text 2 KB

One 640x480 24-bit color still image 900 KB

Voice ( 8 Khz, 8-bit) 8 KB /second

Audio CD DA (44.1 Khz, 16-bit) 176 KB/second

Animation ( 320x640 pixels, 16-bit color, 16 frame/s) 6.25 MB/secondVideo (720x480 pixels, 24-bit color, 30 frame/s) 29.7 MB/second

• Storage requirement for various uncompressed

data types

• Goal of compression

– Given a bit rate, achieve the best quality

– Given an allowed distortion, minimize the data amount

– (Rate-Distortion or RD tradeoff)


Image Coding Standardsby ITU and ISO

• G3,G4: facsimile standard

• JBIG: The next generation facsimile standard

– ISO Joint Bi-level Image experts Group

• JPEG: For coding still images or video frames.

– ISO Joint Photographic Experts Group

• JPEG2000: For coding still images, more efficient than JPEG

• Lossless JPEG: for medical and archiving applications.

• MPEGx: audio and video coding standards of ISo

• H.26x: video coding standard of ITU-T

• ITU: International telecommunications union

• ISO: International standards organization


A Typical Compression System

Transfor-

mation

Quanti-

zation

Binary

Encoding

PredictionTransformsModel fitting…...

Scalar QVector Q

Fixed lengthVariable length(Huffman, arithmetic, LZW)

Input

Samples

Transformed

parameters

Quantized

parameters

Binary

bitstreams

• Motivation for transformation ---To yield a more efficient representation of the original samples.


Review of Basic Probability Theory

• A. Papoulis and S. Pillai, “Probability, Random Variables, and Stochastic Processes”, McGraw-Hill, Inc., 2002. http://www.mhhe.com/engcs/electrical/papoulis/

• Random Variable (RV)*– The set, S, of all possible outcomes of a particular

experiment is called the sample space for the experiment.

– An event is any collection of possible outcomes of an experiment, that is, any subset of S.

– Random variable is a function from a sample space S into the real numbers.

*From G. Casella and R. Berger, “Statistical Inference”, Duxbury Press, 1990.


Examples of Random Variables

• Tossing two coins, X is the number of heads, and Y is the number of tails

– X and Y take on values {0, 1, 2}

– Discrete type

• X is the lifetime of a certain brand of light bulbs

– X take on values [0, +∞)

– Continuous type


Distribution, Density, and Mass Functions

• The cumulative distribution function (cdf) of a random variable X, is defined by

• If X is a continuous random variable (taking value over a

continuous range)

– FX(x) is continuous function.

– The probability density function (pdf) of X is given by

• If X is a discrete random variable (taking a finite number

of possible values)

– FX(x) is step function.

– The probability mass function (pmf) of X is given by

x.allfor ),.(Pr)( xXxFX ≤=

).(Pr)( xXxpX ==

)()( xFdx

dxf XX =

The percentage of time that X=x.


Special Cases

• Binomial (discrete)

• Poisson distribution

• Normal (or Gaussian) N(µ, σ2)

• Uniform over (x1, x2),

• Laplacian L(µ, b)

.,...,1,0,)1(}{ nkppk

nkXP

knk =−

== −

)2/()( 22

2

1)( σµ

σπ

−−= xexf

≤≤

−=

otherwise0

1

)( 21

12

xxxxxxf

,...1,0,!

}{ === −k

k

aekXP

ka

bxe

bxf

/||

2

1)( µ−−=

Figures are from http://mathworld.wolfram.com


Expected Values

• The expected (or mean) value of a random variable X:

• The variance of a random variable X:

• Mean and variance of common distributions:– Uniform over range (x1, x2): E{x} = (x1+x2)/2, VarX = (x2-x1)

2/12

– Gaussian N(µ, σ2): Ex = µ, VarX = σ2

– Laplace L(µ, b): Ex = µ, VarX = 2b2

===∑∫

∈

∞

∞−

discrete is X if)(

continuous is X if)(}{

Xx

XX

xXxP

dxxxfXEη

( )

( )

=−

−==∑∫

∈

∞

∞−

discrete is X if)(

continuous is X if)(}{

X

2

2

2

x X

XX

X

xXPx

dxxfxXVar

η

ησ


Functions of Random Variable

• Y=g(X)

– Following the example of the lifetime of the bulb, let Y represents the cost of a bulb, which depends on its lifetime X with relation

• Expectation of Y

• Variance of Y

XY =

===∑∫

∈

∞

∞−

discrete is X if)()(

continuous is X if)()(}{

Xx

XY

xXPxg

dxxfxgYEη

( )

( )

=−

−==∑∫

∈

∞

∞−

discrete is X if)()(

continuous is X if)()(}{

X

2

2

2

x X

XY

Y

xXPxg

dxxfxgYVar

η

ησ


Two RVs

• We only discuss discrete RVs (i.e. X and Y for both discrete RVs)

• The joint probability mass function (pmf) of X and Y is given by

• The conditional probability mass function of X given Y is

• Important relations

),.(Pr),( yYxXyxpXY ===

)|.(Pr)/(/ yYxXyxp YX ===

)()/(),( / ypyxpyxp YYXXY =

∑∈

===Yy

XYX yYxXxp ),(.Pr)(


Conditional Mean and Covariance

• Conditional mean

• Correlation

• Correlation matrix

• Covariance

• Covariance matrix

∑ ∈====

X| )|(}|{xyX yYxXxPyXEη

∑ ∈∈====

YyX,, ),(}{xYX yYxXxyPXYER

( )( ) YXYXYXYX RYXEC ηηηη −=−−= ,, }{

[ ]

=

−−

−

−=

2

2

YXY

XYX

YX

Y

X

C

CYX

Y

XE

σ

σηη

η

ηC

[ ]{ }

{ }{ } 222

2

2

, XX

XY

XY XEYER

RXEYX

Y

XE ησ +=

=

=R


Multiple RVs

• The definitions for two RVs can be easily extended to multiple (N>2) RVs, X1,X2, …, XN

• The joint probability mass function (pmf) is given by

• Covariance matrix is

[ ]

=

−−−

−

−

−

=

2

21

2

2

221

112

2

1

2211

22

11

...

............

...

......

NNN

N

N

NN

NN CC

CC

CC

XXX

X

X

X

E

σ

σ

σ

ηηη

η

η

η

C

),...,,.(Pr),...,,( 221121 NNN xXxXxXxxxp ====


Binary Encoding

• Binary encoding

– To represent a finite set of symbols using binary

codewords.

• Fixed length coding

– N symbols represented by (int) log2(N) bits.

• Variable length coding

– more frequently appearing symbols represented by shorter codewords (Huffman, arithmetic, LZW=zip).

• The minimum number of bits required to

represent a source is bounded by its entropy.


Entropy of a Source

• Consider a source of N symbols, rn, n=1,2,…,N. This source can be considered as a discrete RV with N possible outcomes.

• Suppose the probability of symbol rn is pn. The self information of symbol rn is defined as,

• The entropy of this source, which represents the average information, is defined as:

• The self information indicates the number of bits required to convey a particular outcome (rn ) of the underlying RV. The entropy indicates the average number of bits required to specify any possible outcomes.

)(log2 bitspH nn −=

∑=

−=N

n

nn bitsppH1

2 )(log


Shannon Source Coding Theory

• For a source with pn=2-ln, we can design a code

such that the length of the codeword for rn is

ln=-log2pn, and the average bit rate is

∑∑ =−== Hpplpl nnnn 2log


Shannon Source Coding Theory

• For an arbitrary source, a code can be designed so that -log2pn<= ln<= -log2pn+1

• This is Shannon Source Coding Theorem, which states the lower and upper bound for variable length coding.

• The Shannon theorem only gives the bound but not the actual way of constructing the code to achieve the bound.

1)1(loglog

,1loglog

22

22

+=+−≤=≤−=

+−≤≤−

∑∑∑ HpplplppH

haveweplpSince

nnnnnn

nnn


Huffman Coding

• A method to realize the Shannon bound for any given source

• Procedure of Huffman coding– Step 1: Arrange the symbol probabilities pn in

a decreasing order and consider them as leaf nodes of a tree.

– Step 2: While there is more than one node:• Find the two nodes with the smallest probability

and arbitrarily assign 1 and 0 to these two nodes

• Merge the two nodes to form a new node whose probability is the sum of the two merged nodes.


Example of Huffman Coding

Symbol Prob Codeword Length

1

“ 2 ” 36/49 “ 1 ” 1

1 1

“ 3 ” 8/49 “ 01 ” 2

0

1

“ 1 ” 4/49 13/49 “ 001 ” 3

0

0 5/49

“ 0 ” 1/49 “ 000 ” 3

.16.1log;4.149

673

49

13

49

42

49

81

49

362∑ =−===⋅+⋅+⋅+⋅= nn ppHI


Example of Huffman Coding

1.00

SectionVIII

Section

VII SectionVI

SectionV

SectionIV

SectionIII

SectionII

SectionI

.57

.43

.30

.27

.43

.27

.30 .30 .30 .30 .30

.23

.20

.23 .23 .23 .23

.15

.12

.15 .15 .15

.12 .12

.12

.08

.08 .08

.06 .06

.06 .06

.06

.06

b(2):00

b(1):10

b(3):010

b(4):111

b(0): 1100

b(5):1101

b(6): 0110

b(7): 0111

.20

0

1

0

1

0

1

0

1

0

1

1

10

0

bits. 68.2logEntropy bits; 71.2 RateBit Average 2 ==== ∑∑n

nn

n

nn pplp


Disadvantage of Huffman Coding

• At least one bit has to be used for each symbol.

• Solution– Vector Huffman coding: To obtain higher compression,

we can treat each group of M symbols as one entityand give each group a codeword based on the group probability. (M-th order Huffman coding)

– Consider the m symbols as the outcome of m RVs

– Joint Entropy of a pair of variables (X,Y) with a joint distribution p(x,y)

∑∑−=x y

yxpyxpYXH ),(log),(),( 2

∑ ∑−=1

),...,(log),...,(...),...,( 1211

x x

MMM

M

xxpxxpXXH


Example of Vector Huffman Coding

(1)

• Consider an alphabetic source of three symbols: a, b, c.

• Assume the symbols are independent.

6

1)(,

6

1)(,

3

2)( === cpbpap

)()()( ypxpxyp =

36

1

6

1

6

1)(

36

1

6

1

6

1)(

9

1

3

2

6

1)(

36

1

6

1

6

1)(

36

1

6

1

6

1)(

9

1

3

2

6

1)(

9

1

6

1

3

2)(

9

1

6

1

3

2)(

9

4

3

2

3

2)(

=⋅==⋅==⋅=

=⋅==⋅==⋅=

=⋅==⋅==⋅=

ccpcbpcap

bcpbbpbap

acpabpaap



(2)• Design a Huffman code which uses a separate

codeword for each symbol

33.13

42

6

12

6

11

3

2==⋅+⋅+⋅=l

a

b

c

2/3

1/6

1/60

1

0

1

1/3

“1”

“01”

“00”

Symbol Probability



(3)• Design a Huffman code which use a separate

codeword for each group of two symbols.

aa 4/9

ab 1/9

ac 1/9

ba 1/9

bb 1/36

bc 1/36

ca 1/9

cb 1/36

cc 1/36 0

1

1/18

0

1

1/18

0

1

1/9

2/9

0

1

2/9

0

1

3/9

5/9

“1”

“011”

“0101”

“0100”

“00111”

“00110

“000”

“00101”

“00100”

0

1

0

1

0

1

27.136

465

36

15

36

13

9

15

36

15

36

14

9

14

9

13

9

11

9

4

2

1==

⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅=l


• Explain in entropy relation H(X,Y) and H(X), H(Y|X)


Conditional Entropy

• If (X,Y) ~ p(x,y), then the conditional entropy

H(Y|X) is defined as

• 1st order conditional Huffman coding

– Assume there are N symbols {ai}, I =1,…,N, we have to build N different Huffman coder Ci based on p(aj|ai)

∑∑

∑∑

∑ ∑

∑

−=

−=

−=

==

x y

x y

x y

x

xypyxp

xypxypxp

xypxypxp

xXYHxpXYH

)|(log),(

)|(log)|()(

)|(log)|()(

)|()()|(

2

2

2


Example of Conditional Huffman Coding

{ } =

=otherwise

jiifaap ji

4/1

2/1/

5.15.13

13

5.14

1log

4

12

2

1log

2

1)|(log)|()|(

)|()(

22

3

1

2

3

1

=××=

=×−−=−=

=

∑

∑

=

=

H

aapaapaXH

aXHapH

j

ijiji

i

ii

A source with three symbols A

= {a1, a2, a3} has the following

probability distributions

a1 |a1 1/2

1/20

10

1 “1”

“01”

“00”

a2 |a1 1/4

a3 |a1 1/4

a2 |a2 1/2

1/20

10

1 “1”

“01”

“00”

a1 |a2 1/4

a3 |a2 1/4

a3 |a3 1/2

1/20

10

1 “1”

“01”

“00”

a1 |a3 1/4

a2 |a3 1/45.12

4

121

2

1=××+×=l

What is

the previous

Symbol?

Input

a1

a2

a3


Other Variable Length Coding Methods

• Huffman coding– Assign variable length codewords to a fixed length sequence of

symbols

• LZW coding (Lempel, Ziv, and Welsh)– Assign fixed-length codewords to variable length sequences of

source symbols

– Does not require priori knowledge of the symbol probabilities. (universal code)

– Not as efficient as Huffman for a given distribution

• Arithmetic coding– Assign variable-length codewords to variable length sequences

of source symbols

– More efficient than Huffman coding (bit rate closer to the source entropy)

– Requires significantly more computation


Significance of Compression for Facsimile

• For a size A4 document (digitized to 1728x2376 pixels)

– Without compression

• Group 1 Fax: 6 min/page


– With compression


– Fax becomes popular only after the

transmission time is cut down to below 1

min/page.


Runlength Coding of Bi-Level Images

• 1D Runlength Coding:

– Count length of white and length of black alternatingly

– Represent the last runlength using “EOL”

– Code the white and black runlength using different

codebook (Huffman Coding)

• 2D Runlength Coding:

– Use relative address from the last transition in the line

above

– Used in Facsimile Coding (G3,G4)


Example of 1-D Runlength Coding

X

X

2 4 2 1 6 4 2 1 2 4X

2 1 5 1 6 1 5 1 5 1X

2 5 6 5 5 X……

X

X


Example Continued

• Design the Huffman coding table for all possible runlength symbols

• Do on the board


Example of 2-D Runlength Coding

• Relative address coding (RAC) is based on the principle of tracking the binary transitions that begin and end each black and white run.

c is current transition, e is the last transition in the same line, c’ is the

first similar transition past e in the previous line.

If ec <= cc’, d = ec; if cc’ < ec, d = cc’.


CCITT Group 3 and Group 4 Facsimile Coding Standard – the READ Code

• Relative Element Address Designate

– The first line in every K lines is coded using

1D runlength coding, and the following K-1

lines are coded using 2D runlength coding

• The reason for 1D RLC is used for every K line is

to suppress propagation of transmission errors.

• Group 4 method is designed for more secure transmission, such as leased data line where the bit error rate is very low.


Predictive Coding

• Motivation– The value of a current pixel usually does not

change rapidly from those of adjacent pixels. Thus it can be predicted quite accurately from the previous samples.

– The prediction error will have a non-uniform distribution, centered mainly near zero, and can be specified with less bits than that required for specifying the original sample values, which usually have a uniform distribution.


Lossless Predictive Coding System

+

PredictorDelay

(frame store)

Binary

Encoding

+-

fe

f̂

f

+

++

pfEncoder

Decoder

Binary

Decoding+

Be

Be e

PredictorDelay

(frame store)

pff̂

f


Linear Predictor

• Let f0 represent the current pixel, and fk, k =

1,2,…,K the previous pixels that are used to

predict f0. For example, if f0=f(m,n), then fk=f(m-

i,n-j) for certain i, j ≥ 0. A linear predictor is

• ak are called linear prediction coefficients or

simply prediction coefficients.

• The key problem is how to determine ak so that

a certain criterion is satisfied.

∑=

=K

k

kk faf1

0ˆ


LMMSE Predictor (1)

• The designing criterion is to minimize the mean

square error (MSE) of the predictor.

• The optimal ak should minimize the error

−=−= ∑=

2

1

0

2

00

2 }|ˆ{|K

k

kkp fafEffEσ

.,...,2,1,01

0

2

KlffafEa

l

K

k

kk

l

p==

−=

∂

∂∑

=

σ

Let R(k,l)=E{fkfl} ∑=

==K

k

k KllRlkRa1

,...,2,1),,0(),(


LMMSE Predictor (2)

In matrix format

rRarRa

KR

R

R

a

a

a

KKRKRKR

KRRR

KRRR

K

11

1

),0(

)2,0(

)1,0(

),(),2(),1(

)2,(...)2,2()2,1(

)1,(...)1,2()1,1(

−=⇒=⇒

=

MM

L

MOMM

The MSE of this predictor

rRrRarRkRaRfffETT

K

k

kp

1

1

000

2 )0,0()0,0()0,()0,0(})ˆ{( −

=

−=−=−=−= ∑σ

∑=

==K

k

k KllRlkRa1

,...,2,1),,0(),(


An Example of Predictive Coder

• Assume the current pixel f0 = f(m, n) is predicted from two pixels, one on the

left, f1 = f(m, n-1), and one on the top, f2 = f(m-1,n). Assume the pixel values

have zero means. The correlations are: R(0,0) = R(1,1) = R(2,2) = σf2, R(0,1)

= ρhσf2, R(0,2) = ρvσf

2, R(1,2) = R(2,1) = ρdσf2.

[ ]

+−=

+=

==

−

−+−=

−=

−

−

−=

−

−

−=

⇒

=

⇒

=

d

fp

d

vh

d

hdvvhfp

hdv

vdh

dv

h

d

d

d

v

h

d

d

a

a

isotropicisncorrelatiotheif

a

aRRR

a

a

a

a

R

R

a

a

RR

RR

ρ

ρσσ

ρ

ρ

ρρρ

ρ

ρρρρρσσ

ρρρ

ρρρ

ρρ

ρ

ρ

ρ

ρ

ρ

ρ

ρ

ρ

1

21

1

1

1

,,

1

21)2,0()1,0()0,0(

1

1

1

1

1

1

1

1

)2,0(

)1,0(

)2,2()1,2(

)1,2()1,1(

222

2

1

2

222

2

12

22

2

1

2

1

2

1

),1()1,(),(ˆ21 nmfanmfanmf −+−=

)}1,(),1({)1,2(

)},1()1,({)2,1(

)},1(),({)2,0(

)}1,(),({)1,0(

)},1(),1({)2,2(

)}1,()1,({)1,1(

)},(),({)0,0(

),1(

)1,(

),(

2

1

0

−−=

−−=

−=

−=

−−=

−−=

=

−=

−=

=

nmfnmfER

nmfnmfER

nmfnmfER

nmfnmfER

nmfnmfER

nmfnmfER

nmfnmfER

nmff

nmff

nmff

f(m,n)

f(m-1,n)

f(m,n-1)

Note: when the pixel value has non-zero mean, the above predictor can be applied to mean-shifted values.


Homework (1)

1. Consider a discrete source with an alphabet A = {a1, a2, …, aL}. Compute the entropy of the source for the following two cases:(a) the source is uniformly distributed, with p(al) = 1/L, l = 1, 2, …, L.(b) For a particular k, p(ak) = 1 and p(al) = 0; l ≠k.

2. A source with three symbols A = {a1, a2, a3} has the following probability distributions:

(a) Calculate the 1st order entropy, 2nd order entropy, 1st order conditional entropy.Hint: the probability for having a pair of symbols “aiaj” is P(ai)P(aj|ai).

(b) Design the 1st order, 2nd order, and 1st order conditional Huffman codes for this source. Calculate the resulting bit rate for each case. Compare it to the corresponding lower and upper bounds defined by the entropy. Which method has the lowest bit rate per symbol? How do they compare in complexity?

{ } { } =

==otherwise

jiifaapiap jii

6/1

3/2/ ; allfor ,3/1


Homework (2)

3. For the following image which consists of 8 symbols, (a) Determine the probability of each symbol based on its occurrence frequency; (b) Find its entropy; (c) Design a codebook for these symbols using Huffman coding method. Calculate the average bit rate and compare it to the entropy.

4. For the following bi-level image: (a) Give its run-length representation in one dimensional RLC. Assume that each line start with “W” and mark the end with ‘EOL’. (b) Suppose we want to use the same codeword for the black and white run-lengths, determine the probability of each possible run-length (including the symbol ‘EOL’) and calculate the entropy of this source; (c) Determine the Huffman code for the source containing all possible runlengths, calculate the average bit rate of this code, and compare it to the entropy.

0 1 2 32 3 5 52 5 5 64 4 6 7

White pixel

Black pixel


Homework (3)

5. Assume the current pixel f0 = f(m, n) is predicted from three pixels, f1 = f(m, n-1), f2 =

f(m-1,n), f3 = f(m-1,n-1). Assume the pixel values have zero means. The correlations

are: R(0,0) = R(1,1) = R(2,2) = R(,3,3)=σf2, R(0,1) = R(2,3)=ρhσf

2, R(0,2) =R(1,3)=

ρvσf2, R(1,2) = R(0,3) = ρdσf

2. We want to predict f0 from f1,f2,f3 using a linear

predictor f0=a1 f1 +a2 f2 + a3 f3. Determine the predictor coefficients a1,a2,a3 that

will minimize the prediction MSE. For arbitrary values of ρh,ρv ,ρd, you can just write

down the equation for a1,a2,a3. For the specific case with ρh =ρv = ρd = ρ, write down

the solution explicitly.

f0

f2

f1

f3


Homework (4)

6. Suppose we want to code an image using lossless predictive coding, where

a new pixel is predicted by the average of the top and left pixel values. For

the following image:

Determine the predicted image, the prediction error image, and the

reconstructed image. Also determine the entropy of the prediction error

image, which provides a lower bound on the required bit rate. Only do for

the lower 3x3 region, assuming the top row and leftmost column are know.

9768

101197

9786

7465


Reading

• R. Gonzalez, “Digital Image Processing,”Section 8.1 - 8.4

• A.K. Jain, “Fundamentals of Digital Image Processing,” Section 11.1 – 11.3

lossless image compression - department of...

Documents