addressing image compression techniques on …chens/courses/cis6931/2001/...addressing image...

AAddddrreessssiinngg IImmaaggee CCoommpprreessssiioonn TTeecchhnniiqquueess oonn ccuurrrreenntt IInntteerrnneett TTeecchhnnoollooggiieess

EEdduuaarrddoo JJ.. MMoorreeiirraaOOnnyyeekkaa EEzzeennwwooyyee

FFlloorriiddaa IInntteerrnnaattiioonnaall UUnniivveerrssiittyy

SScchhooooll ooff CCoommppuutteerr SScciieennccee

CCIISS 66993311 AAddvvaanncceedd TTooppiiccss ooff IInnffoorrmmaattiioonn PPrroocceessssiinngg

DDrr.. SShhuu--CChhiinngg CChheenn

SSpprriinngg 22000011

2

INTRODUCTION

Introduction ......................................... 3-4

Background ........................................... 5-6

History of Data Compression .......................... 7-9

Compression Types ..................................... 10

Run Length Encoding ................................... 11

Huffman Coding ..................................... 12-14

Arithmetic Coding ..................................... 15

JPEG Joint Photographic Experts Group .............. 16-18

GIF Graphics Interchange Format .................... 19-20

Portable Network Graphic Format (PNG 0) ............ 21-22

MPEG Moving Picture Expert Group ................... 23-25

Conclusion ............................................ 26

References ............................................ 27

3

INTRODUCTION

Data compression focuses on the assumption that when

transmitting data whether it be images, music, video and so

on, one can benefit in the size and transmission times

associated with such endeavors. These days with the ever-

growing World Wide Web, one can understand the growing need

for transmission speeds and overall network performance.

People have grown accustomed to instantaneous gratification,

making a wait for lets say an image to appear on the screen

ever so important. This is one of the most important needs

for data compression; this paper will focus on some of the

most widely used formats and algorithms in this field. First

lets focus on the actual definitions.

Packing data is said to be the storing of data in a format

that requires less space than usual. Compressing data is the

same as packing data. Data compression is the term often

used in reference to packing data, particularly when it

involves data communications. Data compression is

particularly useful in communications because it enables

devices to transmit and store the same amount of data in

fewer bits. A file is a compressed format. Many operating

systems and applications contain commands that enable you to

pack a file so that it takes up less memory. For example,

suppose you have a text file containing ten consecutive

space characters. Normally, this would require ten bytes of

4

storage. However, a program that packs files would replace

the space characters by a special space-series character

followed by the number of spaces being replaced. In this

case, the ten spaces would require only two bytes. This is

just one packing technique, there are many others.

Data compression is also widely used in backup utilities,

spreadsheet applications, and database management systems.

Certain types of data, such as bit-mapped graphics, can be

compressed to a small fraction of their normal size. Some

modems automatically pack data before transmitting it across

communications lines. This can produce faster communication

because fewer bytes need to be sent. However, the modem on

the receiving side must be capable of unpacking the data.

In this paper, we will look at the fundamentals of data

compression, while focusing mainly on its application in the

compression of images.

We will discuss the relative strengths and weaknesses of

image formats and corresponding compression algorithms well

suited to the sharing of simple graphical images across

multiple hardware/operating system platforms and distributed

systems. The specific formats that will be discussed are

GIF, JPEG, and PNG, The compression algorithms, which we’ll

focus mainly, will be JPEG, GIF and PNG 0.

http://www.pcwebopedia.com/Data/Data_Compression/utility.html

5

BACKGROUND

In the past, documents were stored on paper and kept in

filing cabinets. This can be seen to be very inefficient in

terms of storage space and also the time taken to locate and

retrieve information when required. Storing and accessing

documents electronically through computers is now replacing

this traditional method of storing documents. This has

enabled us to manage things more efficiently and

effectively, so that items can be located and information

extracted without undue expense or inconvenience.

In terms of storage, the capacity of a storage device can be

effectively increased with methods that compress a body of

data on its way to a storage device and decompresses it when

it is retrieved.

In terms of communications, compressing data at the sending

end and decompressing data at the receiving end can

effectively increase the bandwidth of a digital

communication link.

To put it simply:

“Data compression saves time and money”

Lets give an example of an every day data compression

application, Facsimile or better known as fax.

Exchanging faxes every day is one of those inevitable tasks

of every business. But fax devices would not work without

6

data compression algorithms. The image of your document is

not being sent over the phone line as is. Fax compresses the

data before it sends it to reduce the time needed to

transmit the document. This reduces the cost of transmission

10 or more times. Now imagine there is no data compression.

A 50c fax transmission could cost as much as $5.

Data compression has many uses, in modern networking

environments. Compressing large files can significantly

reduce file transfer times. This might prove useful for

retrieving database information within a company’s internal

network, or for downloading images on an Internet web page

At any given time, the ability of the Internet to transfer

data is fixed. Thus, if data can effectively be compressed

wherever possible, significant improvements of data

throughput can be achieved. Many files can be combined into

one compressed document making sending easier.

7

A BRIEF HISTORY OF DATA COMPRESSION

Data Compression basically transforms data to a new

representation that typically has less redundancy, and is

based on prediction using some deterministic (but possibly

adapting) model. If data can be predicted with a high

confidence, it can be compressed very well. The Primary goal

of any data compression algorithm is the identification and

extraction of the source redundancy.

The late 40's were the early years of Information Theory;

the idea of developing efficient new coding methods was just

starting to be fleshed out. Ideas of entropy, information

content and redundancy were explored. One popular notion

held that if the probability of symbols in a message were

known, there ought to be a way to code the symbols so that

the message will take up less space.

The first well-known method for compressing digital signals

is now known as Shannon-Fanon coding. Shannon and Fanon

[1948] simultaneously developed this algorithm, which

assigns binary code words to unique symbols that appear

within a given data file. While Shannon-Fanon coding was a

great leap forward, it had the unfortunate luck to be

quickly superseded by an even more efficient coding system:

Huffman Coding.

Huffman coding [1952] shares most characteristics of

Shannon-Fanon coding. Huffman coding could perform effective

8

data compression by reducing the amount of redundancy in the

coding of symbols. It has been proven to be the most

efficient fixed-length coding method available.

In the last fifteen years, Huffman coding has been replaced

by arithmetic coding. Arithmetic coding bypasses the idea of

replacing an input symbol with a specific code. It replaces

a stream of input symbols with a single floating-point

output number. More bits are needed in the output number for

longer, complex messages.

Dictionary-based compression algorithms use a completely

different method to compress data. They encode variable-

length strings of symbols as single tokens. The token forms

an index to a phrase dictionary. If the tokens are smaller

than the phrases, they replace the phrases and compression

occurs. Two dictionary-based compression techniques called

LZ77 and LZ78 have been developed. LZ77 is a "sliding

window" technique in which the dictionary consists of a set

of fixed-length phrases found in a "window" into the

previously seen text. LZ78 takes a completely different

approach to building a dictionary. Instead of using fixed-

length phrases from a window into the text, LZ78 builds

phrases up one symbol at a time, adding a new symbol to an

existing phrase when a match occurs.

9

These various compression methods will be covered in more

detail in the following sections. Now we’ll discuss the

issue of compression types.

10

COMPRESSION TYPES

Data compression can be divided into two main types,

lossless and lossy compression.

Lossless compression can recover the exact original data

after compression. It is used mainly for compression

database records, spreadsheets or word processing files,

program files, where exact replication of the original is

essential. Examples of lossless technologies include the

Run-length and Huffman codes.

Lossy compression will result in a certain loss of accuracy

in exchange for a substantial increase in compression. Lossy

compression is more effective when used to compress graphic

images and digitized voice where losses outside visual or

aural perception can be tolerated. Most lossy compression

techniques can be adjusted to different quality levels,

gaining higher accuracy in exchange for less effective

compression. Examples of lossy methods are JPEG and MPEG.

11

RUN LENGTH ENCODING This lossless compression algorithm uses a quite simple

technique that can achieve compression ratios as good as 8

to 1. The whole idea is based on replacing multiple

occurrences of a symbol with only one copy and a count of

how many times that symbol appears. For example if the

string AAABBBCCCCCCCCCDDDDDEEEEFFFFGGGGG appears in a file

then it will be encoded as 3A3B9C5D4E4F5G.

This coding mechanism can also be used to compress

digital images by comparing pixels that appear adjacent to

each other and only store the changes. The Run Length

Encoding technique is very effective when encoding images

that contain large amounts of white spaces or large

homogeneous areas. But, it does not work well when encoding

files that contain even a small degree of variation among

adjacent pixels. Because it uses two bytes to represent each

symbol. In those cases, the compression algorithm can

actually increase the file size. This compression technique

is used frequently for fax transmissions, which usually

contain large amounts of white spaces that can be removed.

12

HUFFMAN CODING

Huffman Code is one of the most important early developments

in the field of data compression. It uses a simple strategy

to achieve on the average 25 percent savings. It is also

possible to achieve as much as a 50 to 60 percent savings on

may instances when compressing large files. The algorithm is

based on creating a variable length character code or code

words to map those characters that appear in a file most

frequently to code words with the smallest lengths.

Code words are represented as variable length binary

strings. Each binary string is then mapped to a different

character found in the file. A frequency distribution of the

characters found in the file being analyzed for compression

must be created. This distribution is then used to decide

which codeword will be assigned to each of the symbols that

appear in the file. Let us consider how this concept can be

applied when compressing a file composed of printable ASCII

characters. The ASCCII character set consists of

approximately 100 printable characters. In order to uniquely

represent each printable ASCII character we will need the

ceiling of (log 100) or 7 bit strings. This type of

representation will not take into consideration the

frequency at which the characters appear in the file.

Therefore, we will not be able to accomplish the desire

space savings. Lets look at a simple example of how taking

into account the frequency at which characters appear in a

file allows us to save space. Lets suppose that were are

13

trying to compress a file composed of 6 distinct characters

with the following distribution: character c appears 100,000

times, d appears 30,000 times, y appears 5,000 times, t

appears 1,000 times, r appears 50 times and z appears 25

times. Then, we can assign the codeword with the shortest

length to c. Since, c appears with the highest frequency.

The following code words could be assigned to the file

characters:

USING HUFFMAN CODES

Character Codeword Space required to represent all file characters(String length) * (frequency of characters in file)

----------------------------------------------------------------c 0 1*(100,000)d 101 3*(30,000)y 100 3*(5,000)t 111 3*(1,000)r 1101 4*(50)z 1100 4*(25)---------------------------------------------------------------Total Bits Required 208,300---------------------------------------------------------------

USING FIX SIZED STRINGS

Character Codeword Space required to represent all file characters

(String length) * (frequency of characters in file)

------------------------------------------------------------------c 000 3*(100,000)d 001 3*(30,000)y 010 3*(5,000)t 011 3*(1,000)r 100 3*(50)z 101 3*(25)-----------------------------------------------------------------Total Bits Required 408,225-----------------------------------------------------------------

By using Huffman codes, we were able to accomplish a savings

of 199,925 bits from using fixed sized strings. Please note

that if all characters were to occur with the same frequency

in a file, Huffman Codes will not provide any savings.

14

After considering the savings that can be accomplished

when using Huffman Coding for compression, the question

arises, how do we build the coding scheme required. The

classic scheme is to use the frequency distribution table of

the characters appearing in the file. Then begin with the

largest frequency and assign the codeword with the smallest

length to it and move to the next most frequently occurring

character. A full binary tree is always used to represent

the optimal code for a file. The tree only has data at the

leaves. A zero is used to indicate the left branch and a one

to indicate the right branch. Characters with the highest

frequencies will be found closer to the root of the tree

than characters with lower frequencies. Once the code words

have been defined for each character, encoding the file is a

matter of simply concatenating the code words representing

the characters in the file. Decoding will then be a matter

of identifying the code words associated with each character

and repeating the process for the entire encoded file.

15

Arithmetic Coding

Arithmetic coding encodes symbols using non-integer numbers

of bits. It is reference based. All inputs are represented

as a real number –R- between the ranges of 0.37 and 0.46.

The bigger the input data the more precision is needed for

R.

One of the main goals associated with lossless

compression methods is the decomposition of data into

sequences of events. This events need to be encoded in as

few bits as it is possible without loosing any of the

apparent information. We need to give smaller code words to

events which are determined to be more likely. This set can

be compressed whenever more probable events occur. If there

exists an event in which there are ev1, ev2, ev3, evn

distinct pieces, we together make a probability distribution

P. The smallest possible expected numbers of bits to encode

are of the entropy p(H(P)).

16

JPEG – JOINT PHOTOGRAPHIC EXPERTS GROUP

JPEG was developed to compress gray-scale or color images by

the Joint Photographic Experts Group committee. This format

of images is well suited for real world photographs; scenic

a depiction of nature, but in the same token has not been

shown to work well with line drawings, cartoon animation,

and other similar drawings. The type of algorithm used to

compress and decompress the image makes this a "lossy" style

compression. This gives us somewhat of a less than the

original quality we started out with. Even though it was

designed to exploit the limitations of the human eye's

perception of minor color changes in brightness. Thus it was

designed to be viewed by the human eye not analyzed by

machines, which would pickup changes on a compressed image

as small errors and could give problems if used in

conjunction with comparing algorithms. Nevertheless, the

amount of compression that can be accomplished by this

“lossy” method is far greater than with other "lossless"

methods. One reason that jpeg images are well liked has to

do with the amount of flexibility that one has to create

smaller lower quality images or larger higher quality ones,

by changing compression parameters. This in itself makes it

extremely useful to a broad scope of real world

applications. For example, if the main objective is to make

the image as small as possible for transfer through a

network, then you simply would ask yourself "What is the

lowest amount of quality we need" and adjust accordingly.

17

Another important aspect of using JPEG images is that one

could control the actual decoding speed as it relates to the

image quality. This is accomplished by using inaccurate

approximations instead of exact calculations.

There are a couple of good reasons to use JPEG’s; of

course, the first would have to be the compression

advantages and then the storing of 24bit color per pixel

instead of the usual 8bit color per pixel. Always

remembering that moving a 100KByte-image file across a

network instead of a 2Mbyte full-color file makes a big

difference in disk usage and transmission times. We do have

a draw back in the time it takes to decode and view a JPEG

to one, which is of a different format such as a GIF.

However, the trade off in network transmission time is

greater then the actual time taken to decompress the file.

Another advantage of using JPEG’s is the ability to store

24bit/pixel (16million colors), making it a great source for

high quality/definition photographs and drawings. We also

need to understand that these types of image formats can

never duplicate what the human eye seas.

The best-known lossless compression method can

accomplish a 2:1 compression on the average, using JPEG we

can achieve a 10:1 up to 20:1 compression without visible

loss. A 30:1 to 50:1 compression can be achieved with small

to moderate loss of quality. For very low quality purposes,

one could go as far as 100:1 ratio for usage such as

previews. Gray scale images usually do not compress by as

18

much of a factor as do color. The human eye is usually more

sensitive to brightness variations than to hue variations.

JPEG can compress hue data much better than brightness. Gray

scale images are usually only 10-20 percentages smaller than

color images but the uncompressed gray-scale data is only

eight bits/pixels. Approximately 30% of a full color image

of the same quality.

Lets us talk about some of the different compression

methods, which are all, know as JPEG. First, one is called

the baseline JPEG that is stored as one top to bottom scan

of the image. Now lets look at progressive JPEG, which

divides the image file into a series of scans. On the first

pass, it shows the image in a very low quality setting that

takes little space. The storage requirements are roughly the

same as baseline except one can see an approximation of the

image fairly quickly and each scan returns a higher quality

rendering of the image. This format is great when it comes

to the transfer of images across the World Wide Web on slow

modem speeds.

19

GIF – GRAPHIC INTERCHANGE FORMAT

GIF was created as CompuServe’s standard for graphic image

data exchange. We have two versions (GIF standards) being

GIF87a and the other GIF89a. There are minor differences

between both. These days GIF is becoming less used for some

reasons. These reasons have to do with the number of colors

that can be that can be used to represent image data. An

image cannot have a bit depth of more than eight bits per

pixel; meaning that a maximum of 256 separate colors can be

used is a single GIF image. Sometimes more than one image is

within a GIF file, this is called an animation. This is

accomplished by writing out images to screen one after the

other with some type of timing sequence between them. This

animation method is still being used throughout the web.

Image data within the GIF is compressed using what is called

a modified LZW (Lempel-Ziv)algorithm.

It works with indexed 256 color images or less. It

seems to perform best if the image being compressed is

composed of a large percentage of solid colors throughout a

wide portion of the image scope. Other methods record the

actual color value of each pixel, but this method records

one pixels color value and it groups other pixels with the

same color value follow. GIF’s can be interlaced allowing

for progressively clearer images when downloading from the

web on your browser. This helps users start seeing the image

without waiting for the entire image to be downloaded

20

completely. This algorithm is a “lossless” one. The quality

of the image is in no way compromised by compression or any

other function. The original image (data) is restored to

exactly what the original was at the beginning when it is

opened. If you specify fewer that the original 256 colors

for the image you can reduce file sizes. When indexing an

image you should choose the lowest amount of colors needed

to provide the necessary quality for that image, so that the

file size is the smallest it needs to be.

If we use the enhanced GIF89A format, we can specify

one color to be transparent. This allows GIF images to

appear with irregular shapes instead of the normal

rectangular. This is created by having borders colors be

transparent giving the image a cut out effect. Always

keeping in mind that if you choose a color, which happens to

appear in the image, you can distort it.

21

PNG – PORTABLE NETWORK GRAPHIC FORMAT (method 0)

Currently method 0 is the only compression method defined

for PNG. This method is a modified Lempel-Ziv 77 (LZ77)

algorithm. This algorithm is very similar to the one

currently used by “zip” programs like winzip, and pkzip.

Because of this, we get the benefits of 15-35 percent higher

compression. This is considered a lossless algorithm and

intern works very well with PNG true color, palette, and

grayscale color scopes. If we compare JPEG and PNG using

true color and grayscale images, JPEG generally gives us

better compression performance intern smaller file size and

faster transmission speed, while PNG 0 offers higher image

quality. For all intents and purposes it appears that both

JPEG and PNG 0 seem to have a good future in the every

growing and changing World Wide Web.

PNG uses various filtering techniques very similar to

compression but different. Filtering is applied toward bytes

of data before any compression takes place. This intern

prepares the data for optimal compression. It has been shown

that this pre-compression filtering works best when applied

to true color images. Always noting that each individual

image will ultimately determine it’s final outcome. One

needs to test different filters to see which works best for

the individual image.

In conclusion PNG 0 supports 24-bit color, provides

good transparency scheme, gives us great gamma and

http://www.rasip.fer.hr/research/compress/algorithms/adv/png/index.html

22

interlacing support. These and other reasons I believe will

help in the lasting staying power of this format.

23

MPEG – MOVING PICTURE EXPERT GROUP

MPEG Compression has become one of the most popular video

formats currently being used today. It uses several

compression techniques, which will be furthered, discussed

in detail.

MPEG uses three main techniques to achieve up to 30:1

compression ratios. First one being a color conversion,

which plainly transforms a 24-bit RBG picture element into a

16 or 12-bit signal. RGB is composed of three colors red,

blue, and green. Which are encoded using 8 bits with 1 thru

256 levels. We use the original image resolution for

luminance and reduce the red and blue colors this deals with

the fact that the human eye can detect higher resolution

using contrast than in color.

The second technique used is the Quantized Discrete

Cosine Transform (QDCT). This transform converts individual

colors and also intensity levels into an equal number of

cosine frequencies. This moves the most detected variations

in the block to the upper left-hand corner. Then a

predetermined quantization matrix is used which weights

different frequencies then divides the DCT matrix. This is

governed by predetermined definitions of impact to human

perception. This quantized matrix usually has zeroes in all

but a few elements. The non-zeroes elements are concentrated

at the upper left corner of a usually 8x8 block. This block

is sent into a run-length and Huffman encoding function,

24

which reduces the 128 byte block to non-zero coefficients

being followed by a number which represents the number of

zero coefficients before the next significant coefficient

padding the sequence to its full length with zeros. By this

step we get significant compression ratios accompanied with

results in variable size Intraframes.

For the third technique we use what is called motion

estimation and error coding. We assume that in a scene most

of the blocks in one frame will exist in the next frame. But

can be at different locations on the next.

Both MPEG-1 and MPEG-2 use a clustered group size of

15, of which the first one is called the I-frame or

intraframe. This particular frame is much larger than the

rest and does not use motion estimation. Usually these

frames occur every ½ second in a 30frames/ps video. Also p-

frames (predictive frames) are used. They use motion

estimation from the previous I or P frames. P-frames contain

motion vectors and also error correction information for

each macro-block. B-frames (interpolated frames) are

interpolated between non-B-frames on either sides. When we

use two motion vectors for each macro block with

interpolating between two reference frames, we decrease the

size of the error correction and this results in a smaller

amount for the frame.

Another format which is being developed, called MPEG-7,

will provide a better set of standardized tools to

facilitate the descriptions of multimedia content. Unlike

25

the previous MPEG standards which focused on coded

representation of audio and visual data, the main focus here

will be the representation of information about the content,

not the content itself. This intern will provide a much

needed indexing standard which will intern help in accessing

and managing multimedia objects which currently are growing

out of control as more data streams are being sent through

the internet and other networks. Volume and complexity plays

a major role in the lack of current indexing, searching and

retrieving algorithms mainly for multimedia resources. Some

of the objectives which will be addressed are Descriptors

which will provide different feature information about the

multimedia resource. Description Schemes which will be

defined as relationships between structures of Descriptors.

And third, Description Definition Language which will define

both Description Schemes and Descriptors. All this will

enable us to describe audiovisual resources regardless of

storage, transmission, medium, and technology.

26

CONCLUSION

With the ever-growing volume of multimedia data being

transmitted throughout private and public networks, one can

see that compression is inseparable from the data itself. We

constantly strive to send more data in a timely manner even

though network technologies seem to be falling behind ever

so more. In the past, text and simple graphics data was the

norm being transmitted throughout networks (private &

public). Today this has drastically changed, where streaming

videos, music and other high end media is fast becoming the

norm not the exception. This is saturating the current

infrastructures we have in place.

Because of the growing need to send more data faster,

data compression is in the foreground of this endeavor. As

new compression techniques are developed and implemented, we

will see more and more data streams being transmitted,

allowing world networks a more productive method for helping

in the formation of the next generation multimedia formats.

27

REFERENCES

S. Golomb, “Runlength Encodings,” IEEE Trans, Inform.Theory, vol. 1’1’-12, July 1996

W. Pennebaker and J. Mitchell, JPEG still Image CompressionStandard, New York, Nostrand Reinhold

Larry L. Peterson and Bruce S. Davie, Computer Networks – ASystems Approach, ISBN 1-55860-368-9, 1996

Cormen H. Thomas, Leiserson E. Charles and Rivest L. Ronald,Introduction to Algorithms 1990

Ian H. Witten, Alistair Moffat, and Timothy C. Bell –Managing Gigabytes: Compressing and Indexing Documents andImages.

Buford J.F.K. Multimedia Systems 1994 ACM Press/Addison-Wesley

Ziv, J. and A. Lempel. Compression of individual sequencesvia variable rate coding.

D.E Knuth, Dynamic Huffman Coding – Algorithms(June 1985) pg163-175

P.G. Howard & J.S. Vitter, New methods for lossless imagecompression using Arithmetic Coding, Information Processingand Management28 (1992)

X. Wu, W.K. Choi, amd N.D. Memon, Lossless interframe imagecompression, 1998 Data Compression Conference, Mar 1998.

JPEG WWW Home http://www.jpeg.org/public/jpeghomepage.htm

MPEG WWW Home http://www.cselt.it/mpeg/

http://www.cselt.it/mpeg/