addressing image compression techniques on …chens/courses/cis6931/2001/...addressing image...
TRANSCRIPT
AAddddrreessssiinngg IImmaaggee CCoommpprreessssiioonn TTeecchhnniiqquueess oonn ccuurrrreenntt IInntteerrnneett TTeecchhnnoollooggiieess
EEdduuaarrddoo JJ.. MMoorreeiirraaOOnnyyeekkaa EEzzeennwwooyyee
FFlloorriiddaa IInntteerrnnaattiioonnaall UUnniivveerrssiittyy
SScchhooooll ooff CCoommppuutteerr SScciieennccee
CCIISS 66993311 AAddvvaanncceedd TTooppiiccss ooff IInnffoorrmmaattiioonn PPrroocceessssiinngg
DDrr.. SShhuu--CChhiinngg CChheenn
SSpprriinngg 22000011
2
INTRODUCTION
Introduction ......................................... 3-4
Background ........................................... 5-6
History of Data Compression .......................... 7-9
Compression Types ..................................... 10
Run Length Encoding ................................... 11
Huffman Coding ..................................... 12-14
Arithmetic Coding ..................................... 15
JPEG Joint Photographic Experts Group .............. 16-18
GIF Graphics Interchange Format .................... 19-20
Portable Network Graphic Format (PNG 0) ............ 21-22
MPEG Moving Picture Expert Group ................... 23-25
Conclusion ............................................ 26
References ............................................ 27
3
INTRODUCTION
Data compression focuses on the assumption that when
transmitting data whether it be images, music, video and so
on, one can benefit in the size and transmission times
associated with such endeavors. These days with the ever-
growing World Wide Web, one can understand the growing need
for transmission speeds and overall network performance.
People have grown accustomed to instantaneous gratification,
making a wait for lets say an image to appear on the screen
ever so important. This is one of the most important needs
for data compression; this paper will focus on some of the
most widely used formats and algorithms in this field. First
lets focus on the actual definitions.
Packing data is said to be the storing of data in a format
that requires less space than usual. Compressing data is the
same as packing data. Data compression is the term often
used in reference to packing data, particularly when it
involves data communications. Data compression is
particularly useful in communications because it enables
devices to transmit and store the same amount of data in
fewer bits. A file is a compressed format. Many operating
systems and applications contain commands that enable you to
pack a file so that it takes up less memory. For example,
suppose you have a text file containing ten consecutive
space characters. Normally, this would require ten bytes of
4
storage. However, a program that packs files would replace
the space characters by a special space-series character
followed by the number of spaces being replaced. In this
case, the ten spaces would require only two bytes. This is
just one packing technique, there are many others.
Data compression is also widely used in backup utilities,
spreadsheet applications, and database management systems.
Certain types of data, such as bit-mapped graphics, can be
compressed to a small fraction of their normal size. Some
modems automatically pack data before transmitting it across
communications lines. This can produce faster communication
because fewer bytes need to be sent. However, the modem on
the receiving side must be capable of unpacking the data.
In this paper, we will look at the fundamentals of data
compression, while focusing mainly on its application in the
compression of images.
We will discuss the relative strengths and weaknesses of
image formats and corresponding compression algorithms well
suited to the sharing of simple graphical images across
multiple hardware/operating system platforms and distributed
systems. The specific formats that will be discussed are
GIF, JPEG, and PNG, The compression algorithms, which we’ll
focus mainly, will be JPEG, GIF and PNG 0.
5
BACKGROUND
In the past, documents were stored on paper and kept in
filing cabinets. This can be seen to be very inefficient in
terms of storage space and also the time taken to locate and
retrieve information when required. Storing and accessing
documents electronically through computers is now replacing
this traditional method of storing documents. This has
enabled us to manage things more efficiently and
effectively, so that items can be located and information
extracted without undue expense or inconvenience.
In terms of storage, the capacity of a storage device can be
effectively increased with methods that compress a body of
data on its way to a storage device and decompresses it when
it is retrieved.
In terms of communications, compressing data at the sending
end and decompressing data at the receiving end can
effectively increase the bandwidth of a digital
communication link.
To put it simply:
“Data compression saves time and money”
Lets give an example of an every day data compression
application, Facsimile or better known as fax.
Exchanging faxes every day is one of those inevitable tasks
of every business. But fax devices would not work without
6
data compression algorithms. The image of your document is
not being sent over the phone line as is. Fax compresses the
data before it sends it to reduce the time needed to
transmit the document. This reduces the cost of transmission
10 or more times. Now imagine there is no data compression.
A 50c fax transmission could cost as much as $5.
Data compression has many uses, in modern networking
environments. Compressing large files can significantly
reduce file transfer times. This might prove useful for
retrieving database information within a company’s internal
network, or for downloading images on an Internet web page
At any given time, the ability of the Internet to transfer
data is fixed. Thus, if data can effectively be compressed
wherever possible, significant improvements of data
throughput can be achieved. Many files can be combined into
one compressed document making sending easier.
7
A BRIEF HISTORY OF DATA COMPRESSION
Data Compression basically transforms data to a new
representation that typically has less redundancy, and is
based on prediction using some deterministic (but possibly
adapting) model. If data can be predicted with a high
confidence, it can be compressed very well. The Primary goal
of any data compression algorithm is the identification and
extraction of the source redundancy.
The late 40's were the early years of Information Theory;
the idea of developing efficient new coding methods was just
starting to be fleshed out. Ideas of entropy, information
content and redundancy were explored. One popular notion
held that if the probability of symbols in a message were
known, there ought to be a way to code the symbols so that
the message will take up less space.
The first well-known method for compressing digital signals
is now known as Shannon-Fanon coding. Shannon and Fanon
[1948] simultaneously developed this algorithm, which
assigns binary code words to unique symbols that appear
within a given data file. While Shannon-Fanon coding was a
great leap forward, it had the unfortunate luck to be
quickly superseded by an even more efficient coding system:
Huffman Coding.
Huffman coding [1952] shares most characteristics of
Shannon-Fanon coding. Huffman coding could perform effective
8
data compression by reducing the amount of redundancy in the
coding of symbols. It has been proven to be the most
efficient fixed-length coding method available.
In the last fifteen years, Huffman coding has been replaced
by arithmetic coding. Arithmetic coding bypasses the idea of
replacing an input symbol with a specific code. It replaces
a stream of input symbols with a single floating-point
output number. More bits are needed in the output number for
longer, complex messages.
Dictionary-based compression algorithms use a completely
different method to compress data. They encode variable-
length strings of symbols as single tokens. The token forms
an index to a phrase dictionary. If the tokens are smaller
than the phrases, they replace the phrases and compression
occurs. Two dictionary-based compression techniques called
LZ77 and LZ78 have been developed. LZ77 is a "sliding
window" technique in which the dictionary consists of a set
of fixed-length phrases found in a "window" into the
previously seen text. LZ78 takes a completely different
approach to building a dictionary. Instead of using fixed-
length phrases from a window into the text, LZ78 builds
phrases up one symbol at a time, adding a new symbol to an
existing phrase when a match occurs.
9
These various compression methods will be covered in more
detail in the following sections. Now we’ll discuss the
issue of compression types.
10
COMPRESSION TYPES
Data compression can be divided into two main types,
lossless and lossy compression.
Lossless compression can recover the exact original data
after compression. It is used mainly for compression
database records, spreadsheets or word processing files,
program files, where exact replication of the original is
essential. Examples of lossless technologies include the
Run-length and Huffman codes.
Lossy compression will result in a certain loss of accuracy
in exchange for a substantial increase in compression. Lossy
compression is more effective when used to compress graphic
images and digitized voice where losses outside visual or
aural perception can be tolerated. Most lossy compression
techniques can be adjusted to different quality levels,
gaining higher accuracy in exchange for less effective
compression. Examples of lossy methods are JPEG and MPEG.
11
RUN LENGTH ENCODING This lossless compression algorithm uses a quite simple
technique that can achieve compression ratios as good as 8
to 1. The whole idea is based on replacing multiple
occurrences of a symbol with only one copy and a count of
how many times that symbol appears. For example if the
string AAABBBCCCCCCCCCDDDDDEEEEFFFFGGGGG appears in a file
then it will be encoded as 3A3B9C5D4E4F5G.
This coding mechanism can also be used to compress
digital images by comparing pixels that appear adjacent to
each other and only store the changes. The Run Length
Encoding technique is very effective when encoding images
that contain large amounts of white spaces or large
homogeneous areas. But, it does not work well when encoding
files that contain even a small degree of variation among
adjacent pixels. Because it uses two bytes to represent each
symbol. In those cases, the compression algorithm can
actually increase the file size. This compression technique
is used frequently for fax transmissions, which usually
contain large amounts of white spaces that can be removed.
12
HUFFMAN CODING
Huffman Code is one of the most important early developments
in the field of data compression. It uses a simple strategy
to achieve on the average 25 percent savings. It is also
possible to achieve as much as a 50 to 60 percent savings on
may instances when compressing large files. The algorithm is
based on creating a variable length character code or code
words to map those characters that appear in a file most
frequently to code words with the smallest lengths.
Code words are represented as variable length binary
strings. Each binary string is then mapped to a different
character found in the file. A frequency distribution of the
characters found in the file being analyzed for compression
must be created. This distribution is then used to decide
which codeword will be assigned to each of the symbols that
appear in the file. Let us consider how this concept can be
applied when compressing a file composed of printable ASCII
characters. The ASCCII character set consists of
approximately 100 printable characters. In order to uniquely
represent each printable ASCII character we will need the
ceiling of (log 100) or 7 bit strings. This type of
representation will not take into consideration the
frequency at which the characters appear in the file.
Therefore, we will not be able to accomplish the desire
space savings. Lets look at a simple example of how taking
into account the frequency at which characters appear in a
file allows us to save space. Lets suppose that were are
13
trying to compress a file composed of 6 distinct characters
with the following distribution: character c appears 100,000
times, d appears 30,000 times, y appears 5,000 times, t
appears 1,000 times, r appears 50 times and z appears 25
times. Then, we can assign the codeword with the shortest
length to c. Since, c appears with the highest frequency.
The following code words could be assigned to the file
characters:
USING HUFFMAN CODES
Character Codeword Space required to represent all file characters(String length) * (frequency of characters in file)
----------------------------------------------------------------c 0 1*(100,000)d 101 3*(30,000)y 100 3*(5,000)t 111 3*(1,000)r 1101 4*(50)z 1100 4*(25)---------------------------------------------------------------Total Bits Required 208,300---------------------------------------------------------------
USING FIX SIZED STRINGS
Character Codeword Space required to represent all file characters
(String length) * (frequency of characters in file)
------------------------------------------------------------------c 000 3*(100,000)d 001 3*(30,000)y 010 3*(5,000)t 011 3*(1,000)r 100 3*(50)z 101 3*(25)-----------------------------------------------------------------Total Bits Required 408,225-----------------------------------------------------------------
By using Huffman codes, we were able to accomplish a savings
of 199,925 bits from using fixed sized strings. Please note
that if all characters were to occur with the same frequency
in a file, Huffman Codes will not provide any savings.
14
After considering the savings that can be accomplished
when using Huffman Coding for compression, the question
arises, how do we build the coding scheme required. The
classic scheme is to use the frequency distribution table of
the characters appearing in the file. Then begin with the
largest frequency and assign the codeword with the smallest
length to it and move to the next most frequently occurring
character. A full binary tree is always used to represent
the optimal code for a file. The tree only has data at the
leaves. A zero is used to indicate the left branch and a one
to indicate the right branch. Characters with the highest
frequencies will be found closer to the root of the tree
than characters with lower frequencies. Once the code words
have been defined for each character, encoding the file is a
matter of simply concatenating the code words representing
the characters in the file. Decoding will then be a matter
of identifying the code words associated with each character
and repeating the process for the entire encoded file.
15
Arithmetic Coding
Arithmetic coding encodes symbols using non-integer numbers
of bits. It is reference based. All inputs are represented
as a real number –R- between the ranges of 0.37 and 0.46.
The bigger the input data the more precision is needed for
R.
One of the main goals associated with lossless
compression methods is the decomposition of data into
sequences of events. This events need to be encoded in as
few bits as it is possible without loosing any of the
apparent information. We need to give smaller code words to
events which are determined to be more likely. This set can
be compressed whenever more probable events occur. If there
exists an event in which there are ev1, ev2, ev3, evn
distinct pieces, we together make a probability distribution
P. The smallest possible expected numbers of bits to encode
are of the entropy p(H(P)).
16
JPEG – JOINT PHOTOGRAPHIC EXPERTS GROUP
JPEG was developed to compress gray-scale or color images by
the Joint Photographic Experts Group committee. This format
of images is well suited for real world photographs; scenic
a depiction of nature, but in the same token has not been
shown to work well with line drawings, cartoon animation,
and other similar drawings. The type of algorithm used to
compress and decompress the image makes this a "lossy" style
compression. This gives us somewhat of a less than the
original quality we started out with. Even though it was
designed to exploit the limitations of the human eye's
perception of minor color changes in brightness. Thus it was
designed to be viewed by the human eye not analyzed by
machines, which would pickup changes on a compressed image
as small errors and could give problems if used in
conjunction with comparing algorithms. Nevertheless, the
amount of compression that can be accomplished by this
“lossy” method is far greater than with other "lossless"
methods. One reason that jpeg images are well liked has to
do with the amount of flexibility that one has to create
smaller lower quality images or larger higher quality ones,
by changing compression parameters. This in itself makes it
extremely useful to a broad scope of real world
applications. For example, if the main objective is to make
the image as small as possible for transfer through a
network, then you simply would ask yourself "What is the
lowest amount of quality we need" and adjust accordingly.
17
Another important aspect of using JPEG images is that one
could control the actual decoding speed as it relates to the
image quality. This is accomplished by using inaccurate
approximations instead of exact calculations.
There are a couple of good reasons to use JPEG’s; of
course, the first would have to be the compression
advantages and then the storing of 24bit color per pixel
instead of the usual 8bit color per pixel. Always
remembering that moving a 100KByte-image file across a
network instead of a 2Mbyte full-color file makes a big
difference in disk usage and transmission times. We do have
a draw back in the time it takes to decode and view a JPEG
to one, which is of a different format such as a GIF.
However, the trade off in network transmission time is
greater then the actual time taken to decompress the file.
Another advantage of using JPEG’s is the ability to store
24bit/pixel (16million colors), making it a great source for
high quality/definition photographs and drawings. We also
need to understand that these types of image formats can
never duplicate what the human eye seas.
The best-known lossless compression method can
accomplish a 2:1 compression on the average, using JPEG we
can achieve a 10:1 up to 20:1 compression without visible
loss. A 30:1 to 50:1 compression can be achieved with small
to moderate loss of quality. For very low quality purposes,
one could go as far as 100:1 ratio for usage such as
previews. Gray scale images usually do not compress by as
18
much of a factor as do color. The human eye is usually more
sensitive to brightness variations than to hue variations.
JPEG can compress hue data much better than brightness. Gray
scale images are usually only 10-20 percentages smaller than
color images but the uncompressed gray-scale data is only
eight bits/pixels. Approximately 30% of a full color image
of the same quality.
Lets us talk about some of the different compression
methods, which are all, know as JPEG. First, one is called
the baseline JPEG that is stored as one top to bottom scan
of the image. Now lets look at progressive JPEG, which
divides the image file into a series of scans. On the first
pass, it shows the image in a very low quality setting that
takes little space. The storage requirements are roughly the
same as baseline except one can see an approximation of the
image fairly quickly and each scan returns a higher quality
rendering of the image. This format is great when it comes
to the transfer of images across the World Wide Web on slow
modem speeds.
19
GIF – GRAPHIC INTERCHANGE FORMAT
GIF was created as CompuServe’s standard for graphic image
data exchange. We have two versions (GIF standards) being
GIF87a and the other GIF89a. There are minor differences
between both. These days GIF is becoming less used for some
reasons. These reasons have to do with the number of colors
that can be that can be used to represent image data. An
image cannot have a bit depth of more than eight bits per
pixel; meaning that a maximum of 256 separate colors can be
used is a single GIF image. Sometimes more than one image is
within a GIF file, this is called an animation. This is
accomplished by writing out images to screen one after the
other with some type of timing sequence between them. This
animation method is still being used throughout the web.
Image data within the GIF is compressed using what is called
a modified LZW (Lempel-Ziv)algorithm.
It works with indexed 256 color images or less. It
seems to perform best if the image being compressed is
composed of a large percentage of solid colors throughout a
wide portion of the image scope. Other methods record the
actual color value of each pixel, but this method records
one pixels color value and it groups other pixels with the
same color value follow. GIF’s can be interlaced allowing
for progressively clearer images when downloading from the
web on your browser. This helps users start seeing the image
without waiting for the entire image to be downloaded
20
completely. This algorithm is a “lossless” one. The quality
of the image is in no way compromised by compression or any
other function. The original image (data) is restored to
exactly what the original was at the beginning when it is
opened. If you specify fewer that the original 256 colors
for the image you can reduce file sizes. When indexing an
image you should choose the lowest amount of colors needed
to provide the necessary quality for that image, so that the
file size is the smallest it needs to be.
If we use the enhanced GIF89A format, we can specify
one color to be transparent. This allows GIF images to
appear with irregular shapes instead of the normal
rectangular. This is created by having borders colors be
transparent giving the image a cut out effect. Always
keeping in mind that if you choose a color, which happens to
appear in the image, you can distort it.
21
PNG – PORTABLE NETWORK GRAPHIC FORMAT (method 0)
Currently method 0 is the only compression method defined
for PNG. This method is a modified Lempel-Ziv 77 (LZ77)
algorithm. This algorithm is very similar to the one
currently used by “zip” programs like winzip, and pkzip.
Because of this, we get the benefits of 15-35 percent higher
compression. This is considered a lossless algorithm and
intern works very well with PNG true color, palette, and
grayscale color scopes. If we compare JPEG and PNG using
true color and grayscale images, JPEG generally gives us
better compression performance intern smaller file size and
faster transmission speed, while PNG 0 offers higher image
quality. For all intents and purposes it appears that both
JPEG and PNG 0 seem to have a good future in the every
growing and changing World Wide Web.
PNG uses various filtering techniques very similar to
compression but different. Filtering is applied toward bytes
of data before any compression takes place. This intern
prepares the data for optimal compression. It has been shown
that this pre-compression filtering works best when applied
to true color images. Always noting that each individual
image will ultimately determine it’s final outcome. One
needs to test different filters to see which works best for
the individual image.
In conclusion PNG 0 supports 24-bit color, provides
good transparency scheme, gives us great gamma and
22
interlacing support. These and other reasons I believe will
help in the lasting staying power of this format.
23
MPEG – MOVING PICTURE EXPERT GROUP
MPEG Compression has become one of the most popular video
formats currently being used today. It uses several
compression techniques, which will be furthered, discussed
in detail.
MPEG uses three main techniques to achieve up to 30:1
compression ratios. First one being a color conversion,
which plainly transforms a 24-bit RBG picture element into a
16 or 12-bit signal. RGB is composed of three colors red,
blue, and green. Which are encoded using 8 bits with 1 thru
256 levels. We use the original image resolution for
luminance and reduce the red and blue colors this deals with
the fact that the human eye can detect higher resolution
using contrast than in color.
The second technique used is the Quantized Discrete
Cosine Transform (QDCT). This transform converts individual
colors and also intensity levels into an equal number of
cosine frequencies. This moves the most detected variations
in the block to the upper left-hand corner. Then a
predetermined quantization matrix is used which weights
different frequencies then divides the DCT matrix. This is
governed by predetermined definitions of impact to human
perception. This quantized matrix usually has zeroes in all
but a few elements. The non-zeroes elements are concentrated
at the upper left corner of a usually 8x8 block. This block
is sent into a run-length and Huffman encoding function,
24
which reduces the 128 byte block to non-zero coefficients
being followed by a number which represents the number of
zero coefficients before the next significant coefficient
padding the sequence to its full length with zeros. By this
step we get significant compression ratios accompanied with
results in variable size Intraframes.
For the third technique we use what is called motion
estimation and error coding. We assume that in a scene most
of the blocks in one frame will exist in the next frame. But
can be at different locations on the next.
Both MPEG-1 and MPEG-2 use a clustered group size of
15, of which the first one is called the I-frame or
intraframe. This particular frame is much larger than the
rest and does not use motion estimation. Usually these
frames occur every ½ second in a 30frames/ps video. Also p-
frames (predictive frames) are used. They use motion
estimation from the previous I or P frames. P-frames contain
motion vectors and also error correction information for
each macro-block. B-frames (interpolated frames) are
interpolated between non-B-frames on either sides. When we
use two motion vectors for each macro block with
interpolating between two reference frames, we decrease the
size of the error correction and this results in a smaller
amount for the frame.
Another format which is being developed, called MPEG-7,
will provide a better set of standardized tools to
facilitate the descriptions of multimedia content. Unlike
25
the previous MPEG standards which focused on coded
representation of audio and visual data, the main focus here
will be the representation of information about the content,
not the content itself. This intern will provide a much
needed indexing standard which will intern help in accessing
and managing multimedia objects which currently are growing
out of control as more data streams are being sent through
the internet and other networks. Volume and complexity plays
a major role in the lack of current indexing, searching and
retrieving algorithms mainly for multimedia resources. Some
of the objectives which will be addressed are Descriptors
which will provide different feature information about the
multimedia resource. Description Schemes which will be
defined as relationships between structures of Descriptors.
And third, Description Definition Language which will define
both Description Schemes and Descriptors. All this will
enable us to describe audiovisual resources regardless of
storage, transmission, medium, and technology.
26
CONCLUSION
With the ever-growing volume of multimedia data being
transmitted throughout private and public networks, one can
see that compression is inseparable from the data itself. We
constantly strive to send more data in a timely manner even
though network technologies seem to be falling behind ever
so more. In the past, text and simple graphics data was the
norm being transmitted throughout networks (private &
public). Today this has drastically changed, where streaming
videos, music and other high end media is fast becoming the
norm not the exception. This is saturating the current
infrastructures we have in place.
Because of the growing need to send more data faster,
data compression is in the foreground of this endeavor. As
new compression techniques are developed and implemented, we
will see more and more data streams being transmitted,
allowing world networks a more productive method for helping
in the formation of the next generation multimedia formats.
27
REFERENCES
S. Golomb, “Runlength Encodings,” IEEE Trans, Inform.Theory, vol. 1’1’-12, July 1996
W. Pennebaker and J. Mitchell, JPEG still Image CompressionStandard, New York, Nostrand Reinhold
Larry L. Peterson and Bruce S. Davie, Computer Networks – ASystems Approach, ISBN 1-55860-368-9, 1996
Cormen H. Thomas, Leiserson E. Charles and Rivest L. Ronald,Introduction to Algorithms 1990
Ian H. Witten, Alistair Moffat, and Timothy C. Bell –Managing Gigabytes: Compressing and Indexing Documents andImages.
Buford J.F.K. Multimedia Systems 1994 ACM Press/Addison-Wesley
Ziv, J. and A. Lempel. Compression of individual sequencesvia variable rate coding.
D.E Knuth, Dynamic Huffman Coding – Algorithms(June 1985) pg163-175
P.G. Howard & J.S. Vitter, New methods for lossless imagecompression using Arithmetic Coding, Information Processingand Management28 (1992)
X. Wu, W.K. Choi, amd N.D. Memon, Lossless interframe imagecompression, 1998 Data Compression Conference, Mar 1998.
JPEG WWW Home http://www.jpeg.org/public/jpeghomepage.htm
MPEG WWW Home http://www.cselt.it/mpeg/