the amount of data we deal with is getting larger not only do larger files require more disk space,...

11
Compression

Upload: isabel-green

Post on 31-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Compression

Page 2: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Data Compression

The amount of data we deal with is getting larger

Not only do larger files require more disk space, they take longer to transmit

Many times files are compressed to save space or for faster transmission

Page 3: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Run Length Encoding

The simplest type of redundancy in a file is long runs of repeated characters AAAABBBAABBBBBCCCCCCCC

This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count 4A3B2A5B8C

For binary files a refined version of this method can yield dramatic savings

Page 4: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Variable Length Encoding Suppose we wish to encode

ABRACADABRA

Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? A = 000 000 001 100 000 010 000 011 000

001 100 000 B = 001 C = 010 D = 011 R = 100

Page 5: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

We Can Do Better

Why use the same number of bits for each letter? A = 0 0 1 11 0 01 0 10 0 1 11 0 B = 1 C = 01 D = 10 R = 11

This is not really a code because it depends on the blanks 011100101001110

Page 6: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Lets Use a Different Code A slightly different code

A = 1 B = 010 C = 000 D = 001 R = 011

Can you decode this without the blanks? 0001010

Page 7: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Lets Re-order

A slightly different code A = 1 C = 000 D = 001 B = 010 R = 011

Why can you decode without having the blanks?

Page 8: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Combining Bits

A (5) = 1 C (1) = 000 D (1) = 001 B (2) = 010 R (2) = 011

What do you notice about the number of bits used to represent each character?

A0

0 0

1

11

C D B R

0 1

Page 9: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Huffman Coding

The general method for finding this code was developed by D. Huffman in 1952

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code

The most common source symbols using shorter strings of bits than are used for less common source symbols

Used in many compression programs

Page 10: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

How Does It Work?

Start with your text GO GO TIGERS

Build a frequency table

Character

Frequency

G 3

O 2

<SP> 2

T 1

I 1

E 1

R 1

S 1

Page 11: The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files

Build a Tree

Create a tree using two of the characters that appear least often

Merge them in the table

Repeat until everything is merged