compsci201 huffman coding and more
TRANSCRIPT
![Page 1: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/1.jpg)
CompSci 201Huffman Coding and More
Owen AstrachanJeff Forbes
December 1, 2018
12/1/17 CompSci 201, Fall 2017, Huff and More 1
![Page 2: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/2.jpg)
W is for …• World Wide Web
• www.totallyfun.org• Wiki
• Every person’s way to share• Wifi
• We need this everyday• Windows
• From OS to …
12/1/17 CompSci 201, Fall 2017, Huff and More 2
![Page 3: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/3.jpg)
PFTD• Review of Huffman Compression
• Key aspects of compression at a high-level• Decompression with more detail
• Looking at bits and information• What's in a .jpeg compared to .mp3
• WOTO and finishing the semester in 201
12/1/17 CompSci 201, Fall 2017, Huff and More 3
![Page 4: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/4.jpg)
A tale of two disks• 10Mb for $2,990 in 1981• 64GB for $29.90 in 2018
12/1/17 CompSci 201, Fall 2017, Huff and More 4
![Page 5: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/5.jpg)
Lossy v Lossless Compressoin• RAW format compared to JPEG format
• Tradeoffs – another example of "it depends"
• Why do you ZIP files/folders?• Upload to Dropbox/Google Drive
• What are advantages of MP3• You were 0-3 years old
12/1/17 CompSci 201, Fall 2017, Huff and More 5
![Page 6: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/6.jpg)
Huffman is Optimal• We create an encoding for each 8-bit character
• Can’t do better than this on per-character basis
• Normally ‘A’ is 65 and ‘Z’ is 90 (ASCII/Unicode)• A is 01000001 and Z is 01011010• Why does this make sense? 8- or 16-bit/char• Why doesn’t this make sense?
12/1/17 CompSci 201, Fall 2017, Huff and More 6
![Page 7: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/7.jpg)
Leveraging Redundancy• If there are 1,000 “A” and 10 “Z” characters …
• Use fewer bits for “A” and more bits for “Z”
• Huffman treats all A’s equally, no context • A as first letter in a file is the same as last letter
• Other compression techniques can do better• Faster and better compression, more complex
12/1/17 CompSci 201, Fall 2017, Huff and More 7
![Page 8: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/8.jpg)
Summary of Huff Compress• Count how many times every character occurs
• Character is 8-bit “chunk”, use .readBits(8)
• Create a Huffman Trie/Tree, greedy algorithm PQ• Infrequent chars are far away from root• Frequent chars are close to root
• Create encodings from trie to write compressed file• Reset/reread file, look up encoding, write out
12/1/17 CompSci 201, Fall 2017, Huff and More 8
![Page 9: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/9.jpg)
Starting to code… what first?• If you write compress first, how to test
• The bits written aren't "readable" as is• Shadow-print.writeBits with .println?
• If you write decompress first, how to test?• Until you've got a compressed file …• We'll provide several compressed files!
12/1/17 CompSci 201, Fall 2017, Huff and More 9
![Page 10: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/10.jpg)
Huffman Trie/Tree
12/1/17 CompSci 201, Fall 2017, Huff and More 10
SPACE
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
![Page 11: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/11.jpg)
From counts to trie via PQ• After counting every 8-bit chunk # occurrences, create the Trie
• Greedy approach: frequencies[x]= # x's in file
PriorityQueue<HuffNode> forest = new PriorityQueue<>();for (int i = 0; i < 256; i++)
if (frequencies[i] > 0) // computed elsewhereforest.add(new HuffNode(i, frequencies[i]));
while (forest.size() > 1) {HuffNode left = forest.remove();HuffNode right = forest.remove();forest.add(new HuffNode(-1, left.weight()+right.weight(),
left, right));}HuffNode root = forest.remove();
![Page 12: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/12.jpg)
Trie used to encode & decode• Compress: create encodings for each char/leaf
• Similar to LeafTrails APT• Each 8-bit chunk/char mapped to encoding,
e.g., in an array with codings[‘A’] == “010101”
• Decompress: Use trie/tree to decompress bits• Trie unique to each file, part of compressed file
• Compress: write trie, Decompress: read trie
12/1/17 CompSci 201, Fall 2017, Huff and More 12
![Page 13: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/13.jpg)
Compressing kjv10.txtEncodingLength
#valueswiththislength
3 1,159,1244 1,487,4715 712,3256 485,3337 261,6118 84,1079 81,46710 48,01911 21,06512 1,863
EncodingLength
#valueswiththislength
13 1,10814 66415 47616 22517 7118 4419 2220 1121 322 623 6
![Page 14: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/14.jpg)
Uncompression with Huffman• We need the trie to uncompress
• 000100100010011001101111
• As we read a bit, what do we do?• Go left on 0, go right on 1• When do we stop? What to do?
• How do we get the trie?• Could store 256 counts, use same code• Could store trie: read and write
12/1/17 CompSci 201, Fall 2017, Huff and More 14
![Page 15: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/15.jpg)
Reading and Writing Huff Trie• Similar to concept/techniques in Tree APTs
• Distinguish interior and leaf nodes• In huff we label with 0 and 1 respectively• In Tree APT we store "null" explicitly
• 8 4 x 6 x x 12 10 x x 15 x x• Number? Read two subtrees• X ? Return null, no recursion• 8 [ 4 x 6 x x] [12 10 x x 15 x x]• 12 [10 x x] [15 x x]
12/1/17 CompSci 201, Fall 2017, Huff and More 15
![Page 16: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/16.jpg)
Huff WOTO
http://bit.ly/201f17-huff-2
• How does decompress have access to Trie?• When does decompressing stop, how many
bits are written?
12/1/17 CompSci 201, Fall 2017, Huff and More 16
![Page 17: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/17.jpg)
Anita borg: 1949-2003
12/1/17 CompSci 201, Fall 2017, Huff and More 17
“Dr. Anita Borg tenaciously envisioned and set about to change the world for women and for technology. … she fought tirelessly for the development technology with positive social and human impact.”
“Anita Borg sought to revolutionize the world and the way we think about technology and its impact on our lives.”
http://www.youtube.com/watch?v=1yPxd5jqz_Q
![Page 18: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/18.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
00000100001001101
12/1/17 CompSci 201, Fall 2017, Huff and More 18
![Page 19: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/19.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
0000100001001101
G12/1/17 CompSci 201, Fall 2017, Huff and More 19
![Page 20: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/20.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
000100001001101
G12/1/17 CompSci 201, Fall 2017, Huff and More 20
![Page 21: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/21.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
00100001001101
G12/1/17 CompSci 201, Fall 2017, Huff and More 21
![Page 22: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/22.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
0100001001101
G12/1/17 CompSci 201, Fall 2017, Huff and More 22
![Page 23: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/23.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
100001001101
G12/1/17 CompSci 201, Fall 2017, Huff and More 23
![Page 24: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/24.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
00001001101
GO12/1/17 CompSci 201, Fall 2017, Huff and More 24
![Page 25: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/25.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
0001001101
GO12/1/17 CompSci 201, Fall 2017, Huff and More 25
![Page 26: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/26.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
001001101
GO12/1/17 CompSci 201, Fall 2017, Huff and More 26
![Page 27: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/27.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
01001101
GO12/1/17 CompSci 201, Fall 2017, Huff and More 27
![Page 28: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/28.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
1001101
GO12/1/17 CompSci 201, Fall 2017, Huff and More 28
![Page 29: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/29.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
001101
GOO12/1/17 CompSci 201, Fall 2017, Huff and More 29
![Page 30: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/30.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
01101
GOO12/1/17 CompSci 201, Fall 2017, Huff and More 30
![Page 31: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/31.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
1101
GOO12/1/17 CompSci 201, Fall 2017, Huff and More 31
![Page 32: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/32.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
101
GOO12/1/17 CompSci 201, Fall 2017, Huff and More 32
![Page 33: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/33.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
01
GOO12/1/17 CompSci 201, Fall 2017, Huff and More 33
![Page 34: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/34.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
1
GOOD12/1/17 CompSci 201, Fall 2017, Huff and More 34
![Page 35: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/35.jpg)
Decoding a message
11
6
I5
N5
E
1
F1
C1
P2
U
2
R2
L2
D2
G3
O3
T
3
B3
A4
M4
S2 3
445
68
6
8
16
10
21
11
12
2337
60
01100000100001001101
GOODhttp://bit.ly/201-f17-1129-012/1/17 CompSci 201, Fall 2017, Huff and More 35
![Page 36: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/36.jpg)
How to Interpret Bits• What can we tell from file extensions
• Foo.class, bar.jpg, file.txt, coolness.mp3• How does OS know how to open these?
12/1/17 CompSci 201, Fall 2017, Huff and More 36
0000000: cafe babe 0000 0034 001d 0a00 0600 0f09 .......4........0000010: 0010 0011 0800 120a 0013 0014 0700 1507 ................0000020: 0016 0100 063c 696e 6974 3e01 0003 2829 .....<init>...()
0000000: ffd8 ffe0 0010 4a46 4946 0001 0200 0064 ......JFIF.....d0000010: 0064 0000 ffec 0011 4475 636b 7900 0100 .d......Ducky...0000020: 0400 0000 5d00 00ff ee00 0e41 646f 6265 ....]......Adobe
0000000: 4944 3303 0000 0000 0048 5458 5858 0000 ID3......HTXXX..0000010: 001a 0000 0045 6e63 6f64 6564 2062 7900 .....Encoded by.0000020: 4d79 7374 6572 7920 4d65 7468 6f64 5452 Mystery MethodTR
![Page 37: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/37.jpg)
Bits in a .class file• Does JVM read bit-by-bit? By symbol?
• Consider file in Hex or Binary, does it matter?• Compare Foo.java to Foo.class
12/1/17 CompSci 201, Fall 2017, Huff and More 37
0000000: cafe babe 0000 0034 001d 0a00 0600 0f09 .......4........0000010: 0010 0011 0800 120a 0013 0014 0700 1507 ................0000020: 0016 0100 063c 696e 6974 3e01 0003 2829 .....<init>...()
0000000: 11001010 11111110 10111010 10111110 00000000 00000000 ......0000006: 00000000 00110100 00000000 00011101 00001010 00000000 .4....)
![Page 38: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/38.jpg)
PicassoGuernica.jpg• Viewed using "open .." and via "xxd .."
• Wikimedia "knows" how to display?
12/1/17 CompSci 201, Fall 2017, Huff and More 38
0000000: ffd8 ffe0 0010 4a46 4946 0001 0100 0001 ...JFIF......0000010: 0001 0000 ffdb 0043 0008 0606 0706 0508 ....C........0000020: 0707 0709 0908 0a0c 140d 0c0b 0b0c 1912 .............
![Page 39: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/39.jpg)
Limits of Compression• How many values represented with 3 bits?
• 000, 001, 010, 011, 100, 101, 110, 111• How many values represented with N bits? 2N
• Can we compress all of these? Suppose N = 10• 2 1-bit files, 4 2-bit files, … 512 9-bit files• How many is this in total?
• Is this about lossy or lossless compression
12/1/17 CompSci 201, Fall 2017, Huff and More 39
![Page 40: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/40.jpg)
Measuring Information• Original Huff explanation at Duke used example:
• Compress "go go gophers"
12/1/17 CompSci 201, Fall 2017, Huff and More 40
ASCII 3 bitsg 103 1100111 000 00o 111 1101111 001 01p 112 1110000 010 1100h 104 1101000 011 1101e 101 1100101 100 1110r 114 1110010 101 1111s 115 1110011 110 100sp. 32 1000000 111 101
3
2
p1
h1
2
e1
r1
4
s1
*2
7
g3
o3
6
13
![Page 41: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/41.jpg)
Autocomplete meets Huff
12/1/17 CompSci 201, Fall 2017, Huff and More 41
![Page 42: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/42.jpg)
Autocomplete meets Huff
12/1/17 CompSci 201, Fall 2017, Huff and More 42
![Page 43: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/43.jpg)
Autocomplete meets Huff
12/1/17 CompSci 201, Fall 2017, Huff and More 43
![Page 44: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/44.jpg)
Autocomplete meets Huff
12/1/17 CompSci 201, Fall 2017, Huff and More 44
![Page 45: CompSci201 Huffman Coding and More](https://reader030.vdocuments.mx/reader030/viewer/2022040511/624a22034784887d0d514b8c/html5/thumbnails/45.jpg)
YAHW
http://bit.ly/201f17-huff-3
12/1/17 CompSci 201, Fall 2017, Huff and More 45