![Page 1: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/1.jpg)
DESIGN & ANALYSIS OF ALGORITHM02 – HASHING (CONTD.)
Informatics Department
Parahyangan Catholic University
![Page 2: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/2.jpg)
ANALOGY
Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ?
How about 20 red socks, 12 blue socks, and 8 green socks ?
How about unlimited # ofred, blue, green, yellow, and purple socks ?
![Page 3: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/3.jpg)
ANALOGY
In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?
![Page 4: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/4.jpg)
PIGEONHOLE PRINCIPLE
In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon.
-- wikipedian = the range of
possible keysm = the size of the hash table
![Page 5: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/5.jpg)
COLLISION When possible key range > table size, two
distinct keys k1 and k2 may be mapped to the same indexh(k1) = h(k2)
This condition is known as collision resolution strategy is requiredyellow orange red green blue black white
??
![Page 6: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/6.jpg)
COLLISION HANDLING3 STRATEGIES
Open addressing Linear probing Quadratic probing Double Hashing
Separate chaining
Coalesced hashing
![Page 7: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/7.jpg)
COLLISION HANDLINGOPEN ADDRESSING
In open addressing, a colliding entry will be placed in a new slot in the same table
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
N (78)
Jane Smith
John Smith / 521-8976
Lisa Smith / 521-5030
Kenny Baker / 418-4165
Jane Smith / 521-1234
Kayla Newman
?
![Page 8: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/8.jpg)
COLLISION HANDLINGSEPARATE CHAINING
In separate chaining, colliding entries are stored in linked list in different area
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
Jane Smith
Kayla Newman
John Smith 521-8976
Jane Smith 521-1234
Kenny Baker 418-4165
Lisa Smith 521-5030
Kayla Newman418-4222
![Page 9: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/9.jpg)
COLLISION HANDLINGCOALESCED HASHING
Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
N (78)
Jane Smith
John Smith / 521-8976
Lisa Smith / 521-5030
Kenny Baker / 418-4165
Jane Smith / 521-1234
Kayla Newman
Kayla Newman / 418-4222
![Page 10: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/10.jpg)
PERFORMANCE ANALYSIS
What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ?
Load Factor : what is the average number of elements stored in a slot ?
Probe Number : how many slots we need to examine before finding the empty slot ?
![Page 11: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/11.jpg)
EXAMPLE :: OPEN ADDRESSING
Load factor = 1 (because every slot only has 1 element)
Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ?
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
N (78)
Jane Smith
John Smith / 521-8976
Lisa Smith / 521-5030
Kenny Baker / 418-4165
Jane Smith / 521-1234
Kayla Newman
Kayla Newman / 418-4222
![Page 12: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/12.jpg)
EXAMPLE :: SEPARATE CHAINING
Load factor = #of probe (because collided elements are stored in linked list)
Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ?
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
Jane Smith
Kayla Newman
John Smith 521-8976
Jane Smith 521-1234
Kenny Baker 418-4165
Lisa Smith 521-5030
Kayla Newman418-4222
What if we insert new element in the
beginning of the list ?
![Page 13: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/13.jpg)
EXAMPLE :: COALESCED HASHING
John Smith
Lisa Smith
Kenny Baker
J (74)
K (75)
L (76)
M (77)
N (78)
Jane Smith
John Smith / 521-8976
Lisa Smith / 521-5030
Kenny Baker / 418-4165
Jane Smith / 521-1234
Load factor = 1 (because every slot only has 1 element)
What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?
![Page 14: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/14.jpg)
OPEN ADDRESSING
In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number)
There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing
The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence
![Page 15: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/15.jpg)
OPEN ADDRESSINGLINEAR PROBING Define
where h`(k) is the initial hash function, and i is the probe number for key k
Example: m=13 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)
h(k,1) = (5+1) mod 13 = 6 k = 19 h’(k) = 6 (collision)
h(k,1) = (6+1) mod 13 = 7 k = 31 h’(k) = 5 (collision)
h(k,1) = (5+1) mod 13 = 6 (collision)h(k,2) = (5+2) mod 13 = 7 (collision)h(k,3) = (5+3) mod 13 = 8
mikhikh mod))`((),(
![Page 16: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/16.jpg)
Suffers from primary clustering
Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m
There are only m distinct probe sequence
OPEN ADDRESSINGLINEAR PROBING
idx Data
1 A
2 B
3 C
4 D
5 E
6 F
7
8
9
… …
Every k which h(k) between 1 and 6 will be placed in this slot
![Page 17: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/17.jpg)
OPEN ADDRESSINGQUADRATIC PROBING
Definewhere h`(k) is the initial hash function, i is the probe number for key k, and c1 & c2 are some constant
micickhikh mod))`((),( 221
![Page 18: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/18.jpg)
OPEN ADDRESSINGQUADRATIC PROBING
Example: m=13, c1=2, c2=3 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)
h(k,1) = (5+2*1+3*12) mod 13 = 10 k = 19 h’(k) = 6 k = 31 h’(k) = 5 (collision)
h(k,1) = (5+2*1+3*12) mod 13 = 10 (collision)
h(k,2) = (5+2*2+3*22) mod 13 = 8 k = 32 h’(k) = 6(collision)
h(k,1) = (6+2*1+3*12) mod 13 = 11
h(k,1) is not exactly next to h’(k), thus
avoid primary clustering problem
However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering.(again, there are only m distinct probe sequence)
![Page 19: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/19.jpg)
OPEN ADDRESSINGQUADRATIC PROBING
Observe these 2 cases where h’(k)=5 and h’(k)=6(m=13, c1=2 and c2=3)
Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5
Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6
h'(k) = 5 h'(k) = 6
probe# h(k,i)
probe# h(k,i)
1 10 1 11
2 8 2 9
3 12 3 0
4 9 4 10
5 12 5 0
6 8 6 9
7 10 7 11
8 5 8 6
9 6 9 7
10 0 10 1
11 0 11 1
12 6 12 7
13 5 13 6
This suggest that some slots might get filled with
higher probability than the others.
![Page 20: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/20.jpg)
OPEN ADDRESSINGQUADRATIC PROBING
The choice of m, c1, and c2 are important for m = 2n , a good choice is c1 = c2 = 0.5 For prime m > 2, most choice of c1 and c2 will
make h(k, i) distinct for i in [0, (M-1)/2)].
Example: m = 24 = 16, c1 = c2 = 0.5, h’(k) = 0 Probe # h(k,i) Probe # h(k,i)
0 0 8 41 1 9 132 3 10 73 6 11 24 10 12 145 15 13 116 5 14 97 12 15 8
![Page 21: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/21.jpg)
OPEN ADDRESSINGDOUBLE HASHING
Definewhere h1(k) is the initial hash function, i is the probe number for key k, and h2(k) is a different hash function than h1(k)
Two different keys a and b that initially hashed to the same location (h1(a) = h1(b)) will have a different probe sequence, since h2(a) ≠ h2(b)
mkhikhikh mod))()((),( 21
![Page 22: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/22.jpg)
OPEN ADDRESSINGDOUBLE HASHING
h2(k) must be relative prime to m
Example : Let m be the power of 2 and h2(k) always returns
an odd number Let m be prime and h2(k) always returns positive
integers less than m
There are Θ(m2) distinct probe sequence
![Page 23: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/23.jpg)
BASIC HASH TABLE OPERATION
INSERT(key, value)we have discussed this a lot
value SEARCH(key)similar to INSERT
DELETE(key)do not delete the value, mark it “deleted” instead
(why?)
In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked
list
When to stop searching ?
![Page 24: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/24.jpg)
INSERTIONIN OPEN ADDRESSING
INSERT(key, value)// returns true if key is successfully inserted// returns false otherwisei = 0while(i < m)
idx = HASHFUNCTION(key, i)if(table[idx] is empty or marked as deleted)
table[idx] = (key,value)return true
elsei = i+1
return false
![Page 25: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/25.jpg)
SEARCHINGIN OPEN ADDRESSING
SEARCH(key)// returns associated value if key is found// returns null otherwise
i = 0while(i < m)
idx = HASHFUNCTION(key, i)if(table[idx] is empty)
return null //reached an empty slot, //so key is must not be
in the hash tableelse if(table[idx] not marked as deleted AND
table[idx].key == key))return table[idx].value //key found
elsei = i+1 //try the next slot
return null //tried all m possible slots and key not found
![Page 26: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649e915503460f94b95ec2/html5/thumbnails/26.jpg)
DELETIONIN OPEN ADDRESSING
DELETE(key)// returns associated value if key is found // and successfully deleted. returns null otherwisei = 0while(i < m)
idx = HASHFUNCTION(key, i)if(table[idx] is empty)
return null //reached an empty slot, //so key is must not be
in the hash tableelse if((table[idx] not marked as deleted
AND table[idx].key == key))
temp = table[idx].valuemark table[idx] as deletedreturn temp
elsei = i+1 //try the next slot
return null //tried all m possible slots and key not found