d esign & a nalysis of a lgorithm 02 – h ashing (c ontd.) informatics department parahyangan...

26
DESIGN & ANALYSIS OF ALGORITHM 02 – HASHING (CONTD.) Informatics Department Parahyangan Catholic University

Upload: christine-burke

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

DESIGN & ANALYSIS OF ALGORITHM02 – HASHING (CONTD.)

Informatics Department

Parahyangan Catholic University

Page 2: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

ANALOGY

 Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ?

How about 20 red socks, 12 blue socks, and 8 green socks ?

How about unlimited # ofred, blue, green, yellow, and purple socks ?

Page 3: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

ANALOGY

In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?

Page 4: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

PIGEONHOLE PRINCIPLE

In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon.

-- wikipedian = the range of

possible keysm = the size of the hash table

Page 5: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

COLLISION When possible key range > table size, two

distinct keys k1 and k2 may be mapped to the same indexh(k1) = h(k2)

This condition is known as collision resolution strategy is requiredyellow orange red green blue black white

??

Page 6: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

COLLISION HANDLING3 STRATEGIES

Open addressing Linear probing Quadratic probing Double Hashing

Separate chaining

Coalesced hashing

Page 7: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

COLLISION HANDLINGOPEN ADDRESSING

In open addressing, a colliding entry will be placed in a new slot in the same table

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith

John Smith / 521-8976

Lisa Smith / 521-5030

Kenny Baker / 418-4165

Jane Smith / 521-1234

Kayla Newman

?

Page 8: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

COLLISION HANDLINGSEPARATE CHAINING

In separate chaining, colliding entries are stored in linked list in different area

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

Jane Smith

Kayla Newman

John Smith 521-8976

Jane Smith 521-1234

Kenny Baker 418-4165

Lisa Smith 521-5030

Kayla Newman418-4222

Page 9: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

COLLISION HANDLINGCOALESCED HASHING

Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith

John Smith / 521-8976

Lisa Smith / 521-5030

Kenny Baker / 418-4165

Jane Smith / 521-1234

Kayla Newman

Kayla Newman / 418-4222

Page 10: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

PERFORMANCE ANALYSIS

What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ?

Load Factor : what is the average number of elements stored in a slot ?

Probe Number : how many slots we need to examine before finding the empty slot ?

Page 11: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

EXAMPLE :: OPEN ADDRESSING

Load factor = 1 (because every slot only has 1 element)

Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ?

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith

John Smith / 521-8976

Lisa Smith / 521-5030

Kenny Baker / 418-4165

Jane Smith / 521-1234

Kayla Newman

Kayla Newman / 418-4222

Page 12: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

EXAMPLE :: SEPARATE CHAINING

Load factor = #of probe (because collided elements are stored in linked list)

Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ?

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

Jane Smith

Kayla Newman

John Smith 521-8976

Jane Smith 521-1234

Kenny Baker 418-4165

Lisa Smith 521-5030

Kayla Newman418-4222

What if we insert new element in the

beginning of the list ?

Page 13: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

EXAMPLE :: COALESCED HASHING

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith

John Smith / 521-8976

Lisa Smith / 521-5030

Kenny Baker / 418-4165

Jane Smith / 521-1234

Load factor = 1 (because every slot only has 1 element)

What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?

Page 14: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSING

In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number)

There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing

The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence

Page 15: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGLINEAR PROBING Define

where h`(k) is the initial hash function, and i is the probe number for key k

Example: m=13 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)

h(k,1) = (5+1) mod 13 = 6 k = 19 h’(k) = 6 (collision)

h(k,1) = (6+1) mod 13 = 7 k = 31 h’(k) = 5 (collision)

h(k,1) = (5+1) mod 13 = 6 (collision)h(k,2) = (5+2) mod 13 = 7 (collision)h(k,3) = (5+3) mod 13 = 8

mikhikh mod))`((),(

Page 16: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

Suffers from primary clustering

Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m

There are only m distinct probe sequence

OPEN ADDRESSINGLINEAR PROBING

idx Data

1 A

2 B

3 C

4 D

5 E

6 F

7

8

9

… …

Every k which h(k) between 1 and 6 will be placed in this slot

Page 17: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGQUADRATIC PROBING

Definewhere h`(k) is the initial hash function, i is the probe number for key k, and c1 & c2 are some constant

micickhikh mod))`((),( 221

Page 18: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGQUADRATIC PROBING

Example: m=13, c1=2, c2=3 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)

h(k,1) = (5+2*1+3*12) mod 13 = 10 k = 19 h’(k) = 6 k = 31 h’(k) = 5 (collision)

h(k,1) = (5+2*1+3*12) mod 13 = 10 (collision)

h(k,2) = (5+2*2+3*22) mod 13 = 8 k = 32 h’(k) = 6(collision)

h(k,1) = (6+2*1+3*12) mod 13 = 11

h(k,1) is not exactly next to h’(k), thus

avoid primary clustering problem

However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering.(again, there are only m distinct probe sequence)

Page 19: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGQUADRATIC PROBING

Observe these 2 cases where h’(k)=5 and h’(k)=6(m=13, c1=2 and c2=3)

Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5

Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6

h'(k) = 5 h'(k) = 6

probe# h(k,i)

probe# h(k,i)

1 10 1 11

2 8 2 9

3 12 3 0

4 9 4 10

5 12 5 0

6 8 6 9

7 10 7 11

8 5 8 6

9 6 9 7

10 0 10 1

11 0 11 1

12 6 12 7

13 5 13 6

This suggest that some slots might get filled with

higher probability than the others.

Page 20: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGQUADRATIC PROBING

The choice of m, c1, and c2 are important for m = 2n , a good choice is c1 = c2 = 0.5 For prime m > 2, most choice of c1 and c2 will

make h(k, i) distinct for i in [0, (M-1)/2)].

Example: m = 24 = 16, c1 = c2 = 0.5, h’(k) = 0 Probe # h(k,i) Probe # h(k,i)

0 0 8 41 1 9 132 3 10 73 6 11 24 10 12 145 15 13 116 5 14 97 12 15 8

Page 21: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGDOUBLE HASHING

Definewhere h1(k) is the initial hash function, i is the probe number for key k, and h2(k) is a different hash function than h1(k)

Two different keys a and b that initially hashed to the same location (h1(a) = h1(b)) will have a different probe sequence, since h2(a) ≠ h2(b)

mkhikhikh mod))()((),( 21

Page 22: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

OPEN ADDRESSINGDOUBLE HASHING

h2(k) must be relative prime to m

Example : Let m be the power of 2 and h2(k) always returns

an odd number Let m be prime and h2(k) always returns positive

integers less than m

There are Θ(m2) distinct probe sequence

Page 23: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

BASIC HASH TABLE OPERATION

INSERT(key, value)we have discussed this a lot

value SEARCH(key)similar to INSERT

DELETE(key)do not delete the value, mark it “deleted” instead

(why?)

In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked

list

When to stop searching ?

Page 24: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

INSERTIONIN OPEN ADDRESSING

INSERT(key, value)// returns true if key is successfully inserted// returns false otherwisei = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty or marked as deleted)

table[idx] = (key,value)return true

elsei = i+1

return false

Page 25: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

SEARCHINGIN OPEN ADDRESSING

SEARCH(key)// returns associated value if key is found// returns null otherwise

i = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty)

return null //reached an empty slot, //so key is must not be

in the hash tableelse if(table[idx] not marked as deleted AND

table[idx].key == key))return table[idx].value //key found

elsei = i+1 //try the next slot

return null //tried all m possible slots and key not found

Page 26: D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

DELETIONIN OPEN ADDRESSING

DELETE(key)// returns associated value if key is found // and successfully deleted. returns null otherwisei = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty)

return null //reached an empty slot, //so key is must not be

in the hash tableelse if((table[idx] not marked as deleted

AND table[idx].key == key))

temp = table[idx].valuemark table[idx] as deletedreturn temp

elsei = i+1 //try the next slot

return null //tried all m possible slots and key not found