hashing cs 105. hashing slide 2 hashing - introduction in a dictionary, if it can be arranged such...
TRANSCRIPT
Hashing
CS 105
HashingSlide 2
Hashing - Introduction
In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast
Example: Empdata[1000], index = employee ID number search for employee with emp. number = 500 return: Empdata[500] Running Time: O(1)
HashingSlide 3
Hash table
Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes
Insert and find operations involve straightforward array accesses: O(1) time complexity
HashingSlide 4
About hash tables
In the first example shown, it was relatively easy since employee number is an integer
Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical Need to map large integer values to smaller
array indexes Problem 2: what if the key is a word in the
English Alphabet (e.g. last names)? Need to map names to integers (indexes)
HashingSlide 5
Large numbers -> small numbers
Hash function - converts a number from a large range into a number from a smaller range (the range of array indices)
Size of array Rule of thumb: the array size should be
about twice the size of the data set (2s) for 50,000 words, use an array of 100,000
elements
HashingSlide 6
Hash function and modulo
Simplest hash function - achieved by using the modulo function (returns the remainder) for example, 33 % 10 = 3 General formula:
LargeNumber % Smallrange
HashingSlide 7
Hash functions for names
Sum of Digits Method map the alphabet A-Z to the numbers 1 to
26 (a=1,b=2,c=3,etc.) add the total of the letters For example, “cats”
(c=3,a=1,t=20,s=19) 3+1+20+19=43
”cats” will be stored using index = 43 Can use modulo operation (%) if you
need to map to a smaller array
HashingSlide 8
Collisions
Problem Too many words with the same index “was”,”tin”,”give”,”tend”,”moan”,”tick” and
several other words add to 43 These are called collisions
(case where two different search keys hash to the same index value)
Can occur even when dealing with integers Suppose the size of the hash table is 100 Keys 158 and 358 hash to the same value when
using the modulo hash function
HashingSlide 9
Collision resolution policy Need to know what to do when a collision
occurs; i.e., during an insert operation, what if the array slot is already occupied?
Most common policy: go to the next available slot “Wrap around” the array if necessary
Consequence: when searching, use the hash function but first check whether the element is the one you are looking for. If not try the next slots. How do you know if the element is not in the
array?
HashingSlide 10
Probe sequence Sequence of indexes that serve as array slots
where a key value would map to The first index in the probe sequence is the
home position, the value of the hash function. The next indexes are the alternative slots
Example: suppose the array size is 10, and the hash function is h(K) = K%10. The probe sequence for K=25 is: 5, 6, 7, 8, 9, 0, 1, 2, 3, 4 Here, we assume the most common collision
resolution policy of going to the next slot:p(K,i) = i,
Goal: probe sequence should exhaust array slots
HashingSlide 11
Recap: hash table operations Insert object Obj with key value K
home <- h(K)for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error: duplicate record” // alternative: overwrite else if HT[pos] is null then HT[pos] <- Obj break;
Finding an object with key value K home <- h(K)
for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”
HashingSlide 12
Hash table operations
Note: although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n)
Something to think about: characterize the worst-case scenarios for insert and find
HashingSlide 13
Removing elements Removing an element from a hash table
during a delete operation poses a problem If we set the corresponding hash table entry
to null, then succeeding find operations might not work properly Recall that for the find algorithm, seeing a null
means a target element is not found but in fact the element might be in a next slot
Solution: tombstone Arrange it so that deleted entries seem null when
inserting, but don’t seem null when searching Requires a simple flag on the objects stored
HashingSlide 14
Hash tables in Java
java.util.Hashtable Important methods for the Hashtable
class put(Object key, Object entry) Object get(Object key) remove(Object key) boolean containsKey(Object key)
HashingSlide 15
Summary
Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations Caveat: O(n) in the worst-case because of the
possibility of collisions Requires a hash function (maps keys to
array indices) and a collision resolution policy Probe sequence depicts a sequence of array
slots that an object would occupy, given its key In Java: use the Hashtable class