hashing cs 105. hashing slide 2 hashing - introduction in a dictionary, if it can be arranged such...

Hashing

CS 105

HashingSlide 2

Hashing - Introduction

In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast

Example: Empdata[1000], index = employee ID number search for employee with emp. number = 500 return: Empdata[500] Running Time: O(1)

HashingSlide 3

Hash table

Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes

Insert and find operations involve straightforward array accesses: O(1) time complexity

HashingSlide 4

About hash tables

In the first example shown, it was relatively easy since employee number is an integer

Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical Need to map large integer values to smaller

array indexes Problem 2: what if the key is a word in the

English Alphabet (e.g. last names)? Need to map names to integers (indexes)

HashingSlide 5

Large numbers -> small numbers

Hash function - converts a number from a large range into a number from a smaller range (the range of array indices)

Size of array Rule of thumb: the array size should be

about twice the size of the data set (2s) for 50,000 words, use an array of 100,000

elements

HashingSlide 6

Hash function and modulo

Simplest hash function - achieved by using the modulo function (returns the remainder) for example, 33 % 10 = 3 General formula:

LargeNumber % Smallrange

HashingSlide 7

Hash functions for names

Sum of Digits Method map the alphabet A-Z to the numbers 1 to

26 (a=1,b=2,c=3,etc.) add the total of the letters For example, “cats”

(c=3,a=1,t=20,s=19) 3+1+20+19=43

”cats” will be stored using index = 43 Can use modulo operation (%) if you

need to map to a smaller array

HashingSlide 8

Collisions

Problem Too many words with the same index “was”,”tin”,”give”,”tend”,”moan”,”tick” and

several other words add to 43 These are called collisions

(case where two different search keys hash to the same index value)

Can occur even when dealing with integers Suppose the size of the hash table is 100 Keys 158 and 358 hash to the same value when

using the modulo hash function

HashingSlide 9

Collision resolution policy Need to know what to do when a collision

occurs; i.e., during an insert operation, what if the array slot is already occupied?

Most common policy: go to the next available slot “Wrap around” the array if necessary

Consequence: when searching, use the hash function but first check whether the element is the one you are looking for. If not try the next slots. How do you know if the element is not in the

array?

HashingSlide 10

Probe sequence Sequence of indexes that serve as array slots

where a key value would map to The first index in the probe sequence is the

home position, the value of the hash function. The next indexes are the alternative slots

Example: suppose the array size is 10, and the hash function is h(K) = K%10. The probe sequence for K=25 is: 5, 6, 7, 8, 9, 0, 1, 2, 3, 4 Here, we assume the most common collision

resolution policy of going to the next slot:p(K,i) = i,

Goal: probe sequence should exhaust array slots

HashingSlide 11

Recap: hash table operations Insert object Obj with key value K

home <- h(K)for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error: duplicate record” // alternative: overwrite else if HT[pos] is null then HT[pos] <- Obj break;

Finding an object with key value K home <- h(K)

for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”

HashingSlide 12

Hash table operations

Note: although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n)

Something to think about: characterize the worst-case scenarios for insert and find

HashingSlide 13

Removing elements Removing an element from a hash table

during a delete operation poses a problem If we set the corresponding hash table entry

to null, then succeeding find operations might not work properly Recall that for the find algorithm, seeing a null

means a target element is not found but in fact the element might be in a next slot

Solution: tombstone Arrange it so that deleted entries seem null when

inserting, but don’t seem null when searching Requires a simple flag on the objects stored

HashingSlide 14

Hash tables in Java

java.util.Hashtable Important methods for the Hashtable

class put(Object key, Object entry) Object get(Object key) remove(Object key) boolean containsKey(Object key)

HashingSlide 15

Summary

Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations Caveat: O(n) in the worst-case because of the

possibility of collisions Requires a hash function (maps keys to

array indices) and a collision resolution policy Probe sequence depicts a sequence of array

slots that an object would occupy, given its key In Java: use the Hashtable class

hashing cs 105. hashing slide 2 hashing - introduction in a dictionary, if it can be arranged such...

Documents