hashing1 hashing it’s not just for breakfast anymore!

31
hashing 1 Hashing Hashing It’s not just for breakfast anymore!

Post on 20-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: hashing1 Hashing It’s not just for breakfast anymore!

hashing 1

HashingHashing

It’s not just for breakfast anymore!

Page 2: hashing1 Hashing It’s not just for breakfast anymore!

hashing 2

Hashing: the factsHashing: the facts

• Approach that involves both storing and searching for values

• Behavior is linear in the worst case, but strong competitor with binary searching in the average case

• Hashing makes it easy to add and delete elements, an advantage over binary search (since the latter requires sorted array)

Page 3: hashing1 Hashing It’s not just for breakfast anymore!

hashing 3

Dictionary ADTDictionary ADT

• Previously, we have seen a dictionary ADT implemented as a binary search tree

• A hash table can be used to provide an array-based dictionary implementation

• Abstract properties of dictionary:– every item has a key– to retrieve an item, specify key and retrieval

process fetches associated data

Page 4: hashing1 Hashing It’s not just for breakfast anymore!

hashing 4

Setting up the arraySetting up the array

• One approach to an array-based dictionary would be to create consecutive keys, storing the records so that each key corresponds to its index -- this is the method used in MS Access, for example

• An alternative would be to use an existing attribute of the data to be stored as the key value; this approach is more typical of hashing

Page 5: hashing1 Hashing It’s not just for breakfast anymore!

hashing 5

Setting up the arraySetting up the array

• Use of existing key field presents challenges:– Value may be too large for indexing: e.g. social

security number– No guarantee that individual values will be

close enough together for effective indexing: e.g. last 4 digits of social security numbers of students in a class

Page 6: hashing1 Hashing It’s not just for breakfast anymore!

hashing 6

Solution: hashingSolution: hashing

• Instead of direct use of data field, a function is applied to the original value to produce a valid index: this is called the hash function

• The hash function maps the key to an index that can be used to insert data into the array or to retrieve data based on a given key

• An array that uses hashing for indexing is called a hash table

Page 7: hashing1 Hashing It’s not just for breakfast anymore!

hashing 7

Operations on a hash tableOperations on a hash table

• Inserting an item– calculate hash value (index) from item key– check index to determine if space is open

• if open, insert item

• if not open, collision occurs; search through array for next open slot

– requires some mechanism for recognizing an empty space; can’t just start with uninitialized array

Page 8: hashing1 Hashing It’s not just for breakfast anymore!

hashing 8

Open-address hashingOpen-address hashing

• The insertion scheme just described uses open-address hashing

• In open addressing, collisions are resolved by placing a new item in the next open spot in the array

• Scheme requires that the key field of each array element be initialized to some known value; -1, for example

Page 9: hashing1 Hashing It’s not just for breakfast anymore!

hashing 9

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

• In order to insert a new record, the key must somehow be converted to an array index.

• The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

Page 10: hashing1 Hashing It’s not just for breakfast anymore!

hashing 10

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

• Typical hash function – 701 is the number of items in the

array

– Number is the original key value

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ? 3

Page 11: hashing1 Hashing It’s not just for breakfast anymore!

hashing 11

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record• The hash value is used for

the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

[3]

Page 12: hashing1 Hashing It’s not just for breakfast anymore!

hashing 12

CollisionsCollisionsCollisionsCollisions• Here is another new record

to insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

My hashvalue is [2].

Page 13: hashing1 Hashing It’s not just for breakfast anymore!

hashing 13

CollisionsCollisionsCollisionsCollisions• This is called a collision,

because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 14: hashing1 Hashing It’s not just for breakfast anymore!

hashing 14

CollisionsCollisionsCollisionsCollisions

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

The new record goesin the empty spot.

The new record goesin the empty spot.

Page 15: hashing1 Hashing It’s not just for breakfast anymore!

hashing 15

Operations on a hash tableOperations on a hash table

• Retrieving an item– calculate hash value based on desired key– search array, beginning at calculated index, for

desired data– search is finished when:

• item is found; successful search

• an empty index is encountered; unsuccessful search

Page 16: hashing1 Hashing It’s not just for breakfast anymore!

hashing 16

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• The data that's attached to a

key can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

Page 17: hashing1 Hashing It’s not just for breakfast anymore!

hashing 17

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• Calculate the hash value.• Check that location of the array

for the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Not me.

Number 701466868

My hashvalue is [2].

Page 18: hashing1 Hashing It’s not just for breakfast anymore!

hashing 18

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• Keep moving forward until you

find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Not me.

Number 701466868

My hashvalue is [2].

Not me. Yes!

Page 19: hashing1 Hashing It’s not just for breakfast anymore!

hashing 19

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• When the item is found, the

information can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Page 20: hashing1 Hashing It’s not just for breakfast anymore!

hashing 20

Operations on a hash tableOperations on a hash table

• Deleting an item:– find index based on hashed key, as with

insertion and retrieval– mark record at index to indicate the spot is open

• can’t use ordinary “empty” designation -- this could interfere with record retrieval

• use alternative “open” designation: indicate the slot is open for insertion, but won’t stop a search

Page 21: hashing1 Hashing It’s not just for breakfast anymore!

hashing 21

Deleting a RecordDeleting a RecordDeleting a RecordDeleting a Record• Records may also be deleted from a hash table.• But the location must not be left as an ordinary

"empty spot" since that could interfere with searches.

• The location must be marked in some special way so that a search can tell that the spot used to have something in it.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685 Number 701466868

Pleasedelete me.

Page 22: hashing1 Hashing It’s not just for breakfast anymore!

Hash tables & Java: a breakfast Hash tables & Java: a breakfast classicclassic

• Java classes have built-in support for hash tables; all classes inherit the hashcode() method from Object

• The hashcode() method returns a pseudorandom number that can be used as input to a hash method

• One caution: if you define an equals() method for a new class, you should also override the hashcode method – this is because equal objects must have equal hashcodes

hashing 22

Page 23: hashing1 Hashing It’s not just for breakfast anymore!

A Table class implementationA Table class implementation

hashing 23

public class Table {private int manyItems;private Object[ ] data; // hash table dataprivate Object[ ] keys;// array of keys parallel to data arrayprivate boolean[ ] hasBeenUsed; // another parallel array used to indicate the status// of each index – if not currently in use, but has// been used previously, an index should not stop// a search operation

Page 24: hashing1 Hashing It’s not just for breakfast anymore!

ConstructorConstructor

hashing 24

public Table(int capacity) { if (capacity <= 0) throw new IllegalArgumentException("Capacity is negative"); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

Page 25: hashing1 Hashing It’s not just for breakfast anymore!

The put method (adds item to The put method (adds item to table)table)

hashing 25

public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if (index != -1) { // The key is already in the table. answer = data[index]; data[index] = element; return answer; }

Page 26: hashing1 Hashing It’s not just for breakfast anymore!

put method continuedput method continued

hashing 26

else if (manyItems < data.length) { // key is not yet in Table. index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else { // The table is full. throw new IllegalStateException("Table is full."); }}

Page 27: hashing1 Hashing It’s not just for breakfast anymore!

remove methodremove method

hashing 27

public Object remove(Object key) { int index = findIndex(key); Object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer;}

Page 28: hashing1 Hashing It’s not just for breakfast anymore!

findIndex methodfindIndex method

hashing 28

private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); } return -1;}

Page 29: hashing1 Hashing It’s not just for breakfast anymore!

nextIndex & containsKey nextIndex & containsKey methodsmethods

hashing 29

private int nextIndex(int i) { if (i+1 == data.length) return 0; else return i+1;}

public boolean containsKey(Object key) { return findIndex(key) != -1; }

Page 30: hashing1 Hashing It’s not just for breakfast anymore!

hash methodhash method

hashing 30

private int hash(Object key) { return Math.abs(key.hashCode( )) % data.length; }

Page 31: hashing1 Hashing It’s not just for breakfast anymore!

Java’s HashTable classJava’s HashTable class

• The Java API contains a HashTable class in the java.util package

• Unlike the implementation just presented, the API HashTable grows automatically when it approaches capacity

• This has important performance issues; when the array grows, the entire table must be rehashed

hashing 31