hashing o(1) data access (almost) -access, insertion, deletion, updating in constant time (on...
DESCRIPTION
Computed access to array data e.g. array of objects arr location in memory address of arr[12]: * 12 =TRANSCRIPT
![Page 1: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/1.jpg)
Hashing
O(1) data access (almost)-access, insertion, deletion, updating in constant time (on average) but at a price…references:Weiss,Goodrich & Tamassia,Main
“associative memory”
![Page 2: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/2.jpg)
Access to data O(n) - linked list, array O(log n) – sorted array, search tree O(1) – array by indexindex access is O(1) because data
location is found by computation, not search
![Page 3: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/3.jpg)
Computed access to array datae.g. array of objects
arr8423400 location in memory
address of arr[12]:8423400 + 4 * 12 = 8423448
![Page 4: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/4.jpg)
Access by Hashing Hashing applies same concept at
software level:access operations do not search for data keys; they compute data indexes
index = f(data.key) performance “almost” O(1)
![Page 5: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/5.jpg)
Access example student number is key: s01324092 i = f(key) = 01324092 % 10000 = 4092 data for student s01324092 is at location 4092
in data arrayarr[4092].key = “s01324092”
problems wasted storage – array must have 10 000 elements competition for space: s01324092 and s02894092 iterated operations are more difficult
![Page 6: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/6.jpg)
Access examplekey: “s01324092”i = f(key) = 01324092 % 10000 = 4092
f(“s01324092”) = 4092
“s01324092”key
![Page 7: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/7.jpg)
Hashing terminology student number is key: s01324092 i = f(key) = 01324092 % 10000 = 4092 data for student s01324092 is at location 4092
in data arrayarr[4092].key = “s01324092”
problems wasted storage – array must have 10 000 elements competition for space : s01324092 and s02894092
hash function
hash table
collision
![Page 8: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/8.jpg)
Hashing Fact-of-Life
Collisions are unavoidableSolution strategy: minimize number of collisions resolve the collisions that do occur
![Page 9: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/9.jpg)
Hash functions for hash table of size nmap key -> {0,n-1}typical function:key -> integer % n
eg. // student number keyint hash(String stuNo, int n){ return Integer.parseInt(stuNo.substring(1))
%n;}
![Page 10: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/10.jpg)
Hash function goals simple as possible (speed) distribute keys uniformly over indices
(minimize collisions)
two steps:1. transform key to integer if
necessary (hashCode())2. restrict integer to range of data
array (hash())
![Page 11: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/11.jpg)
Java’s hashCode() methodpublic int hashCode() Returns a hash code value for the object. This method is supported for the
benefit of hashtables such as those provided by java.util.Hashtable. The general contract of hashCode is: Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Returns: a hash code value for this object.
![Page 12: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/12.jpg)
Other hashing methods hashCode can be overwritten for any
class hashCode usually should be
overwritten fit actual data improve performance remove dependence on location in
memory
![Page 13: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/13.jpg)
Design model equals() is based on key field
match implies same record hashCode function also based on key field
key is used for accessBUT
hash function is also based on table size
![Page 14: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/14.jpg)
Resizing table if table is resized, all data must be re-
entered into new array, not just copiede.g.:int hash(String stuNo, int n){ return
Integer.parseInt(stuNo.substring(1))%n;}hash(“s01324092”,10000) => 4092hash(“s01324092”, 6667) => 4026
![Page 15: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/15.jpg)
Resolving collisionsWhen a collision occurs on insertion: internal
store new element at another location in the table
external store new element outside the table
![Page 16: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/16.jpg)
Linear probing sequential search for next available
location to store data when collision occurs
eg.hash(key) -> index=4if table[4] is occupied, try table[5] then
table[6],…, until empty location found
![Page 17: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/17.jpg)
Linear probing hash table after each insertion (Weiss)
Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley
![Page 18: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/18.jpg)
fail
find(58)delete(89)find(58)
The Deletion Problem
(Weiss, 2002)
![Page 19: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/19.jpg)
Lazy deletion
49589-1-1-1-1-11889
aaaaaaaaaa
find(58)delete(89)find(58)insert(99)find(58)
49589-1-1-1-1-11889
aaaaaaaaad
insert criterionvalue==-1 OR state==d
value state value state
continue search criterionvalue!=-1
49589-1-1-1-1-11899
aaaaaaaaaa
value state
![Page 20: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/20.jpg)
Linear probing performanceIdeal performance depends on fraction of
table that is full k items in table of size n probability of insertion collision: k/n=p average probes to free space: n/(n-k)
or 1/(1-p)e.g. table half full: 2 probesBUT…
![Page 21: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/21.jpg)
Linear probing performanceLinear probing for insertion produces
primary clustering: probability of insertion collision:
(1+(1-p)-2)/2e.g. table half full: 2.5 probes
![Page 22: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/22.jpg)
Illustration of primary clustering in linear probing (b) versus no clustering (a) and the less significant secondary clustering in quadratic probing (c). Long lines represent occupied cells, and the load factor is 0.7.
Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley
![Page 23: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/23.jpg)
Linear probing performance
0
5
10
15
20
25
0 0.2 0.4 0.6 0.8 1
Load factor
Aver
age
prob
es fo
r ins
ertio
n
Unbiased Linear
![Page 24: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/24.jpg)
Clustering primary clustering
from linear probing solution:
alternate probing actions e.g. quadratic probing
constraint: minimize computation of probe
![Page 25: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/25.jpg)
Clustering primary clustering
linear probing secondary clustering
different probes from different indices
quadratic probing even better:
different probes for different keys at same index
secondary hashing
4958-1-1-1-166261889
aaaaaaaaaa
value state
16
linear
![Page 26: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/26.jpg)
Probing comparison
4958-1-1-1-166261889
aaaaaaaaaa
value state
16
-1-1-1-1-1-166-11889
aaaaaaaaaa
value state
16
48
linear non-linear
-1-1-1-1-1-166-11889
aaaaaaaaaa
value state
16
secondary hash
96
![Page 27: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/27.jpg)
Secondary hashing Hash function determines initial index Secondary hash function determines
step size for probe after collision
![Page 28: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/28.jpg)
Table class – Main p.571public class Table{ private int manyItems; private Object[ ] keys; private Object[ ] data; private boolean[ ] hasBeenUsed; …
![Page 29: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/29.jpg)
constructor public Table(int capacity) { if (capacity <= 0) throw new
IllegalArgumentException("Capacity is negative");
keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }
![Page 30: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/30.jpg)
search for an object by key public boolean containsKey(Object key) { return findIndex(key) != -1; } private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); } return -1; }
![Page 31: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/31.jpg)
wrap around indexing private int nextIndex(int i) { if (i+1 == data.length) return 0; else return i+1; }
![Page 32: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/32.jpg)
get an object public Object get(Object key) { int index = findIndex(key); if (index == -1) return null; else return data[index]; }
![Page 33: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/33.jpg)
insert a key and object public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if (index != -1) // replace object for key { answer = data[index]; data[index] = element; return answer; } else if (manyItems < data.length) // new key and object { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else // table is full { throw new IllegalStateException("Table is full."); } }
![Page 34: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/34.jpg)
remove a key and object public Object remove(Object key) { int index = findIndex(key); Object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null;
manyItems--; } return answer; }
![Page 35: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/35.jpg)
Changing probe strategydouble hash
private int findIndex(Object key) { int count = 0; int i = hash1(key);
int p = hash2(key); while (count < data.length
&& hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i,p); } return -1; }
private int nextIndex(int i, int p) { return (i+p)%data.length; }
![Page 36: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/36.jpg)
Picking good hash strategies division hash functions
prime table size (n) is required index is hashCode % n stepSize is 1+ hashCode % (n-2) (Knuth: best if (n-2) also prime)
mid-square hashCode2 – take ‘middle’ digits
multiplicativehashCode * r (0<r<1) – take fraction digits
![Page 37: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/37.jpg)
External Hashing (Chaining) array of linked lists of objects for map, objects contain map entry pairs
keyhash
functionindex
0123…
data pair data pair data pair
data pair
![Page 38: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/38.jpg)
External Hashing (Chaining) less sensitive to load factor more memory access (list) easier to manage
![Page 39: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/39.jpg)
Comparison of Hashing Performance
number of comparisons (y) vs load (x)
0123456
0 1 2 3 4 5
Linear probe Double hash Chaining
![Page 40: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/40.jpg)
Analysis of performanceLinear probing: ½(1 + 1/(1-α)) comparisons for successful
search where α is load factor (Knuth) assumptions:
uniform hashing no deletions
e.g., 1365 entries in table of 1709α = .80, expect 3 comparisons
![Page 41: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/41.jpg)
Analysis of performanceDouble hashing: -ln(1- α)/α comparisons for successful
search where α is load factor (Knuth) assumptions:
uniform hashing no deletions
e.g., 1365 entries in table of 1709α = .80, expect 2 comparisons
![Page 42: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/42.jpg)
Analysis of performanceChained hashing: 1+α/2 comparisons for successful
search where α is load factor assumptions:
uniform hashing e.g., 1365 entries in table of 1709
α = .80, expect 1.4 comparisons
![Page 43: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/43.jpg)
Hash table summary hash table – array computed access into array based on
key n to 1 relation of keys to indexes
collisions collision resolution
open hashing double hashing chained hashing
![Page 44: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/44.jpg)
JAVA CollectionsInterfaces Collection
List Queue
Set SortedSet
Map SortedMap
Implementations array (resizable) linked list balanced search tree hash table hash table plus linked
list
![Page 45: Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,](https://reader035.vdocuments.mx/reader035/viewer/2022062906/5a4d1b217f8b9ab0599956b4/html5/thumbnails/45.jpg)
hashed implementations HashSet implements Set HashMap implements Mapconstructors:
capacityload factor
performance