cpsc 335 dr. marina gavrilova computer science university of calgary canada
Post on 15-Dec-2015
218 Views
Preview:
TRANSCRIPT
3
Standard hashing works on fixed file size.
What if we add / delete many keys? What if the file sizes change significantly?
Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes
Hash Functions for Extendible Hashing
4
Keys stored in buckets.
Each bucket can only hold a fixed size of items.
Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory.
Example: buckets h(kn) = 11011
Add kn
b00 ********************************
b00
b01 b01
b10
Table
b1 b11
Extendible Hashing
00011
00110
0010101100
0101110011
11110
11111
0001 10 11
0001 10 11
1001111011
11110
11111
5
Directory schemes - Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978) Directory less schemes - Virtual hashing (Litwin 1978)
Hash Functions for Extendible Hashing
6
Size of a bucket = MAX # of pseudokeys (3 in our example)
Once the bucket is full – split the bucket into two
Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket
- Size of directory grows from 2k to 2k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure).
The number of buckets will remain the same, i.e. some references will point to the same bucket.
Finally, one can use bitmap to build the index but store an actual key in the bucket!
Extendible Hashing
000
001
010
011
100
101
110
111
7
1. Use as much space as needed.
2. Input the file name, # of words to insert Use bucket size: 128
3.Use any function h(k) that returns the string of bits of up to 32 bits (integer type can be used).
4.Bucket – char array
5.Main idea: only the FIRST bits of the mask are used for search
Extendible Hashing
8
Assume that a hashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items.
Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file.
The characteristic feature of extendible hashing is the organization of the index, which is an expandable table.
Extendible Hashing
9
A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys.
The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index.
Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized.
A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used.
The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two.
Extendible Hashing
11
Expandable Hashing Similar idea to an extendible hashing. But binary tree is used to store an index on the buckets.
Dynamic Hashing
multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search.
Expandable & Dynamic Hashing
12
Larson method Index is simplified to be represented as a set of binary trees.
Height of each tree is limited.
h(x) is searched in ALL trees. Time: m – trees, k keys in each max, overall: m*lgk.Advantage: shorter search time in index file
Dynamic Hashing
13
Litwin’s Virtual Hashing Expand buckets in a linear fashion.
Store them continuously in the memory.
No table is needed, the procedure is simple.
Virtual Hashing
14
Summary
Extendible hashing advantages: Initially allocated space can increase indefinitely Location of a bucket where key belongs requires only very fast
bits comparison Very flexible in choosing size of the bucket, and allows their
storage on disks/remote memory access
Extendible hashing disadvantages: Increased algorithm complexity Extra memory overhead to store index inside the bucket
top related