intermediate information structurespages.cpsc.ucalgary.ca/~rokne/cpsc335/stuff/slides_2017/... ·...
TRANSCRIPT
CPSC 335
Intermediate Information Structures
Computer Science
University of Calgary
Canada
LECTURE 5
POLYA
and
DYNAMIC AND EXTENDIBLE HASHING
Jon Rokne
Modified from Marina’s lectures.
2
POLYA
• UNDERSTANDING THE PROBLEM • First. • You have to understand the problem. • What is the unknown? What are the data? What is
the condition? • Is it possible to satisfy the condition? Is the
condition sufficient to determine the unknown? Or is it insufficient? Or redundant? Or contradictory?
• Draw a figure. Introduce suitable notation. • Separate the various parts of the condition. Can you
write them down?
G. Polya, How to Solve It
• DEVISING A PLAN • Second. • Find the connection between the data
and the unknown. You may be obliged to consider auxiliary problems if an immediate connection cannot be found. You should obtain eventually a plan of the solution.
• Have you seen it before? Or have you seen the same problem in a slightly different form?
• Do you know a related problem? Do you know a theorem that could be useful?
• Look at the unknown! And try to think of a familiar problem having the same or a similar unknown.
• Here is a problem related to yours and solved before. Could you use it? Could you use its result? Could you use its method? Should you introduce some auxiliary element in order to make its use possible?
• Could you restate the problem? Could you restate it still differently? Go back to definitions.
• If you cannot solve the proposed problem try to solve first some related problem. Could you imagine a more accessible related problem? A more general problem? A more special problem? An analogous problem?
• Could you solve a part of the problem? Keep only a part of the condition, drop the other part; how far is the unknown then determined, how can it vary? Could you derive something useful from the data? Could you think of other data appropriate to determine the unknown? Could you change the unknown or data, or both if necessary, so that the new unknown and the new data are nearer to each other?
• Did you use all the data? Did you use the whole condition? Have you taken into account all essential notions involved in the problem?
• CARRYING OUT THE PLAN • Third. • Carry out your plan. • Carrying out your plan of the solution,
check each step. Can you see clearly that the step is correct? Can you prove that it is correct?
n Looking Back n Fourth. n Examine the solution obtained. n Can you check the result? Can you
check the argument? n Can you derive the solution
differently? Can you see it at a glance?
n Can you use the result, or the method, for some other problem?
10
Applying Polya’s method
n Extendible hashing n Expandable and dynamic hashing n Virtual hashing n Summary
12
OUTLINE
13
Ø Standard hashing works on fixed file size.
Ø What if we add / delete many keys? What if the file sizes change significantly?
Ø Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes
Hash Functions for Extendible Hashing
14
Ø Keys stored in buckets.
Ø Each bucket can only hold a fixed size of items.
Ø Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory. Example: buckets h(kn) = 11011 Add kn
b00 ********************************
b00
b01 b01
b10
Table
b1 b11
Extendible Hashing
00011
00110
00101
01100
01011
10011
11110
11111
00 01 10 11
00 01 10 11
10011
11011
11110
11111
15
Ø Directory schemes - Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978) Ø Directory less schemes - Virtual hashing (Litwin 1978)
Hash Functions for Extendible Hashing
16
Ø Size of a bucket = MAX # of pseudokeys (3 in our example) Ø Once the bucket is full – split the bucket into two Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket - Size of directory grows from 2k to 2k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure). The number of buckets will remain the same, i.e. some references will point to the same bucket. Finally, one can use bitmap to build the index but store an actual key in the bucket!
Extendible Hashing
000
001
010
011
100
101
110
111
17
1. Use as much space as needed.
2. Input the file name, # of words to insert Use bucket size: 128 3. Use any function h(k) that returns the string of bits of up to
32 bits (integer type can be used).
4. Bucket – char array
5. Main idea: only the FIRST bits of the mask are used for search
Extendible Hashing
18
Assume that a hashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items.
Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendible hashing is the organization of the index, which is an expandable table.
Extendible Hashing
19
Ø A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys.
Ø The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index.
Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized.
Ø A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used.
The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two.
Extendible Hashing
20
Extendible Hashing
Figure 1. An example of extendible hashing (Drozdek Textbook)
21
Extendible Hashing insertion/deletion examples
Suppose that we are using an extendible hash table with bucket size 2 and suppose that our hash function H is such thatH(ANT) = 1110… H(DOG) = 0101… H(PIG) = 1001…H(BEAR)= 0010… H(ELK) = 1000… H(RAT) = 0000…H(CAT) = 1010… H(GORN)= 1010… H(WOLF)= 0111…H(COW) = 0001… H(MOOSE) = 0001…
22
Extendible Hashing insertion/deletion examples
Each bucket has an associated label (or signature) indicating which cells in the directory point to it: namely, all those having an index whose binary representation has the label as a prefix.
23
Extendible Hashing insertion/deletion examples
For each of the following operations, apply it to the hash table above (not to the result of applying the previous operations) and show the hash table that results. (a) Insert WOLF. (b) Insert ANT. (c) Insert GORN. (d) Delete DOG. (e) Delete RAT. (f) Delete CAT. (g) Insert MOOSE.
SOLUTIONS:(a) Insert WOLF. WOLF fits quite nicely alongside DOG in the bucket with label 01. (Illustration omitted.)
24
Extendible Hashing insertion/deletion examples
(b) Insert ANT. This causes overflow of the bucket with label 1, and thus that bucket is split into buckets with labels 10 and 11, into which CAT and ELK are placed appropriately, after which we attempt to insert ANT again. Because 10 is a prefix of both H(CAT) and H(ELK), both of these animals are placed into the bucket with label 10, leaving the 11 bucket empty. Insertion of ANT now goes smoothly, as it belongs in the 11 bucket.
25
Extendible Hashing insertion/deletion examples
(c) Insert GORN. This causes overflow of the bucket with label 1, and thus that bucket is split into buckets with labels 10 and 11, into which CAT and ELK are placed appropriately, after which we attempt to insert GORN again. Because 10 is a prefix of both H(CAT) and H(ELK), both of these animals are placed into the bucket with label 10, leaving the 11 bucket empty. Attempting to insert GORN leads to splitting the 10 bucket into buckets with label 100 and 101. ELK is placed into the former and CAT into the latter. Attempting to insert GORN once again, we find room for him in the 101 bucket.
26
Extendible Hashing insertion/deletion examples d) Delete DOG. Remove DOG from the 01 bucket. As there are no sibling buckets with which to combine it, we simply leave the 01 bucket empty. (Only a bucket with label 00 could be a "sibling" to the bucket with label 01, and there is no such bucket.) (Illustration omitted.)
(e) Delete RAT. Remove RAT from the 000 bucket. As the 000 and 001 buckets are "siblings" and the total # of entries in the two of them is now two, we can merge them into a 00 bucket containing COW and BEAR. Because now the maximum length of any bucket's label is two, we can halve the size of the directory, making its depth two. (In real life, we probably wouldn't merge two buckets unless the resulting bucket were somewhat less than full, because otherwise the resulting bucket would be likely to undergo a split in the near future.)
(f) Delete CAT. Remove CAT from the 1 bucket. There is no sibling bucket, so that is all we can do. (Illustration omitted.)
27
(g) Insert MOOSE. This causes overflow of the bucket with label 000. Because this bucket has depth 3, which corresponds to DIR_DEPTH, we double the size of the directory, making each entry in the new directory point to the correct bucket. Then we split the overflowing bucket into buckets with labels 0000 and 0001, into which COW and BEAR are placed appropriately. Then we attempt once more to insert MOOSE. This time, MOOSE fits nicely alongside COW in the 0001 bucket.
28
29
http://www.cosc.brocku.ca/~efoxwell/2P03/slides/Week12Slides.pdf (the next 6 slides)
30
31
32
33
34
Expandable Hashing Ø Similar idea to an extendible hashing. But binary tree is used to store an index on the buckets. Dynamic Hashing Ø multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search.
Expandable & Dynamic Hashing
35
Ø Larson method Ø Index is simplified to be represented as a set of binary trees. Ø Height of each tree is limited.
Ø h(x) is searched in ALL trees. Ø Time: m – trees, k keys in each max, overall: m*lgk. Ø Advantage: shorter search time in index file
Dynamic Hashing
36
Litwin’s Virtual Hashing Ø Expand buckets in a linear fashion.
Ø Store them continuously in the memory.
Ø No table is needed, the procedure is simple.
Virtual Hashing
37
Summary
n Extendible hashing advantages: n Initially allocated space can increase indefinitely n Location of a bucket where key belongs requires only very fast bits
comparison n Very flexible in choosing size of the bucket, and allows their storage on
disks/remote memory access
n Extendible hashing disadvantages: n Increased algorithm complexity n Extra memory overhead to store index inside the bucket