data structures and algorithms hashing first year
DESCRIPTION
Data Structures and Algorithms Hashing First Year. M. B. Fayek CUFE 2010. Hashing. What is Hashing? Problems in hashing Collision Resolution Strategies. 1. What is Hashing?. Hashing is a quick and efficient searching technique . - PowerPoint PPT PresentationTRANSCRIPT
Data Structures Data Structures and Algorithms and Algorithms
HashingHashingFirst YearFirst Year
M. B. FayekM. B. Fayek
CUFE 2010CUFE 2010
HashingHashing
1.1. What is Hashing?What is Hashing?
2.2. Problems in hashingProblems in hashing
3.3. Collision Resolution Collision Resolution Strategies Strategies
1. What is Hashing?1. What is Hashing? Hashing is a quick and efficient Hashing is a quick and efficient
searching techniquesearching technique.. So far, efficiency of search So far, efficiency of search depended on the number of depended on the number of
comparisonscomparisons In hashing the keys themselves In hashing the keys themselves
point directly to records by point directly to records by applying a applying a hashing functionhashing function.. All possible key values are All possible key values are
mapped into in the mapped into in the hash tablehash table.. The hashing function is used for The hashing function is used for
search as well as for storing.search as well as for storing.
1. What is Hashing?1. What is Hashing?
The hash table is sequential and The hash table is sequential and contiguous.contiguous.
Each slot is called a Each slot is called a bucketbucket.. Buckets may hold more than one Buckets may hold more than one
key.key.
1. What is Hashing?1. What is Hashing?
Hashing methods:Hashing methods: Direct and SubtractionDirect and Subtraction
Modulo-division (or division Modulo-division (or division remainder) using list size remainder) using list size
( prime, why?)( prime, why?) Digit extractionDigit extraction
MidsquareMidsquare Folding ( fold shift, fold Folding ( fold shift, fold
boundary)boundary) Pseudo random ( seed)Pseudo random ( seed)
HashingHashing
1.1. What is Hashing?What is Hashing?
2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution
StrategiesStrategies
Problems in HashingProblems in Hashing
CollisionCollision occurs whenever a hash occurs whenever a hash function maps two distinct keys to function maps two distinct keys to
the same bucket.the same bucket. The The hashing functionhashing function must generate must generate
bucket addresses bucket addresses quicklyquickly and and efficientlyefficiently, with minimum collisions., with minimum collisions.
As the domain of keys is usually As the domain of keys is usually larger than the number of buckets larger than the number of buckets collisions are very likely to happen collisions are very likely to happen
no matter how efficient the hashing no matter how efficient the hashing function is. function is.
HashingHashing
1.1. What is Hashing?What is Hashing?
2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution
StrategiesStrategies
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Definitions:Definitions: Load factor Load factor
= list size/num of = list size/num of elements in listelements in list
Clustering ( primary, Clustering ( primary, secondary)secondary)
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Open Addressing: (using prime Open Addressing: (using prime area)area)
Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing
Pseudo-randomPseudo-random Key offsetKey offset
Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)
(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Open Addressing:Open Addressing: Probing:Probing:
Linear Probing:Linear Probing: Search at Search at constant intervals from constant intervals from
collision (typically 1)collision (typically 1) Quadratic Probing:Quadratic Probing: Search at Search at
quad-ratically increasing quad-ratically increasing intervals, i.e. collision function intervals, i.e. collision function
f(i) = if(i) = i2 2 ; i.e. on collision ; i.e. on collision searching 1searching 1stst, 4, 4thth, 9, 9thth, , …… location location
Linear ProbingLinear Probing
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Open Addressing:Open Addressing: (using prime (using prime area)area)
Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing
Pseudo-randomPseudo-random Key offsetKey offset
Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)
(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing
3. Collision Resolution 3. Collision Resolution Strategies Strategies Open AddressingOpen Addressing
Double Hashing:Double Hashing: Apply a Apply a second hashing function second hashing function
and probe at the obtained and probe at the obtained address: address:
hashhash22(x), 2* hash(x), 2* hash22(x), 3* (x), 3* hashhash22(x), . . .(x), . . .
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Open Addressing: (using prime Open Addressing: (using prime area)area)
Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing
Pseudo-randomPseudo-random Key offsetKey offset
Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)
(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Linked lists (Separate Linked lists (Separate Chaining):Chaining):
Separate chaining ( may be Separate chaining ( may be modified by keeping the chain modified by keeping the chain
sorted!)sorted!) Modified Hash Table (by Modified Hash Table (by eliminating the first probe, eliminating the first probe,
hence the hash table becomes hence the hash table becomes an array of records instead of an array of records instead of
an array of pointers to records) an array of pointers to records)
Linked List (Separate Linked List (Separate Chaining)Chaining)
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Open Addressing: (using prime Open Addressing: (using prime area)area)
Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing
Pseudo-randomPseudo-random Key offsetKey offset
Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)
(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Rehashing:Rehashing: When table becomes When table becomes too fulltoo full, ,
operations will start taking operations will start taking too longtoo long
Solution:Solution: Build another Build another hashing table of about double hashing table of about double
size + associated hashing size + associated hashing function and scan down function and scan down
entire original hash tableentire original hash table
successful search unsuccessful search
3. Collision Resolution 3. Collision Resolution Strategies Strategies
Rehashing:Rehashing: When is the table When is the table too full too full ??
Rehash when table is half Rehash when table is half fullfull
Rehash when an insertion Rehash when an insertion failsfails
When table reaches a certain When table reaches a certain load factor . . . . . load factor . . . . . bestbest
End of HashingEnd of Hashing
ProbingProbing
Definition: Definition:
Each calculation of an Each calculation of an address and test for address and test for success is known as success is known as probingprobing
Key offset collision Key offset collision resolutionresolution
Offset = key/list sizeOffset = key/list size Address= (Offset + old Address= (Offset + old
address) % list sizeaddress) % list size