data structures and algorithms hashing first year

23
Data Structures Data Structures and Algorithms and Algorithms Hashing Hashing First Year First Year M. B. Fayek M. B. Fayek CUFE 2010 CUFE 2010

Upload: amaya-kirk

Post on 31-Dec-2015

45 views

Category:

Documents


1 download

DESCRIPTION

Data Structures and Algorithms Hashing First Year. M. B. Fayek CUFE 2010. Hashing. What is Hashing? Problems in hashing Collision Resolution Strategies. 1. What is Hashing?. Hashing is a quick and efficient searching technique . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Structures and Algorithms  Hashing First Year

Data Structures Data Structures and Algorithms and Algorithms

HashingHashingFirst YearFirst Year

M. B. FayekM. B. Fayek

CUFE 2010CUFE 2010

Page 2: Data Structures and Algorithms  Hashing First Year

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing

3.3. Collision Resolution Collision Resolution Strategies Strategies

Page 3: Data Structures and Algorithms  Hashing First Year

1. What is Hashing?1. What is Hashing? Hashing is a quick and efficient Hashing is a quick and efficient

searching techniquesearching technique.. So far, efficiency of search So far, efficiency of search depended on the number of depended on the number of

comparisonscomparisons In hashing the keys themselves In hashing the keys themselves

point directly to records by point directly to records by applying a applying a hashing functionhashing function.. All possible key values are All possible key values are

mapped into in the mapped into in the hash tablehash table.. The hashing function is used for The hashing function is used for

search as well as for storing.search as well as for storing.

Page 4: Data Structures and Algorithms  Hashing First Year

1. What is Hashing?1. What is Hashing?

The hash table is sequential and The hash table is sequential and contiguous.contiguous.

Each slot is called a Each slot is called a bucketbucket.. Buckets may hold more than one Buckets may hold more than one

key.key.

Page 5: Data Structures and Algorithms  Hashing First Year

1. What is Hashing?1. What is Hashing?

Hashing methods:Hashing methods: Direct and SubtractionDirect and Subtraction

Modulo-division (or division Modulo-division (or division remainder) using list size remainder) using list size

( prime, why?)( prime, why?) Digit extractionDigit extraction

MidsquareMidsquare Folding ( fold shift, fold Folding ( fold shift, fold

boundary)boundary) Pseudo random ( seed)Pseudo random ( seed)

Page 6: Data Structures and Algorithms  Hashing First Year

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution

StrategiesStrategies

Page 7: Data Structures and Algorithms  Hashing First Year

Problems in HashingProblems in Hashing

CollisionCollision occurs whenever a hash occurs whenever a hash function maps two distinct keys to function maps two distinct keys to

the same bucket.the same bucket. The The hashing functionhashing function must generate must generate

bucket addresses bucket addresses quicklyquickly and and efficientlyefficiently, with minimum collisions., with minimum collisions.

As the domain of keys is usually As the domain of keys is usually larger than the number of buckets larger than the number of buckets collisions are very likely to happen collisions are very likely to happen

no matter how efficient the hashing no matter how efficient the hashing function is. function is.

Page 8: Data Structures and Algorithms  Hashing First Year

HashingHashing

1.1. What is Hashing?What is Hashing?

2.2. Problems in hashingProblems in hashing3.3. Collision Resolution Collision Resolution

StrategiesStrategies

Page 9: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Definitions:Definitions: Load factor Load factor

= list size/num of = list size/num of elements in listelements in list

Clustering ( primary, Clustering ( primary, secondary)secondary)

Page 10: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

Page 11: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing:Open Addressing: Probing:Probing:

Linear Probing:Linear Probing: Search at Search at constant intervals from constant intervals from

collision (typically 1)collision (typically 1) Quadratic Probing:Quadratic Probing: Search at Search at

quad-ratically increasing quad-ratically increasing intervals, i.e. collision function intervals, i.e. collision function

f(i) = if(i) = i2 2 ; i.e. on collision ; i.e. on collision searching 1searching 1stst, 4, 4thth, 9, 9thth, , …… location location

Page 12: Data Structures and Algorithms  Hashing First Year

Linear ProbingLinear Probing

Page 13: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing:Open Addressing: (using prime (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

Page 14: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies Open AddressingOpen Addressing

Double Hashing:Double Hashing: Apply a Apply a second hashing function second hashing function

and probe at the obtained and probe at the obtained address: address:

hashhash22(x), 2* hash(x), 2* hash22(x), 3* (x), 3* hashhash22(x), . . .(x), . . .

Page 15: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

Page 16: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Linked lists (Separate Linked lists (Separate Chaining):Chaining):

Separate chaining ( may be Separate chaining ( may be modified by keeping the chain modified by keeping the chain

sorted!)sorted!) Modified Hash Table (by Modified Hash Table (by eliminating the first probe, eliminating the first probe,

hence the hash table becomes hence the hash table becomes an array of records instead of an array of records instead of

an array of pointers to records) an array of pointers to records)

Page 17: Data Structures and Algorithms  Hashing First Year

Linked List (Separate Linked List (Separate Chaining)Chaining)

Page 18: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Open Addressing: (using prime Open Addressing: (using prime area)area)

Probing (Linear, quadratic)Probing (Linear, quadratic) Double Hashing Double Hashing

Pseudo-randomPseudo-random Key offsetKey offset

Linked Lists (Separate Linked Lists (Separate Chaining)Chaining)

(Bucket Hashing)(Bucket Hashing) Re-hashingRe-hashing

Page 19: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Rehashing:Rehashing: When table becomes When table becomes too fulltoo full, ,

operations will start taking operations will start taking too longtoo long

Solution:Solution: Build another Build another hashing table of about double hashing table of about double

size + associated hashing size + associated hashing function and scan down function and scan down

entire original hash tableentire original hash table

successful search unsuccessful search

Page 20: Data Structures and Algorithms  Hashing First Year

3. Collision Resolution 3. Collision Resolution Strategies Strategies

Rehashing:Rehashing: When is the table When is the table too full too full ??

Rehash when table is half Rehash when table is half fullfull

Rehash when an insertion Rehash when an insertion failsfails

When table reaches a certain When table reaches a certain load factor . . . . . load factor . . . . . bestbest

Page 21: Data Structures and Algorithms  Hashing First Year

End of HashingEnd of Hashing

Page 22: Data Structures and Algorithms  Hashing First Year

ProbingProbing

Definition: Definition:

Each calculation of an Each calculation of an address and test for address and test for success is known as success is known as probingprobing

Page 23: Data Structures and Algorithms  Hashing First Year

Key offset collision Key offset collision resolutionresolution

Offset = key/list sizeOffset = key/list size Address= (Offset + old Address= (Offset + old

address) % list sizeaddress) % list size