data structures and algorithms lecture notes 7 prepared by İnanç tahrali

40
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

Upload: willis-wood

Post on 30-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

DATA STRUCTURES

ANDALGORITHMS

Lecture Notes 7

Prepared by İnanç TAHRALI

Page 2: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

2

REVIEWWe have investigated the following ADTs

LISTS Array Linked List

STACKS QUEUE TREES

Binary Trees Binary Search Trees AVL Trees

What about their running times ?

Page 3: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

3

Running times of important operations

insertion

deletion find

Array O(n) O(n) O(n)

Linked list O(1) O(n) O(n)

Tree O(log n) O(log n) O(logn)

Can we decrease the running times more ?

Page 4: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

4

ROAD MAP HASHING

General Idea Hash Function Separate Chaining Open Adressing Rehashing

Page 5: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

5

Hashing Hashing: implementation of hash tables hash table: an array of elements

fixed size TableSize Search is performed on a part of the item: key Each key is mapped into a number

in the range 0 to TableSize-1 Used as array index

Mapping by hash function Simple to compute Ensure that any two distinct keys get different cells

How to perform insert, delete and find operations in O(1) time ?

Page 6: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

6

An ideal hash table Each key is mapped to a

different index ! Not always possible

many keys, finite indexes

Even distribution

Considerations : Choose a hash function Decide what to do when

two keys hash to the same value

Decide on table size

Page 7: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

7

Hash function

If keys are integers hash function return Key mod

TableSize Ex: TableSize = 10

Keys = 120, 330, 1000 TableSize should be prime

Page 8: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

8

Hash function If keys are strings

Add ASCII values of the characters If TableSize is large and number of characters is small

TableSize = 10000 & number of characters in a key = 8127*8=1016 < 10000

int hash( const string & key, int tableSize ){

int hashVal = 0;for( int i = 0; i < key.length( ); i++ )

hashVal += key[i];

return hashVal % tableSize;}

Page 9: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

9

Hash function If keys are strings

Use all characters∑ 32i Key [KeySize -i -1 ]

Early characters does not count Use only some number of characters Use characters in odd spaces

Page 10: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

10

Hash function If keys are strings

Use first three characters729*key[2] + 27*key[1] + key[0]

If the keys are not random some part of the table is not used.

int hash( const string & key, int tableSize )

{

return ( key [0] + 27 * key [1] + 729 *

key [2]) % tableSize;

}

Page 11: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

11

int hash( const string & key, int tableSize ){

int hashVal = 0;

for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ];

hashVal %= tableSize;if( hashVal < 0 )

hashVal += tableSize;

return hashVal;}

A good hash function

Page 12: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

12

Collusion Main programming detail is collision

resolution If when an element is inserted, it hashes

to the same value as an already inserted element, there is collision.

There are several methods to deal with this problem Separate chaining Open addressing

Page 13: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

13

Separate Chaining Hash Table

Keep a list of all elements that hash to the same value

TableSize = 10 is not good not prime

Page 14: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

14

Type declaration for separate chaining hash table

template <class HashedObj>class HashTable { public:

explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs )

:ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists( rhs.theLists ) { }

const HashedObj & find( const HashedObj & x ) const;

void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );

const HashTable & operator=( const HashTable & rhs ); private:

vector<List<HashedObj> > theLists; // The array of Listsconst HashedObj ITEM_NOT_FOUND;

};

int hash( const string & key, int tableSize );int hash( int key, int tableSize );

Page 15: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

15

/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj &

notFound, int size ) : ITEM_NOT_FOUND(notFound), theLists( nextPrime( size ) ){}

/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ) {

for( int i = 0; i < theLists.size( ); i++ )theLists[ i ].makeEmpty( );

}

/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj> ::operator=( const HashTable<HashedObj> & rhs ){

if( this != &rhs ) theLists = rhs.theLists; return *this;}

Page 16: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

16

/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ) {

theLists[ hash( x, theLists.size( ) ) ].remove( x );}

/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {

ListItr<HashedObj> itr;itr = theLists[ hash( x, theLists.size( ) ) ].find( x );if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND;else return itr.retrieve( );

}

Page 17: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

17

/* Insert item x into the hash table.template <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ){

List<HashedObj> & whichList = theLists[ hash( x, theLists.size( ) ) ];ListItr<HashedObj> itr = whichList.find( x );

if( itr.isPastEnd( ) )whichList.insert( x, whichList.zeroth( ) );

}

Page 18: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

18

Analysis Let ג be load factor of a hash table

number of elements / TableSize is the avarage length of a list ג Successful Find 2/ג comparisons + time

to evaluate hash function Unsuccessful Find & Insert ג

comparisons + time to evaluate hash function

Good choise 1 ~ גDisadvantage of separate chaining is allocate/deallocate memory !

Page 19: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

19

Open Adressing

If collision try an alternate cellh0(x), h1(x), h2(x), …

hi(x) = (hash(x) + F(i)) mod TableSizeF(0) = 0

1 > ג

Good choise < 0.5

Page 20: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

20

Linear Probing• F is a linear function of i

– F(i) = i

Insert keys

{89, 18, 49, 58, 69} • When 49 is

inserted collision occurs– Put into the

next available spot 0

• 58 collidates with 18, 89, 49

Page 21: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

21

Linear Probing

Problem: It is not easy to delete an element May have caused a collision before Mark the element deleted

Problem: Primary Clustering

Page 22: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

22

Linear Probing

Analysis

21

11

2

1&

UI

1

11

2

1S

Problem: Primary Clustering

Page 23: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

23

Quadratic Probing

F(i) is a quadratic functionEx : F(i) = i2

Page 24: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

24

Quadratic Probing When 49

collides with 89, next position attemped is one cell away

58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 22=4 away

Page 25: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

25

Quadratic Probing Solves primary clustering problem

All empty cells may not be accessed A loop around full cells may happen Hash table not full but empty space not found

Theorem : If the table size is prime and 0.5>ג new element can always be inserted.

Problem : Secondary clustering!...

Page 26: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

26

template <class HashedObj>class HashTable{ public:

explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array( rhs.array ),

currentSize( rhs.currentSize ) { }

const HashedObj & find( const HashedObj & x ) const;

void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );

const HashTable & operator=( const HashTable & rhs );

enum EntryType { ACTIVE, EMPTY, DELETED };

Type declaration for open addressing hash table

Page 27: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

27

private:

struct HashEntry{

HashedObj element;EntryType info;HashEntry( const HashedObj & e = HashedObj( ), EntryType i = EMPTY ) : element( e ), info(i) {

}};

vector<HashEntry> array;int currentSize;const HashedObj ITEM_NOT_FOUND;

bool isActive( int currentPos ) const;int findPos( const HashedObj & x ) const;void rehash( );

};

Type declaration for open addressing hash table

Page 28: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

28

/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj & notFound, int size ) :ITEM_NOT_FOUND( notFound ), array( nextPrime( size ) ) {

makeEmpty( );}

/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ){

currentSize = 0;for( int i = 0; i < array.size( ); i++ )

array[ i ].info = EMPTY;}

Page 29: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

29

/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {

int currentPos = findPos( x );if( isActive( currentPos ) )

return array[ currentPos ].element;else return ITEM_NOT_FOUND;

}

/* Method that performs quadratic probing resolution.template <class HashedObj>int HashTable<HashedObj>::findPos(const HashedObj & x) const {

int collisionNum = 0;int currentPos = hash( x, array.size( ) );

while ( array[ currentPos ].info != EMPTY && array[ currentPos ].element != x ) {

currentPos += 2 * ++collisionNum - 1;

if( currentPos >= array.size( ) )currentPos -= array.size( );

}return currentPos;

}

Page 30: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

30

/* Return true if currentPos exists and is active.template <class HashedObj>bool HashTable<HashedObj>::isActive( int currentPos ) const{

return array[ currentPos ].info == ACTIVE;}

/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ){

int currentPos = findPos( x );if( isActive( currentPos ) )

array[ currentPos ].info = DELETED;}

/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {

int currentPos = findPos( x );if( isActive( currentPos ) ) return;

array[ currentPos ] = HashEntry( x, ACTIVE );}

Page 31: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

31

/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj>::operator=( const HashTable<HashedObj> & rhs ){

if( this != &rhs ){

array = rhs.array; currentSize = rhs.currentSize;

}return *this;

}

Page 32: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

32

Double Hashing Use second hash function F(i) = i * hash2(x) Poor example :

hash2(x) = X mod 9hash1(x) = X mod 10TableSize = 10

If X = 99 what happens ?hash2(x) ≠ 0 for any X

Page 33: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

33

Double Hashing

Good choise : hash2(x) = R – (X mod R)

R is a prime and < TableSize

Page 34: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

34

Double Hashing

hash2(x) = 7 – (X mod 7)

Page 35: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

35

Analysis Random collision resolution

Probes are independent No clustering problem

Unsuccessful search and Insert Number of probes until an empty cell is found

fraction of cells that are empty = (ג -1)expected number of probes = (ג -1) / 1

Successful searchP(X)=Number of probes when the element X is inserted

1/N∑ P(X) approximately

0

1 1 1 1ln

1 1dxx

Page 36: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

36

Rehashing If ג gets large, number of probes

increases. Running time of operations starts taking

too long and insertions might fail Solution : Rehashing with larger

TableSize (usually *2) When to rehash

if 0.5 < ג if insertion fails

Page 37: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

37

Rehashing Example Elements 13, 15, 24 and 6 is inserted into an

open addressing hash table of size 7 H(X) = X mod 7 Linear probing is used to resolve collisions

Page 38: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

38

Rehashing Example

If 23 is inserted, the table is over 70 percent full.

A new table is created

17 is the first primetwice as large as the old one; so

Hnew (X) = X mod 17

Page 39: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

39

Rehashing

Rehashing is an expensive operation Running time is O(N)

Rehashing frees the programmer from worrying about table size

Amortized Analysis: Average over N operations Operations take: O(1) time

Page 40: DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

40

/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {

int currentPos = findPos( x );if( isActive( currentPos ) ) return;

array[ currentPos ] = HashEntry( x, ACTIVE );

if( ++currentSize > array.size( ) / 2 ) rehash( );}/* Expand the hash table.template <class HashedObj>void HashTable<HashedObj>::rehash( ) {

vector<HashEntry> oldArray = array;

array.resize( nextPrime( 2 * oldArray.size( ) ) );for( int j = 0; j < array.size( ); j++ )

array[ j ].info = EMPTY;currentSize = 0;for( int i = 0; i < oldArray.size( ); i++ )

if( oldArray[ i ].info == ACTIVE ) insert( oldArray[ i ].element );}