hashing and hash tables

24
Introduction Hashing Techniques Applications HASHING Muhammad Adil Raja

Upload: adil-raja

Post on 08-May-2015

923 views

Category:

Engineering


0 download

DESCRIPTION

A Lecture about hashing and hash tables.

TRANSCRIPT

Page 1: Hashing and Hash Tables

Introduction Hashing Techniques Applications

HASHING

Muhammad Adil Raja

Page 2: Hashing and Hash Tables

Introduction Hashing Techniques Applications

OUTLINE

1 INTRODUCTION

2 HASHING TECHNIQUES

3 APPLICATIONS

Page 3: Hashing and Hash Tables

Introduction Hashing Techniques Applications

OUTLINE

1 INTRODUCTION

2 HASHING TECHNIQUES

3 APPLICATIONS

Page 4: Hashing and Hash Tables

Introduction Hashing Techniques Applications

OUTLINE

1 INTRODUCTION

2 HASHING TECHNIQUES

3 APPLICATIONS

Page 5: Hashing and Hash Tables

Introduction Hashing Techniques Applications

HASHING

The idea of hashing is to distribute the entries of a datasetacross an array of buckets.Given a key, the algorithm computes an index thatsuggests where an entry can be found:index = f(key, array_size)Often this is done in two steps:hash = hashfunc(key).index = hash % array_size

Page 6: Hashing and Hash Tables

Introduction Hashing Techniques Applications

WHAT IS HASHING

A Hash TableA data structure to implement an associative array.A structure that can map keys to values.Uses a hash function to compute an index into an array ofbuckets or slots from which the correct value can be found.

Page 7: Hashing and Hash Tables

Introduction Hashing Techniques Applications

HASHING TECHNIQUES

Separate Chaining.Open Addressing.Coalesced Hashing.....

Page 8: Hashing and Hash Tables

Introduction Hashing Techniques Applications

HASH FUNCTION

Crucial for good hash table performance.Can be difficult to achieve.A basic expectation is that the function would provide auniform distribution of hash values.A non-uniform distribution increases the number ofcollisions and the cost of resolving them.

Page 9: Hashing and Hash Tables

Introduction Hashing Techniques Applications

COLLISION RESOLUTION

Practically unavoidable.Birthday problem.

Page 10: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SEPARATE CHAINING

Every bucket is independent.And maintains a list of entries with the same index.Time for hash function operations depends on the time tofind the bucket (constant) and the time for list operations.The technique is also called open hashing or closedaddressing.In a good hash table every bucket has very few entries.

Page 11: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SEPARATE CHAINING

FIGURE: Pause

Page 12: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SEPARATE CHAINING WITH LINKED LISTS

Popular as they require basic data structures with simplealgorithms.They can use simple hash functions that are unsuitable forother methods.Cost of the table operation depends on the size of theselected bucket for the desired key.The worst case scenario is when all the entries areinserted into the same bucket.

Page 13: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SEPARATE CHAINING WITH OTHER DATA STRUCTURES

AVL Trees.BSTs.Dynamic Arrays.

Page 14: Hashing and Hash Tables

Introduction Hashing Techniques Applications

TIME COMPLEXITY MEASURES

TABLE: Time Complexity MeasuresGuarantee Average Case

Implementation Search Insert Delete Search Insert DeleteUnordered Array N N N N/2 N/2 N/2Ordered Array lg N N N lg N N/2 N/2Unordered List N N N N/2 N N/2Ordered List N N N N/2 N/2 N/2BST N N N 1.39 lg N 1.39 lg N ?Randomized BST 7 lg N 7 lg N 7 lg N 1.39 lg N 1.39 lg N 1.39 lg N

Page 15: Hashing and Hash Tables

Introduction Hashing Techniques Applications

OPEN ADDRESSING (CLOSED HASHING)

All entry records are stored in the bucket array itself.Insertion of a new entry: The buckets are examined,starting from the hashed-to slot and proceeding in someprobe sequence, until an unoccupied slot is found.Searching: The buckets are scanned in the samesequence, until the target entry is found, or an unused slotis found, which indicates that there is no such key in thetable.Open Addressing: Refers to the fact that location (address)of an entry is not determined by its hash value.Closed Hashing: Not to be confused with open hashing orclose addressing -> names reserved for separate chaining.

Page 16: Hashing and Hash Tables

Introduction Hashing Techniques Applications

PROBE SEQUENCES

Linear Probing – A fixed interval between probes (usually1).Quadratic Probing – Interval between probes is increasedby adding the successive outputs of a quadratic polynomialto the starting value given by the original computation.Double Hashing – Interval between probes is computed byanother hash function.Drawback: The number of stored entries cannot exceedthe number of slots in the bucket array.

Page 17: Hashing and Hash Tables

Introduction Hashing Techniques Applications

OPEN ADDRESSING

FIGURE: Pause

Page 18: Hashing and Hash Tables

Introduction Hashing Techniques Applications

LOAD FACTOR – A KEY STATISTIC

Number of entries divided by the number of buckets – n/k.If this grows too large the hash table becomes slow.Variance of number of entires per bucket is important.Two tables have 1000 entries and 1000 buckets.One has one entry in one bucket and the second has allthe entries in one bucket.Hashing is not working in the second hash table.A low load factor is not beneficial.As the load factor approaches 0, the proportion of unusedareas in the hash table increases.This does not necessarily reduce the search cost.This results in wasted memory.

Page 19: Hashing and Hash Tables

Introduction Hashing Techniques Applications

HOW DROPBOX KNOWS YOU ARE SHARING

COPYRIGHTED STUFF

Dropbox checks the hash of a shared file against a bannedlist, and blocks the share if there is a match.With a properly implemented hash function, running thesame exact file through the algorithm twice will return thesame identifier both times – but changing a file evenslightly completely changes the hash.This identifier can be used to tell you if a file is exactly thesame as another file – but it is a one way street.The hash couldn’t tell you what that original file is, withoutyou already knowing or having a copy of the file tocompare it to.

Page 20: Hashing and Hash Tables

Introduction Hashing Techniques Applications

DROPBOX

FIGURE: Pause

Page 21: Hashing and Hash Tables

Introduction Hashing Techniques Applications

DROPBOX

When you upload a file to Dropbox, two things happen to it:a hash is generated, and then the file gets encrypted tokeep any unauthorized user (be it a hacker or a Dropboxemployee) who somehow stumbles it sitting on Dropbox’sservers from easily being able to open it up.After a DMCA complaint is verified by Dropbox’s legalteam, Dropbox adds that file’s hash to a big blacklist ofhashes known to be those corresponding to files they can’tlegally allow to be shared. When you share a link to a file,it checks that file’s hash against the blacklist.If the file you are sharing is the exact same file that acopyright holder complained about, it is blocked from beingshared with others. If it is something else – a new file, oreven a modified version of the same file – a hash-basedanti-infringement system should not have any idea what itis looking at.

Page 22: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SUBTREE CACHING (IN SYMBOLIC REGRESSION)

log

log

tan z

+

y

x * (tan y + z ) log (x + yz )

z

*

x +

+

x

zy

*

*

x +

x *

y z

x * ( (x+yz) + z ) log (tan y)

tan

y

offsprings

parents

Functions

Terminals

subtrees selected randomly for crossover

FIGURE: Pause

Page 23: Hashing and Hash Tables

Introduction Hashing Techniques Applications

SUBTREE CACHING

Every subtree is evaluated and cached, along with itsevaluation.As a new tree arrives, its subtrees are supposed to beevaluated recursively.Before evaluation, the cache is checked for an evaluationof a matching subtree.If found, evaluation is kept. If not found, the new subtree isevaluated and its evaluation is stored in the cache.Improves performance by saving time on unnecessaryevaluations.

Page 24: Hashing and Hash Tables

Introduction Hashing Techniques Applications

THANKYOU