csc 213 – large scale programming. today’s goal consider what will be important when searching ...

39
LECTURE 11: WHY I LIKE HASH CSC 213 – Large Scale Programming

Upload: cornelia-lloyd

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

LECTURE 11: WHY I LIKE HASH

CSC 213 – Large Scale Programming

Page 2: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Today’s Goal

Consider what will be important when searching Why search in first place? What is its

purpose? What should we expect & handle when

searching? What factors matter to our users (and

ourselves)? (Besides source of bad jokes) What is

hashing? Why important for searching? How can it

help? What are critical factors of good hash

function? Commonly-used hash function example

examined

Page 3: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Keys To Map & Dictionary

1. Used to convert the key into value2. values cannot share a key and be in

same Map3. In searching failure is normal, not

exceptional

Page 4: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Entry ADT

Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of

an Entry Implementations must define 2

methods key() & value() return appropriate item Usually includes setValue() but NOT setKey()

Page 5: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

SEQUENCE-Based Map

SEQUENCE’s perspective of MAP that it holds

POSITIONs

elements

Page 6: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

SEQUENCE-Based Map

Outside view of MAP and how it is stored

POSITIONs

ENTRYs

Page 7: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

SEQUENCE-Based Map

MAP implementation’s view of data and storage

POSITIONs

Elements/ENTRYs

Page 8: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Emergency

Page 9: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Please hold while the machine

searches 1,000,000 records for your location

Page 10: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What
Page 11: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What
Page 12: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Map Performance

In all seriousness, can be matter of life-or-death 911 Operators immediately need

addresses Google’s search performance in TB/s O(log n) time too slow for these uses

Would love to use arrays Get O(1) time to add, remove, or lookup

data This HUGE array needs massive RAM

purchase

Page 13: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

Page 14: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

As with all life’s problems we turn to hash

Page 15: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

As with all life’s problems we turn to hash

Page 16: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Hashing To The Rescue

Hash function turns key into int from 0 – N-1 Result is usable as index for an array Specific for key’s type; cannot be reused

Store the Entrys in array (“HASH TABLE”) (Great name for shop in Amsterdam, too) Begin by computing key’s hash value Result is array index for that Entry

Now is possible to use array for O(1) time!

Page 17: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Hash Table Example

Example shows table of Entry<Long,String>

Simple hash function ish(x) = x mod 10,000 x is/from Entry’s key h(x) computes index to use Always is mod array length

Not all locations used Holes will appear in array Empties: set to null -or-

use sentinel

value

Hash Table

Entrys

0 •

10256120001

“Jay Doe”

29811010002

“Bob Doe”

3 •

44512290004

“Jill Roe”

⁞ ⁞

9997 •

9998

2007519998

“Rhi Smith”

9999 •

Page 18: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

When We Use Hash

Page 19: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

When We Use Hash

Hash key to find index First step for most calls

get()- need index to check Add at that index - put() remove()- index to set null

Then check key at index At index many keys

possible Still a Map, so results known If you find keys not same

cannot treat as the same!

Hash Table

Entrys

0 •

10256120001

“Jay Doe”

29811010002

“Bob Doe”

3 •

44512290004

“Jill Roe”

⁞ ⁞

9997 •

9998

2007519998

“Rhi Smith”

9999 •

Page 20: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Properties of Good Hash

To really be useful, hash must have properties

ReliableFAST

Use entire table

Page 21: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Properties of Good Hash

To really be useful, hash must have properties

ReliableFAST

Use entire tableMake good brownies

Page 22: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Reliability of Hash Function Implement Map with a hash table

To use Entry, get key to easily look up its index

Always computes same index for that key

Page 23: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Speed of Hash Function

Hash must be computed on each access Goal: O(1) efficiency by using an array Efficiency of array wasted if hash is slow

If O(1) computation performed by hash function It is possible to perform get in O(1) time O(1) time for put & remove could also occur None of this is guaranteed; many problems

can occur

Page 24: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Use Entire Table Important

Hashing take lots of space because array is used When creating, make array big enough to

hold all data Can copy to larger array, but this not O(1)

operation Use prime number lengths but these quickly

get large Spreads out Entrys equally across

entire table Further apart it's spread, easier to find

opening

Page 25: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Hash Function Analogy

Page 26: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Hash Function Analogy

Hash table

Page 27: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Hash Function Analogy

Hash functionHash table

Page 28: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Examples of Bad Hash

h(x) = 0 Reliable, fast, little use of table

h(x) = random.nextInt() Unreliable, fast, uses entire table

h(x) = current index -or- free index Reliable, slow, uses entire table

h(x) = x34 + 2x33+ 24x32 + 10x31… Reliable, moderate, too large

Page 29: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Incredibly Bad Hash

Page 30: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Page 31: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Page 32: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Part used for hash

Page 33: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Part used for hashPart that matters

Page 34: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

Page 35: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

Page 36: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

“stop” = (‘s’ * a3) + (‘t’ * a2) + (‘o’ * a1) + (‘p’ * a0)

Page 37: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Good, Fast Hash

Polynomial codes good, but very slow Major bummer since we use hash for its

speed Cause of slowdown: computing an takes n

operations Horner’s method better by

piggybacking work

Slow Approach:“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

Horner’s Method“spot” = ‘t’ + (a * (‘o’ + (a * (‘p’ + (a * ‘s’)))))

Page 38: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Compression

Hash’s only use is computing array indices Useless if larger than table’s length: no

index exists! When a=33, “spot” hashed to

4,293,383 Some hash incalculable (like

“triskaidekaphobia”) To compress result, work like array-

based queuehash = (result + length) % length

% returns by modulus (the remainder from division)

Serves exact same purpose: keeps index within limits

Page 39: CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What

Before Next Lecture…

Continue working on week #4 assignment Due at usual time Tues. so may want to get

cracking Start thinking of designs & CRC cards for

project Due in 10 days as projects completed in stages

Read sections 9.2.1 & 9.2.5 – 9.2.7 of the book Consider better ways of handling this situation: