chapter 10 maps, hash tables, and skip listsjdenny/courses/prior/16-17...โ€ขassume you have a hash...

51
CHAPTER 10 MAPS, HASH TABLES, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016) 0 1 2 3 4 451-229-0004 981-101-0002 025-612-0001

Upload: others

Post on 24-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • CHAPTER 10MAPS, HASH TABLES, AND SKIP LISTSACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH

    DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND

    GOLDWASSER (WILEY 2016)

    0

    1

    2

    3

    4 451-229-0004

    981-101-0002

    025-612-0001

  • MAPS

    โ€ข A map models a searchable collection of key-value entries

    โ€ข The main operations of a map are for searching, inserting, and deleting items

    โ€ข Multiple entries with the same key are not allowed

    โ€ข Applications:โ€ข address book

    โ€ข student-record database

    โ€ข Often called associative containers

  • THE MAP ADT

    โ€ข get(k): if the map ๐‘€ has an entry with key ๐‘˜, return its associated value; else, return null

    โ€ข put(k, v): insert entry (๐‘˜, ๐‘ฃ) into the map ๐‘€; if key ๐‘˜ is not already in ๐‘€, then return null; else, return old value associated with ๐‘˜

    โ€ข remove(k): if the map๐‘€ has an entry with key ๐‘˜, remove it from ๐‘€ and return its associated value; else, return null

    โ€ข size(), isEmpty()

    โ€ข entrySet(): return an iterable collection of the entries in ๐‘€

    โ€ข keySet(): return an iterable collection of the keys in ๐‘€

    โ€ข values(): return an iterator of the values in ๐‘€

  • EXAMPLE

    โ€ข Operation Output Mapโ€ข isEmpty() true ร˜โ€ข put(5,A) null (5,A)โ€ข put(7,B) null (5,A),(7,B)โ€ข put(2,C) null (5,A),(7,B),(2,C)โ€ข put(8,D) null (5,A),(7,B),(2,C),(8,D)โ€ข put(2,E) C (5,A),(7,B),(2,E),(8,D)โ€ข get(7) B (5,A),(7,B),(2,E),(8,D)โ€ข get(4) null (5,A),(7,B),(2,E),(8,D)โ€ข get(2) E (5,A),(7,B),(2,E),(8,D)โ€ข size() 4 (5,A),(7,B),(2,E),(8,D)โ€ข remove(5) A (7,B),(2,E),(8,D)โ€ข remove(2) E (7,B),(8,D)โ€ข get(2) null (7,B),(8,D)โ€ข isEmpty() false (7,B),(8,D)

  • LIST-BASED MAP

    โ€ข We can implement a map with an unsorted list

    โ€ข Store the entries in arbitrary order

    โ€ข Complexity of get, put, remove?

    โ€ข ๐‘‚(๐‘›) on put, get, and remove

    tailheader nodes/positions

    entries

    9 c 6 c 5 c 8 c

  • DIRECT ADDRESS TABLE MAP IMPLEMENTATION

    โ€ข A direct address table is a map in whichโ€ข The keys are in the range [0, ๐‘]โ€ข Stored in an array ๐‘‡ of size ๐‘โ€ข Entry with key ๐‘˜ stored in ๐‘‡[๐‘˜]

    โ€ข Performance:โ€ข put(k, v), get(k), and remove(k) all take ๐‘‚(1) time

    โ€ข Space - requires space ๐‘‚(๐‘), independent of ๐‘›, the number of entries stored in the map

    โ€ข The direct address table is not space efficient unless the range of the keys is close to the number of elements to be stored in the map, i.e., unless ๐‘› is close to ๐‘.

  • SORTED MAP

    โ€ข A Sorted Map supports the usual map operations, but also maintains an order

    relation for the keys.

    โ€ข Naturally supports โ€ข Sorted search tables - store dictionary in

    an array by non-decreasing order of the

    keys

    โ€ข Utilizes binary search

    โ€ข Sorted Map ADT adds the following functionality to a map

    โ€ข firstEntry(), lastEntry() โ€“return iterators to entries with the smallest

    and largest keys, respectively

    โ€ข ceilingEntry(k), floorEntry(k) โ€“ return an iterator

    to the least/greatest key value greater

    than/less than or equal to ๐‘˜

    โ€ข lowerEntry(k), higherEntry(k) โ€“ return an

    iterator to the greatest/least key value

    less than/greater than ๐‘˜

    โ€ข etc

  • EXAMPLE OF ORDERED MAP: BINARY SEARCH

    โ€ข Binary search performs operation get(k) on an ordered search table

    โ€ข similar to the high-low game

    โ€ข at each step, the number of candidate items is halved

    โ€ข terminates after a logarithmic number of steps

    โ€ข Exampleget(7)

    1 3 4 5 7 8 9 11 14 16 18 19

    1 3 4 5 7 8 9 11 14 16 18 19

    1 3 4 5 7 8 9 11 14 16 18 19

    1 3 4 5 7 8 9 11 14 16 18 19

    0

    0

    0

    0

    ml h

    ml h

    ml h

    l=m =h

  • SIMPLE MAP IMPLEMENTATION SUMMARY

    put(k, v) get(k) Space

    Unsorted list ๐‘‚(๐‘›) ๐‘‚(๐‘›) ๐‘‚(๐‘›)

    Direct Address Table ๐‘‚(1) ๐‘‚(1) ๐‘‚(๐‘)

    Sorted Search Table

    (Naturally supported Sorted Map)

    ๐‘‚(๐‘›) ๐‘‚(log ๐‘›) ๐‘‚(๐‘›)

  • DEFINITIONS

    โ€ข A set is an unordered collection of elements, without duplicates that typically supports efficient membership tests.

    โ€ข Elements of a set are like keys of a map, but without any auxiliary values.

    โ€ข A multiset (also known as a bag) is a set-like container that allows duplicates.

    โ€ข A multimap (also known as a dictionary) is similar to a traditional map, in that it associates values with keys; however, in a multimap the same key can be

    mapped to multiple values.

    โ€ข For example, the index of a book maps a given term to one or more locations at which the term occurs.

  • SET ADT

  • GENERIC MERGING

    โ€ข Generalized merge of two sorted lists A and B

    โ€ข Template method genericMerge

    โ€ข Auxiliary methodsโ€ข aIsLess

    โ€ข bIsLess

    โ€ข bothAreEqual

    โ€ข Runs in ๐‘‚(๐‘›๐ด + ๐‘›๐ต) time provided the auxiliary methods run in ๐‘‚(1) time

    Algorithm genericMerge(A, B)

    Input: Sets A,B as sorted lists

    Output: Set ๐‘†1. ๐‘† โ† โˆ…2. while ยฌ๐ด.isEmpty()โˆง ยฌ๐ต.isEmpty() do3. ๐‘Ž โ† ๐ด.first(); ๐‘ โ† ๐ต.first()4. if ๐‘Ž < ๐‘5. aIsLess(a, S) //generic action6. ๐ด.removeFirst();7. else if ๐‘ < ๐‘Ž8. bIsLess(b, S) //generic action9. ๐ต.removeFirst()10. else //๐‘Ž = ๐‘11. bothAreEqual(a, b, S) //generic action12. ๐ด.removeFirst(); ๐ต.removeFirst()13.while ยฌ๐ด.isEmpty() do14. aIsLess(A.first(), S); ๐ด.eraseFront()15.while ยฌ๐ต.isEmpty() do16. bIsLess(B.first(), S); ๐ต.removeFirst()17.return ๐‘†

  • USING GENERIC MERGE FOR SET OPERATIONS

    โ€ข Any of the set operations can be implemented using a generic merge

    โ€ข For example:

    โ€ข For intersection: only copy elements that are duplicated in both list

    โ€ข For union: copy every element from both lists except for the duplicates

    โ€ข All methods run in linear time

  • MULTIMAP

    โ€ข A multimap is similar to a map, except that it can store multiple entries with the same key

    โ€ข We can implement a multimap ๐‘€ by means of a map ๐‘€โ€ฒ

    โ€ข For every key ๐‘˜ in ๐‘€, let ๐ธ(๐‘˜) be the list of entries of ๐‘€ with key ๐‘˜

    โ€ข The entries of ๐‘€โ€ฒ are the pairs (๐‘˜, ๐ธ(๐‘˜))

  • MULITMAPS

  • HASH TABLES

  • INTUITIVE NOTION OF A MAP

    โ€ข Intuitively, a map ๐‘€ supports the abstraction of using keys as indices with a syntax such as ๐‘€ ๐‘˜ .

    โ€ข As a mental warm-up, consider a restricted setting in which a map with ๐‘› items uses keys that are known to be integers in a range from 0 to ๐‘ โˆ’ 1, for some

    ๐‘ โ‰ฅ ๐‘›.

  • MORE GENERAL KINDS OF KEYS

    โ€ข But what should we do if our keys are not integers in the range from 0 to ๐‘โ€“1?

    โ€ข Use a hash function to map general keys to corresponding indices in a table.

    โ€ข For instance, the last four digits of a Social Security number.

    01234 451-229-0004

    981-101-0002

    025-612-0001

    โ€ฆ

  • HASH TABLES

    โ€ข A Hash function โ„Ž ๐‘˜ โ†’ [0,๐‘ โˆ’ 1]

    โ€ข The integer โ„Ž(๐‘˜) is referred to as the hash value of key ๐‘˜

    โ€ข Example - โ„Ž ๐‘˜ = ๐‘˜ mod ๐‘ could be a hash function for integers

    โ€ข Hash tables consist ofโ€ข A hash function โ„Ž

    โ€ข Array ๐ด of size ๐‘ (either to an element itself or to a โ€œbucketโ€)

    โ€ข Goal is to store elements ๐‘˜, ๐‘ฃ at index ๐‘– = โ„Ž ๐‘˜

  • ISSUES WITH HASH TABLES

    โ€ข Issues

    โ€ข Collisions - some keys will map to the same index of H (otherwise we have a Direct Address Table).

    โ€ข Chaining - put values that hash to same location in a linked list (or a โ€œbucketโ€)

    โ€ข Open addressing - if a collision occurs, have a method to select another location in the table.

    โ€ข Load factor

    โ€ข Rehashing

  • EXAMPLE

    โ€ข We design a hash table for a Map storing items (SSN, Name), where

    SSN (social security number) is a

    nine-digit positive integer

    โ€ข Our hash table uses an array of size๐‘ = 10,000 and the hash function

    โ„Ž ๐‘˜ = last four digits of ๐‘˜

    0

    1

    2

    3

    4

    9997

    9998

    9999

    โ€ฆ451-229-0004

    981-101-0002

    200-751-9998

    025-612-0001

  • HASH FUNCTIONS

    โ€ข A hash function is usually specified as the composition of two functions:

    โ€ข Hash code:โ„Ž1: keys integers

    โ€ข Compression function:โ„Ž2: integers [0, ๐‘ โˆ’ 1]

    โ€ข The hash code is applied first, and the compression function is applied

    next on the result, i.e.,

    โ„Ž ๐‘˜ = โ„Ž2 โ„Ž1 ๐‘˜

    โ€ข The goal of the hash function is to โ€œdisperseโ€ the keys in an apparently

    random way

  • HASH CODES

    โ€ข Memory address:โ€ข We reinterpret the memory address of the

    key object as an integer

    โ€ข Good in general, except for numeric and string keys

    โ€ข Integer cast:โ€ข We reinterpret the bits of the key as an

    integer

    โ€ข Suitable for keys of length less than or equal to the number of bits of the integer type (e.g.,

    byte, short, int and float in C++)

    โ€ข Component sum:โ€ข We partition the bits of the key into

    components of fixed length (e.g., 16 or

    32 bits) and we sum the components

    (ignoring overflows)

    โ€ข Suitable for numeric keys of fixed length greater than or equal to the

    number of bits of the integer type (e.g.,

    long and double in C++)

  • HASH CODES

    โ€ข Polynomial accumulation:โ€ข We partition the bits of the key into a

    sequence of components of fixed length

    (e.g., 8, 16 or 32 bits)

    ๐‘Ž0๐‘Ž1โ€ฆ๐‘Ž๐‘›โˆ’1

    โ€ข We evaluate the polynomial๐‘ ๐‘ง = ๐‘Ž0 + ๐‘Ž1๐‘ง + ๐‘Ž2๐‘ง

    2 +โ‹ฏ+ ๐‘Ž๐‘›โˆ’1๐‘ง๐‘›โˆ’1

    at a fixed value z, ignoring overflows

    โ€ข Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a

    set of 50,000 English words)

    โ€ข Cyclic Shift:

    โ€ข Like polynomial accumulation except use bit shifts instead of multiplications

    and bitwise or instead of addition

    โ€ข Can be used on floating point numbers as well by converting the

    number to an array of characters

  • COMPRESSION FUNCTIONS

    โ€ข Division:

    โ€ข โ„Ž2 ๐‘˜ = ๐‘˜ mod ๐‘

    โ€ข The size N of the hash table is usually chosen to be a prime

    โ€ข The reason has to do with number theory and is beyond the scope of this

    course

    โ€ข Multiply, Add and Divide (MAD):

    โ€ข โ„Ž2 ๐‘˜ = ๐‘Ž๐‘˜ + ๐‘ mod ๐‘

    โ€ข ๐‘Ž and ๐‘ are nonnegative integers such that

    ๐‘Ž mod ๐‘ โ‰  0

    โ€ข Otherwise, every integer would map to the same value ๐‘

  • COLLISION RESOLUTION WITHSEPARATE CHAINING

    โ€ข Collisions occur when different elements are mapped to the same

    cell

    โ€ข Separate Chaining: let each cell in the table point to a linked list of

    entries that map there

    โ€ข Chaining is simple, but requires additional memory outside the table

    0

    1

    2

    3

    4 451-229-0004 981-101-0004

    025-612-0001

  • EXERCISESEPARATE CHAINING

    โ€ข Assume you have a hash table ๐ป with ๐‘ = 9 slots (๐ด[0 โˆ’ 8]) and let the hash function be โ„Ž ๐‘˜ = ๐‘˜ mod ๐‘

    โ€ข Demonstrate (by picture) the insertion of the following keys into a hash table with collisions resolved by chaining

    โ€ข 5, 28, 19, 15, 20, 33, 12, 17, 10

  • COLLISION RESOLUTION WITHOPEN ADDRESSING - LINEAR PROBING

    โ€ข In Open addressing the colliding item is placed in a different cell of the table

    โ€ข Linear probing handles collisions by placing the colliding item in the next (circularly)

    available table cell. So the ๐‘–th cell checked is:โ„Ž ๐‘˜, ๐‘– = โ„Ž ๐‘˜ + ๐‘– mod ๐‘

    โ€ข Each table cell inspected is referred to as a โ€œprobeโ€

    โ€ข Colliding items lump together, causing future collisions to cause a longer probe sequence

    โ€ข Example:

    โ€ข โ„Ž ๐‘˜ = ๐‘˜ mod 13

    โ€ข Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

    0 1 2 3 4 5 6 7 8 9 10 11 12

    41 18 44 59 32 22 31 73

    0 1 2 3 4 5 6 7 8 9 10 11 12

  • SEARCH WITH LINEAR PROBING

    โ€ข Consider a hash table A that uses linear probing

    โ€ข get(k)โ€ข We start at cell โ„Ž ๐‘˜

    โ€ข We probe consecutive locations until one of the following occurs

    โ€ข An item with key ๐‘˜ is found, or

    โ€ข An empty cell is found, or

    โ€ข ๐‘ cells have been unsuccessfully probed

    Algorithm get(k)

    1. ๐‘– โ† โ„Ž ๐‘˜2. ๐‘ โ† 03. repeat4. ๐‘ โ† ๐ด ๐‘–5. if ๐‘ โ‰  โˆ…6. return ๐‘›๐‘ข๐‘™๐‘™7. else if ๐‘.key()= ๐‘˜8. return ๐‘9. else10. ๐‘– โ† ๐‘– + 1 mod ๐‘11. ๐‘ โ† ๐‘ + 112. until ๐‘ = ๐‘13. return ๐‘›๐‘ข๐‘™๐‘™

  • UPDATES WITH LINEAR PROBING

    โ€ข To handle insertions and deletions, we introduce a special object, called

    DEFUNCT, which replaces deleted

    elements

    โ€ข remove(k)โ€ข We search for an item with key ๐‘˜

    โ€ข If such an item ๐‘˜, ๐‘ฃ is found, we replace it with the special item DEFUNCT

    โ€ข Else, we return null

    โ€ข put(k, v)

    โ€ข We start at cell โ„Ž(๐‘˜)

    โ€ข We probe consecutive cells until one of the following occurs

    โ€ข A cell ๐‘– is found that is either empty or stores DEFUNCT, or

    โ€ข ๐‘ cells have been unsuccessfully probed

  • EXERCISEOPEN ADDRESSING โ€“ LINEAR PROBING

    โ€ข Assume you have a hash table ๐ป with ๐‘ = 11 slots (๐ด[0 โˆ’ 10]) and let the hash function be โ„Ž ๐‘˜ = ๐‘˜ mod ๐‘

    โ€ข Demonstrate (by picture) the insertion of the following keys into a hash table with collisions resolved by linear probing.

    โ€ข 10, 22, 31, 4, 15, 28, 17, 88, 59

  • COLLISION RESOLUTION WITHOPEN ADDRESSING โ€“ QUADRATIC PROBING

    โ€ข Linear probing has an issue with clustering

    โ€ข Another strategy called quadratic probing uses a hash functionโ„Ž ๐‘˜, ๐‘– = โ„Ž ๐‘˜ + ๐‘–2 mod ๐‘

    for ๐‘– = 0, 1,โ€ฆ ,๐‘ โˆ’ 1

    โ€ข This can still cause secondary clustering

  • COLLISION RESOLUTION WITHOPEN ADDRESSING - DOUBLE HASHING

    โ€ข Double hashing uses a secondary hash function โ„Ž2(๐‘˜) and handles collisions by placing an item in the first available cell of

    the series

    โ„Ž ๐‘˜, ๐‘– = โ„Ž1 ๐‘˜ + ๐‘–โ„Ž2 ๐‘˜ mod ๐‘

    for ๐‘– = 0, 1, โ€ฆ , ๐‘ โˆ’ 1

    โ€ข The secondary hash function โ„Ž2 ๐‘˜ cannot have zero values

    โ€ข The table size ๐‘ must be a prime to allow probing of all the cells

    โ€ข Common choice of compression map for the secondary hash function:

    โ„Ž2 ๐‘˜ = ๐‘ž โˆ’ ๐‘˜ mod ๐‘ž

    where

    โ€ข ๐‘ž < ๐‘

    โ€ข ๐‘ž is a prime

    โ€ข The possible values for โ„Ž2 ๐‘˜ are 1, 2, โ€ฆ , ๐‘ž

  • PERFORMANCE OF HASHING

    โ€ข In the worst case, searches, insertions and removals on a hash table take ๐‘‚ ๐‘› time

    โ€ข The worst case occurs when all the keys inserted into the map collide

    โ€ข The load factor ๐›ผ = ๐‘›๐‘

    affects the performance of

    a hash table

    โ€ข Assuming that the hash values are like random numbers, it can be shown that the expected number

    of probes for an insertion with open addressing is1

    1 โˆ’ ๐›ผ=1

    1 โˆ’ ๐‘› ๐‘=1

    ๐‘ โˆ’ ๐‘› ๐‘

    =๐‘

    ๐‘ โˆ’ ๐‘›

    โ€ข The expected running time of all the Map ADT operations in a hash table is ๐‘‚ 1

    โ€ข In practice, hashing is very fast provided the load factor is not close to 100%

    โ€ข Applications of hash tablesโ€ข Small databases

    โ€ข Compilers

    โ€ข Browser caches

  • UNIFORM HASHING ASSUMPTION

    โ€ข The probe sequence of a key ๐‘˜ is the sequence of slots probed when looking for ๐‘˜

    โ€ข In open addressing, the probe sequence is โ„Ž ๐‘˜, 0 , โ„Ž ๐‘˜, 1 , โ€ฆ , โ„Ž ๐‘˜,๐‘ โˆ’ 1

    โ€ข Uniform Hashing Assumption

    โ€ข Each key is equally likely to have any one of the ๐‘! permutations of {0, 1, โ€ฆ ,๐‘ โˆ’ 1} as is probe sequence

    โ€ข Note: Linear probing and double hashing are far from achieving Uniform Hashing

    โ€ข Linear probing: ๐‘ distinct probe sequences

    โ€ข Double Hashing: ๐‘2 distinct probe sequences

  • PERFORMANCE OF UNIFORM HASHING

    โ€ข Theorem: Assuming uniform hashing and an open-address hash table with load

    factor ๐›ผ =๐‘›

    ๐‘< 1, the expected number of probes in an unsuccessful search is

    at most 1

    1โˆ’๐›ผ.

    โ€ข Exercise: compute the expected number of probes in an unsuccessful search in

    an open address hash table with ๐›ผ =1

    2, ๐›ผ =

    3

    4, and ๐›ผ =

    99

    100.

  • ON REHASHING

    โ€ข Keeping the load factor low is vital for performance

    โ€ข When resizing the table:

    โ€ข Reallocate space for the array (of size that is a prime)

    โ€ข Design a new hash function (new parameters) for the new array size (practically, change the mod)

    โ€ข For each item you reinsert into the table rehash

  • SUMMARY MAPS (SO FAR)

    put(k, v) get(k) Space

    Unsorted list ๐‘‚(๐‘›) ๐‘‚(๐‘›) ๐‘‚(๐‘›)

    Direct Address Table ๐‘‚(1) ๐‘‚(1) ๐‘‚(๐‘)

    Sorted Search Table

    (Naturally supported Sorted Map)

    ๐‘‚(๐‘›) ๐‘‚(log ๐‘›) ๐‘‚(๐‘›)

    Hashing

    (chaining)๐‘‚๐‘›

    ๐‘๐‘‚๐‘›

    ๐‘๐‘‚(๐‘› + ๐‘)

    Hashing

    (open addressing)๐‘‚1

    1 โˆ’๐‘›๐‘

    ๐‘‚1

    1 โˆ’๐‘›๐‘

    ๐‘‚(๐‘)

  • SKIP LISTS

    +-

    S0

    S1

    S2

    S3

    +- 10 362315

    +- 15

    +- 2315

  • RANDOMIZED ALGORITHMS

    โ€ข A randomized algorithm controls its execution through random selection (e.g., coin tosses)

    โ€ข It contains statements like:๐‘ โ† randomBit()

    if ๐‘ = 0

    do somethingโ€ฆ

    else //๐‘ = 1

    do something elseโ€ฆ

    โ€ข Its running time depends on the outcomes of the โ€œcoin tossesโ€

    โ€ข Through probabilistic analysis we can derive the expected running time of a randomized algorithm

    โ€ข We make the following assumptions in the analysis:โ€ข the coins are unbiased

    โ€ข the coin tosses are independent

    โ€ข The worst-case running time of a randomized algorithm is often large but has very low probability (e.g., it occurs

    when all the coin tosses give โ€œheadsโ€)

    โ€ข We use a randomized algorithm to insert items into a skip list to insert in expected ๐‘‚(log ๐‘›)โ€“time

    โ€ข When randomization is used in data structures they are referred to as probabilistic data structures

  • WHAT IS A SKIP LIST?

    โ€ข A skip list for a set S of distinct (key, element) items is a series of lists ๐‘†0, ๐‘†1, โ€ฆ , ๐‘†โ„Ž

    โ€ข Each list ๐‘†๐‘– contains the special keys +โˆž and โˆ’โˆž

    โ€ข List ๐‘†0 contains the keys of ๐‘† in non-decreasing order

    โ€ข Each list is a subsequence of the previous one, i.e.,๐‘†0 โŠ‡ ๐‘†1 โŠ‡ โ‹ฏ โŠ‡ ๐‘†โ„Ž

    โ€ข List ๐‘†โ„Ž contains only the two special keys

    โ€ข Skip lists are one way to implement the Ordered Map ADT

    โ€ข Java applet

    56 64 78 +31 34 44- 12 23 26

    +-

    +31-

    64 +31 34- 23

    S0

    S1

    S2

    S3

    http://www.cs.umd.edu/class/spring2002/cmsc420-0401/demo/SkipList2/

  • IMPLEMENTATION

    โ€ข We can implement a skip list with quad-nodes

    โ€ข A quad-node stores:โ€ข (Key, Value)

    โ€ข links to the nodes before, after, below, and above

    โ€ข Also, we define special keys +โˆž and โˆ’โˆž, and we modify the key comparator to handle them

    x

    quad-node

  • SEARCH - GET(K)

    โ€ข We search for a key ๐‘˜ in a skip list as follows:โ€ข We start at the first position of the top list โ€ข At the current position ๐‘, we compare ๐‘˜ with ๐‘ฆ โ† ๐‘. next(). key()๐‘ฅ = ๐‘ฆ: we return ๐‘. next(). value()๐‘ฅ > ๐‘ฆ: we scan forward๐‘ฅ < ๐‘ฆ: we drop down

    โ€ข If we try to drop down past the bottom list, we return null

    โ€ข Example: search for 78

    +-

    S0

    S1

    S2

    S3

    +31-

    64 +31 34- 23

    56 64 78 +31 34 44- 12 23 26

  • EXERCISESEARCH

    โ€ข We search for a key ๐‘˜ in a skip list as follows:โ€ข We start at the first position of the top list โ€ข At the current position ๐‘, we compare ๐‘˜ with ๐‘ฆ โ† ๐‘. next(). key()๐‘ฅ = ๐‘ฆ: we return ๐‘. next(). value()๐‘ฅ > ๐‘ฆ: we scan forward๐‘ฅ < ๐‘ฆ: we drop down

    โ€ข If we try to drop down past the bottom list, we return NO_SUCH_KEY

    โ€ข Ex 1: search for 64: list the (๐‘†๐‘–, node) pairs visited and the return value

    โ€ข Ex 2: search for 27: list the (๐‘†๐‘–, node) pairs visited and the return value

    +-

    S0

    S1

    S2

    S3

    +31-

    64 +31 34- 23

    56 64 78 +31 34 44- 12 23 26

  • INSERTION - PUT(K, V)

    โ€ข To insert an item (๐‘˜, ๐‘ฃ) into a skip list, we use a randomized algorithm:โ€ข We repeatedly toss a coin until we get tails, and we denote with ๐‘– the number of times the coin came up heads

    โ€ข If ๐‘– โ‰ฅ โ„Ž, we add to the skip list new lists ๐‘†โ„Ž+1, โ€ฆ , ๐‘†๐‘–+1 each containing only the two special keys

    โ€ข We search for ๐‘˜ in the skip list and find the positions ๐‘0, ๐‘1, โ€ฆ , ๐‘๐‘– of the items with largest key less than ๐‘˜ in each list ๐‘†0, ๐‘†1, โ€ฆ , ๐‘†๐‘–

    โ€ข For ๐‘– โ† 0,โ€ฆ , ๐‘–, we insert item (๐‘˜, ๐‘ฃ) into list ๐‘†๐‘– after position ๐‘๐‘–

    โ€ข Example: insert key 15, with ๐‘– = 2

    +-

    S0

    S1

    S2

    S3

    +- 10 362315

    +- 15

    +- 2315

    +- 10 36

    +-

    23

    23 +-

    S0

    S1

    S2

    p0

    p1

    p2

  • DELETION - REMOVE(K)

    โ€ข To remove an item with key ๐‘˜ from a skip list, we proceed as follows:

    โ€ข We search for ๐‘˜ in the skip list and find the positions ๐‘0, ๐‘1, โ€ฆ , ๐‘๐‘– of the items with key ๐‘˜, where position ๐‘๐‘– is in list ๐‘†๐‘–

    โ€ข We remove positions ๐‘0, ๐‘1, โ€ฆ , ๐‘๐‘– from the lists ๐‘†0, ๐‘†1, โ€ฆ , ๐‘†๐‘–

    โ€ข We remove all but one list containing only the two special keys

    โ€ข Example: remove key 34

    - +4512

    - +

    23

    23- +

    S0

    S1

    S2

    - +

    S0

    S1

    S2

    S3

    - +4512 23 34

    - +34

    - +23 34p0

    p1

    p2

  • SPACE USAGE

    โ€ข The space used by a skip list depends on the random bits used by each invocation of

    the insertion algorithm

    โ€ข We use the following two basic probabilistic facts:

    โ€ข Fact 1: The probability of getting ๐‘–consecutive heads when flipping a coin is

    1

    2๐‘–

    โ€ข Fact 2: If each of ๐‘› items is present in a set with probability ๐‘, the expected size of the set is ๐‘›๐‘

    โ€ข Consider a skip list with ๐‘› itemsโ€ข By Fact 1, we insert an item in list ๐‘†๐‘– with

    probability 1

    2๐‘–

    โ€ข By Fact 2, the expected size of list ๐‘†๐‘– is ๐‘›

    2๐‘–

    โ€ข The expected number of nodes used by the skip list is

    ๐‘–=0

    โ„Ž๐‘›

    2๐‘–= ๐‘›

    ๐‘–=0

    โ„Ž1

    2๐‘–< 2๐‘›

    โ€ข Thus the expected space is ๐‘‚ 2๐‘›

  • HEIGHT

    โ€ข The running time of find ๐‘˜ , put ๐‘˜, ๐‘ฃ , and erase ๐‘˜ operations are affected by the height โ„Ž of the skip list

    โ€ข We show that with high probability, a skip list with ๐‘› items has height ๐‘‚ log ๐‘›

    โ€ข We use the following additional probabilistic fact:

    โ€ข Fact 3: If each of ๐‘› events has probability ๐‘, the probability that at least one event occurs

    is at most ๐‘›๐‘

    โ€ข Consider a skip list with ๐‘› itemsโ€ข By Fact 1, we insert an item in list ๐‘†๐‘– with

    probability 1

    2๐‘–

    โ€ข By Fact 3, the probability that list ๐‘†๐‘– has at least one item is at most

    ๐‘›

    2๐‘–

    โ€ข By picking ๐‘– = 3 log ๐‘›, we have that the probability that ๐‘†3 log ๐‘› has at least one

    item is

    at most ๐‘›

    23 log ๐‘›=๐‘›

    ๐‘›3=1

    ๐‘›2

    โ€ข Thus a skip list with ๐‘› items has height at most 3 log ๐‘› with probability at least 1 โˆ’

    1

    ๐‘›2

  • SEARCH AND UPDATE TIMES

    โ€ข The search time in a skip list is proportional toโ€ข the number of drop-down steps

    โ€ข the number of scan-forward steps

    โ€ข The drop-down steps are bounded by the height of the skip list and thus are ๐‘‚ log ๐‘›expected time

    โ€ข To analyze the scan-forward steps, we use yet another probabilistic fact:

    โ€ข Fact 4: The expected number of coin tosses required in order to get tails is 2

    โ€ข When we scan forward in a list, the destination key does not belong to a higher list

    โ€ข A scan-forward step is associated with a former coin toss that gave tails

    โ€ข By Fact 4, in each list the expected number of scan-forward steps is 2

    โ€ข Thus, the expected number of scan-forward steps is ๐‘‚(log ๐‘›)

    โ€ข We conclude that a search in a skip list takes ๐‘‚ log ๐‘› expected time

    โ€ข The analysis of insertion and deletion gives similar results

  • EXERCISE

    โ€ข You are working for ObscureDictionaries.com a new online start-up which specializes in sci-fi languages. The CEO wants your team to describe a data structure which will

    efficiently allow for searching, inserting, and deleting new entries. You believe a skip

    list is a good idea, but need to convince the CEO. Perform the following:

    โ€ข Illustrate insertion of โ€œX-wingโ€ into this skip list. Randomly generated (1, 1, 1, 0).

    โ€ข Illustrate deletion of an incorrect entry โ€œEnterpriseโ€

    โ€ข Argue the complexity of deleting from a skip list

    - +YodaBoba Fett

    - +

    Enterprise

    Enterprise- +

    S0

    S1

    S2

  • SUMMARY

    โ€ข A skip list is a data structure for dictionaries that uses a randomized

    insertion algorithm

    โ€ข In a skip list with ๐‘› items

    โ€ข The expected space used is ๐‘‚ ๐‘›

    โ€ข The expected search, insertion and deletion time is ๐‘‚(log ๐‘›)

    โ€ข Using a more complex probabilistic analysis, one can show that these

    performance bounds also hold with

    high probability

    โ€ข Skip lists are fast and simple to implement in practice