dictionaries i data structures and parallelism

Dictionaries I Data Structures and

Parallelism

Warm Up

Write a recurrence to represent the running time of this code

CSE 373 SU 18 - ROBBIE WEBER 2

int Mystery(int n){

if(n <= 5)

return 1;

for(int i=0; i<n; i++){

for(int j=0; j<n; j++){

System.out.println(“hi”);

return n*Mystery(n/2);

Announcements

Project 1 is due Friday at 11:59 PM.-Remember both partners use the late day(s) if you submit late.

P1: for HashTrieMap, be sure you’re updating the size field we gave you.-We haven’t given you tests for everything!

-Remember that the tests we give you are not exhaustive; passing 90% of the tests on IntelliJ does not mean you would get 90% on the project.

Start thinking about potential P2 partners!

Outline

One more unrolling example

Two new (old?) ADTs-Dictionaries

Review BSTs

Intro AVL trees

Solving Recurrences III

𝑐n2

… …… … …… … …… … …… … …… … …… … …… … …… … ……

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Answer the following

questions:

1. What is input size on

level 𝑖?2. Number of nodes at

level 𝑖?3. Work done at

recursive level 𝑖?4. Last level of tree?

5. Work done at base

6. What is sum over all

levels?

𝑇 𝑛 =5 𝑤ℎ𝑒𝑛 𝑛 ≤ 4

3𝑇𝑛

4+ 𝑐𝑛2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

2c𝑛

𝑐𝑛

𝑐𝑛2

Level (i)Number of

Work per

0 1 𝑐𝑛2 𝑐𝑛2

1 3 𝑐𝑛

16𝑐𝑛2

2 32 𝑐𝑛

𝑐𝑛2

𝑖 3𝑖 𝑐𝑛

𝑐𝑛2

Base =

log4𝑛 − 13log4 𝑛−1 5

3𝑛log4 3

1. Input size on level 𝑖?

2. How many calls on level 𝑖?

3. How much work on level 𝑖?

4. What is the last level?

5. A. How much work for each leaf node?

B. How many base case calls?

4𝑖𝑐

When 𝑛

4𝑖= 4→ log4 𝑛 − 1

3𝑇𝑛

6. Combining it all together…

3𝑖𝑐𝑛

𝑐𝑛2

𝑇 𝑛 =

𝑖=0

log4 𝑛 −23

𝑐𝑛2 +5

3𝑛log43

3log4 𝑛−1 =3log4 𝑛

3 power of a log

𝑥log𝑏 𝑦 = 𝑦log𝑏 𝑥

=𝑛log4 3

𝑇 𝑛 =

𝑖=0

log4 𝑛 −23

𝑐𝑛2 +5

3𝑛log43

𝑇 𝑛 ≤ 𝑐𝑛21

1 −316

3𝑛log43

𝑇 𝑛 = 𝑐𝑛2316

log4 𝑛−1

316 − 1

3𝑛log43

𝑇 𝑛 ∈ 𝑂(𝑛2)

𝑖=𝑎

𝑐𝑓(𝑖) = 𝑐

𝑖=𝑎

𝑓(𝑖)

factoring out a

constant𝑇 𝑛 = 𝑐𝑛2

𝑖=0

log4 𝑛 −23

3𝑛log43

𝑖=0

𝑛−1

𝑥𝑖 =𝑥𝑛 − 1

𝑥 − 1

finite geometric series

𝑖=0

𝑥𝑖 =1

1 − 𝑥

infinite geometric

series

when -1 < x < 1

If we’re trying to prove upper bound…

𝑇 𝑛 ≤ 𝑐𝑛2

𝑖=0

+ 4𝑛log43

Closed form:

7. Simplify…

3𝑇𝑛

Our Next ADT

Dictionary ADT

find(key) – returns the stored value

associated with key.

behaviorSet of (key, value) pairs

delete(key) – removes the key and its value

from the dictionary.

insert(key, value) – inserts (key, value) pair.

If key was already in dictionary, overwrites

the previous value.

Real world intuition:

keys: words

values: definitions

Dictionaries are

often called “maps”

Our Next ADT

Set ADT

find(element) – returns true if element is in

the set, false otherwise.

behaviorSet of elements

delete(key) – removes the key and its value

from the dictionary.

insert(element) – inserts element into the

Usually implemented

as a dictionary with

values “true” or “false”

Later in the course

we’ll want more

complicated set

operations like

union(set1, set2)

Uses of Dictionaries

Dictionaries show up all the time.

There are too many applications to really list all of them:-Phonebooks

-Indexes

-Databases

-Operating System memory management

-The internet (DNS)

Any time you want to organize information for easy retrieval.

We’re going to design three completely different implementations of Dictionaries – they have that many different uses.

Simple Dictionary Implementations

Insert Find Delete

Unsorted Linked List

Unsorted Array

Sorted Linked List

Sorted Array

What are the worst case running times for each operation if you have 𝑛(key, value) pairs.

Assume the arrays do not need to be resized.

Think about what happens if a repeat key is inserted!

Insert Find Delete

Unsorted Linked List Θ(𝑛) Θ(𝑛) Θ(𝑛)

Unsorted Array Θ(𝑛) Θ(𝑛) Θ 𝑛

Sorted Linked List Θ(𝑛) Θ(𝑛) Θ(𝑛)

Sorted Array Θ(𝑛) Θ(log 𝑛) Θ(𝑛)

What are the worst case running times for each operation if you have 𝑛(key, value) pairs.

Assume the arrays do not need to be resized.

Aside: Lazy Deletion

Lazy Deletion: A general way to make delete() more efficient.

Don’t remove the entry from the structure, just “mark” it as deleted.

Benefits:-Much simpler to implement

-More efficient (no need to shift values on every single delete)

Drawbacks:-Extra space:

-For the flag

-More drastically, data structure grows with all insertions, not with the current number of items.

-Sometimes makes other operations more complicated.

Insert Find Delete

Unsorted Linked List Θ(𝑚) Θ(𝑚) Θ(𝑚)

Unsorted Array Θ(𝑚) Θ(𝑚) Θ 𝑚

Sorted Linked List Θ(𝑚) Θ(𝑚) Θ(𝑚)

Sorted Array Θ(𝑚) Θ(log𝑚) Θ(log𝑚)

We can do slightly better with lazy deletion, let 𝑚 be the total number of

elements ever inserted (even if later lazily deleted)

A Better Implementation

What about BSTs?

Keys will have to be comparable…

Insert Find Delete

Average

What about BSTs?

Keys will have to be comparable…

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛)

Let’s talk about how to implement delete.

Deletion from BSTs

Deleting will have three steps:-Finding the element to delete

-Removing the element

-Restoring the BST property

Deletion – Easy Cases

What if the elements to delete is:-A leaf?

-Has exactly one child?

-Has exactly one child?6

Deleting a leaf:

Just get rid of it.

Delete(7)

-Has exactly one child?

Deleting a node with one

child:

Delete the node

Connect its parent and child

Delete(4)

Deletion – The Hard Case

What happens if the node to delete has two children?

What if we try

Delete(7)?

What can we replace it

6 or 8

The biggest thing in left

subtree or smallest thing in

right subtree.

What about BSTs?

Keys will have to be comparable.

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛)

What about BSTs?

Keys will have to be comparable.

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛) Θ(𝑛)

We’re in the same position we were in for heaps

BSTs are great on average,

but we need to avoid the worst case.

Avoiding the Worst Case

Take I:

Let’s require the tree to be complete.

It worked for heaps!

What goes wrong:

When we insert, we’ll break the completeness property.

Insertions always add a new leaf, but you can’t control where.

Can we fix it?

Not easily :/

Take II:

Here are some other requirements you might try. Could they work? If not what can go wrong?

Root Balanced: The root must have the same number of nodes in its

left and right subtrees

Recursively Balanced: Every node must have the same number of

nodes in its left and right subtrees.

Root Height Balanced: The left and right subtrees of the root must

have the same height.

Take III:

The AVL condition

This actually works. To convince you it works, we have to check:1. Such a tree must have height 𝑂(log 𝑛) .

2. We must be able to maintain this property when inserting/deleting

AVL condition: For every node, the height of its left subtree and

right subtree differ by at most 1.

Bounding the Height

Suppose you have a tree of height ℎ, meeting the AVL condition.

What is the minimum number of nodes in the tree?

If ℎ = 0, then 1 node

If ℎ = 1, then 2 nodes.

In general?

AVL condition: For every node, the height of its left subtree and

right subtree differ by at most 1.

Bounding the Height

In general, let 𝑁() be the minimum number of nodes in a tree of height ℎ, meeting the AVL requirement.

𝑁 ℎ = ቐ1 if ℎ = 02 if ℎ = 1

𝑁 ℎ − 1 + 𝑁 ℎ − 2 + 1 otherwise

Bounding the Height

𝑁 ℎ = ቐ1 if ℎ = 02 if ℎ = 1

We can try unrolling or recursion trees.

Bounding the Height

𝑁 ℎ = ቐ1 if ℎ = 02 if ℎ = 1

When unrolling we’ll quickly realize:-Something with Fibonacci numbers is going on.

-It’s really hard to exactly describe the pattern.

The real solution (using deep math magic beyond this course) is

𝑁 ℎ ≥ 𝜙ℎ − 1 where 𝜙 is 1+ 5

2≈ 1.62

dictionaries i data structures and parallelism

Documents

elementary data structures stacks, queues, lists, vectors,...

preliminary draft (c)2007 cambridge...

cse373: data structures & algorithms lecture 5:...

cse 332: data structures & parallelism lecture 22: minimum

cse373: data structures & algorithms lecture 4: ...

cse 326: data structures dictionaries for data compression

cse373: data structures & algorithms lecture 26: ...

concurrency data structures and parallelism

adam blank winter 2017lecture 20 cse 332...adam blank winter...

dictionaries, tuples, and files - advanced data structures...

computer science elementary data structures, dictionaries cs...

argument structures, verb patterns and dictionaries issues...

data structures haim kaplan and uri zwick november 2012...

dictionaries and tolerant retrieval. this lecture dictionary...

cs 261: graduate data structures week 2: dictionaries and

[harvard cs264] 12 - irregular parallelism on the gpu:...

fundamentals of python: from first programs through data...

comparisons sorts data structures and parallelism

cse373: data structures & algorithms lecture 5:...

dictionaries cs 110: data structures and algorithms first...