dictionaries i data structures and parallelism

32
Dictionaries I Data Structures and Parallelism

Upload: others

Post on 27-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dictionaries I Data Structures and Parallelism

Dictionaries I Data Structures and

Parallelism

Page 2: Dictionaries I Data Structures and Parallelism

Warm Up

Write a recurrence to represent the running time of this code

CSE 373 SU 18 - ROBBIE WEBER 2

int Mystery(int n){

if(n <= 5)

return 1;

for(int i=0; i<n; i++){

for(int j=0; j<n; j++){

System.out.println(β€œhi”);

}

}

return n*Mystery(n/2);

}

Page 3: Dictionaries I Data Structures and Parallelism

Announcements

Project 1 is due Friday at 11:59 PM.-Remember both partners use the late day(s) if you submit late.

P1: for HashTrieMap, be sure you’re updating the size field we gave you.-We haven’t given you tests for everything!

-Remember that the tests we give you are not exhaustive; passing 90% of the tests on IntelliJ does not mean you would get 90% on the project.

Start thinking about potential P2 partners!

Page 4: Dictionaries I Data Structures and Parallelism

Outline

One more unrolling example

Two new (old?) ADTs-Dictionaries

-Sets

Review BSTs

Intro AVL trees

Page 5: Dictionaries I Data Structures and Parallelism

Solving Recurrences III

CSE 373 SU 18 - ROBBIE WEBER 5

𝑐n2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

𝑐n

16

2

… …… … …… … …… … …… … …… … …… … …… … …… … ……

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Answer the following

questions:

1. What is input size on

level 𝑖?2. Number of nodes at

level 𝑖?3. Work done at

recursive level 𝑖?4. Last level of tree?

5. Work done at base

case?

6. What is sum over all

levels?

𝑇 𝑛 =5 π‘€β„Žπ‘’π‘› 𝑛 ≀ 4

3𝑇𝑛

4+ 𝑐𝑛2 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

𝑐n

4

2

𝑐n

4

2

𝑐n

4

2c𝑛

4

2

c𝑛

4

2

𝑐𝑛

4

2

𝑐𝑛2

Page 6: Dictionaries I Data Structures and Parallelism

Solving Recurrences III

CSE 373 SU 18 - ROBBIE WEBER 6

Level (i)Number of

Nodes

Work per

Node

Work per

Level

0 1 𝑐𝑛2 𝑐𝑛2

1 3 𝑐𝑛

4

2 3

16𝑐𝑛2

2 32 𝑐𝑛

42

2 3

16

2

𝑐𝑛2

𝑖 3𝑖 𝑐𝑛

4𝑖

2 3

16

𝑖

𝑐𝑛2

Base =

log4𝑛 βˆ’ 13log4 π‘›βˆ’1 5

5

3𝑛log4 3

1. Input size on level 𝑖?

2. How many calls on level 𝑖?

3. How much work on level 𝑖?

4. What is the last level?

5. A. How much work for each leaf node?

B. How many base case calls?

𝑛

4𝑖𝑐

𝑛

4𝑖

2

When 𝑛

4𝑖= 4β†’ log4 𝑛 βˆ’ 1

5

𝑇 𝑛 =5 π‘€β„Žπ‘’π‘› 𝑛 ≀ 4

3𝑇𝑛

4+ 𝑐𝑛2 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

6. Combining it all together…

3𝑖𝑐𝑛

4𝑖

2

=3

16

𝑖

𝑐𝑛2

𝑇 𝑛 =

𝑖=0

log4 𝑛 βˆ’23

16

𝑖

𝑐𝑛2 +5

3𝑛log43

3log4 π‘›βˆ’1 =3log4 𝑛

3 power of a log

π‘₯log𝑏 𝑦 = 𝑦log𝑏 π‘₯

=𝑛log4 3

3

3𝑖

Page 7: Dictionaries I Data Structures and Parallelism

Solving Recurrences III

CSE 373 SU 18 - ROBBIE WEBER 7

𝑇 𝑛 =

𝑖=0

log4 𝑛 βˆ’23

16

𝑖

𝑐𝑛2 +5

3𝑛log43

𝑇 𝑛 ≀ 𝑐𝑛21

1 βˆ’316

+5

3𝑛log43

𝑇 𝑛 = 𝑐𝑛2316

log4 π‘›βˆ’1

βˆ’ 1

316 βˆ’ 1

+5

3𝑛log43

𝑇 𝑛 ∈ 𝑂(𝑛2)

𝑖=π‘Ž

𝑏

𝑐𝑓(𝑖) = 𝑐

𝑖=π‘Ž

𝑏

𝑓(𝑖)

factoring out a

constant𝑇 𝑛 = 𝑐𝑛2

𝑖=0

log4 𝑛 βˆ’23

16

𝑖

+5

3𝑛log43

𝑖=0

π‘›βˆ’1

π‘₯𝑖 =π‘₯𝑛 βˆ’ 1

π‘₯ βˆ’ 1

finite geometric series

𝑖=0

∞

π‘₯𝑖 =1

1 βˆ’ π‘₯

infinite geometric

series

when -1 < x < 1

If we’re trying to prove upper bound…

𝑇 𝑛 ≀ 𝑐𝑛2

𝑖=0

∞3

16

𝑖

+ 4𝑛log43

Closed form:

7. Simplify…

Page 8: Dictionaries I Data Structures and Parallelism

𝑇 𝑛 =5 π‘€β„Žπ‘’π‘› 𝑛 ≀ 4

3𝑇𝑛

4+ 𝑐𝑛2 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

Page 9: Dictionaries I Data Structures and Parallelism

𝑇 𝑛 =5 π‘€β„Žπ‘’π‘› 𝑛 ≀ 4

3𝑇𝑛

4+ 𝑐𝑛2 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

Page 10: Dictionaries I Data Structures and Parallelism

Our Next ADT

Dictionary ADT

find(key) – returns the stored value

associated with key.

state

behaviorSet of (key, value) pairs

delete(key) – removes the key and its value

from the dictionary.

insert(key, value) – inserts (key, value) pair.

If key was already in dictionary, overwrites

the previous value.

Real world intuition:

keys: words

values: definitions

Dictionaries are

often called β€œmaps”

Page 11: Dictionaries I Data Structures and Parallelism

Our Next ADT

Set ADT

find(element) – returns true if element is in

the set, false otherwise.

state

behaviorSet of elements

delete(key) – removes the key and its value

from the dictionary.

insert(element) – inserts element into the

set.

Usually implemented

as a dictionary with

values β€œtrue” or β€œfalse”

Later in the course

we’ll want more

complicated set

operations like

union(set1, set2)

Page 12: Dictionaries I Data Structures and Parallelism

Uses of Dictionaries

Dictionaries show up all the time.

There are too many applications to really list all of them:-Phonebooks

-Indexes

-Databases

-Operating System memory management

-The internet (DNS)

-…

Any time you want to organize information for easy retrieval.

We’re going to design three completely different implementations of Dictionaries – they have that many different uses.

Page 13: Dictionaries I Data Structures and Parallelism

Simple Dictionary Implementations

Insert Find Delete

Unsorted Linked List

Unsorted Array

Sorted Linked List

Sorted Array

What are the worst case running times for each operation if you have 𝑛(key, value) pairs.

Assume the arrays do not need to be resized.

Think about what happens if a repeat key is inserted!

Page 14: Dictionaries I Data Structures and Parallelism

Simple Dictionary Implementations

Insert Find Delete

Unsorted Linked List Θ(𝑛) Θ(𝑛) Θ(𝑛)

Unsorted Array Θ(𝑛) Θ(𝑛) Θ 𝑛

Sorted Linked List Θ(𝑛) Θ(𝑛) Θ(𝑛)

Sorted Array Θ(𝑛) Θ(log 𝑛) Θ(𝑛)

What are the worst case running times for each operation if you have 𝑛(key, value) pairs.

Assume the arrays do not need to be resized.

Think about what happens if a repeat key is inserted!

Page 15: Dictionaries I Data Structures and Parallelism

Aside: Lazy Deletion

Lazy Deletion: A general way to make delete() more efficient.

Don’t remove the entry from the structure, just β€œmark” it as deleted.

Benefits:-Much simpler to implement

-More efficient (no need to shift values on every single delete)

Drawbacks:-Extra space:

-For the flag

-More drastically, data structure grows with all insertions, not with the current number of items.

-Sometimes makes other operations more complicated.

Page 16: Dictionaries I Data Structures and Parallelism

Simple Dictionary Implementations

Insert Find Delete

Unsorted Linked List Θ(π‘š) Θ(π‘š) Θ(π‘š)

Unsorted Array Θ(π‘š) Θ(π‘š) Θ π‘š

Sorted Linked List Θ(π‘š) Θ(π‘š) Θ(π‘š)

Sorted Array Θ(π‘š) Θ(logπ‘š) Θ(logπ‘š)

We can do slightly better with lazy deletion, let π‘š be the total number of

elements ever inserted (even if later lazily deleted)

Think about what happens if a repeat key is inserted!

Page 17: Dictionaries I Data Structures and Parallelism

A Better Implementation

What about BSTs?

Keys will have to be comparable…

Insert Find Delete

Average

Worst

Page 18: Dictionaries I Data Structures and Parallelism

A Better Implementation

What about BSTs?

Keys will have to be comparable…

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛)

Let’s talk about how to implement delete.

Page 19: Dictionaries I Data Structures and Parallelism

Deletion from BSTs

Deleting will have three steps:-Finding the element to delete

-Removing the element

-Restoring the BST property

Page 20: Dictionaries I Data Structures and Parallelism

Deletion – Easy Cases

What if the elements to delete is:-A leaf?

-Has exactly one child?

6

72

84

9

Page 21: Dictionaries I Data Structures and Parallelism

Deletion – Easy Cases

What if the elements to delete is:-A leaf?

-Has exactly one child?6

72

84

9

Deleting a leaf:

Just get rid of it.

Delete(7)

Page 22: Dictionaries I Data Structures and Parallelism

Deletion – Easy Cases

What if the elements to delete is:-A leaf?

-Has exactly one child?

6

72

84

9

Deleting a node with one

child:

Delete the node

Connect its parent and child

Delete(4)

Page 23: Dictionaries I Data Structures and Parallelism

Deletion – The Hard Case

What happens if the node to delete has two children?

4

52

73

9

8 106

What if we try

Delete(7)?

What can we replace it

with?

6 or 8

The biggest thing in left

subtree or smallest thing in

right subtree.

Page 24: Dictionaries I Data Structures and Parallelism

A Better Implementation

What about BSTs?

Keys will have to be comparable.

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛)

Page 25: Dictionaries I Data Structures and Parallelism

A Better Implementation

What about BSTs?

Keys will have to be comparable.

Insert Find Delete

Average Θ(log 𝑛) Θ(log 𝑛) Θ(log 𝑛)

Worst Θ(𝑛) Θ(𝑛) Θ(𝑛)

We’re in the same position we were in for heaps

BSTs are great on average,

but we need to avoid the worst case.

Page 26: Dictionaries I Data Structures and Parallelism

Avoiding the Worst Case

Take I:

Let’s require the tree to be complete.

It worked for heaps!

What goes wrong:

When we insert, we’ll break the completeness property.

Insertions always add a new leaf, but you can’t control where.

Can we fix it?

Not easily :/

Page 27: Dictionaries I Data Structures and Parallelism

Avoiding the Worst Case

Take II:

Here are some other requirements you might try. Could they work? If not what can go wrong?

Root Balanced: The root must have the same number of nodes in its

left and right subtrees

Recursively Balanced: Every node must have the same number of

nodes in its left and right subtrees.

Root Height Balanced: The left and right subtrees of the root must

have the same height.

Page 28: Dictionaries I Data Structures and Parallelism

Avoiding the Worst Case

Take III:

The AVL condition

This actually works. To convince you it works, we have to check:1. Such a tree must have height 𝑂(log 𝑛) .

2. We must be able to maintain this property when inserting/deleting

AVL condition: For every node, the height of its left subtree and

right subtree differ by at most 1.

Page 29: Dictionaries I Data Structures and Parallelism

Bounding the Height

Suppose you have a tree of height β„Ž, meeting the AVL condition.

What is the minimum number of nodes in the tree?

If β„Ž = 0, then 1 node

If β„Ž = 1, then 2 nodes.

In general?

AVL condition: For every node, the height of its left subtree and

right subtree differ by at most 1.

Page 30: Dictionaries I Data Structures and Parallelism

Bounding the Height

In general, let 𝑁() be the minimum number of nodes in a tree of height β„Ž, meeting the AVL requirement.

𝑁 β„Ž = ቐ1 if β„Ž = 02 if β„Ž = 1

𝑁 β„Ž βˆ’ 1 + 𝑁 β„Ž βˆ’ 2 + 1 otherwise

Page 31: Dictionaries I Data Structures and Parallelism

Bounding the Height

𝑁 β„Ž = ቐ1 if β„Ž = 02 if β„Ž = 1

𝑁 β„Ž βˆ’ 1 + 𝑁 β„Ž βˆ’ 2 + 1 otherwise

We can try unrolling or recursion trees.

Page 32: Dictionaries I Data Structures and Parallelism

Bounding the Height

𝑁 β„Ž = ቐ1 if β„Ž = 02 if β„Ž = 1

𝑁 β„Ž βˆ’ 1 + 𝑁 β„Ž βˆ’ 2 + 1 otherwise

When unrolling we’ll quickly realize:-Something with Fibonacci numbers is going on.

-It’s really hard to exactly describe the pattern.

The real solution (using deep math magic beyond this course) is

𝑁 β„Ž β‰₯ πœ™β„Ž βˆ’ 1 where πœ™ is 1+ 5

2β‰ˆ 1.62