lecture 41: semester review

LECTURE 41:SEMESTER REVIEW

CSC 213 – Large Scale Programming

Final Exam

Tues., May 10 from 10:15 – 12:15 in OM 200 Plan on exam taking full 2 hours

If major problem, come talk to me ASAP Exam covers material from entire

semester Open-book & open-note so bring what

you’ve got My handouts, solutions, & computers are not

allowed Cannot collaborate with a neighbor on the

exam Problems will be in a similar style to 2

midterms Lab mastery: 2:45 – 3:45 on Thurs., May 12 in

OM119

Contemplative

Always Using Imagination

Most Important Trait

Critical Property of Test

All good tests FAIL

Loop Testing: Simple Loops

Loop executed at most n times, try inputs that: Skip loop entirely Make 1 pass through the loop Make 2 passes through the loop Make m passes through the loop, where (m

< n) If possible, n-1, n, & (n+1) passes through

the loop

Indexed File Format

Split information into two (or more) files Data file uses fixed-size records to store data Index files contain search terms & location

record starts Fixed-size records usually used in data

file Each record will use exactly that much space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &

rebuild index

Entry ADT

Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of

an Entry Implementations must define 2

methods key() & value() return appropriate item Usually includes setValue() but NOT setKey()

What is a MAP?

At simplest level, Map is collection of Entrys key-value pairs serve as the basic data in

a Map size() & isEmpty() work at level of Entry

Searchable data stored using Maps put() adds an Entry so key is mapped to

the value get() retrieves value associated with key

from Map remove() deletes entire Entry

At most one value per key using a Map

Dictionary ADT

DICTIONARY ADT very similar to MAP Hold searchable data in each of these ADTs Both data structures are collections of Entrys

Convert key to value using either concept DICTIONARY can have multiple values

to one key 1 value for key is still legal option

Dictionary ADT




“pantsless”

Dictionary ADT




“pantsless” Also many Entrys with same key but

different value “cool” “cool”

Using Hash Properly

Normally, table holds one Entry per index Need to be smarter when keys collide

Efficiency matters important critical If we do not care, use List-based approach

Several common schemes used to provide speed Each form of probing has strengths &

weaknesses Must consider bad hash effects before

using If this O(n) time unacceptable, use other

leafy plant

Binary Search Trees

Implements a BinaryTree for searching Map or Dictionary will be ADT exposed

Data organized to make usage efficient (maybe)

Strict ordering maintained in tree Nodes to the left are smaller Larger keys in right child of node Equal values not specified

No problem, just be consistent 6

6

92

41 10

BST Performance

Search, insert, & remove take O(h) time h is height of tree

Height’s best case is complete tree at O(log n)

O(n) height for linked list is BST’s worst case

AVL Tree Definition

Fancy type of BST O(log n) time

provided For this, needs more

info

6

92

41 8

5

AVL Tree Definition



info

6

92

41 8

5

Node heights are shown in blue

1

21 1

23

4

AVL Tree Definition



info Keep tree balanced

by… Checking heights of

kids Only let differ by 0 or

1

6

92

41 8

5


1

1 1

3

4

2

2

AVL Tree Definition



info Keep tree balanced

by… Checking heights of

kids Only let differ by 0 or

1 Fix larger

differences by Shifts nodes in the BST

For balance maintainenceTrinode Restructuring

6

92

41 8

5


1

1 1

3

4

2

2

Building a SplayTree

Another approach which builds upon BST Not an AVLTree, however, but a new BST

subclass

Concept Behind SplayTree

Splay trees do NOT maintain balance Recently used nodes clustered near top of BST

Most recently accessed nodes take O(1) time

Other nodes may need O(n) time to access Usually very efficient, but provides no

guarantees

Red-Black Tree

Root Property: Root node painted black External Property: Leaves are painted

black Internal Property: Red nodes’ children are

black Depth Property: Leaves have identical

black depth Number of black ancestors for the node

9

154

62 12

7

21

Map & Dictionary ADT

Implementation Searching

Adding Removing

Ordered List O(log n) O(n) O(n)

Unordered List

O(n) O(n)/O(1) O(n)

Hash O(n) O(n) O(n)

if lucky/good

O(1) O(1) O(1)

BST O(n) O(n) O(n)

AVL / balanced

O(log n) O(log n) O(log n)

Splay (expected)

O(log n) O(log n) O(log n)

Splay (worst-case)

O(n) O(n) O(n)

Sorting is a Dance

Merge Sort Execution Tree

Show steps used to sort all of the data

7 2 9 4 2 4 7 9

7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9

7 2 2 7

7 7 2 2

9 4 4 9

9 9

4 4

3 8 6 1 1 3 6 8

3 8 3 8

8 8

3 3

6 1 1 6

6 6

1 1

Quick Sort

Divide: Partition by pivot L has values <= p G uses values >= p

Recur: Sort L and G Conquer: Merge L, p, G

p

p

L G

p

Quick Sort v. Merge Sort

Quick Sort Merge Sort

Divide data around pivot Want pivot to be near

middle All comparisons occur

here

Conquer with recursion Does not need extra

space

Merge usually done already Data already sorted!

Divide data in blindly half Always gets even split No comparisons

performed!

Conquer with recursion Needs* to use other

arrays

Merge combines solutions Compares from (sorted)

halves

Bucket & Radix Sort

Sort data written as tuple of enumerable data Consumption of wine overall, in liters Annual per capita consumption of liters

Sort one place in tuple using bucket sort Uses 1 bucket per value that could be

enumerated When there are ties, preserve relative

ordering Repeat stable sorts to perform radix

sort Must preserve relative ordering, like bucket

sort From least to most important sort each

tuple place

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT 0001

0010

1001

1101

1110

1001

0010

1101

0001

1110

1001

0001

0010

1101

1110

1001

1101

0001

0010

1110

0010

1110

1001

1101

0001

Lower Bound on Sorting

Smallest number of comparisons is tree’s height Decision tree sorting n elements has n!

leaves At least log(n!) height needed for this many

leaves As we saw, this simplifies to at most O(n log

n) height O(n log n) time needed to compare data!

Practical lower bound, but cheating can do better

Need enumerable tuples - cannot always cheat

“If you believe radix hypothesis” it takes O(n) time

John

DavidPaul

brown.edu

cox.net

cs.brown.edu

att.netqwest.net

math.brown.edu

cslab1bcslab1a

Graph Applications

Electronic circuits Transportation networks Databases Packing suitcases Finding terrorists Scheduling college’s exams Assigning classes to rooms Garbage collection Coloring countries on a map Playing minesweeper

Edge List Structure

Simplest Graph Space efficient No change to use with

directed or undirected

v

u

w

a c

b zd

vertices

Edge List Structure



Fields Sequence of vertices

v

u

w

a c

b zd

u v w z

edges

Edge List Structure



Fields Sequence of vertices Sequence of edges

v w

a c

b

a

zd

b c d

vertices

v w z

u

u

Adjacency-List Implementation Vertex has Sequence of Edges Edges still refer

to Vertex

u wu

v

wa b

edges

vertices


to Vertex Ideas in Edge-List

serve as base

u w

u v w

a b

u

v

wa b

edges

vertices



serve as base Extends Vertex

u w

u v w

a b

u

v

wa b

edges

vertices



serve as base Extends Vertex

Add Position reference to speed removal

u w

u v w

a b

u

v

wa b

edges

vertices

0 1 2

0

1

2

Adjacency Matrix Structure

Edge-List structurestill used as base

u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2



Vertex stores int Index found in

matrix u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2




matrix

Adjacency matrix in Graph class

u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2




matrix

Adjacency matrix in Graph class null if

not adjacent

u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2




matrix Adjacency matrix

in Graph class null if

not adjacent -or-

Edge incidentto both vertices

u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2


Undirected edgesstored in both array locations

u v w

0 1 2

u

v

wa b

ba

edges

vertices

0 1 2

0

1

2



Directed edgesonly in array from source to target

u v w

0 1 2

u

v

wa b

ba

0 1 2

0

1

2

edges

vertices



Directed edgesonly in array from source to target

u v w

0 1 2

u

v

wa b

ba

n vertices & m edges no self-loops

Edge-List

Adjacency-List

Adjacency-Matrix

Space n + m n + m n2

incidentEdges(v) m deg(v) n + deg(v)

areAdjacent(v,w) m min(deg(v), deg(w)) 1

insertVertex(o) 1 1 n2

insertEdge(v,w,o) 1 1 1

removeVertex(v) m deg(v) n2

removeEdge(e) 1 1 1

Asymptotic Performance

Just Messing With You

Taking up time just to keep you from:

Graphs Solve Many Problems…

Understand how it works & what it does: DFS finds connected components in tree

form Connected vertices using minimal hops

using BFS Dijsktra’s minimizes weight to each vertex Weight of edge total minimized with Prim-

Jarnik Topological sort schedules vertices (when

possible) Can compute reachablility with Floyd-

Warshall

Given problem, which algorithm would solve it?

Graphs Solve Many Problems…

But Not All Problems…

Cost of Accessing Memory

How long memory access takes is also important Will make a major difference in time

program takes Easy memory aid to remember how

this works:

Cost of Accessing Memory

How long memory access takes is also important Will make a major difference in time

program takes Easy memory aid to remember how

this works:

Beer

Multi-Way Search Tree

Nodes contain multiple elements Tree grows up with leaves always at same

level Each internal node:

At least 2 children Has 1 fewer Entrys than children Entrys sorted from smallest to largest

11 24

2 6 8 15 27 30

Hints for Studying

Will NOT require memorizing: ADT’s methods Node implementations Big-Oh time proofs (Anything else you think of)

Hints for Studying

You should know (& be ready to look up): How ADTs & algorithms work (trace & big

ideas) For each ADT implementations, its pros &

cons Where & why each ADT would be used For each method what it does & what it

returns Big-Oh complexity & impact for important

methods

Studying For the Exam

1. What does the ADT/implementation do?

Where in the real-world is this found?

2. How is the ADT, search tree, or sort used?

What would we apply it to solve a problem?

How is it used and why?

3. What is necessary for implementation?

Given implementation, why do we do it like that?

What tradeoffs does this implementation make?

“Subtle” Hint

Do NOT bother with

memorizationBe able to access &use information quickly

lecture 41: semester review

Documents

key1 value

methodskey value

value pairs

basic data

maphold searchable data

adtsboth data structures

different value coolcool

level of entrysearchable