cs305/503, spring 2009 binary trees

45
CS305/503, Spring 2009 Binary Trees Michael Barnathan

Upload: harmon

Post on 23-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

CS305/503, Spring 2009 Binary Trees. Michael Barnathan. The Midterm. Grades were generally good. There was a 5 point curve. Remember, only the higher of the midterm and final count. If you’re not happy with your exam grade, taking the final can only raise it. Let’s review each question. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS305/503, Spring 2009 Binary Trees

CS305/503, Spring 2009Binary Trees

Michael Barnathan

Page 2: CS305/503, Spring 2009 Binary Trees

The Midterm• Grades were generally good. There was a 5 point curve.• Remember, only the higher of the midterm and final

count. If you’re not happy with your exam grade, taking the final can only raise it.

• Let’s review each question.

Page 3: CS305/503, Spring 2009 Binary Trees

Here’s what we’ll be learning:• Data structures:

– Binary Trees.– Binary Search Trees.

• Theory:– Tree traversals.

• Preorder.• Inorder.• Postorder.

– Balanced and complete trees.– Recursion on Binary Trees.

Page 4: CS305/503, Spring 2009 Binary Trees

Linear Structures• Arrays, Linked Lists, Stacks, and Queues are linear

data structures.– Even circularly linked lists.

• One element follows another: there is always just one “next” element.

• As we mentioned, these usually yield recurrences of the form T(n) = T(n-1) + f(n).

• What if we violate this assumption? What if a structure had two next elements?– In a way, we started with the most restrictive structure

(arrays), which we are progressively relaxing.

Page 5: CS305/503, Spring 2009 Binary Trees

Frankenstein’s Data Structures

1

2

5

3

4

7

6

Data 10 21 44Index 0 1 2

Data 5 10Index 1 2

Data 42Index 2

It hardly even makes sense to talk about a node having more than one successor in an array. What would data[2] be here? It’s not clear.

Linked lists make more sense, but what is 1->next now? We need more information.

Page 6: CS305/503, Spring 2009 Binary Trees

Binary Trees• There are now two next nodes, not one.• Clearly, we need two pointers to model them.• So let’s rotate that list…

• Let’s call the pointers “left” and “right”.• This structure is known as a binary tree.

– Binary: branches into two nodes.• Higher-order trees exist too; we’ll talk about these later.

– Why we call it a tree should be obvious.

1

2

5

3

4 76

Left Right

Page 7: CS305/503, Spring 2009 Binary Trees

Nomenclature• Some from botany, some from genealogy.• Root: The “highest” node in the tree; that is, the one without a

parent.– Almost all tree algorithms start at the root.

• Child/Subtree: A node one level below the current node. Traversing the left or right pointers will bring you to a node’s “left” or “right” child.

• Leaf: A node at the “bottom” of the tree; i.e. one without children (or really with two null children).

• Parent: The node one level above.• Siblings: Nodes with the same parent.

• “Complete” tree: a tree where every node has either 0 or 2 children and all leaves are at the same level. (Basically, it’s fully “filled in”).

Page 8: CS305/503, Spring 2009 Binary Trees

Recursive Definition

• Binary trees have a nice recursive definition:• A binary tree is a value, a left binary tree, and

a right binary tree.• Thus, each individual node is itself a tree.

• Base case: the empty tree.– Leaves’ left and right children are both empty.– We usually represent this with nulls.

Page 9: CS305/503, Spring 2009 Binary Trees

Node Access

• All you must store is a reference to the root.• You can get to the rest of the nodes by

traversing the tree.• Example: Accessing 5.

• There’s a problem.– Anyone see it?

1

2

5

3

4 76

Root

Page 10: CS305/503, Spring 2009 Binary Trees

Traversal• We have no way of knowing where 5 is.• In order to find it, we need to check every node in the tree.• So what’s the complexity of access?

– “Check every” should ring alarm bells by now.• This is a nonlinear data structure, so we have more than one way to

traverse, however.• There are three common tree traversals:

– Preorder, inorder, and postorder.• There are some more exotic ones, too: traversals based on pointer

inversion, threaded traversal, Robson traversal…• Since binary trees are recursive structures, tree algorithms are usually

recursive. Traversals are no exception.• Remember how we reversed the output of printTo(n) by moving the

output above or below the recursive call?– It turns out you can change the order of the traversal in the same way.

Page 11: CS305/503, Spring 2009 Binary Trees

Traversals and “Visiting”

• You can do anything to a node inside of a tree traversal algorithm!

• You certainly can search for a value.• But you can also output its value, modify the its

value, insert a node there, etc.• This generic action is simply called “visiting” the

node when discussing traversals.

Page 12: CS305/503, Spring 2009 Binary Trees

Preorder Traversal• If I gave you a binary tree and asked you to search for an element,

how would you go about it?– You would check the value.– You would search the left subtree.– You would search the right subtree.

• This is how preorder traversal works.– We check/output/do something with the node value.– We recurse on the left subtree.– We recurse on the right subtree.

• We stop once we run out of nodes.• This is also called depth-first traversal, because it first focuses on

traversing down specific nodes before broadly visiting others.– Like taking a lot of CS courses first before satisfying a core curriculum.– Preorder traversal is a special case of depth-first search on data

structures called graphs, which we will discuss soon.

Page 13: CS305/503, Spring 2009 Binary Trees

Preorder TraversalBinaryTree preorder(BinaryTree node, int targetvalue) {

if (node == null) //Base case.return null;

else if (node.value == targetvalue) //Found it.return node;

BinaryTree lchild = preorder(node.left); //Traverse left.if (lchild != null)

return lchild; //Found on the left.

return preorder(node.right); //Traverse right.}

Page 14: CS305/503, Spring 2009 Binary Trees

Preorder Traversal: Illustration

1

2

5

3

4 76

Root

Order: [1 2 4 5 3 6 7]The root is always the first node to be visited.

Page 15: CS305/503, Spring 2009 Binary Trees

Inorder Traversal• We have two recursive calls in the preorder traversal: left

and right.• In preorder, we checked the node before calling either of

them.• In an inorder traversal, we check in-between the two calls.• We dive down all the way on the left before outputting,

then we visit the right.– To use recursive stack language, we output after popping on the

left but before pushing on the right.• There is no inherent advantage to choosing one traversal

over another on a regular binary tree unless you deliberately want a certain ordering.

• However, inorder traversal is important on a variation of the binary tree. More on that in just a moment.

Page 16: CS305/503, Spring 2009 Binary Trees

Inorder TraversalBinaryTree inorder(BinaryTree node, int targetvalue) {

if (node == null) //Base case.return null;

//All we did was swap the order of these two lines.BinaryTree lchild = inorder(node.left); //Traverse left.if (node.value == targetvalue) //Found it.

return node;

if (lchild != null)return lchild; //Found on the left.

return inorder(node.right); //Traverse right.}

Page 17: CS305/503, Spring 2009 Binary Trees

Inorder Traversal: Illustration

1

2

5

3

4 76

Root

Order: [4 2 5 1 6 3 7]The root is always the middle node.

Page 18: CS305/503, Spring 2009 Binary Trees

Postorder Traversal

• The obvious next step: output after both recursive calls.

• This causes the algorithm to dive down to the bottom of the tree and output/visit the node when going back up.– Similar to what we did in printTo(), actually.– We are outputting on the pop.

Page 19: CS305/503, Spring 2009 Binary Trees

Postorder TraversalBinaryTree postorder(BinaryTree node, int targetvalue) {

if (node == null) //Base case.return null;

BinaryTree lchild = postorder(node.left); //Traverse left.BinaryTree rchild = postorder(node.right); //Traverse right.if (node.value == targetvalue) //Found it.

return node;

if (lchild != null)return lchild; //Found on the left.

//Found on the right or not at all.return rchild;

}

Page 20: CS305/503, Spring 2009 Binary Trees

Postorder Traversal: Illustration

1

2

5

3

4 76

Root

Order: [4 5 2 6 7 3 1]The root is always the last node to be visited.

Page 21: CS305/503, Spring 2009 Binary Trees

CRUD: Binary Trees.• Insertion: ?• Access: ?• Updating an element: ?• Deleting an element: ?

• Search/Traversal: O(n).

• All three traversals are linear: They visit every node in sequence.– They each just follow different sequences.– You can search by traversing, so search is also O(n).

• How long would it take to access a node, though?– If I knew I wanted the left child’s left child, how many pointers would I

need to follow to get to it?

Page 22: CS305/503, Spring 2009 Binary Trees

Tree Height.• To analyze worst-case access, we need to talk about tree

height.• The height of a tree is the number of vertical levels it

contains, not including the root level.• Or you can think of it as the number of times you’d have to

traverse down the tree to get from the root to the lowest leaf node.

• Nodes in the tree are said to have a depth, based on how many vertical levels they are down from the root.– The root itself has a depth of 0.– The root’s children have a depth of 1.– Their children have a depth of 2…– Etc.

• The height is thus also the depth of the lowest node.

Page 23: CS305/503, Spring 2009 Binary Trees

Height

1

2

5

3

4 76

Root

Height = 2Remember, don’t count the root level.

Depth = 0

Depth = 1

Depth = 2

Page 24: CS305/503, Spring 2009 Binary Trees

Height Balance• A tree is considered balanced (or height-balanced) if the

depth of the highest and lowest leaves differs by no more than 1.

• This turns out to be an important property because it forms a lower bound on the access time of the tree and lets us find the height.

• Question: If we have n nodes in a balanced binary tree, what is the height of the tree?– floor(log2 n)– Note that we had 7 nodes in the previous tree, but a height of 2.

The tree was full; adding an 8th node would take the height to 3.• The time to access a node depends on the height, thus we

know it is O(log n) on a balanced tree.

Page 25: CS305/503, Spring 2009 Binary Trees

Degeneracy• Trees with only left or right pointers degenerate into

linked lists.– Which gives you another perspective on why Quicksort

became quadratic with everything on one side of the pivot.– Access on linked lists is O(n).

• Performance gets worse even as we approach this condition, so we want to keep trees balanced.

1

2

3

1

2

3

Page 26: CS305/503, Spring 2009 Binary Trees

CRUD: Balanced Binary Trees.• Insertion: ?• Access: O(log n).• Updating an element: ?• Deleting an element: ?

• Search/Traversal: O(n).

• Once we know where to insert, insertion is simple.– Just add a new leaf there: O(1).

• However, discovering where to insert is a bit trickier.– Anywhere that a null child used to be will work.– We don’t want to upset the balance of the tree.– A good strategy is to traverse down the tree based on the value of

each node. This creates a partitioning at each level.

Page 27: CS305/503, Spring 2009 Binary Trees

Binary Tree Insertionvoid insert(BinaryTree root, BinaryTree newtree) {

//This can only happen now if the user passes in an empty tree.if (root == null)

root = newtree; //Empty. Insert the root.else if (newtree.value < root.value) { //Go left if <.

if (root.left == null) //Found a place to insert.root.left = newtree;

elseinsert(root.left, newtree); //Keep traversing.

}else { //Go right if >=.

if (root.right == null)root.right = newtree; //Found a place to insert.

elseinsert(root.right, newtree); //Keep traversing.

}}

Page 28: CS305/503, Spring 2009 Binary Trees

Insertion Analysis

• This is similar to a traversal, but guided by the value of the node.

• We choose left or right based on whether the node is < or >=.

• We split into one subproblem of size n/2 each time we traverse.– What recurrence would we have for this?– What would be the solution?

Page 29: CS305/503, Spring 2009 Binary Trees

CRUD: Balanced Binary Trees.• Insertion: O(log n).• Access: O(log n).• Updating an element: O(1).• Deleting an element: ?

• Search/Traversal: O(n).

• If we’re already at the element we need to update, we can just change the value, thus O(1).– Note that we can say the same for insertion, but finding a place to put the

node is usually considered part of it.• Deletion is quite complex, on the other hand.

– If there are no children, just remove the node – O(1).– If there is one child, just replace the node with its child – O(1).– If there are two… well, that’s the tricky case.

Page 30: CS305/503, Spring 2009 Binary Trees

Deletion

• If we need to delete a node with two children, we need to find a suitable node to replace it with.

• One good choice is the inorder successor of the node, which will be the leftmost leaf of the right child we’re deleting from.– Inorder successor meaning the next node in an

inorder traversal.

• So our course is clear: inorder traverse, stop at the next node, swap.

Page 31: CS305/503, Spring 2009 Binary Trees

Deletionvoid deleteWithTwoSubtrees(BinaryTree targetnode) {

if (targetnode == null) //Deleting a null is a no-op.

return;

//Find the inorder successor and its parent.BinaryTree inorder_succ;BinaryTree inorder_parent = targetnode;for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ = inorder_succ.left)

inorder_parent = inorder_succ; //Keep track of the parent.

//Set the value of the parent to that of the inorder successor…targetnode.value = inorder_succ.value;

//Delete the inorder successor (here’s why we needed the parent):inorder_parent.left = null;

}

Page 32: CS305/503, Spring 2009 Binary Trees

CRUD: Balanced Binary Trees.

• Insertion: O(log n).• Access: O(log n).• Updating an element: O(1).• Deleting an element: O(log n).

• Search/Traversal: O(n).

• Finding the inorder successor requires time proportional to the height of the tree. If the tree is balanced, this is O(log n).

Page 33: CS305/503, Spring 2009 Binary Trees

CRUD: Unbalanced Binary Trees.• Insertion: O(n).• Access: O(n).• Updating an element: O(1).• Deleting an element: O(1).

• Search/Traversal: O(n).

• The worst sort of unbalanced tree is just a linked list. • The deletion algorithm would always hit the second case (only one

child), so we’d never experience O(log n) behavior…• But the insertion algorithm is not as efficient as that of a linked list.

– Unless we check for this condition explicitly, in which case we get O(1).

Page 34: CS305/503, Spring 2009 Binary Trees

Binary Search Trees

• Binary Search Trees (BSTs) capture the notion of “splitting into two”.

• Or, to use the Quicksort term, partitioning.– The value of a node is the pivot.– The left tree contains elements < the pivot.– The right tree contains elements >= the pivot.

• They are simply binary trees that are kept sorted in the manner stated above.

Page 35: CS305/503, Spring 2009 Binary Trees

BSTs: What do they entail?

• Like priority queues and sorted arrays, binary search trees are inherently sorted containers.

• This means inserting a sequence of elements and then reading them back will get them out in sorted order.– Ah, but this time we have three ways to read them back

out. All three can’t give us the same order.– The elements of an inorder traversal are sorted in binary

search trees.• It also means that we’ll have to do some extra work to

ensure that this guarantee is true.– But will this work influence the asymptotic performance?

Page 36: CS305/503, Spring 2009 Binary Trees

A Binary Search Tree

4

2

3

6

1 75

Root

Inorder traversal: [1 2 3 4 5 6 7]< on the left, >= on the right.

Page 37: CS305/503, Spring 2009 Binary Trees

CRUD: Binary Search Trees.• Insertion: O(log n).• Access: O(log n).• Updating an element: ?• Deleting an element: O(log n).

• Search: O(log n).• Traversal: O(n).

• Search and traversal are no longer the same operation!– Traversal is analogous to linear search: look at every element, one at a time, and try to find

the target.– Search on a BST is analogous to binary search: the data is sorted around the value of the node

we’re at, so it guides us to eliminate half of the remaining elements at each step.– Just like other unsorted containers, we have to traverse to search a standard binary tree. And

like other sorted containers, a BST lets us do a binary search.• Remember, BSTs are sorted in an inorder traversal.

– Therefore, the deletion algorithm we previously specified will preserve the ordering.

Page 38: CS305/503, Spring 2009 Binary Trees

Access on a BST• Use the same strategy we used in binary search:

– Compare the node.– If the target value is less than the node’s value, go left

(eliminates the right subtree).– If the target value is greater than the node’s value, go

right (eliminates the left subtree).– If it’s equal, we’ve found the target.– If we hit a NULL, the target isn’t in the tree.

• This exhibits the same performance: O(log n).– If the tree is balanced. In the degenerate case, we are

binary searching a linked list, which is O(n).

Page 39: CS305/503, Spring 2009 Binary Trees

Insertion on a BST

• The algorithm I gave you for insertion was actually the BST insertion algorithm as well.– That was one of the reasons why I chose that

strategy, although it does result in a fairly balanced tree if the data distribution is uniform.

• In order to keep elements partitioned around the pivot, we need to traverse left when the new element has a value < the pivot and right when it’s >=.

• It was O(log n) before, and it still is.

Page 40: CS305/503, Spring 2009 Binary Trees

Deletion on a BST

• I also gave you the BST deletion algorithm.• As the inorder traversal is in sorted order, the

inorder successor is the next element after the one we’re deleting in sorted order.

• If we replace the element we’re deleting with the next element in the sequence, the sequence is still sorted.– e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13].

• It was O(log n) before, and it still is.

Page 41: CS305/503, Spring 2009 Binary Trees

Updating a BST• Ah, here’s something different.• Updating unsorted containers is usually a constant-time operation,

while updating sorted containers usually takes longer.• When we change the value of a node in a BST, we may be required

to change the node’s position in the tree to preserve the ordering.– This is why updating sorted containers is usually a slow operation.

• No one seems to want to deal with updating these, so most sources (including your textbook) just define it as “delete and reinsert”.– Which fully works and is very simple to do.– Don’t be afraid to do “quick and dirty” things if they don’t harm your

performance.• So does this harm performance?

– Insertion is O(log n).– Deletion is O(log n).– Unless we can update in O(1) on a BST (we can’t), then no.

Page 42: CS305/503, Spring 2009 Binary Trees

CRUD: Binary Search Trees.• Insertion: O(log n).• Access: O(log n).• Updating an element: O(log n).• Deleting an element: O(log n).

• Search: O(log n).• Traversal: O(n).

• This is the ultimate compromise data structure.– Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in

linear time.– This does everything (except traversal, which is inherently a linear operation) in logarithmic

time.– But remember, logarithmic time isn’t much worse than constant.– So these are pretty good data structures.

• As usual, there’s a catch…

Page 43: CS305/503, Spring 2009 Binary Trees

The Important of Balance• Every operation on a tree begins to degenerate when balance is

lost.– And in the worst case, you end up with a less efficient linked list.

• Keeping the tree balanced is thus important.– There is one who is prophesized to bring balance to the Force, but I

don’t think that includes your trees.– So the burden falls on you, my young padawan.

• Since BSTs are the structural analogue of Quicksort, you may have an idea of what insertion sequence will produce the worst case.– Yep, sorted or inverse-sorted, just as in Quicksort.

• Most data is not arranged like this already, and on average, BSTs stay fairly well balanced.

• But this is enough of a problem where various self-balancing structures have been invented. We will discuss these next week.

Page 44: CS305/503, Spring 2009 Binary Trees

A General Note

• Although I put numbers in most of my examples, any sort of data can go in these.– Strings, Objects, Employees.

• Caveat: When using Java’s sorted containers, make sure your class implements Comparable.– Java doesn’t give you a BinaryTree class outright,

but it does give you TreeSet and TreeMap.– TreeMap in particular is very neat; check it out.– We’ll do some things with these in Thursday’s lab.

Page 45: CS305/503, Spring 2009 Binary Trees

“In all my life I’ll never see / a thing so beautiful as a tree.”

• The study of trees goes very deep.– We’ve just scratched the surface.– We’ll come back to self-balancing trees, heaps,

and perhaps splay trees. • The lesson:

– Ideas are universal. They can come from your study. They can come from outside of your study. They can come from nature. They can come from anywhere.

• Next class: Linear-time sorting, B+ trees.