trees in datastructures

8/9/2019 Trees in DataStructures

1/26

ome

earningTutorials

Data Structures

Andersson Trees

AVL Trees

Binary Search Trees I

Binary Search Trees II

Hash Tables

Red Black Trees

Skip Lists

Linked Lists

Algorithms

Languages

Articles

Asymptotic NotationUsing rand()

Libraries

censing

By Julienne Walker

License: Public Domain

Trees are among the oldest and most heavily used data structures in computerprogramming. At the most general, trees are simply undirected graphs called freetrees. However, the most common type of tree is a binary search tree, because itallows for efficient sorting and searching of data, which are the two most heavilyperformed operations in software. Binary search trees are so common that whensomeone talks about trees, they usually mean a binary search tree. Henceforth, whenI saytree, I really meanbinary search treeto save on space and in doing so avoidcarpal tunnel. Also, when I saybinary search tree, I meanbinary search tree, whichcould be confusing, but bear with me. :-) This tutorial will discuss, often in painfuldetail, the basic concept of trees.

I'll do my best to cover the important and practical areas (always providing real,working code) without dipping into theory except where absolutely necessary. I haveno desire to add yet another dry textbook description of trees, so I'll focus on whatyou can use in the real world without having to get your PhD first. If you're a graduatestudent looking for lemmas and proofs, this tutorial is not for you . I'm neitherqualified nor interested in filling page after page with mathematical mumbo jumbo. Ifyou want to know what these binary thingies are or have a vague idea but just

don't quite get it, you've come to the right place! If you know what trees are andhave used them in real or test code, this tutorial can help you take your knowledgefurther by offering variations and alternatives that you may not have consideredbefore.

I'm a programmer, not a computer scientist. I make a living from writing code, not

theories. As such, I have a strong opinion about useless theory. Sure, it makes youlook smart, but it's not accessible to the average Joe, and in the end most theoryreally makes no difference in practice. I want my writings to have the widest possibleusefulness, so I write for people like me, who don't give a hoot about the theoryunless it's important for a decision, but care about writing great code. I also write likea programmer, with puns, off-color jokes, and various social defects included. I like tothink of this as adding character to the text, and it's a ton of fun to read too. :-)

A few notes on the code are in order before we start. All of the code is written in C. Ichose C because I can fit the meat of the topics into each example without a greatdeal of framework. C++ or Java seem to be good choices on the surface becausethey're so hyped to be better than Jolt! cola these days, but to do it properly requiresthe classes to be written using good style for those languages, and that adds a lot ofunnecessary clutter. Since I'm talking about trees and not the details of how to write

a good container class, I cut out the fluff by using C instead because the necessaryframework is minimal. C is also a good base for the common languages these days.C++ and Java are based off of C, so the syntax is similar enough that a reasonablycompetent programmer can do a good translation.

I don't dumb down my code. It's my belief that contriving useless examples thateveryone can understand is silly and helps nobody. I expect you to know C we llenough to fol low the examples, but I'll explain anything that isn't immediatelyobvious. All of the code is written with a production library in mind, driving home thatthe intended readers are people who want working code that they can actually use!I've seen too many tutorials that give code with syntax errors, logical errors,impractical constructs, or simplifying assumptions that make the code use lesseverywhere but in that tutorial. Such is not the case here. Not only do I allow you tocut code out of the tutorial and get it to work with minimal effort, I encourage it!

Everything in this tutorial is in the public domain. That means you can quote mewithout worry, take cool stuff from the code without fear, copy the entire contents ofthe tutorial and post it somewhere else, and even copy the code verbatim into alibrary that you intend to sell without asking permission first! Naturally the originalswill be the most up-to-date, but I impose no restrictions on use elsewhere. Taking mycode and calling it your own is also allowed, but it's bad karma. I trust that you'rehonest enough to give me credit where credit is due. Tutorials are hard work, so iswriting good code.

Concept

Okay, we're among friends now, right? You guys are programmers or aspiringprogrammers, so naturally you've been in a situation where you need to save a lot ofdata. It might be something like generating a histogram of unique words in a file, orsorting a huge and unknown number of IP addresses gathered from a socket, or even asimple database of email contacts. Well, most likely you ended up using a linked list,or even (gasp!) an array, both of which are totally unsuited for all of the above jobs.What we need is a data structure that keeps everything sorted without bending overbackward, is easy to insert and delete efficiently, and is easy to search efficiently.Neither arrays nor linked lists are a good fit, so they're poor solutions.
mailto:[email protected]://www.eternallyconfuzzled.com/jsw_license.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Libraries')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Libraries')http://www.eternallyconfuzzled.com/arts/jsw_art_rand.aspxhttp://www.eternallyconfuzzled.com/arts/jsw_art_bigo.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Articles')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Articles')http://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Tutorials//Languages')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Languages')http://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Tutorials//Algorithms')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Algorithms')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_linklist.aspxhttp://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_skip.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Skip%20Lists')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Red%20Black%20Trees')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Hash%20Tables')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_bst2.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Binary%20Search%20Trees%20II')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_bst1.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Binary%20Search%20Trees%20I')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_avl.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//AVL%20Trees')http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_andersson.aspxhttp://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures//Andersson%20Trees')http://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Tutorials//Data%20Structures')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials//Data%20Structures')http://__dopostback%28%27ctl00%24navtree%27%2C%27slearning//Tutorials')http://__dopostback%28%27ctl00%24navtree%27%2C%27tlearning//Tutorials')http://www.eternallyconfuzzled.com/jsw_brain.aspxhttp://www.eternallyconfuzzled.com/jsw_home.aspx


2/26

Well, we know that a binary search is fast because it divides the number of items thatneed to be searched in half at each step. The only downside is that binary searchrequires the list to be sorted. Let's try something crazy and actually store the items inthe list just like they would be found if we did a binary search for each. That is, thestart of the data structure would be the middle item. If we move to the left then wesee the middle of the left subset. Basically we're taking a linear list and changing it toan explicit binary search structure:

[0][1][2][3][4][5][6][7][8][9]

becomes

5

/ \

2 8

/ \ / \

1 4 7 9

/ / /

0 3 6

This is the basic idea behind a binary search tree. It's call ed a tree because itbranches like one (duh!), even though it grows down instead of up like normal trees.

Maybe we should be calling them binary search roots instead, since that's easier tolink to a common depiction of a tree's roots in your typical high school textbook.Sadly, trees are too well entrenched in computer science, so renaming them would bea bad idea.

If you start a search at the highest node in the tree, called (confusingly!) the root,then you can search for any item in the tree just by testing whether or not that itemis less than the root or greater than the root. Say we're looking for 3. 3 is less than 5,so we move to the left and call 2 the root. 3 is greater than 2, so we move to theright and call 4 the root. 3 is less than 4, so we move left and call 3 the root. 3 isequal to 3, so we found what we were looking for. You can follow that pattern to lookfor any item in the tree.

Okay, but what about an unsuccessful search? What if the item isn't in the tree? It's

simple! Just do a search like we did above, and if you walk off the bottom of the treethen the search was unsuccessful. Well, maybe not so simple because how do weknow when we walked off the bottom of the tree? There's a special node called a leafthat doesn't contain a valid item and tells us that we went too far in a search. Withmy convention of using a tilde ('~') for the leaf, the tree would look like this:

5

/ \

2 8

/ \ / \

1 4 7 9

/ \ / \ / \ / \

0 ~ 3 ~ 6 ~ ~ ~

/ \ / \ / \

~ ~ ~ ~ ~ ~

That's harder to draw with my ASCII art, so I won't use it often, but it's easy to seenow that you can stop the search when you reach a leaf. Just for future reference, let'sgo through some terminology. We know that the top of the tree is called the root, butsince trees can be defined recursively, every node in the tree can also be called a root(the root of the subtree). If a node is the root of a subtree then it has a parent thatpoints to it. So 2 is the parent of 1, and 5 is the parent of 2. Looking from the other

direction, 2 is a child of 5 and 1 is a child of 2. This all falls back to the family treeconnection, so don't be surprised if you hear things like grandparent, greatgrandparent, and the like when discussing trees.

Nodes that have two children are called internal nodes, and nodes that have one orfewer children are called external nodes. A leaf is a leaf. The height of a tree (orsubtree because trees are recursive) is the number of nodes from the root to a leaf,not including the root. In the above diagram, the height of 8 is 2 because the longestpath to a leaf has two nodes in it, 7 and 6. Sometimes you see the height includingthe root as well, so the height of 8 could also be 3. Either way is correct as long as


3/26

it's consistently used.

Okay, so it's easy to search efficiently in a binary search tree. But what about easilyinserting and deleting nodes? That doesn't seem easy at first glance. Well, insertinginto a tree is both simple and efficient if you look at it from the perspective of anunsuccessful search. If you search for the new item that you know isn't there, you canreplace the leaf you're sure to get to with a new node that has that item. You canalso handle duplicates with an extra special case that either treats a duplicate as lessthan or greater than itself. Either way works, and the duplicates will line up down thepath.

Deletion is harder (but not as much as many people would like you to think), so we'llsave it for now and look at it in detail a bit later. But what about keeping the datasorted? Since every node's left children are less than or equal to it, and every node'sright children are greater than or equal to it, the items are kept in sorted order.Sometimes it's a little tricky to actually take advantage of that order because the nextitem isn't necessarily in an adjacent node. Consider 2 and 3 being separated by 4, and4 and 5 being separated by 2. We'll look at several ways of exploiting this structure toget what we want, but for now just remember that it really is sorted!

So trees meet all of our requirements. They're easy to search efficiently, items aresorted by default, so no extra work is needed there, it's easy to insert new items, andif you take my word for it, deletion isn't that bad either. Okay, so we know what wewant, now let's try to put concept to code. The first thing we need is a structure for anode. It's also convenient to treat a tree as a whole, so we'll use a second structurethat just holds the root of the tree:

1struct jsw_node {

2 int data;3 struct jsw_node *link[2];

4 };

5

6struct jsw_tree {

7 struct jsw_node *root;

8 };

Okay, this is a typical self-referential data structure. If you know how a linked listworks, there shouldn't be any problem. A structure can contain a pointer to itself, soit's easy to link together two instances by assigning the address of one to the pointermember of the other. The only confusing part should be the link array. Yes, I couldhave used two pointers called left and right, but as you'll see shortly, the operationson a binary search tree are symmetric. By using an array and a boolean value as theindex for that array, we can avoid unnecessary repetition of code with only a minorincrease in complexity. Once you get used to the idiom, it's actually simpler than theusual left/right idiom. We'll simply use a null pointer for a leaf, since that's the mostcommon solution and it's pretty easy to test for.

With those preliminaries out of the way, we can go straight to the meat of t hetutorial. We'll start with searching, since you already know how it works. That way youcan see how the operations are performed with code to strike home all of the details.

Search

Okay, we know that a search starts at the root and moves either left or rightdepending on whether the item we're searching for is greater than or less than theitem at the root (ignoring a tree that allows duplicates for now). Then we go left orright and do the same thing with the next node. Since we're basically treating eachsubtree as if it were a unique tree, and each step does the same thing, we can do allof this recursively. The code is so simple, it's almost frightening:

1int jsw_find_r ( struct jsw_node *root, int data )

2 {

3 if ( root == NULL )

4 return0;

5 elseif ( root->data == data )

6 return1;

7 else {

8 int dir = root->data < data;

9 return jsw_find_r ( root->link[dir], data );

10 }

11 }12

13int jsw_find ( struct jsw_tree *tree, int data )

14 {

15 return jsw_find_r ( tree->root, data );

16 }

The base cases for the recursion are a successful and unsuccessful search. If the rootis a null pointer, we've reached a leaf and can return a failure code. If the itemcontained in the root matches the item we're searching for, we can return a success


4/26

code. Otherwise, we test whether the item we're looking for is less than or greaterthan the root and move left or right, respectively. The return codes are pushed all theway back up the recursion, so jsw_find can return what it gets. 0 means the itemwasn't found and 1 means that it was. The point ofjsw_find is for convenience. Nowa user can just sayjsw_find ( tree, data ) and treat the tree as a black box insteadof grabbing the root withjsw_find_r ( tree->root, data ) .

Okay, I don't imagine that this code is a big stumbling block for you, and if it is thenyou're probably not quite ready for binary search trees. However, the magic with dirand the link array is almost always confusing to people who are new to it, so I'll goover how it works. We know that link[0] is the left subtree and link[1] is the rightsubtree because I told you before, so to go left we need to make sure that thecomparison results in 0 and to go right we need to make sure that the result of thecomparison is 1. That's a problem because the most obvious test, data < root->data,gives us 1 and that's the wrong direction because we want to go left. So to get thingsto work the correct way, we reverse the test and ask ifroot->data < data. Thismakes sure that ifdata is less than root->data, we get 0 and move in the correctdirection.

While some things are easier with trees when you use recursion, many people prefernon-recursive solutions for a number of good reasons. Searching a tree withoutrecursion is even simpler than with recursion, removes the problem of stack overflow,and has a tendency to execute faster and with less memory use:


2 {

3 struct jsw_node *it = tree->root;

4

5 while ( it != NULL ) {6 if ( it->data == data )

7 return1;

8 else {

9 int dir = it->data < data;

10 it = it->link[dir];

11 }

12 }

13

14 return0;

15 }

Some notes on duplicate items are in order. These search algorithms will find the firstmatch, but if you want to search for duplicates, such as counting the duplicates, it's alittle harder. In such a case you would probably want to return a pointer to the data,

and also save a pointer to the node that matched, so that you can restart the searchwhere you left off last. That way you can find subsequent matches past the first. Themost common trees disallow duplicates, so for the most part we'll assume that in thistutorial. It's also easier on me because I don't have to write extra code for thefunctions that work with duplicates. ;-)

Insertion

A lot of the stuff you want to do with a tree involves searching, that's why they're soefficient. A search will ideally divide the problem in half at each step, so it's a goodidea to take advantage of that for other operations too. For example, insertion into atree is a variation of an unsuccessful search. You search for an item that's not in thetree, then replace the first leaf you get to with a new node that contains that item:

Find 7

5

/ \

2 8

/ \ / \

1 4 ~ 9

Not found, insert at leaf

5

/ \

2 8

/ \ / \

1 4 7 9

The code to ull this off is ust a sim le variation of the search al orithm, but the


5/26

recursive code takes advantage of recursion to simplify updates. Here's the recursivecode to insert into a binary search tree (make_node simply allocates memory,assigns the data and sets the links to NULL):

1struct jsw_node *jsw_insert_r ( struct jsw_node *root, int data )

2 {

3 if ( root == NULL )

4 root = make_node ( data );

5 elseif ( root->data == data )

6 return root;

7 else {

8 int dir = root->data < data;

9 root->link[dir] = jsw_insert_r ( root->link[dir], data );

10 }

11

12 return root;13 }

14

15int jsw_insert ( struct jsw_tree *tree, int data )

16 {

17 tree->root = jsw_insert_r ( tree->root, data );

18 return1;

19 }

The only difference betweenjsw_insert_r andjsw_find_r is what you return for thebase cases. However, it might not be immediately obvious how the recursive updateworks. As we ride the wave of recursion back up the tree, we re-assign the next linkdown the path to the return value ofjsw_insert_r. This way we can be sure that anychanges made to the lower parts of the tree can be recognized by the higher parts ofthe tree. That's a common pitfall, by the way. Always remember to reset the parent of

a node that changes so that your changes stick.

Since we only insert at a leaf, and leaves are at the bottom of the tree, trees growdownward, which make the wholetree concept confusing and tempts us again tothink of trees asroots. Let's insert a few numbers into a new tree to see how thisworks:

Insert 5

5

Insert 2

5

/

2

Insert 8

5

/ \

2 8

Insert 4

5

/ \

2 8

\

4

Insert 9

5

/ \

2 8

\ \

4 9

Insert 3

5

/ \

2 8


6/26

\ \

4 9

/

3

Insert 1

5

/ \

2 8

/ \ \

1 4 9

/

3

As withjsw_find,jsw_insert can be implemented without recursion as well. Just likethe recursive and non-recursive search functions, non-recursive insertion has severaladvantages over recursive insertion. For large trees, you don't have to worry abouthitting some arbitrary limit for the depth of the recursion. A loop is almost alwaysfaster than the equivalent behavior of recursive function calls, and it saves on localvariable space as well. Sometimes recursive is just the best option because of itssimplicity, but in my opinion, many of the non-recursive solutions for trees aren't thatmuch more complicated. It's usually worth the effort to do an implementation withoutrecursion.

The only real difference between the non-recursive search and the non-recursiveinsertion is that insertion needs to be careful not to actually get to a leaf. Because aleaf is a null pointer, there's no way to determine which node in the tree we camefrom and it's impossible to link the new node into the tree. That defeats the purposeof insertion since we can't make the change stick. So the loop tests the next linkbefore going to it, and if it's a leaf, we break out of the loop and tack a new nodeonto that link. Since there's no next link that points to the root of the tree itself, wetreat that as a special case. There are ways to get around the special case of anempty tree, but I think it's cleaner this way:


2 {

3 if ( tree->root == NULL )

4 tree->root = make_node ( data );

5 else {


7 int dir;

8

9 for ( ; ; ) {

10 dir = it->data < data;

11

12 if ( it->data == data )

13 return0;

14 elseif ( it->link[dir] == NULL )

15 break;

16


18 }

19

20 it->link[dir] = make_node ( data );

21 }

22

23 return1;

24 }

Since the conditions for exiting the loop are between the code to choose a directionand the code to move in that direction, an infinite loop is used. Sure, it's possible toset up aproperloop, but I don't think those methods are as clean in this case, andsince I'm the author, my opinion rules for my tutorial, so there. :-) Note that in all ofthese implementations, you can allow dupl icates simply by removing the test forequality. Everything will just work and duplicates will fall underneath each other.

Deletion

The ease with which we could insert into a tree might lull you into a false sense ofsecurity. Well, if all you need to do is search and insert then everything is fine, butmost of us at some point find the need to remove an existing node from the tree. Thisisn't as easy as inserting into a tree because this time we don't have the option of


7/26

where to remove the node. If it's an external node then the process is about as simpleas inserting a node. However, if it's an internal node, things get tricky.

To remove an external node, all you have to do is replace it with its non-leaf child, ora leaf if it doesn't have any. There are three simple cases for that. If the node has aleft child, replace it with the left child because it's an external node and the right childis sure to be a leaf. If the node has a right child, this is the symmetric case for if thenode has a left child. If the node has no children, just pick one because they're bothleaves. Be sure to reset the parent to point to the child, and it unlinks the node fromthe tree:

Left child

p p

\ \

x -> c

/ \ / \

c ~ * *

/ \

* *

Right child

p p

\ \

x -> c

/ \ / \

~ c * *

/ \

* *

No children

p p

\ \

x -> ~

/ \

~ ~

The key element to success for this is making sure that the parent's link is updated topoint to either c or a leaf. If this isn't done then x won't be successfully removed fromthe tree, and any subsequent freeing of memory will cause issues. Fortunately, aslong as we have access to the parent node, these three cases can be lumped togetherwith a single block of code. It's probably the ugliest thing I can think of for basictrees, because you need to test which child ofx is non-null and which child ofpx is

with our funky little reverse test:

1 p->link[p->link[1] == x] = x->link[x->link[0] == NULL];

Cool, huh? It's like removal for a linked list, we replace p's next link with x 's next link,thus removing x from the chain so that we can free it safely. The test for x->link[0]against NULL will give us 1 if the left l ink is null (the second case), and we'll use theright link to replace x. If the right link is also null, x had no children (the third case).If the test gives us 0, the left link is not null (the first case). Checking x against p'sright link does the same thing but in a different way. Ifx is p's right link, the test willgive us 1, which is correct. Otherwise, the test will result in 0, and since x isn't a leaf,we can be sure that it's p's left child. All done with one line (we'll handle the specialcase of deleting the tree's root in a bit, since that's a special case), this is the simplecase for deletion. It gets harder when we want to delete a node with two children.

5

/ \

2 8

/ \ / \

1 4 7 9


8/26

Let's delete 5 from the tree above, just pretend that it's not the tree's root. ;-) Wecan't simply replace it with one of its children because that would cause someuncomfortable problems, most notably what do we do with the extra child that wouldbe floating around? Okay, what about attaching 5's left subtree to 7's left l ink andreplacing 5 with 8?

8

/ \

7 9

/

2

/ \

1 4

That maintains the rule that left nodes are less and right nodes are greater, but it'snot exactly trivial to do surgery like that, and we can do better with less work andavoid changing the structure around so much. How about instead of moving aroundsubtrees, we just replace 5 with 4 or 7? These are the inorder predecessor andsuccessor, respectively. All of the work is in finding one of those nodes, because thenall you need to do in most trees is copy the data of one to the other, then remove thepredecessor or successor that you replaced it with. In th is tu toria l, we 'll use thesuccessor, but it really makes no difference which you choose:

Copy 7 to 5

7

/ \

2 8

/ \ / \

1 4 7 9

Remove the external 7

7

/ \

2 8

/ \ / \

1 4 ~ 9

The nice thing about this trick (it's called deletion by copying) is that we can take thehard case of deleting a node with two children and turn it into a case of deleting anode with only one child. Wait, how does that work though? What if the successor hastwo children?

The inorder predecessor of a node never has a right child, and the inorder successornever has a left child. So we can be sure that the successor always has at most one

child, and the left child is a leaf. Following from this, we have an easy way of findingthe successor of any node provided it's in the right subtree of that node. Simply walkonce to the right, then go all the way to the left. Likewise, to find the predecessor,you walk once to the left and then all the way to the right.

Now, this works for finding a successor down the tree, but what if we want to knowthe successor of 4? Well, that's harder, but we won't ever see that situation in thebasic deletion algorithm, so we can set it aside for now. We have enough foundationin place to actually go about removing a node from the tree. First we search for thenode to delete, just as we did with insertion, except this time we'll also save theparent as we go down. When we get to the found item (we just return failure if wereach a leaf), we test to see how many children it has and do either the simple caseor the hard case depending on that. Here's the code:

1int jsw_remove ( struct jsw_tree *tree, int data )2 {

3 if ( tree->root != NULL ) {

4 struct jsw_node *p = NULL, *succ;


6 int dir;

7

8 for ( ; ; ) {

9 if ( it == NULL )

10 return0;

11 elseif ( it->data == data )


9/26

12 break;

13


15 p = it;


17 }

18

19 if ( it->link[0] != NULL && it->link[1] != NULL ) {

20 p = it;

21 succ = it->link[1];

22

23 while ( succ->link[0] != NULL ) {

24 p = succ;

25 succ = succ->link[0];

26 }

27

28 it->data = succ->data;

29 p->link[p->link[1] == succ] = succ->link[1];

30

31 free ( succ );

32 }

33 else {

34 dir = it->link[0] == NULL;

35

36 if ( p == NULL )

37 tree->root = it->link[dir];

38 else

39 p->link[p->link[1] == it] = it->link[dir];

40

41 free ( it );

42 }

43 }

44

45 return1;

46 }

The search is almost identical to non-recursive insertion, except this time we don'tneed to split the direction and movement code, and we also set p to it before settingit to it->dir. This saves the parent, and if the p i s NULL when the search is over,we're looking at a special case of deleting the root of the tree.

Let's look at the two cases after the search is over. In the first case, if the node astwo children, we need to find the inorder successor. As you already know, to do thatwe simply move right once, then left until the next left is a leaf. During this process,we also take the parent with us so that at the end of this walk down the tree, succ isthe successor and p is the successor's parent. We then copy the successor's data to

the node we want to delete, but actually remove the successor instead. Notice thatthe parent's next link is always given succ->link[1] in this case, because we knowthat the left link is a leaf. The reverse test p->link[1] == succ is basically the samething as we've been doing. It results in 0 if succ is p's left link and 1 ifsucc i s p'sright link. Free the unlinked node and we're done.

In the second case, the node we want to delete only has one child, it's an externalnode. This is the easy case because we don't need to find the successor, we can justcut to the part where we remove the node directly. We pick which side is not a leaf,and give that subtree to p as it's replacement link. However, ifp is NULL then we'retrying to delete the root of the tree. That can be an issue because any changes wemake have to be assigned to tree->root or they won't be saved, so we treat that asa special case. This special case doesn't exist if the node we want to delete has twochildren, because even if it's the root of the tree, we'll be replacing its value instead

of actually unlinking it.

Now, while this method is relatively short and easy, it's somewhat o f a naiveapproach. We're solving the problem directly when we could be working smarter.Notice how both cases actually end up removing an external node, but they're sti llseparate cases because we need to find the successor for one of them. Let's trysomething different. Instead of stopping when we find the node we want to delete,let's save it and keep going all of the way down to an external node, since we'll dothat anyway. Then when we get to the bottom, we copy the current node's data intothe saved node, and remove the current node:

1int jsw_remove ( struct jsw_tree *tree, int data )

2 {


4 struct jsw_node head = {0};

5 struct jsw_node *it = &head;

6 struct jsw_node *p, *f = NULL;

7 int dir = 1;

8

9 it->link[1] = tree->root;

10

11 while ( it->link[dir] != NULL ) {

12 p = it;


14 dir = it->data


10/26

15


17 f = it;

18 }

19

20 if ( f != NULL ) {

21 f->data = it->data;

22 p->link[p->link[1] == it] = it->link[it->link[0] == NULL];

23 free ( it );

24 }

25

26 tree->root = head.link[1];

27 }

28

29 return1;

30 }

This function introduces two tricks to avoid special cases. The first trick is a dummyroot so that the root of the tree always has a parent. The second trick is saving thefound node so that we can copy the data when we get to an external node. Now wecan avoid the special case of deleting the root of the tree, and the special case offinding the successor. The code is much shorter and more elegant. Notice that we alsotest whether it->data is less than or equal to data because we want to keep goingeven if we found a matching item, and we want to go to the right to get to thesuccessor of that item.

We can delete from a binary search tree with recursion as well, but it's not as c lean.The following function follows a similar approach as the smarter deletion describedabove. When we find the node we want to delete, we then find the successor andcopy the data. Then we trick the function by changing the data we're searching for tothe successor's (also testing for less than or equal because there will be a match aftercopying the successor's data). The recursion continues until we get to the successor,at which point we do a simple case deletion. Then we ride the recursion wave back upto make sure that all of the changes stick:

1struct jsw_node *jsw_remove_r ( struct jsw_node *root, int data )

2 {

3 if ( root != NULL ) {

4 int dir;

5

6 if ( root->data == data ) {

7 if ( root->link[0] != NULL && root->link[1] != NULL ) {

8 struct jsw_node *succ = root->link[1];

9

10 while ( succ->link[0] != NULL )

11 succ = succ->link[0];12

13 data = succ->data;

14 root->data = data;

15 }

16 else {

17 struct jsw_node *save = root;

18

19 root = root->link[root->link[0] == NULL];

20 free ( save );

21

22 return root;

23 }

24 }

25

26 dir = root->data link[dir] = jsw_remove_r ( root->link[dir], data );28 }

29

30 return root;

31 }

32


34 {

35 tree->root = jsw_remove_r ( tree->root, data );

36 return1;

37 }

Destruction

At some point you'll get sick and tired of trees, and you'll want to kill it. But you don't

want memory leaks, so you want to remove every node and free its memory correctly.Sure, you could calljsw_remove a bunch of times, but that's hackish. A better way isto traverse every node in the tree and delete them all in one shot. Ignoring thedetails of traversal for now because we'll discuss it shortly, this can be done with apostorder traversal, and the recursive solution is trivial:

1void jsw_destroy_r ( struct jsw_node *root )

2 {


4 jsw_destroy_r ( root->link[0] );


11/26

5 jsw_destroy_r ( root->link[1] );

6 free ( root );

7 }

8 }

9

10void jsw_destroy ( struct jsw_tree *tree )

11 {

12 jsw_destroy_r ( tree->root );

13 }

It can also be done without using recursion, but the non-recursive postorder traversalisn't exactly easy. However, you don't need to do a postorder traversal to destroy atree. Not if you're willing to change the structure to suit your needs, that is. ;-) If wehave a tree where every left link is a leaf, it's easy to walk down the tree and deleteevery node: just save the right link, delete the node, then go to the saved link:

0

/ \

~ 1

/ \

~ 2

/ \

~ 3

/ \

~ 4

/ \

~ 5

/ \

~ ~

The trick is forcing this particular structure from any possible tree. We can go about itwith an operation called a rotation. A rotation can be either left or r ight. A leftrotation takes the right child of a node and makes it the parent, with the node beingthe new left child. A right rotation is the symmetric inverse of a left rotation:

Right rotation

3 1

/ \ / \

1 4 -> 0 3

/ \ / \

0 2 2 4

Notice how 2 moves from the left subtree to the right subtree. The only nodes thatare changed in the rotation are 1 and 3. 3's left link becomes 1's right link, and 1's

right link becomes 3. A left rotation on the second tree would result in the first tree,thus proving that rotations are symmetric. So, if we delete a node when there's no leftlink and do a right rotation when there is, we can be sure that we'll see and deleteevery node in the tree. The code to do this is surprisingly short:

1void jsw_destroy ( struct jsw_tree *tree )

2 {


4 struct jsw_node *save;

5

6 while ( it != NULL ) {

7 if ( it->link[0] != NULL ) {

8 /* Right rotation */

9 save = it->link[0];

10 it->link[0] = save->link[1];

11 save->link[1] = it;

12 }

13 else {

14 save = it->link[1];

15 free ( it );

16 }

17

18 it = save;

19 }

20 }


12/26

Okay, in case you didn't notice, this section was the end of the foundation sectionsand the beginning of the intermediate sections. I've divided the tutorial up so that theeasy stuff is at the beginning, because I've been told that my last tutorial on binarysearch trees was complicated. From here on out, I'm going to assume that you have agood foundation and pretty much know what you're doing. The next section will covertraversal, which can get pretty heady, but I'll try to keep it light. Following traversal,we'll talk about parent pointers and threading, which can be considered the advancedparts of the tutorial. Last, some comments on performance properties and hints aboutwhat comes after the basics, and then you'll be done with this tutorial. :-)

Traversal

Now that we have the code to build a binary search tree, we can do things with it.Useful things. Fun things. Naturally, you can use the search functions definedpreviously to determine if an item is in the tree, but it would be nice to verify that theinsertion functions work properly. It would also be n ice to be able to perform anoperation on every item in the list, such as printing the items to standard output or afile. This brings us to the field of tree traversal.

You would think that visiting every node in a tree would be simple and consist of one,maybe two different cases, right? Wrong! There are actually N! (N factorial) differentways to traverse a binary search tree of N nodes, but most of them are useless. Ofthe many ways to traverse a binary search tree, we'll be looking at two categories:breadth-first, where we look at the level-order traversal as an example, and depth-first, where we play with preorder, inorder, and postorder traversals. We'll also look ata more flexible stepwise traversal that you would find in a good tree library.

Depth-first traversals begin by moving as far left or right as pos sible befor ebacktracking. They then move up one link and move left or right again. This process isrepeated until all nodes have been visited. As expected, the depth-first traversals canbe written recursively because the movement is based on stack behavior. The questionof course, is when do the nodes get visited? Since we can only move in one of twodirections, we can determine that there are only six possible ways to t raverse andvisit in a depth-first traversal. Because each operation can be one ofmove left,

move right, orvisit, the following variants are available:

1. visit, move left, move right2. visit, move right, move left3. move left, visit, move right4. move right, visit, move left5. move left, move right, visit6. move right, move left, visit

Of these six, three are common enough to give standardized names: traversal 1 isreferred to as preorder because the node is visited first, traversal 3 is called inorderbecause it results in a traversal in the sorted order of the node values, and finally,traversal 5 is called postorder because the node is visited after both movements. Eachof these can be written using a short and elegant recursive algorithm:

1void jsw_preorder_r ( struct jsw_node *root )

2 {


4 printf ( "%d\n", root->data );

5 jsw_preorder_r ( root->link[0] );

6 jsw_preorder_r ( root->link[1] );

7 }

8 }

9

10void jsw_preorder ( struct jsw_tree *tree )

11 {

12 jsw_preorder_r ( tree->root );

13 }

14

15void jsw_inorder_r ( struct jsw_node *root )

16 {


18 jsw_inorder_r ( root->link[0] );


20 jsw_inorder_r ( root->link[1] );

21 }

22 }

23

24void jsw_inorder ( struct jsw_tree *tree )

25 {

26 jsw_inorder_r ( tree->root );

27 }

28

29void jsw_postorder_r ( struct jsw_node *root )

30 {


32 jsw_postorder_r ( root->link[0] );


13/26

33 jsw_postorder_r ( root->link[1] );


35 }

36 }

37

38void jsw_postorder ( struct jsw_tree *tree )

39 {

40 jsw_postorder_r ( tree->root );

41 }

Let's look at an example. Given the following tree, we'll look at the results of eachtraversal. The preorder traversal visits a node and then moves, so the nodes would bevisited in the order of 5, 3, 2, 7, 6, 8. An inorder traversal moves as far left aspossible before visiting the node, so the order would be 2, 3, 5, 6, 7, 8. Postordertraversal would result in 2, 3, 6, 8, 7, 5.

5

/ \

3 7

/ / \

2 6 8

Okay, so we can print out all of the nodes in a tree, but let's do something fun just totake away the monotony. Instead of just getting all of the values, let's actually lookat the structure of the tree. You can do this easily with an inorder traversal, and itprints out the tree rotated 90 counter-clockwise:

1void jsw_structure_r ( struct jsw_node *root, int level )

2 {

3 int i;

4

5 if ( root == NULL ) {

6 for ( i = 0; i < level; i++ )

7 putchar ( '\t' );

8 puts ( "~" );

9 }

10 else {

11 jsw_structure_r ( root->link[1], level + 1 );

12

13 for ( i = 0; i < level; i++ )

14 putchar ( '\t' );


16

17 jsw_structure_r ( root->link[0], level + 1 );

18 }

19 }

20

21void jsw_structure ( struct jsw_tree *tree )

22 {

23 jsw_structure_r ( tree->root, 0 );

24 }

Notice how this is basically the same thing asjsw_inorder, except instead of justprinting the value of the node, we also print a number of tabs that correspond to thelevel of the node. We also print leaves, just so that you can tell the tree endedproperly. So while the recursive traversals may seem like kiddie toys, they can be verypowerful when used creatively. See what you can come up with. :-)

Despite these wonderfully concise solutions, it can be prudent to take the non-recursive approach to traversal. These traversals are more difficult because of the dualrecursive calls. If you're the kind of person who reads ahead to try and trick theteacher (shame on you!) then you may have noticed similarities between the preordertraversal and the levelorder travesal if the queue were replaced with a stack. If not,don't worry about it. Believe it or not, that's precisely what needs to be done to writean iterative preorder traversal. Wow:

1void jsw_preorder ( struct jsw_tree *tree )

2 {


4 struct jsw_node *up[50];

5 int top = 0;

6

7 if ( it == NULL )

8 return;

9

10 up[top++] = it;

11

12 while ( top != 0 ) {

13 it = up[--top];

14

15 printf ( "%d\n", it->data );

16


14/26

17 if ( it->link[1] != NULL )

18 up[top++] = it->link[1];



21 }

22 }

Inorder traversal is harder. We need to walk to the left without losing any of the rightlinks or any of the parents. This implies at least several loops, one to save the linksfor backtracking, one to visit the saved links, and another to manage successivebranches. Fortunately, while the logic is rather complex, the code is surprisinglysimple:

1void jsw_inorder ( struct jsw_tree *tree )

2 {


4 struct jsw_node *up[50];

5 int top = 0;

6





11

12 up[top++] = it;

13 it = it->link[0];

14 }

15

16 it = up[--top];

17

18 while ( top != 0 && it->link[1] == NULL ) {


20 it = up[--top];

21 }

22


24

25 if ( top == 0 )

26 break;

27

28 it = up[--top];

29 }

30 }

The outer loop continues until it is a null pointer. This could be due to an empty treeat the very beginning, or because there are no more nodes left on the stack. You'll

notice that the last statement of the outer loop is careful to set walk to NULL if thestack is empty, so that the algorithm actually stops. The first inner loop handles thesaving of right links and parents while moving down the left links. The second innerloop handles the visiting of parents. Lastly, the final call to printftakes care oflingering right links. A diagram of the execution using the same tree as the recursivetraversals is in order.

save 7, stack = { 7 }

save 5, stack = { 5, 7 }

save 3, stack = { 3, 5, 7 }

save 2, stack = { 2, 3, 5, 7 }

visit 2, stack = { 3, 5, 7 }

visit 3, stack = { 5, 7 }

visit 5, stack = { 7 }

pop 7, stack = {}

save 8, stack = { 8 }save 7, stack = { 7, 8 }

save 6, stack = { 6, 7, 8 }

visit 6, stack = { 7, 8 }

visit 7, stack = { 8 }

pop 8, stack = {}

save 8, stack = { 8 }

visit 8, stack = {}

The most difficult of the non-recursive depth-first traversals is postorder. The difficultycomes from trying to work out a way to visit items on a lower level while still savingthe parents and visiting them at the right time. I invariably find myself using a stackand helper counts, where 0 meanssave the left link, 1 meanssave the right link,and 2 meansvisit the top of the stack. This solution is convenient because it fitscomfortably into my scheme for using boolean values to determine whether to go left

or right:

1void jsw_postorder ( struct jsw_tree *tree )

2 {

3 struct {

4 struct jsw_node *p;

5 int n;

6 } up[50], it;

7 int top = 0, dir;

8

9 up[top].p = tree->root;


15/26

10 up[top++].n = 0;

11

12 while ( top != 0 ) {

13 it = up[--top];

14

15 if ( it.n != 2 ) {

16 dir = it.n++;

17 up[top++] = it;

18

19 if ( it.p->link[dir] != NULL ) {

20 up[top].p = it.p->link[dir];

21 up[top++].n = 0;

22 }

23 }

24 else

25 printf ( "%d\n", it.p->data );

26 }

27 }

The code is short, but incredibly opaque. A diagram of the execution helps a greatdeal in figuring out what the algorithm is really doing. It's not as hard as it looks, Iswear!

push 5:0, stack = { 5:0 }

increment, stack = { 5:1 }

push 3:0, stack = { 3:0, 5:1 }

increment, stack = { 3:1, 5:1 }

push 2:0, stack = { 2:0, 3:1, 5:1 }

increment, stack = { 2:1, 3:1, 5:1 }

increment, stack = { 2:2, 3:1, 5:1 }

visit 2:2, stack = { 3:1, 5:1 }


visit 3:2, stack = { 5:1 }

increment, stack = { 5:2 }

push 7:0, stack = { 7:0, 5:2 }


push 6:0, stack = { 6:0, 7:1, 5:2 }

increment, stack = { 6:1, 7:1, 5:2 }

increment, stack = { 6:2, 7:1, 5:2 }

visit 6:2, stack = { 7:1, 5:2 }


push 8:0, stack = { 8:0, 7:2, 5:2 }

increment, stack = { 8:1, 7:2, 5:2 }

increment, stack = { 8:2, 7:2, 5:2 }

visit 8:2, stack = { 7:2, 5:2 }

visit 7:2, stack = { 5:2 }

visit 5:2, stack = {}

Levelorder traversal looks at the tree as a stack of levels, where each level consists ofall nodes with the same height, and moves along each item in one level beforemoving on to the next level. The most common implementation starts at the root andtraverses each level from left to right. For example, in the following tree, a levelordertraversal would visit the items in the order 5, 3, 7, 2, 6, 8. Note that this isn't theonly way to do a levelorder traversal, but because it's the most common, we'll use itas our example for discussion. So there.

5

/ \

3 7

/ / \

2 6 8

The levelorder traversal is one of the few algorithms on a binary search tree that can'tbe written recursively without bending in all kinds of awkward, and probablyuncomfortable, directions. For those that are familiar with recursion, you know that arecursive solution can be simulated through the use of a stack. Levelorder traversalrequires the effects of a queue, however, so recursion isn't very practical.

The algorithm itself is very simple in theory, and it follows the same logic as preordertraversal except with a queue. For each node, the left and right links are pushed ontoqueue. The item is then visited (visiting an item is simply performing an operation onthat item, such as printing it to standard output). The next item to visit can be found

by popping the first item off of the queue. The following is an example using the treeabove:

save 5, queue = { 5 }

visit 5, queue = {}

save 3, queue = { 3 }

save 7, queue = { 7, 3 }

visit 3, queue = { 7 }

save 2, queue = { 2, 7 }



16/26

save 6, queue = { 6, 2 }

save 8, queue = { 8, 6, 2 }

visit 2, queue = { 8, 6 }


visit 8, queue = {}

Once you understand what's going on, an algorithm is short and sweet. It's basicallythe preorder traversal with a queue instead of a stack. I used a simple array basedrotating queue with front and back indices, but beyond that it's a really simplefunction:

1void jsw_levelorder ( struct jsw_tree *tree )

2 {


4 struct jsw_node *q[50];

5 int front = 0, back = 0;

6

7 if ( it == NULL )

8 return;

9

10 q[front++] = it;

11

12 while ( front != back ) {

13 it = q[back++];

14


16


18 q[front++] = it->link[0];


20 q[front++] = it->link[1];

21 }

22 }

Since this is a tutorial about trees and not queues, I'll try to restrain myself fromdescribing how it works and we can simply trust that it does. Or you can test my codeto make sure that I'm not just pulling your leg. And I have been known to makemistakes from time to time, so it never hurts to double check me. ;-)

All of these traversals are fine for what they do, but they're not usually fine for whatwe want to do. Or at least what I want to do. What I want to do is this:

1int *x = first ( tree );

2

3while ( x != NULL ) {

4 printf ( "%d\n", *x );5 x = next ( tree );

6 }

That's hard to do with both the recursive and non-recursive traversals that we'velooked at so far because we need to save the state of the last step in the traversal.The easiest way to go about this is a separate traversal structure that holds theinformation that we need. Because we need to save nodes that are further up thetree, a stack is needed, just like the depth-first traversals. We also need to keeptrack of which node is the current node in the traversal, so we'll go with the followingstructure (as usual, the size of the stack is pretty arbitrary, but it shouldn't be lessthan the expected height of the tree):

1struct jsw_trav {

2 struct jsw_node *up[50]; /* Stack */

3 struct jsw_node *it; /* Current node */4 int top; /* Top of stack */

5 };

Now for the fun part. Since the most common traversal by far is an inorder traversal,we'll do that one. It's by no means the only traversal we can do in a stepwise manner,but accessing the items in sorted order is probably the most useful traversal,especially since it's pretty easy to add both forward and backward movement. To startoff an inorder traversal, we need to find the smallest node, or the node that's farthestto the left. So we write a simple function called jsw_first, which initializes a

jsw_trav instance, moves the current node to the smallest item, and saves the path:

1int *jsw_first ( struct jsw_trav *trav, struct jsw_tree *tree )

2 {

3 trav->it = tree->root;

4 trav->top = 0;

5

6 if ( trav->it != NULL ) {

7 while ( trav->it->link[0] != NULL ) {

8 trav->up[trav->top++] = trav->it;

9 trav->it = trav->it->link[0];

10 }

11 }

12

13 if ( trav->it != NULL )


17/26

14 return &trav->it->data;

15 else

16 return NULL;

17 }

Notice how a pointer to the item is returned instead of the item itself. This makes iteasier to test for boundaries because when the traversal is finished, we can simplyreturn a null pointer. That fits much more comfortably into the desirable traversal loopthat I gave above.jsw_first is a simple function, and we could modify to be jsw_lastby simply changing the 0's to 1's, to go right instead of left.

Now for the hard part. We need to perform a single step in the inorder traversal,starting from the smallest node. The code to do this is relatively simple. If the currentnode has a right link, we find the inorder successor down the tree and update the

stack accordingly. If the current node doesn't have a right link, we need to find theinorder successor up the tree, which involves popping the stack as long as we'vealready visited the right link. If the stack is empty, we terminate the traversal bysetting the current node to NULL.

1int *jsw_next ( struct jsw_trav *trav )

2 {

3 if ( trav->it->link[1] != NULL ) {



6

7 while ( trav->it->link[0] != NULL ) {



10 }

11 }

12 else {

13 struct jsw_node *last;

14

15 do {

16 if ( trav->top == 0 ) {

17 trav->it = NULL;

18 break;

19 }

20

21 last = trav->it;

22 trav->it = trav->up[--trav->top];

23 } while ( last == trav->it->link[1] );

24 }

25



28 else

29 return NULL;

30 }

Now I can do what I want to do, with only minor changes, and it only took boat loadsof extra thought to make a stepwise traversal from a non-recursive traversal. Aren'tyou glad that I did it for you? :-)

1struct jsw_trav it;

2int *x = first ( &it, tree );

3

4while ( x != NULL ) {

5 printf ( "%d\n", *x );

6 x = next ( &it );

7 }

All in all, a stepwise traversal isn't much more difficult than a non-recursive traversal,and it's so much more flexible it's not even funny. You can do any number oftraversals stepwise, but like I said before, inorder is by far the most common. Playaround with the non-recursive traversals and see if you can make them all stepwise. Ifyou can, I can safely say that you understand the concept. :-)

Parent Pointers

The biggest problem with binary search trees is when you need to do something thatinvolves walking back up the tree, such as a traversal (or later, balancing). Unless weget clever, we need to use either an explicit stack to save the path down, or animplicit stack through recursion. However, sometimes it makes sense to get clever. Forexample, with recursion we might hit an arbitrary limi t, and the re's no way todetermine what that limit is in a portable way. With the explicit stack, we've beensetting the arbitrary limit because we were using a stack that doesn't grow on its own.

Now, you could use a nice stack library that grows as needed, or you could build theability to move back up the tree into the tree. As it turns out, the latter is morecommon than the former, and by far the most common solution is an extra link forevery node that points up to the parent. There are called, cleverly, parent pointers:


18/26

1struct jsw_node {

2 int data;

3 struct jsw_node *up;

4 struct jsw_node *link[2];

5 };

6

7struct jsw_tree {


9 };

Insertion into a tree with parent pointers is pretty simple. Only a small bi t of codeneeds to be added to the function to set the parent pointer correctly. However, careneeds to be taken to give the root's parent pointer a leaf value so we can test for it.All of this is done in make_node, which allocates memory to a new node, assigns the

data, and sets all links to NULL:


2 {

3 if ( tree->root == NULL )


5 else {


7 int dir;

8

9 for ( ; ; ) {


11


13 return0;


15 break;16


18 }

19

20 it->link[dir] = make_node ( data );

21 it->link[dir]->up = it;

22 }

23

24 return1;

25 }

Since the parent of the new node is it, we have no trouble setting the link. As long asyou have access to the parent, setting a parent pointer is triviality itself. Of course,that makes a recursive solution slightly more complicated than it would be without

parent pointers.

Deletion from a tree with parent pointers is also slightly more complicated, but we'realso saved the need for an extra variable to save the parent. In this case, special careneeds to be taken in updating the parent pointer because it's possible that thereplacement node is a leaf! This can happen with both the tree's root and further downthe tree, so these are two special cases:


2 {


4 struct jsw_node head = {0};


6 struct jsw_node *f = NULL;

7 int dir = 1;


10 tree->root->up = &head;

11



14 dir = it->data data == data )

17 f = it;

18 }

19

20 if ( f != NULL ) {

21 int dir = it->link[0] == NULL;

22


24 it->up->link[it->up->link[1] == it] =25 it->link[dir];

26

27 if ( it->link[dir] != NULL )

28 it->link[dir]->up = it->up;

29

30 free ( it );

31 }

32


34 if ( tree->root != NULL )


19/26

35 tree->root->up = NULL;

36 }

37

38 return1;

39 }

Finally we come to traversal. This is the whole reason for adding parent pointers. Nowwe don't need that kludgy stack to move toward a successor up the tree, we can justfollow the parent pointers.jsw_trav andjsw_first are almost identical, differing onlyin that they don't use a stack:

1struct jsw_trav {

2 struct jsw_node *it;

3 };

4


6 {


8


10 while ( trav->it->link[0] != NULL )


12 }

13



16 else

17 return NULL;

18 }

The big differences are injsw_next, where we now remove the use of the stack andsimply follow a parent pointer when upward movement is needed. The l ogic isbasically the same, and the changes are minimal. Not a bad trade for removing thatstack, is it?


2 {

3 if ( trav->it->link[1] != NULL ) {


5



8 }

9 else {

10 for ( ; ; ) {

11 if ( trav->it->up == NULL || trav->it == trav->it->up->link[0] )

12 {

13 trav->it = trav->it->up;

14 break;

15 }

16

17 trav->it = trav->it->up;

18 }

19 }

20



23 else

24 return NULL;

25 }

Right Threading

Parent pointers are useful in many ways, but the extra overhead per node can beprohibitive. A very clever solution was devised where the leaves o f a t ree can bereused, so that instead of pointing to null, they point to the inorder successor orpredecessor of the external node. This is called a threaded tree. Now the overhead ofan extra pointer goes away and becomes the overhead of a li ttle flag that tells uswhether a link is a thread or a real link. Why the flag? Because we'll get stuck in aninfinite loop if we can't differentiate between a legitimate link and a thread. One flagis needed per link in the tree, so a fully threaded binary search tree would need twoflags. A more common solution only uses one thread for the right link and simplyleaves the left links as they are in a normal tree:

1struct jsw_node {

2 int data;

3 int thread;

4 struct jsw_node *link[2];

5 };

6

7struct jsw_tree {


9 };


20/26

In the following tree built using the above structures, every right link that wouldnormally be a leaf is now a link to the inorder successor and every left link that wouldnormally be a leaf is still normally a leaf. Notice how we can now get directly from 4to 6 instead of working our way up past 3 through a parent pointer or a stacked path.That's the primary benefit of threaded trees, and as you'll see shortly, it simplifies thestepwise traversal drastically. Also notice how a search for 5 would just go around in acircle from 6 to 3 to 4 and back to 6 again. That's generally considered undesirable,hence the flag.

6

/ | \

|

3 | 7

| ~/ | \ | / \ |

| | |

2 | 4 | ~ 8 |

| | |

/ \___| / \__| / \_|

~ ~ ~

Threaded trees are not very common, but the most common of them will only allowthreads for the right links, as shown above, thus allowing an inorder traversal inascending sorted order, but to traverse in descending sorted order, we need to useone of the other solutions for a traversal (parent pointers, recursion, or a stack).These are called right threaded trees, because only the right side is threaded, and thecode is more complicated than with parent pointers. The search algorithm must be

changed to handle a thread, where if we reach a thread then it counts the same as ifwe reached a leaf so as to avoid that nasty little endless loop problem:


2 {


4 int dir;

5

6 for ( ; ; ) {


8


10 return1;

11 elseif ( dir == 1 && it->thread == 1 )

12 break;


14 break;15


17 }

18

19 return0;

20 }

Because the symmetry of the tree has been broken, we now need to consider thedifferences between moving to the right (where we might see a thread but never aleaf) and moving to the left (where we might see a leaf, but never a thread). Insertionfollows the same search pattern, but the actual insertion needs to make the samedistinction because if the new node is inserted as a right link, it needs to take overthe thread of its parent before being linked into the tree. If the new node is insertedas a left link, its thread needs to be set to the parent. Consider inserting 5 in to thethreaded tree shown above. 5 goes to the right link of 4, but 4's right link is a thread.To maintain a proper threaded tree, we need to shift that thread down to 5 so that 5'sright link points to 6. On the other hand, if we were to insert 0, it would be placed onthe left link of 2. But we still need to set 0's right thread to point to the inordersuccessor, which would be 2:

Insert 5, save thread to 6

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |2 | 4 | ~ 8 |

| | |

/ \___| / \ | / \_|

|

~ ~ 5 | ~

|

/ \_|

~


21/26

Insert 0, new thread to 2

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 4 | ~ 8 |

| | |

/ | \___| / \ | / \_|

| |

0 | ~ 5 | ~

| |

/ \_| / \_|

~ ~

As long as you follow these two rules for insertion, building a threaded tree is easybecause you only have to make very localized changes for it to work. The code to dothis isn't terribly complicated, but it can be confusing if you're not familiar withthreading because it does something different for the right and left links. make_nodestill allocates memory, assigns the data, and set s th e l inks to NULL, but for athreaded tree it also sets the flag to 1 because a new node is always an externalnode with no children. Notice also that the rightmost node in the tree will have athread to a leaf, which might affect other operations on the tree. Here's the code:


2 {3 if ( tree->root == NULL )


5 else {

6 struct jsw_node *it = tree->root, *q;

7 int dir;

8

9 for ( ; ; ) {


11


13 return0;

14 elseif ( dir == 1 && it->thread == 1 )

15 break;


17 break;


20 }

21

22 q = make_node ( data );

23

24 if ( dir == 1 ) {

25 q->link[1] = it->link[1];

26 it->thread = 0;

27 }

28 else

29 q->link[1] = it;

30

31 it->link[dir] = q;

32 }

33

34 return1;35 }

Instead of setting the value of the new node directly to the parent, we save it first ina temporary variable. That way the current thread of the parent isn't lost if the newnode is going to be the parent's right link. All in all, the process of inserting into athreaded tree isn't that tough, as long as you understand the concept. :-)

Deletion from a threaded tree follows the same pattern, but there are four cases foractually unlinking the node. If the node we want to remove has no children and is theleft link of its parent, we can just set the left link of the parent to be a leaf. If it's theright link of its parent and it has no children, we need to set the right link of theparent to the right link of the node so as to save the thread.

Remove 0, replace with leaf

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 4 | ~ 8 |


22/26

| | |

/ | \___| / \ | / \_|

| |

0 | ~ 5 | ~

| |

/ \_| / \_|

~ ~

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 4 | ~ 8 |

| | |

/ \___| / \ | / \_|

|

~ ~ 5 | ~

|

/ \_|

~

Remove 5, save thread

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 4 | ~ 8 |

| | |

/ \___| / \ | / \_|

|

~ ~ 5 | ~

|

/ \_|

~

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 4 | ~ 8 |

| | |

/ \___| / \__| / \_|

~ ~ ~

If the node we want to remove has a child (remember that it's an external node, soit'll only have one if any), and that child is to the right, we simply replace it with thechild since no thread changes would be needed. On the other hand, if the child is tothe left, we need to give the node's thread to its child before replacing it with thechild. It's kind of tricky, but you'll see that it's worth in when we do a stepwisetraversal:

Remove 4, replace with 5

6

/ | \

|

3 | 7

| ~

/ | \ | / \ || | |

2 | 4 | ~ 8 |

| | |

/ | \___| / \ | / \_|

| |

0 | ~ 5 | ~

| |

/ \_| / \_|

~ ~


23/26

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 5 | ~ 8 |

| | |

/ | \___| / \__| / \_|

|

0 | ~ ~

|

/ \_|

~

Remove 2, save thread

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |

2 | 5 | ~ 8 |

| | |

/ | \___| / \__| / \_||

0 | ~ ~

|

/ \_|

~

6

/ | \

|

3 | 7

| ~

/ | \ | / \ |

| | |0 | 5 | ~ 8 |

| | |

/ \___| / \__| / \_|

~ ~ ~

The code to implement these cases isn't terribly complicated, but it can be confusingat first. As usual, I'll use the elegant deletion because it's shorter and prettier. Prettyis good. I'm well known for taking the aesthetic appearance of my code very veryseriously. Of course, that doesn't mean I don't also take correctness very very very

seriously. ;-)


2 {

3 if ( tree->root != NULL ) {4 struct jsw_node head = {0};


6 struct jsw_node *q, *p, *f = NULL;

7 int dir = 1;

8


10


12 if ( dir == 1 && it->thread == 1 )

13 break;

14

15 p = it;


17 dir = it->data data == data )20 f = it;

21 }

22

23 if ( f != NULL ) {

24 q = it->link[it->link[0] == NULL];

25 dir = p->link[1] == it;


27

28 if ( p == q )

29 p->link[0] = NULL;


24/26

30 elseif ( it->link[0] == NULL && it->thread ) {

31 p->thread = 1;

32 p->link[1] = it->link[1];

33 }

34 elseif ( it->link[0] == NULL )

35 p->link[dir] = q;

36 else {

37 q->thread = it->thread;

38 q->link[1] = it->link[1];

39 p->link[dir] = q;

40 }

41

42 free ( it );

43 }

44


46 }

47

48 return1;

49 }

Okay, so how do these statements match the cases described above? Well, since p isthe parent and q is the child we want to replace with, ifp and q are the same nodethen q i s it's right link, and it's a thread to p. That amounts to the case where weremove 0 in the above diagrams. If the left link of the node we want to delete is aleaf and the right link is a thread, it matches the case where we remove 5. If the leftlink of the node we want to delete is null and the right link is not a thread, we've hitthe same case as we did when removing 4. Finally, the last case covers removing 2 inthe diagrams. With a good understanding of what happens at each case, the code ispretty simple.

The stepwise traversal is simplified greatly because to find the inorder successor upthe tree, we only have to follow a link. All of the tricky gobbledygook for moving up,but only as long as we haven't gone to the right before is gone. The following is prettymuch the holy grail of inorder traversals (with the exception that being able to go bothways using a fully threaded tree would be cooler). It's a shame we had to go to suchlengths to build a tree that allows it:

1struct jsw_trav {

2 struct jsw_node *it;

3 };

4


6 {





12 }

13



16 else

17 return NULL;

18 }

19


21 {

22 if ( trav->it->thread == 0 ) {





28 }

29 }

30 else


32



35 else

36 return NULL;

37 }

Parent pointers are more common than threaded trees simply because it's easier tovisualize what's happening. Right threaded trees are the most common threaded tree,but you can also have a symmetrically threaded tree that's similar, but allows threadsin both directions. The logic is similar enough that we only covered the more commonvariant so as to minimize my medical bills when I get carpal tunnel for writing thesetutorials. ;-)

Performance


25/26

Some theory is useful for trees. You have to know how well they'll work in practice,and the analyses done by people smarter than I are beneficial in that respect. Wheninsertions and deletions are made at random, the average case performance of abinary search tree is O(log N), or base 2 logarithmic, or the number of powers of 2 youneed to get to N. That's really good, by the way. If a search is O(log N) then youshrink the number of potential matches by about half with each step. So a search in abinary search tree is fast when insertions and deletions are random.

Note that I keep saying when insertions and deletions are random. That's mecovering my butt from nitpickers who would love to burn me for making an absolutestatement when I have so much fun doing the same to them. In reality, binary searchtrees have a really nasty worst case if every node in the tree is external. The easiestway to obtain and visualize this worst case is to insert numbers in ascending sorted

order:

0

\

1

\

2

\

3

\

4

\

5

Instead of that wonderful O(log N) performance, we basically have a glorified linkedlist and the performance degrades to O(N). This is called a degenerate tree, and it canoccur with a sequence of exceptionally unlucky deletions and insertions that areimpossible to predict. That's not exactly conducive to confidence in your trees. A treethat thinks about this and tries to correct it is called a balanced tree. All of the treesdiscussed in this tutorial are not balanced trees, so the only way to stack the deck inyour favor is to make sure that insertions and deletions are random. That minimizesthe chances of hitting a degenerate case.

There are ways to guarantee near optimal performance either with every operation oramortized over a bunch of operations, but those ways can be very complicated. I'vewritten tutorials on a few of the more common and simpler ones, so I won't repeatmyself here. It's also possible to globally rebalance a tree. One way is to copy thetree into an array of nodes, then do a something like a binary search on the array,recursively choosing the middle node in the array and inserting it into a new tree. Thatresults in a well balanced tree. The above degenerate tree would look like this aftersuch a rebalancing effort:

3

/ \

1 5

/ \ /

0 2 4

Even better is that this rebalancing scheme is pretty easy to imagine and implement,but it's not very efficient. Another good way to globally rebalance a tree is to changethe structure into the worst possible case as shown above (only temporarily though!),then perform what's called a rotation at every other node down the tree repeatedly.I'll leave an implementation up to you (it's called DSW, if you want to google it), butthe process would look something like this:

0


26/26

3

\

4

trees in datastructures

Documents