cpsc 211 data structures & implementations (c) texas a&m...
TRANSCRIPT
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 0 ]
About These Slides
These slides were developed by
Prof. Jennifer WelchDepartment of Computer ScienceTexas A&M UniversityCollege Station, TX [email protected]
during Spring 1999. Comments and suggestions forimprovements are welcome.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 1 ]
What are Data Structures?
Data structures are ways to organize data (informa-tion). Examples:
� simple variables —
� objects —
� arrays —
� linked lists —
Typically, algorithms go with the data structures tomanipulate the data (e.g., the methods of a class).
This course will cover some more complicated datastructures:
� how
� what
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 2 ]
Abstract Data Types
An abstract data type(ADT) defines
�
�
Similar to a
This course will cover
� specifications of
� pros and cons of
� how the
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 3 ]
Specific ADTs
The ADTs to be studied (and some sample applica-tions) are:
�
�
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 4 ]
How Does C Fit In?
Although data structures are universal (can be imple-mented in any programming language), this course willuse Java and C:
�
�
We will learn how to gain the advantages of
Reasons to learn C:
� learn
� useful
� ubiquitous and
� Unix
� C code can be very
� very efficient
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 5 ]
Other Topics
Course will emphasizegood software developmentpractice:
�
�
�
�
Course will touch on several moreadvanced computerscience topicsthat appear later in the curriculum, andfit in with our topics this semester:
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 6 ]
Principles of Computer Science
Computer Science is like:
� engineering:
� science:
� math:
However, CS studies
Recurring concepts in computer science are:
� layers, hierarchies, information-hiding, abstraction,interfaces
� efficiency, tradeoffs, resource usage
� reliability, affordability, correctness
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 7 ]
Introduction to Data Structures
Data structures are one of the enduring principlesin computer science.Why?
1. Data structures are based on the notion of informa-tion hiding:
2. A number of data structures are useful in a widerange of applications.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 8 ]
Efficiency Considerations
Since these data structures are so widespread, it’s im-portant to implement them efficiently. Measures ofefficiency:
�
�
in
�
�
We will study tradeoffs, such as
�
�
Efficiency will be measured using
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 9 ]
Asymptotic Analysis
Actual (wall-clock) time of a program is affected by:
�
�
�
�
�
�
Instead of wall-clock time, look at thepatternof theprogram’s behavioras the problem size increases. Thisis calledasymptotic analysis.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 10 ]
Big-Oh Notation
Big-oh notation is used to capture the generic
From a practical point of view, you can get the big-ohnotation for a function by
1.
2.
Which terms are lower order than others?In increas-ing order:
Examples:
� 4302 =
� n3 + n log n + n5 + n =
� 34n3 � 2n log n + :0004n5 + 5:2n =
See Appendix B, Section 4 of Standish, or CPSC 311,for mathematical definitions and justifications.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 11 ]
Why Multiplicative Constants are Unimportant
An example showing how multiplicative constants be-come unimportant asn gets very large:
n 1000 log n :0001 � n2
2
256
4096
8192
16,384
32,768
1,048,576
Big-oh notation is not always appropriate! If yourprogram is working on small input sizes,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 12 ]
Generic Steps
How can you figure out the running time of an algo-rithm without implementing it, running it on variousinputs, plotting the results, and fitting a curve to thedata? And even if you did that, how would you knowyou fit the right curve?
We countgeneric stepsof the algorithm. Each genericstep that we count should be
Classifying an assignment statement as a generic stepis
Classifying a statement “sort the entire array” as a genericstep is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 13 ]
Stack vs. Heap
Memory used by an executing program is partitioned:
� the stack:
– When a method begins executing, a piece of thestack (stack frame) is devoted to it.
– There is an entry in the stack frame for�
�
�
– For variables of primitive type, the data itself isstored
For variables of object type,
– When the method finishes, the method’s stack frameis
� the heap:Dynamically allocated memory goes here,including the actual data for objects. Lifetime is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 14 ]
Stack Frames Example
main calls p p calls q
q returns p calls r r calls s
s returns r returns p returns
main main main
main main main
main main main
p p
p p p
p p
r r
r
q
s
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 15 ]
Objects
An object is an entity (e.g., a ball) that has
� state—
� behavior —
A classis the
Analogy: a class is like an
an object is like an
� class defines important
� construction is required to
� many objects/houses can be created
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 16 ]
Data Abstraction
The class concept supports
Similar principles apply as for procedural abstraction:
� group
� group
� separate the issue of
� separate the issue of
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 17 ]
References
The class of an object is its
Objects are declared differently than are variables ofprimitive types.
Suppose there is a class calledPerson .
int total;Person neighbor;
� Declaration oftotal allocates storage on the
� Declaration ofneighbor allocates storage on the
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 18 ]
Creating Objects
A constructor is a special method of the class that
When a constructor is called,
� storage space is allocated
� each object gets
� the object’s state is
The name of the constructor for classX is X() . Ex:
neighbor = new Person();
The operatornew must be put in front of the call to theconstructor.
Summary: Declaring a variable of an object type pro-duces
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 19 ]
Creating Objects (cont’d)
You can combine the declaration and initialization:Person neighbor = new Person();just as you can for primitive types:int total = 25;
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 20 ]
Object Assignment & Aliases
The meaning of assignment isdifferentfor objects thanit is for primitive types.int num1 = 5;int num2 = 12;num2 = num1;
At the end,num2 holds 5.Person neighbor = new Person(); // creates object 1Person friend = new Person(); // creates object 2friend = neighbor;
At the end,friend andneighbor both refer to ob-ject 1 (they arealiasesof each other) and nothing refersto object 2 (it isinaccessible).
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 21 ]
Data Abstraction Revisited
As a rule of thumb, referring to instance variables out-side the class is
For instance, the implementor of thePerson classmight decide to store the age
In this case,getAgeInYears must change:
Code that got the age using this method need not change,but code that got the age using.age directly
Moral:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 22 ]
Public vs. Private
You can tailor the ability to access methods and vari-ables from outside the class, usingvisibility modifiers .
� public: the variable or method can
� private: the variable or method can
Visibility modifiers go at the beginning of the line thatdeclares the variable or method. Ex:public static void main(...private int age;Rules of thumb:
� make instance variables
� make instance methods that are part of the publicinterface of the class
� make instance methods that help with internal workof a class
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 23 ]
Public vs. Private (cont’d)
Instance variables should be accessible onlyindirectlyvia public ”get” and ”set” methods. Ex:
getAgeInYears()
Group together all the private variables/methods, andall the public ones when you format your program.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 24 ]
Specification vs. Implementation
Users of a class should rely only on the specification ofthe class. They are allowed to
� declare
� create
� invoke
Implementors of a class should
� define
� hide
� protect
� feel free to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 25 ]
Inheritance
Inheritance lets a programmer derive a new class froman existing class. New class can
� use
� modify
� have
Thus inheritance promotessoftware reuse. It is a defin-ing characteristic of
Terminology:
� Class A isderived from (or, inherits from) anotherclass B
� A is calledsubclassor child class.
� B is calledsuperclassor parent class.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 26 ]
Benefits of Inheritance
Inheritance is particularly useful inlargesoftware projects:
�
– saves
– provides
– supports
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 27 ]
Costs of Inheritance
�
– Usually this disadvantage is outweighed by
– Once system is working,
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 28 ]
Inheritance in Java
To declare that a class is a subclass of another class:
class <child-class> extends <parent-class> {... // define the child-class
}
� child class inherits
� child class inherits
� child class does NOT inherit
� child class does NOT inherit
Inherited variables and methods can be used in thechild class
Inheritance is one-way street!!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 29 ]
Protected Visibility
� private :
� public :
This makes it dangerous to inherit variables, since nor-mally instance variables should not be made accessibleoutside the class.
The solution is
� protected:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 30 ]
Overriding Methods
When a child class defines a method with the samenameand signature(sequence of parameters) as theparent, the child’s versionoverrides the parent’s ver-sion. Useful when
Polymorphism means that
These are not necessarily the same, since a variable canrefer to any object whose class is a descendant of thevariable’s class.
When in doubt, draw a memory diagram!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 31 ]
Abstract Classes — Motivation
Consider a database for a veterinarian to keep track ofmedical and billing information for each patient.
� Each patient is someone’s pet (e.g., dog, bird).
� Some aspects of the vet’s business are independentof the particular species (e.g., billing, owner info).
� Some aspects depend critically on the species (e.g.,the vaccination schedule, diet recommendations).
An obvious organization is to have a
Note that it does not make sense to create aPet object—
ThePet class is used to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 32 ]
Rules for Abstract Classes and Methods
� Only instance methods can be declared
� Any class with an abstract method must be declared
� A class may be declared abstract
� An abstract class cannot
� A non-abstract subclass of an abstract class must
� If a subclass of an abstract class does not implementall of the abstract methods that it inherits, then
Since an abstract class cannot be instantiated, its vari-ables and methods are notdirectly used. But they canbe
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 33 ]
Declaring an Interface
An interface is an abstract class taken to the extreme.It is like an abstract class in which
interface <interface name> {<constant declarations> // public final<abstract method declarations> // public abstract
}
An interface provides
� a collection of
� a collection of
For example:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 34 ]
Implementing an Interface
The syntax for “inheriting from” (calledimplement-ing) an interface I is:
class B implements I { ... }
For example:
The classAccount
� can access the
� must provide an implementation of
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 35 ]
Abstract Classes vs. Interfaces
� An abstract class can be used as a repository of
� A class can implement
� Both abstract classes and interfaces can be used to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 36 ]
Object-Oriented Design
The design of a software system is an iterative process.
� choose
� develop
� previous step may indicate that
� develop
� etc.
As the design matures, objects are abstracted into classes:
� group
� put
� determine
Initial design effort focuses on the overall structure ofthe program. The algorithms for the methods are spec-ified using pseudocode. Actual coding begins
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 37 ]
Deciding on Objects and Classes
Make some guesses about what the objects in the sys-tem are and try to arrange them into groups (whichwill be the classes). Although you should put seriousthought into this,don’t try to do this perfectly on thefirst pass.
Rule of Thumb:
Later you may need
As you come up with the objects, some details (vari-ables and methods) will be obvious. Document theseand test them out with scenarios —
A scenariois a
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 38 ]
Linked List
Linked lists are useful when
Linked lists are an example of
Separate blocks of storage are
Linked representations are an important alternative to
Many key abstract data types (lists, stacks, queues, sets,trees, tables) can be represented with either
Important to understand the
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 39 ]
Pointers
Pointers in Java are called
However, you cannot
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 40 ]
Linear Linked Lists
The list consists of a series of
Each node contains
�
�
To realize this idea in Java:
� each
� class
–
–
� another class
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 41 ]
Linear Linked Lists (cont’d)
Here is a diagram of the heap:
Space complexity:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 42 ]
Linked List Example — Node Class
For a linked list of books, first define a class that rep-resents individual list elements (nodes).
The type of the link variable is thesameas the classbeing defined —
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 43 ]
Linked List Example — List Class
Then define a class that represents
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 44 ]
Linked List Operations
What should be the operations on a linked list?
� –
–
–
� –
–
–
�
Add some instance methods to theBookList class:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 45 ]
Using a Linked List
Example:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 46 ]
Inserting at the Front of a Linked List
Pseudocode:
1.
2.
In Java (assuming the parameter is not null):
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 47 ]
Inserting at the Front of a Linked List (cont’d)
What happens if we do step 1 and step 2 in the oppositeorder?
Time Complexity:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 48 ]
Inserting at the End of a Linked List
First, assume the list is empty (i.e.,first equalsnull ).
1.
2.
Now, assume the list is not empty (i.e.,first doesnot equalnull ).
1.
2.
How do we do step 1?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 49 ]
Inserting at the End of a Linked List (cont’d)
Time Complexity:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 50 ]
Using a Last Pointer
To improve running time, keep a pointer to the lastnode in the list class, as well as the first node.
Time Complexity:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 51 ]
Using a Last Pointer (cont’d)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 52 ]
Deleting Last Node from Linked List
Suppose we want to delete the node at the end of thelist and return the deleted node.
First, let’s handle theboundary conditions:
� If the list is empty,
� If the list has only one element
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 53 ]
Deleting Last Node from Linked List (cont’d)
Suppose the list has at least two elements.First attempt:
1.
2.
3.
...
return thisStep 1 can be done as before.
What about step 2?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 54 ]
Deleting Last Node from Linked List (cont’d)
Time Complexity:
Would it help to keep a last pointer?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 55 ]
Linked Lists Pitfalls
� Check that a link is not null before following it!Example:
� Mark end of list
� Be careful with boundary cases!
� Draw memory diagrams!
� Don’t lose access to needed objects!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 56 ]
Linked Lists vs. Arrays
Space complexity:
Time Complexity (n data items):
singly singly doubly doubly arraylinked linked, linked linked,
last ptr last ptrinsert front
insert end
delete first
delete last
search
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 57 ]
Linked Lists vs. Arrays (cont’d)
Suppose the items in the sequence are in sorted order.Then data items must be inserted in the correct place.But perhaps this will make searching for an item easier.Break the insertion process into two parts:
1. search
2. insert
singly singly doubly doubly arraylinked linked, linked linked,
last ptr last ptrsearch
insert
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 58 ]
Linked Lists vs. Arrays (cont’d)
Tradeoff:
� linked list:
– insert is
– search is
because nodes
� arrays:
– insert is
– search is
because nodes
Binary search cannot be used on
Later we will see some other data structures that try to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 59 ]
Other Linked Structures
We don’t have to restrict ourselves to just having onelink instance variable per node. We can get arbitrarilycomplicated linked structures.
Some of the more common and useful ones are:
� doubly linked list —
� rings —
� trees —
� general graphs —
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 60 ]
Recursion
Idea ofrecursion is closely related to the principle of
� Figure out how to
� Assume you have a
� Figure out how to
This is also an application of
Rules for recursive programs:
� There must be
� Recursive call(s) must
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 61 ]
Stack Frames for Recursive Methods
When a recursive method is executed,
Example:The factorial ofn, representedn!, is calculated asn �(n� 1) � (n� 2) � � � 2 � 1.
To computen!:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 62 ]
Stack Frames for Factorial Example
Stack frames when callingfact(4) :
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 63 ]
Reversing a Linked List Recursively
To find a recursive solution, break the problem downinto a smaller problem. Let the list consist of nodesx1; x2; : : : ; xn.
One idea:
1. Reverse
2. Put
Step 1 solves a smaller problem; step 2 does a littlemore work to solve the larger problem.
(A similar idea:
1. Reverse
2. Put
Stopping case?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 64 ]
Reversing a Linked List Recursively (cont’d)
abstract class Node {Node link;
}class LinkedList {
Node first;...void reverseList() {
first = reverse(first);}
}
reverseList is an instance method that
Note a common occurrence:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 65 ]
Reversing a Linked List Recursively (cont’d)
� reverse takes as a parameter
� reverse returns
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 66 ]
Concatenating Two Lists
Methodconcat appends the list starting with node bto the end of the list starting with node a. It returns areference to the first node in the resulting list.
Time Complexity:To reverse a list ofn nodes takes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 67 ]
Figure for Reversing a Linked List Recursively
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 68 ]
Reversing an Array Recursively
Let A be an array of sizen. To reverseA, we mustchange which indexes are occupied by which data, sothat at the end:
� A[0] contains
� A[1] contains
� etc.
We can follow the ideas from the linked list:
1. save
2. recursively cause
3. store
This breaks the problem of sizen down into a subprob-lem of sizen� 1.Stopping case:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 69 ]
Reversing an Array Recursively (cont’d)
The following reverses the elements ofA starting atindexstart :
The top level call is:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 70 ]
Figure for Reversing an Array Recursively
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 71 ]
Towers of Hanoi
Towers of Hanoi is is an example of a problem thatis mucheasier to solve using recursion than not usingrecursion.
� There are 3 pegs andn disks, all of different sizes
� Initially all disks are on the start peg, stacked indecreasing size, with largest on bottom and smalleston top.
� We must move all the disks to the end peg
� The third peg
Example:n = 2. Solution is:
1. Move
2. Move
3. Move
For largern, it becomes difficult to figure out.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 72 ]
Recursive Solution to Towers of Hanoi
Using recursion can help. Suppose someone gives us amethodM to moven� 1 pegs. We can use it to solvethe problem forn pegs as follows:
1. Move
2. Move
3. Move
Steps 1 and 3 will be done
Stopping case?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 73 ]
Figure for Towers of Hanoi
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 74 ]
Recursive Solution to Towers of Hanoi (cont’d)
The output of the program will be a list of instructions.
To call this method, suppose you have 4 pegs and youwant to use peg 1 as the start peg, peg 3 as the finishpeg, and peg 2 as the spare peg:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 75 ]
Time Complexity of Towers of Hanoi Solution
Time Complexity:Asymptotically proportional to thenumber of
Each instantiation of the method
To count the number of instantiations, draw a
Number of vertices in the tree is
Therefore time complexity is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 76 ]
Parsing Arithmetic Expressions
An important part of a compiler is theparser, whichchecks whether
An important part of this problem is to check whether
� a + (b� (x=y))
� a + +b=z
� (a)) � c
To simplify the problem:
� Assume that the operands are
� Only consider operators
The correct syntax for arithmetic expressions can bedescribed using
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 77 ]
A Grammar for Arithmetic Expressions
Sample Rules:(j means “or”)
1.
2.
3.
Here are some derivations:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 78 ]
Recursive Parsing Algorithm
Idea is to try to obtain an expression from the input. Todo this, try to obtain from the input
�
�
�
To obtain a term from the input (starting at the currentposition), try to obtain
�
�
�
To obtain a factor from the input (starting at the currentposition), try to obtain
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 79 ]
Recursive Parsing Algorithm (cont’d)
At the top level:
boolean valid(String input) {String remainder = getExpr(input);return ((remainder != null) &&
(remainder.length() == 0));}
getExpr recognizes an expression at the beginningof input and returns the rest of the string, which willbe the empty string if nothing is left over. If a syntaxerror is encountered, it returnsnull . (Does not handlewhite space in the input.)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 80 ]
Recursive Parsing Algorithm (cont’d)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 81 ]
Abstract Data Types
An abstract data type(ADT) defines entities that have
�
�
ADTs provide the benefits of
There is astrict separationbetween
This separation facilitates
ADTs are easily achieved in
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 82 ]
ADT Example: Priority Queue Specification
Thepriority queue ADT is useful in many situations.Here is its specification:� The state is
� The operations on a priority queue are:
–
–
–
Note thatthere is no operationto
Example applications:� Pay
� Provide
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 83 ]
Using a Priority Queue to Sort a List of Integers
Even without knowing anything abouthow a priorityqueue might be implemented, we can take advantageof its operations to solve other problems.
For example, to sort a list of numbers:
� Insert
� Successively
� Store
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 84 ]
Implementing a Priority Queue with an Array
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 85 ]
Implementing a Priority Queue with a Linked List
Pseudocode:
� To insert an element:
� To remove the highest priority element:
– Scan
– When
Time is
Asymptotic running times are
Time to sort is
Can we do things faster by keeping the array, or linkedlist, elements in sorted order?Warning:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 86 ]
Implementing a PQ with a Sorted Array
Keep the array elements in increasing order of priority.(If highest priority is smallest element, then elementswill be in decreasingorder).Pseudocode:
� To insert an element:
� To remove the highest priority element:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 87 ]
Implementing a PQ with a Sorted Linked List
Pseudocode:
� To insert an element:
� To remove the highest priority element:
Asymptotic times are
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 88 ]
Generic PQ Implementation Using Java
To avoid rewriting the priority queue implementationfor every different kind of element (integer, double,String, user-defined classes, etc.), we can use Java’sinterface feature.
All that is required is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 89 ]
Using theComparisonKey Interface
� Change the specification of thePriorityQueueclass to consist of a collection of
� Any class that
� Define a class calledPQItem that
� sortPQ , the sorting algorithm that uses a priorityqueue, can
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 90 ]
Generic Implementation of PQ with Array
class PriorityQueue {private ComparisonKey[] A =
new ComparisonKey[100]; // int -> CKprivate int next;PriorityQueue() {
next = 0;}public void insert(ComparisonKey x) { // int -> CK
A[next] = x;next++;
}public ComparisonKey remove() { // int -> CK
ComparisonKey high = A[0]; // int -> CKint highLoc = 0;for (int cur = 1; cur < next; cur++) {
if (high.compareTo(A[cur]) ==ComparisonKey.LOWER) { // use compareTo metho d
high = A[cur];highLoc = cur;
}}A[highLoc] = A[next-1];next--;return high;
}}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 91 ]
Implementing the GenericPQItem
Here is a possiblePQItem class for integers. Note
For aPQItem class for strings:
� make
� make
� the method
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 92 ]
GenericPQItem ’s (cont’d)
This approach is particularly powerful since we can
Suppose the items are
One form of priority might be
Another form might be
All those decisions will be encapsulated inside the
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 93 ]
Sorting with Generic PQ
Finally, here is the sorting algorithm:
void sortPQ (ComparisonKey[] A) {int n = A.length;PriorityQueue pq =
new PriorityQueue();for (int i = 0; i < n; i++)
pq.insert(A[i]);for (int i = 0; i < n; i++)
A[i] = pq.remove();}
The only difference from before is
IMPORTANT TO NOTICE:
� ThePriorityQueue class
� ThesortPQ method
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 94 ]
Importance of Modularity and Information Hiding
Why is it valuable to be able to do these kinds of things?
The public/private visibility modifiers of Java, and thediscipline of not making the internal details be avail-able outside are forms of
Information hiding promotesmodular programming— you can
The key to abstraction is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 95 ]
Compiling and Running a C Program in Unix
Simple scenario in which your program is in a singlefile: Suppose you want to name your programtest .
1. edit
2. compile
3. if
4. run
5. if
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 96 ]
Structure of a C Program
A C program is a list of
Every C program must contain
Functions are
� The
� For
�
� The\n is
� Comments
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 97 ]
A Useful Library
See the Reek book (especially Chapter 16) for a de-scription of what you can do with built-in libraries. Inaddition tostdio.h ,
� stdlib.h lets you use functions for, e.g.,
–
–
–
–
� math.h provides
� string.h has
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 98 ]
Printf
The functionprintf is used to print the standard out-put (screen):
� It can take a
� The first argument must
� The first argument might
� A
� Following the first argument is a
Example:
Output is:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 99 ]
Variables and Arithmetic Expressions
The main numeric data types that we will use are:
�
�
�
Variables are declared and manipulated in arithmeticexpressions pretty much as in Java. For instance,
However, in C,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 100]
Reading from the Keyboard
The functionscanf reads in data from the keyboard.
� scanf takes a
� The first argument is
� Each
� After the first argument is a
� The subsequent arguments must each be
� The code for an
When you run this program, it will wait for you to entertwo integers, and then continue. The integers can be onthe same line separated by a space, or on two lines.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 101]
Functions
Functions in C are pretty much like methods in Java(dealing only with primitive types). Example:#include < stdio.h >double times2 (double x) {
x = 2*x;return x;
}main () {
double y = 301.4;printf("Original value is %f; final value is %f.\n",
y, times2(y));}
� Functions must be
� As in Java, parameters are
� As in Java, if the function does not return any value,
� Parameters and local variables of functions
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 102]
Recursive Functions
Recursion is essentially the same as in Java.
The only difference is if you have mutually recursivefunctions, also calledindirect recursion: for instance,if function A calls function B, while B calls A.
Then you have a problem with the requirement thatfunctions be defined before they are used.
You can get around this problem with
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 103]
Global Variables and Constants
C also providesglobal variables.
� A global variable is defined
� A global variable can be used
Generally, global variables that can be changed are frownedupon, as contributing to errors. However, global vari-ables are very appropriate forconstants. Constants aredefined usingmacros:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 104]
Boolean Expressions
� The operators to compare two values are the sameas in Java:
� However, instead of returning a boolean value, theyreturn
� Actually, C interprets
Thus the analog in C of aboolean expressionin Javais any expression that produces
As in Java, boolean expressions can be operated onwith Some examples:
� (10 == 3) evaluates to
� !(10 == 3) evaluates to
� !( (x < 4) || (y == 5) ) : if x is 10 andy is 5, then this evaluates to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 105]
If Statements and Loops
Given the preceding interpretation of “boolean expres-sion”, the following statements are the same in C as inJava:
�
�
�
�
Since Boolean expressions are essentially integers, youcan have afor statement like this in C:for (int count = 99; count; count--) {
...}
� count is initialized to
� the loop is executed
� count is
� This loop is executed
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 106]
Switch
C has a switch statement that is like that in Java:
switch ( <integer-expression> ) {case <integer-constant-1> :
<statements-for-case-1>break;
case <integer-constant-2> :<statements-for-case-2>break;
...default : <default-statements>
}
Don’t forget the break statements!
The integer expression must produce a value belongingto any of the integral data types (various size integersand characters).
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 107]
Enumerations
This is something neat that Java does not have.
An enumeration is a way to give
For instance, suppose you need to have some codesin your program to indicate whether a library book ischecked in, checked out, or lost. Intead of
#define CHECKED_IN 0#define CHECKED_OUT 1#define LOST 2
you can use anenumeration declaration:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 108]
Using an Enumeration in a Switch Statement
int status;/* some code to give status a value */switch (status) {
case CHECKED_IN :/* handle a checked in book */break;
case CHECKED_OUT :/* handle a checked out book */break;
case LOST :/* handle a lost book */break;
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 109]
Enumeration Data Type
You can give a name to an enumeration and thus createanenumeration data type. The syntax is:
enum <name-of-enum-type> <actual enumeration>
For example:
enum book_status { CHECKED_IN, CHECKED_OUT, LOST };
Why bother to do this?
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 110]
Type Synonyms
The enumeration type is our first example of auserdefined typein C.
It’s rather unpleasant to have to carry around the wordenum all the time for this type.
Instead, you can give a name to this type you havecreated, and subsequently just use that type – withouthaving to keep repeatingenum. For example:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 111]
Structures
C also gives you a way to create more general types ofyour own, asstructures These are essentially like ob-jects in Java, if you just consider the instance variables.A structure groups together related data items that canbe of different types.
The syntax to define a structure is:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 112]
Storage on the Stack
The statement
struct student stu;
causes the entirestu structure to be stored
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 113]
Using typedef with Structures
When using the structure type, you have to carry alongthe wordstruct .
To avoid this, you can use a
A more concise way to do this is:
Now you can create aStudent variable:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 114]
Using a Structure
You can access the pieces of a structure using dot no-tation (analogous to accessing instance variables of anobject in Java) :
You can also have theentirestruct on either the left orthe right side of the assignment operator:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 115]
Figure for Copying a Structure
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 116]
Passing a Structure to a Function
Structures can be passed as parameters to functions:
Then you can call the function:
But if you put the following line of code after theprintfin print info :
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 117]
Returning a Structure From a Function
You can return a structure from a function also. Sup-pose you have the following function:
Now you can call the function:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 118]
Figure for Returning a Structure from a Function
The copying of formal parameters and return valuescan be avoided by
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 119]
Arrays
To define an array:
For example:
� Unlike Java,
� Unlike Java,
� Unlike Java,
� As in Java,
� As in Java,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 120]
Arrays (cont’d)
Two things you CAN do:
� If you have an array of structures,
� You can declare a two-dimensional array (and higher):e.g.,
Two things you CANNOT do:
�
�
We’ll see how to accomplish these tasks
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 121]
Pointers in C
Pointers are used in C to
� circumvent
– copying of parameters and return values
– lasting changes
� access
� allow
For each data type T,
For instance,
declaresiptr to be of type “pointer toint ”. iptrrefers to a
Actually, most C programmers write it as:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 122]
Addresses and Indirection
Computer memory is
Each variable is
Theaddressof the variable is
� iptr refers to
� *iptr refers to
Applying the* operator is called
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 123]
The Address-Of Operator
We saw the& operator inscanf . It
int i;int* iptr;i = 55;iptr = &i;*iptr = *iptr + 1;
Last line gets data out of location whose address is iniptr , adds 1 to that data, and stores result back inlocation whose address is iniptr .
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 124]
Comparing Indirection and Address-Of Operators
As a rule of thumb:
� Indirection:
– It CANNOT
– It CAN
� Address-Of:
– It CAN
– It CANNOT
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 125]
Pointers and Structures
Remember the struct typeStudent , which has anint age and adouble grade point :
Student stu;Student* sptr;sptr = &stu;
To access variables of the structure:
There is a “shorthand” for this notation:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 126]
Passing Pointer Variables as Parameters
You can pass pointer variables as parameters.
void printAge(Student* sp) {printf("Age is %i",sp->age);
}
When this function is called,
1. aStudent* variable:
or
2. apply the& operator to aStudent variable:
C still uses call by value to pass pointer parameters, butbecause they are pointers, what gets copied are
Data comingin to the function is not copied.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 127]
Passing Pointer Variables as Parameters (cont’d)
Now we can
void changeAge(Student* sp, int newAge) {sp->age = newAge;
}
You can also
Old initialize with copying:
Student initialize(int old, double gpa) {Student st;st.age = old;st.grade_point = gpa;return st;
}
More efficientinitialize using pointers:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 128]
Passing Pointer Variables as Parameters (cont’d)
Using pointers is anoptimizationin previous case. Butit is
void swapAges (Student* sp1, Student* sp2) {int temp;temp = sp1->age;sp1->age = sp2->age;sp2->age = temp;
}
To call this function:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 129]
Pointers and Arrays
The name of an array is
It is a
To reference array elements, you can use
�or
�
What is going on with the pointer notation?
� a refers to
� *a refers to
� a+1 refers to
� *(a+1) refers to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 130]
Pointers and Arrays (cont’d)
You can also refer to array elements with
For example,
int a[5];int* p;p = a; /* p = &a[0]; is same */
� p refers to
� *p refers to
� p+1 refers to
� *(p+1) refers to
Sincep is a non-constant pointer, you can also
Warning: NO BOUNDS CHECKING IS DONE INC!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 131]
Passing an Array as a Parameter
To pass an array to a function:
void printAllAges(int a[], int n) {int i;for (i = 0; i < n; i++) {
printf("%i \n", a[i]);}
}
The “array” parameter indicates
Alternative definition:
void printAllAges(int* p, int n) {int i;for (i = 0; i < n; i++) {
printf("%i \n", *p);p++;
}}
Theformalarray parameter is a
You can call the function like this:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 132]
Dynamic Memory Allocation in Java
JavaThat means that
This happens whenever
In Java there is strict distinction between
Every variable is either
� memory for variables is
This memory
� memory for variables of primitive type
� memory that holds the actual contents of an objectis
This memory goes away
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 133]
Dynamic Memory Allocation in C
In C,
Every type has the possibility of being allocated stati-cally (on the stack) or dynamically (on the heap).
To allocate space statically, you
Space is allocated
To allocate space dynamically, use
� It takes one integer parameter indicating the
Usesizeof operator to get the length;
� It returns a
The pointer has typevoid* . You MUST cast it tothe appropriate type. Ifmalloc fails to allocate thespace,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 134]
malloc Example
To dynamically allocate space for anint :
int* p;p = (int*) malloc(sizeof(int)); /* cast result
to int* */if (p == NULL) { /* to be on the safe side */
printf("malloc failed!");} else {
*p = 33;printf("%i", *p);
}
Normally, you don’t need to allocate a single integer ata time. Typically, you would usemalloc to:
� allocate
� allocate
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 135]
Another malloc Example
To dynamically allocate space for a structure:
Student* sptr;sptr = (Student*) malloc(sizeof(Student));sptr->age = 20;sptr->grade_point = 3.4;
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 136]
Allocating a Linked List Node Dynamically
For a singly linked list of students, use this type:
typedef struct Stu_Node{int age;double grade_point;struct Stu_Node* link;
} StuNode;
To allocate a node for the list:
To insert the node pointed to bysptr after the nodepointed to by some other node, saycur :
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 137]
Allocating an Array Dynamically
To allocate an array dynamically,
int i;int* p;p = (int*) malloc(100*sizeof(int)); /* 100 elt array *//* now p points to the beginning of the array */for (i = 0; i < 100; i++) /* initialize the array */
p[i] = 0; /* access the elements */
Similarly, you can allocate an array of structures:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 138]
Deallocating Memory Dynamically
When memory is allocated using malloc,
You can get
void sub() {int *p;p = (int*) malloc(100*sizeof(int));return;
}
Although the space for the pointer variablep goes awaywhensub finishes executing,
But they are completely useless aftersub is done,
If you had wanted them to be accessible outside ofsub ,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 139]
Using free
To deallocate memory when you are through with it,
It takes as an argument a
and returns nothing. The result offree is that all thespace starting at the designated location will be
In the functionvoid sub above, just before the re-turn, you should say:
DO NOT DO THE FOLLOWING:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 140]
Saving Space with Arrays of Pointers
Suppose you need an array of structures, where eachstructure is fairly large. But you are not sure at compiletime how big the array needs to be.
1. Allocate
2. Find out
3. Allocate
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 141]
Array of Pointers Example
To implement with the usualStudent struct :
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 142]
Information Hiding in C
Java provides support for information hiding by
�
�
Advantages of data abstraction, including the use ofconstructor and accessor (set and get) functions:
� push
� easier
� easy
� easy
C does not provide the same level of compiler supportas Java, but you can achieve the same effect with some
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 143]
Information Hiding in C (cont’d)
A “constructor” in C would be a function that
� calls
� initializes
� returns
For example:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 144]
Information Hiding in C (cont’d)
The analog of a Java instance method in C would bea function whose first parameter is the “object” to beoperated on.
You can writeset andget functions in C:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 145]
Information Hiding in C (cont’d)
You can use theset andget functions to swap theages for two student objects:
When should you provide set and get functions andwhen should you not? They obviously impose someoverhead in terms of additional function calls.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 146]
Strings in C
� There is no explicit string type in C.
� A string in C is an array of characters that isterminated with the null character.
� The length
� The null character
� A sequence of characters enclosed in double quotes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 147]
Strings in C (cont’d)
� You can also declare a
To initializename, do not assign to a string literal!Instead, either
� Access elements using the brackets notation:
char firstLetter;name[3] = ’a’;firstLetter = name[0];namePtr[3] = ’b’;firstLetter = namePtr[0];
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 148]
Passing Strings to and from Funtions
To pass a string into a function or return one from afunction, you mustPassing in a string:
Returning a string:
You can call these functions like this:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 149]
Reading in a String from the User
To read in a string from the user, call:
scanf("%s", name);
� Notice the use of%sin scanf . The correspondingdata must be a
� scanf reads a string from the input stream up to
� The letters are read into
� You must make sure that you have a large enougharray to hold the string.How much space is needed?
� If you don’t have enough space, whatever followsthe array will be
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 150]
String Manipulation Functions
There are some useful string manipulation functionsprovided for you in C. These include:
� strlen , which takes a string as an argument andreturns the length of the string,not counting thenull character at the end. I.e., it counts how manycharacters it encounters before reaching’\0’ .
� strcpy , which takes two strings as arguments andcopies itssecondargument to itsfirst argument.
First, to use them, you need to include headers for thestring handling library:
#include <string.h>
To demonstrate the use ofstrlen andstrcpy , sup-pose you want to add anamecomponent to theStudentstructure and change the constructor so that it asks theuser interactively for the name:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 151]
String Manipulation Functions Example
typedef struct {char* name;int age;double grade_point;
} Student;
Student* constructStudent(int age, double gpa) {char inputBuffer[100]; /* read name into this */Student* sptr;sptr = (Student*) malloc(sizeof(Student));sptr->age = age;sptr->grade_point = gpa;
/* here’s the new part: */printf("Enter student’s name: ");scanf("%s", inputBuffer);
/* allocate just enough space for the name */sptr->name = (char*) malloc (
(strlen (inputBuffer) + 1)*sizeof(char) );/* copy name into new space */
strcpy (sptr->name, inputBuffer);return sptr;
}
When constructor returns,inputBuffer goes away.Space allocated forStudent object is anint , adoubleand just enough space for the actualname.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 152]
Other Kinds of Character Arrays
Not every character array has to be used to represent astring. You may want a character array that holds allpossible letter grades, for instance:
char grades[5];grades[0] = ’A’;grades[1] = ’B’;grades[2] = ’C’;grades[3] = ’D’;grades[4] = ’F’;
In this case, there is no reason for the last array entryto be the null character, and in fact, it is not.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 153]
File Input and Output
File I/O is much simpler than in Java.
� Include
� Declare
� Call
� Writing to a file is done with
� Reading from a file is done with
� Call
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 154]
File I/O Example
/* to use the built in file functions */#include <stdio.h>main () {/* create a pointer to a struct called FILE; *//* it is system dependent */
FILE* fp;char line[80];int i;
/* open the file for writing */fp = fopen("testfile", "w");
/* write into the file */fprintf(fp,"Line %i ends \n", 1);fprintf(fp,"Line %i ends \n", 2);
/* close the file */fclose(fp);
/* open the file for reading */fp = fopen("testfile", "r");
/* read six strings from the file */for (i = 1; i < 7; i++) {
fscanf(fp,"%s", line);printf("got from the file: %s \n", line);
}/* close the file
fclose(fp);}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 155]
Motivation for Stacks
Some examples oflast-in, first-out(LIFO) behavior:
� Web browser’s
� Text editors
� The most recent pending method/function call
� To evaluate an arithmetic expression,
A stack is a sequence of elements, to which elementscan be added (push) and removed (pop):
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 156]
Specifying an ADT with an Abstract State
We would like a specification to be as independent ofany particular implementation as possible.
But since people naturally think in terms of state, apopular way to specify an ADT is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 157]
Specifying the Stack ADT with an Abstract State
1. A stack’s state is modeled as
2. Initially the state of the stack is
3. The effect of a push(x) operation is to
4. The effect of a pop operation is to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 158]
Specifying an ADT with Operation Sequences
But a purist might complain that a state-based specifi-cation is, implicitly, suggesting a particular implemen-tation. To be even more abstract, one can specify anADT
For instance:
� push(a) pop(a):
� pop(a):
� push(a) push(b) push(c) pop(c) pop(b) push(d) pop(d):
� push(a) push(b) pop(a):
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 159]
Additional Stack Operations
Other operations that you sometimes want to provide:
� peek:
� size:
� empty:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 160]
Balanced Parentheses
Recursive definition of a sequence of parentheses thatis balanced:
� the sequence
� if the sequence
According to this definition:
� ( ) :
� ( ( ) ( ( ) ) ) :
� ( ( ) ) ) ( ) :
� ( ) ) ( :
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 161]
Algorithm to Check for Balanced Parentheses
Key observations:
1. There must be
2. In any prefix, the number of
Pseudocode:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 162]
Java Method to Check for Balanced Parentheses
Usingjava.util.Stack class (which manipulatesobjects):
import java.util.*;
boolean isBalanced(char[] parens) {Stac k S = new Stack();try { // pop might throw an exception
for (int i = 0 ; i < parens.length; i++) {if ( parens[i] == ’(’ )
S.push(new Character(’(’));else
S.pop(); // discard popped object}return S.empty();
}catch (EmptyStackException e) {
return false;}
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 163]
Checking for Multiple Kinds of Balanced Parens
Suppose there are 3 different kinds of parentheses:( and ), [ and ],f andg.
Modify the program:
boolean isBalanced3(char[] parens) {Stac k S = new Stack();try {
for (int i = 0 ; i < parens.length; i++) {if (leftParen(parens[i]) // ( or [ or {
S.push(new Character(parens[i]));else {
char leftp = ((Character)S.pop()).charValue();if (!match(leftp,parens[i])) return false;
}}return S.empty();
} // end trycatch (EmptyStackException e) {
return false;}
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 164]
Multiple Kinds of Parentheses (cont’d)
boolean leftParen(char c) {return ((c == ’(’) || (c == ’[’) || c == ’{’));
}
boolean match(char lp, char rp) {if ((lp == ’(’) && (rp == ’)’) return true;if ((lp == ’[’) && (rp == ’]’) return true;if ((lp == ’{’) && (rp == ’}’) return true;return false;
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 165]
Postfix Expressions
We normally write arithmetic expressions usinginfixnotation:
Another way to write arithmetic expressions is to usepostfix notation:
For example,
� 3 4 + is same as
� 1 2 - 5 - 6 5 / + is same as
One advantage of postfix is that
For instance,
� (1 + 2) * 3 becomes
� 1 + (2 * 3) becomes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 166]
Using a Stack to Evaluate Postfix Expressions
Pseudocode:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 167]
StringTokenizer Class
Java’sStringTokenizer class is very helpful tobreak up the input string into operators and operands— called
� Create aStringTokenizer object out of the in-put string. It
� Use instance methodhasMoreTokens to test
� Use instance methodnextToken to
� Second argument to constructor indicates that,
� Third argument to constructor indicates that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 168]
Java Method to Evaluate Postfix Expressions
public static double evalPostFix(String postfix)throws EmptyStackException {
Stac k S = new Stack();StringTokenizer parser = new StringTokenizer
(postfix, " \n\t\r+-*/", true);while (parser.hasMoreTokens()) {
String token = parser.nextToken();char c = token.charAt(0);if (isOperator(c)) {
double y = ((Double)S.pop()).doubleValue();double x = ((Double)S.pop()).doubleValue();switch (c) {
case ’+’:S.push(new Double(x+y)); break;
case ’-’:S.push(new Double(x-y)); break;
case ’*’:S.push(new Double(x*y)); break;
case ’/’:S.push(new Double(x/y)); break;
} // end switch} // end ifelse if (!isWhiteSpace(c)) // token is operand
S.push(Double.valueOf(token));} // end whilereturn ((Double)S.pop()).doubleValue();
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 169]
Evaluating Postfix (cont’d)
public static boolean isOperator(char c) {return ( (c == ’+’) || (c == ’-’) ||
(c == ’*’) || (c == ’/’) );}
public static boolean isWhiteSpace(char c) {return ( (c == ’ ’) || (c == ’\n’) ||
(c == ’\t’) || (c == ’\r’) );}
Does not
Does no
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 170]
Implementing a Stack with an Array
Since Java supplies aStack class, why bother?
Idea:
Issues for Java implementation:
� elements in the array are to be of type
� throw exception if
� dynamically increase the size of the array to avoid
To handle the last point, we’ll do the following:
� initially,
� if array is full and a push occurs,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 171]
Implementing a Stack with an Array in Java
class Stack {private Object[] A;private int next;
public Stack () {A = new Object[16];next = 0;
}public void push(Object obj) {
if (next == A.length) {// array is full, double its size
Object[] newA = new Object[2*A.length];for (int i = 0 ; i < next; i++) // copy
newA[i] = A[i];A = newA; // old A can now be garbage collected
}A[next] = obj;next++;
}public Object pop() throws EmptyStackException {
if (next == 0)throw new EmptyStackException();
else {next--;return A[next];
}}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 172]
Implementing a Stack with an Array in Java (cont’d)
public boolean empty() {return (next == 0);
}
public Object peek() throws EmptyStackException {if (next == 0)
throw new EmptyStackException();else
return A[next-1];}
} // end Stack class
class EmptyStackException extends Exception {
public EmptyStackException() {super();
}}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 173]
Time Performance of Array Implementation
� push:
� pop:
� empty:
� peek:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 174]
Impementing a Stack with a Linked List in Java
Idea:
class StackNode {Object item;StackNode link;
}
class Stack {
private StackNode top; // first node in list, the top
public Stack () {top = null;
}
public void push(Object obj) {StackNode node = new StackNode();node.item = obj;node.link = top;top = node;
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 175]
Implementing a Stack with a Linked List in Java(cont’d)
public Object pop() throws EmptyStackException {
if (top == null)throw new EmptyStackException();
else {StackNode temp = top;top = top.link;return temp.item;
}}
public boolean empty() {return (top == null);
}
public Object peek() throws EmptyStackException {if (top == null)
throw new EmptyStackException();else
return top.item;}
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 176]
Time Performance of Linked List Implementation
� push:
� pop:
� empty:
� peek:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 177]
Interchangeability of Implementations
If you have done things right, you can:
� write a program using the built-inStack class
� compile and run that program
� then make available your ownStack class, usingthe array implementation (e.g., putStack.classin the same directory
� WITHOUT CHANGING OR RECOMPILING YOURPROGRAM, run your program — it will use the lo-calStack implementation and will still be correct!
� then replace the array-basedStack.class file withyour own linked-list-basedStack.class file
� again, WITHOUT CHANGING OR RECOMPIL-ING YOUR PROGRAM, run your program — itwill use the localStack implementation and willstill be correct!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 178]
Motivation for Queues
Some examples offirst-in, first-out(FIFO) behavior:
�
�
�
A queueis a
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 179]
Specifying the Queue ADT
Using the abstract state style of specification:
� The state of a queue is modeled as a
� Initially the state of the queue is the
� The effect of an enqueue(x) operation is to
� The effect of a dequeue operation is to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 180]
Specifying the Queue ADT (cont’d)
Alternative specification using allowable sequences wouldgive some rules (an “algebra”). Some specific exam-ples:
� enqueue(a) dequeue(a):
� dequeue(a):
� enqueue(a) enqueue(b) enqueue(c) dequeue(a) en-queue(d) dequeue(b):
� enqueue(a) enqueue(b) dequeue(b):
Other popular queue operations:
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 181]
Applications of Queues in Operating Systems
The text discusses some applications of queues in op-erating systems:
� to buffer data coming from a running process goingto a printer:
� a printer may be shared between several computersthat are networked together.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 182]
Application of Queues in Discrete Event Simulators
A simulation program is a program that mimics, or“simulates”, the behavior of some complicated real-world situation, such as
�
�
�
These systems are typically too complicated to be mod-eled exactly mathematically, so instead, they are sim-ulated: events take place in them according to somerandom number generator. For instance,
� at random times,
� at random times,
� at random times,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 183]
Using a Queue to Convert Infix to Postfix
First attempt: Assume infix expression is
For example:
� (((22=7) + 4) � (6� 2))
� (7� (((2 � 3) + 5) � (8� (4=2))))
Pseudocode:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 184]
Converting Infix to Postfix (cont’d)
Examples:
� (((22=7) + 4) � (6� 2))
Q:
S:
� (7� (((2 � 3) + 5) � (8� (4=2))))
Q:
S:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 185]
Converting Infix to Postfix with Precedence
It is too restrictive to require parentheses around every-thing.
Instead,precedence conventionstell
For instance,4 � 3 + 2 equals
We need to modify the above algorithm to handle op-erator precedence.
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 186]
Converting Infix to Postfix with Precedence (cont’d)
create queue Q to hold postfix expressioncreate stack S to hold operators not yet
added to the postfix expressionwhile there are more tokens do
get next token tif t is a number then enqueue t on Qelse if S is empty then push t on Selse if t is ( then push t on Selse if t is ) then
while top of S is not ( dopop S and enqueue result on Q
endwhilepop S // get rid of ( that ended while
else // t is real operator and S not empty)while prec(t) <= prec(top of S) do
pop S and enqueue result on Qendwhilepush t on S
endifendwhilewhile S is not empty do
pop S and enqueue result on Qendwhilereturn Q
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 187]
Converting Infix to Postfix with Precedence (cont’d)
For example:
� (22=7 + 4) � (6� 2)
Q:
S:
� 7� (2 � 3 + 5) � (8� 4=2)
Q:
S:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 188]
Implementing a Queue with an Array
State is represented with:
� arrayA
� integerhead that holds
� integertail that holds
Operation implementations:
� enqueue(x):
� dequeue(x):
� empty:
� peek:
� size:
Problem:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 189]
Implementing a Queue with a Circular Array
Wrap around to reuse the vacated space at the begin-ning of the array in a circular fashion, using mod oper-ator%.
� enqueue(x):
� dequeue(x):
� empty:
The problem is that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 190]
Expanding Size of Queue Dynamically
To avoid overflow problem in circular array implemen-tation of a queue, use same idea as for array implemen-tation of stack:If array is discovered to be full during an enqueue,
� allocate
� copy
� enqueue
� free
One complication with the queue, though, is that thecontents of the queue might be in two sections:
1. from
2. then from
Copying the new array must take this into account.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 191]
Performance of Circular Array
Performance of the circular array implementation of aqueue:
� Time:
� space:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 192]
Implementing a Queue with a Linked List
State representation:
� Data items are kept in
� Pointerhead points to
� Pointertail points to
Operation implementations:
� To enqueue an item,
� To dequeue an item,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 193]
Implementing a Queue with a Linked List (cont’d)
class Queue {
private QueueNode head;private QueueNode tail;
public Queue() {head = null;tail = null;
}
public boolean empty() {return (head == null);
}
public void enqueue(Object obj) {QueueNode node = new QueueNode(obj);if empty() {
head = node;tail = node;
} else {tail.link = node;tail = node;
}}
// continued on next slide
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 194]
Implementing a Queue with a Linked List (cont’d)
// continued from previous slide
public Object dequeue() {if ( empty() )
return null; // or throw an EmptyQueueExceptionelse {
Object returnItem = head.item;head = head.link; // remove first node from listif (head == null) // fix tail pointer if needed
tail = null;return returnItem;
}}
}
Every operation always takes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 195]
Motivation for the List ADT
This ADT is good for modeling
Some sample applications:
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 196]
Specifying the List ADT
Thestateof a list object is
Typical operations on a list are:
� create:
� empty:
� length:
� select(i):
� replace(i,x):
� delete(x):
� insert(x):
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 197]
Implementing the List ADT
Array implementation:
� Keep a counter
� To select or replace at some location,
� To insert at some location, items down.
� To delete at some location,
Linked list implementation:
� Keep a count of
� To select, replace, delete or insert an item,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 198]
Comparing the Times of List Implementations
Timefor various operations, on a list ofn data items:
list singlyoperation linked list array
empty
length
select(i)
replace(i)
delete(i)
insert(i)
The time for insert in an array assumes no overflowoccurs. If overflow occurs,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 199]
Comparing the Space of List Implementations
Spacerequirements:
� If the array holdspointersto the items, then there isthe space overhead of
� If the array holds the items themselves, then there isthe space overhead of
� In both kinds of arrays, there is also the overhead of
� If you use a linked list, then the space overhead isfor
To quantify the space tradeoffs between the array ofitems and linked list representations:
� Let p be the number of
� Let q be the number of
� Letm be the number of
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 200]
Comparing the Space (cont’d)
To holdn items,
� the array representation uses
� the linked list representation uses
The tradeoff point is when
� Whenn < q �m=(p+ q),
� Whenn > q �m=(p+ q),
� When the item size,q, is much larger than the pointersize,p,
� When the item size,q, is closer to the pointer size,p,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 201]
Generalized Lists
A generalized listis
Example:(a; b; (c; (d; e); f); g; (h; i)).
There are five elements in the (top level) list:
1.
2.
3.
4.
5.
Items which are not lists are calledatoms(they cannotbe further subdivided).
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 202]
Sample Java Code for Generalized List
class Node {Object item;Node link;Node (Object obj) { item = obj; }
}class GenList {
private Node first;GenList() { first = null; }void insert(Object newItem) {
Node node = new Node(newItem);node.link = first;first = node;
}void print() {
System.out.print("( ");Node node = first;while (node != null) {
if (node.item instanceof GenList)((GenList)node.item).print();
else S.o.p(node.item);node = node.link;if (node != null) S.o.p(", ");
}S.o.p(" )");
}}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 203]
Sample Java Code (cont’d)
Notice:
� o instanceof C returns true if
– objecto
– objecto
– objecto
– objecto
� castsnode.item to typeGenList , if appropri-ate
� recursive call of theGenList methodprint
� implicit use of thetoString method of every class,in the call toSystem.out.print
Don’t confuse theprint method ofSystem.outwith theprint method we are defining for classGenList .)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 204]
Sample Java Code (cont’d)
How do we know thatprint is well-defined and won’tget into an infinite loop?
Theprint method is recursiveanduses a while loop.� The while loop
� If an item is not a generalized list, then it
� If an item is itself a generalized list, then
� The while loop stops whenEach recursive call takes you deeper into the nesting ofthe generalized list.� Assume
� The stopping case for the recursion is
� Each recursive call takes you closer to a stoppingcase.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 205]
Generalized List Pitfalls
Warning! If there is acycle in the generalized list,print will go into an infinite loop. For instance:
Be careful aboutshared sublists. For instance,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 206]
Application of Generalized Lists: LISP
Generalized lists are
� highly
� good for applications where
� the key structuring paradigm in
LISP is afunctional language:
Each function call is represented as a list, with thename of the function coming first, and the argumentscoming after it:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 207]
LISP-like Approach to Arithmetic Expressions
Apply this approach to evaluating arithmetic expres-sions:
Useprefix notation (as opposed to postfix), with paren-theses to delimit the sublists:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 208]
Strings and StringBuffers
Java differentiates between
There areno methods that changean existingString .
If you want to change the characters in a string, use aStringBuffer . Some key features are:
� change
� append
� insert
TheStringBuffer class can be implemented usingan array of characters. The ideas are not complicated.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 209]
The Heap
When you usenew or malloc to dynamically allo-cate some space, the run-time system handles the me-chanics of actually finding the required free space ofthe necessary size.
When you make an object inaccessible (in Java) or usefree (in C), again the run-time system handles themechanics of reclaiming the space.
We are now going to look at HOW one could imple-ment dynamic allocation of objects from the heap. Thereasons are:
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 210]
What is the Heap?
The heap is an area of memory used to store objectsthat will by dynamically allocated and deallocated.
Memory can be viewed as one long array of memorylocations, where the address of a memory location isthe index of the location in the array.
Thus we can view the heap as
Contiguous locations in the heap (array) are groupedtogether into
When a request arrives to allocaten bytes, the system
� finds
� allocates
� returns
Blocks are classified as either
Initially,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 211]
Heap Data Structures
Once blocks are allocated, the heap might get choppedup into alternating allocated and free blocks of varyingsizes.
We need a way to locate all the free blocks.
This will be done by keeping the free blocks in a
The linked list is implemented using
Each block has some
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 212]
Allocation
When a request arrives to allocaten bytes,
There are two strategies for choosing the block to use:
�
�
If the block found is bigger thann, then
If the block found is exactly of sizen, then
If no block large enough is found, then
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 213]
Deallocation
When a block is deallocated, as a first cut, simply insertthe block at the front of the free list.
���������������
���������������
���������������
���������������
������������������������������
������������������������������
������������������������������������
������������������������������������
������������������������������
������������������������������
��������������������������������������������������
��������������������������������������������������
��������������������������������������������������
��������������������������������������������������
p := alloc(10)
q := alloc(20)
free(p)
r := alloc(40)
free(q)
10
100
10 70
10
70
50
10
10
10
20
20
20
50
10
10
40
40
0
free
0
freep
79
79
79300 10
q free
20
p
0
q
30 7910
free
79703010
q
0
rfree
0 10 30 79
rfree
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 214]
Fragmentation
������������������������������������������������������������������
������������������������������������������������������������������
free(q)
70
10 20 1040
0 10 30 79
rfree
Problem with previous example: If a request comes infor 30 bytes, the system will check the free list, andfind
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 215]
Coalescing
A solution to fragmentation is to
� physical neighbor:
� virtual neighbor:
To facilitate this operation, we will need additional spaceoverhead in the header, and it will also help to keep“footer” information at the end of each block to:
� make
� indicate
� replicate
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 216]
More Insidious Fragmentation
������������������������������������������������������������������
������������������������������������������������������������������
free(q)
70
10 20 1040
0 10 30 79
rfree
However, coalescing will not accommodate a requestfor
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 217]
Compaction
The solution to this problem is called
The difficulty though is that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 218]
Master Pointers
A solution is to use
� A special area of the heap contains
� The addresses
� The address returned by the allocate procedure is
� Thecontentsof a master pointer
� But the user,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 219]
Master Pointers (cont’d)
������������������������������
������������������������������
��������������������
��������������������
���������������
���������������......
q rp
master pointers
rest of heap
Costs:
� Additional
� Additional
� Unpredictable
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 220]
Garbage Collection
The above discussion of deallocation assumes the mem-ory allocation algorithm is somehow informed aboutwhich blocks are no longer in use:
� In C, this is done
� In Java,
This process is part ofgarbage collection:
�
�
One of the challenging aspects of garbage collection ishow to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 221]
Trees
Important terminology:
Some uses of trees:
� model
� model
� a clever implementation of
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 222]
Trees (cont’d)
Some more terms:
� path:
� length of path:
� height of a node:
� height of tree:
� depth (or level) of a node:
� depth of tree:
Fact: The depth of a tree equals the height of the tree.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 223]
Binary Trees
Binary tree: a tree in which
Complete binary tree: tree in which
Important Facts:
� A complete binary tree withL levels contains
� A complete binary tree withn nodes has
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 224]
Binary Trees (cont’d)
Leftmost binary tree: like a complete binary tree,except that
however, all leaves at bottom level are
Important Facts:
� A leftmost binary tree withL levels contains
� A leftmost binary tree withn nodes has
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 225]
Binary Heap
Now suppose that there is a data item, calledinside each node of a tree.
A binary heap (or min-heap) is a
� leftmost binary tree
� satisfies the
Do not confuse this use of “heap” with its usage inmemory management!
Important Fact: The same set of keys
There is no
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 226]
Using a Heap to Implement a Priority Queue
To implement the priority queue operationinsert(x):
1.
2.
3.
Time:
To implement the priority queue operationremove():Tricky part is how to remove the root without messingup the tree structure.
1.
2.
3.
Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 227]
Using a Heap to Implement a PQ (cont’d)
PQ operation sorted arrayunsorted arrayheapor linked list or linked list
insertremove (min)
No longer have the severe tradeoffs of the array andlinked list representations of priority queue.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 228]
Heap Sort
Recall the sorting algorithm that used a priority queue:
1. insert the elements to be sorted, one by one, into apriority queue.
2. remove the elements, one by one, from the priorityqueue; they will come out in sorted order.
If the priority queue is implemented with a heap, therunning time is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 229]
Linked Structure Implementation of Heap
To implement a heap with a linked structure, each nodeof the tree will be represented with an object containing
�
�
�
�
To find the next available location for insert, or therightmost node on the bottom level for remove, in con-stant time,
�
�
Then keep a
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 230]
Array Implementation of Heap
Fortunately, there’s a nifty way to implement a heapusing an array, based on an interesting observation: Ifyou number the nodes in a leftmost binary tree, startingat the root and going across levels and down levels, yousee a pattern:
1
2 3
4 5
8 9
6 7
� Node numberi has left child
� Node numberi has right child
� If 2 � i > n, theni has no
� If 2 � i + 1 > n, theni has no
� Therefore, node numberi is a leaf if
� The parent of nodei is
� Next available location for insert is index
� Rightmost node on the bottom level is index
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 231]
Array Implementation of Heap (cont’d)
Representation consists of
� arrayA[1..max] (ignore location 0)
� integern, which is initially 0, holding number ofelements in heap
To implementinsert(x) (ignoring overflow):n := n+1 // make a new leaf nodeA[n] := x // new node’s key is initially xcur := n // start bubbling x upparent := cur/2while (parent != 0) && A[parent] > A[cur] do
// current node is not the root and its key// has not found final resting placeswap A[cur] and A[parent]cur := parent // move up a level in the treeparent := cur/2
endwhile
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 232]
Array Implementation of Heap (cont’d)
To implementremove(ignoring underflow):minKey := A[1] // smallest key, to be returnedA[1] := A[n] // replace root’s key with key in
// rightmost leaf on bottom leveln := n-1 // delete rightmost leaf on bottom levelcur := 1 // start bubbling down key in rootLchild := 2*curRchild := 2*cur + 1while (Lchild <= n) && (A[minChild()] < A[cur]) do
// current node is not a leaf and its key has// not found final resting place
swap A[cur] and A[minChild()]cur := minChild() // move down a level in the treeLchild := 2*curRchild := 2*cur + 1
endwhilereturn minKey
minChild(): // returns index of child w/ smaller keymin := Lchildif (Rchild <= n) && (A[Rchild] < A[Lchild]) then
// node has a right child and it is smallermin := RChild
endifreturn min
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 233]
Binary Tree Traversals
Now consideranykind of binary tree with data in thenodes, not just leftmost binary trees.
In many applications, we need totraversea tree: “visit”each node exactly once. When the node is visited,some computation can take place, such as printing thekey.
There are three popular kinds of traversals, differing inthe order in which each node is visited in relation to theorder in which its left and right subtrees are visited:
� inorder traversal:
� preorder traversal:
� postorder traversal:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 234]
Binary Tree Traversals (cont’d)
preorder(x):if x is not empty then
visit xpreorder(leftchild(x))preorder(rightchild(x))
inorder(x):if x is not empty then
inorder(leftchild(x))visit xinorder(rightchild(x))
postorder(x):if x is not empty then
postorder(leftchild(x))postorder(rightchild(x))visit x
a
b c
d
e
f g
h i
� preorder:
� inorder:
� postorder:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 235]
Binary Tree Traversals (cont’d)
These traversals are particularly interesting when thebinary tree is a parse tree for an arithmetic expression:
� Postorder traversal results in the
� Preorder gives
� Does inorder give
*
+ -
135 2� preorder:
� inorder:
� postorder:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 236]
Representation of a Binary Tree
The most straightforward representation for an (arbi-trary) binary tree is a linked structure, where each nodehas
�
�
�
Notice that the array representation used for a heapwill not work, because the structure of the tree is notnecessarily very regular.
class TreeNode {Object data; // data in the nodeTreeNode left; // left childTreeNode right; // right child
// constructor goes here...
void visit() {// what to do when node is visited}
}
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 237]
Representation of a Binary Tree (cont’d)
class Tree {TreeNode root;// other information...
void preorderTraversal() {preorder(root);
}
preorder(TreeNode t) {if (t != null) { // stopping case for recursion
t.visit(); // user-defined visit methodpreorder(t.left);preorder(t.right);
}}
}
But we haven’t yet talked about how you actually MAKEa binary tree. We’ll do that next, when we talk about
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 238]
Dictionary ADT Specification
So far, we’ve seen the abstract data types
�
�
�
�
Another useful ADT is adictionary (or table). Theabstract state of a dictionary is a
The main operations are:
�
�
�
Some additional operations are:
� find the
� find the
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 239]
Dictionary ADT Applications
Thedictionary (or table) ADT is
For instance, student records at a university can be keptin a dictionary data structure:
� When a new student enrolls,
� When a student graduates,
� When information about a student needs to be up-dated,
� Once the search has located the record for that stu-dent,
� When information about student needs to be retrieved,
The world is full of information databases, many ofthem extremely large (imagine what the IRS has).
When the number of elements gets very large,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 240]
Dictionary Implementations
We will study a number of implementations:
Search Trees
�
� :
–––
�
Hash Tables
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 241]
Binary Search Tree
Recall theheap ordering propertyfor binary heaps:
Anotherordering property is thebinary search treeproperty: for each nodex,
� all keys in the left subtree ofx
� all keys in the right subtree ofx
A binary search tree (BST)is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 242]
Searching in a BST
To search for a particular key in a binary search tree,we take advantage of the binary search tree property:
search(x,k): // x is node where search starts----------- // k is key searched forif x is null then // stopping case for recursion
return "not found"else i f k = the key of x then
return xelse i f k < the key of x then
search(leftchild(x),k) // recursive callelse / / k > the key of x
search(rightchild(x),k) // recursive callendif
The top level call hasx equal to
In the previous tree, the search path for 17 isand the search path for 21 is
Running Time:If BST is a chain, then
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 243]
Searching in a BST (cont’d)
Iterative version of search:
search(x,k):------------while x != null do
if k = the key of x thenreturn x
else if k < the key of x thenx := leftchild(x)
else // k > the key of xx := rightchild(x)
endifendwhilereturn "not found"
As in the recursive version,
The comparison of the search key with the node keytells you at each level
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 244]
Searching in a Balanced BST
If the tree is a complete binary tree, then the depth is
and thus the search time is
Binary trees withO(log n) depth are consideredbal-anced: there is balance between
You can have binary trees that areso that the depth isbut might have a larger constant hidden in the big-oh.
As an aside, a binary heap does not have
Since nodes at the same level of the heap have no par-ticular ordering relationship to each other, you will needto
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 245]
Inserting into a BST
To insert a keyk into a binary search tree,
Then
insert(x,k):-----------if x = null then
make a new node containing kreturn new node
else i f k = the key of x thenreturn null // key already exists
else i f k < the key of x thenleftchild(x) := insert(leftchild(x),k)return x
else / / k > the key of xrightchild(x) := insert(rightchild(x),k)return x
endif
Insert called on nodexunlessx is null, in which case
As a result, a child of a node
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 246]
Inserting into a BST (cont’d)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 247]
Finding Min and Max in Binary Search Tree
Fact: The smallest key in a binary tree is found by
Running Time:
Guess how to find the largest key and how long it takes.
Min isand max is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 248]
Printing a BST in Sorted Order
Cute tie-in between tree traversals and BST’s.
Theorem: Inorder traversal of a binary search tree vis-its the nodes
Inorder traversal on previous tree gives:
Proof: Let’s look at some small cases and then useinduction for the general case.
Case 1:
Case 2:
Casen: Suppose true for trees of size
Consider a tree of size
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 249]
Printing a BST in Sorted Order (cont’d)
L contains at mostandR contains at most
Inorder traversal:
� prints out
� then prints out
� then prints out
2
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 250]
Tree Sort
Does previous theorem suggest yet another sorting al-gorithm to you?
Tree Sort: Insert all the keysthen do an
Running Time:since each of then inserts takes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 251]
Finding Successor in a BST
Thesuccessorof a nodex in a BST is
Case 1:If x has a right child, then the successor ofx
is the
follow x’s right pointer, then follow left pointers untilthere are no more.
Path to find successor of 19 is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 252]
Finding Successor in a BST (cont’d)
19
10 22
16
17 27
20 26
13
4
Case 2:If x does not have a right child, then find the
Path to find successor of 17 is
If you never find an ancestor that is larger thanx’s key,then
Path to try to find successor of 27 is
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 253]
Finding Predecessor in a BST
The predecessorof a nodex in a BST is the nodewhose
To find it,
Case 1:If x has a left child, then the predecessor ofx
follow x’s left pointer, then follow right pointers untilthere are no more.
Case 2: If x does not have a left child, then find thelowest ancestor ofx
(I.e., follow parent pointers fromx until reaching a keysmaller thanx’s.)
If you never find an ancestor that is smaller thanx’skey, then
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 254]
Deleting a Node from a BST
Case 1:x is a leaf. Then
Case 2:x has only one child. Then
Case 3:x has two children. Use the same strategy asbinary heap: Instead of removing the root node,
1. Find
2. Delete
3. Replace
Running Time:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 255]
Deleting a Node from a BST (cont’d)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 256]
Balanced Search Trees
We would like to come up with a way to keep a binarysearch tree “balanced”, so that the depth isand thus the running time for the BST operations willbe
There are a number of schemes that have been devised.We will briefly look at a few of them.
They all require much more complicated algorithmsfor insertion and deletion, in order to
The algorithms for searching, finding min, max, pre-decessor or successor, are essentially the same as for
Next few slides give the main idea for the definitionsof the trees, but not why the definitions giveO(log n)
depth, and not how the algorithms for insertion anddeletion work.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 257]
AVL Trees
An AVL tree is a binary search tree such that for eachnode, the heights of the left and right subtrees of thenode
Theorem: The depth of an AVL tree is
When inserting or deleting a node in an AVL tree, ifyou detect that the AVL tree property has been vio-lated, then you
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 258]
Red-Black Trees
A red-black tree is a binary search tree in which
� every “real” node is given
� every node is colored
– every leaf node is
– if a node is red, then both its children are
– every path from a node to a leaf contains
From a fixed node, all paths from that node to a leafdiffer in length by
Theorem: The depth of an AVL tree isInsert and delete algorithms are quite involved.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 259]
B-Trees
The AVL tree and red-black tree allowed some varia-tion in
An alternative idea is to make sure that all root-to-leafpaths have
and allow
The definition of a B-tree uses a parameterm:
� every leaf
� the root has
� every non-root node has
Keys are placed into nodes like this:
� Each non-leaf node has
� Each leaf node has
� The keys within a node are
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 260]
B-Trees (cont’d)
And we require theextended search tree property:
� For each nodex, thei-th key inx is
and is
B-trees are extensively used in the real world, for in-stance, database applications. In practice,
Theorem: The depth of a B-tree tree is
Insert and delete algorithms are quite involved.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 261]
Tries
In the previous search trees, each key is
except for their
For some kinds of keys, one key might be a
For example, if the keys are strings, then the key “at”is a prefix of the key “atlas”.
The next kind of tree takes advantage of
to store them more efficiently.
A trie is a (not necessarily binary) tree in which
� each node corresponds to
� prefix for each node
The trie storing “a”, “ ale”, “ant”, “bed”, “bee”, “bet”:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 262]
Inserting into a Trie
To insert into a trie:
insert(x,s): // x is node, s is string to insert------------if length(s) = 0 then
mark x as holding a complete keyelse
c := first character in sif no outgoing edge from x is labeled with c then
create a new child node of xlabel the edge to the new child node with cput the edge in the correct sorted order
among all of x’s outgoing edgesendifx := child of x reached by edge labeled cs := result of removing first character from sinsert(x,s)
endif
Start the recursionTo insert “an” and “beep”:
a b
e
d e t
l n
te
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 263]
Searching in a Trie
To search in a trie:
search(x,s): // x is node, s is string to search for------------if length(s) = 0 then
if x holds a complete key then return xelse return null // s is not in the trie
elsec := first character in sif no outgoing edge from x is labeled with c then
return null // s is not in the trieelse
x := child of x reached by edge labeled cs := result of removing first character from ssearch(x,s)
endifendif
Start the recursion
To search for “art” and “bee”:a b
e
d e t
l n
te
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 264]
Hash Table Implementation of Dictionary ADT
Another implementation of the Dictionary ADT is a
Hash tables support the operations
�
�
�
with
This is a significant advantage over even balanced searchtrees, which have average times of
Thedisadvantageof hash tables is that
and printing all elements in sorted order takes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 265]
Main Idea of Hash Table
Main idea: exploitrandom accessfeature of arrays:the i-th entry of array A can be accessed
Simple example:Suppose all keys are in the range
Then store elements in an array A withInitialize all entries to some empty indicator.
� To insert x with key k:
� To search for key k:
� To delete element with key k:
All times are
But this idea does not scale well.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 266]
Hash Functions
Suppose
� elements are
� school has
� keys are
Since there are 1 billion possible SSN’s, we need anarray of length 1 billion. And most of it will be wasted,since only 40,000/1,000,000,000 = 1/25,000 fraction isnonempty.
Instead, we need a way to
LetM be the size of the array we are willing to provide.
Use ahash function, h, to
Thenh maps key values to integers in the range
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 267]
Simple Hash Function Example
Suppose keys are integers. Let the hash function beh(k) = k mod M . Notice that this always gives yousomething in the range
� To insertx with keyk:
� To search for element with keyk:
� To delete element with keyk:
All times areassuming the hash function can be computed in con-stant time.
The key to making this work is to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 268]
Collisions
In reality, any hash function will havecollisions: whentwo different keys
This is inevitable, since the hash function is squashingdown a large domain into a small range.
For example, ifh(k) = k mod M , then
since they both hash to
What should you do when you have a collision? Twocommon solutions are
1.
2.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 269]
Chaining
Keep all data items that hash to the same array locationin a
� to insert elementx with keyk:
� to search for element with keyk:
� to delete element with keyk:
Worst case times, assuming computingh is constant:
� insert:
� search and delete:Worst case is if alln elements
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 270]
Good Hash Functions for Chaining
Intuition: Hash function should
More formally:
Impractical to check in practice since
For example: Suppose the symbol table in a compileris implemented with a hash table. The compiler writercannot know in advance which variable names will ap-pear in each program to be compiled.
Heuristics are used to approximate this condition:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 271]
Good Hash Functions for Chaining (cont’d)
Some issues to consider in choosing a hash function:
� Exploit
For symbol table example, take into account the kindsof variables names that people often choose (e.g.,x1).
� Hash function should depend on
For example: if the keys are English words, it is nota good idea to hash on the first letter, since manywords begin with S and few with X.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 272]
Average Case Analysis of Chaining
Defineload factor of hash table withM entries andnkeys to be
Assume a hash function that is ideal for chaining
Fact: Average length of each linked list is
Theaveragerunning time for chaining:
� Insert:
� Unsuccessful Search:O(1) time to computeh(k); � items, on average, inthe linked list are checked until discovering thatk isnot present.
� Successful Search:O(1) time to computeh(k); on average, key beingsought is in middle of linked list, so�=2 compar-isons needed to findk.
� Delete:
For these times to beO(1), � must beO(1), son cannotbe too much larger than
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 273]
Open Addressing
With this scheme, there areInstead,
If there is a collision, you have toprobe the table –
You must pick a pattern that you will use to probe thetable.
The simplest pattern is toand then check
This is called
If h(k) = 7, the probe sequence will be
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 274]
Clustering
A problem with linear probing:
If an insert probe sequence begins in a cluster,
�
�
To reduce clustering,to skip over some locations, so locations are not checked
There are various schemes for how to choose the incre-ments; in fact, the increment to use can be
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 275]
Clustering (cont’d)
If the probe sequence starts at 7 and the probe incre-ment is 4, then the probe sequence will be
Warning!The probe increment must be
otherwise you will not search all locations.
For example, suppose you have table size 9 and incre-ment 3. You will only search
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 276]
Double Hashing
Even when “non-linear” probing is used, it is still truethat
To get around this problem, use
1. One hash function,h1, is used to determine
2. A second hash function,h2, is used to determine
If the hash functions are chosen properly,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 277]
Double Hashing Example
Let h1(k) = k mod 13 andh2(k) = 1 + (k mod 11).
� To insert 14: start probing atProbe increment isProbe sequence is
� To insert 27: start probing atProbe increment isProbe sequence is
� To search for 18: start probing atProbe increment isProbe sequence is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 278]
Deleting with Open Addressing
Open addressing has another complication:
� to insert:
� to search:
Suppose we use linear probing. Consider this sequence:
� Insertk1, whereh(k1) = 3, at location 3.
� Insertk2, whereh(k2) = 3, at location 4.
� Insertk3, whereh(k3) = 3, at location 5.
� Deletek2 from location 4 by setting location 4 toempty.
� Search fork3.
Solution:when an element is deleted, instead of mark-ing the slot as empty,
Then the search algorithm needs to continue searchingif it finds one of those slots.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 279]
Good Hash Functions for Open Addressing
An ideal hash function for open addressing would sat-isfy an even stronger property than that for chaining,namely:
This is even harder to achieve in practice than the idealproperty for chaining.
A good approximation is double hashing with this scheme:
�
Generalizes the earlier example.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 280]
Average Case Analysis of Open Addressing
In this situation, the load factor� = n=M is alwaysless than 1:
Assume that there is always at least one empty slot.
Assume that the hash function ensures that each key isequally likely to have each permutation off0; 1; : : : ;M � 1g as its probe sequence.
Average case running times:
� Unsuccessful Search:
� Insert:
� Successful Search:
� Delete:
The reasoning behind these formulas requires more so-phisticated probability than for chaining.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 281]
Sanity Check for Open Addressing Analysis
The time for searches should
The formula for unsuccessful search is
� As n gets closer toM ,
� so
� so
At the extreme, whenn = M � 1, the formula 1
1��=
M , meaning that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 282]
Sorting
� Insertion Sort:
– Consider
– Shift
– Insert
– Worst-case time is
� Treesort:
– Insert
– Then do
– For a basic BST, worst-case time isbut average time is
– For a balanced BST, worst-cast time isalthough code is more complicated.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 283]
Sorting (cont’d)
� Heapsort:
– Insert
– Then
– Worst-case time is
� Mergesort: Apply the idea of
– Split
– Recursively
– Recursively
– Then
– Worst-case time is
however, it requires more space.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 284]
Object-Oriented Software Engineering
References:
� Standish textbook, Appendix C
� Developing Java Software, by Russel Winder andGraham Roberts, John Wiley & Sons, 1998 (ch 8-9).
Outline of material:
�
�
�
�
�
�
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 285]
Small Scale vs. Large Scale Programming
Programming in the small: programs done by
whose length is
Programming in the large: projects consisting of
and producing
Obviously the complications are much greater here.
The field of software engineering is mostly orientedtoward
However, the principles still hold (although simplified)for programming in the small. It’s worth understandingthese principles so that
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 286]
Object-Oriented Software Engineering
Software engineeringstudies
Object-oriented software engineeringuses
Why object-oriented?
� use of abstractions to
� benefits of encapsulation to
� power of inheritance to
Experience has shown that object-oriented software en-gineering
� helps create robust reliable programs with
� promotes the development of programs by
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 287]
Object-Oriented Software Engineering (cont’d)
Solutions to specific problems tend to be fragile andshort-lived:
To minimize effects of requirement changes
instead of just focusing on
Usually the problem domain is fairly stable, whereas a
If you capture the problem domain as the core ofyour design, then the code is likely to be
More traditionalstructured programming tends to leadto a
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 288]
Object-Oriented Software Engineering (cont’d)
In OO analysis and design,identify
and model them asLeads to
� go downwards to
� go upwards to
This approach tends to lead toandFor instance, when the requirements change, you mayhave all the basic abstractions right but you
Aim for
which are specialized by inheritance to provide
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 289]
Software Life Cycle
� inception:
– requirements:
� elaboration:
– analysis:
– design:
– identify reuse:
� implementation
––
–
� testing
� delivery and maintenance
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 290]
Software Life Cycle (cont’d)
Lifecycle is not followed linearly;
An ideal way to proceed is by
� implement
� review
� decide
� proceed
� continue
This supports
letting you try alternatives and
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 291]
Requirements
Decidewhatthe program is supposed to do
Harder than it sounds.
Ask the user
�
�
Involve the user in reviewing the requirements whenthey are produced and the prototypes developed.
Typically, requirements are organized
Helpful to constructscenarios, which describe
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 292]
Requirements (cont’d)
An example scenario to look up a phone number:
1. select
2. enter
3.
4. program computes, to(do NOT specify data structure to be used at thislevel)
5.
Construct as many scenarios as needed until you feelcomfortable, and have gotten feedback from the user,that
This part of the software life cycle is no different forobject-oriented software engineering than for non-object-oriented.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 293]
Object-Oriented Analysis and Design
Main objective:
Analysis and design are two ends of a spectrum: Anal-ysis focuses more on the
while design focuses more on the
For large scale projects, there might be a real distinc-tion: for example,
might be required to implement
For small scale projects, there is typically no distinc-tion between analysis and design:
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 294]
Object-Oriented Analysis and Design (cont’d)
To decide on the classes:
� Study
Look for nounsin the requirements:
These will probably turn into
and/or
See how the requirements specify interactions be-tween things (e.g., each student has a GPA, eachcourse has a set of enrolled students).
� Use ananalysis method:
(Particularly aimed at large scale projects.)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 295]
An Example OO Analysis Method
CRC (Class, Responsibility, Collaboration): It clearlyidentifies the Classes, what the Responsibilities are ofeach class, and how the classes Collaborate (interact).
In the CRC method, you drawclass diagrams:
� each class is
–––
� if class 1 is a subclass of class 2, then
� if an object of class 1 is part of (an instance variableof) class 2, then
� if objects of class 1 need to communicate with ob-jects of class 2, then
The arrows and lines can be annotated to indicate thenumber of objects involved, the role they play, etc.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 296]
CRC Example
To model a game with several players who take turnsthrowing a cup containing dice, in which some scoringsystem is used to determine the best score:
This is a diagram of thenot theObject diagrams are trickier since
Double-check that the class diagram is consistent withrequirements scenarios.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 297]
Object-Oriented Analysis and Design (cont’d)
While fleshing out the design, after identifyingwhatthe different methods of the classes should be, figureout
This means deciding what
Do not fall in love with one particular solution (such asthe first one that occurs to you). Generate
and then try to
Do not commit to a particular solution too early in theprocess. Concentrate on
The use of ADTs assists in this aspect.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 298]
Verification and Correctness Proofs
Part of the design includes
You should have some convincing argument as to whythese algorithms arecorrect.
In many cases, it will be obvious:
�
�
But sometimes you might be coming up with your ownalgorithm, or
In these cases, it’s important to check what you aredoing!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 299]
Verification and Correctness Proofs (cont’d)
The Standish book describes one particular way to provecorrectness of small programs, or program fragments.The important lessons are:
� It is possible to
� Formalisms can help you to
� Spending a lot of time thinking about your program,no matter what formalism, will
� These approaches are impossible to do
For large programs, there are research efforts aimed at
i.e., programs that
Generally automatic verification is slow and cumber-some, and requires some specialized skills.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 300]
Verification and Correctness Proofs (cont’d)
An alternative approach to program verification is
Instead of trying to verify actual code,
� Represent the algorithm in
� then
Of course, you might make a mistake when translat-ing your pseudocode into Java, but the proving will bemuch more manageable than the verification.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 301]
Implementation
The design is now fleshed out to the level of code:
�
�
�
�
As the code is written, document the key design de-cisions, implementation choices, and any unobviousaspects of the code.
Software reuse:Use library classes as appropriate (e.g.,Stack, Vector, Date, HashTable). Kinds of reuse:
�
�
�
But sometimes modifications can be more time con-suming than starting from scratch.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 302]
Testing and Debugging: The Limitations
Testing cannot prove that your program is correct.
It is impossible to test a program on every single input,so
Even if you could apply some kind of program verifi-cation to your program,
And in fact, how do you know that your requirements
However, testing still serves a worthwhile, pragmatic,purpose.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 303]
Test Cases, Plans and Logs
Run the program on varioustest cases. Test casesshould
More specifically,
� test on
� test on
� test on
Organize your test cases according to a
Purposes:
� make it clear
� ensure that
Results of running a set of tests is a
After fixing a bug, you must
(Winder and Roberts calls this the Principle of Maxi-mum Paranoia.)
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 304]
Kinds of Testing
Unit testing:
�
�
Integration testing:
Two approaches to integration testing:
Bottom-up testing
Then progress to the next level up: those methods andclasses that only use the bottom level ones already tested.Use a driver to test combinations of the bottom twolayers.
Proceed until
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 305]
Kinds of Testing (cont’d)
Top down testing proceeds in the opposite direction,making
Reasons to do top down testing:
� to allow software development to
� if you have modules that are mutually dependent,e.g., X uses Y, Y uses Z, and Z uses X. You can
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 306]
Other Approaches to Debugging
In addition to testing, another approach to debugging aprogram is to
A third approach is called a
Some companies give your (group’s) code to anothergroup, whose job is to try to make your code break!
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 307]
Maintenance and Documentation
Maintenance includes:
�
�
�
�
Most often, the person (or people) doing the mainte-nance are NOT the one(s) who originally wrote theprogram.There are (at least) two kinds of documentation, bothof which need to be updated during maintenance:
� internal documentation,
� external documentation,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 308]
Maintenance and Documentation (cont’d)
In addition to good documentation, a clean and eas-ily modifiable structure is needed for effective mainte-nance,
If changes are made in ad hoc, kludgey way, (either be-cause the maintainer does not understand the underly-ing design or because the design is poor), the programwill
Trying to fix one problem causes something else tobreak, so in desperation you put in some jumps (spaghetticode) to try to avoid this, etc.
Eventually it may be better to replace the program with
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 309]
Measurement and Tuning
Experience has shown:
�
�
These observations suggest that optimizing your pro-gram can pay big benefits, but that it is smarter to
How can you figure out where your program is spend-ing its time?
� use a tool called an
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 310]
Measurement and Tuning (cont’d)
Things you can do to speed up a program:
� find
� replace
� replace
� take advantage of
Don’t do things that are stupidly slow in your programfrom the beginning.
On the other hand, don’t go overboard in supposedoptimizations (that might hurt readability) unless you
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 311]
Software Reuse and Bottom-up Programming
The bottom line from section C.7 in Standish is:
� the effort required to build software is
� making use of reusable components can
So it makes lots of sense to try to reuse software. Ofcourse, there are costs associated with reuse:
�
�
Using lots of reusable components leads to more bottom-up, rather than top down, programming. Or perhaps,more appropriately,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 312]
Design Patterns
As you gain experience, you will learn to recognizegood and bad design and build up
Why not try to exploit other people’s experience in thisarea as well?
A design patterncaptures a component of a completedesign that has been observed to
It provides both a solution to a problem and informa-tion about them.
There is a growing literature on design patterns, espe-cially for object oriented programming. It is worth-while to become familiar with it. For instance, searchthe WWW for “design pattern” and see what you get.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313]
File Structures
A file is
Why on mass storage?
�
�
�
The data is subdivided into
Each record contains a number of
One (or more) field is the
Issue:
We will discuss sequential files, indexed files, and hashedfiles.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 314]
Sequential Files
Records areconceptuallyorganized in
The actual storage might or might not be sequential:
� On a tape,
� On a disk,
Convenient way to batch (group together) a number ofupdates:
� Store the
� Sort the
� Scan through
Not a convenient organization for accessing a particu-lar record quickly.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 315]
Indexed Files
Sequential search is even slower on disk/tape than inmain memory. Try to improve performance using
An index for a file is a
Typically the key field is
The index can be organized as a list, a search tree, ahash table, etc.To find a particular record:
�
�
�
Multiple indexes, one per key field, allow
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 316]
Hashed Files
An alternative to storing the index as a hash table is to
Instead, hash on the key to find the address of the de-sired record and
The usual hashing considerations arise.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 317]
Databases
A databaseis
�
�
Example: Collection of student records can be viewedas a database to be used by:
�
�
�
�
The advantages of consolidating the data:
�
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 318]
Database System Organization
The “software architecture” of a database system is
� End user calls application software to access thedata. End user thinks of data
� Application software calls database management sys-tem (DBMS) software. The applications softwarehas a
� DBMS deals with the
As usual, the advantages of layering are that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 319]
Communication with a Database
Databases usually provide a useful and powerful in-terface for obtaining information from them. So far,we’ve just seen requests of the form:
�
�
�
But suppose you’d like to print out the names of allstudents that are freshman and either have a 4.0 GPAor whose names start with X.
There are ways to conceptually organize the data toallow suchqueries to be answered efficiently, usingwhat are called
� The application software communicates with
� The DBMS must
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 320]
Database Integrity
Data in a database is typically
�
�
Thus it must
Data can be corrupted if
Example of corrupted data:
� T1 transfers
� T2 inventories
Suppose this sequence of events occurs:
� T1 subtracts
� T2 gets the
� T2 gets the
� T1 adds
T2’s total balance is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 321]
DB Serializability
To prevent transactions from interfering with each other,the DBMS should
This property is called
The DMBS does not have to (and should not) actuallymake the transactions run serially, but if there is a po-tential conflict,
One solution is
� Before accessing any data item, the transaction must
� Only one transaction at a time can
� If another transaction already has the lock, then
� After accessing all the data items,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 322]
Committing and Aborting a Transaction
Two-phase locking can lead todeadlock, e.g.:
�
�
�
�
The DBMS must periodically check for deadlock, andif one is discovered, it must
If the aborted transaction has already made changes tothe database, the DBMS must
� either
� don’t actually
Once the transaction has successfully completed, thenit is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 323]
Artificial Intelligence
Goal: Develop machines that
�
�
and proceed ”intelligently”
�
�
�
Distinct but related goals:
1.
2.
3.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 324]
8-Puzzle Example
Given a 3-by-3 box that holds 8 tiles, numbered 1 through8. One tile is missing. The goal is to start with the tilesscrambled and
We will try to solve this problem by a machine that has
� a gripper,
� a video camera,
� a computer,
� a “finger”,
Ideas from mechanical engineering can be used to im-plement the gripper and the finger. We will talk abouthow to “see” where the tiles are, and how to decidehow to move the tiles.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 325]
Computer Vision
It is not enough to simply store the image obtainedfrom the camera. The program must be
� figure out which parts of the image are the salientobjects, called
� and then recognize the objects by comparing themto known symbols, called
For the 8-puzzle, this problem can be highly simplified:
� always expect the digits to
�
�
�
But in general this is a very difficult problem and onewhere there has been extensive research.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 326]
Reasoning
How can the program solve the puzzle?
One solution is to
For example, if the input is
then the solution is to
But in this case there are approximately 9! = 362,880different inputs, some of which require a long sequenceof moves to solve, and it would require a lot of space.
Plus, someone would have to figure out all the answersin advance.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 327]
Production Systems
Instead, have the program figure out the solution. Oneapproach is the
First, consider thestate graphof the problem:
�
�
Here is a tiny piece of the state graph for the 8-puzzle:
Identify the
Thecontrol systemfigures out how to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 328]
Solving a Production System
We must find a path through the state graph from
Luckily, finding paths in graphs is
One way is to build asearch tree(not to be confusedwith a binary search tree), which
Two solutions are
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 329]
Breadth-First Search
Build the search tree in abreadth-first manner:
� The root
� The next level
� The next level
For example:1 2 3
6
7
2 3
6
7
1 2 3
4 6
7
1 2 3
6
3
6
7
1 3
4 6
7
1 2 3
4
7
1 2 3
4 5 6
7
1 2 3
6
85
4
85
41
85
85
41
2
85
2
85
6
8
85
47
8
47
5
But the search tree grows exponentially.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 330]
Depth-First Search
Another approach is
Pursue more promising paths to greater depths andconsider other options only if
To implement this idea, we need some criterion to de-cide which paths are promising, orappearto be promis-ing.
Such criteria are calledheuristics. A heuristic is
We need something quantitative so we can
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 331]
Heuristic for 8-Puzzle
For the 8-puzzle example, our intuitive rule of thumbis to
A quantitative heuristic measure is:
For instance, if the input is
then the heuristic measure is
This heuristic has two desirable properties:
1. it is a
2. it is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 332]
Using a Heuristic in Depth-First Search
� Repeatedly
� Choose the
� Generate
� Continue
In the 8-puzzle example above:
� Generate the root. Its heuristic measure is
� Generate all children of the root. They have mea-sures
� Choose the leaf with measure 2 and generate all itschildren. They have measures
� Choose the leaf with measure 1 and generate all itschildren. They have measures
In this depth-first search, we only had to generate 9states, instead of
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 333]
Other Applications of Production Systems
Many problems can be formulated as production sys-tems. In addition to the 8-puzzle,
You can even model the process of drawing logicalconclusions from a set of given facts as a productionsystem. In this case,
� each state is
� a production/rule/move corresponds to
For instance, part of the state graph might be:
since there is a rule of logic that says: Given the facts
1.
2.
then you can deduce that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 334]
Some Other Areas of AI
Neural Networks: Try to take advantage of the powerof parallelism (multiprocessor computer architectures)using a paradigm that (roughly) follows the model of
Robotics: Hardware and software working together,e.g., automated manufacturing. Great interest in hav-ing machines explore and function in uncontrolled andunpredictable environments, such as
�
�
�
Expert Systems:Combine domain specific knowledgefrom human experts with For example:
�
�
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 335]
Time Complexity of an Algorithm
Time complexity of an algorithm: the functionT (n)that describes the
Given a particular algorithm, discover this function byattacking the problem from two directions:
� find anupper boundU(n) on the functionT (n), i.e.,convince ourselves that the algorithm will
� find a lower boundL(n) on the functionT (n), i.e.,convince ourselves that, for eachn, there is
Try to find smallestU and largestL, so thatT is squeezedin between and has no room to hide.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 336]
Time Complexity of an Algorithm (cont’d)
(a) No execution on an input of sizen0 takes
(b) The slowest execution on all inputs of sizen0 takes
(c) At least one execution on an input of sizen0 takes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 337]
Time Complexity of Heapsort
Let T (n) be the time complexity of heapsort.
First cut at upper bound:
First cut at lower bound:
Refined argument for upper bound: each heap opera-tion never
Refined argument for lower bound: Describe a partic-ular input that
On inputn; n � 1; n� 2; : : : ; 3; 2; 1, running time is atleast
ThusT (n) now precisely identified as
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 338]
Time Complexity of a Problem
Time complexity of a problem: the time complexityfor
To show that a problem has time complexityT (n):
� Identify a
� Then prove
Example:Sorting problem has time complexityO(n log n).
�
� It can be proved that
Problems can be classified by their time complexity.Harderproblems are considered to be those
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 339]
The Class P
All problems(not algorithms) whose time complexityis at most some polynomial are said to be
Example:
Not all problems are in P.
Example:Consider the problem of listing all permuta-tions of the integers 1 throughn.
� Output size is
� Thus running time is
� n! is larger than2n, thus
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 340]
NP-Complete Problems
There is an important class of problems that
These problems are called
These problems have the following characteristic:
�
�
Many real-world problems in science, math, engineer-ing, operations research, etc. are NP-complete.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 341]
Traveling Salesman Problem
An example NP-complete problem is the
Given a set of cities and the distances between them,determine an order in which to
A candidate solution for TSP is
To check whether the allowed mileage is exceeded, addup the distances between adjacent cities in the listing,which will take
But the total number of different candidate solutions is
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 342]
P vs. NP
Imagine an (unrealistically) powerful model of compu-tation in which the computer first makes a luckyguess(a nondeterministic choice) as to a candidate solutionin constant time, and then behaves as an ordinary com-puter and verifies the solution.
Problems solvable on this computer in polynomial timeare
NP includes
Having polynomial running time on this funny com-puter would not seem to ensure polynomial runningtime on a real computer.
That is, it seems likely that
But no one has yet been able to proveP 6= NP . Out-standing open question in CS since the 1970’s.
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 343]
Computability Theory
Complexity theory focuses on
Computability theory focuses on
We will focus on computing (mathematical)functions,with inputs and outputs.
We would like to know if there exist functions that
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 344]
Church-Turing Thesis
First, we have to decide what constitutes an algorithm.
� Assembly languages have
� High-level languages have
�
Church-Turing thesis: (“thesis” means “conjecture”)Anything that can reasonably be considered an algo-rithm can be
A Turing machine is a
Thus, for theoretical purposes,
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 345]
Computing Functions
Some sample functions:
� f(n) = 3:
� f(n) = 2n:
� f(n) = sinn:
There existnon-computablefunctions, functions whoseinput/output relationships are so complicated that thereis no
We will assume
� your
� with a
� only consider
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 346]
Goedel Number of a Program
Here is a way to convert a program into an integer.
�
�
Conversely, any integer can be converted
� Most of the time,
� Sometimes it
� Rarely,
� More rarely,
Use this numbering scheme to
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 347]
An Uncomputable Function
Define a functionh called the
� If the program with Goedel numbern halts when itsinput isn, then
� If the program with Goedel numbern does not haltwhen its input isn, then
Theorem: h is uncomputable
Proof: Assume in contradiction thath is computable.Then
Define another programI (which will be in the listing):
1.n
2. run programH
3. letx be
4. if x = 0 then
5. else
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 348]
An Uncomputable Function (cont’d)
Let nI be the Goedel number ofI.
Case 1:
Case 2:
Thus the hypothetical programH
2
Another way to view this result is that