ada notes
TRANSCRIPT
ADA Notes
Introduction
An algorithm, named after the ninth century scholar Abu Jafar Muhammad Ibn Musu Al-Khowarizmi, is defined as follows: Roughly speaking:
An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a required result.
An algorithm is a sequence of computational steps that transform the input into the output.
An algorithm is a sequence of operations performed on data that have to be organized in data structures.
An algorithm is an abstraction of a program to be executed on a physical machine (model of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks: this is Euclid's algorithm for calculating the greatest common divisor of two integers.
The Classic Multiplication Algorithm1. Multiplication, the American way:
Multiply the multiplicand one after another by each digit of the multiplier taken from right to left.
2. Multiplication, the English way:
- 1 -
ADA Notes
Multiply the multiplicand one after another by each digit of the multiplier taken from left to right.
Algorithmic is a branch of computer science that consists of designing and analyzing computer algorithms
The “design” pertain to
The description of algorithm at an abstract level by means of a pseudo language, and
Proof of correctness that is, the algorithm solves the given problem in all cases.
The “analysis” deals with performance evaluation (complexity analysis).
We start with defining the model of computation, which is usually the Random Access Machine (RAM) model, but other models of computations can be use such as PRAM. Once the model of computation has been defined, an algorithm can be describe using a simple language (or pseudo language) whose syntax is close to programming language such as C or java
Algorithm's PerformanceTwo important ways to characterize the effectiveness of an algorithm are its space complexity and time complexity. Time complexity of an algorithm concerns determining an expression of the number of steps needed as a function of the problem size. Since the step count measure is somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only to get asymptotic bounds on the step count. Asymptotic analysis makes use of the O (Big Oh) notation. Two other notational constructs used by computer scientists in the analysis of algorithms are Θ (Big Theta) notation and Ω (Big Omega)
- 2 -
ADA Notes
notation.The performance evaluation of an algorithm is obtained by totaling the number of occurrences of each operation when running the algorithm. The performance of an algorithm is evaluated as a function of the input size n and is to be considered modulo a multiplicative constant.
The following notations are commonly use notations in performance analysis and used to characterize the complexity of an algorithm.
Θ-Notation (Same order)
This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if there exist positive constants n0, c1 and c2 such that to the right of n0 the value of f(n) always lies between c1g(n) and c2g(n) inclusive.
O-Notation (Upper Bound)
This notation gives an upper bound for a function to within a constant factor. We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n) always lies on or below cg(n).
- 3 -
ADA Notes
Ω-Notation (Lower Bound)
This notation gives a lower bound for a function to within a constant factor. We write f(n) = Ω(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n) always lies on or above cg(n).
Algorithm AnalysisThe complexity of an algorithm is a function g(n) that gives the upper bound of the number of operation (or running time) performed by an algorithm when the input size is n.
- 4 -
ADA Notes
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations over all problem instances for a given size.
Because, it is quite difficult to estimate the statistical behavior of the input, most of the time we content ourselves to a worst case behavior. Most of the time, the complexity of g(n) is approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear complexity), log n (logarithmic complexity), na where a≥2 (polynomial complexity), an (exponential complexity).
OptimalityOnce the complexity of an algorithm has been estimated, the question arises whether this algorithm is optimal. An algorithm for a given problem is optimal if its complexity reaches the lower bound over all the algorithms solving this problem. For example, any algorithm solving “the intersection of n segments” problem will execute at least n2 operations in the worst case even if it does nothing but print the output. This is abbreviated by saying that the problem has Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it will be optimal and of complexity Θ(n2).
ReductionAnother technique for estimating the complexity of a problem is the transformation of problems, also called problem reduction. As an example, suppose we know a lower bound for a problem A, and that we would like to estimate a lower bound for a problem B. If we can transform A into B by a transformation step whose cost is less than that for solving A, then B has the same bound as A.
- 5 -
ADA Notes
The Convex hull problem nicely illustrates "reduction" technique. A lower bound of Convex-hull problem established by reducing the sorting problem (complexity: Θ(nlogn)) to the Convex hull problem.
Mathematics for Algorithmic
Set
A set is a collection of different things (distinguishable objects or distinct objects) represented as a unit. The objects in a set are called its elements or members. If an object x is a member of a set S, we write x S. On the the hand, if x is not a member of S, we write z S. A set cannot contain the same object more than once, and its elements are not ordered.
For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57} or equivalently, 7 S and 8 S.
We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Thus, {n : n = m2 for some m N } means that a set of perfect squares.
Set CardinalityThe number of elements in a set is called cardinality or size of the set, denoted |S| or sometimes n(S). The two sets have same cardinality if their elements can be put into a one-to-one correspondence. It is easy to see that the cardinality of an empty set is zero i.e., | | .
MustiestIf we do want to take the number of occurrences of members into account, we call the group a multiset.For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as multiset.
Infinite Set A set contains infinite elements. For example, set of negative integers, set of integers, etc.
- 6 -
ADA Notes
Empty Set Set contain no member, denoted as or {}.
SubsetFor two sets A and B, we say that A is a subset of B, written A B, if every member of A also is a member of B. Formally, A B if
x A implies x B
writtenx A => x B.
Proper Subset
Set A is a proper subset of B, written A B, if A is a subset of B and not equal to B.
That is, A set A is proper subset of B if A B but A B.Equal Sets
The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased definition, let A and B be sets. A = B if A B and B A.Power Set
Let A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A. That is, P(A) = {B : B A}.For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the power set of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0), (0, 1), (1, 0), (1, 1)}.Disjoint Sets
Let A and B be sets. A and B are disjoint if A B = .Union of Sets
The union of A and B, written A B, is the set we get by combining all elements in A and B into a single set. That is,
A B = { x : x A or x B}.
For two finite sets A and B, we have identity
- 7 -
ADA Notes
|A B| = |A| + |B| - |A B|
We can conclude
|A B| |A| + |B|
That is,
if |A B| = 0 then |A B| = |A| + |B| and if A B then |A| |B|
Intersection Sets
The intersection of set set A and B, written A B, is the set of elements that are both in A and in B. That is,
A B = { x : x A and x B}.
Partition of SetA collection of S = {Si} of nonempty sets form a partition of a set if
i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
ii. Their union is S, that is, S = Si
In other words, S form a partition of S if each element of S appears in exactly on Si.
Difference of SetsLet A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.
For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3} while B-A = {4, 6, 8}.Complement of a Set
All set under consideration are subset of some large set U called universal set. Given a universal set U, the complement of A, written A', is the set of all elements under consideration that are not in
A.
- 8 -
ADA Notes
Formally, let A be a subset of universal set U. The complement of A in U is
A' = A - U
OR
A' = {x : x U and x A}.
For any set A U, we have following laws
i. A'' = A
ii. A A' = .
iii. A A' = U
Symmetric difference
Let A and B be sets. The symmetric difference of A and B is
A B = { x : x A or x B but not both}
Therefore,
A B = (A B) - (A B)
As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The
symmetric difference, A B = {1, 3, 4, 6, 8}.
SequencesA sequence of objects is a list of objects in some order. For example, the sequence 7, 21, 57 would be written as (7, 21, 57). In a set the order does not matter but in a sequence it does.
Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7, 7, 21, 57) is different from {7, 21, 57}.
TuplesFinite sequence often are called tuples. For example,
- 9 -
ADA Notes
(7, 21) 2-tuple or pair(7, 21, 57) 3-tuple(7, 21, ..., k ) k-tuple
An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a, b) = (a, {a, b}).
Cartesian Product or Cross ProductIf A and B are two sets, the cross product of A and B, written A×B, is the set of all pairs wherein the first element is a member of the set A and the second element is a member of the set B. Formally,
A×B = {(a, b) : a A, b B}.
For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x), (2, y), (2, z)}.When A and B are finite sets, the cardinality of their product is
|A×B| = |A| . |B|
n-tuples
The cartesian product of n sets A1, A2, ..., An is the set of n-tuples
A1
× A2 × ... × A
n = {(a
1, ..., a
n) : a
i A
i, i = 1, 2, ..., n}
whose cardinality is
| A1 × A
2 × ... × A
n| = |A1| . |A
2| ... |A
n|
If all sets are finite. We denote an n-fold cartesian product over a single set A by the set
An = A × A × ... × A
whose cardinality is
|An | = | A|n if A is finite.
- 10 -
ADA Notes
Greedy Introduction
Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the sense that they take decisions on the basis of information at hand without worrying about the effect these decisions may have in the future. They are easy to invent, easy to implement and most of the time quite efficient. Many problems cannot be solved correctly by greedy approach. Greedy algorithms are used to solve optimization problems
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at any moment; it never reconsiders this decision, whatever situation may arise later.
As an example consider the problem of "Making Change".
Coins available are:
dollars (100 cents) quarters (25 cents)
dimes (10 cents) nickels (5 cents)
pennies (1 cent)
Problem Make a change of a given amount using the smallest possible number of coins.
Informal Algorithm
Start with nothing.
at every stage without passing the given amount.
add the largest to the coins already chosen.
Formal Algorithm
Make change for n units using the least possible number of coins.
- 11 -
ADA Notes
MAKE-CHANGE (n) C ← {100, 25, 10, 5, 1} // constant. Sol ← {}; // set that will hold the solution set. Sum ← 0 sum of item in solution set WHILE sum not = n x = largest item in set C such that sum + x ≤ n IF no such item THEN RETURN "No Solution" S ← S {value of x} sum ← sum + x RETURN S
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the largest coin without worrying about the consequences. Moreover, it never changes its mind in the sense that once a coin has been included in the solution set, it remains there.
Characteristics and Features of Problems solved by Greedy Algorithms
To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen items and the other contains rejected items.
The greedy algorithm consists of four (4) function.
A function that checks whether chosen set of items provide a solution.
A function that checks the feasibility of a set.
The selection function tells which of the candidates is the most promising.
An objective function, which does not appear explicitly, gives the value of a solution.
Structure Greedy Algorithm
Initially the set of chosen items is empty i.e., solution set.
At each step item will be added in a solution set by using selection function.
IF the set would no longer be feasible
- 12 -
ADA Notes
reject items under consideration (and is never consider again).
ELSE IF set is still feasible THEN
add the current item.
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution, but an optimal solution to the problem. In particular, the empty set is always promising why? (because an optimal solution always exists)
Unlike Dynamic Programming, which solves the subproblems bottom-up, a greedy strategy usually progresses in a top-down fashion, making one greedy choice after another, reducing each problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.
Knapsack Problem
Statement A thief robbing a store and can carry a maximal weight of w into their knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should thief take?
There are two versions of problem
Fractional knapsack problemThe setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 ≤ xi ≤ 1.
Exhibit greedy choice property.
- 13 -
ADA Notes
Greedy algorithm exists.
Exhibit optimal substructure property.
?????
0-1 knapsack problemThe setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a fraction of an item.
Exhibit No greedy choice property.
No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.
Dynamic-Programming Solution
to the 0-1 Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds.
Then S`= S - {i} is an optimal solution for W-wi pounds and the value to
the solution S is Vi plus the value of the subproblem.
We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
c[i,w]
=c[i-1, w] if w
i ≥ 0
max [vi
+ c[i-1, w-wi], c[i-1,
w]} if i>0 and w ≥ w
i
- 14 -
ADA Notes
This says that the value of the solution to i items either include ith item, in which case it is vi plus a subproblem solution for (i-1) items and the weight excluding wi, or does not include ith item, in which case it is a subproblem's solution for (i-1) items and the same weight. That is, if the thief picks item i, thief takes vi value, and thief can choose from items w-wi, and get c[i-1, w-wi] additional value. On other hand, if thief decides not to take item i, thief can choose from item 1,2, . . . , i-1 upto the weight limit w, and get c[i-1, w] value. The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula: boundary values are 0, and other values are computed from the input and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR-book for finding a longest common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in the table, that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is, the first row of c is filled in from left to right, then the second row, and so on. At the end of the computation, c[n, w] contains the maximum value that can be picked into the knapsack.
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do c[0, w] = 0for i = 1 to n do c[i, 0] = 0 for w = 1 to W do if wi ≤ w then if vi + c[i-1, w-wi] then c[i, w] = vi + c[i-1, w-wi] else c[i, w] = c[i-1, w] else c[i, w] = c[i-1, w]
- 15 -
ADA Notes
The set of items to take can be deduced from the table, starting at c[n. w] and tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-W].
Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows:
θ(nw) times to fill the c-table, which has (n+1).(w+1) entries, each
requiring θ(1) time to compute. O(n) time to trace the solution, because the
tracing process starts in row n of the table and moves up 1 row at each step.
Greedy Solution to the
Fractional Knapsack Problem
There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and
worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack. In this version of a problem the items can be broken into smaller piece, so the
thief may decide to carry only a fraction xi of object i, where 0 ≤ xi ≤ 1.
Item i contributes xiwi to the total weight in the knapsack, and xivi to the value of the load.
In Symbol, the fraction knapsack problem can be stated as follows.maximize nSi=1 xivi subject to constraint nSi=1 xiwi ≤ W
It is clear that an optimal solution must fill the knapsack exactly, for otherwise we could add a fraction of one of the remaining objects and increase the value of the load. Thus in an optimal solution nSi=1 xiwi = W.
Greedy-fractional-knapsack (w, v, W)
- 16 -
ADA Notes
FOR i =1 to n do x[i] =0weight = 0while weight < W do i = best remaining item IF weight + w[i] ≤ W then x[i] = 1 weight = weight + w[i] else x[i] = (w - weight) / w[i] weight = Wreturn x
Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).
If we keep the items in heap with largest vi/wi at the root. Then
creating the heap takes O(n) time
while-loop now takes O(log n) time (since heap property must be restored after the removal of root) Although this data structure does not alter the worst-case, it may be faster if only a small number of items are need to fill the knapsack.
One variant of the 0-1 knapsack problem is when order of items are sorted by increasing weight is the same as their order when sorted by decreasing value.
The optimal solution to this problem is to sort by the value of the item in decreasing order. Then pick up the most valuable item which also has a least weight. First, if its weight is less than the total weight that can be carried. Then deduct the total weight that can be carried by the weight of the item just pick.
- 17 -
ADA Notes
The second item to pick is the most valuable item among those remaining. Keep follow the same strategy until thief cannot carry more item (due to weight).
Proof
One way to proof the correctness of the above algorithm is to prove the greedy choice property and optimal substructure property. It consist of two steps. First, prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem after the first greedy choice has been made. The second part is easy to prove since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □
Theorem The fractional knapsack problem has the greedy-choice property.
Proof Let the ratio v`/w` is maximal. This supposition implies that
v`/w` ≥ v/w for any pair (v, w), so v`v / w > v for any (v, w). Now
Suppose a solution does not contain the full w` weight of the best ratio. Then
by replacing an amount of any other w with more w` will improve the value. □
Greedy Solution to the
Fractional Knapsack Problem
There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and
worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack. In this version of a problem the items can be broken into smaller piece, so the
thief may decide to carry only a fraction xi of object i, where 0 ≤ xi ≤ 1.
Item i contributes xiwi to the total weight in the knapsack, and xivi to the value of the load.
- 18 -
ADA Notes
In Symbol, the fraction knapsack problem can be stated as follows.maximize nSi=1 xivi subject to constraint nSi=1 xiwi ≤ W
It is clear that an optimal solution must fill the knapsack exactly, for otherwise we could add a fraction of one of the remaining objects and increase the value of the load. Thus in an optimal solution nSi=1 xiwi = W.
Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n do x[i] =0weight = 0while weight < W do i = best remaining item IF weight + w[i] ≤ W then x[i] = 1 weight = weight + w[i] else x[i] = (w - weight) / w[i] weight = Wreturn x
Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).
If we keep the items in heap with largest vi/wi at the root. Then
creating the heap takes O(n) time
while-loop now takes O(log n) time (since heap property must be restored after the removal of root)
- 19 -
ADA Notes
Although this data structure does not alter the worst-case, it may be faster if only a small number of items are need to fill the knapsack.
One variant of the 0-1 knapsack problem is when order of items are sorted by increasing weight is the same as their order when sorted by decreasing value.
The optimal solution to this problem is to sort by the value of the item in decreasing order. Then pick up the most valuable item which also has a least weight. First, if its weight is less than the total weight that can be carried. Then deduct the total weight that can be carried by the weight of the item just pick. The second item to pick is the most valuable item among those remaining. Keep follow the same strategy until thief cannot carry more item (due to weight).
Proof
One way to proof the correctness of the above algorithm is to prove the greedy choice property and optimal substructure property. It consist of two steps. First, prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem after the first greedy choice has been made. The second part is easy to prove since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □
Theorem The fractional knapsack problem has the greedy-choice property.
Proof Let the ratio v`/w` is maximal. This supposition implies that
v`/w` ≥ v/w for any pair (v, w), so v`v / w > v for any (v, w). Now
Suppose a solution does not contain the full w` weight of the best ratio. Then
by replacing an amount of any other w with more w` will improve the value. □
- 20 -
ADA Notes
An Activity Selection Problem
An activity-selection is the problem of scheduling a resource among several competing activity.
Problem Statement
Given a set S of n activities with and start time, Si and fi, finish time of an ith activity. Find the maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi
Greedy Algorithm for Selection Problem
I. Sort the input activities by increasing finishing time. f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-SELECTOR (s, f)
n = length [s]
A={i}
j = 1
for i = 2 to n
do if si ≥ fj
then A= AU{i}
j = i
return set A
- 21 -
ADA Notes
Operation of the algorithm
Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished times for proposed activities are (1, 4), (3, 5), (0, 6), 5, 7), (3, 8), 5, 9), (6, 10), (8, 11), (8, 12), (2, 13) and (12, 14).
A = {p} Initialization at line 2A = {p, s} line 6 - 1st iteration of FOR - loopA = {p, s, w} line 6 -2nd iteration of FOR - loopA = {p, s, w, z} line 6 - 3rd iteration of FOR-loopOut of the FOR-loop and Return A = {p, s, w, z}
Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already sorted in part I by their finish time.
Correctness
Note that Greedy algorithm do not always produce optimal solutions but GREEDY-ACTIVITY-SELECTOR does.
Theorem Algorithm GREED-ACTIVITY-SELECTOR produces solution of maximum size for the activity-selection problem.
Proof Idea Show the activity problem satisfied
Greedy choice property.
Optimal substructure property.
Proof
Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It implies that activity 1 has the earliest finish time.
Suppose, A S is an optimal solution and let activities in A are ordered by finish time.
Suppose, the first activity in A is k.
- 22 -
ADA Notes
If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is nothing to proof here).
If k 1, we want to show that there is another solution B that begins with greedy choice, activity 1.
Let B = A - {k} {1}. Because f1 fk, the activities in B are disjoint and since
B has same number of activities as A, i.e., |A| = |B|, B is also optimal.
Once the greedy choice is made, the problem reduces to finding an optimal solution for
the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an optimal solution to the activity-selection problem S` = {i S: Si fi}.
why? Because if we could find a solution B` to S` with more activities then A`, adding 1 to B` would yield a solution B to S with more activities than A, there by contradicting the optimality.
As an example consider the example. Given a set of activities to among lecture halls. Schedule all the activities using minimal lecture halls.In order to determine which activity should use which lecture hall, the algorithm uses the GREEDY-ACTIVITY-SELECTOR to calculate the activities in the first lecture hall. If there are some activities yet to be scheduled, a new lecture hall is selected and GREEDY-ACTIVITY-SELECTOR is called again. This continues until all activities have been scheduled.
LECTURE-HALL-ASSIGNMENT (s, f)
n = length [s)for i = 1 to n do HALL [i] = NILk = 1while (Not empty (s)) do HALL [k] = GREEDY-ACTIVITY-SELECTOR (s, t, n) k = k + 1return HALL
Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f) (see CLR).
- 23 -
ADA Notes
j = first (s)A = ifor i = j + 1 to n do if s(i) not= "-" then if
GREED-ACTIVITY-SELECTOR (s, f, n)
j = first (s)A = i = j + 1 to n if s(i] not = "-" then if s[i] ≥ f[j]| then A = AU{i} s[i] = "-" j = ireturn A
Correctness
The algorithm can be shown to be correct and optimal. As a contradiction, assume the number of lecture halls are not optimal, that is, the algorithm
allocates more hall than necessary. Therefore, there exists a set of activities B which have been wrongly allocated. An activity b belonging to B which has
been allocated to hall H[i] should have optimally been allocated to H[k].
This implies that the activities for lecture hall H[k] have not been allocated optimally, as the GREED-ACTIVITY-SELECTOR produces the optimal set of activities for a particular lecture hall.
Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-
SELECTOR runs in θ(n). The running time of this algorithm is O(n2).
Two important Observations
- 24 -
ADA Notes
Choosing the activity of least duration will not always produce an optimal solution. For example, we have a set of activities {(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3, 5) or (6, 8) will be picked first, which will be picked first, which will prevent the optimal solution of {(1, 4), (4, 7), (7, 10)} from being found.
Choosing the activity with the least overlap will not always produce solution. For example, we have a set of activities {(0, 4), (4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0, 3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other activities is (4, 6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5), (5, 9), (9, 10)} from being found.
Dynamic-Programming Algorithm for the Activity-Selection Problem
Huffman Codes
Huffman code is a technique for compressing data. Huffman's greedy algorithm look at the occurrence of each character and it as a binary string in an optimal way.
Example
Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the data occur with following frequencies.
a b c d e f
Frequency 45,000
13,000
12,000
16,000
9,000
5,000
Consider the problem of designing a "binary character code" in which each character is represented by a unique binary string.
Fixed Length Code
In fixed length code, needs 3 bits to represent six(6) characters.
- 25 -
ADA Notes
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000Fixed Length code 000 001 010 011 100 101
This method require 3000,000 bits to code the entire file.
How do we get 3000,000?
Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 1000,000.
Add each character is assigned 3-bit codeword => 3 * 1000,000 = 3000,000 bits.
Conclusion
Fixed-length code requires 300,000 bits while variable code requires 224,000 bits.
=> Saving of approximately 25%.
Prefix Codes
In which no codeword is a prefix of other codeword. The reason prefix codes are desirable is that they simply encoding (compression) and decoding.
Can we do better?
A variable-length code can do better by giving frequent characters short codewords and infrequent characters long codewords.
- 26 -
ADA Notes
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000Fixed Length code 0 101 100 111 1101 1100
Character 'a' are 45,000 each character 'a' assigned 1 bit codeword. 1 * 45,000 = 45,000 bits.
Characters (b, c, d) are 13,000 + 12,000 + 16,000 = 41,000 each character assigned 3 bit codeword 3 * 41,000 = 123,000 bits
Characters (e, f) are 9,000 + 5,000 = 14,000 each character assigned 4 bit codeword. 4 * 14,000 = 56,000 bits.
Implies that the total bits are: 45,000 + 123,000 + 56,000 = 224,000 bits
Encoding: Concatenate the codewords representing each characters of the file.
- 27 -
ADA Notes
String Encoding TEA 10 00 010
SEA 011 00 010 TEN 10 00 110
Example From variable-length codes table, we code the3-character file abc as:
a b c
0 101 100 => 0.101.100 = 0101100
Decoding
Since no codeword is a prefix of other, the codeword that begins an encoded file is unambiguous.To decode (Translate back to the original character), remove it from the encode file and repeatedly parse.For example in "variable-length codeword" table, the string 001011101 parse uniquely as 0.0.101.1101, which is decode to aabe.
The representation of "decoding process" is binary tree, whose leaves are characters. We interpret the binary codeword for a character as path from the root to that character, where 0 means "go to the left child" and 1 means "go to
- 28 -
ADA Notes
the right child". Note that an optimal code for a file is always represented by a full (complete) binary tree.
Theorem A Binary tree that is not full cannot correspond to an optimal prefix code.
Proof Let T be a binary tree corresponds to prefix code such that T is not full. Then there must exist an internal node, say x, such that x has only one child, y. Construct another binary tree, T`, which has save leaves as T and have same depth as T except for the leaves which are in the subtree rooted at y in T. These leaves will have depth in T`, which implies T cannot correspond to an optimal prefix code.To obtain T`, simply merge x and y into a single node, z is a child of parent of x (if a parent exists) and z is a parent to any children of y. Then T` has the desired properties: it corresponds to a code on the same alphabet as the code which are obtained, in the subtree rooted at y in T have depth in T` strictly less (by one) than their depth in T.This completes the proof. □
a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000Fixed Length code 000 001 010 011 100 101
Variable-length Code 0 101 100 111 1101 1100
Fixed-length code is not optimal since binary tree is not full.
- 29 -
ADA Notes
Figure
Optimal prefix code because tree is full binary
Figure
From now on consider only full binary tree
If C is the alphabet from which characters are drawn, then the tree for an optimal prefix code has exactly |c| leaves (one for each letter) and exactly |c|-1 internal orders. Given a tree T corresponding to the prefix code, compute the number of bits required to encode a file. For each character c in C, let f(c) be the frequency of c and let dT(c) denote the depth of c's leaf. Note that dT(c) is also the length of codeword. The number of bits to encode a file is
B (T) = S f(c) dT(c)
which define as the cost of the tree T.
For example, the cost of the above tree is
B (T) = S f(c) dT(c) = 45*1 +13*3 + 12*3 + 16*3 + 9*4 +5*4 = 224
Therefore, the cost of the tree corresponding to the optimal prefix code is 224 (224*1000 = 224000).
Constructing a Huffman code
A greedy algorithm that constructs an optimal prefix code called a Huffman code. The algorithm builds the tree T corresponding to the optimal code in a
- 30 -
ADA Notes
bottom-up manner. It begins with a set of |c| leaves and perform |c|-1 "merging" operations to create the final tree.
Data Structure used: Priority queue = QHuffman (c)n = |c|Q = cfor i =1 to n-1 do z = Allocate-Node () x = left[z] = EXTRACT_MIN(Q) y = right[z] = EXTRACT_MIN(Q) f[z] = f[x] + f[y] INSERT (Q, z)return EXTRACT_MIN(Q)
Analysis
Q implemented as a binary heap.
line 2 can be performed by using BUILD-HEAP (P. 145; CLR) in O(n) time.
FOR loop executed |n| - 1 times and since each heap operation requires O(lg n) time.
=> the FOR loop contributes (|n| - 1) O(lg n)
=> O(n lg n)
Thus the total running time of Huffman on the set of n characters is O(nlg n).
- 31 -
ADA Notes
Operation of the Algorithm
An optimal Huffman code for the following set of frequencies a:1 b:1 c:2 d:3 e:5 g:13 h:2 Note that the frequencies are based on Fibonacci numbers.
Since there are letters in the alphabet, the initial queue size is n = 8, and 7 merge steps are required to build the tree. The final tree represents the optimal prefix code.
Figure
The codeword for a letter is the sequence of the edge labels on the path from the root to the letter. Thus, the optimal Huffman code is as follows:
h : 1
g : 1 0
f : 1 1 0
e : 1 1 1 0
d : 1 1 1 1 0
c : 1 1 1 1 1 0
b : 1 1 1 1 1 1 0
a : 1 1 1 1 1 1 1
As we can see the tree is one long limb with leaves n=hanging off. This is true for Fibonacci weights in general, because the Fibonacci the recurrence is
Fi+1 + Fi + Fi-1 implies that i Fi = Fi+2 - 1.
To prove this, write Fj as Fj+1 - Fj-1 and sum from 0 to i, that is, F-1 = 0.
- 32 -
ADA Notes
Correctness of Huffman Code Algorithm
Proof Idea
Step 1: Show that this problem satisfies the greedy choice property, that is, if a greedy choice is made by Huffman's algorithm, an optimal solution remains possible.
Step 2: Show that this problem has an optimal substructure property, that is, an optimal solution to Huffman's algorithm contains optimal solution to subproblems.
Step 3: Conclude correctness of Huffman's algorithm using step 1 and step 2.
Lemma - Greedy Choice Property Let c be an alphabet in which each character c has frequency f[c]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit.
Proof Idea
Take the tree T representing optimal prefix code and transform T into a tree
T` representing another optimal prefix code such that the x characters x and y appear as sibling leaves of maximum depth in T`. If we can do this, then their codewords will have same length and differ only in the last bit.
Figures
- 33 -
ADA Notes
Proof
Let characters b and c are sibling leaves of maximum depth in tree T. Without
loss of generality assume that f[b] ≥ f[c] and f[x] ≤ f[y]. Since f[x]
and f[y] are lowest leaf frequencies in order and f[b] and f[c] are arbitrary
frequencies in order. We have f[x] ≤ f[b] and f[y] ≤ f[c]. As shown in
the above figure, exchange the positions of leaves to get first T` and then T``.
By formula, B(t) = c in C f(c)dT(c), the difference in cost between T and
T` is
B(T) - B(T`) = f[x]dT(x) + f(b)dT(b) - [f[x]dT(x) + f[b]dT`(b) = (f[b] - f[x]) (dT(b) - dT(x)) = (non-negative)(non-negative) ≥ 0
Two Important Points
The reason f[b] - f[x] is non-negative because x is a minimum frequency
leaf in tree T and the reason dT(b) - dT(x) is non-negative that b is a
leaf of maximum depth in T.Similarly, exchanging y and c does not increase the cost which implies that
B(T) - B(T`) ≥ 0. This fact in turn implies that B(T) ≤ B(T`), and
since T is optimal by supposition, which implies B(T`) = B(T). Therefore,
T`` is optimal in which x and y are sibling leaves of maximum depth, from which greedy choice property follows. This complete the proof. □
- 34 -
ADA Notes
Lemma - Optimal Substructure Property Let T be a full binary tree representing an optimal prefix code over an alphabet C, where frequency f[c] is define for each character c belongs to set C. Consider any two characters x and y that appear as sibling leaves in the tree T and let z be their parent. Then, considering character z with frequency f[z] = f[x] + f[y], tree T` = T - {x, y} represents an optimal code for the alphabet C` = C - {x, y}U{z}.
Proof Idea
Figure
Proof
We show that the cost B(T) of tree T can be expressed in terms of the cost
B(T`). By considering the component costs in equation, B(T) = f(c)dT(c), we show that the cost B(T) of tree T can be expressed in terms of
the cost B(T`) of the tree T`. For each c belongs to C - {x, y}, we have
dT(c) = dT(c)
cinC f[c]dT(c) = ScinC-{x,y} f[c]dT`(c) = f[x](dT` (z) + 1 + f[y] (dT`(z) +1) = (f[x] + f[y]) dT`(z) + f[x] + f[y]And B(T) = B(T`) + f(x) + f(y)
- 35 -
ADA Notes
If T` is non-optimal prefix code for C`, then there exists a T`` whose leaves
are the characters belongs to C` such that B(T``) < B(T`). Now, if x and y are added to T`` as a children of z, then we get a prefix code for alphabet C
with cost B(T``) + f[x] + f[y] < B(T), Contradicting the optimality of
T. Which implies that, tree T` must be optimal for the alphabet C. □
Theorem Procedure HUFFMAN produces an optimal prefix code.
Proof
Let S be the set of integers n ≥ 2 for which the Huffman procedure produces a tree of representing optimal prefix code for frequency f and alphabet C with |c| = n.If C = {x, y}, then Huffman produces one of the following optimal trees.
figure
This clearly shows 2 is a member of S. Next assume that n belongs to S and show that (n+1) also belong to S.Let C is an alphabet with |c| = n + 1. By lemma 'greedy choice property', there exists an optimal code tree T for alphabet C. Without loss of generality, if x and y are characters with minimal frequencies then a. x and y are at maximal depth in tree T and b. and y have a common parent z.Suppose that T` = T - {x,y} and C` = C - {x,y}U{z} then by lemma-optimal substructure property (step 2), tree T` is an optimal code tree for C`. Since |C`| = n and n belongs to S, the Huffman code procedure produces an optimal code tree T* for C`. Now let T** be the tree obtained from T* by attaching x and y as leaves to z.Without loss of generality, T** is the tree constructed for C by the Huffman procedure. Now suppose Huffman selects a and b from alphabet C in first step
- 36 -
ADA Notes
so that f[a] not equal f[x] and f[b] = f[y]. Then tree C constructed by Huffman can be altered as in proof of lemma-greedy-choice-property to give equivalent tree with a and y siblings with maximum depth. Since T` and T* are both optimal for C`, implies that B(T`) = B(T*) and also B(T**) = B(T) why? because
B(T**) = B(T*) - f[z]dT*(z) + [f[x] + f[y]] (dT*(z) + 1)] = B(T*) + f[x] + f[y]
Since tree T is optimal for alphabet C, so is T** . And T** is the tree constructed by the Huffman code.And this completes the proof. □
Theorem The total cost of a tree for a code can be computed as the sum, over all internal nodes, of the combined frequencies of the two children of the node.
Proof
Let tree be a full binary tree with n leaves. Apply induction hypothesis on the number of leaves in T. When n=2 (the case n=1 is trivially true), there are two leaves x and y (say) with the same parent z, then the cost of T is
B(T) = f(x)dT(x) + f[y]dT(y) = f[x] + f[y] since dT(x) = dT(y) =1 = f[child1 of z] + f[child2
of z].
Thus, the statement of theorem is true. Now suppose n>2 and also suppose that theorem is true for trees on n-1 leaves.
- 37 -
ADA Notes
Let c1 and c2 are two sibling leaves in T such that they have the same parent p. Letting T` be the tree obtained by deleting c1 and c2, we know by induction that
B(T) = leaves l` in T` f[l`]dT(l`) = internal nodes i` in T`
f[child1of i`] + f [child2 of i`]
Using this information, calculates the cost of T.
B(T) = leaves l in T f[l]dT(l) = l not= c1, c2 f[l]dT(l) + f[c1]dT (c1) -1) + f[c2]dT (c2) -1) + f[c1]+ f[c2] = leaves l` in T` f[l]dT`(l) + f[c1]+ f[c2] = internal nodes i` in T`
f[child1of i`] + f [child2 of i`] + f[c1]+ f[c2] = internal nodes i in T
f[child1of i] + f[child1of i]
Thus the statement is true. And this completes the proof.
The question is whether Huffman's algorithm can be generalize to handle ternary codewords, that is codewords using the symbols 0,1 and 2. Restate the question, whether or not some generalized version of Huffman's algorithm yields optimal ternary codes? Basically, the algorithm is similar to the binary code example given in the CLR-text book. That is, pick up three nodes (not two) which have the least frequency and form a new node with frequency equal to the summation of these three frequencies. Then repeat the procedure. However, when the number of nodes is an even number, a full ternary tree is not possible. So take care of this by inserting a null node with zero frequency.
- 38 -
ADA Notes
Correctness
Proof is immediate from the greedy choice property and an optimal substructure property. In other words, the proof is similar to the correctness proof of Huffman's algorithm in the CLR.
Spanning Tree and
Minimum Spanning Tree
Spanning Trees
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the spanning tree is called a chord. We construct spanning tree whenever we want to find a simple, cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.). Spanning trees are important because of following reasons.
Spanning trees construct a sparse sub graph that tells a lot about the original graph.
Spanning trees a very important in designing efficient routing algorithms.
Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be solved approximately by using
spanning trees.
Spanning trees have wide applications in many areas, such as network design, etc.
Greedy Spanning Tree Algorithm
One of the most elegant spanning tree algorithm that I know of is as follows:
Examine the edges in graph in any arbitrary sequence.
Decide whether each edge will be included in the spanning tree.
- 39 -
ADA Notes
Note that each time a step of the algorithm is performed, one edge is examined. If there is only a finite number of edges in the graph, the algorithm must halt after a finite number of steps. Thus, the time complexity of this
algorithm is clearly O(n), where n is the number of edges in the graph.
Some important facts about spanning trees are as follows:
Any two vertices in a tree are connected
by a unique path.
Let T be a spanning tree of a graph G, and
let e be an edge of G not in T. The T+e
contains a unique cycle.
Lemma The number of spanning trees in the complete graph Kn is nn-2.
Greediness It is easy to see that this algorithm has the property that each edge is examined at most once. Algorithms, like this one, which examine each entity at most once and decide its fate once and for all during that examination are called greedy algorithms. The obvious advantage of greedy approach is that we do not have to spend time reexamining entities.
Consider the problem of finding a spanning tree with the smallest possible weight or the largest possible weight, respectively called a minimum spanning tree and a maximum spanning tree. It is easy to see that if a graph possesses a spanning tree, it must have a minimum spanning tree and also a maximum spanning tree. These spanning trees can be constructed by performing the
- 40 -
ADA Notes
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate ordering of the edges.
Minimum Spanning Tree Algorithm Perform the spanning tree algorithm (above) by examining the edges is order of non decreasing weight (smallest first, largest last). If two or more edges have the same weight, order them arbitrarily.
Maximum Spanning Tree Algorithm Perform the spanning tree algorithm (above) by examining the edges in order of non increasing weight (largest first, smallest last). If two or more edges have the same weight, order them arbitrarily.
Minimum Spanning Trees
A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges sum is minimum weight. In other words, a MST is a tree formed from a subset of the edges in a given undirected graph, with two properties:
it spans the graph, i.e., it includes every vertex of the graph.
it is a minimum, i.e., the total weight of all the edges is as
low as possible.
Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the set of edges. Each edge has a given non negative length.
Problem Find a subset T of the edges of G such that all the vertices remain connected when only the edges T are used, and the sum of the lengths of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges in T. [Note: A connected graph with n vertices must have at least n-1 edges AND more that n-1 edges implies at least one cycle]. So n-1 is the minimum number of edges in the T. Hence if G` is connected and T has more that n-1
- 41 -
ADA Notes
edges, we can remove at least one of these edges without disconnecting (choose an edge that is part of cycle). This will decrease the total length of edges in T.
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must have n-1 edges otherwise there exist at least one cycle. Hence if G` is connected and T has more that n-1 edges. Implies that it contains at least one cycle. Remove edge from T without disconnecting the G` (i.e., remove the edge that is part of the cycle). This will decrease the total length of the edges in T. Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T must have n-1 edges and since G` is connected it must be a tree. The G` is called Minimum Spanning Tree (MST).
Kruskal's Algorithm
In kruskal's algorithm the selection function chooses edges in increasing order of length without worrying too much about their connection to previously chosen edges, except that never to form a cycle. The result is a forest of trees that grows until all the trees in a forest (all the components) merge in a single tree.
Prim's Algorithm
This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from an arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed; the algorithm halts when all the vertices in the graph have been reached. This strategy is greedy in the sense that at each step the partial spanning tree is augmented with an edge that is the smallest among all possible adjacent edges.
MST-PRIM
Input: A weighted, undirected graph G=(V, E, w)Output: A minimum spanning tree T.
T={}Let r be an arbitrarily chosen vertex from V.
- 42 -
ADA Notes
U = {r}WHILE | U| < n DOFind u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.T = TU{(u, v)}U= UU{v}
AnalysisThe algorithm spends most of its time in finding the smallest edge. So, time of the algorithm basically depends on how do we search this edge.Straightforward methodJust find the smallest edge by searching the adjacency list of the vertices in V. In this case, each iteration costs O(m) time, yielding a total running time of O(mn).
Binary heapBy using binary heaps, the algorithm runs in O(m log n).
Fibonacci heapBy using Fibonacci heaps, the algorithm runs in O(m + n log n) time.
Dijkstra's Algorithm (Shortest Path)
Consider a directed graph G = (V, E).
Problem Determine the length of the shortest path from the source to each of the other nodes of the graph. This problem can be solved by a greedy algorithm often called Dijkstra's algorithm.
The algorithm maintains two sets of vertices, S and C. At every stage the set S contains those vertices that have already been selected and set C contains all the other vertices. Hence we have the invariant property V=S U C. When algorithm starts Delta contains only the source vertex and when the algorithm halts, Delta contains all the vertices of the graph and problem is solved. At each step algorithm choose the vertex in C whose distance to the source is least and add it to S.
Divide-and-Conquer Algorithm
- 43 -
ADA Notes
Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing the problem into smaller subproblems hoping that the solutions of the subproblems are easier to find and then composing the partial solutions into the solution of the original problem.
Little more formally, divide-and-conquer paradigm consists of following major phases:
Breaking the problem into several sub-problems that are similar to the original problem but smaller in size,
Solve the sub-problem recursively (successively and independently), and then
Combine these solutions to subproblems to create a solution to the original problem.
Binary Search (simplest application of divide-and-conquer)
Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an ordered array of n elements, the basic idea of binary search is that for a given element we "probe" the middle element of the array. We continue in either the lower or upper segment of the array, depending on the outcome of the probe until we reached the required (given) element.
Problem Let A[1 . . . n] be an array of non-decreasing sorted order;
that is A [i] ≤ A [j] whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query
point. The problem consist of finding 'q' in the array A. If q is not in A, then
find the position where 'q' might be inserted.
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤ A[i].
Sequential Search
- 44 -
ADA Notes
Look sequentially at each element of A until either we reach at the end of an array A or find an item no smaller than 'q'.
Sequential search for 'q' in array A
for i = 1 to n do if A [i] ≥ q then return index i return n + 1
Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is
Ω(n) in the worst case and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the
array then loop executed (n + 1) / 2 average number of times. On average
(as well as the worst case), sequential search takes θ(n) time.
Binary Search
Look for 'q' either in the first half or in the second half of the array A.
Compare 'q' to an element in the middle, n/2 , of the array. Let k = n/2. If q ≤ A[k], then search in the A[1 . . . k]; otherwise search
T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the promise that
A[i-1] < x ≤ A[j]If i = j then return i (index)
- 45 -
ADA Notes
k= (i + j)/2if q ≤ A [k] then return Binary Search [A [i-k], q] else return Binary Search [A[k+1 . . j], q]
Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e.,
T(n) = θ(log n). This version of the binary search takes logarithmic time in the best case.
Iterative Version of Binary Search
Interactive binary search for q, in array A[1 . . n]
if q > A [n] then return n + 1i = 1;j = n;while i < j do k = (i + j)/2 if q ≤ A [k] then j = k else i = k + 1return i (the index)
- 46 -
ADA Notes
Analysis
The analysis of Iterative algorithm is identical to that of its recursive counterpart.
Dynamic Programming Algorithms
Dynamic programming is a stage-wise search method suitable for optimization problems whose solutions may be viewed as the result of a sequence of decisions. The most attractive property of this strategy is that during the search for a solution it avoids full enumeration by pruning early partial decision solutions that cannot possibly lead to optimal solution. In many practical situations, this strategy hits the optimal solution in a polynomial number of decision steps. However, in the worst case, such a strategy may end up performing full enumeration.
Dynamic programming takes advantage of the duplication and arrange to solve each subproblem only once, saving the solution (in table or something) for later use. The underlying idea of dynamic programming is: avoid calculating the same stuff twice, usually by keeping a table of known results of subproblems. Unlike divide-and-conquer, which solves the subproblems top-down, a dynamic programming is a bottom-up technique.
Bottom-up means
i. Start with the smallest subproblems.
ii. Combining theirs solutions obtain the solutions to subproblems of increasing
size.
iii. Until arrive at the solution of the original problem.
- 47 -
ADA Notes
The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an optimal sequence of decisions or choices, each subsequence must also be optimal. For example, in matrix chain multiplication problem, not only the value we are interested in is optimal but all the other entries in the table are also represent optimal.
The principle can be related as follows: the optimal solution to a problem is a combination of optimal solutions to some of its subproblems.
The difficulty in turning the principle of optimally into an algorithm is that it is not usually obvious which subproblems are relevant to the problem under consideration.
Dynamic-Programming Solution
to the 0-1 Knapsack Problem
Problem Statement A thief robbing a store and can carry a maximal weight of W into their knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should thief take?
There are two versions of problem
Fractional knapsack problem The setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 ≤ xi ≤ 1.
0-1 knapsack problem The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it
- 48 -
ADA Notes
(binary choice), but may not take a fraction of an item.
Fractional knapsack problem
Exhibit greedy choice property.
Greedy algorithm exists.
Exhibit optimal substructure property.
0-1 knapsack problem
Exhibit No greedy choice property.
No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.
Dynamic-Programming Solution to the 0-1 Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds.
Then S` = S - {i} is an optimal solution for W - wi pounds and the value to
the solution S is Vi plus the value of the subproblem.
We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
- 49 -
ADA Notes
c[i,w]
=c[i-1, w] if w
i ≥ 0
max [vi
+ c[i-1, w-wi], c[i-1,
w]} if i>0 and w ≥ w
i
This says that the value of the solution to i items either include ith item, in
which case it is vi plus a subproblem solution for (i - 1) items and the weight
excluding wi, or does not include ith item, in which case it is a subproblem's
solution for (i - 1) items and the same weight. That is, if the thief picks item i, thief takes vi value, and thief can choose from items w - wi, and get c[i - 1, w - wi] additional value. On other hand, if thief decides not to take item i,
thief can choose from item 1,2, . . . , i- 1 upto the weight limit w, and get c[i - 1, w] value. The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula: boundary values are 0, and other values are computed from the input
and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR for finding a longest common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n,
and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in the table, that is, a two dimensional array,
c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is,
the first row of c is filled in from left to right, then the second row, and so on.
At the end of the computation, c[n, w] contains the maximum value that can be picked into the knapsack.
- 50 -
ADA Notes
Dynamic-0-1-knapsack (v, w, n, W)
FOR w = 0 TO W DO c[0, w] = 0FOR i=1 to n DO c[i, 0] = 0 FOR w=1 TO W DO IFf wi ≤ w THEN IF vi + c[i-1, w-wi] THEN c[i, w] = vi + c[i-1, w-wi] ELSE c[i, w] = c[i-1, w] ELSE c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n. w] and
tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-W].
Analysis
- 51 -
ADA Notes
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as
follows: θ(nw) times to fill the c-table, which has (n +1).(w +1) entries,
each requiring θ(1) time to compute. O(n) time to trace the solution, because
the tracing process starts in row n of the table and moves up 1 row at each step.
Dynamic-Programming Algorithm
for the Activity-Selection Problem
An activity-selection is the problem of scheduling a resource among several competing activity.
Problem Statement Given a set S of n activities with and start time,
Si and fi, finish time of an ith activity. Find the maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do
not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi
Dynamic-Programming Algorithm
The finishing time are in a sorted array f[i] and the starting times are in array
s[i]. The array m[i] will store the value mi, where mi is the size of the largest of mutually compatible activities among activities {1, 2, . . . , i}. Let
- 52 -
ADA Notes
BINARY-SEARCH(f, s) returns the index of a number i in the sorted
array f such that f(i) ≤ s ≤ f[i + 1].
for i =1 to n do m[i] = max(m[i-1], 1+ m [BINARY-SEARCH(f, s[i])]) We have P(i] = 1 if activity i is in optimal selection, and P[i] = 0 otherwise i = n while i > 0 do if m[i] = m[i-1] then P[i] = 0 i = i - 1 else i = BINARY-SEARCH (f, s[i]) P[i] = 1
Analysis
- 53 -
ADA Notes
The running time of this algorithm is O(n lg n) because of the binary search
which takes lg(n) time as opposed to the O(n) running time of the greedy algorithm. This greedy algorithm assumes that the activities already sorted by increasing time.
Amortized Analysis
In an amortized analysis, the time required to perform a sequence of data structure operations is average over all operation performed. Amortized analysis can be used to show that average cost of an operation is small, if one average over a sequence of operations, even though a single operation might be expensive. Unlike the average probability distribution function, the amortized analysis guarantees the 'average' performance of each operation in the worst case.
CLR covers the three most common techniques used in amortized analysis. The main difference is the way the cost is assign.
1. Aggregate Method
o Computes an upper bound T(n) on the total cost of a sequence of n operations.
2. Accounting Method
o Overcharge some operations early in the sequence. This 'overcharge' is used later in the sequence for pay operation
that charges less than they actually cost.
3. Potential Method
o Maintain the credit as the potential energy to pay for future operations.
Aggregate Method
Aggregate Method Characteristics
- 54 -
ADA Notes
It computes the worst case time T(n) for a sequence of n operations.
The amortized cost is T(n)/n per operation.
It gives the average performance of each operation in the worst case.
This method is less precise than other methods, as all operations are assigned the same cost.
Application 1: Stack operations
In the following pseudocode, the operation STACK-EMPTY returns TRUE if there are no object currently on the stack, and FALSE otherwise.
MULTIPOP(s, k) while (.NOT. STACK-EMPTY(s) and k ≠ 0) do pop(s) k = k-1
Analysis
i. Worst-case cost for MULTIPOP is O(n). There are n successive calls to MULTIPOP would cost O(n2). We get unfair cost
O(n2) because each item can be poped only once for each time it is pushed.
ii. In a sequence of n mixed operations the most times multipop can be called n/2.Since the cost of push and pop is O(1), the cost
of n stack operations is O(n). Therefore, amortized cost of an operation is the average: O(n)/n = O(1).
Application 2: Binary Counter
- 55 -
ADA Notes
We use an array A[0 . . k-1] of bits, where length [A] = k, as the counter. A binary number x that is stored in the counter has its lowest-order bit in A[0] and its highest-order bit is A[k-1], so that k-1Si=0 2iA[i]. Initially, x = 0, and thus A[i] = 0 for i=0, 1, . . . , k-1.To add 1 (modulus 2k ) to the value in the counter, use the following pseudocode.
INCREMENT (A)i = 0while i < length [A] and A[i] = 1 do A[i] = 0 i = i+1if i < length [A] then A[i] = 1
A single execution of INCREMENT takes O(k) in worst case when Array A contains all 1's. Thus, a sequence of n INCREMENT operation on an initially
zero counter takes O(nk) in the worst case. This bound is correct but not tight.
Amortized Analysis
We can tighten the analysis to get a worst-case cost for a sequence of an INCREMENT's by observing that not all bits flip each time INCREMENT is called.
Bit A[0] is changed ceiling n times (every time)Bit A[0] is changed ceiling [n/21] times (every other time)Bit A[0] is changed ceiling [n/22] times (every other time) . .
- 56 -
ADA Notes
.Bit A[0] is changed ceiling [n/2i] times.
In general, for i = 0, 1, . . ., lg n , bit A[i] flips n/2i times in a sequence of n INCREMENT operations on an initially zero counter.
For i > lg(n) , bit A i never flips at all. The total number of flips in a sequence is thus
floor(log)Si=0 n/2i < n ∞Si=0 1/2i = 2n
Therefore, the worst-case time for a sequence of n INCREMENT operation on an initially zero counter is therefore O(n), so the amortized cost of each operation is O(n)/n = O(1).
Accounting Method
In this method, we assign changes to different operations, with some operations charged more or less than they actually cost. In other words, we assign artificial charges to different operations.
Any overcharge for an operation on an item is stored (in an bank account) reserved for that item.
Later, a different operation on that item can pay for its cost with the credit for that item.
The balanced in the (bank) account is not allowed to become negative.
The sum of the amortized cost for any sequence of operations is an upper bound for the actual total cost of these operations.
The amortized cost of each operation must be chosen wisely in order to pay for each operation at or before the cost is incurred.
Application 1: Stack Operation
Recall the actual costs of stack operations were:
- 57 -
ADA Notes
PUSH (s, x) 1POP (s) 1MULTIPOP (s, k) min(k,s)
The amortized cost assignments are
PUSH 2POP 0MULTIPOP 0
Observe that the amortized cost of each operation is O(1). We must show that one can pay for any sequence of stack operations by charging the amortized costs.
The two units costs collected for each PUSH is used as follows:
1 unit is used to pay the cost of the PUSH.
1 unit is collected in advanced to pay for a potential future POP.
Therefore, for any sequence for n PUSH, POP, and MULTIPOP operations, the amortized cost is an
Ci = j=1Si 3 - Ciactual
= 3i - (2floor(lg1) + 1 + i -floor(lgi) - 2)If i = 2k, where k ≥ 0, thenCi = 3i - (2k+1 + i - k -2) = k + 2If i = 2k + j, where k ≥ 0 and 1 ≤ j ≤ 2k, thenCi = 3i - (2k+1 + i - k - 2) = 2j + k + 2
This is an upperbound on the total actual cost. Since the total amortized cost is O(n) so is the total cost.
- 58 -
ADA Notes
As an example, consider a sequence of n operations is performed on a data structure. The ith operation costs i if i is an exact power of 2, and 1 otherwise. The accounting method of amortized analysis determine the amortized cost per operation as follows:Let amortized cost per operation be 3, then the credit Ci after the ith operation is: Since k ≥ 1 and j ≥ 1, so credit Ci always greater than zero. Hence, the total amortized cost 3n, that is O(n) is an upper bound on the total actual cost. Therefore, the amortized cost of each operation is O(n)/n = O(1).
Another example, consider a sequence of stack operations on a stack whose size never exceeds k. After every k operations, a copy of the entire stack is made. We must show that the cost of n stack operations, including copying the stack, is O(n) by assigning suitable amortized costs to the various stack operations.There are, ofcourse, many ways to assign amortized cost to stack operations. One way is:
PUSH 4,POP 0,MULTIPOP 0,STACK-COPY 0.
Every time we PUSH, we pay a dollar (unit) to perform the actual operation and store 1 dollar (put in the bank). That leaves us with 2 dollars, which is placed on x (say) element. When we POP x element off the stack, one of two dollar is used to pay POP operation and the other one (dollar) is again put into a bank account. The money in the bank is used to pay for the STACK-COPY operation. Since after kk dollars in the bank and the stack size is never exceeds k, there is enough dollars (units) in the bank (storage) to pay for the STACK-COPY operations. The cost of n stack operations, including copy the stack is therefore O(n). operations, there are atleast
Application 2: Binary Counter
- 59 -
ADA Notes
We observed in the method, the running time of INCREMENT operation on binary counter is proportion to the number of bits flipped. We shall use this running time as our cost here.
For amortized analysis, charge an amortized cost of 2 dollars to set a bit to 1.When a bit is set, use 1 dollar out of 2 dollars already charged to pay the actual setting of the bit, and place the other dollar on the bit as credit so that when we reset a bit to zero, we need not charge anything.
The amortized cost of psuedocode INCREMENT can now be evaluated:
INCREMENT (A)
1. i = 02. while i < length[A] and A[i] = 13. do A[i] = 04. i = i +15. if i < length [A]6. then A[i] = 1
Within the while loop, the cost of resetting the bits is paid for by the dollars on the bits that are reset.At most one bit is set, in line 6 above, and therefore the amortized cost of an INCREMENT operation is at most 2 dollars (units). Thus, for n INCREMENT operation, the total amortized cost is O(n), which bounds the total actual cost.
Consider a Variant
Let us implement a binary counter as a bit vector so that any sequence of n INCREMENT and RESET operations takes O(n) time on an initially zero counter,. The goal here is not only to increment a counter but also to read it to zero, that is, make all bits in the binary counter to zero. The new field , max[A],
- 60 -
ADA Notes
holds the index of the high-order 1 in A. Initially, set max[A] to -1. Now, update max[A] appropriately when the counter is incremented (or reset). By contrast the cost of RESET, we can limit it to an amount that can be covered from earlier INCREMENT'S.
INCREMENT (A)
1. i = 12. while i < length [A] and A[i] = 13. do A[i] = 04. i = i +15. if i < length [A]6. then A[i] = 17. if i > max[A]8. then max[A] = i9. else max[A] = -1
Note that lines 7, 8 and 9 are added in the CLR algorithm of binary counter.
RESET(A)
For i = 0 to max[A] do A[i] = 0max[A] = -1
For the counter in the CLR we assume that it cost 1 dollar to flip a bit. In
- 61 -
ADA Notes
addition to that we assume that we need 1 dollar to update max[A]. Setting and Resetting of bits work exactly as the binary counter in CLR: Pay 1 dollar to set bit to 1 and placed another 1 dollar on the same bit as credit. So, that the credit on each bit will pay to reset the bit during incrementing.In addition, use 1 dollar to update max[A] and if max[A] increases place 1 dollar as a credit on a new high-order 1. (If max[A] does not increase we just waste that one dollar). Since RESET manipulates bits at some time before the high-order 1 got up to max[A], every bit seen by RESET has one dollar credit on it. So, the zeroing of bits by RESET can be completely paid for by the credit stored on the bits. We just need one dollar to pay for resetting max[A].Thus, charging 4 dollars for each INCREMENT and 1 dollar for each RESET is sufficient, so the sequence of n INCREMENT and RESET operations take O(n) amortized time.
Potential Method
This method stores pre-payments as potential or potential energy that can be released to pay for future operations. The stored potential is associated with the entire data structure rather than specific objects within the data structure.
Notation:
D0 is the initial data structure (e.g., stack)
Di is the data structure after the ith operation.
ci is the actual cost of the ith operation.
The potential function Ψ maps each Di to its potential value Ψ(D
i)
The amortized cost ^ci of the ith operation w.r.t potential function Ψ is defined by
^ci = ci + Ψ(Di) - Ψ (Di-1) --------- (1)
The amortized cost of each operation is therefore
^ci = [Actual operation cost] + [change
in potential].
- 62 -
ADA Notes
By the eq.I, the total amortized cost of the n operation is
ni=1 ^ci
= ni=1(ci
+ Ψ(Di) - Ψ (Di-1) ) = n
i=1 ci + n
i=1 Ψ(Di) - n
i=1Ψ (Di-1) = n
i=1 ci + Ψ(D1) + Ψ(D2) + . . . + Ψ (Dn-1) + Ψ(Dn) - {Ψ(D0) + Ψ(D1) + . . . + Ψ (Dn-1) = n
i=1 ci + Ψ(Dn) - Ψ(D0)
----------- (2)
If we define a potential function Ψ so that Ψ(Dn) ≥ Ψ(D0), then the total amortized cost n
i=1 ^ci is an upper bound on the total actual cost.
As an example consider a sequence of n operations performed on a data
structure. The ith operation costs i if i is an exact power of 2 and 1 otherwise. The potential method of amortized determines the amortized cost per operation as follows:
Let Ψ(Di) = 2i - 2lgi+1 + 1, then Ψ(D0) = 0. Since 2lgi+1 ≤ 2i where i >0 , Therefore, Ψ(Di) ≥ 0 = Ψ(D0)
If i = 2k where k ≥ 0 then
2lgi+1 = 2k+1 = 2i 2lgi = 2k = i
- 63 -
ADA Notes
^ci = ci + Ψ(Di) - Ψ(Di-1)
= i + (2i -2i+1) -{2(i-1)-i+1} = 2
If i = 2k + j where k ≥ 0 and 1 ≤ j ≤ 2k
then 2lgi+1 = 2[lgi]
^ci = ci + Ψ(Di) - Ψ(Di-1) = 3
Because ni=1
^ci = n
i=1 ci + Ψ(Dn) - Ψ(D0)
and Ψ(Di) ≥ Ψ(D0), so, the total amortized cost of n operation is an upper bound on the total actual cost. Therefore, the total amortized cost of a sequence
of n operation is O(n) and the amortized cost per operation is O(n) / n = O(1).
Application 1- Stack Operations
Define the potential function Ψ on a stack to be the number of objects in the stack. For empty stack D0 , we have Ψ(D0) = 0. Since the number of objects in the stack can not be negative, the stack Di after the ith operation has nonnegative potential, and thus
Ψ(Di) ≥ 0 = Ψ(D0).
Therefore, the total amortized cost of n operations w.r.t. function Ψ represents an upper bound on the actual cost.
- 64 -
ADA Notes
Amortized costs of stack operations are:
PUSH
If the ith operation on a stack containing s object is a PUSH operation, then the potential difference is
Ψ(Di) - Ψ(Di-1) = (s + 1) - s = 1
In simple words, if ith is PUSH then (i-1)th must be one less. By equation I, the amortized cost of this PUSH operation is
^ci = ci + Ψ(Di) - Ψ(Di-1) = 1 + 1 =
2
MULTIPOP
If the ith operation on the stack is MULTIPOP(S, k)
and k` = min(k,s) objects are popped off the stack.
The actual cost of the operation is k`, and the potential difference is
Ψ(Di) - Ψ(Di-1) = -k`
why this is negative? Because we are taking off item from the stack. Thus, the amortized cost of the MULTIPOP operation is
^ci = ci + Ψ(Di) - Ψ(Di-1)
= k`-k` = 0
- 65 -
ADA Notes
POP
Similarly, the amortized cost of a POP operation is 0.
Analysis
Since amortized cost of each of the three operations is O(1), therefore, the total amortized cost of n operations is O(n). The total amortized cost of n operations is an upper bound on the total actual cost.
Lemma If data structure is Binary heap: Show that a potential function is O(nlgn) such that the amortized cost of EXTRACT-MIN is constant.
Proof
We know that the amortized cost ^ci of operation i is defined as
^ci = ci + Ψ(Di) - Ψ(Di-1)
For the heap operations, this gives us
c1lgn = c2lg(n+c3) + Ψ(Di) - Ψ(Di-1) (Insert) ------------(1)
- 66 -
ADA Notes
c4 = c5lg(n + c6) + Ψ(Di) - Ψ(Di-1) (EXTRACT-MIN) -----(2)
Consider the potential function Ψ(D) = lg(n!), where n is the number of items in D.
From equation (1), we have
(c1 - c2)lg(n + c3) = lg(n!) - lg ((n-1)!) = lgn.
This clearly holds if c1 = c2 and c3 = 0.
From equation (2), we have
c4 - c5 lg(n + c6) = lg(n!) - lg ((n+1)!) = - lg(n+1).
This clearly holds if c4 = 0 and c4 = c6 = 1.
Remember that stirlings function tells that lg(n!) = θ(nlgn), so
Ψ(D) = θ(n lg n)
And this completes the proof.
- 67 -
ADA Notes
Application 2: Binary Counter
Define the potential of the counter after ith INCREMENT operation to be bi,
the number of 1's in the counter after ith operation.
Let ith INCREMENT operation resets ti bits. This implies that actual cost = atmost (ti + 1).
Why? Because in addition to resetting ti it also sets at most one bit to 1.
Therefore, the number of 1's in the counter after the ith operation is therefore bi
≤ bi-1 - ti + 1, and the potential difference is
Ψ(Di) - Ψ(Di-1) ≤ (bi-1 - ti + 1) - bi-1 = 1- ti
Putting this value in equation (1), we get
^ci = ci + Ψ(Di) - Ψ
(Di-1) = (ti + 1) + (1- ti) = 2
If counter starts at zero, then Ψ(D0) = 0. Since Ψ(Di) ≥ 0 for all i, the
total amortized cost of a sequence of n INCREMENT operation is an upper
bound on the total actual cost, and so the worst-case cost of n INCREMENT
operations is O(n).
If counter does not start at zero, then the initial number are 1's (= b0).
After 'n' INCREMENT operations the number of 1's = bn, where 0 ≤ b0, bn ≤ k.
Since ni=1
^ci = (ci + Ψ(Di) + Ψ(Di-1))=> n
i=1^ci = n
i=1 ci + ni=1 Ψ(Di) + n
i=1 Ψ(Di-1)
- 68 -
ADA Notes
=> ni=1
^ci = nSi=1 ci + Ψ(Dn) - Ψ(D0)
=> ni=1ci = n
i=1 ^ci + Ψ(D0) - Ψ(Dn)
We have ^ci ≤ 2 for all 1≤ i ≤ n. Since Ψ(Di) = b0 and Ψ(Dn) = b, the
total cost of n INCREMENT operation is
Since ni=1ci = n
i=1 ^ci + Ψ(Dn) + Ψ(D0)
≤ ni=1 2 - bn + b0 why because ci ≤
2 = 2 n
i=1 - bn + b0
= 2n - bn + b
Note that since b0 ≤ k, if we execute at least n = Ω(k) INCREMENT
Operations, the total actual cost is O(n), no matter what initial value of counter is.
Implementation of a queue with two stacks, such that the amortized cost of each ENQUEUE and each DEQUEUE Operation is O(1). ENQUEUE pushes an object onto the first stack. DEQUEUE pops off an object from second stack if it is not empty. If second stack is empty, DEQUEUE transfers all objects from the first stack to the second stack to the second stack and then pops off the
first object. The goal is to show that this implementation has an O(1)
amortized cost for each ENQUEUE and DEQUEUE operation. Suppose Di
denotes the state of the stacks after ith operation. Define Ψ(Di) to be the
number of elements in the first stack. Clearly, Ψ(D0) = 0 and Ψ(Di) ≥ Ψ(D0) for all i. If the ith operation is an ENQUEUE operation, then Ψ(Di) - Ψ(Di-1) = 1Since the actual cost of an ENQUEUE operation is 1, the amortized cost of an
ENQUEUE operation is 2. If the ith operation is a DEQUEUE, then there are two case to consider.
Case i: When the second stack is not empty.
- 69 -
ADA Notes
In this case we have Ψ(Di) - Ψ(D
i-1) = 0 and the actual cost of the DEQUEUE operation is 1.
Case ii: When the second stack is empty.
In this case, we have Ψ(Di) - Ψ(D
i-1) = - Ψ(D
i-1) and the actual cost of the DEQUEUE operation is Ψ(D
i-1) + 1 .
In either case, the amortize cost of the DEQUEUE operation is 1. It follows that
each operation has O(1) amortized cost
Dynamic Table
If the allocated space for the table is not enough, we must copy the table into larger size table. Similarly, if large number of members erased from the table, it is good idea to reallocate the table with a smaller size. Using amortized analysis we shall show that the amortized cost of insertion and deletion is constant and unused space in a dynamic table never exceeds a constant fraction of the total space.
Assume that the dynamic table supports following two operations:
TABLE-INSERT
This operation add an item in the table by copying into the unused single slot. The cost of insertion is 1.
TABLE-DELETE
This operation removes an item from the table by freeing a slot. The cost of deletion is 1.
- 70 -
ADA Notes
Load Factor
The number of items stored in the table, n, divided by the size of the table, m, is defined as the load factor and denoted as T(α) = m/nThe load factor of the empty table (size m=0) is 1.A table is full when there exists no used slots. That is, the number of items stored in the table equals the number of available slots (m=n). In this case
Load factor T(α) = n/m = 1
Proposed Algorithm
1. Initialize table size to m=1.
2. Keep inserting elements as long as size of the table less than number of items i.e., n<m.
3. Generate a new table of size 2m and set m <-- 2m.
4. Copy items (by using elementary insertion) from the old table into the new one.
5. GOTO step 2.
Analysis
If n elementary insert operations are performed in line 4, the worst-case cost of an operation is O(n), which leads to an upper bound of O(n2) on the total running time for n operations.
Aggregate Analysis
The ith insert operation causes an expansion only when i - 1 an exact power of
2. Let ci be the cost of the ith insert operation. Then
- 71 -
ADA Notes
ci = i if i - 1 is an exact power of 2 1 Otherwise
As an example, consider the following illustration.
INSERTION TABLE-SIZE COST
n m 1
1 1 1+1
2 2 1+2
3 4 1
4 4 1+4
5 8 1
6 8 1
7 8 1
8 8 1
9 16 1+8
10 16 1
The total cost of n insert operations is therefore
n∑i=1 ci ≤ n + floor(lgn)∑j=0 2j = n + [2floor(lgn)+1 -1]/[2-1] since n∑k=0 xk = [xn+1 -1]/[x-1] = n + 2lgn . 2-1 = n + 2n - 1
- 72 -
ADA Notes
= 3n - 1 < 3n
Therefore, the amortized cost of a single operation is
= Total cost Number of operations= 3n/n= 3
Asymptotically, the cost of dynamic table is O(1) which is the same as that of table of fixed size.
Accounting Method
Here we guess charges to 3$. Intuitively, each item pays for 3 elementary operations.
1. 1$ for inserting immediate item.
2. 1$ for moving item (re-insert) when the table is expanded.
3. 1$ for moving another item (re-insert) has already been moved once when that table is expanded.
Potential Method
Define a potential function Φ that is 0 immediately after an expansion but potential builds to the table size by the time the table is full.
Φ(T) = 2 . num[T] - size[T]
- 73 -
ADA Notes
Immediately after an expansion (but before any insertion) we have, num[T] = size[T]/2
which implies
Φ(T) = 2 . num[T] - 2 num[T] = 0
Immediately before an expansion, we have num[T] = size[T],
which implies
Φ(T) = 2 . num[T] - num[T] = num[T]
The initial value of the potential function is zero i.e., Φ(T) = 0, and half-full
i.e., num[T] ≥ size[T]/2. or 2 num[T] ≥ size[T]
which implies
Φ(T) = 2 num[T] - size[T] ≥ 0
That is, Φ(T) is always nonnegative.
Before, analyze the amortized cost of the ith TABLE-INSERT operation define following.
Let numi = number of elements in the table after ith operation sizei = table after ith operation. Φi = Potential function after ith operation.
Initially, we have num0 = size0 =0 and Φ0 = 0.
- 74 -
ADA Notes
If ith insertion does not trigger an expansion, then sizei = sizei-1 and the amortized cost of the operation is
^ci = ci + Φi - Φi-1
= 1 + [2 . numi - sizei] - [2 . numi-1- sizei-1] = 1 + 2 numi - sizei - 2(numi- 1) - sizei
= 1 + 2numi - sizei - 2numi + 2 - sizei
= 3
If the ith insertion does trigger an expansion, then (size of the table becomes
double) then size i = sizei-1 = numi and the amortized cost of the operation is 2
^ci = ci + Φi - Φi-1
= numi + [2 . numi - sizei] - [2 . numi-1 - sizei-1] = numi + 2numi - sizei - 2 . numi-1 + sizei-1
= numi + 2numi -2(numi -1) -2(numi -1) + (numi -1) = numi + 2numi - 2numi -2 -2numi + 2 + numi -1 = 4 -1 = 3
What is the catch? It show how potential builds (from zero) to pay the table expansion.
- 75 -
ADA Notes
Dynamic Table Expansion and Contraction
When the load factor of the table, α(T) = n/m, becomes too small, we want to preserve following two properties:
1. Keep the load factor of the dynamic table below by a constant.
2. Keep the amortized cost of the dynamic table bounded above by a constant.
Proposed Strategy
Even: When an item is inserted into full table.
Action: Double the size of the table i.e., m ← 2m.Even: When removing of an item makes table less the half table.
Action: Halve the size of the table i.e., m ← m/2.
The problem with this strategy is trashing. We can avoid this problem by
allowing the load factor of the table, α(T) = n/m, to drop below 1/2 before
contradicting it. By contradicting the table when load factor falls below 1/4,
we maintain the lower bound α(T) ≥ 1/4 i.e., load factor is bounded by the
constant 1/4.
Load Factor
The load factor α(T), of non-empty table T is defined as the number of items stored in the T divided by the size of T, which is number slots in T, i.e.,
α(T) = num[T] / size[T]
- 76 -
ADA Notes
Since for an empty table, we have
num[T] = size[T] = 0 and α(T) = 1
which implies that we always have
num[T] = α(T) . size[T]
whether table is empty or not.
Analysis by Potential Method
We start by defining a potential function Φ that is
1. 0 immediately after an expansion and builds as the load factor, α(T) = n/m, increases to 1.
2. 0 immediately after a contraction and builds as the load factor, α(T) = n/m, decreases to 1/4.
Φ(T)2 . num[T] - size(T) if α(T) ≥ 1/2
size(T) - num[T] if α(T) < 1/2
Note that the potential function is never negative.
Properties of Potential Function
When α(T) = 1/2
α(T) = num[ T ] = 1/2 size[T]
- 77 -
ADA Notes
implies that size[T] = 2 num[T]
And the potential function is
since Φ(T) = 2 num[T] - size[T] = 2 num[T] - 2 num[T] = 0
When α(T) = 1since α(T) = num[ T ] = 1 size[T]
implies that size[T] = num[T]
And the potential function is
since Φ(T) = 2 num[T] - size[T] = 2 num[T] - num[T] = num[T]
which indicates that the potential can pay for an expansion if an item is inserted.
When α(T) = 1/4, thensince α(T) = num[ T ] = 1/4 size[T] implies that size[T] = 4 num[T]
And the potential function is
Φ(T) = size[T]/2 - num[T] = 4 num[T]/2 - num[T] = num[T]
which indicates that the potential can pay for a contradiction if an item is deleted.
Notation
- 78 -
ADA Notes
The subscript is used in the existing notations to denote their values after the ith operations. That is to say, ^ci, ci, numi, sizei, αi and Φi indicate values after the ith operation.
Initially num0 = size0 = Φ0 = 1 and α0
Hash Table
Direct-address table
If the keys are drawn from the reasoning small universe U = {0, 1, . . . , m-1} of keys, a solution is to use a Table T[0, . m-1], indexed by keys. To represent the dynamic set, we use an array, or direct-address table, denoted by T[0 . . m-1], in which each slot corresponds to a key in the universe.
Following figure illustrates the approach.
- 79 -
ADA Notes
Each key in the universe U i.e., Collection, corresponds to an index in the table T[0 . . m-1]. Using this approach, all three basic operations (dictionary
operations) take θ(1) in the worst case.
Hash Tables
When the size of the universe is much larger the same approach (direct address table) could still work in principle, but the size of the table would make it impractical. A solution is to map the keys onto a small range, using a function called a hash function. The resulting data structure is called hash table.
With direct addressing, an element with key k is stored in slot k. With hashing =, this same element is stored in slot h(k); that is we use a hash function h to compute the slot from the key. Hash function maps the universe U of keys into the slot of a hash table T[0 . . .m-1].
h: U → {0, 1, . . ., m-1}
More formally, suppose we want to store a set of size n in a table of size m.
The ratio α = n/m is called a load factor, that is, the average number of elements stored in a Table. Assume we have a hash function h that maps each
key k U to an integer name h(k) [0 . . m-1]. The basic idea is to store
key k in location T[h(k)].
Typical, hash functions generate "random looking" valves. For example, the following function usually works well
h(k) = k mod m where m is a prime number.
Is there any point of the hash function? Yes, the point of the hash function is to reduce the range of array indices that need to be handled.
Collision
- 80 -
ADA Notes
As keys are inserted in the table, it is possible that two keys may hash to the same table slot. If the hash function distributes the elements uniformly over the table, the number of conclusions cannot be too large on the average, but the birthday paradox makes it very likely that there will be at least one collision, even for a lightly loaded table
A hash function h map the keys k and j to the same slot, so they collide.
There are two basic methods for handling collisions in a hash table: Chaining and Open addressing.
Collision Resolution by Chaining
When there is a collision (keys hash to the same slot), the incoming keys is stored in an overflow area and the corresponding record is appeared at the end of the linked list.
- 81 -
ADA Notes
Each slot T[j] contains a linked list of all the keys whose hash value is j. For
example, h(k1) = h(kn) and h(k5) = h(k2) = h(k7).
The worst case running time for insertion is O(1).
Deletion of an element x can be accomplished in O(1) time if the lists are doubly linked.
In the worst case behavior of chain-hashing, all n keys hash to the same slot, creating a list of length n. The worst-case time for search
is thus θ(n) plus the time to compute the hash function.
keys: 5, 28, 19, 15, 20, 33, 12, 17, 10slots: 9hash function = h(k) = k mod 9
h(5) = 5 mod 9 = 4
h(28) = 28 mod 9 = 1
h(19) = 19 mod 9 = 1
h(15) = 15 mod 9 = 6
h(20) = 20 mod 9 = 2
h(33) = 33 mod 9 = 6
h(12) = 12mod 9 = 3
h(17) = 17 mod 9 = 8
h(10) = 10 mod 9 = 1
Figure
- 82 -
ADA Notes
A good hash function satisfies the assumption of simple uniform hashing, each element is equally likely to hash into any of the m slots, independently of where any other element has hash to. But usually it is not possible to check this condition because one rarely knows the probability distribution according to which the keys are drawn.
In practice, we use heuristic techniques to create a hash function that perform well. One good approach is to derive the hash value in a way that is expected to be independent of any patterns that might exist in the data (division method).
Most hash function assume that the universe of keys is the set of natural numbers. Thus, its keys are not natural to interpret than as natural numbers.
Method for Creating Hash Function
1. The division method.
2. The multiplication method.
3. Universal hashing.
1. The Division Method
Map a key k into one of m slots by taking the remainder of k divided by m. That is, the hash function is
h(k) = k mod m.
Example:
- 83 -
ADA Notes
If table size m = 12 key k = 100
than h(100) = 100 mod 12 = 4
Poor choices of m
m should not be a power of 2, since if m = 2p, then h(k) is just the p lowest-order bits of k.
So, 2p may be a poor choice, because permuting the characters of k does not change value.
Good m choice of m
A prime not too close to an exact of 2.
2. The Multiplication Method
Two step process
Step 1:
Multiply the key k by a constant 0< A < 1 and extract the fraction part of kA.
Step 2:
Multiply kA by m and take the floor of the result.
The hash function using multiplication method is:
h(k) = m(kA mod 1)
where "kA mod 1" means the fractional part of kA, that is, kA - kA.
Advantage of this method is that the value of m is not critical and can be implemented on most computers.
- 84 -
ADA Notes
A reasonable value of constant A is
(sqrt5 - 1) /2
suggested by Knuth's Art of Programming.
3. Universal Hashing
Open Addressing
This is another way to deal with collisions.
In this technique all elements are stored in the hash table itself. That is, each table entry contains either an element or NIL. When searching for element (or empty slot), we systematically examine slots until we found an element (or empty slot). There are no lists and no elements stored outside the table. That implies that table can completely "fill up"; the load factor α can never exceed 1.Advantage of this technique is that it avoids pointers (pointers need space too). Instead of chasing pointers, we compute the sequence of slots to be examined. To perform insertion, we successively examine or probe, the hash table until we find an empty slot. The sequence of slots probed "depends upon the key being inserted." To determine which slots to probe, the hash function includes the probe number as a second input. Thus, the hash function becomes
h: × {0, 1, . . . m -1 }--> {0, 1, . . . , m-1}
and the probe sequence
< h(k, 0), h(k, 1), . . . , h(k, m-1) >
in which every slot is eventually considered.
Pseudocode for Insertion
HASH-INSERT (T, k)
- 85 -
ADA Notes
i = 0Repeat j <-- h(k, i) if Y[j] = NIL then T[j] = k Return j use i = i +1until i = merror "table overflow"
Pseudocode for Search
HASH-SEARCH (T, k)
i = 0Repeat j <-- h(k, i) if T[j] = k then return j i = i +1until T[j] = NIL or i = mReturn NIL
Pseudocode for Deletion
Following are the three techniques to compute the probe sequences.
1. Linear probing.
2. Quadratic probing.
3. Double hashing.
- 86 -
ADA Notes
These techniques guarantee that
< h(k, 0), h(k, 1), . . . , h(k, m-1) >
a permutation of < 0, 1, . . . , m-1> for each key k.
Uniform hashing required are not met. Since none of these techniques capable of generating more than m2 probe sequences (instead of m!).
Uniform Hashing
Each key is equally likely to have any of the m! permutation of < 0, 1, . . . , m-1> as its probe sequence.
Note that uniform hashing generalizes the notion of simple uniform hashing.
1. Linear Probing
This method uses the hash function of the form:
h(k, i) = (h`(k) + i) mod m for i = 0, 1, 2, . . . , m-1
where h` is an auxiliary hash function. Linear probing suffers primary clustering problem.
2. Quadratic Probing
This method uses the hash function of the form
h(k, i) = (h`(k) + c1i + c2i2) mod m for i = 0, 1, 2, . . . , m-1
where h` is an auxiliary hash function. c1 + c2 ≠ 0 are auxiliary constants.
- 87 -
ADA Notes
This method works much better than linear probing.
Quadratic probing suffers a milder form of clustering, called secondary clustering.
3. Double Hashing
This method produced the permutation that is very close to random. This method uses a hash function of the form
h(k, i) = (h, 9k) + ih2 (k)) mod m
where h1 and h2 are auxiliary hash functions.
The probe sequence here depends in two ways on the key k, the initial probe position and the offset.
Binary Search Tree
Binary Search tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x
are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x. This is called binary-search-tree
property.
The basic operations on a binary search tree take time proportional to the height of the tree. For a complete binary tree with node n, such operations runs in
(lg n) worst-case time. If the tree is a linear chain of n nodes, however, the
same operations takes (n) worst-case time.
- 88 -
ADA Notes
The height of the Binary Search Tree equals the number of links from the root node to the deepest node.
Implementation of Binary Search Tree
Binary Search Tree can be implemented as a linked data structure in which each node is an object with three pointer fields. The three pointer fields left,
right and p point to the nodes corresponding to the left child, right child and the parent respectively NIL in any pointer field signifies that there exists no corresponding child or parent. The root node is the only node in the BTS
structure with NIL in its p field.
- 89 -
ADA Notes
Inorder Tree Walk
During this type of walk, we visit the root of a subtree between the left subtree visit and right subtree visit.
INORDER-TREE-WALK (x)If x NIL then INORDER-TREE-WALK (left[x]) print key[x] INORDER-TREE-WALK (right[x])
It takes (n) time to walk a tree of n nodes. Note that the Binary Search Tree property allows us to print out all the elements in the
Binary Search Tree in sorted order.
- 90 -
ADA Notes
Preorder Tree Walk
In which we visit the root node before the nodes in either subtree.
PREORDER-TREE-WALK (x)If x not equal NIL then PRINT key[x] PREORDER-TREE-WALK (left[x]) PREORDER-TREE-WALK (right[x])
Postorder Tree Walk
In which we visit the root node after the nodes in its subtrees.
POSTORDER-TREE-WALk (x)If x not equal NIL then POSTORDER-TREE-WALK (left[x]) PREORDER-TREE-WALK (right[x]) PRINT key [x]
- 91 -
ADA Notes
It takes O(n) time to walk (inorder, preorder and pastorder) a tree of n nodes.
Binary-Search-Tree property Vs Heap Property
In a heap, a nodes key is greater than equal to both of its children's keys. In binary search tree, a node's key is greater than or equal to its child's key but less than or equal to right child's key. Furthermore, this applies to entire subtree in the binary search tree case. It is very important to note that the heap property does not help print the nodes in sorted order because this property does not tell us in which subtree the next item is. If the heap property could used to print the
keys (as we have shown above) in sorted order in O(n) time, this would contradict our known lower bound on comparison sorting.
The last statement implies that since sorting n elements takes Ω(n lg n) time in the worst case in the comparison model, any comparison-based algorithm for
constructing a Binary Search Tree from arbitrary list n elements takes Ω(n lg n) time in the worst case.
We can show the validity of this argument (in case you are thinking of beating
Ω(n lg n) bound) as follows: let c(n) be the worst-case running time for
constructing a binary tree of a set of n elements. Given an n-node BST, the inorder walk in the tree outputs the keys in sorted order (shown above). Since the worst-case running time of any computation based sorting algorithm is
Ω(n lg n) , we have
c(n) + O(n) = Ω(n lgn)Therefore, c(n) = Ω(n lgn).
Querying a Binary Search Tree
- 92 -
ADA Notes
The most common operations performed on a BST is searching for a key stored in the tree. Other operations are MINIMUM, MAXIMUM, SUCCESSOR and
PREDESESSOR. These operations run in O(h) time where h is the height of
the tree i.e., h is the number of links root node to the deepest node.
The TREE-SEARCH (x, k) algorithm searches the tree root at x for a node
whose key value equals k. It returns a pointer to the node if it exists otherwise NIL
TREE-SEARCH (x, k)
if x = NIL .OR. k = key[x] then return xif k < key[x] then return TREE-SEARCH (left[x], k) else return TREE-SEARCH (right[x], k)
Clearly, this algorithm runs in O(h) time where h is the height of the tree.
The iterative version of above algorithm is very easy to implement.
ITERATIVE-TREE-SEARCH (x, k)
1. while x not equal NIL .AND. key ≠
key[x] do
2. if k < key [x]
3. then x ← left[x]
- 93 -
ADA Notes
4. else x ← right [x]
5. return x
The TREE-MINIMUN (x) algorithm returns a point to the node of the tree at x whose key value is the minimum of all keys in the tree. Due to BST property, an minimum element can always be found by following left child pointers from the root until NIL is uncountered.
TREE-MINIMUM (x)
while left[x] ≠ NIL do x ← left [x]return x
Clearly, it runs in O(h) time where h is the height of the tree. Again thanks to BST property, an element in a binary search tree whose key is a maximum can always be found by following right child pointers from root until a NIL is encountered.
TREE-MAXIMUM (x)
while right[x] ≠ NIL do x ← right [x]return x
Clearly, it runs in O(h) time where h is the height of the tree.
The TREE-SUCCESSOR (x) algorithm returns a pointer to the node in the tree whose key value is next higher than key [x].
- 94 -
ADA Notes
TREE-SUCCESSOR (x)
if right [x] ≠ NIL then return TREE-MINIMUM (right[x]) else y ← p[x] while y ≠ NIL .AND. x = right[y] do x ← y y ← p[y] return y
Note that algorithm TREE-MINIMUM, TRE-MAXIMUM, TREE-SUCCESSOR, and TREE-PREDESSOR never look at the keys.
An inorder tree walk of an n-node BST can be implemented in (n)-time by finding the minimum element in the tree with TREE-MINIMUM (x) algorithm and then making n-1 calls to TREE-SUCCESSOR (x).
Another way of Implementing Inorder walk on Binary Search Tree
Algorithm
find the minimum element in the tree with TREE-
MINIMUM
Make n-1 calls to TREE-SUCCESSOR
- 95 -
ADA Notes
Let us show that this algorithm runs in (n) time. For a tree T, let mT be the number of edges that are traversed by the above algorithm. The running time of
the algorithm for T is (mT). We make following claim:
mT is zero if T has at most one node
and 2e - r otherwise, where e is
the number of edges in the tree and r is the length of the path from root to the node holding the maximum key.
Note that e = n - 1 for any tree with at least one node. This allows us to
prove the claim by induction on e (and therefore, on n).
Base case Suppose that e = 0. Then, either the tree is empty or consists
only of a single node. So, e = r = 0. Therefore, the claim holds.
Inductive step Suppose e > 0 and assume that the claim holds for all e' <
e. Let T be a binary search tree with e edges. Let x be the root, and T1 and T2
respectively be the left and right subtree of x. Since T has at least one edge,
either T1 or T2 respectively is nonempty. For each i = 1, 2, let ei be the
number of edges in Ti, pi the node holding the maximum key in Ti, and ri the
distance from pi to the root of Ti. Similarly, let e, p and r be the
correspounding values for T. First assume that both T1 and T2 are nonempty. Then e = e1 + e2 + 2, p = p2, and r = r2 + 1. The action of the enumeration is as follows:
Upon being called, the minimum-tree(x) traverses the left branch of x and enters T1.
Once the root of T1 is visited, the edges of T1 are traversed as if T1 is the input tree. This situation will last until p1 is visisted.
When the Tree-Successor is called form p1. The upward path from p
1 and x is traversed and x is discovered to hold the
successor.
When the tree-Successor called from x, the right branch of x is taken.
Once the root of T2 is visited, the edges of T2 are traversed as if T2 is the input tree. This situation will last until p2 is
reached, whereby the algorithm halts.
- 96 -
ADA Notes
By the above analysis, the number of edges that are traversed by the above
algorithm, mT, is
mT = 1 + (2e1 - r1) + (r1 + 1) + 1 + (2e2 - r2) = 2(e1 + e2 + 2) - (r2 + 1) = 2e -r
Therefore, the claim clearly holds for this case.
Next suppose that T2 is emply. Since e > 0, T1 is nonempty. Then e = e1 + 1. Since x does not have a right child, x holds the maximum. Therefore, p = x and r = 0. The action of the enumeration algorithm is the first two steps. Therefore, the number of edges that are traversed by the algorithm in question is
mT = 1 + (2e1 - r1) + ( r1 +1) = 2(e1 + 1) - 0 = 2e - r
Therefore, the claim holds for this case.
Finally, assume that T1 is empty. Then T2 is nonempty. It holds that e = e2 + 1, p = p2, and r = r2 + 1. This time x holds the minimum key and the action of the enumeration algorithm is the last two steps. Therefore, the number of edges that are traversed by the algorithm is
mT = 1 + (2e2 - r2) = 2(e2+1) - (r2 + 1) = 2e -r
Therefore, the claim holds for this case.
The claim is proven since e = n - 1, mT 2n. On the other hand, at least
- 97 -
ADA Notes
one edge has to be traversed when going from on node to another, so mT n - 1. Therefore, the running time of the above algorithm is (n).
Consider any binary search tree T and let y be the parent of a leaf z. Our goal
is to show that key[y] is
either the smallest key in T larger than key[x]
or the largest key in the T smaller than key[x].
Proof Suppose that x is a left child of y. Since key[y] key[x], only we
have to show that there is no node z with key[y] > key[z] > key[x]. Assume, to
the contrary, that there is such a z. Choose z so that it holds the smallest key
among such nodes. Note for every node u z, x, key[z] dey[u] if and only if
key[x] key[u]. If we search key[z], then the search path is identical to that of
key[x] until the path rearches z or x. Since x is a leaf (meaning it has no
children), the search path never reaches x. Therefore, z is an ancestor of x.
Since y is the parent of x (it is given, in case you've forgotton!) and is not z, z
has to be an ancestor of y. So, key[y] > dey[z] >dey[x]. However, we are
assuming key[y] > key[z] > key[x], so this is clearly impossible. Therefore,
there is no such z.
The case when x is a right child of y is easy. Hint: symmetric.
INSERTION
To insert a node into a BST
- 98 -
ADA Notes
1. find a leaf st the appropriate place and
2. connect the node to the parent of the leaf.
TREE-INSERT (T, z)
y ← NILx ← root [T]while x ≠ NIL do y ← x if key [z] < key[x] then x ← left[x] else x ← right[x]p[z] ← yif y = NIL then root [T] ← z else if key [z] < key [y] then left [y] ← z else right [y] ← z
Like other primitive operations on search trees, this algorithm begins at the root
of the tree and traces a path downward. Clearly, it runs in O(h) time on a tree
of height h.
- 99 -
ADA Notes
Sorting
We can sort a given set of n numbers by first building a binary search tree containing these number by using TREE-INSERT (x) procedure repeatedly to insert the numbers one by one and then printing the numbers by an inorder tree walk.
Analysis
Best-case running time
Printing takes O(n) time and n insertion cost O(lg n) each (tree is balanced, half the insertions are at depth lg(n) -1).
This gives the best-case running time O(n lg n).
Worst-case running time
Printing still takes O(n) time and n insertion costing O(n) each (tree is a single chain of nodes) is O(n2). The n insertion
cost 1, 2, 3, . . . n, which is arithmetic sequence so it is n2/2.
Deletion
Removing a node from a BST is a bit more complex, since we do not want to
create any "holes" in the tree. If the node has one child then the child is spliced to the parent of the node. If the node has two children then its successor has no left child; copy the successor into the node and delete the successor instead TREE-DELETE (T, z) removes the node pointed to by z from the tree T. IT returns a pointer to the node removed so that the node can be put on a free-node list, etc.
TREE-DELETE (T, z)
1. if left [z] = NIL .OR. right[z] = NIL
2. then y ← z
3. else y ← TREE-SUCCESSOR (z)
- 100 -
ADA Notes
4. if left [y] ≠ NIL
5. then x ← left[y]
6. else x ← right [y]
7. if x ≠ NIL
8. then p[x] ← p[y]
9. if p[y] = NIL
10. then root [T] ← x
11. else if y = left [p[y]]
12. then left [p[y]] ← x
13. else right [p[y]] ← x
14. if y ≠ z
15. then key [z] ← key [y]
16. if y has other field, copy them, too
17. return y
The procedure runs in O(h) time on a tree of height h.
Graph Algorithms
Graph Theory is an area of mathematics that deals with following types of problems
Connection problems
Scheduling problems
Transportation problems
Network analysis
Games and Puzzles.
The Graph Theory has important applications in Critical path analysis, Social
- 101 -
ADA Notes
psychology, Matrix theory, Set theory, Topology, Group theory, Molecular chemistry, and Searching.
Those who would like to take a quick tour of essentials of graph theory please go directly to "Graph Theory" from here.
Digraph
A directed graph, or digraph G consists of a finite nonempty set of vertices V, and a finite set of edges E, where an edge is an ordered pair of vertices in V. Vertices are also commonly referred to as nodes. Edges are sometimes referred to as arcs.
As an example, we could define a graph G=(V, E) as follows:
V = {1, 2, 3, 4} E = { (1, 2), (2, 4), (4, 2) (4, 1)}
Here is a pictorial representation of this graph.
The definition of graph implies that a graph can be drawn just knowing its vertex-set and its edge-set. For example, our first example
- 102 -
ADA Notes
has vertex set V and edge set E where: V = {1,2,3,4} and E = {(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1). Notice that each edge seems to be listed twice.
Another example, the following Petersen Graph G=(V,E) has vertex set V and edge set E where: V = {1,2,3,4}and E ={(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1)}.
We'll quickly covers following three important topics from algorithmic perspective.
1. Transpose
- 103 -
ADA Notes
2. Square
3. Incidence Matrix
1. Transpose
If graph G = (V, E) is a directed graph, its transpose, GT = (V, ET) is the same as graph G with all arrows reversed. We define the transpose of a adjacency matrix A = (aij) to be the adjacency matrix AT = (Taij) given by Taij = aji. In other words, rows of matrix A become columns of matrix AT and columns of matrix A becomes rows of matrix AT. Since in an undirected graph, (u, v) and (v, u) represented the same edge, the adjacency matrix A of an undirected graph is its own transpose: A = AT.
Formally, the transpose of a directed graph G = (V, E) is the graph GT (V, ET), where ET = {(u, v) V×V : (u, v)E. Thus, GT is G with all its edges reversed.
We can compute GT from G in the adjacency matrix representations and adjacency list representations of graph G.
Algorithm for computing GT from G in representation of graph G is
ALGORITHM MATRIX TRANSPOSE (G, GT)
For i = 0 to i < V[G] For j = 0 to j V[G] GT (j, i) = G(i, j) j = j + 1; i = i + 1
To see why it works notice that if GT(i, j) is equal to G(j, i), the same thing is achieved. The time complexity is clearly O(V2).
- 104 -
ADA Notes
Algorithm for Computing GT from G in Adjacency-List Representation
In this representation, a new adjacency list must be constructed for transpose of G. Every list in adjacency list is scanned. While scanning adjacency list of v (say), if we encounter u, we put v in adjacency-list of u.
ALGORITHM LIST TRANSPOSE [G]
for u = 1 to V[G] for each element vAdj[u] Insert u into the front of Adj[v]
To see why it works, notice if an edge exists from u to v, i.e., v is in the adjacency list of u, then u is present in the adjacency list of v in the transpose of G.
2. Square
The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that (a, b)E2 if and only if for some vertex cV, both (u, c)E and (c,b)E. That is, G2 contains an edge between vertex a and vertex b whenever G contains a path with exactly two edges between vertex a and vertex b.
Algorithms for Computing G2 from G in the Adjacency-List Representation of G
- 105 -
ADA Notes
Create a new array Adj'(A), indexed by V[G]For each v in V[G] doFor each u in Adj[v] do \\ v has a path of length 2. \\ to each of the neighbors of umake a copy of Adj[u] and append it to Adj'[v]Return Adj'(A).
For each vertex, we must make a copy of at most |E| list elements. The total time is O(|V| * |E|).
Algorithm for Computing G2 from G in the Adjacency-Matrix representation of G.
For i = 1 to V[G] For j = 1 to V[G] For k = 1 to V[G] c[i, j] = c[i, j] + c[i, k] * c[k, j]
Because of three nested loops, the running time is O(V3).
3. Incidence Matrix
The incidence matrix of a directed graph G=(V, E) is a V×E matrix B = (bij) such that
- 106 -
ADA Notes
-1 if edge j leaves vertex j.bij = 1 if edge j enters vertex j. 0 otherwise.
If B is the incidence matrix and BT is its transpose, the diagonal of the product matrix BBT represents the degree of all the nodes, i.e., if P is the product matrix BBT then P[i, j] represents the degree of node i:
Specifically we have
BBT(i,j) = ∑eE bie bTej = ∑eE bie bje
Now,
If i = j, then bie
bje
= 1, whenever edge e enters or leaves vertex i and 0 otherwise.
If i ≠ j, then bie
bje
= -1, when e = (i, j) or e = (j, i) and 0 otherwise.
Therefore
BBT(i,j) = deg(i) = in_deg + Out_deg if i = j
= -(# of edges connecting i an j ) if i ≠ j
Breadth First Search (BFS)
Breadth First Search algorithm used in
- 107 -
ADA Notes
Prim's MST algorithm.
Dijkstra's single source shortest path algorithm.
Like depth first search, BFS traverse a connected component of a given graph and defines a spanning tree.
Algorithm Breadth First Search
BFS starts at a given vertex, which is at level 0. In the first stage, we visit all vertices at level 1. In the second stage, we visit all vertices at second level. These new vertices, which are adjacent to level 1 vertices, and so on. The BFS traversal terminates when every vertex has been visited.
BREADTH FIRST SEARCH (G, S)
Input: A graph G and a vertex.Output: Edges labeled as discovery and cross edges in the connected component.
Create a Queue Q.ENQUEUE (Q, S) // Insert S into Q.While Q is not empty do for each vertex v
- 108 -
ADA Notes
in Q do for all edges e incident on v do if edge e is unexplored then let w be the other endpoint of e. if vertex w is unexpected then - mark e as a discovery edge - insert w into Q else mark e as a cross edge
BFS label each vertex by the length of a shortest path (in terms of number of edges) from the start vertex.
Example (CLR)
Step1
- 109 -
ADA Notes
Step 2
Step 3
Step 4
Step 5
Step 6
- 110 -
ADA Notes
Step 7
Step 8
Step 9
Starting vertex (node) is SSolid edge = discovery edge.Dashed edge = error edge (since none of them connects a vertex to one of its ancestors).
As with the depth first search (DFS), the discovery edges form a spanning tree,
which in this case we call the BSF-tree.
- 111 -
ADA Notes
BSF used to solve following problem
Testing whether graph is connected.
Computing a spanning forest of graph.
Computing, for every vertex in graph, a path with the minimum number of edges between start vertex and current vertex or reporting that no such path exists.
Computing a cycle in graph or reporting that no such cycle exists.
Analysis
Total running time of BFS is O(V + E).
Bipartite Graph
We define bipartite graph as follows: A bipartite graph is an undirected graph G = (V, E) in which V can be partitioned into two sets V1 and V2 such that (u, v)E implies either u V1 and v V2 or u V2 and v V1. That is, all edges go between the two sets V1 and V2.
In other to determine if a graph G = (V, E) is bipartite, we perform a BFS on it
with a little modification such that whenever the BFS is at a vertex u and
encounters a vertex v that is already 'gray' our modified BSF should check to
- 112 -
ADA Notes
see if the depth of both u and v are even, or if they are both odd. If either of
these conditions holds which implies d[u] and d[v] have the same parity, then the graph is not bipartite. Note that this modification does not change the
running time of BFS and remains (V + E).
Formally, to check if the given graph is bipartite, the algorithm traverse the graph labeling the vertices 0, 1, or 2 corresponding to unvisited., partition 1 and partition 2 nodes. If an edge is detected between two vertices in the same partition, the algorithm returns.
ALGORITHM: BIPARTITE (G, S)
For each vertex UV[G] - {s} do
Color[u] = WHITE d[u] = ∞ partition[u] = 0
Color[s] = graypartition[s] = 1d[s] = 0Q = [s]While Queue 'Q' is not empty do u = head [Q] for each v in Adj[u] do if partition [u] = partition [v] then
- 113 -
ADA Notes
return 0 else if color[v] WHITE then color[v] = gray d[v] = d[u] +1 partition[v] = 3 - partition[u] ENQUEUE (Q, v)DEQUEUE (Q)Color[u] = BLACKReturn 1
Correctness
As Bipartite (G, S) traverse the graph it labels the vertices with a partition number consisted with the graph being bipartite. If at any vertex, algorithm detects an inconsistency, it shows with an invalid return value,. Partition value
of u will always be a valid number as it was enqueued at some point and its
partition was assigned at that point. AT line 19, partition of v will unchanged if
it already set, otherwise it will be set to a value opposite to that of vertex u.
Analysis
The lines added to BFS algorithm take constant time to execute and so the
running time is the same as that of BFS which is O(V + E).
- 114 -
ADA Notes
Diameter of Tree
The diameter of a tree T = (V, E) is the largest of all shortest-path distance
in the tree and given by max[dist(u,v)]. As we have mentioned that BSF can be use to compute, for every vertex in graph, a path with the minimum number of edges between start vertex and current vertex. It is quite easy to compute the diameter of a tree. For each vertex in the tree, we use BFS algorithm to get a shortest-path. By using a global variable length, we record
the largest of all shortest-paths. This will clearly takes O(V(V + E)) time.
ALGORITHM: TREE_DIAMETER (T)
maxlength = 0For S = 0 to S < |V[T]| temp = BSF(T, S) if maxlength < temp maxlength = temp Increment s by 1return maxlength
Depth First Search (DFS)
- 115 -
ADA Notes
Depth first search (DFS) is useful for
Find a path from one vertex to another
Whether or not graph is connected
Computing a spanning tree of a connected graph.
DFS uses the backtracking technique.
Algorithm Depth First Search
Algorithm starts at a specific vertex S in G, which becomes current vertex. Then algorithm traverse graph by any edge (u, v) incident to the current vertex u. If the edge (u, v) leads to an already visited vertex v, then we backtrack to current vertex u. If, on other hand, edge (u, v) leads to an unvisited vertex v, then we go to v and v becomes our current vertex. We proceed in this manner until we reach to "deadend". At this point we start back tracking. The process terminates when backtracking leads back to the start vertex.
Edges leads to new vertex are called discovery or tree edges and edges lead to already visited are called back edges.
DEPTH FIRST SEARCH (G, v)
Input: A graph G and a vertex v.Output: Edges labeled as discovery and back edges in the connected component.
- 116 -
ADA Notes
For all edges e incident on v do If edge e is unexplored then w ← opposite (v, e) // return the end point of e distant to v If vertex w is unexplained then - mark e as a discovery edge - Recursively call DSF (G, w) else - mark e as a back edge
Example (CLR)
- 117 -
ADA Notes
Solid Edge = discovery or tree edgeDashed Edge = back edge.
Each vertex has two time stamps: the first time stamp records when vertex is first discovered and second time stamp records when the search finishes examining adjacency list of vertex.
- 118 -
ADA Notes
DFS algorithm used to solve following problems.
Testing whether graph is connected.
Computing a spanning forest of graph.
Computing a path between two vertices of graph or equivalently reporting that no such path exists.
Computing a cycle in graph or equivalently reporting that no such cycle exists.
Analysis
The running time of DSF is (V + E).
Consider vertex u and vertex v in V[G] after a DFS. Suppose vertex v in
some DFS-tree. Then we have d[u] < d[v] < f[v] < f[u] because of the following reasons
1. Vertex u was discovered before vertex v; and
2. Vertex v was fully explored before vertex u was fully explored.
- 119 -
ADA Notes
Note that converse also holds: if d[u] < d[v] < f[v] < f[u] then vertex v
is in the same DFS-tree and a vertex v is a descendent of vertex u.
Suppose vertex u and vertex v are in different DFS-trees or suppose vertex u and vertex v are in the same DFS-tree but neither vertex is the descendent of the other. Then one vertex was discovered and fully explored before the other
was discovered i.e., f[u] < d[v] or f[v] < d[u].
Consider a directed graph G = (V, E). After a DFS of graph G we can put each edge into one of four classes:
A tree edge is an edge in a DFS-tree.
A back edge connects a vertex to an ancestor in a DFS-tree. Note that a self-loop is a back edge.
A forward edge is a nontree edge that connects a vertex to a descendent in a DFS-tree.
A cross edge is any other edge in graph G. It connects vertices in two different DFS-tree or two vertices in the same DFS-
tree neither of which is the ancestor of the other.
Lemma 1 An Edge (u, v) is a back edge if and only if d[v] < d[u] < f[u] < f[v].
Proof
(=> direction) From the definition of a back edge, it connects vertex u to an ancestor vertex v in a DFS-tree. Hence, vertex u is a descendent of vertex v. Corollary 23.7 in the CLR states that vertex u is a proper descendent of vertex v if and only if d[v] < d[u] < f[u] < f[v]. Hence proved forward direction. □
(<= direction) Again by the Corollary 23.7 (CLR), vertex u is a proper descendent of vertex v. Hence if an edge (u, v) exists from u to v then it is an edge connecting a descendent vertex u to its ancestor vertex v. Hence it is a back edge. Hence proved backward direction.
Conclusion: Immediate from both directions.
- 120 -
ADA Notes
Lemma 2 An edge (u, v) is a cross edge if and only if d[v] < f[v] < d[u] < f[v].
Proof
First take => direction.
Observation 1 For an edge (u, v), d[u] < f[u] and d[v] < f[v] since for any vertex has to be discovered before we can finish exploring it.
Observation 2 From the definition of a cross edge it is an edge which is not a tree edge, forward edge or a backward edge. This
implies that none of the relationships for forward edge [ d[u] < d[v] < f[v] < f[u] ] or back edge [ d[v] < d[u] < f[u] < f[v] ] can hold for a cross edge.
From the above two observations we conclude that the only two possibilities are:
d[u] < f[u] < d[v] < f[v] and
d[v] < f[v] < d[u] < f[u]
When the cross edge (u, v) is discovered we must be at vertex u
and vertex v must be black. The reason is that if v was while then
edge (u, v) would be a tree edge and if v was gray edge (u, v)
would be a back edge. Therefore, d[v] < d[u] and hence possibility (2) holds true.
Now take <= direction.
We can prove this direction by eliminating the various possible
edges that the given relation can convey. If d[v] < d[v] <
- 121 -
ADA Notes
d[u] < f[u] then edge (u, v) cannot be a tree or a forward
edge. Also, it cannot be a back edge by lemma 1. Edge (u, v) is not a forward or back edge. Hence it must be a cross edge (please go above and look again the definition of cross edge).
Conclusion: Immediately from both directions.
Just for the hell of it lets determine whether or not an undirected graph contain a cycle. It is not difficult to see that the algorithm for this problem would be very similar to DFS(G) except that when the adjacent edge is already a GRAY edge than a cycle is detected. While doing this the algorithm also takes care that it is not detecting a cycle when the GRAY edge is actually a tree edge from a ancestor to a descendent.
ALGORITHM DFS_DETECT_CYCLES [G]
For each vertex u in V[G] do Color [u] = while, Predecessor [u] = NIL;time = 0For each vertex u in V[G] do if color [u] = while DFS_visit(u);
- 122 -
ADA Notes
The subalgorithm DFS_visit(u) is as follows:
DFS_visit(u) color(u) = GRAY d[u] = time = time + 1 For each v in adj[u] do if color[v] = gray and Predecessor[u] v do return "cycle exists" if color[v] = while do Predecessor[v] = u Recursively DFS_visit(v) color[u] = Black; f[u] = time = time + 1
Correctness
To see why this algorithm works suppose the node to visited v is a gray node, then there are two possibilities:
1. The node v is a parent node of u and we are going back the tree edge which we traversed while visiting u after visiting v. In
that case it is not a cycle.
2. The second possibility is that v has already been encountered once during DFS_visit and what we are traversing now will be
back edge and hence a cycle is detected.
- 123 -
ADA Notes
Time Complexity
The maximum number of possible edges in the graph G if it does not have
cycle is |V| - 1. If G has a cycles, then the number of edges exceeds this number. Hence, the algorithm will detects a cycle at the most at the Vth edge if
not before it. Therefore, the algorithm will run in O(V) time.
Topological Sort
A topological sort of a directed acylic graph (DAG) G is an ordering of the
vertices of G such that for every edge (ei, ej) of G we have i < j. That is, a
topological sort is a linear ordering of all its vertices such that if DAG G
contains an edge (ei, ej), then ei appears before ej in the ordering. DAG is cyclic then no linear ordering is possible.
In simple words, a topological ordering is an ordering such that any directed
path in DAG G traverses vertices in increasing order.
It is important to note that if the graph is not acyclic, then no linear ordering is possible. That is, we must not have circularities in the directed graph. For example, in order to get a job you need to have work experience, but in order to get work experience you need to have a job (sounds familiar?).
Theorem A directed graph has a topological ordering if and only if it is acyclic.
Proof:
Part 1. G has a topological ordering if is G acyclic.
Let G is topological order.Let G has a cycle (Contradiction).
- 124 -
ADA Notes
Because we have topological ordering. We must have i0, < i, < . . . < ik-1 < i0, which is clearly impossible.Therefore, G must be acyclic.
Part 2. G is acyclic if has a topological ordering.
Let is G acyclic.Since is G acyclic, must have a vertex with no incoming edges. Let v1 be such a vertex. If we remove v1 from graph, together with its outgoing edges, the resulting digraph is still acyclic. Hence resulting digraph also has a vertex *
Algorithm Topological Sort
TOPOLOGICAL_SORT(G)
1. For each vertex find the finish time by
calling DFS(G).
2. Insert each finished vertex into the front of
a linked list.
3. Return the linked list.
Example
Given graph G; start node u
Diagram
with no incoming edges, and we let v2 be such a vertex. By repeating this
process until digraph G becomes empty, we obtain an ordering v1 < v2 < , . . . , vn of vertices of digraph G. Because of the construction, if (vi, vj) is an
- 125 -
ADA Notes
edge of digraph G, then vi must be detected before vj can be deleted, and thus i < j. Thus, v1, . . . , vn is a topological sorting.
Total running time of topological sort is (V + E) . Since DFS(G) search
takes (V + E) time and it takes O(1) time to insert each of the |V| vertices onto the front of the linked list.
- 126 -