winter 2014parallel processing, fundamental conceptsslide 1 3 parallel algorithm complexity review...

Winter 2014 Parallel Processing, Fundamental Concepts Slide 1

3 Parallel Algorithm Complexity

Review algorithm complexity and various complexity classes:• Introduce the notions of time and time/cost optimality• Derive tools for analysis, comparison, and fine-tuning

Topics in This Chapter

3.1 Asymptotic Complexity

3.2 Algorithms Optimality and Efficiency

3.3 Complexity Classes

3.4 Parallelizable Tasks and the NC Class

3.5 Parallel Programming Paradigms

3.6 Solving Recurrences


3.1 Asymptotic Complexity

Fig. 3.1 Graphical representation of the notions of asymptotic complexity.

n

c g(n)

g(n)

f(n)

n n

c g(n)

c' g(n)

f(n)

n n

g(n)

c g(n)

f(n)

n 0 0 0

f(n) = O(g(n)) f(n) = (g(n)) f(n) = (g(n)) f(n) = O(g(n)) f(n) = (g(n)) f(n) = (g(n))

3n log n = O(n2) ½ n log2 n = (n) 3n2 + 200n = (n2)


Little Oh, Big Oh, and Their Buddies

Notation Growth rate Example of use

f(n) = o(g(n)) strictly less than T(n) = cn2 + o(n2)

f(n) = O(g(n)) no greater than T(n, m) = O(n log n + m)

f(n) = (g(n)) the same as T(n) = (n log n)

f(n) = (g(n)) no less than T(n, m) = (n + m3/2)

f(n) = (g(n)) strictly greater than T(n) = (log n)


Some Commonly Encountered Growth Rates

Notation Class name Notes

O(1) Constant Rarely practicalO(log log n) Double-logarithmic SublogarithmicO(log n) LogarithmicO(logk

n) Polylogarithmic k is a constantO(na), a < 1 e.g., O(n1/2) or O(n1–)O(n / logk n) Still sublinear-------------------------------------------------------------------------------------------------------------------------------------------------------------------

O(n) Linear-------------------------------------------------------------------------------------------------------------------------------------------------------------------

O(n logk n) SuperlinearO(nc), c > 1 Polynomial e.g., O(n1+) or O(n3/2)O(2n) Exponential Generally intractableO(22n

) Double-exponential Hopeless!


Complexity History of Some Real Problems

Examples from the book Algorithmic Graph Theory and Perfect Graphs [GOLU04]:

Complexity of determining whether an n-vertex graph is planar

Exponential Kuratowski 1930

O(n3) Auslander and Porter 1961Goldstein 1963Shirey 1969

O(n2) Lempel, Even, and Cederbaum 1967

O(n log n) Hopcroft and Tarjan 1972

O(n) Hopcroft and Tarjan 1974Booth and Leuker 1976

A second, more complex example: Max network flow, n vertices, e edges:ne2 n2e n3 n2e1/2 n5/3e2/3 ne log2 n ne log(n2/e) ne + n2+ ne loge/(n log n) n ne loge/n n + n2 log2+ n


Suppose that we have constructed a valid algorithm to solve a given problem of size n in g(n) time, where g(n) is a known function such as n log2 n or n ²,obtained through exact or asymptotic analysis.

A question of interest is whether or not the algorithm at hand is the best algorithm for solving the problem?

3.2. Algorithm Optimality And Efficiency



What is the running timeƒ(n) of the fastest algorithm for solving this problem?

Of course, algorithm quality can be judged in many different ways,such as:

•running time• resource requirements• simplicity (which affects the cost of development, debugging, and maintenance•portability


If we are interested in asymptotic comparison, then because an algorithm with running time g(n) is already known, ƒ(n) =O(g(n)); i.e., for large n, the running time of the best algorithm is upper bounded by cg(n) for some constant c.

If, subsequently, someone develops an asymptotically faster algorithm for solving the same problem, say in time h(n), we conclude that f(n)=O(h(n)).

The process of constructing and improving algorithms thus contributes to the establishment of tighter upper bounds for the complexity of the best algorithm



On currently with the establishment of upper bounds as discussed above, we might work on determining lower bounds on a problem's time complexity.

A lower bound is useful as it tells us how much room for improvement there might be in existing algorithms.



1. In the worst case, solution of the problem requires data to travel a certain distance or that a certain volume of data must pass through a limited bandwidth interface.

The second method : is exemplified by the worst-case linear time required by any sorting algorithm on a binary tree architecture (bisection-based lower bound).

An example of he first method is the observation algorithm on a p-processor square mesh needs at least 2p-2 communication steps in the worst case. (Diameter based lower bound)



2. In the worst case, solution of the problem requires that a certain number of elementary operations be performed. This is the method used for establishing the Ω(n log n) lower bound for comparison-based sequential sorting algorithms.

3. Showing that any instance of a previously analyzed problem can be converted to an instance of the problem under study, so that an algorithm for solving our problem can also be used, with simple pre and post processing steps, to solve the previous problem.



Fig. 3.2 Upper and lower bounds may tighten over time.

Upper bounds: Deriving/analyzing algorithms and proving them correct

Lower bounds: Theoretical arguments based on bisection width, and the like

Typical complexity classes

Improving upper bounds Shifting lower bounds

log n log n 2 n / log n n n log log n n log n n 2

1988 Zak’s thm. (log n)

1994 Ying’s thm. (log n) 2

1996 Dana’s alg.

O(n)

1991 Chin’s alg.

O(n log log n)

1988 Bert’s alg. O(n log n)

1982 Anne’s alg. O(n ) 2

Optimal algorithm?

Sublinear Linear

Superlinear



Some Notions of Algorithm Optimality

Time optimality (optimal algorithm, for short)

T(n, p) = g(n, p), where g(n, p) is an established lower bound

Cost-time optimality (cost-optimal algorithm, for short)

pT(n, p) = T(n, 1); i.e., redundancy = utilization = 1

Cost-time efficiency (efficient algorithm, for short)

pT(n, p) = (T(n, 1)); i.e., redundancy = utilization = (1)

Problem size Number of processors


3.3. Complexity Classes

Problems whose running times are upper bounded by polynomials in n are said to belong to the P class and are generally considered to be tractable.

Even if the polynomial is of a high degree, such that a large problem requires years of computation on the fastest available supercomputer.

In complexity theory, problems are divided into several complexity classes according to their running times on a single-processor system (or a deterministic Turing machine, to be more exact).


For example, if solving a problem of size n requires the execution of 2n machine instructions, the running time for

n= 100 on a GIPS (Giga IPS) processor will be around 400 billion centuries!

problems for which the best known deterministic algorithm runs in exponential time are intractable.

A problem of this kind for which, when given a solution, the correctness of the solution can be verified in polynomial time,

is said to belong to the NP (nondeterministic polynomial) class.



Figure 3.4. A conceptual view of complexity classes and their relationships



3.4. Parallelizable Tasks And The NC Class

A problem that takes 400 billion centuries to solve on a uniprocessor, would still take 400 centuries even if it can be perfectly parallelized over 1 billion processors.

Again, this statement does not refer to specific instances of the problem but to a general solution for all instances.

parallel processing is generally of no avail for solving NP problems.

Thus, parallel processing is primarily useful for speeding upthe execution time of the problems in P.


Efficiently parallelizable problems in P might be defined as those problems that can be solved in a time period that is at most poly logarithmic in the problem size n,

This class of problems was later named Nick’s Class (NC) in his honor. The class NC has been extensively studied and forms a foundation for parallel complexity theory.

i.e.,T(p) = O(log k n) for some constant k, using no more than a polynomial number

p =O(n l ) of processors.

3.4. Parallelizable Tasks And The NC Class


3.5 Parallel Programming Paradigms Divide and conquerDecompose problem of size n into smaller problems; solve sub

problems independently; combine sub problem results into final answer. T(n) =Td(n) +Ts+Tc(n)

RandomizationWhen it is impossible or difficult to decompose a large problem into sub

problems with equal solution times, one might use random decisions that lead to good results with very high probability.

Example: sorting with random sampling

ApproximationIterative numerical methods may use approximation to arrive at

solution(s). Example: Solving linear systems using Jacobi relaxation. Under proper conditions, the iterations converge to the correct

solutions; more iterations greater accuracy


1. Random search:When a large space must be searched for an element with

certain desired properties, and it is known that such elements are abundant, random search can lead to very good average-case performance.

The other randomization methods are:

2. Control randomization:To avoid consistently experiencing close to worst-case performance with one algorithm, related to some unfortunate distribution of inputs, the algorithm to be applied for solving a problem, or an algorithm parameter, can be chosen at random.



3. Symmetry breaking:Interacting deterministic processes may exhibit a cyclic behavior that leads to deadlock (akin to two people colliding when they try to exit a room through a narrow door, backing up, and then colliding again). Randomization can be used to break the symmetry and thus the deadlock.



3.6 Solving Recurrences

f(n) = f(n/2) + 1 rewrite f(n/2) as f((n/2)/2 + 1 = f(n/4) + 1 + 1= f(n/8) + 1 + 1 + 1 . . .

= f(n/n) + 1 + 1 + 1 + . . . + 1 -------- log2 n times --------

= log2 n = (log n)

This method is known as unrolling

f(n) = f(n – 1) + n rewrite f(n – 1) as f((n – 1) – 1) + n – 1= f(n – 2) + n – 1 + n= f(n – 3) + n – 2 + n – 1 + n . . .

= f(1) + 2 + 3 + . . . + n – 1 + n = n(n + 1)/2 – 1 = (n2)

In all examples below, ƒ(1) = 0 is assumed.


More Example of Recurrence Unrolling

f(n) = f(n/2) + n = f(n/4) + n/2 + n= f(n/8) + n/4 + n/2 + n . . .

= f(n/n) + 2 + 4 + . . . + n/4 + n/2 + n = 2n – 2 = (n)

f(n) = 2f(n/2) + 1 = 4f(n/4) + 2 + 1= 8f(n/8) + 4 + 2 + 1 . . .

= n f(n/n) + n/2 + . . . + 4 + 2 + 1 = n – 1 = (n)


Still More Examples of Unrolling

f(n) = f(n/2) + log2 n = f(n/4) + log2(n/2) + log2 n= f(n/8) + log2(n/4) + log2(n/2) + log2 n . . .

= f(n/n) + log2 2 + log2 4 + . . . + log2(n/2) + log2 n

= 1 + 2 + 3 + . . . + log2 n= log2 n (log2 n + 1)/2 = (log2

n)

f(n) = 2f(n/2) + n = 4f(n/4) + n + n= 8f(n/8) + n + n + n . . .

= n f(n/n) + n + n + n + . . . + n --------- log2 n times ---------

= n log2n = (n log n)


Master Theorem for Recurrences

Theorem 3.1:

Given f(n) = a f(n/b) + h(n); a, b constant, h arbitrary function

the asymptotic solution to the recurrence is (c = logb a)

f(n) = (n c) if h(n) = O(n

c – ) for some > 0

f(n) = (n c log n) if h(n) = (n

c)

f(n) = (h(n)) if h(n) = (n c + ) for some > 0

Example: f(n) = 2 f(n/2) + 1a = b = 2; c = logb a = 1h(n) = 1 = O( n

1 – )f(n) = (n

c) = (n)


The End

winter 2014parallel processing, fundamental conceptsslide 1 3 parallel algorithm complexity review...

Documents