adding parallelism to undergraduate algorithms computational models and algorithms

72
Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Upload: sara-jordan

Post on 11-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Adding Parallelism to Undergraduate Algorithms

Computational Models and Algorithms

Page 2: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

RAM model for Sequential Computation

• RAM Random Access Memory• Serves as a universal model for sequential

computation.• Based on the von Neumann architecture• Allows cost to be assigned for different opns

– Constant time operations are memory access; adding, subtracting, multiplying, etc. numbers, incrementing indices, etc.

• Allows cost to be assigned to loops, based on content and number of repetitions.

2

Page 3: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Benefits of a Common Sequential Model

• Has fostered six decades of prosperity and fast growth of computer software

• The performance of the single CPU has grown exponentially as the CPU clock frequency increased.

• The laws of physics limit increasing the clock frequency of sequential hardware due to voltage leakage and heat dissipation problems.

• .

3

Page 4: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Flynn’s Taxonomy forParallel Computers

• Instruction streams (I) and data streams (D) are classified as being single (S) or multiple (M)

• MIMD -- multiple instruction streams and multiple data streams– Generally considered to be the most important class– Includes most computers currently being built

• SIMD – Single instruction stream and multiple data streams– Includes the massive parallel computers of earlier

years– MPP, Connection Machine, MP-1

• Hybrids of these two major classes exist as well.4

Page 5: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

MIMD

• Multiple instruction streams, multiple data• Processors are asynchronous, since they can

independently execute different instructions on different data sets.

• Communications are handled either – through shared memory. (multiprocessors)– by use of message passing (multicomputers)

• MIMD’s have been considered by most researchers to include the most important and least restricted computers.

5

Page 6: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

The SIMD Parallel Model

• SIMD means Single Instruction, Multiple Data.• The instruction stream (IS) is also called the

control unit and stores the program to be executed.

• The IS compiles the program and broadcasts the executable commands to the PEs (or processing elements).

• The PEs are very simplistic and are essentially ALUs

• The active PEs execute the IS steps synchronously. 6

Page 7: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

MIMD Parallel Models

• Parallel programming should be as simple as sequential programming.– Sequential programming is deterministic and

predictable • Unfortunately, there is no common model for

parallel execution • Parallel programming is normally optimized for

average case performance and typically worst case is not considered.

• Non-determinism is key to achieving high performance and scalable programs in many-core

7

Page 8: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Problems that make Parallel Programming Difficult

• Race conditions• Data dependency• Load balancing• Non-determinism• Deadlocks• Dynamic Scheduling normally required for real-time

applications.• Numerous NP-hard problems for MIMDs.

– See Garey and Johnson, “Computers & Intractability: a Guide to the Theory of NP-completeness”, and related references.

– Includes load balancing & dynamic scheduling

8

Page 9: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Consequences• MIMD software solutions to non-trivial problems

will require more much software.• NP-hard problems are typically handled using

fast polynomial approximation algorithms.• For large problems, the software required to

handle preceding problems is typically many times larger the sequential solution to the original problem.

• Use of SIMD computation can avoid essentially all of these problems.

9

Page 10: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Additional SIMD Properties

• Varying subsets of PEs can remain idle during execution of portions of the program.

• All active processors executes the same instructions synchronously, but on different data.

• On memory accesses, all active PEs must access the same location in their local memory

• The data items in different PEs in the same location form an array or vector.– An instruction can act on the entire vector in one step.

10

Page 11: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

How to View a SIMD Machine

• Think of soldiers in formation.• The commander selects certain soldiers as

active – for example, the first row• The commander barks out an order to all active

soldiers (e.g., “at ease”).• These soldiers execute this order synchronously• The remaining soldiers do not execute any

orders until they are reactivated.

11

Page 12: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

12

The SIMD Model

IS

PE

NETWORK

PEMemory

· · ·

PEMemory

PEMemory

Page 13: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

13

1

Busy-idle

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

0

1

1

0

0

1

IS

Typical Data Structure for SIMD Model

• Make, color, etc. are fields established by the programmer

• An instruction can act on parts or all of a field (i.e., vector) in one step

Page 14: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

14

Associative Computers

• A SIMD computer with a few additional constant-time operations, supported in hardware.

• These additional features can normally be supported (less efficiently) in traditional SIMDs in software.

• The name “associative” is due to computer’s ability to locate items in the memory of PEs by content rather than location.

Page 15: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

ASC Additional Operations

• Responder Processing– The IS can detect if a data test is satisfied by any PE

in constant time (i.e., any-responders property).– The IS can select an arbitrary responding PE in

constant time (i.e., pick-one property).

• Constant Time Global Operations (across PEs)– Compute Logical OR and AND of binary values from a

vector field– Compute the maximum or minimum of values from a

vector field– Associative searches

15

Page 16: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

16

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

The Associative Search

IS asks for all cars that are red and on the lot.

PE1 and PE7 respond by setting a mask bit in their PE.

Page 17: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

17

Non-SIMD Properties of ASC

• Observation: The ASC properties that are unusual for SIMDs are the following constant time operations:– Constant time responder processing

• Any-responders?• Pick-one

– Constant time global operations• Logical OR and AND of binary values• Maximum and minimum value of numbers• Associative Searches

• Justification: Timings are justified by implementations using a resolver in the paper by Jin, Baker, & Batcher (listed in associative references and paper posted).

Page 18: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

18

Associative Computer Motivation• The STARAN Computer (Goodyear Aerospace, late

1960’s to early 1970’s) was built for air traffic control (ATC) – Kenneth Batcher was its chief architect.– Its primary purpose was to handle air traffic control

(ATC)• A second generation STARAN, called the ASPRO, was

built for use by the Navy for a combination of ATC and Air Defense Systems. – Was heavily used by the Navy for over 10 years

• ATC is essentially a dynamic data base problem, with most of its data changing very rapidly.

• The additional “non-SIMD” properties allows “flat” dynamic data base records to be accessed and changed extremely rapidly (i.e., in constant time).

Page 19: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Advantages of ASC Algorithms over SIMD Algorithms

• The non-SIMD properties of ASC simplify the presentation of SIMD algorithms.

• ASC algorithms frequently look almost identical to their sequential counterpart.

• The ASC “associative search” often allows linked lists, queues, and sorted list to be eliminated. – The ASC algorithm often appears simpler than its

sequential counterpart.• The non-SIMD constant time operations simplify the

complexity analysis of ASC algorithms.

19

Page 20: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

ASC Algorithms Advantages (cont)

• The use of the non-SIMD constant time operations eliminates the need to use the interconnection network in many algorithms.– Easier to design optimal algorithms– When the interconnection network is used, the

running time normally has to include the network diameter of the network as a factor.

20

Page 21: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Curricular Advantages of Simpler Algorithms and Programs

• ASC algorithms and programs are very similar to sequential ones.

• ASC computation is deterministic and predictable• ASC algorithms and programs can be easily merged into

the current curriculum with minimal effort.• Basic parallelism issue such as Amdahl law, optimizing

use of processors, etc. can be emphasized • This results in parallelism being easier to include early in

the program • MIMD issues like thread synchronization, race

conditions, load balancing, false sharing, etc. can be postponed until later in the curriculum.

21

Page 22: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

Creating New ASC Algorithms• ASC algorithms can often be created by making obvious

modifications to one of the standard sequential algorithms for a given problem.

• The majority of PRAM algorithms involve the use of a single instruction stream. These can often be easily converted into an ASC algorithm.

• Potentially useful PRAM references:– Selim Akl, Parallel Computational Models and Methods, Prentice

Hall, 1997.– Russ Miller and Laurence Boxer, Algorithms Sequential &

Parallel: A Unified Approach, Prentice Hall, 2000.– Joseph JaJa, An Introduction to Parallel Algorithms, Addison

Wesley, 1992

22

Page 23: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

An Associative Program for the Minimal Spanning Tree

• Unlike the version of the MST algorithm discussed later, the objects here are nodes

• This implementation will be on a ClearSpeed SIMD chip, using the Cn language– Cn is a parallel extension of ANSI C

• The Non-SIMD operations have been efficiently implemented as functions in Cn .– These are non-constant and run in O(# PEs)

• The graph is represented using an adjacency matrix.• The code follows almost exactly the steps of a sequential

algorithm for MST.

23

Page 24: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

24

An Associative Program for the MST Problem using ClearSpeed

Hassan [email protected]

Page 25: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

25

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 26: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

26

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node

a

IS

∞∞∞9∞∞f

∞∞363∞e

∞3∞∞4∞d

96∞∞78 c

∞347∞2b

∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 27: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

27

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 28: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

28

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

wait∞∞∞9∞∞f

wait∞∞363∞e

wait∞3∞∞4∞d

8ayes96∞∞78 c

2ayes∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 29: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

29

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 30: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

30

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 31: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

31

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node e

a

IS

wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 32: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

32

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 33: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

33

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 34: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

34

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node d

a

IS

wait∞∞∞9∞∞f

3bno∞∞363∞e

3eyes∞3∞∞4∞d

6eyes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 35: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

35

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 36: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

36

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node c

a

IS

wait∞∞∞9∞∞f

3bno∞∞363∞e

3eno∞3∞∞4∞d

6eyes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 37: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

37

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 38: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

38

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 39: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

39

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node f

a

IS

yes∞∞∞9∞∞f

3bno∞∞363∞e

3eno∞3∞∞4∞d

6eno96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

c 9

Page 40: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

40

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

8

96

3

3

4

7

2

Page 41: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

41

Short Version of Algorithm: ASC-MST-PRIM(root)1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 42: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

42

Algorithm: ASC-MST-PRIM• Initially assign any node to root.• All processors set

– candidate$ to “wait”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes– Set their parent$ field to root.– Set current_best$ = d.

Page 43: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

43

Algorithm: ASC-MST-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors to those whose candidate

field is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say node y.

» Set the candidate$ value of node y to “no” – Set the scalar variable next-node to y.

Page 44: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

44

Algorithm: ASC-MST-PRIM (cont. 3/3)

–If the value z in the next_node column of a processor is less than its current_best$ value, then»Set current_best$ to z. »Set parent$ to next_node

– For all processors, if candidate$ is “waiting” and the distance of its node from next_node y is finite, then • Set candidate$ to “yes”• Set current_best$ to the distance of its node from y.• Set parent$ to y

Page 45: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

45

MST in Cn By Hassan AL-Maksousy•//----------------------------------------------------------•// mst.cn•// by Hassan AL-Maksousy•// 2012-02-02•//----------------------------------------------------------• •#include <lib_ext.h>•#include <stdiop.h>•#include "asc.h"• •struct nodes•{• char nodeId;• char candidate;• int nodeArray[6];• int current_best;• int parent;•};•//----------------------------------------------------------•void initialize(poly struct nodes* node)•{• poly int i;•//Initialize candidates to 'waiting'• (*node).candidate = 'w';• (*node).parent = -1;

• //All active processor set current_best to infinit• (*node).current_best = 32;• for(i=0;i<6;i++)• {• (*node).nodeArray[i]=32;• }•  • if (get_penum() == 0)• {• (*node).nodeId = 'a';• (*node).nodeArray[1] = 2;• (*node).nodeArray[2] = 8;• }•  

• if (get_penum() == 1)• {• (*node).nodeId = 'b';• (*node).nodeArray[0] = 2;• (*node).nodeArray[2] = 7;• (*node).nodeArray[3] = 4;• (*node).nodeArray[4] = 3;• }•  

• if (get_penum() == 2)• {• (*node).nodeId = 'c';

Page 46: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

46

•(*node).nodeArray[0] = 8;• (*node).nodeArray[1] = 7;• (*node).nodeArray[4] = 6;• (*node).nodeArray[5] = 9;• }• if (get_penum() == 3)• {• (*node).nodeId = 'd';• (*node).nodeArray[1] = 4;• (*node).nodeArray[4] = 3;• }• if (get_penum() == 4)• {• (*node).nodeId = 'e';• (*node).nodeArray[1] = 3;• (*node).nodeArray[2] = 6;• (*node).nodeArray[3] = 3;• }• if (get_penum() == 5)• {• (*node).nodeId = 'f';• (*node).nodeArray[2] = 9;• }• 

•} //initilize

• //----------------------------------------------------------• void showNodes(poly struct nodes node)• {• poly char letter[7]={' ','a','b','c','d','e','f'};• //Print nodeID, candidate, parent, and current

best //distance.• printfp("Node %c: Candidate %c, Parent: %c,

Current Best: %d\n"• , node.nodeId• , node.candidate• , letter[node.parent+1]• , ((node.current_best < 32) ? node.current_best :

0)• );• printf("\n");•  

• } //End of showNodes(poly struct nodes)•  

• //----------------------------------------------------------•  

• int main()• {• poly int mask = 0;• //Initially assign any node to root.• poly int root = 0; // we choose the root here // a• poly int minimum = 0;

MST in Cn By Hassan AL-Maksousy

Page 47: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

47

• poly struct nodes node;• poly int nextNode;• int control=0;• int i;• poly int minCpu=100;• poly char letter[6]={'a','b','c','d','e','f'};• • 

• if ( get_penum() < 6 )• {•//All processors set •//candidate$ to 'wait'•//current-best$ to infinit• initialize(&node);• showNodes(node);• //the candidate field for the root node to 'no' • if (get_penum() == root)• {• node.candidate = 'n';• control+=1;• printfp("Root = %c\n", letter[root]);• }

• // All processors whose distance d from their node to //root node is finite do

• if ((node.nodeArray[root] >= 0)&& • (node.nodeArray[root] != 32))• {• //Set their candidate$ field to 'yes'• node.candidate = 'y';• //Set their parent$ field to root.• node.parent = root;• //Set current_best$ = d• node.current_best=node.nodeArray[root];• }• //While the candidate field of some processor is 'yes'• while (control < 6 )• {• //Restrict the active processors to those whose

//candidate field • //is 'yes' and (for these processors) do• if (node.candidate == 'y')• { • mask = 1;• //Compute the minimum value x of current_best$.• minimum = min_int(node.current_best);• }•

MST in Cn By Hassan AL-Maksousy

Page 48: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

48

•nextNode = -1;• showNodes(node);• 

•//Restrict the active processors to those with //current_best$ = x and do• if ((mask == 1) && • (minimum == node.current_best)) • {• minCpu=get_penum();• minCpu=min_int(minCpu);• }•//pick an active processor, say node y• if ( minCpu == get_penum())• {•//Set the candidate$ value of node y to 'no'• node.candidate = 'n';• control+=1;• nextNode = get_penum();• printfp("Next Node = %c\n", letter[nextNode]);• }•// save nextNode value to all processor• nextNode = max_int(nextNode);

• if(node.candidate != 'n')• {• //If the value z in the next_node column of a

//processor is less than • //its current_best$ value, then • if (node.nodeArray[nextNode] <

node.current_best)• {• //Set current_best$ to z. • node.current_best =

node.nodeArray[nextNode];• //Set parent$ to next_node• node.parent = nextNode;• } •  

• //For all processors, if candidate$ is "waiting" and //the distance of its

• //node from next_node y is finite, then• if ((node.nodeArray[nextNode] < 32) &&• (node.candidate == 'w'))• { • //Set candidate$ to "yes"• node.candidate = 'y';

MST in Cn By Hassan AL-Maksousy

Page 49: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

49

•//Set current_best$ to the distance of its node from y.• node.current_best = node.nodeArray[nextNode];•//Set parent$ to y.• node.parent = nextNode;• }•

• } // endif 'n'• mask = 0;• minCpu = 100 ;• } //end while• showNodes(node);• printf ("End of program \n");• }//endif get_penum()•} //main

•// Compiling and running the program

•-bash-3.00$ csreset•-bash-3.00$ cscn asc.cn mst.cn•-bash-3.00$ csrun a.csx

• Node a: Candidate w, Parent: , Current Best: 0• Node b: Candidate w, Parent: , Current Best: 0• Node c: Candidate w, Parent: , Current Best: 0• Node d: Candidate w, Parent: , Current Best: 0• Node e: Candidate w, Parent: , Current Best: 0• Node f: Candidate w, Parent: , Current Best: 0• Root = a• Node a: Candidate n, Parent: , Current Best: 0• Node b: Candidate y, Parent: a, Current Best: 2• Node c: Candidate y, Parent: a, Current Best: 8• Node d: Candidate w, Parent: , Current Best: 0• Node e: Candidate w, Parent: , Current Best: 0• Node f: Candidate w, Parent: , Current Best: 0• Next Node = b• Node a: Candidate n, Parent: , Current Best: 0• Node b: Candidate n, Parent: a, Current Best: 2• Node c: Candidate y, Parent: b, Current Best: 7• Node d: Candidate y, Parent: b, Current Best: 4• Node e: Candidate y, Parent: b, Current Best: 3• Node f: Candidate w, Parent: , Current Best: 0

MST in Cn By Hassan AL-Maksousy

Page 50: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

50

Next Node = e

Node a: Candidate n, Parent: , Current Best: 0

Node b: Candidate n, Parent: a, Current Best: 2

Node c: Candidate y, Parent: e, Current Best: 6

Node d: Candidate y, Parent: e, Current Best: 3

Node e: Candidate n, Parent: b, Current Best: 3

Node f: Candidate w, Parent: , Current Best: 0

Next Node = d

Node a: Candidate n, Parent: , Current Best: 0

Node b: Candidate n, Parent: a, Current Best: 2

Node c: Candidate y, Parent: e, Current Best: 6

Node d: Candidate n, Parent: e, Current Best: 3

Node e: Candidate n, Parent: b, Current Best: 3

Node f: Candidate w, Parent: , Current Best: 0

Next Node = c

Node a: Candidate n, Parent: , Current Best: 0

Node b: Candidate n, Parent: a, Current Best: 2

Node c: Candidate n, Parent: e, Current Best: 6

Node d: Candidate n, Parent: e, Current Best: 3

Node e: Candidate n, Parent: b, Current Best: 3

Node f: Candidate y, Parent: c, Current Best: 9

Next Node = f

Node a: Candidate n, Parent: , Current Best: 0

Node b: Candidate n, Parent: a, Current Best: 2

Node c: Candidate n, Parent: e, Current Best: 6

Node d: Candidate n, Parent: e, Current Best: 3

Node e: Candidate n, Parent: b, Current Best: 3

Node f: Candidate n, Parent: c, Current Best: 9

End of program

MST in Cn By Hassan AL-Maksousy

Page 51: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

THE END

Additional reference slides follow

Page 52: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

52

Basic Properties of ASC Model• Instruction Stream

– The IS has a copy of the program and can broadcast instructions to cells in unit time

• Cell Properties– Each cell consists of a PE and its local memory– All cells listen to the IS – A cell can be active, inactive, or idle

• Inactive cells listens to but does not execute IS commands until reactivated

• Idle cells contain no essential data and are available for reassignment

• Active cells execute IS commands synchronously

Page 53: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

53

Basic Properties of ASC

• Responder Processing– The IS can detect if a data test is satisfied by

any of its responder cells in constant time (i.e., any-responders property).

– The IS can select an arbitrary responder in constant time (i.e., pick-one property).

Page 54: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

54

• Constant Time Global Operations (across PEs)– Logical OR and AND of binary values– Maximum and minimum of numbers– Associative searches

• Communications– There are at least two real or virtual networks

• PE communications (or cell) network • IS broadcast/reduction network (which could be

implemented as two separate networks)

Basic Properties of ASC

Page 55: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

55

Basic Properties of ASC

– The PE communications network is normally supported by an interconnection network• E.g., a 2D mesh

– The broadcast/reduction network(s) are normally supported by a broadcast and a reduction network (sometimes combined).• See posted paper by Jin, Baker, & Batcher (listed in

associative references)

• Control Features– PEs and the IS and the networks all operate

synchronously, using the same clock

Page 56: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

56

Characteristics of Associative

Programming • Consistent use of style of programming called

data parallel programming• Consistent use of global associative searching

and responder processing• Usually, frequent use of the constant time

global reduction operations: AND, OR, MAX, MIN

• Broadcast of data using an IS bus allows the use of the PE network to be restricted to large synchronous parallel data movement.

Page 57: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

57

Characteristics of Associative Programming

• Tabular representation of data – (i.e., 2D arrays)• Use of searching instead of sorting• Use of searching instead of pointers• Use of searching instead of the ordering provided

by linked lists, stacks, queues• Promotes an highly intuitive programming style

that promotes high productivity• Uses structure codes (i.e., numeric

representation) to represent data structures such as trees, graphs, embedded lists, and matrices.

• Examples of the above are given in– Ref: Nov. 1994 IEEE Computer journal paper.– More examples given in “Associative Computing” book

by Potter.

Page 58: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

58

Algorithms and Programs Implemented in ASC

• A wide range of algorithms have been implemented in ASC without the use of the PE network:– Graph Algorithms

• minimal spanning tree• shortest path• connected components

– Computational Geometry Algorithms• convex hull algorithms (Jarvis March, Quickhull,

Graham Scan, etc)• Dynamic hull algorithms

Page 59: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

59

ASC Algorithms and Programs(not requiring PE network)

– String Matching Algorithms• all exact substring matches• all exact matches with “don’t care” (i.e., wild card)

characters.

– Algorithms for NP-complete problems• traveling salesperson • 2-D knapsack.

– Data Base Management Software• associative data base• relational data base

Page 60: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

60

ASC Algorithms and Programs (not requiring a PE network)

– A Two Pass Compiler for ASC – not the one we will be using. This compiler runs on an associative computer & uses ASC parallelism.• first pass• optimization phase

– Two Rule-Based Inference Engines for AI• An Expert System OPS-5 interpreter• PPL (Parallel Production Language interpreter)

– A Context Sensitive Language Interpreter• (OPS-5 variables force context sensitivity)

– An associative PROLOG interpreter

Page 61: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

61

Associative Algorithms & Programs (using a network)

• There are numerous associative algortihms or programs that use a PE network;– 2-D Knapsack ASC Algorithm using a 1-D mesh– Image processing algorithms using 1-D mesh– FFT (Fast Fourier Transform) using 1-D nearest

neighbor & Flip networks– Matrix Multiplication using 1-D mesh– An Air Traffic Control Program (using Flip network

connecting PEs to memory)• Demonstrated using live data at Knoxville in mid

70’s.• All but first were created and/or implemented in

assembler for STARAN at Goodyear Aerospace

Page 62: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

62

The MST Problem

• The MST problem assumes the weights are positive, the graph is connected, and seeks to find the minimal spanning tree, – i.e. a subgraph that is a tree1, that includes all

nodes (i.e. it spans), and– where the sum of the weights on the arcs of the

subgraph is the smallest possible weight (i.e. it is minimal).

• Why would an algorithm solving this problem be useful?• Note: The solution may not be unique.1 A tree is a set of points called vertices, pairs of distinct

vertices called edges, such that (1) there is a sequence of edges called a path from any vertex to any other, and (2) there are no circuits, that is, no paths starting from a vertex and returning to the same vertex.

Page 63: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

63

ASC-MST Algorithm Preliminaries

• Next, a “data structure” level presentation of Prim’s algorithm for the MST is given.

• The data structure used is illustrated in the next two slides. – This example is from the Nov. 1994 IEEE Computer

paper cited in the references. • There are two types of variables for the ASC

model, namely– the parallel variables (i.e., ones for the PEs) – the scalar variables (ie., the ones used by the IS). – Scalar variables are essentially global variables.

• Could replace each scalar variable with its scalar value stored in each entry of a parallel variable.

Page 64: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

64

ASC-MST Algorithm Preliminaries (cont.)

• In order to distinguish between variable types here, the parallel variables names will end with a “$” symbol.

• Each step in this algorithm takes constant time. • One MST edge is selected during each pass through the

loop in this algorithm.• Since a spanning tree has n-1 edges, the running time of

this algorithm is O(n) and its cost is O(n 2).– Definition of cost is (running time) (number of processors)

• Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal.

Page 65: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

65

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

2 8

96

3

3

4

7

2

Page 66: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

66

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 67: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

67

Short Version of Algorithm: ASC-MST-PRIM(root)1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 68: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

68

Comments on ASC-MST Algorithm• The three preceding slides are Figure 6 in [Potter, Baker,

et.al.] IEEE Computer, Nov 1994].• Preceding slide gives a compact, data-structures level

pseudo-code description for this algorithm – Pseudo-code illustrates Potter’s use of pronouns

(e.g., them, its) and possessive nouns.– The mindex function returns the index of a processor

holding the minimal value.– This MST pseudo-code is much shorter and simpler

than data-structure level sequential MST pseudo-codes • e.g., see one of Baase’s textbooks cited in references• Algorithm given in Baase’s books is identical to this parallel

algorithm, except for a sequential computer• Next, a more detailed explanation of the algorithm in

preceding slide will be given next.

Page 69: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

69

Tracing 1st Pass of MST Algorithm on Figure 6 (Put below chart & Figure 6 on board)

curr

ent_

best

$

cand

idat

e$

next-node

a

IS

∞∞∞9∞∞f

∞∞363∞e

∞3∞∞4∞d

96∞∞78 c

∞347∞2b

∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 70: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

70

Algorithm: ASC-MST-PRIM• Initially assign any node to root.• All processors set

– candidate$ to “wait”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes– Set their parent$ field to root.– Set current_best$ = d.

Page 71: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

71

Algorithm: ASC-MST-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors to those whose

candidate field is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say node y.

» Set the candidate$ value of node y to “no” – Set the scalar variable next-node to y.

Page 72: Adding Parallelism to Undergraduate Algorithms Computational Models and Algorithms

72

Algorithm: ASC-MST-PRIM (cont. 3/3)

– If the value z in the next_node column of a processor is less than its current_best$ value, then»Set current_best$ to z. »Set parent$ to next_node

– For all processors, if candidate$ is “waiting” and the distance of its node from next_node y is finite, then • Set candidate$ to “yes”• Set current_best$ to the distance of its node from

y.• Set parent$ to y