project18’s communication drawing design

project18’s Communication Drawing Design

By: Camilo A. Silva

BIOinformatics

Summer 2008

ObjectiveObjective

Find out what type of MPI Find out what type of MPI communication design could be used communication design could be used for project18for project18

Determine which MPI functions could Determine which MPI functions could be used to accomplish the above be used to accomplish the above objective objective

Communication DesignCommunication Design

What is needed?What is needed? We need all nodes to have the basic data in order to We need all nodes to have the basic data in order to

run the program prior executionrun the program prior execution We need a “master/slave” modelWe need a “master/slave” model All data at the end must be collected and sent back to All data at the end must be collected and sent back to

the master nodethe master node Our communication flow and data computation should Our communication flow and data computation should

be dynamic by using all the resources. be dynamic by using all the resources. E.g. If a processor completes a search it needs to E.g. If a processor completes a search it needs to

continue with the next data computation independentlycontinue with the next data computation independently—without needing to wait for other processors to finish—without needing to wait for other processors to finish

There needs to be a communication flow with the There needs to be a communication flow with the master node that keeps track of the status of the master node that keeps track of the status of the completion of the computation by gathering completion of the computation by gathering information from the slave nodesinformation from the slave nodes

An anti “dead-lock” mechanism must be implementedAn anti “dead-lock” mechanism must be implemented

Blue printThe master node is in charge of coordinating the processes and keeping track of the status of each process of each node.

The slave nodes are following the coordination of the master node. Their processes should be independent. They should report their progress to the Master node in an effective manner.

•At the beginning of the program, all the nodes need to have essential data needed for the program to run.

•At the end of the program the output of each node needs to be collected as one; and sent to the master node for storage and access.

At the beginning…At the beginning…

Let’s assume that all the nodes have the all the data needed for the project18 program to run successfully.

1. When the program is run from the cluster GCB or nay other, the user needs to indicate which genomes will be compared: genome1 vs. genome2

2. That info will be sent to all nodes as a collective function: MPI_Bcast()

MPI_Bcast(&nameOfGenome1, 20, MPI_CHAR, 0, MPI_COMM_WORLD);

InitializationThe Master node will then orchestrate and

administrate the computation amongst the nodes:

1. Since all nodes have the same data and info each node will be given a specific range of indexes to process

2. Such indexes are base locations of genome1 to be contrasted with genome2

1. Here the communication would be point-to-point due to the fact that the master node is communicating with each single node independently

3. Each slave node will compute accordingly to their specified distribution of indexes. The results shall be stored in a text file within each node.

Initialization ExampleGenome1=“aaaaaaacccccccgggggggtttttttcccccccaaaaaaagggggggtttttttcccccc…”

6 13 20 27 34 41 48 55 …

This is a visualization of the array of indexes to be distributed to each single node. In this case, we are using a range of seven (7) bases per process.

In this example, let’s assume that the search range of the distributing probe is 14. Thus, if node 1 will be computing the results of the first 6 bases, the iterations should be as follows:

1. Find pattern “aaaaaaaccccccc” in genome2

2. 2nd pattern “aaaaaacccccccg”

3. Etc… until “acccccccgggggg”

The results of each single node shall be stored on disk as a text file.1

2

4

5

6

7X

3

?

Master node as receiver and Master node as receiver and managermanager

As some of you may have predicted, the master node will be receiving a lot of communication from all the different nodes.

This type of communication is point-to-point and the function used to accomplish this is MPI_Recv()

The master node acts as a manager. It will be receiving completion codes from each node, and it shall record such completions appropriately.

After recording the status of completion of a node, the master node will be in charge of administering and orchestrating the next process for a node. This will be done by creating a simple algorithm involving int arrays just as shown previously.

Keeping trustworthy Keeping trustworthy accountabilityaccountability

The master node needs to know the completion status of a The master node needs to know the completion status of a process in order to keep accountability of completion of each process in order to keep accountability of completion of each nodenode

The master node will determine based on the communication The master node will determine based on the communication sent by the node if all processes were completed.sent by the node if all processes were completed.– This can be done by implementing a simple completion This can be done by implementing a simple completion

counter in each node that will be updated after each search counter in each node that will be updated after each search of the discriminating probe. This int counter will be returned of the discriminating probe. This int counter will be returned to the master node which will verify its count to be the same to the master node which will verify its count to be the same as the index range determined.as the index range determined.

– Such result could be stored in various formats as explained Such result could be stored in various formats as explained in the following slide.in the following slide.

By having an accountable system the master node will be able By having an accountable system the master node will be able

to resubmit a job that was not completed or that did not finishto resubmit a job that was not completed or that did not finish

Tracking down completion statusTracking down completion status

7 6 13 20 27 34 41 48 55

7 6 13 20 27 34 41 48 0

range N1 N2 N3 N4 N5 N6 N7 next

This is the completion code. It will be the same integer as the respective current process (status[0][X]) when it is not yet completed.

If there is an error found, it will receive the value of zero (0).

Let’s assume that N3 was the first one to complete the process. Let’s suppose it completed the searches of the indexes successfully, thus, an int count = 7 shall be returned in an MPI_Recv() to the master node.

Int status[][]

Tracking down completion status

Int 7

7 6 13 55 27 34 41 48 62

7 6 13 55 27 34 41 48 0

range N1 N2 N3 N4 N5 N6 N7 next

If(Check_errors()){…check on error and determine what to do}

Else if (no errors in completion){report completion and assign new job}

When a process is successfully completed, the data of status[][] is modified accordingly and the next process is dynamically assigned to the node that is ready to compute.

Collecting the data

Master Node Using MPI-IO

Issues to consider…Issues to consider…

Bottlenecking and “dead-locking”Bottlenecking and “dead-locking”What’s the solution:What’s the solution:

Asynchronous communication strategiesAsynchronous communication strategiesNon-blocking strategiesNon-blocking strategies

What’s next?

Learn about MPI-IO Study asynchronous communications and

non-blocking communication in order to prevent bottlenecking and dead-locking.

Start programming just for fun!

project18’s communication drawing design

Documents