chao “bill” xie, victor bolet, art vandenberg georgia state university, atlanta, ga 30303, usa

Chao “Bill” Xie, Victor Bolet, Art VandenbergGeorgia State University, Atlanta, GA 30303, USA

February 22/23, 2006SURA, Washington DC

Memory Efficient Pairwise Genome Alignment Algorithm –

A Small-Scale Application with Grid Potential

Introduction

• Small scale application is studied in the grid environment• Performances are compared with shared memory

environment, grid environment and cluster environment• Pairwise sequence alignment program is chosen as a small

scale application• The basic algorithm is modified to a memory efficient algorithm• The parallel implementation for pairwise sequence alignment is

studied in different environments

• Based on work done by Nova Ahmed, NMI Integration Testbed

Specification of the Distributed Environments

• Shared Memory environment is a SGI ORIGIN 2000 machine with 24 CPUs

• Cluster environment at UAB was a beowulf cluster with 8 homogenous nodes, each node with four 550 MHz Pentium III processors with 512 MB of RAM

• Grid environment is the same beowulf cluster of the cluster environment with the Globus Toolkit software layer over it.

• Summer 2005 USC HPC resources used

• Two dimensional array - Similarity Matrix - stores the two sequences

• A match or a mismatch is calculated for each position in the pair of sequences to be matched

• Dynamic programming is used

The Basic Pairwise Sequence Alignment Algorithm

0 0 0 0 0 0 0

0 0 1

0 0 1

0

0

0

0

G A G A A G A G A C

A

A

G

A

A

Sequence X

Sequence Y

0 0 0 0

0 1

0 1

1

2

A

The Reduced Memory Algorithm

• Keep only nonzero elements of the matrix

• Memory dynamically allocated as required

• New data structure for efficiency

The Parallel Method

•The genome sequences are divided among processors•The Similarity Matrix is divided among processors

P1 P2 P3 P4 P5

Part being computedComputation completedPi sends Edge value to Pi+1

Time

Results

28

14

20

26

0

100

200

300

400

500

Computation Time (seconds)

Number of Processors

Genome length 3000(Grid)

Genome length 3000(Cluster)

Genome length 3000( Shared Memory)

28

14

20

26

0

200

400

600

Computation Time (seconds)


Genome length 10000 (Grid)

Genome length 10000( Cluster)

Computation time: Shared Memory, Cluster, Grid-enabled Cluster environment

Computation time: Cluster, Grid-enabled Cluster environment

2 4 6 8 10 12 14 16 18 20 22 24 26

05

10

15

20

25

30

35

40

45

50

Speed Up


Genome length 3000(Grid)

Genome length 3000(Cluster)

Genome length 3000(Shared Memory)

Comparison of speed up: Shared Memory, Cluster, and

Grid-enabled Cluster environment

Comparison of speed up: Cluster, and Grid-enabled Cluster environment

24 6

810

1214

1618

2022

2426

3032

0

5

10

15

20

25

Speed Up


Genome length 10000 (Grid)

Genome length 10000( Cluster)

Results

UAB multi-cluster

(a) Computation time (b) Speedup

Comparison of multi-Cluster Grid environments

0

100

200

300

400

500

0 5 10 15 20 25 30

Number of processors

Computation time (sec)

Single Cluster

Single Clustered

Grid

Multi Clustered

Grid

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30


Speed up

Single Cluster

Single Clustered

Grid

Multi Clustered Grid

Running Example

04.08.2004 (per Nova Ahmed, UAB Beowulf Cluster: Medusa)

Here the steps of running the genome alignment program for grid.

First the sample program which aligns a very small genome sequence is tested. The genome sequences were t1.txt, t2.txt

The object file is:

ar7

Grid-proxy-init, RSL script, globusrun

1. First the grid-proxy-init is run to get the grid certificate

Your identity: /O=Grid/OU=UAB Grid/CN=Nova Ahmed

Enter GRID pass phrase for this identity:

Creating proxy .......................................................

Done

Your proxy is valid until: Fri Apr 9 00:54:24 2004

2. Then create the RSL script in genome.rsl to run the job

& (count=4)

(executable=/home/nova/ar7)

(jobtype=mpi)

3. the actual program ran on the grid using globus run command

globusrun -s -r medusa.lab.ac.uab.edu -f ./genome.rsl

Output Output

------------------------------------

NOVA1

MyId = 1 NumProc = 4

[1 : 1 ->2 2]

[1 : 2 ->13 3]

[1 : 3 ->1 1] [1 : 3 ->11 1]

myid = 1 finished

NOVA1


[2 : 0 ->1 1] [2 : 0 ->11 1]

[2 : 2 ->1 1]

[2 : 3 ->2 2]

[2 : 4 ->2 2] [2 : 4 ->13 3]

[2 : 5 ->1 1] [2 : 5 ->13 3]

myid = 2 finished

NOVA1


[3 : 0 ->11 1] [3 : 0 ->21 1]

[3 : 1 ->2 2]

[3 : 2 ->11 1] [3 : 2 ->31 1]

[3 : 3 ->1 1]

[3 : 4 ->1 1] [3 : 4 ->12 2] [3 : 4 ->21 1]

[3 : 5 ->2 2] [3 : 5 ->12 2] [3 : 5 ->23 3] [3 : 5 ->31 1]

myid = 3 finished

NOVA1


tgatggaggt

gatagg

[0 : 0 ->11 1]

[0 : 2 ->1 1]

[0 : 4 ->11 1]

[0 : 5 ->11 1]

Elapsed time is =0.014624

myid = 0 finished

//----------------------

Running the program using longer genome sequences

a1-1000, a1-2000, a1-3000 compared with

a2-1000, a2-2000, a2-3000

USC HPC – Summer 2005

0

10

20

30

40

50

60

70

80

90

0 50 100 150


Computation time (sec).

Cluster

Grid

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200


Computation time (seconds).

Cluster

Grid

(a) for small set sequences (b) for long set sequences

Computation time in Cluster and Grid environment varying number of processors

USC HPC – Summer 2005

(a) for small set sequences (b) for long set sequences

Speed up in the Cluster and Grid environments

0

5

10

15

20

25

30

0 50 100 150 200


Speed Up

Cluster

Grid

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200


Speed Up

Cluster

Grid

Conclusion

• Grid environment shows similar performance to cluster environment • Grid environment adds little overhead• Shared memory environment has better speedup performance compared to cluster and grid• Shared memory environment shows the limitation of memory for computing large genome sequences• Small scale applications (as well as large scale) can run efficiently on a grid• Distributed applications with minimal communication among the processors will see benefit in a grid environment – perhaps even across multiple clusters

Future Work

• Additional work in a SURAgrid environment that includes multiple clusters

• Test data that provides a more computation intensive challenge for grid environments

• Adapt the application to the grid environment such that is is using less inter-process communication

Acknowledgements

• This material is based in part upon work supported by:– National Science Foundation under Grant No. ANI-0123937 - NMI

Integration Testbed Program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)

– SURA Grant SURA-2005-305 - SURAgrid Application Development & Documentation

• Thanks to– Nova Ahmed, currently Georgia Tech Computer Science PhD program, for

original work carried out as part of NMI Integration Testbed Program– John-Paul Robinson and University of Alabama at Birmingham for access to

medusa cluster– Jim Cotillier, Shelley Henderson, University of Southern California, for

access to HPC resources– Chao “Bill” Xie, Georgia State Computer Science PhD program, for continuing

Nova Ahmed’s work– Victor Bolet, Georgia State Information Systems & Technology Advanced

Campus Services unit, for support of Georgia State’s SURAgrid nodes– John McGee, RENCI.org, for discussions of approach using globus

chao “bill” xie, victor bolet, art vandenberg georgia state university, atlanta, ga 30303, usa

Documents