chao “bill” xie, victor bolet, art vandenberg georgia state university, atlanta, ga 30303, usa
Post on 01-Jan-2016
25 Views
Preview:
DESCRIPTION
TRANSCRIPT
Chao “Bill” Xie, Victor Bolet, Art VandenbergGeorgia State University, Atlanta, GA 30303, USA
February 22/23, 2006SURA, Washington DC
Memory Efficient Pairwise Genome Alignment Algorithm –
A Small-Scale Application with Grid Potential
Introduction
• Small scale application is studied in the grid environment• Performances are compared with shared memory
environment, grid environment and cluster environment• Pairwise sequence alignment program is chosen as a small
scale application• The basic algorithm is modified to a memory efficient algorithm• The parallel implementation for pairwise sequence alignment is
studied in different environments
• Based on work done by Nova Ahmed, NMI Integration Testbed
Specification of the Distributed Environments
• Shared Memory environment is a SGI ORIGIN 2000 machine with 24 CPUs
• Cluster environment at UAB was a beowulf cluster with 8 homogenous nodes, each node with four 550 MHz Pentium III processors with 512 MB of RAM
• Grid environment is the same beowulf cluster of the cluster environment with the Globus Toolkit software layer over it.
• Summer 2005 USC HPC resources used
• Two dimensional array - Similarity Matrix - stores the two sequences
• A match or a mismatch is calculated for each position in the pair of sequences to be matched
• Dynamic programming is used
The Basic Pairwise Sequence Alignment Algorithm
0 0 0 0 0 0 0
0 0 1
0 0 1
0
0
0
0
G A G A A G A G A C
A
A
G
A
A
Sequence X
Sequence Y
0 0 0 0
0 1
0 1
1
2
A
The Reduced Memory Algorithm
• Keep only nonzero elements of the matrix
• Memory dynamically allocated as required
• New data structure for efficiency
The Parallel Method
•The genome sequences are divided among processors•The Similarity Matrix is divided among processors
P1 P2 P3 P4 P5
Part being computedComputation completedPi sends Edge value to Pi+1
Time
Results
28
14
20
26
0
100
200
300
400
500
Computation Time (seconds)
Number of Processors
Genome length 3000(Grid)
Genome length 3000(Cluster)
Genome length 3000( Shared Memory)
28
14
20
26
0
200
400
600
Computation Time (seconds)
Number of Processors
Genome length 10000 (Grid)
Genome length 10000( Cluster)
Computation time: Shared Memory, Cluster, Grid-enabled Cluster environment
Computation time: Cluster, Grid-enabled Cluster environment
2 4 6 8 10 12 14 16 18 20 22 24 26
05
10
15
20
25
30
35
40
45
50
Speed Up
Number of Processors
Genome length 3000(Grid)
Genome length 3000(Cluster)
Genome length 3000(Shared Memory)
Comparison of speed up: Shared Memory, Cluster, and
Grid-enabled Cluster environment
Comparison of speed up: Cluster, and Grid-enabled Cluster environment
24 6
810
1214
1618
2022
2426
3032
0
5
10
15
20
25
Speed Up
Number of Processors
Genome length 10000 (Grid)
Genome length 10000( Cluster)
Results
UAB multi-cluster
(a) Computation time (b) Speedup
Comparison of multi-Cluster Grid environments
0
100
200
300
400
500
0 5 10 15 20 25 30
Number of processors
Computation time (sec)
Single Cluster
Single Clustered
Grid
Multi Clustered
Grid
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Number of Processors
Speed up
Single Cluster
Single Clustered
Grid
Multi Clustered Grid
Running Example
04.08.2004 (per Nova Ahmed, UAB Beowulf Cluster: Medusa)
Here the steps of running the genome alignment program for grid.
First the sample program which aligns a very small genome sequence is tested. The genome sequences were t1.txt, t2.txt
The object file is:
ar7
Grid-proxy-init, RSL script, globusrun
1. First the grid-proxy-init is run to get the grid certificate
Your identity: /O=Grid/OU=UAB Grid/CN=Nova Ahmed
Enter GRID pass phrase for this identity:
Creating proxy .......................................................
Done
Your proxy is valid until: Fri Apr 9 00:54:24 2004
2. Then create the RSL script in genome.rsl to run the job
& (count=4)
(executable=/home/nova/ar7)
(jobtype=mpi)
3. the actual program ran on the grid using globus run command
globusrun -s -r medusa.lab.ac.uab.edu -f ./genome.rsl
Output Output
------------------------------------
NOVA1
MyId = 1 NumProc = 4
[1 : 1 ->2 2]
[1 : 2 ->13 3]
[1 : 3 ->1 1] [1 : 3 ->11 1]
myid = 1 finished
NOVA1
MyId = 2 NumProc = 4
[2 : 0 ->1 1] [2 : 0 ->11 1]
[2 : 2 ->1 1]
[2 : 3 ->2 2]
[2 : 4 ->2 2] [2 : 4 ->13 3]
[2 : 5 ->1 1] [2 : 5 ->13 3]
myid = 2 finished
NOVA1
MyId = 3 NumProc = 4
[3 : 0 ->11 1] [3 : 0 ->21 1]
[3 : 1 ->2 2]
[3 : 2 ->11 1] [3 : 2 ->31 1]
[3 : 3 ->1 1]
[3 : 4 ->1 1] [3 : 4 ->12 2] [3 : 4 ->21 1]
[3 : 5 ->2 2] [3 : 5 ->12 2] [3 : 5 ->23 3] [3 : 5 ->31 1]
myid = 3 finished
NOVA1
MyId = 0 NumProc = 4
tgatggaggt
gatagg
[0 : 0 ->11 1]
[0 : 2 ->1 1]
[0 : 4 ->11 1]
[0 : 5 ->11 1]
Elapsed time is =0.014624
myid = 0 finished
//----------------------
Running the program using longer genome sequences
a1-1000, a1-2000, a1-3000 compared with
a2-1000, a2-2000, a2-3000
USC HPC – Summer 2005
0
10
20
30
40
50
60
70
80
90
0 50 100 150
Number of processors
Computation time (sec).
Cluster
Grid
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
Number of processors
Computation time (seconds).
Cluster
Grid
(a) for small set sequences (b) for long set sequences
Computation time in Cluster and Grid environment varying number of processors
USC HPC – Summer 2005
(a) for small set sequences (b) for long set sequences
Speed up in the Cluster and Grid environments
0
5
10
15
20
25
30
0 50 100 150 200
Number of processors
Speed Up
Cluster
Grid
0
10
20
30
40
50
60
70
80
90
0 50 100 150 200
Number of processors
Speed Up
Cluster
Grid
Conclusion
• Grid environment shows similar performance to cluster environment • Grid environment adds little overhead• Shared memory environment has better speedup performance compared to cluster and grid• Shared memory environment shows the limitation of memory for computing large genome sequences• Small scale applications (as well as large scale) can run efficiently on a grid• Distributed applications with minimal communication among the processors will see benefit in a grid environment – perhaps even across multiple clusters
Future Work
• Additional work in a SURAgrid environment that includes multiple clusters
• Test data that provides a more computation intensive challenge for grid environments
• Adapt the application to the grid environment such that is is using less inter-process communication
Acknowledgements
• This material is based in part upon work supported by:– National Science Foundation under Grant No. ANI-0123937 - NMI
Integration Testbed Program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)
– SURA Grant SURA-2005-305 - SURAgrid Application Development & Documentation
• Thanks to– Nova Ahmed, currently Georgia Tech Computer Science PhD program, for
original work carried out as part of NMI Integration Testbed Program– John-Paul Robinson and University of Alabama at Birmingham for access to
medusa cluster– Jim Cotillier, Shelley Henderson, University of Southern California, for
access to HPC resources– Chao “Bill” Xie, Georgia State Computer Science PhD program, for continuing
Nova Ahmed’s work– Victor Bolet, Georgia State Information Systems & Technology Advanced
Campus Services unit, for support of Georgia State’s SURAgrid nodes– John McGee, RENCI.org, for discussions of approach using globus
top related