Download - High performance bioinformatics
![Page 1: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/1.jpg)
HIGH PERFORMANCE
BIOINFORMATICS
Group May 09-06Bryan McCoy
Kinit PatelTyson Williams
![Page 2: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/2.jpg)
Problem/Need Statement Current ways to solve Bioinformatics
problems are either slow or very expensive.
There is a need for a way to reduce cost and still deliver high performance in a computer system that can solve Bioinformatics problems.
![Page 3: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/3.jpg)
What is Bioinformatics? Genetic sequencing. Massive amounts of data. Simple operations but many of them. Perfect for distributed computing.
![Page 4: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/4.jpg)
Proposed Solution Use a cluster of
PS3s with their embedded Cell processors.
![Page 5: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/5.jpg)
Cell Broadband Engine Has 1 central
PowerPC based PPE.
Has 8 surrounding SPEs.
The 8 SPEs are connected via the element interconnect bus.
![Page 6: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/6.jpg)
Cell Broadband Engine
![Page 7: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/7.jpg)
Functional requirements FR1. Ported applications shall run on
the Cell B.E. FR2. The results returned shall be the
same as the original program. FR3. The applications shall return their
runtime. FR4. The applications shall execute in
parallel on multiple Cell B.E.s.
![Page 8: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/8.jpg)
Non-Functional Requirements NF1. The Cells shall all run on the Linux
OS. NF2. The resulting runtimes of the
ported applications shall be faster than on the original applications.
NF3. The ported application shall be coded in the C language.
![Page 9: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/9.jpg)
Operating Environment Use Fedora 9 OS as
it is currently supported by the Cell SDK 3.1.
Uses the command line for user interface.
Use the IBM XLC compiler and/or the current GCC compiler.
![Page 10: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/10.jpg)
Market Survey Results of the survey point to a huge speed
up of computationally intensive programs. Dr. Gaurav Khanna at the University of
Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer.
Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.
![Page 11: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/11.jpg)
Deliverables The Source Code. Compiled Executable. Runtime Comparisons. Project Final Report. Project Poster. Project Final Presentation.
![Page 12: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/12.jpg)
Work Breakdown Structure
Port Apps to Cluster PS3s
Problem Definition
Research Cell/B.E
Research Bioperf Suite
Research Distributed Parallel Algorithms
Research Previously Done Work
End Product Design
Design Requirements
Design Process
Design Documents
Considerations and Selections
Decide Which Linux to Install
Decide which applications to port
End Product Implementation
Hardware Implementation
Prototyping Implementation
Software Implementation
End Product Testing
Ensure Correctness of Output Results
Benchmarking
Final Documentation and Demonstration
Create Final Report
Create Project Poster
Prepare for Presentation
![Page 13: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/13.jpg)
Costs Time
Approximately 555 man hours total.
Freely donated.Total cost $0.
Equipment3 PS3s Crossbar routerProvided for us by
client.Total cost $0.
![Page 14: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/14.jpg)
Resource Requirements 3 PlayStation 3s. High performance network switch. Books on distributed computing on Cell. Time.
![Page 15: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/15.jpg)
Work Schedule Gant chart
![Page 16: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/16.jpg)
Risk Assessment Slow network speed. Software support. Limited RAM. Hardware Failure.
Lower quality entertainment hardware. Limited prior experience. Software development schedule.
![Page 17: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/17.jpg)
Design Further divide the application into
multiple threads for SPE execution on multiple PS3s, alter the functional logic, and vectorize the code where possible.
![Page 18: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/18.jpg)
Software Decomposition Diagram
![Page 19: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/19.jpg)
System Requirements SR1. The system shall allow the user to input multiple
DNA sequences in FASTA format through a file interface.
SR2. The system shall output all of the most parsimonious trees implied by the input data to the screen.
SR3. The system shall share computational work among the PPE and SPEs available to each client/server process.
SR4. The front-end shall share computational work with available back-end processes.
SR5. The front-end shall be able to connect to at least 2 back-end processes via a high performance router.
![Page 20: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/20.jpg)
System Analysis The key is data flow. Broken into 3 stages.
DNA sequences distributed to the PPEs down to the SPEs
Each SPE searches every possible parsimony tree for the best possible score using a branch and bound heuristic.
Finally the results are aggregated back to the main PPE and the results output.
![Page 21: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/21.jpg)
Specifications Input
DNA sequence files in FASTA format. Output
Runtime of the application.The most parsimonious phylogenetic tree.The parsimony score of the phylogenetic
tree.
![Page 22: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/22.jpg)
Specifications User Interface
No changes to the user interface.Uses a command line interface.
![Page 23: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/23.jpg)
Specifications Hardware
3 PlayStation 3sHigh performance
Cross-Bar network switch.
![Page 24: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/24.jpg)
Specifications Software
Fedora 9 with Linux 2.6.25 kernel for the Power PC
IBM Cell SDK 3.1IBM XLC 9.0 and GCC 4.3 compilers.DNAPenny 3.6.Bioperf Suite
![Page 25: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/25.jpg)
Specifications Testing
Compare benchmarked runtimes over several iterations and inputs to get averages.
Compare these runtimes with previous group’s runtimes on single Cell processor.
Compare these runtimes with previous group’s runtimes on a high performance server.○ Quad-core Intel Xeon 3.0GHz, 6GB RAM.
![Page 26: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/26.jpg)
Acknowledgements May08-24 group
Kyle ByerlyShannon McCormickMatt RohlfBryan Venteicher
Bioperf developersDavid A. Bader, Georgia Tech Yue Li, Univ. of Florida Tao Li, Univ. of Florida Vipin Sachdeva, IBM Austin
![Page 27: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/27.jpg)
Questions?
![Page 28: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/28.jpg)
Previous Results and Projected Results
Code revision 4-Way 3.0GHz Machine (seconds)
X Speedup
PlayStation 3 (seconds)
X Speedup
dnapenny_orig 823.568 1 7793.915 1
dnapenny_slimmer 360.131 2.28685673
941.981 8.273962
parallel_dnapenny_1.0 221.432 3.71928177
780.867 9.9811043
supplement_spe_parallel_1SPE
1111.471 7.0122522
supplement_spe_parallel_3SPE
443.521 17.572821
supplement_spe_parallel_6SPE
277.233 28.11323
supplement_parallel_vector_1SPE
260.952 29.867236
supplement_parallel_vector_3SPE
153.656 50.723141
supplement_parallel_vector_6SPE
130.59 59.682326
Cluster with 3 PlayStations
(Projected)
~54.8 ~142.224
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
f(x) = 5.72802144736842 x + 21.9361413947368R² = 0.887915258548363
Number of available SPEs + PPE
x Sp
eedU
p (C
ompa
red
to o
rigin
al p
rogr
am ru
nnin
g on
one
PPE
)
![Page 29: High performance bioinformatics](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815e08550346895dcc5e5a/html5/thumbnails/29.jpg)
Summary Cost: $0. Equipment provided. Time: 555 approximate man hours.
Freely Donated. Results: 4x the performance of a
similarly priced system.