massively parallel solutions for molecular sequence analysis bertil schmidt school of computer...
Post on 21-Dec-2015
221 Views
Preview:
TRANSCRIPT
Massively Parallel Solutions for Molecular Sequence Analysis
Bertil SchmidtBertil Schmidt School of Computer Engineering,
Nanyang Technological University , Singapore
Heiko SchröderHeiko SchröderSchool of Computer Science and Information Technology,
RMIT University, Melbourme, Australia
Manfred SchimmlerManfred SchimmlerInstitut für Datentechnik und Kommunikationsnetze,
TU Braunschweig, Germany
Contents
MotivationSmith-Waterman Algorithm Parallelization on the Hybrid
ArchitectureParallelization on the Fuzion 150Performance EvaluationConclusion and Future Work
Genetic sequence databases are growing exponentially Growth rate will continue, since multiple concurrent
genome projects have begun, with more to come
Motivation
Motivation
Discovered sequences are analyzed by comparison with databases
Complexity of sequence comparison is proportional to the product of query size times database size
Analysis too slow on sequential computersAnalysis too slow on sequential computersTwo possible approaches
HeuristicsHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results
Parallel ProcessingParallel Processing, get high-quality results in reasonable time
Full Genome Comparison
related Organisms, but Tuberculosis causes a disease find common and different parts
16106 pairwise sequence comparisons Many Genome-Genome Comparisons will be required in the near future
3918 ProteinSequences1.329.298
AminoAcids
4289 ProteinSequences1.359.008
AminoAcids
Protein Sequence Alignment
BLAST, FastA, Smith-Waterman
GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII
GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV|||::::| : |::| ||:::||||:|:|||:: ::| |::::
BLAST
FastA
Smith-Waterman
Slower
Faster
SearchSpeed
DataQuality
Lower Higher
Smith-Waterman Algorithm
Optimal local alignment of two sequences Performs an exhaustive search for the
optimal local alignment Complexity O(nm) for sequence lengths n and m
Based on the 'dynamic programming' (DP) algorithm Fill the DP matrix using a substitution (mutation) matrix Find the maximal value (score) in the matrix Trace back from the score until a 0 value is reached
Smith-Waterman Algorithm Aligning S1 and S2 of length n and m using Recurrences:
21 ,11,
)2,1()1,1(
),(
),(
0
max),( ljli
SSSbtjiH
jiF
jiEjiH
ji
0),0(),0(
0)0,()0,(
jFjH
iEiH
),1(
),1(max),( ,
)1,(
)1,(max),(
jiF
jiHjiF
jiE
jiHjiE
Calculate three possible ways to extend the alignment by one AminoAcid (AA) in each sequence by one AA in the first sequence and align it with a gap in the second by one AA in the second sequence and align it with a gap in the first
Smith-Waterman AlgorithmAlign S1=ATCTCGTATGATGATCTCGTATGATG S2=GTCTATCACGTCTATCAC
GTCTATCAC
A T C T C G T A T G A T G
0 0 0 0 0 2 1 0 0 2 1 00000000000
0 0 0 0 0 0 0 0 0 0 0 0 02
0 2 1 2 1 1 4 3 2 1 1 3 20021021
1224321
4323654
3654554
4554657
3444556
3546545
3475576
2569876
1458876
03677
109
2258799
2147788
108
97
534
2
0
else 1
)( if 2),(
yxyxSbt
=1, =1
A T C T C G T A T G A T GA T C T C G T A T G A T G
G T C G T C T A T C A CT A T C A C
)2,1()1,1(
1)1,(
1),1(
0
max),(
ji SSSbtjiH
jiH
jiHjiH
Parallel Architectures for Bioinformatics
Embedded Massively Parallel Accelerators Systola 1024: PC add-on board with
1024 processors (ISATEC, Germany)
Fuzion 150: 1536 processors on a single chip (Clearspeed Technology, UK)
Parallel Architectures for Bioinformatics
High speed Myrinet switchHigh speed Myrinet switch
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Systola1024
Supercomputer performance at low cost combines SIMD and MIMD paradigm within a parallel architecture Hybrid ComputerHybrid Computer
Previous Applications
Scientific ComputingVolume VisualizationAutomatic Visual Quality ControlCryptographyComputer TomographyVideo CompressionRange of Transforms (Fourier, Wavelet,
Hough, Radon)Computer Graphics
Architecture of Systola 1024
Interface processors
ISA
RAM NORTH
host computer bus
Controller
RAM WEST
program memory
Instruction Systolic Array: 32 32 mesh of
processing elements wavefront instruction
execution
14
Instruction Systolic Array
+
row selectors
columnselectorsinstructions
*
-
+
-
*-
+*+
+*-+
+*
* +-+
+*-
+* +*
+*-
++*
*-*-+
+*
+*
-
-
-
+*
+*- +*- -
wavefront instruction execution fast accumulation operations (e.g. row sum, broadcast, ringshift)
Parallelization of Smith-Waterman
matrix cells along a single diagonal are computed in parallel comparison is performed in A+B1 steps on A PEs
GTCTATCAC
A T C T C G T A T G A T G
0 0 0 0 0 2 1 0 0 2 1 00000000000
0 0 0 0 0 0 0 0 0 0 0 0 02
0 2 1 2 1 1 4 3 2 1 1 3 20021021
1224321
4323654
3654554
4554657
3444556
3546545
3475576
2569876
1458876
03677
109
2258799
2147788
000 0
02
0
01
14
2
2
2
0
3
2
1
3
2
1
52
43
B
A
P1 P2 P13
Mapping onto Systola 1024
a30a31 a0
a63 a62 a32
a992a1022a1023
bk….b1b0bk….b1b0…c1c0 X
bb: subject sequence
aa: query sequence (equal to 1024)
Subject sequences can be pipelined with only 1 step delay k steps for subject sequence of length k
Efficient routing on the ISA: Row Ringshift and Broadcast
Performance Evaluation
Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths
Query sequence length 256 512 1024 2048 4096
Systola 1024speedup to PIII 850
2945
5776
11376
22416
46116
Cluster of 16 Systolasspeedup to PIII 850
2081
3886
7391
14294
29094
Parallel implementation scales linearly with sequence length and number of PCs
Computing time dominates data transfer time
Fuzion 150 Architecture
0.25-m, single-chip, SIMD architecture 1536 PEs @ 200 MHz 300 GOPS 600 GB/s on-chip, 6.4 GB/s off-chip bandwidth Multithreading (control units interact via semaphores) developed by Clearspeed Technology (UK) for graphics, networking processing
Linear SIMD Array1536 PEs
each with 2 Kbytes DRAM
Linear SIMD Array1536 PEs
each with 2 Kbytes DRAM
FUZION BusFUZION Bus
32-bit EPU(ARC)
32-bit EPU(ARC)
VideoI/O
VideoI/O
DisplayDisplay
Instruction FetchInstruction Fetch
SIMD ControllerSIMD Controller
Local MemoryLocal
Memory1,2 or 4
Channels (6.4 GB/s)
HostHost AGP Rambus
Fuzion 150 Architecture
PE(0,0)
PE(0,1)
PE(0,255)
Fuz
ion
Bus
PE(1,0)
PE(1,1)
PE(1,255)
PE(5,0)
PE(5,1)
PE(5,255)
Local MemoryLocal
Memory
Block 5
Block 1
Block 0
ALU(8 bits)
Register file32 Bytes
PE Memory2 KByte DRAM
Right PE
Instructions
Block I/O Channel
Left PE
Mapping onto the Fuzion 150 Block 5
Block 1
Block 0
bb: subject sequence
bk….b1b0bk….b1b0
a1a0 a255
a511 a510 a256
a1280a1534a1535aa: query sequence (equal to 1536)
…c1c0 X
No fast global communication 2-step local communication Subject sequence can be pipelined with only step delay
Mapping onto the Fuzion 150
Reduce communication time Assign 16 AAs to each PE query lengths up to
24576 AAs can be processed within a single pass
Partitioning for query lengths <24576: each subarray of corresponding size computes
the alignment of the same query sequence with different subject sequences
Performance Evaluation
Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths
Query sequence length 256 512 1024 2048 4096
Fuzion 150speedup to PIII 850
12136
22151
42157
82163
162165
Parallel implementation scales linearly with sequence length Computing time dominates data transfer time
Performance Evaluation Normalized time Comparison for a 10 Mbase
search on different parallel architectures with different query length
1
10
100
SAMBA Fuzion 150 Kestrel 16K-PEMasPar
Se
con
ds 512
1024
2048
4faster than 16K-PE MasPar 6faster than Kestrel 5faster than SAMBA (special-purpose 3-board
architecture)
Performance Evaluation for Full Genome Comparison Scan times for pairwise protein sequence comparison of
Mycobacterium Tuberculosis and Escherichia Coli
Cluster of Systola 1024speedup to PIII 850
17 min79
Fuzion 150speedup to PIII 850
11 min133
Comparison has to be performed for several parameters (Substitution matrices, gap penalties) Mycobacterium Smegmatis will be published later this year Results of the comparison will be interpreted with the Centre for Molecular Cell Biology, NUS,
Singapore
Conclusions and Future Work
Demonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics
Significant runtime savings for genome comparisons and database searching More Discovery Is Possible at a good price-performance ratio
Other Computational Biology applications of interest to us: ClustalW HMM pattern matching algorithms, such as inverted repeats,
short tandem repeats, etc Availability of accelerators as a special-resource in a Grid
Environment
Contents
Protein StructureProtein Structure PredictionApproach based on Local Protein
StructureRefinementsConclusions and Future Work
Protein Structure
Proteins are large molecules composed of smaller molecules called amino acids
There are 20 kinds of amino acids found in natural proteins
All share a common structure
R side chain
carboxyl groupamine group
alpha carbon(with attached hydrogen)
From Primary to Tertiary Structure
A protein’s 3D shape is determined by its primary amino acid sequence (Anfinsen, 1963)
Predicting tertiary structure from amino acid sequence is an unsolved problem Difficult to model the energies
that stabilize a protein molecule Conformational search space is
enormous
Prediction Methods
Given an amino acid sequence: search a set of known folds by aligning sequence
and a template fold representative predict the fold that gets the best scoring
alignment
Target amino acid sequence
Template
Fold library
YLAADTYK
Template amino acid sequence FISSETCN MEPSSYV TGLIRKN
Target/template Score: 7 21 2
Prediction Methods
This method is very effective when target and template have >30% sequence identity
Approximately 1/3 of protein sequences can be assigned folds and modeled this way
Our aim is to contribute to determine tertiary structures in case matching sequences cannot be found
Local structure and prediction
What is Local structure ? describes environment of an amino acid an amino acid’s relationship to neighbors
we use this information to predict structure from primary sequence
Dihedral Angles
The 6 atoms in each peptide unit lie in the same plane and free to rotate
The structure of a protein is almost totally determined, if all angles and are known
Idea of our Approach
Stiff free local predictability database of sub-chain structures reduction of the number of degrees of freedom by 10, reduces the computation time
significantly in combination with a global optimization algorithm (e.g. GA or SA)
Side chains
Back bone
C
and
C
N
Classification of Dihedral Angles
Selected PDB structures
Dihedral angle
extraction
Histogram for each
amino acids pair
stiff
multiple
flexible
Classification of Dihedral Angles
-100 -50 0 50 100 1500
20
40
60
80
100
120
ALA-ALA
Freq
uenc
y
ALA-ALA
Freq
uenc
y
-160 -140 -120 -100 -80 -600
10
20
30
40
50
60
LEU-ARG
Freq
uenc
yFr
eque
ncy
LEU-ARG
-160 -140 -120 -100 -80 -60 -40 -200
5
10
15
20
GLY-ILE
Freq
uenc
yFr
eque
ncy
GLY-ILE
Stiff
multiple
flexible
Classification of Dihedral Angles
Selected PDB structures
Dihedral angle
extraction
Histogram for each
amino acids pair
stiff
multiple
flexibleStiff angles: determine mean valueMultiple angles: determine sequence of mean values,
one for each peak in decreasing order of these peaksFlexible angles: determine mean value and mark as
flexible
Prediction based on Classification
Given a sequence of amino acids, find the subsequence in which all angles are of type stiff
predict structure of these subsequences, using the mean values of the corresponding histograms
Prediction based on Classification
Part of a protein predicted with this method (backbone of a helix, original structure on the left, predicted structure on the right)
Successfully predicted certain stiff structures of subsequences up to the length of 15
Refinement of the method
For multiple angles: consider sequences of length 3 or 4:
extract sequences (C,A,B,D) and determine the histogram of angles and related to the peptide chain between A and B
if histogram for for amino acids (A,B) is multiple, check if angle for (A,B,C,D) is stiff
with longer subsequences the occurrences of these sequences drops dramatically
Refinement of the method
For multiple angles: if an amino acid sequence has only a
small number of multiple edges, it is possible to try all combinations of possible peaks
many combinations lead to collisions in part of the protein, and thus can be eliminated
Conclusion and Future Work
Presented a method to predict stiff structures of subsequences up to the certain length
Presented a refinement of the method to handle multiple angles
how to handle flexible angles ?Using the local prediction as an input for a
global optimization method, e.g. based on Simulated Annealing
top related