massively parallel solutions for molecular sequence analysis bertil schmidt school of computer...

Massively Parallel Solutions for Molecular Sequence Analysis

Bertil SchmidtBertil Schmidt School of Computer Engineering,

Nanyang Technological University , Singapore

Heiko SchröderHeiko SchröderSchool of Computer Science and Information Technology,

RMIT University, Melbourme, Australia

Manfred SchimmlerManfred SchimmlerInstitut für Datentechnik und Kommunikationsnetze,

TU Braunschweig, Germany

Contents

MotivationSmith-Waterman Algorithm Parallelization on the Hybrid

ArchitectureParallelization on the Fuzion 150Performance EvaluationConclusion and Future Work

Genetic sequence databases are growing exponentially Growth rate will continue, since multiple concurrent

genome projects have begun, with more to come

Motivation

Discovered sequences are analyzed by comparison with databases

Complexity of sequence comparison is proportional to the product of query size times database size

Analysis too slow on sequential computersAnalysis too slow on sequential computersTwo possible approaches

HeuristicsHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results

Parallel ProcessingParallel Processing, get high-quality results in reasonable time

Full Genome Comparison

related Organisms, but Tuberculosis causes a disease find common and different parts

16106 pairwise sequence comparisons Many Genome-Genome Comparisons will be required in the near future

3918 ProteinSequences1.329.298

AminoAcids

4289 ProteinSequences1.359.008

AminoAcids

Protein Sequence Alignment

BLAST, FastA, Smith-Waterman

GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII

GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV|||::::| : |::| ||:::||||:|:|||:: ::| |::::

Smith-Waterman

Slower

Faster

SearchSpeed

DataQuality

Lower Higher

Smith-Waterman Algorithm

Optimal local alignment of two sequences Performs an exhaustive search for the

optimal local alignment Complexity O(nm) for sequence lengths n and m

Based on the 'dynamic programming' (DP) algorithm Fill the DP matrix using a substitution (mutation) matrix Find the maximal value (score) in the matrix Trace back from the score until a 0 value is reached

Smith-Waterman Algorithm Aligning S1 and S2 of length n and m using Recurrences:

21 ,11,

)2,1()1,1(

max),( ljli

SSSbtjiH

jiEjiH

0),0(),0(

0)0,()0,(

),1(max),( ,

)1,(max),(

jiHjiF

jiHjiE

Calculate three possible ways to extend the alignment by one AminoAcid (AA) in each sequence by one AA in the first sequence and align it with a gap in the second by one AA in the second sequence and align it with a gap in the first

Smith-Waterman AlgorithmAlign S1=ATCTCGTATGATGATCTCGTATGATG S2=GTCTATCACGTCTATCAC

GTCTATCAC

A T C T C G T A T G A T G

0 0 0 0 0 2 1 0 0 2 1 00000000000

0 0 0 0 0 0 0 0 0 0 0 0 02

0 2 1 2 1 1 4 3 2 1 1 3 20021021

1224321

4323654

3654554

4554657

3444556

3546545

3475576

2569876

1458876

2258799

2147788

else 1

)( if 2),(

yxyxSbt

=1, =1

A T C T C G T A T G A T GA T C T C G T A T G A T G

G T C G T C T A T C A CT A T C A C

)2,1()1,1(

max),(

ji SSSbtjiH

jiHjiH

Parallel Architectures for Bioinformatics

Embedded Massively Parallel Accelerators Systola 1024: PC add-on board with

1024 processors (ISATEC, Germany)

Fuzion 150: 1536 processors on a single chip (Clearspeed Technology, UK)

Parallel Architectures for Bioinformatics

High speed Myrinet switchHigh speed Myrinet switch

Systola1024

Supercomputer performance at low cost combines SIMD and MIMD paradigm within a parallel architecture Hybrid ComputerHybrid Computer

Previous Applications

Scientific ComputingVolume VisualizationAutomatic Visual Quality ControlCryptographyComputer TomographyVideo CompressionRange of Transforms (Fourier, Wavelet,

Hough, Radon)Computer Graphics

Architecture of Systola 1024

Interface processors

RAM NORTH

host computer bus

Controller

RAM WEST

program memory

Instruction Systolic Array: 32 32 mesh of

processing elements wavefront instruction

execution

Instruction Systolic Array

row selectors

columnselectorsinstructions

+*- +*- -

wavefront instruction execution fast accumulation operations (e.g. row sum, broadcast, ringshift)

Parallelization of Smith-Waterman

matrix cells along a single diagonal are computed in parallel comparison is performed in A+B1 steps on A PEs

GTCTATCAC

A T C T C G T A T G A T G

0 0 0 0 0 2 1 0 0 2 1 00000000000

0 0 0 0 0 0 0 0 0 0 0 0 02

0 2 1 2 1 1 4 3 2 1 1 3 20021021

1224321

4323654

3654554

4554657

3444556

3546545

3475576

2569876

1458876

2258799

2147788

P1 P2 P13

Mapping onto Systola 1024

a30a31 a0

a63 a62 a32

a992a1022a1023

bk….b1b0bk….b1b0…c1c0 X

bb: subject sequence

aa: query sequence (equal to 1024)

Subject sequences can be pipelined with only 1 step delay k steps for subject sequence of length k

Efficient routing on the ISA: Row Ringshift and Broadcast

Performance Evaluation

Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths

Query sequence length 256 512 1024 2048 4096

Systola 1024speedup to PIII 850

Cluster of 16 Systolasspeedup to PIII 850

Parallel implementation scales linearly with sequence length and number of PCs

Computing time dominates data transfer time

Fuzion 150 Architecture

0.25-m, single-chip, SIMD architecture 1536 PEs @ 200 MHz 300 GOPS 600 GB/s on-chip, 6.4 GB/s off-chip bandwidth Multithreading (control units interact via semaphores) developed by Clearspeed Technology (UK) for graphics, networking processing

Linear SIMD Array1536 PEs

each with 2 Kbytes DRAM

Linear SIMD Array1536 PEs

each with 2 Kbytes DRAM

FUZION BusFUZION Bus

32-bit EPU(ARC)

VideoI/O

DisplayDisplay

Instruction FetchInstruction Fetch

SIMD ControllerSIMD Controller

Local MemoryLocal

Memory1,2 or 4

Channels (6.4 GB/s)

HostHost AGP Rambus

Fuzion 150 Architecture

PE(0,0)

PE(0,1)

PE(0,255)

PE(1,0)

PE(1,1)

PE(1,255)

PE(5,0)

PE(5,1)

PE(5,255)

Local MemoryLocal

Memory

Block 5

Block 1

Block 0

ALU(8 bits)

Register file32 Bytes

PE Memory2 KByte DRAM

Right PE

Instructions

Block I/O Channel

Left PE

Mapping onto the Fuzion 150 Block 5

Block 1

Block 0

bb: subject sequence

bk….b1b0bk….b1b0

a1a0 a255

a511 a510 a256

a1280a1534a1535aa: query sequence (equal to 1536)

…c1c0 X

No fast global communication 2-step local communication Subject sequence can be pipelined with only step delay

Mapping onto the Fuzion 150

Reduce communication time Assign 16 AAs to each PE query lengths up to

24576 AAs can be processed within a single pass

Partitioning for query lengths <24576: each subarray of corresponding size computes

the alignment of the same query sequence with different subject sequences

Performance Evaluation

Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths

Query sequence length 256 512 1024 2048 4096

Fuzion 150speedup to PIII 850

162165

Parallel implementation scales linearly with sequence length Computing time dominates data transfer time

Performance Evaluation Normalized time Comparison for a 10 Mbase

search on different parallel architectures with different query length

SAMBA Fuzion 150 Kestrel 16K-PEMasPar

ds 512

4faster than 16K-PE MasPar 6faster than Kestrel 5faster than SAMBA (special-purpose 3-board

architecture)

Performance Evaluation for Full Genome Comparison Scan times for pairwise protein sequence comparison of

Mycobacterium Tuberculosis and Escherichia Coli

Cluster of Systola 1024speedup to PIII 850

17 min79

Fuzion 150speedup to PIII 850

11 min133

Comparison has to be performed for several parameters (Substitution matrices, gap penalties) Mycobacterium Smegmatis will be published later this year Results of the comparison will be interpreted with the Centre for Molecular Cell Biology, NUS,

Singapore

Conclusions and Future Work

Demonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics

Significant runtime savings for genome comparisons and database searching More Discovery Is Possible at a good price-performance ratio

Other Computational Biology applications of interest to us: ClustalW HMM pattern matching algorithms, such as inverted repeats,

short tandem repeats, etc Availability of accelerators as a special-resource in a Grid

Environment

Contents

Protein StructureProtein Structure PredictionApproach based on Local Protein

StructureRefinementsConclusions and Future Work

Protein Structure

Proteins are large molecules composed of smaller molecules called amino acids

There are 20 kinds of amino acids found in natural proteins

All share a common structure

R side chain

carboxyl groupamine group

alpha carbon(with attached hydrogen)

Protein Structure

From Primary to Tertiary Structure

A protein’s 3D shape is determined by its primary amino acid sequence (Anfinsen, 1963)

Predicting tertiary structure from amino acid sequence is an unsolved problem Difficult to model the energies

that stabilize a protein molecule Conformational search space is

enormous

Prediction Methods

Given an amino acid sequence: search a set of known folds by aligning sequence

and a template fold representative predict the fold that gets the best scoring

alignment

Target amino acid sequence

Template

Fold library

YLAADTYK

Template amino acid sequence FISSETCN MEPSSYV TGLIRKN

Target/template Score: 7 21 2

Prediction Methods

This method is very effective when target and template have >30% sequence identity

Approximately 1/3 of protein sequences can be assigned folds and modeled this way

Our aim is to contribute to determine tertiary structures in case matching sequences cannot be found

Local structure and prediction

What is Local structure ? describes environment of an amino acid an amino acid’s relationship to neighbors

we use this information to predict structure from primary sequence

Dihedral Angles

The 6 atoms in each peptide unit lie in the same plane and free to rotate

The structure of a protein is almost totally determined, if all angles and are known

Idea of our Approach

Stiff free local predictability database of sub-chain structures reduction of the number of degrees of freedom by 10, reduces the computation time

significantly in combination with a global optimization algorithm (e.g. GA or SA)

Side chains

Back bone

Classification of Dihedral Angles

Selected PDB structures

Dihedral angle

extraction

Histogram for each

amino acids pair

multiple

flexible

-100 -50 0 50 100 1500

ALA-ALA

-160 -140 -120 -100 -80 -600

LEU-ARG

-160 -140 -120 -100 -80 -60 -40 -200

GLY-ILE

multiple

flexible

Selected PDB structures

Dihedral angle

extraction

Histogram for each

amino acids pair

multiple

flexibleStiff angles: determine mean valueMultiple angles: determine sequence of mean values,

one for each peak in decreasing order of these peaksFlexible angles: determine mean value and mark as

flexible

Prediction based on Classification

Given a sequence of amino acids, find the subsequence in which all angles are of type stiff

predict structure of these subsequences, using the mean values of the corresponding histograms

Prediction based on Classification

Part of a protein predicted with this method (backbone of a helix, original structure on the left, predicted structure on the right)

Successfully predicted certain stiff structures of subsequences up to the length of 15

Refinement of the method

For multiple angles: consider sequences of length 3 or 4:

extract sequences (C,A,B,D) and determine the histogram of angles and related to the peptide chain between A and B

if histogram for for amino acids (A,B) is multiple, check if angle for (A,B,C,D) is stiff

with longer subsequences the occurrences of these sequences drops dramatically

Refinement of the method

For multiple angles: if an amino acid sequence has only a

small number of multiple edges, it is possible to try all combinations of possible peaks

many combinations lead to collisions in part of the protein, and thus can be eliminated

Conclusion and Future Work

Presented a method to predict stiff structures of subsequences up to the certain length

Presented a refinement of the method to handle multiple angles

how to handle flexible angles ?Using the local prediction as an input for a

global optimization method, e.g. based on Simulated Annealing

massively parallel solutions for molecular sequence analysis bertil schmidt school of computer...

t c t c g t

c slide

t g g t c t

gtctatcac g t c t

c atctcgtatgatg

aminoacids slide

sequence yby

motivation slide

Documents

nanyang artists

nanyang technical university

the nanyang chronicle

nanyang mba brochure

bertil persson

session 63 bertil hök

bertil persson - camatsystem.com

nanyang technological university school of biological...

78349054 bertil malmberg la fonetica

ဘာတီးလင့္တနာ bertil lintner -...

excel - bachelorprojekt - bertil theis jørgensen

akademi senilukis nanyang

shangyu chen nanyang technological university, …shangyu...

success factors that improved the asian universities...

bertil - weburn.kb.se

session 65 bertil mårtensson

massively scaled security solutions for massively scaled...

nanyang bulletin

bertil aldman memorial lecture

session 67 bertil mårtensson