cluster computer for bioinformatics applications nile university, bioinformatics group. hisham adel...

52
Cluster Computer For Bioinformatics Applications Nile University, Bioinformatics Group. Hisham Adel 2008

Post on 21-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Cluster Computer For Bioinformatics

Applications

Nile University,

Bioinformatics Group.

Hisham Adel

2008

2

Done By:1. Hisham Adel Hassan.

Supervised by:

Dr. Mohamed Aboualhouda

3

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

4

Introduction

5

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

6

Cluster Definition

•Group of computers and servers (connected together) that act like a single system.

•Each system called a Node.

•Node contain one or more Processor , Ram ,Hard disk and LAN card.

•Nodes work in Parallel.

•We can increase performance by adding more Nodes.

7

8

9

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

10

Cluster types

•Load Balancing Cluster (Parallel BLAST).

•Computing Cluster(Parallel sequence alignment).

•High-availability (HA) clusters.

11

Cluster types:Load Balancing Cluster

Task

12

Cluster types:Computing Cluster

Task

13

Cluster type:High-availability Clusters

14

Cluster advantages

•Performance.

•Scalability.

•Maintenance.

•Cost.

15

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

16

Node 1

switch

Node 4

Node 2

Node 3

Internet

Internet

Internet

Internet

Our Cluster

17

Communication : Switch 5-Port 10/100Mbps.

Processor and Ram: -Master Node Duo core Processor 1.86 GHZ. Ram 1GB.-Node 1 Pentium 4 Ram 1GB.-Node 2 Pentium 4 Ram 1GB-Node 3 Pentium 4 Ram 512 MB

Our Cluster specification

18

Operating System OPEN SUSE 10.3

http://software.opensuse.org/

MPICH2

http://www.mcs.anl.gov/research/projects/mpich2/

Our Cluster specification (cont’)

19

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

20

Performance of the Cluster is affected by

1-Node speed.

2-Running Program.

21

Working…

Running Program(sequential)

22

Working…

Running Program(sequential)

23

Working…

Running Program(sequential)

24

Running Program(sequential)

25

Data sent

Data sent

Data sent

Running Program(Parallel)

26

Working…

Working…

Working…

Working…

Running Program(Parallel)

27

Finished…

Finished…

Finished…

Results

Results

Results

Get results…

Running Program(Parallel)

28

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

29

Sequence Alignment

30

Sequence Alignment

Used to :

1-Compare between sequences.

2-Search databases.

31

How to Align two Sequences.

if we have two sequences A A A C G A A A T G ALet match=1, gap=-1 , miss-match=0.

they can be aligned as:

1- A A A C G A | | | | | | Score=3 A A T _ G A

2- A A A C _ G A | | | | | | | Score=1 A A _ _ T G A

32

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance• Cluster Computer for Basic Problems..• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

33

BLAST(Basic Local Alignment Search Tool)

Searching DataBases

34

BLAST Algorithm

(High scoring pairs)

35

Blast search types.

BLASTN - Compares a nucleotide query sequence against a nucleotide sequencedatabase.

BLASTP- Compares an amino acid query sequence against a protein sequencedatabase.

TBLASTN- Compares a protein query sequence against a nucleotide sequenceDatabase.

BLASTX- Compares nucleotide query sequence against a protein sequence database.

36

Why We need BLAST to be parallelized ?

37

Our Program:Parallel BLAST

38

Parallel BLAST(cont’)

Formatdb.c

Nucleotide sequence database “formatdb -i DATABASE -p F “.

Protein sequence database “formatdb -i DATABASE -p T “.

39

Linux_Cluster_BLASTALL.c

“blastall -p BLAST Search Type -d DATABASE -i QUERY FILE -o out . Txt”

Parallel BLAST(cont’)

40

Results Average of running 1000 Query, 1000 times.

month.htgs (573 MB)drosoph.nt (118,6 MB))

igseqnt (67.5 MB)Yeastnt (3.2 MB)

mito.nt (3.2 MB)Pdbnt (1.7 MB)

0.0000000

0.2000000

0.4000000

0.6000000

0.8000000

1.0000000

1.2000000

1.4000000

1.6000000

1.8000000

Nucleotide-Nucleotide

1 Node

3 Nodes-Query time

3-Nodes-Query and communication time

Database(Size)

Tim

e(S)

41

Results(cont’) Average of running 1000 Query, 1000 times.

env_nr(1.6GB) nr(573MB) SwissProt(160MB) Pdbaa(20MB) Yeast.aa(3.2MB)

0.000000

10.000000

20.000000

30.000000

40.000000

50.000000

60.000000

70.000000

80.000000

90.000000

Amino acid_Amino acid

1 Node-Query Time

3 Nodes-Query time

3 Nodes-Query and communication time

Database(size)

Tim

e(S)

42

Results(cont’) Average of running 1000 Query, 1000 times.

env_nr(1.6GB) Swissprot(160MB) nr(84.7MB) Pdbaa(20.4MB) yeast.aa(3.2MB)

0.0000000

10.0000000

20.0000000

30.0000000

40.0000000

50.0000000

60.0000000

70.0000000

80.0000000

90.0000000

Amino acid_Nucltide

1 Node Query time

3 Nodes Query time only

3 Nodes Query and Communication time

Database(Size)

Tim

e(S

)

43

Conclusion about Parallel BLAST.

•Performane: Batter by using CLUSTER.

•Scalability:More Nodes time decrease.

44

Points

• Introduction.• Cluster and Supercomputers.• Cluster Types and Advantages.• Our Cluster.• Cluster Performance.• Cluster Computer for Basic Problems.• General Idea about Sequence Alignment.• BLAST and Parallel BLAST Algorithm.• Sequence Alignment and Parallel Sequence Alignment. • Learned Skills.

45

Sequence AlignmentCompare between sequences

46

Sequence Alignment

•Introduction.

•Sequence Alignment Benefits.

•Sequence Alignment Types.

47

Needleman-Wunsch Algorithm

48

Why We need Sequence Alignment to be parallelized ?

49

Parallel Sequence Alignment algorithm

50

Our Sequence Alignment Program

•Pairwise Alignment.

•Built Using Needleman-Wunsch algorithm.

51

Learned Skills.

•Using Linux (Suse 10.3) operating system.

• Programming using C language.

• Cluster computers and how to build one.

• MPICH2 for message passing interfaces between nodes.

• Latex.

• Team working, and helping each other.

• Presentation skills.

52

Thank you for your time.

Hisham Adel