needleman-wunch algorithm harshita

21
NEEDLEMAN WUNCH ALGORITHM HARSHITA BHAWSAR M.SC LIFE SCIENCE NIT ROURKELA

Upload: harshita-bhawsar

Post on 13-Apr-2017

203 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Needleman-wunch algorithm  harshita

NEEDLEMAN WUNCH ALGORITHM

HARSHITA BHAWSARM.SC LIFE SCIENCE

NIT ROURKELA

Page 2: Needleman-wunch algorithm  harshita

What is Needleman-Wunsch algorithm?

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.

It performs a global alignment on two sequences. The algorithm was developed by Saul B. Needleman and

Christian D. Wunsch and published in 1970. It is an example of Dynamic Programming and It was one of the

first applications of dynamic programming to compare biological sequences.

Page 3: Needleman-wunch algorithm  harshita

Even for relatively short sequences, there are lots of possible alignments. But it will take a long time to assess each alignment

one-by-one , to find the best alignment.

The Needleman-Wunsch algorithm saves us the trouble of assessing all the many possible alignments to find the best one.

The N-W algorithm takes time proportion to n2 to find the best alignment of two sequences that are both n letters long.

.

Page 4: Needleman-wunch algorithm  harshita

Alignment methods

Alignment:- Arranging the sequence of DNA/RNA or PROTEIN to identify similarities.

2 types:- Global and local sequence alignment methods

Global : Needleman-Wunch algorithm Local : Smith-Waterman algorithm

These two dynamic programming alignment algorithm are guaranteed to give OPTIMAL alignments

Page 5: Needleman-wunch algorithm  harshita

Goals of sequence alignment

Measure the similarity

Observe patterns of sequence conservation between related biological species and variability of sequences over time.

Infer evolutionary relationships.

Page 6: Needleman-wunch algorithm  harshita

Algorithm

Page 7: Needleman-wunch algorithm  harshita

Steps

1. Initialization

2 Matrix fill or scoring

3. Traceback and alignment

Page 8: Needleman-wunch algorithm  harshita

RULES

Put the gap in the first Fill the first column and last row with gap values Value of Box beside + Gap value Value of Box bottom + Gap value Diagonal value + {match/mismatch}

Page 9: Needleman-wunch algorithm  harshita

Lets see an example….TWO SEQUENCES WILL BE ALIGNED:- GATC (#SEQUENCE 1)GAGC (#SEQUENCE 2)

Page 10: Needleman-wunch algorithm  harshita

InitilizationCreate Matrix with M + 1 columns and N + 1 rows.M= sequence 1N= sequence 2

0

C

G

A

G

- - G A T C

Page 11: Needleman-wunch algorithm  harshita

Matrix FillFill the first column and For match=+1; Mismatch= -1; Gap= -2 last row with gap valuesWe putting the values by adding the gap valuesWith the beside box

0 -2 -4

C

G

A

G

- - G A T C

Page 12: Needleman-wunch algorithm  harshita

For match=+1; Mismatch= -1; Gap= -2

-8

-6

-4

-2

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 13: Needleman-wunch algorithm  harshita

Scoring Parameters Value of Box beside + Gap value match=+1;

Mismatch= -1; Value of Box bottom + Gap value Gap= -2 Diagonal value + {match/mismatch}

-8

-6

-4

-2

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

-4 -4+1

1

Page 14: Needleman-wunch algorithm  harshita

Scoring match=+1; Mismatch= -1; Gap= -2

-8

-6

-4

-2 1 -1 -3 -5

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 15: Needleman-wunch algorithm  harshita

Continuing the procedure…

match= +1; Mismatch= -1; Gap= -2

-8 -5 -2 -1 2

-6 -3 0 1 -1

-4 -1 2 0 -2

-2 1 -1 -3 -5

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 16: Needleman-wunch algorithm  harshita

Traceback Step After scoring is done we get the maximum global alignment score

at the end. It may be in negative or positive. The trace back step will determine the actual alingment(s) that

result in the maximum score. In this step we need to come back towards zero. Since we have kept the pointers to all the predecessors, so the traceback step become simple.

-8 -5 -2 -1 2

-6 -3 0 1 -1

-4 -1 2 0 -2

-2 1 -1 -3 -5

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 17: Needleman-wunch algorithm  harshita

we follow the pointers

-8 -5 -2 -1 2

-6 -3 0 1 -1

-4 -1 2 0 -2

-2 1 -1 -3 -5

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 18: Needleman-wunch algorithm  harshita

GAGC It`s the optimal alignment GA T C

-8 -5 -2 -1 2

-6 -3 0 1 -1

-4 -1 2 0 -2

-2 1 -1 -3 -5

0 -2 -4 -6 -8

C

G

A

G

-

- G A T C

Page 19: Needleman-wunch algorithm  harshita

Other example…AGC and AACC

For alignment we need to look at the pointers:- = sequence = gaps

We got 3 optimal alignment:-A-GC AG-C -AGCAAAC AACC AACC

A

G C

0 -2 -4 -6

A -2 1 -1 -3

A -4 -1 0 -2

C -6 -3 -2 -1

C -8 -5 -4 -1

Page 20: Needleman-wunch algorithm  harshita

Checking..!

We can also check our alignment is right or not, by doing scoring manually.

Eg:- GAGC A-GC

GATC AACC +1+1-1+1 +1-2-1+1 = 2 = -1This score should must be equal to the maximum score of traceback.If it is then it`s a perfect alingment.

Page 21: Needleman-wunch algorithm  harshita