presented by liu qi pairwise sequence alignment. presented by liu qi why align sequences? functional...
TRANSCRIPT
Presented by Liu QiPresented by Liu Qi
Pairwise Sequence Pairwise Sequence AlignmentAlignment
Presented By Liu QiPresented By Liu Qi
Why align sequences
Functional predictions based on identifying homologues
Assumesconservation of sequence conservation of
function BUT Function carried out at level of proteins ie3-D structure Sequence conservation carried out at level of DNA1-D sequence
Presented By Liu QiPresented By Liu Qi
Some DefinitionsSome Definitions
An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer
An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
MethodsMethods
Dot matrix Dynamic Programming Word k-tuple (heuristic based)
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Why align sequences
Functional predictions based on identifying homologues
Assumesconservation of sequence conservation of
function BUT Function carried out at level of proteins ie3-D structure Sequence conservation carried out at level of DNA1-D sequence
Presented By Liu QiPresented By Liu Qi
Some DefinitionsSome Definitions
An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer
An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
MethodsMethods
Dot matrix Dynamic Programming Word k-tuple (heuristic based)
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Some DefinitionsSome Definitions
An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer
An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
MethodsMethods
Dot matrix Dynamic Programming Word k-tuple (heuristic based)
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Some DefinitionsSome Definitions
An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer
An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
MethodsMethods
Dot matrix Dynamic Programming Word k-tuple (heuristic based)
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
MethodsMethods
Dot matrix Dynamic Programming Word k-tuple (heuristic based)
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Brief intro of methodsBrief intro of methods
dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences
bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length
bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
1 - one sequence listed along top of page and second sequence listed along the side
2 - move across row and put dot in any column where the character is the same
3 - continue for each row until all possible character matches between thesequences are represented by dots
4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)
5 - isolated dots represent random similarity unrelated to the alignment
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dot matrix with noise reductionDot matrix with noise reduction
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences
We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dot matrixDot matrix
Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences
1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10
1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Things to be consideredThings to be considered
Scoring matrix for distance correction
Window size Threshold
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
The useful of Dot plot The useful of Dot plot
Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps
Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals
Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base
pairing of RNA moleculespairing of RNA molecules
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Intra-sequence comparisonIntra-sequence comparison
RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
ABRACADABRACADABRACADABRACAD
ExamplesExamples
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
palindromepalindromeSequence ATOYOTA
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
RepeatsRepeats
Drosophila melanogaster SLIT protein against itself
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Low complexityLow complexity
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Inter sequence comparisonInter sequence comparison
Conserved domainsConserved domains Insertion and deletionInsertion and deletion
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Insertion and deletionInsertion and deletion
Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Conserved domainsConserved domains
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Even more can be done with RNAEven more can be done with RNA
RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative
bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast
bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Structures of Structures of tRNA-PhetRNA-Phe
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Programs for Dot MatrixPrograms for Dot Matrix
DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml
SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics
res_inf_sightmlres_inf_sightmlDotter Dotter
httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml
COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
conclusionconclusion
Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods
letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT
DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
ReferenceReference
Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111
Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669
Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Answer what is the optimal alignment of two sequences(the best score)
How many different alignments
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Alignment methods with DPAlignment methods with DP
Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences
Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
A simple exampleA simple example
3
4
5
3
6
5 4
2
A
B
C
D
E
F
8
7
9
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Exercise
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
动态规划的适用条件动态规划的适用条件
一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性
以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic ProgrammingDynamic Programming
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
DP Algorithm for Global DP Algorithm for Global AlignmentAlignment
Two sequences X = x1xn and Y = y1ym
F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)
djiF
djiF
yxsjiF
jiF
F
ji
1
1
11
max
000
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
DP in equation formDP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5
GG -10-10
CC -15-15 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 -5-5 -10-10 -15-15
AA -5-5 22 -3-3 -8-8
GG -10-10 -3-3 -3-3 -1-1
CC -15-15 -8-8 -8-8 -6-6 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
TracebackTraceback
Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left
Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence
A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence
A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each
sequencesequence
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left
Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence
A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence
A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence
A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence
A simple exampleA simple example
AA AA GG
00 -5-5
AA 22 -3-3
GG -1-1
CC -6-6
Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5
AAG- AAG--AGC A-GC
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
AnswerAnswer
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein
Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local alignment DPLocal alignment DP
Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution
matrix d is the linear gap penaltymatrix d is the linear gap penalty
0
11
11
max
000
djiFdjiF
yxsjiF
jiF
F
ji
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local DP in equation formLocal DP in equation form
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the
matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch
Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
AA
GG
CC 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00
GG 00
CC 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
A simple exampleA simple example
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
AA 00 22 22 00
GG 00 00 00 44
CC 00 00 00 00 11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5
0
AGAG
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00
AA 00
AA 00
GG 00
GG 00
CC 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Local alignmentLocal alignment
AA CC GG TT
AA 22 -7-7 -5-5 -7-7
CC -7-7 22 -7-7 -5-5
GG -5-5 -7-7 22 -7-7
TT -7-7 -5-5 -7-7 22
AA AA GG
00 00 00 00
GG 00 00 00 22
AA 00 22 22 00
AA 00 22 44 00
GG 00 00 00 66
GG 00 00 00 22
CC 00 00 00 00
11 jiF
jiF jiF 1
1 jiF
d
d ji yxs
Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5
0
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment
any number of indel operations at the end or at the beginning of the alignment contribute zero weight
X= - - c a c - t g t a c
Y= g a c a c t t g - - -
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)
F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)
Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)
X = c a c t g t a c
Y= g a c a c t t g
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
思考题思考题Does a local alignment program always Does a local alignment program always
produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment
Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
LETVGYW----L
-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and
gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Affine gap penaltyAffine gap penalty
a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a
stretch of characters ndash separated gaps are probably due to distinct mutational events
a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two
terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Gap penalty functionsGap penalty functions
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
need 3 matrices instead of 1
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
match=1 mismatch=-1
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
ExerciseExercise
Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST
Presented By Liu QiPresented By Liu Qi
Word k-tup
FASTAFASTA
BLASTBLAST