comparison of large sequences
DESCRIPTION
Comparison of large sequences. First part: Alignment of large sequences. Dynamic programming. accaccacaccacaacgagcata … acctgagcgatat. a c c . . t. acc.................................agt | | |.................................|xx acc.................................a--. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/1.jpg)
Comparison of large sequences
First part:
Alignment of large sequences
![Page 2: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/2.jpg)
Dynamic programming
What about genomes?
• Quadratic cost of space and time.
accaccacaccacaacgagcata … acctgagcgatat
acc..t
• Short sequences (up to 10.000 bps) can be aligned using dynamic programming
• Quadratic cost of space and time.
acc.................................agt | | |.................................|xxacc.................................a--
![Page 3: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/3.jpg)
Genomic sequences
In which case Dynamic Programming can be applied?
•The length of sequences is 1000 times longer.
• Genomic sequences have millions of base pairs.
•The running time is 1.000.000 times higher !
(1 second becomes 11 days)(1 minute becomes 2 years)
![Page 4: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/4.jpg)
First assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
……………………………………Genome B
……
……
……
……
……
….
Gen
ome
A
![Page 5: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/5.jpg)
Realistic assumption?
Unrealistic assumption!
More realistic
assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
………………………………………………………………….
………………………………………………...…………...….Genome A
Genome B
………………………
……
……G
enom
e A
Genome B
![Page 6: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/6.jpg)
Realistic assumptions?
But, now is it a
real case?
Unrealistic assumption!
More realistic
assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
…………………………………………………………………
………………………………………………...…………...….Genome A
Genome B
………………………
……
……G
enom
e A
Genome B
![Page 7: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/7.jpg)
Preview in a real case
Chlamidia muridarum: 1.084.689bps Chlamidia Thrachomatis:1057413bps
![Page 8: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/8.jpg)
Preview in a real case
Pyrococcus abyssis: 1.790.334 bpsPyrococcus horikoshu: 1.763.341 bps
![Page 9: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/9.jpg)
Methodology of an alignment
1st:
2nd:
3th: (Linear cost)
Identify the portions that can be aligned.
Make a preview: ……………………..….
…………………...….
Make the alignment:
…..…
……
………………….
(Linear cost)
![Page 10: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/10.jpg)
Methodology of an alignment
(Linear cost)
Make a preview: ……………………..….
…………………...….
1st:
2nd:
3th:
Identify the portions that can be aligned.
Make the alignment:
…..…
……
………………….
?
![Page 11: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/11.jpg)
Preview-Revisited
… a a t g….c t g...
… c g t g….c c c ...
MatchingUniqueMaximal
MUM
Connect to MALGEN
![Page 12: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/12.jpg)
Methodology of an alignment
1st:
2nd:
3th:
Identify the portions that can be aligned.
Make a preview: ……………………..….
…………………...….
Make the alignment:
…..…
……
………………….
How can MUMs be found?
With CLUSTALW, TCOFFEE,…
How can these portions be determined?
Linear costwith
Suffix trees
![Page 13: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/13.jpg)
Bioinformatics PhD. Course
Second part:
Introducing Suffix trees
![Page 14: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/14.jpg)
Suffix trees
Given string ababaas:
1: ababaas
2: babaas
3: abaas
4: baas
5: aas
6: as
7: s
as,3
s,6
as,5
s,7
as,4ba
baas,2
a
babaas,1
a
babaas,1
ba
baas,2
as,3
as,4
s,6
as,5
s,7
Suffixes:
What kind of queries?
![Page 15: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/15.jpg)
Applications of Suffix trees
a
babaas,1as,3
ba
baas,2
as,4
s,6
as,5
s,7
1. Exact string matching
…………………………
• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
![Page 16: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/16.jpg)
Quadratic insertion algorithm
Given the string …………………………......
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 17: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/17.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
![Page 18: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/18.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
![Page 19: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/19.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1ababaabbs,1
![Page 20: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/20.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
abbs,3
![Page 21: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/21.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
abbs,3
ba
baabbs,2
![Page 22: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/22.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
abbs,3
ba
baabbs,2
abbs,4
![Page 23: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/23.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
abbs,3
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
![Page 24: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/24.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
abbs,5
![Page 25: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/25.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
abbs,5
![Page 26: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/26.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4
ba
ba
baabbs,2
abbs,4
a abbs,5
b
a abbs,3
baabbs,1
![Page 27: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/27.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
a abbs,5
b
a abbs,3
baabbs,1
bs,6
![Page 28: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/28.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
a abbs,5
b
a abbs,3
baabbs,1
bs,6
![Page 29: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/29.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
b
a abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
![Page 30: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/30.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
b
a abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
s,8
![Page 31: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/31.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
b
a abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
s,7
s,9
![Page 32: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/32.jpg)
Generalizad suffix tree
The suffix tree of many strings …
and it is the suffix tree of the concatenation of strings.
the generalized suffix tree of ababaabb and aabaat …
is the suffix tree of ababaabαaabaatβ, :
is called the generalized suffix tree …
For instance,
![Page 33: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/33.jpg)
Generalizad suffix tree
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given the suffix tree of ababaabα :
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 34: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/34.jpg)
Generalizad suffix tree
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 35: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/35.jpg)
Generalizad suffix tree
Construction of the suffix tree of ababaabbαaabaaβ :
a bα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
![Page 36: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/36.jpg)
Generalizad suffix tree
Construction of the suffix tree of ababaabbαaabaaβ :
a bα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
![Page 37: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/37.jpg)
Generalizad suffix tree
a bα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 38: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/38.jpg)
Generalizad suffix tree
a bα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 39: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/39.jpg)
Construction of the suffix tree of ababaabbαaabaaβ :
Generalizad suffix tree
a bα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
aβ,3
![Page 40: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/40.jpg)
Construction of the suffix tree of ababaabbαaabaaβ :
Generalizad suffix tree
a bα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
aβ,3
![Page 41: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/41.jpg)
Generalizad suffix tree
abα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
aβ,3
a
β,4
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 42: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/42.jpg)
Generalizad suffix tree
abα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
aβ,3
a
β,4
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 43: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/43.jpg)
Generalizad suffix tree
abα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
aβ,3
a
β,4β,5
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 44: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/44.jpg)
Generalizad suffix tree
abα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
aβ,3
a
β,4β,5
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 45: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/45.jpg)
Generalizad suffix tree
abα,5
b
a bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
aβ,3
a
β,4β,5β,6
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 46: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/46.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
Generalized suffix tree of ababaabbαaabaaβ :
![Page 47: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/47.jpg)
Applications of Suffix trees
a
babaas,1as,3
ba
baas,2
as,4
s,6
as,5
s,7
1. Exact string matching
…………………………
• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
![Page 48: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/48.jpg)
Applications of Suffix trees
2. The substring problem for a database of strings DB• Does the DB contain any ocurrence of patterns abab, aab, and ab?
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 49: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/49.jpg)
Applications of Suffix trees
3. The longest common substring of two strings
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 50: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/50.jpg)
Applications of Suffix trees
5. Finding MUMs.
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 51: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/51.jpg)
Bioinformatics PhD. Course
Third part:
Suffix links
![Page 52: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/52.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 53: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/53.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 54: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/54.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 55: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/55.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 56: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/56.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 57: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/57.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 58: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/58.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 59: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/59.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 60: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/60.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 61: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/61.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 62: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/62.jpg)
Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 63: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/63.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a
![Page 64: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/64.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a
![Page 65: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/65.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a aa in S2 [1] Unique matchings
![Page 66: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/66.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a aa in S2 [1] Unique matchings
aab in S2 [1] =
S1[5..6-7] in S2 [1]
![Page 67: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/67.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]
![Page 68: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/68.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]
![Page 69: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/69.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]
S1[3..6-…] in S2 [2]
![Page 70: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/70.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]
S1[3..6-…] in S2 [2]
![Page 71: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/71.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]
S1[3..6-…] in S2 [2]
![Page 72: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/72.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]
S1[3..6-…] in S2 [2]
![Page 73: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/73.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]
S1[3..6-8] in S2 [2]
S1[4..6-8] in S2 [3]
![Page 74: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/74.jpg)
Traversal using Suffix links
a abbα,5
b
a abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4]
S1[3..6-8] in S2 [2]
S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]
![Page 75: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/75.jpg)
From UMs to MUMs
Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4]
S1[3..6-8] in S2 [2]
S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]
Array of UMs
123 6-84 6-85 86 87 889
and S1 = a b a b a a b b α
MUM: S1[3..6-8] in S2[2]
![Page 76: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/76.jpg)
Bioinformatics PhD. Course
Third part:
Linear insertion algorithm
![Page 77: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/77.jpg)
Quadratic insertion algorithm
Given the string …………………………......
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 78: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/78.jpg)
Linear insertion algorithm
Given the string …………………………......
P2: the string is the longest string that can be spelt through the tree.
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 79: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/79.jpg)
Linear insertion algorithm: example
Given the string ababaababb...
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
a
![Page 80: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/80.jpg)
Linear insertion algorithm: example
Given the string ababaababb...
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 8
![Page 81: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/81.jpg)
Linear insertion algorithm: example
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 8Given the string ababaababb...
![Page 82: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/82.jpg)
Linear insertion algorithm: example
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 89Given the string ababaababb...
![Page 83: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/83.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
baababb...,1
ba
baababb...,2
ababb...,4
Given the string ababaababb...
6 7 89
baababb...,1b
b...,6
aababb...,1
![Page 84: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/84.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 85: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/85.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 86: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/86.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 87: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/87.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
baababb...,2b aababb...,2
![Page 88: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/88.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 8…
b
b...,6
aababb...,1
baababb...,2b
b...,7
aababb...,2
![Page 89: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/89.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 90: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/90.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 91: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/91.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 92: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/92.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 93: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/93.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
89
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
![Page 94: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/94.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
89
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 95: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/95.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 96: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/96.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 97: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/97.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
a
![Page 98: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/98.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
a
b...,9
![Page 99: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/99.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8
a
b...,9
![Page 100: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/100.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8
a
b...,9
![Page 101: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/101.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8
a
b...,9
![Page 102: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/102.jpg)
Index
Suffix arrays Suffix-arrays: a new method for on-line
string searches, G. Myers, U. Manber
![Page 103: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/103.jpg)
Suffix arrays
Given string ababaa#:
1: ababaa#
2: babaa#
3: abaa#
4: baa#
5: aa#
6: a#
7: #
Suffixes: … but lexicographically sorted
1: ababaa#
2: babaa#
3: abaa#
4: baa#
5: aa#6: a#1: #1
234567
Which is the cost? O(n log(n))
![Page 104: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/104.jpg)
Applications of suffix arrays
1. Exact string matching• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
1: ababaa#
2: babaa#
3: abaa#
4: baa#
5: aa#6: a#1: #1
234567
Binary search
O(log(n) |P|)
… which is the cost?
O(log(n)+|P|) ?
Can it be improved to …
![Page 105: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/105.jpg)
Fast search with cost O(log(n)+|P|) Query:
Invariant Properties:
P1: α < query ≤ β α
β
12… …
n
Suffix array
P2: matches pref( query)
![Page 106: Comparison of large sequences](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568145ae550346895db2a8c1/html5/thumbnails/106.jpg)
Fast search with cost O(log(n)+|P|) Query:
Invariant Properties:
P1: α < query ≤ β α
β
γAlgorithm:
12… …
n
Suffix array
P2: matches pref( query)
If suff(γ)<suff(query) then α = γ
else β = γ