team members: joshua wu 11174269 shuyu (christine) xu 11161640
TRANSCRIPT
![Page 1: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/1.jpg)
ADVANCED COMPUTATIONAL BIOLOGY
PROJECT PRESENTATION
Team Members:
Joshua Wu 11174269
Shuyu (Christine) Xu 11161640
![Page 2: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/2.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 3: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/3.jpg)
Project Description
Explicit Suffix Trees Suppose that we want to store explicitly
all strings that are edge labels of a suffix tree.
The main question of this project is how much space explicit suffix trees require comparing to implicit suffix trees.
Implement suffix tree algorithm and run it on substrings of real data.
![Page 4: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/4.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 5: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/5.jpg)
Introduction
Any string of length m can be degenerated into m suffixes, and these suffixes can be stored in a suffix tree.
Setup time O(m) (m is length of string)
searching time O(n) (n is length of pattern)
![Page 6: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/6.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 7: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/7.jpg)
Motivation
"Suffix trees are widely used in the computer field... Recent improvements in the method have cut the memory requirement to 17 bytes per letter, which brings the method to the verge of practicality [for bioinformatics applications]" -- Nat Goodman (Genome Technology).
![Page 8: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/8.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 9: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/9.jpg)
Bioinformatics Application
1. multiple genome alignment (Michael Hohl et al., 2002)
2. selection of signature oligonucleotides for DNA arrays (Kaderali and Schliep, 2002)
3. identification of sequence repeats (Kurtz and Schleiermacher, 1999)
![Page 10: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/10.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 11: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/11.jpg)
Explicit vs Implicit ABC $ Explicit 1 2 3 4 ABC$ $
BC$ C$
Implicit
1,4 4,4
2,4 3,4
![Page 12: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/12.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 13: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/13.jpg)
Problem Analysis
Best Case for explicit and implicit suffix trees: All different characters
Best case not likely with DNA inputs: total of 4 characters
Worst case: same characters throughout
![Page 14: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/14.jpg)
Assumptions
In implicit trees, each number will only take up one bit. (the number 10 takes up 1 bit)
Only alphabets will be in the sequence
![Page 15: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/15.jpg)
Example: all different char ABCD $ 1,5 5,5 1 2 3 4 5 2,5 3,5 4,5
N: string length N = 5 Memory = 10 best case
![Page 16: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/16.jpg)
Example
ABCABC $ 7,7 1 2 3 4 5 6 7 1,3 2,3 6,6 N: string length N = 7 4,7 7,7 7,7 7,7 Memory = 20 4,7 4,7
![Page 17: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/17.jpg)
Example: all same character AAAA $ 1 2 3 4 5 1,1 5,5 N=string length N = 5, 6, 7 2,2 5,5 Memory = 16, 20, 24 Memory = 4n-4 3,3 5,5
Worse case
4,5 5,5
![Page 18: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/18.jpg)
Program Input Data
DNA for all kinds of creatures:
Homo Sapiens, Monkeys, Chickens, …
![Page 19: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/19.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 20: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/20.jpg)
Sample input: Homo Sapien
cagctcctgagactgctggcatgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaagtggacctcagacatggctcagccataggacctgccacacaagcagccgtggacacaacgcccactaccacctcccacatggaaatgtatcctcaaaccgtttaatcaataa
![Page 21: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/21.jpg)
Sample result
![Page 22: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/22.jpg)
Sample input 2: plants
EARPIVVGPPPPLSGGLPGTENSDQARDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG
![Page 23: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/23.jpg)
Sample output:
![Page 24: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/24.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 25: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/25.jpg)
Homo Sapien
![Page 26: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/26.jpg)
Sample Input: Homo Sapiens
atgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaa
![Page 27: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/27.jpg)
Comparisons: Homo Sapiens
![Page 28: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/28.jpg)
Comparisons: Homo Sapiens
![Page 29: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/29.jpg)
Monkey Virus
![Page 30: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/30.jpg)
Sample Input: Monkey Virus
GGSCFKCGKKGHFAKNCHEHAHNNAEPKVPGLCPRCKRGKHWANECKSKTDNQGNPIPPH
![Page 31: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/31.jpg)
Monkey Virus
![Page 32: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/32.jpg)
Plants
![Page 33: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/33.jpg)
Sample Input: Plants
EARPIVVGPPPPLSGGLPGTENSDQARDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG
![Page 34: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/34.jpg)
Plants
![Page 35: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/35.jpg)
Tobacco
![Page 36: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/36.jpg)
Sample input: tobacco
SYSITTPSQFVFLSSAWADPIELINLCTNALGNQFQTQQARTVVQRQFSEVWKPSPQVTVRFPDSDFKVYRYNAVLDPLVTALLGAFDTRNRIIEVENQANPTTAETLDATRRVDDATVAIRSAINNLIVELIRGTGSYNRSSFESSSGLVWTSGPAT
![Page 37: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/37.jpg)
Tobacco
![Page 38: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/38.jpg)
Insects
![Page 39: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/39.jpg)
Sample Input: Insects
DCLSGRYKGPCAVWDNETCRRVCKEEGRSSGHCSPSLKCWCEGC
![Page 40: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/40.jpg)
Insects
![Page 41: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/41.jpg)
Birds
![Page 42: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/42.jpg)
Sample Input: Birds
IDTCRLPSDRGRCKASFERWYFNGRTCAKFIYGGCGGNGNKFPTQEACMKRCAKA
![Page 43: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/43.jpg)
Birds
![Page 44: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/44.jpg)
SARS
![Page 45: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/45.jpg)
Sample Input: SARS
ALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV
![Page 46: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/46.jpg)
SARS
![Page 47: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/47.jpg)
Fish
![Page 48: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/48.jpg)
Sample Input: Fish
GHHHHHHLEDPSGGTPYIGSKISLISKAEIRYEGILYTIDTENSTVALAKVRSFGTEDRPTDRPIAPRDETFEYIIFRGSDIKDLTVCEPPKPIM
![Page 49: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/49.jpg)
Fish
![Page 50: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/50.jpg)
Chicken
![Page 51: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/51.jpg)
Sample Input: Chicken
RVKRVWPLVIRTVIAGYNLYRAIKKK
![Page 52: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/52.jpg)
Chicken
![Page 53: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/53.jpg)
files
Code
Results
![Page 54: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/54.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 55: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/55.jpg)
Conclusion
Explicit suffix trees require more space than implicit suffix trees in real datas.
Data comparison: worst case is DNA input (least variety of characters)
results Implicit trees should be used for smaller
use of storage
![Page 56: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/56.jpg)
1 3 5 7 9 11 13 15 17 19 21 23 250
500
1000
1500
2000
2500
3000
variety of string vs tree size
variety of string vs tree size
# of alphabets
![Page 57: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/57.jpg)
Conclusion
Application:it is easier to compare structures for implicit
than explicit suffix trees (number comparisons)
Save spaceEasy to implement
Further improvement?
![Page 58: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/58.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 59: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/59.jpg)
Possible Future Work
Program speed is too slow
The interface of our program should be improved. (Matlab)
More variety of input
![Page 60: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/60.jpg)
References
Real Data http://www.ncbi.nlm.nih.gov/entrez/viewe
r.fcgi?db=nucleotide&val=74273665 http://www.rcsb.org/pdb http://www.ncbi.nlm.nih.gov/sites/entrez
?cmd=search&db=nucleotide
![Page 61: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/61.jpg)
References
Online info http://en.wikipedia.org/wiki/Suffix_tree http://marknelson.us/1996/08/01/suffix-tr
ees/ http://homepage.usask.ca/~ctl271/857/s
uffix_tree.shtml http://www.cs.uku.fi/~kilpelai/BSA05/lect
ures/print07.pdf
![Page 62: Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649da15503460f94a8df05/html5/thumbnails/62.jpg)
THANK YOU!