prosite ucsc genome browser msas and phylogeny exercise 2
Post on 20-Dec-2015
226 views
TRANSCRIPT
![Page 1: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/1.jpg)
Prosite Prosite UCSC Genome UCSC Genome
BrowserBrowserMSAsMSAsandand
Phylogeny Phylogeny
Exercise 2Exercise 2
![Page 2: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/2.jpg)
Turning information into knowledgeTurning information into knowledge
The outcome of a sequencing project is The outcome of a sequencing project is masses of raw datamasses of raw data
The challenge is to turn this The challenge is to turn this raw data into raw data into biological knowledgebiological knowledge
A valuable tool for this challenge is an A valuable tool for this challenge is an automated diagnostic pipe through which automated diagnostic pipe through which newly determined sequences can be newly determined sequences can be streamlinedstreamlined
![Page 3: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/3.jpg)
From sequence to functionFrom sequence to function
Nature tends to innovate rather than inventNature tends to innovate rather than invent Proteins are composed of functional Proteins are composed of functional
elements: domains and motifselements: domains and motifs DomainsDomains are structural units that carry out a are structural units that carry out a
certain functioncertain function The same domains are The same domains are
shared between different shared between different proteinsproteins
MotifsMotifs are shorter are shorter sequences with certainsequences with certainbiological activitybiological activity
![Page 4: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/4.jpg)
http://www.ebi.ac.uk/http://www.ebi.ac.uk/interprointerpro//
![Page 5: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/5.jpg)
InterProInterPro
An integrated documentation resource for An integrated documentation resource for protein families, domains and sitesprotein families, domains and sites
Groups signatures describing the same protein Groups signatures describing the same protein family or domainfamily or domain
Combines a number of databases that use Combines a number of databases that use different methodologies to derive protein different methodologies to derive protein signature:signature: UniProt: UniProtKB Swiss-Prot, TrEMBL, UniProt: UniProtKB Swiss-Prot, TrEMBL,
UniRef,UniParcUniRef,UniParc prosite: documented DB on domains, families and prosite: documented DB on domains, families and
functional sites.functional sites. Pfam: a DB of protein families represented by MSAsPfam: a DB of protein families represented by MSAs
![Page 6: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/6.jpg)
InterPro searchInterPro search
![Page 7: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/7.jpg)
![Page 8: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/8.jpg)
![Page 9: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/9.jpg)
http://www.expasy.ch/http://www.expasy.ch/prositeprosite//
![Page 10: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/10.jpg)
prositeprosite
A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences
Consists of a DB of annotated biologically Consists of a DB of annotated biologically important important sites/patterns/motifs/signature/fingerprintssites/patterns/motifs/signature/fingerprints
![Page 11: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/11.jpg)
prositeprosite Entries are represented with Entries are represented with patternspatterns or or
profilesprofiles
pattern
1122334455
AA0.660.66110000..
TT00000011..
CC0.330.33000.660.6600..
GG00000.330.3300..
profile
[AC-]A-[GC]-T-[TC]-[GC]
Profiles are used in prosite when the motif is relatively Profiles are used in prosite when the motif is relatively divergent, and it is difficult to represent as a patterndivergent, and it is difficult to represent as a pattern
![Page 12: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/12.jpg)
Scanning prositeScanning prosite
Query: sequence
Query: pattern
Result: all patterns found in sequence
Result: all sequences which adhere to this pattern
![Page 13: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/13.jpg)
Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence
Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.
Found in the majority of known protein Found in the majority of known protein sequences sequences
High probability of occurrenceHigh probability of occurrence
![Page 14: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/14.jpg)
prosite sequence queryprosite sequence query
![Page 15: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/15.jpg)
![Page 16: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/16.jpg)
prosite pattern queryprosite pattern query
![Page 17: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/17.jpg)
![Page 18: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/18.jpg)
![Page 19: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/19.jpg)
UCSC Genome BrowserUCSC Genome Browser
![Page 20: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/20.jpg)
Reset all settings of
previous user
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
![Page 21: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/21.jpg)
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
![Page 22: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/22.jpg)
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
![Page 23: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/23.jpg)
UCSC Genome BrowserUCSC Genome Browserquery resultsquery results
![Page 24: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/24.jpg)
UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks
Vertebrate conservation
mRNA (GenBank)
RefSeq
UCSC Genes
Base position
Single species compared
SNPs
Repeats
GeneDirection
Exon
Intron
UTR
![Page 25: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/25.jpg)
USCS GeneUSCS Gene
![Page 26: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/26.jpg)
UCSC Genome Browser - movementUCSC Genome Browser - movement
Zoom x3 + Center
![Page 27: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/27.jpg)
UCSC Genome Browser – UCSC Genome Browser – Base viewBase view
![Page 28: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/28.jpg)
Annotation track optionsAnnotation track options
dense
squish
full
pack
![Page 29: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/29.jpg)
Annotation track optionsAnnotation track optionsAnother option totoggle between
‘pack’ and ‘dense’view is to click on
the track title
Sickle-cell anemia distr.
Malariadistr.
![Page 30: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/30.jpg)
BLATBLAT
BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on
DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.
Good for:Good for:
1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA
2.2. Determining exons/intronsDetermining exons/introns
3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…) homologs of another vertebrate sequencehomologs of another vertebrate sequence
![Page 31: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/31.jpg)
BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser
![Page 32: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/32.jpg)
BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser
![Page 33: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/33.jpg)
BLAT ResultsBLAT Results
![Page 34: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/34.jpg)
BLAT ResultsBLAT Results
Match
Non-Match(mismatch/indel)
Indel boundaries
![Page 35: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/35.jpg)
BLAT ResultsBLAT Results
![Page 36: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/36.jpg)
BLAT Results on the browserBLAT Results on the browser
![Page 37: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/37.jpg)
Getting Getting DNADNA sequence of region sequence of region
![Page 38: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/38.jpg)
Getting Getting DNADNA sequence of region sequence of region
![Page 39: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/39.jpg)
Clustal X –Clustal X –
A Multiple A Multiple Alignment ToolAlignment Tool
![Page 40: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/40.jpg)
Input: multiple sequence Fasta fileInput: multiple sequence Fasta file>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein ]Homo sapiens[>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein ]Homo sapiens[MNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQMNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANSQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS
>gi|114051746|ref|NP_001040585.1| protease, serine, 2 ]Macaca mulatta[>gi|114051746|ref|NP_001040585.1| protease, serine, 2 ]Macaca mulatta[MNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQMNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANSQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANS
>gi|6755891|ref|NP_035775.1| mesotrypsin ]Mus musculus[>gi|6755891|ref|NP_035775.1| mesotrypsin ]Mus musculus[MNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQMNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>gi|6981422|ref|NP_036861.1| protease, serine, 2 ]Rattus norvegicus[>gi|6981422|ref|NP_036861.1| protease, serine, 2 ]Rattus norvegicus[MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQMRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen ]Bos taurus[>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen ]Bos taurus[MHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQMHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECLVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECL. . .. . .
![Page 41: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/41.jpg)
OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file
![Page 42: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/42.jpg)
OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file
![Page 43: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/43.jpg)
Input: multiple sequence Fasta fileInput: multiple sequence Fasta file>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein ]Homo sapiens[>gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein ]Homo sapiens[MNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQMNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANSQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS
>gi|114051746|ref|NP_001040585.1| protease, serine, 2 ]Macaca mulatta[>gi|114051746|ref|NP_001040585.1| protease, serine, 2 ]Macaca mulatta[MNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQMNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANSQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANS
>gi|6755891|ref|NP_035775.1| mesotrypsin ]Mus musculus[>gi|6755891|ref|NP_035775.1| mesotrypsin ]Mus musculus[MNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQMNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>gi|6981422|ref|NP_036861.1| protease, serine, 2 ]Rattus norvegicus[>gi|6981422|ref|NP_036861.1| protease, serine, 2 ]Rattus norvegicus[MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQMRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen ]Bos taurus[>gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen ]Bos taurus[MHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQMHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECLVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECL. . .. . .
![Page 44: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/44.jpg)
Input: multiple sequence Fasta fileInput: multiple sequence Fasta file>>gi|21536452|ref|NP_002762.2|gi|21536452|ref|NP_002762.2| mesotrypsin preproprotein ]Homo sapiens[mesotrypsin preproprotein ]Homo sapiens[
MNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQMNPFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANSQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS
>>gi|114051746|ref|NP_001040585.1|gi|114051746|ref|NP_001040585.1| protease, serine, 2 ]Macaca mulatta[protease, serine, 2 ]Macaca mulatta[MNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQMNPLLILAFVGVAVAAPFDDDDKIVGGYTCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAAHCYKTRIQVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALVRLGEHNIEVLEGTEQFINAAKIIRHPDYDRKTLNNDILLIKLSSPAVINARVSTISLPTAPPAAGAEALISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLISGWGNTLSSGADYPDELQCLEAPVLSQAECEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVSNGQLQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANSQGIVSWGYGCAQKNRPGVYTKVYNYVDWIRDTIAANS
>>gi|6755891|ref|NP_035775.1|gi|6755891|ref|NP_035775.1| mesotrypsin ]Mus musculus[mesotrypsin ]Mus musculus[MNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQMNALLILALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKTRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFNRKTLNNDIMLLKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNRELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>>gi|6981422|ref|NP_036861.1|gi|6981422|ref|NP_036861.1| protease, serine, 2 ]Rattus norvegicus[protease, serine, 2 ]Rattus norvegicus[MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQMRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAANQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN
>>gi|27819626|ref|NP_777115.1|gi|27819626|ref|NP_777115.1| pancreatic anionic trypsinogen ]Bos taurus[pancreatic anionic trypsinogen ]Bos taurus[MHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQMHPLLILAFVGAAVAFPSDDDDKIVGGYTCAENSVPYQVSLNAGYHFCGGSLINDQWVVSAAHCYQYHIQVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECLVRLGEYNIDVLEGGEQFIDASKIIRHPKYSSWTLDNDILLIKLSTPAVINARVSTLALPSACASGSTECL. . .. . .
![Page 45: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/45.jpg)
Step1: Load the sequencesStep1: Load the sequences
![Page 46: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/46.jpg)
Sequences and conservation viewSequences and conservation view
![Page 47: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/47.jpg)
Step2: Perform AlignmentStep2: Perform Alignment
![Page 48: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/48.jpg)
Sequences and conservation viewSequences and conservation view
![Page 49: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/49.jpg)
Sequences and conservation viewSequences and conservation view
![Page 50: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/50.jpg)
Step 3: Create treeStep 3: Create tree
![Page 51: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/51.jpg)
Step 4: NJPlotStep 4: NJPlot
![Page 52: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/52.jpg)
Step 4: NJPlotStep 4: NJPlot
![Page 53: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/53.jpg)
The Newick tree format is used to represent trees as strings
CA D
In Newick format: ((A,C),(B,D));
B
Each pair of parenthesis () enclose a clade in the tree, and the comma separates the members of the corresponding clade.“;” – is always the last character
![Page 54: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/54.jpg)
How How robustrobust is our tree is our tree??
![Page 55: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/55.jpg)
We need some statistical way to estimate We need some statistical way to estimate the confidence in the tree topologythe confidence in the tree topology
But we don’t know anything about the tree But we don’t know anything about the tree topology distribution or parameterstopology distribution or parameters
The only data source we have is our data The only data source we have is our data (MSA)(MSA)
So, we must rely on our own resources: So, we must rely on our own resources: “pull up by your own bootstraps”“pull up by your own bootstraps”
How robust is our treeHow robust is our tree??
![Page 56: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/56.jpg)
Bootstrap(and jackknife)
![Page 57: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/57.jpg)
Jackknife1. We create n (typically 100-1000) new MSAs (pseudo-data sets) by randomly sampling half of the characters. (random samples without replacement)
We do not change the number of sequences, just the number of positions!
POS: 523161 : TATTT2 : CATTT3 : CACTTN : AACTT
POS: 187451 : TTTAT2 : TAACC3 : TAACCN : TGGGA
POS: 183941 : TTGTA2 : TAGAC3 : TAAACN : TGAGG
![Page 58: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/58.jpg)
Jackknife2. We reconstruct a tree from each data set, using the same method used for reconstructing the original tree
POS: 523161 : TATTT2 : CATTT3 : CACTTN : AACTT
POS: 187451 : TTTAT2 : TAACC3 : TAACCN : TGGGA
POS: 183941 : TTGTA2 : TAGAC3 : TAAACN : TGAGG
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
![Page 59: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/59.jpg)
3. For each node in our original tree, we count the number of times it appeared in the Jackknife analysis
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Back to Jackknife
Sp1Sp2
Sp3
Sp4
67%100%
In 67% of the data sets, the node SP1+SP2 was found
![Page 60: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/60.jpg)
Bootstrap
The same as jackknife, but instead of sampling K/2 positions, we sample K positions with replacement
![Page 61: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/61.jpg)
Bootstrap
1. Resample K positions n times
12345 K1 : ATCTG…A 2 : ATCTG…C3 : ACTTA…C N : ACCTA…T
11244 K1 : AATTT…T2 : AATTT…G3 : AACTT…TN : AACTT…T
47789…K1 : TTTAT…T2 : TAACC…G3 : TAACC…TN : TGGGA…T
15578… K1 : AGGTA…T2 : AGGAC…G3 : AAAAC…AN : AAAGG…C
![Page 62: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/62.jpg)
Bootstrap2. Reconstruct a tree from each data set using the same method used for reconstructing the original tree
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
11244 K1 : AATTT…T2 : AATTT…G3 : AACTT…TN : AACTT…T
47789…K1 : TTTAT…T2 : TAACC…G3 : TAACC…TN : TGGGA…T
15578… K1 : AGGTA…T2 : AGGAC…G3 : AAAAC…AN : AAAGG…C
![Page 63: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/63.jpg)
Bootstrap3. For each node in our original tree, we count the number of times it appeared in the bootstrap analysis
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3Sp4
Sp1Sp2
Sp3
Sp4
67%100%
![Page 64: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/64.jpg)
Step 3.5 - BootstrapStep 3.5 - Bootstrap
![Page 65: Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2](https://reader035.vdocuments.mx/reader035/viewer/2022062313/56649d455503460f94a22239/html5/thumbnails/65.jpg)
Bootstrap values on NJPlotBootstrap values on NJPlot
Note:ClustalX saves trees as .ph filetrees with bootstrap are saved as .phb
You might have to reopen the tree…