• comments, notes, explanations, or other types of external remarks that can be attached to a document……
• For genomicsfunctional annotation means attaching biological information to sequences
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Functional Annotation
Manual curation
Structural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Searches
Domain/MotifsNucleotide/Protein Databases
Functional AnnotationStructural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Manual curation
Searches
Domain/MotifsNucleotide/Protein Databases
Automated Searches
• Search programs can be downloaded and run internally on unix system
• Graphic user interfaces but normally takes limited sequences
Homology or similarity based searches
• Local pairwise alignment tools : look for any regions of similarity within the proteins that score well.– BLAST
• fast
• Global pairwise alignment tools take two sequences and attempt to find an alignment of the two over their full lengths.– Needleman-Wunsch
• finds best out of all possible alignments
• Multiple alignments tools try to align 3 or more proteins so that the maximal number of amino acids from each protein are matched in the alignment - this may or may not include the full length of some or all of the proteins– clustalW
BLAST Programs
• Blastn: Search a nucleotide database using a nucleotide query
• BlastP: Search protein database using a protein query
• Blastx: Search protein database using a translated nucleotide query
• Tblastn: Search translated nucleotide database using a protein query
• Tblastx: Search translated nucleotide database using a translated nucleotide query
Example of BLAST output
top row is the search protein (query) and the bottom row is the match protein (subject).Middle row is consensus+ indicates similar amino acidsnumbers indicate amino acid position in the sequence
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Functional AnnotationStructural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Manual curation
Searches
Domain/MotifsNucleotide/Protein Databases
Domain Search
Hidden Markov Models• Stastistical models of the primary
structure consensus of a sequence family
Presenter
Presentation Notes
Add Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Pfamhttp://pfam.sanger.ac.uk/
• Large collection of protein families represented by multiple sequence alignments and HMMs
• Analyze protein sequences for Pfam match• Look at multiple alignments of members of
the gene family
INTERPROhttp://www.ebi.ac.uk/interpro/
• Database of protein families, domains and sites with identified in known proteins which can be applied to new protein sequences
• Collects protein families from other databases such as Pfam, UniProtKb and TIGRFAMs
• Sequence search is done with InterProScanDownloadable (rans faster on own
server, large set)GUI (limited number of sequences)
Subcellular localization
• Signal P:Predicts the presence and location of signal peptide and cleavage sites in organism
• TMHMM: Predicts transmembrane • TargetP:Predicts subcellular location based
on chlroplast transit peptide and mitochondrial targeting sequence
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Signal P Searchhttp://www.cbs.dtu.dk/services/SignalP/
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Sample SignalP OutputCRN2…confirmed with proteomics
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Sample SignalP OutputCRN2…confirmed with proteomics
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Search EC numbershttp://ca.expasy.org/enzyme/
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Functional AnnotationStructural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Manual curation
Searches
Domain/MotifsNucleotide/Protein Databases
Metabolic Pathways
•Help improve annotation by showing missing genes in essentail pathways•Useful for comparative genomicsKEGG: http://www.genome.jp/kegg/pathway.htmlReactome: http://www.reactome.orgMetacyc:http://www.metacyc.org
Add lots of others
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
KEGG: Kyoto Encyclopedia of Genes and Genomes
http://www.genome.jp/kegg/pathway.html
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Functional AnnotationStructural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Manual curation
Searches
Domain/MotifsNucleotide/Protein Databases
First set of terms
These processes are general to all associations
Some initial PAMGO Biological Process Terms Included in initial 35 terms added Jan 2005
GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death
oomycete
bacterium
GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death
GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death
oomycete
bacterium
GO: 0009405 pathogenesis
Functional AnnotationStructural Annotation
Automated GO MetabolicPathways
EC Number
Assignments
Manual curation
Searches
Domain/MotifsNucleotide/Protein Databases
Why manual Annotation
Combine all search information and evidenceManually look through all informationAdd experimental data from literature when availableApproach conservatively
SetbackTime-consuming and more expensive.
Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.