introduction to bioinformatics
DESCRIPTION
Introduction to Bioinformatics. Alexandra M Schnoes Univ. California San Francisco [email protected]. What is Bioinformatics?. Intersection of Biology and Computers Broad field Often means different things to different people Personal Definition: - PowerPoint PPT PresentationTRANSCRIPT
What is Bioinformatics?
Intersection of Biology and Computers Broad field
Often means different things to different people
Personal Definition: The utilization of computation for biological
investigation and discovery—the process by which you unlock the biological world through the use of computers.
What does one do in Bioinformatics?
(a small sample)
dsafd
dsafd
C
O–
O
O OH
PO-O
O-P
O
-O
O
O-O-
OH
O-
OH
O
O-
OH
O
H
-OO-
O
O
-O
O
O O
H
?
Our Lab: Understanding Protein (Enzyme) Function
What does one do in Bioinformatics?
(a small sample) Discover new drug targets—computational docking
Atreya, C. E. et al. J. Biol. Chem. 2003;278:14092-14100Shoichet, B. K. Nature. 2004;432:862-865
What does one do in Bioinformatics?
(a small sample) Systems Biology
sbw.kgi.edu/ www.sbi.uni-rostock.de/ research.html
This lab: Nucleotide & Protein Informatics
Sequence analysis Finding similar sequences Multiple sequence alignment Phylogenetic analysis
C
O–
O
O OH
PO-O
O-P
O
-O
O
O-O-
OH
SequenceStructureFunction
Process of Evolution
Sequences change due to Mutation Insertion Deletion
Use Evolutionary Principles to Analyze Sequences
If sequence A and sequence B are similar A and B evolutionarily related
If sequence A, B and C are all similar but A and B are more similar than A and C and B and C. A and B are more closely evolutionarily related to each
other than to C
Extremely Powerful Idea
1. Start with unknown sequence
2. Find what the unknown is similar to
3. Use information about the known to make predictions about the unknown
How do you know when sequences are similar?
Align two sequences together and score their similarity
TASSWSYIVE
TATSFSYLVG
Use substitution matrices to score the alignment
Substitution Matrices Give a Score for Each Mutation
Many different matrices available Blosum matrices standard in the field
Blosum 62 Scoring matrix
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
http://www.carverlab.org/images/
Scoring: Add up the positional Scores
Score of 30
TASSWSYIVE
TATSFSYLVG
TASSWSYIVE
TATSFSYLVG
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Score of 1
Additional issues…
Gaps (insertions/deletions) Have scoring penalties for opening and continuing a
gap
TASSWSYIVE TASSWSYIVE
TATSFLVG TATSF--LVG
How do we find similar sequences?
Start at the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/
How do we find similar sequences?
Nucleotide Sequence Databases
How do we find similar sequences?
Protein Sequence Databases
How do we find similar sequences?
Similarity Search: BLAST Basic Local Alignment Search Tool
BLAST is very quick but … Only local alignments Alignments aren’t great Only pair-wise alignments
Want better alignments … Multiple alignment
Multiple sequences Better signal to noise
More Sequences = Better alignment More accurate reflection of evolution
ClustalW Commonly used Easy to use
Visualize the Multiple Alignment
Use the Alignment to Calculate Evolutionary Distances
See ‘how close’ sequences are to each other
Best way to tell what is ‘most similar’
Can calculate simple tree from clustalW
Taubenberger et al., Nature: 437, 889-893, 2005
Caveats!
In reality Sequences (even parts of sequences) can evolve at
different rates Don’t have a good understanding of sequence and
function High sequence identity does not always mean the same
function Getting good alignments and good trees can be very
hard
Bioinformatics: Sequence Analysis
1. Start with unknown sequence
2. Find similar sequences
3. Create alignment
4. Create phylogenetic tree
5. Use information about knowns to make predictions about unknown
Mini Virus Intro— Often considered ‘not alive’ Extremely small (much smaller than a cell) Cellular parasites
Has a genome but can only reproduce inside a host cell
Different Viruses
RNA & DNA viruses Both single and double-
stranded
Different Viruses
RNA & DNA viruses Both single and double-
stranded
Influenza Virus
Influenza Virus (flu) Small genome—8 RNA molecules Evolves quickly– genetic drift, antigenic shift
Influenza Virus (flu)
Sequencing
Reverse Transcriptase
DNA
SequencingGenomic Nucleotide Sequence
Influenza Pandemics
1918 Flu Killed from 50-100 Mil. people worldwide Considered to be one of the most deadly pandemics Killed many of the young and healthy Influenza A, Type H1N1 Thought to have derived from Avian Influenza Recently reconstituted from recovered human samples Considerable ethical debate
Avian Influenza
Current fear of pandemic High mortality rate (including young and healthy) Current concern is Influenza A, Type H5N1 Still only transmitted by contact with birds Is now in Asia and Eastern and Western Europe
This lab: Nucleotide & Protein Informatics
Sequence analysis Finding similar sequences Multiple sequence alignment Phylogenetic analysis