introduction to bioinformatics - shandong...
TRANSCRIPT
1
Introduction to Bioinformatics
Dr. rer. nat. Gong Jing
Cancer Research Center
Medicine School of Shandong University
2012.11.09
Introduction to Introduction to BioinformaticsBioinformatics
2
Chapter 4 Phylogenetic
Tree
Introduction to Introduction to BioinformaticsBioinformatics
3
Introduction to Introduction to BioinformaticsBioinformatics
PhylogenyEvidence from morphological (形态学的), biochemical, and gene sequence data suggests that all organisms on earth are genetically related, and the genealogical (谱系的) relationship of living things can be represented by a vast evolutionary tree, the tree of Life. The tree of life then represents the phylogeny of organisms.
A phylogeny is a tree representation for the evolutionary history relating the species we are interested in.
4
Introduction to Introduction to BioinformaticsBioinformatics
The most authentic evidences are fossils! But fossils are scattered, not complete, not systematic.
How to Study the Evolutionary History
5
Introduction to Introduction to BioinformaticsBioinformatics
We can use comparative morphology and comparative anatomy (解剖学) to determine general framework of evolution. But many details are controversial.
How to Study the Evolutionary History
6
Introduction to Introduction to BioinformaticsBioinformatics
Basic assumptions:
1) Nucleic acid sequences and protein sequences contain all information of evolutionary history of species;
2) Molecular clock: the rate of evolutionary change (the number of amino acid differences) of a certain protein was approximately constant over time and over different lineages.
=> The more similar two homologous proteins are, the closer they are to their common ancestor.
How to Study the Evolutionary HistoryComputational molecular evolution: phylogenetic tree. Evolution process happened on the level of molecules: DNA, RNA and protein.
7
Homologous gene are genes that derive from a common ancestor.
They have 3 types of relationships:
Orthologs (直系同源): They’re separated by speciation — is the phenomenon during which a common ancestor gives birth to two subgroups that slowly drift away from their common genetic makeup to become distinct species. Orthologsusually have similar functions and structure.
Paralogs (间接同源): Paralogs are homologues separated by a duplication event, meaning that within a genome, a gene was duplicated. One of the duplicates may have kept the original function while the other duplicate could have acquired a new function.
Xenologs (异同源): Xeno is a Greek word that means “foreigner.” Xenologsresult from a lateral transfer between two organisms — a direct DNA transfer between two species. This means that one of the species contains a gene that does not have the same history as the genome in which it is inserted. This is often seen between pathogenetic bacteria and humans.
Introduction to Introduction to BioinformaticsBioinformatics
How to Study the Evolutionary History
8
Introduction to Introduction to BioinformaticsBioinformatics
How to Study the Evolutionary History
9
Phylogenetic TreeWhat is a phylogenetic tree used for?
For a certain protein/gene, determining the closest relatives of the organism that you’re interested in.
Discovering the function of a new protein/gene.
Retracing the origin of a gene.
Introduction to Introduction to BioinformaticsBioinformatics
10
Conceptions:
leaf / outer node
branch / lineage
inner node
root
Phylogenetic Tree
Introduction to Introduction to BioinformaticsBioinformatics
11
All these trees represent the same evolutionary relationships.
Cladogram Change-based phylogram Time-based phylogram
Branch lengths do Branch lengths indicate Inner nodes indicatenot mean anything. numbers of evolutionary branching time points.
changes
Phylogenetic Tree
Introduction to Introduction to BioinformaticsBioinformatics
With different branches, the phylogenetic trees have different names.
12
Phylogenetic Tree
Introduction to Introduction to BioinformaticsBioinformatics
There are many different ways to represent the information found in a phylogenetic tree.
13
Phylogenetic Tree
Introduction to Introduction to BioinformaticsBioinformatics
Branches can be rotated at a node, without changing the relationships among the out nodes.
14
Should you do this on the protein or on the DNA sequence?
If DNA sequences > 70% identical: DNA multiple sequence alignment.
If DNA sequences ˂ 70% identical: If your sequences code for proteins: translate them into proteins and build the protein multiple sequence alignment.
If your sequences are too similar at the protein level, you can thread the DNA sequences back onto the protein alignment using pal2nal: http://www.bork.embl.de/pal2nal/.
In practice, unless your sequences are almost identical, it is easier to keep working at the protein level.
Choosing Right Sequences for the Right Tree
Introduction to Introduction to BioinformaticsBioinformatics
choose right sequences
do multiple sequence alignment
build a phylogenetic
tree
15
Paralogs of a large human gene family: story of this gene family.
Orthologs from different species: much like a species tree.
Choosing Right Sequences for the Right Tree
Introduction to Introduction to BioinformaticsBioinformatics
16
Algorithms of Tree Reconstruction
Maximum Parsimony (MP) 最大简约法:
Closely related sequences, accurate, sequence number <= 12.
Distance (Neighbor Joining, NJ) 邻接法:
Distantly/closely related sequences, not very accurate.
Maximum Likelihood (ML) 最大似然法:
Distantly related sequences, very accurate.
Speed:
Distance > Maximum Parsimony > Maximum Likelihood
Introduction to Introduction to BioinformaticsBioinformatics
17
Algorithms of Tree Reconstruction
Introduction to Introduction to BioinformaticsBioinformatics
18
Preparing Your Multiple Sequence Alignment
Computing your multiple sequence alignment:ClustalW: http://www.ebi.ac.uk/Tools/msa/clustalw2/MUSCLE: http://www.ebi.ac.uk/Tools/msa/muscle/T-coffee: http://tcoffee.crg.cat/
Removing bad columns that affect the tree quality:1. Make sure there are as many gap-free columns as possible. 2. Remove the extremities of your multiple alignment.3. Remove the gap-rich regions of your alignment.4. Be sure to keep the most informative blocks.
Before using your MSA for building a tree, you must make sure that it is as accurate as possible.
19
1. Make sure there are as many gap-free columns as possible.
Preparing Your Multiple Sequence Alignment
columns to remove
20
2. Remove the “bad” terminals of your multiple alignment.
columns to remove
Preparing Your Multiple Sequence Alignment
21
3. Remove the gap-rich regions of your alignment.
columns to remove
Preparing Your Multiple Sequence Alignment
22
4. Be sure to keep the most informative blocks.
columns to keep
Preparing Your Multiple Sequence Alignment
23
How to Delete Columns with WORDWhile pressing the Alt key on your
keyboard, use the mouse to select entire columns in your alignment.
When you’ve selected everything you want to remove, press the Delete key to remove the selected block.
+
24
Computing Your Tree
Guide Tree is NOT a phylogenetic tree.!
25
EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
26
EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
27
EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
28
clustalw.aln
sequences.fasta
EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
29
This tree is much more accurate than a guide tree!
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
30
A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
Phylogram Tree
31
A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.
Cladogram Tree
Computing Your Tree
English Courses English Courses for for
Graduate StudentsGraduate Students
32
Different tree representation by choosing display options.
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Computing Your Tree
33
The easiest way to save your tree is to make a screen capture with theprint-screen (PrntScr) key on your keyboard. You can then cut and pastethis image into your favorite application (PowerPoint, Paint. etc.).
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Displaying Your Tree
Paste (Ctrl + V) into Windows-Paint
34
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
MyTree.ph
Displaying Your Tree
35
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html
MyTree.ph
Displaying Your Tree
36
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html
right click MyTree.png
Displaying Your Tree
37
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
sequences.fasta
clustalw.aln
MyTree.ph
MyTree.png