gene trees abhita chugh. phylogenetic tree evolutionary tree showing the relationship among various...
TRANSCRIPT
GENE TREES
Abhita Chugh
Phylogenetic tree
Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor
Species tree
• A phylogenetic tree showing the relationship among various species that are believed to have a common ancestor
Species tree
Shows the evolutionary history of a set of species
Speciation Nodes
Gene tree
• A phylogenetic tree that depicts how a single gene has evolved in a group of related species
• For this talk, evolve = duplication or loss
• Can be constructed over the topology of a species tree
Gene tree
Shows the evolutionary history of a single gene
Speciation Nodes
Duplicationnodes
Some definitions: Homologs
• Homolog: A gene related to a second gene by descent from a common ancestral DNA sequence
• Two types:
(i) Orthologs
(ii) Paralogs
Orthologs
• Genes in different species that evolved from a common ancestral gene by speciation - Retain the same function
Primates
Human Chimp
Speciation
Paralogs
• Genes related by duplication within a genome
• Evolve new functions
Primates
Chimp HumanRat
Rodents
Mouse
Why are Gene Trees interesting?
• Determine the evolutionary history of a gene family
• Infer gene duplications and losses
• Estimate bounds on times these events occurred
• Determine whether a given pair of homologs is orthologous or paralogous
Gene tree can be constructed over a species tree topology
PRIMATES INTELLIGENCE
No, seriously ..
Gene Tree Reconstruction
• Problem: Given a set of sequences from a gene family, find the tree that best explains the data
• 2 models:– Micro-evolutionary: considers sequence
evolution only– Macro-evolutionary: considers duplication
and losses only; useful but rarely used
Macro-evolutionary Problem
Macro-evolutionary Problem
Reconstruction algorithm
• Only macroevolutionary events are considered
• i – number of gene copies a node inherits from its parent
• j – number of gene copies a node sends to its children
• Range from 1 to m, where m is the maximum multiplicity of the gene in any species
Reconstruction algorithm
• The entering number of genes in root should be one• For each node, v, the dynamic program calculates the
minimum D/L Score of the subtree rooted at v, for all possible values of i and j
Step 1: Annotates minimum cost tables for all nodes
• cost [ i, j ] = cost at a node if it inherits i genes and sends j genes
• cost [ i ] = minimum cost at a node if it inherits i genes
= minimum { cost [ i, j ] }, for all j
cost[1] = 1cost[2] = 0
cost[1] = 1cost[2] = 0
cost[1] = 0cost[2] = 1
Cost of an internal node = cost of duplication/loss at the node + optimal cost of left subtree + optimal cost of right subtree, if they inherit j copies
cost[1, 1] = 0 + 0 + 1 = 1cost[1, 2] = 1 + 1 + 0 = 2cost[2, 1] = 1 + 0 + 1 = 2
cost[2, 2] = 0 + 1 + 0 = 1
cost[1] = 1cost[2] = 1
cost[1, 1] = 0 + 1 + 1 = 2cost[1, 2] = 1 + 0 + 1 = 2cost[1] = 2
Step 2: Enumerate all histories from the cost tables
• Maintain 3 variables for each node
• dups = optimal number of duplicated genes
• losses = optimal number of lost genes
• out = optimal number of genes to pass to its children
out = 1, losses = dups = 0
dups = 1losses = 0
out = 1 , losses = dups = 0
dups = 0losses = 0
dups = 1losses = 0
Step 3: Build a gene tree to represent the history
• From step 2: 1 duplication in humans & 1 duplication in frogs
• Build the gene tree with this information & the topology of the species tree
Hybrid Model