orthologs: two genes, each from a different species , that descended from
DESCRIPTION
Orthology & Paralogy (etc. etc.). Orthologs: Two genes, each from a different species , that descended from a single common ancestral gene. (note no regard to function!). Paralogs : Two or more genes, within the same species , that originated by one or more gene duplication events. - PowerPoint PPT PresentationTRANSCRIPT
1
Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene
Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events
(note no regard to function!)
Orthology & Paralogy (etc. etc.)
2
A B C D ESPECIES TREE
A1 B1 C1 D1 E1GENE TREE
Clear case of orthology: each gene 1 in each species is an orthologOf the others - all descended from a single common ancestor
Ancestral Gene 1
Ancestral species
3
A1 B1 C1 D1 E1GENE TREE
Ancestral Gene 1
C2 D2A B C D ESPECIES TREE
Ancestral species
Duplication event along branch to species C & DC1 and C2 are paralogs, D1 and D2 are paralogs
What about A1 to C1? To C2?
Gene duplication along this species branch
4
Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene
Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events
(note no regard to function!)
Also now many subtle variants:Outparalogs: cross-species paralogs (i.e. gene duplication BEFORE speciation)
Inparalogs: lineage-specific duplication (i.e. duplication AFTER speciation)
Ohnolog: duplicates originating from a whole-genome duplication (WGD)
Xenolog: genes related by horizontal gene transfer between species
Orthology & Paralogy (etc. etc.)
5
Phenology vs. Phylogeny
Phenology: tree based on similarity of characteristics
1. Align protein & score alignment(# of identical and ‘conserved’ amino acids)
2. Build a tree based on sequence similarity
A1 B1 C1 C2A1 is more similar to C1 than C2 -
A1 & C1 are likely (* but not guaranteed!) more similar functionally
Phylogeny: tree based on evolutionary history
A1 B1 C1 C2But historically, A1 is
equally distant to C1 and C2
1. Requires inferring history across the species
6
Species AGene A1Gene A2
. . .
Gene An
Species B
1. BLAST Gene A1 against Species B genome2. Take top BLAST hit in Species B and use as the query against Species A3. If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs
Gene B1Gene B2
. . .
Gene Bn
Methods of orthology prediction
1. Reciprocal best-BLAST hits (RBH): simplest method
7
Methods of orthology prediction
1. Reciprocal best-BLAST hits (RBH): simplest method
Species A Species B
1. BLAST Gene A1 against Species B genome2. Take top BLAST hit in Species B and use as the query against Species A3. If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs
Gene B1Gene B2
. . .
Gene Bn
Gene A1Gene A2
. . .
Gene An
8
Problems with RBH
* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains
* Gene duplications in one species can completely obscure orthologous hits
* Orthologs with very low sequence homology can be missed altogether
9
Methods of orthology prediction
2. Reciprocal Smallest Distance (RSD): slightly more complicated
Species A Species B
1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)
Gene B1Gene B2
. . .
Gene Bn
Gene A1Gene A2
. . .
Gene An
10
1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions
2. Reciprocal Smallest Distance (RSD): slightly more complicated
Methods of orthology prediction
11
1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions4. Take remaining proteins and find the single one with the closest evolutionary distance
2. Reciprocal Smallest Distance (RSD): slightly more complicated
Methods of orthology prediction
12
Species A Species BGene B1Gene B2
. . .
Gene Bn
Gene A1Gene A2
. . .
Gene An
1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions4. Take remaining proteins and find the single one with the closest evolutionary distance5. Final reciprocal BLAST using remaining gene in Species B as query against Genome A
2. Reciprocal Smallest Distance (RSD): slightly more complicated
Methods of orthology prediction
13
Problems with RSD
* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains
* Gene duplications in one species can completely obscure orthologous hits
* Orthologs with very low sequence homology can be missed altogether
14
3. Newest methods take synteny into account
Methods of orthology prediction
Syntenic = conserved gene/sequence order
Gene A1 A2 A3 A4
Gene B1 B2 B3 B4
15
Problems with Synteny-based Methods
* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains
* Gene duplications in one species less likely to obscure things
* Orthologs with low sequence homology not part of a larger duplicationcould still be missed
16
Methods of orthology prediction
4. Clusters of Orthologs (COG) approach: - Addresses the restriction of 1:1 orthologs- Identifies inparalogs and then id’s orthologous relationships between groups
Species A B C D
Several approaches can assign COGs across many species at once (InParanoid, Fuzzy RB)
17
Why is orthology-paralogy so important?
Allows us to study the history of protein evolution & infer constraints
A1 B1 C1 D1 E1GENE TREE
Ancestral Gene 1
C2 D2
Gene duplication along this species branch
A2
Separate gene duplicationin Species A
18
19
Glucocorticoid Receptor (GR)
Mineralocorticoid Receptor (MR)
Ligand Governs
Cortisol Stress Response
Aldosterone (tetrapods)DOC (teleosts)
Electrolyte Homeostasis
* Teleosts don’t make aldosterone
20
Figure 1
Blue = Aldo bindingRed = Cortisol ONLY
21
Two amino-acid changes in AncCR can alter specificity
Blue = DOCRed = CortisolGreen = Aldo
S106P likely occurred FIRST, then L111Q
22
Model for evolution of ligand binding & hormone response
1. Ancestral protein could bind Aldo, even though no Aldo present2. Duplication ~450 mya = redundant receptors3. Two successive changes in GR = switch to Cortisol Specificity4. Emergence of Aldosterone Hormone