orthologs: two genes, each from a different species , that descended from

22
1 hologs: Two genes, each from a different species, that descended fro a single common ancestral gene logs: Two or more genes, within the same species, that originated by one or more gene duplication events (note no regard to function!) Orthology & Paralogy (etc. etc.)

Upload: ewan

Post on 31-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Orthology & Paralogy (etc. etc.). Orthologs: Two genes, each from a different species , that descended from a single common ancestral gene. (note no regard to function!). Paralogs : Two or more genes, within the same species , that originated by one or more gene duplication events. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Orthologs:   Two genes,  each from a different species , that descended from

1

Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene

Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events

(note no regard to function!)

Orthology & Paralogy (etc. etc.)

Page 2: Orthologs:   Two genes,  each from a different species , that descended from

2

A B C D ESPECIES TREE

A1 B1 C1 D1 E1GENE TREE

Clear case of orthology: each gene 1 in each species is an orthologOf the others - all descended from a single common ancestor

Ancestral Gene 1

Ancestral species

Page 3: Orthologs:   Two genes,  each from a different species , that descended from

3

A1 B1 C1 D1 E1GENE TREE

Ancestral Gene 1

C2 D2A B C D ESPECIES TREE

Ancestral species

Duplication event along branch to species C & DC1 and C2 are paralogs, D1 and D2 are paralogs

What about A1 to C1? To C2?

Gene duplication along this species branch

Page 4: Orthologs:   Two genes,  each from a different species , that descended from

4

Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene

Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events

(note no regard to function!)

Also now many subtle variants:Outparalogs: cross-species paralogs (i.e. gene duplication BEFORE speciation)

Inparalogs: lineage-specific duplication (i.e. duplication AFTER speciation)

Ohnolog: duplicates originating from a whole-genome duplication (WGD)

Xenolog: genes related by horizontal gene transfer between species

Orthology & Paralogy (etc. etc.)

Page 5: Orthologs:   Two genes,  each from a different species , that descended from

5

Phenology vs. Phylogeny

Phenology: tree based on similarity of characteristics

1. Align protein & score alignment(# of identical and ‘conserved’ amino acids)

2. Build a tree based on sequence similarity

A1 B1 C1 C2A1 is more similar to C1 than C2 -

A1 & C1 are likely (* but not guaranteed!) more similar functionally

Phylogeny: tree based on evolutionary history

A1 B1 C1 C2But historically, A1 is

equally distant to C1 and C2

1. Requires inferring history across the species

Page 6: Orthologs:   Two genes,  each from a different species , that descended from

6

Species AGene A1Gene A2

. . .

Gene An

Species B

1. BLAST Gene A1 against Species B genome2. Take top BLAST hit in Species B and use as the query against Species A3. If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs

Gene B1Gene B2

. . .

Gene Bn

Methods of orthology prediction

1. Reciprocal best-BLAST hits (RBH): simplest method

Page 7: Orthologs:   Two genes,  each from a different species , that descended from

7

Methods of orthology prediction

1. Reciprocal best-BLAST hits (RBH): simplest method

Species A Species B

1. BLAST Gene A1 against Species B genome2. Take top BLAST hit in Species B and use as the query against Species A3. If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs

Gene B1Gene B2

. . .

Gene Bn

Gene A1Gene A2

. . .

Gene An

Page 8: Orthologs:   Two genes,  each from a different species , that descended from

8

Problems with RBH

* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains

* Gene duplications in one species can completely obscure orthologous hits

* Orthologs with very low sequence homology can be missed altogether

Page 9: Orthologs:   Two genes,  each from a different species , that descended from

9

Methods of orthology prediction

2. Reciprocal Smallest Distance (RSD): slightly more complicated

Species A Species B

1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)

Gene B1Gene B2

. . .

Gene Bn

Gene A1Gene A2

. . .

Gene An

Page 10: Orthologs:   Two genes,  each from a different species , that descended from

10

1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions

2. Reciprocal Smallest Distance (RSD): slightly more complicated

Methods of orthology prediction

Page 11: Orthologs:   Two genes,  each from a different species , that descended from

11

1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions4. Take remaining proteins and find the single one with the closest evolutionary distance

2. Reciprocal Smallest Distance (RSD): slightly more complicated

Methods of orthology prediction

Page 12: Orthologs:   Two genes,  each from a different species , that descended from

12

Species A Species BGene B1Gene B2

. . .

Gene Bn

Gene A1Gene A2

. . .

Gene An

1. BLAST Gene A1 against Species B genome2. Take X number of top BLAST hits (user determined)3. Do a global multiple alignment - throw out proteins with <Y% gapped positions4. Take remaining proteins and find the single one with the closest evolutionary distance5. Final reciprocal BLAST using remaining gene in Species B as query against Genome A

2. Reciprocal Smallest Distance (RSD): slightly more complicated

Methods of orthology prediction

Page 13: Orthologs:   Two genes,  each from a different species , that descended from

13

Problems with RSD

* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains

* Gene duplications in one species can completely obscure orthologous hits

* Orthologs with very low sequence homology can be missed altogether

Page 14: Orthologs:   Two genes,  each from a different species , that descended from

14

3. Newest methods take synteny into account

Methods of orthology prediction

Syntenic = conserved gene/sequence order

Gene A1 A2 A3 A4

Gene B1 B2 B3 B4

Page 15: Orthologs:   Two genes,  each from a different species , that descended from

15

Problems with Synteny-based Methods

* Clear cases where the top BLAST hit is NOT the orthologe.g. top hits can be highly conserved common domains

* Gene duplications in one species less likely to obscure things

* Orthologs with low sequence homology not part of a larger duplicationcould still be missed

Page 16: Orthologs:   Two genes,  each from a different species , that descended from

16

Methods of orthology prediction

4. Clusters of Orthologs (COG) approach: - Addresses the restriction of 1:1 orthologs- Identifies inparalogs and then id’s orthologous relationships between groups

Species A B C D

Several approaches can assign COGs across many species at once (InParanoid, Fuzzy RB)

Page 17: Orthologs:   Two genes,  each from a different species , that descended from

17

Why is orthology-paralogy so important?

Allows us to study the history of protein evolution & infer constraints

A1 B1 C1 D1 E1GENE TREE

Ancestral Gene 1

C2 D2

Gene duplication along this species branch

A2

Separate gene duplicationin Species A

Page 18: Orthologs:   Two genes,  each from a different species , that descended from

18

Page 19: Orthologs:   Two genes,  each from a different species , that descended from

19

Glucocorticoid Receptor (GR)

Mineralocorticoid Receptor (MR)

Ligand Governs

Cortisol Stress Response

Aldosterone (tetrapods)DOC (teleosts)

Electrolyte Homeostasis

* Teleosts don’t make aldosterone

Page 20: Orthologs:   Two genes,  each from a different species , that descended from

20

Figure 1

Blue = Aldo bindingRed = Cortisol ONLY

Page 21: Orthologs:   Two genes,  each from a different species , that descended from

21

Two amino-acid changes in AncCR can alter specificity

Blue = DOCRed = CortisolGreen = Aldo

S106P likely occurred FIRST, then L111Q

Page 22: Orthologs:   Two genes,  each from a different species , that descended from

22

Model for evolution of ligand binding & hormone response

1. Ancestral protein could bind Aldo, even though no Aldo present2. Duplication ~450 mya = redundant receptors3. Two successive changes in GR = switch to Cortisol Specificity4. Emergence of Aldosterone Hormone