fog: high-resolution fungal orthologous groups rené van der heijden project 5.10: comparative...

29
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways in Saccharomyces cerevisiae

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

FOG: High-Resolution Fungal Orthologous Groups

René van der Heijden

Project 5.10: Comparative genomics for the prediction of protein function and pathways in

Saccharomyces cerevisiae

What is this presentation about?

• What is ‘orthology’?

• Why do we study gene-ancestry/gene-trees (phylogenies)?

• Why high-resolution orthology?

• Automated high-resolution orthology detection

• The FOG database and some applications

Orthology

“This gene in that other species …”

• We don’t have chicken genes !

• They mean: the corresponding gene ?

• Why that particular gene ?

• Sure this actually is the gene ?

• Sure that all n orthologs are correct ?

the line represents a genein some ancestral species

a long long time agoin a land far far away

there is a speciation eventspeciation event resulting in two species

with the same, orthologousorthologous gene

time

one of the genes gets duplicatedresulting in two paralogous genes

another speciation event …but one of the paralogous genes is lost

in one of the new species

another speciation eventcurrent set of genes with apparent history

Orthologous genes

orthologsorthologs

paralogs

Duplications, Speciations, and Orthology

Two genes in two species are orthologous ifthey derive from one gene

in their last common ancestor

Orthologous genes are likely to have the same function

Detecting orthologous genes

• Usual methods based on blast hit quality:e.g. bi-directional best hit (BBH)

BBH

ortholog

BBH

ortholog

KOG clusters

• Based on triangle of BBH between genes of three species

• InParalogs are added

• Triangles are extended by other genes and other species

KOG statistics

These large KOG clusters must have multiple representatives per species

Low Resolution: Low Resolution: There must be There must be functional specialization functional specialization

within these clusters!within these clusters!

High-res versus Low-res

• Many,

• Complete, and

• Closely related

genomes

Challenge:Automatic Orthology assignment

Gene Families• Use PSI-blast to recognize (distant)

homologs• Split gene set into families of homologous

genes

Challenge:Promiscuous domains

Multi domain genes occur Multi domain genes occur very very oftenoften in Eukaryotic genomes in Eukaryotic genomes

Gene Families

• Promiscuous domains cause genes to be only partially homologous:– Gene A-B is partially homolgous to gene A-C,

as is gene B-C

• Merging everything with homologous parts generates far too large gene families:– Not possible to obtain proper multiple alignments

• More advanced technique for separating multi-domain genes into gene families

Generating Gene Families

• More advanced technique for the merging of genes into gene families is not functional yet

• Fall back on ‘known’ gene families using KOG:– Low resolution orthology assignments for Eukaryotes

– Some inclusive families with many genes per species

Some statistics:• 15 Fungal species with 104.440 genes in total• Divided into 11.020 KOG clusters (gene families)• Involving 70.867 genes (= 68%)

Uncertainty in trees

• Evolutionary noise– Differing rates of evolution

– Convergent evolution (low complexity, coiled coils)

– Promiscuous domains (recombination, fusion, fission)

• Use of heuristic methods– Multiple alignment

– Tree making

Reading Gene-Trees

Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1

The tree suggests at least 2 gene losses

Analyze trees … but don’t trust them fully

• Rigid analysis suggests many duplications and losses• Presume scp branch is wrongly placed!

If this is correct …. this can’t be

Three orthologous groups suggesting 15 gene losses

Considering one wrongly placedgene leaves only 2 gene losses

Analyze trees … but don’t trust them fully

• And if we accept wrong placement of branches …

Automatic Orthology assignment

• LOFT: Levels of Orthology From Trees

Result

• Collection of genes is split into KOG families

• KOG families are aligned and phylogenetic trees are derived

• Phylogenetic trees are analyzed using LOFT resulting in high-resolution orthology

Result

Can LOFTbe trusted?

It seemsokay!

Applications

• We now have FOG: a complete set of high resolution orthology assignments for fungi

• We ‘know’ which orthologous genes are present and absent in which species

• Phyletic distribution

Complex I

Complex I

Complex I

Phyletic distribution of mitochondrial orthologous

groups

Phylogenetic Tree for

Mitochondrial

Carrier Proteins

Orthologous group 24 is an uncharacterized mitochondrial carrier

It is present in all fungi, except

in Ashbya gossypii

In yeast this is known as YMC1, unknown function

YMC1: predicted glycine/serine antiporter

• There are three S.cerevisiae genes with the same phyletic distribution:– subunit glycine decarboxylase– other subunit glycine decarboxylase– gene with unknown function