fog: high-resolution fungal orthologous groups rené van der heijden project 5.10: comparative...
Post on 20-Dec-2015
214 views
TRANSCRIPT
FOG: High-Resolution Fungal Orthologous Groups
René van der Heijden
Project 5.10: Comparative genomics for the prediction of protein function and pathways in
Saccharomyces cerevisiae
What is this presentation about?
• What is ‘orthology’?
• Why do we study gene-ancestry/gene-trees (phylogenies)?
• Why high-resolution orthology?
• Automated high-resolution orthology detection
• The FOG database and some applications
Orthology
“This gene in that other species …”
• We don’t have chicken genes !
• They mean: the corresponding gene ?
• Why that particular gene ?
• Sure this actually is the gene ?
• Sure that all n orthologs are correct ?
the line represents a genein some ancestral species
a long long time agoin a land far far away
there is a speciation eventspeciation event resulting in two species
with the same, orthologousorthologous gene
time
one of the genes gets duplicatedresulting in two paralogous genes
another speciation event …but one of the paralogous genes is lost
in one of the new species
another speciation eventcurrent set of genes with apparent history
Orthologous genes
orthologsorthologs
paralogs
Duplications, Speciations, and Orthology
Two genes in two species are orthologous ifthey derive from one gene
in their last common ancestor
Orthologous genes are likely to have the same function
Detecting orthologous genes
• Usual methods based on blast hit quality:e.g. bi-directional best hit (BBH)
BBH
ortholog
BBH
ortholog
KOG clusters
• Based on triangle of BBH between genes of three species
• InParalogs are added
• Triangles are extended by other genes and other species
KOG statistics
These large KOG clusters must have multiple representatives per species
Low Resolution: Low Resolution: There must be There must be functional specialization functional specialization
within these clusters!within these clusters!
High-res versus Low-res
• Many,
• Complete, and
• Closely related
genomes
Challenge:Automatic Orthology assignment
Gene Families• Use PSI-blast to recognize (distant)
homologs• Split gene set into families of homologous
genes
Challenge:Promiscuous domains
Multi domain genes occur Multi domain genes occur very very oftenoften in Eukaryotic genomes in Eukaryotic genomes
Gene Families
• Promiscuous domains cause genes to be only partially homologous:– Gene A-B is partially homolgous to gene A-C,
as is gene B-C
• Merging everything with homologous parts generates far too large gene families:– Not possible to obtain proper multiple alignments
• More advanced technique for separating multi-domain genes into gene families
Generating Gene Families
• More advanced technique for the merging of genes into gene families is not functional yet
• Fall back on ‘known’ gene families using KOG:– Low resolution orthology assignments for Eukaryotes
– Some inclusive families with many genes per species
Some statistics:• 15 Fungal species with 104.440 genes in total• Divided into 11.020 KOG clusters (gene families)• Involving 70.867 genes (= 68%)
Uncertainty in trees
• Evolutionary noise– Differing rates of evolution
– Convergent evolution (low complexity, coiled coils)
– Promiscuous domains (recombination, fusion, fission)
• Use of heuristic methods– Multiple alignment
– Tree making
Reading Gene-Trees
Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1
The tree suggests at least 2 gene losses
Analyze trees … but don’t trust them fully
• Rigid analysis suggests many duplications and losses• Presume scp branch is wrongly placed!
If this is correct …. this can’t be
Three orthologous groups suggesting 15 gene losses
Considering one wrongly placedgene leaves only 2 gene losses
Analyze trees … but don’t trust them fully
• And if we accept wrong placement of branches …
Result
• Collection of genes is split into KOG families
• KOG families are aligned and phylogenetic trees are derived
• Phylogenetic trees are analyzed using LOFT resulting in high-resolution orthology
Applications
• We now have FOG: a complete set of high resolution orthology assignments for fungi
• We ‘know’ which orthologous genes are present and absent in which species
• Phyletic distribution
Orthologous group 24 is an uncharacterized mitochondrial carrier
It is present in all fungi, except
in Ashbya gossypii
In yeast this is known as YMC1, unknown function