Julio César Chávez Galarza
Abril de 2009Min
ho
2009
U
Universidade do Minho
Escola de Ciências
Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes
Jul
io C
ésar
Chá
vez
Gal
arza
Ph
ylo
ge
ne
tic
rela
tio
nsh
ips
of
the
mo
st c
om
mo
n p
ath
og
en
ic
Can
dida
sp
eci
es
infe
rre
d b
y se
qu
en
ce a
na
lysi
s o
f n
ucl
ea
r g
en
es
Tese de Mestrado Mestrado em Genética Molecular
Trabalho efectuado sob a orientação da Professora Doutora Ana Paula Fernandes Monteiro Sampaio Carvalho
Julio César Chávez Galarza
Abril de 2009
Universidade do Minho
Escola de Ciências
Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes
É AUTORIZADA A REPRODUÇÃO PARCIAL DESTA TESE,
APENAS PARA EFEITOS DE INVESTIGAÇÃO, MEDIANTE DECLARAÇÃO
ESCRITA DO INTERESSADO, QUE A TAL SE COMPROMETE.
iii
This Thesis was supported by the Programme ALβAN, the European Union Programme of High
Level Scholarships for Latin America, scholarship No E06M103915PE.
v
DEDICATORIA
A Jehová Deus por tudo
Aos meus pais Julio y Claudia pelo seu
amor e apoio em todo momento
vii
AGRADECIMENTOS
Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula
Sampaio, orientadora deste trabalho, por todo apoio, dedicação, determinação, constante
estímulo, e em particular por todos os ensinamentos que me transmitiu.
À Prof. Doutora Célia Pais, pelo seu apoio, ensinamentos e valiosas críticas deste
trabalho.
À Prof. Cândida Lucas, por toda a sua ajuda que me proporcionou para começar os
meus estudos em Portugal.
Ao Prof. Rui Oliveira, pelos seus conselhos e amizade.
Ao Prof. Felipe Costa, pelo seu apoio e sugestões na realização desta tese.
Ao Prof. Pedro Carvalho pela sua amizade e acolhida quando cheguei a Portugal.
Aos elementos que integram o Departamento de Biologia e, nomeadamente os seus
docentes, investigadores e técnicos, por todo o apoio demonstrado.
Aos colegas que integram o laboratório Yolanda, Alexandra, Marlene, Eugénia, Ana,
Rafael, Raquel, pelo auxilio no trabalho, pela companhia e pela amizade.
Aos colegas do Departamento de Biologia, em especial à Andreia, Sílvia, Célia, Rita,
Sofia, Emi ao Flavio e Jorge, pelos seus conselhos, pela amizade, alegria e longas
tertúlias.
Ao Cristovão pela sua ajuda na edição desta tese.
À família Alves Pinto Pacheco, que me fizeram esquecer o meu Perú, para aprender
querer Portugal.
Aos colegas do mestrado á Dany, Dulcineia, Dulce, Ana, Nádia, ao Marcos e Ricardo,
pelos bons momentos que partilhamos e apoio que me brindaram.
viii
Aos professores da Universidade Nacional de Trujillo-Perú: Dr. Manuel Fernández
Honores e Ms. C. Elmer Alvítez Izquierdo por me terem apoiado e ensinado grandes
valores e em forma muito especial ao Professor Roberto Rodríguez Rodríguez por ter
sido o meu guía durante a vida universitária e modelo de pessoa.
Aos professores Dr. Ricardo Fujita Alarcón do Instituto de Genética e Biologia
Molecular da Universidade Particular San Martín de Porres e Dr. Enrique Pérez do
Instituto de Medicina Tropical da Universidade Cayetano Heredia, pelo apoio
incondicional e confiança colocada em mim.
Aos meus irmãos Gianmarco, Michael, Lizbeth, Pilar, Emilia, Ruth, Rommel, Maira e
Milagros, pelo apoio incondicional.
Ao meu tío Miguel pela sua confiança e apoio incondicional desde o início dos meus
estudos.
Aos meus tíos Ramón, Picolín, Doris, Luz e Carlos pelo seu apoio e confiança que
sempre têm posto em mim.
À família Rodríguez Gonzales: Prof. Roberto, Sra. Rocio, Mireya, Lalo e Brandon, pelo
apoio e amizade nos momentos mais difíceis, este logro é produto da sua confiança
posta em mim.
Às famílias Salgado Rodríguez, Juárez Pita, Rebaza Rodríguez, Massé Miguel, sempre
me fizeram sentir como parte delas.
À Clarisa que sem a sua ajuda tivesse sido difícil este logro.
À Yolanda pelo amor e carinho constante que nunca acabarão.
Do mesmo modo, aos meus amigos do Perú: Carlitos, Eduardo, Miguel, Victor Hugo,
Daniel, Fibi, Ramón, Ludwig, Gabriela, Cecilia, Divián, Omar, Néstor, Marge, Sussy.
ix
RESUMO
As regiões genómicas codificantes são as mais usadas em estudos filogenéticos,
quer através de análises multigénicas, quer filogenómicas. Contudo, a maioria destas
análises usa apenas as regiões com elevada similaridade entre as sequências ortólogas,
não considerando as restantes regiões, as quais nos podem dar informações muito úteis.
Deste modo, o principal objectivo desta tese foi avaliar o uso dos genes da família
MADS-box, RLM1 e MCM1, assim como o gene IFF8, que codifica para uma proteína
GPI, em estudos de filogenia de fungos.
Foram obtidas setenta e seis sequências ortólogas para RLM1 e MCM1 e oito
para IFF8 através da pesquisa em várias bases de dados. O alinhamento das sequências
foi realizado mediante o uso dos programas CLUSTALW e MUSCLE e a análise
filogenética, utilizando os programas PHYML 3.0 e MrBayes 3.2. Os resultados obtidos
usando o factor de transcrição RLM1 indicaram que este apresenta condições favoráveis
para ser incluído em estudos filogenéticos de fungos, uma vez que a topologia obtida
está muito próxima das estabelecidas em outras análises multigénicas e filogenómicas.
O factor de transcrição MCM1 demonstrou limitações para ser utilizado em estudos
filogenéticos a nível do reino, visto que as sequências obtidas apresentam uma grande
variabilidade de tamanho, originando problemas nos alinhamentos. Apesar desta
limitação este gene pode ser utilizado, quer independentemente quer combinado com
outros genes, para resolver filogenias a nível do Filo ou de grupos de categoria inferior,
uma vez que os resultados obtidos dentro do subfilo Saccharomycotina estão de acordo
com os estudos publicados. A utilização do gene IFF8 para inferir filogenia apresentou
grandes limitações uma vez que foram identificados ortólogos deste gene apenas no
grupo CUG e além disso a filogenia obtida não estava de acordo com outros estudos
publicados, em particular no respeitante à posição de C. tropicalis na filogenia de
Candida spp.
É presentemente aceite que no Subfilo Saccharomycotina ocorreu um processo
de duplicação completa do genoma, nos grupos sensu stricto e sensu lato do ‗Complexo
Saccharomyces‘ com conservação de vários genes duplicados. A pesquisa inicial por
sequências ortólogas revelou que os factores de transcrição estudados neste trabalho
estão dentro dos genes duplicados que foram mantidos. Assim, outro objectivo deste
trabalho foi determinar se o gene RLM1 esteve sob selecção positiva no Subfilo
x
Saccharomycotina. Esta análise identificou várias posições onde os aminoácidos estão
possivelmente sob selecção positiva e embora estas substituições tenham sido
observadas em diferentes locais da proteína, não se detectou nenhuma nas regiões
conservadas. Esta observação sugere que a proteína executa uma função importante na
célula que foi mantida durante o processo de divergência de espécies. A presença de
aminoácidos sob selecção positiva, no inicio da região repetitiva do terminal carboxílico
da proteína, bem como a diferença no aminoácido repetido entre as espécies com o
genoma duplicado e as espécies com genoma não duplicado sugeriam que uma mutação
de frameshift seria a responsável pelas alterações observadas. Esta hipótese foi então
testada no Subfilo Saccharomycotina, desenhando as três grelhas de leitura e
reconstruindo a sequência ancestral. Os resultados desta análise confirmaram que uma
mutação de frameshift foi de facto responsável pelas alterações observadas na região
repetitiva com substituição de aminoácidos, diversificando a função deste gene nas
espécies que duplicaram o genoma.
Os principais resultados desta tese foram: (i) a identificação do potencial do
gene RLM1 para estudos de filogenia do reino Fungi, com especial ênfase na filogenia
das espécies do género Candida spp.; e (ii) a identificação do mecanismo molecular
responsável pela alteração da região repetitiva do terminal carboxílico da proteína,
ocorrida no Saccharomyces sensu stricto durante a divergência das espécies após
duplicação do genoma.
xi
ABSTRACT
Coding regions are used to resolve phylogenetic relationships through
multigenic and phylogenomic analyses. However, the majority of these analyses uses
regions with similarity only among orthologue gene sequences and do not take into
account other regions which could give useful information. Thus, the main purpose of
this thesis was to evaluate the use of RLM1 and MCM1 MADS-box transcription
factors, and IFF8, a GPI-anchor protein-coding gene, in fungi phylogeny.
Seventy six putative orthologue sequences for RLM1 and MCM1 and eight for IFF8
were obtained from different fungal databases. Sequence alignments were performed by
using CLUSTALW and MUSCLE and phylogeny was inferred by using PHYML 3.0
and Mr Bayes 3.2. Results obtained from the phylogenetic analysis, using the
transcription factor RLM1, indicated that it presents conditions to be considered within a
multigene analysis, since the obtained fungal phylogeny is closer to the ones established
by multigene and phylogenomic analyses. The transcription factor MCM1 presented
limitations to be used in phylogeny at the kingdom level, because of its variable
sequence sizes. However, despite this limitation this gene can be used to resolve
phylogeny at the phylum or lower clade levels since the results obtained, independently
and/or concatenated with the other genes used in this study, within the subphylum
Saccharomycotina were in agreement with published studies. On the other hand, the use
of IFF8 gene to infer phylogeny is limited and restricted to the CUG group, since other
orthologues were not found within the kingdom Fungi and the results obtained in the
CUG group phylogeny presented conflicts, particularly in the position of Candida
tropicalis which is not in agreement with previously determined relationship in Candida
phylogeny.
It is known that within the subphylum Saccharomycotina the process of genome
duplication has occurred in the ‗Saccharomyces complex‘, groups sensu stricto and
sensu lato, resulting in the duplication of some genes. The initial search for orthologue
sequences showed that the transcription factors studied in this work are within the
duplicated genes that were maintained. Thus, another objective of this work was to
determine if RLM1 was under positive selection to search for an alternative explanation
for the persistence and diversification of gene duplicates. These analyses identified
several amino acid sites under positive selection within the subphylum
xii
Saccharomycotina and although these substitutions were present in different positions,
they were not inside conserved regions, suggesting that the protein plays an important
role in fungi that was maintained during the evolution process of divergence of species.
The presence of amino acids under positive selection at the beginning of Rlm1 C-
terminal repetitive region and the differences in the amino acid under repetition between
species that presented the duplicated genome (WGD) and species with non duplicated
genome was indicative of a possible frameshift mutation. This hypothesis was tested in
Saccharomycotina group by designing the three open reading frames and reconstructing
the ancestral sequence. Results from this analysis showed that amino acid substitution
that occurred during the divergence of species changed this repetitive region and
diversified gene function in WGD species avoiding the loss of the protein function.
The major findings of this work were (i) the identification of the potential use of
RLM1 gene for inferring phylogeny in the kingdom Fungi with special emphasis to
Candida species, and (ii) the observation that this gene evolved within the
Saccharomyces sensu stricto after genome duplication, being the molecular mechanism
responsible for the change observed in the C-terminal of this protein most probably a
frameshift mutation.
xiii
RESUMEN
Las regiones genómicas codificantes son usadas en estudios filogenéticos através
de análisis a niveles multigenicos y filogenómicos. Pero la mayoría de estos análisis
usan solo regiones con similaridad entre secuencias de genes ortólogos y no toma en
cuenta otras regiones las cuales podrían darnos informaciones útiles. Por lo tanto, el
objetivo principal de esta tesis fue evaluar el uso de los genes MADS-box, RLM1 y
MCM1, así como IFF8, un gen codificante de proteína de anclaje GPI, en la filogenia de
hongos.
Se obtuvieron setenta y seis secuencias putativas ortólogas para RLM1 y MCM1,
y ocho para IFF8 a partir de las diferentes bases de datos de hongos. Los alineamientos
de secuencias se realizaron mediante el uso de los programas CLUSTALW y MUSCLE,
y la filogenia fue inferida por medio de los programas PHYML 3.0 y MrBayes 3.2. Los
resultados obtenidos de los análisis filogenéticos, utilizando el factor de transcripción
RLM1 indicaron que este presenta condiciones para ser considerado dentro de análisis
multigénico, ya que la filogenia obtenida para hongos está muy próxima a las
establecidas por otros análisis multigénicos y filogenómicos. El factor de transcripción
MCM1 presentan limitaciones para ser utilizado en filogenia a nivel de reino, debido a
sus secuencias de tamaño variables, dando lugar a problemas en el alineamiento. A
pesar de esta limitación este gen se puede utilizar para resolver filogenias a nivel de Filo
o clados inferiores debido a los resultados obtenidos con su uso, de manera
independiente y/o combinado con el resto de genes utilizados en este estudio, ya que
dentro del Subfilo Saccharomycotina estuvieron de acuerdo con estudios publicados.
Por otro lado, el uso de IFF8 gen para inferir filogenia es limitado y restringido al grupo
CUG, ya que otros ortólogos no se han encontrado en el reino Fungi y los resultados
obtenidos en la filogenia del grupo CUG presentaron conflictos, en particular en la
posición de C. tropicalis que no está de acuerdo con lo que ya se ha determinado en la
filogenia de Candida spp.
Es actualmente aceptado que dentro del subfilo Saccharomycotina el proceso de
duplicación del genoma se ha producido en los grupos sensu stricto y sensu lato del
'Complejo Saccharomyces‘, resultando en la manutención de algunos genes. La
búsqueda inicial de secuencias ortólogas mostró que los factores de transcripción
estudiados en este trabajo están dentro de los genes duplicados que han sido
xiv
mantenidos. Así pues, otro objetivo de este trabajo fue determinar si RLM1 estuvo bajo
selección positiva. Estos análisis identificaron varios sitios donde los aminoácidos
estuvieron posiblemente bajo selección positiva dentro del Subfilo Saccharomycotina y
aunque estas sustituciones estaban presentes en diferentes posiciones, no estuvieron
presentes en las regiones conservadas, lo que sugiere que la proteína ejecuta una
función importante en la célula que fue mantenida durante el proceso de divergencia de
especies. La presencia de aminoácidos bajo selección positiva en el inicio de la region
repetitiva del C-terminal en Rlm1 y la diferencia en el aminoácido bajo repetición entre
las especies con el genoma duplicado y las especies con genoma no duplicado indicaba
un posible frameshift mutation. Asi, esta hipótesis fue testada en el subfilo
Saccharomycotina diseñando tres open reading frames y reconstruyendo la secuencia
ancestral. Los resultados de este análisis mostraron que la substitución de aminoácidos
ocurrida durante la divergencia de especies que alteró esa región fue mediante una
frameshift mutation diversificando la función de este gen en las especies que duplicaron
su genoma.
Los principales resultados de esta tesis son: (i) la identificación del potencial del
gen RLM1 para estudios de filogenia en el reino Fungi, con especial énfasis en la
filogenia de las especies de Candida spp y (ii) la identificación del mecanismo
molecular responsable por el cambio observado en el C-terminal de esta proteína en el
grupo Saccharomyces sensu stricto que alteró el gen durante la divergencia de especies
después del proceso de duplicación de genoma.
xv
INDEX
DEDICATORIA .............................................................................................................. v
AGRADECIMENTOS ................................................................................................. vii
RESUMO ........................................................................................................................ ix
ABSTRACT .................................................................................................................... xi
RESUMEN ................................................................................................................... xiii
ABBREVIATION LIST ............................................................................................. xvii
FIGURE LIST .............................................................................................................. xxi
TABLE LIST .............................................................................................................. xxiii
1. INTRODUCTION ...................................................................................................... 1
1.1. Classification of organisms .................................................................................. 3
1.2. Tree of life ............................................................................................................. 5
1.3. Kingdom Fungi ................................................................................................... 12
1.4. Fungal taxonomy ................................................................................................ 15
1.5. Fungal phylogeny ............................................................................................... 16
1.6. Fungal identification .......................................................................................... 19
1.6.1. Phenotypic methods .................................................................................... 20
1.6.2. Genotypic methods ...................................................................................... 21
1.7. Methods for phylogenetic inference.................................................................. 24
1.7.1. Multiple sequence alignment ...................................................................... 25
1.7.1.1. CLUSTAL series of programs .......................................................... 25
1.7.1.2. MUSCLE ............................................................................................ 26
1.7.2. Major methods for estimating phylogenetic trees and infer phylogeny . 26
1.7.2.1. Distance methods ............................................................................... 26
1.7.2.2. Character-based methods ................................................................. 28
1.8. Adaptive evolution.............................................................................................. 30
2. THESIS BACKGROUND AND OBJECTIVES .................................................... 33
3. MATERIAL AND METHODS ............................................................................... 43
3.1. Phylogenetic analysis ......................................................................................... 45
3.1.1. Data retrieval ............................................................................................... 45
3.1.2. Alignment of sequences ............................................................................... 46
3.1.3. Selection of the best evolutionary model ................................................... 46
xvi
3.1.4. Phylogenetic reconstruction ........................................................................ 46
3.1.5. Interpretation of values considered for nodal support ............................. 47
3.2. Analysis of adaptive evolution ........................................................................... 48
3.3. Analysis of the repetitive region in the subphylum Saccharomycotina ......... 49
4. RESULTS AND DISCUSSION................................................................................ 51
4.1. Compilation of data ............................................................................................ 53
4.2. Phylogeny based on MADS-box transcrition factors ...................................... 54
4.2.1. Rlm1 transcription factor ........................................................................... 54
4.2.2. Mcm1 transcription factor .......................................................................... 60
4.2.3. Placement of fungal phylogeny based on MADS-box transcription factor
within the current literature ................................................................................. 63
4.3. Relationships within the subphylum Saccharomycotina ................................ 69
4.3.1. Phylogeny of the ‘Saccharomyces complex’ ............................................... 69
4.3.2. Phylogenetic relationships amongst Candida species ............................... 74
4.4. Analysis of adaptive evolution ........................................................................... 77
4.5. Event of frameshift mutation in S. cerevisiae RLM1 gene .............................. 82
5. FINAL REMARKS ................................................................................................... 89
6. REFERENCES .......................................................................................................... 95
7. ANNEXES ................................................................................................................ 129
xvii
ABBREVIATION LIST
AIC Akaike Information Criterion
AICc Akaike Information Criterion corrected
ACT1 Actin 1
AFLP Amplified Fragment Length Polymorphism
aLRT Approximate Likelihood Ratio Test
AP-PCR Arbitrarily Primed - PCR
ARG80 Arginine requirement 80
BEB Bayesian Empirical Bayes
BI Bayesian inference
COX2 Cythocrome oxidase 2 gene
CUG Organisms translate serine instead of leucine
CytB Cythocrome oxidase B gene
DHFR Dihydrofolate Reductase gene
DNA Desoxiribonucleic acid
dN Number of non-synonymous changes per non-synonymous site
dS Number of synonymous changes per synonymous site
EF-1α Elongation factor 1 alpha
GPI Glycosylphosphatidylinositol
HYPHY Hypothesis Testing Using Phylogenies
IFF8 Individual proteins file family F 8
ITS Intergenic Transcribed Spacer
LPS Lipopolysaccharide
LRTs Likelihood Ratio Tests
MADS MCM1-AGAMOUS-DEFICIENS-SRF
MAP1 Schizosaccharomyces pombe MCM1 homologue
MCM1 MiniChromosome Maintenance
MCMCMC Metropolis-coupled Markov Chain Monte Carlo
MEC Mechanistical-Empirical Model
MEF2 Myocite Enhancer Factor 2
xviii
MEGA Molecular Evolutionary Genetics Analysis
ML Maximum likelihood
MMR Mismatch repair system
MP Maximum parsimony
mtDNA Mithocondrial Desoxiribonucleic acid
mtLSUrDNA Mithocondrial large subunit ribosomal DNA gene
mtSSUrDNA Mithocondrial small subunit ribosomal DNA gene
MUSCLE Multiple Sequence Comparison by log-Expectation
NJ Neighbor joining
NonWGD Non Whole Genome Duplication
nucLSUrDNA Nuclear large subunit ribosomal DNA gene
nucSSUrDNA Nuclear small subunit ribosomal DNA gene
OM Outer membrane
Omp85 Outer membrane protein family 85
ORF Open Reading Frame
PAML Phylogenetic Analysis by Maximum Likelihood
PAUP Phylogenetic Analysis Using Parsimony
PCR Polymerase chain reaction
PHYML Phylogeny by Maximum Likelihood
PP Posterior Probability
RAPD Randomly Amplified Polymorphic DNA
rDNA ribosomal DNA gene
rRNA ribosomal RNA
RFLP Restriction Fragment Length Polymorphism
RLM1 Resistance to Lethality of MKK1P386 overexpression gene
RPB1 RNA polymerase II largest subunit
RPB2 RNA polymerase II second largest subunit
SAM SRF-Arg80-Mcm1
S-H Shimodaira-Hasegawa suppot
SMP1 Second MEF2-like Protein
xix
SRF Serum Response Factor
TS Thymidylate Synthase gene
Ty elements Transponsable element of yeast
UMC1 Ustilago MCM1 homologue
UPGMA Unweighted Pair Group Method with Aritmetic Mean
WGD Whole Genome Duplication
xxi
FIGURE LIST
Figure 1. Reconstruction of the Tree of Life based on transition analysis(…) ................ 7
Figure 2. Reconstruction of the eukaryote evolutionary tree(…) .................................. 10
Figure 3. Diversity of fungi(…) ..................................................................................... 14
Figure 4. Phylogeny of the kingdom Fungi(…) ............................................................. 19
Figure 5. Schematic representation of Type I (SRF-like) and Type II (MEF2-like)
protein domains in plant, animal and fungi(…) .............................................................. 38
Figure 6. Structure of Type I (SRF) and Type II (MEF2A) MADS domains in complex
with their DNA target (Homo sapiens)(…) .................................................................... 39
Figure 7. GPI anchor structure(…) ................................................................................ 40
Figure 8. Fungal phylogeny based on Rlm1 protein sequences(…) .............................. 56
Figure 9. Fungal phylogeny based on RLM1 DNA sequences(…) ............................... 58
Figure 10. Fungal phylogeny based on Mcm1 protein sequences(…) .......................... 62
Figure 11. Phylogeny of the subphylum Saccharomycotina by using the RLM1(…) ... 70
Figure 12. Phylogeny of the subphylum Saccharomycotina by using the MCM1(…) . 71
Figure 13. Phylogeny of the subphylum Saccharomycotina inferred by using
concatenated alignment of RLM1-MCM1(…) ............................................................... 72
Figure 14. Phylogenetic network reconstructed using a concatenated alignment of
RLM1-MCM1 in the subphylum Saccharomycotina(…) ............................................... 73
Figure 15. Phylogeny of CUG Group based on IFF8(…) ............................................. 75
Figure 16. Subphylum Saccharomycotina phylogeny reconstructed using concatenated
alignment of RLM1-MCM1-IFF8(…) ............................................................................ 75
Figure 17. Subphylum Saccharomycotina phylogenetic network reconstructed using a
concatenated alignment of RLM1-MCM1-IFF8(…) ...................................................... 76
xxii
Figure 18. Conserved regions in RLM1 gene of the subphylum Saccharomycotina:
MADS-box, Acidic region, A-B conserved regions present in this subphylum(…) ...... 78
Figure 19. Region of microsatellites present in some species from subphylum
Saccharomycotina(…) ..................................................................................................... 79
Figure 20. Saccharomyces cerevisiae amino acid sequence showing sites under positive
selection by the MEC model(…) .................................................................................... 81
Figure 21. The three RLM1 open reading frames of Candida albicans, Kluyveromyces
lactis and Saccharomyces cerevisiae visualized with the Virtual Ribosome 1.0(…) ..... 83
Figure 22. (A) Evolution of MADS-box genes based on Alvarez-Buylla et al. (2000)
model, proposing a common origin for Animals, Fungi and Plants and a further
duplication event in the Fungi lineage in some species within the subphylum
Saccharomycotina (WGD group). (B) Comparison of Rlm1 protein/gene domains in S.
cerevisiae and C. albicans, showing the frameshift mutation ocurred after genome
duplication in S. cerevisisae abd WGD species that changed the coding amino acid from
Q to N .............................................................................................................................. 85
Figure 23. Predicted putative ancestral RLM1 gene sequence from which CUG, WGD
and Non-WGD groups diverged(…) ............................................................................... 87
xxiii
TABLE LIST
Table 1. Evolution in the classification of living organisms throughout time. ................ 4
Table 2. The six kingdoms of life .................................................................................... 5
Table 3. Phyla of the kingdom Bacteria ........................................................................... 8
INTRODUCTION
INTRODUCTION
3
1. INTRODUCTION
1.1 Classification of organisms
The classification of living organisms has always been a controversial subject.
Attempts aiming at classifying living organisms were performed from the dawn of
recorded history, however early classification systems were artificial and based
primarily on habit and/or characteristics important to humans (i.e., medicines, food).
The earliest known system of classification, from which current systems of
classifying forms of life descend, comes from the thought presented by the Greek
philosopher and scientist Aristotle (384-322 BC), who published in his metaphysical and
logical works the first known classification of everything whatsoever, or "being". This
was the scheme that gave moderns such words as substance, species and genus and was
retained in a modified and less general form by Linnaeus. Aristotle‘s system classified
all living organisms known at that time as either a plant or an animal on the basis of
movement (air, land, or water), feeding mechanism, growth patterns, mode of
reproduction and possession or lack of red blood cells. Microscopic organisms were
unknown.
In 1735, Swedish naturalist Carolus Linnaeus formalized the use of two latin names
to identify each organism, a system called binomial nomenclature, in his great book the
Systema Naturae that ran through twelve editions during his lifetime. In his work,
nature was divided into three kingdoms: mineral, vegetable and animal (Table1).
Linnaeus grouped closely related organisms and introduced the modern classification
groups: class, order, family, genus, and species.
The german biologist Ernst Haeckel (1866) proposed a third kingdom, Protista, to
include all single-celled organisms. Some taxonomists also placed simple multicellular
organisms, such as seaweeds, in the kingdom Protista. Bacteria were also placed within
Protista as a separate group called Monera.
In 1938, American biologist Herbert Copeland was responsible for proposing the
group Monera as a fourth kingdom, which included only bacteria, separating it from
kingdom Protista. This was the first classification proposal to separate organisms
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
4
without nuclei, the prokaryotes, from organisms with nuclei, called eukaryotes, at the
kingdom level.
Table 1. Evolution in the Classification of Living Organisms throughout time.
In 1969, American biologist Robert Whittaker proposed a fifth kingdom, Fungi,
based on fungi's unique structures and methods of obtaining food. Fungi do not ingest
food as animals do, nor do they make their own food, as plants do; rather, they secrete
digestive enzymes around their food and then absorb it into their cells.
Woese and Fox (1977), divided the kingdom Monera in two new kingdoms:
Eubacteria and Archeobacteria. Then, Carl Woese et al. (1990) proposed a new
category, called Domain, to reflect evidence from nucleic acid studies that more
precisely reveal evolutionary or family relationships. He suggested three domains,
Archaea, Bacteria, and Eukarya, based largely on studies of the small subunit rRNA.
Recently, the system of classification has been modified well supported by
molecular and cellular studies due to zoologist Thomas Cavalier-Smith (1981) with the
proposal of a sixth kingdom of life: the Chromista. In 2004, Chromista was accepted as
kingdom and the term Domain was changed to Empire, with the proposal of two:
Prokaryota and Eukaryota (Table 2).
Linnaeus 1735
2
Kingdoms
Haeckel 1866
3
Kingdoms
Copeland 1938
4
Kingdoms
Whittaker 1969
5
Kingdoms
Woese and Fox 1977
6
Kingdoms
Woese et
al. 1990
3 Domains
Cavalier-
Smith 2004
2 Empires
6
Kingdoms
Vegetabilia Protista Monera Monera Eubacteria Bacteria Prokaryota
Archaebacteria Archea Bacteria
Protista Protista Protista Eukarya Eukaryota
Animalia Plantae Protozoa
Plantae
Fungi Fungi Chromista
Animalia Plantae Plantae Fungi
Animalia Animalia Animalia Plantae
Animalia
INTRODUCTION
5
Table 2. The six kingdoms of life (Cavalier-Smith, 2004)
1.2 Tree of Life
The new insights coming from molecular sequence studies and their integration with
numerous other lines of evidence, genetic, structural and biochemical have allowed an
exact classification of organisms. Based on these facts, researchers are reconstructing
the tree of life, and trying to place correctly the root of the evolutionary tree of all life
what would enable us to deduce rigorously the major characteristics of the last common
ancestor of all. It is probably the most difficult problem of all in phylogenetics, not yet
solved and the most important to solve correctly because the result colours all
interpretations of evolutionary history, influencing ideas of which features are primitive
or derived and which branches are deeper and more ancient than others (Cavalier-Smith,
2002a; Gribaldo and Philippe, 2002).
During a long time, researchers proposed that the root of the tree of life came
from ancestral species of Archaebacteria, giving origin to other groups and ultimately to
the formation of superior multicellular organisms. Other attempts of rooting the tree of
life used protein paralogue trees that theoretically placed the root, but these results were
contradictory due to tree-reconstruction artefacts or to poor resolution (Cavalier-Smith,
2002a). Studies based on ribosome-related and DNA-handling enzymes suggested one
root between Neomura (Eukaryotes plus Archaebacteria) and Eubacteria (Iwabe et al.,
1989), whereas metabolic enzymes place the root within Eubacteria but in contradictory
places (Peretó et al., 2004). Palaeontology shows that eubacteria are much more ancient
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
6
than Eukaryotes (Cavalier-Smith, 1987; Douzery et al., 2004; Roger and Hug, 2006),
with phylogenetic evidence that archaebacteria are sisters not ancestral to Eukaryotes,
implying that the root is not within the Neomura.
The first fully resolved prokaryote tree was inferred, by using 13 major
biological transitions within eubacteria (Figure 1 and Table 3).This tree showed a basal
stem comprising the new infrakingdom Glidobacteria (Chlorobacteria, Hadobacteria,
Cyanobacteria), which is entirely non-flagellate, and two derived branches comprising
the infrakingdom Gracilicutes and the subkingdom Unibacteria that diverged
immediately following the origin of flagella (Cavalier-Smith, 2006a).
Proteasome evolution, a proteic structure responsible for degradating unneeded
or damaged protein, indicated that the universal root would be outside the clade
comprising Neomura and Actinomycetales (proteates), and thus lying within other
eubacteria, contrary to the widespread assumption that it was between Eubacteria and
neomura. Cell wall and flagellar evolution independently located the root outside
Posibacteria (Actinobacteria and Endobacteria), and thus among Negibacteria, with two
membranes, favouring a transition from de Negibacteria to Posibacteria and not from
Posibacteria to Negibacteria as traditionally assumed. Posibacteria were derived from
Eurybacteria and ancestral to Neomura.
Studies based on the comparison of new amino acid insertions in the RNA
polymerase gene strongly favor the monophyly of Gracilicutes which contain
Proteobacteria, Planctobacteria, Sphingobacteria and Spirochaetes (Iyer et al., 2004).
Evolution of the negibacterial outer membrane places the root within Eobacteria
(Hadobacteria and Chlorobacteria both primitive without lipopolysaccharide), as all
phyla possesses the outer membrane β-barrel protein Omp85, and Chlorobacteria, the
only Negibacteria without Omp85, or within Chlorobacteria (Cavalier-Smith, 2006a).
A negibacterial root also fits the fossil record, which shows that Negibacteria are
more than twice as old as Eukaryotes (Cavalier-Smith, 2002a; Cavalier-Smith, 2006b).
Posibacteria, Archaebacteria and eukaryotes were probably all ancestrally heterotrophs,
whereas Negibacteria are likely to have been ancestrally photosynthetic and diversified
by evolving all the known types of photosystems and major antenna pigments.
INTRODUCTION
7
Figure 1. Reconstruction of the Tree of Life based on biological transitions analysis. Thumbnail sketches
show major variants in cell morphology (microtubular skeleton red, peptidogglycan wall
brown, outer membrane blue). The most likely root position is as shown; the possibility that it
may lie within Chlorobacteria instead cannot yet be ruled out. Lowest level groups including or
consisting enterily of photosynthetic organisms are in green or purple. The frequently
misplaced hyperthermophilic eubacteria are in red (Adapted from Cavalier-Smith, 2006a).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
8
Table 3. Phyla of the kingdom Bacteria (Cavalier-Smith, 2006a). *Classification proposed by author.
The last ancestor of all life forms would have been a eubacterium with acyl-ester
membrane lipids, large genome, murein peptidoglycan walls, and with a fully developed
eubacterial biology and cell division. It would be a non-flagellate negibacterium with
two membranes, probably a photosynthetic green non-sulphur bacterium with relatively
primitive secretory machinery, not a heterotrophic posibacterium with one membrane.
As Negibacteria are the only prokaryotes that use sunlight to fix carbon dioxide this is
also the only position that would have allowed the first ecosystems to have been based
on photosynthesis, without which extensive evolution might have been impossible
(Cavalier-Smith, 2002a; 2006a).
INTRODUCTION
9
Ancestral neomura would have had their ancestral origin in an Arabobacteria
(Actinobacteria). This ancestral would have been a thermophilic bacterium with
cotranslational protein secretion, N-linked glicosylation, glycoprotein instead of murein,
histones ¾ instead of DNA gyrase. This neomuran ancestor diverged sharply into two
contrasting lineages. One formed a glycoprotein wall and became hyperthermophilic,
evolving prenyl ether lipids and losing many eubacterial genes, as H1 histones, to form
the Archaebacteria. The other became much more radically changed by using its
glycoproteins as a flexible surface coat, evolving phagotrophy, an endomembrane
system, endoskeleton, be uniciliate, unicentriolar aerobic zooflagellate and nucleus and
enslaving an a-proteobacterium as a protomitochondrion to become the first eukaryote
(Cavalier-Smith, 2000; 2002b).
Eukaryotes comprise the basal kingdom Protozoa and four derived kingdoms:
Animalia, Fungi, Plantae, and Chromista (Cavalier-Smith, 1998). Some thoughtful 19th
century protozoologists suspected that flagellates preceded amoebae, as just asserted.
But until DNA sequencing and molecular systematics burgeoned in the 1980s and
1990s the often more popular idea of Haeckel that amoebae came first and flagellates
were more advanced could not be disproved. However, it is now known that the primary
diversification of eukaryote cells took place among zooflagellates with non-
photosynthetic predatory cells having one or more flagella for swimming, and often also
generating water currents for pulling in preys (Cavalier-Smith, 2006), from which two
groups diverged, the bikonts and the unikonts that incorporate all eukaryotic kingdoms
(Figure 2).
Plantae, Chromista, three protozoan infrakingdoms (Alveolata, Rhizaria,
Excavata), and the small zooflagellate phylum Apusozoa are all clearly ancestrally
biciliate and together constitute a clade designated the bikonts (Cavalier-Smith, 2002b).
Ancestral bikonts had two diverging cilia: the anterior undulated for swimming or prey
attraction, and the posterior for gliding on surfaces in many groups (likely its ancestral
condition), but secondarily adapted for swimming in other groups (or quite often was
lost to make secondary uniciliates, not to be confused with genuine unikonts, which
confusingly are sometimes secondarily biciliate). All major eukaryote groups except
Ciliophora included severely modified organisms that lost cilia; some became highly
amoeboid. The major shared derived character for all groups (not yet clearly
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
10
demonstrated for Rhizaria) is the ciliary transformation in which the anterior
cilium/centriole and its associated roots are always the first formed during the cellular
division by bipartition; in the next cell cycle the cillium often undergo marked changes
in structure and function to become the corresponding posterior organelles (Moestrup,
2000).
Figure 2. Reconstruction of the eukaryote evolutionary tree. Taxa in black comprise the basal kingdom
Protozoa. The thumbnail sketches show the contrasting microtubule skeletons of unikonts and
bikonts in red. The dates highlighted in yellow are for the most ancient fossils known for each
major group. Major innovations that help group higher taxa are shown by bars. Four major cell
enslavements to form cellular chimaeras are shown by heavy arrows (enslaved bacteria) or
dashed arrows (enslaved eukaryote algae) (Cavalier-Smith, 2006c).
An independent derived character for bikonts is the fusion between thymidylate
synthase (TS) and dihydrofolate reductase (DHFR) genes to encode a single
bifunctional chimaeric protein. This fusion appears to have taken place in the common
ancestor of the ancestral bikont after their invertion in the cenancestral eukaryote
(Stechmann and Cavalier-Smith, 2002; 2003).
Based on these facts, a new branching for the bikont group was proposed, in
which the zooflagellate Phylum Apusozoa may be a clade sister of the Excavata. In an
another group, it was observed that the clade chromalveolates was formed by the
kingdom Chromista and the protozoan Infrakingdom Alveolata, which diverged from a
common ancestor that enslaved a red algae and evolved novel plastid protein-targeting
INTRODUCTION
11
machinery via the host rough endoplasmatic reticule (ER) and the enslaved algal plasma
membrane (periplastid membrane). kingdom Plantae, whose ancestor enslaved a
cyanobacteria to form chloroplasts, may be sister of an ancestral to chromalveolates;
Rhizaria and Excavata (jointly Cabozoa) are probably sisters if the formerly green algal
plastid of euglenoids and chloarachneans (Cercozoa) was enslaved in a single event
named symbiogenesis secondary in their common ancestor (Cavalier-Smith, 2002c;
2003).
The TS and DHFR genes are separately translated in Sarcomastigota, Animalia
and Fungi. These three taxa are referred to as unikonts because it is argued that their
common ancestor probably had only a single centriole and cilium per kinetid (Cavalier-
Smith, 2002b). This unikont state is found in most distinct amoebozoan. On the other
hand, unikonts share the same myosin II as in our muscles, being absent from all
bikonts (Cavalier-Smith, 2006c). Unikont is argued to be the ancestral state for
Amoebozoa (Cavalier-Smith, 2002b; Cavalier-Smith et al., 2004) as only myxogastrids
and former protostelids arguably related to them have bicentriolar kinetids. The
bicentriolar state of myxogastrids and many Choanozoa is developmentally different
from that of bikonts, as their anterior cilium remains anterior in successive cell cycles
and does not transform into a posterior one. Thus, it is arguably not homologous to the
bikont state with true ciliary transformation. A second gene fusion involving the first
three enzymes of pyrimidine biosynthesis (carbamoyl phosphate synthase,
dihydrorotase and aspartate carbamoyltransferase) is apparently a shared derived
character for unikonts, absent from bikonts and the ancestral prokaryotes (Stechmann
and Cavalier-Smith, 2003). As this involved two simultaneous fusion events, it is even
less likely to ever have been reversed than the DHFR/TS fusion. If none of these gene
fusions has ever been reversed during evolution, then they together indicate that the root
of the eukaryote tree cannot lie within unikonts or bikonts, but must lie between these
two clades (Stechmann and Cavalier-Smith, 2003).
Earlier structural and molecular evidence would support the idea that Animalia,
Choanozoa and Fungi together form a clade, characterized ancestrally by a single
posterior cilium with a bicentriolar kinetid and flat mitochondrial cristae (Cavalier-
Smith, 1987b). An exclusive grouping of animals and fungi was first noted in
evolutionary trees of small subunit ribosomal RNA analysis (Wainright et al., 1993) and
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
12
proteins such as Ef-1α, Act1 and α-β tubulin (Baldauf and Palmer, 1993). This grouping
was subsequently designated as ‗‗Opisthokont‘‘ (Cavalier-Smith and Chao, 1995) and is
now strongly supported by phylogenetic analyses of all taxonomically well-represented
and well-characterized molecular data sets. These include concatenated multigene
analyses (Baldauf et al., 2000; Bapteste et al. 2001; Lang et al., 2002) as well as many
single-gene trees (Baldauf, 1999; Inagaki and Doolittle, 2000; Van de Peer et al., 2000).
Recent molecular phylogenetic evidence has indicated that animals and fungi evolved
independently from different unicellular protozoan choanozoan ancestors (Steenkamp et
al., 2006; Shalchian-Tabrizi et al., 2008).
The other clade that forms part of the Opisthokonts is the Phylum Choanozoa.
Extant Choanozoa are either a clade that is sister to Animalia or a paraphyletic group
ancestral to them (Cavalier-Smith and Chao, 2003; Rokas et al., 2003), and within the
Choanozoa only the Nucleariid appears to be the closest sister taxon to Fungi
(Steeankamp et al., 2006). A sister relationship between animals and choanoflagellates
at least is supported by the probable gene fusion that generated the receptor tyrosine
kinase from a cytoplasmic kinase and calcium-binding epidermal growth factor in their
common ancestor after it diverged from Amoebozoa (King and Carroll, 2001). The
Opisthokont clade is also supported by a shared derived 11–17 amino acid insertion in
protein synthesis elongation factor EF-1α absent from prokaryotes and all other
eukaryotes (Baldauf, 1999).
1.3 Kingdom Fungi
Fungi share a long history with human civilization. References in Greek literature,
mushroom stones from Mesoamérica dating back to 1000-300 B.C. (Lowy, 1971), and
dried mushrooms of Piptoporus betulinus found in a pouch around a Stone Age man´s
neck in the Alps (Rensberger, 1992) all attest to this long relationship. These make up
one of the major clades of life. Roughly 80000 species of fungi have been described, but
the actual number of species has been estimated at approximately 1.5 million
(Hawksworth, 1991; 2001). This number may yet underestimate the true magnitude of
fungal diversity (Persoh and Rambold, 2003).
For a long time, fungi were regarded as a single kingdom belonging to the
aforementioned five-kingdom scheme. However, the organisms usually considered to be
INTRODUCTION
13
fungi are very complex and diverse. Their recognition, delimitation and typification
have been formidable problems since the discovery of these organisms. They can
present chitin or not, multicellular and filamentous absorptive forms and unicellular
assimilative forms, among others, and can reproduce by different types of propagules or
even by fission (Alexopoulos et al., 1996; Guarro et al., 1999) (Figure 3). Despite this
great variability in form and function, systematic studies based upon morphology and
life cycle produced, by the middle 20th century, a manageable number of major phyla;
and the use of molecular approaches have actually determined seven phyla well defined
within the kingdom Fungi (Hibbet et al., 2007).
Fungi play a critical role impacting nearly all other forms of life in virtually all
ecosystems as either friend or foe. Saprotrophic fungi are important in the environment
in the cycling of nutrients through the decomposition of organic material, especially the
carbon that is sequestered in wood and other plant tissues. Mutualistic symbiont fungi
through relationships with prokaryotes, plant (including algae) and animals have
enabled a diversity of other organisms to exploit novel habitats and resources. Indeed,
the establishment of mycorrhizal associations may be a key factor that enabled plants to
make the transition from aquatic to terrestrial habitats. Other group are pathogenic and
parasitic Fungi that attack virtually all groups of organisms, including bacteria, plants,
other Fungi, and animals, including humans. The economic impact of such fungi is
massive either beneficial by the production of antibiotics or extremely detrimental by
the devastating impacts in plant diseases, mycotoxins and mycosis (Pirozynski and
Malloch, 1975; Moss, 1987; Alexopoulos et al., 1996; Blackwell, 2000).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
14
Figure 3. Diversity of fungi. Microsporidia (Nosema apis (A), Octosporea bayeri (B));
Blastocladiomycota (Flammulina velutipes (C)); Neocallimastigoycota (Caecomyces
communis (D)); Chytridiomycota (Batrachochytrium dendrobatidis (E)); ‗Zygomycota‘-
Mucoromycotina (Phycomyces blakesleeanus (H), Mucor circinelloides (G));
Glomeromycota (Glomus coronatum (J)); Basidiomycota (Agaricus xanthodermus (F),
Cryptococcus neoformans (I), Ustilago maydis (K)); Ascomycota (Microsporum gypseum
(L), Aspergillus niger (M), Candida albicans (N), Blastomyces dermatitidis (O)).
INTRODUCTION
15
1.4 Fungal taxonomy
Historically the monophyletic Fungi have been classified into four phyla:
Chytridiomycota, Zygomycota, Basidiomycota, and Ascomycota (Barr, 1992;
Hawksworth et al., 1995; Alexopoulos et al. 1996). However, current studies have
shown that such a simple classification does not represent the phylogeny of those
organisms, being currently recognized the phyla Microsporidia, Blastocladiomycota,
Neocallimastigomycota, Chytridiomycota, Glomeromycota, Basidiomycota and
Ascomycota. Presently, the phylum named Zygomycota is still to define due to the
pending resolution of relationships among the clades that have traditionally been placed
within this phylum (Hibbet et al., 2007).
The current fungal classification accepts one kingdom, one subkingdom, seven
phyla, ten subphyla, 35 classes, 12 subclasses, and 129 orders. The clade containing the
Ascomycota and Basidiomycota is classified as the subkingdom Dikarya (James et al.,
2006), reflecting the putative synapomorphy of dikaryotic hyphae (Tehler, 1988). All of
the other new names are based on automatically typified teleomorphic names. In
Basidiomycota, the clades formerly called Basidiomycetes, Urediniomycetes, and
Ustilaginomycetes in the last edition of Ainsworth & Bisby‘s Dictionary of the Fungi
are now called the Agaricomycotina, Pucciniomycotina, and Ustilaginomycotina,
respectively (Bauer et al., 2006). This was done to minimize confusion between taxon
names and informal terms (basidiomycetes is a commonly used informal term for all
Basidiomycota) and to refer to the included genera Agaricus (including the cultivated
button mushroom) and Puccinia (which includes barberry-wheat rust). Another
significant change in the Basidiomycota classification has been the inclusion of the
Wallemiomycetes and Entorrhizomycetes as classes incertae sedis within the phylum,
reflecting ambiguity about their higher-level placements (Matheny et al., 2007a).
The most dramatic changes in the classification concern the ‗basal fungal
lineages‘, which include the taxa that have traditionally been placed in the Zygomycota
and Chytridiomycota. These groups have long been recognized to be polyphyletic,
based on analyses of rRNA, EF-1α, and RPB1 (James et al., 2000; Nagahama et al.,
1995; Tanabe et al., 2004, 2005). Recent multilocus analyses now provide the sampling,
resolution, and support necessary to structure new classifications of these early-
diverging groups, although significant questions remain (Liu et al., 2006; Lutzoni et
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
16
al.,2004; James et al., 2006). The Chytridiomycota is retained in a highly restricted
sense, including Chytridiomycetes and Monoblepharidomycetes. The Blastocladiales, a
traditional member of the Chytridiomycota, is now treated as a phylum, the
Blastocladiomycota (James et al., 2007). The Neocallimastigales, whose distinctiveness
from other chytrids has long been recognized, is also elevated to phylum, based on both
morphology and molecular phylogeny (Hibbett et al., 2007). The genera
Caulochytrium, Olpidium, and Rozella, which have traditionally been placed in the
Chytridiomycota, and Basidiobolus, previously classified in the Zygomycota
(Entomophthorales), are not included in any higher taxa in the current classification,
pending more definitive resolutions of their placements. The phylum Zygomycota has
not been accepted in the current classification, and its resolution is pending due to the
relationships among the groups that were traditionally part of this phylum. The
traditional Zygomycota were distributed among the phylum Glomeromycota and four
subphyla incertae sedis, including Mucoromycotina, Kickxellomycotina,
Zoopagomycotina and Entomophthoromycotina (Hibbett et al., 2007). Microsporidia,
unicellular parasites of animals and protists with highly reduced mitochondria, has been
included as a phylum of the Fungi, based on multigenic analyses (Germot et al., 1997;
Hirt et al., 1997; Peyretaillade et al., 1998; Keeling et al., 2000; Gill and Fast, 2006;
James et al., 2006; Liu et al., 2006).
1.5. Fungal phylogeny
Until recently, the fungal phylogenetics has been well served by phenotypic
characters such as morphological comparison, cell wall composition, cytologic testing,
ultrastructure, cellular metabolism and fossil records (Le`John, 1974; Taylor, 1978;
Heath, 1986; Hawksworth et al., 1995). However, higher-level relationships amongst
these groups are less certain and are best elucidated using molecular techniques. More
recently, the advent of cladistic and molecular approaches has changed the existing
situation and provided new insights into fungal evolution. Genotypic analysis offers a
straightforward means of resolving evolutionary questions where phenotypic characters
mentioned are absent or contradictory.
Many loci have been used by mycologists for evolutionary studies at single-gene
level, but few of these loci revealed to be appropriated to resolve relationships among
the main lineages of the Fungi. Even when trees are inferred by using multiple loci, the
INTRODUCTION
17
phylogenetic signal may be limited strongly by the loci selected. Phylogenetic studies
indicate that more than 83.9% of fungal phylogenies are based exclusively on sequences
from the ribosomal RNA tandem repeats. The few protein-coding genes that have been
sequenced for phylogenetic studies of fungi have demonstrated clearly that such genes
can contribute greatly to resolving deep phylogenetic relationships with high support
and/or increase support for topologies inferred by using ribosomal DNA genes (Liu et
al., 1999). Studies using other protein-coding genes such as DNA topoisomerase II,
Actina-1 (ACT1), first large subunit of RNA polymerase (RPB1), Citochrome oxidase 2
(COX2) and Citochrome oxidase B (Cyt B) combined with traditional genes used in
phylogeny are already being used to infer fungal phylogeny (Kurtzman and Robnett,
2003; Diezmman et al., 2004; Lutzoni et al., 2004; Reeb et al. 2004; Wang et al., 2004;
Liu et al., 2006; Matheny et al., 2007b).
Until 2004, genome databases revealed that 21075 ITS, 7990 nucSSU, 5373
nucLSU, 1991 mitSSU, and 349 RPB2 sequences were available and the number is
continuosly increasing (Lutzoni et al., 2004). As impressive as these numbers is the
collective effort to generate DNA sequence data for the Fungi but none of these loci
alone can resolve the fungal tree of life with a satisfactory level of phylogenetic
confidence (Kurtzman and Robnett, 1998; Tehler et al., 2000; Berbee, 2001; Binder and
Hibbett, 2002; Moncalvo et al., 2002; Tehler et al., 2003). Combining sequence data
from multiple loci is an integral part of large scale phylogenetic inference and is central
to assembling the fungal tree of life. However, most published phylogenetic studies
have restricted their analyses to one locus maximizing the number of fungal taxa. To
quantify this observation, 560 publications reporting published fungal phylogenetic
trees were surveyed from 1990 through 2003. Of the 595 trees considered in those
studies, 489 (82.2%) were based on a single locus, only 77 trees were based on two
combined loci, 19 on three combined loci, and 10 on four or more combined loci. Seven
of the latter 10 studies were restricted to closely related species or strains within a
species (Lutzoni et al.; 2004). Exceptions included studies with 93 species representing
most major clades of Homobasidiomycetes (Binder and Hibbett, 2002), with 15 species
representing 10 orders (Binder et al., 2001), and with 45 species representing nine
orders (Hibbett and Binder, 2001).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
18
The largest phylogenetic tree based on one locus included 1551 nucSSU
sequences representing 60 orders (Tehler et al., 2003). The largest multilocus trees
included 162 ITS + ß-tubulin sequences representing a single order of Fungi (Stenroos
et al., 2002); 158 species representing 10 orders based on nucSSU, nucLSU, and
mitSSU (Hibbett et al., 2000); 110 species in a single order sequenced for ITS and
nucLSU (Peterson, 2000); and 108 nucSSU + nucLSU sequences representing 19 orders
of Fungi (Miadlikowska and Lutzoni, 2004). A more recent phylogenetic study directed
toward resolving the fungal tree of life based on six combined loci, included members
from all four traditionally recognized phyla of Fungi (Ascomycota, Basidiomycota,
Chytridiomycota, and Zygomycota), and two new phyla Glomeromycota and
Microsporidia (James et al., 2006).
Although much effort has been invested in defining orders (compared to
families, for example), few studies have focused on resolving relationships among
orders of Fungi: 354 of the 595 trees examined (59.5%) conveyed relationships within
single orders. The largest number of orders considered in a single study (N=62) resulted
in a tree based only on nucSSU data (Tehler et al., 2000). The fungal trees based on
combined data from multiple loci and encompassing the largest number of orders
included 38 species representing 25 orders (Bhattacharya et al., 2000), 52 species
representing 20 orders (Lutzoni et al., 2001), and 108 species representing 19 orders
(Miadlikowska and Lutzoni, 2004). All of these studies focused on ascomycetes and
were based on nucSSU and nucLSU rDNA. An exceptional study covering 16 orders of
fungi (34 species) using a combined analysis of two protein-coding genes (α and ß-
tubulin) was carried out to infer the phylogenetic placement of Microsporidia within
kingdom Fungi (Keeling, 2003). The major fungal phylogenomic study based on 42
complete genomes was carried out by using all conserved regions of genes, resolving
deep phylogenetic relationship of the majority of known taxa. The major drawback of
this study is the low number of sequenced fungal genomes being an obstacle to resolve
basal taxa in some clades (Fitzpatrick et al., 2006).
Figure 4 shows a fungal phylogeny based on genetic and geologic data, which
present the five main major groups of fungal organisms that have their genomes
sequenced or in progress, but uses the previous taxonomic denomination. In this tree, it
is evident that the Ascomycetes and Basidiomycetes are sister groups (subkingdom
INTRODUCTION
19
Dykaria), and that there is a close relationship of Glomeromycetes to the subkingdom
Dykaria, which diverged about 600 myrs ago. The basal groups in the kingdom Fungi
are Chytrids and Zygomycetes. Currently, studies for determining the correct placement
of species in these groups are being carried out, since some of these produced paraphyly
(Chytrids) or poliphyly (Zygomycetes) in the fungal phylogeny. Due to this result, the
species from Zygomycetes have been reclassified (Hibbet et al., 2007).
Figure 4. Phylogeny of the kingdom Fungi. Major fungal groups colored as indicated by text at the right.
Diamonds indicate evolutionary branch points, and their approximate dating (bottom), captured
by fungal genomes sequenced in or in progress (modified Berbee and Taylor, 2001).
1.6. Fungal identification
The identification of fungi to species level and also the distinction of
morphologically identical strains of the same species has been a challenge for the
mycologists. Hence, most species have received only limited study, and the
classification has been mainly traditional and based on readily observable
morphological features. Other features besides morphology, such as susceptibility to
chemicals and antifungal drugs, physiological and biochemical tests, production of
secondary metabolites, ubiquinone systems, fatty acid composition, cell wall
composition, protein composition and molecular approaches, have been used in
classification and also in identification of fungal species (Guarro et al., 1999).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
20
1.6.1. Phenotypic methods
The classification and identification of fungi using phenotypic methods relies
mainly on morphological criteria. The classical light microscopic methods have been
enhanced by Nomarski differential interference contrast, fluorescence, cytochemistry,
and the development of new staining techniques such as those for ascus apical structures
(Romero and Winter, 1988). Unfortunately, during infections most pathogenic fungi
show only the vegetative phase (absence of sporulation), in host tissue only hyphal
elements or other nonspecific structures are observed. Although the pigmentation and
shape of these hyphae and the presence or absence of septa can give us an idea of their
identity, fungal culture is required for accurate identification. For the identification and
classification of these fungi, the type of conidia and conidiogenesis are considered the
most important sets of characteristics to be observed (Cole and Samson, 1979; Guarro et
al., 1999). In the rare instances that opportunistic fungi develop the teleomorph in vitro
(this happens in numerous species of Ascomycota and in a few species of Zygomycota
and Basidiomycota), there are many morphological details associated with sexual
sporulation which can be extremely useful in their classification. The type of fruiting
body and type of ascus (structure that contain ascospores produced after meiosis) are
vital for classification. Shape, color, and the presence of an apical opening (ostiole) in
the fruiting bodies are important features in the recognition of higher taxa (Guarro et al.,
1999). In recent years, morphological techniques have been influenced by modern
procedures, which allow more reliable phenotypic studies to be performed. Numerical
taxonomy, effective statistical packages, and the application of computer facilities to the
development of identification keys offer some solutions and the possibility of a
renaissance of morphological studies.
Besides the use of morphological identification, other approaches have been
proposed to identify fungi, which are: a) Biotyping, which is based on the use of
physiological and biochemical test such as the commercially avaliable kits, API and
VITEK systems, and selective media as CHROMOAGAR have been used to identify
unicellular fungi (San-Millan et al., 1996; Bernal et al., 1998; Graf et al., 2000; Ballesté
et al., 2005), b) Serotyping, a method based on aglutination reaction between antigens
and policlonal antibodies, used for pathogenic yeast typing as Candida spp,
Cryptococcus neoformans at the strain level (Evans et al., 1950; Haesenclever and
Mitchell, 1961), c) Production of secondary metabolites, which consists in the
INTRODUCTION
21
identication of fungi by the metabolites that they produce (Carlile and Watkinson, 1994;
Frisvad, 1994; Whaley and Edwards, 1995), d) Cell wall composition, based on studies
of structure and composition of the cell wall of fungi, delimiting high taxa (Bartnicki-
Garcia, 1987; Carlile and Watkinson, 1994), and e) Isoenzymatic analysis, that enable
the differentiation between species by the electrophoretic pattern of specific enzyme
activity (Busch and Nitscko, 1999; Soll, 2000). All of these techniques have presented
limitations in the identification of fungal species, being some of them restricted to some
species or fungal group. Hence, the use of techniques that included studies at the
genomic level may resolve these disadvantages.
1.6.2. Genotypic methods
Methods based on studies directly in DNA are within the genotypic methods. The
electrophoretic karyotyping is a method, which use pulsed field gel electrophoresis to
assess variations of chromosome sizes and the precise determination of karyotypes,
allowing the discrimination at the strain and species level. The electrophoretic
karyotyping consist in the migration of chromosomes under the influence of electric
fields of alternating orientation to move intact chromosomes according to size through
the agarose gel matrix, which can be visualized by ethidium bromide staining (Guarro et
al., 1999; Taylor et al., 1999). Electrophoretic karyotyping has been used to fingerprint
different fungal species as Candida spp, Micromucor spp, Paracoccidioides
brasiliensis, Pyrenophora spp (Dib et al., 1996; Nogueira et al., 1998; Aragona et al.,
2000; Klempp-Selb et al., 2000; Shin et al., 2001, Nagy et al., 2004).
One of the first DNA fingerprinting methods used to assess strain relatedness and
taxonomy in fungi was restriction fragment analysis, or restriction fragment length
polymorphism (RFLP) comparisons, without probe hybridization. DNA is extracted,
digested with one or more restriction endonucleases to sample short pieces of DNA
sequence. Then, the fragments are separated by electrophoresis in an agarose gel. The
banding pattern of digested DNA is then visualized, usually by staining with ethidium
bromide. The obtained band pattern is based on different fragment lengths determined
by the restriction sites identified by the particular endonuclease employed (Taylor et al.,
1999; Soll, 2000). Any region of DNA can be used for RFLP analysis if variation
(alleles) can be visualized, directly due to multiple copies (mtDNA, rDNA) or indirectly
either by hybridization with a probe or by PCR amplification. RFLPs were the first
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
22
DNA markers used for fungal evolutionary biology. The band patterns can be tabulated
and compared, and phenetic trees constructed (Edel et al., 1996). Critics of RFLP
analysis note that while restriction sites in different individuals are most likely to be
identical by descent that is not the case for missing sites, because it is easier to lose a
site than to gain one. There are many ways in which a restriction endonuclease site can
be lost: any of the several nucleotides in the recognition sequence can be substituted, or
the site can suffer length mutations. Missing sites that have arisen, unknowingly, by
different routes confound evolutionary analysis (Taylor et al., 1999).
Randomly amplified polymorphic DNA (RAPD) analysis or arbitrarily primed PCR
(AP-PCR) analysis is similar to RFLP analysis in that it assays DNA sequence variation
in short regions, but instead of analyzing restriction endonuclease recognition
sequences, it focuses on PCR priming regions (Williams et al., 1990). Using random
primers of approximately 10 bases and a low annealing temperature, amplicons
throughout the genome are targeted and amplified. Amplified products are separated on
an agarose gel and stained with ethidium bromide. When a single random
oligonucleotide primer is used in a reaction, it hybridizes to homologous sequences in
the genome and if the primer hybridizes to sequences on alternative DNA strands within
roughly 3 kb, the DNA region between the two hybridization sites will be amplified.
Criticisms of RAPDs initially focused on reproducibility. RAPD analysis succeeds
because just one nucleotide substitution can allow or prevent priming, and so it is not
surprising that small differences in any aspect will have the same effect of PCR profile.
Even if there were no problems with repeatability, there would still be the concern that
bands of equal electrophoretic mobility may not be homologous and the related concern
that missing bands may not be homologous because they can be lost by several possible
nucleotide substitutions in either PCR priming site as well as by length mutations. There
is also the problem of dominant and null alleles; in haploid organisms both the
dominant (presence) and null (absence) alleles can be scored, but in diploids it is not
possible to distinguish genotypes that are homozygous for the dominant allele from
those that are heterozygous. It is also tempting to score more than one variable band per
RAPD reaction, although they may not be independent (Taylor et al., 1999). Thus,
RAPD has not been found useful for evolutionary studies. Because of the repetitive
character of the target sequences, genetic distances calculated from RAPD could be
INTRODUCTION
23
affected by parology, namely, recombination and duplication events not parallel with
speciation events (Melo et al., 1998).
Amplified fragment length polymorphism (AFLP) analysis is a relatively new
technique which has a discriminatory power that makes it suitable for identification as
well as for strain typing. In short, in AFLP analysis genomic DNA is digested with two
restriction enzymes (EcoRI and MseI) and double-stranded oligonucleotide adapters are
ligated to the fragments. These adapters serve as targets for the primers during PCR
amplification. To increase the specificity, it is possible to elongate the primers at their
3`ends with one to three selective nucleotides. One of the primers is labeled with a
fluorescent dye (Vos et al 1995). The polymorphism revealed by amplified fragment
length polymorphism (AFLP) analysis depends on restriction endonuclease site
differences, just like RFLP. The variable fragments (loci) have two alleles (positive and
null), and so the same concerns raised about null alleles with RFLP and RAPD analysis
apply (Taylor et al., 1999). The advantage of AFLP analysis is that only a limited
amount of DNA is needed since the fragments are PCR amplified producing more than
with RFLP. Furthermore, since stringent annealing temperatures are used during
amplification, the technique is more reproducible and robust than other methods such as
RAPD analysis (Savelkoul et al., 1999). The value and potential of the AFLP technique
in differentiation and identification from strains and/or species and the similarity of
genetic analysis has been demonstrated in yeast ecology and evolutionary studies
(Barros Lopes et al., 1999).
Sequencing is the best method for measuring phylogenetic relationships and
discriminating between species and strains through comparison of the DNA sequences
of a variety of noncoding and coding regions. The underlying assumption is that the
rates at which mutations accumulate in gene regions reflect evolutionary clocks.
Because particular noncoding and coding regions may be highly resistant to change and
therefore may have very slow evolutionary clocks while other coding regions, such as
those of genes involved in pathogenesis, may be far less resistant to change and
therefore may have very fast evolutionary clocks. The studies of rRNA genes were
based on the assumption that they were less prone to biases resulting from selective
pressure (Karl and Avise, 1993). Until recently, cloning and sequencing genes were
slow, technically demanding and expensive, that seemed beyond the technical capacity
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
24
of most medical mycology laboratories. However, the emergence of PCR, automated
DNA-sequencing technologies, and gene data banks have made these studies easier
(Soll, 2000). The sequencing of several genes both, nuclear and mithocondrial, is being
used to infer phylogeny and discriminate fungal species and strains.
In the first sequencing studies, noncoding regions have been selected to infer
phylogeny, using the small subunit ribosomal DNA sequences from eukaryotes and
prokaryotes (Woese and Fox, 1977), allowing the reconstruction of the first tree of life.
Intergenic Transcribed Spacer (ITS) regions are mainly used in fungal taxonomy and
phylogeny, but to resolve the phylogenetic placement of some fungal species that
present an uncertain position new genes are being used as nuclear large and small
subunit ribosomal DNA (nucLSU and nucSSU), mithocondrial large and small
ribosomal DNA (mitLSU and mitSSU), second large subunit of RNA polymerase
(RPB2), α and β tubulin and elongation factor 1 (EF-1), clarifying the divergence of
species (Lutzoni et al., 2004; Bridge et al., 2005; James et al., 2006). Currently, the
sequencing is being performed at the genome level what has enabled to explain better
the phylogenetic relationships in fungi (Fitzpatrick et al., 2006).
1.7. Methods for phylogenetic inference
Previously, the attempt of inferring the phylogeny was based in the identification of
common morphological characters amongst species. Though these data enabled us to
group close species, it did not always resolve deep phylogenetic relationships especially
in organisms with particular characters like fungi that were for long time related with
plants. Currently, the methods for inferring phylogenetic relationships between species
are using amino acid or nucleotide sequences, being these data more reliable because
we can obtain information that is not possible to visualize by morphological analysis,
such as presence and difference among homologue genes, mutations, and duplicated
genes, for instance.
In phylogenetic studies the first step is the single-gene or multigenic homologue
sequence alignment. This procediment will allow the identification of regions with high
homologies and how close the sequences are by using both nucleotides and amino acid
sequences. Several programs have been proposed for alignment of sequences, being the
most used the CLUSTAL series of programs. The obtained alignment is imported into a
INTRODUCTION
25
program of phylogenetic inference, which can use two primary approaches to
phylogenetic tree construction: an algorithmic and a tree-searching. The first uses an
algorithm to construct a tree from the data. The second constructs many trees, and then
the best tree or best set of trees is selected. These two methods are also known as
distance and character-based methods, respectively. Below, the two methods of
alignment as well as methods of phylogenetic construction will be described.
1.7.1. Multiple sequence alignment
1.7.1.1. CLUSTAL series of programs
The Clustal series of programs are widely used in molecular biology for the multiple
alignments of both nucleic acid and protein sequences and for preparing phylogenetic
trees. The first Clustal program was designed specifically to work efficiently on
personal computers, which at that time, had feeble computing power by today‘s
standards. It combined a memory-efficient dynamic programming algorithm with the
progressive alignment strategy. The multiple alignments are built up progressively by a
series of pairwise alignments, following the branching order in a guide tree. The initial
pre-comparison used a rapid word-based alignment algorithm and the guide tree was
constructed by using the Unweighted Pair-Group Method with Arithmetic Mean
(UPGMA) method (Higgins and Sharp, 1988). In 1992, a new release was made, called
ClustalV, which incorporated profile alignments (alignments of existing alignments)
and the facility to generate trees from the multiple alignment by using the Neighbour
Joining (NJ) method, and the user could also test the tree for robustness by using a
simple bootstrap test of tree topology (Higgins et al., 1992). In 1994, the third
generation of the series Clustal W was released. Clustal W incorporated position-
specific gap penalties so that gap penalties can be lowered at hydrophilic residues and
wherever gaps are already introduced into the alignment. The sequence pre-comparison
in Clustal W uses more sensitive dynamic programming, which yields a much better
dendrogram by means of NJ, which improves tree topology and provides a method for
weighting sequences on the basis of their divergences (Thompson et al., 1994). After,
Clustal X was developed, displaying many improvements. Within alignments,
conserved columns are highlighted by using a colour scheme that the user can
customize. Beneath the sequence alignment, Clustal X provides a plot of residue
conservation. Quality-analysis tools that highlight misaligned regions are also available
(Thompson et al., 1997). The popularity of the programs depends on a number of
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
26
factors, including not only the accuracy of the results, but also the robustness,
portability and user-friendliness of the programs.
1.7.1.2. MUSCLE
Multiple sequence comparison by log-expectation (MUSCLE) is a new
computer program for creating multiple alignments of protein sequences or alternatively
nucleotides. This program uses two distance measures for a pair of sequences: a kmer
distance (for an unaligned pair) and the Kimura distance (for an aligned pair). A kmer is
a contiguous subsequence of length k, also known as a word or k-tuple. Related
sequences tend to have more kmers in common than expected by chance. The kmer
distance is derived from the fraction of kmers in common in a compressed amino acid
alphabet. This measure does not require an alignment, giving a significant speed
advantage. Given an aligned pair of sequences, it computes the pairwise identity and
converts it to an additive distance estimate, applying the Kimura correction for multiple
substitutions at a single site. Distance matrices are clustered using UPGMA, which give
slightly improved results over neighbor-joining, despite the expectation that neighbor-
joining will give a more reliable estimate of the evolutionary tree. This can be explained
by assuming that in progressive alignment, the best accuracy is obtained at each node by
aligning the two profiles that have fewest differences, even if they are not evolutionary
neighbors. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of
these sets. Without refinement, MUSCLE achieves average accuracy statistically
indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods
for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min
on a current desktop computer (Edgar, 2004).
1.7.2. Major methods for estimating phylogenetic trees and infer
phylogeny
1.7.2.1. Distance methods
These methods convert aligned sequences into a distance matrix of pairwise
differences (distances) between the sequences. The matrix is used as the data from
which branching order and branch lengths are computed. Within the most used methods
we have the NJ and the UPGMA.
UPGMA is a straightforward example of a clustering method. It starts with grouping
two taxa with the smallest distance between them according to the distance matrix.
INTRODUCTION
27
Then, a new node is added in the midpoint of the two, and the two original taxa are put
on the tree. The distance from the new node to other nodes will be the arithmetic
average. Then, a reduced distance matrix is obtained by replacing two taxa with one
new node. That process is repeated on the new matrix and reiterated until the matrix
consists of a single entry. That set of matrices is then used to build up the tree by
starting at the root and moving out to the first two nodes represented by the last two
clusters. In fact, the outcome phylogenetic tree from UPGMA is not an arbitrary tree.
Instead, it is a special type of tree, which satisfies the molecular clock property. That is,
the total time travelling down a path to the leaves from any node is the same, regardless
the choice of path. Thus, if the original phylogenetic tree from observation does not
have this property, it will not reconcile well with the tree from UPGMA. One necessary
condition for producing a better phylogenetic tree is the ultrametric condition, which
requires that any triangle must be at least equilateral with regard to the weights between
them, what is very unlikely. For that and other reasons, UPGMA is considered a bad
algorithm for construction of phylogenetic trees, since it relies on the rates of evolution
among different lineages to be approximately equal (Sneath and Sokal, 1973; Hall,
2001).
Neighbor Joining is similar to UPGMA in that it manipulates a distance matrix,
reducing it in size at each step, and reconstructing the tree from that series of matrices.
It differs from UPGMA in that it does not construct clusters but directly calculates
distances to internal nodes. In this method, the total sum of the individual distances is
not computed for all or many different topologies, but the examination of different
topologies is imbedded in the algorithm, so that only one final tree is produced. From
the original matrix, NJ starts first with a star phylogeny. The next step is to calculate for
each taxon its net divergence from all other taxa as the sum of the individual distances
from the taxon. It then uses that net divergence to calculate a corrected distance matrix.
NJ then finds the pair of taxa with the lowest corrected distance and calculates the
distance from each of those taxa to the node that joins them, so that we can identify a
pair of neighbors. Once this pair is identified, they are combined as a single unit and
treated as a single sequence in the next step. This process is continued until all
multifurcating nodes are resolved into bifurcating ones. NJ does not assume that all taxa
are equidistant from a root. Unlike UPGMA, NJ method produces unrooted trees. One
of the disadvantages of the NJ method is that it generates only one tree and does not test
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
28
other possible tree topologies. This can cause problems because in many cases in the
initial set up of NJ, there might be multiple equally close pair of neighbors to join,
leading to multiple trees. Since there is no way the algorithm would know which one is
the most optimal tree, choosing a wrong option may produce a suboptimal tree. To
overcome this problem, a generalized NJ method has been developed in which multiple
trees with different initial taxon groupings are generated (Saitou and Nei, 1987; Nei,
1996; Hall, 2001).
1.7.2.2. Character-based methods
These methods are characterized by using multiple alignments directly for
comparison of characters within each column (each site) in the alignment. Three main
methods are applied in phylogeny: maximum parsimony (MP), maximum likelihood
(ML) and Bayesian inference (BI).
The MP method, a given set of nucleotide (or amino acid) sequences are
considered, and the nucleotides (or amino acids) of ancestral sequences for a
hypothetical topology are inferred under the assumption that mutational changes occur
in all directions among the four different nucleotides or the 20 amino acids. The
smallest number of nucleotide substitutions that explain the entire evolutionary process
for the given topology is then computed. This computation is done for all other
topologies, and the topology that requires the smallest number of substitutions is chosen
as the best tree. If there are no multiple substitutions at each site, MP is expected to
generate the correct topology as long as enough parsimony-informative sites are
examined. The disadvantage is that in practice the nucleotide sequences are often
subject to backward and parallel substitutions and this introduces uncertainties in
phylogenetic inference. When the true tree has a special type of topology and branch
lengths MP may generate an incorrect topology even if an infinite number of
nucleotides are examined and this can happen even if the rate of nucleotide substitution
is constant for all evolutionary lineages (Felsenstein, 1978; Takezaki and Nei, 1994).
Furthermore, in parsimony analysis it is difficult to treat the phylogenetic inference in a
statistical framework because there is no natural way to compute the means and
variances of minimum numbers of substitutions obtained by the parsimony procedure.
However, under certain circumstances, MP is quite efficient in obtaining the correct
topology. It should also be noted that MP is the only method that can easily take care of
INTRODUCTION
29
insertions and deletions of nucleotides, which sometimes give important phylogenetic
information (Nei, 1996).
Cavalli-Sforza and Edwards (1967) were the first to propose the idea of using a
ML method for phylogenetic inference from gene frequency data, but they encountered
a number of problems when implementing this method. Later, considering nucleotide
sequence data, an algorithm was developed for constructing a phylogenetic tree by the
ML method (Felsenstein, 1981). In the case of amino acid sequence data, the likelihood
method was proposed by using an empirical transition matrix for 20 different amino
acids (Kishino et al., 1990). Later, this method was extended by using various transition
matrices for nuclear and mitochondrial proteins (Adachi and Hasegawa, 1995; 1996).
The first step is the selection of the substitution model of interest that gives the
instantaneous rates at which each of the four possible nucleotides (or 20 amino acids)
changes to each of the other three possible nucleotides (or 19 amino acids). Alignments
of homologous DNA and amino acid sequences can be examined under a wide range of
evolutive models (JC69, K80, F81, F84, HKY85, TN93 and GTR for nucleotides, and
Dayhoff, JTT, mtREV, WAG and DCMut for amino acids) (Guindon et al., 2005). ML
looks for the tree that, under some model of evolution, maximizes the likelihood of
observing the data that is expressed as log-likelihood. ML program seeks the tree with
the largest log likelihood. The advantages of the ML method are that it allows users to
specify the evolutionary model they want to use, and that the likelihood of the resulting
tree is known. A disadvantage is that ML method is considerably slower than either
Parsimony or NJ, and it is not difficult to exceed the capacity of even the most up-to-
date desktop computer (Hall, 2001).
In the BI, inferences of phylogenies are based upon the posterior probability
(PP) of phylogenetic trees. Bayesian analysis of phylogenies is similar to ML in that the
user postulates a model of evolution and the program searches for the best trees that are
consistent with both the model and the data (the alignment). It differs somewhat from
ML in that while ML seeks the tree that maximizes the probability of observing the data
given by that tree, bayesian analysis seeks the tree that maximizes the probability of the
tree given the data and model for evolution. Unlike ML, that seeks the single most
likely tree, Bayesian analysis searches for the best set of trees. MrBayes program uses
the Metropolis-coupled Markov Chain Monte Carlo (MCMCMC) method, which works
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
30
proposing a new state (new tree) by using a substitution model, and then the acceptance
probability for this state is calculated (Hasting, 1970; Green, 1995). A random number
between 0 and 1 is drawn, and if that number is less than the calculated probability, the
new state is accepted; otherwise the state remains the same. The proposed new state
involves moving a branch and/or changing length of a branch to create a modified tree.
If that new tree is more likely than the existing tree, given the model and the data, it is
more likely to be accepted. This process of proposing and accepting/rejecting new state
is repeated many thousands or millions of times (Hall, 2001; Huelsenbeck and
Ronquist, 2001). Although BI present computational efficiency with respect to the other
character-based methods some questions are unsolved as convergence of Markov chain,
discrepancy between Bayesian probabilities and nonparametric bootstrap test values
(Huelsenbeck et al., 2002).
1.8. Adaptive evolution
Most amino acid substitutions are deleterious and are eventually eliminated from the
population by purifying selection. Some amino acid substitutions are neutral and pure
chance determines whether they will be fixed into the population. In rare instances, an
amino acid substitution is beneficial and is fixed into the population by positive
selection. At the nucleotide level, some base substitutions cause amino acid
replacements and others are silent. The nucleotide position within a codon affects
whether an amino acid replacement takes place when the nucleotide changes. Most
mutations at the first position in a codon result in amino acid replacements; all
mutations at second positions are replacement mutations, but because of the redundancy
of the genetic code, many third position mutations are silent. The number of non-
synonymous changes per non-synonymous site is called dN, and the number of
synonymous changes per synonymous is called dS. When the sequences are compared a
dN/dS ratio<1 is taken as evidence of purifying selection, a ratio=1 is taken as evidence
that mutations were generally neutral, and a dN/dS ratio >1 is taken as evidence of
positive selection. To identify evidence of positive or purifying selection, studies of
adaptive evolution normalize the number of non-synonymous and synonymous
mutations to the number of non-synonymous and synonymous sites in the gene.
Several methods have been proposed to estimate nonsynonymous and synonymous
rate of substitution by comparison of nucleotides and amino acid sequences. The first
INTRODUCTION
31
methods were based on estimating the numbers of synonymous and nonsynonymous
substitutions from comparison of two coding DNA sequences (Miyata and Yasunaga,
1980; Li et al., 1985; Nei and Gojobori, Li, 1993; Pamilo and Bianchi, 1993). The
method of Nei and Gojobori (1986) is the simplest and performs a codon-by-codon
comparison. The model of Jukes and Cantor (1969) is applied to correct multiple
substitutions that have occurred at one site, and it normally produces results very similar
to those obtained by Nei-Gojobori method.
Methods based on a likelihood approach were proposed for determining positive
selection by comparison of multiple sequences, whose models of codon substitution
assume a single ω=dN/dS for all lineages and sites and has been extended to account for
variation of ω, either among lineages or among sites (Goldman and Yang, 1994; Muse
and Gaut, 1994). The lineage-specific models allow for variable ωs among lineages and
are thus suitable for detecting positive selection along lineages. They assume no
variation in ω among sites. As a result, they detect positive selection for a lineage only
if the average dN over all sites is higher than average dS (Muse and Gaut, 1994; Yang,
1998; Yang and Nielsen, 1998). The specific-site models allow the ω ratio to vary
among sites but not among lineages. Positive selection is detected at individual sites
only if the average dN over all lineages is higher than average dS (Nielsen and Yang,
1998; Yang et al., 2000). The model branch-site allows the ω ratio to vary both among
sites and among lineages. This model assumes that positive selection occurs at a few
time points and affects a few amino acids (Yang and Nielsen, 2002).
The codon-based models, based on likelihood approach, include parameters for
transition-transversion bias and for codon frequencies. In the case of the Goldman and
Yang (1994) model it also includes the different replacement probabilities between
amino acids based on the Grantham (1974) physicochemical distance matrix, and
Bayesian models that assume a prior distribution dN/dS ratio (Nielsen and Yang, 1998;
Yang et al., 2000). All methods are based on theoretical assumptions and ignore the
empirical observations that distinct amino acids differ in their replacement rates.
Recently, a codon-based model, named mechanistic-empirical model (MEC), was
published what takes into account all the parameters mentioned above and the
assimilation of empirical amino acid replacement probabilities into the codon-
substitution matrix (Doron-Faigenboim and Pupko, 2007).
THESIS BACKGROUND AND OBJECTIVES
THESIS BACKGROUND AND OBJECTIVES
35
2. THESIS BACKGROUND AND OBJECTIVES
Throughout the last decade, yeasts belonging to the genus Candida have emerged as
major opportunistic pathogens broadly distributed in the nature, being frequently
present as innocuous commensal of the skin, mouth, vagina, and pharynx as well as of
the urinary and gastrointestinal tracts in humans and other warm-blooded animals
(Coleman et al., 1998; Pedrós, 2003). These yeasts present small cells about 4 a 6 um
diameter, hyphaes and pseudohyphae and reproduce majorly by budding. As a relatively
harmless commensal organism C. albicans exists in about 50% of the human population
(Odds, 1988; Calderone, 2002).
Candida species produce infections denomined candidiasis. These infections can
be superficial form particularly in the mucosas (epitelium and endotelium), or
generalized causing systemic candidiasis also designated as candidemia (Gudlaugsson
et al., 2003). These infections have increased over the last two decades owing to the
increase of immunocompromised patients due to hematological disorders, transplants,
chemotherapy, AIDS, among others. They are also frequent in the extremes of age such
as in the elderly and in neonates, particularly the underweight, and in women of
childbearing age (López Martínez et al., 1984; Musial et al., 1988; Horowitz et al.,
1992; Ball et al., 2004; Estrella et al., 2000; Clark et al., 2004; Chen et al., 2005;
Tavanti et al., 2005a). As opportunistic they can present pathogenic character under
determined conditions, becoming an important cause of infection (Coleman et al., 1998;
Pedrós, 2003). The infections caused by yeasts from the genera Candida became the
fourth cause of invasive infections in United States and the eighth in Europe (Jarvis,
1995; Fluit et al., 2000).
Approximately 200 Candida species are known, of which nearly 20 species of
Candida have been identified as etiologic agents of infections, but the list of medically
important yeasts continues to grow (Calderone, 2002). Although Candida albicans has
been, in the past, the most common causative organism of fungemia and disseminated
candidiasis, other Candida species are becoming increasingly more prevalent. A
worldwide study of bloodstream infections from 1997 to 1999 showed that at least 45%
of yeast infections were caused by other Candida species than C. albicans (Pfaller et al.,
2001). The most commonly isolated species, apart from C. albicans, are C. parapsilosis
(20 to 40% of all reported episodes of candidemia), C. tropicalis (10 to 30%), C. krusei
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
36
(10 to 35%), C. glabrata (5 to 40%), C. guilliermondii (2 to 10%) and C. lusitaniae (up
to 8%) (Sandven 2000, Krcmery and Barnes, 2002). The emergence of these non-
albicans species is associated with inherent or acquired resistance to fluconazole in up
to 75% of all C. krusei infections and 35% of all C. glabrata infections; conversely, at
present C. lusitaniae remains highly susceptible to azole antifungal agents but it is
resistant to Amphotericine B (Wingard et al., 1991; Krcmery and Barnes, 2002; Pfaller
et al., 2003). C. dubliniensis has emerged as an opportunistic pathogen closely related to
C. albicans. It shares many diagnostic characteristics with C. albicans but differs with
respect to its epidemiology, virulence, and the ability to develop fluconazole resistance
(Sullivan and Coleman, 1997; Gutierrez et al., 2002). The incidence of C. parapsilosis
is associated with handling central venous catheters and its ability ofgrowing as biofilm
on implanted medical devices (Baillie and Douglas, 1998; Kuhn et al., 2004). C.
tropicalis is related to patients with cancer, particularly due to its increased invasiveness
when the gastrointestinal mucosa is damaged and the intestinal flora is suppressed by
pharmacological treatments (Kullberg and Lashof, 2002).
Other new species have been appearing as C. rugosa, C. famata, C. africana, C.
bracarensis, C. fabianii, C. ciferri, C. membranaefaciens, C. metapsilosis, C.
orthopsilosis, C. pseudohaemulonii and C. pseudorugosa, affecting mainly
immunocompromised patients, taking advantage of host weakness to pass from
commensal to infectious. (Hazen, 1995; Tietz et al., 2001; García-Martos et al., 2004;
Fanci and Pecile, 2005; Tavanti et al., 2005b; Correia et al., 2006; Li et al., 2006;
Sugita et al., 2006; Valenza et al., 2006).
All Candida species belong to the subphylum Saccharomycotina, which is
included in the phylum Ascomycota. Both mithocondrial and nuclear genes have been
used to construct the phylogeny of the subphylum Saccharomycotina, making use of
one gene or using multigenic analysis (Barns et al., 1991; Bowman et al., 1992;
Diezmann et al., 2004; Lutzoni et al., 2004, Lopandic et al., 2005; James et al., 2006).
Others have carried out phylogenomic analysis, with very good results, but some
species still need to resolve their correct place in the phylogeny due to the lack of other
supporting sequenced genomes (Robbertse et al., 2006; Fitzpatrick et al., 2006).
THESIS BACKGROUND AND OBJECTIVES
37
The search of new genes and the evaluation of their usefulness in phylogenetic
studies are very important to understand the phylogenetic relationships and their
potential uses in fungal molecular systematics. Genes that are considered good
candidates for use in phylogeny present specific features as vertical transmission,
ubiquity and slowly evolving sites. Within these new genes we can consider MADS-box
transcription factors and GPI-anchor proteins as potential candidates since they present
all the above characteristics. The MADS-box genes encode a eukaryotic family of
transcriptional regulators involved in diverse and important biological functions and the
GPI-anchor proteins participate in the interaction with the environment.
The MADS-box proteins have been identified in yeasts, plants, insects,
nematodes, lower vertebrates and mammals. These proteins are characterized by
containing a conserved DNA binding and dimerization domain named the MADS-box
(Schwarz-Sommer et al., 1990) after the five founding members of the family: Mcm1
(yeast) (Passmore et al., 1989), Arg80 (yeast) (Dubois et al., 1987) or Agamous (plant)
(Yanofsky et al., 1990), Deficiens (plant) (Sommer et al., 1990) and SRF (human)
(Norman et al., 1988). In animal and fungi, two distinct types of MADS-box genes have
been identified, the SRF-like (type I) and MEF2-like (type II) classes (Shore and
Sharrocks, 1995). These MEF2-like amino acid sequences are more closely related to
most plant MADS-domain sequences, suggesting that at least one gene-duplication
event occurred before the divergence of plants and animals (Alvarez-Buylla et al.,
2000). Thus, both type I and type II MADS-box proteins can be found in each kingdom.
The organization of type I and type II MADS-box proteins is represented in Figure 5.
Type I and type II proteins can be further classified in subfamilies on the basis of shared
sequence similarity between their C-terminal extensions and the highly conserved
MADS domain. These domains are referred to as SAM in fungi and animal SRF-like
proteins, as MEF2 in fungi and animal MEF2-like proteins and as I, K and C in plant
type II proteins (Mueller and Nordheim, 1991; Sharrocks et al., 1993; Riechmann et al.,
1996; Riechmann and Meyerowitz, 1997). MADS-box family members generally
recognize AT rich consensus sequences, with a highly conserved core of 10 bp:
CC(A/T)6GG is the binding site of SRF-like proteins known as CArG box (Treisman,
1990), and CTA(A/T)4TAG is the binding site of MEF2-like proteins (Figure 6)
(Pollock and Treisman, 1991).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
38
Figure 5. Schematic representation of Type I (SRF-like) and Type II (MEF2-like) protein domains in
plant, animal, and fungi. The scale indicates the number of amino acids along the protein. Plant
Type II-like proteins have carboxyl-terminal domains that go beyond 200 amino acids. In plant
Type I-like proteins the ―?‖ indicates carboxyl-terminal domains not well defined yet and of
variable lengths (Based on Alvarez-Buylla et al., 2000).
In the yeast Saccharomyces cerevisiae four MADS-box proteins have been
found: Mcm1 and Arg80 which are related to the human SRF, and Rlm1 and Smp1
which belong to the MEF2-like family (Alvarez-Buylla et al., 2000). Rlm1 controls the
expression of genes required for cell wall integrity, while Smp1 is involved in osmotic
stress response mediated by the Hog1 signal transduction pathway (Watanabe et al.,
1995; Dodou and Treisman, 1997; Nadal Ed et al., 2003). Arg80 is only required for
repression of arginine anabolic genes and induction of arginine catabolic genes
(Messenguy and Dubois, 2000), whereas Mcm1, also involved in the control of arginine
metabolism, playing a pleiotropic role in the cell (Messenguy and Dubois, 1993). Mcm1
is essential for cell viability and controls M/G1 and G2/M cell-cycle-dependent
transcription (Althoefer et al., 1995; McInerny et al., 1997), mating (Jarvis et al., 1989),
minichromosome maintenance (Passmore et al., 1988), recombination (Elble and Tye,
1992), transcription of TY elements (Errede, 1993; Yu and Fassler, 1993) and
osmotolerance (Kuo et al., 1997).
THESIS BACKGROUND AND OBJECTIVES
39
Figure 6. Structure of Type I (SRF) and Type II (MEF2A) MADS domains in complex with their DNA
target (Homo sapiens). (a) Aligned sequences of SRF and MEF2A MADS domains with α and
β secondary structure assignments. (b) The SRF– core DNA complex with one monomer in red
and the other monomer in green (taken from Pellegrini et al., 1995). DNA is in grey, with
upper and lower DNA strands labelled W and C. (c) SRF DNA sequence (18 bp) with the
palindromic CArG recognition element boxed. (d) MEF2A–DNA complex with one monomer
in red and the other monomer in green (adapted from Santelli and Richmond, 2000). DNA is in
grey. (e) MEF DNA sequence (17 bp) in the crystal structure with MEF2 consensus site boxed
(taken from Santelli and Richmond, 2000).
Other MADS-box proteins type I have also been studied, for instance in
Schizosaccharomyces pombe, MAP1 gene encodes a protein required for cell-type-
specific gene expression, in Ustilago maydis, UMC1 encodes a protein that regulates the
expression of pheromone-inducible genes (Yabana and Yamamoto, 1996; Kruger et al.,
1997) and in C. albicans, CaMCM1 is crucial for morphogenesis (Rottmann et al.,
2003). In the case of MADS-box type II, putative RLM1 orthologues were identified in
C. albicans, Paracoccidioides brasiliensis and Aspergillus niger (Damveld et al., 2005;
Fernandes et al., 2005; Bruno et al., 2006).
GPI-anchor proteins (GPiPs) contain a C-terminal signal sequence that allows
the linkage to a glycosylphosphatidylinositol (GPI) anchor. Proteins destined to be GPI
anchored share conserved features: an N-terminal signal sequence for localization to the
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
40
endoplasmic reticulum (ER), a C-terminal hydrophobic domain (9 to 24 residues) for
transient attachment to the ER membrane, and the so-called ω site, where the protein is
cleaved to be ligated to the GPI anchor (Thomas et al., 1990). The ω site is localized 9
to 10 amino acids before the C-terminal hydrophobic domain, and its amino acid
environment determines whether the protein has a high probability to be GPI anchored.
Amino acids at positions ω-1 to ω-11 form a linker region, usually with no charge and
no secondary structure (α-helix or β-sheet). The ω site region is composed of small
amino acids to fit the transamidase protease catalytic pocket, and finally, the spacer
region (ω + 3 to ω + 9) is comparable to the linker, having no charge and being flexible
(Eisenhaber et al., 2004). Shortly after protein synthesis in the ER, the preformed GPI
anchor replaces the C-terminal transmembrane region (Figure 7).
Figure 7. GPI anchor structure. The structure of core GPI anchor which consists of a lipid group (serving
as the membrane anchor), a myoinositol group, an N-acetylglucosamine group, three
mannose groups, and a phosphoethanol phosphoethanolamine group, which ultimately
connects the GPI anchor to the protein via an amide linkage represented in black (Tiede et
al., 1999). The addition of mannose groups and the positioning of side chains like
phosphoethanolamine on the GPI anchor contributes to the variety of GPI anchors identified
so far. The additional gray groups illustrate the side chains added to the GPI anchor in S.
cerevisiae (side chains are specific to each organism).
GPI anchoring is encountered in every eukaryotic cell from unicellular yeast
cells to the highly specialized mammalian cells (Bowman et al., 2006; Ferguson, 1999;
McConville and Menon, 2000). In S. cerevisiae, most are involved in cell wall
compound biosynthesis, flocculation, protease activity, sporulation, and mating (Caro et
al., 1997). These proteins have also been found in A. nidulans, C. albicans, C. glabrata,
THESIS BACKGROUND AND OBJECTIVES
41
Neurospora crassa and Schizosacch. pombe (Caro et al., 1997; de Groot et al., 2003;
Eisenhaber et al., 2003; Eisenhaber et al., 2004; Sundstrom, 2002; Weig et al., 2004). It
is notable that S. cerevisiae and Schizosacch. pombe seem to possess a smaller number
of GpiPs than C. albicans.
IFF8 gene encodes a putative GPI-anchored protein of unknown function that
forms part of the group of genes, also known as the IFF genes (for individual protein
file family F). Analysis of the coding sequences in this family revealed gene products
with a large N-terminal domain of 340 amino acids shared by each member of the
family, with high homology. The proteins diverge strongly after this domain; both in
size and sequence, including its C-terminal hydrophobic domain that allows for linkage
to a glycosylphosphatidylinositol (GPI) anchor (Richard and Plaine, 2007).
Objectives
Currently, protein-coding genes are used to resolve deep level phylogenetic
relationships through multigenic and phylogenomic analysis, but the majority of these
analyses uses regions with similarity only among orthologue gene sequences and does
not take into account the other regions which could give us useful information to infer a
more robust phylogeny. Hence, the main purpose of this thesis was to evaluate the use
of the MADS-box transcription factors in fungi phylogeny and the use of IFF8 gene to
help to resolve the phylogeny in the particular CUG group.
Therefore, the specific aims of the present work were:
* Search for MADS-box and IFF8 ortohologue genes in the kingdom Fungi.
* Analyze the potential use of these genes to infer phylogenetic relationships
according to current proposal in the kingdom Fungi.
* Infer the phylogeny of Candida species based on concatenated alignment of the
previously predicted genes.
* Analyze the putative adaptive evolution of RLM1 gene in the subphylum
Saccharomycotina, particularly in the Whole Genome Duplicate (WGD) group.
* Identify the molecular mechanism that putatively changed RLM1 gene in the
subphylum Saccharomycotina.
MATERIAL AND METHODS
MATERIAL AND METHODS
45
3. MATERIAL AND METHODS
3.1. Phylogenetic analysis
3.1.1. Data retrieval
Sequences used in this study were obtained from several different databases (DB)
and Rlm1 and Mcm1 protein and DNA sequences from Saccharomyces cerevisiae as
well as the Iff8 protein and DNA sequences of Candida albicans were used as
references. The S. cerevisiae Rlm1 and Mcm1 protein/gene sequences were obtained
from the Genbank under accession numbers BAA09658/D63340 and
CAA88409/Z48502, respectively. The Iff8 protein/gene sequences of C. albicans were
obtained from Candida Database orf19.8201.
To obtain both nucleotide and amino acid putative orthologue sequences for the 76
fungal species, the protein sequences referred above were used as reference to carry out
TBLASTN searches (Matrix BLOSUM62, cut-off value E > 10-5
) in the following
databases: Candida DB (http://www.candidagenome.org), Saccharomyces DB
(http://www.yeastgenome.org), Broad Institute (http://www.broad.mit.edu), Joint
Genome Institute (JGI, http://www.jgi.doe.gov), Genolevures (http://cbi.labri.fr), The
Institute for Genomic Research (TIGR, http://www.tigr.org), Washington University St.
Louis (WUSTL, http://www.wustl.edu), National Center for Biotechnology Information
(NCBI, http://www.ncbi.nlm.nih.gov), Sanger Institute (http://www.sanger.ac.uk),
Baylor College of Medicine (http://www.hgsc.bcm.edu), Oklahoma University
(http://www.genome.ou.edu) and Murcia University (http://mucorgen.um.es). To avoid
errors in the following alignments, if partial orthologue sequences were obtained in this
search they were completed with the information from the closest species. Animals and
plants MADS box sequences were used as outgroup.
Both outgroup protein/gene sequences were retrieved from Genbank database under
the following accession numbers, Type II: Homo sapiens MEF2D
NP_005911/NM_005920, Xenopus laevis MEF2 AAI12917/BC112916, Danio rerio
MEF2D AAH98522/BC098522, Drosophila melanogaster DMEF2
NP_477021/NM_057673, Caenorhabditis elegans CEMEF2 NP_492441/NM_060040,
Arabidopsis thaliana APETALA1 NP_177074/NM_105581 and Oryza sativa
AGAMOUS ABG21913/DP000011; Type I: Homo sapiens SRF
NP_003122/NM_003131, Xenopus laevis SRF NP_001095218/NM_001101748, Danio
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
46
rerio MEF2D NP_001103996/NM_001110526, Drosophila melanogaster DSRF
NP_726438/NM_166667, Caenorhabditis elegans NP_492226/NM_059895,
Arabidopsis thaliana AGL36 NP_850880/NM_180549 and Oryza sativa AGL35
BAD81343/AP002070. No outgroup was used in Iff8 protein and gene analyses.
3.1.2 Sequence alignments
The sequences obtained were aligned by using MUSCLE program (Edgar, 2004),
and then imported into MEGA 4.0 software (Tamura et al., 2007) to be manually edited.
The alignments were based on amino acids, therefore nucleotides were aligned in
codons. As some parts of the C-terminal regions were too divergent to be confidently
aligned, a preliminary NJ tree in Clustal W (Thompson et al., 1994) and UPGMA tree
in MUSCLE was produced for each phylum, aligning the conserved C-terminal
domains. After this previous analysis, the alignment of the less-conserved C-terminal
regions from different phyla became much easier and a new and final total alignment
was produced.
3.1.3. Selection of the best evolutionary model to infer phylogeny
The model selection approach, Akaike Information Criterion (AIC), was used to
estimate the best-fit protein evolutionary model for ML. The ProtTest 1.4 program was
used and the best model was selected as the one that presented the greatest likelihood
value, from the 48 models produced by the program (Abascal et al., 2005). The same
approach was used to estimate the best-fit DNA evolutionary model for ML but by
using the jModeltest 1.0 program (Posada, 2008) for likelihood calculations, the best
model was also the one that presented the greatest likelihood value from the 88 models
of evolution produced. For Bayesian inference a model selection approach was also
performed, using the 4 hierarchies for likelihood ratio test to determine the simplest and
most appropriate DNA evolutionary model by using MrModeltest 2.3 (Nylander, 2004)
and PAUP 4.0b10 (Swofford, 2002) programs as well as the required file
Mrmodelblock (anexo I).
3.1.4. Phylogenetic reconstruction
Phylogenetic analysis was performed by ML in PHYML 3.0 program (Guindon and
Gascuel, 2003), for each protein and/or gene and protein and/or gene concatenated
alignments previously obtained. To obtain the phylogenetic tree, the best evolutionary
MATERIAL AND METHODS
47
model determined previously was considered and the branch/nodal supports were
estimated with the approximate likelihood ratio test by using the Shimodaira-Hasegawa
like support option (Anisimova and Gascuel, 2006).
Bayesian inference (BI) was performed by using MrBayes 3.2 program
(Huelsenbeck and Ronquist, 2001) for each alignment previously obtained. As for ML,
to obtain the phylogenetic tree by bayesian inference the best evolutionary model
determined by the previous analysis was also considered and the branch/nodal support,
given as the posterior probability (PP) values in this analysis, was calculated by using
the Metropolis-coupled Markov chain Monte Carlo method after calculating a large
number of different trees. To infer fungal phylogeny using RLM1 gene four chains with
4.000.000 analysis were run. For Saccharomycotina phylogeny inferred with RLM1
gene, four chains with 800.000 analyses were run and for phylogeny inferred with
MCM1 gene four chains with 1.250.000 analyses. Saccharomycotina phylogeny with
RLM1+MCM1 genes was inferred by running four chains with 5.500.000 analyses. To
infer CUG group phylogeny using IFF8 gene four chains with 1.000.000 analyses were
run and by using RLM1+MCM1+IFF8 genes 2.000.000 generations. Trees were
sampled every 100 generations and trees obtained before the convergence of the
Markov chain were not included in the consensus tree. The remaining samples were
used to construct the consensus tree and the branches/nodes probability posterior values
were converted from frequency to percentage, importing into PAUP 4.0b10. In the BI,
these percentages for the consensus tree are the rough equivalent of a ML search with
bootstrapping analysis (Huelsenbeck and Ronquist, 2001).
To visualize the degree of phylogenetic conflict within concatenated alignments a
phylogenetic network was generated by using the NeighborNet method in the
SPLITSTREE 4.1 program (Huson and Bryant, 2006), with 1000 bootstrap analysis.
3.1.5. Interpretation of values considered for nodal support
Previous studies have demonstrated that to infer about the nodal support in
phylogenetic trees the information from posterior probabilities and bootstrap analysis
should be complemented (Alfaro et al., 2003; Douady et al., 2003; Reeb et al., 2004).
Bayesian methods are more efficient in recovering accurate nodal support values, but
same authors have indicated that they could be less conservative than boostrap (Suzuki
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
48
et al., 2002; Alfaro et al., 2003), unlike Shimodaira-Hasegawa (S-H) approach used in
the ML analyses which is more conservative (http://atgc.lirmm.fr/alrt). Therefore, a
combination of both, posterior probabilities and Shimodaira-Hasegawa like support,
were used to assess the level of confidence of a specific node in the phylogenetic DNA
trees. Throughout this work, the following scale was used:
- High (strong) support - PP ≥ 95% and S-H ≥ 0.70;
- Medium (moderate) support - PP ≥ 95% and 0.70 > S-H ≥ 0.50, or PP < 95%
and S-H ≥ 0.70;
- Low (poor or weak) support - PP ≥ 95% and S-H < 0.50, or PP < 95% and 0.70
> S-H ≥ 0.50;
- No support - PP < 95% and S-H < 0.50.
In protein phylogenetic trees Shimodaira-Hasegawa like support (S-H) analyses was
the only considered, using the same scale values.
3.2. Analysis of adaptive evolution
We applied the approach of Yang and coworkers (Yang et al., 2000; Swanson et al.,
2001) to test for amino acids under positive selection in Rlm1 protein. To determine
which model best fitted the data, likelihood ratio tests (LRTs) were performed by
comparing the differences in log likelihood values between two models by using the χ 2
distribution. The codon-substitution models M0, M1a, M2a, M3, M7, and M8 that use a
statistical distribution to describe the random variation among codon sites (ω=dN/dS)
were used (Nielsen and Yang, 1998; Wong et al., 2004; Yang et al., 2005). We used
LRTs to make 3 comparisons to find out whether positive selection has played a role in
the molecular evolution of this gene; (i) the one ratio model (M0) was compared with
the discrete model (M3), (ii) the neutral model (M1a) was compared with the selection
model (M2a), and (iii) the beta model (M7) was compared with the beta and ω model
(M8).
To identify particular sites in the gene that were likely to have evolved under
positive selection the Bayesian Empirical Bayes (BEB) analysis was used (Yang and
Bielawski, 2000; Yang et al., 2005). This estimates the posterior probability (PP) values
at each site. If, a particular codon site presents PP value higher than 95% and ω higher
MATERIAL AND METHODS
49
than 1 that site is inferred to be under positive selection (Nielsen and Yang, 1998; Yang
and Bielawski, 2000; Yang et al., 2005). These calculations were performed by using
the CODEML program inside PAML 4.1 (Yang, 1997) and HYPHY 1.0 program
(Kosakovsky et al. 2005) to analyze positive selection in the MADS-box orthologues
from RLM1 gene in plants, animals and fungi, and in the entire RLM1 gene within the
Saccharomycotina group.
A seventh model, named mechanistic empirical combined (MEC), was further
used to test positive selection in the subphylum Saccharomycotina (Doron-Faigenboim
and Pupko, 2006). This model was statistically tested by comparing the second order
Akaike Information Criterion corrected (AICc) likelihood scores with Model 8a (Wong
et al., 2004). The calculations were performed by using SELECTON 2.4 program (Stern
et al., 2007).
3.3. Analysis of the repetitive region in the subphylum Saccharomycotina
To identify the putative molecular mechanism that contributed to the divergence of
the MADS-box RLM1 gene in the subphylum Saccharomycotina, the C-terminal was
analysed. Three reading frames from the sequences that presented a repetitive region in
the C-terminal were analyzed by using VIRTUAL RIBOSOME 1.1 program
(Wernersson, 2006) to determine the possible amino acids encoded by that region. The
corresponding ancestral RLM1 gene, from which the most recent Saccharomycotina
species reported in this study diverged, was constructed by using MrBayes 3.2 program.
The Saccharomycotina Rlm1 protein sequences were aligned with this ancestral
sequence to identify the presence of an ancestral repetitive region in the C-terminal.
RESULTS AND DISCUSSION
RESULTS AND DISCUSSION
53
4. RESULTS AND DISCUSSION
4.1. Compilation of data
The fungal Rlm1 and Mcm1 orthologue protein/gene sequences identified in this
analysis consisted of 76 putative orthologues. From these, 47 putative Rlm1 orthologue
sequences were directly identified in the databases, 22 were predicted by homology and
7 presented partial sequences (Annex II). Regarding Mcm1, 60 putative orthologue
sequences were found in the database, 14 were predicted by homology and 2 presented
partial sequences (Annex III). In the case of Iff8 protein/gene only 9 putative orthologue
sequences were identified (Annex IV). Other members of MADS-box protein, Smp1
and Arg80 protein orthologue sequences were also identified in Saccharomyces sensu
stricto and Saccharomyces sensu lato groups, but not being used in this study (Annexe
V and VI).
The presence of introns was detected only in RLM1 and MCM1 genes (Annexes II
and III) IFF8 sequences did not contain introns (Annex IV). The number of introns
varied in RLM1 and MCM1 genes, from the lack of them to 8 introns. The higher
variability in the number of introns was observed within Mucoromycotina and
Basidiomycota while the Ascomycota group presented a more constant pattern, from 0
to 2 introns. In view of these facts, a bias towards an intron loss in MADS-box
transcription factors in the kingdom Fungi seems to be occurring, not a gain, which was
evidenced by the absence or presence of only 2 introns in the most recently diverged
phylum Ascomycota, unlike the older phyla that presented a varied number that could
reach eight introns.
In the present work, the results suggested that during the divergence of fungal
species the intron loss would have been toward RML1 3‘ end because in some species
that diverged recently the intron in the 5‘ end inside the MADS-box of this gene is the
only maintained. In MCM1 a bias to lose introns both in 5‘ and in 3‘ ends was
identified, but like in RLM1 gene one intron was maintained inside the MADS-box.
Curiously, the presence of the first intron towards the 5‘ end was inside the MADS-box
a feature present in all the genes with introns. In the case of MCM1 this was also true,
the first intron is also present inside the MADS-box, but not in the 5‘ end since the
MADS-box is in the middle of the gene.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
54
It is known that in intron-rich organisms, introns are evenly distributed within the
coding sequences, but are biased toward the 5‘ end of genes in intron–poor organisms
(Nielsen et al., 2004). Current models attribute the bias to 3‘ end intron loss due to a
poly-adenosine-primed reverse transcription mechanism which was demonstrated in
experiments with intron-containing Ty elements in yeast (Fink, 1987; Boecke et al.,
1985). However, in a study carried out in different fungal species no increased
frequency in intron loss toward the 3‘ end of genes was found, the loss occurred in the
middle of the genes, suggesting either other mutational mechanisms (e.g., reverse
transcription primed internally) or the presence of selective pressure to preferentially
conserve introns near the 5‘ and 3‘ ends of genes (Nielsen et al., 2004). To date, the
mechanism by which introns are inserted or delected from gene loci is not well
understood.
4.2. Phylogeny based on MADS-box transcription factors
4.2.1. Rlm1 transcription factor
Fungal phylogeny based on Rlm1 transcription factor was inferred both by using
amino acids and DNA sequences. The phylogeny based on amino acids sequences was
inferred only by maximum likelihood (ML) by using the PHYML 3.0 program. From
the various models of evolution tested the Jones, Taylor and Thornton (JTT) model was
the one selected, including the options to calculate the proportion of invariant sites, the
gamma distribution and the frequency of amino acids. In the phylogeny based on
nucleotides the General Time Reversible (GTR) model, including the proportion of
invariant sites and the gamma distribution options, were the selected, both for ML
analysis and bayesian inference.
The phylogeny based on protein sequences resolved the Fungi as a clade strongly
supported by a Shimodaira-Hasegawa (S-H) like value of 0.96 in the ML analysis.
However the resolution of Fungi into a clade based on DNA sequences revealed the
existence of a conflict. The S-H value obtained was of 0.24 for ML and of 100% for
bayesian posterior probability (PP), which by the scale value means a low nodal support
(Figure 8 and 9). This result could be due to the low number of DNA RLM1 sequences
from species other than fungi that were used in this study. But, although the low nodal
support in the phylogeny based on DNA sequences, the obtained topology is strongly
supported by previous studies (Wainright et al., 1993; Baldauf and Palmer, 1993).
RESULTS AND DISCUSSION
55
The Mucoromycotina ‗Zygomycota‘ are primarily coenocytic (sometimes producing
cell septa) and undergo sexual reproduction by formation of a thick-walled resting spore
called a zygospore, which is clearly different from the Basidiomycota and Ascomycota
(Dikaryomycota) and other phyla. Morphologicaly the Ascomycota and Basidiomycota
both possess regularly septate hyphae and a dikaryotic life stage but differ in the
structures involved in meiosis and sporulation.
The fungal phylogeny inferred by these analyses clearly placed the 76 fungal
species into one subphyla incertae sedis, the Mucoromycotina ‗Zygomycota‘, and two
well supported phyla, the Basidiomycota and the Ascomycota (Figures 8 and 9). The
Mucoromycotina ‗Zygomycota‘ formed a clade moderately supported by protein (S-H
value of 0.64) as well as by nucleotide analysis (S-H 0.69 ML and 99% PP). Both
Ascomycota and Basidiomycota formed clades highly supported by S-H values of 0.84
and 0.91, respectively with the protein analysis, but these clades were medium or low
supported in the nucleotide ML analysis by S-H of 0.67 and 0.57 and PP values of 98%
and 61%, respectively. Based on the literature a sister relationship between the
Basidiomycota and Ascomycota (the ‗‗Dikaryomycota‘‘) has been proposed, being
recently recognized as the subkingdom Dikarya (James et al., 2006, Hibbett et al.,
2007). The results obtained in these analyses highly support this sister relationship (S-H
= 0.98 ML in protein, S-H= 0.97 ML and PP = 100% in nucleotides).
In the present analysis three species belonging to the subphylum Mucoromycotina
‗Zygomycota‘ were used two of which the Mucor circinelloides and the Rhyzopus
oryzae are sisters by a well-supported node in the phylogenetic analysis using
nucleotide (SH=0.87, PP=99%) and amino acids (SH=0.92) sequences (Figures 8 and
9). This result agrees with the current taxonomic classification in which these two
species form part of the family Mucoraceae (Voigt and Wöstemeyer, 2001). The other
species included in this subphylum, Phycomyces blakeleeanus, member of the family
Phycomycetaceae. The topology obtained is in agreement with previous results since
both the Mucoraceae and Phycomycetaceae families are part of the order Mucorales
within the subphylum Mucoromycotina (Hibbet et al., 2007).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
56
Figure 8. Fungal phylogeny based on Rlm1 protein sequences. This phylogeny was obtained by
maximum likelihood. The nodes are supported by Shimodaira-Hasegawa like support. Species
with partial sequences are: Kluyveromyces waltii, Saccharomyces paradoxus, Ascosphaera
apis, Botrytis cinerea, Chaetomium globosum, Ephicloa festucae, Coccidioides immitis.
RESULTS AND DISCUSSION
57
The phylum Basidiomycota is strongly supported as a monophyletic clade as well as
classes Agaricomycetes (S-H=0.99 ML in protein, S-H=0.97 ML and PP=100% in
nucleotides), Tremellomycetes (S-H=1.0 ML in protein, S-H=1.0 ML and PP=100% in
nucleotides), and Ustilaginomycetes (S-H=0.97 ML in protein, S-H=0.97 ML and
PP=91% in nucleotides), within this phylum by both protein and nucleotide analyses.
The classification of Ustilaginomycetes, Microbotryomycetes, Tremellomycetes and
Agaricomycetes adopted here followed the current taxonomic proposal (Hibbet et al.,
2007). The Ustilaginomycetes, which is a class of the subphylum Ustilaginomycotina,
are represented by one species of the order Ustilaginales, Ustilago maydis, and by
another of the order Malasseziales, Malassezia globosa, class incertae sedis. These two
species received high support as sister groups in protein and nucleotide analyses.
However, in the phylogeny inferred by the bayesian method a topology conflict was
observed with Ustilaginomycetes and Schizosaccharomycetes grouping them as sister
clades with a low support (PP=62%), being the latter a member of the phylum
Ascomycota. The class Microbotryomycetes, which is a member of subphylum
Pucciniomycotina, is only represented by one species, Sporobolomyces roseus, and is
grouped as a sister group of the subphylum Agaricomycotina with high nodal support
(S-H=0.96 ML in protein, S-H=0.86 ML and PP=95% in nucleotides). The classes
Tremellomycetes and Agaricomycetes are members of the subphylum
Agaricomycotina. These two classes form well supported sister clades in protein (S-
H=0.99), but obtained a low support with the nucleotide analysis ((S-H=0.42 ML and
PP=86%). The class Tremellomycetes is represented by two species of the order
Tremellales, Cryptococcus neoformans and Cryptococcus gatti. These two species are
strongly supported as sister taxa by protein (S-H= 1.0) and nucleotide (S-H= 1.0,
PP=100%) analyses. The class Agaricomycetes is represented by four species included
in the orders Agaricales (Coprinus cinereus, Laccaria bicolor), Polyporales (Postia
placenta), and Corticiales (Phanerochaete chrysosporium), being these latter two
considered as subclass incertae sedis. The order Polyporales and Corticiales are
resolved as well-supported sister groups in protein (S-H=0.95) but with a low support in
nucleotide (S-H= 0.0, PP=80%) analyses. The order Agaricales is resolved as the sister
group of the Polysporales/Corticiales group with high nodal support in protein (S-
H=0.99) and nucleotide (S-H=0.97, PP=100%) analyses.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
58
Figure 9. Fungal phylogeny based on RLM1 DNA sequences. Nodes values correspond to posterior
probabilities (left) and Shimodaira-Hasegawa (right). In the maximum likelihood analysis a S-
H value of 0.67 (a) for Schizosaccharomycetes clade within phylum Ascomycota, value of 1.0
(b) for V. polyspora as basal taxon from ‗Saccharomyces complex‘, value of 0.32 (c) for
Ephicloa festucae as basal taxon from Trichoderma group and a value of 0.04 (d) for A. terreus
as basal taxon of A. nidulans- A. niger group was observed. Species with partial sequence are:
Kluyveromyces waltii, Saccharomyces paradoxus, Ascosphaera apis, Botrytis cinerea,
Chaetomium globosum, E. festucae, and Coccidioides immitis.
RESULTS AND DISCUSSION
59
In this study, of the three subphyla recognized within the Ascomycota (Eriksson
et al., 2004, Hibbet et al., 2007), only the Taphrinomycotina presented a conflict
topology, as mentioned before. The species in this subphylum represented a
monophyletic clade with a high nodal support in protein (S-H= 0.99) and nucleotide (S-
H=1.0, PP=100%) analyses. The subphylum Saccharomycotina also presented
monophyly with well-supported nodal values (S-H=0.89 in protein and S-H=0.88,
PP=99% in nucleotide analyses). The subphylum Pezizomycotina was also represented
as a monophyletic clade with high nodal support values (S-H=0.98 in protein and S-
H=0.98, PP= 100% in nucleotide analyses), as well as its classes Leotiomycetes,
Sordariomycetes, Dothideomycetes and Eurotiomycetes (Figures 8 and 9). The class
Leotiomycetes was only represented by the order Helotiales with a high nodal support
(S-H=1.0 in protein and S-H=1.0, PP=100% in nucleotide analyses), being the species
Botrytis cinerea and Sclerotinia sclerotiorum members of this order.
The class Sordariomycetes is represented by the orders Sordariales, Hypocreales,
Phyllacorales – subclass incertae sedis, and the family Magnaporthaceae – order
incertae sedis. The orders Sordariales, Phyllacorales and the family Magnaportheceae
form a group highly supported (S-H=0.96 in protein and S-H= 0.98, PP=100% in
nucleotide analyses) with only a nodal conflict in the placement of Neurospora crassa
with the protein analysis, since this species should form a clade with Chaetomium
globosum and Podospora anserina. The order Hypocreales is a sister clade of the group
Sordariales/Phyllacorales/Magnaporthaceae with a high nodal support (S-H=1.0 in
protein and S-H=1.0, PP=100% in nucleotide analyses) but a conflict was also observed
regarding the placement of Ephicloe festucae (family Clavicipitaceae). This species was
placed as a sister clade with the family Hypocraceae (Trichoderma atroviride, T. reesei,
T. virens) in ML both in protein and in nucleotide analysis, but as a sister clade of the
family Nectriaceae (Nectria haematoccoca, Fusarium graminearum, F. oxysporum, F.
verticillioides) in the BI, although with low nodal support in all analyses (S-H=0.12 ML
in protein, S-H=0.32 ML and PP=63% in nucleotide).
The class Dothideomycetes is represented by the orders Capnodiales and
Pleosporales, which grouped as monophyletic sister clades with high nodal support
(SH=0.99 in protein, S-H=0.97 PP=90% in nucleotide analyses) (Figures 8 and 9). The
class Eurotiomycetes is represented by the orders Onygenales and Eurotiales and
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
60
presented monophyly. In the order Onygenales, the families Ascosphaeraceae
(Ascosphaera apis) and Arthrodermateceae (Microsporum gypseum) are sister clades
with high nodal support in protein (S-H=0.92) and nucleotide (S-H=0.92, PP=99%)
analyses as well as the families Onygenaceae (Uncinocarpus reesii) and Gymnoascaceae
(Coccidioides immitis, Coccidioides posadasii). The family Ajellomycetaceae is
represented by Paracoccidioides brasiliensis, Histoplasma capsulatum and Blastomyces
dermatitidis, forming a highly supported clade by protein (S-H=1.0) and nucleotide (S-
H=0.99, PP=100%) analyses. A nodal conflict was observed regarding the placement of
the group Ascosphaeraceae/Arthrodermateceae, since the protein analysis showed it as a
sister group of the group Onygenaceae/Gymnoascaceae and the nucleotide analyses as a
sister group of the clade Ajellomycetaceae. However, both analyses showed low nodal
suppor by protein (S-H=0.58) and nucleotide (S-H=0.04, PP=67%) analyses. The order
Eurotiales is represented only by the family Trichocomaceae with a high nodal support
(S-H=0.99 in protein, S-H=1.0, PP=100% in nucleotide analyses). Talaromyces
stipitatus and Penicillium marneffei are sister taxa (Figures 8 and 9). The species
Aspergillus clavatus, A. fumigatus and Neosartorya fischeri formed a group with a well
nodal support that is a sister group of the one formed by A. oryzae, A. flavus, A. terreus,
A. nidulans and A. niger, also well supported (Figures 8 and 9). Only regarding A.
terreus placement by bayesian analysis a conflict was observed. This species grouped as
sister taxa of A. oryzae and A. flavus, while in ML analysis it was placed together with
A. nidulans and A. niger. However, both options had low supports. The orders
Leotiomycetes and Sordariomycetes formed sister clades with a well nodal support
while the orders Dothideomycetes and Eurotiomycetes also formed sister clades but
with a moderate support (S-H=0.85 in protein, S-H=0.89, PP=86% in nucleotide
analyses).
4.2.2. Mcm1 transcription factor
In fungal phylogeny based on the analysis of Mcm1 protein/gene only ML of
protein sequences was considered, since the nucleotide analysis showed a contradictory
topology to the one obtained with protein due to the high difference of MCM1 sequence
sizes (Figure 10). BI was not performed.
The phylogeny inferred from Mcm1 protein analysis coincides in the main
points with the one inferred from Rlm1 protein/gene analyses. The Fungi were resolved
RESULTS AND DISCUSSION
61
as a single clade with S-H= 0.82, strongly supported. The major differences in the
overall topology fall on some clades:
-The subphylum Mucoromycotina ‗Zygomycota‘ presented well supported
polyphyly between the families Phycomycetaceae and Mucoraceae, while with Rlm1
presented monophyly.
-Within the phylum Basidiomycota, the species Malassezia globosa formed a
sister group with the family Schizosaccharomycetes from the phylum Ascomycota with
a good support value, contrary to the obtained with Rlm1 protein analysis, that correctly
placed the Schizosaccharomycetes within the Ascomycota.
-In the phylum Ascomycota, the class Saccharomycetes is present as a
monophyletic group, but with a lower nodal support, while with Rlm1 analyses is well
supported.
-Within the subphylum Pezizomycotina, in the class Sordariomycetes (formed
by the orders Sordariales, Phyllacorales) and the family Magnaportheceae presented
Magnaporthe grisea as the basal clade, with a moderate nodal support, contrary to
results found in the phylogeny with Rlm1 protein/gene that presented Verticillium
dahliae as the basal taxon.
-The basal taxon in the family Hypocraceae is Trichoderma atroviride, which
disagrees with the obtained topology with Rlm1 protein analyses that places T. reesei as
the basal taxa.
-In the class Dothideomycetes, the latest taxa of the order Pleosporales is
different regarding the result found in the topology with Rlm1 protein/gene analyses.
With Mcm1 the latest taxa is Alternaria brassicicola and with Rlm1 is Pyrenophora
tritici-repentis, but with Mcm1 analysis the nodal support is very low.
-Within the class Eurotiomycetes Ascosphaera apis was found as basal of all the
orders that constitute this clade, while the Rlm1 analyses placed as a sister taxon to
Microsporum gypseum.
-The clade Arthrodermateceae (Microsporum gypseum) is a sister clade of the
group formed by Onygenaceae/Gymnoascaceae but presented a low nodal support
contrary to results obtained with the nucleotide RLM1 analysis that joined it to the clade
Ajellomycetaceae, also with low support.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
62
Figure 10. Fungal phylogeny based on Mcm1 protein sequences. This phylogeny was obtained by
maximum likelihood. The nodes are supported by Shimodaira-Hasegawa like support.
Species with partial sequence: Kluyveromyces waltii, Blastomyces dermatitidis, Botrytis
cinerea, Cryptococcus gatti.
RESULTS AND DISCUSSION
63
-The order Eurotiales is weakly supported, while the group formed by the taxa
A. niger, A. fumigatus and Neosarterya fischeri is well supported, contrary to results
obtained in Rlm1 phylogeny, where A. clavatus was joined to the two latter taxa
mentioned before.
-The placement of the taxa A. terreus is also different in this analysis, it was
placed as a sister taxon to A. oryzae and A.flavus, but only in the Rlm1 gene/ protein
analyses its placement is not well supported. The same for A. clavatus that with Rlm1
protein/genes analyses formed a highly supported group with Neosartorya fisheri and A.
fumigatus, but with Mcm1 presented an uncertain placement.
4.2.3. Placement of fungal phylogeny based on MADS-box transcription
factor within the current literature
Several phylogenetric studies have demonstrated the close relationship of the
kingdoms Animalia and Fungi (Baldauf and Palmer, 1993; Baldauf et al., 2000; Lang et
al., 2002). These two kingdoms are now known to be a part of a larger group that
includes the phylum Choanozoa, termed the Opisthokonts (Cavalier-Smith and Chao,
1995; Ragan et al., 1996). Together, the Opisthokonts and the phylum Amoebozoa form
the group Unikonts (Cavalier-Smith, 2002b). In the present work the phylum
Choanozoa has not been included but the close relationship between the two kingdoms,
Animalia and Fungi, was observed using species from the kingdom Plantaea as
outgroup. The kingdom Plantae is part of the group Bikonts (Cavalier-Smith, 2002b).
The phylogenetic topology obtained in this study is also in agreement with the topology
obtained in a study that explain the origin of the MADS-box proteins in Animalia,
Fungi, and Plantae (Alvarez-Buylla et al., 2000).
Regarding the kingdom Fungi, previous studies based on multigenic and rDNA
analysis indicated that the phyla Microsporidia, Chytridiomycota,
Neocallimastigomycota, Blastocladiomycota and the subphyla incertae sedis
Mucoromycotina, Zoopagomycotina, Entomophthoromycotina and Kickxellomycotina
are part of the earliest known divergence that took place during fungal evolution (Bruns
et al., 1992; Berbee and Taylor, 1993; Tanabe et al., 2000, Lutzoni et al., 2004;
Blackwell et al., 2006; Fitzpatrick et al., 2006; James et al., 2006). However, due to the
low number of avalaible sequences within all the phyla that represents the kingdom
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
64
Fungi, only sequences from species within Basidiomycota and Ascomycota phyla and
the subphylum Mucoromycotina were used in this study.
The subphylum Mucoromycotina is represented by species that belonged to the
ex-phylum Zygomycota (Hibbet et al., 2007). The results obtained in our work, showed
that the subphylum Mucoromycotina is an early diverging clade (Figures 8 and 9) which
agrees with previous studies. The subphylum Mucoromycotina presented monophyly
with moderate support with Rlm1 sequence analyses, in agreement with phylogenetic
analysis inferred with ACT1, RPB1, RPB2, EF-1α, small and large subunit rDNA genes
(Voigt and Wöstemeyer, 2001; James et al., 2006), and polyphyly in the analysis with
MCM1 gene (Figure 10), like what was found in other studies (Lutzoni et al., 2004). To
resolve this monophyly/polyphyly conflict the inclusion of sequences from other
species within the subphyla Mucoromycotina, as well as from the other subphyla
incertae sedis would be important. The use of additional protein-coding genes would
also help to resolve this conflict.
Basidiomycota includes about 30,000 species of rusts, smuts, yeasts, and
mushroom fungi (Kirk et al., 2001). Most of them are characterized by their meiospores
(basidiospores) on the exterior of typically club-shaped meiosporangia (basidia).
Phylogenetic relationships among the three subphyla of Basidiomycota are still
uncertain. The subphylum Pucciniomycotina is primarily distinguished by containing
the rust fungi (7000 species), which are primarily pathogens of land plants. In our study,
the species representing the subphylum Pucciniomycotina grouped together with the
species of subphylum Agaricomycotina (Figures 8, 9 and 10). This result is not
consistent with previous studies that suggested a sister group relationship with the group
formed by the subphyla Ustilaginomycotina and Agaricomycotina (Lutzoni et al., 2004,
James et al. 2006). The result from our study could be due to the use of only one
representant of the subpylum Puccioniomycotina, the Sporobolomyces roseus, and also
due to long-branch attraction artifact since the species used presented large MADS-box
genes, like the ones observed within the representants of the class Tremellomycetes
from the subphylum Agaricomycotina.
The Ustilaginomycotina includes 1500 species of true smut fungi and yeasts,
most of which cause systemic infections of angiosperm plant hosts. The
RESULTS AND DISCUSSION
65
Agaricomycotina includes almost two-thirds of the known basidiomycetes, including
the vast majority of mushroom forming fungi. Much of the morphological diversity
exemplified in mushroom fruiting bodies is the result of radiations of certain lineages
within the Agaricomycotina, and recovering their relationships with confidence has
proven very difficult (Binder et al., 2005; Moncalvo et al., 2002). In our work, early-
diverging lineages in the Agaricomycotina are strongly supported, which include
parasitic and/or saprotrophic fungi capable of dimorphism or yeast-like phases, in
agreement with previous studies that found the same relationships among species of this
subphylum (Lutzoni et al., 2004; Fitzpatrick et al., 2006, James et al., 2006).
Ascomycota is the largest phylum within the Fungi characterized by the
production of meiospores (ascospores) in specialized sac-shaped meiosporangia (asci),
which may or may not be produced within a sporocarp (ascoma). The majority of the
species studied in our analysis belonged to this phylum. Within the Ascomycota three
main subphyla are recognized, the Taphrinomycotina, the Pezizomycotina and the
Saccharomycotina. Results from this study showed that all subphyla of Ascomycota
were well supported as monophyletic groups in the phylogenetic tree presented in
Figure 9.
In the results obtained with the Mcm1 protein and RLM1 gene (Bayesian
inference), the class Schizosaccharomycetes from the subphylum Taphrinomycotina sits
outside the other subphyla from Ascomycota. Taphrinomycotina has been resolved as
the earliest diverging monophyletic clade (Liu et al., 1999; Fitzpatrick et al., 2006;
James et al., 2006; Liu et al., 2006; Liu et al., 2009). It includes a diverse group of
species that exhibits yeast-like (for example, Pneumocystis carinii) and dimorphic (for
example, Taphrina deformans) growth forms. The incongruence observed in the
placement of the Schizosaccharomycetes could be due to the fact that in the MADS-box
proteins identified for Ustilaginomycotina and Schizosaccharomycotina a high
variability in the C‘-terminal region was observed. This variability caused a conflict in
the topology determined by bayesian inference and maximum likelihood for RLM1 gene
and Mcm1 protein, respectively (Figure 9 and 10), producing the paraphyly observed,
and grouping these species. Another hypothesis to explain the incongruence could be
the wrong and partial assembly of genomes, which would affect the correct sequence of
genes in the databases; for instance the sequence from MCM1 gene of Blastomyces
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
66
dermatitidis was identified, but part of the N‘-terminal was not matching with the query
sequence. Thus, this sequence had to be corrected and completed with the N‘-terminal
from closer species. Other reason for this incongruence could be the robustness of the
method for phylogenetic inference, since the ALRT approach in maximum likelihood
estimates the probability of a branch being correct better than bootstrapping, and is
more conservative than Bayesian posterior probabilities (Anisimova and Gascuel, 2006;
Hall and Salipante, 2007). That could explain why, the class Schizosaccharomycetes
was correctly placed in the analysis with Rlm1 protein and DNA analyses by maximum
likelihood and aLRT test.
The subphylum Saccharomycotina consists of the ‗true yeasts‘, including
bakers‘ yeast (Saccharomyces cerevisiae) and Candida albicans, the most frequently
encountered fungal pathogen of humans, and its monophyly has been demonstrated in
several studies (Lutzoni et al., 2004; Fitzpatrick et al., 2006; James et al. 2006; Suh et
al., 2006). Results obtained in this work are also in agreement with the previous studies
in which three clades, the CUG, the Saccharomyces sensu stricto and the
Saccharomyces sensu lato are clearly shown (Figure 8, 9 and 10).
Pezizomycotina is the largest subphylum of Ascomycota and includes the vast
majority of filamentous and fruit-body-producing species. Within the Pezizomycotina a
number of well-defined classes are observed, namely the Sordariomycetes, the
Leotiomycetes, the Eurotiomycetes and Dothideomycetes whose relationships has been
the subject of many debates in previous studies (Berbee, 1996; Liu et al., 1996). The
topology obtained with Rlm1 and Mcm1 protein/gene in this study indicated that
Leotiomycetes and Sordariomycetes are well supported sister clades (Figure 8, 9 and
10). This result is in agreement with several previous analyses, like the rDNA-based
analysis (Lumbsch et al., 2005), multigenic analyses (James et al., 2006; Spatafora et
al., 2006) and analyses based on complete fungal genomes (Fitzpatrick et al., 2006;
Robbertse et al., 2006).), but is in disagreement with one analysis of four gene
combined dataset that placed the Dothideomycetes as a sister group to the
Sordariomycetes (Lutzoni et al., 2004).
In this study, within the Sordariomycetes, the inferred phylogenetic relationships
amongst the Sordariomycetidae organisms concur with previous phylogenetic studies
RESULTS AND DISCUSSION
67
(Berbee, 2001). However, in the order Sordariales the placement of Magnaporthe grisea
is not well-defined as the analysis with Mcm1 protein considers it as basal taxon, unlike
the analysis with Rlm1 protein/gene considers Verticillium dahliae as basal taxon.
These results disagree with the phylogeny of Sordariomycetes obtained by other studies,
which placed V. dahliae as the sister taxon of the Hypocreales (Zhang et al., 2006).
Likewise, in this study the inferred phylogeny based on Mcm1 protein and RLM1 gene
(Bayesian inference) propose that Ephicloe festucae is a sister taxon of the group
formed by Fusarium species (Figure 9 and 10), and the analysis of Rlm1 protein/gene
(maximum likelihood) placed it as sister taxon of group formed by Trichoderma
species. However, in our study only one species from family Clavicipitaceae was used.
The Dothideomycetes class form a well-supported sister group with the
Eurotiomycetes in the analysis of Rlm1 protein/gene, but not in the analysis of Mcm1
protein (S-H=0.0). The former result agrees with the previous analysis carried out with
four combined genes which also grouped together the Dothideomycetes and the
Eurotiomycetes (Lutzoni et al; 2004). Conversely, a concatenated alignment of 42
fungal genomes inferred that Stagonospor nodorum (a member of the Dothideomycetes)
is more closely related to the Sordariomycetes and Leotiomycetes lineages (Fitzpatrick
et al., 2006). Another study also based on 17 Ascomycota genomes, which used
concatenated alignment, reported conflicting inferences regarding the phylogentic
position of S. nodorum (Robbertse et al., 2006). The phylogenies reconstructed based
on 17 genomes, using NJ and ML methods, inferred a sister group relationship between
S. nodorum and Eurotiomycetes (Robbertse et al., 2006). This work is in accordance
with a supertree inferred based on 42 complete genomes (Fitzpatrick et al., 2006) and
the tree obtained with Rlm1 protein/gene and Mcm1 protein analyses. However, the
same phylogenomic study inferred, by using maximum parsimony, the placement of S.
nodorum at the base of the Pezizomycotina (Robbertse et al., 2006). The topology
inferred in this work clearly showed two sister clades (orders Capnodiales and
Pleosporales), which concurs with previous phylogenetic studies (Schoch et al., 2006).
In this study, within the Eurotiomycetes class there is only a clade corresponding
to the order Onygenales (Histoplasma capsulatum; Blastomyces dermatitidis
Paracoccidioides brasiliensis, Coccidioides immitis, Coccidioides posadassii,
Uncinocarpus reesii, Microsporum gypseum and Ascosphaera apis). The Onygenales
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
68
clade is particularly interesting as it contains mainly animal pathogenic species. Some
of them namely Coccidioides immitis have initially been classified as a protist, but
further research showed it were a fungus and separate studies placed it in three different
divisions of the former group named Eumycota (Rixford and Gilchrist, 1896; Ophuls,
1905; Ciferri and Redaelli, 1936; Baker et al., 1943). Subsequent ribosomal phylogeny
studies (Pan et al., 1994; Bowman et al., 1996) suggested a close phylogenetic
relationship between C. immitis and U. reesii, excluding H. capsulatum. The
phylogenies based on Rlm1 and Mcm1 protein/gene agree with the placement of C.
immitis, C. posadassii and U. reesii as sister taxa, representing two families,
Gymnoascaceae and Onygenaceae. However, a conflict was observed in the placement
of Ascosphaera apis, which formed a clade with Microsporum gypseum with the Rlm1
protein/gene analysis and appears as a basal taxon in Eurotiomycetes, with the Mcm1
protein analysis. The obtained results disagree with the Eurotiomycetes phylogeny
study, in which Ascosphaera apis (Ascosphaeraceae) formed a sister clade in the
Ajellomycetaceae (H. capsulatum, P. brasiliensis and B. dermatitidis) and Microsporum
gypseum, a sister clade of Gymnoascaceae (Geiser et al., 2006). The Eurotiomycetes
branch containing the Eurotiales clade inferred the close relationship among
Talaromyces stipitatus and Penicillium marneffei (Figures 8, 9 and 10). A minor
difference in the Aspergillus clade was observed between the phylogenetic analyses
with Rlm1 and Mcm1 regarding the position of A. nidulans, A. terreus, A. niger and A.
fumigatus (Figure 8, 9 and 10). In this study, Neosatoria fischeri and A. fumigatus
formed well-supported sister taxa, as well as A. oryzae and A. flavus, in accordance to
Fitzpatrick et al.(2006) and James et al. (2006) studies.
Due to the fact that the majority of fungi are still undiscovered, a robust phylogeny
of known taxonomic groups will be essential for placement of unknown species as these
are discovered. As demonstrated for Bacteria and Archaea (Pace, 1997), the Fungi are
likely to harbor many lineages whose discovery is dependent on phylogenetic analyses.
Therefore, it will be necessary to carry out phylogenetic studies including a wider set of
fungal species from other phyla, as the Glomeromycota and the Chytridiomycota, what
will enable to resolve the phylogenetic relationships between misclassified and
misplaced fungal species.
RESULTS AND DISCUSSION
69
4.3. Relationships within the subphylum Saccharomycotina
Due to the relationship that exists between Candida species and ‗Saccharomyces
complex‘, these groups were further analysed by using both single gene and
concatenated gene phylogenies.
4.3.1. Phylogeny of the ‘Saccharomyces complex’
In figures 8, 9 and 10, the subphylum Saccharomycotina is clearly shown as a
monophyletic clade grouping Yarrowia lipolytica, Candida species and ‗Saccharomyces
complex‘ species, in the phylogenetic tree inferred for the kingdom Fungi. The
phylogenies deduced based on single gene or concatenated genes, including only
species belonging to the subphylum Saccharomycotina are present in Figures 11 to 14.
Results from this analysis showed a topology very similar to the previously inferred
topologies that included all other fungal species, by using Rlm1 and Mcm1 genes.
The Non Whole Genome Duplication (Non-WGD) group, formed by
Kluyveromyces lactis, Ashbya gossypii, Saccharomyces kluyveri and Kluyveromyces
waltii, presented monophyly in all analyses, except in the topologies based on MCM1
gene by bayesian inference (Figure 12B). The relationships amongst the Non-WGD
species presented a variable placement of taxa, with their grouping poorly defined
(Figures 11, 12 and 13). These results were not conclusive due to the low number of
taxa within this complex that have available sequences for this study. In other studies,
based on multigenic and phylogenomic analyses, the relationship amongst the Non-
WGD species has been demonstrated showing K. waltii – S. kluyveri and K. lactis – A.
gossypii grouping together (Fitzpatrick et al., 2006; James et al., 2006; Jeffroy et al.,
2006; Liu et al., 2006).
Another study of the ‗Saccharomyces complex‘ based on multigenic analysis,
that used 75 species showed A. gossypii and K. lactis forming basal separated clades
from S. kluyveri and K. waltii, these latter clade with a medium support (Kurtzman and
Robnett, 2003). Another phylogenetic study based on SSU and LSU rDNA sequences
from 64 species of the family Saccharomycetales placed K. lactis as an early clade and
A. gossypi as a basal clade (Suh et al., 2006). Thus, the correct relationship amongst the
Non-WGD species is still to be resolved.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
70
In our study, a noticeable difference occured in the placement of Saccharomyces
castellii in the WGD clade with respect to previous studies. In the analyses based on
RLM1 gene and protein, S. castellii was a well-supported early taxon within the WGD
group (Figure 11).
A B
Figure 11. Phylogeny of the subphylum Saccharomycotina inferred by using RLM1. (A) Protein
phylogeny obtained by maximum likelihood using the evolutive model JTT + I + G with
nodal support by Shimodaira-Hasegawa; (B) Gene phylogeny obtained by using the evolutive
model GTR + I + G. Nodes values correspond to posterior probabilities (left) and maximum
likelihood (right), Shimodaira-Hasegawa like support). Species with partial sequence are
Kluyveromyces waltii, Saccharomyces paradoxus.
However, the analysis based on Mcm1 protein showed a low support of S.
castellii as an early taxon from Saccharomyces sensu stricto group (S. bayanus, S.
kudriavzevii, S. mikatae, S. cerevisiae and S. paradoxus). The analysis with MCM1 gene
indicated a weak monophyly by ML with Saccharomyces sensu lato group (S. castellii,
C. glabrata, Vanderwaltozyma polyspora) and BI presented a medium support as early
taxon from Saccharomyces sensu stricto group (Figure 12).
The Rlm1-Mcm1 concatenated phylogeny placed S. castellii as an early taxon from
Saccharomyces sensu stricto group, both by maximum likelihood and bayesian methods
(Figure 13). Additionally, in the global fungal phylogeny analysis based on Rlm1 and
Mcm1 sequences, S. castellii also presented an inaccurate placement (Figures 11 - 13).
RESULTS AND DISCUSSION
71
Inacurrate placement of S. castellii was also obtained by past multigenic studies, in
which it either competed with C. glabrata to the basal placement in the Saccharomyces
sensu stricto group, or formed a clade together with this group (Diezmann et al., 2004;
Fitzpatrick et al., 2006; James et al., 2006; Jeffroy et al., 2006; Liu et al., 2006).
A B
Figure 12. Phylogeny of the subphylum Saccharomycotina inferred by using MCM1. (A) Protein
phylogeny maximum likelihood using the evolutive model JTT + I + G with nodal support by
Shimodaira-Hasegawa; (B) Gene phylogeny using the evolutive model GTR + I + G. Nodes
values correspond to posterior probabilities (left) and maximum likelihood (right,
Shimodaira-Hasegawa like support). b 0.16 is the support from group formed by S. castelli,
C. glabrata and V.polypsora by maximum likelihood. Species with partial sequence:
Kluyveromyces waltii
However, a study including a higher number of Saccharomycetales species and
based on SSU and LSU rDNA showed that V. polyspora was closer to S. cerevisiae than
S. castellii and C. glabrata (Suh et al., 2006). Another study based on mitochondrial
and nuclear genes as COX1, mit SSU rDNA, EF-1α, rDNA (ITS1, ITS2), nuc LSU
rDNA (D1/D2), placed S. castellii forming a closer clade to Saccharomyces sensu
stricto group, and C. glabrata and V. polyspora forming earlier clades than S. castellii
with respect to Saccharomyces sensu stricto group (Kurtzman and Robnett, 2003).
The sister group relationships amongst the Saccharomyces sensu stricto species
also differed between Rlm1, Mcm1 and concatenated phylogenies (Figures 11 - 13). For
example, the Saccharomycetales Rlm1 protein/gene phylogeny placed S. bayanus and S.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
72
kudriavzevii as sister taxa at the base of the Saccharomyces sensu stricto node and S.
mikatae, S. cerevisiae and S. paradoxus forming a group within it, by ML and BI. But
Mcm1 protein phylogeny placed S. mikatae as basal taxon while MCM1 gene
phylogeny did not define with accuracy the basal taxon between S. bayanus or S.
kudriavzevii.
A B
Figure 13. Phylogeny of the subphylum Saccharomycotina inferred by using concatenated alignment of
RLM1 - MCM1. (A) Protein phylogeny obtained by maximum likelihood with the evolutive
model JTT + I + G and nodal support by Shimodaira-Hasegawa; and (B) Gene phylogeny
obtained using the evolutive model GTR + I + G. Nodes values correspond to posterior
probabilities (left) and maximum likelihood (right, Shimodaira-Hasegawa like support).
Species with partial sequence: Kluyveromyces waltii.
The concatenated alignment of Rlm1 and Mcm1 proteins (Figure 13) inferred a
result similar to Rlm1 gene/protein phylogenies (Figure 12), and the analysis based on
concatenated alignment of genes (RLM1 and MCM1) placed S. bayanus at the base of
the Saccharomyces sensu stricto node and inferred a ladderised topology amongst the
Saccharomyces sensu stricto species (Figure 13). These phylogenetic trees, based on
RLM1 and concatenated genes, agree with supertrees that indicated both ladderised
topology and S. bayanus and S. kudriavzevii as sister taxa (Fitzpatrick et al., 2006,
Jeffroy et al., 2006). The study of the ‗Saccharomyces complex‘ based on multigenic
analysis also showed a ladderised topology in Saccharomyces sensu stricto species,
having S. bayanus as basal taxon (Kurtzman and Robnett, 2003).
RESULTS AND DISCUSSION
73
To analyze the degree of conflicting phylogenetic signal within the concatenated
alignment, a phylogenetic network was constructed and a bootstrap analysis was
performed on this phylogenetic network (Figure 14). It is interesting to note that
numerous alternative splits were obtained (60 in total), however no splits were observed
excluding C. glabrata, V. polyspora and S. castellii from the remaining WGD
organisms, despite these species form clades within the WGD group. These results are
in conflict with the concatenated phylogeny (Figure 13), which inferred that C. glabrata
and V. polyspora formed a well-defined clade, and S. castellii was the basal species
from the remaining WGD organisms. The syntenic information clearly shows that S.
castellii diverged from the Saccharomyces sensu stricto lineage before C. glabrata
(Scanell et al., 2006). Therefore topologies that place C. glabrata as outgroup to the
Saccharomyces sensu stricto lineage and S. castellii or vice-versa are unreliable
(Scanell et al., 2006; Fitzpatrick et al., 2006; Suh et al., 2006), like the ones observed in
our study with the simple and concatenated genes. It is known that systematic bias can
influence the resulting tree (Phillips et al., 2004). Hence, a closer scrutiny would be
necessary.The incongruences found in these analyses suggest that the use of additional
Saccharomycotina taxa would be important to help solve the placement of species
belonging to these groups. In addition, it is likely that stochastic errors and erroneous
inferences could be eradicated with the addition of extra genome data, when it becomes
available.
Figure 14. Phylogenetic network reconstructed using a concatenated alignment of RLM1 – MCM1 genes
in the subphylum Saccharomycotina. The NeighborNet method was used to infer splits within
the alignment (1000 bootstrap nodal support).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
74
4.3.2. Phylogenetic relationship amongst Candida species
The group of organisms that translate mainly CUG as serine instead of leucine has
been designated as the CUG group (Kawaguchi et al., 1989; Santos and Tuite, 1995).
This codon reassignment has been proposed to have occurred about 170 million years
ago (Massey et al., 2003). Rlm1, Mcm1 and concatenated topologies inferred a robust
monophyletic clade containing organisms within the CUG group (Figures 11 - 13). The
topology analysis showed that there are four putative CUG sub-clades, the first
containing C. lusitaniae, the second C. guilliermondii and Debaromyces hansenii, the
third Pichia stipitis and the fourth containing C. tropicalis, C. albicans, C. dubliniensis,
C. parapsilosis, and Lodderomyces longisporus. These topologies have also been
recognized in other phylogenetic studies inferred by rRNA genes (Suh et al., 2006).
Some phenotypic and morphologic characteristics could correlate with the topology
observed since C. lusitaniae and C. guilliermondii are haploid yeasts, and their
teleomorphic states present sexual reproduction (Wickerham and Burton, 1954;
Rodriguez de Miranda, 1979; Young et al., 2000). D. hansenii and Pichia stipitis are
homothallic, with a fused mating locus (Dujon et al., 2004; Fabre et al., 2005). In
contrast, members of the fourth sub-clade have at best a cryptic sexual cycle and have
never been observed to undergo meiosis (Hull and Johnson, 1999; Hull et al., 2000;
Magee and Magee, 2000; Pujol et al., 2004; Logue et al., 2005).
The phylogeny of Candida species was also investigated in this study by using a
further analysis with Iff8 protein and gene (Figure 15). The resultant topologies placed
L. elongisporus within the asexual clade with high support, which is in agreement with
other phylogenetic studies based on multigenic and phylogenomic analyses (James et
al., 1994; Fitzpatrick et al., 2006; Diezmann et al., 2004).
The CUG specific topology based on three concatenated proteins/genes, RLM1,
MCM1 and IFF8, suggested that D. hansenii and C. guilliermondii are sister taxa, as
they are grouped together with high support by maximum likelihood and bayesian
inference, excluding C. lusitaniae and P. stipitis (Figure 16). Topologies obtained with
Rlm1 protein/gene and RLM1-MCM1 concatenated genes presented conflicts in the
placement of P. stipitis, being included within C. guilliermondii – Debaryomyces
hansenii clade (Figure 11 and 13A).
RESULTS AND DISCUSSION
75
A B
Figure 15. Phylogeny of CUG Group based on IFF8. (A) Protein phylogeny obtained by maximum
likelihood, using the JTT + I + G evolutive model with nodal support by Shimodaira-
Hasegawa; (B) Gene phylogeny obtained by using the evolutive model used was GTR + I +
G. Nodes values correspond to posterior probabilities (left) and maximum likelihood (right,
Shimodaira-Hasegawa like support). *In the maximum likelihood analysis, C. tropicalis,
C.parapsilosis and L.elongisporus form a clade with 0.41 S-H and C.parapsilosis-
L.elongisporus are sister taxa with 0.98 S-H. The species mentioned before with C.albicans
and C.dubliniensis form a clade well supported 0.97S-H.
A B
Figure 16. Subphylum Saccharomycotina phylogeny reconstructed using concatenated alignment of
RLM1 - MCM1 – IFF8. (A) Protein phylogeny obtained by maximum likelihood with the
evolutive model JTT + I + G and Shimodaira-Hasegawa like nodal support; (B) Genes
phylogeny obtained by using the evolutive model used was GTR + I + G. Nodes values
correspond to posterior probabilities (left) and maximum likelihood (right, Shimodaira-
Hasegawa like support).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
76
Other studies have placed C. lusitaniae in a clade with C. guilliermondii, and
inferred a closer relationship between the two with Debaryomyces species (Daniel et al.,
2001; Diezmann et al., 2004). These results raise interesting questions regarding the
sexual status of the Candida species. It is possible that the "asexual" species are in fact
fully sexual. C. albicans, C. dubliniensis and C. tropicalis have been observed to mate
(Soll et al., 1988; Pujol et al., 2004), and in addition C. albicans genome contains most
of the required genes for the development of meiosis (Tzung et al., 2001). In contrast
the evidence that L. elongisporus reproduces sexually is based on the appearance of
asci, with one (or sometimes two) spores (Van der Walt, 1966).
The phylogenetic network constructed based on the three concatenated genes
(Figure 17) corroborated the CUG-specific concatenated tree regarding the grouping of
C. guilliermondii and D. hansenii as sister taxa, excluding C. lusitianiae and P.stipitis,
but when the network was inferred by two concatenated genes C. guilliermondii, D.
hansenii and P. stipitis formed sister taxa (Figure 14). This analysis showed a topology
conflict with the phylogeny based on Iff8 protein and gene for L. elongisporus group
with C. parapsilosis (Figure 15). As expected there was no conflict for the grouping of
C. albicans and C. dubliniensis, illustrating their high genotypic similarity (Sullivan et
al., 1995), and those species formed with C. tropicalis a highly supported clade, which
agree with other recent published studies (Fitzpatrick et al., 2006; James et al., 2006;
Suh et al., 2006).
Figure 17. Subphylum Saccharomycotina phylogenetic network reconstructed using a concatenated
alignment of RLM1 – MCM1 – IFF8 genes. The NeighborNet method was used to infer splits
within the alignment (1000 bootstrap nodal support).
RESULTS AND DISCUSSION
77
4.4. Analysis of adaptive evolution
Genome duplication has occurred in the WGD group as previously described (Kellis
et al., 2004). Understanding the fate of duplicates is fundamental to clarify mechanisms
of genetic redundancy and the link between gene family diversification and phenotypic
evolution. Models to identify possible sites of positive selection can provide alternative
explanations for the persistence and diversification of gene duplicates.
The detection of an excess in the ratio of the rate of nonsynonymous (dN) over the
synonymous (dS) substitutions (dN/dS, also denoted ω) is a nonambiguous indicator of
positive selection at the codon level. Positive selection (ω>1) is considered as the
selection of fixing advantageous mutations, and this term is used interchangeably with
molecular adaptation and adaptive molecular evolution. Purifying selection (ω<1) is
considered as the natural selection against deleterious mutations and the term is used
interchangeably with negative selection or selective constraints. Several likelihood-
based tests were used to search for evidence of positive selection using the PAML 4.1
and HYPHY 1.0 packages. Codon-based substitution models have been widely used to
identify amino acid sites under positive selection in comparative analysis of protein-
coding DNA sequences.
From the three genes analyzed in this study, the RLM1 was the one that gave better
phylogenetic results. In this view, the sequence of RLM1 was analyzed at the codon
level to detect both positively selected and purified selected amino acid sites by using
the complex models M3, M2a, an M8 and their corresponding null models M0, M1a
and M7, respectively.
First, the RLM1 sequences from animals, plants and fungi were used to infer
positive selection by testing the six different codon sites models implemented in the
PAML 4.1 package (Annexes VII). According to Nielsen and Yang, (1998) in the
analyses of sequences with high variability the gaps introduced in the alignment should
be removed. Thus the first analysis, in which all animals, plants and fungi RLM1
sequences were tested, only the MADS-box was considered. Results showed no
evidence for positive selection in the MADS-box type II domain, suggesting that this
sequence was extremely conserved within the fungi. Then, this same analysis was
performed in the complete RLM1 gene but only for the subphylum Saccharomycotina to
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
78
avoid the presence of a high number of gaps and to analyze closer species. This new
analysis was performed by using the HYPHY 1.0 package because it consider the
presence of gaps calculating a dN/dS value for those positions (Annex VIII). To better
analyze this sequence the conserved moieties were identified. In the alignment of
Saccharomycotina Rlm1 proteins four conserved moities were recognized, the first
motif is the MADS-box conserved region at the N-terminal, the second is an acidic
region close to MADS-box, third and fourth motif were named A and B, respectively
(Figure 18).
Figure 18. Conserved regions in RLM1 gen of the subphylum Saccharomycotina: MADS-box, Acidic
region, A-B Conserved regions present in this subphylum.
MADS-BOX
Acidic region A
B
RESULTS AND DISCUSSION
79
At the C-terminal of Rlm1 a repetitive region was identified in Candida albicans,
Candida dubliniensis, Kluyveromyces lactis and Saccharomyces sensu stricto group
(Figure 19).
Results obtained by applying the models to identify sites of positive selection
showed exactly the same evidence, no positive selection not only in the MADS-box
domain but also in the all RLM1 sequence, confirming the previous findings. The
significance of the values obtained with the complex models against their corresponding
null models was tested by using a Likelihood Ratio Test (LRT), according to Anisimova
et al. (2001) (Annex IX).
The LRT test confirmed the results supporting no positive selection. Overall, the
results from maximum-likelihood models of codon evolution indicated that the
evolution of MADS-box from fungi and Saccharomycotina RLM1 gene are mainly
under purifying selection, suggesting that the expressed protein plays an important role
in the fungi that must be maintained.
Figure 19. Region of microsatellites present in some species from subphylum Saccharomycotina.
Alignment was performed by using MEGA 4.0 with ClustalW.
A new codon-based model the Mechanistic Empirical Combined (MEC), based
on the junction of all parameters used by the traditional models mentioned before and a
new empirical amino acid replacement model was recently described (Doron-
Faigenboim and Pupko, 2006). This MEC model was also tested on the
Saccharomycotina sequences to detect sites in Rlm1 under positive selection. By means
of this new model it was possible to identify sites under positive selection in the
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
80
Saccharomycotina Rlm1 protein. The protein sequence from S. cerevisiae was used as
pattern to identify those sites (Figure 20).
The MEC model produced a loglikelihood value of -38544.7 that is higher than -
39363.5 obtained for M8a (used as the null model) which indicates the presence of sites
under positive selection. To test the significancy of the MEC model result, an AICc
(Akaike Information Criterion correction) was performed and the value of AICc for the
MEC model (77099.41602) was lower than the one obtained for the M8a model
(78735.01068), which means that the hypothesis of positive selection can be accepted.
Several sites in the Rlm1 sequence were identified as putatively under positive
selection. It is curious to observe that 57% (20 out of 35) of the places under positive
selection were observed in the first part of the protein, right after the MADS-Box and
before the region A. Between the MADS-box and the acidic region 12% of the amino
acid residues were identified as being under positive selection, between the acidic
region and the region A 8%, between the A and B region 1.7%, and between the B and
the repetitive region 10% (Figure 20).
A particular fact was the observation of amino acids under positive selection
near the microsatellite region. Taking into consideration this repetitive region, we can
observe that in C. albicans, C. dubliniensis and K. lactis the amino acid under repetition
is glutamine, coded by CAA, while in the group of Saccharomyces sensu stricto /S.
cerevisiae is asparagine, encoded by AAC. This observation showed that amino acid
substitutions that would have occurred during the divergence of species, due to events
of mutations and natural selection changed this microsatellite region and probably
diversified gene function in WGD species, avoiding the loss of protein function.
Numerous lines of evidence have demonstrated that genomic distribution of
repetitive DNA, particularly of microsatellites, is nonrandom presumably because of
their effects on chromatin organization, regulation of gene activity, recombination,
DNA replication, cell cycle, mismatch repair (MMR) system, etc. (Li et al., 2002).
Microsatellites may provide an evolutionary advantage of fast adaptation to new
environments as evolutionary tuning knobs (Kashi et al., 1997; Trifonov, 2003). Due to
the presence of trinucleotide microsatellites within protein coding regions tracts of
RESULTS AND DISCUSSION
81
repeated amino acid are present in some proteins and different amino acid repeats are
concentrated in different classes of proteins. In yeast proteins, the most abundant amino
acid repeats are Gln, Asn, Asp, Glu, and Ser (Richard and Dujon 1997; Alba et al.,
1999). In this work within Saccharomycotina Rlm1 microsatellite region tracts of poly-
Gln and poly-Asn have been found (Figure 19). The change in the amino acid tract at
the microsatellite region, from Gln (CAA) to Asn (AAC) could be due to a possible
frameshift mutation that occurred in the gene before the microsatellite region. Since,
this kind of mutations has been already observed in MADS-Box genes, exactly at the C-
terminal (Kramer at al. 2006). Thus, the microsatellite repetitive region of this gene was
further analysed.
Figure 20. Saccharomyces cerevisiae amino acid sequence showing sites under positive selection by the
MEC model. Conserved regions: MADS-box (blue), Acidic region (orange), Region A
(purple), Region B (green) and microsatellites (red).
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
82
4.5. Event of frameshift mutation in Saccharomyces cerevisiae RLM1
gene
In the sequence alignment of Saccharomycotina Rlm1 proteins four conserved
moities were recognized, and at the C-terminal a repetitive region was identified in C.
albicans, C. dubliniensis, Kluyveromyces lactis and Saccharomyces sensu stricto group,
which seems to be under positive selection. It is difficult to determine the molecular
events that changed the amino acid residues in the sites identified as under positive
selection but from the nucleotide sequence analysis, the molecular event that could have
changed the tract at the repetitive region might have been a frameshift mutation.
Additionally, this type of mutations has been already observed at the C- terminal of
MADS-box genes, as previously mentioned, and thus this hypothesis was tested by
analyzing the different RML1 reading frames.
The three open reading frames from C. albicans, C. dubliniensis, K. lactis and S.
cerevisiae were analyzed to find all the potential repetitive amino acids in the C-
terminal region of this gene. The first reading frame showed a poli- Glutamine (Gln–Q)
tract in the C-terminal of C. albicans, C. dubliniensis and K. lactis, and a poli-
Asparagine (Asn–N) in the C - terminal of S. cerevisiae (Figure 21). These reading
frames are the actual coding frames of the mentioned yeast species. The S. cerevisiae
second reading frame showed a translation to Threonine (The–T), while the third
reading frame encoded a poli-Glutamine tract, just like C. albicans, C. dubliniensis and
K. lactis (Figure 21).
The mutational mechanisms known to contribute to microsatellite polymorphism
and that could affect the reading frame in the microsatellite region are the following: i)
DNA polymerase slippage during DNA replication (Tachida and Izuka, 1992) and ii)
recombination (unequal crossover or gene conversion) between DNA strands (Harding
et al., 1992; Li et al., 2002). Microsatellites studies indicated that trinucleotide repeats
in coding sequences of many organisms are selected against possible frameshift
mutation since these mutations are generally considered detrimental by causing
nonfunctional transcripts and /or protein, through possible insertion of premature stop
codons (Ohno, 1970, Tóth et al., 2000; Wren et al., 2000; Cordeiro et al., 2001).
Although trinucleotide repeats are under selective pressure to avoid frameshift
mutations, exceptional situations have been described when a protein is temporarily
RESULTS AND DISCUSSION
83
freed from selective pressure. If selective pressure is relieved, for example, through a
second copy of a gene, this duplicate can compensate for the possible loss of function
caused by the frameshift mutation and enable such mutations to lead to functional
divergence (Raes and Van de Peer, 2005).
Figure 21. The three RLM1 open reading frames of Candida albicans, Kluyveromyces lactis and
Saccharomyces cerevisisae visualized with the Virtual Ribosome 1.0. The microsatellite
region, as well as the most likely position to insert a nucleotide and change S. cerevisisae
reading frame, avoiding the presence of stop codons, are indicated
Genomic duplication has been proposed as an advantageous path to evolutionary
innovation, because duplicated genes can supply genetic raw material for the emergence
of new functions through the forces of mutation and natural selection (Ohno, 1970).
Such duplication can involve individual genes, genomic segments or whole genomes
and coordinated duplication of an entire genome may allow for large-scale adaptation to
new environments.
In this work, the phylogeny analysis clearly inferred two groups that suffered
genome duplication: Saccharomyces sensu strict and sensu lato and it was in the
Saccharomyces sensu stricto group that the amino acid encoded by trinucleotide repeats
suffered alteration. The genome duplication of S. cerevisisae was firstly proposed by the
locations of paralogues that revealed the existence of ancestral duplication blocks
(Wolfe and Shields, 1997; Langkjaer et al., 2003). The model WGD was then proposed
Nucleotide insertion
C. albicans
K. lactis
S. cerevisiae
3
2
1
3
2
1
3
2
1
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
84
and corroboted by syntenic and genomic studies (Wolfe and Shields, 1997; Seoighe and
Wolfe, 1999, Kellis et al., 2004). Those studies postulated that the WGD event occurred
after the divergence of S. cerevisiae and K. lactis, being also identified a group of Non-
WGD species represented by K. lactis, A. gossypii, S. kluyveri and K. waltii (Byrne and
Wolfe, 2006). In accordance to the inferred phylogenies firstly the CUG group diverged
from the Non WGD – WGD groups, and then the Non WGD – WGD groups separated.
Therefore, the microsatellite within RLM1 must have been present in same ancestral
taxon within Saccharomycotina, evolving during the speciation process, and changing
in recent species, possibly due to a frameshift mutation, as this work is indicating.
Additionally, frameshift mutations have been detected in MADS-box transcription
factor families of plant, causing mutations exactly in the C–terminal and yielding
functional diversification (Lachtman, 1998, Kramer et al., 2006).
In the present study two members of the MADS- box transcription factor family
were studied, the Rlm1 and the Mcm1. However, in S. cerevisiae two other members of
this family were identified, and this species presents four MADS- box in total: the Smp1
whose function is involved in regulating the response to osmotic stress (de Nadal et al.,
2003) and with orthologues in the WGD group and none in the Non WGD and CUG
groups; and the Arg80 whose function is involved in the regulation of arginine-
responsive genes (Dubois et al., 1987), being identified in S. cerevisiae and in the WGD
group (Annexes V and VI). Recently, the complete genomes of Zygosaccharomyces
rouxii and Kluyveromyces thermotolerans have been sequenced, showing that Z. rouxii
presents four MADS-box transcription factors, like the WGD group, while K.
thermotolerans presents just two MADS-box transcription factors, like Non-WGD
group. Therefore, we could infer that the genome duplication in Saccharomycotina
would have occurred in taxa that presented the four MADS-box transcription factors.
Previous studies suggested that at least one MADS-box gene was present in the
common ancestor of plants, animals, and fungi, and that probably the duplication that
gave rise to the animal MADS- box type II and type I genes occurred after animals
diverged from plants, but before fungi diverged from animals (Theissen et al., 1996).
However, Alvarez-Buylla et al. (2000) proposed that at least one ancestral MADS- box
gene duplicated in the common ancestor of the major eukaryotic kingdoms, more than a
billion years ago, to give rise to the distinct type I (SRF-like) and type II (MEF2-like)
RESULTS AND DISCUSSION
85
lineages found in plants, fungi, and animals today. Figure 22 presents a scheme of the
MADS-box genes evolution model based on Alvarez-Buylla et al. (2000).
Saccharomyces cervisiae Candida albicans
B)
Figure 22. A) Evolution of MADS-box genes based on the Alvarez-Buylla et al.(2000) model, proposing
a common origin for Animals, Fungi and Plants and a further duplication event in the Fungi
lineage in some species within the subphylum of Saccharomycotina (WGD group). B)
Comparison of Rlm1 protein/gene domains in S. cerevisiae and C. albicans, showing the
frameshift mutations occurred after genome duplication in S. cerevisiae and WGD species
that changed the coding amino acid from Q to N.
Plants
Lineage
Animal and Fungi
Lineages
Animals
Lineage
Fungi
Lineage
WGD – Saccharomycotina
Lineage
Common
Ancestral
1 billion
years ago
(Based on
Alvarez-Buylla et
al., 2000)
APETALA,
PISTILLATA,
DEFICIENS,
CAULIFLOWER,
FRUITFULL,
AGAMOUS and
SQUAMOSA
MEF2-like and
SRF-like
isoforms
A)
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
86
Animal and fungal MADS-box sequences type II are more closely related to most
plant MADS-box sequences type II than to animal MADS-box sequences type I,
suggesting the occurrence of at least one gene-duplication event before the divergence
of plants and animals. This model also proposes that after the divergence of animal-
fungi and plant lineages both types of MADS-box genes were transferred to each
lineage. In the plant lineages different evolutive mechanisms such as gene duplication,
frameshift mutation, and alternative splicing, occurred, producing a variety of MADS-
box transcription factor genes, while in fungi and animals each diverged with both
MADS-box types. In animals the alternative splicing has been reported as the main
evolutive mechanism that changed these genes. Fungi lineages maintained both MADS-
box types represented by RLM1 and MCM1 genes, but within the ‗Saccharomyces
complex‘ in subphylum Saccharomycotina, with the whole genome duplication event
occurring in Saccharomyces sensu stricto and sensu lato clades, the genes SMP1 and
ARG80 appeared, that are paralogues to Rlm1 and Mcm1, respectively. After this
genome duplication the coded amino acid in the repetitive region of RLM1 gene within
the Saccharomyces sensu stricto group changed. This was due to a frameshift mutation
to encode Asn instead of Gln as in the Non-WGD and CUG groups, that represent more
basal clades in the subphylum Saccharomycotina.
The putative ancestral RLM1 gene was predicted to try to confirm the microsatellite
codon present in the commom ancestor from which CUG, Non-WGD and WGD groups
diverged (Figure 23). This ancestral gene presented the main features characteristic of
the MADS-box transcription factor that has been previously identified namely, the
MADS-box, the acidic region and the regions A and B (Figure 18), but the repetitive
region was not observed. However, in the C-teminal of the protein this predicted
putative ancestral gene/protein showed a low accuracy percentage (59.9% - 43%),
presenting 194 unreliable sites of 1407. These unreliable sites are present in different
regions other than conserved ones, indicating a large variability on the unreliable
regions that were mainly replaced by amino acids with similar properties. On the other
hand, the conserved regions will be a primordial in the function of the protein and thus,
their presence in basal taxa in the subphylum Saccharomycotina.
Several reports indicate that transcription factors and protein kinases are
significantly associated with acidic and polar amino acid repeats, whereas Ser repeats
RESULTS AND DISCUSSION
87
are significantly associated with membrane transporter proteins (Alba et al., 1999).
Rlm1 is a transcription factor and the repetitive tracts observed are poly-Gln and poly-
Asn, which agrees with the previous refered works.
M G R R K I E I Q P I T D D R N R T V T F I K R K A G L F K
5' ATGGGTAGAAGAAAGATTGAAATTCAGCCGATCACGGACGATCGAAACCGGACAGTGACGTTTATCAAGCGTAAGGCCGGTCTATTCAAA 90
>>>.......................................................................................
* * *
K A H E L A V L C Q V D V A V I I F G N N N K L Y E F S S V
5' AAGGCTCACGAGTTGGCTGTGCTTTGCCAGGTAGATGTTGCGGTGATCATCTTTGGCAACAACAATAAGCTGTACGAGTTCTCCTCTGTT 180
............)))......................................................)))..................
*
D T N E L I K R Y Q K N M P H E M K A P E D Y G D Y K K K K
5' GACACAAACGAGTTGATTAAAAGATACCAGAAAAACATGCCGCACGAAATGAAGGCGCCTGAGGACTACGGAGACTACAAGAAGAAAAAA 270
............))).....................>>>.........>>>.......................................
* * * * *
H V G D D R P T A H F N V N N N N D D D D D D D D D E D D D
5' CATGTGGGTGATGATAGACCAACGGCACATTTTAATGTTAATAACAATAATGACGATGATGATGACGATGATGATGATGAAGATGATGAT 360
..........................................................................................
* * * * * * * * * * * * *
D T D N D T P D A N P K D K N E T E K K T Q N Q T P K E L N
5' GATACTGATAACGATACTCCTGATGCCAATCCTAAAGATAAAAACGAGACGGAAAAAAAAACTCAAAATCAAACTCCTAAGGAGCTGAAC 450
....................................................................................)))...
* * * * * * * * * * * * * * * *
H P Q Q P P P P Q H T S Q Q Q V Q K P E A P K D Q P A P P T
5' CATCCTCAGCAACCCCCGCCGCCTCAACATACTTCTCAACAACAAGTACAAAAACCAGAAGCACCTAAAGACCAACCCGCGCCGCCGACA 540
..........................................................................................
* * * * * * * * * * * * *
P P N H T H K N P D T M N Q R P E P P V Q I P T D V Q T N H
5' CCTCCAAACCACACACACAAGAACCCGGATACTATGAACCAAAGACCTGAACCGCCAGTTCAAATTCCAACTGATGTCCAGACTAATCAT 630
.................................>>>......................................................
* * * * * * * * * * * * * * * * * *
Q N D N A K T I T A I H T H K Q K N Q K N T K N K N N K N N
5' CAGAACGATAATGCTAAAACCATTACGGCCATTCACACCCATAAACAAAAAAATCAAAAGAATACTAAGAATAAAAATAATAAAAATAAT 720
..........................................................................................
* * * * * * * * * * * * * * * * * * * * * * * *
L S N Q N H I N S E T T P T T N Y S A S I K T P D S S N H T
5' CTAAGTAATCAGAATCATATTAATAGTGAAACTACTCCAACTACCAATTATTCTGCATCGATCAAAACACCTGATTCATCAAACCATACT 810
..........................................................................................
* * * * * * * * * * * * * *
L P I P T T T K E L P P T P V T A T A P G L P I S K N N P Y
5' TTGCCAATACCAACAACAACTAAAGAACTACCTCCTACTCCAGTTACTGCAACTGCTCCTGGATTACCAATAAGTAAAAATAATCCATAT 900
))).......................................................................................
* * * * * * * * *
F F G S E P P P Q V S P P Y P S I T P I L Q H E I P Q Q D P
5' TTTTTTGGATCCGAACCTCCACCTCAAGTTTCTCCTCCATATCCTTCTATTACACCTATACTTCAACATGAAATTCCTCAACAAGATCCA 990
..........................................................................................
* * * * * * * * * *
P P Q P V G Q H T P Q Y Q Q T T Q T D T N N K K K L P P P H
5' CCGCCGCAACCGGTAGGACAACATACACCTCAATACCAACAAACTACTCAAACAGATACTAACAATAAGAAAAAATTACCACCACCACAT 1080
..........................................................................................
* * * * * * * * * * * * * *
P N L N H P Q N N A A P I P P P E T I E H N P T G E E Q T P
5' CCTAATCTAAATCATCCACAAAATAATGCGGCACCAATACCTCCACCTGAAACAATTGAACATAATCCTACTGGTGAAGAACAAACGCCA 1170
..........................................................................................
* * * * * * * * * * * * * * * * * *
I S G L P S R Y V N D M F P S P S N F Y A P Q D W P T G M T
5' ATATCAGGACTACCATCGCGATATGTTAACGACATGTTTCCATCTCCATCTAACTTCTACGCTCCTCAAGATTGGCCCACTGGTATGACA 1260
.................................>>>................................................>>>...
* * *
P I N A N M P Q Y V A G T I P A G G P K K E Q T R T R K K T
5' CCCATTAATGCTAATATGCCTCAATACGTTGCGGGTACGATTCCTGCAGGAGGTCCTAAGAAAGAACAAACTCGAACTCGCAAAAAAACA 1350
...............>>>........................................................................
* * * * * * * * * * * * * * * * * * *
K G N D K T D N K E D N N K E K K R K
5' AAGGGAAATGATAAAACGGATAATAAGGAAGACAATAATAAAGAAAAAAAGAGAAAA 1407
.........................................................
* * * * * * * * * * * * * *
Figure 23. Predicted putative ancestral RLM1 gene sequence from which CUG, WGD and Non-WGD
groups diverged. MADS.box (red), Acidic region (orange), conserved region A (yellow) and
conserved region B (green). *unreliable sites.
Earlier investigations speculated that eukaryotes incorporating more DNA
repeats, might provide a molecular device for faster adaptation to environmental
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
88
stresses (Kashi et al., 1997; Marcotte et al. 1999; Wren et al., 2000; Li et al. 2002;
Trifonov, 2003). This speculation has been supported by an increasing number of
experiments. For instance, in S. cerevisiae, microsatellites are overrepresented among
ORFs encoding for regulatory proteins (e.g., transcription factors and protein kinases)
rather than for structural ones, indicating the role of microsatellites as a factor
contributing to fast evolution of adaptive phenotypes (Young et al., 2000). In
prokaryotes, microsatellites are not as abundant as in eukaryotes. Most of the
microsatellites in bacteria are located in virulence genes and/or regulatory regions, and
they affect pathogenesis and bacterial adaptive behavior (Hood et al., 1996; Peak et al.,
1996; van Belkum et al., 1998; Field and Wills, 1998). The contingency genes
containing microsatellites show high mutation rates, allowing the bacterium to act
swiftly on deleterious environmental conditions, showing the role of these repetitive
DNA sequences in natural selection (Moxon et al. 1994). In C. albicans, it has been
demonstrated that the microsatellite region in the 3‘end of RLM1 gene, denominated
CAI, presents high polymorphism (Sampaio et al., 2003, 2009). Additionally, the
analysis of CAI polymorphism in strains from recurrent infections revealed
microevolutionary changes at CAI, suggesting a putative involvement of RLM1 in
evolutive and/or adaptative response of C. albicans strains during recurrence (Sampaio
et al. 2005).
The C-terminal of proteins from the MADS box family is necessary for dimerization
and required for transcripticional activation (Messenguy and Dubois, 2003). It is known
that their regulatory specificity depends on accessory factors and in many cases the
cofactor with which the protein interacts specifies which genes are regulated, when they
are regulated and if these genes are transcriptionally activated or repressed (Shore and
Sharrocks, 1995). Moreover, Sampaio et al., (2009) demonstrated that the CAI
repetitive region confers a high genetic variability to RLM1 gene which is reflected in
strain susceptibility to different stress conditions. Thus, and although not evident in the
putative ancestral sequence, this regions seemed to be important for the protein
function. After genome duplication the coded amino acid in the repetitive region of
RLM1 gene within the Saccharomyces sensu stricto group changed, possibly due to a
frameshift mutation to encode Asn instead of Gln. This change certainly created
additional /new function for this protein.
FINAL REMARKS
FINAL REMARKS
91
5. FINAL REMARKS
Presently, the search of protein-encoding genes that present the desired
characteristics to infer a robust phylogeny is a challenge. Several genes have been used
to construct phylogenetic trees of different species among which the ribosomal genes
were used to infer most of the phylogenetic relationships among the different kingdoms
of life. In recent years, the use of concatenated gene alignments (multigene analysis)
and of complete genome analysis has become the best approach for resolving
phylogenetic relationships among species, whose placements were not well determined
by single-gene approach. However, the concatenated alignments of genes have only
been based in conserved regions, which can lead to a loss of genetic information that
might be important to resolve certain species placement that even with a large number
of genes may not be possible to infer accurate position.
In this work we used three nuclear protein-encoding genes, RLM1 and MCM1,
belonging to the family of MADS-box transcription factors, and Iff8 a GPI-anchor
protein, to infer phylogeny in the kingdom Fungi, using the entire gene sequence in
contrast with the previously mentioned studies. Orthologue sequences for the two genes,
RLM1 and MCM1, were identified in 76 species from the kingdom Fungi that have their
genome sequenced, and 8 putative orthologue IFF8 genes in the group CUG.
Results obtained from the phylogenetic analysis using the transcription factor RLM1
indicated that it presents conditions to be considered within a multigene analysis, since
the obtained phylogeny is closer to the ones established by multigene and phylogenomic
analyses. The transcription factor Mcm1 presented limitations to be used in phylogeny
at the kingdom level, because of its variable size sequences, leading to possible errors in
the alignment. This problem could be overcome with extra sequences of other species
that would help to infer a more accurate phylogeny and still be used at the kingdom
level. Despite this limitation this gene can be used to resolve phylogeny at the phylum
or lower clades level, because its use, independently and/or concatenated, to infer
phylogeny within the subphylum Saccharomycotina was in agreement with published
studies. On the other hand, the use of IFF8 gene to infer phylogeny is limited and
restricted to the CUG group, since other orthologues were not found within the kingdom
Fungi and the results obtained in CUG group phylogeny presented conflicts, particularly
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
92
in the position of C. tropicalis that is not in agreement with what has already been
determined as its correct relationship in Candida phylogeny.
It is known that, within subphylum Saccharomycotina, the process of genome
duplication has occurred in the ‗Saccharomyces complex‘, in known groups as sensu
stricto and sensu lato, resulting in duplication of some genes. The transcription factors
studied in this work are within the duplicated genes that were maintained.
Understanding the fate of duplicates is fundamental to clarify mechanisms of genetic
redundancy and the link between gene family diversification and phenotypic evolution.
Thus, in this study, the RLM1 gene was analyzed particularly within the subphylum
Saccharomycotina to identify possible sites of positive selection that could provide
alternative explanations for the persistence and diversification of this gene. The analysis
of adaptive evolution within the subphylum Saccharomycotina, by using the
conventional evolutive models, indicated no positive selection in the RLM1 gene
(purifying selection), suggesting that this protein plays an important role in these
organisms. However, a more complete model, the MEC, identified positive selection in
several amino acid sites, indicating nonsynonymous substitutions in some amino acid
positions. Although these substitutions were present in different positions, these
positions were not inside conserved regions, suggesting that these changes occurred due
to different molecular events during the evolution process of divergence of species. A
particular situation that was further analyzed in this study was the observation of the
presence of amino acids under positive selection at the beginning of the repetitive
region, a trinucleotide microsatellite, and differences in the amino acid under repetition
between the species that presented the duplicated the genome and the non duplicated.
The S. cerevisiae Rlm1 protein presents a common microsatellite region in C.
albicans, C. dubliniensis and K. lactis in the C-terminal of the protein, but there is one
peculiarity about this repetitive region. The amino acid encoded by this trinucleotide
microsatellite in S. cerevisiae is the Asn, and in the other species is Gln. The closer
analyses of the different reading frame of RLM1 gene lead to the conclusion that the
most likely molecular mechanism that changed the amino acids at the microsatellite
region was a frameshift mutation by insertion of nucleotides and not an alteration of
nucleotides in the codons, as it is generally the case in nonsynonymous substitutions.
This mechanism of frameshift mutation prevented the insertion of stop codons in the C-
FINAL REMARKS
93
terminal, avoiding the loss of this region in the protein. This change of amino acids in
the microsatellites of S. cerevisiae RLM1 gene should have brought structural and
functional alterations to the protein.
Another peculiarity of this gene is the absence of these microsatellites in basal
species, within the subphylum Saccharomycotina, indicating that their evolution has
occurred during the divergence of species. This was further confirmed by the
observation that the predicted ancestral RLM1 gene presented the main features
characteristic of the MADS-box transcription factor that has been previously identified
namely the MADS-box, the acidic region and the regions A and B, but the repetitive
region was absent. This absence would be related with the role that the microsatellites
play in the functionality of these proteins in some species and adaptation to different
environmental conditions. Thus, the process of duplication, and genome reorganization
influenced the restructuring of the S. cerevisiae RLM1 gene, decreasing the selective
pressure on this gene and enabling a frameshift mutation, which shifted the reading of
codons from Gln to Asn. Additionally, this process of gene duplication affected not only
RLM1 gene but also MCM1 gene in the Saccharomyces sensu stricto and
Saccharomyces sensu lato groups. This observation points to the potential use of
MADS-box transcription factors in the studies of identification and process of genomes
duplication within subphylum Saccharomycotina, due to the presence of two or four
MADS-box genes in these yeast species.
The major finding of this work were (i) the identification of the potential use of
RLM1 gene for inferring phylogeny in the kingdom Fungi with special emphasis to
Candida species, and (ii) the observation that this gene evolved within the
Saccharomyces sensu strict group after genome duplication, being the molecular
mechanism responsible for the change observed in the C-terminal of this protein most
probably a frameshift mutation.
REFERENCES
REFERENCES
97
6. REFERENCES
Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: Selection of best-fit
models of protein evolution. Bioinformatics. 21(9): 2104-2105.
Adachi, J., and Hasegawa, M. (1995). MOLPHY: Programs for Molecular
Phylogenetics. Tokyo: Inst. Statist. Math.
Alba, M. M., Santibáñez-Koref, M. F., and Hancock, J. M. (1999). Amino acid
reiterations in yeast are overrepresented in particular classes of proteins and
show evidence of a slippage-like mutational process. J Mol Evol. 49: 789–797.
Alexopoulos, C.J., Mims, C.W., and Blackwell, M. (1996). Introductory
Micology. 4th
edition. John Wiley & Sons Inc., New York, EE.UU.
Alfaro, M. E., Zoller, S., and Lutzoni, F. (2003). Bayes or bootstrap? A
simulation study comparing the performance of Bayesian Markov chain Monte
Carlo sampling and bootstrapping in assessing phylogenetic confidence.
Molecular Biology and Evolution. 20: 255–266.
Althoefer, H., Schleiffer, A., Wassmann, K., Nordheim, A., and Ammerer, G.
(1995). Mcm1 is required to coordinate G2-specific transcription in
Saccharomyces cerevisiae. Mol Cell Biol. 15: 5917–5928.
Alvarez-Buylla, E. R., Pelaz, S., Liljegren, S. J., Gold, S. E., Burgeff, C., Ditta,
G. S., Ribas de Pouplana, L., Martinez-Castilla, L., and Yanofsky, M. F. (2000).
An ancestral MADS-box gene duplication occurred before the divergence of
plants and animals. Proc Natl Acad Sci. 97: 5328–5333.
Anisimova, M., Bielawski, J. P., and Yang, Z. (2001). Accuracy and power of
the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol
Evol, 18(8): 1585-1592.
Anisimova, M., Bielawski, J. P., and Yang, Z. (2002). Accuracy and power of
Bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 19:
950-958.
Anisimova, M., and Gascuel, O. (2006). Approximate Likelihood-Ratio Test for
branches: A fast, accurate and powerful alternative. Systematic Biology. 55(4):
539-552.
Aragona, M., Montigiani, M., and Porta-Puglia, A. (2000). Electrophoretic
karyotypes of the phytopathogenic Pyrenophora graminea and P. teres.
Mycological Research. 104: 853-857.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
98
Baillie, G. S., and Douglas, L. J. (1998). Effect of growth rate on resistance of
Candida albicans biofilms to antifungal agents. Antimicrob Agents Chemother.
42: 1900–1905.
Baldauf, S. L. (1999). A search for the origins of animals and fungi: comparing
and combining molecular data. American Naturalist. 154: S178-S188.
Baldauf, S. L., and Palmer, J. D. (1993). Animals and fungi are each other‘s
closest relatives: congruent evidence from multiple proteins. Proc Natl Acad Sci.
90: 11558–11562.
Baldauf, S. L., Roger, A. J., Wenk-Sierfert, I., and Doolittle, W. F. (2000). A
kingdom-level phylogeny of eukaryotes based on combined protein data.
Science. 290: 972–977.
Ball, L. M., Bes, M. A., Theelen, B., Boekhout, T., Egeler, R., and Kuijper, E. J.
(2004). Significance of amplified fragment length polymorphism in
identification and epidemiological examination of Candida species colonization
in children undergoing allogeneic stem cell transplantation. J Clin Microbiol.
42(4): 1673-1679.
Ballesté, R, Arteta, Z., Fernández, N., Mier, C., Mousqués, N., Xavier, B.,
Cabrera, M. J., Acosta, G., Combol, A., and Gezuele, E. (2005). Evaluación del
medio cromógeno CHROMagar CandidaTM para la identificación de levaduras
de interés médico. Rev Med Uruguay. 21: 186-193.
Baker, E. E., Mrak, M., and Smith, C. E. (1943). The morphology, taxonomy,
and distribution of Coccidioides irnmitis Rixford and Gilchrist 1896. Farlowia.
1: 199-244.
Bapteste, E., Brinkmann, H., Lee, J. A., Moore, D. V., Sensen, C. W., Gordon,
P., Duruflé, L., Gaasterland, T., Lopez, P., Müller, M., and Philippe, H. (2001).
The analysis of 100 genes supports the grouping of three highly divergent
amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci.
99: 1414 – 1419.
Barr, D. J. S. (1992). Evolution and kingdoms of organisms from the perspective
of a mycologist. Mycologia. 84: 1–11.
Barros Lopes, M., Rainieri, S., Henschke, P.A., and Langridge, P. (1999). AFLP
fingerprinting for analysis of yeast genetic variation. Int J Syst Bacteriol. 49:
915-924.
REFERENCES
99
Bartnicki-Garcia, S. (1987). The cell wall in fungal evolution., p. 389-403. In A.
D. M. Rayner, C. M. Brasier, and D. Moore (ed.), Evolutionary biology of the
fungi. Cambridge University Press, New York, N. Y.
Bhattacharya, D., Lutzoni, F., Reeb, V., Simon, D., Nason, J., and Fernandez, F.
(2000). Widespread occurrence of spliceosomal introns in the rDNA genes of
Ascomycetes. Molecular Biology and Evolution. 17: 1971–1984.
Bauer, R., Begerow, D., Sampaio, J.P., Weiß, M., and Oberwinkler, F. (2006).
The simple-septate basidiomycetes: a synopsis. Mycological Progress. 5: 41–66.
Beckmann, J. S., and Weber, J. L. (1992). Survey of human and rat
microsatellites. Genomics. 12: 627–631.
Berbee, M. L. (1996). Loculoascomycete origins and evolution of filamentous
ascomycete morphology based on 18S rDNA gene sequence data. Molecular
Biology and Evolution. 13: 462–470.
Berbee, M. L. (2001). The phylogeny of plant and animal pathogens in the
Ascomycota. Physiology and Molecular Plant Pathology. 59: 165–187.
Berbee, M. L., and Taylor, J. W. (1993). Dating the evolutionary radiations of
the true fungi. Canadian Journal of Botany. 71: 1114–1127.
Bernal, S., Mazuelos, E. M., Chavez, M., Coronilla, J., and Valverde, A. (1998).
Evaluation of the new API Candida system for identification of the most
clinically important yeast species. Diagnostic Microbiology and Infectuos
Diseases. 32(3): 217-221.
Binder, M., and Hibbett, D. S. (2002). Higher-level phylogenetic relationships of
Homobasidiomycetes (mushroom-forming fungi) inferred from four rDNA
regions. Molecular Phylogenetics and Evolution. 22: 76–90.
Binder, M., Hibbett, D. S, and Molitoris, H. P. (2001). Phylogenetic
relationships of the marine gasteromycete Nia vibrissa. Mycologia. 93: 679–
688.
Binder, M., Hibbett, D. S., Larsson, K-H., Larsson, E., Langer, E., and Langer,
G. (2005). The phylogenetic distribution of resupinate forms across the major
clades of mushroom-forming fungi (Homobasidiomycetes). Systematics and
Biodiversity. 3(2): 113-157.
Blackwell, M. (2000). Evolution. Terrestrial life-fungal from the start?. Science.
293: 1129-1133.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
100
Blackwell, M., Hibbett, D. S., Taylor, J., and Spatafora, J. W. (2006). Research
coordination networks: a phylogeny for kingdom Fungi (Deep Hypha).
Mycologia. 98(6): 829-837.
Boeke, J. D., Garfinkel, D. J., Styles, C. A., and Fink, G. R. (1985). Ty elements
transpose through an RNA intermediate. Cell. 40: 491–500.
Botterel, F., Cesterke, C., Costa, C., and Bretagne, S. (2001). Analysis of
microsatellite markers of Candida albicans used for rapid typing. J Clin
Microbiol. 39: 4076-4081.
Bowman, B. H., White, T. J., and Taylor, J. W. (1996). Human pathogeneic
fungi and their close nonpathogenic relatives. Mol Phylogenet Evol. 6(1): 89-96.
Bowman, S. M., Piwowar, A., Al Dabbous, M., Vierula, J., and Free, S. J.
(2006). Mutational analysis of the glycosylphosphatidylinositol (GPI) anchor
pathway demonstrates that GPI-anchored proteins are required for cell wall
biogenesis and normal hyphal growth in Neurospora crassa. Eukaryot Cell 5:
587–600.
Bretagne, S., Costa, J. M., Besmond, C., Carsique, R., and Calderone, R. (1997).
Microsatellite polymorphism in the promoter sequence of the elongation factor 3
gene of Candida albicans as the basis for a typing system. J Clin Microbiol. 35:
1777-1780.
Bridge, P. D., Spooner, B. M., and Roberts, P. J. (2005). The impact of
molecular data in fungal systematics. Adv Bot Res. 42: 33 – 67.
Bruno, V. M., Kalachikov, S., Subaran, R., Nobile, C. J., Kyratsous, C., and
Mitchell, A. P. (2006). Control of the C. albicans cell wall damage response by
transcriptional regulatorCas5. Plos Pathogens. 2(3): e21.
Bruns, T. D., Vilgalys, R., Barns, S. M., Gonzalez, D., Hibbett, D. S., Lane, D.
J., Simon, L., Stickel, S., Szaro, T. M., Weisburg, W. G., and Sogin, M. L.
(1992). Evolutionary relationships within the fungi: analyses of nuclear small
subunit RNA sequences. Molecular Phylogenetics and Evolution. 1: 231–241.
Busch, U., and Nitschko, H. (1999). Methods for the differentiation of
microorganisms. Journal of Chromatography B. 722: 263 - 278.
Byrne, K. P., and Wolfe, K. H. (2006). Visualizing syntenic relationships among
the hemiascomycetes with the Yeast Gene Order Browser.Nucleic Acids Res.
34: D452-D455.
REFERENCES
101
Calderone, R. A. (2002). Candida and Candidiasis. ASM Press. Washington,
DC, EE.UU.
Carlile, M. J., and Watkinson, S. C. (1994). The Fungi. Academic Press,Ltd.,
London, United Kingdom.
Cavalier-Smith, T. (1981). Eukaryote kingdoms: seven or nine?. Biosystems.
14(3-4): 461-481.
Cavalier-Smith, T. (1987a). The origin of eukaryotic and archaebacterial cells.
Ann N Y Acad Sci. 503: 17-54.
Cavalier-Smith, T. (1987b). The origin of Fungi and pseudofungi. In: Rayner A.
D. M., Brasier C. M. and Moore D. (eds): Evolutionary Biology of the Fungi,
pp. 339–353. Cambridge University Press.
Cavalier-Smith, T. (1998). A revised six-kingdom system of life. Biol. Rev.
Camb. Philos. Soc. 73: 203–266.
Cavalier-Smith, T. (2000). Flagellate megaevolution: the basis for eukaryote
diversification. In: Green J. R. and Leadbeater B. S. C. (eds): The Flagellates,
pp. 361-390. Taylor and Francis. London.
Cavalier-Smith, T. (2002a). The neomuran origin of archaebacteria, the
negibacterial root of the universal tree and bacterial megaclassification. Int J
Syst Evol Microbiol. 52(Pt 1): 7-76.
Cavalier-Smith, T. (2002b). The phagotrophic origin of eukaryotes and
phylogenetic classification of Protozoa. Int J Syst Evol Microbiol. 52(part2):
297-354.
Cavalier-Smith, T. (2002c). Chloroplast evolution: Secundary symbiogenesis
and multiple losses. Curr Biol. 12: R62-R64.
Cavalier-Smith, T. (2003). Protist phylogeny and the high-level classification of
Protozoa. Europ J Protistol. 39: 338-348.
Cavalier-Smith, T. (2004). Only six kingdoms of life. Proc R Soc Lond B. 271:
1251-1262.
Cavalier-Smith, T. (2006a). Rooting of tree of life by transition analysis.
Biology Direct. 1: 19.
Cavalier-Smith, T. (2006b). Cell evolution and earth history: stasis and
revolution. Phil Trans Roy Soc B. 361: 969-1006.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
102
Cavalier-Smith, T. (2006c). Protozoa: the most abundant predators on earth.
Microbiology today. 166-169.
Cavalier-Smith, T., and Chao, E. E. (1995). The opalozoan Apusomonas is
related to the common ancestor of animals, fungi, and choanoflagellates. Proc R
Soc Lond B. 261: 1–9.
Cavalier-Smith, T., and Chao, E. E. (2003). Phylogeny of Choanozoa,
Apusozoa, and other Protozoa and early eukaryote megaevolution. J Mol Evol.
56: 540–563.
Cavalier-Smith T., Chao E. E., and Oates B. (2004): Molecular phylogeny of
Amoebozoa and the evolutionary significance of the unikont Phalansterium. Eur
J Protistol. 40: 21-48.
Cavalli-Sforza, L. L., and Edwards, A. W. F. (1967). Phylogenetic analysis:
models and estimation procedures. Am J Hum Genet. 19: 122–257.
Chapela, I. H. (1991). Spore size revisited: analysis of spore populations using
automated particle size. Sydowia. 43: 1–14.
Chen, T. C., Chen, Y. H., Tsai, J. J., Peng, C. F., Lu, P. L., Chang, K., Hsieh, H.
C., and Chen, T .P. (2005). Epidemiologic analysis and antifungal susceptibility
of Candida blood isolates in Southern Taiwan. Journal of Microbiology and
Immunological Infectious. 38: 200-210.
Chistiakov, D. A., Hellemans, B., and Volckaert, F. A. M. (2006).
Microsatellites and their genomic distribution, evolution, function and
applications: A review with special reference to fish genetics. Aquaculture. 255:
1-29.
Ciferri, R., and Redaelli, P. (1936). Morfologia, biologia e posizione sistemática
di Coccidioides immitis stiles e delle sue varieta, con notizie sul granuloma
coccidioide. R Accad Ital. 7: 399-474.
Clark, T. A., Slavinski, S. A, Morgan, J., Lott, T., Arthington-Skaggs, B. A.,
Brandt, M. E., Webb, R. M., Currier, M., Flowers, R. H., Fridkin, S. K., and
Hajjeh, R. A. (2004). Epidemiologic and molecular characterization of an
outbreak of Candida parapsilosis bloodstream infections in a community
hospital. J Clin Microbiol. 42(10): 4468-4472.
Cole, G. T., Samson, R. A. (1979). Patterns of development in conidial fungi.
Pittman, London, United Kingdom.
REFERENCES
103
Coleman, D. C., Rinaldi, M. G., Haynes, K. A., Rex, J. H., Summerbell, R. C.,
Anaissie, E. J., Li, A., and Sullivan, D. J. (1998). Importance of Candida species
other than Candida albicans as opportunistic pathogens. Medical Mycology. 36:
156–165.
Copeland, H. (1956). The Classification of Lower Organisms. Palo Alto: Pacific
Books.
Caro, L. H., Tettelin, H., Vossen, J. H., Ram, A. F., van den Ende, H., and Klis,
F. M. (1997). In silico identification of glycosyl-phosphatidylinositol-anchored
plasma-membrane and cell wall proteins of Saccharomyces cerevisiae. Yeast.
13: 1477–1489.
Cordeiro, G. M., Casu, R., McIntyre, C. L., Manners, J. M., and Henry, R. J.
(2001). Microsatellite markers from sugarcane (Saccharum spp) ESTs across
transferable to erianthus and sorghum. Plant Sci. 160: 1115–1123.
Correia, A., Sampaio, P., James, S., and Pais, C. (2006). Candida bracarensis
sp. nov., a novel anamorphic yeast species phenotypically similar to Candida
glabrata. Int J Syst Evol Microbiol. 56: 313-317.
Damveld, R. A., Arentshorst, M., Franken, A., vanKuyk, P. A., Klis, F. M., van
den Hondel, C. A. M. J. J., and Ram, A. F. J. (2005). The Aspergillus niger
MADS-box transcription factor RlmA is required for cell wall reinforcement in
response to cell wall stress. Molecular Microbiology. 58(1): 305-319.
Daniel, H. M., Sorrell, T. C., and Meyer, W. (2001). Partial sequence analysis of
the actin gene and its potential for studying the phylogeny of Candida species
and their teleomorphs. Int J Syst Evol Microbiol. 51(Pt 4): 1593-1606.
De Groot, P. W., de Boer, A. D., Cunningham, J., Dekker, H. L., de Jong, L.,
Hellingwerf, K. J., de Koster, C., and Klis, F. M. (2004). Proteomic analysis of
Candida albicans cell walls reveals covalently bound carbohydrate-active
enzymes and adhesins. Eukaryot Cell, 3: 955–965.
De Nadal., E., Casadomé, L., and Posas, F. (2003). Targeting the MEF2-like
transcription factor Smp1 by the stress-activated Hog1 mitogen-activated protein
kinase. Mol Cel Biol. 23(1): 229-237.
Dib, J.C., Dube, M., Kelly, C., Rinaldi, M.G., and Patterson, J. E. (1996).
Evaluation of pulsed-field gel electrophoresis as a typing system for Candida
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
104
rugosa: comparison of karyotype and restriction fragment length
polymorphisms. J Clin Microbiol. 34: 1494–1496.
Diezmann, S., Cox, C. J., Schönian, G., Vilgalys, R., and Mitchell, T. (2004).
Phylogeny and evolution of medical species of Candida and related taxa: A
multigenic analysis. J Clin Microbiol. 42(12): 5624-5635.
Dodou, E., and Treisman, R. (1997). The Saccharomyces cerevisiae MADS-box
transcription factor Rlm1 is a target for the Mpk1 mitogen-activated protein
kinase pathway. Mol Cell Biol. 17: 1848– 1859.
Doron-Faigenboim, A., and Pupko, T. 2006. A combined empirical and
mechanistic codon model. Mol Biol Evol. 24(2): 388-397.
Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F., and Douzery, E. J. P.
(2003). Comparison of the Bayesian and maximum likelihood bootstrap
measures of phylogenetic reliability. Molecular Biology and Evolution. 20: 248–
254.
Douzery, E. J., Snell, E. A., Bapteste, E., Delsuc, F., and Philippe, H. (2004) The
timing of eukaryotic evolution: does a relaxed molecular clock reconcile
proteins and fossils? Proc Natl Acad Sci. 101(43): 15386-15391.
Dubois, E., Bercy, J., and Messenguy, F. (1987). Characterization of two genes,
ARGRI and ARGRIII required for specific regulation of arginine metabolism in
yeast. Mol Gen Genet. 207: 142–148.
Dujon, B., Sherman, D., Fischer, G., Durrens, P., Casaregola, S., Lafontaine, I.,
De Montigny, J., Marck, C., Neuveglise, C., Talla, E., Goffard, N., Frangeul, L.,
Aigle, M., Anthouard, V., Babour, A., Barbe, V., Barnay, S., Blanchin, S.,
Beckerich, J. M., Beyne, E., Bleykasten, C., Boisrame, A., Boyer, J., Cattolico,
L., Confanioleri, F., De Daruvar, A., Despons, L., Fabre, E., Fairhead, C., Ferry-
Dumazet, H., Groppi, A., Hantraye, F., Hennequin, C., Jauniaux, N., Joyet, P.,
Kachouri, R., Kerrest, A., Koszul, R., Lemaire, M., Lesur, I., Ma, L., Muller, H.,
Nicaud, J. M., Nikolski, M., Oztas, S., Ozier-Kalogeropoulos, O., Pellenz, S.,
Potier, S., Richard, G. F., Straub, M. L., Suleau, A., Swennen, D., Tekaia, F.,
Wesolowski-Louvel, M., Westhof, E., Wirth, B., Zeniou-Meyer, M., Zivanovic,
I., Bolotin-Fukuhara, M., Thierry, A., Bouchier, C., Caudron, B., Scarpelli, C.,
Gaillardin, C., Weissenbach, J., Wincker, P., and Souciet, J. L. (2004). Genome
evolution in yeasts. Nature. 430(6995): 35-44.
REFERENCES
105
Edel, V., Steinberg, C., Gautheron, N., and Alabouvette, C. (1996). Evaluation
of restriction analysis of polymerase chain reaction (PCR)-amplified ribosomal
DNA for the identification of Fusarium species. Mycol Res. 101: 179–187.
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy
and high throughput. Nucleic Acids Research. 32(5): 1792-1797.
Eisenhaber, B., Maurer-Stroh, S., Novatchkova, M., Schneider, G., and
Eisenhaber, F. (2003). Enzymes and auxiliary factors for GPI lipid anchor
biosynthesis and post-translational transfer to proteins. Bioessays. 25: 367–385.
Eisenhaber, B., Schneider, G., Wildpaner, M., and Eisenhaber, F. (2004). A
sensitive predictor for potential GPI lipid modification sites in fungal protein
sequences and its application to genome-wide studies for Aspergillus nidulans,
Candida albicans, Neurospora crassa, Saccharomyces cerevisiae and
Schizosaccharomyces pombe. J Mol Biol. 337: 243–253.
Elble, R., and Tye, B. K. (1992). Chromosome loss, hyperrecombination, and
cell cycle arrest in a yeast mcm1 mutant. Mol Biol Cell. 3: 971 – 980.
Ellegren, H. (2000). Microsatellite mutations in the germline: implications for
evolutionary inference. Trends Genet. 16: 551–558.
Eriksson, O. E., Baral, H. O., Currah, R. S., Hansen, K., Kurtzman, C. P.,
Rambold, G., and Laessøe, T (eds). (2004). Outline of Ascomycota— 2004.
Myconet, 10: 1–99.
Errede, B. (1993). MCM1 binds to a transcriptional control element in Ty1. Mol
Cell Biol. 13: 57 – 62.
Estrella, M. C., Mellado, E., Guerra, T. M. D., Monzón, A., and Tudela, J. L. R.
(2000). Susceptibility of fluconazole-resistant clinical isolates of Candida spp.
to echinocandin LY303366, itraconazole and amphotericin B. Journal of
Antimicrobial Chemotherapy. 46: 475-477.
Evans, E. E. (1950). The antigenic composition of Cryptococcus neoformans. I.
A serologic classification by means of the capsular and agglutination reactions. J
Immunol, 64: 423 – 430.
Fabre, E., Muller, H., Therizols, P., Lafontaine, I., Dujon, B., and Fairhead, C.
(2005). Comparative genomics in hemiascomycete yeasts: evolution of sex,
silencing, and subtelomeres. Mol Biol Evol. 22(4): 856-873.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
106
Fanci, R., and Pecile, P. (2005). Central venous catheter-related infection due to
Candida membranaefaciens, a new opportunistic azole-resistant yeast in a
cancer patient: a case report and a review of literature. Mycoses, 48: 357-359.
Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will
be positively misleading. Syst Zool. (27): 401– 410.
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum
likelihood approach. J Mol Evol. 17: 368–376.
Ferguson, M. A. (1999). The structure, biosynthesis and functions of
glycosylphosphatidylinositol anchors, and the contributions of trypanosome
research. J Cell Sci. 112: 2799–2809.
Fernandes, L., Araújo, M. A. M., Amaral, A., Reis, V. C. B., Martins, N. F., and
Felipe, M. S. (2005). Cell signalling pathways in Paracoccidoides brasiliensis –
inferred from comparisons with other fungi. Genet Mol Res. 4(2): 216-231.
Field, D., and Wills, C. (1996). Long, polymorphic microsatellites in simple
organisms. Proc R Soc London Ser B. 263: 209–251.
Fink, G. R. (1987). Pseudogenes in yeast? Cell. 49: 5–6.
Fitzpatrick, D., Logue, M., Stajich, J., and Butler, G. (2006) A fungal phylogeny
bosed on 42 complete genomes derived from supertree and combined gene
analysis. BMC Evolutionary Biology. 6: 99.
Fluit, A. C., Jones, M. E., Schmitz, F-J., Acar, J., Gupta, R., Verhoef, J.,
SENTRY participants Groupl. (2000). Antimicrobial susceptibility and
frequency of occurrence of clinical blood isolates in Europe from the SENTRY
Antimicrobial Surveillance Program, 1997 and 1998. Clin Infect Dis. 30: 454–
60.
Frisvad, J. C. (1994). Classification of organisms by secondary metabolites,p.
303-321. In D. L. Hawsworth (ed.), The identification and characterization of
pest organisms. CAB International, Wallingford, United Kingdom.
García-Martos, P., Ruiz-Aragón, J., Garcia-Agudo, L., Saldarreaga, A., Lozano,
M., and Marín, P. (2004). Aislamiento de Candida ciferri en un paciente
inmunodeficiente. Rev Iberoam Micol. 21: 85-86.
Germot, A., Philipe, H., and Le Guyader, H. (1997). Evidence for loss of
mitochondria in microsporidia from a mitochondrial HSP70 in Nosema locustae.
Molecular and Biochemical Parasitology. 87: 159–168.
REFERENCES
107
Geiser, D. M., Gueidan, C., Miadlikowska, J., Lutzoni, F., Kauff, F., Hofstetter,
V., Fraker, E., Schoch, C. L., Tibell, L., Untereiner, W. A., and Aptroot, A.
(2006). Eurotiomycetes: Eurotiomycetidae and Chaetothyriomycetidae.
Mycologia, 98(6): 1053-1064.
Gill, E. E., and Fast, N. M. (2006). Assessing the microsporidia–fungi
relationship: combined phylogenetic analysis of eight genes. Gene. 375: 103–
109.
Graf, B., Adam, T., Zill, E., and Göbel, U. B. (2000). Evaluation of the VITEK 2
system for rapid identification of yeasts and yeast-like organisms. Journal of
Clinical Microbiology, 38(5): 1782-1785.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation
and Bayesian model determination. Biometrika. 82: 711-732.
Gribaldo, S., and Philippe, H. (2002). Ancient phylogenetic relationships. Theor
Popul Biol. 61(4): 391-408.
Guarro, J., Gené, J., and Stchigel, A. 1999. Developments in fungal taxonomy.
Clin Microbiol Rev. 12(3): 454-500.
Gudlaugsson, O., Gillespie, S., Lee, K., Van de Berg, J., Hu, J., Messer, S.,
Herwaldt, L., Pfaller, M., and Diekema, D. (2003). Attributable mortality of
nosocomial candidemia. Clin Infect Dis. 37: 1172-1177.
Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate algorithm to
estimate large phylogenies by maximum likelihood. Systematic Biology. 52(5):
696-704.
Guindon, S., Lethiec, F., Duroux, P., and Gascuel, O. (2005). PHYML Online –
a web server for fast maximum likelihood-based phylogenetic inference. Nucleic
Acids Research. 33: W557-W559.
Gur-Arie, R., Cohen, C., Eitan, L., Shelef, E., Hallerman, E., and Kashi, Y.
(2000). Abundance, non-random genomic distribution, nucleotide composition,
and polymorphism of simple sequence repeats in E. coli. Genome Res. 10: 61–
70.
Gutierrez, J., Morales, P., Gonzalez, M. A, and Quindos, G. (2002). Candida
dubliniensis, a new fungal pathogen. J. Basic Microbiol. 42: 207–227.
Haeckel, E. (1866). Generelle Morphologie der Organismen. Reimer, Berlin.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
108
Hall, B. G. (2008). Phylogenetic trees made easy: A how to manual for
molecular biologists. Third Edition. Sinauer Associates, Inc. Sunderland,
Massachusetts, EE UU.
Harding, R. M., Boyce, A. J., and Clegg, J. B. (1992). The evolution of tandemly
repetitive DNA: recombination rules. Genetics. 132: 847–859.
Hasenclever, H. F., and Mitchell, W. O. (1961). Antigenic studies of Candida. I.
Observations of two antigenic groups in Candida albicans. J Bacteriol. 82: 570
– 573.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains
and their applications. Biometrika. 57: 97-109.
Hawksworth, D. L. (1991). The fungal dimension of biodiversity-magnitude,
significance, and conservation. Mycological Research. 95: 641.
Hawksworth, D. L. (2001). The magnitude of fungal diversity: the 1-5 million
species estimate revisited. Mycological Research. 105: 1422-1433.
Hawksworth, D. L., Kirk, P. M., Sutton, B. C., and Pegler, D. N. (1995).
Ainsworth & Bisby‘s Dictionary of the Fungi. 8th ed. CAB International,
Wallingford, Oxon, United Kingdom.
Hazen, K. C. (1995). New and emerging yeast pathogens. Clin Microbiol
Reviews. 8: 462–478.
Heath, I. B. (1986). Nuclear division: a marker for protist phylogeny? Progress
in Protistology. 1: 115-162.
Hibbett, D. S., and Binder, M. (2001). Evolution of marine mushrooms.
Biological Bulletin. 201: 319–322.
Hibbett, D. S., Gilbert, L. B., and Donoghue, M.J. (2000). Evolutionary
instability of ectomycorrhizal symbiosis in basidiomycetes. Nature. 407: 506–
508.
Hibbett, D. S., Binder, M., Bischoff, J. F., Blackwell, M., Cannon, P. F.,
Eriksson, O. E., Huhndorf, S., James, T., Kirk, P. M., Lücking, R., Lumbsch, T.,
Lutzoni, F., Matheny, P. B., Mclaughlin, D. J., Powell, M. J., Redhead, S.,
Schoch, C. L., Spatafora, J. W., Stalpers, J. A., Vilgalys, R., Aime, M. C.,
Aptroot, A., Bauer, R., Begerow, D., Benny, G. L., Castlebury, L. A., Crous, P.
W., Dai, Y-C., Gams, W., Geiser, D. M., Griffith, G. W., Gueidan, C.,
Hawksworth, D. L., Hestmark, G., Hosaka, K., Humber, R. A., Hyde, K.,
REFERENCES
109
Ironside, J. E., Kõljalg, U., Kurtzman, C. P., Larsson, K-H., Lichtwardt, R.,
Longcore, J., Midlikowska, J., Miller, A., Moncalvo, J-M., Mozley-Standridge,
S., Oberwinkler, F., Parmasto, E., Reeb, V., Rogers, J. D., Roux, C., Ryvarden,
L., Sampaio, J. P., Schüßler, A., Sugiyama, J., Thorn, R. G., Tibell, L.,
Untereiner, W. A., Walker, C., Wang, Z., Weir, A., Weiß, M., White, M. M.,
Winka, K., Yao, Y-J., and Zhang, N. (2007). A higher-level phylogenetic
classification of the Fungi. Mycological Research III. 509-247.
Hirt, R. P., Healy, B., Vossbrinck, C. R., Canning, E. U., and Embley, T. M.
(1997). A mitochondrial Hsp70 orthologue in Vairimorpha necatrix: molecular
evidence that Microsporidia once contained mitochondria. Current Biology. 7:
995–998.
Hood, D. W., Deadman, M. E., Jennings, M. P., Bisercic, M., Fleischmann, R.
D., Venter, J. C., and Moxon, E. R. (1996). DNA repeats identify novel
virulence genes in Haemophilus influenzae. Proc Natl Acad Sci. 93: 11121–
11125.
Horowitz, B. J., Giaquinta, D., and Ito, S. (1992). Evolving pathogens in
vulvovaginal candidiasis: Implications for patient care. J Clin Pharmacol. 32(3):
248-255.
Huelsenbeck, J. P., and Ronquist, F. (2001). MrBayes: Bayesian inference of
phylogenetic trees. Bioinformatics Applications Note. 17(8): 754-755.
Helsenbeck, J. P., Larget, B., Miller, R. E., and Ronquist, F. (2002). Potential
applications and pitfalls of bayesian inference of phylogeny. Syst Biol. 51(5):
673 – 688.
Hull, C. M., and Johnson, A. D. (1999). Identification of a mating type-like
locus in the asexual pathogenic yeast Candida albicans. Science. 285(5431):
1271-1275.
Hull, C. M., Raisner, R. M., and Johnson, A. D. (2000). Evidence for mating of
the "asexual" yeast Candida albicans in a mammalian host. Science. 289(5477):
307-310.
Huson, D., and Bryant, D. (2006). Application of Phylogenetic Networks in
evolutionary studies. Mol Biol Evol. 23(2): 254-267.
Inagaki, Y., and Doolittle, W. F. (2000). Evolution of the eukaryotic translation
termination system: origins of release factors. Mol Biol Evol. 17: 882 – 889.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
110
Iyer, L. M., Koonin, E. V., and Aravind L. (2004). Evolution of bacterial RNA
polymerase: implications for large-scale bacterial phylogeny, domain accretion,
and horizontal gene transfer. Gene. 335: 73-88.
Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S., and Miyata, T. (1989).
Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred
from phylogenetic trees of duplicated genes. Proc Natl Acad Sci. 86(23): 9355-
9359.
James, T. Y., Porter, D., Leander, C. A., Vilgalys, R., and Longcore, J. E.
(2000). Molecular phylogenetics of the Chytridiomycota supports the utility of
ultrastructural data in chytrid systematics. Canadian Journal of Botany. 78: 336–
350.
James, T. Y., Kauff, F., Schoch, C. L., Matheny, P. B., Hofstetter, V., Cox, C.,
Celio, G., Gueidan, C., Fraker, E., Miädlikowska, J., Lumbsch, H. T., Rauhut,
A., Reeb, V., Arnold, E. A., Amtoft, A., Stajich, J. E., Hosaka, K., Sung, G-H.,
Johnson, D., O‘Rourke, B., Crockett, M., Binder, M., Curtis, J. M., Slot, J. C.,
Wang, Z., Wilson, A. W., Schüßler, A., Longcore, J. E., O‘Donnell, K., Mozley-
Standridge, S., Porter, D., Letcher, P. M., Powell, M. J., Taylor, J. W., White,
M. M., Griffith, G. W., Davies, D. R., Humber, R. A., Morton, J., Sugiyama, J.,
Rossman, A. Y., Rogers, J. D., Pfister, D. H., Hewitt, D., Hansen, K.,
Hambleton, S., Shoemaker, R. A., Kohlmeyer, J., Volkmann-Kohlmeyer, B.,
Spotts, R. A., Serdani, M., Crous, P. W., Hughes, K. W., Matsuura, K., Langer,
E., Langer, G., Untereiner, W. A., Lücking, R., Büdel, B., Geiser, D. M.,
Aptroot, A., Diederich, P., Schmitt, I., Schultz, M., Yahr, R., Hibbett, D. S.,
Lutzoni, F., McLaughlin, D., Spatafora, J., and Vilgalys, R. (2006).
Reconstructing the early evolution of the fungi using a six gene phylogeny.
Nature. 443: 818–822.
James, T. Y., Letcher, P. M., Longcore, J. E., Mozley-Standridge, S. E., Porter,
D., Powell, M. J., Griffith, G. W., and Vilgalys, R. (2007). A molecular
phylogeny of the flagellated fungi (Chytridiomycota) and a proposal for a new
phylum (Blastocladiomycota). Mycologia. 98: 860–871.
Jarvis, W. R. (1995). Epidemiology of nosocomial fungal infections, with
emphasis on Candida species. Clin Infect Dis. 20: 1526–1530.
REFERENCES
111
Jarvis, E. E., Clark, K. L., and Sprague Jr., G. F. (1989). The yeast transcription
activator PRTF, a homolog of the mammalian serum response factor, is encoded
by the MCM1 gene. Genes Dev. 3: 936–945.
Jeffroy, O., Brinkmann, H., Delsuc, F., and Philippe, H. (2006). Phylogenomics:
the beginning of incongruence? Trends Genet. 22(4): 225-231.
Karl, S. A., and Avise, J. C. (1993). PCR-based assays of mendelian
polymorphisms from anonymous single-copy nuclear DNA-techniques and
applications for population genetics. Mol Biol Evol. 10: 342–361.
Kashi, Y., King, D., and Soller, M. (1997). Simple sequence repeats as a source
of quantitative genetic variation. Trends Genet. 13: 74–78.
Kawaguchi, Y., Honda, H., Taniguchi-Morimura, J., and Iwasaki, S. (1989). The
codon CUG is read as serine in an asporogenic yeast Candida cylindracea.
Nature. 341(6238): 164-166.
Keeling, P. J. (2003). Congruent evidence from alpha-tubulin and beta-tubulin
gene phylogenies for a zygomycete origin of microsporidia. Fungal Genetics and
Biology. 38: 298–309.
Keeling, P. J., Luker, M. A., and Palmer, J. D. (2000). Evidence from beta-
tubulin phylogeny that Microsporidia evolved from within the fungi. Molecular
Biology and Evolution. 17: 23–31.
King, N., and Carroll, S. B. (2001): A receptor tyrosine kinase from
choanoflagellates: molecular insights into early animal evolution. Proc. Natl
Acad. Sci. 98: 15032–15037.
Kirk, P. M., Cannon, P. F., David, J. C., and Stalpers, J. A. (eds). (2001).
Ainsworth & Bisby‘s; Dictionary of the Fungi 60 (CAB International,
Wallingford, UK).
Klempp-Selb, B., Rimek, D., and Kappe, R. (2000). Karyotyping of Candida
albicans and Candida glabrata from patients with Candida sepsis. Mycoses. 43:
159-163.
Kohn, L. M. 1992. Developing new characters for fungal systematics: an
experimental approach for determining the rank of resolution. Mycologia. 84:
139–153.
Kosakovsky, S. L., Frost, S. D. W., and Muse, S. V. (2005). HyPhy: Hypothesis
testing using phylogenies. Bioinformatics. 21(5): 676–9.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
112
Kramer, E. M., Su, H-J., Wu, C-C., and Hu, J-M. (2006). A simplified
explanation for the frameshift mutation that created novel C-terminal motif in
the APETALA3 gene lineage. BMC Evolutionary Biology. 6: 30.
Krcmery, V., and Barnes, A. J. (2002). Non-albicans Candida spp. causing
fungaemia: pathogenicity and antifungal resistance. J Hosp Infect. 50: 243–260.
Kruger, J., Aichinger, C., Kahmann, R., and Bolker, M. (1997). A MADS-box
homologue in Ustilago maydis regulates the expression of pheromoneinducible
genes but is nonessential. Genetics. 147: 1643– 1652.
Kuhn, D. M., Mukherjee, P. K., Clark, T. A., Pujol, C., Chandra, J., Hajjeh, R.
A., Warnock, D. W., Soll, D. R., and Ghannoum, M. A. (2004). Candida
parapsilosis characterization in an outbreak setting. Emerg Infect Dis. 10: 1074–
1081.
Kullberg, B. J., and Lashof, A. M. L. O. (2002). Epidemiology of opportunistic
invasive mycoses. European Journal of Medical Research. 7: 183-191.
Kuo, M. H., Nadeau, E. T., and Grayhack, E. J. (1997). Multiple phosphorylated
forms of the Saccharomyces cerevisiae Mcm1 protein include an isoform
induced in response to high salt concentrations. Mol Cell Biol. 17: 819–832.
Kurtzman, C. P., and Robnett, C. J. (1998). Identification and phylogeny of
ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal
DNA partial sequences. Antonie van Leeuwenhoek, 73: 331–371.
Kurtzman, C. P., and Robnett, C. J. (2003). Phylogenetic relationships among
yeasts of the ‗Saccharomyces complex‘ determined from multigene sequence
analyses. FEMS Yeast Res. 3: 417–432.
Latchman, D. S. (1998). Eukaryotic transcription factors, Academic Press.
Lang, B. F., O‘Kelly, C., Nerad, T., Gray, M. W., and Burger, G. (2002). The
closest unicellular relatives of animals. Curr Biol. 12: 1773–1778.
Langkjaer, R. B., Cliften, P. F., Johnston, M., and Piskur, J. (2003). Yeast
genome duplication was followed by asynchronous differentiation of duplicated
genes. Nature. 421: 848–852.
Linnaei, C. (1735). Systema naturae sive Regna Tria Naturae Systematice
proposita per Classes, Ordines, Genera et Species. Lugduni Batavorum: apud T.
Hakk.
REFERENCES
113
Le´John, H. B. (1974). Biochemical parameters of fungal phylogenetics. Evol
Biol. 7: 79–125.
Li, J., Xu, J., and Bai, F. (2006). Candida pseudorugosa sp. nov., a novel yeast
species from sputum. J Clin Microbiol. 44(12): 4486-4490.
Li, Y-C., Korol, A. B., Fahima, T., Beiles, A., and Nevo, E. (2002).
Microsatellites: genomic distribution, putative functions and mutational
mechanisms. Molecular Ecology. 11: 2453-2465.
Li, Y-C., Korol, A.B., Fahima, T., and Nevo, E. (2004). Microsatellites within
genes: structure, function, and evolution. Mol Biol Evol. 21: 991–1007.
Liu, Y. J., Whelen, S., and Hall, B. D. (1999). Phylogenetic relationships among
ascomycetes: evidence from an RNA polymerase II subunit. Molecular Biology
and Evolution. 16: 1799–1808.
Liu, Y. J., Hodson, M. C., and Hall, B. D. (2006). Loss of the flagellum
happened only once in the fungal lineage: phylogenetic structure of kingdom
Fungi inferred from RNA polymerase II subunit genes. BMC Evolutionary
Biology. 6: 74.
Liu, Y. J., Leigh, J. W., Brinkmann, H., Cushion, M. T., Rodriguez-Ezpeleta, N.,
Philippe, H., and Lang, B. F. (2009). Phylogenomic analysis support the
monophyly of Taphrinomycotina, including Schizosaccharomyces fission
yeasts. Mol Biol Evol. 26: 27-34.
Logue, M. E., Wong, S., Wolfe, K. H., and Butler, G. (2005). A genome
sequence survey shows that the pathogenic yeast Candida parapsilosis has a
defective MTLa1 allele at its mating type locus. Eukaryot Cell. 4(6): 1009-1017.
López Martínez, R., Ruiz Sánchez, J., and Vértiz Chávez, E. (1984). Vaginal
candidiasis. Mycopathologia. 85: 167-170.
Lowy B. 1971. New records of mushroom stones from Guatemala. Mycologia.
63: 983-993.
Lumbsch, H. T., Schmitt, I., Lindemuth, R., Miller, A., Mangold, A., Fernandez,
F., and Huhndorf, S. (2005). Performance of four ribosomal DNA regions to
infer higher-level phylogenetic relationships of inoperculate euascomycetes
(Leotiomyceta). Mol Phylogenet Evol. 34(3):512-524.
Lutzoni, F., Pagel, M., and Reeb, V. (2001). Major fungal lineages are derived
from lichen symbiotic ancestors. Nature. 411: 937–940.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
114
Lutzoni, F., Kauff, F., Cox, C. J., McLaughlin, D., Celio, G., Dentinger, B.,
Padamsee, M., Hibbett, D. S., James, T. Y., Baloch, E., Grube, M., Reeb, V.,
Hofstetter, V., Schoch, C., Arnold, A. E., Miädlikowska, J., Spatafora, J.,
Johnson, D., Hambleton, S., Crockett, M., Shoemaker, R., Sung, G-H., Lücking,
R., Lumbsch, T., O‘Donnell, K., Binder, M., Diederich, P., Ertz, D., Gueidan,
C., Hansen, K., Harris, R. C., Hosaka, K., Lim, Y-W., Matheny, B., Nishida, H.,
Pfister, D., Rogers, J., Rossman, A., Schmitt, I., Sipman, H., Stone, J.,
Sugiyama, J., Yahr, R., and Vilgalys, R. (2004). Assembling the fungal tree of
life: progress, classification, and evolution of subcellular traits. American
Journal of Botany. 91: 1446–1480.
Magee, B. B., and Magee, P. T. (1987). Electrophoretic karyotypes and
chromosome numbers in Candida species. J Gen Microbiol. 133: 425–430.
Magee, B. B., and Magee, P. T. (2000). Induction of mating in Candida albicans
by construction of MTLa and MTLalpha strains. Science. 289(5477): 310-313.
Marcotte, E. M., Pellegrini, M., Yeates, T. O., and Eisenberg, D. (1999). A
census of protein repeats. J Mol Biol. 293: 151–160.
Massey, S. E., Moura, G., Beltrao, P., Almeida, R., Garey, J. R., Tuite, M. F.,
and Santos, M. A. (2003). Comparative evolutionary genomics unveils the
molecular mechanism of reassignment of the CTG codon in Candida spp.
Genome Res. 13(4): 544-557.
Matheny, P. B., Gossman, J. A., Zalar, P., Arun Kumar, T. K., and Hibbett, D. S.
(2007a). Resolving the phylogenetic position of the Wallemiomycetes: an
enigmatic major lineage of Basidiomycota. Canadian Journal of Botany. 84:
1794–1805.
Matheny, P. B., Wang, Z., Binder, M., Curtis, J. M., Lim, Y. W., Nilsson, R. H.,
Hughes, K. W., Hofstetter, V., Ammirati, J. F., Schoch, C. L., Langer, G. E.,
McLaughlin, D. J., Wilson, A. W., Frøslev, T., Ge, Z. W., Kerrigan, R. W., Slot,
J. C., Vellinga, E. C., Liang, Z. L., Baroni, T. J., Fischer, M., Hosaka, K.,
Matsuura, K., Seidl, M. T., Vaura, J., and Hibbett, D. S. (2007b). Contributions
of Rpb2 and Tef1 to the phylogeny of mushrooms and allies (Basidiomycota,
Fungi). Molecular Phylogenetics and Evolution. 43: 430-451.
Mayer, V. W., and Aguilera, A. (1990). High levels of chromosome instability in
polyploids of Saccharomyces cerevisiae. Mutat Res. 231: 177–186 (1990).
REFERENCES
115
McConville, M. J., and Menon, A. K. (2000). Recent developments in the cell
biology and biochemistry of glycosylphosphatidylinositol lipids (review). Mol
Membr Biol. 17: 1–16.
McInerny, C. J., Partridge, J. F., Mikesell, G. E., Creemer, D. P., and Breeden,
L. L. (1997). A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, and
CDC47 promoters activates M/G1-specific transcription. Genes Dev. 11: 1277–
1288.
Melo, A., de Almeida, L., Colombo, A., and Briones, M. (1998). Evolutionary
distances and identification of Candida species in clinical isolates by randomly
amplified polymorphic DNA (RAPD). Mycopathologia. 142: 57-66.
Messenguy, F., and Dubois, E. (1993). Genetic evidence for a role for MCM1 in
the regulation of arginine metabolism in Saccharomyces cerevisiae. Mol Cell
Biol. 13: 2586– 2592.
Messenguy, F., and Dubois, E. (2000). Regulation of arginine metabolism in
Saccharomyces cerevisiae: a network of specific and pleiotropic proteins in
response to multiple environmental signals. Food Technol Biotechnol. 38: 277–
285.
Metzgar, D., van Belkum, A., Field, D., Haubrich, R., and Wills, C. (1998).
Random amplification of polymorphic DNA and microsatellite genotyping of
pre- and posttreatment isolates of Candida spp. from human immunodeficiency
virus-infected patients on different fluconazole regimens. J Clin Microbiol. 36:
2308-2313.
Metzgar, D., Bytof, J., and Wills, C. (2000). Selection against frameshift
mutations limits microsatellite expansion in coding DNA. Genome Res. 10: 72–
80.
Miadlikowska, J., and Lutzoni, F. (2004). Phylogenetic classification of
peltigeralean fungi (Peltigerales, Ascomycota) based on ribosomal RNA small
and large subunits. American Journal of Botany. 91: 449–464.
Micales, J. A., Bonde, M. R., and Peterson, G. L. (1986). The use of isozyme
analysis in fungal taxonomy and genetics. Mycotaxon. 27: 405-409.
Moestrup, Ø. (2000): The flagellate cytoskeleton: introduction of a general
terminology for microtubular roots in protists. In: Leadbeater B. S. and Green J.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
116
C. (eds): The flagellates: unity, diversity and evolution, pp. 69–94. Taylor and
Francis, London.
Moncalvo, J. M., Vilgalys, R., Redhead, S. A., Johnson, J. E., James, T. Y.,
Aime, M. C., Hofstetter, V., Verduin; S. J. W., Larsson, E., Baroni, T. J., Thorn,
R.G., Jacobsson, S., Clémençon, H., and Miller, jr, O. K. (2002). One hundred
and seventeen clades of euagarics. Molecular Phylogenetics and Evolution. 23:
357–400.
Moss, M. O. (1987). Fungal biotechnology roundup. Mycologist, 21: 55-58.
Moxon, E. R., Rainey, P. B., Nowak, M. A., and Lenski, R. E. (1994). Adaptive
evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 4: 24–33.
Mueller, C. G., and Nordheim, A. (1991). A protein domain conserved between
yeast MCM1 and human SRF directs ternary complex formation. EMBO J. 10:
4219– 4229.
Musial, C. E., Cockerill, F. R., and Roberts, G. D. (1988). Fungal infections of
the immunocompromised host: Clinical and laboratory aspects. Clin Microbiol
Rev. 1: 349–364.
Nagahama, T., Sato, H., Shimazu, M., and Sugiyama, J. (1995). Phylogenetic
divergence of the entomophthoralean fungi: evidence from nuclear 18S
ribosomal RNA gene sequences. Mycologia. 87: 203–209.
Nagy, Á., Pesti, M., Galgóczy, M., and Vágvölyi, C. (2004). Electrophoretic
karyotype of two Micromucor species. Journal of Basic Microbiology. 44(1):
36-41.
Nadal Ed, E., Casadome, L., and Posas, F. (2003). Targeting the MEF2-like
transcription factor Smp1 by the stress-activated Hog1 mitogen-activated protein
kinase. Mol Cell Biol. 23: 229– 237.
Nei, M. (1996). Phylogenetic analysis in molecular evolutionary genetics. Annu
Rev Genet. 30: 371-403.
Nielsen, R., and Yang, Z. (1998). Likelihood models for detecting positively
selected amino acid sites and applications to the HIV-1 envelope gene. Genetics.
148: 929–936.
Nielsen, C. B., Friedman, B., Birren, B., Burgue, C. B., and Galagan, J. E.
(2004). Patterns of intron gain and loss in Fungi. Plos Biology. 2(12): e422.
REFERENCES
117
Nogueira, M. I., Silva, P., Galindo, I., Ramirez, J. L., Arruda, R., and Franco, J.
(1998). Electrophoretic karyotypes and genome sizing of the pathogenic fungus
Paracoccidioides brasiliensis. J Clin Microbiol. 36(3): 742-747.
Norman, C., Runswick, M., Pollock, R., and Treisman, R. (1988). Isolation and
properties of cDNA clones encoding SRF, a transcription factor that binds to the
c-fos serum response element. Cell. 55: 989– 1003.
Nylander, J. A. A. (2004). MrModeltest v2. Program distributed by the author.
Evolutionary Biology Centre, Uppsala University.
Odds, F. C. (1988). Candida and Candidosis: A Review and Bibliography. 2 ed.
Baillière Tindall, London.
Ohno, S. (1970). Evolution by Gene Duplication (Allen and Unwin, London).
Ophuls, M. D. (1905). Further observations on a pathogenic mould formerly
described as a protozoon (Coccidioides immitis, Coccidioides pyrogenes). J Exp
Med. 6: 443-485.
Pace, N. R. (1997). A molecular view of microbial diversity and the biosphere.
Science. 276: 734–740.
Pan, S., Sigler, L., and Cole, G. T. (1994). Evidence for a phylogenetic
connection between Coccidioides immitis and Uncinocarpus reesii
(Onygenaceae). Microbiology. 140 (Pt 6): 1481-1494.
Passmore, S., Maine, G. T., Elble, R., Christ, C., and Tye, B. K. (1988).
Saccharomyces cerevisiae protein involved in plasmid maintenance is necessary
for mating of MAT alpha cells. J Mol Biol. 204: 593– 606.
Passmore, S., Elble, R., and Tye, B. K.. (1989). A protein involved in
minichromosome maintenance in yeast binds a transcriptional enhancer
conserved in eukaryotes. Genes Dev. 3: 921– 935.
Peak, I. R. A., Jennings, M. P., Hood, D. W., Bisercic, M., and Moxon, E. R.
(1996). Tetrameric repeat units associated with virulence factor phase variation
in Haemophilus also occur in Neiserria spp. and Moraxella catarrhalis. FEMS
Microbiol Lett. 137: 109–114.
Pedrós, M. (2003). Clonación y caracterización de una hidrofobina de clase II
(CaHPB) de Candida albicans. Tesis Doctoral. Facultad de Farmacia.
Universitat de Valencia. Valencia, España. 215 pag.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
118
Pellegrini, L., Tan, S., and Richmond, T. J. (1995). Structure of serum response
factor core bound to DNA. Nature. 376: 490– 498.
Peretó, J., López-García, P., and Moreira, D. (2004). Ancestral lipid biosynthesis
and early membrane evolution. Trends Biochem Sci. 29: 469-477.
Persoh, D., and Rambold, G. (2003). Fungal diversity in the host lichen Letharia
vulpine. In J. Stadler, I. Hensen, S. Klotz, and H. Feldmann (eds.),
Biodiversity—from patterns to processes. Kurzfassungen der Beiträge zur 33.
Jahrestagung der Gesellschaft für Ökologie, 161. Verhandlungen der
Gesellschaft für Ökologie, Halle/Saale, Germany.
Peterson, S. W. (2000). Phylogenetic relationships in Aspergillus based on
rDNA sequences analysis. In R. A. Samson and J. I. Pitt (eds.), Integration of
modern taxonomic methods for Penicillium and Aspergillus classification.
Harwood Academic Publishers, Amsterdam, Netherlands.
Peyretaillade, E., Broussolle, V., Peyret, P., Méténier, G., Gouy, M., Vivarès, C.
P. (1998). Microsporidia, amitochondrial protists, possess a 70-kDa heat shock
protein gene of mitochondrial evolutionary origin. Molecular Biology and
Evolution. 15: 683–689.
Pfaller, M. A., Diekema, D. J., Jones, R. N., Sader, H. S., Fluit, A. C., Hollis, R.
J., and Messer, S. A. (2001). International surveillance of bloodstream infections
due to Candida species: frequency of occurrence and in vitro susceptibilities to
fluconazole, ravuconazole, and voriconazole of isolates collected from 1997
through 1999 in the SENTRY antimicrobial surveillance program. J Clin
Microbiol. 39: 3254–3259.
Pfaller, M. A., Diekema, D. J., Messer, S. A., Boyken, L., Hollis, R. J., and
Jones, R. N. (2003). In vitro activities of voriconazole, posaonazole, and four
licensed systemic antifungal agents against Candida species infrequently
isolated from blood. J Clin Microbiol. 41: 78–83.
Phillips, M. J., Delsuc, F., and Penny, D. (2004). Genome-scale phylogeny and
the detection of systematic biases. Mol Biol Evol. 21(7): 1455-1458.
Pirozynski, K. A., and Malloch, D. (1975). The origin of land plants: a matter of
mycotropism. BioSystems. 6: 153–164.
REFERENCES
119
Pollock, R., and Treisman, R. (1991). Human SRF-related proteins:
DNAbinding properties and potential regulatory targets. Genes Dev. 5: 2327–
2341.
Polonelli, L., Conti, S., Magliani, W., and Horace, G. (1989). Biotyping of
pathogenic fungi by the killer system and with monoclonal antibodies.
Mycopathologia. 107(1): 17 – 23.
Posada, D. (2008). jModelTest: Phylogeny model averaging. Mol Biol Evol.
25(7): 1253-1256.
Pujol, C., Daniels, K. J., Lockhart, S. R., Srikantha, T., Radke, J. B., Geiger, J.,
and Soll, D. R. (2004). The closely related species Candida albicans and
Candida dubliniensis can mate. Eukaryot Cell. 3(4): 1015-1027.
Raes J., and van de Peer, Y. (2005). Functional divergenceof proteins through
frameshift mutations. Trends in Genetics. 21(8): 428-431.
Ragan, M. A., Goggins, C. L., Cawthorn, R. J., Cerenius, L., Jamieson, A. V. C.,
Plourde, S. M., Rand, T. G., Söderhäll, K., and Gutell, R. R. (1996). A novel
clade of protistan parasites near the animal-fungal divergence. Proc Natl Acad
Sci. 93: 11907–11912.
Reeb, V., Lutzoni, F., and Roux, C. (2004). Contribution of RPB2 to multilocus
phylogenetic studies of the euascomycetes (Pezizomycotina, Fungi) with special
emphasis on the lichen-forming Acarosporaceae and evolution of polyspory.
Molecular Phylogenetics and Evolution. 32: 1036-1060.
Ribeiro, E., Pereira, C., Takaki, R., and Höfling, J.F. (2000). Grouping oral
Candida species by multilocus enzyme electrophoresis. Int J Syst Evol
Microbiol. 50: 1343-1349.
Richard, G.–F., and Dujon, B. (1997). Trinucleotide repeats in yeast. Res
Microbiol, 148: 731–744.
Riechmann, J. L., and Meyerowitz, E. M. (1997). Determination of floral organ
identity by Arabidopsis MADS domain homeotic proteins AP1, AP3, PI, and
AG is independent of their DNA-binding specificity. Mol Biol Cell. 8: 1243–
1259.
Riechmann, J. L., Krizek, B. A., and Meyerowitz, E. M. (1996). Dimerization
specificity of Arabidopsis MADS domain homeotic proteins APETALA1,
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
120
APETALA3, PISTILLATA, and AGAMOUS. Proc Natl Acad Sci. 93: 4793 –
4798.
Rixford, E., and Gilchrist, C. (1896). Two cases of protozoan (coccidioidal)
infection of the skin and other organs. Johns Hopkins Hospital Report. 1: 209-
268.
Rensberger, B. (1992). The Iceman: Now the research is on ice. J NIIT Res. 4:
25–27.
Rodrigues de Miranda, L. (1979). Clavispora, a new yeast genus of the
Saccharomycetales. Antonie Van Leeuwenhoek. 45(3): 479-483.
Roger, A. J, and Hug, L. A. (2006). The origin and diversification of eukaryotes:
problems with molecular phylogenetics and molecular clock estimation. Philos
Trans R Soc Lond B Biol Sci. 361: 1039-1054.
Rokas, A., King, N., Finnerty, J., and Carroll, S. B. (2003). Conflicting
phylogenetic signals at the base of the metazoan tree. Evol. Dev. 5: 346–359.
Romero, A. J., and Minter, D. W. (1988). Fluorescence microscopy: an aid to the
elucidation of ascomycete structures. Trans Br Mycol Soc. 90: 457–470.
Rottmann, M., Dieter, S., Brunner, H., and Rupp, S. (2003). A screen in
Saccharomyces cerevisiae identified CaMCM1, an essential gene in Candida
albicans crucial for morphogenesis. Mol Microbiol. 47: 943–959.
Saitou, N., and Nei, M. (1987). The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Mol Biol Evol. 4: 406–425.
Sampaio, P., Gusmão, L., Alves, C., Pina-Vaz, C., Amorim, A., and Pais, C.
(2003). Highly polymorphic microsatellite for identification of Candida albicans
strains. J Clin Microbiol. 41(2): 552-557.
Sampaio, P., Gusmão, L., Correia, A., Alves, C., Rodrigues, A., Pina-Vaz, C.,
Amorim, A., and Pais, C. (2005). New microsatellite multiplex PCR for Candida
albicans strain typing reveals microevolutionary changes. J Clin Microbiol.
43(8): 3869-3876.
Sampaio, P., Nogueira, E., Loureiro, A. S., Delgado-Silva, Y, Correia, A., Pais,
C. (2009). Increased number of glutamine repeats in the C-terminal of Candida
albicans Rlm1p enhances the resistant to stress agents. Antonie van
Leeuwehoek. In press.
REFERENCES
121
Sandven, P. (2000). Epidemiology of candidemia. Rev Iberoam Micol. 17: 73–
81.
San-Millan, R., Ribacoba, L., Ponton, J., and Quindos, G. (1996). Evaluation of
a commercial medium for identification of Candida species. J Clin Microbiol
Infect Dis. 15: 153 – 158.
Santos, M. A., and Tuite, M. F. (1995). The CUG codon is decoded in vivo as
serine and not leucine in Candida albicans. Nucleic Acids Res. 23(9): 1481-
1486.
Santelli, E., and Richmond, T.J. (2000). Crystal structure of MEF2A core bound
to DNA at 1.5 A resolution. J Mol Biol. 297: 437– 449.
Savelkoul, P. H. M., Aarts, H. J. M., de Haas, J., Dijkshoorn, L., Duim, B.,
Otsen, M., Rademaker, J. L. W., Schouls, L., and Lenstra, J. A. 1999. Amplified
fragment length polymorphism analysis: the state of an art. J Clin Microbiol. 37:
3083–3091.
Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S., and Wolfe, K. H. (2006).
Multiple rounds of speciation associated with reciprocal gene loss in polyploid
yeasts. Nature. 440(7082): 341-345.
Shalchian-Tabrizi, K., Minge, M. A., Espelund, M., Orr, R., Ruden, T.,
Jakobsen, K. S., and Cavalier-Smith, T. (2008). Multigene phylogeny of
Choanozoaand the Origin of Animals. Plos ONE. 3(5): e2098.
Schoch, C. L., Shoemaker, R. A., Seifert, K. A. Hambleton, S., Spatafora, J. W.,
and Crous, P. W. (2006). A multigene phylogeny of the Dothideomycetes using
four nuclear loci. Mycologia. 98(6): 1041-1052.
Schüßler, A., Schwarzott, D., and Walker, C. (2001). A new fungal phylum, the
Glomeromycota: phylogeny and evolution. Mycologycal Research. 105: 1413-
1421.
Schwarz-Sommer, Z., Huijser, P., Nacken, W., Saedler, H., and Sommer, H.
(1990). Genetic control of flower development by homeotic genes in
Antirrhinum majus. Science. 250: 931– 936.
Seifert, K. A., Wingfield, B. D., and Wingfield, M. J. 1995. A critique of DNA
sequence analysis in the taxonomy of filamentous Ascomycetes and
ascomycetous anamorphs. Can J Bot. 73 (Suppl. 1): S760–S767.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
122
Seoighe, C., and Wolfe, K. H. (1999). Updated map of duplicated regions in the
yeast genome. Gene. 238: 253–261.
Sharrocks, A. D., Gille, H., and Shaw, P. E. (1993). Identification of amino acids
essential for DNA binding and dimerization in p67SRF: implications for a novel
DNA-binding motif. Mol Cell Biol. 13: 123– 132.
Shemer, R., Weissman, Z., Hashman, N., and Kornitzer, D. 2001. A highly
polymorphic degenerate microsatellite for molecular strain typing of Candida
krusei. Microbiology. 147: 2021-2028.
Shin, J. H., Shin, D. H., Song, J. W., Kee, S. J., Suh, S. P., and Ryang, D. W.
(2001). Electrophoretic karyotype analysis of sequential Candida parapsilosis
isolates from patients with persistent or recurrent fungemia. J Clin Microbiol.
39(4): 1258-1263.
Sneath, P. H. A., and Sokal, R. R. (1973). Numerical Taxonomy. W.H. Freeman
and Company, San Francisco, pp 230-234.
Soll, D. R., Staebell, M., Langtimm, C.J., Pfaller, M., Hicks, J., and Rao, T.V.G.
(1988). Multiple Candida strains in the course of a single systemic infection. J.
Clin. Microbiol. 26: 1448-1459.
Soll, D. (2000). The ins and out of DNA fingerprinting the infectious fungi. Clin
Microbiol Rev. 13(2): 332-370.
Sommer, H., Beltran, J. P., Huijser, P., Pape, H., Lonnig, W. E., Saedler, H., and
Schwarz-Sommer, Z. (1990). Deficiens, a homeotic gene involved in the control
of flower morphogenesis in Antirrhinum majus: the protein shows homology to
transcription factors. EMBO J. 9: 605– 613.
Spatafora, J. W., Sung, G-H., Johnson, D., Hesse, C., O‘Rourke, B., Serdani, M.,
Spotts, R., Lutzoni, F., Hofstetter, V., Miadlikowska, J., Reeb, V., Gueidan, C.,
Fraker, E., Lumbsch, T., Lücking, R., Schmitt, I., Hosaka, K., Aptroot, A.,
Roux, C., Miller, A. N., Geiser, D. M., Hafellner, J., Hestmark, G., Arnold, A.
E., Büdel, B., Rauhut, A., Hewitt, D., Untereiner, W. A., Cole, M. S.,
Scheidegger, C., Schultz, M., Sipman, H., and Schoch, C. L. (2006). A five-gene
phylogeny of Pezizomycotina. Mycologia. 98(6): 1018-1028.
Stechmann, A., and Cavalier-Smith, T. (2002). Rooting the eukaryote tree by
using a derived gene fusion. Science. 297: 89–91.
REFERENCES
123
Stechmann, A., and Cavalier-Smith, T. (2003). The root of the eukaryote tree
pinpointed. Curr Biol. 13: R665–R666.
Steenkamp, E. T., Wright, J., and Baldauf, S. (2006). The protistan origins of
Animals and Fungi. Mol Biol Evol. 23(1): 93 -106.
Stenroos, S., Hyvönen, J., Myllys, L., Thell, A., and Ahti, T. (2002). Phylogeny
of the genus Cladonia s. lat. (Cladoniaceae, Ascomycetes) inferred from
molecular, morphological, and chemical data. Cladistics. 18: 237–278.
Stern, A., Doron-Faigenboim, A., Erez, E., Martz, E., Bacharach, E., and Pupko,
T. (2007). Selecton 2007: advanced models for detecting positive and purifying
selection using a Bayesian inference approach. Nucleic Acids Research. 35:
W506-W511.
Sugita, T., Takashima, M., Poonwan, N., and Mekha, N. (2006). Candida
pseudohaemulonii an amphotericin B- and azole-resistant yeast species, isolated
from the blood of a patient from Thailand. Microbiol Immunol. 50(6): 469-473.
Suh, S-O., Blackwell, M., Kurtzman, C., and Lachance, M-A. (2006).
Phylogenetics of Saccharomycetales, the ascomycete yeasts. Mycologia. 98(6):
1006-1017.
Sullivan, D., and Coleman, D. (1997). Candida dubliniensis: an emerging
opportunistic pathogen. Curr Top Med Mycol. 8: 15–25.
Sullivan, D. J., Westerneng, T. J., Haynes, K. A., Bennet, D. E., and Coleman,
D. C. (1995). Candida dubliniensis sp. nov.: Phenotypic and molecular
characterization of a novel species associated with oral candidosis in HIV-
infected individuals. Microbiology. 141: 1507-1521.
Sundstrom, P. (2002). Adhesion in Candida spp. Cell Microbiol. 4: 461–469.
Suzuki, Y., Glazko, G. V., and Nei, M. (2002). Overcredibility of molecular
phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci. 99:
16138–16143.
Swanson, W. J., Yang, Z., Wolfner, M., and Aquado, C. F. (2001). Positive
Darwinian selection drives the evolution of several female reproductive proteins
in mammals. Proc Natl Acad Sci. 98: 2509–2514.
Swofford, D. L. (2002). PAUP: Phylogenetic analysis using parsimony (*and
other methods). Sunderland, Massachusetts, Sinauer Associates.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
124
Tachida, H., and Izuka, M. (1992). Persistence of repeated sequences that evolve
by replication slippage. Genetics. 131: 471–478.
Takezaki, N., and Nei, M. (1994). Inconsistency of the maximum parsimony
method when the rate of nucleotide substitution is constant. J Mol Evol. 39:
210–218.
Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular
Biology and Evolution. 24: 1596-1599.
Tanabe, Y., O‘Donnell, K., Saikawa, M., and Sugiyama, J. (2000). Molecular
phylogeny of parasitic Zygomycota (Dimargaritales, Zoopagales) based on
nuclear small subunit ribosomal DNA sequences. Molecular Phylogenetics and
Evolution. 16: 253–262.
Tanabe, Y., Saikawa, M., Watanabe, M. M., and Sugiyama, J. (2004). Molecular
phylogeny of Zygomycota based on EF-1 and RPB1 sequences: limitations and
utility of alternative markers to rDNA. Molecular Phylogenetics and Evolution.
30: 438–449.
Tanabe, Y., Watanabe, M. M., and Sugiyama, J. (2005). Evolutionary
relationships among basal fungi (Chytridiomycota and Zygomycota): Insights
from molecular phylogenetics. Journal of General and Applied Microbiology.
51: 267–276.
Tateishi, T., Murayama, S. Y., Otsuka, F., and Yamaguchi, H. (1996).
Karyotyping by PFGE of clinical isolates of Sporothrix schenckii. FEMS
Immunol Med Microbiol. 13: 147–154.
Tavanti, A., Davidson, A., Johnson, E., Maiden, M., Shaw, D., Gow, N., and
Odds, F. (2005a). Multilocus sequence typing for differentiation of strains of
Candida tropicalis. J Clin Microbiol. 43(11): 5593-5600.
Tavanti, A., Davidson, A., Gow, N., Maiden, M., and Odds, F. (2005b). Candida
orthopsilosis and Candida metapsilosis spp. nov. to replace Candida
parapsilosis groups II and III. J Clin Microbiol. 43(1): 284-292.
Taylor, F. J. R. (1978). Problems in the development of an explicit hypothetical
phylogeny of the lower eukaryotes. BioSystems. 10: 67–89.
REFERENCES
125
Taylor, J. W., Geiser, D. M., Burt, A., and Koufopanou, V. (1999). The
evolutionary biology and population genetics underlying fungal strain typing.
Clin Microbiol Rev. 12(1): 126-146.
Tehler, A. (1988). A cladistic outline of the Eumycota. Cladistics. 4: 227–277.
Tehler, A., Farris, J. S., Lipscomb, D. L., Källerrsjö, M. (2000). Phylogenetic
analyses of the fungi based on large rDNA data sets. Mycologia. 92: 459–474.
Tehler, A., Little, D. P., and Farris, J. S. 2003. The full-length phylogenetic tree
from 1551 ribosomal sequences of chitinous fungi, Fungi. Mycological
Research. 107: 901–916.
Theissen, G., Kim, J., and Saedler, H. 1996. Classification and phylogeny of the
MADS-box multigene family suggest defined roles of MADS-box gene
subfamilies in the morphological evolution of eukaryotes. J Mol Evol. 43: 484-
516.
Thomas, J. R., Dwek, R. A., and Rademacher, T. W. (1990). Structure,
biosynthesis, and function of glycosylphosphatidylinositols. Biochemistry. 29:
5413–5422.
Thompson, J. D., Higgins, D. G., and Gibson T. J. (1994). CLUSTALW:
improving the sensitivity of progressive multiple sequence alignment through
sequence weighting, position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22: 4673–4680.
Tiede, A., Bastisch, I., Schubert, J., Orlean, P., and Schmidt, R. E. (1999).
Biosynthesis of glycosylphosphatidylinositols in mammals and unicellular
microbes. Biol Chem. 380: 503–523.
Tietz, H., Hopp, M., Schmalreck, A., Sterry, W., and Czaika, V. (2001).
Candida africana sp. nov. a new human pathogen or a variant of Candida
albicans?. Mycoses. 44: 437-445.
Toth, G., Gaspari, Z., and Jurka, J. (2000). Microsatellites in different eukaryotic
genomes: survey and analysis. Genome Res. 10: 967–981.
Treisman, R. (1990). The SRE: a growth factor responsive transcriptional
regulator. Semin. Cancer Biol. 1: 47– 58.
Trifonov, E. N. (2003). Tuning function of tandemly repeating sequences: a
molecular device for fast adaptation. Pp. 1–24 in S. P. Wasser, ed., Evolutionary
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
126
theory and processes: nodern horizons, papers in honor of Eviatar Nevo. Kluwer
Academic Publishers. Amsterdam, The Netherlands.
Tzung, K. W., Williams, R. M., Scherer, S., Federspiel, N., Jones, T., Hansen,
N., Bivolarevic, V., Huizar, L., Komp, C., Surzycki, R., Tamse, R., Davis, R.
W., and Agabian, N. (2001). Genomic evidence for a complete sexual cycle in
Candida albicans. Proc Natl Acad Sci. 98(6): 3249-3253.
Valenza, G., Valenza, R., Brederlau, J., Frosch, M., and Kurzai, O. (2006).
Identification of Candida fabianii as a cause of letal septicaemia. Mycoses. 49:
331-334.
Van Belkum, A., Scherer, S., van Alphen, L., and Verbrugh, H. (1998). Short-
sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev. 62:
275–293.
Van de Peer, Y., Baldauf, S. L., Doolittle, W. F., and Meyer, A. (2000). An
updated and comprehensive rRNA phylogeny of (crown) eukaryotes based on
rate-calibrated evolutionary distances. J Mol Evol. 51: 565 – 576.
Van der Walt, J. P. (1966). Lodderomyces, a new genus of the
Saccharomycetaceae. Antonie Van Leeuwenhoek 1966. 32(1): 1-5.
Voigt, K., and Wöstemeyer, J. (2001). Phylogeny and origin of 82 zygomycetes
from all 54 genera of the Mucorales and Mortierellales based on combined
analysis of actin and translation elongation factor EF-1α genes. Gene. 270: 113-
120.
Vos, P., Hogers, R., Bleeker, M., Reijans, M., Van de Lee, T., Hornes, M.,
Frijters, A., Pot, J., Peleman, J., Kuiper, M., and Zabeau, M. (1995). AFLP: a
new technique for DNA fingerprinting. Nucleic Acids Res. 23: 4407-4414.
Wainright, P. O., Hinkle, G., Sogin, M. L., and Stickel, S. K. (1993).
Monophyletic origins of the Metazoa: an evolutionary link with fungi. Science.
260: 340 – 342.
Wang, Z.; Binder, M.; Dai, Y.C.; and Hibbett, D.S. 2004. Phylogenetic
relationships of Sparassis inferred from nuclear and mitochondrial ribosomal
DNA and RNA polymerase sequences. Mycologia. 96: 1013-1027.
Watanabe, Y., Irie, K., and Matsumoto, K. (1995). Yeast RLM1 encodes a
serum response factor-like protein that may function downstream of the Mpk1
(Slt2) mitogen-activated protein kinase pathway. Mol Cell Biol. 15: 5740–5749.
REFERENCES
127
Weig, M., Jansch, L., Gross, U., De Koster, C. G., Klis, F. M., and De Groot, P.
W. (2004). Systematic identification in silico of covalently bound cell wall
proteins and analysis of protein-polysaccharide linkages of the human pathogen
Candida glabrata. Microbiology. 150: 3129–3144.
Wernersson, R. (2006). Virtual Ribosome-a comprehensive DNA translation
tool with support for integration of sequence feature annotation. Nucleic Acids
Research. 34: W385-W388.
Whalley, A. J. S., and Edwards, R. L. (1995). Secondary metabolites and
systematic arrangement within the Xylariaceae. Can J Bot. 73(Suppl. 1): S802-
S810.
Wickerham, L. J., and Burton, K. A. (1954). A clarification of the relationship of
Candida guilliermondii to other yeasts by a study of their mating types. J
Bacteriol. 68(5): 594-597.
Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski, J. A., and Tingey, S.
V. (1990). DNA polymorphisms amplified by arbitrary primers are useful as
genetic markers. Nucleic Acids Research. 18: 6531-6535.
Whittaker, R. (1969). New concepts of kingdoms of organisms. Science. 163:
150–160.
Wingard, J. R., Mertz, W. G., Rinaldi, M. G., Johnson, T. R., Karp, J. E., and
Saral, R. (1991). Increase in Candida krusei infection among patients with bone
marrow transplant and neutropenia treated prophylactically with fluconazole. N
Engl J Med. 325: 1274–1277.
Wolfe, K. H., and Shields, D. C. (1997). Molecular evidence for an ancient
duplication of the entire yeast genome. Nature. 387: 708–713.
Woese, C., and Fox, G. (1977). Phylogenetic structure of the prokaryotic
domain: The primary kingdoms. Proc Natl Acad Sci. 74(11): 5088 – 5090.
Woese, C.; Kandler, O.; and Wheelis, M. (1990). Towards a Natural System of
Organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl
Acad Sci. 87(12): 4576-4579.
Wong, W. S. W., Yang, Z., Goldman, N., and Nielsen, R. (2004). Accurancy and
power of statistical methods for detecting adaptive evolution in protein coding
sequences and for identifying positive selective sites. Genetics. 168: 1041–1051.
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
128
Wren, J. D., Forgacs, E., Fondon, J. W., 3rd, Pertsemlidis, A., Cheng, S. Y.,
Gallardo, T., Williams, R. S., Shohet, R. V., Minna, J. D., and Garner, H. R.
(2000). Repeat polymorphisms within gene regions: phenotypic and
evolutionary implications. Am J Hum Genet. 67: 345–356.
Yabana, N., and Yamamoto, M. (1996). Schizosaccharomyces pombe map1+
encodes a MADS-box-family protein required for cell-type-specific gene
expression. Mol Cell Biol. 16: 3420 – 3428.
Yanofsky, M. F., Ma, H., Bowman, J. L., Drews, G. N., Feldmann, K. A., and
Meyerowitz, E. M. (1990). The protein encoded by the Arabidopsis homeotic
gene agamous resembles transcription factors. Nature. 346: 35– 39.
Yang, Z. (1997). PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput Appl Biosci. 13: 555–556.
Yang, Z., and Bielawski, J. P. (2000). Statistical methods for detecting
molecular adaptation. Trends Ecol Evolut. 15: 496–503.
Yang, Z., Wong, W. S., and Nielsen, R. (2005) Bayes empirical Bayes inference
of amino acid sites under positive selection. Mol Biol Evol. 22: 1107–1118.
Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A. M. K, (2000). Codon
substitution models for heterogeneous selection pressure at amino acid sites.
Genetics. 155: 431–449.
Young, L. Y., Lorenz, M. C., and Heitman, J. (2000). A STE12 homolog is
required for mating but dispensable for filamentation in Candida lusitaniae.
Genetics. 155(1): 17-29.
Young, E. T., Sloan, J. S., and van Riper, K. (2000). Trinucleotide repeats are
clustered in regulatory genes in Saccharomyces cerevisiae. Genetics. 154: 1053–
1068.
Yu, G., and Fassler, J. S. (1993). SPT13 (GAL11) of Saccharomyces cerevisiae
negatively regulates activity of the MCM1 transcription factor in Ty1 elements.
Mol Cell Biol. 13: 63 – 71.
Zhang, N., Clastebury, L. A., Miller, A. N., Huhndorf, S. M., Schoch, C. L.,
Seifert, K. A., Rossman, A. Y., Rogers, J. D., Kohlmeyer, J., Volkmann-
Kohlmeyer, B., and Sung. G-H. (2006). An overview of the systematic of the
Sordariomycetes based on a four-gene phylogeny. Mycologia. 98(6): 1076-
1087.
ANNEXES
ANNEXES
131
7. ANNEXES
ANNEX I
MrModelblock
#NEXUS
[Johan Nylander 2004-07-01]
[! ***** MrModeltest block -- Modified from MODELTEST 3.0 *****]
[The following command will calculate a NJ tree using the JC69 model of evolution]
BEGIN PAUP;
Log file= mrmodelfit.log replace;
DSet distance=JC objective=ME base=equal rates=equal pinv=0
subst=all negbrlen=setzero;
NJ showtree=no breakties=random;
End;
[!
***** BEGIN TESTING 24 MODELS OF EVOLUTION ***** ]
BEGIN PAUP;
Default lscores longfmt=yes; [Workaround for the bug in PAUP 4b10]
Set criterion=like;
[!
** Model 1 of 24 * Calculating JC **]
lscores 1/ nst=1 base=equal rates=equal pinv=0
scorefile=mrmodel.scores replace;
[!
** Model 2 of 24 * Calculating JC+I **]
lscores 1/ nst=1 base=equal rates=equal pinv=est
scorefile=mrmodel.scores append;
[!
** Model 3 of 24 * Calculating JC+G **]
lscores 1/ nst=1 base=equal rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 4 of 24 * Calculating JC+I+G **]
lscores 1/ nst=1 base=equal rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
[!
** Model 5 of 24 * Calculating F81 **]
lscores 1/ nst=1 base=est rates=equal pinv=0
scorefile=mrmodel.scores append;
[!
** Model 6 of 24 * Calculating F81+I **]
lscores 1/ nst=1 base=est rates=equal pinv=est
scorefile=mrmodel.scores append;
[!
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
132
** Model 7 of 24 * Calculating F81+G **]
lscores 1/ nst=1 base=est rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 8 of 24 * Calculating F81+I+G **]
lscores 1/ nst=1 base=est rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
[!
** Model 9 of 24 * Calculating K80 **]
lscores 1/ nst=2 base=equal tratio=est rates=equal pinv=0
scorefile=mrmodel.scores append;
[!
** Model 10 of 24 * Calculating K80+I **]
lscores 1/ nst=2 base=equal tratio=est rates=equal pin=est
scorefile=mrmodel.scores append;
[!
** Model 11 of 24 * Calculating K80+G **]
lscores 1/ nst=2 base=equal tratio=est rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 12 of 24 * Calculating K80+I+G **]
lscores 1/ nst=2 base=equal tratio=est rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
[!
** Model 13 of 24 * Calculating HKY **]
lscores 1/ nst=2 base=est tratio=est rates=equal pinv=0
scorefile=mrmodel.scores append;
[!
** Model 14 of 24 * Calculating HKY+I **]
lscores 1/ nst=2 base=est tratio=est rates=equal pinv=est
scorefile=mrmodel.scores append;
[!
** Model 15 of 24 * Calculating HKY+G **]
lscores 1/ nst=2 base=est tratio=est rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 16 of 24 * Calculating HKY+I+G **]
lscores 1/ nst=2 base=est tratio=est rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
[!
** Model 17 of 24 * Calculating SYM **]
lscores 1/ nst=6 base=equal rmat=est rclass= (a b c d e f) rates=equal
pinv=0
scorefile=mrmodel.scores append;
[!
** Model 18 of 24 * Calculating SYM+I **]
lscores 1/ nst=6 base=equal rmat=est rates=equal pinv=est
scorefile=mrmodel.scores append;
[!
** Model 19 of 24 * Calculating SYM+G **]
ANNEXES
133
lscores 1/ nst=6 base=equal rmat=est rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 20 of 24 * Calculating SYM+I+G **]
lscores 1/ nst=6 base=equal rmat=est rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
[!
** Model 21 of 24 * Calculating GTR **]
lscores 1/ nst=6 base=est rmat=est rates=equal pinv=0
scorefile=mrmodel.scores append;
[!
** Model 22 of 24 * Calculating GTR+I **]
lscores 1/ nst=6 base=est rmat=est rates=equal pinv=est
scorefile=mrmodel.scores append;
[!
** Model 23 of 24 * Calculating GTR+G **]
lscores 1/ nst=6 base=est rmat=est rates=gamma shape=est pinv=0
scorefile=mrmodel.scores append;
[!
** Model 24 of 24 * Calculating GTR+I+G **]
lscores 1/ nst=6 base=est rmat=est rates=gamma shape=est pinv=est
scorefile=mrmodel.scores append;
Log stop=yes;
END;
134
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Annex II. Fungal species where were identified ortologue RLM1 genes. Taxonomy, Nº intron, gene location and database.
Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center
Phycomyces blakeleeanus Mucoromycotina 5 Scaffold 1: 2877572 – 2879654 (-) Joint Genome Institute
Mucor circinelloides Mucoromycotina 3 Scaffold 2: 32637 – 34277 (-) Broad Institute, Murcia University
Rhizopus oryzae Mucoromycotina 4 Supercontig 2: 4147507 – 4149079 (-) Broad Institute
Ustilago maydis Ustilagomycetes - Contig 191: 196940 – 198718 (+) Broad Institute
Malassezia globosa Malasseziales - Sf 7.1: 204470 – 205939 (+) NCBI
Sporobolomyces roseus Microbotryomycetes 2 Scaffold 10: 85788 – 88407 (+) Joint Genome Institute
Cryptococcus neoformans Tremellomycetes 4 Chromosome 2: 1360064 - 1362029 (-) Broad Institute, Genome Sciences
Center Canada, Duke Center for
Genome Research, Stanford Genome
Technology
Cryptococcus gatti Tremellomycetes 4 Supercontig 16: 244132 – 246064 (+) Broad Institute
Laccaria bicolor Agaricomycetes 4 Scaffold 20: 147515 - 149297 (+) Joint Genome Institute
Coprinus cinereus Agaricomycetes 7 Contig 317: 81588 – 83599 (-)‖ Broad Institute
Phanerochaete chrysosporium Agaricomycetes 4 Scaffold 3: 1745299 – 1747266 (+) Joint Genome Institute
Postia placenta Agaricomycetes 8 Scaffold 174: 171333 – 174904 (+) Joint Genome Institute
Schizosaccharomyces japonicus Schizosaccharomycetes - Supercontig 5: 941790 - 942323 (+) Broad Institute
Schizosaccharomyces pombe Schizosaccharomycetes - Chromosome 2: 3626639-3627757 (+) Sanger Institute
Schizosaccharomyces octosporus Schizosaccharomycetes - Supercontig 2: 351924 – 352948 (+) Broad Institute
Yarrowia lipolytica Saccharomycetes 1 Chromosome F: 2507846 – 2509480 (-) Génolevures
Candida lusitaniae Saccharomycetes - Supercontig 5: 493096 - 494358 (-) Broad Institute
Candida guilliermondii Saccharomycetes - Supercontig 3: 493468 - 494775 (+) Broad Institute
ANNEXES
135
Pichia stipitis Saccharomycetes - Chromosome 1: 1519668 – 1521421 (+) Joint Genome Institute
Debaryomyces hansenii Saccharomycetes - Chromosome 2: 237182 - 238732 (+) Génolevures
Lodderomyces elongisporus Saccharomycetes - Supercontig 6: 1060953 - 1063316 (+) Broad Institute
Candida parapsilosis Saccharomycetes - Contig 005806: 536977 – 539238 (-) Sanger Institute
Candida tropicalis Saccharomycetes - Supercontig 1: 49905 - 51623 (-) Broad Institute, Génolevures
Candida albicans Saccharomycetes - Chromosome 4: 253044 – 254879 (+) Stanford Genome Technology Center,
Sanger Institute and Broad Institute
Candida dubliniensis Saccharomycetes - Chromosome 4: 265932 – 267662 (+) Sanger Institute
Kluyveromyces lactis Saccharomycetes - Chromosome E: 2138785 – 2140983 (+) Génolevures
Ashbya gossypii Saccharomycetes 1 Chromosome 7: 1125892 - 1127486 (-) Biozentrum and Syngenta
Saccharomyces kluyveri Saccharomycetes 1 Contig 4.8: 312314 – 314049 (-) Washington University Genome
Sequencing Center, Génolevures
Kluyveromyces waltii Saccharomycetes 1 Contig 62: 28422 – 29929 (+) Broad Institute
Candida glabrata Saccharomycetes - Chromosome H: 556498 – 558378 (+) Génolevures
Vanderwaltozyma polyspora Saccharomycetes - Contig 1048b: 36130 – 38160 (-) Trinity College Dublin, Ireland
Saccharomyces castellii Saccharomycetes - Contig 664: 5363 - 7027 (-) Washington University Genome
Sequencing Center
Saccharomyces bayanus Saccharomycetes - Contig 737: 12450 – 14405 (+) Washington University Genome
Sequencing Center, Broad Institute,
Génolevures
Saccharomyces kudriavzevii Saccharomycetes - Contig 02.2091: 16474 – 18432 (-) Washington University Genome
Sequencing Center
Saccharomyces mikatae Saccharomycetes - Contig 34: 13635 – 15605 (-) Washington University Genome
Sequencing Center, Broad Institute
136
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Saccharomyces cerevisiae Saccharomycetes - Chromosome 16: 379117 – 379117 (-) Stanford Genome Technology Center,
Sanger Institute, Broad Institute
Saccharomyces paradoxus Saccharomycetes - Contig 387: 39147 – 40522 (-) Broad Institute
Botrytis cinerea Leotiomycetes 2* Supercontig 4: 131839 – 133900 (+) Broad Institute and Syngenta AG,
Genoscope
Sclerotinia sclerotiorum Leotiomycetes 2 Supercontig 7: 1397346 - 1399404 (-) Broad Institute
Magnoporthe grisea Sordariomycetes 2 Supercontig 27: 2198205 - 2200477 (+) Broad Institute
Verticillium dahliae Sordariomycetes 2 Supercontig 2: 411495 - 413529 (-) Broad Institute
Podospora anserina Sordariomycetes 1 Chromosome 1_SC1: 2337785 – 2339800 (+) Genoscope
Chaetomium globosum Sordariomycetes 3* Supercontig 1: 4237276 - 4239345 (+) Broad Institute
Neurospora crassa Sordariomycetes 2 Contig 6: 745962 - 748345 (-) Broad Institute
Nectria haematococca Sordariomycetes 2 Contig 1: 4240418 – 4242558 (+) Joint Genome Institute
Fusarium oxysporum Sordariomycetes 2 Supercontig 6: 882635 - 884756 (-) Broad Institute
Fusarium verticillioides Sordariomycetes 2 Supercontig 4: 851617 - 853730 (-) Broad Institute and Syngenta AG
Fusarium graminearum Sordariomycetes 2 Supercontig 6: 1217222 – 1219438 (+) Broad Institute
Ephicloa festucae Sordariomycetes 2* Contig 09290: 7786 – 8664 (+); Contig 10180:
1- 1318 (+)
Oklahoma University
Trichoderma reesei Sordariomycetes 2 Contig 1: 484378 – 486398 (+) Joint Genome Institute
Trichoderma virens Sordariomycetes 2 Locus ABDF01000081: 60794 – 62787 (-) Joint Genome Institute
Trichoderma atroviride Sordariomycetes 2 Locus ABDG01000180: 45972 – 47983 (-) Joint Genome Institute
Alternaria brassicicola Dothideomycetes 2 Contig 2.99: 20908 – 22912 (+) Washington University
Mycosphaerella fijiensis Dothideomycetes 2 Scaffold 11: 1715901 – 1717838 (+) Joint Genome Institute
Mycosphaerella graminicola Dothideomycetes 2 Scaffold 5: 2832610 – 2834512 (-) Joint Genome Institute
Pyrenophora tritici-repentis Dothideomycetes 2 Supercontig 8: 1656744 – 1658773 (+) Broad institute
ANNEXES
137
Stagonospora nodorum Dothideomycetes 2 Supercontig 36: 142061 – 144139 (-) Broad Institute, International
Stagonospora nodorum Genomics
Consortium
Cochliobolus heterostrophus Dothideomycetes 2 Scaffold 3: 15429 - 17447 (-) Joint Genome Insttitute
Coccidioides immitis Eurotiomycetes 2 Chromosome 4: 837325 – 839343 (-) Broad Institute
Coccidioides posadasii Eurotiomycetes 2 Chromosome 5: 696037 – 698131 (+) TIGR and Broad Institute
Blastomyces dermatitidis Eurotiomycetes 2 Contig 5.48: 866 – 3141 (-) Washington University
Histoplasma capsulatum Eurotiomycetes 2 Contig 1161: 255583 – 257713 (+) Broad institute, Washington University
Genome Sequencing Center
Paracoccidioides brasiliensis Eurotiomycetes 2 Supercontig 2: 1922491 - 1924634 (-) Broad Institute
Uncinocarpus reesii Eurotiomycetes 2 Supercontig 5: 1771068 – 1772993 (+) Broad Institute
Ascosphaera apis Eurotiomycetes 1* Contig 6709: 1 – 1360 (-); Contig 28: 3916 –
5847 (-)
Baylor College of Medicine
Microsporum gypseum Eurotiomycetes 1 Supercontig 8: 169115 – 170914 (-) Broad Institute
Talaromyces stipitatus Euriotiomycetes 2 Locus ABAS01000017: 261779 – 263672 (+) TIGR
Penicillium marneffei Euriotiomycetes 2 Locus ABAR01000011: 131838 – 133723 (-) TIGR
Neosartorya fischeri Eurotiomycetes 2 Contig 574: 1982352 - 1984281 (-) TIGR
Aspergillus clavatus Eurotiomycetes 2 Contig 83: 930562 – 932534 (+) TIGR
Aspergillus fumigatus Eurotiomycetes 2 Chromosome 3: 2185740 - 2187668 ( +) Sanger Institute and TIGR
Aspergillus oryzae Eurotiomycetes 2 Supercontig 2: 3771013 - 3773030 (-) Broad Institute
Aspergillus terreus Eurotiomycetes 2 Supercontig 2: 1754327 - 1756250 (-) Broad Institute
Aspergillus flavus Eurotiomycetes 2 Contig 1: 3827471 – 3829494 (-) TIGR
Aspergillus nidulans Euriotiomycetes 2 Contig 51: 528863 - 530802 (+) Broad Institute and Monsanto
Aspergillus niger Eurotiomycetes 2 Contig 2: 3360920 – 3362951 (+) Joint Genome Institute
138
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Annex III. Fungal species where were identified ortologue MCM1 genes. Taxonomy, Nº intron, gene location and database.
Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center
Phycomyces blakeleeanus Mucoromycotina 3 Scaffold 1: 2124764 – 2126294 (+) Joint Genome Institute
Mucor circinelloides Mucoromycotina 2 Scaffold 4: 3273504 – 3274607 (-) Broad Institute, Murcia University
Rhizopus oryzae Mucoromycotina 3 Supercontig 4: 1692147-1693115 (-) Broad Institute
Ustilago maydis Ustilagomycetes - Contig 43: 64341-65705 (+) Broad Institute
Malassezia globosa Malasseziales 1 Sf 19.1: 27642 – 28805 (+) NCBI
Sporobolomyces roseus Microbotryomycetes 5 Scaffold 5: 1185071 – 1188512 (-) Joint Genome Institute
Cryptococcus neoformans Tremellomycetes 4 Chromosome 13: 39688-42628 (+) Broad Institute, Genome Sciences
Center Canada, Duke Center for
Genome Research, Stanford Genome
Technology
Cryptococcus gatti Tremellomycetes 3 Supercontig 14: 629109 – 632178 (-) Broad Institute
Laccaria bicolor Agaricomycetes 2 Scaffold 3: 1957445 - 1958697 (+) Joint Genome Institute
Coprinus cinereus Agaricomycetes 2 Contig 175: 86341-87583 (+) Broad Institute
Phanerochaete chrysosporium Agaricomycetes 1 Scaffold 8: 889662 – 890764 (-) Joint Genome Institute
Postia placenta Agaricomycetes 1 Scaffold 93: 242447 – 243724 (+) Joint Genome Institute
Schizosaccharomyces japonicus Schizosaccharomycetes 2 Supercontig 1: 436734-438566 (+) Broad Institute
Schizosaccharomyces pombe Schizosaccharomycetes 2 Chromosome 1: 5292795-5294344 (+) Sanger Institute
Schizosaccharomyces octosporus Schizosaccharomycetes 2 Supercontig 1: 2428314-2429843 (+) Broad Institute
Yarrowia lipolytica Saccharomycetes 1 Chromosome C: 913297 – 914639 (-) Génolevures
Candida lusitaniae Saccharomycetes - Supercontig 5: 694152-694868 (-) Broad Institute
Candida guilliermondii Saccharomycetes - Supercontig 1: 710234-710938 (+) Broad Institute
ANNEXES
139
Pichia stipitis Saccharomycetes - Chromosome 5: 1545724 – 1546455 (-) Joint Genome Institute
Debaryomyces hansenii Saccharomycetes - Chromosome F: 1872703 - 1873428 (+) Génolevures
Lodderomyces elongisporus Saccharomycetes - Supercontig 10: 288713-289579 (-) Broad Institute
Candida parapsilosis Saccharomycetes - Contig 005806: 325917 – 326762 (+) Sanger Institute
Candida tropicalis Saccharomycetes - Supercontig 7: 104018-104854 (+) Broad Institute, Génolevures
Candida albicans Saccharomycetes - Chromosome 7: 188023-188811 (-) Stanford Genome Technology Center,
Sanger Institute and Broad Institute
Candida dubliniensis Saccharomycetes - Chromosome 7: 207792 - 208493 (-) Sanger Institute
Kluyveromyces lactis Saccharomycetes - Chromosome F: 2474742 – 2475452 (-) Génolevures
Ashbya gossypii Saccharomycetes - Chromosome 2: 561146 - 561766 (-) Biozentrum and Syngenta
Saccharomyces kluyveri Saccharomycetes - Contig 5.10: 2448 – 3053 (+) Washington University Genome
Sequencing Center, Génolevures
Kluyveromyces waltii Saccharomycetes - Contig 206: 1 – 307 (-); Contig 205: 7934 –
8165 (-)
Broad Institute
Candida glabrata Saccharomycetes - Chromosome F: 618976 – 619632 (-) Génolevures
Vanderwaltozyma polyspora Saccharomycetes - Contig 1073b: 43103 – 43882 (-) Trinity College Dublin, Ireland
Saccharomyces castellii Saccharomycetes - Contig 649: 39880 - 40560 (-) Washington University Genome
Sequencing Center
Saccharomyces bayanus Saccharomycetes - Contig 579: 25293 – 26144 (-) Washington University Genome
Sequencing Center, Broad Institute,
Génolevures
Saccharomyces kudriavzevii Saccharomycetes - Contig 02.820: 794 – 1651 (-) Washington University Genome
Sequencing Center
Saccharomyces mikatae Saccharomycetes - Contig 578: 4300 – 5130 (+) Washington University Genome
140
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Sequencing Center, Broad Institute
Saccharomyces cerevisiae Saccharomycetes - Chromosome 13: 353870 – 354730 (+) Stanford Genome Technology Center,
Sanger Institute, Broad Institute
Saccharomyces paradoxus Saccharomycetes - Contig 178: 11877 – 12719 (-) Broad Institute
Botrytis cinerea Leotiomycetes 2* Supercontig 165: 67693 – 68463 (-) Broad Institute and Syngenta AG,
Genoscope
Sclerotinia sclerotiorum Leotiomycetes 2 Supercontig 6: 1981916-1982762 (+) Broad Institute
Magnoporthe grisea Sordariomycetes 2 Supercontig 23: 1178204-1179169 (-) Broad Institute
Verticillium dahliae Sordariomycetes 2 Supercontig 3: 440666-441518 (+) Broad Institute
Podospora anserina Sordariomycetes 2 Chromosome 1_SC4: 1831500 – 1832359 (+) Genoscope
Chaetomium globosum Sordariomycetes 2 Supercontig 3: 3648631 - 3649505 (-) Broad Institute
Neurospora crassa Sordariomycetes 2 Contig 39: 300711 - 301579 (+) Broad Institute
Nectria haematococca Sordariomycetes 2 Contig 1: 1904784 – 1905808 (+) Joint Genome Institute
Fusarium oxysporum Sordariomycetes 2 Supercontig 3: 1396623 - 1397638 (-) Broad Institute
Fusarium verticillioides Sordariomycetes 2 Supercontig 2: 1298954 - 1299971 (-) Broad Institute and Syngenta AG
Fusarium graminearum Sordariomycetes 2 Supercontig 5: 973273 – 974303 (-) Broad Institute
Ephicloa festucae Sordariomycetes 2 Contig 03502: 2589 – 3725 (-) Oklahoma University
Trichoderma reesei Sordariomycetes 2 Contig 1: 293329 – 294473 (+) Joint Genome Institute
Trichoderma virens Sordariomycetes 2 Locus ABDF01000238: 119788 – 120837 (-) Joint Genome Institute
Trichoderma atroviride Sordariomycetes 2 Locus ABDG01000087 : 9802 – 10882 (-) Joint Genome Institute
Alternaria brassicicola Dothideomycetes 2 Contig 7.54: 1842 – 2668 (+) Washington University
Mycosphaerella fijiensis Dothideomycetes 2 Scaffold 5: 691696 – 692478 (+) Joint Genome Institute
Mycosphaerella graminicola Dothideomycetes 2 Scaffold 7: 131733 – 132524 (-) Joint Genome Institute
Pyrenophora tritici-repentis Dothideomycetes 2 Supercontig 5: 186836-187667 (+) Broad institute
ANNEXES
141
Stagonospora nodorum Dothideomycetes 2 Supercontig 11: 1046577 – 1047421 (+) Broad Institute, International
Stagonospora nodorum Genomics
Consortium
Cochliobolus heterostrophus Dothideomycetes 2 Scaffold 2: 392134 - 393018 (-) Joint Genome Insttitute
Coccidioides immitis Eurotiomycetes 2 Chromosome 3: 322326 – 323200 (-) Broad Institute
Coccidioides posadasii Eurotiomycetes 2 Chromosome 2: 790014 – 790884 (-) TIGR and Broad Institute
Blastomyces dermatitidis Eurotiomycetes 2 Contig 5593.1: 398 – 885 (-) Washington University
Histoplasma capsulatum Eurotiomycetes 2 Supercontig 1.3: 251153 – 252083 (+) Broad institute, Washington University
Genome Sequencing Center
Paracoccidioides brasiliensis Eurotiomycetes 2 Supercontig 12: 450142-451180 (+) Broad Institute
Uncinocarpus reesii Eurotiomycetes 2 Supercontig 2: 4576943-4577833 (+) Broad Institute
Ascosphaera apis Eurotiomycetes 1 Contig 2371: 17 – 538 (-); Contig 6891: 66 –
1133 (+)
Baylor College of Medicine
Microsporum gypseum Eurotiomycetes 2 Supercontig 1: 3737424-3738317 (+) Broad Institute
Talaromyces stipitatus Euriotiomycetes 2 Locus ABAS01000004: 938690 – 939519 (+) TIGR
Penicillium marneffei Euriotiomycetes 2 Locus ABAR01000013 : 274356 – 275187 (+) TIGR
Neosartorya fischeri Eurotiomycetes 2 Contig 582: 357403-358305 (-) TIGR
Aspergillus clavatus Eurotiomycetes 2 Contig 85: 946418-947301 (+) TIGR
Aspergillus fumigatus Eurotiomycetes 2 Chromosome 6: 357383-358283 –() Sanger Institute and TIGR
Aspergillus oryzae Eurotiomycetes 2 Supercontig 17: 221157-222072 (-) Broad Institute
Aspergillus terreus Eurotiomycetes 2 Supercontig 10: 1111982-1112918 (+) Broad Institute
Aspergillus flavus Eurotiomycetes 2 Contig 9: 471345-472255 (-) TIGR
Aspergillus nidulans Euriotiomycetes 2 Contig 159: 33815-34722 (-) Broad Institute and Monsanto
Aspergillus niger Eurotiomycetes 2 Contig 9: 222598 - 223521 (-) Joint Genome Institute
142
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Annex IV. Fungal species where identified orthologue IFF8 genes. Taxonomy, Nº de intron, gene location and database.
Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center
Candida lusitaniae Saccharomycetes - Supercontig 7: 753415 - 755439 (+) Broad Institute
Candida guilliermondii Saccharomycetes - Supercontig 2: 36309 - 39698 (+) Broad Institute
Pichia stipitis Saccharomycetes - Chromosome 1: 307099 – 311559 (+) Joint Genome Institute
Debaryomyces hansenii Saccharomycetes - Chromosome B: 107946 - 112736 (-) Génolevures
Lodderomyces elongisporus Saccharomycetes - Contig 1.123: 280245 - 287281 (-) Broad Institute
Candida parapsilosis Saccharomycetes - Contig 005807: 1470770 – 1475785 (+) Sanger Institute
Candida tropicalis Saccharomycetes - Supercontig 10: 128215 - 134073 (+) Broad Institute, Génolevures
Candida albicans Saccharomycetes - Chromosome 5: 164521 - 166665 (-) Stanford Genome Technology Center,
Sanger Institute and Broad Institute
Candida dubliniensis Saccharomycetes - Chromosome 5: 187922 - 189868 (+) Sanger Institute
ANNEXES
143
Annex V. Fungal species where were identified ortologue SMP1 genes. Taxonomy, Nº intron, gene location and database.
Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center
Candida glabrata Saccharomycetes - Chromosome M: 657290 – 659269 (-) Génolevures
Vanderwaltozyma polyspora Saccharomycetes - Contig 397: 8567 – 10588 (-) Trinity College Dublin, Ireland
Saccharomyces castellii Saccharomycetes - Contig 721: 27914 - 29635 (-) Washington University Genome
Sequencing Center
Saccharomyces bayanus Saccharomycetes - Contig 3: 5420 – 6778 (+) Washington University Genome
Sequencing Center, Broad Institute,
Génolevures
Saccharomyces kudriavzevii Saccharomycetes - Contig 02.1807: 8672 – 9925 (+) Washington University Genome
Sequencing Center
Saccharomyces mikatae Saccharomycetes - Contig 100: 1952 – 3295 (+) Washington University Genome
Sequencing Center, Broad Institute
Saccharomyces cerevisiae Saccharomycetes - Chromosome 2: 594859 – 593501 (+) Stanford Genome Technology Center,
Sanger Institute, Broad Institute
Saccharomyces paradoxus Saccharomycetes - Contig 210: 35931 – 37187 (-) Broad Institute
144
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
Annex VI. Fungal species where were identified ortologue ARG80 genes. Taxonomy, Nº intron, gene location and database.
Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center
Candida glabrata Saccharomycetes - Chromosome F: 620431 – 621027 (-) Génolevures
Vanderwaltozyma polyspora Saccharomycetes - Contig 1073b: 45942 – 46289 (-) Trinity College Dublin, Ireland
Saccharomyces castellii Saccharomycetes - Contig 709: 49895 - 50539 (-) Washington University Genome
Sequencing Center
Saccharomyces bayanus Saccharomycetes - Contig 579: 26917 – 27447 (-) Washington University Genome
Sequencing Center, Broad Institute,
Génolevures
Saccharomyces kudriavzevii Saccharomycetes - Contig 02.1514: 585 – 1130 (-) Washington University Genome
Sequencing Center
Saccharomyces mikatae Saccharomycetes - Contig 2531: 1149 – 1703 (-) Washington University Genome
Sequencing Center, Broad Institute
Saccharomyces cerevisiae Saccharomycetes - Chromosome 13: 352602 – 353135 (+) Stanford Genome Technology Center,
Sanger Institute, Broad Institute
Saccharomyces paradoxus Saccharomycetes - Contig 178: 13466 – 14017 (-) Broad Institute
ANNEXES
145
ANNEX VII
Likelihood value, parameter estimates, and sites under positive selection as inferred under the six models
proposed to calculate ω over codons in MADS-box of orthologue RLM1 genes. (PAML 4.1)
Model l Estimates of parameters dN/dS Positively
Selected sites
One-ratio (M0)
-8907.095177
ω = 0.0185
0.0185
None
Neutral (M1a)
-8881.920467
p: 0.97143 0.02857
w: 0.01692 1.00000
0.0450
Not allowed
Selection (M2a)
-8881.920467
p: 0.97143 0.00817 0.02040
w: 0.01692 1.00000 1.00000
0.0450
None
Free-ratio (M3)
-8595.963248
p: 0.35665 0.42127 0.22208
w: 0.00122 0.01527 0.06434
0.0212
None
Beta (M7)
-8575.525557
p= 0.42029 q= 13.55916
0.0283
Not allowed
Beta & w (M8)
-8575.526257
p0= 0.99999 p= 0.42029 q= 13.55916
(p1= 0.00001) w= 1.00000
0.0283
None
Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes
146
ANNEX VIII
Likelihood value, parameter estimates, and sites under positive selection as inferred under the six models
proposed to calculate ω over codons in Saccharomycotina RLM1 gene. (HYPHY 1.0)
Model l Estimates of parameters dN/dS Positively
Selected sites
One-ratio (M0)
-39464.90296
ω = 0.13616
0.13616
None
Neutral (M1a)
-40768.67683
p: 0.9999343 0.0000657
w: 1.00000 1.00000
1.000
Not allowed
Selection (M2a)
-39216.75175
p: 0.66539 0.33461 0.00000
w: 0.13860 1.00000 2.98229
0.42684
None
Free-ratio (M3)
-38569.21552
p: 0.11323 0.46676 0.42001
w: 0.01243 0.10299 0.28431
0.16889
None
Beta (M7)
-38570.03576
p= 0.96651 q= 4.56744
0.17465
Not allowed
Beta & w (M8)
-38570.03383
p0= 0.99999 p= 0.96573 q= 4.56035
(p1= 0.00001) w= 1.05491
0.17476
None
ANNEXES
147
ANNEX IX
Likelihood ratio test statistics for comparing site specific models for MADS-box and Saccharomycotina
RLM1 gene.
Analysis (DF)
M0 vs M3 (4)
M1a vs M2a (2)
M7 vs M8 (2)
MADS-box
622.26* 0 0.0014
Saccharomycotina
Rlm1 gene
1791.37* 3103.85* 0.0039
p value p> 0.05 p>0.05 p>0.05 Degrees of freedom (DF), corresponding to the difference of the number of parameters in the models, are
reported in brackets, significant values are marked by *.