julio césar chávez galarza...manifesto de maneira muito especial o meu agradecimiento à prof....

172
Julio César Chávez Galarza Abril de 2009 Minho 2009 U Universidade do Minho Escola de Ciências Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes Julio César Chávez Galarza Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes

Upload: others

Post on 29-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Julio César Chávez Galarza

Abril de 2009Min

ho

2009

U

Universidade do Minho

Escola de Ciências

Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes

Jul

io C

ésar

Chá

vez

Gal

arza

Ph

ylo

ge

ne

tic

rela

tio

nsh

ips

of

the

mo

st c

om

mo

n p

ath

og

en

ic

Can

dida

sp

eci

es

infe

rre

d b

y se

qu

en

ce a

na

lysi

s o

f n

ucl

ea

r g

en

es

Page 2: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Tese de Mestrado Mestrado em Genética Molecular

Trabalho efectuado sob a orientação da Professora Doutora Ana Paula Fernandes Monteiro Sampaio Carvalho

Julio César Chávez Galarza

Abril de 2009

Universidade do Minho

Escola de Ciências

Phylogenetic relationships of the most common pathogenic Candida species inferred by sequence analysis of nuclear genes

Page 3: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

É AUTORIZADA A REPRODUÇÃO PARCIAL DESTA TESE,

APENAS PARA EFEITOS DE INVESTIGAÇÃO, MEDIANTE DECLARAÇÃO

ESCRITA DO INTERESSADO, QUE A TAL SE COMPROMETE.

Page 4: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

iii

This Thesis was supported by the Programme ALβAN, the European Union Programme of High

Level Scholarships for Latin America, scholarship No E06M103915PE.

Page 5: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 6: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

v

DEDICATORIA

A Jehová Deus por tudo

Aos meus pais Julio y Claudia pelo seu

amor e apoio em todo momento

Page 7: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 8: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

vii

AGRADECIMENTOS

Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula

Sampaio, orientadora deste trabalho, por todo apoio, dedicação, determinação, constante

estímulo, e em particular por todos os ensinamentos que me transmitiu.

À Prof. Doutora Célia Pais, pelo seu apoio, ensinamentos e valiosas críticas deste

trabalho.

À Prof. Cândida Lucas, por toda a sua ajuda que me proporcionou para começar os

meus estudos em Portugal.

Ao Prof. Rui Oliveira, pelos seus conselhos e amizade.

Ao Prof. Felipe Costa, pelo seu apoio e sugestões na realização desta tese.

Ao Prof. Pedro Carvalho pela sua amizade e acolhida quando cheguei a Portugal.

Aos elementos que integram o Departamento de Biologia e, nomeadamente os seus

docentes, investigadores e técnicos, por todo o apoio demonstrado.

Aos colegas que integram o laboratório Yolanda, Alexandra, Marlene, Eugénia, Ana,

Rafael, Raquel, pelo auxilio no trabalho, pela companhia e pela amizade.

Aos colegas do Departamento de Biologia, em especial à Andreia, Sílvia, Célia, Rita,

Sofia, Emi ao Flavio e Jorge, pelos seus conselhos, pela amizade, alegria e longas

tertúlias.

Ao Cristovão pela sua ajuda na edição desta tese.

À família Alves Pinto Pacheco, que me fizeram esquecer o meu Perú, para aprender

querer Portugal.

Aos colegas do mestrado á Dany, Dulcineia, Dulce, Ana, Nádia, ao Marcos e Ricardo,

pelos bons momentos que partilhamos e apoio que me brindaram.

Page 9: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

viii

Aos professores da Universidade Nacional de Trujillo-Perú: Dr. Manuel Fernández

Honores e Ms. C. Elmer Alvítez Izquierdo por me terem apoiado e ensinado grandes

valores e em forma muito especial ao Professor Roberto Rodríguez Rodríguez por ter

sido o meu guía durante a vida universitária e modelo de pessoa.

Aos professores Dr. Ricardo Fujita Alarcón do Instituto de Genética e Biologia

Molecular da Universidade Particular San Martín de Porres e Dr. Enrique Pérez do

Instituto de Medicina Tropical da Universidade Cayetano Heredia, pelo apoio

incondicional e confiança colocada em mim.

Aos meus irmãos Gianmarco, Michael, Lizbeth, Pilar, Emilia, Ruth, Rommel, Maira e

Milagros, pelo apoio incondicional.

Ao meu tío Miguel pela sua confiança e apoio incondicional desde o início dos meus

estudos.

Aos meus tíos Ramón, Picolín, Doris, Luz e Carlos pelo seu apoio e confiança que

sempre têm posto em mim.

À família Rodríguez Gonzales: Prof. Roberto, Sra. Rocio, Mireya, Lalo e Brandon, pelo

apoio e amizade nos momentos mais difíceis, este logro é produto da sua confiança

posta em mim.

Às famílias Salgado Rodríguez, Juárez Pita, Rebaza Rodríguez, Massé Miguel, sempre

me fizeram sentir como parte delas.

À Clarisa que sem a sua ajuda tivesse sido difícil este logro.

À Yolanda pelo amor e carinho constante que nunca acabarão.

Do mesmo modo, aos meus amigos do Perú: Carlitos, Eduardo, Miguel, Victor Hugo,

Daniel, Fibi, Ramón, Ludwig, Gabriela, Cecilia, Divián, Omar, Néstor, Marge, Sussy.

Page 10: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ix

RESUMO

As regiões genómicas codificantes são as mais usadas em estudos filogenéticos,

quer através de análises multigénicas, quer filogenómicas. Contudo, a maioria destas

análises usa apenas as regiões com elevada similaridade entre as sequências ortólogas,

não considerando as restantes regiões, as quais nos podem dar informações muito úteis.

Deste modo, o principal objectivo desta tese foi avaliar o uso dos genes da família

MADS-box, RLM1 e MCM1, assim como o gene IFF8, que codifica para uma proteína

GPI, em estudos de filogenia de fungos.

Foram obtidas setenta e seis sequências ortólogas para RLM1 e MCM1 e oito

para IFF8 através da pesquisa em várias bases de dados. O alinhamento das sequências

foi realizado mediante o uso dos programas CLUSTALW e MUSCLE e a análise

filogenética, utilizando os programas PHYML 3.0 e MrBayes 3.2. Os resultados obtidos

usando o factor de transcrição RLM1 indicaram que este apresenta condições favoráveis

para ser incluído em estudos filogenéticos de fungos, uma vez que a topologia obtida

está muito próxima das estabelecidas em outras análises multigénicas e filogenómicas.

O factor de transcrição MCM1 demonstrou limitações para ser utilizado em estudos

filogenéticos a nível do reino, visto que as sequências obtidas apresentam uma grande

variabilidade de tamanho, originando problemas nos alinhamentos. Apesar desta

limitação este gene pode ser utilizado, quer independentemente quer combinado com

outros genes, para resolver filogenias a nível do Filo ou de grupos de categoria inferior,

uma vez que os resultados obtidos dentro do subfilo Saccharomycotina estão de acordo

com os estudos publicados. A utilização do gene IFF8 para inferir filogenia apresentou

grandes limitações uma vez que foram identificados ortólogos deste gene apenas no

grupo CUG e além disso a filogenia obtida não estava de acordo com outros estudos

publicados, em particular no respeitante à posição de C. tropicalis na filogenia de

Candida spp.

É presentemente aceite que no Subfilo Saccharomycotina ocorreu um processo

de duplicação completa do genoma, nos grupos sensu stricto e sensu lato do ‗Complexo

Saccharomyces‘ com conservação de vários genes duplicados. A pesquisa inicial por

sequências ortólogas revelou que os factores de transcrição estudados neste trabalho

estão dentro dos genes duplicados que foram mantidos. Assim, outro objectivo deste

trabalho foi determinar se o gene RLM1 esteve sob selecção positiva no Subfilo

Page 11: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

x

Saccharomycotina. Esta análise identificou várias posições onde os aminoácidos estão

possivelmente sob selecção positiva e embora estas substituições tenham sido

observadas em diferentes locais da proteína, não se detectou nenhuma nas regiões

conservadas. Esta observação sugere que a proteína executa uma função importante na

célula que foi mantida durante o processo de divergência de espécies. A presença de

aminoácidos sob selecção positiva, no inicio da região repetitiva do terminal carboxílico

da proteína, bem como a diferença no aminoácido repetido entre as espécies com o

genoma duplicado e as espécies com genoma não duplicado sugeriam que uma mutação

de frameshift seria a responsável pelas alterações observadas. Esta hipótese foi então

testada no Subfilo Saccharomycotina, desenhando as três grelhas de leitura e

reconstruindo a sequência ancestral. Os resultados desta análise confirmaram que uma

mutação de frameshift foi de facto responsável pelas alterações observadas na região

repetitiva com substituição de aminoácidos, diversificando a função deste gene nas

espécies que duplicaram o genoma.

Os principais resultados desta tese foram: (i) a identificação do potencial do

gene RLM1 para estudos de filogenia do reino Fungi, com especial ênfase na filogenia

das espécies do género Candida spp.; e (ii) a identificação do mecanismo molecular

responsável pela alteração da região repetitiva do terminal carboxílico da proteína,

ocorrida no Saccharomyces sensu stricto durante a divergência das espécies após

duplicação do genoma.

Page 12: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xi

ABSTRACT

Coding regions are used to resolve phylogenetic relationships through

multigenic and phylogenomic analyses. However, the majority of these analyses uses

regions with similarity only among orthologue gene sequences and do not take into

account other regions which could give useful information. Thus, the main purpose of

this thesis was to evaluate the use of RLM1 and MCM1 MADS-box transcription

factors, and IFF8, a GPI-anchor protein-coding gene, in fungi phylogeny.

Seventy six putative orthologue sequences for RLM1 and MCM1 and eight for IFF8

were obtained from different fungal databases. Sequence alignments were performed by

using CLUSTALW and MUSCLE and phylogeny was inferred by using PHYML 3.0

and Mr Bayes 3.2. Results obtained from the phylogenetic analysis, using the

transcription factor RLM1, indicated that it presents conditions to be considered within a

multigene analysis, since the obtained fungal phylogeny is closer to the ones established

by multigene and phylogenomic analyses. The transcription factor MCM1 presented

limitations to be used in phylogeny at the kingdom level, because of its variable

sequence sizes. However, despite this limitation this gene can be used to resolve

phylogeny at the phylum or lower clade levels since the results obtained, independently

and/or concatenated with the other genes used in this study, within the subphylum

Saccharomycotina were in agreement with published studies. On the other hand, the use

of IFF8 gene to infer phylogeny is limited and restricted to the CUG group, since other

orthologues were not found within the kingdom Fungi and the results obtained in the

CUG group phylogeny presented conflicts, particularly in the position of Candida

tropicalis which is not in agreement with previously determined relationship in Candida

phylogeny.

It is known that within the subphylum Saccharomycotina the process of genome

duplication has occurred in the ‗Saccharomyces complex‘, groups sensu stricto and

sensu lato, resulting in the duplication of some genes. The initial search for orthologue

sequences showed that the transcription factors studied in this work are within the

duplicated genes that were maintained. Thus, another objective of this work was to

determine if RLM1 was under positive selection to search for an alternative explanation

for the persistence and diversification of gene duplicates. These analyses identified

several amino acid sites under positive selection within the subphylum

Page 13: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xii

Saccharomycotina and although these substitutions were present in different positions,

they were not inside conserved regions, suggesting that the protein plays an important

role in fungi that was maintained during the evolution process of divergence of species.

The presence of amino acids under positive selection at the beginning of Rlm1 C-

terminal repetitive region and the differences in the amino acid under repetition between

species that presented the duplicated genome (WGD) and species with non duplicated

genome was indicative of a possible frameshift mutation. This hypothesis was tested in

Saccharomycotina group by designing the three open reading frames and reconstructing

the ancestral sequence. Results from this analysis showed that amino acid substitution

that occurred during the divergence of species changed this repetitive region and

diversified gene function in WGD species avoiding the loss of the protein function.

The major findings of this work were (i) the identification of the potential use of

RLM1 gene for inferring phylogeny in the kingdom Fungi with special emphasis to

Candida species, and (ii) the observation that this gene evolved within the

Saccharomyces sensu stricto after genome duplication, being the molecular mechanism

responsible for the change observed in the C-terminal of this protein most probably a

frameshift mutation.

Page 14: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xiii

RESUMEN

Las regiones genómicas codificantes son usadas en estudios filogenéticos através

de análisis a niveles multigenicos y filogenómicos. Pero la mayoría de estos análisis

usan solo regiones con similaridad entre secuencias de genes ortólogos y no toma en

cuenta otras regiones las cuales podrían darnos informaciones útiles. Por lo tanto, el

objetivo principal de esta tesis fue evaluar el uso de los genes MADS-box, RLM1 y

MCM1, así como IFF8, un gen codificante de proteína de anclaje GPI, en la filogenia de

hongos.

Se obtuvieron setenta y seis secuencias putativas ortólogas para RLM1 y MCM1,

y ocho para IFF8 a partir de las diferentes bases de datos de hongos. Los alineamientos

de secuencias se realizaron mediante el uso de los programas CLUSTALW y MUSCLE,

y la filogenia fue inferida por medio de los programas PHYML 3.0 y MrBayes 3.2. Los

resultados obtenidos de los análisis filogenéticos, utilizando el factor de transcripción

RLM1 indicaron que este presenta condiciones para ser considerado dentro de análisis

multigénico, ya que la filogenia obtenida para hongos está muy próxima a las

establecidas por otros análisis multigénicos y filogenómicos. El factor de transcripción

MCM1 presentan limitaciones para ser utilizado en filogenia a nivel de reino, debido a

sus secuencias de tamaño variables, dando lugar a problemas en el alineamiento. A

pesar de esta limitación este gen se puede utilizar para resolver filogenias a nivel de Filo

o clados inferiores debido a los resultados obtenidos con su uso, de manera

independiente y/o combinado con el resto de genes utilizados en este estudio, ya que

dentro del Subfilo Saccharomycotina estuvieron de acuerdo con estudios publicados.

Por otro lado, el uso de IFF8 gen para inferir filogenia es limitado y restringido al grupo

CUG, ya que otros ortólogos no se han encontrado en el reino Fungi y los resultados

obtenidos en la filogenia del grupo CUG presentaron conflictos, en particular en la

posición de C. tropicalis que no está de acuerdo con lo que ya se ha determinado en la

filogenia de Candida spp.

Es actualmente aceptado que dentro del subfilo Saccharomycotina el proceso de

duplicación del genoma se ha producido en los grupos sensu stricto y sensu lato del

'Complejo Saccharomyces‘, resultando en la manutención de algunos genes. La

búsqueda inicial de secuencias ortólogas mostró que los factores de transcripción

estudiados en este trabajo están dentro de los genes duplicados que han sido

Page 15: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xiv

mantenidos. Así pues, otro objetivo de este trabajo fue determinar si RLM1 estuvo bajo

selección positiva. Estos análisis identificaron varios sitios donde los aminoácidos

estuvieron posiblemente bajo selección positiva dentro del Subfilo Saccharomycotina y

aunque estas sustituciones estaban presentes en diferentes posiciones, no estuvieron

presentes en las regiones conservadas, lo que sugiere que la proteína ejecuta una

función importante en la célula que fue mantenida durante el proceso de divergencia de

especies. La presencia de aminoácidos bajo selección positiva en el inicio de la region

repetitiva del C-terminal en Rlm1 y la diferencia en el aminoácido bajo repetición entre

las especies con el genoma duplicado y las especies con genoma no duplicado indicaba

un posible frameshift mutation. Asi, esta hipótesis fue testada en el subfilo

Saccharomycotina diseñando tres open reading frames y reconstruyendo la secuencia

ancestral. Los resultados de este análisis mostraron que la substitución de aminoácidos

ocurrida durante la divergencia de especies que alteró esa región fue mediante una

frameshift mutation diversificando la función de este gen en las especies que duplicaron

su genoma.

Los principales resultados de esta tesis son: (i) la identificación del potencial del

gen RLM1 para estudios de filogenia en el reino Fungi, con especial énfasis en la

filogenia de las especies de Candida spp y (ii) la identificación del mecanismo

molecular responsable por el cambio observado en el C-terminal de esta proteína en el

grupo Saccharomyces sensu stricto que alteró el gen durante la divergencia de especies

después del proceso de duplicación de genoma.

Page 16: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xv

INDEX

DEDICATORIA .............................................................................................................. v

AGRADECIMENTOS ................................................................................................. vii

RESUMO ........................................................................................................................ ix

ABSTRACT .................................................................................................................... xi

RESUMEN ................................................................................................................... xiii

ABBREVIATION LIST ............................................................................................. xvii

FIGURE LIST .............................................................................................................. xxi

TABLE LIST .............................................................................................................. xxiii

1. INTRODUCTION ...................................................................................................... 1

1.1. Classification of organisms .................................................................................. 3

1.2. Tree of life ............................................................................................................. 5

1.3. Kingdom Fungi ................................................................................................... 12

1.4. Fungal taxonomy ................................................................................................ 15

1.5. Fungal phylogeny ............................................................................................... 16

1.6. Fungal identification .......................................................................................... 19

1.6.1. Phenotypic methods .................................................................................... 20

1.6.2. Genotypic methods ...................................................................................... 21

1.7. Methods for phylogenetic inference.................................................................. 24

1.7.1. Multiple sequence alignment ...................................................................... 25

1.7.1.1. CLUSTAL series of programs .......................................................... 25

1.7.1.2. MUSCLE ............................................................................................ 26

1.7.2. Major methods for estimating phylogenetic trees and infer phylogeny . 26

1.7.2.1. Distance methods ............................................................................... 26

1.7.2.2. Character-based methods ................................................................. 28

1.8. Adaptive evolution.............................................................................................. 30

2. THESIS BACKGROUND AND OBJECTIVES .................................................... 33

3. MATERIAL AND METHODS ............................................................................... 43

3.1. Phylogenetic analysis ......................................................................................... 45

3.1.1. Data retrieval ............................................................................................... 45

3.1.2. Alignment of sequences ............................................................................... 46

3.1.3. Selection of the best evolutionary model ................................................... 46

Page 17: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xvi

3.1.4. Phylogenetic reconstruction ........................................................................ 46

3.1.5. Interpretation of values considered for nodal support ............................. 47

3.2. Analysis of adaptive evolution ........................................................................... 48

3.3. Analysis of the repetitive region in the subphylum Saccharomycotina ......... 49

4. RESULTS AND DISCUSSION................................................................................ 51

4.1. Compilation of data ............................................................................................ 53

4.2. Phylogeny based on MADS-box transcrition factors ...................................... 54

4.2.1. Rlm1 transcription factor ........................................................................... 54

4.2.2. Mcm1 transcription factor .......................................................................... 60

4.2.3. Placement of fungal phylogeny based on MADS-box transcription factor

within the current literature ................................................................................. 63

4.3. Relationships within the subphylum Saccharomycotina ................................ 69

4.3.1. Phylogeny of the ‘Saccharomyces complex’ ............................................... 69

4.3.2. Phylogenetic relationships amongst Candida species ............................... 74

4.4. Analysis of adaptive evolution ........................................................................... 77

4.5. Event of frameshift mutation in S. cerevisiae RLM1 gene .............................. 82

5. FINAL REMARKS ................................................................................................... 89

6. REFERENCES .......................................................................................................... 95

7. ANNEXES ................................................................................................................ 129

Page 18: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xvii

ABBREVIATION LIST

AIC Akaike Information Criterion

AICc Akaike Information Criterion corrected

ACT1 Actin 1

AFLP Amplified Fragment Length Polymorphism

aLRT Approximate Likelihood Ratio Test

AP-PCR Arbitrarily Primed - PCR

ARG80 Arginine requirement 80

BEB Bayesian Empirical Bayes

BI Bayesian inference

COX2 Cythocrome oxidase 2 gene

CUG Organisms translate serine instead of leucine

CytB Cythocrome oxidase B gene

DHFR Dihydrofolate Reductase gene

DNA Desoxiribonucleic acid

dN Number of non-synonymous changes per non-synonymous site

dS Number of synonymous changes per synonymous site

EF-1α Elongation factor 1 alpha

GPI Glycosylphosphatidylinositol

HYPHY Hypothesis Testing Using Phylogenies

IFF8 Individual proteins file family F 8

ITS Intergenic Transcribed Spacer

LPS Lipopolysaccharide

LRTs Likelihood Ratio Tests

MADS MCM1-AGAMOUS-DEFICIENS-SRF

MAP1 Schizosaccharomyces pombe MCM1 homologue

MCM1 MiniChromosome Maintenance

MCMCMC Metropolis-coupled Markov Chain Monte Carlo

MEC Mechanistical-Empirical Model

MEF2 Myocite Enhancer Factor 2

Page 19: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xviii

MEGA Molecular Evolutionary Genetics Analysis

ML Maximum likelihood

MMR Mismatch repair system

MP Maximum parsimony

mtDNA Mithocondrial Desoxiribonucleic acid

mtLSUrDNA Mithocondrial large subunit ribosomal DNA gene

mtSSUrDNA Mithocondrial small subunit ribosomal DNA gene

MUSCLE Multiple Sequence Comparison by log-Expectation

NJ Neighbor joining

NonWGD Non Whole Genome Duplication

nucLSUrDNA Nuclear large subunit ribosomal DNA gene

nucSSUrDNA Nuclear small subunit ribosomal DNA gene

OM Outer membrane

Omp85 Outer membrane protein family 85

ORF Open Reading Frame

PAML Phylogenetic Analysis by Maximum Likelihood

PAUP Phylogenetic Analysis Using Parsimony

PCR Polymerase chain reaction

PHYML Phylogeny by Maximum Likelihood

PP Posterior Probability

RAPD Randomly Amplified Polymorphic DNA

rDNA ribosomal DNA gene

rRNA ribosomal RNA

RFLP Restriction Fragment Length Polymorphism

RLM1 Resistance to Lethality of MKK1P386 overexpression gene

RPB1 RNA polymerase II largest subunit

RPB2 RNA polymerase II second largest subunit

SAM SRF-Arg80-Mcm1

S-H Shimodaira-Hasegawa suppot

SMP1 Second MEF2-like Protein

Page 20: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xix

SRF Serum Response Factor

TS Thymidylate Synthase gene

Ty elements Transponsable element of yeast

UMC1 Ustilago MCM1 homologue

UPGMA Unweighted Pair Group Method with Aritmetic Mean

WGD Whole Genome Duplication

Page 21: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 22: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xxi

FIGURE LIST

Figure 1. Reconstruction of the Tree of Life based on transition analysis(…) ................ 7

Figure 2. Reconstruction of the eukaryote evolutionary tree(…) .................................. 10

Figure 3. Diversity of fungi(…) ..................................................................................... 14

Figure 4. Phylogeny of the kingdom Fungi(…) ............................................................. 19

Figure 5. Schematic representation of Type I (SRF-like) and Type II (MEF2-like)

protein domains in plant, animal and fungi(…) .............................................................. 38

Figure 6. Structure of Type I (SRF) and Type II (MEF2A) MADS domains in complex

with their DNA target (Homo sapiens)(…) .................................................................... 39

Figure 7. GPI anchor structure(…) ................................................................................ 40

Figure 8. Fungal phylogeny based on Rlm1 protein sequences(…) .............................. 56

Figure 9. Fungal phylogeny based on RLM1 DNA sequences(…) ............................... 58

Figure 10. Fungal phylogeny based on Mcm1 protein sequences(…) .......................... 62

Figure 11. Phylogeny of the subphylum Saccharomycotina by using the RLM1(…) ... 70

Figure 12. Phylogeny of the subphylum Saccharomycotina by using the MCM1(…) . 71

Figure 13. Phylogeny of the subphylum Saccharomycotina inferred by using

concatenated alignment of RLM1-MCM1(…) ............................................................... 72

Figure 14. Phylogenetic network reconstructed using a concatenated alignment of

RLM1-MCM1 in the subphylum Saccharomycotina(…) ............................................... 73

Figure 15. Phylogeny of CUG Group based on IFF8(…) ............................................. 75

Figure 16. Subphylum Saccharomycotina phylogeny reconstructed using concatenated

alignment of RLM1-MCM1-IFF8(…) ............................................................................ 75

Figure 17. Subphylum Saccharomycotina phylogenetic network reconstructed using a

concatenated alignment of RLM1-MCM1-IFF8(…) ...................................................... 76

Page 23: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xxii

Figure 18. Conserved regions in RLM1 gene of the subphylum Saccharomycotina:

MADS-box, Acidic region, A-B conserved regions present in this subphylum(…) ...... 78

Figure 19. Region of microsatellites present in some species from subphylum

Saccharomycotina(…) ..................................................................................................... 79

Figure 20. Saccharomyces cerevisiae amino acid sequence showing sites under positive

selection by the MEC model(…) .................................................................................... 81

Figure 21. The three RLM1 open reading frames of Candida albicans, Kluyveromyces

lactis and Saccharomyces cerevisiae visualized with the Virtual Ribosome 1.0(…) ..... 83

Figure 22. (A) Evolution of MADS-box genes based on Alvarez-Buylla et al. (2000)

model, proposing a common origin for Animals, Fungi and Plants and a further

duplication event in the Fungi lineage in some species within the subphylum

Saccharomycotina (WGD group). (B) Comparison of Rlm1 protein/gene domains in S.

cerevisiae and C. albicans, showing the frameshift mutation ocurred after genome

duplication in S. cerevisisae abd WGD species that changed the coding amino acid from

Q to N .............................................................................................................................. 85

Figure 23. Predicted putative ancestral RLM1 gene sequence from which CUG, WGD

and Non-WGD groups diverged(…) ............................................................................... 87

Page 24: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

xxiii

TABLE LIST

Table 1. Evolution in the classification of living organisms throughout time. ................ 4

Table 2. The six kingdoms of life .................................................................................... 5

Table 3. Phyla of the kingdom Bacteria ........................................................................... 8

Page 25: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 26: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

Page 27: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 28: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

3

1. INTRODUCTION

1.1 Classification of organisms

The classification of living organisms has always been a controversial subject.

Attempts aiming at classifying living organisms were performed from the dawn of

recorded history, however early classification systems were artificial and based

primarily on habit and/or characteristics important to humans (i.e., medicines, food).

The earliest known system of classification, from which current systems of

classifying forms of life descend, comes from the thought presented by the Greek

philosopher and scientist Aristotle (384-322 BC), who published in his metaphysical and

logical works the first known classification of everything whatsoever, or "being". This

was the scheme that gave moderns such words as substance, species and genus and was

retained in a modified and less general form by Linnaeus. Aristotle‘s system classified

all living organisms known at that time as either a plant or an animal on the basis of

movement (air, land, or water), feeding mechanism, growth patterns, mode of

reproduction and possession or lack of red blood cells. Microscopic organisms were

unknown.

In 1735, Swedish naturalist Carolus Linnaeus formalized the use of two latin names

to identify each organism, a system called binomial nomenclature, in his great book the

Systema Naturae that ran through twelve editions during his lifetime. In his work,

nature was divided into three kingdoms: mineral, vegetable and animal (Table1).

Linnaeus grouped closely related organisms and introduced the modern classification

groups: class, order, family, genus, and species.

The german biologist Ernst Haeckel (1866) proposed a third kingdom, Protista, to

include all single-celled organisms. Some taxonomists also placed simple multicellular

organisms, such as seaweeds, in the kingdom Protista. Bacteria were also placed within

Protista as a separate group called Monera.

In 1938, American biologist Herbert Copeland was responsible for proposing the

group Monera as a fourth kingdom, which included only bacteria, separating it from

kingdom Protista. This was the first classification proposal to separate organisms

Page 29: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

4

without nuclei, the prokaryotes, from organisms with nuclei, called eukaryotes, at the

kingdom level.

Table 1. Evolution in the Classification of Living Organisms throughout time.

In 1969, American biologist Robert Whittaker proposed a fifth kingdom, Fungi,

based on fungi's unique structures and methods of obtaining food. Fungi do not ingest

food as animals do, nor do they make their own food, as plants do; rather, they secrete

digestive enzymes around their food and then absorb it into their cells.

Woese and Fox (1977), divided the kingdom Monera in two new kingdoms:

Eubacteria and Archeobacteria. Then, Carl Woese et al. (1990) proposed a new

category, called Domain, to reflect evidence from nucleic acid studies that more

precisely reveal evolutionary or family relationships. He suggested three domains,

Archaea, Bacteria, and Eukarya, based largely on studies of the small subunit rRNA.

Recently, the system of classification has been modified well supported by

molecular and cellular studies due to zoologist Thomas Cavalier-Smith (1981) with the

proposal of a sixth kingdom of life: the Chromista. In 2004, Chromista was accepted as

kingdom and the term Domain was changed to Empire, with the proposal of two:

Prokaryota and Eukaryota (Table 2).

Linnaeus 1735

2

Kingdoms

Haeckel 1866

3

Kingdoms

Copeland 1938

4

Kingdoms

Whittaker 1969

5

Kingdoms

Woese and Fox 1977

6

Kingdoms

Woese et

al. 1990

3 Domains

Cavalier-

Smith 2004

2 Empires

6

Kingdoms

Vegetabilia Protista Monera Monera Eubacteria Bacteria Prokaryota

Archaebacteria Archea Bacteria

Protista Protista Protista Eukarya Eukaryota

Animalia Plantae Protozoa

Plantae

Fungi Fungi Chromista

Animalia Plantae Plantae Fungi

Animalia Animalia Animalia Plantae

Animalia

Page 30: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

5

Table 2. The six kingdoms of life (Cavalier-Smith, 2004)

1.2 Tree of Life

The new insights coming from molecular sequence studies and their integration with

numerous other lines of evidence, genetic, structural and biochemical have allowed an

exact classification of organisms. Based on these facts, researchers are reconstructing

the tree of life, and trying to place correctly the root of the evolutionary tree of all life

what would enable us to deduce rigorously the major characteristics of the last common

ancestor of all. It is probably the most difficult problem of all in phylogenetics, not yet

solved and the most important to solve correctly because the result colours all

interpretations of evolutionary history, influencing ideas of which features are primitive

or derived and which branches are deeper and more ancient than others (Cavalier-Smith,

2002a; Gribaldo and Philippe, 2002).

During a long time, researchers proposed that the root of the tree of life came

from ancestral species of Archaebacteria, giving origin to other groups and ultimately to

the formation of superior multicellular organisms. Other attempts of rooting the tree of

life used protein paralogue trees that theoretically placed the root, but these results were

contradictory due to tree-reconstruction artefacts or to poor resolution (Cavalier-Smith,

2002a). Studies based on ribosome-related and DNA-handling enzymes suggested one

root between Neomura (Eukaryotes plus Archaebacteria) and Eubacteria (Iwabe et al.,

1989), whereas metabolic enzymes place the root within Eubacteria but in contradictory

places (Peretó et al., 2004). Palaeontology shows that eubacteria are much more ancient

Page 31: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

6

than Eukaryotes (Cavalier-Smith, 1987; Douzery et al., 2004; Roger and Hug, 2006),

with phylogenetic evidence that archaebacteria are sisters not ancestral to Eukaryotes,

implying that the root is not within the Neomura.

The first fully resolved prokaryote tree was inferred, by using 13 major

biological transitions within eubacteria (Figure 1 and Table 3).This tree showed a basal

stem comprising the new infrakingdom Glidobacteria (Chlorobacteria, Hadobacteria,

Cyanobacteria), which is entirely non-flagellate, and two derived branches comprising

the infrakingdom Gracilicutes and the subkingdom Unibacteria that diverged

immediately following the origin of flagella (Cavalier-Smith, 2006a).

Proteasome evolution, a proteic structure responsible for degradating unneeded

or damaged protein, indicated that the universal root would be outside the clade

comprising Neomura and Actinomycetales (proteates), and thus lying within other

eubacteria, contrary to the widespread assumption that it was between Eubacteria and

neomura. Cell wall and flagellar evolution independently located the root outside

Posibacteria (Actinobacteria and Endobacteria), and thus among Negibacteria, with two

membranes, favouring a transition from de Negibacteria to Posibacteria and not from

Posibacteria to Negibacteria as traditionally assumed. Posibacteria were derived from

Eurybacteria and ancestral to Neomura.

Studies based on the comparison of new amino acid insertions in the RNA

polymerase gene strongly favor the monophyly of Gracilicutes which contain

Proteobacteria, Planctobacteria, Sphingobacteria and Spirochaetes (Iyer et al., 2004).

Evolution of the negibacterial outer membrane places the root within Eobacteria

(Hadobacteria and Chlorobacteria both primitive without lipopolysaccharide), as all

phyla possesses the outer membrane β-barrel protein Omp85, and Chlorobacteria, the

only Negibacteria without Omp85, or within Chlorobacteria (Cavalier-Smith, 2006a).

A negibacterial root also fits the fossil record, which shows that Negibacteria are

more than twice as old as Eukaryotes (Cavalier-Smith, 2002a; Cavalier-Smith, 2006b).

Posibacteria, Archaebacteria and eukaryotes were probably all ancestrally heterotrophs,

whereas Negibacteria are likely to have been ancestrally photosynthetic and diversified

by evolving all the known types of photosystems and major antenna pigments.

Page 32: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

7

Figure 1. Reconstruction of the Tree of Life based on biological transitions analysis. Thumbnail sketches

show major variants in cell morphology (microtubular skeleton red, peptidogglycan wall

brown, outer membrane blue). The most likely root position is as shown; the possibility that it

may lie within Chlorobacteria instead cannot yet be ruled out. Lowest level groups including or

consisting enterily of photosynthetic organisms are in green or purple. The frequently

misplaced hyperthermophilic eubacteria are in red (Adapted from Cavalier-Smith, 2006a).

Page 33: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

8

Table 3. Phyla of the kingdom Bacteria (Cavalier-Smith, 2006a). *Classification proposed by author.

The last ancestor of all life forms would have been a eubacterium with acyl-ester

membrane lipids, large genome, murein peptidoglycan walls, and with a fully developed

eubacterial biology and cell division. It would be a non-flagellate negibacterium with

two membranes, probably a photosynthetic green non-sulphur bacterium with relatively

primitive secretory machinery, not a heterotrophic posibacterium with one membrane.

As Negibacteria are the only prokaryotes that use sunlight to fix carbon dioxide this is

also the only position that would have allowed the first ecosystems to have been based

on photosynthesis, without which extensive evolution might have been impossible

(Cavalier-Smith, 2002a; 2006a).

Page 34: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

9

Ancestral neomura would have had their ancestral origin in an Arabobacteria

(Actinobacteria). This ancestral would have been a thermophilic bacterium with

cotranslational protein secretion, N-linked glicosylation, glycoprotein instead of murein,

histones ¾ instead of DNA gyrase. This neomuran ancestor diverged sharply into two

contrasting lineages. One formed a glycoprotein wall and became hyperthermophilic,

evolving prenyl ether lipids and losing many eubacterial genes, as H1 histones, to form

the Archaebacteria. The other became much more radically changed by using its

glycoproteins as a flexible surface coat, evolving phagotrophy, an endomembrane

system, endoskeleton, be uniciliate, unicentriolar aerobic zooflagellate and nucleus and

enslaving an a-proteobacterium as a protomitochondrion to become the first eukaryote

(Cavalier-Smith, 2000; 2002b).

Eukaryotes comprise the basal kingdom Protozoa and four derived kingdoms:

Animalia, Fungi, Plantae, and Chromista (Cavalier-Smith, 1998). Some thoughtful 19th

century protozoologists suspected that flagellates preceded amoebae, as just asserted.

But until DNA sequencing and molecular systematics burgeoned in the 1980s and

1990s the often more popular idea of Haeckel that amoebae came first and flagellates

were more advanced could not be disproved. However, it is now known that the primary

diversification of eukaryote cells took place among zooflagellates with non-

photosynthetic predatory cells having one or more flagella for swimming, and often also

generating water currents for pulling in preys (Cavalier-Smith, 2006), from which two

groups diverged, the bikonts and the unikonts that incorporate all eukaryotic kingdoms

(Figure 2).

Plantae, Chromista, three protozoan infrakingdoms (Alveolata, Rhizaria,

Excavata), and the small zooflagellate phylum Apusozoa are all clearly ancestrally

biciliate and together constitute a clade designated the bikonts (Cavalier-Smith, 2002b).

Ancestral bikonts had two diverging cilia: the anterior undulated for swimming or prey

attraction, and the posterior for gliding on surfaces in many groups (likely its ancestral

condition), but secondarily adapted for swimming in other groups (or quite often was

lost to make secondary uniciliates, not to be confused with genuine unikonts, which

confusingly are sometimes secondarily biciliate). All major eukaryote groups except

Ciliophora included severely modified organisms that lost cilia; some became highly

amoeboid. The major shared derived character for all groups (not yet clearly

Page 35: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

10

demonstrated for Rhizaria) is the ciliary transformation in which the anterior

cilium/centriole and its associated roots are always the first formed during the cellular

division by bipartition; in the next cell cycle the cillium often undergo marked changes

in structure and function to become the corresponding posterior organelles (Moestrup,

2000).

Figure 2. Reconstruction of the eukaryote evolutionary tree. Taxa in black comprise the basal kingdom

Protozoa. The thumbnail sketches show the contrasting microtubule skeletons of unikonts and

bikonts in red. The dates highlighted in yellow are for the most ancient fossils known for each

major group. Major innovations that help group higher taxa are shown by bars. Four major cell

enslavements to form cellular chimaeras are shown by heavy arrows (enslaved bacteria) or

dashed arrows (enslaved eukaryote algae) (Cavalier-Smith, 2006c).

An independent derived character for bikonts is the fusion between thymidylate

synthase (TS) and dihydrofolate reductase (DHFR) genes to encode a single

bifunctional chimaeric protein. This fusion appears to have taken place in the common

ancestor of the ancestral bikont after their invertion in the cenancestral eukaryote

(Stechmann and Cavalier-Smith, 2002; 2003).

Based on these facts, a new branching for the bikont group was proposed, in

which the zooflagellate Phylum Apusozoa may be a clade sister of the Excavata. In an

another group, it was observed that the clade chromalveolates was formed by the

kingdom Chromista and the protozoan Infrakingdom Alveolata, which diverged from a

common ancestor that enslaved a red algae and evolved novel plastid protein-targeting

Page 36: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

11

machinery via the host rough endoplasmatic reticule (ER) and the enslaved algal plasma

membrane (periplastid membrane). kingdom Plantae, whose ancestor enslaved a

cyanobacteria to form chloroplasts, may be sister of an ancestral to chromalveolates;

Rhizaria and Excavata (jointly Cabozoa) are probably sisters if the formerly green algal

plastid of euglenoids and chloarachneans (Cercozoa) was enslaved in a single event

named symbiogenesis secondary in their common ancestor (Cavalier-Smith, 2002c;

2003).

The TS and DHFR genes are separately translated in Sarcomastigota, Animalia

and Fungi. These three taxa are referred to as unikonts because it is argued that their

common ancestor probably had only a single centriole and cilium per kinetid (Cavalier-

Smith, 2002b). This unikont state is found in most distinct amoebozoan. On the other

hand, unikonts share the same myosin II as in our muscles, being absent from all

bikonts (Cavalier-Smith, 2006c). Unikont is argued to be the ancestral state for

Amoebozoa (Cavalier-Smith, 2002b; Cavalier-Smith et al., 2004) as only myxogastrids

and former protostelids arguably related to them have bicentriolar kinetids. The

bicentriolar state of myxogastrids and many Choanozoa is developmentally different

from that of bikonts, as their anterior cilium remains anterior in successive cell cycles

and does not transform into a posterior one. Thus, it is arguably not homologous to the

bikont state with true ciliary transformation. A second gene fusion involving the first

three enzymes of pyrimidine biosynthesis (carbamoyl phosphate synthase,

dihydrorotase and aspartate carbamoyltransferase) is apparently a shared derived

character for unikonts, absent from bikonts and the ancestral prokaryotes (Stechmann

and Cavalier-Smith, 2003). As this involved two simultaneous fusion events, it is even

less likely to ever have been reversed than the DHFR/TS fusion. If none of these gene

fusions has ever been reversed during evolution, then they together indicate that the root

of the eukaryote tree cannot lie within unikonts or bikonts, but must lie between these

two clades (Stechmann and Cavalier-Smith, 2003).

Earlier structural and molecular evidence would support the idea that Animalia,

Choanozoa and Fungi together form a clade, characterized ancestrally by a single

posterior cilium with a bicentriolar kinetid and flat mitochondrial cristae (Cavalier-

Smith, 1987b). An exclusive grouping of animals and fungi was first noted in

evolutionary trees of small subunit ribosomal RNA analysis (Wainright et al., 1993) and

Page 37: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

12

proteins such as Ef-1α, Act1 and α-β tubulin (Baldauf and Palmer, 1993). This grouping

was subsequently designated as ‗‗Opisthokont‘‘ (Cavalier-Smith and Chao, 1995) and is

now strongly supported by phylogenetic analyses of all taxonomically well-represented

and well-characterized molecular data sets. These include concatenated multigene

analyses (Baldauf et al., 2000; Bapteste et al. 2001; Lang et al., 2002) as well as many

single-gene trees (Baldauf, 1999; Inagaki and Doolittle, 2000; Van de Peer et al., 2000).

Recent molecular phylogenetic evidence has indicated that animals and fungi evolved

independently from different unicellular protozoan choanozoan ancestors (Steenkamp et

al., 2006; Shalchian-Tabrizi et al., 2008).

The other clade that forms part of the Opisthokonts is the Phylum Choanozoa.

Extant Choanozoa are either a clade that is sister to Animalia or a paraphyletic group

ancestral to them (Cavalier-Smith and Chao, 2003; Rokas et al., 2003), and within the

Choanozoa only the Nucleariid appears to be the closest sister taxon to Fungi

(Steeankamp et al., 2006). A sister relationship between animals and choanoflagellates

at least is supported by the probable gene fusion that generated the receptor tyrosine

kinase from a cytoplasmic kinase and calcium-binding epidermal growth factor in their

common ancestor after it diverged from Amoebozoa (King and Carroll, 2001). The

Opisthokont clade is also supported by a shared derived 11–17 amino acid insertion in

protein synthesis elongation factor EF-1α absent from prokaryotes and all other

eukaryotes (Baldauf, 1999).

1.3 Kingdom Fungi

Fungi share a long history with human civilization. References in Greek literature,

mushroom stones from Mesoamérica dating back to 1000-300 B.C. (Lowy, 1971), and

dried mushrooms of Piptoporus betulinus found in a pouch around a Stone Age man´s

neck in the Alps (Rensberger, 1992) all attest to this long relationship. These make up

one of the major clades of life. Roughly 80000 species of fungi have been described, but

the actual number of species has been estimated at approximately 1.5 million

(Hawksworth, 1991; 2001). This number may yet underestimate the true magnitude of

fungal diversity (Persoh and Rambold, 2003).

For a long time, fungi were regarded as a single kingdom belonging to the

aforementioned five-kingdom scheme. However, the organisms usually considered to be

Page 38: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

13

fungi are very complex and diverse. Their recognition, delimitation and typification

have been formidable problems since the discovery of these organisms. They can

present chitin or not, multicellular and filamentous absorptive forms and unicellular

assimilative forms, among others, and can reproduce by different types of propagules or

even by fission (Alexopoulos et al., 1996; Guarro et al., 1999) (Figure 3). Despite this

great variability in form and function, systematic studies based upon morphology and

life cycle produced, by the middle 20th century, a manageable number of major phyla;

and the use of molecular approaches have actually determined seven phyla well defined

within the kingdom Fungi (Hibbet et al., 2007).

Fungi play a critical role impacting nearly all other forms of life in virtually all

ecosystems as either friend or foe. Saprotrophic fungi are important in the environment

in the cycling of nutrients through the decomposition of organic material, especially the

carbon that is sequestered in wood and other plant tissues. Mutualistic symbiont fungi

through relationships with prokaryotes, plant (including algae) and animals have

enabled a diversity of other organisms to exploit novel habitats and resources. Indeed,

the establishment of mycorrhizal associations may be a key factor that enabled plants to

make the transition from aquatic to terrestrial habitats. Other group are pathogenic and

parasitic Fungi that attack virtually all groups of organisms, including bacteria, plants,

other Fungi, and animals, including humans. The economic impact of such fungi is

massive either beneficial by the production of antibiotics or extremely detrimental by

the devastating impacts in plant diseases, mycotoxins and mycosis (Pirozynski and

Malloch, 1975; Moss, 1987; Alexopoulos et al., 1996; Blackwell, 2000).

Page 39: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

14

Figure 3. Diversity of fungi. Microsporidia (Nosema apis (A), Octosporea bayeri (B));

Blastocladiomycota (Flammulina velutipes (C)); Neocallimastigoycota (Caecomyces

communis (D)); Chytridiomycota (Batrachochytrium dendrobatidis (E)); ‗Zygomycota‘-

Mucoromycotina (Phycomyces blakesleeanus (H), Mucor circinelloides (G));

Glomeromycota (Glomus coronatum (J)); Basidiomycota (Agaricus xanthodermus (F),

Cryptococcus neoformans (I), Ustilago maydis (K)); Ascomycota (Microsporum gypseum

(L), Aspergillus niger (M), Candida albicans (N), Blastomyces dermatitidis (O)).

Page 40: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

15

1.4 Fungal taxonomy

Historically the monophyletic Fungi have been classified into four phyla:

Chytridiomycota, Zygomycota, Basidiomycota, and Ascomycota (Barr, 1992;

Hawksworth et al., 1995; Alexopoulos et al. 1996). However, current studies have

shown that such a simple classification does not represent the phylogeny of those

organisms, being currently recognized the phyla Microsporidia, Blastocladiomycota,

Neocallimastigomycota, Chytridiomycota, Glomeromycota, Basidiomycota and

Ascomycota. Presently, the phylum named Zygomycota is still to define due to the

pending resolution of relationships among the clades that have traditionally been placed

within this phylum (Hibbet et al., 2007).

The current fungal classification accepts one kingdom, one subkingdom, seven

phyla, ten subphyla, 35 classes, 12 subclasses, and 129 orders. The clade containing the

Ascomycota and Basidiomycota is classified as the subkingdom Dikarya (James et al.,

2006), reflecting the putative synapomorphy of dikaryotic hyphae (Tehler, 1988). All of

the other new names are based on automatically typified teleomorphic names. In

Basidiomycota, the clades formerly called Basidiomycetes, Urediniomycetes, and

Ustilaginomycetes in the last edition of Ainsworth & Bisby‘s Dictionary of the Fungi

are now called the Agaricomycotina, Pucciniomycotina, and Ustilaginomycotina,

respectively (Bauer et al., 2006). This was done to minimize confusion between taxon

names and informal terms (basidiomycetes is a commonly used informal term for all

Basidiomycota) and to refer to the included genera Agaricus (including the cultivated

button mushroom) and Puccinia (which includes barberry-wheat rust). Another

significant change in the Basidiomycota classification has been the inclusion of the

Wallemiomycetes and Entorrhizomycetes as classes incertae sedis within the phylum,

reflecting ambiguity about their higher-level placements (Matheny et al., 2007a).

The most dramatic changes in the classification concern the ‗basal fungal

lineages‘, which include the taxa that have traditionally been placed in the Zygomycota

and Chytridiomycota. These groups have long been recognized to be polyphyletic,

based on analyses of rRNA, EF-1α, and RPB1 (James et al., 2000; Nagahama et al.,

1995; Tanabe et al., 2004, 2005). Recent multilocus analyses now provide the sampling,

resolution, and support necessary to structure new classifications of these early-

diverging groups, although significant questions remain (Liu et al., 2006; Lutzoni et

Page 41: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

16

al.,2004; James et al., 2006). The Chytridiomycota is retained in a highly restricted

sense, including Chytridiomycetes and Monoblepharidomycetes. The Blastocladiales, a

traditional member of the Chytridiomycota, is now treated as a phylum, the

Blastocladiomycota (James et al., 2007). The Neocallimastigales, whose distinctiveness

from other chytrids has long been recognized, is also elevated to phylum, based on both

morphology and molecular phylogeny (Hibbett et al., 2007). The genera

Caulochytrium, Olpidium, and Rozella, which have traditionally been placed in the

Chytridiomycota, and Basidiobolus, previously classified in the Zygomycota

(Entomophthorales), are not included in any higher taxa in the current classification,

pending more definitive resolutions of their placements. The phylum Zygomycota has

not been accepted in the current classification, and its resolution is pending due to the

relationships among the groups that were traditionally part of this phylum. The

traditional Zygomycota were distributed among the phylum Glomeromycota and four

subphyla incertae sedis, including Mucoromycotina, Kickxellomycotina,

Zoopagomycotina and Entomophthoromycotina (Hibbett et al., 2007). Microsporidia,

unicellular parasites of animals and protists with highly reduced mitochondria, has been

included as a phylum of the Fungi, based on multigenic analyses (Germot et al., 1997;

Hirt et al., 1997; Peyretaillade et al., 1998; Keeling et al., 2000; Gill and Fast, 2006;

James et al., 2006; Liu et al., 2006).

1.5. Fungal phylogeny

Until recently, the fungal phylogenetics has been well served by phenotypic

characters such as morphological comparison, cell wall composition, cytologic testing,

ultrastructure, cellular metabolism and fossil records (Le`John, 1974; Taylor, 1978;

Heath, 1986; Hawksworth et al., 1995). However, higher-level relationships amongst

these groups are less certain and are best elucidated using molecular techniques. More

recently, the advent of cladistic and molecular approaches has changed the existing

situation and provided new insights into fungal evolution. Genotypic analysis offers a

straightforward means of resolving evolutionary questions where phenotypic characters

mentioned are absent or contradictory.

Many loci have been used by mycologists for evolutionary studies at single-gene

level, but few of these loci revealed to be appropriated to resolve relationships among

the main lineages of the Fungi. Even when trees are inferred by using multiple loci, the

Page 42: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

17

phylogenetic signal may be limited strongly by the loci selected. Phylogenetic studies

indicate that more than 83.9% of fungal phylogenies are based exclusively on sequences

from the ribosomal RNA tandem repeats. The few protein-coding genes that have been

sequenced for phylogenetic studies of fungi have demonstrated clearly that such genes

can contribute greatly to resolving deep phylogenetic relationships with high support

and/or increase support for topologies inferred by using ribosomal DNA genes (Liu et

al., 1999). Studies using other protein-coding genes such as DNA topoisomerase II,

Actina-1 (ACT1), first large subunit of RNA polymerase (RPB1), Citochrome oxidase 2

(COX2) and Citochrome oxidase B (Cyt B) combined with traditional genes used in

phylogeny are already being used to infer fungal phylogeny (Kurtzman and Robnett,

2003; Diezmman et al., 2004; Lutzoni et al., 2004; Reeb et al. 2004; Wang et al., 2004;

Liu et al., 2006; Matheny et al., 2007b).

Until 2004, genome databases revealed that 21075 ITS, 7990 nucSSU, 5373

nucLSU, 1991 mitSSU, and 349 RPB2 sequences were available and the number is

continuosly increasing (Lutzoni et al., 2004). As impressive as these numbers is the

collective effort to generate DNA sequence data for the Fungi but none of these loci

alone can resolve the fungal tree of life with a satisfactory level of phylogenetic

confidence (Kurtzman and Robnett, 1998; Tehler et al., 2000; Berbee, 2001; Binder and

Hibbett, 2002; Moncalvo et al., 2002; Tehler et al., 2003). Combining sequence data

from multiple loci is an integral part of large scale phylogenetic inference and is central

to assembling the fungal tree of life. However, most published phylogenetic studies

have restricted their analyses to one locus maximizing the number of fungal taxa. To

quantify this observation, 560 publications reporting published fungal phylogenetic

trees were surveyed from 1990 through 2003. Of the 595 trees considered in those

studies, 489 (82.2%) were based on a single locus, only 77 trees were based on two

combined loci, 19 on three combined loci, and 10 on four or more combined loci. Seven

of the latter 10 studies were restricted to closely related species or strains within a

species (Lutzoni et al.; 2004). Exceptions included studies with 93 species representing

most major clades of Homobasidiomycetes (Binder and Hibbett, 2002), with 15 species

representing 10 orders (Binder et al., 2001), and with 45 species representing nine

orders (Hibbett and Binder, 2001).

Page 43: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

18

The largest phylogenetic tree based on one locus included 1551 nucSSU

sequences representing 60 orders (Tehler et al., 2003). The largest multilocus trees

included 162 ITS + ß-tubulin sequences representing a single order of Fungi (Stenroos

et al., 2002); 158 species representing 10 orders based on nucSSU, nucLSU, and

mitSSU (Hibbett et al., 2000); 110 species in a single order sequenced for ITS and

nucLSU (Peterson, 2000); and 108 nucSSU + nucLSU sequences representing 19 orders

of Fungi (Miadlikowska and Lutzoni, 2004). A more recent phylogenetic study directed

toward resolving the fungal tree of life based on six combined loci, included members

from all four traditionally recognized phyla of Fungi (Ascomycota, Basidiomycota,

Chytridiomycota, and Zygomycota), and two new phyla Glomeromycota and

Microsporidia (James et al., 2006).

Although much effort has been invested in defining orders (compared to

families, for example), few studies have focused on resolving relationships among

orders of Fungi: 354 of the 595 trees examined (59.5%) conveyed relationships within

single orders. The largest number of orders considered in a single study (N=62) resulted

in a tree based only on nucSSU data (Tehler et al., 2000). The fungal trees based on

combined data from multiple loci and encompassing the largest number of orders

included 38 species representing 25 orders (Bhattacharya et al., 2000), 52 species

representing 20 orders (Lutzoni et al., 2001), and 108 species representing 19 orders

(Miadlikowska and Lutzoni, 2004). All of these studies focused on ascomycetes and

were based on nucSSU and nucLSU rDNA. An exceptional study covering 16 orders of

fungi (34 species) using a combined analysis of two protein-coding genes (α and ß-

tubulin) was carried out to infer the phylogenetic placement of Microsporidia within

kingdom Fungi (Keeling, 2003). The major fungal phylogenomic study based on 42

complete genomes was carried out by using all conserved regions of genes, resolving

deep phylogenetic relationship of the majority of known taxa. The major drawback of

this study is the low number of sequenced fungal genomes being an obstacle to resolve

basal taxa in some clades (Fitzpatrick et al., 2006).

Figure 4 shows a fungal phylogeny based on genetic and geologic data, which

present the five main major groups of fungal organisms that have their genomes

sequenced or in progress, but uses the previous taxonomic denomination. In this tree, it

is evident that the Ascomycetes and Basidiomycetes are sister groups (subkingdom

Page 44: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

19

Dykaria), and that there is a close relationship of Glomeromycetes to the subkingdom

Dykaria, which diverged about 600 myrs ago. The basal groups in the kingdom Fungi

are Chytrids and Zygomycetes. Currently, studies for determining the correct placement

of species in these groups are being carried out, since some of these produced paraphyly

(Chytrids) or poliphyly (Zygomycetes) in the fungal phylogeny. Due to this result, the

species from Zygomycetes have been reclassified (Hibbet et al., 2007).

Figure 4. Phylogeny of the kingdom Fungi. Major fungal groups colored as indicated by text at the right.

Diamonds indicate evolutionary branch points, and their approximate dating (bottom), captured

by fungal genomes sequenced in or in progress (modified Berbee and Taylor, 2001).

1.6. Fungal identification

The identification of fungi to species level and also the distinction of

morphologically identical strains of the same species has been a challenge for the

mycologists. Hence, most species have received only limited study, and the

classification has been mainly traditional and based on readily observable

morphological features. Other features besides morphology, such as susceptibility to

chemicals and antifungal drugs, physiological and biochemical tests, production of

secondary metabolites, ubiquinone systems, fatty acid composition, cell wall

composition, protein composition and molecular approaches, have been used in

classification and also in identification of fungal species (Guarro et al., 1999).

Page 45: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

20

1.6.1. Phenotypic methods

The classification and identification of fungi using phenotypic methods relies

mainly on morphological criteria. The classical light microscopic methods have been

enhanced by Nomarski differential interference contrast, fluorescence, cytochemistry,

and the development of new staining techniques such as those for ascus apical structures

(Romero and Winter, 1988). Unfortunately, during infections most pathogenic fungi

show only the vegetative phase (absence of sporulation), in host tissue only hyphal

elements or other nonspecific structures are observed. Although the pigmentation and

shape of these hyphae and the presence or absence of septa can give us an idea of their

identity, fungal culture is required for accurate identification. For the identification and

classification of these fungi, the type of conidia and conidiogenesis are considered the

most important sets of characteristics to be observed (Cole and Samson, 1979; Guarro et

al., 1999). In the rare instances that opportunistic fungi develop the teleomorph in vitro

(this happens in numerous species of Ascomycota and in a few species of Zygomycota

and Basidiomycota), there are many morphological details associated with sexual

sporulation which can be extremely useful in their classification. The type of fruiting

body and type of ascus (structure that contain ascospores produced after meiosis) are

vital for classification. Shape, color, and the presence of an apical opening (ostiole) in

the fruiting bodies are important features in the recognition of higher taxa (Guarro et al.,

1999). In recent years, morphological techniques have been influenced by modern

procedures, which allow more reliable phenotypic studies to be performed. Numerical

taxonomy, effective statistical packages, and the application of computer facilities to the

development of identification keys offer some solutions and the possibility of a

renaissance of morphological studies.

Besides the use of morphological identification, other approaches have been

proposed to identify fungi, which are: a) Biotyping, which is based on the use of

physiological and biochemical test such as the commercially avaliable kits, API and

VITEK systems, and selective media as CHROMOAGAR have been used to identify

unicellular fungi (San-Millan et al., 1996; Bernal et al., 1998; Graf et al., 2000; Ballesté

et al., 2005), b) Serotyping, a method based on aglutination reaction between antigens

and policlonal antibodies, used for pathogenic yeast typing as Candida spp,

Cryptococcus neoformans at the strain level (Evans et al., 1950; Haesenclever and

Mitchell, 1961), c) Production of secondary metabolites, which consists in the

Page 46: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

21

identication of fungi by the metabolites that they produce (Carlile and Watkinson, 1994;

Frisvad, 1994; Whaley and Edwards, 1995), d) Cell wall composition, based on studies

of structure and composition of the cell wall of fungi, delimiting high taxa (Bartnicki-

Garcia, 1987; Carlile and Watkinson, 1994), and e) Isoenzymatic analysis, that enable

the differentiation between species by the electrophoretic pattern of specific enzyme

activity (Busch and Nitscko, 1999; Soll, 2000). All of these techniques have presented

limitations in the identification of fungal species, being some of them restricted to some

species or fungal group. Hence, the use of techniques that included studies at the

genomic level may resolve these disadvantages.

1.6.2. Genotypic methods

Methods based on studies directly in DNA are within the genotypic methods. The

electrophoretic karyotyping is a method, which use pulsed field gel electrophoresis to

assess variations of chromosome sizes and the precise determination of karyotypes,

allowing the discrimination at the strain and species level. The electrophoretic

karyotyping consist in the migration of chromosomes under the influence of electric

fields of alternating orientation to move intact chromosomes according to size through

the agarose gel matrix, which can be visualized by ethidium bromide staining (Guarro et

al., 1999; Taylor et al., 1999). Electrophoretic karyotyping has been used to fingerprint

different fungal species as Candida spp, Micromucor spp, Paracoccidioides

brasiliensis, Pyrenophora spp (Dib et al., 1996; Nogueira et al., 1998; Aragona et al.,

2000; Klempp-Selb et al., 2000; Shin et al., 2001, Nagy et al., 2004).

One of the first DNA fingerprinting methods used to assess strain relatedness and

taxonomy in fungi was restriction fragment analysis, or restriction fragment length

polymorphism (RFLP) comparisons, without probe hybridization. DNA is extracted,

digested with one or more restriction endonucleases to sample short pieces of DNA

sequence. Then, the fragments are separated by electrophoresis in an agarose gel. The

banding pattern of digested DNA is then visualized, usually by staining with ethidium

bromide. The obtained band pattern is based on different fragment lengths determined

by the restriction sites identified by the particular endonuclease employed (Taylor et al.,

1999; Soll, 2000). Any region of DNA can be used for RFLP analysis if variation

(alleles) can be visualized, directly due to multiple copies (mtDNA, rDNA) or indirectly

either by hybridization with a probe or by PCR amplification. RFLPs were the first

Page 47: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

22

DNA markers used for fungal evolutionary biology. The band patterns can be tabulated

and compared, and phenetic trees constructed (Edel et al., 1996). Critics of RFLP

analysis note that while restriction sites in different individuals are most likely to be

identical by descent that is not the case for missing sites, because it is easier to lose a

site than to gain one. There are many ways in which a restriction endonuclease site can

be lost: any of the several nucleotides in the recognition sequence can be substituted, or

the site can suffer length mutations. Missing sites that have arisen, unknowingly, by

different routes confound evolutionary analysis (Taylor et al., 1999).

Randomly amplified polymorphic DNA (RAPD) analysis or arbitrarily primed PCR

(AP-PCR) analysis is similar to RFLP analysis in that it assays DNA sequence variation

in short regions, but instead of analyzing restriction endonuclease recognition

sequences, it focuses on PCR priming regions (Williams et al., 1990). Using random

primers of approximately 10 bases and a low annealing temperature, amplicons

throughout the genome are targeted and amplified. Amplified products are separated on

an agarose gel and stained with ethidium bromide. When a single random

oligonucleotide primer is used in a reaction, it hybridizes to homologous sequences in

the genome and if the primer hybridizes to sequences on alternative DNA strands within

roughly 3 kb, the DNA region between the two hybridization sites will be amplified.

Criticisms of RAPDs initially focused on reproducibility. RAPD analysis succeeds

because just one nucleotide substitution can allow or prevent priming, and so it is not

surprising that small differences in any aspect will have the same effect of PCR profile.

Even if there were no problems with repeatability, there would still be the concern that

bands of equal electrophoretic mobility may not be homologous and the related concern

that missing bands may not be homologous because they can be lost by several possible

nucleotide substitutions in either PCR priming site as well as by length mutations. There

is also the problem of dominant and null alleles; in haploid organisms both the

dominant (presence) and null (absence) alleles can be scored, but in diploids it is not

possible to distinguish genotypes that are homozygous for the dominant allele from

those that are heterozygous. It is also tempting to score more than one variable band per

RAPD reaction, although they may not be independent (Taylor et al., 1999). Thus,

RAPD has not been found useful for evolutionary studies. Because of the repetitive

character of the target sequences, genetic distances calculated from RAPD could be

Page 48: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

23

affected by parology, namely, recombination and duplication events not parallel with

speciation events (Melo et al., 1998).

Amplified fragment length polymorphism (AFLP) analysis is a relatively new

technique which has a discriminatory power that makes it suitable for identification as

well as for strain typing. In short, in AFLP analysis genomic DNA is digested with two

restriction enzymes (EcoRI and MseI) and double-stranded oligonucleotide adapters are

ligated to the fragments. These adapters serve as targets for the primers during PCR

amplification. To increase the specificity, it is possible to elongate the primers at their

3`ends with one to three selective nucleotides. One of the primers is labeled with a

fluorescent dye (Vos et al 1995). The polymorphism revealed by amplified fragment

length polymorphism (AFLP) analysis depends on restriction endonuclease site

differences, just like RFLP. The variable fragments (loci) have two alleles (positive and

null), and so the same concerns raised about null alleles with RFLP and RAPD analysis

apply (Taylor et al., 1999). The advantage of AFLP analysis is that only a limited

amount of DNA is needed since the fragments are PCR amplified producing more than

with RFLP. Furthermore, since stringent annealing temperatures are used during

amplification, the technique is more reproducible and robust than other methods such as

RAPD analysis (Savelkoul et al., 1999). The value and potential of the AFLP technique

in differentiation and identification from strains and/or species and the similarity of

genetic analysis has been demonstrated in yeast ecology and evolutionary studies

(Barros Lopes et al., 1999).

Sequencing is the best method for measuring phylogenetic relationships and

discriminating between species and strains through comparison of the DNA sequences

of a variety of noncoding and coding regions. The underlying assumption is that the

rates at which mutations accumulate in gene regions reflect evolutionary clocks.

Because particular noncoding and coding regions may be highly resistant to change and

therefore may have very slow evolutionary clocks while other coding regions, such as

those of genes involved in pathogenesis, may be far less resistant to change and

therefore may have very fast evolutionary clocks. The studies of rRNA genes were

based on the assumption that they were less prone to biases resulting from selective

pressure (Karl and Avise, 1993). Until recently, cloning and sequencing genes were

slow, technically demanding and expensive, that seemed beyond the technical capacity

Page 49: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

24

of most medical mycology laboratories. However, the emergence of PCR, automated

DNA-sequencing technologies, and gene data banks have made these studies easier

(Soll, 2000). The sequencing of several genes both, nuclear and mithocondrial, is being

used to infer phylogeny and discriminate fungal species and strains.

In the first sequencing studies, noncoding regions have been selected to infer

phylogeny, using the small subunit ribosomal DNA sequences from eukaryotes and

prokaryotes (Woese and Fox, 1977), allowing the reconstruction of the first tree of life.

Intergenic Transcribed Spacer (ITS) regions are mainly used in fungal taxonomy and

phylogeny, but to resolve the phylogenetic placement of some fungal species that

present an uncertain position new genes are being used as nuclear large and small

subunit ribosomal DNA (nucLSU and nucSSU), mithocondrial large and small

ribosomal DNA (mitLSU and mitSSU), second large subunit of RNA polymerase

(RPB2), α and β tubulin and elongation factor 1 (EF-1), clarifying the divergence of

species (Lutzoni et al., 2004; Bridge et al., 2005; James et al., 2006). Currently, the

sequencing is being performed at the genome level what has enabled to explain better

the phylogenetic relationships in fungi (Fitzpatrick et al., 2006).

1.7. Methods for phylogenetic inference

Previously, the attempt of inferring the phylogeny was based in the identification of

common morphological characters amongst species. Though these data enabled us to

group close species, it did not always resolve deep phylogenetic relationships especially

in organisms with particular characters like fungi that were for long time related with

plants. Currently, the methods for inferring phylogenetic relationships between species

are using amino acid or nucleotide sequences, being these data more reliable because

we can obtain information that is not possible to visualize by morphological analysis,

such as presence and difference among homologue genes, mutations, and duplicated

genes, for instance.

In phylogenetic studies the first step is the single-gene or multigenic homologue

sequence alignment. This procediment will allow the identification of regions with high

homologies and how close the sequences are by using both nucleotides and amino acid

sequences. Several programs have been proposed for alignment of sequences, being the

most used the CLUSTAL series of programs. The obtained alignment is imported into a

Page 50: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

25

program of phylogenetic inference, which can use two primary approaches to

phylogenetic tree construction: an algorithmic and a tree-searching. The first uses an

algorithm to construct a tree from the data. The second constructs many trees, and then

the best tree or best set of trees is selected. These two methods are also known as

distance and character-based methods, respectively. Below, the two methods of

alignment as well as methods of phylogenetic construction will be described.

1.7.1. Multiple sequence alignment

1.7.1.1. CLUSTAL series of programs

The Clustal series of programs are widely used in molecular biology for the multiple

alignments of both nucleic acid and protein sequences and for preparing phylogenetic

trees. The first Clustal program was designed specifically to work efficiently on

personal computers, which at that time, had feeble computing power by today‘s

standards. It combined a memory-efficient dynamic programming algorithm with the

progressive alignment strategy. The multiple alignments are built up progressively by a

series of pairwise alignments, following the branching order in a guide tree. The initial

pre-comparison used a rapid word-based alignment algorithm and the guide tree was

constructed by using the Unweighted Pair-Group Method with Arithmetic Mean

(UPGMA) method (Higgins and Sharp, 1988). In 1992, a new release was made, called

ClustalV, which incorporated profile alignments (alignments of existing alignments)

and the facility to generate trees from the multiple alignment by using the Neighbour

Joining (NJ) method, and the user could also test the tree for robustness by using a

simple bootstrap test of tree topology (Higgins et al., 1992). In 1994, the third

generation of the series Clustal W was released. Clustal W incorporated position-

specific gap penalties so that gap penalties can be lowered at hydrophilic residues and

wherever gaps are already introduced into the alignment. The sequence pre-comparison

in Clustal W uses more sensitive dynamic programming, which yields a much better

dendrogram by means of NJ, which improves tree topology and provides a method for

weighting sequences on the basis of their divergences (Thompson et al., 1994). After,

Clustal X was developed, displaying many improvements. Within alignments,

conserved columns are highlighted by using a colour scheme that the user can

customize. Beneath the sequence alignment, Clustal X provides a plot of residue

conservation. Quality-analysis tools that highlight misaligned regions are also available

(Thompson et al., 1997). The popularity of the programs depends on a number of

Page 51: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

26

factors, including not only the accuracy of the results, but also the robustness,

portability and user-friendliness of the programs.

1.7.1.2. MUSCLE

Multiple sequence comparison by log-expectation (MUSCLE) is a new

computer program for creating multiple alignments of protein sequences or alternatively

nucleotides. This program uses two distance measures for a pair of sequences: a kmer

distance (for an unaligned pair) and the Kimura distance (for an aligned pair). A kmer is

a contiguous subsequence of length k, also known as a word or k-tuple. Related

sequences tend to have more kmers in common than expected by chance. The kmer

distance is derived from the fraction of kmers in common in a compressed amino acid

alphabet. This measure does not require an alignment, giving a significant speed

advantage. Given an aligned pair of sequences, it computes the pairwise identity and

converts it to an additive distance estimate, applying the Kimura correction for multiple

substitutions at a single site. Distance matrices are clustered using UPGMA, which give

slightly improved results over neighbor-joining, despite the expectation that neighbor-

joining will give a more reliable estimate of the evolutionary tree. This can be explained

by assuming that in progressive alignment, the best accuracy is obtained at each node by

aligning the two profiles that have fewest differences, even if they are not evolutionary

neighbors. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of

these sets. Without refinement, MUSCLE achieves average accuracy statistically

indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods

for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min

on a current desktop computer (Edgar, 2004).

1.7.2. Major methods for estimating phylogenetic trees and infer

phylogeny

1.7.2.1. Distance methods

These methods convert aligned sequences into a distance matrix of pairwise

differences (distances) between the sequences. The matrix is used as the data from

which branching order and branch lengths are computed. Within the most used methods

we have the NJ and the UPGMA.

UPGMA is a straightforward example of a clustering method. It starts with grouping

two taxa with the smallest distance between them according to the distance matrix.

Page 52: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

27

Then, a new node is added in the midpoint of the two, and the two original taxa are put

on the tree. The distance from the new node to other nodes will be the arithmetic

average. Then, a reduced distance matrix is obtained by replacing two taxa with one

new node. That process is repeated on the new matrix and reiterated until the matrix

consists of a single entry. That set of matrices is then used to build up the tree by

starting at the root and moving out to the first two nodes represented by the last two

clusters. In fact, the outcome phylogenetic tree from UPGMA is not an arbitrary tree.

Instead, it is a special type of tree, which satisfies the molecular clock property. That is,

the total time travelling down a path to the leaves from any node is the same, regardless

the choice of path. Thus, if the original phylogenetic tree from observation does not

have this property, it will not reconcile well with the tree from UPGMA. One necessary

condition for producing a better phylogenetic tree is the ultrametric condition, which

requires that any triangle must be at least equilateral with regard to the weights between

them, what is very unlikely. For that and other reasons, UPGMA is considered a bad

algorithm for construction of phylogenetic trees, since it relies on the rates of evolution

among different lineages to be approximately equal (Sneath and Sokal, 1973; Hall,

2001).

Neighbor Joining is similar to UPGMA in that it manipulates a distance matrix,

reducing it in size at each step, and reconstructing the tree from that series of matrices.

It differs from UPGMA in that it does not construct clusters but directly calculates

distances to internal nodes. In this method, the total sum of the individual distances is

not computed for all or many different topologies, but the examination of different

topologies is imbedded in the algorithm, so that only one final tree is produced. From

the original matrix, NJ starts first with a star phylogeny. The next step is to calculate for

each taxon its net divergence from all other taxa as the sum of the individual distances

from the taxon. It then uses that net divergence to calculate a corrected distance matrix.

NJ then finds the pair of taxa with the lowest corrected distance and calculates the

distance from each of those taxa to the node that joins them, so that we can identify a

pair of neighbors. Once this pair is identified, they are combined as a single unit and

treated as a single sequence in the next step. This process is continued until all

multifurcating nodes are resolved into bifurcating ones. NJ does not assume that all taxa

are equidistant from a root. Unlike UPGMA, NJ method produces unrooted trees. One

of the disadvantages of the NJ method is that it generates only one tree and does not test

Page 53: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

28

other possible tree topologies. This can cause problems because in many cases in the

initial set up of NJ, there might be multiple equally close pair of neighbors to join,

leading to multiple trees. Since there is no way the algorithm would know which one is

the most optimal tree, choosing a wrong option may produce a suboptimal tree. To

overcome this problem, a generalized NJ method has been developed in which multiple

trees with different initial taxon groupings are generated (Saitou and Nei, 1987; Nei,

1996; Hall, 2001).

1.7.2.2. Character-based methods

These methods are characterized by using multiple alignments directly for

comparison of characters within each column (each site) in the alignment. Three main

methods are applied in phylogeny: maximum parsimony (MP), maximum likelihood

(ML) and Bayesian inference (BI).

The MP method, a given set of nucleotide (or amino acid) sequences are

considered, and the nucleotides (or amino acids) of ancestral sequences for a

hypothetical topology are inferred under the assumption that mutational changes occur

in all directions among the four different nucleotides or the 20 amino acids. The

smallest number of nucleotide substitutions that explain the entire evolutionary process

for the given topology is then computed. This computation is done for all other

topologies, and the topology that requires the smallest number of substitutions is chosen

as the best tree. If there are no multiple substitutions at each site, MP is expected to

generate the correct topology as long as enough parsimony-informative sites are

examined. The disadvantage is that in practice the nucleotide sequences are often

subject to backward and parallel substitutions and this introduces uncertainties in

phylogenetic inference. When the true tree has a special type of topology and branch

lengths MP may generate an incorrect topology even if an infinite number of

nucleotides are examined and this can happen even if the rate of nucleotide substitution

is constant for all evolutionary lineages (Felsenstein, 1978; Takezaki and Nei, 1994).

Furthermore, in parsimony analysis it is difficult to treat the phylogenetic inference in a

statistical framework because there is no natural way to compute the means and

variances of minimum numbers of substitutions obtained by the parsimony procedure.

However, under certain circumstances, MP is quite efficient in obtaining the correct

topology. It should also be noted that MP is the only method that can easily take care of

Page 54: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

29

insertions and deletions of nucleotides, which sometimes give important phylogenetic

information (Nei, 1996).

Cavalli-Sforza and Edwards (1967) were the first to propose the idea of using a

ML method for phylogenetic inference from gene frequency data, but they encountered

a number of problems when implementing this method. Later, considering nucleotide

sequence data, an algorithm was developed for constructing a phylogenetic tree by the

ML method (Felsenstein, 1981). In the case of amino acid sequence data, the likelihood

method was proposed by using an empirical transition matrix for 20 different amino

acids (Kishino et al., 1990). Later, this method was extended by using various transition

matrices for nuclear and mitochondrial proteins (Adachi and Hasegawa, 1995; 1996).

The first step is the selection of the substitution model of interest that gives the

instantaneous rates at which each of the four possible nucleotides (or 20 amino acids)

changes to each of the other three possible nucleotides (or 19 amino acids). Alignments

of homologous DNA and amino acid sequences can be examined under a wide range of

evolutive models (JC69, K80, F81, F84, HKY85, TN93 and GTR for nucleotides, and

Dayhoff, JTT, mtREV, WAG and DCMut for amino acids) (Guindon et al., 2005). ML

looks for the tree that, under some model of evolution, maximizes the likelihood of

observing the data that is expressed as log-likelihood. ML program seeks the tree with

the largest log likelihood. The advantages of the ML method are that it allows users to

specify the evolutionary model they want to use, and that the likelihood of the resulting

tree is known. A disadvantage is that ML method is considerably slower than either

Parsimony or NJ, and it is not difficult to exceed the capacity of even the most up-to-

date desktop computer (Hall, 2001).

In the BI, inferences of phylogenies are based upon the posterior probability

(PP) of phylogenetic trees. Bayesian analysis of phylogenies is similar to ML in that the

user postulates a model of evolution and the program searches for the best trees that are

consistent with both the model and the data (the alignment). It differs somewhat from

ML in that while ML seeks the tree that maximizes the probability of observing the data

given by that tree, bayesian analysis seeks the tree that maximizes the probability of the

tree given the data and model for evolution. Unlike ML, that seeks the single most

likely tree, Bayesian analysis searches for the best set of trees. MrBayes program uses

the Metropolis-coupled Markov Chain Monte Carlo (MCMCMC) method, which works

Page 55: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

30

proposing a new state (new tree) by using a substitution model, and then the acceptance

probability for this state is calculated (Hasting, 1970; Green, 1995). A random number

between 0 and 1 is drawn, and if that number is less than the calculated probability, the

new state is accepted; otherwise the state remains the same. The proposed new state

involves moving a branch and/or changing length of a branch to create a modified tree.

If that new tree is more likely than the existing tree, given the model and the data, it is

more likely to be accepted. This process of proposing and accepting/rejecting new state

is repeated many thousands or millions of times (Hall, 2001; Huelsenbeck and

Ronquist, 2001). Although BI present computational efficiency with respect to the other

character-based methods some questions are unsolved as convergence of Markov chain,

discrepancy between Bayesian probabilities and nonparametric bootstrap test values

(Huelsenbeck et al., 2002).

1.8. Adaptive evolution

Most amino acid substitutions are deleterious and are eventually eliminated from the

population by purifying selection. Some amino acid substitutions are neutral and pure

chance determines whether they will be fixed into the population. In rare instances, an

amino acid substitution is beneficial and is fixed into the population by positive

selection. At the nucleotide level, some base substitutions cause amino acid

replacements and others are silent. The nucleotide position within a codon affects

whether an amino acid replacement takes place when the nucleotide changes. Most

mutations at the first position in a codon result in amino acid replacements; all

mutations at second positions are replacement mutations, but because of the redundancy

of the genetic code, many third position mutations are silent. The number of non-

synonymous changes per non-synonymous site is called dN, and the number of

synonymous changes per synonymous is called dS. When the sequences are compared a

dN/dS ratio<1 is taken as evidence of purifying selection, a ratio=1 is taken as evidence

that mutations were generally neutral, and a dN/dS ratio >1 is taken as evidence of

positive selection. To identify evidence of positive or purifying selection, studies of

adaptive evolution normalize the number of non-synonymous and synonymous

mutations to the number of non-synonymous and synonymous sites in the gene.

Several methods have been proposed to estimate nonsynonymous and synonymous

rate of substitution by comparison of nucleotides and amino acid sequences. The first

Page 56: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

INTRODUCTION

31

methods were based on estimating the numbers of synonymous and nonsynonymous

substitutions from comparison of two coding DNA sequences (Miyata and Yasunaga,

1980; Li et al., 1985; Nei and Gojobori, Li, 1993; Pamilo and Bianchi, 1993). The

method of Nei and Gojobori (1986) is the simplest and performs a codon-by-codon

comparison. The model of Jukes and Cantor (1969) is applied to correct multiple

substitutions that have occurred at one site, and it normally produces results very similar

to those obtained by Nei-Gojobori method.

Methods based on a likelihood approach were proposed for determining positive

selection by comparison of multiple sequences, whose models of codon substitution

assume a single ω=dN/dS for all lineages and sites and has been extended to account for

variation of ω, either among lineages or among sites (Goldman and Yang, 1994; Muse

and Gaut, 1994). The lineage-specific models allow for variable ωs among lineages and

are thus suitable for detecting positive selection along lineages. They assume no

variation in ω among sites. As a result, they detect positive selection for a lineage only

if the average dN over all sites is higher than average dS (Muse and Gaut, 1994; Yang,

1998; Yang and Nielsen, 1998). The specific-site models allow the ω ratio to vary

among sites but not among lineages. Positive selection is detected at individual sites

only if the average dN over all lineages is higher than average dS (Nielsen and Yang,

1998; Yang et al., 2000). The model branch-site allows the ω ratio to vary both among

sites and among lineages. This model assumes that positive selection occurs at a few

time points and affects a few amino acids (Yang and Nielsen, 2002).

The codon-based models, based on likelihood approach, include parameters for

transition-transversion bias and for codon frequencies. In the case of the Goldman and

Yang (1994) model it also includes the different replacement probabilities between

amino acids based on the Grantham (1974) physicochemical distance matrix, and

Bayesian models that assume a prior distribution dN/dS ratio (Nielsen and Yang, 1998;

Yang et al., 2000). All methods are based on theoretical assumptions and ignore the

empirical observations that distinct amino acids differ in their replacement rates.

Recently, a codon-based model, named mechanistic-empirical model (MEC), was

published what takes into account all the parameters mentioned above and the

assimilation of empirical amino acid replacement probabilities into the codon-

substitution matrix (Doron-Faigenboim and Pupko, 2007).

Page 57: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 58: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

THESIS BACKGROUND AND OBJECTIVES

Page 59: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 60: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

THESIS BACKGROUND AND OBJECTIVES

35

2. THESIS BACKGROUND AND OBJECTIVES

Throughout the last decade, yeasts belonging to the genus Candida have emerged as

major opportunistic pathogens broadly distributed in the nature, being frequently

present as innocuous commensal of the skin, mouth, vagina, and pharynx as well as of

the urinary and gastrointestinal tracts in humans and other warm-blooded animals

(Coleman et al., 1998; Pedrós, 2003). These yeasts present small cells about 4 a 6 um

diameter, hyphaes and pseudohyphae and reproduce majorly by budding. As a relatively

harmless commensal organism C. albicans exists in about 50% of the human population

(Odds, 1988; Calderone, 2002).

Candida species produce infections denomined candidiasis. These infections can

be superficial form particularly in the mucosas (epitelium and endotelium), or

generalized causing systemic candidiasis also designated as candidemia (Gudlaugsson

et al., 2003). These infections have increased over the last two decades owing to the

increase of immunocompromised patients due to hematological disorders, transplants,

chemotherapy, AIDS, among others. They are also frequent in the extremes of age such

as in the elderly and in neonates, particularly the underweight, and in women of

childbearing age (López Martínez et al., 1984; Musial et al., 1988; Horowitz et al.,

1992; Ball et al., 2004; Estrella et al., 2000; Clark et al., 2004; Chen et al., 2005;

Tavanti et al., 2005a). As opportunistic they can present pathogenic character under

determined conditions, becoming an important cause of infection (Coleman et al., 1998;

Pedrós, 2003). The infections caused by yeasts from the genera Candida became the

fourth cause of invasive infections in United States and the eighth in Europe (Jarvis,

1995; Fluit et al., 2000).

Approximately 200 Candida species are known, of which nearly 20 species of

Candida have been identified as etiologic agents of infections, but the list of medically

important yeasts continues to grow (Calderone, 2002). Although Candida albicans has

been, in the past, the most common causative organism of fungemia and disseminated

candidiasis, other Candida species are becoming increasingly more prevalent. A

worldwide study of bloodstream infections from 1997 to 1999 showed that at least 45%

of yeast infections were caused by other Candida species than C. albicans (Pfaller et al.,

2001). The most commonly isolated species, apart from C. albicans, are C. parapsilosis

(20 to 40% of all reported episodes of candidemia), C. tropicalis (10 to 30%), C. krusei

Page 61: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

36

(10 to 35%), C. glabrata (5 to 40%), C. guilliermondii (2 to 10%) and C. lusitaniae (up

to 8%) (Sandven 2000, Krcmery and Barnes, 2002). The emergence of these non-

albicans species is associated with inherent or acquired resistance to fluconazole in up

to 75% of all C. krusei infections and 35% of all C. glabrata infections; conversely, at

present C. lusitaniae remains highly susceptible to azole antifungal agents but it is

resistant to Amphotericine B (Wingard et al., 1991; Krcmery and Barnes, 2002; Pfaller

et al., 2003). C. dubliniensis has emerged as an opportunistic pathogen closely related to

C. albicans. It shares many diagnostic characteristics with C. albicans but differs with

respect to its epidemiology, virulence, and the ability to develop fluconazole resistance

(Sullivan and Coleman, 1997; Gutierrez et al., 2002). The incidence of C. parapsilosis

is associated with handling central venous catheters and its ability ofgrowing as biofilm

on implanted medical devices (Baillie and Douglas, 1998; Kuhn et al., 2004). C.

tropicalis is related to patients with cancer, particularly due to its increased invasiveness

when the gastrointestinal mucosa is damaged and the intestinal flora is suppressed by

pharmacological treatments (Kullberg and Lashof, 2002).

Other new species have been appearing as C. rugosa, C. famata, C. africana, C.

bracarensis, C. fabianii, C. ciferri, C. membranaefaciens, C. metapsilosis, C.

orthopsilosis, C. pseudohaemulonii and C. pseudorugosa, affecting mainly

immunocompromised patients, taking advantage of host weakness to pass from

commensal to infectious. (Hazen, 1995; Tietz et al., 2001; García-Martos et al., 2004;

Fanci and Pecile, 2005; Tavanti et al., 2005b; Correia et al., 2006; Li et al., 2006;

Sugita et al., 2006; Valenza et al., 2006).

All Candida species belong to the subphylum Saccharomycotina, which is

included in the phylum Ascomycota. Both mithocondrial and nuclear genes have been

used to construct the phylogeny of the subphylum Saccharomycotina, making use of

one gene or using multigenic analysis (Barns et al., 1991; Bowman et al., 1992;

Diezmann et al., 2004; Lutzoni et al., 2004, Lopandic et al., 2005; James et al., 2006).

Others have carried out phylogenomic analysis, with very good results, but some

species still need to resolve their correct place in the phylogeny due to the lack of other

supporting sequenced genomes (Robbertse et al., 2006; Fitzpatrick et al., 2006).

Page 62: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

THESIS BACKGROUND AND OBJECTIVES

37

The search of new genes and the evaluation of their usefulness in phylogenetic

studies are very important to understand the phylogenetic relationships and their

potential uses in fungal molecular systematics. Genes that are considered good

candidates for use in phylogeny present specific features as vertical transmission,

ubiquity and slowly evolving sites. Within these new genes we can consider MADS-box

transcription factors and GPI-anchor proteins as potential candidates since they present

all the above characteristics. The MADS-box genes encode a eukaryotic family of

transcriptional regulators involved in diverse and important biological functions and the

GPI-anchor proteins participate in the interaction with the environment.

The MADS-box proteins have been identified in yeasts, plants, insects,

nematodes, lower vertebrates and mammals. These proteins are characterized by

containing a conserved DNA binding and dimerization domain named the MADS-box

(Schwarz-Sommer et al., 1990) after the five founding members of the family: Mcm1

(yeast) (Passmore et al., 1989), Arg80 (yeast) (Dubois et al., 1987) or Agamous (plant)

(Yanofsky et al., 1990), Deficiens (plant) (Sommer et al., 1990) and SRF (human)

(Norman et al., 1988). In animal and fungi, two distinct types of MADS-box genes have

been identified, the SRF-like (type I) and MEF2-like (type II) classes (Shore and

Sharrocks, 1995). These MEF2-like amino acid sequences are more closely related to

most plant MADS-domain sequences, suggesting that at least one gene-duplication

event occurred before the divergence of plants and animals (Alvarez-Buylla et al.,

2000). Thus, both type I and type II MADS-box proteins can be found in each kingdom.

The organization of type I and type II MADS-box proteins is represented in Figure 5.

Type I and type II proteins can be further classified in subfamilies on the basis of shared

sequence similarity between their C-terminal extensions and the highly conserved

MADS domain. These domains are referred to as SAM in fungi and animal SRF-like

proteins, as MEF2 in fungi and animal MEF2-like proteins and as I, K and C in plant

type II proteins (Mueller and Nordheim, 1991; Sharrocks et al., 1993; Riechmann et al.,

1996; Riechmann and Meyerowitz, 1997). MADS-box family members generally

recognize AT rich consensus sequences, with a highly conserved core of 10 bp:

CC(A/T)6GG is the binding site of SRF-like proteins known as CArG box (Treisman,

1990), and CTA(A/T)4TAG is the binding site of MEF2-like proteins (Figure 6)

(Pollock and Treisman, 1991).

Page 63: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

38

Figure 5. Schematic representation of Type I (SRF-like) and Type II (MEF2-like) protein domains in

plant, animal, and fungi. The scale indicates the number of amino acids along the protein. Plant

Type II-like proteins have carboxyl-terminal domains that go beyond 200 amino acids. In plant

Type I-like proteins the ―?‖ indicates carboxyl-terminal domains not well defined yet and of

variable lengths (Based on Alvarez-Buylla et al., 2000).

In the yeast Saccharomyces cerevisiae four MADS-box proteins have been

found: Mcm1 and Arg80 which are related to the human SRF, and Rlm1 and Smp1

which belong to the MEF2-like family (Alvarez-Buylla et al., 2000). Rlm1 controls the

expression of genes required for cell wall integrity, while Smp1 is involved in osmotic

stress response mediated by the Hog1 signal transduction pathway (Watanabe et al.,

1995; Dodou and Treisman, 1997; Nadal Ed et al., 2003). Arg80 is only required for

repression of arginine anabolic genes and induction of arginine catabolic genes

(Messenguy and Dubois, 2000), whereas Mcm1, also involved in the control of arginine

metabolism, playing a pleiotropic role in the cell (Messenguy and Dubois, 1993). Mcm1

is essential for cell viability and controls M/G1 and G2/M cell-cycle-dependent

transcription (Althoefer et al., 1995; McInerny et al., 1997), mating (Jarvis et al., 1989),

minichromosome maintenance (Passmore et al., 1988), recombination (Elble and Tye,

1992), transcription of TY elements (Errede, 1993; Yu and Fassler, 1993) and

osmotolerance (Kuo et al., 1997).

Page 64: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

THESIS BACKGROUND AND OBJECTIVES

39

Figure 6. Structure of Type I (SRF) and Type II (MEF2A) MADS domains in complex with their DNA

target (Homo sapiens). (a) Aligned sequences of SRF and MEF2A MADS domains with α and

β secondary structure assignments. (b) The SRF– core DNA complex with one monomer in red

and the other monomer in green (taken from Pellegrini et al., 1995). DNA is in grey, with

upper and lower DNA strands labelled W and C. (c) SRF DNA sequence (18 bp) with the

palindromic CArG recognition element boxed. (d) MEF2A–DNA complex with one monomer

in red and the other monomer in green (adapted from Santelli and Richmond, 2000). DNA is in

grey. (e) MEF DNA sequence (17 bp) in the crystal structure with MEF2 consensus site boxed

(taken from Santelli and Richmond, 2000).

Other MADS-box proteins type I have also been studied, for instance in

Schizosaccharomyces pombe, MAP1 gene encodes a protein required for cell-type-

specific gene expression, in Ustilago maydis, UMC1 encodes a protein that regulates the

expression of pheromone-inducible genes (Yabana and Yamamoto, 1996; Kruger et al.,

1997) and in C. albicans, CaMCM1 is crucial for morphogenesis (Rottmann et al.,

2003). In the case of MADS-box type II, putative RLM1 orthologues were identified in

C. albicans, Paracoccidioides brasiliensis and Aspergillus niger (Damveld et al., 2005;

Fernandes et al., 2005; Bruno et al., 2006).

GPI-anchor proteins (GPiPs) contain a C-terminal signal sequence that allows

the linkage to a glycosylphosphatidylinositol (GPI) anchor. Proteins destined to be GPI

anchored share conserved features: an N-terminal signal sequence for localization to the

Page 65: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

40

endoplasmic reticulum (ER), a C-terminal hydrophobic domain (9 to 24 residues) for

transient attachment to the ER membrane, and the so-called ω site, where the protein is

cleaved to be ligated to the GPI anchor (Thomas et al., 1990). The ω site is localized 9

to 10 amino acids before the C-terminal hydrophobic domain, and its amino acid

environment determines whether the protein has a high probability to be GPI anchored.

Amino acids at positions ω-1 to ω-11 form a linker region, usually with no charge and

no secondary structure (α-helix or β-sheet). The ω site region is composed of small

amino acids to fit the transamidase protease catalytic pocket, and finally, the spacer

region (ω + 3 to ω + 9) is comparable to the linker, having no charge and being flexible

(Eisenhaber et al., 2004). Shortly after protein synthesis in the ER, the preformed GPI

anchor replaces the C-terminal transmembrane region (Figure 7).

Figure 7. GPI anchor structure. The structure of core GPI anchor which consists of a lipid group (serving

as the membrane anchor), a myoinositol group, an N-acetylglucosamine group, three

mannose groups, and a phosphoethanol phosphoethanolamine group, which ultimately

connects the GPI anchor to the protein via an amide linkage represented in black (Tiede et

al., 1999). The addition of mannose groups and the positioning of side chains like

phosphoethanolamine on the GPI anchor contributes to the variety of GPI anchors identified

so far. The additional gray groups illustrate the side chains added to the GPI anchor in S.

cerevisiae (side chains are specific to each organism).

GPI anchoring is encountered in every eukaryotic cell from unicellular yeast

cells to the highly specialized mammalian cells (Bowman et al., 2006; Ferguson, 1999;

McConville and Menon, 2000). In S. cerevisiae, most are involved in cell wall

compound biosynthesis, flocculation, protease activity, sporulation, and mating (Caro et

al., 1997). These proteins have also been found in A. nidulans, C. albicans, C. glabrata,

Page 66: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

THESIS BACKGROUND AND OBJECTIVES

41

Neurospora crassa and Schizosacch. pombe (Caro et al., 1997; de Groot et al., 2003;

Eisenhaber et al., 2003; Eisenhaber et al., 2004; Sundstrom, 2002; Weig et al., 2004). It

is notable that S. cerevisiae and Schizosacch. pombe seem to possess a smaller number

of GpiPs than C. albicans.

IFF8 gene encodes a putative GPI-anchored protein of unknown function that

forms part of the group of genes, also known as the IFF genes (for individual protein

file family F). Analysis of the coding sequences in this family revealed gene products

with a large N-terminal domain of 340 amino acids shared by each member of the

family, with high homology. The proteins diverge strongly after this domain; both in

size and sequence, including its C-terminal hydrophobic domain that allows for linkage

to a glycosylphosphatidylinositol (GPI) anchor (Richard and Plaine, 2007).

Objectives

Currently, protein-coding genes are used to resolve deep level phylogenetic

relationships through multigenic and phylogenomic analysis, but the majority of these

analyses uses regions with similarity only among orthologue gene sequences and does

not take into account the other regions which could give us useful information to infer a

more robust phylogeny. Hence, the main purpose of this thesis was to evaluate the use

of the MADS-box transcription factors in fungi phylogeny and the use of IFF8 gene to

help to resolve the phylogeny in the particular CUG group.

Therefore, the specific aims of the present work were:

* Search for MADS-box and IFF8 ortohologue genes in the kingdom Fungi.

* Analyze the potential use of these genes to infer phylogenetic relationships

according to current proposal in the kingdom Fungi.

* Infer the phylogeny of Candida species based on concatenated alignment of the

previously predicted genes.

* Analyze the putative adaptive evolution of RLM1 gene in the subphylum

Saccharomycotina, particularly in the Whole Genome Duplicate (WGD) group.

* Identify the molecular mechanism that putatively changed RLM1 gene in the

subphylum Saccharomycotina.

Page 67: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 68: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

MATERIAL AND METHODS

Page 69: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 70: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

MATERIAL AND METHODS

45

3. MATERIAL AND METHODS

3.1. Phylogenetic analysis

3.1.1. Data retrieval

Sequences used in this study were obtained from several different databases (DB)

and Rlm1 and Mcm1 protein and DNA sequences from Saccharomyces cerevisiae as

well as the Iff8 protein and DNA sequences of Candida albicans were used as

references. The S. cerevisiae Rlm1 and Mcm1 protein/gene sequences were obtained

from the Genbank under accession numbers BAA09658/D63340 and

CAA88409/Z48502, respectively. The Iff8 protein/gene sequences of C. albicans were

obtained from Candida Database orf19.8201.

To obtain both nucleotide and amino acid putative orthologue sequences for the 76

fungal species, the protein sequences referred above were used as reference to carry out

TBLASTN searches (Matrix BLOSUM62, cut-off value E > 10-5

) in the following

databases: Candida DB (http://www.candidagenome.org), Saccharomyces DB

(http://www.yeastgenome.org), Broad Institute (http://www.broad.mit.edu), Joint

Genome Institute (JGI, http://www.jgi.doe.gov), Genolevures (http://cbi.labri.fr), The

Institute for Genomic Research (TIGR, http://www.tigr.org), Washington University St.

Louis (WUSTL, http://www.wustl.edu), National Center for Biotechnology Information

(NCBI, http://www.ncbi.nlm.nih.gov), Sanger Institute (http://www.sanger.ac.uk),

Baylor College of Medicine (http://www.hgsc.bcm.edu), Oklahoma University

(http://www.genome.ou.edu) and Murcia University (http://mucorgen.um.es). To avoid

errors in the following alignments, if partial orthologue sequences were obtained in this

search they were completed with the information from the closest species. Animals and

plants MADS box sequences were used as outgroup.

Both outgroup protein/gene sequences were retrieved from Genbank database under

the following accession numbers, Type II: Homo sapiens MEF2D

NP_005911/NM_005920, Xenopus laevis MEF2 AAI12917/BC112916, Danio rerio

MEF2D AAH98522/BC098522, Drosophila melanogaster DMEF2

NP_477021/NM_057673, Caenorhabditis elegans CEMEF2 NP_492441/NM_060040,

Arabidopsis thaliana APETALA1 NP_177074/NM_105581 and Oryza sativa

AGAMOUS ABG21913/DP000011; Type I: Homo sapiens SRF

NP_003122/NM_003131, Xenopus laevis SRF NP_001095218/NM_001101748, Danio

Page 71: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

46

rerio MEF2D NP_001103996/NM_001110526, Drosophila melanogaster DSRF

NP_726438/NM_166667, Caenorhabditis elegans NP_492226/NM_059895,

Arabidopsis thaliana AGL36 NP_850880/NM_180549 and Oryza sativa AGL35

BAD81343/AP002070. No outgroup was used in Iff8 protein and gene analyses.

3.1.2 Sequence alignments

The sequences obtained were aligned by using MUSCLE program (Edgar, 2004),

and then imported into MEGA 4.0 software (Tamura et al., 2007) to be manually edited.

The alignments were based on amino acids, therefore nucleotides were aligned in

codons. As some parts of the C-terminal regions were too divergent to be confidently

aligned, a preliminary NJ tree in Clustal W (Thompson et al., 1994) and UPGMA tree

in MUSCLE was produced for each phylum, aligning the conserved C-terminal

domains. After this previous analysis, the alignment of the less-conserved C-terminal

regions from different phyla became much easier and a new and final total alignment

was produced.

3.1.3. Selection of the best evolutionary model to infer phylogeny

The model selection approach, Akaike Information Criterion (AIC), was used to

estimate the best-fit protein evolutionary model for ML. The ProtTest 1.4 program was

used and the best model was selected as the one that presented the greatest likelihood

value, from the 48 models produced by the program (Abascal et al., 2005). The same

approach was used to estimate the best-fit DNA evolutionary model for ML but by

using the jModeltest 1.0 program (Posada, 2008) for likelihood calculations, the best

model was also the one that presented the greatest likelihood value from the 88 models

of evolution produced. For Bayesian inference a model selection approach was also

performed, using the 4 hierarchies for likelihood ratio test to determine the simplest and

most appropriate DNA evolutionary model by using MrModeltest 2.3 (Nylander, 2004)

and PAUP 4.0b10 (Swofford, 2002) programs as well as the required file

Mrmodelblock (anexo I).

3.1.4. Phylogenetic reconstruction

Phylogenetic analysis was performed by ML in PHYML 3.0 program (Guindon and

Gascuel, 2003), for each protein and/or gene and protein and/or gene concatenated

alignments previously obtained. To obtain the phylogenetic tree, the best evolutionary

Page 72: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

MATERIAL AND METHODS

47

model determined previously was considered and the branch/nodal supports were

estimated with the approximate likelihood ratio test by using the Shimodaira-Hasegawa

like support option (Anisimova and Gascuel, 2006).

Bayesian inference (BI) was performed by using MrBayes 3.2 program

(Huelsenbeck and Ronquist, 2001) for each alignment previously obtained. As for ML,

to obtain the phylogenetic tree by bayesian inference the best evolutionary model

determined by the previous analysis was also considered and the branch/nodal support,

given as the posterior probability (PP) values in this analysis, was calculated by using

the Metropolis-coupled Markov chain Monte Carlo method after calculating a large

number of different trees. To infer fungal phylogeny using RLM1 gene four chains with

4.000.000 analysis were run. For Saccharomycotina phylogeny inferred with RLM1

gene, four chains with 800.000 analyses were run and for phylogeny inferred with

MCM1 gene four chains with 1.250.000 analyses. Saccharomycotina phylogeny with

RLM1+MCM1 genes was inferred by running four chains with 5.500.000 analyses. To

infer CUG group phylogeny using IFF8 gene four chains with 1.000.000 analyses were

run and by using RLM1+MCM1+IFF8 genes 2.000.000 generations. Trees were

sampled every 100 generations and trees obtained before the convergence of the

Markov chain were not included in the consensus tree. The remaining samples were

used to construct the consensus tree and the branches/nodes probability posterior values

were converted from frequency to percentage, importing into PAUP 4.0b10. In the BI,

these percentages for the consensus tree are the rough equivalent of a ML search with

bootstrapping analysis (Huelsenbeck and Ronquist, 2001).

To visualize the degree of phylogenetic conflict within concatenated alignments a

phylogenetic network was generated by using the NeighborNet method in the

SPLITSTREE 4.1 program (Huson and Bryant, 2006), with 1000 bootstrap analysis.

3.1.5. Interpretation of values considered for nodal support

Previous studies have demonstrated that to infer about the nodal support in

phylogenetic trees the information from posterior probabilities and bootstrap analysis

should be complemented (Alfaro et al., 2003; Douady et al., 2003; Reeb et al., 2004).

Bayesian methods are more efficient in recovering accurate nodal support values, but

same authors have indicated that they could be less conservative than boostrap (Suzuki

Page 73: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

48

et al., 2002; Alfaro et al., 2003), unlike Shimodaira-Hasegawa (S-H) approach used in

the ML analyses which is more conservative (http://atgc.lirmm.fr/alrt). Therefore, a

combination of both, posterior probabilities and Shimodaira-Hasegawa like support,

were used to assess the level of confidence of a specific node in the phylogenetic DNA

trees. Throughout this work, the following scale was used:

- High (strong) support - PP ≥ 95% and S-H ≥ 0.70;

- Medium (moderate) support - PP ≥ 95% and 0.70 > S-H ≥ 0.50, or PP < 95%

and S-H ≥ 0.70;

- Low (poor or weak) support - PP ≥ 95% and S-H < 0.50, or PP < 95% and 0.70

> S-H ≥ 0.50;

- No support - PP < 95% and S-H < 0.50.

In protein phylogenetic trees Shimodaira-Hasegawa like support (S-H) analyses was

the only considered, using the same scale values.

3.2. Analysis of adaptive evolution

We applied the approach of Yang and coworkers (Yang et al., 2000; Swanson et al.,

2001) to test for amino acids under positive selection in Rlm1 protein. To determine

which model best fitted the data, likelihood ratio tests (LRTs) were performed by

comparing the differences in log likelihood values between two models by using the χ 2

distribution. The codon-substitution models M0, M1a, M2a, M3, M7, and M8 that use a

statistical distribution to describe the random variation among codon sites (ω=dN/dS)

were used (Nielsen and Yang, 1998; Wong et al., 2004; Yang et al., 2005). We used

LRTs to make 3 comparisons to find out whether positive selection has played a role in

the molecular evolution of this gene; (i) the one ratio model (M0) was compared with

the discrete model (M3), (ii) the neutral model (M1a) was compared with the selection

model (M2a), and (iii) the beta model (M7) was compared with the beta and ω model

(M8).

To identify particular sites in the gene that were likely to have evolved under

positive selection the Bayesian Empirical Bayes (BEB) analysis was used (Yang and

Bielawski, 2000; Yang et al., 2005). This estimates the posterior probability (PP) values

at each site. If, a particular codon site presents PP value higher than 95% and ω higher

Page 74: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

MATERIAL AND METHODS

49

than 1 that site is inferred to be under positive selection (Nielsen and Yang, 1998; Yang

and Bielawski, 2000; Yang et al., 2005). These calculations were performed by using

the CODEML program inside PAML 4.1 (Yang, 1997) and HYPHY 1.0 program

(Kosakovsky et al. 2005) to analyze positive selection in the MADS-box orthologues

from RLM1 gene in plants, animals and fungi, and in the entire RLM1 gene within the

Saccharomycotina group.

A seventh model, named mechanistic empirical combined (MEC), was further

used to test positive selection in the subphylum Saccharomycotina (Doron-Faigenboim

and Pupko, 2006). This model was statistically tested by comparing the second order

Akaike Information Criterion corrected (AICc) likelihood scores with Model 8a (Wong

et al., 2004). The calculations were performed by using SELECTON 2.4 program (Stern

et al., 2007).

3.3. Analysis of the repetitive region in the subphylum Saccharomycotina

To identify the putative molecular mechanism that contributed to the divergence of

the MADS-box RLM1 gene in the subphylum Saccharomycotina, the C-terminal was

analysed. Three reading frames from the sequences that presented a repetitive region in

the C-terminal were analyzed by using VIRTUAL RIBOSOME 1.1 program

(Wernersson, 2006) to determine the possible amino acids encoded by that region. The

corresponding ancestral RLM1 gene, from which the most recent Saccharomycotina

species reported in this study diverged, was constructed by using MrBayes 3.2 program.

The Saccharomycotina Rlm1 protein sequences were aligned with this ancestral

sequence to identify the presence of an ancestral repetitive region in the C-terminal.

Page 75: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 76: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

Page 77: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 78: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

53

4. RESULTS AND DISCUSSION

4.1. Compilation of data

The fungal Rlm1 and Mcm1 orthologue protein/gene sequences identified in this

analysis consisted of 76 putative orthologues. From these, 47 putative Rlm1 orthologue

sequences were directly identified in the databases, 22 were predicted by homology and

7 presented partial sequences (Annex II). Regarding Mcm1, 60 putative orthologue

sequences were found in the database, 14 were predicted by homology and 2 presented

partial sequences (Annex III). In the case of Iff8 protein/gene only 9 putative orthologue

sequences were identified (Annex IV). Other members of MADS-box protein, Smp1

and Arg80 protein orthologue sequences were also identified in Saccharomyces sensu

stricto and Saccharomyces sensu lato groups, but not being used in this study (Annexe

V and VI).

The presence of introns was detected only in RLM1 and MCM1 genes (Annexes II

and III) IFF8 sequences did not contain introns (Annex IV). The number of introns

varied in RLM1 and MCM1 genes, from the lack of them to 8 introns. The higher

variability in the number of introns was observed within Mucoromycotina and

Basidiomycota while the Ascomycota group presented a more constant pattern, from 0

to 2 introns. In view of these facts, a bias towards an intron loss in MADS-box

transcription factors in the kingdom Fungi seems to be occurring, not a gain, which was

evidenced by the absence or presence of only 2 introns in the most recently diverged

phylum Ascomycota, unlike the older phyla that presented a varied number that could

reach eight introns.

In the present work, the results suggested that during the divergence of fungal

species the intron loss would have been toward RML1 3‘ end because in some species

that diverged recently the intron in the 5‘ end inside the MADS-box of this gene is the

only maintained. In MCM1 a bias to lose introns both in 5‘ and in 3‘ ends was

identified, but like in RLM1 gene one intron was maintained inside the MADS-box.

Curiously, the presence of the first intron towards the 5‘ end was inside the MADS-box

a feature present in all the genes with introns. In the case of MCM1 this was also true,

the first intron is also present inside the MADS-box, but not in the 5‘ end since the

MADS-box is in the middle of the gene.

Page 79: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

54

It is known that in intron-rich organisms, introns are evenly distributed within the

coding sequences, but are biased toward the 5‘ end of genes in intron–poor organisms

(Nielsen et al., 2004). Current models attribute the bias to 3‘ end intron loss due to a

poly-adenosine-primed reverse transcription mechanism which was demonstrated in

experiments with intron-containing Ty elements in yeast (Fink, 1987; Boecke et al.,

1985). However, in a study carried out in different fungal species no increased

frequency in intron loss toward the 3‘ end of genes was found, the loss occurred in the

middle of the genes, suggesting either other mutational mechanisms (e.g., reverse

transcription primed internally) or the presence of selective pressure to preferentially

conserve introns near the 5‘ and 3‘ ends of genes (Nielsen et al., 2004). To date, the

mechanism by which introns are inserted or delected from gene loci is not well

understood.

4.2. Phylogeny based on MADS-box transcription factors

4.2.1. Rlm1 transcription factor

Fungal phylogeny based on Rlm1 transcription factor was inferred both by using

amino acids and DNA sequences. The phylogeny based on amino acids sequences was

inferred only by maximum likelihood (ML) by using the PHYML 3.0 program. From

the various models of evolution tested the Jones, Taylor and Thornton (JTT) model was

the one selected, including the options to calculate the proportion of invariant sites, the

gamma distribution and the frequency of amino acids. In the phylogeny based on

nucleotides the General Time Reversible (GTR) model, including the proportion of

invariant sites and the gamma distribution options, were the selected, both for ML

analysis and bayesian inference.

The phylogeny based on protein sequences resolved the Fungi as a clade strongly

supported by a Shimodaira-Hasegawa (S-H) like value of 0.96 in the ML analysis.

However the resolution of Fungi into a clade based on DNA sequences revealed the

existence of a conflict. The S-H value obtained was of 0.24 for ML and of 100% for

bayesian posterior probability (PP), which by the scale value means a low nodal support

(Figure 8 and 9). This result could be due to the low number of DNA RLM1 sequences

from species other than fungi that were used in this study. But, although the low nodal

support in the phylogeny based on DNA sequences, the obtained topology is strongly

supported by previous studies (Wainright et al., 1993; Baldauf and Palmer, 1993).

Page 80: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

55

The Mucoromycotina ‗Zygomycota‘ are primarily coenocytic (sometimes producing

cell septa) and undergo sexual reproduction by formation of a thick-walled resting spore

called a zygospore, which is clearly different from the Basidiomycota and Ascomycota

(Dikaryomycota) and other phyla. Morphologicaly the Ascomycota and Basidiomycota

both possess regularly septate hyphae and a dikaryotic life stage but differ in the

structures involved in meiosis and sporulation.

The fungal phylogeny inferred by these analyses clearly placed the 76 fungal

species into one subphyla incertae sedis, the Mucoromycotina ‗Zygomycota‘, and two

well supported phyla, the Basidiomycota and the Ascomycota (Figures 8 and 9). The

Mucoromycotina ‗Zygomycota‘ formed a clade moderately supported by protein (S-H

value of 0.64) as well as by nucleotide analysis (S-H 0.69 ML and 99% PP). Both

Ascomycota and Basidiomycota formed clades highly supported by S-H values of 0.84

and 0.91, respectively with the protein analysis, but these clades were medium or low

supported in the nucleotide ML analysis by S-H of 0.67 and 0.57 and PP values of 98%

and 61%, respectively. Based on the literature a sister relationship between the

Basidiomycota and Ascomycota (the ‗‗Dikaryomycota‘‘) has been proposed, being

recently recognized as the subkingdom Dikarya (James et al., 2006, Hibbett et al.,

2007). The results obtained in these analyses highly support this sister relationship (S-H

= 0.98 ML in protein, S-H= 0.97 ML and PP = 100% in nucleotides).

In the present analysis three species belonging to the subphylum Mucoromycotina

‗Zygomycota‘ were used two of which the Mucor circinelloides and the Rhyzopus

oryzae are sisters by a well-supported node in the phylogenetic analysis using

nucleotide (SH=0.87, PP=99%) and amino acids (SH=0.92) sequences (Figures 8 and

9). This result agrees with the current taxonomic classification in which these two

species form part of the family Mucoraceae (Voigt and Wöstemeyer, 2001). The other

species included in this subphylum, Phycomyces blakeleeanus, member of the family

Phycomycetaceae. The topology obtained is in agreement with previous results since

both the Mucoraceae and Phycomycetaceae families are part of the order Mucorales

within the subphylum Mucoromycotina (Hibbet et al., 2007).

Page 81: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

56

Figure 8. Fungal phylogeny based on Rlm1 protein sequences. This phylogeny was obtained by

maximum likelihood. The nodes are supported by Shimodaira-Hasegawa like support. Species

with partial sequences are: Kluyveromyces waltii, Saccharomyces paradoxus, Ascosphaera

apis, Botrytis cinerea, Chaetomium globosum, Ephicloa festucae, Coccidioides immitis.

Page 82: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

57

The phylum Basidiomycota is strongly supported as a monophyletic clade as well as

classes Agaricomycetes (S-H=0.99 ML in protein, S-H=0.97 ML and PP=100% in

nucleotides), Tremellomycetes (S-H=1.0 ML in protein, S-H=1.0 ML and PP=100% in

nucleotides), and Ustilaginomycetes (S-H=0.97 ML in protein, S-H=0.97 ML and

PP=91% in nucleotides), within this phylum by both protein and nucleotide analyses.

The classification of Ustilaginomycetes, Microbotryomycetes, Tremellomycetes and

Agaricomycetes adopted here followed the current taxonomic proposal (Hibbet et al.,

2007). The Ustilaginomycetes, which is a class of the subphylum Ustilaginomycotina,

are represented by one species of the order Ustilaginales, Ustilago maydis, and by

another of the order Malasseziales, Malassezia globosa, class incertae sedis. These two

species received high support as sister groups in protein and nucleotide analyses.

However, in the phylogeny inferred by the bayesian method a topology conflict was

observed with Ustilaginomycetes and Schizosaccharomycetes grouping them as sister

clades with a low support (PP=62%), being the latter a member of the phylum

Ascomycota. The class Microbotryomycetes, which is a member of subphylum

Pucciniomycotina, is only represented by one species, Sporobolomyces roseus, and is

grouped as a sister group of the subphylum Agaricomycotina with high nodal support

(S-H=0.96 ML in protein, S-H=0.86 ML and PP=95% in nucleotides). The classes

Tremellomycetes and Agaricomycetes are members of the subphylum

Agaricomycotina. These two classes form well supported sister clades in protein (S-

H=0.99), but obtained a low support with the nucleotide analysis ((S-H=0.42 ML and

PP=86%). The class Tremellomycetes is represented by two species of the order

Tremellales, Cryptococcus neoformans and Cryptococcus gatti. These two species are

strongly supported as sister taxa by protein (S-H= 1.0) and nucleotide (S-H= 1.0,

PP=100%) analyses. The class Agaricomycetes is represented by four species included

in the orders Agaricales (Coprinus cinereus, Laccaria bicolor), Polyporales (Postia

placenta), and Corticiales (Phanerochaete chrysosporium), being these latter two

considered as subclass incertae sedis. The order Polyporales and Corticiales are

resolved as well-supported sister groups in protein (S-H=0.95) but with a low support in

nucleotide (S-H= 0.0, PP=80%) analyses. The order Agaricales is resolved as the sister

group of the Polysporales/Corticiales group with high nodal support in protein (S-

H=0.99) and nucleotide (S-H=0.97, PP=100%) analyses.

Page 83: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

58

Figure 9. Fungal phylogeny based on RLM1 DNA sequences. Nodes values correspond to posterior

probabilities (left) and Shimodaira-Hasegawa (right). In the maximum likelihood analysis a S-

H value of 0.67 (a) for Schizosaccharomycetes clade within phylum Ascomycota, value of 1.0

(b) for V. polyspora as basal taxon from ‗Saccharomyces complex‘, value of 0.32 (c) for

Ephicloa festucae as basal taxon from Trichoderma group and a value of 0.04 (d) for A. terreus

as basal taxon of A. nidulans- A. niger group was observed. Species with partial sequence are:

Kluyveromyces waltii, Saccharomyces paradoxus, Ascosphaera apis, Botrytis cinerea,

Chaetomium globosum, E. festucae, and Coccidioides immitis.

Page 84: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

59

In this study, of the three subphyla recognized within the Ascomycota (Eriksson

et al., 2004, Hibbet et al., 2007), only the Taphrinomycotina presented a conflict

topology, as mentioned before. The species in this subphylum represented a

monophyletic clade with a high nodal support in protein (S-H= 0.99) and nucleotide (S-

H=1.0, PP=100%) analyses. The subphylum Saccharomycotina also presented

monophyly with well-supported nodal values (S-H=0.89 in protein and S-H=0.88,

PP=99% in nucleotide analyses). The subphylum Pezizomycotina was also represented

as a monophyletic clade with high nodal support values (S-H=0.98 in protein and S-

H=0.98, PP= 100% in nucleotide analyses), as well as its classes Leotiomycetes,

Sordariomycetes, Dothideomycetes and Eurotiomycetes (Figures 8 and 9). The class

Leotiomycetes was only represented by the order Helotiales with a high nodal support

(S-H=1.0 in protein and S-H=1.0, PP=100% in nucleotide analyses), being the species

Botrytis cinerea and Sclerotinia sclerotiorum members of this order.

The class Sordariomycetes is represented by the orders Sordariales, Hypocreales,

Phyllacorales – subclass incertae sedis, and the family Magnaporthaceae – order

incertae sedis. The orders Sordariales, Phyllacorales and the family Magnaportheceae

form a group highly supported (S-H=0.96 in protein and S-H= 0.98, PP=100% in

nucleotide analyses) with only a nodal conflict in the placement of Neurospora crassa

with the protein analysis, since this species should form a clade with Chaetomium

globosum and Podospora anserina. The order Hypocreales is a sister clade of the group

Sordariales/Phyllacorales/Magnaporthaceae with a high nodal support (S-H=1.0 in

protein and S-H=1.0, PP=100% in nucleotide analyses) but a conflict was also observed

regarding the placement of Ephicloe festucae (family Clavicipitaceae). This species was

placed as a sister clade with the family Hypocraceae (Trichoderma atroviride, T. reesei,

T. virens) in ML both in protein and in nucleotide analysis, but as a sister clade of the

family Nectriaceae (Nectria haematoccoca, Fusarium graminearum, F. oxysporum, F.

verticillioides) in the BI, although with low nodal support in all analyses (S-H=0.12 ML

in protein, S-H=0.32 ML and PP=63% in nucleotide).

The class Dothideomycetes is represented by the orders Capnodiales and

Pleosporales, which grouped as monophyletic sister clades with high nodal support

(SH=0.99 in protein, S-H=0.97 PP=90% in nucleotide analyses) (Figures 8 and 9). The

class Eurotiomycetes is represented by the orders Onygenales and Eurotiales and

Page 85: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

60

presented monophyly. In the order Onygenales, the families Ascosphaeraceae

(Ascosphaera apis) and Arthrodermateceae (Microsporum gypseum) are sister clades

with high nodal support in protein (S-H=0.92) and nucleotide (S-H=0.92, PP=99%)

analyses as well as the families Onygenaceae (Uncinocarpus reesii) and Gymnoascaceae

(Coccidioides immitis, Coccidioides posadasii). The family Ajellomycetaceae is

represented by Paracoccidioides brasiliensis, Histoplasma capsulatum and Blastomyces

dermatitidis, forming a highly supported clade by protein (S-H=1.0) and nucleotide (S-

H=0.99, PP=100%) analyses. A nodal conflict was observed regarding the placement of

the group Ascosphaeraceae/Arthrodermateceae, since the protein analysis showed it as a

sister group of the group Onygenaceae/Gymnoascaceae and the nucleotide analyses as a

sister group of the clade Ajellomycetaceae. However, both analyses showed low nodal

suppor by protein (S-H=0.58) and nucleotide (S-H=0.04, PP=67%) analyses. The order

Eurotiales is represented only by the family Trichocomaceae with a high nodal support

(S-H=0.99 in protein, S-H=1.0, PP=100% in nucleotide analyses). Talaromyces

stipitatus and Penicillium marneffei are sister taxa (Figures 8 and 9). The species

Aspergillus clavatus, A. fumigatus and Neosartorya fischeri formed a group with a well

nodal support that is a sister group of the one formed by A. oryzae, A. flavus, A. terreus,

A. nidulans and A. niger, also well supported (Figures 8 and 9). Only regarding A.

terreus placement by bayesian analysis a conflict was observed. This species grouped as

sister taxa of A. oryzae and A. flavus, while in ML analysis it was placed together with

A. nidulans and A. niger. However, both options had low supports. The orders

Leotiomycetes and Sordariomycetes formed sister clades with a well nodal support

while the orders Dothideomycetes and Eurotiomycetes also formed sister clades but

with a moderate support (S-H=0.85 in protein, S-H=0.89, PP=86% in nucleotide

analyses).

4.2.2. Mcm1 transcription factor

In fungal phylogeny based on the analysis of Mcm1 protein/gene only ML of

protein sequences was considered, since the nucleotide analysis showed a contradictory

topology to the one obtained with protein due to the high difference of MCM1 sequence

sizes (Figure 10). BI was not performed.

The phylogeny inferred from Mcm1 protein analysis coincides in the main

points with the one inferred from Rlm1 protein/gene analyses. The Fungi were resolved

Page 86: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

61

as a single clade with S-H= 0.82, strongly supported. The major differences in the

overall topology fall on some clades:

-The subphylum Mucoromycotina ‗Zygomycota‘ presented well supported

polyphyly between the families Phycomycetaceae and Mucoraceae, while with Rlm1

presented monophyly.

-Within the phylum Basidiomycota, the species Malassezia globosa formed a

sister group with the family Schizosaccharomycetes from the phylum Ascomycota with

a good support value, contrary to the obtained with Rlm1 protein analysis, that correctly

placed the Schizosaccharomycetes within the Ascomycota.

-In the phylum Ascomycota, the class Saccharomycetes is present as a

monophyletic group, but with a lower nodal support, while with Rlm1 analyses is well

supported.

-Within the subphylum Pezizomycotina, in the class Sordariomycetes (formed

by the orders Sordariales, Phyllacorales) and the family Magnaportheceae presented

Magnaporthe grisea as the basal clade, with a moderate nodal support, contrary to

results found in the phylogeny with Rlm1 protein/gene that presented Verticillium

dahliae as the basal taxon.

-The basal taxon in the family Hypocraceae is Trichoderma atroviride, which

disagrees with the obtained topology with Rlm1 protein analyses that places T. reesei as

the basal taxa.

-In the class Dothideomycetes, the latest taxa of the order Pleosporales is

different regarding the result found in the topology with Rlm1 protein/gene analyses.

With Mcm1 the latest taxa is Alternaria brassicicola and with Rlm1 is Pyrenophora

tritici-repentis, but with Mcm1 analysis the nodal support is very low.

-Within the class Eurotiomycetes Ascosphaera apis was found as basal of all the

orders that constitute this clade, while the Rlm1 analyses placed as a sister taxon to

Microsporum gypseum.

-The clade Arthrodermateceae (Microsporum gypseum) is a sister clade of the

group formed by Onygenaceae/Gymnoascaceae but presented a low nodal support

contrary to results obtained with the nucleotide RLM1 analysis that joined it to the clade

Ajellomycetaceae, also with low support.

Page 87: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

62

Figure 10. Fungal phylogeny based on Mcm1 protein sequences. This phylogeny was obtained by

maximum likelihood. The nodes are supported by Shimodaira-Hasegawa like support.

Species with partial sequence: Kluyveromyces waltii, Blastomyces dermatitidis, Botrytis

cinerea, Cryptococcus gatti.

Page 88: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

63

-The order Eurotiales is weakly supported, while the group formed by the taxa

A. niger, A. fumigatus and Neosarterya fischeri is well supported, contrary to results

obtained in Rlm1 phylogeny, where A. clavatus was joined to the two latter taxa

mentioned before.

-The placement of the taxa A. terreus is also different in this analysis, it was

placed as a sister taxon to A. oryzae and A.flavus, but only in the Rlm1 gene/ protein

analyses its placement is not well supported. The same for A. clavatus that with Rlm1

protein/genes analyses formed a highly supported group with Neosartorya fisheri and A.

fumigatus, but with Mcm1 presented an uncertain placement.

4.2.3. Placement of fungal phylogeny based on MADS-box transcription

factor within the current literature

Several phylogenetric studies have demonstrated the close relationship of the

kingdoms Animalia and Fungi (Baldauf and Palmer, 1993; Baldauf et al., 2000; Lang et

al., 2002). These two kingdoms are now known to be a part of a larger group that

includes the phylum Choanozoa, termed the Opisthokonts (Cavalier-Smith and Chao,

1995; Ragan et al., 1996). Together, the Opisthokonts and the phylum Amoebozoa form

the group Unikonts (Cavalier-Smith, 2002b). In the present work the phylum

Choanozoa has not been included but the close relationship between the two kingdoms,

Animalia and Fungi, was observed using species from the kingdom Plantaea as

outgroup. The kingdom Plantae is part of the group Bikonts (Cavalier-Smith, 2002b).

The phylogenetic topology obtained in this study is also in agreement with the topology

obtained in a study that explain the origin of the MADS-box proteins in Animalia,

Fungi, and Plantae (Alvarez-Buylla et al., 2000).

Regarding the kingdom Fungi, previous studies based on multigenic and rDNA

analysis indicated that the phyla Microsporidia, Chytridiomycota,

Neocallimastigomycota, Blastocladiomycota and the subphyla incertae sedis

Mucoromycotina, Zoopagomycotina, Entomophthoromycotina and Kickxellomycotina

are part of the earliest known divergence that took place during fungal evolution (Bruns

et al., 1992; Berbee and Taylor, 1993; Tanabe et al., 2000, Lutzoni et al., 2004;

Blackwell et al., 2006; Fitzpatrick et al., 2006; James et al., 2006). However, due to the

low number of avalaible sequences within all the phyla that represents the kingdom

Page 89: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

64

Fungi, only sequences from species within Basidiomycota and Ascomycota phyla and

the subphylum Mucoromycotina were used in this study.

The subphylum Mucoromycotina is represented by species that belonged to the

ex-phylum Zygomycota (Hibbet et al., 2007). The results obtained in our work, showed

that the subphylum Mucoromycotina is an early diverging clade (Figures 8 and 9) which

agrees with previous studies. The subphylum Mucoromycotina presented monophyly

with moderate support with Rlm1 sequence analyses, in agreement with phylogenetic

analysis inferred with ACT1, RPB1, RPB2, EF-1α, small and large subunit rDNA genes

(Voigt and Wöstemeyer, 2001; James et al., 2006), and polyphyly in the analysis with

MCM1 gene (Figure 10), like what was found in other studies (Lutzoni et al., 2004). To

resolve this monophyly/polyphyly conflict the inclusion of sequences from other

species within the subphyla Mucoromycotina, as well as from the other subphyla

incertae sedis would be important. The use of additional protein-coding genes would

also help to resolve this conflict.

Basidiomycota includes about 30,000 species of rusts, smuts, yeasts, and

mushroom fungi (Kirk et al., 2001). Most of them are characterized by their meiospores

(basidiospores) on the exterior of typically club-shaped meiosporangia (basidia).

Phylogenetic relationships among the three subphyla of Basidiomycota are still

uncertain. The subphylum Pucciniomycotina is primarily distinguished by containing

the rust fungi (7000 species), which are primarily pathogens of land plants. In our study,

the species representing the subphylum Pucciniomycotina grouped together with the

species of subphylum Agaricomycotina (Figures 8, 9 and 10). This result is not

consistent with previous studies that suggested a sister group relationship with the group

formed by the subphyla Ustilaginomycotina and Agaricomycotina (Lutzoni et al., 2004,

James et al. 2006). The result from our study could be due to the use of only one

representant of the subpylum Puccioniomycotina, the Sporobolomyces roseus, and also

due to long-branch attraction artifact since the species used presented large MADS-box

genes, like the ones observed within the representants of the class Tremellomycetes

from the subphylum Agaricomycotina.

The Ustilaginomycotina includes 1500 species of true smut fungi and yeasts,

most of which cause systemic infections of angiosperm plant hosts. The

Page 90: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

65

Agaricomycotina includes almost two-thirds of the known basidiomycetes, including

the vast majority of mushroom forming fungi. Much of the morphological diversity

exemplified in mushroom fruiting bodies is the result of radiations of certain lineages

within the Agaricomycotina, and recovering their relationships with confidence has

proven very difficult (Binder et al., 2005; Moncalvo et al., 2002). In our work, early-

diverging lineages in the Agaricomycotina are strongly supported, which include

parasitic and/or saprotrophic fungi capable of dimorphism or yeast-like phases, in

agreement with previous studies that found the same relationships among species of this

subphylum (Lutzoni et al., 2004; Fitzpatrick et al., 2006, James et al., 2006).

Ascomycota is the largest phylum within the Fungi characterized by the

production of meiospores (ascospores) in specialized sac-shaped meiosporangia (asci),

which may or may not be produced within a sporocarp (ascoma). The majority of the

species studied in our analysis belonged to this phylum. Within the Ascomycota three

main subphyla are recognized, the Taphrinomycotina, the Pezizomycotina and the

Saccharomycotina. Results from this study showed that all subphyla of Ascomycota

were well supported as monophyletic groups in the phylogenetic tree presented in

Figure 9.

In the results obtained with the Mcm1 protein and RLM1 gene (Bayesian

inference), the class Schizosaccharomycetes from the subphylum Taphrinomycotina sits

outside the other subphyla from Ascomycota. Taphrinomycotina has been resolved as

the earliest diverging monophyletic clade (Liu et al., 1999; Fitzpatrick et al., 2006;

James et al., 2006; Liu et al., 2006; Liu et al., 2009). It includes a diverse group of

species that exhibits yeast-like (for example, Pneumocystis carinii) and dimorphic (for

example, Taphrina deformans) growth forms. The incongruence observed in the

placement of the Schizosaccharomycetes could be due to the fact that in the MADS-box

proteins identified for Ustilaginomycotina and Schizosaccharomycotina a high

variability in the C‘-terminal region was observed. This variability caused a conflict in

the topology determined by bayesian inference and maximum likelihood for RLM1 gene

and Mcm1 protein, respectively (Figure 9 and 10), producing the paraphyly observed,

and grouping these species. Another hypothesis to explain the incongruence could be

the wrong and partial assembly of genomes, which would affect the correct sequence of

genes in the databases; for instance the sequence from MCM1 gene of Blastomyces

Page 91: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

66

dermatitidis was identified, but part of the N‘-terminal was not matching with the query

sequence. Thus, this sequence had to be corrected and completed with the N‘-terminal

from closer species. Other reason for this incongruence could be the robustness of the

method for phylogenetic inference, since the ALRT approach in maximum likelihood

estimates the probability of a branch being correct better than bootstrapping, and is

more conservative than Bayesian posterior probabilities (Anisimova and Gascuel, 2006;

Hall and Salipante, 2007). That could explain why, the class Schizosaccharomycetes

was correctly placed in the analysis with Rlm1 protein and DNA analyses by maximum

likelihood and aLRT test.

The subphylum Saccharomycotina consists of the ‗true yeasts‘, including

bakers‘ yeast (Saccharomyces cerevisiae) and Candida albicans, the most frequently

encountered fungal pathogen of humans, and its monophyly has been demonstrated in

several studies (Lutzoni et al., 2004; Fitzpatrick et al., 2006; James et al. 2006; Suh et

al., 2006). Results obtained in this work are also in agreement with the previous studies

in which three clades, the CUG, the Saccharomyces sensu stricto and the

Saccharomyces sensu lato are clearly shown (Figure 8, 9 and 10).

Pezizomycotina is the largest subphylum of Ascomycota and includes the vast

majority of filamentous and fruit-body-producing species. Within the Pezizomycotina a

number of well-defined classes are observed, namely the Sordariomycetes, the

Leotiomycetes, the Eurotiomycetes and Dothideomycetes whose relationships has been

the subject of many debates in previous studies (Berbee, 1996; Liu et al., 1996). The

topology obtained with Rlm1 and Mcm1 protein/gene in this study indicated that

Leotiomycetes and Sordariomycetes are well supported sister clades (Figure 8, 9 and

10). This result is in agreement with several previous analyses, like the rDNA-based

analysis (Lumbsch et al., 2005), multigenic analyses (James et al., 2006; Spatafora et

al., 2006) and analyses based on complete fungal genomes (Fitzpatrick et al., 2006;

Robbertse et al., 2006).), but is in disagreement with one analysis of four gene

combined dataset that placed the Dothideomycetes as a sister group to the

Sordariomycetes (Lutzoni et al., 2004).

In this study, within the Sordariomycetes, the inferred phylogenetic relationships

amongst the Sordariomycetidae organisms concur with previous phylogenetic studies

Page 92: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

67

(Berbee, 2001). However, in the order Sordariales the placement of Magnaporthe grisea

is not well-defined as the analysis with Mcm1 protein considers it as basal taxon, unlike

the analysis with Rlm1 protein/gene considers Verticillium dahliae as basal taxon.

These results disagree with the phylogeny of Sordariomycetes obtained by other studies,

which placed V. dahliae as the sister taxon of the Hypocreales (Zhang et al., 2006).

Likewise, in this study the inferred phylogeny based on Mcm1 protein and RLM1 gene

(Bayesian inference) propose that Ephicloe festucae is a sister taxon of the group

formed by Fusarium species (Figure 9 and 10), and the analysis of Rlm1 protein/gene

(maximum likelihood) placed it as sister taxon of group formed by Trichoderma

species. However, in our study only one species from family Clavicipitaceae was used.

The Dothideomycetes class form a well-supported sister group with the

Eurotiomycetes in the analysis of Rlm1 protein/gene, but not in the analysis of Mcm1

protein (S-H=0.0). The former result agrees with the previous analysis carried out with

four combined genes which also grouped together the Dothideomycetes and the

Eurotiomycetes (Lutzoni et al; 2004). Conversely, a concatenated alignment of 42

fungal genomes inferred that Stagonospor nodorum (a member of the Dothideomycetes)

is more closely related to the Sordariomycetes and Leotiomycetes lineages (Fitzpatrick

et al., 2006). Another study also based on 17 Ascomycota genomes, which used

concatenated alignment, reported conflicting inferences regarding the phylogentic

position of S. nodorum (Robbertse et al., 2006). The phylogenies reconstructed based

on 17 genomes, using NJ and ML methods, inferred a sister group relationship between

S. nodorum and Eurotiomycetes (Robbertse et al., 2006). This work is in accordance

with a supertree inferred based on 42 complete genomes (Fitzpatrick et al., 2006) and

the tree obtained with Rlm1 protein/gene and Mcm1 protein analyses. However, the

same phylogenomic study inferred, by using maximum parsimony, the placement of S.

nodorum at the base of the Pezizomycotina (Robbertse et al., 2006). The topology

inferred in this work clearly showed two sister clades (orders Capnodiales and

Pleosporales), which concurs with previous phylogenetic studies (Schoch et al., 2006).

In this study, within the Eurotiomycetes class there is only a clade corresponding

to the order Onygenales (Histoplasma capsulatum; Blastomyces dermatitidis

Paracoccidioides brasiliensis, Coccidioides immitis, Coccidioides posadassii,

Uncinocarpus reesii, Microsporum gypseum and Ascosphaera apis). The Onygenales

Page 93: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

68

clade is particularly interesting as it contains mainly animal pathogenic species. Some

of them namely Coccidioides immitis have initially been classified as a protist, but

further research showed it were a fungus and separate studies placed it in three different

divisions of the former group named Eumycota (Rixford and Gilchrist, 1896; Ophuls,

1905; Ciferri and Redaelli, 1936; Baker et al., 1943). Subsequent ribosomal phylogeny

studies (Pan et al., 1994; Bowman et al., 1996) suggested a close phylogenetic

relationship between C. immitis and U. reesii, excluding H. capsulatum. The

phylogenies based on Rlm1 and Mcm1 protein/gene agree with the placement of C.

immitis, C. posadassii and U. reesii as sister taxa, representing two families,

Gymnoascaceae and Onygenaceae. However, a conflict was observed in the placement

of Ascosphaera apis, which formed a clade with Microsporum gypseum with the Rlm1

protein/gene analysis and appears as a basal taxon in Eurotiomycetes, with the Mcm1

protein analysis. The obtained results disagree with the Eurotiomycetes phylogeny

study, in which Ascosphaera apis (Ascosphaeraceae) formed a sister clade in the

Ajellomycetaceae (H. capsulatum, P. brasiliensis and B. dermatitidis) and Microsporum

gypseum, a sister clade of Gymnoascaceae (Geiser et al., 2006). The Eurotiomycetes

branch containing the Eurotiales clade inferred the close relationship among

Talaromyces stipitatus and Penicillium marneffei (Figures 8, 9 and 10). A minor

difference in the Aspergillus clade was observed between the phylogenetic analyses

with Rlm1 and Mcm1 regarding the position of A. nidulans, A. terreus, A. niger and A.

fumigatus (Figure 8, 9 and 10). In this study, Neosatoria fischeri and A. fumigatus

formed well-supported sister taxa, as well as A. oryzae and A. flavus, in accordance to

Fitzpatrick et al.(2006) and James et al. (2006) studies.

Due to the fact that the majority of fungi are still undiscovered, a robust phylogeny

of known taxonomic groups will be essential for placement of unknown species as these

are discovered. As demonstrated for Bacteria and Archaea (Pace, 1997), the Fungi are

likely to harbor many lineages whose discovery is dependent on phylogenetic analyses.

Therefore, it will be necessary to carry out phylogenetic studies including a wider set of

fungal species from other phyla, as the Glomeromycota and the Chytridiomycota, what

will enable to resolve the phylogenetic relationships between misclassified and

misplaced fungal species.

Page 94: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

69

4.3. Relationships within the subphylum Saccharomycotina

Due to the relationship that exists between Candida species and ‗Saccharomyces

complex‘, these groups were further analysed by using both single gene and

concatenated gene phylogenies.

4.3.1. Phylogeny of the ‘Saccharomyces complex’

In figures 8, 9 and 10, the subphylum Saccharomycotina is clearly shown as a

monophyletic clade grouping Yarrowia lipolytica, Candida species and ‗Saccharomyces

complex‘ species, in the phylogenetic tree inferred for the kingdom Fungi. The

phylogenies deduced based on single gene or concatenated genes, including only

species belonging to the subphylum Saccharomycotina are present in Figures 11 to 14.

Results from this analysis showed a topology very similar to the previously inferred

topologies that included all other fungal species, by using Rlm1 and Mcm1 genes.

The Non Whole Genome Duplication (Non-WGD) group, formed by

Kluyveromyces lactis, Ashbya gossypii, Saccharomyces kluyveri and Kluyveromyces

waltii, presented monophyly in all analyses, except in the topologies based on MCM1

gene by bayesian inference (Figure 12B). The relationships amongst the Non-WGD

species presented a variable placement of taxa, with their grouping poorly defined

(Figures 11, 12 and 13). These results were not conclusive due to the low number of

taxa within this complex that have available sequences for this study. In other studies,

based on multigenic and phylogenomic analyses, the relationship amongst the Non-

WGD species has been demonstrated showing K. waltii – S. kluyveri and K. lactis – A.

gossypii grouping together (Fitzpatrick et al., 2006; James et al., 2006; Jeffroy et al.,

2006; Liu et al., 2006).

Another study of the ‗Saccharomyces complex‘ based on multigenic analysis,

that used 75 species showed A. gossypii and K. lactis forming basal separated clades

from S. kluyveri and K. waltii, these latter clade with a medium support (Kurtzman and

Robnett, 2003). Another phylogenetic study based on SSU and LSU rDNA sequences

from 64 species of the family Saccharomycetales placed K. lactis as an early clade and

A. gossypi as a basal clade (Suh et al., 2006). Thus, the correct relationship amongst the

Non-WGD species is still to be resolved.

Page 95: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

70

In our study, a noticeable difference occured in the placement of Saccharomyces

castellii in the WGD clade with respect to previous studies. In the analyses based on

RLM1 gene and protein, S. castellii was a well-supported early taxon within the WGD

group (Figure 11).

A B

Figure 11. Phylogeny of the subphylum Saccharomycotina inferred by using RLM1. (A) Protein

phylogeny obtained by maximum likelihood using the evolutive model JTT + I + G with

nodal support by Shimodaira-Hasegawa; (B) Gene phylogeny obtained by using the evolutive

model GTR + I + G. Nodes values correspond to posterior probabilities (left) and maximum

likelihood (right), Shimodaira-Hasegawa like support). Species with partial sequence are

Kluyveromyces waltii, Saccharomyces paradoxus.

However, the analysis based on Mcm1 protein showed a low support of S.

castellii as an early taxon from Saccharomyces sensu stricto group (S. bayanus, S.

kudriavzevii, S. mikatae, S. cerevisiae and S. paradoxus). The analysis with MCM1 gene

indicated a weak monophyly by ML with Saccharomyces sensu lato group (S. castellii,

C. glabrata, Vanderwaltozyma polyspora) and BI presented a medium support as early

taxon from Saccharomyces sensu stricto group (Figure 12).

The Rlm1-Mcm1 concatenated phylogeny placed S. castellii as an early taxon from

Saccharomyces sensu stricto group, both by maximum likelihood and bayesian methods

(Figure 13). Additionally, in the global fungal phylogeny analysis based on Rlm1 and

Mcm1 sequences, S. castellii also presented an inaccurate placement (Figures 11 - 13).

Page 96: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

71

Inacurrate placement of S. castellii was also obtained by past multigenic studies, in

which it either competed with C. glabrata to the basal placement in the Saccharomyces

sensu stricto group, or formed a clade together with this group (Diezmann et al., 2004;

Fitzpatrick et al., 2006; James et al., 2006; Jeffroy et al., 2006; Liu et al., 2006).

A B

Figure 12. Phylogeny of the subphylum Saccharomycotina inferred by using MCM1. (A) Protein

phylogeny maximum likelihood using the evolutive model JTT + I + G with nodal support by

Shimodaira-Hasegawa; (B) Gene phylogeny using the evolutive model GTR + I + G. Nodes

values correspond to posterior probabilities (left) and maximum likelihood (right,

Shimodaira-Hasegawa like support). b 0.16 is the support from group formed by S. castelli,

C. glabrata and V.polypsora by maximum likelihood. Species with partial sequence:

Kluyveromyces waltii

However, a study including a higher number of Saccharomycetales species and

based on SSU and LSU rDNA showed that V. polyspora was closer to S. cerevisiae than

S. castellii and C. glabrata (Suh et al., 2006). Another study based on mitochondrial

and nuclear genes as COX1, mit SSU rDNA, EF-1α, rDNA (ITS1, ITS2), nuc LSU

rDNA (D1/D2), placed S. castellii forming a closer clade to Saccharomyces sensu

stricto group, and C. glabrata and V. polyspora forming earlier clades than S. castellii

with respect to Saccharomyces sensu stricto group (Kurtzman and Robnett, 2003).

The sister group relationships amongst the Saccharomyces sensu stricto species

also differed between Rlm1, Mcm1 and concatenated phylogenies (Figures 11 - 13). For

example, the Saccharomycetales Rlm1 protein/gene phylogeny placed S. bayanus and S.

Page 97: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

72

kudriavzevii as sister taxa at the base of the Saccharomyces sensu stricto node and S.

mikatae, S. cerevisiae and S. paradoxus forming a group within it, by ML and BI. But

Mcm1 protein phylogeny placed S. mikatae as basal taxon while MCM1 gene

phylogeny did not define with accuracy the basal taxon between S. bayanus or S.

kudriavzevii.

A B

Figure 13. Phylogeny of the subphylum Saccharomycotina inferred by using concatenated alignment of

RLM1 - MCM1. (A) Protein phylogeny obtained by maximum likelihood with the evolutive

model JTT + I + G and nodal support by Shimodaira-Hasegawa; and (B) Gene phylogeny

obtained using the evolutive model GTR + I + G. Nodes values correspond to posterior

probabilities (left) and maximum likelihood (right, Shimodaira-Hasegawa like support).

Species with partial sequence: Kluyveromyces waltii.

The concatenated alignment of Rlm1 and Mcm1 proteins (Figure 13) inferred a

result similar to Rlm1 gene/protein phylogenies (Figure 12), and the analysis based on

concatenated alignment of genes (RLM1 and MCM1) placed S. bayanus at the base of

the Saccharomyces sensu stricto node and inferred a ladderised topology amongst the

Saccharomyces sensu stricto species (Figure 13). These phylogenetic trees, based on

RLM1 and concatenated genes, agree with supertrees that indicated both ladderised

topology and S. bayanus and S. kudriavzevii as sister taxa (Fitzpatrick et al., 2006,

Jeffroy et al., 2006). The study of the ‗Saccharomyces complex‘ based on multigenic

analysis also showed a ladderised topology in Saccharomyces sensu stricto species,

having S. bayanus as basal taxon (Kurtzman and Robnett, 2003).

Page 98: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

73

To analyze the degree of conflicting phylogenetic signal within the concatenated

alignment, a phylogenetic network was constructed and a bootstrap analysis was

performed on this phylogenetic network (Figure 14). It is interesting to note that

numerous alternative splits were obtained (60 in total), however no splits were observed

excluding C. glabrata, V. polyspora and S. castellii from the remaining WGD

organisms, despite these species form clades within the WGD group. These results are

in conflict with the concatenated phylogeny (Figure 13), which inferred that C. glabrata

and V. polyspora formed a well-defined clade, and S. castellii was the basal species

from the remaining WGD organisms. The syntenic information clearly shows that S.

castellii diverged from the Saccharomyces sensu stricto lineage before C. glabrata

(Scanell et al., 2006). Therefore topologies that place C. glabrata as outgroup to the

Saccharomyces sensu stricto lineage and S. castellii or vice-versa are unreliable

(Scanell et al., 2006; Fitzpatrick et al., 2006; Suh et al., 2006), like the ones observed in

our study with the simple and concatenated genes. It is known that systematic bias can

influence the resulting tree (Phillips et al., 2004). Hence, a closer scrutiny would be

necessary.The incongruences found in these analyses suggest that the use of additional

Saccharomycotina taxa would be important to help solve the placement of species

belonging to these groups. In addition, it is likely that stochastic errors and erroneous

inferences could be eradicated with the addition of extra genome data, when it becomes

available.

Figure 14. Phylogenetic network reconstructed using a concatenated alignment of RLM1 – MCM1 genes

in the subphylum Saccharomycotina. The NeighborNet method was used to infer splits within

the alignment (1000 bootstrap nodal support).

Page 99: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

74

4.3.2. Phylogenetic relationship amongst Candida species

The group of organisms that translate mainly CUG as serine instead of leucine has

been designated as the CUG group (Kawaguchi et al., 1989; Santos and Tuite, 1995).

This codon reassignment has been proposed to have occurred about 170 million years

ago (Massey et al., 2003). Rlm1, Mcm1 and concatenated topologies inferred a robust

monophyletic clade containing organisms within the CUG group (Figures 11 - 13). The

topology analysis showed that there are four putative CUG sub-clades, the first

containing C. lusitaniae, the second C. guilliermondii and Debaromyces hansenii, the

third Pichia stipitis and the fourth containing C. tropicalis, C. albicans, C. dubliniensis,

C. parapsilosis, and Lodderomyces longisporus. These topologies have also been

recognized in other phylogenetic studies inferred by rRNA genes (Suh et al., 2006).

Some phenotypic and morphologic characteristics could correlate with the topology

observed since C. lusitaniae and C. guilliermondii are haploid yeasts, and their

teleomorphic states present sexual reproduction (Wickerham and Burton, 1954;

Rodriguez de Miranda, 1979; Young et al., 2000). D. hansenii and Pichia stipitis are

homothallic, with a fused mating locus (Dujon et al., 2004; Fabre et al., 2005). In

contrast, members of the fourth sub-clade have at best a cryptic sexual cycle and have

never been observed to undergo meiosis (Hull and Johnson, 1999; Hull et al., 2000;

Magee and Magee, 2000; Pujol et al., 2004; Logue et al., 2005).

The phylogeny of Candida species was also investigated in this study by using a

further analysis with Iff8 protein and gene (Figure 15). The resultant topologies placed

L. elongisporus within the asexual clade with high support, which is in agreement with

other phylogenetic studies based on multigenic and phylogenomic analyses (James et

al., 1994; Fitzpatrick et al., 2006; Diezmann et al., 2004).

The CUG specific topology based on three concatenated proteins/genes, RLM1,

MCM1 and IFF8, suggested that D. hansenii and C. guilliermondii are sister taxa, as

they are grouped together with high support by maximum likelihood and bayesian

inference, excluding C. lusitaniae and P. stipitis (Figure 16). Topologies obtained with

Rlm1 protein/gene and RLM1-MCM1 concatenated genes presented conflicts in the

placement of P. stipitis, being included within C. guilliermondii – Debaryomyces

hansenii clade (Figure 11 and 13A).

Page 100: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

75

A B

Figure 15. Phylogeny of CUG Group based on IFF8. (A) Protein phylogeny obtained by maximum

likelihood, using the JTT + I + G evolutive model with nodal support by Shimodaira-

Hasegawa; (B) Gene phylogeny obtained by using the evolutive model used was GTR + I +

G. Nodes values correspond to posterior probabilities (left) and maximum likelihood (right,

Shimodaira-Hasegawa like support). *In the maximum likelihood analysis, C. tropicalis,

C.parapsilosis and L.elongisporus form a clade with 0.41 S-H and C.parapsilosis-

L.elongisporus are sister taxa with 0.98 S-H. The species mentioned before with C.albicans

and C.dubliniensis form a clade well supported 0.97S-H.

A B

Figure 16. Subphylum Saccharomycotina phylogeny reconstructed using concatenated alignment of

RLM1 - MCM1 – IFF8. (A) Protein phylogeny obtained by maximum likelihood with the

evolutive model JTT + I + G and Shimodaira-Hasegawa like nodal support; (B) Genes

phylogeny obtained by using the evolutive model used was GTR + I + G. Nodes values

correspond to posterior probabilities (left) and maximum likelihood (right, Shimodaira-

Hasegawa like support).

Page 101: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

76

Other studies have placed C. lusitaniae in a clade with C. guilliermondii, and

inferred a closer relationship between the two with Debaryomyces species (Daniel et al.,

2001; Diezmann et al., 2004). These results raise interesting questions regarding the

sexual status of the Candida species. It is possible that the "asexual" species are in fact

fully sexual. C. albicans, C. dubliniensis and C. tropicalis have been observed to mate

(Soll et al., 1988; Pujol et al., 2004), and in addition C. albicans genome contains most

of the required genes for the development of meiosis (Tzung et al., 2001). In contrast

the evidence that L. elongisporus reproduces sexually is based on the appearance of

asci, with one (or sometimes two) spores (Van der Walt, 1966).

The phylogenetic network constructed based on the three concatenated genes

(Figure 17) corroborated the CUG-specific concatenated tree regarding the grouping of

C. guilliermondii and D. hansenii as sister taxa, excluding C. lusitianiae and P.stipitis,

but when the network was inferred by two concatenated genes C. guilliermondii, D.

hansenii and P. stipitis formed sister taxa (Figure 14). This analysis showed a topology

conflict with the phylogeny based on Iff8 protein and gene for L. elongisporus group

with C. parapsilosis (Figure 15). As expected there was no conflict for the grouping of

C. albicans and C. dubliniensis, illustrating their high genotypic similarity (Sullivan et

al., 1995), and those species formed with C. tropicalis a highly supported clade, which

agree with other recent published studies (Fitzpatrick et al., 2006; James et al., 2006;

Suh et al., 2006).

Figure 17. Subphylum Saccharomycotina phylogenetic network reconstructed using a concatenated

alignment of RLM1 – MCM1 – IFF8 genes. The NeighborNet method was used to infer splits

within the alignment (1000 bootstrap nodal support).

Page 102: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

77

4.4. Analysis of adaptive evolution

Genome duplication has occurred in the WGD group as previously described (Kellis

et al., 2004). Understanding the fate of duplicates is fundamental to clarify mechanisms

of genetic redundancy and the link between gene family diversification and phenotypic

evolution. Models to identify possible sites of positive selection can provide alternative

explanations for the persistence and diversification of gene duplicates.

The detection of an excess in the ratio of the rate of nonsynonymous (dN) over the

synonymous (dS) substitutions (dN/dS, also denoted ω) is a nonambiguous indicator of

positive selection at the codon level. Positive selection (ω>1) is considered as the

selection of fixing advantageous mutations, and this term is used interchangeably with

molecular adaptation and adaptive molecular evolution. Purifying selection (ω<1) is

considered as the natural selection against deleterious mutations and the term is used

interchangeably with negative selection or selective constraints. Several likelihood-

based tests were used to search for evidence of positive selection using the PAML 4.1

and HYPHY 1.0 packages. Codon-based substitution models have been widely used to

identify amino acid sites under positive selection in comparative analysis of protein-

coding DNA sequences.

From the three genes analyzed in this study, the RLM1 was the one that gave better

phylogenetic results. In this view, the sequence of RLM1 was analyzed at the codon

level to detect both positively selected and purified selected amino acid sites by using

the complex models M3, M2a, an M8 and their corresponding null models M0, M1a

and M7, respectively.

First, the RLM1 sequences from animals, plants and fungi were used to infer

positive selection by testing the six different codon sites models implemented in the

PAML 4.1 package (Annexes VII). According to Nielsen and Yang, (1998) in the

analyses of sequences with high variability the gaps introduced in the alignment should

be removed. Thus the first analysis, in which all animals, plants and fungi RLM1

sequences were tested, only the MADS-box was considered. Results showed no

evidence for positive selection in the MADS-box type II domain, suggesting that this

sequence was extremely conserved within the fungi. Then, this same analysis was

performed in the complete RLM1 gene but only for the subphylum Saccharomycotina to

Page 103: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

78

avoid the presence of a high number of gaps and to analyze closer species. This new

analysis was performed by using the HYPHY 1.0 package because it consider the

presence of gaps calculating a dN/dS value for those positions (Annex VIII). To better

analyze this sequence the conserved moieties were identified. In the alignment of

Saccharomycotina Rlm1 proteins four conserved moities were recognized, the first

motif is the MADS-box conserved region at the N-terminal, the second is an acidic

region close to MADS-box, third and fourth motif were named A and B, respectively

(Figure 18).

Figure 18. Conserved regions in RLM1 gen of the subphylum Saccharomycotina: MADS-box, Acidic

region, A-B Conserved regions present in this subphylum.

MADS-BOX

Acidic region A

B

Page 104: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

79

At the C-terminal of Rlm1 a repetitive region was identified in Candida albicans,

Candida dubliniensis, Kluyveromyces lactis and Saccharomyces sensu stricto group

(Figure 19).

Results obtained by applying the models to identify sites of positive selection

showed exactly the same evidence, no positive selection not only in the MADS-box

domain but also in the all RLM1 sequence, confirming the previous findings. The

significance of the values obtained with the complex models against their corresponding

null models was tested by using a Likelihood Ratio Test (LRT), according to Anisimova

et al. (2001) (Annex IX).

The LRT test confirmed the results supporting no positive selection. Overall, the

results from maximum-likelihood models of codon evolution indicated that the

evolution of MADS-box from fungi and Saccharomycotina RLM1 gene are mainly

under purifying selection, suggesting that the expressed protein plays an important role

in the fungi that must be maintained.

Figure 19. Region of microsatellites present in some species from subphylum Saccharomycotina.

Alignment was performed by using MEGA 4.0 with ClustalW.

A new codon-based model the Mechanistic Empirical Combined (MEC), based

on the junction of all parameters used by the traditional models mentioned before and a

new empirical amino acid replacement model was recently described (Doron-

Faigenboim and Pupko, 2006). This MEC model was also tested on the

Saccharomycotina sequences to detect sites in Rlm1 under positive selection. By means

of this new model it was possible to identify sites under positive selection in the

Page 105: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

80

Saccharomycotina Rlm1 protein. The protein sequence from S. cerevisiae was used as

pattern to identify those sites (Figure 20).

The MEC model produced a loglikelihood value of -38544.7 that is higher than -

39363.5 obtained for M8a (used as the null model) which indicates the presence of sites

under positive selection. To test the significancy of the MEC model result, an AICc

(Akaike Information Criterion correction) was performed and the value of AICc for the

MEC model (77099.41602) was lower than the one obtained for the M8a model

(78735.01068), which means that the hypothesis of positive selection can be accepted.

Several sites in the Rlm1 sequence were identified as putatively under positive

selection. It is curious to observe that 57% (20 out of 35) of the places under positive

selection were observed in the first part of the protein, right after the MADS-Box and

before the region A. Between the MADS-box and the acidic region 12% of the amino

acid residues were identified as being under positive selection, between the acidic

region and the region A 8%, between the A and B region 1.7%, and between the B and

the repetitive region 10% (Figure 20).

A particular fact was the observation of amino acids under positive selection

near the microsatellite region. Taking into consideration this repetitive region, we can

observe that in C. albicans, C. dubliniensis and K. lactis the amino acid under repetition

is glutamine, coded by CAA, while in the group of Saccharomyces sensu stricto /S.

cerevisiae is asparagine, encoded by AAC. This observation showed that amino acid

substitutions that would have occurred during the divergence of species, due to events

of mutations and natural selection changed this microsatellite region and probably

diversified gene function in WGD species, avoiding the loss of protein function.

Numerous lines of evidence have demonstrated that genomic distribution of

repetitive DNA, particularly of microsatellites, is nonrandom presumably because of

their effects on chromatin organization, regulation of gene activity, recombination,

DNA replication, cell cycle, mismatch repair (MMR) system, etc. (Li et al., 2002).

Microsatellites may provide an evolutionary advantage of fast adaptation to new

environments as evolutionary tuning knobs (Kashi et al., 1997; Trifonov, 2003). Due to

the presence of trinucleotide microsatellites within protein coding regions tracts of

Page 106: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

81

repeated amino acid are present in some proteins and different amino acid repeats are

concentrated in different classes of proteins. In yeast proteins, the most abundant amino

acid repeats are Gln, Asn, Asp, Glu, and Ser (Richard and Dujon 1997; Alba et al.,

1999). In this work within Saccharomycotina Rlm1 microsatellite region tracts of poly-

Gln and poly-Asn have been found (Figure 19). The change in the amino acid tract at

the microsatellite region, from Gln (CAA) to Asn (AAC) could be due to a possible

frameshift mutation that occurred in the gene before the microsatellite region. Since,

this kind of mutations has been already observed in MADS-Box genes, exactly at the C-

terminal (Kramer at al. 2006). Thus, the microsatellite repetitive region of this gene was

further analysed.

Figure 20. Saccharomyces cerevisiae amino acid sequence showing sites under positive selection by the

MEC model. Conserved regions: MADS-box (blue), Acidic region (orange), Region A

(purple), Region B (green) and microsatellites (red).

Page 107: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

82

4.5. Event of frameshift mutation in Saccharomyces cerevisiae RLM1

gene

In the sequence alignment of Saccharomycotina Rlm1 proteins four conserved

moities were recognized, and at the C-terminal a repetitive region was identified in C.

albicans, C. dubliniensis, Kluyveromyces lactis and Saccharomyces sensu stricto group,

which seems to be under positive selection. It is difficult to determine the molecular

events that changed the amino acid residues in the sites identified as under positive

selection but from the nucleotide sequence analysis, the molecular event that could have

changed the tract at the repetitive region might have been a frameshift mutation.

Additionally, this type of mutations has been already observed at the C- terminal of

MADS-box genes, as previously mentioned, and thus this hypothesis was tested by

analyzing the different RML1 reading frames.

The three open reading frames from C. albicans, C. dubliniensis, K. lactis and S.

cerevisiae were analyzed to find all the potential repetitive amino acids in the C-

terminal region of this gene. The first reading frame showed a poli- Glutamine (Gln–Q)

tract in the C-terminal of C. albicans, C. dubliniensis and K. lactis, and a poli-

Asparagine (Asn–N) in the C - terminal of S. cerevisiae (Figure 21). These reading

frames are the actual coding frames of the mentioned yeast species. The S. cerevisiae

second reading frame showed a translation to Threonine (The–T), while the third

reading frame encoded a poli-Glutamine tract, just like C. albicans, C. dubliniensis and

K. lactis (Figure 21).

The mutational mechanisms known to contribute to microsatellite polymorphism

and that could affect the reading frame in the microsatellite region are the following: i)

DNA polymerase slippage during DNA replication (Tachida and Izuka, 1992) and ii)

recombination (unequal crossover or gene conversion) between DNA strands (Harding

et al., 1992; Li et al., 2002). Microsatellites studies indicated that trinucleotide repeats

in coding sequences of many organisms are selected against possible frameshift

mutation since these mutations are generally considered detrimental by causing

nonfunctional transcripts and /or protein, through possible insertion of premature stop

codons (Ohno, 1970, Tóth et al., 2000; Wren et al., 2000; Cordeiro et al., 2001).

Although trinucleotide repeats are under selective pressure to avoid frameshift

mutations, exceptional situations have been described when a protein is temporarily

Page 108: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

83

freed from selective pressure. If selective pressure is relieved, for example, through a

second copy of a gene, this duplicate can compensate for the possible loss of function

caused by the frameshift mutation and enable such mutations to lead to functional

divergence (Raes and Van de Peer, 2005).

Figure 21. The three RLM1 open reading frames of Candida albicans, Kluyveromyces lactis and

Saccharomyces cerevisisae visualized with the Virtual Ribosome 1.0. The microsatellite

region, as well as the most likely position to insert a nucleotide and change S. cerevisisae

reading frame, avoiding the presence of stop codons, are indicated

Genomic duplication has been proposed as an advantageous path to evolutionary

innovation, because duplicated genes can supply genetic raw material for the emergence

of new functions through the forces of mutation and natural selection (Ohno, 1970).

Such duplication can involve individual genes, genomic segments or whole genomes

and coordinated duplication of an entire genome may allow for large-scale adaptation to

new environments.

In this work, the phylogeny analysis clearly inferred two groups that suffered

genome duplication: Saccharomyces sensu strict and sensu lato and it was in the

Saccharomyces sensu stricto group that the amino acid encoded by trinucleotide repeats

suffered alteration. The genome duplication of S. cerevisisae was firstly proposed by the

locations of paralogues that revealed the existence of ancestral duplication blocks

(Wolfe and Shields, 1997; Langkjaer et al., 2003). The model WGD was then proposed

Nucleotide insertion

C. albicans

K. lactis

S. cerevisiae

3

2

1

3

2

1

3

2

1

Page 109: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

84

and corroboted by syntenic and genomic studies (Wolfe and Shields, 1997; Seoighe and

Wolfe, 1999, Kellis et al., 2004). Those studies postulated that the WGD event occurred

after the divergence of S. cerevisiae and K. lactis, being also identified a group of Non-

WGD species represented by K. lactis, A. gossypii, S. kluyveri and K. waltii (Byrne and

Wolfe, 2006). In accordance to the inferred phylogenies firstly the CUG group diverged

from the Non WGD – WGD groups, and then the Non WGD – WGD groups separated.

Therefore, the microsatellite within RLM1 must have been present in same ancestral

taxon within Saccharomycotina, evolving during the speciation process, and changing

in recent species, possibly due to a frameshift mutation, as this work is indicating.

Additionally, frameshift mutations have been detected in MADS-box transcription

factor families of plant, causing mutations exactly in the C–terminal and yielding

functional diversification (Lachtman, 1998, Kramer et al., 2006).

In the present study two members of the MADS- box transcription factor family

were studied, the Rlm1 and the Mcm1. However, in S. cerevisiae two other members of

this family were identified, and this species presents four MADS- box in total: the Smp1

whose function is involved in regulating the response to osmotic stress (de Nadal et al.,

2003) and with orthologues in the WGD group and none in the Non WGD and CUG

groups; and the Arg80 whose function is involved in the regulation of arginine-

responsive genes (Dubois et al., 1987), being identified in S. cerevisiae and in the WGD

group (Annexes V and VI). Recently, the complete genomes of Zygosaccharomyces

rouxii and Kluyveromyces thermotolerans have been sequenced, showing that Z. rouxii

presents four MADS-box transcription factors, like the WGD group, while K.

thermotolerans presents just two MADS-box transcription factors, like Non-WGD

group. Therefore, we could infer that the genome duplication in Saccharomycotina

would have occurred in taxa that presented the four MADS-box transcription factors.

Previous studies suggested that at least one MADS-box gene was present in the

common ancestor of plants, animals, and fungi, and that probably the duplication that

gave rise to the animal MADS- box type II and type I genes occurred after animals

diverged from plants, but before fungi diverged from animals (Theissen et al., 1996).

However, Alvarez-Buylla et al. (2000) proposed that at least one ancestral MADS- box

gene duplicated in the common ancestor of the major eukaryotic kingdoms, more than a

billion years ago, to give rise to the distinct type I (SRF-like) and type II (MEF2-like)

Page 110: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

85

lineages found in plants, fungi, and animals today. Figure 22 presents a scheme of the

MADS-box genes evolution model based on Alvarez-Buylla et al. (2000).

Saccharomyces cervisiae Candida albicans

B)

Figure 22. A) Evolution of MADS-box genes based on the Alvarez-Buylla et al.(2000) model, proposing

a common origin for Animals, Fungi and Plants and a further duplication event in the Fungi

lineage in some species within the subphylum of Saccharomycotina (WGD group). B)

Comparison of Rlm1 protein/gene domains in S. cerevisiae and C. albicans, showing the

frameshift mutations occurred after genome duplication in S. cerevisiae and WGD species

that changed the coding amino acid from Q to N.

Plants

Lineage

Animal and Fungi

Lineages

Animals

Lineage

Fungi

Lineage

WGD – Saccharomycotina

Lineage

Common

Ancestral

1 billion

years ago

(Based on

Alvarez-Buylla et

al., 2000)

APETALA,

PISTILLATA,

DEFICIENS,

CAULIFLOWER,

FRUITFULL,

AGAMOUS and

SQUAMOSA

MEF2-like and

SRF-like

isoforms

A)

Page 111: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

86

Animal and fungal MADS-box sequences type II are more closely related to most

plant MADS-box sequences type II than to animal MADS-box sequences type I,

suggesting the occurrence of at least one gene-duplication event before the divergence

of plants and animals. This model also proposes that after the divergence of animal-

fungi and plant lineages both types of MADS-box genes were transferred to each

lineage. In the plant lineages different evolutive mechanisms such as gene duplication,

frameshift mutation, and alternative splicing, occurred, producing a variety of MADS-

box transcription factor genes, while in fungi and animals each diverged with both

MADS-box types. In animals the alternative splicing has been reported as the main

evolutive mechanism that changed these genes. Fungi lineages maintained both MADS-

box types represented by RLM1 and MCM1 genes, but within the ‗Saccharomyces

complex‘ in subphylum Saccharomycotina, with the whole genome duplication event

occurring in Saccharomyces sensu stricto and sensu lato clades, the genes SMP1 and

ARG80 appeared, that are paralogues to Rlm1 and Mcm1, respectively. After this

genome duplication the coded amino acid in the repetitive region of RLM1 gene within

the Saccharomyces sensu stricto group changed. This was due to a frameshift mutation

to encode Asn instead of Gln as in the Non-WGD and CUG groups, that represent more

basal clades in the subphylum Saccharomycotina.

The putative ancestral RLM1 gene was predicted to try to confirm the microsatellite

codon present in the commom ancestor from which CUG, Non-WGD and WGD groups

diverged (Figure 23). This ancestral gene presented the main features characteristic of

the MADS-box transcription factor that has been previously identified namely, the

MADS-box, the acidic region and the regions A and B (Figure 18), but the repetitive

region was not observed. However, in the C-teminal of the protein this predicted

putative ancestral gene/protein showed a low accuracy percentage (59.9% - 43%),

presenting 194 unreliable sites of 1407. These unreliable sites are present in different

regions other than conserved ones, indicating a large variability on the unreliable

regions that were mainly replaced by amino acids with similar properties. On the other

hand, the conserved regions will be a primordial in the function of the protein and thus,

their presence in basal taxa in the subphylum Saccharomycotina.

Several reports indicate that transcription factors and protein kinases are

significantly associated with acidic and polar amino acid repeats, whereas Ser repeats

Page 112: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

RESULTS AND DISCUSSION

87

are significantly associated with membrane transporter proteins (Alba et al., 1999).

Rlm1 is a transcription factor and the repetitive tracts observed are poly-Gln and poly-

Asn, which agrees with the previous refered works.

M G R R K I E I Q P I T D D R N R T V T F I K R K A G L F K

5' ATGGGTAGAAGAAAGATTGAAATTCAGCCGATCACGGACGATCGAAACCGGACAGTGACGTTTATCAAGCGTAAGGCCGGTCTATTCAAA 90

>>>.......................................................................................

* * *

K A H E L A V L C Q V D V A V I I F G N N N K L Y E F S S V

5' AAGGCTCACGAGTTGGCTGTGCTTTGCCAGGTAGATGTTGCGGTGATCATCTTTGGCAACAACAATAAGCTGTACGAGTTCTCCTCTGTT 180

............)))......................................................)))..................

*

D T N E L I K R Y Q K N M P H E M K A P E D Y G D Y K K K K

5' GACACAAACGAGTTGATTAAAAGATACCAGAAAAACATGCCGCACGAAATGAAGGCGCCTGAGGACTACGGAGACTACAAGAAGAAAAAA 270

............))).....................>>>.........>>>.......................................

* * * * *

H V G D D R P T A H F N V N N N N D D D D D D D D D E D D D

5' CATGTGGGTGATGATAGACCAACGGCACATTTTAATGTTAATAACAATAATGACGATGATGATGACGATGATGATGATGAAGATGATGAT 360

..........................................................................................

* * * * * * * * * * * * *

D T D N D T P D A N P K D K N E T E K K T Q N Q T P K E L N

5' GATACTGATAACGATACTCCTGATGCCAATCCTAAAGATAAAAACGAGACGGAAAAAAAAACTCAAAATCAAACTCCTAAGGAGCTGAAC 450

....................................................................................)))...

* * * * * * * * * * * * * * * *

H P Q Q P P P P Q H T S Q Q Q V Q K P E A P K D Q P A P P T

5' CATCCTCAGCAACCCCCGCCGCCTCAACATACTTCTCAACAACAAGTACAAAAACCAGAAGCACCTAAAGACCAACCCGCGCCGCCGACA 540

..........................................................................................

* * * * * * * * * * * * *

P P N H T H K N P D T M N Q R P E P P V Q I P T D V Q T N H

5' CCTCCAAACCACACACACAAGAACCCGGATACTATGAACCAAAGACCTGAACCGCCAGTTCAAATTCCAACTGATGTCCAGACTAATCAT 630

.................................>>>......................................................

* * * * * * * * * * * * * * * * * *

Q N D N A K T I T A I H T H K Q K N Q K N T K N K N N K N N

5' CAGAACGATAATGCTAAAACCATTACGGCCATTCACACCCATAAACAAAAAAATCAAAAGAATACTAAGAATAAAAATAATAAAAATAAT 720

..........................................................................................

* * * * * * * * * * * * * * * * * * * * * * * *

L S N Q N H I N S E T T P T T N Y S A S I K T P D S S N H T

5' CTAAGTAATCAGAATCATATTAATAGTGAAACTACTCCAACTACCAATTATTCTGCATCGATCAAAACACCTGATTCATCAAACCATACT 810

..........................................................................................

* * * * * * * * * * * * * *

L P I P T T T K E L P P T P V T A T A P G L P I S K N N P Y

5' TTGCCAATACCAACAACAACTAAAGAACTACCTCCTACTCCAGTTACTGCAACTGCTCCTGGATTACCAATAAGTAAAAATAATCCATAT 900

))).......................................................................................

* * * * * * * * *

F F G S E P P P Q V S P P Y P S I T P I L Q H E I P Q Q D P

5' TTTTTTGGATCCGAACCTCCACCTCAAGTTTCTCCTCCATATCCTTCTATTACACCTATACTTCAACATGAAATTCCTCAACAAGATCCA 990

..........................................................................................

* * * * * * * * * *

P P Q P V G Q H T P Q Y Q Q T T Q T D T N N K K K L P P P H

5' CCGCCGCAACCGGTAGGACAACATACACCTCAATACCAACAAACTACTCAAACAGATACTAACAATAAGAAAAAATTACCACCACCACAT 1080

..........................................................................................

* * * * * * * * * * * * * *

P N L N H P Q N N A A P I P P P E T I E H N P T G E E Q T P

5' CCTAATCTAAATCATCCACAAAATAATGCGGCACCAATACCTCCACCTGAAACAATTGAACATAATCCTACTGGTGAAGAACAAACGCCA 1170

..........................................................................................

* * * * * * * * * * * * * * * * * *

I S G L P S R Y V N D M F P S P S N F Y A P Q D W P T G M T

5' ATATCAGGACTACCATCGCGATATGTTAACGACATGTTTCCATCTCCATCTAACTTCTACGCTCCTCAAGATTGGCCCACTGGTATGACA 1260

.................................>>>................................................>>>...

* * *

P I N A N M P Q Y V A G T I P A G G P K K E Q T R T R K K T

5' CCCATTAATGCTAATATGCCTCAATACGTTGCGGGTACGATTCCTGCAGGAGGTCCTAAGAAAGAACAAACTCGAACTCGCAAAAAAACA 1350

...............>>>........................................................................

* * * * * * * * * * * * * * * * * * *

K G N D K T D N K E D N N K E K K R K

5' AAGGGAAATGATAAAACGGATAATAAGGAAGACAATAATAAAGAAAAAAAGAGAAAA 1407

.........................................................

* * * * * * * * * * * * * *

Figure 23. Predicted putative ancestral RLM1 gene sequence from which CUG, WGD and Non-WGD

groups diverged. MADS.box (red), Acidic region (orange), conserved region A (yellow) and

conserved region B (green). *unreliable sites.

Earlier investigations speculated that eukaryotes incorporating more DNA

repeats, might provide a molecular device for faster adaptation to environmental

Page 113: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

88

stresses (Kashi et al., 1997; Marcotte et al. 1999; Wren et al., 2000; Li et al. 2002;

Trifonov, 2003). This speculation has been supported by an increasing number of

experiments. For instance, in S. cerevisiae, microsatellites are overrepresented among

ORFs encoding for regulatory proteins (e.g., transcription factors and protein kinases)

rather than for structural ones, indicating the role of microsatellites as a factor

contributing to fast evolution of adaptive phenotypes (Young et al., 2000). In

prokaryotes, microsatellites are not as abundant as in eukaryotes. Most of the

microsatellites in bacteria are located in virulence genes and/or regulatory regions, and

they affect pathogenesis and bacterial adaptive behavior (Hood et al., 1996; Peak et al.,

1996; van Belkum et al., 1998; Field and Wills, 1998). The contingency genes

containing microsatellites show high mutation rates, allowing the bacterium to act

swiftly on deleterious environmental conditions, showing the role of these repetitive

DNA sequences in natural selection (Moxon et al. 1994). In C. albicans, it has been

demonstrated that the microsatellite region in the 3‘end of RLM1 gene, denominated

CAI, presents high polymorphism (Sampaio et al., 2003, 2009). Additionally, the

analysis of CAI polymorphism in strains from recurrent infections revealed

microevolutionary changes at CAI, suggesting a putative involvement of RLM1 in

evolutive and/or adaptative response of C. albicans strains during recurrence (Sampaio

et al. 2005).

The C-terminal of proteins from the MADS box family is necessary for dimerization

and required for transcripticional activation (Messenguy and Dubois, 2003). It is known

that their regulatory specificity depends on accessory factors and in many cases the

cofactor with which the protein interacts specifies which genes are regulated, when they

are regulated and if these genes are transcriptionally activated or repressed (Shore and

Sharrocks, 1995). Moreover, Sampaio et al., (2009) demonstrated that the CAI

repetitive region confers a high genetic variability to RLM1 gene which is reflected in

strain susceptibility to different stress conditions. Thus, and although not evident in the

putative ancestral sequence, this regions seemed to be important for the protein

function. After genome duplication the coded amino acid in the repetitive region of

RLM1 gene within the Saccharomyces sensu stricto group changed, possibly due to a

frameshift mutation to encode Asn instead of Gln. This change certainly created

additional /new function for this protein.

Page 114: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

FINAL REMARKS

Page 115: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 116: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

FINAL REMARKS

91

5. FINAL REMARKS

Presently, the search of protein-encoding genes that present the desired

characteristics to infer a robust phylogeny is a challenge. Several genes have been used

to construct phylogenetic trees of different species among which the ribosomal genes

were used to infer most of the phylogenetic relationships among the different kingdoms

of life. In recent years, the use of concatenated gene alignments (multigene analysis)

and of complete genome analysis has become the best approach for resolving

phylogenetic relationships among species, whose placements were not well determined

by single-gene approach. However, the concatenated alignments of genes have only

been based in conserved regions, which can lead to a loss of genetic information that

might be important to resolve certain species placement that even with a large number

of genes may not be possible to infer accurate position.

In this work we used three nuclear protein-encoding genes, RLM1 and MCM1,

belonging to the family of MADS-box transcription factors, and Iff8 a GPI-anchor

protein, to infer phylogeny in the kingdom Fungi, using the entire gene sequence in

contrast with the previously mentioned studies. Orthologue sequences for the two genes,

RLM1 and MCM1, were identified in 76 species from the kingdom Fungi that have their

genome sequenced, and 8 putative orthologue IFF8 genes in the group CUG.

Results obtained from the phylogenetic analysis using the transcription factor RLM1

indicated that it presents conditions to be considered within a multigene analysis, since

the obtained phylogeny is closer to the ones established by multigene and phylogenomic

analyses. The transcription factor Mcm1 presented limitations to be used in phylogeny

at the kingdom level, because of its variable size sequences, leading to possible errors in

the alignment. This problem could be overcome with extra sequences of other species

that would help to infer a more accurate phylogeny and still be used at the kingdom

level. Despite this limitation this gene can be used to resolve phylogeny at the phylum

or lower clades level, because its use, independently and/or concatenated, to infer

phylogeny within the subphylum Saccharomycotina was in agreement with published

studies. On the other hand, the use of IFF8 gene to infer phylogeny is limited and

restricted to the CUG group, since other orthologues were not found within the kingdom

Fungi and the results obtained in CUG group phylogeny presented conflicts, particularly

Page 117: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

92

in the position of C. tropicalis that is not in agreement with what has already been

determined as its correct relationship in Candida phylogeny.

It is known that, within subphylum Saccharomycotina, the process of genome

duplication has occurred in the ‗Saccharomyces complex‘, in known groups as sensu

stricto and sensu lato, resulting in duplication of some genes. The transcription factors

studied in this work are within the duplicated genes that were maintained.

Understanding the fate of duplicates is fundamental to clarify mechanisms of genetic

redundancy and the link between gene family diversification and phenotypic evolution.

Thus, in this study, the RLM1 gene was analyzed particularly within the subphylum

Saccharomycotina to identify possible sites of positive selection that could provide

alternative explanations for the persistence and diversification of this gene. The analysis

of adaptive evolution within the subphylum Saccharomycotina, by using the

conventional evolutive models, indicated no positive selection in the RLM1 gene

(purifying selection), suggesting that this protein plays an important role in these

organisms. However, a more complete model, the MEC, identified positive selection in

several amino acid sites, indicating nonsynonymous substitutions in some amino acid

positions. Although these substitutions were present in different positions, these

positions were not inside conserved regions, suggesting that these changes occurred due

to different molecular events during the evolution process of divergence of species. A

particular situation that was further analyzed in this study was the observation of the

presence of amino acids under positive selection at the beginning of the repetitive

region, a trinucleotide microsatellite, and differences in the amino acid under repetition

between the species that presented the duplicated the genome and the non duplicated.

The S. cerevisiae Rlm1 protein presents a common microsatellite region in C.

albicans, C. dubliniensis and K. lactis in the C-terminal of the protein, but there is one

peculiarity about this repetitive region. The amino acid encoded by this trinucleotide

microsatellite in S. cerevisiae is the Asn, and in the other species is Gln. The closer

analyses of the different reading frame of RLM1 gene lead to the conclusion that the

most likely molecular mechanism that changed the amino acids at the microsatellite

region was a frameshift mutation by insertion of nucleotides and not an alteration of

nucleotides in the codons, as it is generally the case in nonsynonymous substitutions.

This mechanism of frameshift mutation prevented the insertion of stop codons in the C-

Page 118: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

FINAL REMARKS

93

terminal, avoiding the loss of this region in the protein. This change of amino acids in

the microsatellites of S. cerevisiae RLM1 gene should have brought structural and

functional alterations to the protein.

Another peculiarity of this gene is the absence of these microsatellites in basal

species, within the subphylum Saccharomycotina, indicating that their evolution has

occurred during the divergence of species. This was further confirmed by the

observation that the predicted ancestral RLM1 gene presented the main features

characteristic of the MADS-box transcription factor that has been previously identified

namely the MADS-box, the acidic region and the regions A and B, but the repetitive

region was absent. This absence would be related with the role that the microsatellites

play in the functionality of these proteins in some species and adaptation to different

environmental conditions. Thus, the process of duplication, and genome reorganization

influenced the restructuring of the S. cerevisiae RLM1 gene, decreasing the selective

pressure on this gene and enabling a frameshift mutation, which shifted the reading of

codons from Gln to Asn. Additionally, this process of gene duplication affected not only

RLM1 gene but also MCM1 gene in the Saccharomyces sensu stricto and

Saccharomyces sensu lato groups. This observation points to the potential use of

MADS-box transcription factors in the studies of identification and process of genomes

duplication within subphylum Saccharomycotina, due to the presence of two or four

MADS-box genes in these yeast species.

The major finding of this work were (i) the identification of the potential use of

RLM1 gene for inferring phylogeny in the kingdom Fungi with special emphasis to

Candida species, and (ii) the observation that this gene evolved within the

Saccharomyces sensu strict group after genome duplication, being the molecular

mechanism responsible for the change observed in the C-terminal of this protein most

probably a frameshift mutation.

Page 119: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 120: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

Page 121: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 122: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

97

6. REFERENCES

Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: Selection of best-fit

models of protein evolution. Bioinformatics. 21(9): 2104-2105.

Adachi, J., and Hasegawa, M. (1995). MOLPHY: Programs for Molecular

Phylogenetics. Tokyo: Inst. Statist. Math.

Alba, M. M., Santibáñez-Koref, M. F., and Hancock, J. M. (1999). Amino acid

reiterations in yeast are overrepresented in particular classes of proteins and

show evidence of a slippage-like mutational process. J Mol Evol. 49: 789–797.

Alexopoulos, C.J., Mims, C.W., and Blackwell, M. (1996). Introductory

Micology. 4th

edition. John Wiley & Sons Inc., New York, EE.UU.

Alfaro, M. E., Zoller, S., and Lutzoni, F. (2003). Bayes or bootstrap? A

simulation study comparing the performance of Bayesian Markov chain Monte

Carlo sampling and bootstrapping in assessing phylogenetic confidence.

Molecular Biology and Evolution. 20: 255–266.

Althoefer, H., Schleiffer, A., Wassmann, K., Nordheim, A., and Ammerer, G.

(1995). Mcm1 is required to coordinate G2-specific transcription in

Saccharomyces cerevisiae. Mol Cell Biol. 15: 5917–5928.

Alvarez-Buylla, E. R., Pelaz, S., Liljegren, S. J., Gold, S. E., Burgeff, C., Ditta,

G. S., Ribas de Pouplana, L., Martinez-Castilla, L., and Yanofsky, M. F. (2000).

An ancestral MADS-box gene duplication occurred before the divergence of

plants and animals. Proc Natl Acad Sci. 97: 5328–5333.

Anisimova, M., Bielawski, J. P., and Yang, Z. (2001). Accuracy and power of

the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol

Evol, 18(8): 1585-1592.

Anisimova, M., Bielawski, J. P., and Yang, Z. (2002). Accuracy and power of

Bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 19:

950-958.

Anisimova, M., and Gascuel, O. (2006). Approximate Likelihood-Ratio Test for

branches: A fast, accurate and powerful alternative. Systematic Biology. 55(4):

539-552.

Aragona, M., Montigiani, M., and Porta-Puglia, A. (2000). Electrophoretic

karyotypes of the phytopathogenic Pyrenophora graminea and P. teres.

Mycological Research. 104: 853-857.

Page 123: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

98

Baillie, G. S., and Douglas, L. J. (1998). Effect of growth rate on resistance of

Candida albicans biofilms to antifungal agents. Antimicrob Agents Chemother.

42: 1900–1905.

Baldauf, S. L. (1999). A search for the origins of animals and fungi: comparing

and combining molecular data. American Naturalist. 154: S178-S188.

Baldauf, S. L., and Palmer, J. D. (1993). Animals and fungi are each other‘s

closest relatives: congruent evidence from multiple proteins. Proc Natl Acad Sci.

90: 11558–11562.

Baldauf, S. L., Roger, A. J., Wenk-Sierfert, I., and Doolittle, W. F. (2000). A

kingdom-level phylogeny of eukaryotes based on combined protein data.

Science. 290: 972–977.

Ball, L. M., Bes, M. A., Theelen, B., Boekhout, T., Egeler, R., and Kuijper, E. J.

(2004). Significance of amplified fragment length polymorphism in

identification and epidemiological examination of Candida species colonization

in children undergoing allogeneic stem cell transplantation. J Clin Microbiol.

42(4): 1673-1679.

Ballesté, R, Arteta, Z., Fernández, N., Mier, C., Mousqués, N., Xavier, B.,

Cabrera, M. J., Acosta, G., Combol, A., and Gezuele, E. (2005). Evaluación del

medio cromógeno CHROMagar CandidaTM para la identificación de levaduras

de interés médico. Rev Med Uruguay. 21: 186-193.

Baker, E. E., Mrak, M., and Smith, C. E. (1943). The morphology, taxonomy,

and distribution of Coccidioides irnmitis Rixford and Gilchrist 1896. Farlowia.

1: 199-244.

Bapteste, E., Brinkmann, H., Lee, J. A., Moore, D. V., Sensen, C. W., Gordon,

P., Duruflé, L., Gaasterland, T., Lopez, P., Müller, M., and Philippe, H. (2001).

The analysis of 100 genes supports the grouping of three highly divergent

amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci.

99: 1414 – 1419.

Barr, D. J. S. (1992). Evolution and kingdoms of organisms from the perspective

of a mycologist. Mycologia. 84: 1–11.

Barros Lopes, M., Rainieri, S., Henschke, P.A., and Langridge, P. (1999). AFLP

fingerprinting for analysis of yeast genetic variation. Int J Syst Bacteriol. 49:

915-924.

Page 124: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

99

Bartnicki-Garcia, S. (1987). The cell wall in fungal evolution., p. 389-403. In A.

D. M. Rayner, C. M. Brasier, and D. Moore (ed.), Evolutionary biology of the

fungi. Cambridge University Press, New York, N. Y.

Bhattacharya, D., Lutzoni, F., Reeb, V., Simon, D., Nason, J., and Fernandez, F.

(2000). Widespread occurrence of spliceosomal introns in the rDNA genes of

Ascomycetes. Molecular Biology and Evolution. 17: 1971–1984.

Bauer, R., Begerow, D., Sampaio, J.P., Weiß, M., and Oberwinkler, F. (2006).

The simple-septate basidiomycetes: a synopsis. Mycological Progress. 5: 41–66.

Beckmann, J. S., and Weber, J. L. (1992). Survey of human and rat

microsatellites. Genomics. 12: 627–631.

Berbee, M. L. (1996). Loculoascomycete origins and evolution of filamentous

ascomycete morphology based on 18S rDNA gene sequence data. Molecular

Biology and Evolution. 13: 462–470.

Berbee, M. L. (2001). The phylogeny of plant and animal pathogens in the

Ascomycota. Physiology and Molecular Plant Pathology. 59: 165–187.

Berbee, M. L., and Taylor, J. W. (1993). Dating the evolutionary radiations of

the true fungi. Canadian Journal of Botany. 71: 1114–1127.

Bernal, S., Mazuelos, E. M., Chavez, M., Coronilla, J., and Valverde, A. (1998).

Evaluation of the new API Candida system for identification of the most

clinically important yeast species. Diagnostic Microbiology and Infectuos

Diseases. 32(3): 217-221.

Binder, M., and Hibbett, D. S. (2002). Higher-level phylogenetic relationships of

Homobasidiomycetes (mushroom-forming fungi) inferred from four rDNA

regions. Molecular Phylogenetics and Evolution. 22: 76–90.

Binder, M., Hibbett, D. S, and Molitoris, H. P. (2001). Phylogenetic

relationships of the marine gasteromycete Nia vibrissa. Mycologia. 93: 679–

688.

Binder, M., Hibbett, D. S., Larsson, K-H., Larsson, E., Langer, E., and Langer,

G. (2005). The phylogenetic distribution of resupinate forms across the major

clades of mushroom-forming fungi (Homobasidiomycetes). Systematics and

Biodiversity. 3(2): 113-157.

Blackwell, M. (2000). Evolution. Terrestrial life-fungal from the start?. Science.

293: 1129-1133.

Page 125: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

100

Blackwell, M., Hibbett, D. S., Taylor, J., and Spatafora, J. W. (2006). Research

coordination networks: a phylogeny for kingdom Fungi (Deep Hypha).

Mycologia. 98(6): 829-837.

Boeke, J. D., Garfinkel, D. J., Styles, C. A., and Fink, G. R. (1985). Ty elements

transpose through an RNA intermediate. Cell. 40: 491–500.

Botterel, F., Cesterke, C., Costa, C., and Bretagne, S. (2001). Analysis of

microsatellite markers of Candida albicans used for rapid typing. J Clin

Microbiol. 39: 4076-4081.

Bowman, B. H., White, T. J., and Taylor, J. W. (1996). Human pathogeneic

fungi and their close nonpathogenic relatives. Mol Phylogenet Evol. 6(1): 89-96.

Bowman, S. M., Piwowar, A., Al Dabbous, M., Vierula, J., and Free, S. J.

(2006). Mutational analysis of the glycosylphosphatidylinositol (GPI) anchor

pathway demonstrates that GPI-anchored proteins are required for cell wall

biogenesis and normal hyphal growth in Neurospora crassa. Eukaryot Cell 5:

587–600.

Bretagne, S., Costa, J. M., Besmond, C., Carsique, R., and Calderone, R. (1997).

Microsatellite polymorphism in the promoter sequence of the elongation factor 3

gene of Candida albicans as the basis for a typing system. J Clin Microbiol. 35:

1777-1780.

Bridge, P. D., Spooner, B. M., and Roberts, P. J. (2005). The impact of

molecular data in fungal systematics. Adv Bot Res. 42: 33 – 67.

Bruno, V. M., Kalachikov, S., Subaran, R., Nobile, C. J., Kyratsous, C., and

Mitchell, A. P. (2006). Control of the C. albicans cell wall damage response by

transcriptional regulatorCas5. Plos Pathogens. 2(3): e21.

Bruns, T. D., Vilgalys, R., Barns, S. M., Gonzalez, D., Hibbett, D. S., Lane, D.

J., Simon, L., Stickel, S., Szaro, T. M., Weisburg, W. G., and Sogin, M. L.

(1992). Evolutionary relationships within the fungi: analyses of nuclear small

subunit RNA sequences. Molecular Phylogenetics and Evolution. 1: 231–241.

Busch, U., and Nitschko, H. (1999). Methods for the differentiation of

microorganisms. Journal of Chromatography B. 722: 263 - 278.

Byrne, K. P., and Wolfe, K. H. (2006). Visualizing syntenic relationships among

the hemiascomycetes with the Yeast Gene Order Browser.Nucleic Acids Res.

34: D452-D455.

Page 126: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

101

Calderone, R. A. (2002). Candida and Candidiasis. ASM Press. Washington,

DC, EE.UU.

Carlile, M. J., and Watkinson, S. C. (1994). The Fungi. Academic Press,Ltd.,

London, United Kingdom.

Cavalier-Smith, T. (1981). Eukaryote kingdoms: seven or nine?. Biosystems.

14(3-4): 461-481.

Cavalier-Smith, T. (1987a). The origin of eukaryotic and archaebacterial cells.

Ann N Y Acad Sci. 503: 17-54.

Cavalier-Smith, T. (1987b). The origin of Fungi and pseudofungi. In: Rayner A.

D. M., Brasier C. M. and Moore D. (eds): Evolutionary Biology of the Fungi,

pp. 339–353. Cambridge University Press.

Cavalier-Smith, T. (1998). A revised six-kingdom system of life. Biol. Rev.

Camb. Philos. Soc. 73: 203–266.

Cavalier-Smith, T. (2000). Flagellate megaevolution: the basis for eukaryote

diversification. In: Green J. R. and Leadbeater B. S. C. (eds): The Flagellates,

pp. 361-390. Taylor and Francis. London.

Cavalier-Smith, T. (2002a). The neomuran origin of archaebacteria, the

negibacterial root of the universal tree and bacterial megaclassification. Int J

Syst Evol Microbiol. 52(Pt 1): 7-76.

Cavalier-Smith, T. (2002b). The phagotrophic origin of eukaryotes and

phylogenetic classification of Protozoa. Int J Syst Evol Microbiol. 52(part2):

297-354.

Cavalier-Smith, T. (2002c). Chloroplast evolution: Secundary symbiogenesis

and multiple losses. Curr Biol. 12: R62-R64.

Cavalier-Smith, T. (2003). Protist phylogeny and the high-level classification of

Protozoa. Europ J Protistol. 39: 338-348.

Cavalier-Smith, T. (2004). Only six kingdoms of life. Proc R Soc Lond B. 271:

1251-1262.

Cavalier-Smith, T. (2006a). Rooting of tree of life by transition analysis.

Biology Direct. 1: 19.

Cavalier-Smith, T. (2006b). Cell evolution and earth history: stasis and

revolution. Phil Trans Roy Soc B. 361: 969-1006.

Page 127: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

102

Cavalier-Smith, T. (2006c). Protozoa: the most abundant predators on earth.

Microbiology today. 166-169.

Cavalier-Smith, T., and Chao, E. E. (1995). The opalozoan Apusomonas is

related to the common ancestor of animals, fungi, and choanoflagellates. Proc R

Soc Lond B. 261: 1–9.

Cavalier-Smith, T., and Chao, E. E. (2003). Phylogeny of Choanozoa,

Apusozoa, and other Protozoa and early eukaryote megaevolution. J Mol Evol.

56: 540–563.

Cavalier-Smith T., Chao E. E., and Oates B. (2004): Molecular phylogeny of

Amoebozoa and the evolutionary significance of the unikont Phalansterium. Eur

J Protistol. 40: 21-48.

Cavalli-Sforza, L. L., and Edwards, A. W. F. (1967). Phylogenetic analysis:

models and estimation procedures. Am J Hum Genet. 19: 122–257.

Chapela, I. H. (1991). Spore size revisited: analysis of spore populations using

automated particle size. Sydowia. 43: 1–14.

Chen, T. C., Chen, Y. H., Tsai, J. J., Peng, C. F., Lu, P. L., Chang, K., Hsieh, H.

C., and Chen, T .P. (2005). Epidemiologic analysis and antifungal susceptibility

of Candida blood isolates in Southern Taiwan. Journal of Microbiology and

Immunological Infectious. 38: 200-210.

Chistiakov, D. A., Hellemans, B., and Volckaert, F. A. M. (2006).

Microsatellites and their genomic distribution, evolution, function and

applications: A review with special reference to fish genetics. Aquaculture. 255:

1-29.

Ciferri, R., and Redaelli, P. (1936). Morfologia, biologia e posizione sistemática

di Coccidioides immitis stiles e delle sue varieta, con notizie sul granuloma

coccidioide. R Accad Ital. 7: 399-474.

Clark, T. A., Slavinski, S. A, Morgan, J., Lott, T., Arthington-Skaggs, B. A.,

Brandt, M. E., Webb, R. M., Currier, M., Flowers, R. H., Fridkin, S. K., and

Hajjeh, R. A. (2004). Epidemiologic and molecular characterization of an

outbreak of Candida parapsilosis bloodstream infections in a community

hospital. J Clin Microbiol. 42(10): 4468-4472.

Cole, G. T., Samson, R. A. (1979). Patterns of development in conidial fungi.

Pittman, London, United Kingdom.

Page 128: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

103

Coleman, D. C., Rinaldi, M. G., Haynes, K. A., Rex, J. H., Summerbell, R. C.,

Anaissie, E. J., Li, A., and Sullivan, D. J. (1998). Importance of Candida species

other than Candida albicans as opportunistic pathogens. Medical Mycology. 36:

156–165.

Copeland, H. (1956). The Classification of Lower Organisms. Palo Alto: Pacific

Books.

Caro, L. H., Tettelin, H., Vossen, J. H., Ram, A. F., van den Ende, H., and Klis,

F. M. (1997). In silico identification of glycosyl-phosphatidylinositol-anchored

plasma-membrane and cell wall proteins of Saccharomyces cerevisiae. Yeast.

13: 1477–1489.

Cordeiro, G. M., Casu, R., McIntyre, C. L., Manners, J. M., and Henry, R. J.

(2001). Microsatellite markers from sugarcane (Saccharum spp) ESTs across

transferable to erianthus and sorghum. Plant Sci. 160: 1115–1123.

Correia, A., Sampaio, P., James, S., and Pais, C. (2006). Candida bracarensis

sp. nov., a novel anamorphic yeast species phenotypically similar to Candida

glabrata. Int J Syst Evol Microbiol. 56: 313-317.

Damveld, R. A., Arentshorst, M., Franken, A., vanKuyk, P. A., Klis, F. M., van

den Hondel, C. A. M. J. J., and Ram, A. F. J. (2005). The Aspergillus niger

MADS-box transcription factor RlmA is required for cell wall reinforcement in

response to cell wall stress. Molecular Microbiology. 58(1): 305-319.

Daniel, H. M., Sorrell, T. C., and Meyer, W. (2001). Partial sequence analysis of

the actin gene and its potential for studying the phylogeny of Candida species

and their teleomorphs. Int J Syst Evol Microbiol. 51(Pt 4): 1593-1606.

De Groot, P. W., de Boer, A. D., Cunningham, J., Dekker, H. L., de Jong, L.,

Hellingwerf, K. J., de Koster, C., and Klis, F. M. (2004). Proteomic analysis of

Candida albicans cell walls reveals covalently bound carbohydrate-active

enzymes and adhesins. Eukaryot Cell, 3: 955–965.

De Nadal., E., Casadomé, L., and Posas, F. (2003). Targeting the MEF2-like

transcription factor Smp1 by the stress-activated Hog1 mitogen-activated protein

kinase. Mol Cel Biol. 23(1): 229-237.

Dib, J.C., Dube, M., Kelly, C., Rinaldi, M.G., and Patterson, J. E. (1996).

Evaluation of pulsed-field gel electrophoresis as a typing system for Candida

Page 129: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

104

rugosa: comparison of karyotype and restriction fragment length

polymorphisms. J Clin Microbiol. 34: 1494–1496.

Diezmann, S., Cox, C. J., Schönian, G., Vilgalys, R., and Mitchell, T. (2004).

Phylogeny and evolution of medical species of Candida and related taxa: A

multigenic analysis. J Clin Microbiol. 42(12): 5624-5635.

Dodou, E., and Treisman, R. (1997). The Saccharomyces cerevisiae MADS-box

transcription factor Rlm1 is a target for the Mpk1 mitogen-activated protein

kinase pathway. Mol Cell Biol. 17: 1848– 1859.

Doron-Faigenboim, A., and Pupko, T. 2006. A combined empirical and

mechanistic codon model. Mol Biol Evol. 24(2): 388-397.

Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F., and Douzery, E. J. P.

(2003). Comparison of the Bayesian and maximum likelihood bootstrap

measures of phylogenetic reliability. Molecular Biology and Evolution. 20: 248–

254.

Douzery, E. J., Snell, E. A., Bapteste, E., Delsuc, F., and Philippe, H. (2004) The

timing of eukaryotic evolution: does a relaxed molecular clock reconcile

proteins and fossils? Proc Natl Acad Sci. 101(43): 15386-15391.

Dubois, E., Bercy, J., and Messenguy, F. (1987). Characterization of two genes,

ARGRI and ARGRIII required for specific regulation of arginine metabolism in

yeast. Mol Gen Genet. 207: 142–148.

Dujon, B., Sherman, D., Fischer, G., Durrens, P., Casaregola, S., Lafontaine, I.,

De Montigny, J., Marck, C., Neuveglise, C., Talla, E., Goffard, N., Frangeul, L.,

Aigle, M., Anthouard, V., Babour, A., Barbe, V., Barnay, S., Blanchin, S.,

Beckerich, J. M., Beyne, E., Bleykasten, C., Boisrame, A., Boyer, J., Cattolico,

L., Confanioleri, F., De Daruvar, A., Despons, L., Fabre, E., Fairhead, C., Ferry-

Dumazet, H., Groppi, A., Hantraye, F., Hennequin, C., Jauniaux, N., Joyet, P.,

Kachouri, R., Kerrest, A., Koszul, R., Lemaire, M., Lesur, I., Ma, L., Muller, H.,

Nicaud, J. M., Nikolski, M., Oztas, S., Ozier-Kalogeropoulos, O., Pellenz, S.,

Potier, S., Richard, G. F., Straub, M. L., Suleau, A., Swennen, D., Tekaia, F.,

Wesolowski-Louvel, M., Westhof, E., Wirth, B., Zeniou-Meyer, M., Zivanovic,

I., Bolotin-Fukuhara, M., Thierry, A., Bouchier, C., Caudron, B., Scarpelli, C.,

Gaillardin, C., Weissenbach, J., Wincker, P., and Souciet, J. L. (2004). Genome

evolution in yeasts. Nature. 430(6995): 35-44.

Page 130: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

105

Edel, V., Steinberg, C., Gautheron, N., and Alabouvette, C. (1996). Evaluation

of restriction analysis of polymerase chain reaction (PCR)-amplified ribosomal

DNA for the identification of Fusarium species. Mycol Res. 101: 179–187.

Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy

and high throughput. Nucleic Acids Research. 32(5): 1792-1797.

Eisenhaber, B., Maurer-Stroh, S., Novatchkova, M., Schneider, G., and

Eisenhaber, F. (2003). Enzymes and auxiliary factors for GPI lipid anchor

biosynthesis and post-translational transfer to proteins. Bioessays. 25: 367–385.

Eisenhaber, B., Schneider, G., Wildpaner, M., and Eisenhaber, F. (2004). A

sensitive predictor for potential GPI lipid modification sites in fungal protein

sequences and its application to genome-wide studies for Aspergillus nidulans,

Candida albicans, Neurospora crassa, Saccharomyces cerevisiae and

Schizosaccharomyces pombe. J Mol Biol. 337: 243–253.

Elble, R., and Tye, B. K. (1992). Chromosome loss, hyperrecombination, and

cell cycle arrest in a yeast mcm1 mutant. Mol Biol Cell. 3: 971 – 980.

Ellegren, H. (2000). Microsatellite mutations in the germline: implications for

evolutionary inference. Trends Genet. 16: 551–558.

Eriksson, O. E., Baral, H. O., Currah, R. S., Hansen, K., Kurtzman, C. P.,

Rambold, G., and Laessøe, T (eds). (2004). Outline of Ascomycota— 2004.

Myconet, 10: 1–99.

Errede, B. (1993). MCM1 binds to a transcriptional control element in Ty1. Mol

Cell Biol. 13: 57 – 62.

Estrella, M. C., Mellado, E., Guerra, T. M. D., Monzón, A., and Tudela, J. L. R.

(2000). Susceptibility of fluconazole-resistant clinical isolates of Candida spp.

to echinocandin LY303366, itraconazole and amphotericin B. Journal of

Antimicrobial Chemotherapy. 46: 475-477.

Evans, E. E. (1950). The antigenic composition of Cryptococcus neoformans. I.

A serologic classification by means of the capsular and agglutination reactions. J

Immunol, 64: 423 – 430.

Fabre, E., Muller, H., Therizols, P., Lafontaine, I., Dujon, B., and Fairhead, C.

(2005). Comparative genomics in hemiascomycete yeasts: evolution of sex,

silencing, and subtelomeres. Mol Biol Evol. 22(4): 856-873.

Page 131: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

106

Fanci, R., and Pecile, P. (2005). Central venous catheter-related infection due to

Candida membranaefaciens, a new opportunistic azole-resistant yeast in a

cancer patient: a case report and a review of literature. Mycoses, 48: 357-359.

Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will

be positively misleading. Syst Zool. (27): 401– 410.

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum

likelihood approach. J Mol Evol. 17: 368–376.

Ferguson, M. A. (1999). The structure, biosynthesis and functions of

glycosylphosphatidylinositol anchors, and the contributions of trypanosome

research. J Cell Sci. 112: 2799–2809.

Fernandes, L., Araújo, M. A. M., Amaral, A., Reis, V. C. B., Martins, N. F., and

Felipe, M. S. (2005). Cell signalling pathways in Paracoccidoides brasiliensis –

inferred from comparisons with other fungi. Genet Mol Res. 4(2): 216-231.

Field, D., and Wills, C. (1996). Long, polymorphic microsatellites in simple

organisms. Proc R Soc London Ser B. 263: 209–251.

Fink, G. R. (1987). Pseudogenes in yeast? Cell. 49: 5–6.

Fitzpatrick, D., Logue, M., Stajich, J., and Butler, G. (2006) A fungal phylogeny

bosed on 42 complete genomes derived from supertree and combined gene

analysis. BMC Evolutionary Biology. 6: 99.

Fluit, A. C., Jones, M. E., Schmitz, F-J., Acar, J., Gupta, R., Verhoef, J.,

SENTRY participants Groupl. (2000). Antimicrobial susceptibility and

frequency of occurrence of clinical blood isolates in Europe from the SENTRY

Antimicrobial Surveillance Program, 1997 and 1998. Clin Infect Dis. 30: 454–

60.

Frisvad, J. C. (1994). Classification of organisms by secondary metabolites,p.

303-321. In D. L. Hawsworth (ed.), The identification and characterization of

pest organisms. CAB International, Wallingford, United Kingdom.

García-Martos, P., Ruiz-Aragón, J., Garcia-Agudo, L., Saldarreaga, A., Lozano,

M., and Marín, P. (2004). Aislamiento de Candida ciferri en un paciente

inmunodeficiente. Rev Iberoam Micol. 21: 85-86.

Germot, A., Philipe, H., and Le Guyader, H. (1997). Evidence for loss of

mitochondria in microsporidia from a mitochondrial HSP70 in Nosema locustae.

Molecular and Biochemical Parasitology. 87: 159–168.

Page 132: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

107

Geiser, D. M., Gueidan, C., Miadlikowska, J., Lutzoni, F., Kauff, F., Hofstetter,

V., Fraker, E., Schoch, C. L., Tibell, L., Untereiner, W. A., and Aptroot, A.

(2006). Eurotiomycetes: Eurotiomycetidae and Chaetothyriomycetidae.

Mycologia, 98(6): 1053-1064.

Gill, E. E., and Fast, N. M. (2006). Assessing the microsporidia–fungi

relationship: combined phylogenetic analysis of eight genes. Gene. 375: 103–

109.

Graf, B., Adam, T., Zill, E., and Göbel, U. B. (2000). Evaluation of the VITEK 2

system for rapid identification of yeasts and yeast-like organisms. Journal of

Clinical Microbiology, 38(5): 1782-1785.

Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation

and Bayesian model determination. Biometrika. 82: 711-732.

Gribaldo, S., and Philippe, H. (2002). Ancient phylogenetic relationships. Theor

Popul Biol. 61(4): 391-408.

Guarro, J., Gené, J., and Stchigel, A. 1999. Developments in fungal taxonomy.

Clin Microbiol Rev. 12(3): 454-500.

Gudlaugsson, O., Gillespie, S., Lee, K., Van de Berg, J., Hu, J., Messer, S.,

Herwaldt, L., Pfaller, M., and Diekema, D. (2003). Attributable mortality of

nosocomial candidemia. Clin Infect Dis. 37: 1172-1177.

Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate algorithm to

estimate large phylogenies by maximum likelihood. Systematic Biology. 52(5):

696-704.

Guindon, S., Lethiec, F., Duroux, P., and Gascuel, O. (2005). PHYML Online –

a web server for fast maximum likelihood-based phylogenetic inference. Nucleic

Acids Research. 33: W557-W559.

Gur-Arie, R., Cohen, C., Eitan, L., Shelef, E., Hallerman, E., and Kashi, Y.

(2000). Abundance, non-random genomic distribution, nucleotide composition,

and polymorphism of simple sequence repeats in E. coli. Genome Res. 10: 61–

70.

Gutierrez, J., Morales, P., Gonzalez, M. A, and Quindos, G. (2002). Candida

dubliniensis, a new fungal pathogen. J. Basic Microbiol. 42: 207–227.

Haeckel, E. (1866). Generelle Morphologie der Organismen. Reimer, Berlin.

Page 133: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

108

Hall, B. G. (2008). Phylogenetic trees made easy: A how to manual for

molecular biologists. Third Edition. Sinauer Associates, Inc. Sunderland,

Massachusetts, EE UU.

Harding, R. M., Boyce, A. J., and Clegg, J. B. (1992). The evolution of tandemly

repetitive DNA: recombination rules. Genetics. 132: 847–859.

Hasenclever, H. F., and Mitchell, W. O. (1961). Antigenic studies of Candida. I.

Observations of two antigenic groups in Candida albicans. J Bacteriol. 82: 570

– 573.

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains

and their applications. Biometrika. 57: 97-109.

Hawksworth, D. L. (1991). The fungal dimension of biodiversity-magnitude,

significance, and conservation. Mycological Research. 95: 641.

Hawksworth, D. L. (2001). The magnitude of fungal diversity: the 1-5 million

species estimate revisited. Mycological Research. 105: 1422-1433.

Hawksworth, D. L., Kirk, P. M., Sutton, B. C., and Pegler, D. N. (1995).

Ainsworth & Bisby‘s Dictionary of the Fungi. 8th ed. CAB International,

Wallingford, Oxon, United Kingdom.

Hazen, K. C. (1995). New and emerging yeast pathogens. Clin Microbiol

Reviews. 8: 462–478.

Heath, I. B. (1986). Nuclear division: a marker for protist phylogeny? Progress

in Protistology. 1: 115-162.

Hibbett, D. S., and Binder, M. (2001). Evolution of marine mushrooms.

Biological Bulletin. 201: 319–322.

Hibbett, D. S., Gilbert, L. B., and Donoghue, M.J. (2000). Evolutionary

instability of ectomycorrhizal symbiosis in basidiomycetes. Nature. 407: 506–

508.

Hibbett, D. S., Binder, M., Bischoff, J. F., Blackwell, M., Cannon, P. F.,

Eriksson, O. E., Huhndorf, S., James, T., Kirk, P. M., Lücking, R., Lumbsch, T.,

Lutzoni, F., Matheny, P. B., Mclaughlin, D. J., Powell, M. J., Redhead, S.,

Schoch, C. L., Spatafora, J. W., Stalpers, J. A., Vilgalys, R., Aime, M. C.,

Aptroot, A., Bauer, R., Begerow, D., Benny, G. L., Castlebury, L. A., Crous, P.

W., Dai, Y-C., Gams, W., Geiser, D. M., Griffith, G. W., Gueidan, C.,

Hawksworth, D. L., Hestmark, G., Hosaka, K., Humber, R. A., Hyde, K.,

Page 134: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

109

Ironside, J. E., Kõljalg, U., Kurtzman, C. P., Larsson, K-H., Lichtwardt, R.,

Longcore, J., Midlikowska, J., Miller, A., Moncalvo, J-M., Mozley-Standridge,

S., Oberwinkler, F., Parmasto, E., Reeb, V., Rogers, J. D., Roux, C., Ryvarden,

L., Sampaio, J. P., Schüßler, A., Sugiyama, J., Thorn, R. G., Tibell, L.,

Untereiner, W. A., Walker, C., Wang, Z., Weir, A., Weiß, M., White, M. M.,

Winka, K., Yao, Y-J., and Zhang, N. (2007). A higher-level phylogenetic

classification of the Fungi. Mycological Research III. 509-247.

Hirt, R. P., Healy, B., Vossbrinck, C. R., Canning, E. U., and Embley, T. M.

(1997). A mitochondrial Hsp70 orthologue in Vairimorpha necatrix: molecular

evidence that Microsporidia once contained mitochondria. Current Biology. 7:

995–998.

Hood, D. W., Deadman, M. E., Jennings, M. P., Bisercic, M., Fleischmann, R.

D., Venter, J. C., and Moxon, E. R. (1996). DNA repeats identify novel

virulence genes in Haemophilus influenzae. Proc Natl Acad Sci. 93: 11121–

11125.

Horowitz, B. J., Giaquinta, D., and Ito, S. (1992). Evolving pathogens in

vulvovaginal candidiasis: Implications for patient care. J Clin Pharmacol. 32(3):

248-255.

Huelsenbeck, J. P., and Ronquist, F. (2001). MrBayes: Bayesian inference of

phylogenetic trees. Bioinformatics Applications Note. 17(8): 754-755.

Helsenbeck, J. P., Larget, B., Miller, R. E., and Ronquist, F. (2002). Potential

applications and pitfalls of bayesian inference of phylogeny. Syst Biol. 51(5):

673 – 688.

Hull, C. M., and Johnson, A. D. (1999). Identification of a mating type-like

locus in the asexual pathogenic yeast Candida albicans. Science. 285(5431):

1271-1275.

Hull, C. M., Raisner, R. M., and Johnson, A. D. (2000). Evidence for mating of

the "asexual" yeast Candida albicans in a mammalian host. Science. 289(5477):

307-310.

Huson, D., and Bryant, D. (2006). Application of Phylogenetic Networks in

evolutionary studies. Mol Biol Evol. 23(2): 254-267.

Inagaki, Y., and Doolittle, W. F. (2000). Evolution of the eukaryotic translation

termination system: origins of release factors. Mol Biol Evol. 17: 882 – 889.

Page 135: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

110

Iyer, L. M., Koonin, E. V., and Aravind L. (2004). Evolution of bacterial RNA

polymerase: implications for large-scale bacterial phylogeny, domain accretion,

and horizontal gene transfer. Gene. 335: 73-88.

Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S., and Miyata, T. (1989).

Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred

from phylogenetic trees of duplicated genes. Proc Natl Acad Sci. 86(23): 9355-

9359.

James, T. Y., Porter, D., Leander, C. A., Vilgalys, R., and Longcore, J. E.

(2000). Molecular phylogenetics of the Chytridiomycota supports the utility of

ultrastructural data in chytrid systematics. Canadian Journal of Botany. 78: 336–

350.

James, T. Y., Kauff, F., Schoch, C. L., Matheny, P. B., Hofstetter, V., Cox, C.,

Celio, G., Gueidan, C., Fraker, E., Miädlikowska, J., Lumbsch, H. T., Rauhut,

A., Reeb, V., Arnold, E. A., Amtoft, A., Stajich, J. E., Hosaka, K., Sung, G-H.,

Johnson, D., O‘Rourke, B., Crockett, M., Binder, M., Curtis, J. M., Slot, J. C.,

Wang, Z., Wilson, A. W., Schüßler, A., Longcore, J. E., O‘Donnell, K., Mozley-

Standridge, S., Porter, D., Letcher, P. M., Powell, M. J., Taylor, J. W., White,

M. M., Griffith, G. W., Davies, D. R., Humber, R. A., Morton, J., Sugiyama, J.,

Rossman, A. Y., Rogers, J. D., Pfister, D. H., Hewitt, D., Hansen, K.,

Hambleton, S., Shoemaker, R. A., Kohlmeyer, J., Volkmann-Kohlmeyer, B.,

Spotts, R. A., Serdani, M., Crous, P. W., Hughes, K. W., Matsuura, K., Langer,

E., Langer, G., Untereiner, W. A., Lücking, R., Büdel, B., Geiser, D. M.,

Aptroot, A., Diederich, P., Schmitt, I., Schultz, M., Yahr, R., Hibbett, D. S.,

Lutzoni, F., McLaughlin, D., Spatafora, J., and Vilgalys, R. (2006).

Reconstructing the early evolution of the fungi using a six gene phylogeny.

Nature. 443: 818–822.

James, T. Y., Letcher, P. M., Longcore, J. E., Mozley-Standridge, S. E., Porter,

D., Powell, M. J., Griffith, G. W., and Vilgalys, R. (2007). A molecular

phylogeny of the flagellated fungi (Chytridiomycota) and a proposal for a new

phylum (Blastocladiomycota). Mycologia. 98: 860–871.

Jarvis, W. R. (1995). Epidemiology of nosocomial fungal infections, with

emphasis on Candida species. Clin Infect Dis. 20: 1526–1530.

Page 136: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

111

Jarvis, E. E., Clark, K. L., and Sprague Jr., G. F. (1989). The yeast transcription

activator PRTF, a homolog of the mammalian serum response factor, is encoded

by the MCM1 gene. Genes Dev. 3: 936–945.

Jeffroy, O., Brinkmann, H., Delsuc, F., and Philippe, H. (2006). Phylogenomics:

the beginning of incongruence? Trends Genet. 22(4): 225-231.

Karl, S. A., and Avise, J. C. (1993). PCR-based assays of mendelian

polymorphisms from anonymous single-copy nuclear DNA-techniques and

applications for population genetics. Mol Biol Evol. 10: 342–361.

Kashi, Y., King, D., and Soller, M. (1997). Simple sequence repeats as a source

of quantitative genetic variation. Trends Genet. 13: 74–78.

Kawaguchi, Y., Honda, H., Taniguchi-Morimura, J., and Iwasaki, S. (1989). The

codon CUG is read as serine in an asporogenic yeast Candida cylindracea.

Nature. 341(6238): 164-166.

Keeling, P. J. (2003). Congruent evidence from alpha-tubulin and beta-tubulin

gene phylogenies for a zygomycete origin of microsporidia. Fungal Genetics and

Biology. 38: 298–309.

Keeling, P. J., Luker, M. A., and Palmer, J. D. (2000). Evidence from beta-

tubulin phylogeny that Microsporidia evolved from within the fungi. Molecular

Biology and Evolution. 17: 23–31.

King, N., and Carroll, S. B. (2001): A receptor tyrosine kinase from

choanoflagellates: molecular insights into early animal evolution. Proc. Natl

Acad. Sci. 98: 15032–15037.

Kirk, P. M., Cannon, P. F., David, J. C., and Stalpers, J. A. (eds). (2001).

Ainsworth & Bisby‘s; Dictionary of the Fungi 60 (CAB International,

Wallingford, UK).

Klempp-Selb, B., Rimek, D., and Kappe, R. (2000). Karyotyping of Candida

albicans and Candida glabrata from patients with Candida sepsis. Mycoses. 43:

159-163.

Kohn, L. M. 1992. Developing new characters for fungal systematics: an

experimental approach for determining the rank of resolution. Mycologia. 84:

139–153.

Kosakovsky, S. L., Frost, S. D. W., and Muse, S. V. (2005). HyPhy: Hypothesis

testing using phylogenies. Bioinformatics. 21(5): 676–9.

Page 137: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

112

Kramer, E. M., Su, H-J., Wu, C-C., and Hu, J-M. (2006). A simplified

explanation for the frameshift mutation that created novel C-terminal motif in

the APETALA3 gene lineage. BMC Evolutionary Biology. 6: 30.

Krcmery, V., and Barnes, A. J. (2002). Non-albicans Candida spp. causing

fungaemia: pathogenicity and antifungal resistance. J Hosp Infect. 50: 243–260.

Kruger, J., Aichinger, C., Kahmann, R., and Bolker, M. (1997). A MADS-box

homologue in Ustilago maydis regulates the expression of pheromoneinducible

genes but is nonessential. Genetics. 147: 1643– 1652.

Kuhn, D. M., Mukherjee, P. K., Clark, T. A., Pujol, C., Chandra, J., Hajjeh, R.

A., Warnock, D. W., Soll, D. R., and Ghannoum, M. A. (2004). Candida

parapsilosis characterization in an outbreak setting. Emerg Infect Dis. 10: 1074–

1081.

Kullberg, B. J., and Lashof, A. M. L. O. (2002). Epidemiology of opportunistic

invasive mycoses. European Journal of Medical Research. 7: 183-191.

Kuo, M. H., Nadeau, E. T., and Grayhack, E. J. (1997). Multiple phosphorylated

forms of the Saccharomyces cerevisiae Mcm1 protein include an isoform

induced in response to high salt concentrations. Mol Cell Biol. 17: 819–832.

Kurtzman, C. P., and Robnett, C. J. (1998). Identification and phylogeny of

ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal

DNA partial sequences. Antonie van Leeuwenhoek, 73: 331–371.

Kurtzman, C. P., and Robnett, C. J. (2003). Phylogenetic relationships among

yeasts of the ‗Saccharomyces complex‘ determined from multigene sequence

analyses. FEMS Yeast Res. 3: 417–432.

Latchman, D. S. (1998). Eukaryotic transcription factors, Academic Press.

Lang, B. F., O‘Kelly, C., Nerad, T., Gray, M. W., and Burger, G. (2002). The

closest unicellular relatives of animals. Curr Biol. 12: 1773–1778.

Langkjaer, R. B., Cliften, P. F., Johnston, M., and Piskur, J. (2003). Yeast

genome duplication was followed by asynchronous differentiation of duplicated

genes. Nature. 421: 848–852.

Linnaei, C. (1735). Systema naturae sive Regna Tria Naturae Systematice

proposita per Classes, Ordines, Genera et Species. Lugduni Batavorum: apud T.

Hakk.

Page 138: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

113

Le´John, H. B. (1974). Biochemical parameters of fungal phylogenetics. Evol

Biol. 7: 79–125.

Li, J., Xu, J., and Bai, F. (2006). Candida pseudorugosa sp. nov., a novel yeast

species from sputum. J Clin Microbiol. 44(12): 4486-4490.

Li, Y-C., Korol, A. B., Fahima, T., Beiles, A., and Nevo, E. (2002).

Microsatellites: genomic distribution, putative functions and mutational

mechanisms. Molecular Ecology. 11: 2453-2465.

Li, Y-C., Korol, A.B., Fahima, T., and Nevo, E. (2004). Microsatellites within

genes: structure, function, and evolution. Mol Biol Evol. 21: 991–1007.

Liu, Y. J., Whelen, S., and Hall, B. D. (1999). Phylogenetic relationships among

ascomycetes: evidence from an RNA polymerase II subunit. Molecular Biology

and Evolution. 16: 1799–1808.

Liu, Y. J., Hodson, M. C., and Hall, B. D. (2006). Loss of the flagellum

happened only once in the fungal lineage: phylogenetic structure of kingdom

Fungi inferred from RNA polymerase II subunit genes. BMC Evolutionary

Biology. 6: 74.

Liu, Y. J., Leigh, J. W., Brinkmann, H., Cushion, M. T., Rodriguez-Ezpeleta, N.,

Philippe, H., and Lang, B. F. (2009). Phylogenomic analysis support the

monophyly of Taphrinomycotina, including Schizosaccharomyces fission

yeasts. Mol Biol Evol. 26: 27-34.

Logue, M. E., Wong, S., Wolfe, K. H., and Butler, G. (2005). A genome

sequence survey shows that the pathogenic yeast Candida parapsilosis has a

defective MTLa1 allele at its mating type locus. Eukaryot Cell. 4(6): 1009-1017.

López Martínez, R., Ruiz Sánchez, J., and Vértiz Chávez, E. (1984). Vaginal

candidiasis. Mycopathologia. 85: 167-170.

Lowy B. 1971. New records of mushroom stones from Guatemala. Mycologia.

63: 983-993.

Lumbsch, H. T., Schmitt, I., Lindemuth, R., Miller, A., Mangold, A., Fernandez,

F., and Huhndorf, S. (2005). Performance of four ribosomal DNA regions to

infer higher-level phylogenetic relationships of inoperculate euascomycetes

(Leotiomyceta). Mol Phylogenet Evol. 34(3):512-524.

Lutzoni, F., Pagel, M., and Reeb, V. (2001). Major fungal lineages are derived

from lichen symbiotic ancestors. Nature. 411: 937–940.

Page 139: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

114

Lutzoni, F., Kauff, F., Cox, C. J., McLaughlin, D., Celio, G., Dentinger, B.,

Padamsee, M., Hibbett, D. S., James, T. Y., Baloch, E., Grube, M., Reeb, V.,

Hofstetter, V., Schoch, C., Arnold, A. E., Miädlikowska, J., Spatafora, J.,

Johnson, D., Hambleton, S., Crockett, M., Shoemaker, R., Sung, G-H., Lücking,

R., Lumbsch, T., O‘Donnell, K., Binder, M., Diederich, P., Ertz, D., Gueidan,

C., Hansen, K., Harris, R. C., Hosaka, K., Lim, Y-W., Matheny, B., Nishida, H.,

Pfister, D., Rogers, J., Rossman, A., Schmitt, I., Sipman, H., Stone, J.,

Sugiyama, J., Yahr, R., and Vilgalys, R. (2004). Assembling the fungal tree of

life: progress, classification, and evolution of subcellular traits. American

Journal of Botany. 91: 1446–1480.

Magee, B. B., and Magee, P. T. (1987). Electrophoretic karyotypes and

chromosome numbers in Candida species. J Gen Microbiol. 133: 425–430.

Magee, B. B., and Magee, P. T. (2000). Induction of mating in Candida albicans

by construction of MTLa and MTLalpha strains. Science. 289(5477): 310-313.

Marcotte, E. M., Pellegrini, M., Yeates, T. O., and Eisenberg, D. (1999). A

census of protein repeats. J Mol Biol. 293: 151–160.

Massey, S. E., Moura, G., Beltrao, P., Almeida, R., Garey, J. R., Tuite, M. F.,

and Santos, M. A. (2003). Comparative evolutionary genomics unveils the

molecular mechanism of reassignment of the CTG codon in Candida spp.

Genome Res. 13(4): 544-557.

Matheny, P. B., Gossman, J. A., Zalar, P., Arun Kumar, T. K., and Hibbett, D. S.

(2007a). Resolving the phylogenetic position of the Wallemiomycetes: an

enigmatic major lineage of Basidiomycota. Canadian Journal of Botany. 84:

1794–1805.

Matheny, P. B., Wang, Z., Binder, M., Curtis, J. M., Lim, Y. W., Nilsson, R. H.,

Hughes, K. W., Hofstetter, V., Ammirati, J. F., Schoch, C. L., Langer, G. E.,

McLaughlin, D. J., Wilson, A. W., Frøslev, T., Ge, Z. W., Kerrigan, R. W., Slot,

J. C., Vellinga, E. C., Liang, Z. L., Baroni, T. J., Fischer, M., Hosaka, K.,

Matsuura, K., Seidl, M. T., Vaura, J., and Hibbett, D. S. (2007b). Contributions

of Rpb2 and Tef1 to the phylogeny of mushrooms and allies (Basidiomycota,

Fungi). Molecular Phylogenetics and Evolution. 43: 430-451.

Mayer, V. W., and Aguilera, A. (1990). High levels of chromosome instability in

polyploids of Saccharomyces cerevisiae. Mutat Res. 231: 177–186 (1990).

Page 140: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

115

McConville, M. J., and Menon, A. K. (2000). Recent developments in the cell

biology and biochemistry of glycosylphosphatidylinositol lipids (review). Mol

Membr Biol. 17: 1–16.

McInerny, C. J., Partridge, J. F., Mikesell, G. E., Creemer, D. P., and Breeden,

L. L. (1997). A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, and

CDC47 promoters activates M/G1-specific transcription. Genes Dev. 11: 1277–

1288.

Melo, A., de Almeida, L., Colombo, A., and Briones, M. (1998). Evolutionary

distances and identification of Candida species in clinical isolates by randomly

amplified polymorphic DNA (RAPD). Mycopathologia. 142: 57-66.

Messenguy, F., and Dubois, E. (1993). Genetic evidence for a role for MCM1 in

the regulation of arginine metabolism in Saccharomyces cerevisiae. Mol Cell

Biol. 13: 2586– 2592.

Messenguy, F., and Dubois, E. (2000). Regulation of arginine metabolism in

Saccharomyces cerevisiae: a network of specific and pleiotropic proteins in

response to multiple environmental signals. Food Technol Biotechnol. 38: 277–

285.

Metzgar, D., van Belkum, A., Field, D., Haubrich, R., and Wills, C. (1998).

Random amplification of polymorphic DNA and microsatellite genotyping of

pre- and posttreatment isolates of Candida spp. from human immunodeficiency

virus-infected patients on different fluconazole regimens. J Clin Microbiol. 36:

2308-2313.

Metzgar, D., Bytof, J., and Wills, C. (2000). Selection against frameshift

mutations limits microsatellite expansion in coding DNA. Genome Res. 10: 72–

80.

Miadlikowska, J., and Lutzoni, F. (2004). Phylogenetic classification of

peltigeralean fungi (Peltigerales, Ascomycota) based on ribosomal RNA small

and large subunits. American Journal of Botany. 91: 449–464.

Micales, J. A., Bonde, M. R., and Peterson, G. L. (1986). The use of isozyme

analysis in fungal taxonomy and genetics. Mycotaxon. 27: 405-409.

Moestrup, Ø. (2000): The flagellate cytoskeleton: introduction of a general

terminology for microtubular roots in protists. In: Leadbeater B. S. and Green J.

Page 141: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

116

C. (eds): The flagellates: unity, diversity and evolution, pp. 69–94. Taylor and

Francis, London.

Moncalvo, J. M., Vilgalys, R., Redhead, S. A., Johnson, J. E., James, T. Y.,

Aime, M. C., Hofstetter, V., Verduin; S. J. W., Larsson, E., Baroni, T. J., Thorn,

R.G., Jacobsson, S., Clémençon, H., and Miller, jr, O. K. (2002). One hundred

and seventeen clades of euagarics. Molecular Phylogenetics and Evolution. 23:

357–400.

Moss, M. O. (1987). Fungal biotechnology roundup. Mycologist, 21: 55-58.

Moxon, E. R., Rainey, P. B., Nowak, M. A., and Lenski, R. E. (1994). Adaptive

evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 4: 24–33.

Mueller, C. G., and Nordheim, A. (1991). A protein domain conserved between

yeast MCM1 and human SRF directs ternary complex formation. EMBO J. 10:

4219– 4229.

Musial, C. E., Cockerill, F. R., and Roberts, G. D. (1988). Fungal infections of

the immunocompromised host: Clinical and laboratory aspects. Clin Microbiol

Rev. 1: 349–364.

Nagahama, T., Sato, H., Shimazu, M., and Sugiyama, J. (1995). Phylogenetic

divergence of the entomophthoralean fungi: evidence from nuclear 18S

ribosomal RNA gene sequences. Mycologia. 87: 203–209.

Nagy, Á., Pesti, M., Galgóczy, M., and Vágvölyi, C. (2004). Electrophoretic

karyotype of two Micromucor species. Journal of Basic Microbiology. 44(1):

36-41.

Nadal Ed, E., Casadome, L., and Posas, F. (2003). Targeting the MEF2-like

transcription factor Smp1 by the stress-activated Hog1 mitogen-activated protein

kinase. Mol Cell Biol. 23: 229– 237.

Nei, M. (1996). Phylogenetic analysis in molecular evolutionary genetics. Annu

Rev Genet. 30: 371-403.

Nielsen, R., and Yang, Z. (1998). Likelihood models for detecting positively

selected amino acid sites and applications to the HIV-1 envelope gene. Genetics.

148: 929–936.

Nielsen, C. B., Friedman, B., Birren, B., Burgue, C. B., and Galagan, J. E.

(2004). Patterns of intron gain and loss in Fungi. Plos Biology. 2(12): e422.

Page 142: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

117

Nogueira, M. I., Silva, P., Galindo, I., Ramirez, J. L., Arruda, R., and Franco, J.

(1998). Electrophoretic karyotypes and genome sizing of the pathogenic fungus

Paracoccidioides brasiliensis. J Clin Microbiol. 36(3): 742-747.

Norman, C., Runswick, M., Pollock, R., and Treisman, R. (1988). Isolation and

properties of cDNA clones encoding SRF, a transcription factor that binds to the

c-fos serum response element. Cell. 55: 989– 1003.

Nylander, J. A. A. (2004). MrModeltest v2. Program distributed by the author.

Evolutionary Biology Centre, Uppsala University.

Odds, F. C. (1988). Candida and Candidosis: A Review and Bibliography. 2 ed.

Baillière Tindall, London.

Ohno, S. (1970). Evolution by Gene Duplication (Allen and Unwin, London).

Ophuls, M. D. (1905). Further observations on a pathogenic mould formerly

described as a protozoon (Coccidioides immitis, Coccidioides pyrogenes). J Exp

Med. 6: 443-485.

Pace, N. R. (1997). A molecular view of microbial diversity and the biosphere.

Science. 276: 734–740.

Pan, S., Sigler, L., and Cole, G. T. (1994). Evidence for a phylogenetic

connection between Coccidioides immitis and Uncinocarpus reesii

(Onygenaceae). Microbiology. 140 (Pt 6): 1481-1494.

Passmore, S., Maine, G. T., Elble, R., Christ, C., and Tye, B. K. (1988).

Saccharomyces cerevisiae protein involved in plasmid maintenance is necessary

for mating of MAT alpha cells. J Mol Biol. 204: 593– 606.

Passmore, S., Elble, R., and Tye, B. K.. (1989). A protein involved in

minichromosome maintenance in yeast binds a transcriptional enhancer

conserved in eukaryotes. Genes Dev. 3: 921– 935.

Peak, I. R. A., Jennings, M. P., Hood, D. W., Bisercic, M., and Moxon, E. R.

(1996). Tetrameric repeat units associated with virulence factor phase variation

in Haemophilus also occur in Neiserria spp. and Moraxella catarrhalis. FEMS

Microbiol Lett. 137: 109–114.

Pedrós, M. (2003). Clonación y caracterización de una hidrofobina de clase II

(CaHPB) de Candida albicans. Tesis Doctoral. Facultad de Farmacia.

Universitat de Valencia. Valencia, España. 215 pag.

Page 143: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

118

Pellegrini, L., Tan, S., and Richmond, T. J. (1995). Structure of serum response

factor core bound to DNA. Nature. 376: 490– 498.

Peretó, J., López-García, P., and Moreira, D. (2004). Ancestral lipid biosynthesis

and early membrane evolution. Trends Biochem Sci. 29: 469-477.

Persoh, D., and Rambold, G. (2003). Fungal diversity in the host lichen Letharia

vulpine. In J. Stadler, I. Hensen, S. Klotz, and H. Feldmann (eds.),

Biodiversity—from patterns to processes. Kurzfassungen der Beiträge zur 33.

Jahrestagung der Gesellschaft für Ökologie, 161. Verhandlungen der

Gesellschaft für Ökologie, Halle/Saale, Germany.

Peterson, S. W. (2000). Phylogenetic relationships in Aspergillus based on

rDNA sequences analysis. In R. A. Samson and J. I. Pitt (eds.), Integration of

modern taxonomic methods for Penicillium and Aspergillus classification.

Harwood Academic Publishers, Amsterdam, Netherlands.

Peyretaillade, E., Broussolle, V., Peyret, P., Méténier, G., Gouy, M., Vivarès, C.

P. (1998). Microsporidia, amitochondrial protists, possess a 70-kDa heat shock

protein gene of mitochondrial evolutionary origin. Molecular Biology and

Evolution. 15: 683–689.

Pfaller, M. A., Diekema, D. J., Jones, R. N., Sader, H. S., Fluit, A. C., Hollis, R.

J., and Messer, S. A. (2001). International surveillance of bloodstream infections

due to Candida species: frequency of occurrence and in vitro susceptibilities to

fluconazole, ravuconazole, and voriconazole of isolates collected from 1997

through 1999 in the SENTRY antimicrobial surveillance program. J Clin

Microbiol. 39: 3254–3259.

Pfaller, M. A., Diekema, D. J., Messer, S. A., Boyken, L., Hollis, R. J., and

Jones, R. N. (2003). In vitro activities of voriconazole, posaonazole, and four

licensed systemic antifungal agents against Candida species infrequently

isolated from blood. J Clin Microbiol. 41: 78–83.

Phillips, M. J., Delsuc, F., and Penny, D. (2004). Genome-scale phylogeny and

the detection of systematic biases. Mol Biol Evol. 21(7): 1455-1458.

Pirozynski, K. A., and Malloch, D. (1975). The origin of land plants: a matter of

mycotropism. BioSystems. 6: 153–164.

Page 144: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

119

Pollock, R., and Treisman, R. (1991). Human SRF-related proteins:

DNAbinding properties and potential regulatory targets. Genes Dev. 5: 2327–

2341.

Polonelli, L., Conti, S., Magliani, W., and Horace, G. (1989). Biotyping of

pathogenic fungi by the killer system and with monoclonal antibodies.

Mycopathologia. 107(1): 17 – 23.

Posada, D. (2008). jModelTest: Phylogeny model averaging. Mol Biol Evol.

25(7): 1253-1256.

Pujol, C., Daniels, K. J., Lockhart, S. R., Srikantha, T., Radke, J. B., Geiger, J.,

and Soll, D. R. (2004). The closely related species Candida albicans and

Candida dubliniensis can mate. Eukaryot Cell. 3(4): 1015-1027.

Raes J., and van de Peer, Y. (2005). Functional divergenceof proteins through

frameshift mutations. Trends in Genetics. 21(8): 428-431.

Ragan, M. A., Goggins, C. L., Cawthorn, R. J., Cerenius, L., Jamieson, A. V. C.,

Plourde, S. M., Rand, T. G., Söderhäll, K., and Gutell, R. R. (1996). A novel

clade of protistan parasites near the animal-fungal divergence. Proc Natl Acad

Sci. 93: 11907–11912.

Reeb, V., Lutzoni, F., and Roux, C. (2004). Contribution of RPB2 to multilocus

phylogenetic studies of the euascomycetes (Pezizomycotina, Fungi) with special

emphasis on the lichen-forming Acarosporaceae and evolution of polyspory.

Molecular Phylogenetics and Evolution. 32: 1036-1060.

Ribeiro, E., Pereira, C., Takaki, R., and Höfling, J.F. (2000). Grouping oral

Candida species by multilocus enzyme electrophoresis. Int J Syst Evol

Microbiol. 50: 1343-1349.

Richard, G.–F., and Dujon, B. (1997). Trinucleotide repeats in yeast. Res

Microbiol, 148: 731–744.

Riechmann, J. L., and Meyerowitz, E. M. (1997). Determination of floral organ

identity by Arabidopsis MADS domain homeotic proteins AP1, AP3, PI, and

AG is independent of their DNA-binding specificity. Mol Biol Cell. 8: 1243–

1259.

Riechmann, J. L., Krizek, B. A., and Meyerowitz, E. M. (1996). Dimerization

specificity of Arabidopsis MADS domain homeotic proteins APETALA1,

Page 145: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

120

APETALA3, PISTILLATA, and AGAMOUS. Proc Natl Acad Sci. 93: 4793 –

4798.

Rixford, E., and Gilchrist, C. (1896). Two cases of protozoan (coccidioidal)

infection of the skin and other organs. Johns Hopkins Hospital Report. 1: 209-

268.

Rensberger, B. (1992). The Iceman: Now the research is on ice. J NIIT Res. 4:

25–27.

Rodrigues de Miranda, L. (1979). Clavispora, a new yeast genus of the

Saccharomycetales. Antonie Van Leeuwenhoek. 45(3): 479-483.

Roger, A. J, and Hug, L. A. (2006). The origin and diversification of eukaryotes:

problems with molecular phylogenetics and molecular clock estimation. Philos

Trans R Soc Lond B Biol Sci. 361: 1039-1054.

Rokas, A., King, N., Finnerty, J., and Carroll, S. B. (2003). Conflicting

phylogenetic signals at the base of the metazoan tree. Evol. Dev. 5: 346–359.

Romero, A. J., and Minter, D. W. (1988). Fluorescence microscopy: an aid to the

elucidation of ascomycete structures. Trans Br Mycol Soc. 90: 457–470.

Rottmann, M., Dieter, S., Brunner, H., and Rupp, S. (2003). A screen in

Saccharomyces cerevisiae identified CaMCM1, an essential gene in Candida

albicans crucial for morphogenesis. Mol Microbiol. 47: 943–959.

Saitou, N., and Nei, M. (1987). The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Mol Biol Evol. 4: 406–425.

Sampaio, P., Gusmão, L., Alves, C., Pina-Vaz, C., Amorim, A., and Pais, C.

(2003). Highly polymorphic microsatellite for identification of Candida albicans

strains. J Clin Microbiol. 41(2): 552-557.

Sampaio, P., Gusmão, L., Correia, A., Alves, C., Rodrigues, A., Pina-Vaz, C.,

Amorim, A., and Pais, C. (2005). New microsatellite multiplex PCR for Candida

albicans strain typing reveals microevolutionary changes. J Clin Microbiol.

43(8): 3869-3876.

Sampaio, P., Nogueira, E., Loureiro, A. S., Delgado-Silva, Y, Correia, A., Pais,

C. (2009). Increased number of glutamine repeats in the C-terminal of Candida

albicans Rlm1p enhances the resistant to stress agents. Antonie van

Leeuwehoek. In press.

Page 146: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

121

Sandven, P. (2000). Epidemiology of candidemia. Rev Iberoam Micol. 17: 73–

81.

San-Millan, R., Ribacoba, L., Ponton, J., and Quindos, G. (1996). Evaluation of

a commercial medium for identification of Candida species. J Clin Microbiol

Infect Dis. 15: 153 – 158.

Santos, M. A., and Tuite, M. F. (1995). The CUG codon is decoded in vivo as

serine and not leucine in Candida albicans. Nucleic Acids Res. 23(9): 1481-

1486.

Santelli, E., and Richmond, T.J. (2000). Crystal structure of MEF2A core bound

to DNA at 1.5 A resolution. J Mol Biol. 297: 437– 449.

Savelkoul, P. H. M., Aarts, H. J. M., de Haas, J., Dijkshoorn, L., Duim, B.,

Otsen, M., Rademaker, J. L. W., Schouls, L., and Lenstra, J. A. 1999. Amplified

fragment length polymorphism analysis: the state of an art. J Clin Microbiol. 37:

3083–3091.

Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S., and Wolfe, K. H. (2006).

Multiple rounds of speciation associated with reciprocal gene loss in polyploid

yeasts. Nature. 440(7082): 341-345.

Shalchian-Tabrizi, K., Minge, M. A., Espelund, M., Orr, R., Ruden, T.,

Jakobsen, K. S., and Cavalier-Smith, T. (2008). Multigene phylogeny of

Choanozoaand the Origin of Animals. Plos ONE. 3(5): e2098.

Schoch, C. L., Shoemaker, R. A., Seifert, K. A. Hambleton, S., Spatafora, J. W.,

and Crous, P. W. (2006). A multigene phylogeny of the Dothideomycetes using

four nuclear loci. Mycologia. 98(6): 1041-1052.

Schüßler, A., Schwarzott, D., and Walker, C. (2001). A new fungal phylum, the

Glomeromycota: phylogeny and evolution. Mycologycal Research. 105: 1413-

1421.

Schwarz-Sommer, Z., Huijser, P., Nacken, W., Saedler, H., and Sommer, H.

(1990). Genetic control of flower development by homeotic genes in

Antirrhinum majus. Science. 250: 931– 936.

Seifert, K. A., Wingfield, B. D., and Wingfield, M. J. 1995. A critique of DNA

sequence analysis in the taxonomy of filamentous Ascomycetes and

ascomycetous anamorphs. Can J Bot. 73 (Suppl. 1): S760–S767.

Page 147: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

122

Seoighe, C., and Wolfe, K. H. (1999). Updated map of duplicated regions in the

yeast genome. Gene. 238: 253–261.

Sharrocks, A. D., Gille, H., and Shaw, P. E. (1993). Identification of amino acids

essential for DNA binding and dimerization in p67SRF: implications for a novel

DNA-binding motif. Mol Cell Biol. 13: 123– 132.

Shemer, R., Weissman, Z., Hashman, N., and Kornitzer, D. 2001. A highly

polymorphic degenerate microsatellite for molecular strain typing of Candida

krusei. Microbiology. 147: 2021-2028.

Shin, J. H., Shin, D. H., Song, J. W., Kee, S. J., Suh, S. P., and Ryang, D. W.

(2001). Electrophoretic karyotype analysis of sequential Candida parapsilosis

isolates from patients with persistent or recurrent fungemia. J Clin Microbiol.

39(4): 1258-1263.

Sneath, P. H. A., and Sokal, R. R. (1973). Numerical Taxonomy. W.H. Freeman

and Company, San Francisco, pp 230-234.

Soll, D. R., Staebell, M., Langtimm, C.J., Pfaller, M., Hicks, J., and Rao, T.V.G.

(1988). Multiple Candida strains in the course of a single systemic infection. J.

Clin. Microbiol. 26: 1448-1459.

Soll, D. (2000). The ins and out of DNA fingerprinting the infectious fungi. Clin

Microbiol Rev. 13(2): 332-370.

Sommer, H., Beltran, J. P., Huijser, P., Pape, H., Lonnig, W. E., Saedler, H., and

Schwarz-Sommer, Z. (1990). Deficiens, a homeotic gene involved in the control

of flower morphogenesis in Antirrhinum majus: the protein shows homology to

transcription factors. EMBO J. 9: 605– 613.

Spatafora, J. W., Sung, G-H., Johnson, D., Hesse, C., O‘Rourke, B., Serdani, M.,

Spotts, R., Lutzoni, F., Hofstetter, V., Miadlikowska, J., Reeb, V., Gueidan, C.,

Fraker, E., Lumbsch, T., Lücking, R., Schmitt, I., Hosaka, K., Aptroot, A.,

Roux, C., Miller, A. N., Geiser, D. M., Hafellner, J., Hestmark, G., Arnold, A.

E., Büdel, B., Rauhut, A., Hewitt, D., Untereiner, W. A., Cole, M. S.,

Scheidegger, C., Schultz, M., Sipman, H., and Schoch, C. L. (2006). A five-gene

phylogeny of Pezizomycotina. Mycologia. 98(6): 1018-1028.

Stechmann, A., and Cavalier-Smith, T. (2002). Rooting the eukaryote tree by

using a derived gene fusion. Science. 297: 89–91.

Page 148: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

123

Stechmann, A., and Cavalier-Smith, T. (2003). The root of the eukaryote tree

pinpointed. Curr Biol. 13: R665–R666.

Steenkamp, E. T., Wright, J., and Baldauf, S. (2006). The protistan origins of

Animals and Fungi. Mol Biol Evol. 23(1): 93 -106.

Stenroos, S., Hyvönen, J., Myllys, L., Thell, A., and Ahti, T. (2002). Phylogeny

of the genus Cladonia s. lat. (Cladoniaceae, Ascomycetes) inferred from

molecular, morphological, and chemical data. Cladistics. 18: 237–278.

Stern, A., Doron-Faigenboim, A., Erez, E., Martz, E., Bacharach, E., and Pupko,

T. (2007). Selecton 2007: advanced models for detecting positive and purifying

selection using a Bayesian inference approach. Nucleic Acids Research. 35:

W506-W511.

Sugita, T., Takashima, M., Poonwan, N., and Mekha, N. (2006). Candida

pseudohaemulonii an amphotericin B- and azole-resistant yeast species, isolated

from the blood of a patient from Thailand. Microbiol Immunol. 50(6): 469-473.

Suh, S-O., Blackwell, M., Kurtzman, C., and Lachance, M-A. (2006).

Phylogenetics of Saccharomycetales, the ascomycete yeasts. Mycologia. 98(6):

1006-1017.

Sullivan, D., and Coleman, D. (1997). Candida dubliniensis: an emerging

opportunistic pathogen. Curr Top Med Mycol. 8: 15–25.

Sullivan, D. J., Westerneng, T. J., Haynes, K. A., Bennet, D. E., and Coleman,

D. C. (1995). Candida dubliniensis sp. nov.: Phenotypic and molecular

characterization of a novel species associated with oral candidosis in HIV-

infected individuals. Microbiology. 141: 1507-1521.

Sundstrom, P. (2002). Adhesion in Candida spp. Cell Microbiol. 4: 461–469.

Suzuki, Y., Glazko, G. V., and Nei, M. (2002). Overcredibility of molecular

phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci. 99:

16138–16143.

Swanson, W. J., Yang, Z., Wolfner, M., and Aquado, C. F. (2001). Positive

Darwinian selection drives the evolution of several female reproductive proteins

in mammals. Proc Natl Acad Sci. 98: 2509–2514.

Swofford, D. L. (2002). PAUP: Phylogenetic analysis using parsimony (*and

other methods). Sunderland, Massachusetts, Sinauer Associates.

Page 149: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

124

Tachida, H., and Izuka, M. (1992). Persistence of repeated sequences that evolve

by replication slippage. Genetics. 131: 471–478.

Takezaki, N., and Nei, M. (1994). Inconsistency of the maximum parsimony

method when the rate of nucleotide substitution is constant. J Mol Evol. 39:

210–218.

Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular

Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular

Biology and Evolution. 24: 1596-1599.

Tanabe, Y., O‘Donnell, K., Saikawa, M., and Sugiyama, J. (2000). Molecular

phylogeny of parasitic Zygomycota (Dimargaritales, Zoopagales) based on

nuclear small subunit ribosomal DNA sequences. Molecular Phylogenetics and

Evolution. 16: 253–262.

Tanabe, Y., Saikawa, M., Watanabe, M. M., and Sugiyama, J. (2004). Molecular

phylogeny of Zygomycota based on EF-1 and RPB1 sequences: limitations and

utility of alternative markers to rDNA. Molecular Phylogenetics and Evolution.

30: 438–449.

Tanabe, Y., Watanabe, M. M., and Sugiyama, J. (2005). Evolutionary

relationships among basal fungi (Chytridiomycota and Zygomycota): Insights

from molecular phylogenetics. Journal of General and Applied Microbiology.

51: 267–276.

Tateishi, T., Murayama, S. Y., Otsuka, F., and Yamaguchi, H. (1996).

Karyotyping by PFGE of clinical isolates of Sporothrix schenckii. FEMS

Immunol Med Microbiol. 13: 147–154.

Tavanti, A., Davidson, A., Johnson, E., Maiden, M., Shaw, D., Gow, N., and

Odds, F. (2005a). Multilocus sequence typing for differentiation of strains of

Candida tropicalis. J Clin Microbiol. 43(11): 5593-5600.

Tavanti, A., Davidson, A., Gow, N., Maiden, M., and Odds, F. (2005b). Candida

orthopsilosis and Candida metapsilosis spp. nov. to replace Candida

parapsilosis groups II and III. J Clin Microbiol. 43(1): 284-292.

Taylor, F. J. R. (1978). Problems in the development of an explicit hypothetical

phylogeny of the lower eukaryotes. BioSystems. 10: 67–89.

Page 150: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

125

Taylor, J. W., Geiser, D. M., Burt, A., and Koufopanou, V. (1999). The

evolutionary biology and population genetics underlying fungal strain typing.

Clin Microbiol Rev. 12(1): 126-146.

Tehler, A. (1988). A cladistic outline of the Eumycota. Cladistics. 4: 227–277.

Tehler, A., Farris, J. S., Lipscomb, D. L., Källerrsjö, M. (2000). Phylogenetic

analyses of the fungi based on large rDNA data sets. Mycologia. 92: 459–474.

Tehler, A., Little, D. P., and Farris, J. S. 2003. The full-length phylogenetic tree

from 1551 ribosomal sequences of chitinous fungi, Fungi. Mycological

Research. 107: 901–916.

Theissen, G., Kim, J., and Saedler, H. 1996. Classification and phylogeny of the

MADS-box multigene family suggest defined roles of MADS-box gene

subfamilies in the morphological evolution of eukaryotes. J Mol Evol. 43: 484-

516.

Thomas, J. R., Dwek, R. A., and Rademacher, T. W. (1990). Structure,

biosynthesis, and function of glycosylphosphatidylinositols. Biochemistry. 29:

5413–5422.

Thompson, J. D., Higgins, D. G., and Gibson T. J. (1994). CLUSTALW:

improving the sensitivity of progressive multiple sequence alignment through

sequence weighting, position-specific gap penalties and weight matrix choice.

Nucleic Acids Res. 22: 4673–4680.

Tiede, A., Bastisch, I., Schubert, J., Orlean, P., and Schmidt, R. E. (1999).

Biosynthesis of glycosylphosphatidylinositols in mammals and unicellular

microbes. Biol Chem. 380: 503–523.

Tietz, H., Hopp, M., Schmalreck, A., Sterry, W., and Czaika, V. (2001).

Candida africana sp. nov. a new human pathogen or a variant of Candida

albicans?. Mycoses. 44: 437-445.

Toth, G., Gaspari, Z., and Jurka, J. (2000). Microsatellites in different eukaryotic

genomes: survey and analysis. Genome Res. 10: 967–981.

Treisman, R. (1990). The SRE: a growth factor responsive transcriptional

regulator. Semin. Cancer Biol. 1: 47– 58.

Trifonov, E. N. (2003). Tuning function of tandemly repeating sequences: a

molecular device for fast adaptation. Pp. 1–24 in S. P. Wasser, ed., Evolutionary

Page 151: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

126

theory and processes: nodern horizons, papers in honor of Eviatar Nevo. Kluwer

Academic Publishers. Amsterdam, The Netherlands.

Tzung, K. W., Williams, R. M., Scherer, S., Federspiel, N., Jones, T., Hansen,

N., Bivolarevic, V., Huizar, L., Komp, C., Surzycki, R., Tamse, R., Davis, R.

W., and Agabian, N. (2001). Genomic evidence for a complete sexual cycle in

Candida albicans. Proc Natl Acad Sci. 98(6): 3249-3253.

Valenza, G., Valenza, R., Brederlau, J., Frosch, M., and Kurzai, O. (2006).

Identification of Candida fabianii as a cause of letal septicaemia. Mycoses. 49:

331-334.

Van Belkum, A., Scherer, S., van Alphen, L., and Verbrugh, H. (1998). Short-

sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev. 62:

275–293.

Van de Peer, Y., Baldauf, S. L., Doolittle, W. F., and Meyer, A. (2000). An

updated and comprehensive rRNA phylogeny of (crown) eukaryotes based on

rate-calibrated evolutionary distances. J Mol Evol. 51: 565 – 576.

Van der Walt, J. P. (1966). Lodderomyces, a new genus of the

Saccharomycetaceae. Antonie Van Leeuwenhoek 1966. 32(1): 1-5.

Voigt, K., and Wöstemeyer, J. (2001). Phylogeny and origin of 82 zygomycetes

from all 54 genera of the Mucorales and Mortierellales based on combined

analysis of actin and translation elongation factor EF-1α genes. Gene. 270: 113-

120.

Vos, P., Hogers, R., Bleeker, M., Reijans, M., Van de Lee, T., Hornes, M.,

Frijters, A., Pot, J., Peleman, J., Kuiper, M., and Zabeau, M. (1995). AFLP: a

new technique for DNA fingerprinting. Nucleic Acids Res. 23: 4407-4414.

Wainright, P. O., Hinkle, G., Sogin, M. L., and Stickel, S. K. (1993).

Monophyletic origins of the Metazoa: an evolutionary link with fungi. Science.

260: 340 – 342.

Wang, Z.; Binder, M.; Dai, Y.C.; and Hibbett, D.S. 2004. Phylogenetic

relationships of Sparassis inferred from nuclear and mitochondrial ribosomal

DNA and RNA polymerase sequences. Mycologia. 96: 1013-1027.

Watanabe, Y., Irie, K., and Matsumoto, K. (1995). Yeast RLM1 encodes a

serum response factor-like protein that may function downstream of the Mpk1

(Slt2) mitogen-activated protein kinase pathway. Mol Cell Biol. 15: 5740–5749.

Page 152: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

REFERENCES

127

Weig, M., Jansch, L., Gross, U., De Koster, C. G., Klis, F. M., and De Groot, P.

W. (2004). Systematic identification in silico of covalently bound cell wall

proteins and analysis of protein-polysaccharide linkages of the human pathogen

Candida glabrata. Microbiology. 150: 3129–3144.

Wernersson, R. (2006). Virtual Ribosome-a comprehensive DNA translation

tool with support for integration of sequence feature annotation. Nucleic Acids

Research. 34: W385-W388.

Whalley, A. J. S., and Edwards, R. L. (1995). Secondary metabolites and

systematic arrangement within the Xylariaceae. Can J Bot. 73(Suppl. 1): S802-

S810.

Wickerham, L. J., and Burton, K. A. (1954). A clarification of the relationship of

Candida guilliermondii to other yeasts by a study of their mating types. J

Bacteriol. 68(5): 594-597.

Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski, J. A., and Tingey, S.

V. (1990). DNA polymorphisms amplified by arbitrary primers are useful as

genetic markers. Nucleic Acids Research. 18: 6531-6535.

Whittaker, R. (1969). New concepts of kingdoms of organisms. Science. 163:

150–160.

Wingard, J. R., Mertz, W. G., Rinaldi, M. G., Johnson, T. R., Karp, J. E., and

Saral, R. (1991). Increase in Candida krusei infection among patients with bone

marrow transplant and neutropenia treated prophylactically with fluconazole. N

Engl J Med. 325: 1274–1277.

Wolfe, K. H., and Shields, D. C. (1997). Molecular evidence for an ancient

duplication of the entire yeast genome. Nature. 387: 708–713.

Woese, C., and Fox, G. (1977). Phylogenetic structure of the prokaryotic

domain: The primary kingdoms. Proc Natl Acad Sci. 74(11): 5088 – 5090.

Woese, C.; Kandler, O.; and Wheelis, M. (1990). Towards a Natural System of

Organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl

Acad Sci. 87(12): 4576-4579.

Wong, W. S. W., Yang, Z., Goldman, N., and Nielsen, R. (2004). Accurancy and

power of statistical methods for detecting adaptive evolution in protein coding

sequences and for identifying positive selective sites. Genetics. 168: 1041–1051.

Page 153: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

128

Wren, J. D., Forgacs, E., Fondon, J. W., 3rd, Pertsemlidis, A., Cheng, S. Y.,

Gallardo, T., Williams, R. S., Shohet, R. V., Minna, J. D., and Garner, H. R.

(2000). Repeat polymorphisms within gene regions: phenotypic and

evolutionary implications. Am J Hum Genet. 67: 345–356.

Yabana, N., and Yamamoto, M. (1996). Schizosaccharomyces pombe map1+

encodes a MADS-box-family protein required for cell-type-specific gene

expression. Mol Cell Biol. 16: 3420 – 3428.

Yanofsky, M. F., Ma, H., Bowman, J. L., Drews, G. N., Feldmann, K. A., and

Meyerowitz, E. M. (1990). The protein encoded by the Arabidopsis homeotic

gene agamous resembles transcription factors. Nature. 346: 35– 39.

Yang, Z. (1997). PAML: a program package for phylogenetic analysis by

maximum likelihood. Comput Appl Biosci. 13: 555–556.

Yang, Z., and Bielawski, J. P. (2000). Statistical methods for detecting

molecular adaptation. Trends Ecol Evolut. 15: 496–503.

Yang, Z., Wong, W. S., and Nielsen, R. (2005) Bayes empirical Bayes inference

of amino acid sites under positive selection. Mol Biol Evol. 22: 1107–1118.

Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A. M. K, (2000). Codon

substitution models for heterogeneous selection pressure at amino acid sites.

Genetics. 155: 431–449.

Young, L. Y., Lorenz, M. C., and Heitman, J. (2000). A STE12 homolog is

required for mating but dispensable for filamentation in Candida lusitaniae.

Genetics. 155(1): 17-29.

Young, E. T., Sloan, J. S., and van Riper, K. (2000). Trinucleotide repeats are

clustered in regulatory genes in Saccharomyces cerevisiae. Genetics. 154: 1053–

1068.

Yu, G., and Fassler, J. S. (1993). SPT13 (GAL11) of Saccharomyces cerevisiae

negatively regulates activity of the MCM1 transcription factor in Ty1 elements.

Mol Cell Biol. 13: 63 – 71.

Zhang, N., Clastebury, L. A., Miller, A. N., Huhndorf, S. M., Schoch, C. L.,

Seifert, K. A., Rossman, A. Y., Rogers, J. D., Kohlmeyer, J., Volkmann-

Kohlmeyer, B., and Sung. G-H. (2006). An overview of the systematic of the

Sordariomycetes based on a four-gene phylogeny. Mycologia. 98(6): 1076-

1087.

Page 154: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

Page 155: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,
Page 156: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

131

7. ANNEXES

ANNEX I

MrModelblock

#NEXUS

[Johan Nylander 2004-07-01]

[! ***** MrModeltest block -- Modified from MODELTEST 3.0 *****]

[The following command will calculate a NJ tree using the JC69 model of evolution]

BEGIN PAUP;

Log file= mrmodelfit.log replace;

DSet distance=JC objective=ME base=equal rates=equal pinv=0

subst=all negbrlen=setzero;

NJ showtree=no breakties=random;

End;

[!

***** BEGIN TESTING 24 MODELS OF EVOLUTION ***** ]

BEGIN PAUP;

Default lscores longfmt=yes; [Workaround for the bug in PAUP 4b10]

Set criterion=like;

[!

** Model 1 of 24 * Calculating JC **]

lscores 1/ nst=1 base=equal rates=equal pinv=0

scorefile=mrmodel.scores replace;

[!

** Model 2 of 24 * Calculating JC+I **]

lscores 1/ nst=1 base=equal rates=equal pinv=est

scorefile=mrmodel.scores append;

[!

** Model 3 of 24 * Calculating JC+G **]

lscores 1/ nst=1 base=equal rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 4 of 24 * Calculating JC+I+G **]

lscores 1/ nst=1 base=equal rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

[!

** Model 5 of 24 * Calculating F81 **]

lscores 1/ nst=1 base=est rates=equal pinv=0

scorefile=mrmodel.scores append;

[!

** Model 6 of 24 * Calculating F81+I **]

lscores 1/ nst=1 base=est rates=equal pinv=est

scorefile=mrmodel.scores append;

[!

Page 157: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

132

** Model 7 of 24 * Calculating F81+G **]

lscores 1/ nst=1 base=est rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 8 of 24 * Calculating F81+I+G **]

lscores 1/ nst=1 base=est rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

[!

** Model 9 of 24 * Calculating K80 **]

lscores 1/ nst=2 base=equal tratio=est rates=equal pinv=0

scorefile=mrmodel.scores append;

[!

** Model 10 of 24 * Calculating K80+I **]

lscores 1/ nst=2 base=equal tratio=est rates=equal pin=est

scorefile=mrmodel.scores append;

[!

** Model 11 of 24 * Calculating K80+G **]

lscores 1/ nst=2 base=equal tratio=est rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 12 of 24 * Calculating K80+I+G **]

lscores 1/ nst=2 base=equal tratio=est rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

[!

** Model 13 of 24 * Calculating HKY **]

lscores 1/ nst=2 base=est tratio=est rates=equal pinv=0

scorefile=mrmodel.scores append;

[!

** Model 14 of 24 * Calculating HKY+I **]

lscores 1/ nst=2 base=est tratio=est rates=equal pinv=est

scorefile=mrmodel.scores append;

[!

** Model 15 of 24 * Calculating HKY+G **]

lscores 1/ nst=2 base=est tratio=est rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 16 of 24 * Calculating HKY+I+G **]

lscores 1/ nst=2 base=est tratio=est rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

[!

** Model 17 of 24 * Calculating SYM **]

lscores 1/ nst=6 base=equal rmat=est rclass= (a b c d e f) rates=equal

pinv=0

scorefile=mrmodel.scores append;

[!

** Model 18 of 24 * Calculating SYM+I **]

lscores 1/ nst=6 base=equal rmat=est rates=equal pinv=est

scorefile=mrmodel.scores append;

[!

** Model 19 of 24 * Calculating SYM+G **]

Page 158: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

133

lscores 1/ nst=6 base=equal rmat=est rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 20 of 24 * Calculating SYM+I+G **]

lscores 1/ nst=6 base=equal rmat=est rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

[!

** Model 21 of 24 * Calculating GTR **]

lscores 1/ nst=6 base=est rmat=est rates=equal pinv=0

scorefile=mrmodel.scores append;

[!

** Model 22 of 24 * Calculating GTR+I **]

lscores 1/ nst=6 base=est rmat=est rates=equal pinv=est

scorefile=mrmodel.scores append;

[!

** Model 23 of 24 * Calculating GTR+G **]

lscores 1/ nst=6 base=est rmat=est rates=gamma shape=est pinv=0

scorefile=mrmodel.scores append;

[!

** Model 24 of 24 * Calculating GTR+I+G **]

lscores 1/ nst=6 base=est rmat=est rates=gamma shape=est pinv=est

scorefile=mrmodel.scores append;

Log stop=yes;

END;

Page 159: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

134

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Annex II. Fungal species where were identified ortologue RLM1 genes. Taxonomy, Nº intron, gene location and database.

Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center

Phycomyces blakeleeanus Mucoromycotina 5 Scaffold 1: 2877572 – 2879654 (-) Joint Genome Institute

Mucor circinelloides Mucoromycotina 3 Scaffold 2: 32637 – 34277 (-) Broad Institute, Murcia University

Rhizopus oryzae Mucoromycotina 4 Supercontig 2: 4147507 – 4149079 (-) Broad Institute

Ustilago maydis Ustilagomycetes - Contig 191: 196940 – 198718 (+) Broad Institute

Malassezia globosa Malasseziales - Sf 7.1: 204470 – 205939 (+) NCBI

Sporobolomyces roseus Microbotryomycetes 2 Scaffold 10: 85788 – 88407 (+) Joint Genome Institute

Cryptococcus neoformans Tremellomycetes 4 Chromosome 2: 1360064 - 1362029 (-) Broad Institute, Genome Sciences

Center Canada, Duke Center for

Genome Research, Stanford Genome

Technology

Cryptococcus gatti Tremellomycetes 4 Supercontig 16: 244132 – 246064 (+) Broad Institute

Laccaria bicolor Agaricomycetes 4 Scaffold 20: 147515 - 149297 (+) Joint Genome Institute

Coprinus cinereus Agaricomycetes 7 Contig 317: 81588 – 83599 (-)‖ Broad Institute

Phanerochaete chrysosporium Agaricomycetes 4 Scaffold 3: 1745299 – 1747266 (+) Joint Genome Institute

Postia placenta Agaricomycetes 8 Scaffold 174: 171333 – 174904 (+) Joint Genome Institute

Schizosaccharomyces japonicus Schizosaccharomycetes - Supercontig 5: 941790 - 942323 (+) Broad Institute

Schizosaccharomyces pombe Schizosaccharomycetes - Chromosome 2: 3626639-3627757 (+) Sanger Institute

Schizosaccharomyces octosporus Schizosaccharomycetes - Supercontig 2: 351924 – 352948 (+) Broad Institute

Yarrowia lipolytica Saccharomycetes 1 Chromosome F: 2507846 – 2509480 (-) Génolevures

Candida lusitaniae Saccharomycetes - Supercontig 5: 493096 - 494358 (-) Broad Institute

Candida guilliermondii Saccharomycetes - Supercontig 3: 493468 - 494775 (+) Broad Institute

Page 160: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

135

Pichia stipitis Saccharomycetes - Chromosome 1: 1519668 – 1521421 (+) Joint Genome Institute

Debaryomyces hansenii Saccharomycetes - Chromosome 2: 237182 - 238732 (+) Génolevures

Lodderomyces elongisporus Saccharomycetes - Supercontig 6: 1060953 - 1063316 (+) Broad Institute

Candida parapsilosis Saccharomycetes - Contig 005806: 536977 – 539238 (-) Sanger Institute

Candida tropicalis Saccharomycetes - Supercontig 1: 49905 - 51623 (-) Broad Institute, Génolevures

Candida albicans Saccharomycetes - Chromosome 4: 253044 – 254879 (+) Stanford Genome Technology Center,

Sanger Institute and Broad Institute

Candida dubliniensis Saccharomycetes - Chromosome 4: 265932 – 267662 (+) Sanger Institute

Kluyveromyces lactis Saccharomycetes - Chromosome E: 2138785 – 2140983 (+) Génolevures

Ashbya gossypii Saccharomycetes 1 Chromosome 7: 1125892 - 1127486 (-) Biozentrum and Syngenta

Saccharomyces kluyveri Saccharomycetes 1 Contig 4.8: 312314 – 314049 (-) Washington University Genome

Sequencing Center, Génolevures

Kluyveromyces waltii Saccharomycetes 1 Contig 62: 28422 – 29929 (+) Broad Institute

Candida glabrata Saccharomycetes - Chromosome H: 556498 – 558378 (+) Génolevures

Vanderwaltozyma polyspora Saccharomycetes - Contig 1048b: 36130 – 38160 (-) Trinity College Dublin, Ireland

Saccharomyces castellii Saccharomycetes - Contig 664: 5363 - 7027 (-) Washington University Genome

Sequencing Center

Saccharomyces bayanus Saccharomycetes - Contig 737: 12450 – 14405 (+) Washington University Genome

Sequencing Center, Broad Institute,

Génolevures

Saccharomyces kudriavzevii Saccharomycetes - Contig 02.2091: 16474 – 18432 (-) Washington University Genome

Sequencing Center

Saccharomyces mikatae Saccharomycetes - Contig 34: 13635 – 15605 (-) Washington University Genome

Sequencing Center, Broad Institute

Page 161: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

136

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Saccharomyces cerevisiae Saccharomycetes - Chromosome 16: 379117 – 379117 (-) Stanford Genome Technology Center,

Sanger Institute, Broad Institute

Saccharomyces paradoxus Saccharomycetes - Contig 387: 39147 – 40522 (-) Broad Institute

Botrytis cinerea Leotiomycetes 2* Supercontig 4: 131839 – 133900 (+) Broad Institute and Syngenta AG,

Genoscope

Sclerotinia sclerotiorum Leotiomycetes 2 Supercontig 7: 1397346 - 1399404 (-) Broad Institute

Magnoporthe grisea Sordariomycetes 2 Supercontig 27: 2198205 - 2200477 (+) Broad Institute

Verticillium dahliae Sordariomycetes 2 Supercontig 2: 411495 - 413529 (-) Broad Institute

Podospora anserina Sordariomycetes 1 Chromosome 1_SC1: 2337785 – 2339800 (+) Genoscope

Chaetomium globosum Sordariomycetes 3* Supercontig 1: 4237276 - 4239345 (+) Broad Institute

Neurospora crassa Sordariomycetes 2 Contig 6: 745962 - 748345 (-) Broad Institute

Nectria haematococca Sordariomycetes 2 Contig 1: 4240418 – 4242558 (+) Joint Genome Institute

Fusarium oxysporum Sordariomycetes 2 Supercontig 6: 882635 - 884756 (-) Broad Institute

Fusarium verticillioides Sordariomycetes 2 Supercontig 4: 851617 - 853730 (-) Broad Institute and Syngenta AG

Fusarium graminearum Sordariomycetes 2 Supercontig 6: 1217222 – 1219438 (+) Broad Institute

Ephicloa festucae Sordariomycetes 2* Contig 09290: 7786 – 8664 (+); Contig 10180:

1- 1318 (+)

Oklahoma University

Trichoderma reesei Sordariomycetes 2 Contig 1: 484378 – 486398 (+) Joint Genome Institute

Trichoderma virens Sordariomycetes 2 Locus ABDF01000081: 60794 – 62787 (-) Joint Genome Institute

Trichoderma atroviride Sordariomycetes 2 Locus ABDG01000180: 45972 – 47983 (-) Joint Genome Institute

Alternaria brassicicola Dothideomycetes 2 Contig 2.99: 20908 – 22912 (+) Washington University

Mycosphaerella fijiensis Dothideomycetes 2 Scaffold 11: 1715901 – 1717838 (+) Joint Genome Institute

Mycosphaerella graminicola Dothideomycetes 2 Scaffold 5: 2832610 – 2834512 (-) Joint Genome Institute

Pyrenophora tritici-repentis Dothideomycetes 2 Supercontig 8: 1656744 – 1658773 (+) Broad institute

Page 162: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

137

Stagonospora nodorum Dothideomycetes 2 Supercontig 36: 142061 – 144139 (-) Broad Institute, International

Stagonospora nodorum Genomics

Consortium

Cochliobolus heterostrophus Dothideomycetes 2 Scaffold 3: 15429 - 17447 (-) Joint Genome Insttitute

Coccidioides immitis Eurotiomycetes 2 Chromosome 4: 837325 – 839343 (-) Broad Institute

Coccidioides posadasii Eurotiomycetes 2 Chromosome 5: 696037 – 698131 (+) TIGR and Broad Institute

Blastomyces dermatitidis Eurotiomycetes 2 Contig 5.48: 866 – 3141 (-) Washington University

Histoplasma capsulatum Eurotiomycetes 2 Contig 1161: 255583 – 257713 (+) Broad institute, Washington University

Genome Sequencing Center

Paracoccidioides brasiliensis Eurotiomycetes 2 Supercontig 2: 1922491 - 1924634 (-) Broad Institute

Uncinocarpus reesii Eurotiomycetes 2 Supercontig 5: 1771068 – 1772993 (+) Broad Institute

Ascosphaera apis Eurotiomycetes 1* Contig 6709: 1 – 1360 (-); Contig 28: 3916 –

5847 (-)

Baylor College of Medicine

Microsporum gypseum Eurotiomycetes 1 Supercontig 8: 169115 – 170914 (-) Broad Institute

Talaromyces stipitatus Euriotiomycetes 2 Locus ABAS01000017: 261779 – 263672 (+) TIGR

Penicillium marneffei Euriotiomycetes 2 Locus ABAR01000011: 131838 – 133723 (-) TIGR

Neosartorya fischeri Eurotiomycetes 2 Contig 574: 1982352 - 1984281 (-) TIGR

Aspergillus clavatus Eurotiomycetes 2 Contig 83: 930562 – 932534 (+) TIGR

Aspergillus fumigatus Eurotiomycetes 2 Chromosome 3: 2185740 - 2187668 ( +) Sanger Institute and TIGR

Aspergillus oryzae Eurotiomycetes 2 Supercontig 2: 3771013 - 3773030 (-) Broad Institute

Aspergillus terreus Eurotiomycetes 2 Supercontig 2: 1754327 - 1756250 (-) Broad Institute

Aspergillus flavus Eurotiomycetes 2 Contig 1: 3827471 – 3829494 (-) TIGR

Aspergillus nidulans Euriotiomycetes 2 Contig 51: 528863 - 530802 (+) Broad Institute and Monsanto

Aspergillus niger Eurotiomycetes 2 Contig 2: 3360920 – 3362951 (+) Joint Genome Institute

Page 163: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

138

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Annex III. Fungal species where were identified ortologue MCM1 genes. Taxonomy, Nº intron, gene location and database.

Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center

Phycomyces blakeleeanus Mucoromycotina 3 Scaffold 1: 2124764 – 2126294 (+) Joint Genome Institute

Mucor circinelloides Mucoromycotina 2 Scaffold 4: 3273504 – 3274607 (-) Broad Institute, Murcia University

Rhizopus oryzae Mucoromycotina 3 Supercontig 4: 1692147-1693115 (-) Broad Institute

Ustilago maydis Ustilagomycetes - Contig 43: 64341-65705 (+) Broad Institute

Malassezia globosa Malasseziales 1 Sf 19.1: 27642 – 28805 (+) NCBI

Sporobolomyces roseus Microbotryomycetes 5 Scaffold 5: 1185071 – 1188512 (-) Joint Genome Institute

Cryptococcus neoformans Tremellomycetes 4 Chromosome 13: 39688-42628 (+) Broad Institute, Genome Sciences

Center Canada, Duke Center for

Genome Research, Stanford Genome

Technology

Cryptococcus gatti Tremellomycetes 3 Supercontig 14: 629109 – 632178 (-) Broad Institute

Laccaria bicolor Agaricomycetes 2 Scaffold 3: 1957445 - 1958697 (+) Joint Genome Institute

Coprinus cinereus Agaricomycetes 2 Contig 175: 86341-87583 (+) Broad Institute

Phanerochaete chrysosporium Agaricomycetes 1 Scaffold 8: 889662 – 890764 (-) Joint Genome Institute

Postia placenta Agaricomycetes 1 Scaffold 93: 242447 – 243724 (+) Joint Genome Institute

Schizosaccharomyces japonicus Schizosaccharomycetes 2 Supercontig 1: 436734-438566 (+) Broad Institute

Schizosaccharomyces pombe Schizosaccharomycetes 2 Chromosome 1: 5292795-5294344 (+) Sanger Institute

Schizosaccharomyces octosporus Schizosaccharomycetes 2 Supercontig 1: 2428314-2429843 (+) Broad Institute

Yarrowia lipolytica Saccharomycetes 1 Chromosome C: 913297 – 914639 (-) Génolevures

Candida lusitaniae Saccharomycetes - Supercontig 5: 694152-694868 (-) Broad Institute

Candida guilliermondii Saccharomycetes - Supercontig 1: 710234-710938 (+) Broad Institute

Page 164: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

139

Pichia stipitis Saccharomycetes - Chromosome 5: 1545724 – 1546455 (-) Joint Genome Institute

Debaryomyces hansenii Saccharomycetes - Chromosome F: 1872703 - 1873428 (+) Génolevures

Lodderomyces elongisporus Saccharomycetes - Supercontig 10: 288713-289579 (-) Broad Institute

Candida parapsilosis Saccharomycetes - Contig 005806: 325917 – 326762 (+) Sanger Institute

Candida tropicalis Saccharomycetes - Supercontig 7: 104018-104854 (+) Broad Institute, Génolevures

Candida albicans Saccharomycetes - Chromosome 7: 188023-188811 (-) Stanford Genome Technology Center,

Sanger Institute and Broad Institute

Candida dubliniensis Saccharomycetes - Chromosome 7: 207792 - 208493 (-) Sanger Institute

Kluyveromyces lactis Saccharomycetes - Chromosome F: 2474742 – 2475452 (-) Génolevures

Ashbya gossypii Saccharomycetes - Chromosome 2: 561146 - 561766 (-) Biozentrum and Syngenta

Saccharomyces kluyveri Saccharomycetes - Contig 5.10: 2448 – 3053 (+) Washington University Genome

Sequencing Center, Génolevures

Kluyveromyces waltii Saccharomycetes - Contig 206: 1 – 307 (-); Contig 205: 7934 –

8165 (-)

Broad Institute

Candida glabrata Saccharomycetes - Chromosome F: 618976 – 619632 (-) Génolevures

Vanderwaltozyma polyspora Saccharomycetes - Contig 1073b: 43103 – 43882 (-) Trinity College Dublin, Ireland

Saccharomyces castellii Saccharomycetes - Contig 649: 39880 - 40560 (-) Washington University Genome

Sequencing Center

Saccharomyces bayanus Saccharomycetes - Contig 579: 25293 – 26144 (-) Washington University Genome

Sequencing Center, Broad Institute,

Génolevures

Saccharomyces kudriavzevii Saccharomycetes - Contig 02.820: 794 – 1651 (-) Washington University Genome

Sequencing Center

Saccharomyces mikatae Saccharomycetes - Contig 578: 4300 – 5130 (+) Washington University Genome

Page 165: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

140

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Sequencing Center, Broad Institute

Saccharomyces cerevisiae Saccharomycetes - Chromosome 13: 353870 – 354730 (+) Stanford Genome Technology Center,

Sanger Institute, Broad Institute

Saccharomyces paradoxus Saccharomycetes - Contig 178: 11877 – 12719 (-) Broad Institute

Botrytis cinerea Leotiomycetes 2* Supercontig 165: 67693 – 68463 (-) Broad Institute and Syngenta AG,

Genoscope

Sclerotinia sclerotiorum Leotiomycetes 2 Supercontig 6: 1981916-1982762 (+) Broad Institute

Magnoporthe grisea Sordariomycetes 2 Supercontig 23: 1178204-1179169 (-) Broad Institute

Verticillium dahliae Sordariomycetes 2 Supercontig 3: 440666-441518 (+) Broad Institute

Podospora anserina Sordariomycetes 2 Chromosome 1_SC4: 1831500 – 1832359 (+) Genoscope

Chaetomium globosum Sordariomycetes 2 Supercontig 3: 3648631 - 3649505 (-) Broad Institute

Neurospora crassa Sordariomycetes 2 Contig 39: 300711 - 301579 (+) Broad Institute

Nectria haematococca Sordariomycetes 2 Contig 1: 1904784 – 1905808 (+) Joint Genome Institute

Fusarium oxysporum Sordariomycetes 2 Supercontig 3: 1396623 - 1397638 (-) Broad Institute

Fusarium verticillioides Sordariomycetes 2 Supercontig 2: 1298954 - 1299971 (-) Broad Institute and Syngenta AG

Fusarium graminearum Sordariomycetes 2 Supercontig 5: 973273 – 974303 (-) Broad Institute

Ephicloa festucae Sordariomycetes 2 Contig 03502: 2589 – 3725 (-) Oklahoma University

Trichoderma reesei Sordariomycetes 2 Contig 1: 293329 – 294473 (+) Joint Genome Institute

Trichoderma virens Sordariomycetes 2 Locus ABDF01000238: 119788 – 120837 (-) Joint Genome Institute

Trichoderma atroviride Sordariomycetes 2 Locus ABDG01000087 : 9802 – 10882 (-) Joint Genome Institute

Alternaria brassicicola Dothideomycetes 2 Contig 7.54: 1842 – 2668 (+) Washington University

Mycosphaerella fijiensis Dothideomycetes 2 Scaffold 5: 691696 – 692478 (+) Joint Genome Institute

Mycosphaerella graminicola Dothideomycetes 2 Scaffold 7: 131733 – 132524 (-) Joint Genome Institute

Pyrenophora tritici-repentis Dothideomycetes 2 Supercontig 5: 186836-187667 (+) Broad institute

Page 166: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

141

Stagonospora nodorum Dothideomycetes 2 Supercontig 11: 1046577 – 1047421 (+) Broad Institute, International

Stagonospora nodorum Genomics

Consortium

Cochliobolus heterostrophus Dothideomycetes 2 Scaffold 2: 392134 - 393018 (-) Joint Genome Insttitute

Coccidioides immitis Eurotiomycetes 2 Chromosome 3: 322326 – 323200 (-) Broad Institute

Coccidioides posadasii Eurotiomycetes 2 Chromosome 2: 790014 – 790884 (-) TIGR and Broad Institute

Blastomyces dermatitidis Eurotiomycetes 2 Contig 5593.1: 398 – 885 (-) Washington University

Histoplasma capsulatum Eurotiomycetes 2 Supercontig 1.3: 251153 – 252083 (+) Broad institute, Washington University

Genome Sequencing Center

Paracoccidioides brasiliensis Eurotiomycetes 2 Supercontig 12: 450142-451180 (+) Broad Institute

Uncinocarpus reesii Eurotiomycetes 2 Supercontig 2: 4576943-4577833 (+) Broad Institute

Ascosphaera apis Eurotiomycetes 1 Contig 2371: 17 – 538 (-); Contig 6891: 66 –

1133 (+)

Baylor College of Medicine

Microsporum gypseum Eurotiomycetes 2 Supercontig 1: 3737424-3738317 (+) Broad Institute

Talaromyces stipitatus Euriotiomycetes 2 Locus ABAS01000004: 938690 – 939519 (+) TIGR

Penicillium marneffei Euriotiomycetes 2 Locus ABAR01000013 : 274356 – 275187 (+) TIGR

Neosartorya fischeri Eurotiomycetes 2 Contig 582: 357403-358305 (-) TIGR

Aspergillus clavatus Eurotiomycetes 2 Contig 85: 946418-947301 (+) TIGR

Aspergillus fumigatus Eurotiomycetes 2 Chromosome 6: 357383-358283 –() Sanger Institute and TIGR

Aspergillus oryzae Eurotiomycetes 2 Supercontig 17: 221157-222072 (-) Broad Institute

Aspergillus terreus Eurotiomycetes 2 Supercontig 10: 1111982-1112918 (+) Broad Institute

Aspergillus flavus Eurotiomycetes 2 Contig 9: 471345-472255 (-) TIGR

Aspergillus nidulans Euriotiomycetes 2 Contig 159: 33815-34722 (-) Broad Institute and Monsanto

Aspergillus niger Eurotiomycetes 2 Contig 9: 222598 - 223521 (-) Joint Genome Institute

Page 167: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

142

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Annex IV. Fungal species where identified orthologue IFF8 genes. Taxonomy, Nº de intron, gene location and database.

Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center

Candida lusitaniae Saccharomycetes - Supercontig 7: 753415 - 755439 (+) Broad Institute

Candida guilliermondii Saccharomycetes - Supercontig 2: 36309 - 39698 (+) Broad Institute

Pichia stipitis Saccharomycetes - Chromosome 1: 307099 – 311559 (+) Joint Genome Institute

Debaryomyces hansenii Saccharomycetes - Chromosome B: 107946 - 112736 (-) Génolevures

Lodderomyces elongisporus Saccharomycetes - Contig 1.123: 280245 - 287281 (-) Broad Institute

Candida parapsilosis Saccharomycetes - Contig 005807: 1470770 – 1475785 (+) Sanger Institute

Candida tropicalis Saccharomycetes - Supercontig 10: 128215 - 134073 (+) Broad Institute, Génolevures

Candida albicans Saccharomycetes - Chromosome 5: 164521 - 166665 (-) Stanford Genome Technology Center,

Sanger Institute and Broad Institute

Candida dubliniensis Saccharomycetes - Chromosome 5: 187922 - 189868 (+) Sanger Institute

Page 168: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

143

Annex V. Fungal species where were identified ortologue SMP1 genes. Taxonomy, Nº intron, gene location and database.

Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center

Candida glabrata Saccharomycetes - Chromosome M: 657290 – 659269 (-) Génolevures

Vanderwaltozyma polyspora Saccharomycetes - Contig 397: 8567 – 10588 (-) Trinity College Dublin, Ireland

Saccharomyces castellii Saccharomycetes - Contig 721: 27914 - 29635 (-) Washington University Genome

Sequencing Center

Saccharomyces bayanus Saccharomycetes - Contig 3: 5420 – 6778 (+) Washington University Genome

Sequencing Center, Broad Institute,

Génolevures

Saccharomyces kudriavzevii Saccharomycetes - Contig 02.1807: 8672 – 9925 (+) Washington University Genome

Sequencing Center

Saccharomyces mikatae Saccharomycetes - Contig 100: 1952 – 3295 (+) Washington University Genome

Sequencing Center, Broad Institute

Saccharomyces cerevisiae Saccharomycetes - Chromosome 2: 594859 – 593501 (+) Stanford Genome Technology Center,

Sanger Institute, Broad Institute

Saccharomyces paradoxus Saccharomycetes - Contig 210: 35931 – 37187 (-) Broad Institute

Page 169: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

144

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

Annex VI. Fungal species where were identified ortologue ARG80 genes. Taxonomy, Nº intron, gene location and database.

Genus/species Taxonomy N° Intron Chromosome, Supercontig or Contig Sequencing Center

Candida glabrata Saccharomycetes - Chromosome F: 620431 – 621027 (-) Génolevures

Vanderwaltozyma polyspora Saccharomycetes - Contig 1073b: 45942 – 46289 (-) Trinity College Dublin, Ireland

Saccharomyces castellii Saccharomycetes - Contig 709: 49895 - 50539 (-) Washington University Genome

Sequencing Center

Saccharomyces bayanus Saccharomycetes - Contig 579: 26917 – 27447 (-) Washington University Genome

Sequencing Center, Broad Institute,

Génolevures

Saccharomyces kudriavzevii Saccharomycetes - Contig 02.1514: 585 – 1130 (-) Washington University Genome

Sequencing Center

Saccharomyces mikatae Saccharomycetes - Contig 2531: 1149 – 1703 (-) Washington University Genome

Sequencing Center, Broad Institute

Saccharomyces cerevisiae Saccharomycetes - Chromosome 13: 352602 – 353135 (+) Stanford Genome Technology Center,

Sanger Institute, Broad Institute

Saccharomyces paradoxus Saccharomycetes - Contig 178: 13466 – 14017 (-) Broad Institute

Page 170: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

145

ANNEX VII

Likelihood value, parameter estimates, and sites under positive selection as inferred under the six models

proposed to calculate ω over codons in MADS-box of orthologue RLM1 genes. (PAML 4.1)

Model l Estimates of parameters dN/dS Positively

Selected sites

One-ratio (M0)

-8907.095177

ω = 0.0185

0.0185

None

Neutral (M1a)

-8881.920467

p: 0.97143 0.02857

w: 0.01692 1.00000

0.0450

Not allowed

Selection (M2a)

-8881.920467

p: 0.97143 0.00817 0.02040

w: 0.01692 1.00000 1.00000

0.0450

None

Free-ratio (M3)

-8595.963248

p: 0.35665 0.42127 0.22208

w: 0.00122 0.01527 0.06434

0.0212

None

Beta (M7)

-8575.525557

p= 0.42029 q= 13.55916

0.0283

Not allowed

Beta & w (M8)

-8575.526257

p0= 0.99999 p= 0.42029 q= 13.55916

(p1= 0.00001) w= 1.00000

0.0283

None

Page 171: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

Phylogenetic relationships of Candida species inferred by sequence analysis of nuclear genes

146

ANNEX VIII

Likelihood value, parameter estimates, and sites under positive selection as inferred under the six models

proposed to calculate ω over codons in Saccharomycotina RLM1 gene. (HYPHY 1.0)

Model l Estimates of parameters dN/dS Positively

Selected sites

One-ratio (M0)

-39464.90296

ω = 0.13616

0.13616

None

Neutral (M1a)

-40768.67683

p: 0.9999343 0.0000657

w: 1.00000 1.00000

1.000

Not allowed

Selection (M2a)

-39216.75175

p: 0.66539 0.33461 0.00000

w: 0.13860 1.00000 2.98229

0.42684

None

Free-ratio (M3)

-38569.21552

p: 0.11323 0.46676 0.42001

w: 0.01243 0.10299 0.28431

0.16889

None

Beta (M7)

-38570.03576

p= 0.96651 q= 4.56744

0.17465

Not allowed

Beta & w (M8)

-38570.03383

p0= 0.99999 p= 0.96573 q= 4.56035

(p1= 0.00001) w= 1.05491

0.17476

None

Page 172: Julio César Chávez Galarza...Manifesto de maneira muito especial o meu agradecimiento à Prof. Doutora Ana Paula Sampaio, orientadora deste trabalho, por todo apoio, dedicação,

ANNEXES

147

ANNEX IX

Likelihood ratio test statistics for comparing site specific models for MADS-box and Saccharomycotina

RLM1 gene.

Analysis (DF)

M0 vs M3 (4)

M1a vs M2a (2)

M7 vs M8 (2)

MADS-box

622.26* 0 0.0014

Saccharomycotina

Rlm1 gene

1791.37* 3103.85* 0.0039

p value p> 0.05 p>0.05 p>0.05 Degrees of freedom (DF), corresponding to the difference of the number of parameters in the models, are

reported in brackets, significant values are marked by *.