evolutionary appearance of genes encoding proteins associated with box h/aca snornas: cbf5p in...

11
2342–2352 Nucleic Acids Research, 2000, Vol. 28, No. 12 © 2000 Oxford University Press Evolutionary appearance of genes encoding proteins associated with box H/ACA snoRNAs: Cbf5p in Euglena gracilis, an early diverging eukaryote, and candidate Gar1p and Nop10p homologs in archaebacteria Yoh-ichi Watanabe and Michael W. Gray* Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada Received February 22, 2000; Revised and Accepted April 24, 2000 DDBJ/EMBL/GenBank accession nos AF234319, AF234320 ABSTRACT A reverse transcription–polymerase chain reaction (RT–PCR) approach was used to clone a cDNA encoding the Euglena gracilis homolog of yeast Cbf5p, a protein component of the box H/ACA class of snoRNPs that mediate pseudouridine formation in eukaryotic rRNA. Cbf5p is a putative pseudouridine synthase, and the Euglena homolog is the first full- length Cbf5p sequence to be reported for an early diverging unicellular eukaryote (protist). Phylogenetic analysis of putative pseudouridine synthase sequences confirms that archaebacterial and eukaryotic (including Euglena) Cbf5p proteins are specifically related and are distinct from the TruB/Pus4p clade that is responsible for formation of pseudouridine at position 55 in eubacterial (TruB) and eukaryotic (Pus4p) tRNAs. Using a bioinformatics approach, we also identified archaebacterial genes encoding candidate homologs of yeast Gar1p and Nop10p, two additional proteins known to be associated with eukaryotic box H/ACA snoRNPs. These observations raise the possibility that pseudouridine formation in archaebacterial rRNA may be dependent on analogs of the eukaryotic box H/ACA snoRNPs, whose evolutionary origin may therefore predate the split between Archaea (archae- bacteria) and Eucarya (eukaryotes). Database searches further revealed, in archaebacterial and some eukaryotic genomes, two previously unrecognized groups of genes (here designated ‘PsuX’ and ‘PsuY’) distantly related to the Cbf5p/TruB gene family. INTRODUCTION In eukaryotes, cytoplasmic rRNA species other than 5S rRNA are transcribed as part of a longer precursor (pre-rRNA), from which the mature rRNAs are processed. In most cases, the pre- rRNA, which includes external and internal transcribed spacers, undergoes endonucleolytic cleavage to yield large subunit (LSU) rRNAs (5.8S and 26–28S) and a small subunit (SSU) rRNA. To date, the mechanisms of eukaryotic pre-rRNA processing have been well studied only in relatively late diverging species such as yeast (Saccharomyces cerevisiae) and vertebrate animals. These investigations have provided information about the complexity and order of processing events, as well as the involvement of numerous cis-elements and trans-acting factors (for recent reviews, see 1–4). In the protist phylum Euglenozoa, which includes euglenid protozoa such as Euglena gracilis and kinetoplastid protozoa such as Trypanosoma spp, the LSU rRNA is further fragmented into smaller pieces (a total of 14 in the case of E.gracilis; 5). Euglena gracilis LSU rRNA also contains a substantially higher proportion of O 2-methylnucleosides than its counterparts in yeast and vertebrates (6; M.N.Schnare, personal communi- cation). As in other eukaryotes, the highly fragmented euglenid LSU rRNA is derived from a pre-rRNA that includes the SSU rRNA sequence (7,8); however, information about trans-acting factors and details about the rRNA maturation pathway in E.gracilis are still quite limited (8,9). Although questions about the precise phylogenetic position of the Euglenozoa within the domain Eucarya (eukaryotes) have been raised by recent protein phylogenies (10–13), this phylum is generally considered to represent an early-branching eukaryotic lineage (14). In this context, studies of ribosome biogenesis in members of the Euglenozoa should provide insights into the evolutionary origin of the system that processes eukaryotic rRNA. The box H/ACA snoRNAs constitute one of the two major classes of snoRNA in eukaryotes (15). Box H/ACA snoRNAs display a secondary structure that features two hairpin motifs and two conserved sequence blocks, box H (ANANNA, in the hinge region between the two hairpin structures) and box ACA (ACANNN, at the 3end) (15–18). The box H/ACA snoRNAs participate in either rRNA processing (e.g. yeast snR30) (19) or formation of pseudouridine (5-ribosyluracil, Ψ) in rRNA. The latter function relies on complementarity within ‘pseudo- uridine pockets’ to short regions flanking the modification sites in rRNA (16,20). Some box H/ACA snoRNAs (e.g. yeast *To whom correspondence should be addressed. Tel: +1 902 494 2521; Fax: +1 902 494 1355; Email: [email protected]

Upload: independent

Post on 26-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

2342–2352 Nucleic Acids Research, 2000, Vol. 28, No. 12 © 2000 Oxford University Press

Evolutionary appearance of genes encoding proteinsassociated with box H/ACA snoRNAs: Cbf5p in Euglenagracilis, an early diverging eukaryote, and candidateGar1p and Nop10p homologs in archaebacteriaYoh-ichi Watanabe and Michael W. Gray*

Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry andMolecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada

Received February 22, 2000; Revised and Accepted April 24, 2000 DDBJ/EMBL/GenBank accession nos AF234319, AF234320

ABSTRACT

A reverse transcription–polymerase chain reaction(RT–PCR) approach was used to clone a cDNAencoding the Euglena gracilis homolog of yeastCbf5p, a protein component of the box H/ACA classof snoRNPs that mediate pseudouridine formation ineukaryotic rRNA. Cbf5p is a putative pseudouridinesynthase, and the Euglena homolog is the first full-length Cbf5p sequence to be reported for an earlydiverging unicellular eukaryote (protist). Phylogeneticanalysis of putative pseudouridine synthase sequencesconfirms that archaebacterial and eukaryotic (includingEuglena) Cbf5p proteins are specifically related and aredistinct from the TruB/Pus4p clade that is responsiblefor formation of pseudouridine at position 55 ineubacterial (TruB) and eukaryotic (Pus4p) tRNAs.Using a bioinformatics approach, we also identifiedarchaebacterial genes encoding candidate homologsof yeast Gar1p and Nop10p, two additional proteinsknown to be associated with eukaryotic box H/ACAsnoRNPs. These observations raise the possibilitythat pseudouridine formation in archaebacterialrRNA may be dependent on analogs of the eukaryoticbox H/ACA snoRNPs, whose evolutionary origin maytherefore predate the split between Archaea (archae-bacteria) and Eucarya (eukaryotes). Database searchesfurther revealed, in archaebacterial and some eukaryoticgenomes, two previously unrecognized groups ofgenes (here designated ‘PsuX’ and ‘PsuY’) distantlyrelated to the Cbf5p/TruB gene family.

INTRODUCTION

In eukaryotes, cytoplasmic rRNA species other than 5S rRNAare transcribed as part of a longer precursor (pre-rRNA), fromwhich the mature rRNAs are processed. In most cases, the pre-rRNA, which includes external and internal transcribed spacers,

undergoes endonucleolytic cleavage to yield large subunit(LSU) rRNAs (5.8S and 26–28S) and a small subunit (SSU)rRNA. To date, the mechanisms of eukaryotic pre-rRNAprocessing have been well studied only in relatively latediverging species such as yeast (Saccharomyces cerevisiae)and vertebrate animals. These investigations have providedinformation about the complexity and order of processingevents, as well as the involvement of numerous cis-elementsand trans-acting factors (for recent reviews, see 1–4).

In the protist phylum Euglenozoa, which includes euglenidprotozoa such as Euglena gracilis and kinetoplastid protozoasuch as Trypanosoma spp, the LSU rRNA is further fragmentedinto smaller pieces (a total of 14 in the case of E.gracilis; 5).Euglena gracilis LSU rRNA also contains a substantiallyhigher proportion of O2′-methylnucleosides than its counterpartsin yeast and vertebrates (6; M.N.Schnare, personal communi-cation). As in other eukaryotes, the highly fragmented euglenidLSU rRNA is derived from a pre-rRNA that includes the SSUrRNA sequence (7,8); however, information about trans-actingfactors and details about the rRNA maturation pathway inE.gracilis are still quite limited (8,9).

Although questions about the precise phylogenetic positionof the Euglenozoa within the domain Eucarya (eukaryotes)have been raised by recent protein phylogenies (10–13), thisphylum is generally considered to represent an early-branchingeukaryotic lineage (14). In this context, studies of ribosomebiogenesis in members of the Euglenozoa should provideinsights into the evolutionary origin of the system that processeseukaryotic rRNA.

The box H/ACA snoRNAs constitute one of the two majorclasses of snoRNA in eukaryotes (15). Box H/ACA snoRNAsdisplay a secondary structure that features two hairpin motifsand two conserved sequence blocks, box H (ANANNA, in thehinge region between the two hairpin structures) and box ACA(ACANNN, at the 3′ end) (15–18). The box H/ACA snoRNAsparticipate in either rRNA processing (e.g. yeast snR30) (19)or formation of pseudouridine (5-ribosyluracil, Ψ) in rRNA.The latter function relies on complementarity within ‘pseudo-uridine pockets’ to short regions flanking the modificationsites in rRNA (16,20). Some box H/ACA snoRNAs (e.g. yeast

*To whom correspondence should be addressed. Tel: +1 902 494 2521; Fax: +1 902 494 1355; Email: [email protected]

Nucleic Acids Research, 2000, Vol. 28, No. 12 2343

snR10) (20,21) are involved in both nucleolytic processing andpost-transcriptional modification. Box H/ACA snoRNAs havebeen identified in so-called ‘crown’ eukaryotes such asanimals, plants and yeast (reviewed in 22,23; see also 18,24–26)as well as in a ciliate protozoon, Tetrahymena thermophila (27).

In S.cerevisiae, box H/ACA snoRNAs form a complex withthe proteins Gar1p, Cbf5p, Nhp2p and Nop10p (4,28–31).Cbf5p, originally identified as a centromere/microtubule-bindingprotein (32), is an essential protein in yeast, implicated inrRNA processing (33) and stability of box H/ACA snoRNAs(4). Cbf5p shares sequence similarity with the TruB/Pus4pfamily of tRNA Ψ55 synthases (34–38) and, in associationwith box H/ACA snoRNAs, is believed to act as an rRNA Ψsynthase. Eukaryotic Cbf5p homologs have been identifiedand characterized in rat (NAP57; 34), human (dyskerin;39,40), Drosophila [Nop60B (41) or Minifly protein (26)] andfungi (42). Genes for archaebacterial Cbf5p homologs havealso been identified (43–47) but their function is currently unclear.

As yet, there is no published information about box H/ACAsnoRNP proteins in unicellular eukaryotes (protists), especiallyones thought to be early branching members of the domainEucarya. As part of an on-going investigation of rRNAprocessing and modification in the early diverging protist,

E.gracilis, we have cloned and characterized a full-lengthcDNA encoding the Cbf5p protein of this organism. In thecourse of this study, we also identified candidate genesencoding archaebacterial homologs of two of the proteins(Gar1p and Nop10p) known to be associated with eukaryoticbox H/ACA snoRNPs, as well as genes specifying two novelprotein families related to Cbf5p and TruB.

MATERIALS AND METHODS

RNA and DNA

Total cellular DNA and RNA from E.gracilis was prepared asdescribed by Breckenridge et al. (48). Poly(A)+ RNA wasselected from total RNA using the PolyATtract system(Promega, Madison, WI).

Oligonucleotides

Oligoribonucleotide P-1R (5′-AAUAAAGCGGCCGCGGAUC-CAA-3′) was purchased from Dalton Chemical LaboratoriesInc. (North York, ON, Canada). Oligodeoxyribonucleotides(Table 1) were obtained from Gibco BRL (Burlington, ON,Canada) or ID Labs Biotechnology, Inc. (London, ON, Canada).

Table 1. Oligonucleotides used for cloning of Cbf5p cDNA and genomic sequences

a3′ end blocked with enzymatically added dideoxy C (ddC), rather than an amine group as in ref. 56.bCorresponds to positions 1469 to 1496 of the cDNA sequence.cCorresponds to positions 1511 to 1537 of the cDNA sequence.

Primer Sequence (5′ to 3′) Corresponding peptide sequence

P-4 5′-AATAAAGCGGCCGCGGATCCAA-3′

dTP-4 5′-AATAAAGCGGCCGCGGATCCAAT16-3′

P-16 5′-AATAAAGCGGCCGCGGATCCAAT16(A/G/C)N-3′

P-22 5′-CTAATACGACTCACTATAGGGCTCGAGCGGCCGCCCGGGCAGGT-3′

P-23ddC 5′-pACCTGCCddC-3′a

P-24 5′-GGATCCTAATACGACTCACTATAGGGC-3′

P-25 5′-AATAGGGCTCGAGCGGC-3′

P-55 5′-GGAGCTCAATAAAGCGGCCGC-3′

P-70 5′-AGTCA(T/C)GA(A/G)GTN(T/G)CNTGG-3′ SHEVV(A/S)W (sense)

P-73 5′-ACTTTTGGN(A/T)C(C/A/G)A(A/G)NGTNCC-3′ GTL(D/V)PKV (antisense)

P-77 5′-GCT(A/G)T(T/C/G)TG(T/C)TA(T/C)GGNGCNAA-3′ A(I/V)CYGAK (sense)

P-79 5′-ATNGC(C/T)TCNCC(T/C)TTNGTNGT-3′ TTKGEA(I/V) (antisense)

P-85 5′-TTGAAGCGAATCCTGAAGGTGGAGAAGAC-3′

P-87 5′-GATGACCTCGTTGACCTCAATGCCGT-3′

P-86 5′-AAGGTGGAGAAGACTGGGCACGCA-3′

P-88 5′-CAATGCCGTTCTCAAAGCGGAGGAGT-3′

P-95 5′-CCAATCGGGTGGCTCTCTCAATGCAGA-3′

P-99 5′-CTCGGCATCGCCCTCATGACCA-3′

P-107 5′-CTGCGGCAGCAACGAGGAGGCGATGAA-3′

P-163 5′-GGCATAGCTTCAGGCCGTAGGTCATTCA-3′b

P-164 5′-CCCCTGGAACGTGCGGTGCTGCTGCTT-3′c

2344 Nucleic Acids Research, 2000, Vol. 28, No. 12

Generation of Cbf5p cDNA sequence by reversetranscription–polymerase chain reaction (RT–PCR)

RT of a fraction enriched in poly(A)+ RNA (∼500 ng) wascarried out using Superscript II reverse transcriptase (RNaseH–; Gibco BRL) and modified oligo(dT) primers (Table 1)dTP-4 for degenerate PCR or P-16 for 3′ RACE (rapid ampli-fication of cDNA ends) following the supplier’s protocol. Theoligo-capping method of Maruyama and Sugano (49) was usedin 5′ RACE experiments. In brief, a fraction enriched inpoly(A)+ RNA was treated with calf intestinal alkaline phos-phatase (New England Biolabs, Beverly, MA) followed bytobacco acid pyrophosphatase (Epicentre Technologies,Madison, WI). Using T4 RNA ligase (Amersham-PharmaciaBiotech, Cleveland, OH), the 3′ end of RNA oligo P-1R wasjoined to the 5′ end of the RNA originally possessing the capstructure. The resulting RNA was subjected to RT as describedabove using P-87 (Table 1) as a primer.

PCR mixtures were prepared as described by Baskaran et al.(50) using Pfu DNA polymerase (1.5 U/ml, Stratagene, LaJolla, CA) but substituting Taq DNA polymerase (25 U/ml ofreaction mix, Gibco BRL) for the KlenTaq1 polymerase.Initially, two internal portions of Cbf5p sequence were amplifiedby degenerate PCR using modified touchdown protocols (51)with the primer pairs P-70/P-73 and P-77/P-79 (Table 1; degenerateRT–PCR, Fig. 1). These PCR products were purified by non-denaturing polyacrylamide gel electrophoresis, re-amplifiedusing a conventional PCR cycle, then cloned and sequenced asoutlined below. New primers (P-85 and P-87, then P-86 and P-88in the nested reaction), designed on the basis of the sequenceinformation obtained, were used to connect the two portions(internal RT–PCR, Fig. 1). For 5′ RACE, the combination P-55+ P-87 was followed by P-4 + P-95 in the nested reaction(Fig. 1). For 3′ RACE, P-85 + P-55, then P-86 + P-4 were usedin the nested reaction (3′ RACE I, Fig. 1).

To verify poly(A) addition site(s) more precisely, an addi-tional round of 3′ RACE was carried out using as templatedTP-4-primed cDNA together with primers specific forsequence further downstream (P-99 and P-107, respectively, inprimary and nested PCR) and anchor-specific primers (P-4 andP-55, respectively, in primary and nested reactions; 3′ RACEII, Fig. 1). Ten clones selected in this way were sequenced.

Nested PCR products were purified by agarose gel electro-phoresis, cloned (52), then sequenced by the dideoxy method

(53) with modifications (48,54,55). To minimize the possibilityof ‘PCR mutation’, at least three independent clones from eachreaction were sequenced on both strands. The P-86 to P-88portion of the 3′ RACE I products was not fully analyzedbecause this region overlapped parts of other RT–PCR products.

Genomic PCR

To examine the 3′ terminal region of the E.gracilis Cbf5p geneencompassing the region that includes sequence heterogeneitiesnoted during cDNA analysis (see Results), a PCR walkingstrategy (56) was used with minor modifications. In brief,E.gracilis total DNA was digested separately with a number ofrestriction enzymes that generate blunt ends. A partiallydouble-stranded adapter (P-22 and P23ddC annealed together;see table 1 and figure 1 of ref. 56) was ligated to both ends ofthe resulting restriction fragments using T4 DNA ligase. Theligation products were then used as templates in PCR withdifferent combinations of cDNA- and adapter-specific primers(P-163 and P-24 in the primary reaction, P-164 and P-25 in thesecondary reaction; Table 1). One of the PCR products (∼420 bp),amplified from template prepared from DraI-digested DNA,was purified, cloned and sequenced.

Northern hybridization

The fraction enriched in poly(A)+ RNA (500 ng) was electro-phoresed in a 1% agarose–0.4 M formaldehyde gel (57), thentransferred to GeneScreen Plus nylon membrane (NEN,Boston, MA) according to the protocol of Chomczynski andMackey (58). A cloned cDNA fragment (P-86 to P-88) wasamplified by PCR using a corresponding plasmid clone astemplate DNA. The amplification product was purified byagarose gel electrophoresis, then labeled by a random primingprotocol (59) using [α-32P]dATP. Hybridization wasperformed in 0.5 M sodium phosphate buffer (pH 7.2), 7%SDS and 1 mM EDTA (60) at 65°C overnight. The membranewas immersed in 40 mM sodium phosphate buffer, 5% SDSand 1 mM EDTA at 65°C for 5 min, then 40 mM sodium phos-phate buffer, 1% SDS and 1 mM EDTA at 65°C for 20 min(3×), and finally subjected to autoradiography with an intensifyingscreen at –75°C.

Southern hybridization

Total cellular DNA (5 µg) was hydrolyzed with restrictionenzymes and the products were separated by electrophoresis ina 0.7% agarose gel, then transferred to GeneScreen Plus asdescribed by Chomczynski (61). The cDNA fragment wasamplified by PCR and labeled as above. Hybridization andautoradiography were carried out as described above.

Analysis of sequence data

Sequence data were assembled and analyzed using MacDNASISversion 3.5 (Hitachi Software, Yokohama, Japan). Searches ofpublic domain nucleotide and protein sequence databases werecarried out by Gapped-BLAST (BLASTP or TBLASTN; 62)through the NCBI web server (www.ncbi.nlm.nih.gov ) usingthe default options unless otherwise specified. Sequencealignments were generated with Clustal W version 1.7 (63)followed by manual modification. Based on the alignment,181 amino acid positions were selected for phylogenetic analysis,with positions of insertion and deletion omitted. Phylogenetictrees were constructed using the quartet puzzling and

Figure 1. Strategy for RT–PCR cloning of E.gracilis Cbf5p cDNA. Primers(see Table 1) are shown as arrowheads at positions corresponding to their targetbinding sites.

Nucleic Acids Research, 2000, Vol. 28, No. 12 2345

maximum likelihood methods of protein phylogeny in thePUZZLE 4.0 program (64). The JTT-F + γ model of aminoacid substitutions was assumed in the analysis (64–66). Rateheterogeneity among sites was approximated by a discretegamma distribution (with four categories).

RESULTS

Assembly of a cDNA sequence encoding E.gracilis Cbf5p

As summarized in Figure 1, an RT–PCR approach was used toassemble a complete cDNA sequence encoding EuglenaCbf5p. Initially, highly degenerate primers were designed onthe basis of conserved peptide motifs identified in an alignment ofeukaryotic homologs of yeast Cbf5p. Two of the primer setssuccessfully amplified short cDNA fragments that appeared tospecify portions of Euglena Cbf5p (degenerate PCR in Fig. 1).These partial cDNA sequences were used in further primerdesign to connect the initial two sequences, after whichadditional sets of primers were synthesized as necessary toallow completion of the sequence by RACE techniques.

The 5′ end of the cDNA was obtained by an oligo-cappingmethod (49) that in theory should yield the authentic 5′ end ofthe corresponding mature mRNA. In E.gracilis, as in thetrypanosomatid protozoa, most mRNAs acquire a common5′ terminal sequence as a result of a trans-splicing event thatcovalently joins the 5′ portion of a small, separately tran-scribed RNA, the spliced leader (SL) RNA, to different mRNAtranscripts (67). In our hands, the oligo-capping procedureyielded an SL sequence having two additional 5′ nucleotides(AC) compared with previously reported sequences (67,68).Known gene sequences for the Euglena SL RNA do have ACat the corresponding positions (68); thus, the Euglena SLsequence very likely is 28 nt long, beginning with the sequence5′-ACAC...

The SL RNA features an as yet uncharacterized methyl-guanosine 5′ cap structure, as suggested by analyses(Y.Watanabe, unpublished results) using an anti-TMG anti-body that in our hands reacts with monomethyl- as well astrimethylguanosine (69). Previous studies of SL sequences inEuglena had employed direct reverse transcriptase sequencing,with the enzyme evidently stopping at the third or fourthpositions from the 5′ end, presumably as a result of post-transcriptional modifications in the SL sequence (70; see also71, but no sequencing gel shown). The protocol used in thepresent study may allow reverse transcriptase extensionthrough these modifications to some degree; alternatively,reverse transcriptase readthrough may reflect undermodificationat the 5′ end of the Euglena SL RNA, as observed in the caseof the trypanosomatid SL (72–74). The latter RNA has acharacteristic cap4 structure in which the first four 5′ nucleotidesare methylated in the base and/or sugar (O2′-methyl) moieties(72,75). Our results and those of previous studies suggest thatEuglena SL RNA likely harbors modifications in its first four5′ nucleotides. Thus, Euglena and kinetoplastid protozoa maybe similar not only in possessing trans-splicing but also inhaving an SL RNA that is extensively modified at its 5′ end.Recently, a 28-nt SL beginning with 5′-ACUC... and alsopossibly containing a 5′ cap structure with extensive modificationswas characterized in the colorless euglenid, Entosiphonsulcatum (76).

In 3′ RACE experiments that examined cDNA synthesizedusing an oligo(dT) primer, we obtained evidence of heterogeneityat the poly(A) addition site (Fig. 2). 3′ RACE analysis alsorevealed a likely sequence heterogeneity at a single position(Fig. 2): at this site, four clones had C whereas five clones hadT, with one clone lacking the corresponding region altogether(see below). It seemed unlikely that these differences could beattributed to RT or PCR artifacts because they were observedin clones generated in independent 3′ RACE experiments (datanot shown). On the other hand, sequencing of the genomicPCR product comprising the 3′ end of the cDNA indicated thatthe position in question is T in the gene sequence (three out of

Figure 2. cDNA sequence encoding E.gracilis Cbf5p, displayed with thededuced amino acid sequence. Locations of the trans-spliced leader sequence,TruB motifs I and II, and PUA domain (36,84) and the C-terminal KKE repeatare indicated. Alternative polyadenylation sites determined by 3′ RACE (seeFig. 1) are denoted by asterisks. Because in the gene sequence A residuesimmediately follow one of the deduced polyadenylation sites, the position ofthis site was assigned tentatively.

2346 Nucleic Acids Research, 2000, Vol. 28, No. 12

three independent clones sequenced). The poly(A) additionsite in the one ‘deleted’ clone is ambiguous due to the possibleoccurrence of A residues immediately 5′ to the poly(A) tail; allother clones had a pyrimidine residue at the poly(A) junction.

Genomic Southern analysis (data not shown) combined withthe results of genomic PCR make it unlikely that there aremultiple copies of the Cbf5p gene in E.gracilis nuclear DNA,although detailed Cbf5p gene structure has not yet beeninvestigated. Northern blot analysis (not shown) revealed aCbf5p mRNA of ∼1.7 kb, consistent with the size of thecomplete cDNA sequence plus about 100 3′ terminal A residues.

Characteristics of the Euglena Cbf5p sequence

The E.gracilis Cbf5p cDNA sequence contains an openreading frame of 467 residues specifying a protein of predictedmolecular mass 52 392 Da (Fig. 2). Like the homologousfungal sequences, Euglena Cbf5p lacks an identifiablepositively charged N-terminal extension that in metazoanCbf5p sequences is assumed to represent a nuclear localizationsignal. However, Euglena Cbf5p does possess the C-terminalKKE/D repeats that are characteristic of all known eukaryoticCbf5p homologs. These repeats display microtubule-bindingactivity in vitro and are essential for viability in S.cerevisiae(32) (although not in another yeast, Kluyveromyces lactis; 42).

In Escherichia coli Ψ synthases acting on rRNA and tRNA,a highly conserved aspartate residue is essential for activity(77–80), and a possible catalytic role for this residue via its β-carbonyl group has been proposed (77,81). A recent crystallo-graphic study supports this idea (82). A motif that includes thisconserved Asp (the TruB motif II) has been identified not onlyamong members of the TruB/Pus4p family of tRNA Ψ55synthases (37) but also in Cbf5p (36), and the functionalimportance of this Asp in both TruB (80) and Cbf5p (83) hasnow been demonstrated. A partial alignment of eukaryotic andarchaebacterial Cbf5p homologs, encompassing highlyconserved regions (motifs I and II) characteristic of Ψ synthases,is shown in Figure 3 (a full alignment of the corresponding Cbf5psequences is available upon request). Exceptional conservationof the motif II sequence (including the functionally critical Aspresidue) is evident among Cbf5p homologs, including theE.gracilis one.

An unusual arrangement of the Cbf5p gene is seen inAeropyrum pernix, the only member of the archaebacterialkingdom Crenarchaeota for which a complete genomesequence is available (the other completely sequenced archae-bacterial genomes being from members of the Euryarchaeota).In this case, the Cbf5p homolog is encoded by two partiallyoverlapping open reading frames and the motif II aspartateresidue (Asp60) is replaced by glutamate (47). In mutagenesisexperiments with the E.coli TruA tRNA Ψ synthase, Huanget al. (77) found that Glu at this position could not substitutefunctionally for the conserved Asp.

Phylogenetic relationships

Phylogenetic analysis of eukaryotic and archaebacterial Cbf5phomologs (Fig. 4) suggests that Euglena diverged from themain branch of eukaryotic evolution earlier than the othereukaryotes for whom Cbf5p sequences are currently available.However, confirmation of this point will require a morediverse collection of eukaryotic Cbf5p sequences, includingadditional protist ones. This phylogenetic analysis also

Figure 3. A portion of the Cbf5p sequence alignment showing the regionencompassing TruB motifs I and II. Motifs are adapted from Koonin (36),with highly conserved residues boxed. The position of the conserved,functionally important Asp (see text) is marked with an asterisk. Sequencesshown above the Euglena sequence are eukaryotic, sequences below arearchaebacterial. GenBank accession numbers are: S.cerevisiae, AAA34473;K.lactis, AAC64862; Candida albicans, AAB94297; Emericella nidulans,AAB94296; Sartorya (Aspergillus) fumigata, AAB94298; Schizosaccharomycespombe, CAB10131; D.melanogaster, AAC97117; C.elegans, CAB07244;Homo sapiens, AAB94299; Rattus norvegicus, P40615; A.fulgidus, AAB90995;M.jannaschii, AAB98132; Pyrococcus abyssi, CAB49444; Pyrococcus horikoshii,O59357; M.thermoautotrophicum, O26140. The A.pernix sequence is a fusionof two peptide sequences (BAA79973 and BAA79974) encoded by separateopen reading frames (nucleotide sequence AP000060.1). In the completealignment (available from the authors on request), non-Cbf5p-homologousregions of these two peptides were excluded.

Figure 4. Maximum likelihood phylogenetic tree of selected Cbf5p/TruB familysequences. Sequences are as listed in Figure 3, with GenBank accessionnumbers for additional sequences as follows: Thermotoga maritima,AAD35938; Aquifex aeolicus, AAC06885; Bacillus subtilis, CAB13539;E.coli, AAC76200; S.cerevisiae (Pus4p), P48567; S.pombe (Pus4p),CAA20692. Numbers on branches are the quartet puzzling support valuesfrom 1000 puzzling steps (64).

Nucleic Acids Research, 2000, Vol. 28, No. 12 2347

demonstrates that archaebacterial and eukaryotic (includingEuglena) Cbf5p sequences comprise two distinct branches of asingle clade to the exclusion of the affiliated TruB (eubacterial)and Pus4p (eukaryotic) sequences, which form a separateclade. These affinities are supported by the presence of uniqueinsertions found only in TruB and yeast Pus4p sequences andby the absence of an apparent pseudouridine synthase andarchaeosine (PUA) domain in Pus4p and most eubacterialTruB sequences (84).

Potential homologs of box H/ACA snoRNP proteins inarchaebacterial genomes

The known presence in archaebacterial genomes of genesencoding homologs of Cbf5p and Nhp2p (another box H/ACAsnoRNP-specific protein) prompted us to search for genes thatmight specify Gar1p and Nop10p, the remaining two proteins

recently identified as components of box H/ACA snoRNPs(30,31). In a TBLASTN search of sequenced archaebacterialgenomes using the yeast Gar1p sequence, a potential Methano-bacterium homolog was identified (E value 0.004). Using thisMethanobacterium sequence as query, putative homologs fromother archaebacterial genomes were identified (E valuesranging from 10–12 to 0.43 for Aeropyrum). Figure 5A presentsan alignment of the N-terminal portion of inferred archae-bacterial and eukaryotic Gar1p sequences. The archaebacterialversions are predicted to have shorter N- and C-terminalregions than their eukaryotic homologs, with sequence identitylimited to short blocks dispersed throughout the sequence. Inthis regard, it has been shown in the case of S.cerevisiae Gar1pthat Gly–Arg repeats at both N- and C-termini are not essentialfor growth (85), with the protein produced by in vitro translationable to interact with box H/ACA snoRNAs through its internal

Figure 5. (A) Alignment of the N-terminal portion of putative Gar1p protein sequences from archaebacteria and some eukaryotes. GenBank accession numbers forthese sequences are: P.abyssi, CAB49230; M.jannaschii, P81312; A.pernix, BAA79764; M.thermoautotrophicum, AAB85384; Encephalitozoon cuniculi,CAA07263; S.cerevisiae, P28007; D.melanogaster, S49193; Arabidopsis thaliana, AAF00626. The sequences of P.horikoshii, A.fulgidus, C.parvum and H.sapiens(the latter two being partial sequences from EST data) are from open reading frames in nucleotide sequences AP000007.1, AE001014, AA532317 and AA308727,respectively. (B) Alignment of putative Nop10p sequences from archaebacteria and some eukaryotes. Accession numbers are: P.abyssi, CAB49761; P.horikoshii,translation of an open reading frame in AP000004; A.fulgidus, O29724; M.thermoautotrophicum, O27362; M.jannaschii, P81303; A.thaliana, AAD25649;Trypanosoma brucei, translation from AA681026 (EST data). The sequences of S.cerevisiae and H.sapiens Nop10p are from (31) whereas the Aeropyrum Nop10psequence is from an open reading frame that begins with TTG in the nucleotide sequence AP000059. Highly conserved residues are shown in white on a blackbackground whereas gray shading indicates conservative substitutions.

2348 Nucleic Acids Research, 2000, Vol. 28, No. 12

core region (86). Thus, the shorter archaebacterial Gar1pcandidates could well be functional.

Using the yeast Nop10p sequence in a TBLASTN search ofcomplete archaebacterial genome sequences, we identifiedpossible homologs of this protein; E values ranged from 10–5 to0.84 (except for 4.9 in the case of Methanococcus), withcorresponding values between 10–16 and 10–10 obtained usingthe Pyrococcus Nop10p homologs (which have the identicalprotein sequence) as query. Amino acid sequence identityamong the putative archaebacterial and eukaryotic Nop10phomologs (Fig. 5B) is somewhat greater than in the case ofGar1p, with N-terminal and C-terminal regions being the mostdivergent. However, in contrast to Gar1p, the putative Nop10psequences from archaebacteria and eukaryotes are virtually thesame length.

Examination of sequenced archaebacterial genomes revealedthat in all cases, the putative Nop10p gene is physically linkedto the gene encoding the homolog of eukaryotic translationinitiation factor IF2-α, and sometimes ribosomal proteingenes, as well, possibly in the same operon (Fig. 7A). Further,genes for some candidate archaebacterial Gar1p homologs (inMethanococcus, Methanobacterium, Archaeoglobus andAeropyrum) are physically linked to genes for archaebacterialhomologs of eukaryotic transcription factor IIB (TFIIB)(Fig. 7B). In Archaea, transcription of both rRNA and mRNArequires TFIIB (87). Furthermore, the candidate AeropyrumGar1p gene is in the same transcriptional orientation (and somay be part of the same operon) as the gene for ribosomalprotein S8E (Fig. 7B). The organization of these putativearchaebacterial Gar1p and Nop10p genes suggests that theirexpression may be co-regulated with components of thetranscription and translation machineries in these organisms.

Identification in archaebacteria and some eukaryotes ofgenes encoding a novel group of proteins related to theCbf5p/TruB gene family

In an attempt to identify additional protein sequences related toknown Ψ synthases, we conducted BLAST searches using asquery an internal portion of the E.gracilis Cbf5p sequence(positions G66 to A271) that excluded Cbf5p-specific N- andC-terminal regions. In particular, the query sequence lacked thePUA domain, which has been proposed as a possible RNA-bindingdomain in a broad range of known or putative RNA-bindingproteins (84). In addition to the expected Cbf5p/TruB familysequences (E values ranging from 10–86 to 0.45 except for avalue of 5.1 in the case of S.cerevisiae Pus4p), this searchdetected a previously uncharacterized Archaeoglobussequence (GenBank accession no. AAB90092) at an E value of0.052. Using this sequence in Gapped-BLAST searches (62),additional novel Cbf5p/TruB-related sequences (here designated‘PsuX’) were detected (Fig. 6A). In BLASTP or TBLASTNsearches, we identified PsuX sequences with high statisticalsignificance (E values <10–20) in all completely sequencedarchaebacterial genomes, as well as detecting possible orthologousfull-length sequences in the animals Caenorhabditis elegansand Drosophila melanogaster and a partial sequence in aprotist, Giardia intestinalis; on the other hand, no PsuX-relatedsequences were evident in the yeast (S.cerevisiae) or eubacterialgenomes. BLAST searches of EST databases suggested thatmammals and plants also express a PsuX homolog (data notshown). After four rounds of iteration using the Archaeoglobus

fulgidus sequence as query in a more sensitive PSI-BLAST(62) search and with the low-complexity filter program SEG(88) off, authentic Cbf5p/TruB family sequences but nopseudo-positives appeared with a high degree of significance(E values <10–8).

The Cbf5p/TruB-homologous portion of the PsuX proteinsequence does not contain a readily identifiable TruB motif II(Fig. 6A) which, as noted above, includes the functionally criticalAsp residue. However, alignment of PsuX protein sequences didreveal a highly conserved stretch, (A/S)GRED(V/I)D(A/V)R(M/T/V)LG (positions 183–194 in the A.fulgidus sequence)(Fig. 6A), that is likely a functionally important region. Thisconserved stretch resembles known Ψ synthase motifs (77) andincludes two Asp residues (Fig. 6B). In Figure 6B, the PsuX2alignment follows the DxxxxG pattern proposed for the TruB/RluA/RsuA superfamily (79).

Another notable feature of PsuX sequences is the presence ofCX2C motifs within the N-terminal region. Some of these motifs(e.g., C4X2C7, C20X2C23, C139X2C142 and C147X2C150 inA.fulgidus) are well conserved among the aligned sequences(not shown), although the C139X2C142 and C147X2C150motifs are shared only among A.fulgidus, Methanococcusjannaschii and Methanobacterium thermoautotrophicum. TheCX2C motif is frequently found in metal-binding domains ofvarious nucleic acid-binding proteins (89). Pus1p, a yeasttRNA Ψ synthase, contains a zinc ion essential for functionand tRNA binding, and potential zinc-binding elements similarto CX2C motifs have been proposed for this protein (90).

In M.jannaschi and M.thermoautotrophicum, the PsuX geneis physically linked to the gene encoding ribosomal proteinL21E and (in the case of M.thermoautotrophicum) the gene foran archaebacterial homolog of the signal recognition particleGTPase, Ffh/SRP54, as well (Fig. 7C). Furthermore, in thegenome of Pyrococcus spp, genes encoding PsuX and ahomolog of eukaryotic initiation factor IF2-α are arrayed in ahead-to-head fashion (Fig. 7C). These observations suggest thepossibility in at least some archaebacteria of co-regulation ofthe expression of PsuX genes and of genes encoding proteinsrelated to translation, as suggested above for some of thearchaebacterial candidate Gar1p and Nop10p homologs.

DISCUSSION

On the basis of cDNA analysis, the Euglena Cbf5p mRNA hasa 71-nt 5′ untranslated leader, including a 28-nt trans-splicedleader that is two nucleotides longer at the 5′ end than previouslyreported for the Euglena SL (67,68). A cluster of polyadenylationsites occurs between 122 and 136 nt downstream of an inferredUAG termination codon. However, no clear polyadenylationsignals are evident.

The E.gracilis Cbf5p sequence is the first reported exampleof a protist homolog of this key rRNA modification protein.The Euglena branch is the earliest one in a eukaryotic Cbf5pphylogenetic tree, consistent with the early divergence ofEuglena in phylogenies based on rRNA sequence comparisons;however, this placement should be considered tentative giventhe highly biased nature of the current Cbf5p database which,with the exception of Euglena Cbf5p, consists exclusively ofanimal and fungal sequences. In any event, Euglena Cbf5pbranches robustly within the eukaryotic sub-tree, distinct fromboth archaebacterial Cbf5p and eubacterial/eukaryotic TruB/Pus4p

Nucleic Acids Research, 2000, Vol. 28, No. 12 2349

sequences. Euglena Cbf5p has all of the conserved motifscharacteristic of Ψ synthases, as well as the distinctive C-terminalKKE repeat motif found only in eukaryotic Cbf5p sequences.

Our finding of a Cbf5p sequence in Euglena as well as thepresence of box H/ACA snoRNAs in Tetrahymena (27)strongly indicates that protists possess a box H/ACA snoRNP-based system for rRNA processing and pseudouridylylation ofrRNA. This conclusion is supported by the presence in publicdatabases of partial protist cDNA sequences encoding box H/ACAsnoRNP proteins. These protein sequences include Gar1p,Cbf5p and Nhp2p from an apicomplexan, Cryptosporidiumparvum (GenBank accession nos AA532317, AA532324 andAA224694, respectively), and Nop10p from kinetoplastidprotozoa, Trypanosoma spp (AA681026 and AA952384). Incontrast, eubacteria use a set of site-specific (sometimes multisite-specific) rRNA Ψ synthases, namely RluA, B, C, D and E(comprising the RluA family) and RsuA (78,79,91–97).

The situation with respect to rRNA pseudouridylylation inarchaebacteria is unclear at present. Based on the relativelysmall number of Ψ residues in the rRNA of the crenarchaeote,Sulfolobus solfataricus (98), and the apparent absence of aGar1p homolog in archaebacterial genomes, Lafontaine and

Tollervey (99) suggested that archaebacteria may have asnoRNA-independent system of Ψ formation in rRNA. On theother hand, as noted recently (79,84), genes for the eubacterial-type RluA and RsuA families of Ψ synthases are not apparentin any of the completely sequenced archaebacterial genomes.Moreover, although the LSU rRNA of Sulfolobus acido-caldarius contains only six Ψ residues (compared with 9 and55, respectively, in E.coli and human LSU rRNA), none ofthese archaebacterial Ψ residues is specifically shared witheubacteria to the exclusion of eukaryotes (100); rather, threeare at unique positions, two are shared with both eubacteriaand eukaryotes, and one is shared only with eukaryotes. Ourreport of archaebacterial sequences encoding candidatehomologs of Gar1p and Nop10p, in conjunction with priorevidence of archaebacterial Cbf5p and Nhp2p homologs(30,31,101,102), lends additional support to the propositionthat a snoRNA-based system operates in Archaea to generateΨ in rRNA. However, biochemical characterization of thesearchaebacterial proteins will be required to verify theirproposed function.

Table 2 summarizes the known distribution of Ψ synthaseswithin the three domains of life. In addition to one or more

Figure 6. (A) Alignment of the C-terminal portion of the novel PsuX protein sequence family (the full-length alignment is available upon request). The overliningdenotes a highly conserved stretch of the PsuX alignment that contains two Asp residues and is reminiscent of a Ψ synthase motif (B). GenBank accession numbersfor the PsuX sequences are: P.horikoshii, BAA30059; P.abyssi, CAB49759; M.jannaschii, Q60346; M.thermoautotrophicum, AAB85800; A.fulgidus, AAB90092;A.pernix, BAA79514; C.elegans, CAB60423; D.melanogaster (translation of open reading frame in GenBank accession number AC005334). The portions of theA.pernix and E.gracilis Cbf5p sequences that display similarity to the PsuX sequences are shown at the bottom of the alignment; these Cbf5p sequences correspondto K57 to L132 of accession number BAA79974 (A.pernix) and K127 to L205 (E.gracilis; see Fig. 2). Shading of residues is as described in Figure 5. (B) Alignmentof different classes of Ψ synthase motifs (77) plus the overlined PsuX stretch shown in (A). Highly conserved positions are shown, with non-conserved positionsindicated by ‘x’. Because the putative PsuX motif contains two Asp residues that might correspond to the catalytic Asp (indicated by the asterisk) in known Ψsynthases, two slightly different alignments are shown.

2350 Nucleic Acids Research, 2000, Vol. 28, No. 12

members of the site-specific RluA and RsuA sub-families ofrRNA Ψ synthases, eubacteria generally contain one TruA geneand (with the exception of Mycoplasma spp and Helicobacterpylori) one TruB gene (79). The TruA and TruB Ψ synthasescatalyze formation of Ψ at positions 38–40 and 55, respectively, intRNA [in E.coli, the RluA synthase is a dual-specificityenzyme that carries out pseudouridylylation at position 32 intRNA as well as position 746 in the LSU (23S) rRNA (91)].The RluA- and RsuA-type synthases bear no statisticallysignificant similarity to the Cbf5p/TruB superfamily, althoughthey do possess short sequence motifs diagnostic of Ψsynthases in general.

Convincing Cbf5p and TruA homologs have been identifiedin those archaebacterial genomes that have been completelysequenced; however, no genes that are specifically related toTruB have been reported. If, as suggested here, Cbf5p functionsas an rRNA Ψ synthase in archaebacteria, these observationsraise the question of how Ψ55, which is known to occur inarchaebacterial tRNAs (103), is formed. This may be anothersituation in which a single Ψ synthase (in this case, Cbf5p orTruA) possesses dual specificity, acting on tRNA as well asrRNA, or at different sites in tRNA. Alternatively, genes forother (tRNA-specific) Ψ synthases may exist in archaebacterialgenomes but be too divergent to be recognized in databasesearches by the methods currently available. A third possibilityis that the novel PsuX family reported here plays a role in Ψ55synthesis in tRNA, as its organismal distribution might suggest(Table 2). For example, although Pus4p orthologs are notdetectable in sequenced archaebacterial genomes or in theC.elegans and D.melanogaster genome sequences, PsuXhomologs are present in these genomes. Conversely, the yeast(S.cerevisiae) genome apparently lacks PsuX homologs butdoes encode a tRNA Ψ55 synthase (Pus4p).

Lafontaine and Tollervey (99) have suggested that the Cbf5pgene might have originated via duplication of an ancestralTruB-like gene (probably) after the separation of the domainsEucarya and Archaea. However, because the archaebacterial‘TruB’ homolog is more closely related to the Cbf5p class than

to the TruB sub-family per se, it is likely that any duplicationof a TruB-like gene would have pre-dated the postulatedarchaebacterial–eukaryotic split. In fact, the accumulatingevidence does suggest that a transition in the rRNA Ψ synthesizingmachinery may have occurred in an archaebacterial–eukaryoticcommon ancestor, from the simpler, site-specific, eubacterialtype of RluA-RsuA Ψ synthases to the more complex,snoRNP-dependent type present in eukaryotes and (assuggested here) archaebacteria. Alternatively (but perhaps lesslikely in view of the complexity factor), the snoRNP-basedCbf5p Ψ synthase system may have been ancestral, with retentionof this system in the archaebacterial–eukaryotic line but with ashift to an RluA/RsuA-based system in eubacteria. A thirdpossibility is the direct evolution of a TruB-like, tRNA-specificactivity into a Cbf5p-like, rRNA-specific enzyme (perhapsinitially retaining specificity for tRNA as well), again presumablyin an archaebacterial–eukaryotic common ancestor. Currentdata do not allow us to distinguish among these possibilities.

This evolutionary picture is complicated by the presence insome eukaryotes of a TruB ortholog (Pus4p in yeast) havingtRNA Ψ55 specificity. It is possible that this gene traces itsorigin to an ancestral, duplicated TruB-like gene, with one ofthe duplicates diverging to give the Cbf5p family and the otherretaining TruB structure and function. However, this scenariowould require that the TruB gene was lost in the archaebacteriallineage but retained (as a Pus4p homolog) in the eukaryoticlineage. Also, one must account for the fact that eukaryoticPus4p sequences are substantially more closely related toeubacterial TruB sequences than to either archaebacterial oreukaryotic Cbf5p sequences (Fig. 4). Thus, if duplication of aTruB-like gene had occurred in an archaebacterial–eukaryoticcommon ancestor, one of the duplicates would have had tohave diverged so radically that the evolutionary descendents ofthis duplication (Cbf5p and Pus4p in eukaryotes) no longerbear evidence of a specifically shared common ancestry.Further clouding this issue is the novel PsuX family describedhere, which is distantly related to both the Cbf5p and TruBsubfamilies.

Table 2. Global distribution of genes for known Ψ synthases, the TruB/Cbf5p-related protein PsuX, and box H/ACA snoRNP proteins within the three domains of life

aThought to act on mitochondrial rRNA S.cerevisiae (79,93).bSome eubacteria do have genes for homologs of eukaryotic Nhp2p (101,102).cFor summary of gene organization, see text and Figure 7.dHomologous gene is missing in some eubacteria (see text).ePsuY, a distant relative of TruB, is also encoded by the H.sapiens, D.melanogaster and C.elegans genomes (see text).fFor description of gene organization, see text and Figure 7.gRef. 106.

Gene family Bacteria Archaea Eucarya

RluA, RsuA yes (for rRNA or rRNA/tRNA) no apparent homologs yes (RluA only; for organellar rRNA?a)

Cbf5p no apparent homologs yes yes (for cytoplasmic rRNA)

Other box H/ACA snoRNPproteins(Gar1p, Nhp2p, Nop10p)

no apparent homologsb yesc yes

TruB yesd (for tRNA) no apparent homologs yes (Pus4p) (S.cerevisiae for tRNA, mammals for ?)e

PsuX no apparent homologs yesf no (S.cerevisiae)yes (C.elegans, D.melanogaster, mammals, plants, G.intestinalis)

TruA yes (for tRNA) yes yes (for tRNA or tRNA/snRNAg)

Nucleic Acids Research, 2000, Vol. 28, No. 12 2351

An alternative possibility to account for the origin for Pus4pis lateral transfer of the TruB gene, e.g. from the eubacteria-likeendosymbiont that gave rise to mitochondria. This scenariowould account for the fact that eukaryotic Pus4p is moresimilar to eubacterial TruB than to eukaryotic Cbf5p (31,84).The phylogenetic placement of Pus4p sequences (Fig. 4) isconsistent with a direct eubacterial ancestry, although evidenceof a specific α-proteobacterial ancestry, as expected for a mito-chondrial origin (104), is not apparent. Again, this may largelybe a sampling limitation, with only two full-length Pus4psequences available at the moment. With regard to a possibleendosymbiotic origin of the Pus4p gene, we note that thePus4p protein catalyzes formation of Ψ55 in mitochondrial aswell as cytosolic tRNAs in yeast (38).

After submission of this manuscript, the nearly completegenome sequence of D.melanogaster was published (105).Using E.coli Ψ synthase sequences in a BLASTP searchagainst the database of predicted Drosophila protein sequences,we detected (at E values <10–4) two and three putative homologs,respectively, of RluA and TruA, but no RsuA counterpart.Also, in addition to the Cbf5p (Nop60Bp/minifly) and PsuX

orthologs already discussed, we encountered a thirdDrosophila protein sharing a low level (E = 2 × 10–3) ofsequence similarity with E.coli TruB, namely the CB7849gene product (AAF57283.1). The latter sequence detectsclosely related proteins encoded by the human (AAD20059.1;E = 4 × 10–28) and C.elegans (AAF60570.1; E = 5 × 10–9)genomes. The human homolog of this protein group (for whichwe propose the name ‘PsuY’) was previously detected as aTruB homolog (TRUB2/HUMAN in figure 5 of ref. 79);however, no conserved candidate TruB motif II is evident inthese PsuY sequences.

ACKNOWLEDGEMENTS

We thank Dr Murray N. Schnare and Michael Charette forvaluable discussion and advice, and other members of the GrayLab for critical comment. M.W.G., who is a Fellow in theProgram in Evolutionary Biology, Canadian Institute forAdvanced Research, gratefully acknowledges salary andinteraction support from the CIAR. This work was funded byoperating grant MT-11212 from the Medical Research Councilof Canada to M.W.G.

REFERENCES

1. Eichler,D.C. and Craig,N. (1994) Prog. Nucleic Acid Res. Mol. Biol., 49,197–239.

2. Venema,J. and Tollervey,D. (1995) Yeast, 11, 1629–1650.3. Morrissey,J.P. and Tollervey,D. (1995) Trends Biochem. Sci., 20, 78–82.4. Lafontaine,D.L.J., Bousquet-Antonelli,C., Henry,Y., Caizergues-Ferrer,M.

and Tollervey,D. (1998) Genes Dev., 12, 527–537.5. Schnare,M.N. and Gray,M.W. (1990) J. Mol. Biol., 215, 73–83.6. Gray,M.W. and Schnare,M.N. (1996) In Zimmermann,R.A. and

Dahlberg,A.E. (eds), Ribosomal RNA: Structure, Evolution, Processing,and Function in Protein Biosynthesis. CRC Press, Inc., Boca Raton, FL,pp. 49–69.

7. Schnare,M.N., Cook,J.R. and Gray,M.W. (1990) J. Mol. Biol., 215, 85–91.8. Greenwood,S.J. and Gray,M.W. (1998) Biochim. Biophys. Acta, 1443,

128–138.9. Greenwood,S.J., Schnare,M.N. and Gray,M.W. (1996) Curr. Genet., 30,

338–346.10. Edlind,T.D., Li,J., Visvesvara,G.S., Vodkin,M.H., McLaughlin,G.L. and

Katiyar,S.K. (1996) Mol. Phylogenet. Evol., 5, 359–367.11. Keeling,P.J. and Doolittle,W.F. (1996) Mol. Biol. Evol., 13, 1297–1305.12. Li,J., Katiyar,S.K., Hamelin,A., Visvesvara,G.S. and Edlind,T.D. (1996)

Mol. Biochem. Parasitol., 78, 289–295.13. Roger,A.J., Sandblom,O., Doolittle,W.F. and Philippe,H. (1999)

Mol. Biol. Evol., 16, 218–233.14. Sogin,M.L. (1991) Curr. Opin. Genet. Dev., 1, 457–463.15. Balakin,A.G., Smith,L. and Fournier,M.J. (1996) Cell, 86, 823–834.16. Ganot,P., Bortolin,M.-L. and Kiss,T. (1997) Cell, 89, 799–809.17. Ganot,P., Caizergues-Ferrer,M. and Kiss,T. (1997) Genes Dev., 11, 941–956.18. Selvamurugan,N., Joost,O.H., Haas,E.S., Brown,J.W., Galvin,N.J. and

Eliceiri,G.L. (1997) Nucleic Acids Res., 25, 1591–1596.19. Morrissey,J.P. and Tollervey,D. (1993) Mol. Cell. Biol., 13, 2469–2477.20. Ni,J., Tien,A.L. and Fournier,M.J. (1997) Cell, 89, 565–573.21. Tollervey,D. (1987) EMBO J., 6, 4169–4175.22. Smith,C.M. and Steitz,J.A. (1997) Cell, 89, 669–672.23. Tollervey,D. and Kiss,T. (1997) Curr. Opin. Cell Biol., 9, 337–342.24. Olivas,W.M., Muhlrad,D. and Parker,R. (1997) Nucleic Acids Res., 25,

4619–4625.25. Leader,D.J., Clark,G.P., Watters,J., Beven,A.F., Shaw,P.J. and

Brown,J.W.S. (1997) EMBO J., 16, 5742–5751.26. Giordano,E., Peluso,I., Senger,S. and Furia,M. (1999) J. Cell Biol., 144,

1123–1133.27. Nielsen,H., Ørum,H. and Engberg,J. (1992) FEBS Lett., 307, 337–342.28. Lübben,B., Fabrizio,P., Kastner,B. and Lührmann,R. (1995) J. Biol.

Chem., 270, 11549–11554.

Figure 7. (A) Organization of the genes encoding archaebacterial homologs ofNop10p (filled rectangles). IF2-α, homolog of eukaryotic translation initiationfactor 2-α; L44E and S27E, ribosomal protein genes. (B) Organization of thegenes encoding archaebacterial homologs of Gar1p (filled rectangles). TFIIB,homolog of eukaryotic transcription factor IIB; APE0788, unidentified openreading frame; S8E, ribosomal protein gene. (C) Organization of the genesencoding putative archaebacterial PsuX orthologs (filled rectangles). SRP54/Ffh,homolog of eukaryotic signal recognition particle GTPase; L21E, ribosomalprotein gene. Direction of transcription is indicated by arrows above the rectangles.Note that the sizes of genes and spacers are not drawn to scale, and the regionshown may not represent the entire operon.

2352 Nucleic Acids Research, 2000, Vol. 28, No. 12

29. Bousquet-Antonelli,C., Henry,Y., Gélugne,J.-P., Caizergues-Ferrer,M.and Kiss,T. (1997) EMBO J., 16, 4770–4776.

30. Henras,A., Henry,Y., Bousquet-Antonelli,C., Noaillac-Depeyre,J.,Gélugne,J.-P. and Caizergues-Ferrer,M. (1998) EMBO J., 17, 7078–7090.

31. Watkins,N.J., Gottschalk,A., Neubauer,G., Kastner,B., Fabrizio,P.,Mann,M. and Lührmann,R. (1998) RNA, 4, 1549–1568.

32. Jiang,W., Middleton,K., Yoon,H.-J., Fouquet,C. and Carbon,J. (1993)Mol. Cell. Biol. 13, 4884–4893.

33. Cadwell,C., Yoon,H.-J., Zebarjadian,Y. and Carbon,J. (1997) Mol. Cell. Biol.,17, 6175–6183.

34. Meier,U.T. and Blobel,G. (1994) J. Cell Biol., 127, 1505–1514[published erratum appears in (1998) J. Cell Biol., 140, 447].

35. Nurse,K., Wrzesinski,J., Bakin,A., Lane,B.G. and Ofengand,J. (1995)RNA, 1, 102–112.

36. Koonin,E.V. (1996) Nucleic Acids Res., 24, 2411–2415.37. Gustafsson,C., Reid,R., Greene,P.J. and Santi,D.V. (1996) Nucleic Acids Res.,

24, 3756–3762.38. Becker,H.F., Motorin,Y., Planta,R.J. and Grosjean,H. (1997)

Nucleic Acids Res., 25, 4493–4499.39. Heiss,N.S., Knight,S.W., Vulliamy,T.J., Klauck,S.M., Wiemann,S.,

Mason,P.J., Poustka,A. and Dokal,I. (1998) Nature Genet., 19, 32–38.40. Mitchell,J.R., Wood,E. and Collins,K. (1999) Nature, 402, 551–555.41. Phillips,B., Billin,A.N., Cadwell,C., Buchholz,R., Erickson,C., Merriam,J.R.,

Carbon,J. and Poole,S.J. (1998) Mol. Gen. Genet., 260, 20–29.42. Winkler,A.A., Bobok,A., Zonneveld,B.J.M., Steensma,H.Y. and

Hooykaas,P.J.J. (1998) Yeast, 14, 37–48.43. Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G.,

Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D. et al. (1996)Science, 273, 1058–1073.

44. Smith,D.R., Doucette-Stamm,L.A., Deloughery,C., Lee,H., Dubois,J.,Aldredge,T., Bashirzadeh,R., Blakely,D., Cook,R., Gilbert,K. et al.(1997) J. Bacteriol., 179, 7135–7155.

45. Klenk,H.-P., Clayton,R.A., Tomb,J.-F., White,O., Nelson,K.E.,Ketchum,K.A., Dodson,R.J., Gwinn,M., Hickey,E.K., Peterson,J.D. et al.(1997) Nature, 390, 364–370.

46. Kawarabayasi,Y., Sawada,M., Horikawa,H., Haikawa,Y., Hino,Y.,Yamamoto,S., Sekine,M., Baba,S., Kosugi,H., Hosoyama,A. et al. (1998)DNA Res., 5, 55–76.

47. Kawarabayasi,Y., Hino,Y., Horikawa,H., Yamazaki,S., Haikawa,Y.,Jin-no,K., Takahashi,M., Sekine,M., Baba,S., Ankai,A. et al. (1999)DNA Res., 6, 83–101.

48. Breckenridge,D.G., Watanabe,Y., Greenwood,S.J., Gray,M.W. andSchnare,M.N. (1999) Proc. Natl Acad. Sci. USA, 96, 852–856.

49. Maruyama,K. and Sugano,S. (1994) Gene, 138, 171–174.50. Baskaran,N., Kandpal,R.P., Bhargava,A.K., Glynn,M.W., Bale,A. and

Weissman,S.M. (1996) Genome Res., 6, 633–638.51. Don,R.H., Cox,P.T., Wainwright,B.J., Baker,K. and Mattick,J.S. (1991)

Nucleic Acids Res., 19, 4008.52. Marchuk,D., Drumm,M., Saulino,A. and Collins,F.S. (1991)

Nucleic Acids Res. 19, 1154.53. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl Acad. Sci.

USA, 74, 5463–5467.54. Mytelka,D.S. and Chamberlin,M.J. (1996) Nucleic Acids Res., 24, 2774–2781.55. Frank,R., Müller,D. and Wolff,C. (1981) Nucleic Acids Res., 9, 4967–4979.56. Siebert,P.D., Chenchik,A., Kellog,D.E., Lukyanov,K.A. and

Lukyanov,S.A. (1995) Nucleic Acids Res., 23, 1087–1088.57. Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., Seidman,J.G.,

Smith,J.A. and Struhl,K. (1987) Current Protocols in Molecular Biology.Greene Publishing Associates and Wiley-Interscience, New York, NY.

58. Chomczynski,P. and Mackey,K. (1994) Anal. Biochem., 221, 303–305.59. Sambrook,J., Fritsch,E.F. and Maniatis,T., (1989) Molecular Cloning:

A Laboratory Manual, 2nd Edn. Cold Spring Harbor Laboratory Press,Cold Spring Harbor, NY.

60. Church,G.M. and Gilbert,W. (1984) Proc. Natl Acad. Sci. USA, 81,1991–1995.

61. Chomczynski,P. (1992) Anal. Biochem., 201, 134–139.62. Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W.

and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.63. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res.,

22, 4673–4680.64. Strimmer,K. and von Haeseler,A. (1996) Mol. Biol. Evol., 13, 964–969.

65. Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) Comput. Appl. Biosci.,8, 275–282.

66. Ota,T. and Nei,M. (1994) J. Mol. Evol., 38, 642–643.67. Tessier,L.-H., Keller,M., Chan,R.L., Fournier,R., Weil,J.-H. and

Imbault,P. (1991) EMBO J., 10, 2621–2625.68. Keller,M., Tessier,L.H., Chan,R.L., Weil,J.H. and Imbault,P. (1992)

Nucleic Acids Res., 20, 1711–1715.69. Schnare,M.N. and Gray,M.W. (1999) J. Biol. Chem., 274, 23691–23694.70. Montandon,P.-E. and Stutz,E. (1990) Nucleic Acids Res., 18, 75–82.71. Chan,R.L., Keller,M., Canaday,J., Weil,J.-H. and Imbault,P. (1990)

EMBO J., 9, 333–338.72. Perry,K.L., Watkins,K.P. and Agabian,N. (1987) Proc. Natl Acad. Sci.

USA, 84, 8190–8194.73. McNally,K.P. and Agabian,N. (1992) Mol. Cell. Biol., 12, 4844–4851.74. Lücke,S., Xu,G.-L., Palfi,Z., Cross,M., Bellofatto,V. and Bindereif,A.

(1996) EMBO J., 15, 4380–4391.75. Bangs,J.D., Crain,P.F., Hashizume,T., McCloskey,J.A. and

Boothroyd,J.C. (1992) J. Biol. Chem., 267, 9805–9815.76. Ebel,C., Frantz,C., Paulus,F. and Imbault,P. (1999) Curr. Genet., 35, 542–550.77. Huang,L., Pookanjanatavip,M., Gu,X. and Santi,D.V. (1998)

Biochemistry, 37, 344–351.78. Raychaudhuri,S., Niu,L., Conrad,J., Lane,B.G. and Ofengand,J. (1999)

J. Biol. Chem., 274, 18880–18886.79. Conrad,J., Niu,L., Rudd,K., Lane,B.G. and Ofengand,J. (1999) RNA, 5,

751–763.80. Ramamurthy,V., Swann,S.L., Paulson,J.L., Spedaliere,C.J. and

Mueller,E.G. (1999) J. Biol. Chem., 274, 22225–22230.81. Gu,X., Liu,Y. and Santi,D.V. (1999) Proc. Natl Acad. Sci. USA, 96,

14270–14275.82. Foster,P.G., Huang,L., Santi,D.V. and Stroud,R.M. (2000) Nature Struct.

Biol., 7, 23–27.83. Zebarjadian,Y., King,T., Fournier,M.J., Clarke,L. and Carbon,J. (1999)

Mol. Cell. Biol., 19, 7461–7472.84. Aravind,L. and Koonin,E.V. (1999) J. Mol. Evol., 48, 291–302.85. Girard,J.P., Bagni,C., Caizergues-Ferrer,M., Amalric,F. and Lapeyre,B.

(1994) J. Biol. Chem., 269, 18499–18506.86. Bagni,C. and Lapeyre,B. (1998) J. Biol. Chem., 273, 10868–10873.87. Qureshi,S.A., Bell,S.D. and Jackson,S.P. (1997) EMBO J., 16, 2927–2936.88. Wootton,J.C. and Federhen,S. (1996) Methods Enzymol., 266, 554–571.89. Berg,J.M. (1986) Science, 232, 485–487.90. Arluison,V., Houtondji,C., Robert,B. and Grosjean,H. (1998)

Biochemistry, 37, 7268–7276.91. Wrzesinski,J., Nurse,K., Bakin,A., Lane,B.G. and Ofengand,J. (1995)

RNA, 1, 437–448.92. Wrzesinski,J., Bakin,A., Nurse,K., Lane,B.G. and Ofengand,J. (1995)

Biochemistry, 34, 8904–8913.93. Ofengand,J. and Fournier,M.J. (1998) In Grosjean,H. and Benne,R. (eds),

Modification and Editing of RNA. ASM Press, Washington, DC, pp. 229–253.94. Huang,L., Ku,J., Pookanjanatavip,M., Gu,X., Wang,D., Greene,P.J. and

Santi,D.V. (1998) Biochemistry, 37, 15951–15957.95. Conrad,J., Sun,D., Englund,N. and Ofengand,J. (1998) J. Biol. Chem.,

273, 18562–18566.96. Raychaudhuri,S., Conrad,J., Hall,B.G. and Ofengand,J. (1998) RNA, 4,

1407–1417.97. Niu,L., Lane,B.G. and Ofengand,J. (1999) Biochemistry, 38, 629–635

[published erratum appears in (1999) Biochemistry, 38, 14117].98. Noon,K.R., Bruenger,E. and McCloskey,J.A. (1998) J. Bacteriol., 180,

2883–2888.99. Lafontaine,D.L.J. and Tollervey,D. (1998) Trends Biochem. Sci., 23, 383–388.100. Massenet,S., Ansmant,I., Motorin,Y. and Branlant,C. (1999) FEBS Lett.,

462, 94–100.101. Kolodrubetz,D. and Burgum,A. (1991) Yeast, 7, 79–90.102. Koonin,E.V., Bork,P. and Sander,C. (1994) Nucleic Acids Res., 22, 2166–2167.103. Sprinzl,M., Horn,C., Brown,M., Ioudovitch,A. and Steinberg,S. (1998)

Nucleic Acids Res., 26, 148–153.104. Gray,M.W., Burger,G. and Lang,B.F. (1999) Science, 283, 1476–1481.105. Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D.,

Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al.(2000) Science, 287, 2185–2195.

106. Massenet,S., Motorin,Y., Lafontaine,D.L.J., Hurt,E.C., Grosjean,H. andBranlant,C. (1999) Mol. Cell. Biol., 19, 2142–2154.